A Markov Decision Process (MDP)-Based Q-Learning Framework for Optimizing Bowling Strategies in T20 Cricket

Authors

  • Abdurrahman Sabir, Muhammad Irfan Ud Din, Syed Habib Shah, Gohar Ayu, Mujeeb, Shahid Iqbal, Qamruz Zaman

Abstract

Optimizing bowling strategies in T20 cricket is a complex sequential decision-making problem due to the dynamic, stochastic, and context-dependent nature of the game. Traditional approaches rely on expert judgment, intuition, or aggregated performance statistics, which fail to capture ball-by-ball tactical adaptability and phase-specific strategic requirements. This study develops a Markov Decision Process (MDP)-based Q-learning framework to optimize bowling strategies in T20 cricket using real ball-by-ball data. The framework models each delivery as a state transition, defining states by match phase (Powerplay, Middle over, Death over), delivery length, and bowling line, with actions representing bowling variations including yorkers, slower balls, bouncers, and spin deliveries. The Q-learning algorithm learns optimal policies directly from historical match data without requiring pre-specified transition probabilities, and a reward function based on runs conceded and wickets taken guides policy optimization.

The methodology comprises four stages: data collection and preprocessing from nineteen T20I bowlers using ball-by-ball records from January to December 2024, state-action space design with 105 possible states, MDP formulation with context-aware reward engineering, and Q-learning policy optimization followed by hybrid bowler ranking. Results demonstrate that optimal bowling strategies are highly context-dependent and bowler-specific. In the Powerplay, Nathan Ellis (Mean-Q = 18.86), Josh Hazlewood (18.01), and Arshdeep Singh (16.87) achieved the highest Q-values through disciplined line-length combinations and swing variations. During the middle overs, spin bowlers dominated with Adil Rashid (26.03), Rashid Khan (23.78), and Axar Patel (23.27) recording the highest Mean Q-values. In Death overs, Rashid Khan (30.33), Gudakesh Motie (30.28), Arshdeep Singh (24.67), and Jofra Archer (21.59) demonstrated exceptional effectiveness through yorkers, slower balls, and variation-based strategies. The hybrid ranking model integrates classical metrics (wickets, economy rate, strike rate, bowling average, dot ball percentage, boundary percentage) with normalized Q-scores using a 65:35 weighting scheme. Arshdeep Singh ranked first overall (0.7844) with the highest classical sum (0.6683) and maximum Q-score (1.000), followed by Rashid Khan (0.7817) and Gudakesh Motie (0.6935). Adil Rashid benefited most from RL inclusion, rising from 10th in classical sum to 6th in hybrid rank due to a strong Q-score (0.7805), while Anrich Nortje showed the largest divergence with a respectable classical sum (0.5105) but very low Q-score (0.0793), dropping to 16th. The findings confirm that strategic intelligence measured via Q-learning reveals value not captured by traditional statistics, advocating for reinforcement learning-based evaluation in cricket analytics. 

Downloads

Published

2026-04-12

How to Cite

Abdurrahman Sabir, Muhammad Irfan Ud Din, Syed Habib Shah, Gohar Ayu, Mujeeb, Shahid Iqbal, Qamruz Zaman. 2026. “A Markov Decision Process (MDP)-Based Q-Learning Framework for Optimizing Bowling Strategies in T20 Cricket”. Metallurgical and Materials Engineering, April, 94-125. https://www.metall-mater-eng.com/index.php/home/article/view/1986.

Issue

Section

Research