A Markov Decision Process (MDP)-Based Q-Learning Framework for Optimizing Bowling Strategies in T20 Cricket
Abstract
Optimizing bowling strategies in T20 cricket is a complex sequential decision-making problem due to the dynamic, stochastic, and context-dependent nature of the game. Traditional approaches rely on expert judgment, intuition, or aggregated performance statistics, which fail to capture ball-by-ball tactical adaptability and phase-specific strategic requirements. This study develops a Markov Decision Process (MDP)-based Q-learning framework to optimize bowling strategies in T20 cricket using real ball-by-ball data. The framework models each delivery as a state transition, defining states by match phase (Powerplay, Middle over, Death over), delivery length, and bowling line, with actions representing bowling variations including yorkers, slower balls, bouncers, and spin deliveries. The Q-learning algorithm learns optimal policies directly from historical match data without requiring pre-specified transition probabilities, and a reward function based on runs conceded and wickets taken guides policy optimization.
The methodology comprises four stages: data collection and preprocessing from nineteen T20I bowlers using ball-by-ball records from January to December 2024, state-action space design with 105 possible states, MDP formulation with context-aware reward engineering, and Q-learning policy optimization followed by hybrid bowler ranking. Results demonstrate that optimal bowling strategies are highly context-dependent and bowler-specific. In the Powerplay, Nathan Ellis (Mean-Q = 18.86), Josh Hazlewood (18.01), and Arshdeep Singh (16.87) achieved the highest Q-values through disciplined line-length combinations and swing variations. During the middle overs, spin bowlers dominated with Adil Rashid (26.03), Rashid Khan (23.78), and Axar Patel (23.27) recording the highest Mean Q-values. In Death overs, Rashid Khan (30.33), Gudakesh Motie (30.28), Arshdeep Singh (24.67), and Jofra Archer (21.59) demonstrated exceptional effectiveness through yorkers, slower balls, and variation-based strategies. The hybrid ranking model integrates classical metrics (wickets, economy rate, strike rate, bowling average, dot ball percentage, boundary percentage) with normalized Q-scores using a 65:35 weighting scheme. Arshdeep Singh ranked first overall (0.7844) with the highest classical sum (0.6683) and maximum Q-score (1.000), followed by Rashid Khan (0.7817) and Gudakesh Motie (0.6935). Adil Rashid benefited most from RL inclusion, rising from 10th in classical sum to 6th in hybrid rank due to a strong Q-score (0.7805), while Anrich Nortje showed the largest divergence with a respectable classical sum (0.5105) but very low Q-score (0.0793), dropping to 16th. The findings confirm that strategic intelligence measured via Q-learning reveals value not captured by traditional statistics, advocating for reinforcement learning-based evaluation in cricket analytics.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Abdurrahman Sabir, Muhammad Irfan Ud Din, Syed Habib Shah, Gohar Ayu, Mujeeb, Shahid Iqbal, Qamruz Zaman

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their published articles online (e.g., in institutional repositories or on their website, social networks like ResearchGate or Academia), as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Except where otherwise noted, the content on this site is licensed under a Creative Commons Attribution 4.0 International License.



According to the