Moneybarl: Exploiting pitcher decision-making using reinforcement learning

Gagan Sidhu; Brian Caffo

doi:10.1214/13-AOAS712

Moneybarl: Exploiting pitcher decision-making using reinforcement learning

Gagan Sidhu, Brian Caffo

Bloomberg School of Public Health

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

This manuscript uses machine learning techniques to exploit baseball pitchers' decision making, so-called "Baseball IQ," by modeling the at-bat information, pitch selection and counts, as a Markov Decision Process (MDP). Each state of the MDP models the pitcher's current pitch selection in a Markovian fashion, conditional on the information immediately prior to making the current pitch. This includes the count prior to the previous pitch, his ensuing pitch selection, the batter's ensuing action and the result of the pitch. The necessary Markovian probabilities can be estimated by the relevant observed conditional proportions in MLB pitch-by-pitch game data. These probabilities could be pitcher-specific, using only the data from one pitcher, or general, using the data from a collection of pitchers. Optimal batting strategies against these estimated conditional distributions of pitch selection can be ascertained by Value Iteration. Optimal batting strategies against a pitcher-specific conditional distribution can be contrasted to those calculated from the general conditional distributions associated with a collection of pitchers. In this manuscript, a single season of MLB data is used to calculate the conditional distributions to find optimal pitcher-specific and general (against a collection of pitchers) batting strategies. These strategies are subsequently evaluated by conditional distributions calculated from a different season for the same pitchers. Thus, the batting strategies are conceptually tested via a collection of simulated games, a "mock season," governed by distributions not used to create the strategies. (Simulation is not needed, as exact calculations are available.) Instances where the pitcher-specific batting strategy outperforms the general batting strategy suggests that the pitcher is exploitable-knowledge of the conditional distributions of their pitch-making decision process in a different season yielded a strategy that worked better in a new season than a general batting strategy built on a population of pitchers. A permutation-based test of exploitability of the collection of pitchers is given and evaluated under two sets of assumptions. To show the practical utility of the approach, we introduce a spatial component that classifies each pitcher's pitch-types using a batterparameterized spatial trajectory for each pitch. We found that heuristically labeled "nonelite" batters benefit from using the exploited pitchers' pitcherspecific strategies, whereas (also heuristically labeled) "elite" players do not.

Original language	English (US)
Pages (from-to)	926-955
Number of pages	30
Journal	Annals of Applied Statistics
Volume	8
Issue number	2
DOIs	https://doi.org/10.1214/13-AOAS712
State	Published - Jun 2014

Keywords

Algorithmic statistics
Baseball
Markov
Simulation
Sports

ASJC Scopus subject areas

Statistics and Probability
Modeling and Simulation
Statistics, Probability and Uncertainty

Access to Document

10.1214/13-AOAS712

Cite this

@article{035af448327e46a3b4b7c0887534d78c,

title = "Moneybarl: Exploiting pitcher decision-making using reinforcement learning",

abstract = "This manuscript uses machine learning techniques to exploit baseball pitchers' decision making, so-called {"}Baseball IQ,{"} by modeling the at-bat information, pitch selection and counts, as a Markov Decision Process (MDP). Each state of the MDP models the pitcher's current pitch selection in a Markovian fashion, conditional on the information immediately prior to making the current pitch. This includes the count prior to the previous pitch, his ensuing pitch selection, the batter's ensuing action and the result of the pitch. The necessary Markovian probabilities can be estimated by the relevant observed conditional proportions in MLB pitch-by-pitch game data. These probabilities could be pitcher-specific, using only the data from one pitcher, or general, using the data from a collection of pitchers. Optimal batting strategies against these estimated conditional distributions of pitch selection can be ascertained by Value Iteration. Optimal batting strategies against a pitcher-specific conditional distribution can be contrasted to those calculated from the general conditional distributions associated with a collection of pitchers. In this manuscript, a single season of MLB data is used to calculate the conditional distributions to find optimal pitcher-specific and general (against a collection of pitchers) batting strategies. These strategies are subsequently evaluated by conditional distributions calculated from a different season for the same pitchers. Thus, the batting strategies are conceptually tested via a collection of simulated games, a {"}mock season,{"} governed by distributions not used to create the strategies. (Simulation is not needed, as exact calculations are available.) Instances where the pitcher-specific batting strategy outperforms the general batting strategy suggests that the pitcher is exploitable-knowledge of the conditional distributions of their pitch-making decision process in a different season yielded a strategy that worked better in a new season than a general batting strategy built on a population of pitchers. A permutation-based test of exploitability of the collection of pitchers is given and evaluated under two sets of assumptions. To show the practical utility of the approach, we introduce a spatial component that classifies each pitcher's pitch-types using a batterparameterized spatial trajectory for each pitch. We found that heuristically labeled {"}nonelite{"} batters benefit from using the exploited pitchers' pitcherspecific strategies, whereas (also heuristically labeled) {"}elite{"} players do not.",

keywords = "Algorithmic statistics, Baseball, Markov, Simulation, Sports",

author = "Gagan Sidhu and Brian Caffo",

year = "2014",

month = jun,

doi = "10.1214/13-AOAS712",

language = "English (US)",

volume = "8",

pages = "926--955",

journal = "Annals of Applied Statistics",

issn = "1932-6157",

publisher = "Institute of Mathematical Statistics",

number = "2",

}

TY - JOUR

T1 - Moneybarl

T2 - Exploiting pitcher decision-making using reinforcement learning

AU - Sidhu, Gagan

AU - Caffo, Brian

PY - 2014/6

Y1 - 2014/6

N2 - This manuscript uses machine learning techniques to exploit baseball pitchers' decision making, so-called "Baseball IQ," by modeling the at-bat information, pitch selection and counts, as a Markov Decision Process (MDP). Each state of the MDP models the pitcher's current pitch selection in a Markovian fashion, conditional on the information immediately prior to making the current pitch. This includes the count prior to the previous pitch, his ensuing pitch selection, the batter's ensuing action and the result of the pitch. The necessary Markovian probabilities can be estimated by the relevant observed conditional proportions in MLB pitch-by-pitch game data. These probabilities could be pitcher-specific, using only the data from one pitcher, or general, using the data from a collection of pitchers. Optimal batting strategies against these estimated conditional distributions of pitch selection can be ascertained by Value Iteration. Optimal batting strategies against a pitcher-specific conditional distribution can be contrasted to those calculated from the general conditional distributions associated with a collection of pitchers. In this manuscript, a single season of MLB data is used to calculate the conditional distributions to find optimal pitcher-specific and general (against a collection of pitchers) batting strategies. These strategies are subsequently evaluated by conditional distributions calculated from a different season for the same pitchers. Thus, the batting strategies are conceptually tested via a collection of simulated games, a "mock season," governed by distributions not used to create the strategies. (Simulation is not needed, as exact calculations are available.) Instances where the pitcher-specific batting strategy outperforms the general batting strategy suggests that the pitcher is exploitable-knowledge of the conditional distributions of their pitch-making decision process in a different season yielded a strategy that worked better in a new season than a general batting strategy built on a population of pitchers. A permutation-based test of exploitability of the collection of pitchers is given and evaluated under two sets of assumptions. To show the practical utility of the approach, we introduce a spatial component that classifies each pitcher's pitch-types using a batterparameterized spatial trajectory for each pitch. We found that heuristically labeled "nonelite" batters benefit from using the exploited pitchers' pitcherspecific strategies, whereas (also heuristically labeled) "elite" players do not.

AB - This manuscript uses machine learning techniques to exploit baseball pitchers' decision making, so-called "Baseball IQ," by modeling the at-bat information, pitch selection and counts, as a Markov Decision Process (MDP). Each state of the MDP models the pitcher's current pitch selection in a Markovian fashion, conditional on the information immediately prior to making the current pitch. This includes the count prior to the previous pitch, his ensuing pitch selection, the batter's ensuing action and the result of the pitch. The necessary Markovian probabilities can be estimated by the relevant observed conditional proportions in MLB pitch-by-pitch game data. These probabilities could be pitcher-specific, using only the data from one pitcher, or general, using the data from a collection of pitchers. Optimal batting strategies against these estimated conditional distributions of pitch selection can be ascertained by Value Iteration. Optimal batting strategies against a pitcher-specific conditional distribution can be contrasted to those calculated from the general conditional distributions associated with a collection of pitchers. In this manuscript, a single season of MLB data is used to calculate the conditional distributions to find optimal pitcher-specific and general (against a collection of pitchers) batting strategies. These strategies are subsequently evaluated by conditional distributions calculated from a different season for the same pitchers. Thus, the batting strategies are conceptually tested via a collection of simulated games, a "mock season," governed by distributions not used to create the strategies. (Simulation is not needed, as exact calculations are available.) Instances where the pitcher-specific batting strategy outperforms the general batting strategy suggests that the pitcher is exploitable-knowledge of the conditional distributions of their pitch-making decision process in a different season yielded a strategy that worked better in a new season than a general batting strategy built on a population of pitchers. A permutation-based test of exploitability of the collection of pitchers is given and evaluated under two sets of assumptions. To show the practical utility of the approach, we introduce a spatial component that classifies each pitcher's pitch-types using a batterparameterized spatial trajectory for each pitch. We found that heuristically labeled "nonelite" batters benefit from using the exploited pitchers' pitcherspecific strategies, whereas (also heuristically labeled) "elite" players do not.

KW - Algorithmic statistics

KW - Baseball

KW - Markov

KW - Simulation

KW - Sports

UR - http://www.scopus.com/inward/record.url?scp=84903752897&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84903752897&partnerID=8YFLogxK

U2 - 10.1214/13-AOAS712

DO - 10.1214/13-AOAS712

M3 - Article

AN - SCOPUS:84903752897

SN - 1932-6157

VL - 8

SP - 926

EP - 955

JO - Annals of Applied Statistics

JF - Annals of Applied Statistics

IS - 2

ER -

Moneybarl: Exploiting pitcher decision-making using reinforcement learning

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this