Semi-Parametric Bayesian inference for Multi-Season baseball data

Fernando A. Quintana, Peter Müller, Gary Rosner, Mark Munsell

Research output: Contribution to journalArticle

Abstract

We analyze complete sequences of successes (hits, walks, and sacrifices) for a group of players from the American and National Leagues, collected over 4 seasons. The goal is to describe how players' performances vary from season to season. In particular, we wish to assess and compare the effect of available occasion-specific covariates over seasons. The data are binary sequences for each player and each season. We model dependence in the binary sequence by an autoregressive logistic model. The model includes lagged terms up to a fixed order. For each player and season we introduce a diferent set of autologistic regression coefcients, i.e., the regression coefcients are random effects that are specific to each season and player. We use a nonparametric approach to define a random effects distribution. The nonparametric model is defined as a mixture with a Dirichlet process prior for the mixing measure. The described model is justified by a representation theorem for order-k exchangeable sequences. Besides the repeated measurements for each season and player, multiple seasons within a given player define an additional level of repeated measurements. We introduce dependence at this level of repeated measurements by relating the season-specific random effects vectors in an autoregressive fashion. We ultimately conclude that while some covariates like the ERA of the opposing pitcher are always relevant, others like an indicator for the game being into the seventh inning may be significant only for certain seasons, and some others, like the score of the game, can safely be ignored.

Original languageEnglish (US)
Pages (from-to)317-338
Number of pages22
JournalBayesian Analysis
Volume3
Issue number2
DOIs
StatePublished - 2008
Externally publishedYes

Fingerprint

Repeated Measurements
Bayesian inference
Random Effects
Binary Sequences
Covariates
Regression
Binary sequences
Exchangeable Sequences
Game
Dirichlet Process Prior
Logistic Model
Nonparametric Model
Representation Theorem
Autoregressive Model
Hits
Walk
Vary
Model
Term
Logistics

Keywords

  • Dirichlet process
  • Partial exchangeability
  • Semiparametric random efects

ASJC Scopus subject areas

  • Applied Mathematics
  • Statistics and Probability

Cite this

Semi-Parametric Bayesian inference for Multi-Season baseball data. / Quintana, Fernando A.; Müller, Peter; Rosner, Gary; Munsell, Mark.

In: Bayesian Analysis, Vol. 3, No. 2, 2008, p. 317-338.

Research output: Contribution to journalArticle

Quintana, Fernando A. ; Müller, Peter ; Rosner, Gary ; Munsell, Mark. / Semi-Parametric Bayesian inference for Multi-Season baseball data. In: Bayesian Analysis. 2008 ; Vol. 3, No. 2. pp. 317-338.
@article{47a039de3dcc44e3a6b752f20f74b890,
title = "Semi-Parametric Bayesian inference for Multi-Season baseball data",
abstract = "We analyze complete sequences of successes (hits, walks, and sacrifices) for a group of players from the American and National Leagues, collected over 4 seasons. The goal is to describe how players' performances vary from season to season. In particular, we wish to assess and compare the effect of available occasion-specific covariates over seasons. The data are binary sequences for each player and each season. We model dependence in the binary sequence by an autoregressive logistic model. The model includes lagged terms up to a fixed order. For each player and season we introduce a diferent set of autologistic regression coefcients, i.e., the regression coefcients are random effects that are specific to each season and player. We use a nonparametric approach to define a random effects distribution. The nonparametric model is defined as a mixture with a Dirichlet process prior for the mixing measure. The described model is justified by a representation theorem for order-k exchangeable sequences. Besides the repeated measurements for each season and player, multiple seasons within a given player define an additional level of repeated measurements. We introduce dependence at this level of repeated measurements by relating the season-specific random effects vectors in an autoregressive fashion. We ultimately conclude that while some covariates like the ERA of the opposing pitcher are always relevant, others like an indicator for the game being into the seventh inning may be significant only for certain seasons, and some others, like the score of the game, can safely be ignored.",
keywords = "Dirichlet process, Partial exchangeability, Semiparametric random efects",
author = "Quintana, {Fernando A.} and Peter M{\"u}ller and Gary Rosner and Mark Munsell",
year = "2008",
doi = "10.1214/08-BA312",
language = "English (US)",
volume = "3",
pages = "317--338",
journal = "Bayesian Analysis",
issn = "1936-0975",
publisher = "Carnegie Mellon University",
number = "2",

}

TY - JOUR

T1 - Semi-Parametric Bayesian inference for Multi-Season baseball data

AU - Quintana, Fernando A.

AU - Müller, Peter

AU - Rosner, Gary

AU - Munsell, Mark

PY - 2008

Y1 - 2008

N2 - We analyze complete sequences of successes (hits, walks, and sacrifices) for a group of players from the American and National Leagues, collected over 4 seasons. The goal is to describe how players' performances vary from season to season. In particular, we wish to assess and compare the effect of available occasion-specific covariates over seasons. The data are binary sequences for each player and each season. We model dependence in the binary sequence by an autoregressive logistic model. The model includes lagged terms up to a fixed order. For each player and season we introduce a diferent set of autologistic regression coefcients, i.e., the regression coefcients are random effects that are specific to each season and player. We use a nonparametric approach to define a random effects distribution. The nonparametric model is defined as a mixture with a Dirichlet process prior for the mixing measure. The described model is justified by a representation theorem for order-k exchangeable sequences. Besides the repeated measurements for each season and player, multiple seasons within a given player define an additional level of repeated measurements. We introduce dependence at this level of repeated measurements by relating the season-specific random effects vectors in an autoregressive fashion. We ultimately conclude that while some covariates like the ERA of the opposing pitcher are always relevant, others like an indicator for the game being into the seventh inning may be significant only for certain seasons, and some others, like the score of the game, can safely be ignored.

AB - We analyze complete sequences of successes (hits, walks, and sacrifices) for a group of players from the American and National Leagues, collected over 4 seasons. The goal is to describe how players' performances vary from season to season. In particular, we wish to assess and compare the effect of available occasion-specific covariates over seasons. The data are binary sequences for each player and each season. We model dependence in the binary sequence by an autoregressive logistic model. The model includes lagged terms up to a fixed order. For each player and season we introduce a diferent set of autologistic regression coefcients, i.e., the regression coefcients are random effects that are specific to each season and player. We use a nonparametric approach to define a random effects distribution. The nonparametric model is defined as a mixture with a Dirichlet process prior for the mixing measure. The described model is justified by a representation theorem for order-k exchangeable sequences. Besides the repeated measurements for each season and player, multiple seasons within a given player define an additional level of repeated measurements. We introduce dependence at this level of repeated measurements by relating the season-specific random effects vectors in an autoregressive fashion. We ultimately conclude that while some covariates like the ERA of the opposing pitcher are always relevant, others like an indicator for the game being into the seventh inning may be significant only for certain seasons, and some others, like the score of the game, can safely be ignored.

KW - Dirichlet process

KW - Partial exchangeability

KW - Semiparametric random efects

UR - http://www.scopus.com/inward/record.url?scp=76849089543&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=76849089543&partnerID=8YFLogxK

U2 - 10.1214/08-BA312

DO - 10.1214/08-BA312

M3 - Article

C2 - 21909346

AN - SCOPUS:76849089543

VL - 3

SP - 317

EP - 338

JO - Bayesian Analysis

JF - Bayesian Analysis

SN - 1936-0975

IS - 2

ER -