Common sampling and modeling approaches to analyzing readmission risk that ignore clustering produce misleading results

Huaqing Zhao, Samuel Tanner, Sherita H. Golden, Susan G. Fisher, Daniel J. Rubin

Research output: Contribution to journalArticlepeer-review


Background: There is little consensus on how to sample hospitalizations and analyze multiple variables to model readmission risk. The purpose of this study was to compare readmission rates and the accuracy of predictive models based on different sampling and multivariable modeling approaches. Methods: We conducted a retrospective cohort study of 17,284 adult diabetes patients with 44,203 discharges from an urban academic medical center between 1/1/2004 and 12/31/2012. Models for all-cause 30-day readmission were developed by four strategies: logistic regression using the first discharge per patient (LR-first), logistic regression using all discharges (LR-all), generalized estimating equations (GEE) using all discharges, and cluster-weighted (CWGEE) using all discharges. Multiple sets of models were developed and internally validated across a range of sample sizes. Results: The readmission rate was 10.2% among first discharges and 20.3% among all discharges, revealing that sampling only first discharges underestimates a population’s readmission rate. Number of discharges was highly correlated with number of readmissions (r = 0.87, P < 0.001). Accounting for clustering with GEE and CWGEE yielded more conservative estimates of model performance than LR-all. LR-first produced falsely optimistic Brier scores. Model performance was unstable below samples of 6000–8000 discharges and stable in larger samples. GEE and CWGEE performed better in larger samples than in smaller samples. Conclusions: Hospital readmission risk models should be based on all discharges as opposed to just the first discharge per patient and utilize methods that account for clustered data.

Original languageEnglish (US)
Article number281
JournalBMC medical research methodology
Issue number1
StatePublished - Dec 2020


  • Clustering
  • Logistic models
  • Patient readmission
  • Predictive modeling
  • Sampling strategies

ASJC Scopus subject areas

  • Epidemiology
  • Health Informatics


Dive into the research topics of 'Common sampling and modeling approaches to analyzing readmission risk that ignore clustering produce misleading results'. Together they form a unique fingerprint.

Cite this