Regression tree boosting to adjust health care cost predictions for diagnostic mix

John W. Robinson

Research output: Contribution to journalArticle

Abstract

Objective. To assess the ability of regression tree boosting to risk-adjust health care cost predictions, using diagnostic groups and demographic variables as inputs. Systems for risk-adjusting health care cost, described in the literature, have consistently employed deterministic models to account for interactions among diagnostic groups, simplifying their statistical representation, but sacrificing potentially useful information. An alternative is to use a statistical learning algorithm such as regression tree boosting that systematically searches the data for consequential interactions, which it automatically incorporates into a risk-adjustment model that is customized to the population under study. Data Source. Administrative data for over 2 million enrollees in indemnity, preferred provider organization (PPO), and point-of-service (POS) plans from Thomson Medstat's Commercial Claims and Encounters database. Study Design. The Agency for Healthcare Research and Quality's Clinical Classification Software (CCS) was used to sort 2001 diagnoses into 260 diagnosis categories (DCs). For each plan type (indemnity, PPO, and POS), boosted regression trees and main effects linear models were fitted to predict concurrent (2001) and prospective (2002) total health care cost per patient, given DCs and demographic variables. Principal Findings. Regression tree boosting explained 49.7-52.1 percent of concurrent cost variance and 15.2-17.7 percent of prospective cost variance in independent test samples. Corresponding results for main effects linear models were 42.5-47.6 percent and 14.2-16.6 percent. Conclusions. The combination of regression tree boosting and a diagnostic grouping scheme, such as CCS, represents a competitive alternative to risk-adjustment systems that use complex deterministic models to account for interactions among diagnostic groups.

Original languageEnglish (US)
Pages (from-to)755-772
Number of pages18
JournalHealth Services Research
Volume43
Issue number2
DOIs
StatePublished - Apr 2008
Externally publishedYes

Fingerprint

Preferred Provider Organizations
Health Care Costs
Risk Adjustment
diagnostic
health care
Insurance
regression
Linear Models
indemnity
risk adjustment
Software
costs
Demography
Costs and Cost Analysis
linear model
Aptitude
Information Storage and Retrieval
Health Services Research
interaction
organization

Keywords

  • Boosting
  • Case mix
  • Data mining
  • Health care cost
  • Risk adjustment

ASJC Scopus subject areas

  • Nursing(all)
  • Health(social science)
  • Health Professions(all)
  • Health Policy

Cite this

Regression tree boosting to adjust health care cost predictions for diagnostic mix. / Robinson, John W.

In: Health Services Research, Vol. 43, No. 2, 04.2008, p. 755-772.

Research output: Contribution to journalArticle

@article{015425eaf9b345648a2ab0c88aaddf71,
title = "Regression tree boosting to adjust health care cost predictions for diagnostic mix",
abstract = "Objective. To assess the ability of regression tree boosting to risk-adjust health care cost predictions, using diagnostic groups and demographic variables as inputs. Systems for risk-adjusting health care cost, described in the literature, have consistently employed deterministic models to account for interactions among diagnostic groups, simplifying their statistical representation, but sacrificing potentially useful information. An alternative is to use a statistical learning algorithm such as regression tree boosting that systematically searches the data for consequential interactions, which it automatically incorporates into a risk-adjustment model that is customized to the population under study. Data Source. Administrative data for over 2 million enrollees in indemnity, preferred provider organization (PPO), and point-of-service (POS) plans from Thomson Medstat's Commercial Claims and Encounters database. Study Design. The Agency for Healthcare Research and Quality's Clinical Classification Software (CCS) was used to sort 2001 diagnoses into 260 diagnosis categories (DCs). For each plan type (indemnity, PPO, and POS), boosted regression trees and main effects linear models were fitted to predict concurrent (2001) and prospective (2002) total health care cost per patient, given DCs and demographic variables. Principal Findings. Regression tree boosting explained 49.7-52.1 percent of concurrent cost variance and 15.2-17.7 percent of prospective cost variance in independent test samples. Corresponding results for main effects linear models were 42.5-47.6 percent and 14.2-16.6 percent. Conclusions. The combination of regression tree boosting and a diagnostic grouping scheme, such as CCS, represents a competitive alternative to risk-adjustment systems that use complex deterministic models to account for interactions among diagnostic groups.",
keywords = "Boosting, Case mix, Data mining, Health care cost, Risk adjustment",
author = "Robinson, {John W.}",
year = "2008",
month = "4",
doi = "10.1111/j.1475-6773.2007.00761.x",
language = "English (US)",
volume = "43",
pages = "755--772",
journal = "Health Services Research",
issn = "0017-9124",
publisher = "Wiley-Blackwell",
number = "2",

}

TY - JOUR

T1 - Regression tree boosting to adjust health care cost predictions for diagnostic mix

AU - Robinson, John W.

PY - 2008/4

Y1 - 2008/4

N2 - Objective. To assess the ability of regression tree boosting to risk-adjust health care cost predictions, using diagnostic groups and demographic variables as inputs. Systems for risk-adjusting health care cost, described in the literature, have consistently employed deterministic models to account for interactions among diagnostic groups, simplifying their statistical representation, but sacrificing potentially useful information. An alternative is to use a statistical learning algorithm such as regression tree boosting that systematically searches the data for consequential interactions, which it automatically incorporates into a risk-adjustment model that is customized to the population under study. Data Source. Administrative data for over 2 million enrollees in indemnity, preferred provider organization (PPO), and point-of-service (POS) plans from Thomson Medstat's Commercial Claims and Encounters database. Study Design. The Agency for Healthcare Research and Quality's Clinical Classification Software (CCS) was used to sort 2001 diagnoses into 260 diagnosis categories (DCs). For each plan type (indemnity, PPO, and POS), boosted regression trees and main effects linear models were fitted to predict concurrent (2001) and prospective (2002) total health care cost per patient, given DCs and demographic variables. Principal Findings. Regression tree boosting explained 49.7-52.1 percent of concurrent cost variance and 15.2-17.7 percent of prospective cost variance in independent test samples. Corresponding results for main effects linear models were 42.5-47.6 percent and 14.2-16.6 percent. Conclusions. The combination of regression tree boosting and a diagnostic grouping scheme, such as CCS, represents a competitive alternative to risk-adjustment systems that use complex deterministic models to account for interactions among diagnostic groups.

AB - Objective. To assess the ability of regression tree boosting to risk-adjust health care cost predictions, using diagnostic groups and demographic variables as inputs. Systems for risk-adjusting health care cost, described in the literature, have consistently employed deterministic models to account for interactions among diagnostic groups, simplifying their statistical representation, but sacrificing potentially useful information. An alternative is to use a statistical learning algorithm such as regression tree boosting that systematically searches the data for consequential interactions, which it automatically incorporates into a risk-adjustment model that is customized to the population under study. Data Source. Administrative data for over 2 million enrollees in indemnity, preferred provider organization (PPO), and point-of-service (POS) plans from Thomson Medstat's Commercial Claims and Encounters database. Study Design. The Agency for Healthcare Research and Quality's Clinical Classification Software (CCS) was used to sort 2001 diagnoses into 260 diagnosis categories (DCs). For each plan type (indemnity, PPO, and POS), boosted regression trees and main effects linear models were fitted to predict concurrent (2001) and prospective (2002) total health care cost per patient, given DCs and demographic variables. Principal Findings. Regression tree boosting explained 49.7-52.1 percent of concurrent cost variance and 15.2-17.7 percent of prospective cost variance in independent test samples. Corresponding results for main effects linear models were 42.5-47.6 percent and 14.2-16.6 percent. Conclusions. The combination of regression tree boosting and a diagnostic grouping scheme, such as CCS, represents a competitive alternative to risk-adjustment systems that use complex deterministic models to account for interactions among diagnostic groups.

KW - Boosting

KW - Case mix

KW - Data mining

KW - Health care cost

KW - Risk adjustment

UR - http://www.scopus.com/inward/record.url?scp=41149096954&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=41149096954&partnerID=8YFLogxK

U2 - 10.1111/j.1475-6773.2007.00761.x

DO - 10.1111/j.1475-6773.2007.00761.x

M3 - Article

VL - 43

SP - 755

EP - 772

JO - Health Services Research

JF - Health Services Research

SN - 0017-9124

IS - 2

ER -