Propensity score estimation

neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression

Daniel Westreich, Justin T Lessler, Michele Jonsson Funk

Research output: Contribution to journalArticle

Abstract

Objective: Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting: We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results: We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Conclusion: Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice.

Original languageEnglish (US)
Pages (from-to)826-833
Number of pages8
JournalJournal of Clinical Epidemiology
Volume63
Issue number8
DOIs
StatePublished - 2010

Fingerprint

Propensity Score
Decision Trees
Logistic Models
Biostatistics
Mathematics
Support Vector Machine
Public Health

Keywords

  • Classification and regression trees (CART)
  • Logistic regression
  • Neural networks
  • Propensity scores
  • Recursive partitioning algorithms
  • Review

ASJC Scopus subject areas

  • Epidemiology
  • Medicine(all)

Cite this

@article{d97c86f5073c44fdb794de0cc9749bca,
title = "Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression",
abstract = "Objective: Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting: We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results: We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Conclusion: Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice.",
keywords = "Classification and regression trees (CART), Logistic regression, Neural networks, Propensity scores, Recursive partitioning algorithms, Review",
author = "Daniel Westreich and Lessler, {Justin T} and Funk, {Michele Jonsson}",
year = "2010",
doi = "10.1016/j.jclinepi.2009.11.020",
language = "English (US)",
volume = "63",
pages = "826--833",
journal = "Journal of Clinical Epidemiology",
issn = "0895-4356",
publisher = "Elsevier USA",
number = "8",

}

TY - JOUR

T1 - Propensity score estimation

T2 - neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression

AU - Westreich, Daniel

AU - Lessler, Justin T

AU - Funk, Michele Jonsson

PY - 2010

Y1 - 2010

N2 - Objective: Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting: We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results: We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Conclusion: Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice.

AB - Objective: Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting: We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results: We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Conclusion: Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice.

KW - Classification and regression trees (CART)

KW - Logistic regression

KW - Neural networks

KW - Propensity scores

KW - Recursive partitioning algorithms

KW - Review

UR - http://www.scopus.com/inward/record.url?scp=77953607621&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77953607621&partnerID=8YFLogxK

U2 - 10.1016/j.jclinepi.2009.11.020

DO - 10.1016/j.jclinepi.2009.11.020

M3 - Article

VL - 63

SP - 826

EP - 833

JO - Journal of Clinical Epidemiology

JF - Journal of Clinical Epidemiology

SN - 0895-4356

IS - 8

ER -