Methods for Handling Missing Variables in Risk Prediction Models

Ulrike Held, Alfons Kessels, Judith Garcia Aymerich, Xavier Basaganã, Gerben Ter Riet, Karel G M Moons, Milo A. Puhan

Research output: Contribution to journalArticle

Abstract

Prediction models should be externally validated before being used in clinical practice. Many published prediction models have never been validated. Uncollected predictor variables in otherwise suitable validation cohorts are the main factor precluding external validation. We used individual patient data from 9 different cohort studies conducted in the United States, Europe, and Latin America that included 7,892 patients with chronic obstructive pulmonary disease who enrolled between 1981 and 2006. Data on 3-year mortality and the predictors of age, dyspnea, and airflow obstruction were available. We simulated missing data by omitting the predictor dyspnea cohort-wide, and we present 6 methods for handling the missing variable. We assessed model performance with regard to discriminative ability and calibration and by using 2 vignette scenarios. We showed that the use of any imputation method outperforms the omission of the cohort from the validation, which is a commonly used approach. Compared with using the full data set without the missing variable (benchmark), multiple imputation with fixed or random intercepts for cohorts was the best approach to impute the systematically missing predictor. Findings of this study may facilitate the use of cohort studies that do not include all predictors and pave the way for more widespread external validation of prediction models even if 1 or more predictors of the model are systematically missing.

Original languageEnglish (US)
Pages (from-to)545-551
Number of pages7
JournalAmerican Journal of Epidemiology
Volume184
Issue number7
DOIs
StatePublished - Oct 1 2016
Externally publishedYes

Fingerprint

Dyspnea
Cohort Studies
Benchmarking
Latin America
Chronic Obstructive Pulmonary Disease
Calibration
Mortality
Datasets

Keywords

  • COPD
  • decision support techniques
  • logistic models
  • Meta-Analysis
  • missing data
  • validation studies

ASJC Scopus subject areas

  • Epidemiology

Cite this

Held, U., Kessels, A., Garcia Aymerich, J., Basaganã, X., Ter Riet, G., Moons, K. G. M., & Puhan, M. A. (2016). Methods for Handling Missing Variables in Risk Prediction Models. American Journal of Epidemiology, 184(7), 545-551. https://doi.org/10.1093/aje/kwv346

Methods for Handling Missing Variables in Risk Prediction Models. / Held, Ulrike; Kessels, Alfons; Garcia Aymerich, Judith; Basaganã, Xavier; Ter Riet, Gerben; Moons, Karel G M; Puhan, Milo A.

In: American Journal of Epidemiology, Vol. 184, No. 7, 01.10.2016, p. 545-551.

Research output: Contribution to journalArticle

Held, U, Kessels, A, Garcia Aymerich, J, Basaganã, X, Ter Riet, G, Moons, KGM & Puhan, MA 2016, 'Methods for Handling Missing Variables in Risk Prediction Models', American Journal of Epidemiology, vol. 184, no. 7, pp. 545-551. https://doi.org/10.1093/aje/kwv346
Held U, Kessels A, Garcia Aymerich J, Basaganã X, Ter Riet G, Moons KGM et al. Methods for Handling Missing Variables in Risk Prediction Models. American Journal of Epidemiology. 2016 Oct 1;184(7):545-551. https://doi.org/10.1093/aje/kwv346
Held, Ulrike ; Kessels, Alfons ; Garcia Aymerich, Judith ; Basaganã, Xavier ; Ter Riet, Gerben ; Moons, Karel G M ; Puhan, Milo A. / Methods for Handling Missing Variables in Risk Prediction Models. In: American Journal of Epidemiology. 2016 ; Vol. 184, No. 7. pp. 545-551.
@article{407e516c0a474d989f31c3aeab8afe20,
title = "Methods for Handling Missing Variables in Risk Prediction Models",
abstract = "Prediction models should be externally validated before being used in clinical practice. Many published prediction models have never been validated. Uncollected predictor variables in otherwise suitable validation cohorts are the main factor precluding external validation. We used individual patient data from 9 different cohort studies conducted in the United States, Europe, and Latin America that included 7,892 patients with chronic obstructive pulmonary disease who enrolled between 1981 and 2006. Data on 3-year mortality and the predictors of age, dyspnea, and airflow obstruction were available. We simulated missing data by omitting the predictor dyspnea cohort-wide, and we present 6 methods for handling the missing variable. We assessed model performance with regard to discriminative ability and calibration and by using 2 vignette scenarios. We showed that the use of any imputation method outperforms the omission of the cohort from the validation, which is a commonly used approach. Compared with using the full data set without the missing variable (benchmark), multiple imputation with fixed or random intercepts for cohorts was the best approach to impute the systematically missing predictor. Findings of this study may facilitate the use of cohort studies that do not include all predictors and pave the way for more widespread external validation of prediction models even if 1 or more predictors of the model are systematically missing.",
keywords = "COPD, decision support techniques, logistic models, Meta-Analysis, missing data, validation studies",
author = "Ulrike Held and Alfons Kessels and {Garcia Aymerich}, Judith and Xavier Basagan{\~a} and {Ter Riet}, Gerben and Moons, {Karel G M} and Puhan, {Milo A.}",
year = "2016",
month = "10",
day = "1",
doi = "10.1093/aje/kwv346",
language = "English (US)",
volume = "184",
pages = "545--551",
journal = "American Journal of Epidemiology",
issn = "0002-9262",
publisher = "Oxford University Press",
number = "7",

}

TY - JOUR

T1 - Methods for Handling Missing Variables in Risk Prediction Models

AU - Held, Ulrike

AU - Kessels, Alfons

AU - Garcia Aymerich, Judith

AU - Basaganã, Xavier

AU - Ter Riet, Gerben

AU - Moons, Karel G M

AU - Puhan, Milo A.

PY - 2016/10/1

Y1 - 2016/10/1

N2 - Prediction models should be externally validated before being used in clinical practice. Many published prediction models have never been validated. Uncollected predictor variables in otherwise suitable validation cohorts are the main factor precluding external validation. We used individual patient data from 9 different cohort studies conducted in the United States, Europe, and Latin America that included 7,892 patients with chronic obstructive pulmonary disease who enrolled between 1981 and 2006. Data on 3-year mortality and the predictors of age, dyspnea, and airflow obstruction were available. We simulated missing data by omitting the predictor dyspnea cohort-wide, and we present 6 methods for handling the missing variable. We assessed model performance with regard to discriminative ability and calibration and by using 2 vignette scenarios. We showed that the use of any imputation method outperforms the omission of the cohort from the validation, which is a commonly used approach. Compared with using the full data set without the missing variable (benchmark), multiple imputation with fixed or random intercepts for cohorts was the best approach to impute the systematically missing predictor. Findings of this study may facilitate the use of cohort studies that do not include all predictors and pave the way for more widespread external validation of prediction models even if 1 or more predictors of the model are systematically missing.

AB - Prediction models should be externally validated before being used in clinical practice. Many published prediction models have never been validated. Uncollected predictor variables in otherwise suitable validation cohorts are the main factor precluding external validation. We used individual patient data from 9 different cohort studies conducted in the United States, Europe, and Latin America that included 7,892 patients with chronic obstructive pulmonary disease who enrolled between 1981 and 2006. Data on 3-year mortality and the predictors of age, dyspnea, and airflow obstruction were available. We simulated missing data by omitting the predictor dyspnea cohort-wide, and we present 6 methods for handling the missing variable. We assessed model performance with regard to discriminative ability and calibration and by using 2 vignette scenarios. We showed that the use of any imputation method outperforms the omission of the cohort from the validation, which is a commonly used approach. Compared with using the full data set without the missing variable (benchmark), multiple imputation with fixed or random intercepts for cohorts was the best approach to impute the systematically missing predictor. Findings of this study may facilitate the use of cohort studies that do not include all predictors and pave the way for more widespread external validation of prediction models even if 1 or more predictors of the model are systematically missing.

KW - COPD

KW - decision support techniques

KW - logistic models

KW - Meta-Analysis

KW - missing data

KW - validation studies

UR - http://www.scopus.com/inward/record.url?scp=84994106380&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994106380&partnerID=8YFLogxK

U2 - 10.1093/aje/kwv346

DO - 10.1093/aje/kwv346

M3 - Article

VL - 184

SP - 545

EP - 551

JO - American Journal of Epidemiology

JF - American Journal of Epidemiology

SN - 0002-9262

IS - 7

ER -