Automated Survival Prediction in Metastatic Cancer Patients Using High-Dimensional Electronic Medical Record Data

Michael F. Gensheimer, A. Solomon Henry, Douglas J. Wood, Trevor J. Hastie, Sonya Aggarwal, Sara A. Dudley, Pooja Pradhan, Imon Banerjee, Eunpi Cho, Kavitha Ramchandran, Erqi Pollom, Albert C. Koong, Daniel L. Rubin, Daniel T. Chang

Research output: Contribution to journalArticle

Abstract

Background: Oncologists use patients' life expectancy to guide decisions and may benefit from a tool that accurately predicts prognosis. Existing prognostic models generally use only a few predictor variables. We used an electronic medical record dataset to train a prognostic model for patients with metastatic cancer. Methods: The model was trained and tested using 12 588 patients treated for metastatic cancer in the Stanford Health Care system from 2008 to 2017. Data sources included provider note text, labs, vital signs, procedures, medication orders, and diagnosis codes. Patients were divided randomly into a training set used to fit the model coefficients and a test set used to evaluate model performance (80%/20% split). A regularized Cox model with 4126 predictor variables was used. A landmarking approach was used due to the multiple observations per patient, with t0 set to the time of metastatic cancer diagnosis. Performance was also evaluated using 399 palliative radiation courses in test set patients. Results: The C-index for overall survival was 0.786 in the test set (averaged across landmark times). For palliative radiation courses, the C-index was 0.745 (95% confidence interval [CI] = 0.715 to 0.775) compared with 0.635 (95% CI = 0.601 to 0.669) for a published model using performance status, primary tumor site, and treated site (two-sided P < .001). Our model's predictions were well-calibrated. Conclusions: The model showed high predictive performance, which will need to be validated using external data. Because it is fully automated, the model can be used to examine providers' practice patterns and could be deployed in a decision support tool to help improve quality of care.

Original languageEnglish (US)
Pages (from-to)568-574
Number of pages7
JournalJournal of the National Cancer Institute
Volume111
Issue number6
DOIs
StatePublished - Jun 1 2019
Externally publishedYes

Fingerprint

Electronic Health Records
Survival
Neoplasms
Confidence Intervals
Radiation
Vital Signs
Quality of Health Care
Information Storage and Retrieval
Life Expectancy
Proportional Hazards Models
Delivery of Health Care

ASJC Scopus subject areas

  • Oncology
  • Cancer Research

Cite this

Automated Survival Prediction in Metastatic Cancer Patients Using High-Dimensional Electronic Medical Record Data. / Gensheimer, Michael F.; Henry, A. Solomon; Wood, Douglas J.; Hastie, Trevor J.; Aggarwal, Sonya; Dudley, Sara A.; Pradhan, Pooja; Banerjee, Imon; Cho, Eunpi; Ramchandran, Kavitha; Pollom, Erqi; Koong, Albert C.; Rubin, Daniel L.; Chang, Daniel T.

In: Journal of the National Cancer Institute, Vol. 111, No. 6, 01.06.2019, p. 568-574.

Research output: Contribution to journalArticle

Gensheimer, MF, Henry, AS, Wood, DJ, Hastie, TJ, Aggarwal, S, Dudley, SA, Pradhan, P, Banerjee, I, Cho, E, Ramchandran, K, Pollom, E, Koong, AC, Rubin, DL & Chang, DT 2019, 'Automated Survival Prediction in Metastatic Cancer Patients Using High-Dimensional Electronic Medical Record Data', Journal of the National Cancer Institute, vol. 111, no. 6, pp. 568-574. https://doi.org/10.1093/jnci/djy178
Gensheimer, Michael F. ; Henry, A. Solomon ; Wood, Douglas J. ; Hastie, Trevor J. ; Aggarwal, Sonya ; Dudley, Sara A. ; Pradhan, Pooja ; Banerjee, Imon ; Cho, Eunpi ; Ramchandran, Kavitha ; Pollom, Erqi ; Koong, Albert C. ; Rubin, Daniel L. ; Chang, Daniel T. / Automated Survival Prediction in Metastatic Cancer Patients Using High-Dimensional Electronic Medical Record Data. In: Journal of the National Cancer Institute. 2019 ; Vol. 111, No. 6. pp. 568-574.
@article{bda8f820736745e3bc9ab0b3c32fae9a,
title = "Automated Survival Prediction in Metastatic Cancer Patients Using High-Dimensional Electronic Medical Record Data",
abstract = "Background: Oncologists use patients' life expectancy to guide decisions and may benefit from a tool that accurately predicts prognosis. Existing prognostic models generally use only a few predictor variables. We used an electronic medical record dataset to train a prognostic model for patients with metastatic cancer. Methods: The model was trained and tested using 12 588 patients treated for metastatic cancer in the Stanford Health Care system from 2008 to 2017. Data sources included provider note text, labs, vital signs, procedures, medication orders, and diagnosis codes. Patients were divided randomly into a training set used to fit the model coefficients and a test set used to evaluate model performance (80{\%}/20{\%} split). A regularized Cox model with 4126 predictor variables was used. A landmarking approach was used due to the multiple observations per patient, with t0 set to the time of metastatic cancer diagnosis. Performance was also evaluated using 399 palliative radiation courses in test set patients. Results: The C-index for overall survival was 0.786 in the test set (averaged across landmark times). For palliative radiation courses, the C-index was 0.745 (95{\%} confidence interval [CI] = 0.715 to 0.775) compared with 0.635 (95{\%} CI = 0.601 to 0.669) for a published model using performance status, primary tumor site, and treated site (two-sided P < .001). Our model's predictions were well-calibrated. Conclusions: The model showed high predictive performance, which will need to be validated using external data. Because it is fully automated, the model can be used to examine providers' practice patterns and could be deployed in a decision support tool to help improve quality of care.",
author = "Gensheimer, {Michael F.} and Henry, {A. Solomon} and Wood, {Douglas J.} and Hastie, {Trevor J.} and Sonya Aggarwal and Dudley, {Sara A.} and Pooja Pradhan and Imon Banerjee and Eunpi Cho and Kavitha Ramchandran and Erqi Pollom and Koong, {Albert C.} and Rubin, {Daniel L.} and Chang, {Daniel T.}",
year = "2019",
month = "6",
day = "1",
doi = "10.1093/jnci/djy178",
language = "English (US)",
volume = "111",
pages = "568--574",
journal = "Journal of the National Cancer Institute",
issn = "0027-8874",
publisher = "Oxford University Press",
number = "6",

}

TY - JOUR

T1 - Automated Survival Prediction in Metastatic Cancer Patients Using High-Dimensional Electronic Medical Record Data

AU - Gensheimer, Michael F.

AU - Henry, A. Solomon

AU - Wood, Douglas J.

AU - Hastie, Trevor J.

AU - Aggarwal, Sonya

AU - Dudley, Sara A.

AU - Pradhan, Pooja

AU - Banerjee, Imon

AU - Cho, Eunpi

AU - Ramchandran, Kavitha

AU - Pollom, Erqi

AU - Koong, Albert C.

AU - Rubin, Daniel L.

AU - Chang, Daniel T.

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Background: Oncologists use patients' life expectancy to guide decisions and may benefit from a tool that accurately predicts prognosis. Existing prognostic models generally use only a few predictor variables. We used an electronic medical record dataset to train a prognostic model for patients with metastatic cancer. Methods: The model was trained and tested using 12 588 patients treated for metastatic cancer in the Stanford Health Care system from 2008 to 2017. Data sources included provider note text, labs, vital signs, procedures, medication orders, and diagnosis codes. Patients were divided randomly into a training set used to fit the model coefficients and a test set used to evaluate model performance (80%/20% split). A regularized Cox model with 4126 predictor variables was used. A landmarking approach was used due to the multiple observations per patient, with t0 set to the time of metastatic cancer diagnosis. Performance was also evaluated using 399 palliative radiation courses in test set patients. Results: The C-index for overall survival was 0.786 in the test set (averaged across landmark times). For palliative radiation courses, the C-index was 0.745 (95% confidence interval [CI] = 0.715 to 0.775) compared with 0.635 (95% CI = 0.601 to 0.669) for a published model using performance status, primary tumor site, and treated site (two-sided P < .001). Our model's predictions were well-calibrated. Conclusions: The model showed high predictive performance, which will need to be validated using external data. Because it is fully automated, the model can be used to examine providers' practice patterns and could be deployed in a decision support tool to help improve quality of care.

AB - Background: Oncologists use patients' life expectancy to guide decisions and may benefit from a tool that accurately predicts prognosis. Existing prognostic models generally use only a few predictor variables. We used an electronic medical record dataset to train a prognostic model for patients with metastatic cancer. Methods: The model was trained and tested using 12 588 patients treated for metastatic cancer in the Stanford Health Care system from 2008 to 2017. Data sources included provider note text, labs, vital signs, procedures, medication orders, and diagnosis codes. Patients were divided randomly into a training set used to fit the model coefficients and a test set used to evaluate model performance (80%/20% split). A regularized Cox model with 4126 predictor variables was used. A landmarking approach was used due to the multiple observations per patient, with t0 set to the time of metastatic cancer diagnosis. Performance was also evaluated using 399 palliative radiation courses in test set patients. Results: The C-index for overall survival was 0.786 in the test set (averaged across landmark times). For palliative radiation courses, the C-index was 0.745 (95% confidence interval [CI] = 0.715 to 0.775) compared with 0.635 (95% CI = 0.601 to 0.669) for a published model using performance status, primary tumor site, and treated site (two-sided P < .001). Our model's predictions were well-calibrated. Conclusions: The model showed high predictive performance, which will need to be validated using external data. Because it is fully automated, the model can be used to examine providers' practice patterns and could be deployed in a decision support tool to help improve quality of care.

UR - http://www.scopus.com/inward/record.url?scp=85062421215&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062421215&partnerID=8YFLogxK

U2 - 10.1093/jnci/djy178

DO - 10.1093/jnci/djy178

M3 - Article

C2 - 30346554

AN - SCOPUS:85062421215

VL - 111

SP - 568

EP - 574

JO - Journal of the National Cancer Institute

JF - Journal of the National Cancer Institute

SN - 0027-8874

IS - 6

ER -