TY - JOUR
T1 - Cardiovascular Event Prediction by Machine Learning
T2 - The Multi-Ethnic Study of Atherosclerosis
AU - Ambale-Venkatesh, Bharath
AU - Yang, Xiaoying
AU - Wu, Colin O.
AU - Liu, Kiang
AU - Gregory Hundley, W.
AU - McClelland, Robyn
AU - Gomes, Antoinette S.
AU - Folsom, Aaron R.
AU - Shea, Steven
AU - Guallar, Eliseo
AU - Bluemke, David A.
AU - Lima, João A.C.
N1 - Publisher Copyright:
© 2017 American Heart Association, Inc.
PY - 2017/10/13
Y1 - 2017/10/13
N2 - Rationale: Machine learning may be useful to characterize cardiovascular risk, predict outcomes, and identify biomarkers in population studies. Objective: To test the ability of random survival forests, a machine learning technique, to predict 6 cardiovascular outcomes in comparison to standard cardiovascular risk scores. Methods and Results: We included participants from the MESA (Multi-Ethnic Study of Atherosclerosis). Baseline measurements were used to predict cardiovascular outcomes over 12 years of follow-up. MESA was designed to study progression of subclinical disease to cardiovascular events where participants were initially free of cardiovascular disease. All 6814 participants from MESA, aged 45 to 84 years, from 4 ethnicities, and 6 centers across the United States were included. Seven-hundred thirty-five variables from imaging and noninvasive tests, questionnaires, and biomarker panels were obtained. We used the random survival forests technique to identify the top-20 predictors of each outcome. Imaging, electrocardiography, and serum biomarkers featured heavily on the top-20 lists as opposed to traditional cardiovascular risk factors. Age was the most important predictor for all-cause mortality. Fasting glucose levels and carotid ultrasonography measures were important predictors of stroke. Coronary Artery Calcium score was the most important predictor of coronary heart disease and all atherosclerotic cardiovascular disease combined outcomes. Left ventricular structure and function and cardiac troponin-T were among the top predictors for incident heart failure. Creatinine, age, and ankle-brachial index were among the top predictors of atrial fibrillation. TNF-α (tissue necrosis factor-α) and IL (interleukin)-2 soluble receptors and NT-proBNP (N-Terminal Pro-B-Type Natriuretic Peptide) levels were important across all outcomes. The random survival forests technique performed better than established risk scores with increased prediction accuracy (decreased Brier score by 10%-25%). Conclusions: Machine learning in conjunction with deep phenotyping improves prediction accuracy in cardiovascular event prediction in an initially asymptomatic population. These methods may lead to greater insights on subclinical disease markers without apriori assumptions of causality. Clinical Trial Registration: URL: http://www.clinicaltrials.gov. Unique identifier: NCT00005487.
AB - Rationale: Machine learning may be useful to characterize cardiovascular risk, predict outcomes, and identify biomarkers in population studies. Objective: To test the ability of random survival forests, a machine learning technique, to predict 6 cardiovascular outcomes in comparison to standard cardiovascular risk scores. Methods and Results: We included participants from the MESA (Multi-Ethnic Study of Atherosclerosis). Baseline measurements were used to predict cardiovascular outcomes over 12 years of follow-up. MESA was designed to study progression of subclinical disease to cardiovascular events where participants were initially free of cardiovascular disease. All 6814 participants from MESA, aged 45 to 84 years, from 4 ethnicities, and 6 centers across the United States were included. Seven-hundred thirty-five variables from imaging and noninvasive tests, questionnaires, and biomarker panels were obtained. We used the random survival forests technique to identify the top-20 predictors of each outcome. Imaging, electrocardiography, and serum biomarkers featured heavily on the top-20 lists as opposed to traditional cardiovascular risk factors. Age was the most important predictor for all-cause mortality. Fasting glucose levels and carotid ultrasonography measures were important predictors of stroke. Coronary Artery Calcium score was the most important predictor of coronary heart disease and all atherosclerotic cardiovascular disease combined outcomes. Left ventricular structure and function and cardiac troponin-T were among the top predictors for incident heart failure. Creatinine, age, and ankle-brachial index were among the top predictors of atrial fibrillation. TNF-α (tissue necrosis factor-α) and IL (interleukin)-2 soluble receptors and NT-proBNP (N-Terminal Pro-B-Type Natriuretic Peptide) levels were important across all outcomes. The random survival forests technique performed better than established risk scores with increased prediction accuracy (decreased Brier score by 10%-25%). Conclusions: Machine learning in conjunction with deep phenotyping improves prediction accuracy in cardiovascular event prediction in an initially asymptomatic population. These methods may lead to greater insights on subclinical disease markers without apriori assumptions of causality. Clinical Trial Registration: URL: http://www.clinicaltrials.gov. Unique identifier: NCT00005487.
KW - atrial fibrillation
KW - cardiovascular disease
KW - coronary heart disease
KW - heart failure
KW - machine learning
KW - mortality
KW - stroke
UR - http://www.scopus.com/inward/record.url?scp=85031820117&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85031820117&partnerID=8YFLogxK
U2 - 10.1161/CIRCRESAHA.117.311312
DO - 10.1161/CIRCRESAHA.117.311312
M3 - Article
C2 - 28794054
AN - SCOPUS:85031820117
SN - 0009-7330
VL - 121
SP - 1092
EP - 1101
JO - Circulation research
JF - Circulation research
IS - 9
ER -