Using machine learning on cardiorespiratory fitness data for predicting hypertension: The Henry Ford exercise testing (FIT) Project

Sherif Sakr, Radwa Elshawi, Amjad Ahmed, Waqas T. Qureshi, Clinton Brawner, Steven Keteyian, Michael Blaha, Mouaz H. Al-Mallah

Research output: Contribution to journalArticle

Abstract

This study evaluates and compares the performance of different machine learning techniques on predicting the individuals at risk of developing hypertension, and who are likely to benefit most from interventions, using the cardiorespiratory fitness data. The dataset of this study contains information of 23,095 patients who underwent clinician- referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 10-year follow-up. The variables of the dataset include information on vital signs, diagnosis and clinical laboratory measurements. Six machine learning techniques were investigated: LogitBoost (LB), Bayesian Network classifier (BN), Locally Weighted Naive Bayes (LWB), Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Tree Forest (RTF). Using different validation methods, the RTF model has shown the best performance (AUC = 0.93) and outperformed all other machine learning techniques examined in this study. The results have also shown that it is critical to carefully explore and evaluate the performance of the machine learning models using various model evaluation methods as the prediction accuracy can significantly differ.

Original languageEnglish (US)
Article numbere0195344
JournalPLoS One
Volume13
Issue number4
DOIs
StatePublished - Apr 1 2018

Fingerprint

exercise test
artificial intelligence
hypertension
Learning systems
Exercise
Hypertension
Testing
forest trees
Clinical laboratories
Exercise equipment
Vital Signs
Clinical Laboratory Techniques
Bayesian networks
methodology
risk groups
Area Under Curve
Support vector machines
exercise equipment
Classifiers
neural networks

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Sakr, S., Elshawi, R., Ahmed, A., Qureshi, W. T., Brawner, C., Keteyian, S., ... Al-Mallah, M. H. (2018). Using machine learning on cardiorespiratory fitness data for predicting hypertension: The Henry Ford exercise testing (FIT) Project. PLoS One, 13(4), [e0195344]. https://doi.org/10.1371/journal.pone.0195344

Using machine learning on cardiorespiratory fitness data for predicting hypertension : The Henry Ford exercise testing (FIT) Project. / Sakr, Sherif; Elshawi, Radwa; Ahmed, Amjad; Qureshi, Waqas T.; Brawner, Clinton; Keteyian, Steven; Blaha, Michael; Al-Mallah, Mouaz H.

In: PLoS One, Vol. 13, No. 4, e0195344, 01.04.2018.

Research output: Contribution to journalArticle

Sakr, Sherif ; Elshawi, Radwa ; Ahmed, Amjad ; Qureshi, Waqas T. ; Brawner, Clinton ; Keteyian, Steven ; Blaha, Michael ; Al-Mallah, Mouaz H. / Using machine learning on cardiorespiratory fitness data for predicting hypertension : The Henry Ford exercise testing (FIT) Project. In: PLoS One. 2018 ; Vol. 13, No. 4.
@article{ab49fd21a29b4244ac8cfcf4e1517674,
title = "Using machine learning on cardiorespiratory fitness data for predicting hypertension: The Henry Ford exercise testing (FIT) Project",
abstract = "This study evaluates and compares the performance of different machine learning techniques on predicting the individuals at risk of developing hypertension, and who are likely to benefit most from interventions, using the cardiorespiratory fitness data. The dataset of this study contains information of 23,095 patients who underwent clinician- referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 10-year follow-up. The variables of the dataset include information on vital signs, diagnosis and clinical laboratory measurements. Six machine learning techniques were investigated: LogitBoost (LB), Bayesian Network classifier (BN), Locally Weighted Naive Bayes (LWB), Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Tree Forest (RTF). Using different validation methods, the RTF model has shown the best performance (AUC = 0.93) and outperformed all other machine learning techniques examined in this study. The results have also shown that it is critical to carefully explore and evaluate the performance of the machine learning models using various model evaluation methods as the prediction accuracy can significantly differ.",
author = "Sherif Sakr and Radwa Elshawi and Amjad Ahmed and Qureshi, {Waqas T.} and Clinton Brawner and Steven Keteyian and Michael Blaha and Al-Mallah, {Mouaz H.}",
year = "2018",
month = "4",
day = "1",
doi = "10.1371/journal.pone.0195344",
language = "English (US)",
volume = "13",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "4",

}

TY - JOUR

T1 - Using machine learning on cardiorespiratory fitness data for predicting hypertension

T2 - The Henry Ford exercise testing (FIT) Project

AU - Sakr, Sherif

AU - Elshawi, Radwa

AU - Ahmed, Amjad

AU - Qureshi, Waqas T.

AU - Brawner, Clinton

AU - Keteyian, Steven

AU - Blaha, Michael

AU - Al-Mallah, Mouaz H.

PY - 2018/4/1

Y1 - 2018/4/1

N2 - This study evaluates and compares the performance of different machine learning techniques on predicting the individuals at risk of developing hypertension, and who are likely to benefit most from interventions, using the cardiorespiratory fitness data. The dataset of this study contains information of 23,095 patients who underwent clinician- referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 10-year follow-up. The variables of the dataset include information on vital signs, diagnosis and clinical laboratory measurements. Six machine learning techniques were investigated: LogitBoost (LB), Bayesian Network classifier (BN), Locally Weighted Naive Bayes (LWB), Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Tree Forest (RTF). Using different validation methods, the RTF model has shown the best performance (AUC = 0.93) and outperformed all other machine learning techniques examined in this study. The results have also shown that it is critical to carefully explore and evaluate the performance of the machine learning models using various model evaluation methods as the prediction accuracy can significantly differ.

AB - This study evaluates and compares the performance of different machine learning techniques on predicting the individuals at risk of developing hypertension, and who are likely to benefit most from interventions, using the cardiorespiratory fitness data. The dataset of this study contains information of 23,095 patients who underwent clinician- referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 10-year follow-up. The variables of the dataset include information on vital signs, diagnosis and clinical laboratory measurements. Six machine learning techniques were investigated: LogitBoost (LB), Bayesian Network classifier (BN), Locally Weighted Naive Bayes (LWB), Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Tree Forest (RTF). Using different validation methods, the RTF model has shown the best performance (AUC = 0.93) and outperformed all other machine learning techniques examined in this study. The results have also shown that it is critical to carefully explore and evaluate the performance of the machine learning models using various model evaluation methods as the prediction accuracy can significantly differ.

UR - http://www.scopus.com/inward/record.url?scp=85045648714&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045648714&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0195344

DO - 10.1371/journal.pone.0195344

M3 - Article

C2 - 29668729

AN - SCOPUS:85045648714

VL - 13

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 4

M1 - e0195344

ER -