Identifying Ventricular Arrhythmias and Their Predictors by Applying Machine Learning Methods to Electronic Health Records in Patients With Hypertrophic Cardiomyopathy (HCM-VAr-Risk Model)

Moumita Bhattacharya, Dai Yin Lu, Shibani M. Kudchadkar, Gabriela Villarreal Greenland, Prasanth Lingamaneni, Celia Corona Villalobos, Yufan Guan, Joseph Marine, Jeffrey E. Olgin, Stefan Zimmerman, Theodore P. Abraham, Hagit Shatkay, Maria Roselle Abraham

Research output: Contribution to journalArticle

Abstract

Clinical risk stratification for sudden cardiac death (SCD) in hypertrophic cardiomyopathy (HC) employs rules derived from American College of Cardiology Foundation/American Heart Association (ACCF/AHA) guidelines or the HCM Risk-SCD model (C-index ∼0.69), which utilize a few clinical variables. We assessed whether data-driven machine learning methods that consider a wider range of variables can effectively identify HC patients with ventricular arrhythmias (VAr) that lead to SCD. We scanned the electronic health records of 711 HC patients for sustained ventricular tachycardia or ventricular fibrillation. Patients with ventricular tachycardia or ventricular fibrillation (n = 61) were tagged as VAr cases and the remaining (n = 650) as non-VAr. The 2-sample ttest and information gain criterion were used to identify the most informative clinical variables that distinguish VAr from non-VAr; patient records were reduced to include only these variables. Data imbalance stemming from low number of VAr cases was addressed by applying a combination of over- and undersampling strategies. We trained and tested multiple classifiers under this sampling approach, showing effective classification. We evaluated 93 clinical variables, of which 22 proved predictive of VAr. The ensemble of logistic regression and naïve Bayes classifiers, trained based on these 22 variables and corrected for data imbalance, was most effective in separating VAr from non-VAr cases (sensitivity = 0.73, specificity = 0.76, C-index = 0.83). Our method (HCM-VAr-Risk Model) identified 12 new predictors of VAr, in addition to 10 established SCD predictors. In conclusion, this is the first application of machine learning for identifying HC patients with VAr, using clinical attributes. Our model demonstrates good performance (C-index) compared with currently employed SCD prediction algorithms, while addressing imbalance inherent in clinical data.

Original languageEnglish (US)
JournalAmerican Journal of Cardiology
DOIs
StatePublished - Jan 1 2019

Fingerprint

Electronic Health Records
Hypertrophic Cardiomyopathy
Cardiac Arrhythmias
Sudden Cardiac Death
Ventricular Fibrillation
Ventricular Tachycardia
Machine Learning
Logistic Models
Guidelines

ASJC Scopus subject areas

  • Cardiology and Cardiovascular Medicine

Cite this

Identifying Ventricular Arrhythmias and Their Predictors by Applying Machine Learning Methods to Electronic Health Records in Patients With Hypertrophic Cardiomyopathy (HCM-VAr-Risk Model). / Bhattacharya, Moumita; Lu, Dai Yin; Kudchadkar, Shibani M.; Greenland, Gabriela Villarreal; Lingamaneni, Prasanth; Corona Villalobos, Celia; Guan, Yufan; Marine, Joseph; Olgin, Jeffrey E.; Zimmerman, Stefan; Abraham, Theodore P.; Shatkay, Hagit; Abraham, Maria Roselle.

In: American Journal of Cardiology, 01.01.2019.

Research output: Contribution to journalArticle

Bhattacharya, Moumita ; Lu, Dai Yin ; Kudchadkar, Shibani M. ; Greenland, Gabriela Villarreal ; Lingamaneni, Prasanth ; Corona Villalobos, Celia ; Guan, Yufan ; Marine, Joseph ; Olgin, Jeffrey E. ; Zimmerman, Stefan ; Abraham, Theodore P. ; Shatkay, Hagit ; Abraham, Maria Roselle. / Identifying Ventricular Arrhythmias and Their Predictors by Applying Machine Learning Methods to Electronic Health Records in Patients With Hypertrophic Cardiomyopathy (HCM-VAr-Risk Model). In: American Journal of Cardiology. 2019.
@article{32d27a656a8d4929843922877765851b,
title = "Identifying Ventricular Arrhythmias and Their Predictors by Applying Machine Learning Methods to Electronic Health Records in Patients With Hypertrophic Cardiomyopathy (HCM-VAr-Risk Model)",
abstract = "Clinical risk stratification for sudden cardiac death (SCD) in hypertrophic cardiomyopathy (HC) employs rules derived from American College of Cardiology Foundation/American Heart Association (ACCF/AHA) guidelines or the HCM Risk-SCD model (C-index ∼0.69), which utilize a few clinical variables. We assessed whether data-driven machine learning methods that consider a wider range of variables can effectively identify HC patients with ventricular arrhythmias (VAr) that lead to SCD. We scanned the electronic health records of 711 HC patients for sustained ventricular tachycardia or ventricular fibrillation. Patients with ventricular tachycardia or ventricular fibrillation (n = 61) were tagged as VAr cases and the remaining (n = 650) as non-VAr. The 2-sample ttest and information gain criterion were used to identify the most informative clinical variables that distinguish VAr from non-VAr; patient records were reduced to include only these variables. Data imbalance stemming from low number of VAr cases was addressed by applying a combination of over- and undersampling strategies. We trained and tested multiple classifiers under this sampling approach, showing effective classification. We evaluated 93 clinical variables, of which 22 proved predictive of VAr. The ensemble of logistic regression and na{\"i}ve Bayes classifiers, trained based on these 22 variables and corrected for data imbalance, was most effective in separating VAr from non-VAr cases (sensitivity = 0.73, specificity = 0.76, C-index = 0.83). Our method (HCM-VAr-Risk Model) identified 12 new predictors of VAr, in addition to 10 established SCD predictors. In conclusion, this is the first application of machine learning for identifying HC patients with VAr, using clinical attributes. Our model demonstrates good performance (C-index) compared with currently employed SCD prediction algorithms, while addressing imbalance inherent in clinical data.",
author = "Moumita Bhattacharya and Lu, {Dai Yin} and Kudchadkar, {Shibani M.} and Greenland, {Gabriela Villarreal} and Prasanth Lingamaneni and {Corona Villalobos}, Celia and Yufan Guan and Joseph Marine and Olgin, {Jeffrey E.} and Stefan Zimmerman and Abraham, {Theodore P.} and Hagit Shatkay and Abraham, {Maria Roselle}",
year = "2019",
month = "1",
day = "1",
doi = "10.1016/j.amjcard.2019.02.022",
language = "English (US)",
journal = "American Journal of Cardiology",
issn = "0002-9149",
publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Identifying Ventricular Arrhythmias and Their Predictors by Applying Machine Learning Methods to Electronic Health Records in Patients With Hypertrophic Cardiomyopathy (HCM-VAr-Risk Model)

AU - Bhattacharya, Moumita

AU - Lu, Dai Yin

AU - Kudchadkar, Shibani M.

AU - Greenland, Gabriela Villarreal

AU - Lingamaneni, Prasanth

AU - Corona Villalobos, Celia

AU - Guan, Yufan

AU - Marine, Joseph

AU - Olgin, Jeffrey E.

AU - Zimmerman, Stefan

AU - Abraham, Theodore P.

AU - Shatkay, Hagit

AU - Abraham, Maria Roselle

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Clinical risk stratification for sudden cardiac death (SCD) in hypertrophic cardiomyopathy (HC) employs rules derived from American College of Cardiology Foundation/American Heart Association (ACCF/AHA) guidelines or the HCM Risk-SCD model (C-index ∼0.69), which utilize a few clinical variables. We assessed whether data-driven machine learning methods that consider a wider range of variables can effectively identify HC patients with ventricular arrhythmias (VAr) that lead to SCD. We scanned the electronic health records of 711 HC patients for sustained ventricular tachycardia or ventricular fibrillation. Patients with ventricular tachycardia or ventricular fibrillation (n = 61) were tagged as VAr cases and the remaining (n = 650) as non-VAr. The 2-sample ttest and information gain criterion were used to identify the most informative clinical variables that distinguish VAr from non-VAr; patient records were reduced to include only these variables. Data imbalance stemming from low number of VAr cases was addressed by applying a combination of over- and undersampling strategies. We trained and tested multiple classifiers under this sampling approach, showing effective classification. We evaluated 93 clinical variables, of which 22 proved predictive of VAr. The ensemble of logistic regression and naïve Bayes classifiers, trained based on these 22 variables and corrected for data imbalance, was most effective in separating VAr from non-VAr cases (sensitivity = 0.73, specificity = 0.76, C-index = 0.83). Our method (HCM-VAr-Risk Model) identified 12 new predictors of VAr, in addition to 10 established SCD predictors. In conclusion, this is the first application of machine learning for identifying HC patients with VAr, using clinical attributes. Our model demonstrates good performance (C-index) compared with currently employed SCD prediction algorithms, while addressing imbalance inherent in clinical data.

AB - Clinical risk stratification for sudden cardiac death (SCD) in hypertrophic cardiomyopathy (HC) employs rules derived from American College of Cardiology Foundation/American Heart Association (ACCF/AHA) guidelines or the HCM Risk-SCD model (C-index ∼0.69), which utilize a few clinical variables. We assessed whether data-driven machine learning methods that consider a wider range of variables can effectively identify HC patients with ventricular arrhythmias (VAr) that lead to SCD. We scanned the electronic health records of 711 HC patients for sustained ventricular tachycardia or ventricular fibrillation. Patients with ventricular tachycardia or ventricular fibrillation (n = 61) were tagged as VAr cases and the remaining (n = 650) as non-VAr. The 2-sample ttest and information gain criterion were used to identify the most informative clinical variables that distinguish VAr from non-VAr; patient records were reduced to include only these variables. Data imbalance stemming from low number of VAr cases was addressed by applying a combination of over- and undersampling strategies. We trained and tested multiple classifiers under this sampling approach, showing effective classification. We evaluated 93 clinical variables, of which 22 proved predictive of VAr. The ensemble of logistic regression and naïve Bayes classifiers, trained based on these 22 variables and corrected for data imbalance, was most effective in separating VAr from non-VAr cases (sensitivity = 0.73, specificity = 0.76, C-index = 0.83). Our method (HCM-VAr-Risk Model) identified 12 new predictors of VAr, in addition to 10 established SCD predictors. In conclusion, this is the first application of machine learning for identifying HC patients with VAr, using clinical attributes. Our model demonstrates good performance (C-index) compared with currently employed SCD prediction algorithms, while addressing imbalance inherent in clinical data.

UR - http://www.scopus.com/inward/record.url?scp=85063739229&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063739229&partnerID=8YFLogxK

U2 - 10.1016/j.amjcard.2019.02.022

DO - 10.1016/j.amjcard.2019.02.022

M3 - Article

C2 - 30952382

AN - SCOPUS:85063739229

JO - American Journal of Cardiology

JF - American Journal of Cardiology

SN - 0002-9149

ER -