Machine Learning Methods for Identifying Atrial Fibrillation Cases and Their Predictors in Patients With Hypertrophic Cardiomyopathy: The HCM-AF-Risk Model

Moumita Bhattacharya; Dai Yin Lu; Ioannis Ventoulis; Gabriela V. Greenland; Hulya Yalcin; Yufan Guan; Joseph E. Marine; Jeffrey E. Olgin; Stefan L. Zimmerman; Theodore P. Abraham; M. Roselle Abraham; Hagit Shatkay

doi:10.1016/j.cjco.2021.01.016

Machine Learning Methods for Identifying Atrial Fibrillation Cases and Their Predictors in Patients With Hypertrophic Cardiomyopathy: The HCM-AF-Risk Model

Moumita Bhattacharya, Dai Yin Lu, Ioannis Ventoulis, Gabriela V. Greenland, Hulya Yalcin, Yufan Guan, Joseph E. Marine, Jeffrey E. Olgin, Stefan L. Zimmerman, Theodore P. Abraham, M. Roselle Abraham, Hagit Shatkay

School of Medicine

Research output: Contribution to journal › Article › peer-review

Abstract

Background: Hypertrophic cardiomyopathy (HCM) patients have a high incidence of atrial fibrillation (AF) and increased stroke risk, even with low CHA₂DS₂-VASc (congestive heart failure, hypertension, age diabetes, previous stroke/transient ischemic attack) scores. Hence, there is a need to understand the pathophysiology of AF/stroke in HCM. In this retrospective study, we develop and apply a data-driven, machine learning–based method to identify AF cases, and clinical/imaging features associated with AF, using electronic health record data. Methods: HCM patients with documented paroxysmal/persistent/permanent AF (n = 191) were considered AF cases, and the remaining patients in sinus rhythm (n = 640) were tagged as No-AF. We evaluated 93 clinical variables; the most informative variables useful for distinguishing AF from No-AF cases were selected based on the 2-sample t test and the information gain criterion. Results: We identified 18 highly informative variables that are positively (n = 11) and negatively (n = 7) correlated with AF in HCM. Next, patient records were represented via these 18 variables. Data imbalance resulting from the relatively low number of AF cases was addressed via a combination of oversampling and undersampling strategies. We trained and tested multiple classifiers under this sampling approach, showing effective classification. Specifically, an ensemble of logistic regression and naïve Bayes classifiers, trained based on the 18 variables and corrected for data imbalance, proved most effective for separating AF from No-AF cases (sensitivity = 0.74, specificity = 0.70, C-index = 0.80). Conclusions: Our model (HCM-AF-Risk Model) is the first machine learning–based method for identification of AF cases in HCM. This model demonstrates good performance, addresses data imbalance, and suggests that AF is associated with a more severe cardiac HCM phenotype.

Original language	English (US)
Pages (from-to)	801-813
Number of pages	13
Journal	CJC Open
Volume	3
Issue number	6
DOIs	https://doi.org/10.1016/j.cjco.2021.01.016
State	Published - Jun 2021

ASJC Scopus subject areas

Cardiology and Cardiovascular Medicine

Access to Document

10.1016/j.cjco.2021.01.016

Cite this

Bhattacharya, M., Lu, D. Y., Ventoulis, I., Greenland, G. V., Yalcin, H., Guan, Y., Marine, J. E., Olgin, J. E., Zimmerman, S. L., Abraham, T. P., Abraham, M. R., & Shatkay, H. (2021). Machine Learning Methods for Identifying Atrial Fibrillation Cases and Their Predictors in Patients With Hypertrophic Cardiomyopathy: The HCM-AF-Risk Model. CJC Open, 3(6), 801-813. https://doi.org/10.1016/j.cjco.2021.01.016

Bhattacharya, M, Lu, DY, Ventoulis, I, Greenland, GV, Yalcin, H, Guan, Y, Marine, JE, Olgin, JE, Zimmerman, SL, Abraham, TP, Abraham, MR & Shatkay, H 2021, 'Machine Learning Methods for Identifying Atrial Fibrillation Cases and Their Predictors in Patients With Hypertrophic Cardiomyopathy: The HCM-AF-Risk Model', CJC Open, vol. 3, no. 6, pp. 801-813. https://doi.org/10.1016/j.cjco.2021.01.016

@article{4ad23dadd5fa49468db306939c6bfd03,

title = "Machine Learning Methods for Identifying Atrial Fibrillation Cases and Their Predictors in Patients With Hypertrophic Cardiomyopathy: The HCM-AF-Risk Model",

abstract = "Background: Hypertrophic cardiomyopathy (HCM) patients have a high incidence of atrial fibrillation (AF) and increased stroke risk, even with low CHA2DS2-VASc (congestive heart failure, hypertension, age diabetes, previous stroke/transient ischemic attack) scores. Hence, there is a need to understand the pathophysiology of AF/stroke in HCM. In this retrospective study, we develop and apply a data-driven, machine learning–based method to identify AF cases, and clinical/imaging features associated with AF, using electronic health record data. Methods: HCM patients with documented paroxysmal/persistent/permanent AF (n = 191) were considered AF cases, and the remaining patients in sinus rhythm (n = 640) were tagged as No-AF. We evaluated 93 clinical variables; the most informative variables useful for distinguishing AF from No-AF cases were selected based on the 2-sample t test and the information gain criterion. Results: We identified 18 highly informative variables that are positively (n = 11) and negatively (n = 7) correlated with AF in HCM. Next, patient records were represented via these 18 variables. Data imbalance resulting from the relatively low number of AF cases was addressed via a combination of oversampling and undersampling strategies. We trained and tested multiple classifiers under this sampling approach, showing effective classification. Specifically, an ensemble of logistic regression and na{\"i}ve Bayes classifiers, trained based on the 18 variables and corrected for data imbalance, proved most effective for separating AF from No-AF cases (sensitivity = 0.74, specificity = 0.70, C-index = 0.80). Conclusions: Our model (HCM-AF-Risk Model) is the first machine learning–based method for identification of AF cases in HCM. This model demonstrates good performance, addresses data imbalance, and suggests that AF is associated with a more severe cardiac HCM phenotype.",

author = "Moumita Bhattacharya and Lu, {Dai Yin} and Ioannis Ventoulis and Greenland, {Gabriela V.} and Hulya Yalcin and Yufan Guan and Marine, {Joseph E.} and Olgin, {Jeffrey E.} and Zimmerman, {Stefan L.} and Abraham, {Theodore P.} and Abraham, {M. Roselle} and Hagit Shatkay",

note = "Funding Information: This work was funded in part by the National Science Foundation (NSF) IIS EAGER grant # 1650851 , and the National Institutes of Health grants R01 LM012527, and U54 GM104941 (to H.S.), an award from the John Taylor Babbitt (JTB) foundation (Chatham, New Jersey), and startup funds from the University of California San Francisco, Division of Cardiology (to M.R.A.). Publisher Copyright: {\textcopyright} 2021 The Authors",

year = "2021",

month = jun,

doi = "10.1016/j.cjco.2021.01.016",

language = "English (US)",

volume = "3",

pages = "801--813",

journal = "CJC Open",

issn = "2589-790X",

publisher = "Elsevier Inc.",

number = "6",

}

TY - JOUR

T1 - Machine Learning Methods for Identifying Atrial Fibrillation Cases and Their Predictors in Patients With Hypertrophic Cardiomyopathy

T2 - The HCM-AF-Risk Model

AU - Bhattacharya, Moumita

AU - Lu, Dai Yin

AU - Ventoulis, Ioannis

AU - Greenland, Gabriela V.

AU - Yalcin, Hulya

AU - Guan, Yufan

AU - Marine, Joseph E.

AU - Olgin, Jeffrey E.

AU - Zimmerman, Stefan L.

AU - Abraham, Theodore P.

AU - Abraham, M. Roselle

AU - Shatkay, Hagit

N1 - Funding Information: This work was funded in part by the National Science Foundation (NSF) IIS EAGER grant # 1650851 , and the National Institutes of Health grants R01 LM012527, and U54 GM104941 (to H.S.), an award from the John Taylor Babbitt (JTB) foundation (Chatham, New Jersey), and startup funds from the University of California San Francisco, Division of Cardiology (to M.R.A.). Publisher Copyright: © 2021 The Authors

PY - 2021/6

Y1 - 2021/6

N2 - Background: Hypertrophic cardiomyopathy (HCM) patients have a high incidence of atrial fibrillation (AF) and increased stroke risk, even with low CHA2DS2-VASc (congestive heart failure, hypertension, age diabetes, previous stroke/transient ischemic attack) scores. Hence, there is a need to understand the pathophysiology of AF/stroke in HCM. In this retrospective study, we develop and apply a data-driven, machine learning–based method to identify AF cases, and clinical/imaging features associated with AF, using electronic health record data. Methods: HCM patients with documented paroxysmal/persistent/permanent AF (n = 191) were considered AF cases, and the remaining patients in sinus rhythm (n = 640) were tagged as No-AF. We evaluated 93 clinical variables; the most informative variables useful for distinguishing AF from No-AF cases were selected based on the 2-sample t test and the information gain criterion. Results: We identified 18 highly informative variables that are positively (n = 11) and negatively (n = 7) correlated with AF in HCM. Next, patient records were represented via these 18 variables. Data imbalance resulting from the relatively low number of AF cases was addressed via a combination of oversampling and undersampling strategies. We trained and tested multiple classifiers under this sampling approach, showing effective classification. Specifically, an ensemble of logistic regression and naïve Bayes classifiers, trained based on the 18 variables and corrected for data imbalance, proved most effective for separating AF from No-AF cases (sensitivity = 0.74, specificity = 0.70, C-index = 0.80). Conclusions: Our model (HCM-AF-Risk Model) is the first machine learning–based method for identification of AF cases in HCM. This model demonstrates good performance, addresses data imbalance, and suggests that AF is associated with a more severe cardiac HCM phenotype.

AB - Background: Hypertrophic cardiomyopathy (HCM) patients have a high incidence of atrial fibrillation (AF) and increased stroke risk, even with low CHA2DS2-VASc (congestive heart failure, hypertension, age diabetes, previous stroke/transient ischemic attack) scores. Hence, there is a need to understand the pathophysiology of AF/stroke in HCM. In this retrospective study, we develop and apply a data-driven, machine learning–based method to identify AF cases, and clinical/imaging features associated with AF, using electronic health record data. Methods: HCM patients with documented paroxysmal/persistent/permanent AF (n = 191) were considered AF cases, and the remaining patients in sinus rhythm (n = 640) were tagged as No-AF. We evaluated 93 clinical variables; the most informative variables useful for distinguishing AF from No-AF cases were selected based on the 2-sample t test and the information gain criterion. Results: We identified 18 highly informative variables that are positively (n = 11) and negatively (n = 7) correlated with AF in HCM. Next, patient records were represented via these 18 variables. Data imbalance resulting from the relatively low number of AF cases was addressed via a combination of oversampling and undersampling strategies. We trained and tested multiple classifiers under this sampling approach, showing effective classification. Specifically, an ensemble of logistic regression and naïve Bayes classifiers, trained based on the 18 variables and corrected for data imbalance, proved most effective for separating AF from No-AF cases (sensitivity = 0.74, specificity = 0.70, C-index = 0.80). Conclusions: Our model (HCM-AF-Risk Model) is the first machine learning–based method for identification of AF cases in HCM. This model demonstrates good performance, addresses data imbalance, and suggests that AF is associated with a more severe cardiac HCM phenotype.

UR - http://www.scopus.com/inward/record.url?scp=85108700898&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85108700898&partnerID=8YFLogxK

U2 - 10.1016/j.cjco.2021.01.016

DO - 10.1016/j.cjco.2021.01.016

M3 - Article

C2 - 34169259

AN - SCOPUS:85108700898

SN - 2589-790X

VL - 3

SP - 801

EP - 813

JO - CJC Open

JF - CJC Open

IS - 6

ER -

Machine Learning Methods for Identifying Atrial Fibrillation Cases and Their Predictors in Patients With Hypertrophic Cardiomyopathy: The HCM-AF-Risk Model

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this