Mixed effect machine learning: A framework for predicting longitudinal change in hemoglobin A1c

Che Ngufor; Holly Van Houten; Brian S. Caffo; Nilay D. Shah; Rozalina G. McCoy

doi:10.1016/j.jbi.2018.09.001

Mixed effect machine learning: A framework for predicting longitudinal change in hemoglobin A1c

Che Ngufor, Holly Van Houten, Brian S. Caffo, Nilay D. Shah, Rozalina G. McCoy

Bloomberg School of Public Health

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

Accurate and reliable prediction of clinical progression over time has the potential to improve the outcomes of chronic disease. The classical approach to analyzing longitudinal data is to use (generalized) linear mixed-effect models (GLMM). However, linear parametric models are predicated on assumptions, which are often difficult to verify. In contrast, data-driven machine learning methods can be applied to derive insight from the raw data without a priori assumptions. However, the underlying theory of most machine learning algorithms assume that the data is independent and identically distributed, making them inefficient for longitudinal supervised learning. In this study, we formulate an analytic framework, which integrates the random-effects structure of GLMM into non-linear machine learning models capable of exploiting temporal heterogeneous effects, sparse and varying-length patient characteristics inherent in longitudinal data. We applied the derived mixed-effect machine learning (MEml) framework to predict longitudinal change in glycemic control measured by hemoglobin A1c (HbA1c) among well controlled adults with type 2 diabetes. Results show that MEml is competitive with traditional GLMM, but substantially outperformed standard machine learning models that do not account for random-effects. Specifically, the accuracy of MEml in predicting glycemic change at the 1st, 2nd, 3rd, and 4th clinical visits in advanced was 1.04, 1.08, 1.11, and 1.14 times that of the gradient boosted model respectively, with similar results for the other methods. To further demonstrate the general applicability of MEml, a series of experiments were performed using real publicly available and synthetic data sets for accuracy and robustness. These experiments reinforced the superiority of MEml over the other methods. Overall, results from this study highlight the importance of modeling random-effects in machine learning approaches based on longitudinal data. Our MEml method is highly resistant to correlated data, readily accounts for random-effects, and predicts change of a longitudinal clinical outcome in real-world clinical settings with high accuracy.

Original language	English (US)
Pages (from-to)	56-67
Number of pages	12
Journal	Journal of Biomedical Informatics
Volume	89
DOIs	https://doi.org/10.1016/j.jbi.2018.09.001
State	Published - Jan 2019

Keywords

Glycemic control
Glycosylated hemoglobin
Longitudinal supervised learning
Machine learning
Random-effects
Type 2 diabetes

ASJC Scopus subject areas

Computer Science Applications
Health Informatics

Access to Document

10.1016/j.jbi.2018.09.001

Cite this

@article{99fd3231f2bf4c1881396de776e322d3,

title = "Mixed effect machine learning: A framework for predicting longitudinal change in hemoglobin A1c",

abstract = "Accurate and reliable prediction of clinical progression over time has the potential to improve the outcomes of chronic disease. The classical approach to analyzing longitudinal data is to use (generalized) linear mixed-effect models (GLMM). However, linear parametric models are predicated on assumptions, which are often difficult to verify. In contrast, data-driven machine learning methods can be applied to derive insight from the raw data without a priori assumptions. However, the underlying theory of most machine learning algorithms assume that the data is independent and identically distributed, making them inefficient for longitudinal supervised learning. In this study, we formulate an analytic framework, which integrates the random-effects structure of GLMM into non-linear machine learning models capable of exploiting temporal heterogeneous effects, sparse and varying-length patient characteristics inherent in longitudinal data. We applied the derived mixed-effect machine learning (MEml) framework to predict longitudinal change in glycemic control measured by hemoglobin A1c (HbA1c) among well controlled adults with type 2 diabetes. Results show that MEml is competitive with traditional GLMM, but substantially outperformed standard machine learning models that do not account for random-effects. Specifically, the accuracy of MEml in predicting glycemic change at the 1st, 2nd, 3rd, and 4th clinical visits in advanced was 1.04, 1.08, 1.11, and 1.14 times that of the gradient boosted model respectively, with similar results for the other methods. To further demonstrate the general applicability of MEml, a series of experiments were performed using real publicly available and synthetic data sets for accuracy and robustness. These experiments reinforced the superiority of MEml over the other methods. Overall, results from this study highlight the importance of modeling random-effects in machine learning approaches based on longitudinal data. Our MEml method is highly resistant to correlated data, readily accounts for random-effects, and predicts change of a longitudinal clinical outcome in real-world clinical settings with high accuracy.",

keywords = "Glycemic control, Glycosylated hemoglobin, Longitudinal supervised learning, Machine learning, Random-effects, Type 2 diabetes",

author = "Che Ngufor and {Van Houten}, Holly and Caffo, {Brian S.} and Shah, {Nilay D.} and McCoy, {Rozalina G.}",

note = "Funding Information: This work is supported by the Mayo Clinic Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery (Dr. Ngufor, Dr. Shah, and Dr. McCoy) and by the National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health under Award Number K23DK114497 (Dr. McCoy). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Publisher Copyright: {\textcopyright} 2018 Elsevier Inc.",

year = "2019",

month = jan,

doi = "10.1016/j.jbi.2018.09.001",

language = "English (US)",

volume = "89",

pages = "56--67",

journal = "Journal of Biomedical Informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Mixed effect machine learning

T2 - A framework for predicting longitudinal change in hemoglobin A1c

AU - Ngufor, Che

AU - Van Houten, Holly

AU - Caffo, Brian S.

AU - Shah, Nilay D.

AU - McCoy, Rozalina G.

N1 - Funding Information: This work is supported by the Mayo Clinic Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery (Dr. Ngufor, Dr. Shah, and Dr. McCoy) and by the National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health under Award Number K23DK114497 (Dr. McCoy). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Publisher Copyright: © 2018 Elsevier Inc.

PY - 2019/1

Y1 - 2019/1

N2 - Accurate and reliable prediction of clinical progression over time has the potential to improve the outcomes of chronic disease. The classical approach to analyzing longitudinal data is to use (generalized) linear mixed-effect models (GLMM). However, linear parametric models are predicated on assumptions, which are often difficult to verify. In contrast, data-driven machine learning methods can be applied to derive insight from the raw data without a priori assumptions. However, the underlying theory of most machine learning algorithms assume that the data is independent and identically distributed, making them inefficient for longitudinal supervised learning. In this study, we formulate an analytic framework, which integrates the random-effects structure of GLMM into non-linear machine learning models capable of exploiting temporal heterogeneous effects, sparse and varying-length patient characteristics inherent in longitudinal data. We applied the derived mixed-effect machine learning (MEml) framework to predict longitudinal change in glycemic control measured by hemoglobin A1c (HbA1c) among well controlled adults with type 2 diabetes. Results show that MEml is competitive with traditional GLMM, but substantially outperformed standard machine learning models that do not account for random-effects. Specifically, the accuracy of MEml in predicting glycemic change at the 1st, 2nd, 3rd, and 4th clinical visits in advanced was 1.04, 1.08, 1.11, and 1.14 times that of the gradient boosted model respectively, with similar results for the other methods. To further demonstrate the general applicability of MEml, a series of experiments were performed using real publicly available and synthetic data sets for accuracy and robustness. These experiments reinforced the superiority of MEml over the other methods. Overall, results from this study highlight the importance of modeling random-effects in machine learning approaches based on longitudinal data. Our MEml method is highly resistant to correlated data, readily accounts for random-effects, and predicts change of a longitudinal clinical outcome in real-world clinical settings with high accuracy.

AB - Accurate and reliable prediction of clinical progression over time has the potential to improve the outcomes of chronic disease. The classical approach to analyzing longitudinal data is to use (generalized) linear mixed-effect models (GLMM). However, linear parametric models are predicated on assumptions, which are often difficult to verify. In contrast, data-driven machine learning methods can be applied to derive insight from the raw data without a priori assumptions. However, the underlying theory of most machine learning algorithms assume that the data is independent and identically distributed, making them inefficient for longitudinal supervised learning. In this study, we formulate an analytic framework, which integrates the random-effects structure of GLMM into non-linear machine learning models capable of exploiting temporal heterogeneous effects, sparse and varying-length patient characteristics inherent in longitudinal data. We applied the derived mixed-effect machine learning (MEml) framework to predict longitudinal change in glycemic control measured by hemoglobin A1c (HbA1c) among well controlled adults with type 2 diabetes. Results show that MEml is competitive with traditional GLMM, but substantially outperformed standard machine learning models that do not account for random-effects. Specifically, the accuracy of MEml in predicting glycemic change at the 1st, 2nd, 3rd, and 4th clinical visits in advanced was 1.04, 1.08, 1.11, and 1.14 times that of the gradient boosted model respectively, with similar results for the other methods. To further demonstrate the general applicability of MEml, a series of experiments were performed using real publicly available and synthetic data sets for accuracy and robustness. These experiments reinforced the superiority of MEml over the other methods. Overall, results from this study highlight the importance of modeling random-effects in machine learning approaches based on longitudinal data. Our MEml method is highly resistant to correlated data, readily accounts for random-effects, and predicts change of a longitudinal clinical outcome in real-world clinical settings with high accuracy.

KW - Glycemic control

KW - Glycosylated hemoglobin

KW - Longitudinal supervised learning

KW - Machine learning

KW - Random-effects

KW - Type 2 diabetes

UR - http://www.scopus.com/inward/record.url?scp=85057830461&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057830461&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2018.09.001

DO - 10.1016/j.jbi.2018.09.001

M3 - Article

C2 - 30189255

AN - SCOPUS:85057830461

SN - 1532-0464

VL - 89

SP - 56

EP - 67

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

ER -

Mixed effect machine learning: A framework for predicting longitudinal change in hemoglobin A1c

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this