TY - JOUR
T1 - Optimized Combination of Multiple Graphs with Application to the Integration of Brain Imaging and (epi)Genomics Data
AU - Bai, Yuntong
AU - Pascal, Zille
AU - Calhoun, Vince
AU - Wang, Yu Ping
N1 - Funding Information:
Manuscript received September 5, 2019; revised November 26, 2019; accepted December 3, 2019. Date of publication December 6, 2019; date of current version June 1, 2020. This work was supported in part by NIH under Grant P20GM103472, Grant R01EB005846, Grant R01GM109068, Grant R01MH104680, Grant R01MH107354, and Grant R01MH094524 and in part by NSF under Grant #1539067. (Corresponding author: Yuntong Bai.) Y. Bai, Z. Pascal, and Y.-P. Wang are with the Biomedical Engineering Department, Tulane University, New Orleans, LA 70118 USA (e-mail: wyp@tulane.edu; ybai1@tulane.edu).
Publisher Copyright:
© 1982-2012 IEEE.
PY - 2020/6
Y1 - 2020/6
N2 - With the rapid development of high-throughput technologies, a growing amount of multi-omics data are collected, giving rise to a great demand for combining such data for biomedical discovery. Due to the cost and time to label the data manually, the number of labelled samples is limited. This motivated the need for semi-supervised learning algorithms. In this work, we applied a graph-based semi-supervised learning (GSSL) to classify a severe chronic mental disorder, schizophrenia (SZ). An advantage of GSSL is that it can simultaneously analyse more than two types of data, while many existing models focus on pairwise data analysis. In particular, we applied GSSL to the analysis of single nucleotide polymorphism (SNP), functional magnetic resonance imaging (fMRI) and DNA methylation data, which accounts for genetics, brain imaging (endophenotypes), and environmental factors (epigenomics) respectively. While parameter selection has been an open challenge for most models, another key contribution of this work is that we explored the parameter space to interpret their meaning and established practical guidelines. Based on the practical significance of each hyper-parameter, a relatively small range of candidate values can be determined in a data-driven way to both optimize and speed up the parameter tuning process. We validated the model through both synthetic data and a real SZ dataset of 184 subjects from the Mental Illness and Neuroscience Discovery (MIND) Clinical Imaging Consortium. In comparison to several existing approaches, our algorithm achieved better performance in terms of classification accuracy. We also confirmed the significance of several brain regions associated with SZ.
AB - With the rapid development of high-throughput technologies, a growing amount of multi-omics data are collected, giving rise to a great demand for combining such data for biomedical discovery. Due to the cost and time to label the data manually, the number of labelled samples is limited. This motivated the need for semi-supervised learning algorithms. In this work, we applied a graph-based semi-supervised learning (GSSL) to classify a severe chronic mental disorder, schizophrenia (SZ). An advantage of GSSL is that it can simultaneously analyse more than two types of data, while many existing models focus on pairwise data analysis. In particular, we applied GSSL to the analysis of single nucleotide polymorphism (SNP), functional magnetic resonance imaging (fMRI) and DNA methylation data, which accounts for genetics, brain imaging (endophenotypes), and environmental factors (epigenomics) respectively. While parameter selection has been an open challenge for most models, another key contribution of this work is that we explored the parameter space to interpret their meaning and established practical guidelines. Based on the practical significance of each hyper-parameter, a relatively small range of candidate values can be determined in a data-driven way to both optimize and speed up the parameter tuning process. We validated the model through both synthetic data and a real SZ dataset of 184 subjects from the Mental Illness and Neuroscience Discovery (MIND) Clinical Imaging Consortium. In comparison to several existing approaches, our algorithm achieved better performance in terms of classification accuracy. We also confirmed the significance of several brain regions associated with SZ.
KW - Multi-view learning
KW - graph-based analysis
KW - parameter selection
KW - schizophrenia
UR - http://www.scopus.com/inward/record.url?scp=85085905188&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85085905188&partnerID=8YFLogxK
U2 - 10.1109/TMI.2019.2958256
DO - 10.1109/TMI.2019.2958256
M3 - Article
C2 - 31825864
AN - SCOPUS:85085905188
SN - 0278-0062
VL - 39
SP - 1801
EP - 1811
JO - IEEE transactions on medical imaging
JF - IEEE transactions on medical imaging
IS - 6
M1 - 8926394
ER -