TY - JOUR
T1 - Identifying outliers using multiple kernel canonical correlation analysis with application to imaging genetics
AU - Alam, Md Ashad
AU - Calhoun, Vince D.
AU - Wang, Yu Ping
N1 - Funding Information:
The authors wish to thank the NIH (R01GM109068, R01MH104680, R01MH107354, R01AR059781), and NSF (1539067) for their support.
Funding Information:
The authors wish to thank the NIH ( R01GM109068 , R01MH104680 , R01MH107354 , R01AR059781 ), and NSF ( 1539067 ) for their support.
Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2018/9
Y1 - 2018/9
N2 - Identifying significant outliers or atypical objects from multimodal datasets is an essential and challenging issue for biomedical research. This problem is addressed, using the influence function of multiple kernel canonical correlation analysis. First, the influence function (IF) of the kernel mean element, the kernel covariance operator, the kernel cross-covariance operator and kernel canonical correlation analysis (kernel CCA) are studied. Second, an IF of multiple kernel CCA is proposed, which can be applied to multimodal datasets. Third, a visualization method is proposed to detect influential observations of multiple sources of data based on the IF of kernel CCA and multiple kernel CCA. Finally, to validate the method, experiments on both synthesized and imaging genetics data (e.g., SNP, fMRI, and DNA methylation) are performed. To examine the outliers, both the stem-and-leaf display and distribution based technique are used. The performance of the proposed approach is illustrated on 116 candidate regions of interest (ROIs) from the fMRI data of schizophrenia study to identify significant ROIs. The proposed method and two state-of-the-art statistical methods have identified 8, 34, and 10 ROIs, respectively. Based on an online database, the brain mappings of the selected common 7 ROIs indicate the irregular brain regions susceptible to schizophrenia. The results demonstrate that the proposed method is capable of analyzing outliers and the influence of observations, and can be applicable to many other biomedical data which are often high-dimensional and multi-modal.
AB - Identifying significant outliers or atypical objects from multimodal datasets is an essential and challenging issue for biomedical research. This problem is addressed, using the influence function of multiple kernel canonical correlation analysis. First, the influence function (IF) of the kernel mean element, the kernel covariance operator, the kernel cross-covariance operator and kernel canonical correlation analysis (kernel CCA) are studied. Second, an IF of multiple kernel CCA is proposed, which can be applied to multimodal datasets. Third, a visualization method is proposed to detect influential observations of multiple sources of data based on the IF of kernel CCA and multiple kernel CCA. Finally, to validate the method, experiments on both synthesized and imaging genetics data (e.g., SNP, fMRI, and DNA methylation) are performed. To examine the outliers, both the stem-and-leaf display and distribution based technique are used. The performance of the proposed approach is illustrated on 116 candidate regions of interest (ROIs) from the fMRI data of schizophrenia study to identify significant ROIs. The proposed method and two state-of-the-art statistical methods have identified 8, 34, and 10 ROIs, respectively. Based on an online database, the brain mappings of the selected common 7 ROIs indicate the irregular brain regions susceptible to schizophrenia. The results demonstrate that the proposed method is capable of analyzing outliers and the influence of observations, and can be applicable to many other biomedical data which are often high-dimensional and multi-modal.
KW - Imaging genetics
KW - Influence function
KW - Multimodal datasets
KW - Multiple kernel CCA
KW - Outlier detection
UR - http://www.scopus.com/inward/record.url?scp=85045432897&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045432897&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2018.03.013
DO - 10.1016/j.csda.2018.03.013
M3 - Article
AN - SCOPUS:85045432897
SN - 0167-9473
VL - 125
SP - 70
EP - 85
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
ER -