TY - GEN
T1 - Influence function of multiple kernel canonical analysis to identify outliers in imaging genetics data
AU - Alam, Md Ashad
AU - Calhoun, Vince
AU - Wang, Yu Ping
N1 - Funding Information:
We would like to thank the reviewers for their careful reading of the manuscript and their useful comments. The authors also wish to thank the NIH (R01 GM109068, R01 MH104680) and NSF (1539067) for support.
Publisher Copyright:
© 2016 ACM.
PY - 2016/10/2
Y1 - 2016/10/2
N2 - Imaging genetic research has essentially focused on discovering unique and co-association effects, but typically ignoring to identify outliers or atypical objects in genetic as well as non-genetics variables. Identifying significant outliers is an essential and challenging issue for imaging genetics and multiple sources data analysis. Therefore, we need to examine for transcription errors of identified outliers. First, we address the influence function (IF) of kernel mean element, kernel covariance operator, kernel cross-covariance operator, kernel canonical correlation analysis (kernel CCA) and multiple kernel CCA. Second, we propose an IF of multiple kernel CCA, which can be applied for more than two datasets. Third, we propose a visualization method to detect influential observations of multiple sources of data based on the IF of kernel CCA and multiple kernel CCA. Finally, the proposed methods are capable of analyzing outliers of subjects usually found in biomedical applications, in which the number of dimension is large. To examine the outliers, we use the stem-and-leaf display. Experiments on both synthesized and imaging genetics data (e.g., SNP, fMRI, and DNA methylation) demonstrate that the proposed visualization can be applied effectively.
AB - Imaging genetic research has essentially focused on discovering unique and co-association effects, but typically ignoring to identify outliers or atypical objects in genetic as well as non-genetics variables. Identifying significant outliers is an essential and challenging issue for imaging genetics and multiple sources data analysis. Therefore, we need to examine for transcription errors of identified outliers. First, we address the influence function (IF) of kernel mean element, kernel covariance operator, kernel cross-covariance operator, kernel canonical correlation analysis (kernel CCA) and multiple kernel CCA. Second, we propose an IF of multiple kernel CCA, which can be applied for more than two datasets. Third, we propose a visualization method to detect influential observations of multiple sources of data based on the IF of kernel CCA and multiple kernel CCA. Finally, the proposed methods are capable of analyzing outliers of subjects usually found in biomedical applications, in which the number of dimension is large. To examine the outliers, we use the stem-and-leaf display. Experiments on both synthesized and imaging genetics data (e.g., SNP, fMRI, and DNA methylation) demonstrate that the proposed visualization can be applied effectively.
KW - Data integration
KW - Influence function
KW - Kernel CCA
KW - Multiple kernel CCA
KW - Outlier detection in imaging genetics
UR - http://www.scopus.com/inward/record.url?scp=85009754094&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85009754094&partnerID=8YFLogxK
U2 - 10.1145/2975167.2975189
DO - 10.1145/2975167.2975189
M3 - Conference contribution
AN - SCOPUS:85009754094
T3 - ACM-BCB 2016 - 7th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
SP - 210
EP - 219
BT - ACM-BCB 2016 - 7th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PB - Association for Computing Machinery, Inc
T2 - 7th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2016
Y2 - 2 October 2016 through 5 October 2016
ER -