TY - JOUR
T1 - Discovering and deciphering relationships across disparate data modalities
AU - Vogelstein, Joshua T.
AU - Bridgeford, Eric W.
AU - Wang, Qing
AU - Priebe, Carey E.
AU - Maggioni, Mauro
AU - Shen, Cencheng
N1 - Publisher Copyright:
© 2019, eLife Sciences Publications Ltd. All rights reserved.
PY - 2019
Y1 - 2019
N2 - Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and often are not interpretable. Our approach, ‘Multiscale Graph Correlation’ (MGC), is a dependence test that juxtaposes disparate data science techniques, including k-nearest neighbors, kernel methods, and multiscale analysis. Other methods may require double or triple the number of samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely characterizes the latent geometry underlying the relationship, while maintaining computational efficiency. In real data, including brain imaging and cancer genetics, MGC detects the presence of a dependency and provides guidance for the next experiments to conduct.
AB - Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and often are not interpretable. Our approach, ‘Multiscale Graph Correlation’ (MGC), is a dependence test that juxtaposes disparate data science techniques, including k-nearest neighbors, kernel methods, and multiscale analysis. Other methods may require double or triple the number of samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely characterizes the latent geometry underlying the relationship, while maintaining computational efficiency. In real data, including brain imaging and cancer genetics, MGC detects the presence of a dependency and provides guidance for the next experiments to conduct.
UR - http://www.scopus.com/inward/record.url?scp=85061994950&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061994950&partnerID=8YFLogxK
U2 - 10.7554/eLife.41690
DO - 10.7554/eLife.41690
M3 - Article
C2 - 30644820
AN - SCOPUS:85061994950
SN - 2050-084X
VL - 8
JO - eLife
JF - eLife
M1 - e41690
ER -