Discovering and deciphering relationships across disparate data modalities

Joshua T. Vogelstein, Eric W. Bridgeford, Qing Wang, Carey E. Priebe, Mauro Maggioni, Cencheng Shen

Research output: Contribution to journalArticle

Abstract

Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and often are not interpretable. Our approach, 'Multiscale Graph Correlation' (MGC), is a dependence test that juxtaposes disparate data science techniques, including k-nearest neighbors, kernel methods, and multiscale analysis. Other methods may require double or triple the number of samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely characterizes the latent geometry underlying the relationship, while maintaining computational efficiency. In real data, including brain imaging and cancer genetics, MGC detects the presence of a dependency and provides guidance for the next experiments to conduct.

Original languageEnglish (US)
JournaleLife
Volume8
DOIs
StatePublished - Jan 15 2019

Fingerprint

Computational efficiency
Brain
Connectome
Genes
Imaging techniques
Benchmarking
Geometry
Neuroimaging
Brain Neoplasms
Sample Size
Experiments
Genome
Genetics

Keywords

  • computational biology
  • data science
  • human
  • machine learning
  • neuroscience
  • statistics
  • systems biology

ASJC Scopus subject areas

  • Neuroscience(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)

Cite this

Vogelstein, J. T., Bridgeford, E. W., Wang, Q., Priebe, C. E., Maggioni, M., & Shen, C. (2019). Discovering and deciphering relationships across disparate data modalities. eLife, 8. https://doi.org/10.7554/eLife.41690

Discovering and deciphering relationships across disparate data modalities. / Vogelstein, Joshua T.; Bridgeford, Eric W.; Wang, Qing; Priebe, Carey E.; Maggioni, Mauro; Shen, Cencheng.

In: eLife, Vol. 8, 15.01.2019.

Research output: Contribution to journalArticle

Vogelstein, JT, Bridgeford, EW, Wang, Q, Priebe, CE, Maggioni, M & Shen, C 2019, 'Discovering and deciphering relationships across disparate data modalities', eLife, vol. 8. https://doi.org/10.7554/eLife.41690
Vogelstein JT, Bridgeford EW, Wang Q, Priebe CE, Maggioni M, Shen C. Discovering and deciphering relationships across disparate data modalities. eLife. 2019 Jan 15;8. https://doi.org/10.7554/eLife.41690
Vogelstein, Joshua T. ; Bridgeford, Eric W. ; Wang, Qing ; Priebe, Carey E. ; Maggioni, Mauro ; Shen, Cencheng. / Discovering and deciphering relationships across disparate data modalities. In: eLife. 2019 ; Vol. 8.
@article{f53cf86b91744c72b588e1ce65d07455,
title = "Discovering and deciphering relationships across disparate data modalities",
abstract = "Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and often are not interpretable. Our approach, 'Multiscale Graph Correlation' (MGC), is a dependence test that juxtaposes disparate data science techniques, including k-nearest neighbors, kernel methods, and multiscale analysis. Other methods may require double or triple the number of samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely characterizes the latent geometry underlying the relationship, while maintaining computational efficiency. In real data, including brain imaging and cancer genetics, MGC detects the presence of a dependency and provides guidance for the next experiments to conduct.",
keywords = "computational biology, data science, human, machine learning, neuroscience, statistics, systems biology",
author = "Vogelstein, {Joshua T.} and Bridgeford, {Eric W.} and Qing Wang and Priebe, {Carey E.} and Mauro Maggioni and Cencheng Shen",
year = "2019",
month = "1",
day = "15",
doi = "10.7554/eLife.41690",
language = "English (US)",
volume = "8",
journal = "eLife",
issn = "2050-084X",
publisher = "eLife Sciences Publications",

}

TY - JOUR

T1 - Discovering and deciphering relationships across disparate data modalities

AU - Vogelstein, Joshua T.

AU - Bridgeford, Eric W.

AU - Wang, Qing

AU - Priebe, Carey E.

AU - Maggioni, Mauro

AU - Shen, Cencheng

PY - 2019/1/15

Y1 - 2019/1/15

N2 - Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and often are not interpretable. Our approach, 'Multiscale Graph Correlation' (MGC), is a dependence test that juxtaposes disparate data science techniques, including k-nearest neighbors, kernel methods, and multiscale analysis. Other methods may require double or triple the number of samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely characterizes the latent geometry underlying the relationship, while maintaining computational efficiency. In real data, including brain imaging and cancer genetics, MGC detects the presence of a dependency and provides guidance for the next experiments to conduct.

AB - Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and often are not interpretable. Our approach, 'Multiscale Graph Correlation' (MGC), is a dependence test that juxtaposes disparate data science techniques, including k-nearest neighbors, kernel methods, and multiscale analysis. Other methods may require double or triple the number of samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely characterizes the latent geometry underlying the relationship, while maintaining computational efficiency. In real data, including brain imaging and cancer genetics, MGC detects the presence of a dependency and provides guidance for the next experiments to conduct.

KW - computational biology

KW - data science

KW - human

KW - machine learning

KW - neuroscience

KW - statistics

KW - systems biology

UR - http://www.scopus.com/inward/record.url?scp=85061994950&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061994950&partnerID=8YFLogxK

U2 - 10.7554/eLife.41690

DO - 10.7554/eLife.41690

M3 - Article

C2 - 30644820

AN - SCOPUS:85061994950

VL - 8

JO - eLife

JF - eLife

SN - 2050-084X

ER -