TY - JOUR
T1 - Digitizing omics profiles by divergence from a baseline
AU - Dinalankara, Wikum
AU - Ke, Qian
AU - Xu, Yiran
AU - Ji, Lanlan
AU - Pagane, Nicole
AU - Lien, Anching
AU - Matam, Tejasvi
AU - Fertig, Elana J.
AU - Price, Nathan
AU - Younes, Laurent
AU - Marchionni, Luigi
AU - Geman, Donald
N1 - Funding Information:
ACKNOWLEDGMENTS. We thank Krastan Blagoev, Eddie Luidy-Imada, and Francisco Pereira-Lobo for helpful discussions. This research was supported by NIH National Cancer Institute Grant R01CA200859.
Funding Information:
We thank Krastan Blagoev, Eddie Luidy-Imada, and Francisco Pereira-Lobo for helpful discussions. This research was supported by NIH National Cancer Institute Grant R01CA200859.
Publisher Copyright:
© 2018 National Academy of Sciences. All rights reserved.
PY - 2018/5/1
Y1 - 2018/5/1
N2 - Data collected from omics technologies have revealed pervasive heterogeneity and stochasticity of molecular states within and between phenotypes. A prominent example of such heterogeneity occurs between genome-wide mRNA, microRNA, and methylation profiles from one individual tumor to another, even within a cancer subtype. However, current methods in bioinformatics, such as detecting differentially expressed genes or CpG sites, are population-based and therefore do not effectively model intersample diversity. Here we introduce a unified theory to quantify sample-level heterogeneity that is applicable to a single omics profile. Specifically, we simplify an omics profile to a digital representation based on the omics profiles from a set of samples from a reference or baseline population (e.g., normal tissues). The state of any subprofile (e.g., expression vector for a subset of genes) is said to be “divergent” if it lies outside the estimated support of the baseline distribution and is consequently interpreted as “dysregulated” relative to that baseline. We focus on two cases: single features (e.g., individual genes) and distinguished subsets (e.g., regulatory pathways). Notably, since the divergence analysis is at the individual sample level, dysregulation can be analyzed probabilistically; for example, one can estimate the probability that a gene or pathway is divergent in some population. Finally, the reduction in complexity facilitates a more “personalized” and biologically interpretable analysis of variation, as illustrated by experiments involving tissue characterization, disease detection and progression, and disease–pathway associations.
AB - Data collected from omics technologies have revealed pervasive heterogeneity and stochasticity of molecular states within and between phenotypes. A prominent example of such heterogeneity occurs between genome-wide mRNA, microRNA, and methylation profiles from one individual tumor to another, even within a cancer subtype. However, current methods in bioinformatics, such as detecting differentially expressed genes or CpG sites, are population-based and therefore do not effectively model intersample diversity. Here we introduce a unified theory to quantify sample-level heterogeneity that is applicable to a single omics profile. Specifically, we simplify an omics profile to a digital representation based on the omics profiles from a set of samples from a reference or baseline population (e.g., normal tissues). The state of any subprofile (e.g., expression vector for a subset of genes) is said to be “divergent” if it lies outside the estimated support of the baseline distribution and is consequently interpreted as “dysregulated” relative to that baseline. We focus on two cases: single features (e.g., individual genes) and distinguished subsets (e.g., regulatory pathways). Notably, since the divergence analysis is at the individual sample level, dysregulation can be analyzed probabilistically; for example, one can estimate the probability that a gene or pathway is divergent in some population. Finally, the reduction in complexity facilitates a more “personalized” and biologically interpretable analysis of variation, as illustrated by experiments involving tissue characterization, disease detection and progression, and disease–pathway associations.
KW - Cancer
KW - Digitization
KW - Dysregulation
KW - Precision medicine
KW - Stochasticity
UR - http://www.scopus.com/inward/record.url?scp=85046283176&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046283176&partnerID=8YFLogxK
U2 - 10.1073/pnas.1721628115
DO - 10.1073/pnas.1721628115
M3 - Article
C2 - 29666255
AN - SCOPUS:85046283176
SN - 0027-8424
VL - 115
SP - 4545
EP - 4552
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 18
ER -