Digitizing omics profiles by divergence from a baseline

Wikum Dinalankara; Qian Ke; Yiran Xu; Lanlan Ji; Nicole Pagane; Anching Lien; Tejasvi Matam; Elana J. Fertig; Nathan Price; Laurent Younes; Luigi Marchionni; Donald Geman

doi:10.1073/pnas.1721628115

Digitizing omics profiles by divergence from a baseline

Wikum Dinalankara, Qian Ke, Yiran Xu, Lanlan Ji, Nicole Pagane, Anching Lien, Tejasvi Matam, Elana J. Fertig, Nathan Price, Laurent Younes, Luigi Marchionni, Donald Geman

School of Medicine

Research output: Contribution to journal › Article › peer-review

7 Scopus citations

Abstract

Data collected from omics technologies have revealed pervasive heterogeneity and stochasticity of molecular states within and between phenotypes. A prominent example of such heterogeneity occurs between genome-wide mRNA, microRNA, and methylation profiles from one individual tumor to another, even within a cancer subtype. However, current methods in bioinformatics, such as detecting differentially expressed genes or CpG sites, are population-based and therefore do not effectively model intersample diversity. Here we introduce a unified theory to quantify sample-level heterogeneity that is applicable to a single omics profile. Specifically, we simplify an omics profile to a digital representation based on the omics profiles from a set of samples from a reference or baseline population (e.g., normal tissues). The state of any subprofile (e.g., expression vector for a subset of genes) is said to be “divergent” if it lies outside the estimated support of the baseline distribution and is consequently interpreted as “dysregulated” relative to that baseline. We focus on two cases: single features (e.g., individual genes) and distinguished subsets (e.g., regulatory pathways). Notably, since the divergence analysis is at the individual sample level, dysregulation can be analyzed probabilistically; for example, one can estimate the probability that a gene or pathway is divergent in some population. Finally, the reduction in complexity facilitates a more “personalized” and biologically interpretable analysis of variation, as illustrated by experiments involving tissue characterization, disease detection and progression, and disease–pathway associations.

Original language	English (US)
Pages (from-to)	4545-4552
Number of pages	8
Journal	Proceedings of the National Academy of Sciences of the United States of America
Volume	115
Issue number	18
DOIs	https://doi.org/10.1073/pnas.1721628115
State	Published - May 1 2018

Keywords

Cancer
Digitization
Dysregulation
Precision medicine
Stochasticity

ASJC Scopus subject areas

General

Access to Document

10.1073/pnas.1721628115

Cite this

@article{0ecd21cba38743159007720f9c2d5ec2,

title = "Digitizing omics profiles by divergence from a baseline",

abstract = "Data collected from omics technologies have revealed pervasive heterogeneity and stochasticity of molecular states within and between phenotypes. A prominent example of such heterogeneity occurs between genome-wide mRNA, microRNA, and methylation profiles from one individual tumor to another, even within a cancer subtype. However, current methods in bioinformatics, such as detecting differentially expressed genes or CpG sites, are population-based and therefore do not effectively model intersample diversity. Here we introduce a unified theory to quantify sample-level heterogeneity that is applicable to a single omics profile. Specifically, we simplify an omics profile to a digital representation based on the omics profiles from a set of samples from a reference or baseline population (e.g., normal tissues). The state of any subprofile (e.g., expression vector for a subset of genes) is said to be “divergent” if it lies outside the estimated support of the baseline distribution and is consequently interpreted as “dysregulated” relative to that baseline. We focus on two cases: single features (e.g., individual genes) and distinguished subsets (e.g., regulatory pathways). Notably, since the divergence analysis is at the individual sample level, dysregulation can be analyzed probabilistically; for example, one can estimate the probability that a gene or pathway is divergent in some population. Finally, the reduction in complexity facilitates a more “personalized” and biologically interpretable analysis of variation, as illustrated by experiments involving tissue characterization, disease detection and progression, and disease–pathway associations.",

keywords = "Cancer, Digitization, Dysregulation, Precision medicine, Stochasticity",

author = "Wikum Dinalankara and Qian Ke and Yiran Xu and Lanlan Ji and Nicole Pagane and Anching Lien and Tejasvi Matam and Fertig, {Elana J.} and Nathan Price and Laurent Younes and Luigi Marchionni and Donald Geman",

note = "Funding Information: ACKNOWLEDGMENTS. We thank Krastan Blagoev, Eddie Luidy-Imada, and Francisco Pereira-Lobo for helpful discussions. This research was supported by NIH National Cancer Institute Grant R01CA200859. Funding Information: We thank Krastan Blagoev, Eddie Luidy-Imada, and Francisco Pereira-Lobo for helpful discussions. This research was supported by NIH National Cancer Institute Grant R01CA200859. Publisher Copyright: {\textcopyright} 2018 National Academy of Sciences. All rights reserved.",

year = "2018",

month = may,

day = "1",

doi = "10.1073/pnas.1721628115",

language = "English (US)",

volume = "115",

pages = "4545--4552",

journal = "Proceedings of the National Academy of Sciences of the United States of America",

issn = "0027-8424",

publisher = "National Academy of Sciences",

number = "18",

}

TY - JOUR

T1 - Digitizing omics profiles by divergence from a baseline

AU - Dinalankara, Wikum

AU - Ke, Qian

AU - Xu, Yiran

AU - Ji, Lanlan

AU - Pagane, Nicole

AU - Lien, Anching

AU - Matam, Tejasvi

AU - Fertig, Elana J.

AU - Price, Nathan

AU - Younes, Laurent

AU - Marchionni, Luigi

AU - Geman, Donald

N1 - Funding Information: ACKNOWLEDGMENTS. We thank Krastan Blagoev, Eddie Luidy-Imada, and Francisco Pereira-Lobo for helpful discussions. This research was supported by NIH National Cancer Institute Grant R01CA200859. Funding Information: We thank Krastan Blagoev, Eddie Luidy-Imada, and Francisco Pereira-Lobo for helpful discussions. This research was supported by NIH National Cancer Institute Grant R01CA200859. Publisher Copyright: © 2018 National Academy of Sciences. All rights reserved.

PY - 2018/5/1

Y1 - 2018/5/1

N2 - Data collected from omics technologies have revealed pervasive heterogeneity and stochasticity of molecular states within and between phenotypes. A prominent example of such heterogeneity occurs between genome-wide mRNA, microRNA, and methylation profiles from one individual tumor to another, even within a cancer subtype. However, current methods in bioinformatics, such as detecting differentially expressed genes or CpG sites, are population-based and therefore do not effectively model intersample diversity. Here we introduce a unified theory to quantify sample-level heterogeneity that is applicable to a single omics profile. Specifically, we simplify an omics profile to a digital representation based on the omics profiles from a set of samples from a reference or baseline population (e.g., normal tissues). The state of any subprofile (e.g., expression vector for a subset of genes) is said to be “divergent” if it lies outside the estimated support of the baseline distribution and is consequently interpreted as “dysregulated” relative to that baseline. We focus on two cases: single features (e.g., individual genes) and distinguished subsets (e.g., regulatory pathways). Notably, since the divergence analysis is at the individual sample level, dysregulation can be analyzed probabilistically; for example, one can estimate the probability that a gene or pathway is divergent in some population. Finally, the reduction in complexity facilitates a more “personalized” and biologically interpretable analysis of variation, as illustrated by experiments involving tissue characterization, disease detection and progression, and disease–pathway associations.

AB - Data collected from omics technologies have revealed pervasive heterogeneity and stochasticity of molecular states within and between phenotypes. A prominent example of such heterogeneity occurs between genome-wide mRNA, microRNA, and methylation profiles from one individual tumor to another, even within a cancer subtype. However, current methods in bioinformatics, such as detecting differentially expressed genes or CpG sites, are population-based and therefore do not effectively model intersample diversity. Here we introduce a unified theory to quantify sample-level heterogeneity that is applicable to a single omics profile. Specifically, we simplify an omics profile to a digital representation based on the omics profiles from a set of samples from a reference or baseline population (e.g., normal tissues). The state of any subprofile (e.g., expression vector for a subset of genes) is said to be “divergent” if it lies outside the estimated support of the baseline distribution and is consequently interpreted as “dysregulated” relative to that baseline. We focus on two cases: single features (e.g., individual genes) and distinguished subsets (e.g., regulatory pathways). Notably, since the divergence analysis is at the individual sample level, dysregulation can be analyzed probabilistically; for example, one can estimate the probability that a gene or pathway is divergent in some population. Finally, the reduction in complexity facilitates a more “personalized” and biologically interpretable analysis of variation, as illustrated by experiments involving tissue characterization, disease detection and progression, and disease–pathway associations.

KW - Cancer

KW - Digitization

KW - Dysregulation

KW - Precision medicine

KW - Stochasticity

UR - http://www.scopus.com/inward/record.url?scp=85046283176&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046283176&partnerID=8YFLogxK

U2 - 10.1073/pnas.1721628115

DO - 10.1073/pnas.1721628115

M3 - Article

C2 - 29666255

AN - SCOPUS:85046283176

SN - 0027-8424

VL - 115

SP - 4545

EP - 4552

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

IS - 18

ER -

Digitizing omics profiles by divergence from a baseline

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this