Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues

Niya Wang; Eric P. Hoffman; Lulu Chen; Li Chen; Zhen Zhang; Chunyu Liu; Guoqiang Yu; David M. Herrington; Robert Clarke; Yue Wang

doi:10.1038/srep18909

Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues

Niya Wang, Eric P. Hoffman, Lulu Chen, Li Chen, Zhen Zhang, Chunyu Liu, Guoqiang Yu, David M. Herrington, Robert Clarke, Yue Wang

School of Medicine

Research output: Contribution to journal › Article › peer-review

25 Scopus citations

Abstract

Tissue heterogeneity is both a major confounding factor and an underexploited information source. While a handful of reports have demonstrated the potential of supervised computational methods to deconvolute tissue heterogeneity, these approaches require a priori information on the marker genes or composition of known subpopulations. To address the critical problem of the absence of validated marker genes for many (including novel) subpopulations, we describe convex analysis of mixtures (CAM), a fully unsupervised in silico method, for identifying subpopulation marker genes directly from the original mixed gene expressions in scatter space that can improve molecular analyses in many biological contexts. Validated with predesigned mixtures, CAM on the gene expression data from peripheral leukocytes, brain tissue, and yeast cell cycle, revealed novel marker genes that were otherwise undetectable using existing methods. Importantly, CAM requires no a priori information on the number, identity, or composition of the subpopulations present in mixed samples, and does not require the presence of pure subpopulations in sample space. This advantage is significant in that CAM can achieve all of its goals using only a small number of heterogeneous samples, and is more powerful to distinguish between phenotypically similar subpopulations.

Original language	English (US)
Article number	18909
Journal	Scientific reports
Volume	6
DOIs	https://doi.org/10.1038/srep18909
State	Published - Jan 7 2016

ASJC Scopus subject areas

General

Access to Document

10.1038/srep18909

Cite this

@article{81123f067c9f470f852685f200478e40,

title = "Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues",

abstract = "Tissue heterogeneity is both a major confounding factor and an underexploited information source. While a handful of reports have demonstrated the potential of supervised computational methods to deconvolute tissue heterogeneity, these approaches require a priori information on the marker genes or composition of known subpopulations. To address the critical problem of the absence of validated marker genes for many (including novel) subpopulations, we describe convex analysis of mixtures (CAM), a fully unsupervised in silico method, for identifying subpopulation marker genes directly from the original mixed gene expressions in scatter space that can improve molecular analyses in many biological contexts. Validated with predesigned mixtures, CAM on the gene expression data from peripheral leukocytes, brain tissue, and yeast cell cycle, revealed novel marker genes that were otherwise undetectable using existing methods. Importantly, CAM requires no a priori information on the number, identity, or composition of the subpopulations present in mixed samples, and does not require the presence of pure subpopulations in sample space. This advantage is significant in that CAM can achieve all of its goals using only a small number of heterogeneous samples, and is more powerful to distinguish between phenotypically similar subpopulations.",

author = "Niya Wang and Hoffman, {Eric P.} and Lulu Chen and Li Chen and Zhen Zhang and Chunyu Liu and Guoqiang Yu and Herrington, {David M.} and Robert Clarke and Yue Wang",

note = "Funding Information: This work was funded in part by the National Institutes of Health under Grants NS029525, CA160036, CA184902, ES024988, CA149653, and HL111362.",

year = "2016",

month = jan,

day = "7",

doi = "10.1038/srep18909",

language = "English (US)",

volume = "6",

journal = "Scientific reports",

issn = "2045-2322",

publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues

AU - Wang, Niya

AU - Hoffman, Eric P.

AU - Chen, Lulu

AU - Chen, Li

AU - Zhang, Zhen

AU - Liu, Chunyu

AU - Yu, Guoqiang

AU - Herrington, David M.

AU - Clarke, Robert

AU - Wang, Yue

N1 - Funding Information: This work was funded in part by the National Institutes of Health under Grants NS029525, CA160036, CA184902, ES024988, CA149653, and HL111362.

PY - 2016/1/7

Y1 - 2016/1/7

N2 - Tissue heterogeneity is both a major confounding factor and an underexploited information source. While a handful of reports have demonstrated the potential of supervised computational methods to deconvolute tissue heterogeneity, these approaches require a priori information on the marker genes or composition of known subpopulations. To address the critical problem of the absence of validated marker genes for many (including novel) subpopulations, we describe convex analysis of mixtures (CAM), a fully unsupervised in silico method, for identifying subpopulation marker genes directly from the original mixed gene expressions in scatter space that can improve molecular analyses in many biological contexts. Validated with predesigned mixtures, CAM on the gene expression data from peripheral leukocytes, brain tissue, and yeast cell cycle, revealed novel marker genes that were otherwise undetectable using existing methods. Importantly, CAM requires no a priori information on the number, identity, or composition of the subpopulations present in mixed samples, and does not require the presence of pure subpopulations in sample space. This advantage is significant in that CAM can achieve all of its goals using only a small number of heterogeneous samples, and is more powerful to distinguish between phenotypically similar subpopulations.

AB - Tissue heterogeneity is both a major confounding factor and an underexploited information source. While a handful of reports have demonstrated the potential of supervised computational methods to deconvolute tissue heterogeneity, these approaches require a priori information on the marker genes or composition of known subpopulations. To address the critical problem of the absence of validated marker genes for many (including novel) subpopulations, we describe convex analysis of mixtures (CAM), a fully unsupervised in silico method, for identifying subpopulation marker genes directly from the original mixed gene expressions in scatter space that can improve molecular analyses in many biological contexts. Validated with predesigned mixtures, CAM on the gene expression data from peripheral leukocytes, brain tissue, and yeast cell cycle, revealed novel marker genes that were otherwise undetectable using existing methods. Importantly, CAM requires no a priori information on the number, identity, or composition of the subpopulations present in mixed samples, and does not require the presence of pure subpopulations in sample space. This advantage is significant in that CAM can achieve all of its goals using only a small number of heterogeneous samples, and is more powerful to distinguish between phenotypically similar subpopulations.

UR - http://www.scopus.com/inward/record.url?scp=84954563283&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84954563283&partnerID=8YFLogxK

U2 - 10.1038/srep18909

DO - 10.1038/srep18909

M3 - Article

C2 - 26739359

AN - SCOPUS:84954563283

SN - 2045-2322

VL - 6

JO - Scientific reports

JF - Scientific reports

M1 - 18909

ER -

Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this