IntAPT: Integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles

Xu Shi; Andrew F. Neuwald; Xiao Wang; Tian Li Wang; Leena Hilakivi-Clarke; Robert Clarke; Jianhua Xuan

doi:10.1093/bioinformatics/btaa852

IntAPT: Integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles

Xu Shi, Andrew F. Neuwald, Xiao Wang, Tian Li Wang, Leena Hilakivi-Clarke, Robert Clarke, Jianhua Xuan

School of Medicine

Research output: Contribution to journal › Article › peer-review

Abstract

Motivation: High-throughput RNA sequencing has revolutionized the scope and depth of transcriptome analysis. Accurate reconstruction of a phenotype-specific transcriptome is challenging due to the noise and variability of RNA-seq data. This requires computational identification of transcripts from multiple samples of the same phenotype, given the underlying consensus transcript structure. Results: We present a Bayesian method, integrated assembly of phenotype-specific transcripts (IntAPT), that identifies phenotype-specific isoforms from multiple RNA-seq profiles. IntAPT features a novel two-layer Bayesian model to capture the presence of isoforms at the group layer and to quantify the abundance of isoforms at the sample layer. A spike-and-slab prior is used to model the isoform expression and to enforce the sparsity of expressed isoforms. Dependencies between the existence of isoforms and their expression are modeled explicitly to facilitate parameter estimation. Model parameters are estimated iteratively using Gibbs sampling to infer the joint posterior distribution, from which the presence and abundance of isoforms can reliably be determined. Studies using both simulations and real datasets show that IntAPT consistently outperforms existing methods for the IntAPT. Experimental results demonstrate that, despite sequencing errors, IntAPT exhibits a robust performance among multiple samples, resulting in notably improved identification of expressed isoforms of low abundance.

Original language	English (US)
Pages (from-to)	650-658
Number of pages	9
Journal	Bioinformatics
Volume	37
Issue number	5
DOIs	https://doi.org/10.1093/bioinformatics/btaa852
State	Published - Mar 1 2021

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btaa852

Cite this

@article{73b18008ee17427682009cc30b48142a,

title = "IntAPT: Integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles",

abstract = "Motivation: High-throughput RNA sequencing has revolutionized the scope and depth of transcriptome analysis. Accurate reconstruction of a phenotype-specific transcriptome is challenging due to the noise and variability of RNA-seq data. This requires computational identification of transcripts from multiple samples of the same phenotype, given the underlying consensus transcript structure. Results: We present a Bayesian method, integrated assembly of phenotype-specific transcripts (IntAPT), that identifies phenotype-specific isoforms from multiple RNA-seq profiles. IntAPT features a novel two-layer Bayesian model to capture the presence of isoforms at the group layer and to quantify the abundance of isoforms at the sample layer. A spike-and-slab prior is used to model the isoform expression and to enforce the sparsity of expressed isoforms. Dependencies between the existence of isoforms and their expression are modeled explicitly to facilitate parameter estimation. Model parameters are estimated iteratively using Gibbs sampling to infer the joint posterior distribution, from which the presence and abundance of isoforms can reliably be determined. Studies using both simulations and real datasets show that IntAPT consistently outperforms existing methods for the IntAPT. Experimental results demonstrate that, despite sequencing errors, IntAPT exhibits a robust performance among multiple samples, resulting in notably improved identification of expressed isoforms of low abundance.",

author = "Xu Shi and Neuwald, {Andrew F.} and Xiao Wang and Wang, {Tian Li} and Leena Hilakivi-Clarke and Robert Clarke and Jianhua Xuan",

year = "2021",

month = mar,

day = "1",

doi = "10.1093/bioinformatics/btaa852",

language = "English (US)",

volume = "37",

pages = "650--658",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "5",

}

TY - JOUR

T1 - IntAPT

T2 - Integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles

AU - Shi, Xu

AU - Neuwald, Andrew F.

AU - Wang, Xiao

AU - Wang, Tian Li

AU - Hilakivi-Clarke, Leena

AU - Clarke, Robert

AU - Xuan, Jianhua

PY - 2021/3/1

Y1 - 2021/3/1

N2 - Motivation: High-throughput RNA sequencing has revolutionized the scope and depth of transcriptome analysis. Accurate reconstruction of a phenotype-specific transcriptome is challenging due to the noise and variability of RNA-seq data. This requires computational identification of transcripts from multiple samples of the same phenotype, given the underlying consensus transcript structure. Results: We present a Bayesian method, integrated assembly of phenotype-specific transcripts (IntAPT), that identifies phenotype-specific isoforms from multiple RNA-seq profiles. IntAPT features a novel two-layer Bayesian model to capture the presence of isoforms at the group layer and to quantify the abundance of isoforms at the sample layer. A spike-and-slab prior is used to model the isoform expression and to enforce the sparsity of expressed isoforms. Dependencies between the existence of isoforms and their expression are modeled explicitly to facilitate parameter estimation. Model parameters are estimated iteratively using Gibbs sampling to infer the joint posterior distribution, from which the presence and abundance of isoforms can reliably be determined. Studies using both simulations and real datasets show that IntAPT consistently outperforms existing methods for the IntAPT. Experimental results demonstrate that, despite sequencing errors, IntAPT exhibits a robust performance among multiple samples, resulting in notably improved identification of expressed isoforms of low abundance.

AB - Motivation: High-throughput RNA sequencing has revolutionized the scope and depth of transcriptome analysis. Accurate reconstruction of a phenotype-specific transcriptome is challenging due to the noise and variability of RNA-seq data. This requires computational identification of transcripts from multiple samples of the same phenotype, given the underlying consensus transcript structure. Results: We present a Bayesian method, integrated assembly of phenotype-specific transcripts (IntAPT), that identifies phenotype-specific isoforms from multiple RNA-seq profiles. IntAPT features a novel two-layer Bayesian model to capture the presence of isoforms at the group layer and to quantify the abundance of isoforms at the sample layer. A spike-and-slab prior is used to model the isoform expression and to enforce the sparsity of expressed isoforms. Dependencies between the existence of isoforms and their expression are modeled explicitly to facilitate parameter estimation. Model parameters are estimated iteratively using Gibbs sampling to infer the joint posterior distribution, from which the presence and abundance of isoforms can reliably be determined. Studies using both simulations and real datasets show that IntAPT consistently outperforms existing methods for the IntAPT. Experimental results demonstrate that, despite sequencing errors, IntAPT exhibits a robust performance among multiple samples, resulting in notably improved identification of expressed isoforms of low abundance.

UR - http://www.scopus.com/inward/record.url?scp=85106069205&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85106069205&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btaa852

DO - 10.1093/bioinformatics/btaa852

M3 - Article

C2 - 33016988

AN - SCOPUS:85106069205

SN - 1367-4803

VL - 37

SP - 650

EP - 658

JO - Bioinformatics

JF - Bioinformatics

IS - 5

ER -

IntAPT: Integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this