A multi-sample approach increases the accuracy of transcript assembly

Li Song, Sarven Sabunciyan, Guangyu Yang, Liliana Florea

Research output: Contribution to journalArticle

Abstract

Transcript assembly from RNA-seq reads is a critical step in gene expression and subsequent functional analyses. Here we present PsiCLASS, an accurate and efficient transcript assembler based on an approach that simultaneously analyzes multiple RNA-seq samples. PsiCLASS combines mixture statistical models for exonic feature selection across multiple samples with splice graph based dynamic programming algorithms and a weighted voting scheme for transcript selection. PsiCLASS achieves significantly better sensitivity-precision tradeoff, and renders precision up to 2-3 fold higher than the StringTie system and Scallop plus TACO, the two best current approaches. PsiCLASS is efficient and scalable, assembling 667 GEUVADIS samples in 9 h, and has robust accuracy with large numbers of samples.

Original languageEnglish (US)
Article number5000
JournalNature communications
Volume10
Issue number1
DOIs
StatePublished - Dec 1 2019

Fingerprint

assembly
RNA
Pectinidae
Statistical Models
Politics
Dynamic programming
Gene expression
Feature extraction
voting
Gene Expression
dynamic programming
gene expression
tradeoffs
assembling
sensitivity

ASJC Scopus subject areas

  • Chemistry(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Physics and Astronomy(all)

Cite this

A multi-sample approach increases the accuracy of transcript assembly. / Song, Li; Sabunciyan, Sarven; Yang, Guangyu; Florea, Liliana.

In: Nature communications, Vol. 10, No. 1, 5000, 01.12.2019.

Research output: Contribution to journalArticle

@article{e64b4a607b494655b6e23ba5bebcee15,
title = "A multi-sample approach increases the accuracy of transcript assembly",
abstract = "Transcript assembly from RNA-seq reads is a critical step in gene expression and subsequent functional analyses. Here we present PsiCLASS, an accurate and efficient transcript assembler based on an approach that simultaneously analyzes multiple RNA-seq samples. PsiCLASS combines mixture statistical models for exonic feature selection across multiple samples with splice graph based dynamic programming algorithms and a weighted voting scheme for transcript selection. PsiCLASS achieves significantly better sensitivity-precision tradeoff, and renders precision up to 2-3 fold higher than the StringTie system and Scallop plus TACO, the two best current approaches. PsiCLASS is efficient and scalable, assembling 667 GEUVADIS samples in 9 h, and has robust accuracy with large numbers of samples.",
author = "Li Song and Sarven Sabunciyan and Guangyu Yang and Liliana Florea",
year = "2019",
month = "12",
day = "1",
doi = "10.1038/s41467-019-12990-0",
language = "English (US)",
volume = "10",
journal = "Nature Communications",
issn = "2041-1723",
publisher = "Nature Publishing Group",
number = "1",

}

TY - JOUR

T1 - A multi-sample approach increases the accuracy of transcript assembly

AU - Song, Li

AU - Sabunciyan, Sarven

AU - Yang, Guangyu

AU - Florea, Liliana

PY - 2019/12/1

Y1 - 2019/12/1

N2 - Transcript assembly from RNA-seq reads is a critical step in gene expression and subsequent functional analyses. Here we present PsiCLASS, an accurate and efficient transcript assembler based on an approach that simultaneously analyzes multiple RNA-seq samples. PsiCLASS combines mixture statistical models for exonic feature selection across multiple samples with splice graph based dynamic programming algorithms and a weighted voting scheme for transcript selection. PsiCLASS achieves significantly better sensitivity-precision tradeoff, and renders precision up to 2-3 fold higher than the StringTie system and Scallop plus TACO, the two best current approaches. PsiCLASS is efficient and scalable, assembling 667 GEUVADIS samples in 9 h, and has robust accuracy with large numbers of samples.

AB - Transcript assembly from RNA-seq reads is a critical step in gene expression and subsequent functional analyses. Here we present PsiCLASS, an accurate and efficient transcript assembler based on an approach that simultaneously analyzes multiple RNA-seq samples. PsiCLASS combines mixture statistical models for exonic feature selection across multiple samples with splice graph based dynamic programming algorithms and a weighted voting scheme for transcript selection. PsiCLASS achieves significantly better sensitivity-precision tradeoff, and renders precision up to 2-3 fold higher than the StringTie system and Scallop plus TACO, the two best current approaches. PsiCLASS is efficient and scalable, assembling 667 GEUVADIS samples in 9 h, and has robust accuracy with large numbers of samples.

UR - http://www.scopus.com/inward/record.url?scp=85074297526&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074297526&partnerID=8YFLogxK

U2 - 10.1038/s41467-019-12990-0

DO - 10.1038/s41467-019-12990-0

M3 - Article

C2 - 31676772

AN - SCOPUS:85074297526

VL - 10

JO - Nature Communications

JF - Nature Communications

SN - 2041-1723

IS - 1

M1 - 5000

ER -