CLASS: Constrained transcript assembly of RNA-seq reads

Research output: Contribution to journalArticle

Abstract

Background: RNA-seq has revolutionized our ability to survey the cellular transcriptome in great detail. However, while several approaches have been developed, the problem of assembling the short reads into full-length transcripts remains challenging.Results: We developed a novel algorithm and software tool, CLASS (Constraint-based Local Assembly and Selection of Splice variants), for accurately assembling splice variants using local read coverage patterns of RNA-seq reads, contiguity constraints from read pairs and spliced reads, and optionally information about gene structure extracted from cDNA sequence databases. The algorithmic underpinnings of CLASS are: i) a linear program to infer exons, ii) a compact splice graph representation of a gene and its splice variants, and iii) a transcript selection scheme that takes into account contiguity constraints and, where available, knowledge about gene structure.Conclusion: In comparisons against leading transcript assembly programs, CLASS is more accurate on both simulated and real reads and produces results that are easier to interpret when applied to large scale real data, and therefore is a promising analysis tool for next generation sequencing data.Availability: CLASS is available from http://sourceforge.net/projects/splicebox.

Original languageEnglish (US)
Article numberS14
JournalBMC Bioinformatics
Volume14
Issue numberSUPPL.5
DOIs
StatePublished - Apr 10 2013

Fingerprint

RNA
Genes
Contiguity
Gene
Program assemblers
Transcriptome
Availability Constraints
Exons
Software
Complementary DNA
Databases
Graph Representation
CDNA
Software Tools
Linear Program
Sequencing
Availability
Coverage

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology
  • Medicine(all)

Cite this

CLASS : Constrained transcript assembly of RNA-seq reads. / Song, Li; Florea, Liliana D.

In: BMC Bioinformatics, Vol. 14, No. SUPPL.5, S14, 10.04.2013.

Research output: Contribution to journalArticle

@article{45155dd99d134c20a4448adea365a39f,
title = "CLASS: Constrained transcript assembly of RNA-seq reads",
abstract = "Background: RNA-seq has revolutionized our ability to survey the cellular transcriptome in great detail. However, while several approaches have been developed, the problem of assembling the short reads into full-length transcripts remains challenging.Results: We developed a novel algorithm and software tool, CLASS (Constraint-based Local Assembly and Selection of Splice variants), for accurately assembling splice variants using local read coverage patterns of RNA-seq reads, contiguity constraints from read pairs and spliced reads, and optionally information about gene structure extracted from cDNA sequence databases. The algorithmic underpinnings of CLASS are: i) a linear program to infer exons, ii) a compact splice graph representation of a gene and its splice variants, and iii) a transcript selection scheme that takes into account contiguity constraints and, where available, knowledge about gene structure.Conclusion: In comparisons against leading transcript assembly programs, CLASS is more accurate on both simulated and real reads and produces results that are easier to interpret when applied to large scale real data, and therefore is a promising analysis tool for next generation sequencing data.Availability: CLASS is available from http://sourceforge.net/projects/splicebox.",
author = "Li Song and Florea, {Liliana D}",
year = "2013",
month = "4",
day = "10",
doi = "10.1186/1471-2105-14-S5-S14",
language = "English (US)",
volume = "14",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "SUPPL.5",

}

TY - JOUR

T1 - CLASS

T2 - Constrained transcript assembly of RNA-seq reads

AU - Song, Li

AU - Florea, Liliana D

PY - 2013/4/10

Y1 - 2013/4/10

N2 - Background: RNA-seq has revolutionized our ability to survey the cellular transcriptome in great detail. However, while several approaches have been developed, the problem of assembling the short reads into full-length transcripts remains challenging.Results: We developed a novel algorithm and software tool, CLASS (Constraint-based Local Assembly and Selection of Splice variants), for accurately assembling splice variants using local read coverage patterns of RNA-seq reads, contiguity constraints from read pairs and spliced reads, and optionally information about gene structure extracted from cDNA sequence databases. The algorithmic underpinnings of CLASS are: i) a linear program to infer exons, ii) a compact splice graph representation of a gene and its splice variants, and iii) a transcript selection scheme that takes into account contiguity constraints and, where available, knowledge about gene structure.Conclusion: In comparisons against leading transcript assembly programs, CLASS is more accurate on both simulated and real reads and produces results that are easier to interpret when applied to large scale real data, and therefore is a promising analysis tool for next generation sequencing data.Availability: CLASS is available from http://sourceforge.net/projects/splicebox.

AB - Background: RNA-seq has revolutionized our ability to survey the cellular transcriptome in great detail. However, while several approaches have been developed, the problem of assembling the short reads into full-length transcripts remains challenging.Results: We developed a novel algorithm and software tool, CLASS (Constraint-based Local Assembly and Selection of Splice variants), for accurately assembling splice variants using local read coverage patterns of RNA-seq reads, contiguity constraints from read pairs and spliced reads, and optionally information about gene structure extracted from cDNA sequence databases. The algorithmic underpinnings of CLASS are: i) a linear program to infer exons, ii) a compact splice graph representation of a gene and its splice variants, and iii) a transcript selection scheme that takes into account contiguity constraints and, where available, knowledge about gene structure.Conclusion: In comparisons against leading transcript assembly programs, CLASS is more accurate on both simulated and real reads and produces results that are easier to interpret when applied to large scale real data, and therefore is a promising analysis tool for next generation sequencing data.Availability: CLASS is available from http://sourceforge.net/projects/splicebox.

UR - http://www.scopus.com/inward/record.url?scp=84876122166&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876122166&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-14-S5-S14

DO - 10.1186/1471-2105-14-S5-S14

M3 - Article

C2 - 23734605

AN - SCOPUS:84876122166

VL - 14

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL.5

M1 - S14

ER -