Rascaf

Improving genome assembly with RNA sequencing data

Li Song, Dhruv S. Shankar, Liliana D Florea

Research output: Contribution to journalArticle

Abstract

Abundant but short second-generation sequencing reads make assembly difficult, leading to fragmented genomes and gene annotations. Gene structure information from RNA sequences can be used to improve the completeness and contiguity of an assembly, but bioinformatics methods have been lacking. Rascaf is a highly efficient tool leveraging long-range continuity information from intron spanning RNA sequencing (RNA-seq) read pairs to detect new contig connections. It determines a heaviest path in an exon block graph that simultaneously represents a gene and the underlying contig relationships. Rascaf is more accurate than its competitors, highly precise, and finds thousands of new verifiable connections in several draft Rosaceae genomes. Lightweight and practical, it can be readily incorporated into sequencing pipelines to improve an assembly and its gene annotations.

Original languageEnglish (US)
JournalPlant Genome
Volume9
Issue number3
DOIs
StatePublished - Nov 1 2016

Fingerprint

RNA Sequence Analysis
genome assembly
Molecular Sequence Annotation
sequence analysis
Genome
Rosaceae
Computational Biology
Introns
Genes
Exons
genes
genome
bioinformatics
exons
introns
nucleotide sequences

ASJC Scopus subject areas

  • Agronomy and Crop Science
  • Genetics
  • Plant Science

Cite this

Rascaf : Improving genome assembly with RNA sequencing data. / Song, Li; Shankar, Dhruv S.; Florea, Liliana D.

In: Plant Genome, Vol. 9, No. 3, 01.11.2016.

Research output: Contribution to journalArticle

Song, Li ; Shankar, Dhruv S. ; Florea, Liliana D. / Rascaf : Improving genome assembly with RNA sequencing data. In: Plant Genome. 2016 ; Vol. 9, No. 3.
@article{eb23bbf99d044df5886bf6764e8c0e69,
title = "Rascaf: Improving genome assembly with RNA sequencing data",
abstract = "Abundant but short second-generation sequencing reads make assembly difficult, leading to fragmented genomes and gene annotations. Gene structure information from RNA sequences can be used to improve the completeness and contiguity of an assembly, but bioinformatics methods have been lacking. Rascaf is a highly efficient tool leveraging long-range continuity information from intron spanning RNA sequencing (RNA-seq) read pairs to detect new contig connections. It determines a heaviest path in an exon block graph that simultaneously represents a gene and the underlying contig relationships. Rascaf is more accurate than its competitors, highly precise, and finds thousands of new verifiable connections in several draft Rosaceae genomes. Lightweight and practical, it can be readily incorporated into sequencing pipelines to improve an assembly and its gene annotations.",
author = "Li Song and Shankar, {Dhruv S.} and Florea, {Liliana D}",
year = "2016",
month = "11",
day = "1",
doi = "10.3835/plantgenome2016.03.0027",
language = "English (US)",
volume = "9",
journal = "Plant Genome",
issn = "1940-3372",
publisher = "Crop Science Society of America",
number = "3",

}

TY - JOUR

T1 - Rascaf

T2 - Improving genome assembly with RNA sequencing data

AU - Song, Li

AU - Shankar, Dhruv S.

AU - Florea, Liliana D

PY - 2016/11/1

Y1 - 2016/11/1

N2 - Abundant but short second-generation sequencing reads make assembly difficult, leading to fragmented genomes and gene annotations. Gene structure information from RNA sequences can be used to improve the completeness and contiguity of an assembly, but bioinformatics methods have been lacking. Rascaf is a highly efficient tool leveraging long-range continuity information from intron spanning RNA sequencing (RNA-seq) read pairs to detect new contig connections. It determines a heaviest path in an exon block graph that simultaneously represents a gene and the underlying contig relationships. Rascaf is more accurate than its competitors, highly precise, and finds thousands of new verifiable connections in several draft Rosaceae genomes. Lightweight and practical, it can be readily incorporated into sequencing pipelines to improve an assembly and its gene annotations.

AB - Abundant but short second-generation sequencing reads make assembly difficult, leading to fragmented genomes and gene annotations. Gene structure information from RNA sequences can be used to improve the completeness and contiguity of an assembly, but bioinformatics methods have been lacking. Rascaf is a highly efficient tool leveraging long-range continuity information from intron spanning RNA sequencing (RNA-seq) read pairs to detect new contig connections. It determines a heaviest path in an exon block graph that simultaneously represents a gene and the underlying contig relationships. Rascaf is more accurate than its competitors, highly precise, and finds thousands of new verifiable connections in several draft Rosaceae genomes. Lightweight and practical, it can be readily incorporated into sequencing pipelines to improve an assembly and its gene annotations.

UR - http://www.scopus.com/inward/record.url?scp=84990928884&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84990928884&partnerID=8YFLogxK

U2 - 10.3835/plantgenome2016.03.0027

DO - 10.3835/plantgenome2016.03.0027

M3 - Article

VL - 9

JO - Plant Genome

JF - Plant Genome

SN - 1940-3372

IS - 3

ER -