FLASH: Fast length adjustment of short reads to improve genome assemblies

Tanja Magoč, Steven L Salzberg

Research output: Contribution to journalArticle

Abstract

Motivation: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. Results: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of

Original languageEnglish (US)
Article numberbtr507
Pages (from-to)2957-2963
Number of pages7
JournalBioinformatics
Volume27
Issue number21
DOIs
StatePublished - Nov 2011

Fingerprint

Adjustment
Genome
Genes
Overlapping
Fragment
Chromosomes, Human, Pair 14
Human Chromosomes
Chromosomes
Merging
Bacteria
Sequencing
Libraries
Chromosome
Error Rate
Staphylococcus aureus
Correctness
Coverage
Technology
Human

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

FLASH : Fast length adjustment of short reads to improve genome assemblies. / Magoč, Tanja; Salzberg, Steven L.

In: Bioinformatics, Vol. 27, No. 21, btr507, 11.2011, p. 2957-2963.

Research output: Contribution to journalArticle

@article{12824c1093eb45408e0ff6ca590c1d94,
title = "FLASH: Fast length adjustment of short reads to improve genome assemblies",
abstract = "Motivation: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. Results: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99{\%} of the time on simulated reads with an error rate of",
author = "Tanja Magoč and Salzberg, {Steven L}",
year = "2011",
month = "11",
doi = "10.1093/bioinformatics/btr507",
language = "English (US)",
volume = "27",
pages = "2957--2963",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "21",

}

TY - JOUR

T1 - FLASH

T2 - Fast length adjustment of short reads to improve genome assemblies

AU - Magoč, Tanja

AU - Salzberg, Steven L

PY - 2011/11

Y1 - 2011/11

N2 - Motivation: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. Results: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of

AB - Motivation: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. Results: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of

UR - http://www.scopus.com/inward/record.url?scp=80054913451&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80054913451&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btr507

DO - 10.1093/bioinformatics/btr507

M3 - Article

C2 - 21903629

AN - SCOPUS:80054913451

VL - 27

SP - 2957

EP - 2963

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 21

M1 - btr507

ER -