Automated correction of genome sequence errors

Pawel Gajer, Michael Schatz, Steven L Salzberg

Research output: Contribution to journalArticle

Abstract

By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the overall accuracy of genome sequences and facilitates the use of these sequences for polymorphism discovery. We describe the algorithm and its application in a large set of recent genome sequencing projects. The number of erroneous base calls in these projects was reduced by 80%. In an analysis of over one million corrections, we found that AutoEditor made just one error per 8828 corrections. By substantially increasing the accuracy of base calling, AutoEditor can dramatically accelerate the process of finishing genomes, which involves closing all gaps and ensuring minimum quality standards for the final sequence. It also greatly improves our ability to discover single nucleotide polymorphisms (SNPs) between closely related strains and isolates of the same species.

Original languageEnglish (US)
Pages (from-to)562-569
Number of pages8
JournalNucleic Acids Research
Volume32
Issue number2
DOIs
StatePublished - 2004
Externally publishedYes

Fingerprint

Genome
Single Nucleotide Polymorphism

ASJC Scopus subject areas

  • Genetics

Cite this

Automated correction of genome sequence errors. / Gajer, Pawel; Schatz, Michael; Salzberg, Steven L.

In: Nucleic Acids Research, Vol. 32, No. 2, 2004, p. 562-569.

Research output: Contribution to journalArticle

Gajer, Pawel ; Schatz, Michael ; Salzberg, Steven L. / Automated correction of genome sequence errors. In: Nucleic Acids Research. 2004 ; Vol. 32, No. 2. pp. 562-569.
@article{bfb825a94246470aab782ff7e336254c,
title = "Automated correction of genome sequence errors",
abstract = "By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the overall accuracy of genome sequences and facilitates the use of these sequences for polymorphism discovery. We describe the algorithm and its application in a large set of recent genome sequencing projects. The number of erroneous base calls in these projects was reduced by 80{\%}. In an analysis of over one million corrections, we found that AutoEditor made just one error per 8828 corrections. By substantially increasing the accuracy of base calling, AutoEditor can dramatically accelerate the process of finishing genomes, which involves closing all gaps and ensuring minimum quality standards for the final sequence. It also greatly improves our ability to discover single nucleotide polymorphisms (SNPs) between closely related strains and isolates of the same species.",
author = "Pawel Gajer and Michael Schatz and Salzberg, {Steven L}",
year = "2004",
doi = "10.1093/nar/gkh216",
language = "English (US)",
volume = "32",
pages = "562--569",
journal = "Nucleic Acids Research",
issn = "1362-4962",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - Automated correction of genome sequence errors

AU - Gajer, Pawel

AU - Schatz, Michael

AU - Salzberg, Steven L

PY - 2004

Y1 - 2004

N2 - By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the overall accuracy of genome sequences and facilitates the use of these sequences for polymorphism discovery. We describe the algorithm and its application in a large set of recent genome sequencing projects. The number of erroneous base calls in these projects was reduced by 80%. In an analysis of over one million corrections, we found that AutoEditor made just one error per 8828 corrections. By substantially increasing the accuracy of base calling, AutoEditor can dramatically accelerate the process of finishing genomes, which involves closing all gaps and ensuring minimum quality standards for the final sequence. It also greatly improves our ability to discover single nucleotide polymorphisms (SNPs) between closely related strains and isolates of the same species.

AB - By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the overall accuracy of genome sequences and facilitates the use of these sequences for polymorphism discovery. We describe the algorithm and its application in a large set of recent genome sequencing projects. The number of erroneous base calls in these projects was reduced by 80%. In an analysis of over one million corrections, we found that AutoEditor made just one error per 8828 corrections. By substantially increasing the accuracy of base calling, AutoEditor can dramatically accelerate the process of finishing genomes, which involves closing all gaps and ensuring minimum quality standards for the final sequence. It also greatly improves our ability to discover single nucleotide polymorphisms (SNPs) between closely related strains and isolates of the same species.

UR - http://www.scopus.com/inward/record.url?scp=1342306398&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1342306398&partnerID=8YFLogxK

U2 - 10.1093/nar/gkh216

DO - 10.1093/nar/gkh216

M3 - Article

C2 - 14744981

AN - SCOPUS:1342306398

VL - 32

SP - 562

EP - 569

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 1362-4962

IS - 2

ER -