Abstract
Background: Many bacterial genome sequences completed using the Sanger method may contain assembly errors due in-part to low sequence coverage driven by cost. Findings: To illustrate the need for re-sequencing of pre-nextgen genomes and to validate sequenced genomes, we conducted a series of experiments, using high coverage sequencing data generated by a Illumina Miseq sequencer to sequence genomic DNAs of Bacteroides fragilis NCTC 9343, Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150, Vibrio cholerae O1 biovar El Tor str. N16961, Bacillus halodurans C-125 and Caulobacter crescentus CB15, which had previously been sequenced by the Sanger method during the early 2000's. Conclusions: This study revealed a number of discrepancies between the published assemblies and sequence read alignments for all five bacterial species, suggesting that the continued use of these error-containing genomes and their genetic information may contribute to false conclusions and/or incorrect future discoveries when they are used.
Original language | English (US) |
---|---|
Article number | 122 |
Journal | BioData Mining |
Volume | 7 |
Issue number | 1 |
DOIs | |
State | Published - Nov 7 2014 |
Keywords
- Bacteria
- Brucella
- Genomes
- Genomics
- Microbiota
- Mycobacterium
- Salmonella
- Sequences
- Vibrio
ASJC Scopus subject areas
- Biochemistry
- Molecular Biology
- Genetics
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics