TY - JOUR
T1 - Serendipitous discovery of Wolbachia genomes in multiple Drosophila species
AU - Salzberg, Steven L.
AU - Dunning Hotopp, Julie C.
AU - Delcher, Arthur L.
AU - Pop, Mihai
AU - Smith, Douglas R.
AU - Eisen, Michael B.
AU - Nelson, William C.
N1 - Funding Information:
We thank Hean Koo for help with genome data management, and Hervé Tettelin and Martin Wu for helpful comments on the manuscript. We also thank Agencourt Bioscience, the Washington University Genome Sequencing Center and the NIH for making sequence data publicly available through the NCBI Trace Archive. S.L.S., A.L.D., and M.P. were supported in part by the NIH under grants R01-LM06845 and R01-LM007938 to SLS. J.D.H. was supported by funds from National Science Foundation Frontiers in Integrative Biological Research under grant EF-0328363.
Publisher Copyright:
© 2005 Salzberg et al.
PY - 2005
Y1 - 2005
N2 - Background: The Trace Archive is a repository for the raw, unanalyzed data generated by largescale genome sequencing projects. The existence of this data offers scientists the possibility of discovering additional genomic sequences beyond those originally sequenced. In particular, if the source DNA for a sequencing project came from a species that was colonized by another organism, then the project may yield substantial amounts of genomic DNA, including near-complete genomes, from the symbiotic or parasitic organism. Results: By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences with partial matches to a previously sequenced Wolbachia strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome. Conclusions: The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new Wolbachia genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.
AB - Background: The Trace Archive is a repository for the raw, unanalyzed data generated by largescale genome sequencing projects. The existence of this data offers scientists the possibility of discovering additional genomic sequences beyond those originally sequenced. In particular, if the source DNA for a sequencing project came from a species that was colonized by another organism, then the project may yield substantial amounts of genomic DNA, including near-complete genomes, from the symbiotic or parasitic organism. Results: By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences with partial matches to a previously sequenced Wolbachia strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome. Conclusions: The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new Wolbachia genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.
UR - http://www.scopus.com/inward/record.url?scp=18944377180&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=18944377180&partnerID=8YFLogxK
U2 - 10.1186/gb-2005-6-3-r23
DO - 10.1186/gb-2005-6-3-r23
M3 - Article
C2 - 15774024
AN - SCOPUS:18944377180
SN - 1474-7596
VL - 6
JO - Genome biology
JF - Genome biology
IS - 3
M1 - R23
ER -