Base-Calling Algorithm with Vocabulary (BCV) Method for Analyzing Population Sequencing Chromatograms

Yuri S. Fantin; Alexey D. Neverov; Alexander V. Favorov; Maria V. Alvarez-Figueroa; Svetlana I. Braslavskaya; Maria A. Gordukova; Inga V. Karandashova; Konstantin V. Kuleshov; Anna I. Myznikova; Maya S. Polishchuk; Denis A. Reshetov; Yana A. Voiciehovskaya; Andrei A. Mironov; Vladimir P. Chulanov

doi:10.1371/journal.pone.0054835

Base-Calling Algorithm with Vocabulary (BCV) Method for Analyzing Population Sequencing Chromatograms

Yuri S. Fantin, Alexey D. Neverov, Alexander V. Favorov, Maria V. Alvarez-Figueroa, Svetlana I. Braslavskaya, Maria A. Gordukova, Inga V. Karandashova, Konstantin V. Kuleshov, Anna I. Myznikova, Maya S. Polishchuk, Denis A. Reshetov, Yana A. Voiciehovskaya, Andrei A. Mironov, Vladimir P. Chulanov

Research output: Contribution to journal › Article › peer-review

6 Scopus citations

Abstract

Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV) method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3-14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing.

Original language	English (US)
Article number	e54835
Journal	PloS one
Volume	8
Issue number	1
DOIs	https://doi.org/10.1371/journal.pone.0054835
State	Published - Jan 28 2013
Externally published	Yes

ASJC Scopus subject areas

General

Access to Document

10.1371/journal.pone.0054835

Cite this

Fantin, Y. S., Neverov, A. D., Favorov, A. V., Alvarez-Figueroa, M. V., Braslavskaya, S. I., Gordukova, M. A., Karandashova, I. V., Kuleshov, K. V., Myznikova, A. I., Polishchuk, M. S., Reshetov, D. A., Voiciehovskaya, Y. A., Mironov, A. A., & Chulanov, V. P. (2013). Base-Calling Algorithm with Vocabulary (BCV) Method for Analyzing Population Sequencing Chromatograms. PloS one, 8(1), Article e54835. https://doi.org/10.1371/journal.pone.0054835

Fantin, YS, Neverov, AD, Favorov, AV, Alvarez-Figueroa, MV, Braslavskaya, SI, Gordukova, MA, Karandashova, IV, Kuleshov, KV, Myznikova, AI, Polishchuk, MS, Reshetov, DA, Voiciehovskaya, YA, Mironov, AA & Chulanov, VP 2013, 'Base-Calling Algorithm with Vocabulary (BCV) Method for Analyzing Population Sequencing Chromatograms', PloS one, vol. 8, no. 1, e54835. https://doi.org/10.1371/journal.pone.0054835

@article{837f99bb5aa14f3ea2ee292ddc7f4829,

title = "Base-Calling Algorithm with Vocabulary (BCV) Method for Analyzing Population Sequencing Chromatograms",

abstract = "Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV) method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3-14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing.",

author = "Fantin, {Yuri S.} and Neverov, {Alexey D.} and Favorov, {Alexander V.} and Alvarez-Figueroa, {Maria V.} and Braslavskaya, {Svetlana I.} and Gordukova, {Maria A.} and Karandashova, {Inga V.} and Kuleshov, {Konstantin V.} and Myznikova, {Anna I.} and Polishchuk, {Maya S.} and Reshetov, {Denis A.} and Voiciehovskaya, {Yana A.} and Mironov, {Andrei A.} and Chulanov, {Vladimir P.}",

year = "2013",

month = jan,

day = "28",

doi = "10.1371/journal.pone.0054835",

language = "English (US)",

volume = "8",

journal = "PloS one",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "1",

}

TY - JOUR

T1 - Base-Calling Algorithm with Vocabulary (BCV) Method for Analyzing Population Sequencing Chromatograms

AU - Fantin, Yuri S.

AU - Neverov, Alexey D.

AU - Favorov, Alexander V.

AU - Alvarez-Figueroa, Maria V.

AU - Braslavskaya, Svetlana I.

AU - Gordukova, Maria A.

AU - Karandashova, Inga V.

AU - Kuleshov, Konstantin V.

AU - Myznikova, Anna I.

AU - Polishchuk, Maya S.

AU - Reshetov, Denis A.

AU - Voiciehovskaya, Yana A.

AU - Mironov, Andrei A.

AU - Chulanov, Vladimir P.

PY - 2013/1/28

Y1 - 2013/1/28

N2 - Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV) method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3-14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing.

AB - Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV) method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3-14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing.

UR - http://www.scopus.com/inward/record.url?scp=84873844628&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84873844628&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0054835

DO - 10.1371/journal.pone.0054835

M3 - Article

C2 - 23382983

AN - SCOPUS:84873844628

SN - 1932-6203

VL - 8

JO - PloS one

JF - PloS one

IS - 1

M1 - e54835

ER -

Base-Calling Algorithm with Vocabulary (BCV) Method for Analyzing Population Sequencing Chromatograms

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this