Phymm and PhymmBL: Metagenomic phylogenetic classification with interpolated Markov models

Arthur Brady; Steven L. Salzberg

doi:10.1038/nmeth.1358

Phymm and PhymmBL: Metagenomic phylogenetic classification with interpolated Markov models

Arthur Brady, Steven L. Salzberg

Research output: Contribution to journal › Article › peer-review

335 Scopus citations

Abstract

Metagenomics projects collect DNA from uncharacterized environments that may contain thousands of species per sample. One main challenge facing metagenomic analysis is phylogenetic classification of raw sequence reads into groups representing the same or similar taxa, a prerequisite for genome assembly and for analyzing the biological diversity of a sample. New sequencing technologies have made metagenomics easier, by making sequencing faster, and more difficult, by producing shorter reads than previous technologies. Classifying sequences from reads as short as 100 base pairs has until now been relatively inaccurate, requiring researchers to use older, long-read technologies. We present Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 base pairs, a substantial improvement over previous composition-based classification methods. We also describe how combining Phymm with sequence alignment algorithms improves accuracy.

Original language	English (US)
Pages (from-to)	673-676
Number of pages	4
Journal	Nature Methods
Volume	6
Issue number	9
DOIs	https://doi.org/10.1038/nmeth.1358
State	Published - 2009
Externally published	Yes

ASJC Scopus subject areas

Biotechnology
Biochemistry
Molecular Biology
Cell Biology

Access to Document

10.1038/nmeth.1358

Cite this

@article{5d395d5aae3f4a5487d0f8583bea107b,

title = "Phymm and PhymmBL: Metagenomic phylogenetic classification with interpolated Markov models",

abstract = "Metagenomics projects collect DNA from uncharacterized environments that may contain thousands of species per sample. One main challenge facing metagenomic analysis is phylogenetic classification of raw sequence reads into groups representing the same or similar taxa, a prerequisite for genome assembly and for analyzing the biological diversity of a sample. New sequencing technologies have made metagenomics easier, by making sequencing faster, and more difficult, by producing shorter reads than previous technologies. Classifying sequences from reads as short as 100 base pairs has until now been relatively inaccurate, requiring researchers to use older, long-read technologies. We present Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 base pairs, a substantial improvement over previous composition-based classification methods. We also describe how combining Phymm with sequence alignment algorithms improves accuracy.",

author = "Arthur Brady and Salzberg, {Steven L.}",

note = "Funding Information: We thank A. Delcher for helpful discussions regarding IMM configuration. This work was supported in part by US National Institutes of Health grants R01-LM006845 and R01-GM083873.",

year = "2009",

doi = "10.1038/nmeth.1358",

language = "English (US)",

volume = "6",

pages = "673--676",

journal = "Nature Methods",

issn = "1548-7091",

publisher = "Nature Publishing Group",

number = "9",

}

TY - JOUR

T1 - Phymm and PhymmBL

T2 - Metagenomic phylogenetic classification with interpolated Markov models

AU - Brady, Arthur

AU - Salzberg, Steven L.

N1 - Funding Information: We thank A. Delcher for helpful discussions regarding IMM configuration. This work was supported in part by US National Institutes of Health grants R01-LM006845 and R01-GM083873.

PY - 2009

Y1 - 2009

N2 - Metagenomics projects collect DNA from uncharacterized environments that may contain thousands of species per sample. One main challenge facing metagenomic analysis is phylogenetic classification of raw sequence reads into groups representing the same or similar taxa, a prerequisite for genome assembly and for analyzing the biological diversity of a sample. New sequencing technologies have made metagenomics easier, by making sequencing faster, and more difficult, by producing shorter reads than previous technologies. Classifying sequences from reads as short as 100 base pairs has until now been relatively inaccurate, requiring researchers to use older, long-read technologies. We present Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 base pairs, a substantial improvement over previous composition-based classification methods. We also describe how combining Phymm with sequence alignment algorithms improves accuracy.

AB - Metagenomics projects collect DNA from uncharacterized environments that may contain thousands of species per sample. One main challenge facing metagenomic analysis is phylogenetic classification of raw sequence reads into groups representing the same or similar taxa, a prerequisite for genome assembly and for analyzing the biological diversity of a sample. New sequencing technologies have made metagenomics easier, by making sequencing faster, and more difficult, by producing shorter reads than previous technologies. Classifying sequences from reads as short as 100 base pairs has until now been relatively inaccurate, requiring researchers to use older, long-read technologies. We present Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 base pairs, a substantial improvement over previous composition-based classification methods. We also describe how combining Phymm with sequence alignment algorithms improves accuracy.

UR - http://www.scopus.com/inward/record.url?scp=69549135124&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=69549135124&partnerID=8YFLogxK

U2 - 10.1038/nmeth.1358

DO - 10.1038/nmeth.1358

M3 - Article

C2 - 19648916

AN - SCOPUS:69549135124

SN - 1548-7091

VL - 6

SP - 673

EP - 676

JO - Nature Methods

JF - Nature Methods

IS - 9

ER -

Phymm and PhymmBL: Metagenomic phylogenetic classification with interpolated Markov models

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this