TY - JOUR
T1 - Bracken
T2 - Estimating species abundance in metagenomics data
AU - Lu, Jennifer
AU - Breitwieser, Florian P.
AU - Thielen, Peter
AU - Salzberg, Steven L.
N1 - Publisher Copyright:
© 2017 Lu et al.
PY - 2017
Y1 - 2017
N2 - Metagenomic experiments attempt to characterize microbial communities using high-throughput DNA sequencing. Identification of the microorganisms in a sample provides information about the genetic profile, population structure, and role of microorganisms within an environment. Until recently, most metagenomics studies focused on high-level characterization at the level of phyla, or alternatively sequenced the 16S ribosomalRNAgene that is present in bacterial species. As the cost of sequencing has fallen, though, metagenomics experiments have increasingly used unbiased shotgun sequencing to capture all the organisms in a sample. This approach requires a method for estimating abundance directly from the raw read data. Here we describe a fast, accurate new method that computes the abundance at the species level using the reads collected in a metagenomics experiment. Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN) uses the taxonomic assignments made by Kraken, a very fast read-level classifier, along with information about the genomes themselves to estimate abundance at the species level, the genus level, or above. We demonstrate that Bracken can produce accurate species- and genus-level abundance estimates even when a sample contains multiple near-identical species.
AB - Metagenomic experiments attempt to characterize microbial communities using high-throughput DNA sequencing. Identification of the microorganisms in a sample provides information about the genetic profile, population structure, and role of microorganisms within an environment. Until recently, most metagenomics studies focused on high-level characterization at the level of phyla, or alternatively sequenced the 16S ribosomalRNAgene that is present in bacterial species. As the cost of sequencing has fallen, though, metagenomics experiments have increasingly used unbiased shotgun sequencing to capture all the organisms in a sample. This approach requires a method for estimating abundance directly from the raw read data. Here we describe a fast, accurate new method that computes the abundance at the species level using the reads collected in a metagenomics experiment. Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN) uses the taxonomic assignments made by Kraken, a very fast read-level classifier, along with information about the genomes themselves to estimate abundance at the species level, the genus level, or above. We demonstrate that Bracken can produce accurate species- and genus-level abundance estimates even when a sample contains multiple near-identical species.
KW - Bayesian estimation
KW - Metagenomics
KW - Microbiome
KW - Species abundance
UR - http://www.scopus.com/inward/record.url?scp=85026281530&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85026281530&partnerID=8YFLogxK
U2 - 10.7717/peerj-cs.104
DO - 10.7717/peerj-cs.104
M3 - Article
AN - SCOPUS:85026281530
SN - 2376-5992
VL - 2017
JO - PeerJ Computer Science
JF - PeerJ Computer Science
IS - 1
M1 - e104
ER -