Bracken: Estimating species abundance in metagenomics data

Jennifer Lu, Florian P. Breitwieser, Peter Thielen, Steven L Salzberg

Research output: Contribution to journalArticle

Abstract

Metagenomic experiments attempt to characterize microbial communities using high-throughput DNA sequencing. Identification of the microorganisms in a sample provides information about the genetic profile, population structure, and role of microorganisms within an environment. Until recently, most metagenomics studies focused on high-level characterization at the level of phyla, or alternatively sequenced the 16S ribosomalRNAgene that is present in bacterial species. As the cost of sequencing has fallen, though, metagenomics experiments have increasingly used unbiased shotgun sequencing to capture all the organisms in a sample. This approach requires a method for estimating abundance directly from the raw read data. Here we describe a fast, accurate new method that computes the abundance at the species level using the reads collected in a metagenomics experiment. Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN) uses the taxonomic assignments made by Kraken, a very fast read-level classifier, along with information about the genomes themselves to estimate abundance at the species level, the genus level, or above. We demonstrate that Bracken can produce accurate species- and genus-level abundance estimates even when a sample contains multiple near-identical species.

Original languageEnglish (US)
Article number104
JournalPeerJ
Volume2017
Issue number1
DOIs
StatePublished - 2017

Fingerprint

Metagenomics
taxonomy
Microorganisms
Experiments
microorganisms
High-Throughput Nucleotide Sequencing
Firearms
Population Genetics
Classifiers
Genes
Throughput
sampling
microbial communities
population structure
sequence analysis
DNA
Genome
Costs and Cost Analysis
genome
Costs

Keywords

  • Bayesian estimation
  • Metagenomics
  • Microbiome
  • Species abundance

ASJC Scopus subject areas

  • Neuroscience(all)
  • Medicine(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Bracken : Estimating species abundance in metagenomics data. / Lu, Jennifer; Breitwieser, Florian P.; Thielen, Peter; Salzberg, Steven L.

In: PeerJ, Vol. 2017, No. 1, 104, 2017.

Research output: Contribution to journalArticle

Lu, Jennifer ; Breitwieser, Florian P. ; Thielen, Peter ; Salzberg, Steven L. / Bracken : Estimating species abundance in metagenomics data. In: PeerJ. 2017 ; Vol. 2017, No. 1.
@article{a4642c63e9d44c8fa1b1dab81e4c67ce,
title = "Bracken: Estimating species abundance in metagenomics data",
abstract = "Metagenomic experiments attempt to characterize microbial communities using high-throughput DNA sequencing. Identification of the microorganisms in a sample provides information about the genetic profile, population structure, and role of microorganisms within an environment. Until recently, most metagenomics studies focused on high-level characterization at the level of phyla, or alternatively sequenced the 16S ribosomalRNAgene that is present in bacterial species. As the cost of sequencing has fallen, though, metagenomics experiments have increasingly used unbiased shotgun sequencing to capture all the organisms in a sample. This approach requires a method for estimating abundance directly from the raw read data. Here we describe a fast, accurate new method that computes the abundance at the species level using the reads collected in a metagenomics experiment. Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN) uses the taxonomic assignments made by Kraken, a very fast read-level classifier, along with information about the genomes themselves to estimate abundance at the species level, the genus level, or above. We demonstrate that Bracken can produce accurate species- and genus-level abundance estimates even when a sample contains multiple near-identical species.",
keywords = "Bayesian estimation, Metagenomics, Microbiome, Species abundance",
author = "Jennifer Lu and Breitwieser, {Florian P.} and Peter Thielen and Salzberg, {Steven L}",
year = "2017",
doi = "10.7717/peerj-cs.104",
language = "English (US)",
volume = "2017",
journal = "PeerJ",
issn = "2167-8359",
publisher = "PeerJ",
number = "1",

}

TY - JOUR

T1 - Bracken

T2 - Estimating species abundance in metagenomics data

AU - Lu, Jennifer

AU - Breitwieser, Florian P.

AU - Thielen, Peter

AU - Salzberg, Steven L

PY - 2017

Y1 - 2017

N2 - Metagenomic experiments attempt to characterize microbial communities using high-throughput DNA sequencing. Identification of the microorganisms in a sample provides information about the genetic profile, population structure, and role of microorganisms within an environment. Until recently, most metagenomics studies focused on high-level characterization at the level of phyla, or alternatively sequenced the 16S ribosomalRNAgene that is present in bacterial species. As the cost of sequencing has fallen, though, metagenomics experiments have increasingly used unbiased shotgun sequencing to capture all the organisms in a sample. This approach requires a method for estimating abundance directly from the raw read data. Here we describe a fast, accurate new method that computes the abundance at the species level using the reads collected in a metagenomics experiment. Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN) uses the taxonomic assignments made by Kraken, a very fast read-level classifier, along with information about the genomes themselves to estimate abundance at the species level, the genus level, or above. We demonstrate that Bracken can produce accurate species- and genus-level abundance estimates even when a sample contains multiple near-identical species.

AB - Metagenomic experiments attempt to characterize microbial communities using high-throughput DNA sequencing. Identification of the microorganisms in a sample provides information about the genetic profile, population structure, and role of microorganisms within an environment. Until recently, most metagenomics studies focused on high-level characterization at the level of phyla, or alternatively sequenced the 16S ribosomalRNAgene that is present in bacterial species. As the cost of sequencing has fallen, though, metagenomics experiments have increasingly used unbiased shotgun sequencing to capture all the organisms in a sample. This approach requires a method for estimating abundance directly from the raw read data. Here we describe a fast, accurate new method that computes the abundance at the species level using the reads collected in a metagenomics experiment. Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN) uses the taxonomic assignments made by Kraken, a very fast read-level classifier, along with information about the genomes themselves to estimate abundance at the species level, the genus level, or above. We demonstrate that Bracken can produce accurate species- and genus-level abundance estimates even when a sample contains multiple near-identical species.

KW - Bayesian estimation

KW - Metagenomics

KW - Microbiome

KW - Species abundance

UR - http://www.scopus.com/inward/record.url?scp=85026281530&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85026281530&partnerID=8YFLogxK

U2 - 10.7717/peerj-cs.104

DO - 10.7717/peerj-cs.104

M3 - Article

AN - SCOPUS:85026281530

VL - 2017

JO - PeerJ

JF - PeerJ

SN - 2167-8359

IS - 1

M1 - 104

ER -