Balrog: A universal protein model for prokaryotic gene prediction

Markus J. Sommer, Steven L. Salzberg

Research output: Contribution to journalArticlepeer-review

Abstract

Low-cost, high-Throughput sequencing has led to an enormous increase in the number of sequenced microbial genomes, with well over 100,000 genomes in public archives today. Automatic genome annotation tools are integral to understanding these organisms, yet older gene finding methods must be retrained on each new genome. We have developed a universal model of prokaryotic genes by fitting a temporal convolutional network to aminoacid sequences from a large, diverse set of microbial genomes. We incorporated the new model into a gene finding system, Balrog (Bacterial Annotation by Learned Representation Of Genes), which does not require genome-specific training and which matches or outperforms other state-of-The-Art gene finding tools. Balrog is freely available under the MIT license at https://github.com/salzberg-lab/Balrog.

Original languageEnglish (US)
Article numbere1008727
JournalPLoS computational biology
Volume17
Issue number2
DOIs
StatePublished - Feb 26 2021

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Modeling and Simulation
  • Ecology
  • Molecular Biology
  • Genetics
  • Cellular and Molecular Neuroscience
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Balrog: A universal protein model for prokaryotic gene prediction'. Together they form a unique fingerprint.

Cite this