Microbial gene identification using interpolated Markov models

Steven L. Salzberg, Arthur L. Deicher, Simon Kasif, Owen White

Research output: Contribution to journalArticlepeer-review

757 Scopus citations

Abstract

This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H. pylori and H. influenzae is that the system finds > 97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.

Original languageEnglish (US)
Pages (from-to)544-548
Number of pages5
JournalNucleic acids research
Volume26
Issue number2
DOIs
StatePublished - Jan 15 1998
Externally publishedYes

ASJC Scopus subject areas

  • Genetics

Fingerprint

Dive into the research topics of 'Microbial gene identification using interpolated Markov models'. Together they form a unique fingerprint.

Cite this