Microbial gene identification using interpolated Markov models

Steven L Salzberg, Arthur L. Deicher, Simon Kasif, Owen White

Research output: Contribution to journalArticle

Abstract

This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H. pylori and H. influenzae is that the system finds > 97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.

Original languageEnglish (US)
Pages (from-to)544-548
Number of pages5
JournalNucleic Acids Research
Volume26
Issue number2
DOIs
StatePublished - Jan 15 1998

Fingerprint

Microbial Genes
Microbial Genome
Haemophilus influenzae
Helicobacter pylori
Genes
Nucleotides
DNA

ASJC Scopus subject areas

  • Genetics

Cite this

Microbial gene identification using interpolated Markov models. / Salzberg, Steven L; Deicher, Arthur L.; Kasif, Simon; White, Owen.

In: Nucleic Acids Research, Vol. 26, No. 2, 15.01.1998, p. 544-548.

Research output: Contribution to journalArticle

Salzberg, Steven L ; Deicher, Arthur L. ; Kasif, Simon ; White, Owen. / Microbial gene identification using interpolated Markov models. In: Nucleic Acids Research. 1998 ; Vol. 26, No. 2. pp. 544-548.
@article{9ee3e4a68acb4d6ea0d5a704cedf7ee8,
title = "Microbial gene identification using interpolated Markov models",
abstract = "This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H. pylori and H. influenzae is that the system finds > 97{\%} of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.",
author = "Salzberg, {Steven L} and Deicher, {Arthur L.} and Simon Kasif and Owen White",
year = "1998",
month = "1",
day = "15",
doi = "10.1093/nar/26.2.544",
language = "English (US)",
volume = "26",
pages = "544--548",
journal = "Nucleic Acids Research",
issn = "1362-4962",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - Microbial gene identification using interpolated Markov models

AU - Salzberg, Steven L

AU - Deicher, Arthur L.

AU - Kasif, Simon

AU - White, Owen

PY - 1998/1/15

Y1 - 1998/1/15

N2 - This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H. pylori and H. influenzae is that the system finds > 97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.

AB - This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H. pylori and H. influenzae is that the system finds > 97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.

UR - http://www.scopus.com/inward/record.url?scp=0032518163&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032518163&partnerID=8YFLogxK

U2 - 10.1093/nar/26.2.544

DO - 10.1093/nar/26.2.544

M3 - Article

VL - 26

SP - 544

EP - 548

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 1362-4962

IS - 2

ER -