Microbial gene identification using interpolated Markov models

Steven L. Salzberg; Arthur L. Deicher; Simon Kasif; Owen White

doi:10.1093/nar/26.2.544

Microbial gene identification using interpolated Markov models

Steven L. Salzberg, Arthur L. Deicher, Simon Kasif, Owen White

Research output: Contribution to journal › Article › peer-review

757 Scopus citations

Abstract

This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H. pylori and H. influenzae is that the system finds > 97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.

Original language	English (US)
Pages (from-to)	544-548
Number of pages	5
Journal	Nucleic acids research
Volume	26
Issue number	2
DOIs	https://doi.org/10.1093/nar/26.2.544
State	Published - Jan 15 1998
Externally published	Yes

ASJC Scopus subject areas

Genetics

Access to Document

10.1093/nar/26.2.544

Cite this

@article{9ee3e4a68acb4d6ea0d5a704cedf7ee8,

title = "Microbial gene identification using interpolated Markov models",

abstract = "This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H. pylori and H. influenzae is that the system finds > 97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.",

author = "Salzberg, {Steven L.} and Deicher, {Arthur L.} and Simon Kasif and Owen White",

note = "Funding Information: Thanks to Mark Borodovsky and Alexander Lukashin for kindly sharing the results of GeneMarkHMM on the H.pylori genome. S.L.S. is supported by the National Human Genome Research Institute at NIH under Grant No. K01-HG00022-1. S.L.S. and A.L.D. are supported by the National Science foundation under Grant No. IRI-9530462. S.K. is supported by NSF IRI-9529227. O.W. is supported by the Department of Energy Grant No. DE-FC02-95ER61962.A003.",

year = "1998",

month = jan,

day = "15",

doi = "10.1093/nar/26.2.544",

language = "English (US)",

volume = "26",

pages = "544--548",

journal = "Nucleic acids research",

issn = "0305-1048",

publisher = "Oxford University Press",

number = "2",

}

TY - JOUR

T1 - Microbial gene identification using interpolated Markov models

AU - Salzberg, Steven L.

AU - Deicher, Arthur L.

AU - Kasif, Simon

AU - White, Owen

N1 - Funding Information: Thanks to Mark Borodovsky and Alexander Lukashin for kindly sharing the results of GeneMarkHMM on the H.pylori genome. S.L.S. is supported by the National Human Genome Research Institute at NIH under Grant No. K01-HG00022-1. S.L.S. and A.L.D. are supported by the National Science foundation under Grant No. IRI-9530462. S.K. is supported by NSF IRI-9529227. O.W. is supported by the Department of Energy Grant No. DE-FC02-95ER61962.A003.

PY - 1998/1/15

Y1 - 1998/1/15

N2 - This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H. pylori and H. influenzae is that the system finds > 97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.

AB - This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H. pylori and H. influenzae is that the system finds > 97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.

UR - http://www.scopus.com/inward/record.url?scp=0032518163&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032518163&partnerID=8YFLogxK

U2 - 10.1093/nar/26.2.544

DO - 10.1093/nar/26.2.544

M3 - Article

C2 - 9421513

AN - SCOPUS:0032518163

SN - 0305-1048

VL - 26

SP - 544

EP - 548

JO - Nucleic acids research

JF - Nucleic acids research

IS - 2

ER -

Microbial gene identification using interpolated Markov models

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this