TY - JOUR
T1 - Microbial gene identification using interpolated Markov models
AU - Salzberg, Steven L.
AU - Deicher, Arthur L.
AU - Kasif, Simon
AU - White, Owen
N1 - Funding Information:
Thanks to Mark Borodovsky and Alexander Lukashin for kindly sharing the results of GeneMarkHMM on the H.pylori genome. S.L.S. is supported by the National Human Genome Research Institute at NIH under Grant No. K01-HG00022-1. S.L.S. and A.L.D. are supported by the National Science foundation under Grant No. IRI-9530462. S.K. is supported by NSF IRI-9529227. O.W. is supported by the Department of Energy Grant No. DE-FC02-95ER61962.A003.
PY - 1998/1/15
Y1 - 1998/1/15
N2 - This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H. pylori and H. influenzae is that the system finds > 97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.
AB - This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H. pylori and H. influenzae is that the system finds > 97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.
UR - http://www.scopus.com/inward/record.url?scp=0032518163&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0032518163&partnerID=8YFLogxK
U2 - 10.1093/nar/26.2.544
DO - 10.1093/nar/26.2.544
M3 - Article
C2 - 9421513
AN - SCOPUS:0032518163
SN - 0305-1048
VL - 26
SP - 544
EP - 548
JO - Nucleic acids research
JF - Nucleic acids research
IS - 2
ER -