TY - JOUR
T1 - A probabilistic method for identifying start codons in bacterial genomes
AU - Suzek, Baris E.
AU - Ermolaeva, Maria D.
AU - Schreiber, Mark
AU - Salzberg, Steven L.
N1 - Funding Information:
S.L.S. and M.D.E. were supported in part by grants IRI-9902923 and KDI-9988088 from the National Science Foundation and grant R01-LM06845 from the National Institutes of Health. M.S. was supported by a Targeted PhD Scholarship from Otago University obtained on his behalf by Christopher Brown.
PY - 2002
Y1 - 2002
N2 - As the pace of genome sequencing has accelerated, the need for highly accurate gene prediction systems has grown. Computational systems for identifying genes in prokaryotic genomes have sensitivities of 98-99% or higher (Delcher et al., Nucleic Acids Res., 27, 4636-4641, 1999). These accuracy figures are calculated by comparing the locations of verified stop codons to the predictions. Determining the accuracy of start codon prediction is more problematic, however, due to the relatively small number of start sites that have been confirmed by independent, non-computational methods. Nonetheless, the accuracy of gene finders at predicting the exact gene boundaries at both the 5′ and 3′ ends of genes is of critical importance for microbial genome annotation, especially in light of the important signaling information that is sometimes found on the 5′ end of a protein coding region. In this paper we propose a probabilistic method to improve the accuracy of gene identification systems at finding precise translation start sites. The new system, RBSfinder, is tested on a validated set of genes from Escherichia coli, for which it improves the accuracy of start site locations predicted by computational gene finding systems from the range 67-77% to 90% correct.
AB - As the pace of genome sequencing has accelerated, the need for highly accurate gene prediction systems has grown. Computational systems for identifying genes in prokaryotic genomes have sensitivities of 98-99% or higher (Delcher et al., Nucleic Acids Res., 27, 4636-4641, 1999). These accuracy figures are calculated by comparing the locations of verified stop codons to the predictions. Determining the accuracy of start codon prediction is more problematic, however, due to the relatively small number of start sites that have been confirmed by independent, non-computational methods. Nonetheless, the accuracy of gene finders at predicting the exact gene boundaries at both the 5′ and 3′ ends of genes is of critical importance for microbial genome annotation, especially in light of the important signaling information that is sometimes found on the 5′ end of a protein coding region. In this paper we propose a probabilistic method to improve the accuracy of gene identification systems at finding precise translation start sites. The new system, RBSfinder, is tested on a validated set of genes from Escherichia coli, for which it improves the accuracy of start site locations predicted by computational gene finding systems from the range 67-77% to 90% correct.
UR - http://www.scopus.com/inward/record.url?scp=0036137327&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036137327&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/17.12.1123
DO - 10.1093/bioinformatics/17.12.1123
M3 - Article
C2 - 11751220
AN - SCOPUS:0036137327
SN - 1367-4803
VL - 17
SP - 1123
EP - 1130
JO - Bioinformatics
JF - Bioinformatics
IS - 12
ER -