We have developed a simple procedure to identify protein homologs in genomic databases. The program, called ORF, is based on comparisons of predicted secondary structure. Protein structure is far better conserved than amino acid sequence, and structure-based methods have been effective in exploiting this fact to find homologs, even among proteins with scant sequence identity. ORF is a secondary structure-based method that operates solely on predictions from sequence and requires no experimentally determined information about the structure. The approach is illustrated by an example: Thymidylate synthase, a highly conserved enzyme essential to thymidine biosynthesis in both prokaryotes and eukaryotes, is thought to be used by Archaea, but a corresponding gene has yet to be identified. Here, a candidate thymidylate synthase is identified as a previously unassigned open reading frame from the genome of Methanococcus jannaschii, viz., MJ0757. Using primary structure information alone, the optimally aligned sequence identity between MJ0757 and Escherichia coli thymidylate synthase is 7%, well below the threshold of sensitivity for detection by sequence-based methods.
|Original language||English (US)|
|Number of pages||6|
|Journal||Proceedings of the National Academy of Sciences of the United States of America|
|State||Published - Mar 17 1998|
ASJC Scopus subject areas