We describe a postgenomic in silico approach for identifying genes that are likely to be essential and estimate their proportion in haploid genomes. With the knowledge of all sites eligible for mutagenesis and an experimentally determined partial list of nonessential genes from genome mutagenesis, a Bayesian statistical method provides reasonable predictions of essential genes with a subsaturation level of random mutagenesis. For mutagenesis, a transposon such as Himar1 is suitable as it inserts randomly into TA sites. All of the possible insertion sites may be determined a priorifrom the genome sequence and with this information, data on experimentally hit TA sites may be used to predict the proportion of genes that cannot be mutated. As a model, we used the Mycobacterium tuberculosis genome. Using the Himar1 transposon, we created a genetically defined collection of 1,425 insertion mutants. Based on our Bayesian statistical analysis using Markov chain Monte Carlo and the observed frequencies of transposon insertions in all of the genes, we estimated that the M. tuberculosis genome contains 35% (95% confidence interval, 28%-41%) essential genes. This analysis further revealed seven functional groups with high probabilities of being enriched in essential genes. The PE-PGRS (Pro-Glu polymorphic GC-rich repetitive sequence) family of genes, which are unique to mycobacteria, the polyketide/nonribosomal peptide synthase family, and mycolic and fatty acid biosynthesis gene families were disproportionately enriched in essential genes. At subsaturation levels of mutagenesis with a random transposon such as Himar1, this approach permits a statistical prediction of both the proportion and identities of essential genes of sequenced genomes.
|Original language||English (US)|
|Number of pages||6|
|Journal||Proceedings of the National Academy of Sciences of the United States of America|
|State||Published - Jun 10 2003|
ASJC Scopus subject areas