Virulence and immunity are poorly understood in Mycobacterium tuberculosis. We sequenced the complete genome of the M. tuberculosis clinical strain CDC1551 and performed a whole-genome comparison with the laboratory strain H37Rv in order to identify polymorphic sequences with potential relevance to disease pathogenesis, immunity, and evolution. We found large-sequence and single-nucleotide polymorphisms in numerous genes. Polymorphic loci included a phospholipase C, a membrane lipoprotein, members of an adenylate cyclase gene family, and members of the PE/PPE gene family, some of which have been implicated in virulence or the host immune response. Several gene families, including the PE/PPE gene family, also had significantly higher synonymous and nonsynonymous substitution frequencies compared to the genome as a whole. We tested a large sample of M. tuberculosis clinical isolates for a subset of the large-sequence and single-nucleotide polymorphisms and found widespread genetic variability at many of these loci. We performed phylogenetic and epidemiological analysis to investigate the evolutionary relationships among isolates and the origins of specific polymorphic loci. A number of these polymorphisms appear to have occurred multiple times as independent events, suggesting that these changes may be under selective pressure. Together, these results demonstrate that polymorphisms among M. tuberculosis strains are more extensive than initially anticipated, and genetic variation may have an important role in disease pathogenesis and immunity.
ASJC Scopus subject areas
- Molecular Biology