Nucleotide sequences of DNA within clusters of transcription start sites identified by the Cap Analysis of Gene Expression (CAGE) have some distinctive features. DNA within such clusters is enriched in cytosine and guanine, and its GC-skew agrees with selection of the coding strand for which the G content exceeds the C content. On the other hand, for the coding strand the frequency of tracts of the avoided cytosine, normalized to the expectation calculated from the local content of the nucleotide in the cluster, is significantly higher than that of the tracts of the preferred guanine. Similarly, the statistical significance of the C-rich variant of binding site for transcription factor Sp1 in the coding strand is higher than that of the G-rich variant. Yet it is unlikely that the choice of the Sp1 site variant is induced by the coding strand selection. Rather, it is more likely that both variants are more or less equiprobable, and the Sp1 functional binding works as a selection factor, which counteracts the mutations bringing about the GC-skew.
- Homo sapiens
- cap analysis of gene expression
- transcription factor
ASJC Scopus subject areas