The finished human genome-assemblies comprise several hundred un-sequenced euchromatic gaps, which may be rich in long polypurine/polypyrimidine stretches. Human chromosome 20 (chr 20) currently has three unfinished gaps remaining on its q-arm. All three gaps are within gene-dense regions and/or overlap disease-associated loci, including the DLGAP4 locus. In this study, we sequenced ∼99 of all three unfinished gaps on human chr 20, determined their complete genomic sizes and assessed epigenetic profiles using a combination of Sanger sequencing, mate pair paired-end high-throughput sequencing and chromatin, methylation and expression analyses. We found histone 3 trimethylated at Lysine 27 to be distributed across all three gaps in immortalized B-lymphocytes. In one gap, five novel CpG islands were predominantly hypermethylated in genomic DNA from peripheral blood lymphocytes and human cerebellum. One of these CpG islands was differentially methylated and paternally hypermethylated. We found all chr 20 gaps to comprise structured non-coding RNAs (ncRNAs) and to be conserved in primates. We verified expression for 13 candidate ncRNAs, some of which showed tissue specificity. Four ncRNAs expressed within the gap at DLGAP4 show elevated expression in the human brain. Our data suggest that unfinished human genome gaps are likely to comprise numerous functional elements.
ASJC Scopus subject areas