Long non-coding RNAs (lncRNAs), representing a large proportion of non-coding transcripts across the human genome, are evolutionally conserved and biologically functional. At least one-third of the phenotype-related loci identified by genome-wide association studies (GWAS) are mapped to non-coding intervals. However, the relationships between phenotype-related loci and lncRNAs are largely unknown. Utilizing the 1000 Genomes data, we compared single-nucleotide polymorphisms (SNPs) within the sequences of lncRNA and protein-coding genes as defined in the Ensembl database. We further annotated the phenotype-related SNPs reported by GWAS at lncRNA intervals. Because prostate cancer (PCa) risk-related loci were enriched in lncRNAs, we then performed meta-analysis of two existing GWAS for discovery and an additional sample set for replication, revealing PCa risk-related loci at lncRNA regions. The SNP density in regions of lncRNA was similar to that in protein-coding regions, but they were less polymorphic than surrounding regions. Among the 1998 phenotype-related SNPs identified by GWAS, 52 loci were located directly in lncRNA intervals with a 1.5-fold enrichment compared with the entire genome. More than a 5-fold enrichment was observed for eight PCa risk-related loci in lncRNA genes. We also identified a new PCa risk-related SNP rs3787016 in an lncRNA region at 19q13 (per allele odds ratio = 1.19; 95% confidence interval: 1.11-1.27) with P value of 7.22 × 10 -7. lncRNAs may be important for interpreting and mining GWAS data. However, the catalog of lncRNAs needs to be better characterized in order to fully evaluate the relationship of phenotype-related loci with lncRNAs.
ASJC Scopus subject areas
- Cancer Research