TY - JOUR
T1 - EAGLE
T2 - An algorithm that utilizes a small number of genomic features to predict tissue/cell type-specific enhancer-gene interactions
AU - Gao, Tianshun
AU - Qian, Jiang
N1 - Publisher Copyright:
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019/9/25
Y1 - 2019/9/25
N2 - Long-range regulation by distal enhancers is crucial for many biological processes. The existing methods for enhancer-target gene prediction often require many genomic features. This makes them difficult to be applied to many cell types, in which the relevant datasets are not always available. Here, we design a tool EAGLE, an enhancer and gene learning ensemble method for identification of Enhancer-Gene (EG) interactions. Unlike existing tools, EAGLE used only six features derived from the genomic features of enhancers and gene expression datasets. Cross-validation revealed that EAGLE outperformed other existing methods. Enrichment analyses on special transcriptional factors, epigenetic modifications, and eQTLs demonstrated that EAGLE could distinguish the interacting pairs from non- interacting ones. Finally, EAGLE was applied to mouse and human genomes and identified 7,680,203 and 7,437,255 EG interactions involving 31,375 and 43,724 genes, 138,547 and 177,062 enhancers across 89 and 110 tissue/cell types in mouse and human, respectively. The obtained interactions are accessible through an interactive database enhanceratlas.org. The EAGLE method is available at https://github.com/EvansGao/EAGLE and the predicted datasets are available in http://www.enhanceratlas.org/.
AB - Long-range regulation by distal enhancers is crucial for many biological processes. The existing methods for enhancer-target gene prediction often require many genomic features. This makes them difficult to be applied to many cell types, in which the relevant datasets are not always available. Here, we design a tool EAGLE, an enhancer and gene learning ensemble method for identification of Enhancer-Gene (EG) interactions. Unlike existing tools, EAGLE used only six features derived from the genomic features of enhancers and gene expression datasets. Cross-validation revealed that EAGLE outperformed other existing methods. Enrichment analyses on special transcriptional factors, epigenetic modifications, and eQTLs demonstrated that EAGLE could distinguish the interacting pairs from non- interacting ones. Finally, EAGLE was applied to mouse and human genomes and identified 7,680,203 and 7,437,255 EG interactions involving 31,375 and 43,724 genes, 138,547 and 177,062 enhancers across 89 and 110 tissue/cell types in mouse and human, respectively. The obtained interactions are accessible through an interactive database enhanceratlas.org. The EAGLE method is available at https://github.com/EvansGao/EAGLE and the predicted datasets are available in http://www.enhanceratlas.org/.
UR - http://www.scopus.com/inward/record.url?scp=85095656180&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095656180&partnerID=8YFLogxK
U2 - 10.1101/781427
DO - 10.1101/781427
M3 - Article
AN - SCOPUS:85095656180
JO - Advances in Water Resources
JF - Advances in Water Resources
SN - 0309-1708
ER -