TY - JOUR
T1 - Global Mapping of Transcription Factor Binding Sites by Sequencing Chromatin Surrogates
T2 - A Perspective on Experimental Design, Data Analysis, and Open Problems
AU - Wei, Yingying
AU - Wu, George
AU - Ji, Hongkai
N1 - Funding Information:
This work was partially supported by NIH Grants R01HG005220 and
PY - 2013/5
Y1 - 2013/5
N2 - Mapping genome-wide binding sites of all transcription factors (TFs) in all biological contexts is a critical step toward understanding gene regulation. The state-of-the-art technologies for mapping transcription factor binding sites (TFBSs) couple chromatin immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip). These technologies have limitations: they are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TFBSs by analyzing chromatin features at computationally determined DNA motif sites. This promising new approach may allow researchers to monitor the genome-wide binding sites of many TFs simultaneously. In this article, we discuss various experimental design and data analysis issues that arise when applying this approach. Through a systematic analysis of the data from the Encyclopedia Of DNA Elements (ENCODE) project, we compare the predictive power of individual and combinations of chromatin marks using supervised and unsupervised learning methods, and evaluate the value of integrating information from public ChIP and gene expression data. We also highlight the challenges and opportunities for developing novel analytical methods, such as resolving the one-motif-multiple-TF ambiguity and distinguishing functional and non-functional TF binding targets from the predicted binding sites.
AB - Mapping genome-wide binding sites of all transcription factors (TFs) in all biological contexts is a critical step toward understanding gene regulation. The state-of-the-art technologies for mapping transcription factor binding sites (TFBSs) couple chromatin immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip). These technologies have limitations: they are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TFBSs by analyzing chromatin features at computationally determined DNA motif sites. This promising new approach may allow researchers to monitor the genome-wide binding sites of many TFs simultaneously. In this article, we discuss various experimental design and data analysis issues that arise when applying this approach. Through a systematic analysis of the data from the Encyclopedia Of DNA Elements (ENCODE) project, we compare the predictive power of individual and combinations of chromatin marks using supervised and unsupervised learning methods, and evaluate the value of integrating information from public ChIP and gene expression data. We also highlight the challenges and opportunities for developing novel analytical methods, such as resolving the one-motif-multiple-TF ambiguity and distinguishing functional and non-functional TF binding targets from the predicted binding sites.
KW - ChIP-seq
KW - DNase-seq
KW - FAIRE-seq
KW - Motif
KW - Next-generation sequencing
KW - Transcription factor binding sites
UR - http://www.scopus.com/inward/record.url?scp=84877081486&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84877081486&partnerID=8YFLogxK
U2 - 10.1007/s12561-012-9066-5
DO - 10.1007/s12561-012-9066-5
M3 - Article
AN - SCOPUS:84877081486
SN - 1867-1764
VL - 5
SP - 156
EP - 178
JO - Statistics in Biosciences
JF - Statistics in Biosciences
IS - 1
ER -