TY - JOUR
T1 - Predicting gene expression from sequence
AU - Beer, Michael A.
AU - Tavazoie, Saeed
N1 - Funding Information:
We thank the members of the Tavazoie laboratory and Sohail Tavazoie for helpful discussion and review of this work. M.B. is a Lewis Thomas Fellow of Princeton University. S.T. is supported in part by grants from NSF CAREER, DARPA, and DOE.
PY - 2004/4/16
Y1 - 2004/4/16
N2 - We describe a systematic genome-wide approach for learning the complex combinatorial code underlying gene expression. Our probabilistic approach identifies local DNA-sequence elements and the positional and combinatorial constraints that determine their context-dependent role in transcriptional regulation. The inferred regulatory rules correctly predict expression patterns for 73% of genes in Saccharomyces cerevisiae, utilizing microarray expression data and sequences in the 800 bp upstream of genes. Application to Caenorhabditis elegans identifies predictive regulatory elements and combinatorial rules that control the phased temporal expression of transcription factors, histones, and germline specific genes. Successful prediction requires diverse and complex rules utilizing AND, OR, and NOT logic, with significant constraints on motif strength, orientation, and relative position. This system generates a large number of mechanistic hypotheses for focused experimental validation, and establishes a predictive dynamical framework for understanding cellular behavior from genomic sequence.
AB - We describe a systematic genome-wide approach for learning the complex combinatorial code underlying gene expression. Our probabilistic approach identifies local DNA-sequence elements and the positional and combinatorial constraints that determine their context-dependent role in transcriptional regulation. The inferred regulatory rules correctly predict expression patterns for 73% of genes in Saccharomyces cerevisiae, utilizing microarray expression data and sequences in the 800 bp upstream of genes. Application to Caenorhabditis elegans identifies predictive regulatory elements and combinatorial rules that control the phased temporal expression of transcription factors, histones, and germline specific genes. Successful prediction requires diverse and complex rules utilizing AND, OR, and NOT logic, with significant constraints on motif strength, orientation, and relative position. This system generates a large number of mechanistic hypotheses for focused experimental validation, and establishes a predictive dynamical framework for understanding cellular behavior from genomic sequence.
UR - http://www.scopus.com/inward/record.url?scp=1942453302&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=1942453302&partnerID=8YFLogxK
U2 - 10.1016/S0092-8674(04)00304-6
DO - 10.1016/S0092-8674(04)00304-6
M3 - Article
C2 - 15084257
AN - SCOPUS:1942453302
VL - 117
SP - 185
EP - 198
JO - Cell
JF - Cell
SN - 0092-8674
IS - 2
ER -