Predicting gene expression from sequence

Michael Beer, Saeed Tavazoie

Research output: Contribution to journalArticle

Abstract

We describe a systematic genome-wide approach for learning the complex combinatorial code underlying gene expression. Our probabilistic approach identifies local DNA-sequence elements and the positional and combinatorial constraints that determine their context-dependent role in transcriptional regulation. The inferred regulatory rules correctly predict expression patterns for 73% of genes in Saccharomyces cerevisiae, utilizing microarray expression data and sequences in the 800 bp upstream of genes. Application to Caenorhabditis elegans identifies predictive regulatory elements and combinatorial rules that control the phased temporal expression of transcription factors, histones, and germline specific genes. Successful prediction requires diverse and complex rules utilizing AND, OR, and NOT logic, with significant constraints on motif strength, orientation, and relative position. This system generates a large number of mechanistic hypotheses for focused experimental validation, and establishes a predictive dynamical framework for understanding cellular behavior from genomic sequence.

Original languageEnglish (US)
Pages (from-to)185-198
Number of pages14
JournalCell
Volume117
Issue number2
DOIs
StatePublished - Apr 16 2004
Externally publishedYes

Fingerprint

Gene expression
Genes
Gene Expression
Caenorhabditis elegans
Histones
Saccharomyces cerevisiae
DNA sequences
Transcription Factors
Microarrays
Learning
Genome
Yeast

ASJC Scopus subject areas

  • Cell Biology
  • Molecular Biology

Cite this

Predicting gene expression from sequence. / Beer, Michael; Tavazoie, Saeed.

In: Cell, Vol. 117, No. 2, 16.04.2004, p. 185-198.

Research output: Contribution to journalArticle

Beer, Michael ; Tavazoie, Saeed. / Predicting gene expression from sequence. In: Cell. 2004 ; Vol. 117, No. 2. pp. 185-198.
@article{d7a0f70f6f89419ebc3a6a4478053e77,
title = "Predicting gene expression from sequence",
abstract = "We describe a systematic genome-wide approach for learning the complex combinatorial code underlying gene expression. Our probabilistic approach identifies local DNA-sequence elements and the positional and combinatorial constraints that determine their context-dependent role in transcriptional regulation. The inferred regulatory rules correctly predict expression patterns for 73{\%} of genes in Saccharomyces cerevisiae, utilizing microarray expression data and sequences in the 800 bp upstream of genes. Application to Caenorhabditis elegans identifies predictive regulatory elements and combinatorial rules that control the phased temporal expression of transcription factors, histones, and germline specific genes. Successful prediction requires diverse and complex rules utilizing AND, OR, and NOT logic, with significant constraints on motif strength, orientation, and relative position. This system generates a large number of mechanistic hypotheses for focused experimental validation, and establishes a predictive dynamical framework for understanding cellular behavior from genomic sequence.",
author = "Michael Beer and Saeed Tavazoie",
year = "2004",
month = "4",
day = "16",
doi = "10.1016/S0092-8674(04)00304-6",
language = "English (US)",
volume = "117",
pages = "185--198",
journal = "Cell",
issn = "0092-8674",
publisher = "Cell Press",
number = "2",

}

TY - JOUR

T1 - Predicting gene expression from sequence

AU - Beer, Michael

AU - Tavazoie, Saeed

PY - 2004/4/16

Y1 - 2004/4/16

N2 - We describe a systematic genome-wide approach for learning the complex combinatorial code underlying gene expression. Our probabilistic approach identifies local DNA-sequence elements and the positional and combinatorial constraints that determine their context-dependent role in transcriptional regulation. The inferred regulatory rules correctly predict expression patterns for 73% of genes in Saccharomyces cerevisiae, utilizing microarray expression data and sequences in the 800 bp upstream of genes. Application to Caenorhabditis elegans identifies predictive regulatory elements and combinatorial rules that control the phased temporal expression of transcription factors, histones, and germline specific genes. Successful prediction requires diverse and complex rules utilizing AND, OR, and NOT logic, with significant constraints on motif strength, orientation, and relative position. This system generates a large number of mechanistic hypotheses for focused experimental validation, and establishes a predictive dynamical framework for understanding cellular behavior from genomic sequence.

AB - We describe a systematic genome-wide approach for learning the complex combinatorial code underlying gene expression. Our probabilistic approach identifies local DNA-sequence elements and the positional and combinatorial constraints that determine their context-dependent role in transcriptional regulation. The inferred regulatory rules correctly predict expression patterns for 73% of genes in Saccharomyces cerevisiae, utilizing microarray expression data and sequences in the 800 bp upstream of genes. Application to Caenorhabditis elegans identifies predictive regulatory elements and combinatorial rules that control the phased temporal expression of transcription factors, histones, and germline specific genes. Successful prediction requires diverse and complex rules utilizing AND, OR, and NOT logic, with significant constraints on motif strength, orientation, and relative position. This system generates a large number of mechanistic hypotheses for focused experimental validation, and establishes a predictive dynamical framework for understanding cellular behavior from genomic sequence.

UR - http://www.scopus.com/inward/record.url?scp=1942453302&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1942453302&partnerID=8YFLogxK

U2 - 10.1016/S0092-8674(04)00304-6

DO - 10.1016/S0092-8674(04)00304-6

M3 - Article

C2 - 15084257

AN - SCOPUS:1942453302

VL - 117

SP - 185

EP - 198

JO - Cell

JF - Cell

SN - 0092-8674

IS - 2

ER -