Bayesian coclustering of Anopheles gene expression time series: Study of immune defense response to multiple experimental challenges

Nicholas A. Heard, Christopher C. Holmes, David A. Stephens, David J. Hand, George Dimopoulos

Research output: Contribution to journalArticle

Abstract

We present a method for Bayesian model-based hierarchical coclustering of gene expression data and use it to study the temporal transcription responses of an Anopheles gambiae cell line upon challenge with multiple microbial elicitors. The method fits statistical regression models to the gene expression time series for each experiment and performs coclustering on the genes by optimizing a joint probability model, characterizing gene coregulation between multiple experiments. We compute the model using a two-stage Expectation-Maximization-type algorithm, first fixing the cross-experiment covariance structure and using efficient Bayesian hierarchical clustering to obtain a locally optimal clustering of the gene expression profiles and then, conditional on that clustering, carrying out Bayesian inference on the cross-experiment covariance using Markov chain Monte Carlo simulation to obtain an expectation. For the problem of model choice, we use a cross-validatory approach to decide between individual experiment modeling and varying levels of coclustering. Our method successfully generates tightly coregulated clusters of genes that are implicated in related processes and therefore can be used for analysis of global transcript responses to various stimuli and prediction of gene functions.

Original languageEnglish (US)
Pages (from-to)16939-16944
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Volume102
Issue number47
DOIs
StatePublished - Nov 22 2005

Fingerprint

Anopheles
Cluster Analysis
Gene Expression
Genes
Anopheles gambiae
Markov Chains
Bayes Theorem
Statistical Models
Multigene Family
Transcriptome
Regression Analysis
Cell Line

Keywords

  • Expectation-Maximization
  • Markov chain Monte Carlo
  • Microarray
  • Model-based clustering

ASJC Scopus subject areas

  • Genetics
  • General

Cite this

Bayesian coclustering of Anopheles gene expression time series : Study of immune defense response to multiple experimental challenges. / Heard, Nicholas A.; Holmes, Christopher C.; Stephens, David A.; Hand, David J.; Dimopoulos, George.

In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 102, No. 47, 22.11.2005, p. 16939-16944.

Research output: Contribution to journalArticle

@article{c8ca358f5e4d4a16a57998165c6a3e5f,
title = "Bayesian coclustering of Anopheles gene expression time series: Study of immune defense response to multiple experimental challenges",
abstract = "We present a method for Bayesian model-based hierarchical coclustering of gene expression data and use it to study the temporal transcription responses of an Anopheles gambiae cell line upon challenge with multiple microbial elicitors. The method fits statistical regression models to the gene expression time series for each experiment and performs coclustering on the genes by optimizing a joint probability model, characterizing gene coregulation between multiple experiments. We compute the model using a two-stage Expectation-Maximization-type algorithm, first fixing the cross-experiment covariance structure and using efficient Bayesian hierarchical clustering to obtain a locally optimal clustering of the gene expression profiles and then, conditional on that clustering, carrying out Bayesian inference on the cross-experiment covariance using Markov chain Monte Carlo simulation to obtain an expectation. For the problem of model choice, we use a cross-validatory approach to decide between individual experiment modeling and varying levels of coclustering. Our method successfully generates tightly coregulated clusters of genes that are implicated in related processes and therefore can be used for analysis of global transcript responses to various stimuli and prediction of gene functions.",
keywords = "Expectation-Maximization, Markov chain Monte Carlo, Microarray, Model-based clustering",
author = "Heard, {Nicholas A.} and Holmes, {Christopher C.} and Stephens, {David A.} and Hand, {David J.} and George Dimopoulos",
year = "2005",
month = "11",
day = "22",
doi = "10.1073/pnas.0408393102",
language = "English (US)",
volume = "102",
pages = "16939--16944",
journal = "Proceedings of the National Academy of Sciences of the United States of America",
issn = "0027-8424",
number = "47",

}

TY - JOUR

T1 - Bayesian coclustering of Anopheles gene expression time series

T2 - Study of immune defense response to multiple experimental challenges

AU - Heard, Nicholas A.

AU - Holmes, Christopher C.

AU - Stephens, David A.

AU - Hand, David J.

AU - Dimopoulos, George

PY - 2005/11/22

Y1 - 2005/11/22

N2 - We present a method for Bayesian model-based hierarchical coclustering of gene expression data and use it to study the temporal transcription responses of an Anopheles gambiae cell line upon challenge with multiple microbial elicitors. The method fits statistical regression models to the gene expression time series for each experiment and performs coclustering on the genes by optimizing a joint probability model, characterizing gene coregulation between multiple experiments. We compute the model using a two-stage Expectation-Maximization-type algorithm, first fixing the cross-experiment covariance structure and using efficient Bayesian hierarchical clustering to obtain a locally optimal clustering of the gene expression profiles and then, conditional on that clustering, carrying out Bayesian inference on the cross-experiment covariance using Markov chain Monte Carlo simulation to obtain an expectation. For the problem of model choice, we use a cross-validatory approach to decide between individual experiment modeling and varying levels of coclustering. Our method successfully generates tightly coregulated clusters of genes that are implicated in related processes and therefore can be used for analysis of global transcript responses to various stimuli and prediction of gene functions.

AB - We present a method for Bayesian model-based hierarchical coclustering of gene expression data and use it to study the temporal transcription responses of an Anopheles gambiae cell line upon challenge with multiple microbial elicitors. The method fits statistical regression models to the gene expression time series for each experiment and performs coclustering on the genes by optimizing a joint probability model, characterizing gene coregulation between multiple experiments. We compute the model using a two-stage Expectation-Maximization-type algorithm, first fixing the cross-experiment covariance structure and using efficient Bayesian hierarchical clustering to obtain a locally optimal clustering of the gene expression profiles and then, conditional on that clustering, carrying out Bayesian inference on the cross-experiment covariance using Markov chain Monte Carlo simulation to obtain an expectation. For the problem of model choice, we use a cross-validatory approach to decide between individual experiment modeling and varying levels of coclustering. Our method successfully generates tightly coregulated clusters of genes that are implicated in related processes and therefore can be used for analysis of global transcript responses to various stimuli and prediction of gene functions.

KW - Expectation-Maximization

KW - Markov chain Monte Carlo

KW - Microarray

KW - Model-based clustering

UR - http://www.scopus.com/inward/record.url?scp=28044449342&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=28044449342&partnerID=8YFLogxK

U2 - 10.1073/pnas.0408393102

DO - 10.1073/pnas.0408393102

M3 - Article

C2 - 16287981

AN - SCOPUS:28044449342

VL - 102

SP - 16939

EP - 16944

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

SN - 0027-8424

IS - 47

ER -