TY - JOUR
T1 - Bayesian coclustering of Anopheles gene expression time series
T2 - Study of immune defense response to multiple experimental challenges
AU - Heard, Nicholas A.
AU - Holmes, Christopher C.
AU - Stephens, David A.
AU - Hand, David J.
AU - Dimopoulos, George
PY - 2005/11/22
Y1 - 2005/11/22
N2 - We present a method for Bayesian model-based hierarchical coclustering of gene expression data and use it to study the temporal transcription responses of an Anopheles gambiae cell line upon challenge with multiple microbial elicitors. The method fits statistical regression models to the gene expression time series for each experiment and performs coclustering on the genes by optimizing a joint probability model, characterizing gene coregulation between multiple experiments. We compute the model using a two-stage Expectation-Maximization-type algorithm, first fixing the cross-experiment covariance structure and using efficient Bayesian hierarchical clustering to obtain a locally optimal clustering of the gene expression profiles and then, conditional on that clustering, carrying out Bayesian inference on the cross-experiment covariance using Markov chain Monte Carlo simulation to obtain an expectation. For the problem of model choice, we use a cross-validatory approach to decide between individual experiment modeling and varying levels of coclustering. Our method successfully generates tightly coregulated clusters of genes that are implicated in related processes and therefore can be used for analysis of global transcript responses to various stimuli and prediction of gene functions.
AB - We present a method for Bayesian model-based hierarchical coclustering of gene expression data and use it to study the temporal transcription responses of an Anopheles gambiae cell line upon challenge with multiple microbial elicitors. The method fits statistical regression models to the gene expression time series for each experiment and performs coclustering on the genes by optimizing a joint probability model, characterizing gene coregulation between multiple experiments. We compute the model using a two-stage Expectation-Maximization-type algorithm, first fixing the cross-experiment covariance structure and using efficient Bayesian hierarchical clustering to obtain a locally optimal clustering of the gene expression profiles and then, conditional on that clustering, carrying out Bayesian inference on the cross-experiment covariance using Markov chain Monte Carlo simulation to obtain an expectation. For the problem of model choice, we use a cross-validatory approach to decide between individual experiment modeling and varying levels of coclustering. Our method successfully generates tightly coregulated clusters of genes that are implicated in related processes and therefore can be used for analysis of global transcript responses to various stimuli and prediction of gene functions.
KW - Expectation-Maximization
KW - Markov chain Monte Carlo
KW - Microarray
KW - Model-based clustering
UR - http://www.scopus.com/inward/record.url?scp=28044449342&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=28044449342&partnerID=8YFLogxK
U2 - 10.1073/pnas.0408393102
DO - 10.1073/pnas.0408393102
M3 - Article
C2 - 16287981
AN - SCOPUS:28044449342
VL - 102
SP - 16939
EP - 16944
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
SN - 0027-8424
IS - 47
ER -