TY - JOUR
T1 - A Bayesian model for cross-study differential gene expression
AU - Scharpf, Robert B.
AU - Tjelmeland, Håkon
AU - Parmigiani, Giovanni
AU - Nobel, Andrew B.
N1 - Funding Information:
Robert B. Scharpf is Postdoctoral Fellow, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205 (E-mail: xfan@sta.cuhk.edu. hk). Håkon Tjelmeland is Professor, Department of Mathematical Sciences, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway (E-mail: haakont@stat.ntnu.no). Giovanni Parmigiani is Professor, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health and Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205 (E-mail: gp@jhu.edu). Andrew B. Nobel is Professor, Department of Statistics, University of North Carolina, Chapel Hill, NC 27599 (E-mail: nobel@email.unc.edu). Scharpf’s work was supported by U.S. National Institute of Environmental Health Sciences training grant 5T32ES012871, National Heart, Lung, and Blood Institute training grant 5T32HL007024, and National Science Foundation grant DMS 034211. Nobel’s research was supported in part by National Science Foundation grant DMS 0406361 and U.S. Environmental Protection Agency grant RD-83272001.
PY - 2009/12
Y1 - 2009/12
N2 - In this article we define a hierarchical Bayesian model for microarray expression data collected from several studies and use it to identify genes that show differential expression between two conditions. Key features include shrinkage across both genes and studies, and flexible modeling that allows for interactions between platforms and the estimated effect, as well as concordant and discordant differential expression across studies. We evaluate the performance of our model in a comprehensive fashion, using both artificial data, and a "split - study" validation approach that provides an agnostic assessment of the model's behavior under both the null hypothesis and a realistic alternative. The simulation results from the artificial data demonstrate the advantages of the Bayesian model. Furthermore, the simulations provide guidelines for when the Bayesian model is most likely to be useful. Most notably, in small studies the Bayesian model generally outperforms other methods when evaluated based on several performance measures across a range of simulation parameters, with the differences diminishing for larger sample sizes in the individual studies. The split - study validation illustrates appropriate shrinkage of the Bayesian model in the absence of platform, sample, and annotation differences that otherwise complicate experimental data analyses. Finally, we fit our model to four breast cancer studies using different technologies (cDNA and Affymetrix) to estimate differential expression in estrogen receptor - positive tumors versus estrogen receptor - negative tumors. Software and data for reproducing our analysis are available publicly.
AB - In this article we define a hierarchical Bayesian model for microarray expression data collected from several studies and use it to identify genes that show differential expression between two conditions. Key features include shrinkage across both genes and studies, and flexible modeling that allows for interactions between platforms and the estimated effect, as well as concordant and discordant differential expression across studies. We evaluate the performance of our model in a comprehensive fashion, using both artificial data, and a "split - study" validation approach that provides an agnostic assessment of the model's behavior under both the null hypothesis and a realistic alternative. The simulation results from the artificial data demonstrate the advantages of the Bayesian model. Furthermore, the simulations provide guidelines for when the Bayesian model is most likely to be useful. Most notably, in small studies the Bayesian model generally outperforms other methods when evaluated based on several performance measures across a range of simulation parameters, with the differences diminishing for larger sample sizes in the individual studies. The split - study validation illustrates appropriate shrinkage of the Bayesian model in the absence of platform, sample, and annotation differences that otherwise complicate experimental data analyses. Finally, we fit our model to four breast cancer studies using different technologies (cDNA and Affymetrix) to estimate differential expression in estrogen receptor - positive tumors versus estrogen receptor - negative tumors. Software and data for reproducing our analysis are available publicly.
KW - Bayesian hierarchical model
KW - Bayesian meta-analysis
KW - Differential expression
KW - Gene expression
KW - Multiple studies
UR - http://www.scopus.com/inward/record.url?scp=74049121238&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=74049121238&partnerID=8YFLogxK
U2 - 10.1198/jasa.2009.ap07611
DO - 10.1198/jasa.2009.ap07611
M3 - Article
C2 - 21127725
AN - SCOPUS:74049121238
SN - 0162-1459
VL - 104
SP - 1295
EP - 1310
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 488
ER -