The practical effect of batch on genomic prediction.

Hilary S. Parker, Jeffrey T. Leek

Research output: Contribution to journalArticlepeer-review

Abstract

Measurements from microarrays and other high-throughput technologies are susceptible to non-biological artifacts like batch effects. It is known that batch effects can alter or obscure the set of significant results and biological conclusions in high-throughput studies. Here we examine the impact of batch effects on predictors built from genomic technologies. To investigate batch effects, we collected publicly available gene expression measurements with known outcomes, and estimated batches using date. Using these data we show (1) the impact of batch effects on prediction depends on the correlation between outcome and batch in the training data, and (2) removing expression measurements most affected by batch before building predictors may improve the accuracy of those predictors. These results suggest that (1) training sets should be designed to minimize correlation between batches and outcome, and (2) methods for identifying batch-affected probes should be developed to improve prediction results for studies with high correlation between batches and outcome.

Original languageEnglish (US)
Pages (from-to)Article 10
JournalStatistical applications in genetics and molecular biology
Volume11
Issue number3
DOIs
StatePublished - 2012
Externally publishedYes

ASJC Scopus subject areas

  • Statistics and Probability
  • Molecular Biology
  • Genetics
  • Computational Mathematics

Fingerprint Dive into the research topics of 'The practical effect of batch on genomic prediction.'. Together they form a unique fingerprint.

Cite this