Imputation methods to improve inference in SNP association studies

James Y. Dai, Ingo Ruczinski, Michael Leblanc, Charles Kooperberg

Research output: Contribution to journalArticlepeer-review


Missing single nucleotide polymorphisms (SNPs) are quite common in genetic association studies. Subjects with missing SNPs are often discarded in analyses, which may seriously undermine the inference of SNP-disease association. In this article, we develop two haplotype-based imputation approaches and one tree-based imputation approach for association studies. The emphasis is to evaluate the impact of imputation on parameter estimation, compared to the standard practice of ignoring missing data. Haplotype-based approaches build on haplotype reconstruction by the expectation-maximization (EM) algorithm or a weighted EM (WEM) algorithm, depending on whether case-control status is taken into account. The tree-based approach uses a Gibbs sampler to iteratively sample from a full conditional distribution, which is obtained from the classification and regression tree (CART) algorithm. We employ a standard multiple imputation procedure to account for the uncertainty of imputation. We apply the methods to simulated data as well as a case-control study on developmental dyslexia. Our results suggest that imputation generally improves efficiency over the standard practice of ignoring missing data. The tree-based approach performs comparably well as haplotype-based approaches, but the former has a computational advantage. The WEM approach yields the smallest bias at a price of increased variance.

Original languageEnglish (US)
Pages (from-to)690-702
Number of pages13
JournalGenetic epidemiology
Issue number8
StatePublished - Dec 2006


  • CART
  • EM algorithm
  • Gibbs sampler
  • Linkage disequilibrium
  • Multiple imputation

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)


Dive into the research topics of 'Imputation methods to improve inference in SNP association studies'. Together they form a unique fingerprint.

Cite this