Association Tests that Accommodate Genotyping Uncertainty

Thomas Louis, Benilton S. Carvalho, Daniele Daniele Fallin, Rafael A. Irizarryi, Qing Li, Ingo Ruczinski, Vanja Dukić, Ken Rice

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

High-throughput single nucleotide polymorphism (SNP) arrays, typically used in genome-wide association studies with a trait of interest, provide estimates of genotypes for up to several million loci. Most genotype estimates are very accurate, but genotyping errors do occur and can influence test statistics, p-values and ranks. Some SNPs are harder to call than others due to probe properties and other technical/biological factors; uncertainties can be associated with features of interest. SNP- and case-specific genotype posterior probabilities are available, but they are typically not used or used only informally, for example by setting aside the most uncertain calls. To improve on these approaches we take full advantage of Bayesian structuring and develop an analytic framework that accommodates genotype uncertainties. We show that the power of a score test (and statistical information more generally) is directly a function of the correlation of the genotype probabilities with the true genotypes. We demonstrate that compared to picking a single AA, AB or BB genotype or to setting aside difficult calls, Bayesian structuring can substantially increase statistical information for detecting a true association and for ranking SNPs, whether the ranking be frequentist or optimal Bayes. This improvement is primarily associated with genotypes that are difficult to call.

Original languageEnglish (US)
Title of host publicationBayesian Statistics 9
PublisherOxford University Press
ISBN (Print)9780191731921, 9780199694587
DOIs
StatePublished - Jan 19 2012

Fingerprint

Genotype
Uncertainty
Single nucleotide Polymorphism
Ranking
P-rank
Score Test
Posterior Probability
Bayes
p-Value
Estimate
High Throughput
Test Statistic
Locus
Genome
Probe
Demonstrate

Keywords

  • Association studies
  • Bayesian structuring and ranking
  • Genotype uncertainty
  • Single nucleotide polymorphism

ASJC Scopus subject areas

  • Mathematics(all)

Cite this

Louis, T., Carvalho, B. S., Fallin, D. D., Irizarryi, R. A., Li, Q., Ruczinski, I., ... Rice, K. (2012). Association Tests that Accommodate Genotyping Uncertainty. In Bayesian Statistics 9 Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199694587.003.0013

Association Tests that Accommodate Genotyping Uncertainty. / Louis, Thomas; Carvalho, Benilton S.; Fallin, Daniele Daniele; Irizarryi, Rafael A.; Li, Qing; Ruczinski, Ingo; Dukić, Vanja; Rice, Ken.

Bayesian Statistics 9. Oxford University Press, 2012.

Research output: Chapter in Book/Report/Conference proceedingChapter

Louis, T, Carvalho, BS, Fallin, DD, Irizarryi, RA, Li, Q, Ruczinski, I, Dukić, V & Rice, K 2012, Association Tests that Accommodate Genotyping Uncertainty. in Bayesian Statistics 9. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199694587.003.0013
Louis T, Carvalho BS, Fallin DD, Irizarryi RA, Li Q, Ruczinski I et al. Association Tests that Accommodate Genotyping Uncertainty. In Bayesian Statistics 9. Oxford University Press. 2012 https://doi.org/10.1093/acprof:oso/9780199694587.003.0013
Louis, Thomas ; Carvalho, Benilton S. ; Fallin, Daniele Daniele ; Irizarryi, Rafael A. ; Li, Qing ; Ruczinski, Ingo ; Dukić, Vanja ; Rice, Ken. / Association Tests that Accommodate Genotyping Uncertainty. Bayesian Statistics 9. Oxford University Press, 2012.
@inbook{3f12b270e86e461fa1fd393a8d54d37a,
title = "Association Tests that Accommodate Genotyping Uncertainty",
abstract = "High-throughput single nucleotide polymorphism (SNP) arrays, typically used in genome-wide association studies with a trait of interest, provide estimates of genotypes for up to several million loci. Most genotype estimates are very accurate, but genotyping errors do occur and can influence test statistics, p-values and ranks. Some SNPs are harder to call than others due to probe properties and other technical/biological factors; uncertainties can be associated with features of interest. SNP- and case-specific genotype posterior probabilities are available, but they are typically not used or used only informally, for example by setting aside the most uncertain calls. To improve on these approaches we take full advantage of Bayesian structuring and develop an analytic framework that accommodates genotype uncertainties. We show that the power of a score test (and statistical information more generally) is directly a function of the correlation of the genotype probabilities with the true genotypes. We demonstrate that compared to picking a single AA, AB or BB genotype or to setting aside difficult calls, Bayesian structuring can substantially increase statistical information for detecting a true association and for ranking SNPs, whether the ranking be frequentist or optimal Bayes. This improvement is primarily associated with genotypes that are difficult to call.",
keywords = "Association studies, Bayesian structuring and ranking, Genotype uncertainty, Single nucleotide polymorphism",
author = "Thomas Louis and Carvalho, {Benilton S.} and Fallin, {Daniele Daniele} and Irizarryi, {Rafael A.} and Qing Li and Ingo Ruczinski and Vanja Dukić and Ken Rice",
year = "2012",
month = "1",
day = "19",
doi = "10.1093/acprof:oso/9780199694587.003.0013",
language = "English (US)",
isbn = "9780191731921",
booktitle = "Bayesian Statistics 9",
publisher = "Oxford University Press",

}

TY - CHAP

T1 - Association Tests that Accommodate Genotyping Uncertainty

AU - Louis, Thomas

AU - Carvalho, Benilton S.

AU - Fallin, Daniele Daniele

AU - Irizarryi, Rafael A.

AU - Li, Qing

AU - Ruczinski, Ingo

AU - Dukić, Vanja

AU - Rice, Ken

PY - 2012/1/19

Y1 - 2012/1/19

N2 - High-throughput single nucleotide polymorphism (SNP) arrays, typically used in genome-wide association studies with a trait of interest, provide estimates of genotypes for up to several million loci. Most genotype estimates are very accurate, but genotyping errors do occur and can influence test statistics, p-values and ranks. Some SNPs are harder to call than others due to probe properties and other technical/biological factors; uncertainties can be associated with features of interest. SNP- and case-specific genotype posterior probabilities are available, but they are typically not used or used only informally, for example by setting aside the most uncertain calls. To improve on these approaches we take full advantage of Bayesian structuring and develop an analytic framework that accommodates genotype uncertainties. We show that the power of a score test (and statistical information more generally) is directly a function of the correlation of the genotype probabilities with the true genotypes. We demonstrate that compared to picking a single AA, AB or BB genotype or to setting aside difficult calls, Bayesian structuring can substantially increase statistical information for detecting a true association and for ranking SNPs, whether the ranking be frequentist or optimal Bayes. This improvement is primarily associated with genotypes that are difficult to call.

AB - High-throughput single nucleotide polymorphism (SNP) arrays, typically used in genome-wide association studies with a trait of interest, provide estimates of genotypes for up to several million loci. Most genotype estimates are very accurate, but genotyping errors do occur and can influence test statistics, p-values and ranks. Some SNPs are harder to call than others due to probe properties and other technical/biological factors; uncertainties can be associated with features of interest. SNP- and case-specific genotype posterior probabilities are available, but they are typically not used or used only informally, for example by setting aside the most uncertain calls. To improve on these approaches we take full advantage of Bayesian structuring and develop an analytic framework that accommodates genotype uncertainties. We show that the power of a score test (and statistical information more generally) is directly a function of the correlation of the genotype probabilities with the true genotypes. We demonstrate that compared to picking a single AA, AB or BB genotype or to setting aside difficult calls, Bayesian structuring can substantially increase statistical information for detecting a true association and for ranking SNPs, whether the ranking be frequentist or optimal Bayes. This improvement is primarily associated with genotypes that are difficult to call.

KW - Association studies

KW - Bayesian structuring and ranking

KW - Genotype uncertainty

KW - Single nucleotide polymorphism

UR - http://www.scopus.com/inward/record.url?scp=84933474921&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84933474921&partnerID=8YFLogxK

U2 - 10.1093/acprof:oso/9780199694587.003.0013

DO - 10.1093/acprof:oso/9780199694587.003.0013

M3 - Chapter

AN - SCOPUS:84933474921

SN - 9780191731921

SN - 9780199694587

BT - Bayesian Statistics 9

PB - Oxford University Press

ER -