On multi-marker tests for association in case-control studies

Margaret Anne Taub, Holger R. Schwender, Samuel G. Younkin, Thomas Louis, Ingo Ruczinski

Research output: Contribution to journalArticle

Abstract

Genome-wide association studies (GWAs) have identified thousands of DNA loci associated with a variety of traits. Statistical inference is almost always based on single marker hypothesis tests of association and the respective p-values with Bonferroni correction. Since commercially available genomic arrays interrogate hundreds of thousands or even millions of loci simultaneously, many causal yet undetected loci are believed to exist because the conditional power to achieve a genome-wide significance level can be low, in particular for markers with small effect sizes and low minor allele frequencies and in studies with modest sample size. However, the correlation between neighboring markers in the human genome due to linkage disequilibrium (LD) resulting in correlated marker test statistics can be incorporated into multi-marker hypothesis tests, thereby increasing power to detect association. Herein, we establish a theoretical benchmark by quantifying the maximum power achievable for multi-marker tests of association in case-control studies, achievable only when the causal marker is known. Using that genotype correlations within an LD block translate into an asymptotically multivariate normal distribution for score test statistics, we develop a set of weights for the markers that maximize the non-centrality parameter, and assess the relative loss of power for other approaches. We find that the method of Conneely and Boehnke (2007) based on the maximum absolute test statistic observed in an LD block is a practical and powerful method in a variety of settings. We also explore the effect on the power that prior biological or functional knowledge used to narrow down the locus of the causal marker can have, and conclude that this prior knowledge has to be very strong and specific for the power to approach the maximum achievable level, or even beat the power observed for methods such as the one proposed by Conneely and Boehnke (2007).

Original languageEnglish (US)
Article number00252
JournalFrontiers in Genetics
Volume4
Issue numberDEC
DOIs
StatePublished - 2013

Fingerprint

Linkage Disequilibrium
Case-Control Studies
Benchmarking
Genome-Wide Association Study
Normal Distribution
Human Genome
Gene Frequency
Sample Size
Genotype
Genome
Weights and Measures
DNA

Keywords

  • Genome-wide association studies
  • Linkage disequilibrium
  • Multi-marker tests
  • Multiplicity adjustment
  • Single nucleotide polymorphisms

ASJC Scopus subject areas

  • Genetics
  • Molecular Medicine
  • Genetics(clinical)

Cite this

On multi-marker tests for association in case-control studies. / Taub, Margaret Anne; Schwender, Holger R.; Younkin, Samuel G.; Louis, Thomas; Ruczinski, Ingo.

In: Frontiers in Genetics, Vol. 4, No. DEC, 00252, 2013.

Research output: Contribution to journalArticle

@article{85234e7da4c541f09ac5bf2a81dd46d3,
title = "On multi-marker tests for association in case-control studies",
abstract = "Genome-wide association studies (GWAs) have identified thousands of DNA loci associated with a variety of traits. Statistical inference is almost always based on single marker hypothesis tests of association and the respective p-values with Bonferroni correction. Since commercially available genomic arrays interrogate hundreds of thousands or even millions of loci simultaneously, many causal yet undetected loci are believed to exist because the conditional power to achieve a genome-wide significance level can be low, in particular for markers with small effect sizes and low minor allele frequencies and in studies with modest sample size. However, the correlation between neighboring markers in the human genome due to linkage disequilibrium (LD) resulting in correlated marker test statistics can be incorporated into multi-marker hypothesis tests, thereby increasing power to detect association. Herein, we establish a theoretical benchmark by quantifying the maximum power achievable for multi-marker tests of association in case-control studies, achievable only when the causal marker is known. Using that genotype correlations within an LD block translate into an asymptotically multivariate normal distribution for score test statistics, we develop a set of weights for the markers that maximize the non-centrality parameter, and assess the relative loss of power for other approaches. We find that the method of Conneely and Boehnke (2007) based on the maximum absolute test statistic observed in an LD block is a practical and powerful method in a variety of settings. We also explore the effect on the power that prior biological or functional knowledge used to narrow down the locus of the causal marker can have, and conclude that this prior knowledge has to be very strong and specific for the power to approach the maximum achievable level, or even beat the power observed for methods such as the one proposed by Conneely and Boehnke (2007).",
keywords = "Genome-wide association studies, Linkage disequilibrium, Multi-marker tests, Multiplicity adjustment, Single nucleotide polymorphisms",
author = "Taub, {Margaret Anne} and Schwender, {Holger R.} and Younkin, {Samuel G.} and Thomas Louis and Ingo Ruczinski",
year = "2013",
doi = "10.3389/fgene.2013.00252",
language = "English (US)",
volume = "4",
journal = "Frontiers in Genetics",
issn = "1664-8021",
publisher = "Frontiers Media S. A.",
number = "DEC",

}

TY - JOUR

T1 - On multi-marker tests for association in case-control studies

AU - Taub, Margaret Anne

AU - Schwender, Holger R.

AU - Younkin, Samuel G.

AU - Louis, Thomas

AU - Ruczinski, Ingo

PY - 2013

Y1 - 2013

N2 - Genome-wide association studies (GWAs) have identified thousands of DNA loci associated with a variety of traits. Statistical inference is almost always based on single marker hypothesis tests of association and the respective p-values with Bonferroni correction. Since commercially available genomic arrays interrogate hundreds of thousands or even millions of loci simultaneously, many causal yet undetected loci are believed to exist because the conditional power to achieve a genome-wide significance level can be low, in particular for markers with small effect sizes and low minor allele frequencies and in studies with modest sample size. However, the correlation between neighboring markers in the human genome due to linkage disequilibrium (LD) resulting in correlated marker test statistics can be incorporated into multi-marker hypothesis tests, thereby increasing power to detect association. Herein, we establish a theoretical benchmark by quantifying the maximum power achievable for multi-marker tests of association in case-control studies, achievable only when the causal marker is known. Using that genotype correlations within an LD block translate into an asymptotically multivariate normal distribution for score test statistics, we develop a set of weights for the markers that maximize the non-centrality parameter, and assess the relative loss of power for other approaches. We find that the method of Conneely and Boehnke (2007) based on the maximum absolute test statistic observed in an LD block is a practical and powerful method in a variety of settings. We also explore the effect on the power that prior biological or functional knowledge used to narrow down the locus of the causal marker can have, and conclude that this prior knowledge has to be very strong and specific for the power to approach the maximum achievable level, or even beat the power observed for methods such as the one proposed by Conneely and Boehnke (2007).

AB - Genome-wide association studies (GWAs) have identified thousands of DNA loci associated with a variety of traits. Statistical inference is almost always based on single marker hypothesis tests of association and the respective p-values with Bonferroni correction. Since commercially available genomic arrays interrogate hundreds of thousands or even millions of loci simultaneously, many causal yet undetected loci are believed to exist because the conditional power to achieve a genome-wide significance level can be low, in particular for markers with small effect sizes and low minor allele frequencies and in studies with modest sample size. However, the correlation between neighboring markers in the human genome due to linkage disequilibrium (LD) resulting in correlated marker test statistics can be incorporated into multi-marker hypothesis tests, thereby increasing power to detect association. Herein, we establish a theoretical benchmark by quantifying the maximum power achievable for multi-marker tests of association in case-control studies, achievable only when the causal marker is known. Using that genotype correlations within an LD block translate into an asymptotically multivariate normal distribution for score test statistics, we develop a set of weights for the markers that maximize the non-centrality parameter, and assess the relative loss of power for other approaches. We find that the method of Conneely and Boehnke (2007) based on the maximum absolute test statistic observed in an LD block is a practical and powerful method in a variety of settings. We also explore the effect on the power that prior biological or functional knowledge used to narrow down the locus of the causal marker can have, and conclude that this prior knowledge has to be very strong and specific for the power to approach the maximum achievable level, or even beat the power observed for methods such as the one proposed by Conneely and Boehnke (2007).

KW - Genome-wide association studies

KW - Linkage disequilibrium

KW - Multi-marker tests

KW - Multiplicity adjustment

KW - Single nucleotide polymorphisms

UR - http://www.scopus.com/inward/record.url?scp=84892381433&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84892381433&partnerID=8YFLogxK

U2 - 10.3389/fgene.2013.00252

DO - 10.3389/fgene.2013.00252

M3 - Article

C2 - 24379823

AN - SCOPUS:84892381433

VL - 4

JO - Frontiers in Genetics

JF - Frontiers in Genetics

SN - 1664-8021

IS - DEC

M1 - 00252

ER -