Rapid Testing of SNPs and Gene-Environment Interactions in Case-Parent Trio Data Based on Exact Analytic Parameter Estimation

Holger Schwender, Margaret Anne Taub, Terri L Beaty, Mary L. Marazita, Ingo Ruczinski

Research output: Contribution to journalArticle

Abstract

Case-parent trio studies concerned with children affected by a disease and their parents aim to detect single nucleotide polymorphisms (SNPs) showing a preferential transmission of alleles from the parents to their affected offspring. A popular statistical test for detecting such SNPs associated with disease in this study design is the genotypic transmission/disequilibrium test (gTDT) based on a conditional logistic regression model, which usually needs to be fitted by an iterative procedure. In this article, we derive exact closed-form solutions for the parameter estimates of the conditional logistic regression models when testing for an additive, a dominant, or a recessive effect of a SNP, and show that such analytic parameter estimates also exist when considering gene-environment interactions with binary environmental variables. Because the genetic model underlying the association between a SNP and a disease is typically unknown, it might further be beneficial to use the maximum over the gTDT statistics for the possible effects of a SNP as test statistic. We therefore propose a procedure enabling a fast computation of the test statistic and the permutation-based p-value of this MAX gTDT. All these methods are applied to whole-genome scans of the case-parent trios from the International Cleft Consortium. These applications show our procedures dramatically reduce the required computing time compared to the conventional iterative methods allowing, for example, the analysis of hundreds of thousands of SNPs in a few minutes instead of several hours.

Original languageEnglish (US)
Pages (from-to)766-773
Number of pages8
JournalBiometrics
Volume68
Issue number3
DOIs
StatePublished - Sep 2012

Fingerprint

Gene-environment Interaction
Gene-Environment Interaction
Single nucleotide Polymorphism
genotype-environment interaction
Nucleotides
Polymorphism
Parameter estimation
single nucleotide polymorphism
Single Nucleotide Polymorphism
Parameter Estimation
Genes
Testing
Logistic Models
Conditional Logistic Regression
Test Statistic
statistics
Logistic Regression Model
testing
Statistics
Logistics

Keywords

  • Conditional logistic regression
  • Family-based design
  • Genome-wide association studies
  • Genotypic transmission/disequilibrium test
  • International Cleft Consortium
  • MAX test

ASJC Scopus subject areas

  • Applied Mathematics
  • Statistics and Probability
  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Medicine(all)

Cite this

Rapid Testing of SNPs and Gene-Environment Interactions in Case-Parent Trio Data Based on Exact Analytic Parameter Estimation. / Schwender, Holger; Taub, Margaret Anne; Beaty, Terri L; Marazita, Mary L.; Ruczinski, Ingo.

In: Biometrics, Vol. 68, No. 3, 09.2012, p. 766-773.

Research output: Contribution to journalArticle

@article{36603941f8cf42cd986fe7be191b1701,
title = "Rapid Testing of SNPs and Gene-Environment Interactions in Case-Parent Trio Data Based on Exact Analytic Parameter Estimation",
abstract = "Case-parent trio studies concerned with children affected by a disease and their parents aim to detect single nucleotide polymorphisms (SNPs) showing a preferential transmission of alleles from the parents to their affected offspring. A popular statistical test for detecting such SNPs associated with disease in this study design is the genotypic transmission/disequilibrium test (gTDT) based on a conditional logistic regression model, which usually needs to be fitted by an iterative procedure. In this article, we derive exact closed-form solutions for the parameter estimates of the conditional logistic regression models when testing for an additive, a dominant, or a recessive effect of a SNP, and show that such analytic parameter estimates also exist when considering gene-environment interactions with binary environmental variables. Because the genetic model underlying the association between a SNP and a disease is typically unknown, it might further be beneficial to use the maximum over the gTDT statistics for the possible effects of a SNP as test statistic. We therefore propose a procedure enabling a fast computation of the test statistic and the permutation-based p-value of this MAX gTDT. All these methods are applied to whole-genome scans of the case-parent trios from the International Cleft Consortium. These applications show our procedures dramatically reduce the required computing time compared to the conventional iterative methods allowing, for example, the analysis of hundreds of thousands of SNPs in a few minutes instead of several hours.",
keywords = "Conditional logistic regression, Family-based design, Genome-wide association studies, Genotypic transmission/disequilibrium test, International Cleft Consortium, MAX test",
author = "Holger Schwender and Taub, {Margaret Anne} and Beaty, {Terri L} and Marazita, {Mary L.} and Ingo Ruczinski",
year = "2012",
month = "9",
doi = "10.1111/j.1541-0420.2011.01713.x",
language = "English (US)",
volume = "68",
pages = "766--773",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "3",

}

TY - JOUR

T1 - Rapid Testing of SNPs and Gene-Environment Interactions in Case-Parent Trio Data Based on Exact Analytic Parameter Estimation

AU - Schwender, Holger

AU - Taub, Margaret Anne

AU - Beaty, Terri L

AU - Marazita, Mary L.

AU - Ruczinski, Ingo

PY - 2012/9

Y1 - 2012/9

N2 - Case-parent trio studies concerned with children affected by a disease and their parents aim to detect single nucleotide polymorphisms (SNPs) showing a preferential transmission of alleles from the parents to their affected offspring. A popular statistical test for detecting such SNPs associated with disease in this study design is the genotypic transmission/disequilibrium test (gTDT) based on a conditional logistic regression model, which usually needs to be fitted by an iterative procedure. In this article, we derive exact closed-form solutions for the parameter estimates of the conditional logistic regression models when testing for an additive, a dominant, or a recessive effect of a SNP, and show that such analytic parameter estimates also exist when considering gene-environment interactions with binary environmental variables. Because the genetic model underlying the association between a SNP and a disease is typically unknown, it might further be beneficial to use the maximum over the gTDT statistics for the possible effects of a SNP as test statistic. We therefore propose a procedure enabling a fast computation of the test statistic and the permutation-based p-value of this MAX gTDT. All these methods are applied to whole-genome scans of the case-parent trios from the International Cleft Consortium. These applications show our procedures dramatically reduce the required computing time compared to the conventional iterative methods allowing, for example, the analysis of hundreds of thousands of SNPs in a few minutes instead of several hours.

AB - Case-parent trio studies concerned with children affected by a disease and their parents aim to detect single nucleotide polymorphisms (SNPs) showing a preferential transmission of alleles from the parents to their affected offspring. A popular statistical test for detecting such SNPs associated with disease in this study design is the genotypic transmission/disequilibrium test (gTDT) based on a conditional logistic regression model, which usually needs to be fitted by an iterative procedure. In this article, we derive exact closed-form solutions for the parameter estimates of the conditional logistic regression models when testing for an additive, a dominant, or a recessive effect of a SNP, and show that such analytic parameter estimates also exist when considering gene-environment interactions with binary environmental variables. Because the genetic model underlying the association between a SNP and a disease is typically unknown, it might further be beneficial to use the maximum over the gTDT statistics for the possible effects of a SNP as test statistic. We therefore propose a procedure enabling a fast computation of the test statistic and the permutation-based p-value of this MAX gTDT. All these methods are applied to whole-genome scans of the case-parent trios from the International Cleft Consortium. These applications show our procedures dramatically reduce the required computing time compared to the conventional iterative methods allowing, for example, the analysis of hundreds of thousands of SNPs in a few minutes instead of several hours.

KW - Conditional logistic regression

KW - Family-based design

KW - Genome-wide association studies

KW - Genotypic transmission/disequilibrium test

KW - International Cleft Consortium

KW - MAX test

UR - http://www.scopus.com/inward/record.url?scp=84866750913&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866750913&partnerID=8YFLogxK

U2 - 10.1111/j.1541-0420.2011.01713.x

DO - 10.1111/j.1541-0420.2011.01713.x

M3 - Article

C2 - 22150644

AN - SCOPUS:84866750913

VL - 68

SP - 766

EP - 773

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 3

ER -