A Composite Likelihood Approach to Latent Multivariate Gaussian Modeling of SNP Data with Application to Genetic Association Testing

Fang Han, Wei Pan

Research output: Contribution to journalArticle

Abstract

Many statistical tests have been proposed for case-control data to detect disease association with multiple single nucleotide polymorphisms (SNPs) in linkage disequilibrium. The main reason for the existence of so many tests is that each test aims to detect one or two aspects of many possible distributional differences between cases and controls, largely due to the lack of a general and yet simple model for discrete genotype data. Here we propose a latent variable model to represent SNP data: the observed SNP data are assumed to be obtained by discretizing a latent multivariate Gaussian variate. Because the latent variate is multivariate Gaussian, its distribution is completely characterized by its mean vector and covariance matrix, in contrast to much more complex forms of a general distribution for discrete multivariate SNP data. We propose a composite likelihood approach for parameter estimation. A direct application of this latent variable model is to association testing with multiple SNPs in a candidate gene or region. In contrast to many existing tests that aim to detect only one or two aspects of many possible distributional differences of discrete SNP data, we can exclusively focus on testing the mean and covariance parameters of the latent Gaussian distributions for cases and controls. Our simulation results demonstrate potential power gains of the proposed approach over some existing methods.

Original languageEnglish (US)
Pages (from-to)307-315
Number of pages9
JournalBiometrics
Volume68
Issue number1
DOIs
StatePublished - Mar 2012
Externally publishedYes

Fingerprint

Composite Likelihood
Genetic Association
Single nucleotide Polymorphism
Genetic Testing
Nucleotides
Polymorphism
single nucleotide polymorphism
Single Nucleotide Polymorphism
Testing
Composite materials
Modeling
Latent Variable Models
Normal Distribution
testing
Gaussian distribution
Case-control Data
Linkage Disequilibrium
Statistical tests
Statistical test
linkage disequilibrium

Keywords

  • Genome-wide association study
  • GWAS
  • Latent model
  • Logistic regression
  • Multimarker analysis
  • Multivariate discrete distribution

ASJC Scopus subject areas

  • Applied Mathematics
  • Statistics and Probability
  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Medicine(all)

Cite this

A Composite Likelihood Approach to Latent Multivariate Gaussian Modeling of SNP Data with Application to Genetic Association Testing. / Han, Fang; Pan, Wei.

In: Biometrics, Vol. 68, No. 1, 03.2012, p. 307-315.

Research output: Contribution to journalArticle

@article{cc5fe382d3cf457981f6ad2ad2eeb2ca,
title = "A Composite Likelihood Approach to Latent Multivariate Gaussian Modeling of SNP Data with Application to Genetic Association Testing",
abstract = "Many statistical tests have been proposed for case-control data to detect disease association with multiple single nucleotide polymorphisms (SNPs) in linkage disequilibrium. The main reason for the existence of so many tests is that each test aims to detect one or two aspects of many possible distributional differences between cases and controls, largely due to the lack of a general and yet simple model for discrete genotype data. Here we propose a latent variable model to represent SNP data: the observed SNP data are assumed to be obtained by discretizing a latent multivariate Gaussian variate. Because the latent variate is multivariate Gaussian, its distribution is completely characterized by its mean vector and covariance matrix, in contrast to much more complex forms of a general distribution for discrete multivariate SNP data. We propose a composite likelihood approach for parameter estimation. A direct application of this latent variable model is to association testing with multiple SNPs in a candidate gene or region. In contrast to many existing tests that aim to detect only one or two aspects of many possible distributional differences of discrete SNP data, we can exclusively focus on testing the mean and covariance parameters of the latent Gaussian distributions for cases and controls. Our simulation results demonstrate potential power gains of the proposed approach over some existing methods.",
keywords = "Genome-wide association study, GWAS, Latent model, Logistic regression, Multimarker analysis, Multivariate discrete distribution",
author = "Fang Han and Wei Pan",
year = "2012",
month = "3",
doi = "10.1111/j.1541-0420.2011.01649.x",
language = "English (US)",
volume = "68",
pages = "307--315",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - A Composite Likelihood Approach to Latent Multivariate Gaussian Modeling of SNP Data with Application to Genetic Association Testing

AU - Han, Fang

AU - Pan, Wei

PY - 2012/3

Y1 - 2012/3

N2 - Many statistical tests have been proposed for case-control data to detect disease association with multiple single nucleotide polymorphisms (SNPs) in linkage disequilibrium. The main reason for the existence of so many tests is that each test aims to detect one or two aspects of many possible distributional differences between cases and controls, largely due to the lack of a general and yet simple model for discrete genotype data. Here we propose a latent variable model to represent SNP data: the observed SNP data are assumed to be obtained by discretizing a latent multivariate Gaussian variate. Because the latent variate is multivariate Gaussian, its distribution is completely characterized by its mean vector and covariance matrix, in contrast to much more complex forms of a general distribution for discrete multivariate SNP data. We propose a composite likelihood approach for parameter estimation. A direct application of this latent variable model is to association testing with multiple SNPs in a candidate gene or region. In contrast to many existing tests that aim to detect only one or two aspects of many possible distributional differences of discrete SNP data, we can exclusively focus on testing the mean and covariance parameters of the latent Gaussian distributions for cases and controls. Our simulation results demonstrate potential power gains of the proposed approach over some existing methods.

AB - Many statistical tests have been proposed for case-control data to detect disease association with multiple single nucleotide polymorphisms (SNPs) in linkage disequilibrium. The main reason for the existence of so many tests is that each test aims to detect one or two aspects of many possible distributional differences between cases and controls, largely due to the lack of a general and yet simple model for discrete genotype data. Here we propose a latent variable model to represent SNP data: the observed SNP data are assumed to be obtained by discretizing a latent multivariate Gaussian variate. Because the latent variate is multivariate Gaussian, its distribution is completely characterized by its mean vector and covariance matrix, in contrast to much more complex forms of a general distribution for discrete multivariate SNP data. We propose a composite likelihood approach for parameter estimation. A direct application of this latent variable model is to association testing with multiple SNPs in a candidate gene or region. In contrast to many existing tests that aim to detect only one or two aspects of many possible distributional differences of discrete SNP data, we can exclusively focus on testing the mean and covariance parameters of the latent Gaussian distributions for cases and controls. Our simulation results demonstrate potential power gains of the proposed approach over some existing methods.

KW - Genome-wide association study

KW - GWAS

KW - Latent model

KW - Logistic regression

KW - Multimarker analysis

KW - Multivariate discrete distribution

UR - http://www.scopus.com/inward/record.url?scp=84858865245&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858865245&partnerID=8YFLogxK

U2 - 10.1111/j.1541-0420.2011.01649.x

DO - 10.1111/j.1541-0420.2011.01649.x

M3 - Article

C2 - 21838810

AN - SCOPUS:84858865245

VL - 68

SP - 307

EP - 315

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 1

ER -