Examining the effect of linkage disequilibrium between markers on the type i error rate and power of nonparametric multipoint linkage analysis of two-generation and multigenerational pedigrees in the presence of missing genotype data

Yoonhee Kim, Priya Duggal, Elizabeth M. Gillanders, Ho Kim, Joan E. Bailey-Wilson

Research output: Contribution to journalArticle

Abstract

Because most multipoint linkage analysis programs currently assume linkage equilibrium between markers when inferring parental haplotypes, ignoring linkage disequilibrium (LD) may inflate the Type I error rate. We investigated the effect of LD on the Type I error rate and power of nonparametric multipoint linkage analysis of two-generation and multigenerational multiplex families. Using genome-wide single nucleotide polymorphism (SNP) data from the Collaborative Study of the Genetics of Alcoholism, we modified the original data set into 30 total data sets in order to consider six different patterns of missing data for five different levels of SNP density. To assess power, we designed simulated traits based on existing marker genotypes. For the Type I error rate, we simulated 1,000 qualitative traits from random distributions, unlinked to any of the marker data. Overall, the different levels of SNP density examined here had only small effects on power (except sibpair data). Missing data had a substantial effect on power, with more completely genotyped pedigrees yielding the highest power (except sibpair data). Most of the missing data patterns did not cause large increases in the Type I error rate if the SNP markers were more than 0.3 cM apart. However, in a dense 0.25-cM map, removing genotypes on founders and/or founders and parents in the middle generation caused substantial inflation of the Type I error rate, which corresponded to the increasing proportion of persons with missing data. Results also showed that long high-LD blocks have severe effects on Type I error rates.

Original languageEnglish (US)
Pages (from-to)41-51
Number of pages11
JournalGenetic Epidemiology
Volume32
Issue number1
DOIs
StatePublished - Jan 2008
Externally publishedYes

Fingerprint

Linkage Disequilibrium
Pedigree
Genotype
Single Nucleotide Polymorphism
Economic Inflation
Haplotypes
Alcoholism
Power (Psychology)
Genome
Datasets

Keywords

  • False Positives
  • Linkage disequilibrium
  • Pedigree structure
  • SNPs
  • Type I error rate

ASJC Scopus subject areas

  • Genetics(clinical)
  • Epidemiology

Cite this

@article{ec4f92f85a9f483bb754e1013c99b0de,
title = "Examining the effect of linkage disequilibrium between markers on the type i error rate and power of nonparametric multipoint linkage analysis of two-generation and multigenerational pedigrees in the presence of missing genotype data",
abstract = "Because most multipoint linkage analysis programs currently assume linkage equilibrium between markers when inferring parental haplotypes, ignoring linkage disequilibrium (LD) may inflate the Type I error rate. We investigated the effect of LD on the Type I error rate and power of nonparametric multipoint linkage analysis of two-generation and multigenerational multiplex families. Using genome-wide single nucleotide polymorphism (SNP) data from the Collaborative Study of the Genetics of Alcoholism, we modified the original data set into 30 total data sets in order to consider six different patterns of missing data for five different levels of SNP density. To assess power, we designed simulated traits based on existing marker genotypes. For the Type I error rate, we simulated 1,000 qualitative traits from random distributions, unlinked to any of the marker data. Overall, the different levels of SNP density examined here had only small effects on power (except sibpair data). Missing data had a substantial effect on power, with more completely genotyped pedigrees yielding the highest power (except sibpair data). Most of the missing data patterns did not cause large increases in the Type I error rate if the SNP markers were more than 0.3 cM apart. However, in a dense 0.25-cM map, removing genotypes on founders and/or founders and parents in the middle generation caused substantial inflation of the Type I error rate, which corresponded to the increasing proportion of persons with missing data. Results also showed that long high-LD blocks have severe effects on Type I error rates.",
keywords = "False Positives, Linkage disequilibrium, Pedigree structure, SNPs, Type I error rate",
author = "Yoonhee Kim and Priya Duggal and Gillanders, {Elizabeth M.} and Ho Kim and Bailey-Wilson, {Joan E.}",
year = "2008",
month = "1",
doi = "10.1002/gepi.20260",
language = "English (US)",
volume = "32",
pages = "41--51",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "1",

}

TY - JOUR

T1 - Examining the effect of linkage disequilibrium between markers on the type i error rate and power of nonparametric multipoint linkage analysis of two-generation and multigenerational pedigrees in the presence of missing genotype data

AU - Kim, Yoonhee

AU - Duggal, Priya

AU - Gillanders, Elizabeth M.

AU - Kim, Ho

AU - Bailey-Wilson, Joan E.

PY - 2008/1

Y1 - 2008/1

N2 - Because most multipoint linkage analysis programs currently assume linkage equilibrium between markers when inferring parental haplotypes, ignoring linkage disequilibrium (LD) may inflate the Type I error rate. We investigated the effect of LD on the Type I error rate and power of nonparametric multipoint linkage analysis of two-generation and multigenerational multiplex families. Using genome-wide single nucleotide polymorphism (SNP) data from the Collaborative Study of the Genetics of Alcoholism, we modified the original data set into 30 total data sets in order to consider six different patterns of missing data for five different levels of SNP density. To assess power, we designed simulated traits based on existing marker genotypes. For the Type I error rate, we simulated 1,000 qualitative traits from random distributions, unlinked to any of the marker data. Overall, the different levels of SNP density examined here had only small effects on power (except sibpair data). Missing data had a substantial effect on power, with more completely genotyped pedigrees yielding the highest power (except sibpair data). Most of the missing data patterns did not cause large increases in the Type I error rate if the SNP markers were more than 0.3 cM apart. However, in a dense 0.25-cM map, removing genotypes on founders and/or founders and parents in the middle generation caused substantial inflation of the Type I error rate, which corresponded to the increasing proportion of persons with missing data. Results also showed that long high-LD blocks have severe effects on Type I error rates.

AB - Because most multipoint linkage analysis programs currently assume linkage equilibrium between markers when inferring parental haplotypes, ignoring linkage disequilibrium (LD) may inflate the Type I error rate. We investigated the effect of LD on the Type I error rate and power of nonparametric multipoint linkage analysis of two-generation and multigenerational multiplex families. Using genome-wide single nucleotide polymorphism (SNP) data from the Collaborative Study of the Genetics of Alcoholism, we modified the original data set into 30 total data sets in order to consider six different patterns of missing data for five different levels of SNP density. To assess power, we designed simulated traits based on existing marker genotypes. For the Type I error rate, we simulated 1,000 qualitative traits from random distributions, unlinked to any of the marker data. Overall, the different levels of SNP density examined here had only small effects on power (except sibpair data). Missing data had a substantial effect on power, with more completely genotyped pedigrees yielding the highest power (except sibpair data). Most of the missing data patterns did not cause large increases in the Type I error rate if the SNP markers were more than 0.3 cM apart. However, in a dense 0.25-cM map, removing genotypes on founders and/or founders and parents in the middle generation caused substantial inflation of the Type I error rate, which corresponded to the increasing proportion of persons with missing data. Results also showed that long high-LD blocks have severe effects on Type I error rates.

KW - False Positives

KW - Linkage disequilibrium

KW - Pedigree structure

KW - SNPs

KW - Type I error rate

UR - http://www.scopus.com/inward/record.url?scp=38149016948&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38149016948&partnerID=8YFLogxK

U2 - 10.1002/gepi.20260

DO - 10.1002/gepi.20260

M3 - Article

VL - 32

SP - 41

EP - 51

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 1

ER -