Discrepancies in dbSNP confirmation rates and allele frequency distributions from varying genotyping error rates and patterns

Adele A. Mitchell, Michael E. Zwick, Aravinda Chakravarti, David J. Cutler

Research output: Contribution to journalArticlepeer-review

33 Scopus citations

Abstract

Summary: Three recent publications have examined the quality and completeness of public database single nucleotide polymorphism (dbSNP) and have come to dramatically different conclusions regarding dbSNPs false positive rate and the proportion of dbSNPs that are expected to be common. These studies employed different genotyping technologies and different protocols in determining minimum acceptable genotyping quality thresholds. Because heterozygous sites typically have lower quality scores than homozygous sites, a higher minimum quality threshold reduces the number of false positive SNPs, but yields fewer heterozygotes and leads to fewer confirmed SNPs. To account for the different confirmation rates and distributions of minor allele frequencies, we propose that the three confirmation studies have different false positive and false negative rates. We developed a mathematical model to predict SNP confirmation rates and the apparent distribution of minor allele frequencies under user-specified false positive and false negative rates. We applied this model to the three published studies and to our own resequencing effort. We conclude that the dbSNP false positive rate is ∼15-17% and that the reported confirmation studies have vastly different genotyping error rates and patterns.

Original languageEnglish (US)
Pages (from-to)1022-1032
Number of pages11
JournalBioinformatics
Volume20
Issue number7
DOIs
StatePublished - May 1 2004
Externally publishedYes

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Discrepancies in dbSNP confirmation rates and allele frequency distributions from varying genotyping error rates and patterns'. Together they form a unique fingerprint.

Cite this