Summary: Three recent publications have examined the quality and completeness of public database single nucleotide polymorphism (dbSNP) and have come to dramatically different conclusions regarding dbSNPs false positive rate and the proportion of dbSNPs that are expected to be common. These studies employed different genotyping technologies and different protocols in determining minimum acceptable genotyping quality thresholds. Because heterozygous sites typically have lower quality scores than homozygous sites, a higher minimum quality threshold reduces the number of false positive SNPs, but yields fewer heterozygotes and leads to fewer confirmed SNPs. To account for the different confirmation rates and distributions of minor allele frequencies, we propose that the three confirmation studies have different false positive and false negative rates. We developed a mathematical model to predict SNP confirmation rates and the apparent distribution of minor allele frequencies under user-specified false positive and false negative rates. We applied this model to the three published studies and to our own resequencing effort. We conclude that the dbSNP false positive rate is ∼15-17% and that the reported confirmation studies have vastly different genotyping error rates and patterns.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics