TY - JOUR
T1 - Haplotype inference in random population samples
AU - Lin, Shin
AU - Cutler, David J.
AU - Zwick, Michael E.
AU - Chakravarti, Aravinda
N1 - Funding Information:
We thank J. S. Liu and D. Fallin, for generously providing their haplotype reconstruction software, as well as two anonymous reviewers, for their suggestions. This research was partially supported by National Institutes of Health grants HG01847 and MH60007. S.L. was supported by Johns Hopkins University's Medical Scientists Training Program grant GM07309.
PY - 2002/11/1
Y1 - 2002/11/1
N2 - Contemporary genotyping and sequencing methods do not provide information on linkage phase in diploid organisms. The application of statistical methods to infer and reconstruct linkage phase in samples of diploid sequences is a potentially time- and labor-saving method. The Stephens-Smith-Donnelly (SSD) algorithm is one such method, which incorporates concepts from population genetics theory in a Markov chain-Monte Carlo technique. We applied a modified SSD method, as well as the expectation-maximization and partition-ligation algorithms, to sequence data from eight loci spanning >1 Mb on the human X chromosome. We demonstrate that the accuracy of the modified SSD method is better than that of the other algorithms and is superior in terms of the number of sites that may be processed. Also, we find phase reconstructions by the modified SSD method to be highly accurate over regions with high linkage disequilibrium (LD). If only polymorphisms with a minor allele frequency >0.2 are analyzed and scored according to the fraction of neighbor relations correctly called, reconstructions are 95.2% accurate over entire 100-kb stretches and are 98.6% accurate within blocks of high LD.
AB - Contemporary genotyping and sequencing methods do not provide information on linkage phase in diploid organisms. The application of statistical methods to infer and reconstruct linkage phase in samples of diploid sequences is a potentially time- and labor-saving method. The Stephens-Smith-Donnelly (SSD) algorithm is one such method, which incorporates concepts from population genetics theory in a Markov chain-Monte Carlo technique. We applied a modified SSD method, as well as the expectation-maximization and partition-ligation algorithms, to sequence data from eight loci spanning >1 Mb on the human X chromosome. We demonstrate that the accuracy of the modified SSD method is better than that of the other algorithms and is superior in terms of the number of sites that may be processed. Also, we find phase reconstructions by the modified SSD method to be highly accurate over regions with high linkage disequilibrium (LD). If only polymorphisms with a minor allele frequency >0.2 are analyzed and scored according to the fraction of neighbor relations correctly called, reconstructions are 95.2% accurate over entire 100-kb stretches and are 98.6% accurate within blocks of high LD.
UR - http://www.scopus.com/inward/record.url?scp=0036842635&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036842635&partnerID=8YFLogxK
U2 - 10.1086/344347
DO - 10.1086/344347
M3 - Article
C2 - 12386835
AN - SCOPUS:0036842635
SN - 0002-9297
VL - 71
SP - 1129
EP - 1137
JO - American journal of human genetics
JF - American journal of human genetics
IS - 5
ER -