Haplotype and missing data inference in nuclear families

Shin Lin, Aravinda Chakravarti, David J. Cutler

Research output: Contribution to journalArticle

Abstract

Determining linkage phase from population samples with statistical methods is accurate only within regions of high linkage disequilibrium (LD). Yet, affected individuals in a genetic mapping study, including those involving cases and controls, may share sequences identical-by-descent stretching on the order of 10s to 100s of kilobases, quite possibly over regions of low LD in the population. At the same time, inferring phase from nuclear families may be hampered by missing family members, missing genotypes, and the noninformativity of certain genotype patterns. In this study, we reformulate our previous haplotype reconstruction algorithm, and its associated computer program, to phase parents with information derived from population samples as well as from their offspring. In applications of our algorithm to 100-kb stretches, simulated in accordance to a Wright-Fisher model with typical levels of LD in humans, we find that phase reconstruction for 160 trios with 10% missing data is highly accurate (>90%) over the entire length. Furthermore, our algorithm can estimate allelic status for missing data at high accuracy (>95%). Finally, the input capacity of the program is vast, easily handling thousands of segregating sites in ≥1000 chromosomes.

Original languageEnglish (US)
Pages (from-to)1624-1632
Number of pages9
JournalGenome Research
Volume14
Issue number8
DOIs
StatePublished - Aug 2004

Fingerprint

Linkage Disequilibrium
Nuclear Family
Haplotypes
Genotype
Population
Software
Chromosomes
Parents

ASJC Scopus subject areas

  • Genetics

Cite this

Lin, S., Chakravarti, A., & Cutler, D. J. (2004). Haplotype and missing data inference in nuclear families. Genome Research, 14(8), 1624-1632. https://doi.org/10.1101/gr.2204604

Haplotype and missing data inference in nuclear families. / Lin, Shin; Chakravarti, Aravinda; Cutler, David J.

In: Genome Research, Vol. 14, No. 8, 08.2004, p. 1624-1632.

Research output: Contribution to journalArticle

Lin, S, Chakravarti, A & Cutler, DJ 2004, 'Haplotype and missing data inference in nuclear families', Genome Research, vol. 14, no. 8, pp. 1624-1632. https://doi.org/10.1101/gr.2204604
Lin, Shin ; Chakravarti, Aravinda ; Cutler, David J. / Haplotype and missing data inference in nuclear families. In: Genome Research. 2004 ; Vol. 14, No. 8. pp. 1624-1632.
@article{8e3eaff7ca72490cb561f39b93949738,
title = "Haplotype and missing data inference in nuclear families",
abstract = "Determining linkage phase from population samples with statistical methods is accurate only within regions of high linkage disequilibrium (LD). Yet, affected individuals in a genetic mapping study, including those involving cases and controls, may share sequences identical-by-descent stretching on the order of 10s to 100s of kilobases, quite possibly over regions of low LD in the population. At the same time, inferring phase from nuclear families may be hampered by missing family members, missing genotypes, and the noninformativity of certain genotype patterns. In this study, we reformulate our previous haplotype reconstruction algorithm, and its associated computer program, to phase parents with information derived from population samples as well as from their offspring. In applications of our algorithm to 100-kb stretches, simulated in accordance to a Wright-Fisher model with typical levels of LD in humans, we find that phase reconstruction for 160 trios with 10{\%} missing data is highly accurate (>90{\%}) over the entire length. Furthermore, our algorithm can estimate allelic status for missing data at high accuracy (>95{\%}). Finally, the input capacity of the program is vast, easily handling thousands of segregating sites in ≥1000 chromosomes.",
author = "Shin Lin and Aravinda Chakravarti and Cutler, {David J.}",
year = "2004",
month = "8",
doi = "10.1101/gr.2204604",
language = "English (US)",
volume = "14",
pages = "1624--1632",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "8",

}

TY - JOUR

T1 - Haplotype and missing data inference in nuclear families

AU - Lin, Shin

AU - Chakravarti, Aravinda

AU - Cutler, David J.

PY - 2004/8

Y1 - 2004/8

N2 - Determining linkage phase from population samples with statistical methods is accurate only within regions of high linkage disequilibrium (LD). Yet, affected individuals in a genetic mapping study, including those involving cases and controls, may share sequences identical-by-descent stretching on the order of 10s to 100s of kilobases, quite possibly over regions of low LD in the population. At the same time, inferring phase from nuclear families may be hampered by missing family members, missing genotypes, and the noninformativity of certain genotype patterns. In this study, we reformulate our previous haplotype reconstruction algorithm, and its associated computer program, to phase parents with information derived from population samples as well as from their offspring. In applications of our algorithm to 100-kb stretches, simulated in accordance to a Wright-Fisher model with typical levels of LD in humans, we find that phase reconstruction for 160 trios with 10% missing data is highly accurate (>90%) over the entire length. Furthermore, our algorithm can estimate allelic status for missing data at high accuracy (>95%). Finally, the input capacity of the program is vast, easily handling thousands of segregating sites in ≥1000 chromosomes.

AB - Determining linkage phase from population samples with statistical methods is accurate only within regions of high linkage disequilibrium (LD). Yet, affected individuals in a genetic mapping study, including those involving cases and controls, may share sequences identical-by-descent stretching on the order of 10s to 100s of kilobases, quite possibly over regions of low LD in the population. At the same time, inferring phase from nuclear families may be hampered by missing family members, missing genotypes, and the noninformativity of certain genotype patterns. In this study, we reformulate our previous haplotype reconstruction algorithm, and its associated computer program, to phase parents with information derived from population samples as well as from their offspring. In applications of our algorithm to 100-kb stretches, simulated in accordance to a Wright-Fisher model with typical levels of LD in humans, we find that phase reconstruction for 160 trios with 10% missing data is highly accurate (>90%) over the entire length. Furthermore, our algorithm can estimate allelic status for missing data at high accuracy (>95%). Finally, the input capacity of the program is vast, easily handling thousands of segregating sites in ≥1000 chromosomes.

UR - http://www.scopus.com/inward/record.url?scp=4444319904&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4444319904&partnerID=8YFLogxK

U2 - 10.1101/gr.2204604

DO - 10.1101/gr.2204604

M3 - Article

C2 - 15256514

AN - SCOPUS:4444319904

VL - 14

SP - 1624

EP - 1632

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 8

ER -