Inferring rare disease risk variants based on exact probabilities of sharing by multiple affected relatives

Alexandre Bureau, Samuel G. Younkin, Margaret M. Parker, Joan E. Bailey-Wilson, Mary L. Marazita, Jeffrey C. Murray, Elisabeth Mangold, Hasan Albacha-Hejazi, Terri L Beaty, Ingo Ruczinski

Research output: Contribution to journalArticle

Abstract

Motivation: Family-based designs are regaining popularity for genomic sequencing studies because they provide a way to test cosegregation with disease of variants that are too rare in the population to be tested individually in a conventional case-control study. Results: Where only a few affected subjects per family are sequenced, the probability that any variant would be shared by all affected relatives - given it occurred in any one family member - provides evidence against the null hypothesis of a complete absence of linkage and association. A P-value can be obtained as the sum of the probabilities of sharing events as (or more) extreme in one or more families. We generalize an existing closed-form expression for exact sharing probabilities to more than two relatives per family. When pedigree founders are related, we show that an approximation of sharing probabilities based on empirical estimates of kinship among founders obtained from genome-wide marker data is accurate for low levels of kinship. We also propose a more generally applicable approach based on Monte Carlo simulations. We applied this method to a study of 55 multiplex families with apparent non-syndromic forms of oral clefts from four distinct populations, with whole exome sequences available for two or three affected members per family. The rare single nucleotide variant rs149253049 in ADAMTS9 shared by affected relatives in three Indian families achieved significance after correcting for multiple comparisons (p=2×10-6). Availability and implementation: Source code and binaries of the R package RVsharing are freely available for download at http://cran.r-project.org/web/packages/ RVsharing/index.html.

Original languageEnglish (US)
Pages (from-to)2189-2196
Number of pages8
JournalBioinformatics
Volume30
Issue number15
DOIs
StatePublished - Aug 1 2014

Fingerprint

Rare Diseases
Sharing
Nucleotides
Genes
Availability
Exome
Pedigree
Multiple Comparisons
Case-control Study
Family
Null hypothesis
Linkage
Population
Sequencing
Genomics
Case-Control Studies
Closed-form
Genome
Extremes
Monte Carlo Simulation

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

Inferring rare disease risk variants based on exact probabilities of sharing by multiple affected relatives. / Bureau, Alexandre; Younkin, Samuel G.; Parker, Margaret M.; Bailey-Wilson, Joan E.; Marazita, Mary L.; Murray, Jeffrey C.; Mangold, Elisabeth; Albacha-Hejazi, Hasan; Beaty, Terri L; Ruczinski, Ingo.

In: Bioinformatics, Vol. 30, No. 15, 01.08.2014, p. 2189-2196.

Research output: Contribution to journalArticle

Bureau, A, Younkin, SG, Parker, MM, Bailey-Wilson, JE, Marazita, ML, Murray, JC, Mangold, E, Albacha-Hejazi, H, Beaty, TL & Ruczinski, I 2014, 'Inferring rare disease risk variants based on exact probabilities of sharing by multiple affected relatives', Bioinformatics, vol. 30, no. 15, pp. 2189-2196. https://doi.org/10.1093/bioinformatics/btu198
Bureau, Alexandre ; Younkin, Samuel G. ; Parker, Margaret M. ; Bailey-Wilson, Joan E. ; Marazita, Mary L. ; Murray, Jeffrey C. ; Mangold, Elisabeth ; Albacha-Hejazi, Hasan ; Beaty, Terri L ; Ruczinski, Ingo. / Inferring rare disease risk variants based on exact probabilities of sharing by multiple affected relatives. In: Bioinformatics. 2014 ; Vol. 30, No. 15. pp. 2189-2196.
@article{5c96c6ffca414c0bb2417ec78e0af6dd,
title = "Inferring rare disease risk variants based on exact probabilities of sharing by multiple affected relatives",
abstract = "Motivation: Family-based designs are regaining popularity for genomic sequencing studies because they provide a way to test cosegregation with disease of variants that are too rare in the population to be tested individually in a conventional case-control study. Results: Where only a few affected subjects per family are sequenced, the probability that any variant would be shared by all affected relatives - given it occurred in any one family member - provides evidence against the null hypothesis of a complete absence of linkage and association. A P-value can be obtained as the sum of the probabilities of sharing events as (or more) extreme in one or more families. We generalize an existing closed-form expression for exact sharing probabilities to more than two relatives per family. When pedigree founders are related, we show that an approximation of sharing probabilities based on empirical estimates of kinship among founders obtained from genome-wide marker data is accurate for low levels of kinship. We also propose a more generally applicable approach based on Monte Carlo simulations. We applied this method to a study of 55 multiplex families with apparent non-syndromic forms of oral clefts from four distinct populations, with whole exome sequences available for two or three affected members per family. The rare single nucleotide variant rs149253049 in ADAMTS9 shared by affected relatives in three Indian families achieved significance after correcting for multiple comparisons (p=2×10-6). Availability and implementation: Source code and binaries of the R package RVsharing are freely available for download at http://cran.r-project.org/web/packages/ RVsharing/index.html.",
author = "Alexandre Bureau and Younkin, {Samuel G.} and Parker, {Margaret M.} and Bailey-Wilson, {Joan E.} and Marazita, {Mary L.} and Murray, {Jeffrey C.} and Elisabeth Mangold and Hasan Albacha-Hejazi and Beaty, {Terri L} and Ingo Ruczinski",
year = "2014",
month = "8",
day = "1",
doi = "10.1093/bioinformatics/btu198",
language = "English (US)",
volume = "30",
pages = "2189--2196",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "15",

}

TY - JOUR

T1 - Inferring rare disease risk variants based on exact probabilities of sharing by multiple affected relatives

AU - Bureau, Alexandre

AU - Younkin, Samuel G.

AU - Parker, Margaret M.

AU - Bailey-Wilson, Joan E.

AU - Marazita, Mary L.

AU - Murray, Jeffrey C.

AU - Mangold, Elisabeth

AU - Albacha-Hejazi, Hasan

AU - Beaty, Terri L

AU - Ruczinski, Ingo

PY - 2014/8/1

Y1 - 2014/8/1

N2 - Motivation: Family-based designs are regaining popularity for genomic sequencing studies because they provide a way to test cosegregation with disease of variants that are too rare in the population to be tested individually in a conventional case-control study. Results: Where only a few affected subjects per family are sequenced, the probability that any variant would be shared by all affected relatives - given it occurred in any one family member - provides evidence against the null hypothesis of a complete absence of linkage and association. A P-value can be obtained as the sum of the probabilities of sharing events as (or more) extreme in one or more families. We generalize an existing closed-form expression for exact sharing probabilities to more than two relatives per family. When pedigree founders are related, we show that an approximation of sharing probabilities based on empirical estimates of kinship among founders obtained from genome-wide marker data is accurate for low levels of kinship. We also propose a more generally applicable approach based on Monte Carlo simulations. We applied this method to a study of 55 multiplex families with apparent non-syndromic forms of oral clefts from four distinct populations, with whole exome sequences available for two or three affected members per family. The rare single nucleotide variant rs149253049 in ADAMTS9 shared by affected relatives in three Indian families achieved significance after correcting for multiple comparisons (p=2×10-6). Availability and implementation: Source code and binaries of the R package RVsharing are freely available for download at http://cran.r-project.org/web/packages/ RVsharing/index.html.

AB - Motivation: Family-based designs are regaining popularity for genomic sequencing studies because they provide a way to test cosegregation with disease of variants that are too rare in the population to be tested individually in a conventional case-control study. Results: Where only a few affected subjects per family are sequenced, the probability that any variant would be shared by all affected relatives - given it occurred in any one family member - provides evidence against the null hypothesis of a complete absence of linkage and association. A P-value can be obtained as the sum of the probabilities of sharing events as (or more) extreme in one or more families. We generalize an existing closed-form expression for exact sharing probabilities to more than two relatives per family. When pedigree founders are related, we show that an approximation of sharing probabilities based on empirical estimates of kinship among founders obtained from genome-wide marker data is accurate for low levels of kinship. We also propose a more generally applicable approach based on Monte Carlo simulations. We applied this method to a study of 55 multiplex families with apparent non-syndromic forms of oral clefts from four distinct populations, with whole exome sequences available for two or three affected members per family. The rare single nucleotide variant rs149253049 in ADAMTS9 shared by affected relatives in three Indian families achieved significance after correcting for multiple comparisons (p=2×10-6). Availability and implementation: Source code and binaries of the R package RVsharing are freely available for download at http://cran.r-project.org/web/packages/ RVsharing/index.html.

UR - http://www.scopus.com/inward/record.url?scp=84904246839&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904246839&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btu198

DO - 10.1093/bioinformatics/btu198

M3 - Article

C2 - 24740360

AN - SCOPUS:84904246839

VL - 30

SP - 2189

EP - 2196

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 15

ER -