A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework

Schizophrenia and Bipolar Disorder Working Groups of the Psychiatric Genomics Consortium

doi:10.1186/s12864-018-4859-7

A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework

Schizophrenia and Bipolar Disorder Working Groups of the Psychiatric Genomics Consortium

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

Background: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). Results: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. Conclusions: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS.

Original language	English (US)
Article number	494
Journal	BMC genomics
Volume	19
Issue number	1
DOIs	https://doi.org/10.1186/s12864-018-4859-7
State	Published - Jun 25 2018

Keywords

Covariate-modulated false discovery rate
Cross-phenotype association
Data integration
Meta-analysis with shared subjects

ASJC Scopus subject areas

Biotechnology
Genetics

Access to Document

10.1186/s12864-018-4859-7

Cite this

@article{a55d0c51fdb44761a2b4bd2af2ccdd79,

title = "A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework",

abstract = "Background: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). Results: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. Conclusions: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS.",

keywords = "Covariate-modulated false discovery rate, Cross-phenotype association, Data integration, Meta-analysis with shared subjects",

author = "{Schizophrenia and Bipolar Disorder Working Groups of the Psychiatric Genomics Consortium} and Marissa LeBlanc and Verena Zuber and Thompson, {Wesley K.} and Andreassen, {Ole A.} and Arnoldo Frigessi and Andreassen, {Bettina Kulle} and Stephan Ripke and Neale, {Benjamin M.} and Aiden Corvin and Walters, {James T.R.} and Farh, {Kai How} and Phil Lee and Brendan Bulik-Sullivan and Collier, {David A.} and Hailiang Huang and Pers, {Tune H.} and Ingrid Agartz and Esben Agerbo and Margot Albus and Madeline Alexander and Farooq Amin and Bacanu, {Silviu A.} and Martin Begemann and Belliveau, {Richard A.} and Judit Bene and Elizabeth Bevilacqua and Bigdeli, {Tim B.} and Black, {Donald W.} and Richard Bruggeman and Buccola, {Nancy G.} and Buckner, {Randy L.} and Wiepke Cahn and Guiqing Cai and Cairns, {Murray J.} and Dominique Campion and Cantor, {Rita M.} and Carr, {Vaughan J.} and Noa Carrera and Catts, {Stanley V.} and Chambert, {Kimberly D.} and Chan, {Raymond C.K.} and Liang, {Kung Yee} and Maher, {Brion S.} and Gerald Nestadt and Pulver, {Ann E.} and Weinberger, {Daniel R.} and Mahon, {Pamela B.} and McMahon, {Francis J.} and Zandi, {Peter P.} and Potash, {James B.}",

note = "Publisher Copyright: {\textcopyright} 2018 The Author(s).",

year = "2018",

month = jun,

day = "25",

doi = "10.1186/s12864-018-4859-7",

language = "English (US)",

volume = "19",

journal = "BMC genomics",

issn = "1471-2164",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework

AU - Schizophrenia and Bipolar Disorder Working Groups of the Psychiatric Genomics Consortium

AU - LeBlanc, Marissa

AU - Zuber, Verena

AU - Thompson, Wesley K.

AU - Andreassen, Ole A.

AU - Frigessi, Arnoldo

AU - Andreassen, Bettina Kulle

AU - Ripke, Stephan

AU - Neale, Benjamin M.

AU - Corvin, Aiden

AU - Walters, James T.R.

AU - Farh, Kai How

AU - Lee, Phil

AU - Bulik-Sullivan, Brendan

AU - Collier, David A.

AU - Huang, Hailiang

AU - Pers, Tune H.

AU - Agartz, Ingrid

AU - Agerbo, Esben

AU - Albus, Margot

AU - Alexander, Madeline

AU - Amin, Farooq

AU - Bacanu, Silviu A.

AU - Begemann, Martin

AU - Belliveau, Richard A.

AU - Bene, Judit

AU - Bevilacqua, Elizabeth

AU - Bigdeli, Tim B.

AU - Black, Donald W.

AU - Bruggeman, Richard

AU - Buccola, Nancy G.

AU - Buckner, Randy L.

AU - Cahn, Wiepke

AU - Cai, Guiqing

AU - Cairns, Murray J.

AU - Campion, Dominique

AU - Cantor, Rita M.

AU - Carr, Vaughan J.

AU - Carrera, Noa

AU - Catts, Stanley V.

AU - Chambert, Kimberly D.

AU - Chan, Raymond C.K.

AU - Liang, Kung Yee

AU - Maher, Brion S.

AU - Nestadt, Gerald

AU - Pulver, Ann E.

AU - Weinberger, Daniel R.

AU - Mahon, Pamela B.

AU - McMahon, Francis J.

AU - Zandi, Peter P.

AU - Potash, James B.

PY - 2018/6/25

Y1 - 2018/6/25

N2 - Background: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). Results: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. Conclusions: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS.

AB - Background: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). Results: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. Conclusions: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS.

KW - Covariate-modulated false discovery rate

KW - Cross-phenotype association

KW - Data integration

KW - Meta-analysis with shared subjects

UR - http://www.scopus.com/inward/record.url?scp=85049066693&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049066693&partnerID=8YFLogxK

U2 - 10.1186/s12864-018-4859-7

DO - 10.1186/s12864-018-4859-7

M3 - Article

C2 - 29940862

AN - SCOPUS:85049066693

SN - 1471-2164

VL - 19

JO - BMC genomics

JF - BMC genomics

IS - 1

M1 - 494

ER -

A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this