SNP Prioritization Using a Bayesian Probability of Association

John R. Thompson; Martin Gögele; Christian X. Weichenberger; Mirko Modenese; John Attia; Jennifer H. Barrett; Michael Boehnke; Alessandro De Grandi; Francisco S. Domingues; Andrew A. Hicks; Fabio Marroni; Cristian Pattaro; Fabrizio Ruggeri; Giuseppe Borsani; Giorgio Casari; Giovanni Parmigiani; Andrea Pastore; Arne Pfeufer; Christine Schwienbacher; Daniel Taliun; Ckdgen Consortium; Caroline S. Fox; Peter P. Pramstaller; Cosetta Minelli

doi:10.1002/gepi.21704

SNP Prioritization Using a Bayesian Probability of Association

John R. Thompson, Martin Gögele, Christian X. Weichenberger, Mirko Modenese, John Attia, Jennifer H. Barrett, Michael Boehnke, Alessandro De Grandi, Francisco S. Domingues, Andrew A. Hicks, Fabio Marroni, Cristian Pattaro, Fabrizio Ruggeri, Giuseppe Borsani, Giorgio Casari, Giovanni Parmigiani, Andrea Pastore, Arne Pfeufer, Christine Schwienbacher, Daniel TaliunCkdgen Consortium, Caroline S. Fox, Peter P. Pramstaller, Cosetta Minelli

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

Prioritization is the process whereby a set of possible candidate genes or SNPs is ranked so that the most promising can be taken forward into further studies. In a genome-wide association study, prioritization is usually based on the P-values alone, but researchers sometimes take account of external annotation information about the SNPs such as whether the SNP lies close to a good candidate gene. Using external information in this way is inherently subjective and is often not formalized, making the analysis difficult to reproduce. Building on previous work that has identified 14 important types of external information, we present an approximate Bayesian analysis that produces an estimate of the probability of association. The calculation combines four sources of information: the genome-wide data, SNP information derived from bioinformatics databases, empirical SNP weights, and the researchers' subjective prior opinions. The calculation is fast enough that it can be applied to millions of SNPS and although it does rely on subjective judgments, those judgments are made explicit so that the final SNP selection can be reproduced. We show that the resulting probability of association is intuitively more appealing than the P-value because it is easier to interpret and it makes allowance for the power of the study. We illustrate the use of the probability of association for SNP prioritization by applying it to a meta-analysis of kidney function genome-wide association studies and demonstrate that SNP selection performs better using the probability of association compared with P-values alone.

Original language	English (US)
Pages (from-to)	214-221
Number of pages	8
Journal	Genetic Epidemiology
Volume	37
Issue number	2
DOIs	https://doi.org/10.1002/gepi.21704
State	Published - Feb 2013
Externally published	Yes

Keywords

Genome-wide studies
Prior knowledge
Replication

ASJC Scopus subject areas

Genetics(clinical)
Epidemiology

Access to Document

10.1002/gepi.21704

Cite this

Thompson, J. R., Gögele, M., Weichenberger, C. X., Modenese, M., Attia, J., Barrett, J. H., Boehnke, M., De Grandi, A., Domingues, F. S., Hicks, A. A., Marroni, F., Pattaro, C., Ruggeri, F., Borsani, G., Casari, G., Parmigiani, G., Pastore, A., Pfeufer, A., Schwienbacher, C., ... Minelli, C. (2013). SNP Prioritization Using a Bayesian Probability of Association. Genetic Epidemiology, 37(2), 214-221. https://doi.org/10.1002/gepi.21704

Thompson, JR, Gögele, M, Weichenberger, CX, Modenese, M, Attia, J, Barrett, JH, Boehnke, M, De Grandi, A, Domingues, FS, Hicks, AA, Marroni, F, Pattaro, C, Ruggeri, F, Borsani, G, Casari, G, Parmigiani, G, Pastore, A, Pfeufer, A, Schwienbacher, C, Taliun, D, Consortium, C, Fox, CS, Pramstaller, PP & Minelli, C 2013, 'SNP Prioritization Using a Bayesian Probability of Association', Genetic Epidemiology, vol. 37, no. 2, pp. 214-221. https://doi.org/10.1002/gepi.21704

@article{c74394e9535e40abb33e23e9a15158e8,

title = "SNP Prioritization Using a Bayesian Probability of Association",

abstract = "Prioritization is the process whereby a set of possible candidate genes or SNPs is ranked so that the most promising can be taken forward into further studies. In a genome-wide association study, prioritization is usually based on the P-values alone, but researchers sometimes take account of external annotation information about the SNPs such as whether the SNP lies close to a good candidate gene. Using external information in this way is inherently subjective and is often not formalized, making the analysis difficult to reproduce. Building on previous work that has identified 14 important types of external information, we present an approximate Bayesian analysis that produces an estimate of the probability of association. The calculation combines four sources of information: the genome-wide data, SNP information derived from bioinformatics databases, empirical SNP weights, and the researchers' subjective prior opinions. The calculation is fast enough that it can be applied to millions of SNPS and although it does rely on subjective judgments, those judgments are made explicit so that the final SNP selection can be reproduced. We show that the resulting probability of association is intuitively more appealing than the P-value because it is easier to interpret and it makes allowance for the power of the study. We illustrate the use of the probability of association for SNP prioritization by applying it to a meta-analysis of kidney function genome-wide association studies and demonstrate that SNP selection performs better using the probability of association compared with P-values alone.",

keywords = "Genome-wide studies, Prior knowledge, Replication",

author = "Thompson, {John R.} and Martin G{\"o}gele and Weichenberger, {Christian X.} and Mirko Modenese and John Attia and Barrett, {Jennifer H.} and Michael Boehnke and {De Grandi}, Alessandro and Domingues, {Francisco S.} and Hicks, {Andrew A.} and Fabio Marroni and Cristian Pattaro and Fabrizio Ruggeri and Giuseppe Borsani and Giorgio Casari and Giovanni Parmigiani and Andrea Pastore and Arne Pfeufer and Christine Schwienbacher and Daniel Taliun and Ckdgen Consortium and Fox, {Caroline S.} and Pramstaller, {Peter P.} and Cosetta Minelli",

year = "2013",

month = feb,

doi = "10.1002/gepi.21704",

language = "English (US)",

volume = "37",

pages = "214--221",

journal = "Genetic Epidemiology",

issn = "0741-0395",

publisher = "Wiley-Liss Inc.",

number = "2",

}

TY - JOUR

T1 - SNP Prioritization Using a Bayesian Probability of Association

AU - Thompson, John R.

AU - Gögele, Martin

AU - Weichenberger, Christian X.

AU - Modenese, Mirko

AU - Attia, John

AU - Barrett, Jennifer H.

AU - Boehnke, Michael

AU - De Grandi, Alessandro

AU - Domingues, Francisco S.

AU - Hicks, Andrew A.

AU - Marroni, Fabio

AU - Pattaro, Cristian

AU - Ruggeri, Fabrizio

AU - Borsani, Giuseppe

AU - Casari, Giorgio

AU - Parmigiani, Giovanni

AU - Pastore, Andrea

AU - Pfeufer, Arne

AU - Schwienbacher, Christine

AU - Taliun, Daniel

AU - Consortium, Ckdgen

AU - Fox, Caroline S.

AU - Pramstaller, Peter P.

AU - Minelli, Cosetta

PY - 2013/2

Y1 - 2013/2

N2 - Prioritization is the process whereby a set of possible candidate genes or SNPs is ranked so that the most promising can be taken forward into further studies. In a genome-wide association study, prioritization is usually based on the P-values alone, but researchers sometimes take account of external annotation information about the SNPs such as whether the SNP lies close to a good candidate gene. Using external information in this way is inherently subjective and is often not formalized, making the analysis difficult to reproduce. Building on previous work that has identified 14 important types of external information, we present an approximate Bayesian analysis that produces an estimate of the probability of association. The calculation combines four sources of information: the genome-wide data, SNP information derived from bioinformatics databases, empirical SNP weights, and the researchers' subjective prior opinions. The calculation is fast enough that it can be applied to millions of SNPS and although it does rely on subjective judgments, those judgments are made explicit so that the final SNP selection can be reproduced. We show that the resulting probability of association is intuitively more appealing than the P-value because it is easier to interpret and it makes allowance for the power of the study. We illustrate the use of the probability of association for SNP prioritization by applying it to a meta-analysis of kidney function genome-wide association studies and demonstrate that SNP selection performs better using the probability of association compared with P-values alone.

AB - Prioritization is the process whereby a set of possible candidate genes or SNPs is ranked so that the most promising can be taken forward into further studies. In a genome-wide association study, prioritization is usually based on the P-values alone, but researchers sometimes take account of external annotation information about the SNPs such as whether the SNP lies close to a good candidate gene. Using external information in this way is inherently subjective and is often not formalized, making the analysis difficult to reproduce. Building on previous work that has identified 14 important types of external information, we present an approximate Bayesian analysis that produces an estimate of the probability of association. The calculation combines four sources of information: the genome-wide data, SNP information derived from bioinformatics databases, empirical SNP weights, and the researchers' subjective prior opinions. The calculation is fast enough that it can be applied to millions of SNPS and although it does rely on subjective judgments, those judgments are made explicit so that the final SNP selection can be reproduced. We show that the resulting probability of association is intuitively more appealing than the P-value because it is easier to interpret and it makes allowance for the power of the study. We illustrate the use of the probability of association for SNP prioritization by applying it to a meta-analysis of kidney function genome-wide association studies and demonstrate that SNP selection performs better using the probability of association compared with P-values alone.

KW - Genome-wide studies

KW - Prior knowledge

KW - Replication

UR - http://www.scopus.com/inward/record.url?scp=84872395349&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84872395349&partnerID=8YFLogxK

U2 - 10.1002/gepi.21704

DO - 10.1002/gepi.21704

M3 - Article

C2 - 23280596

AN - SCOPUS:84872395349

SN - 0741-0395

VL - 37

SP - 214

EP - 221

JO - Genetic Epidemiology

JF - Genetic Epidemiology

IS - 2

ER -

SNP Prioritization Using a Bayesian Probability of Association

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this