Mining gold dust under the genome wide significance level: A two-stage approach to analysis of GWAS

Gang Shi; Eric Boerwinkle; Alanna C. Morrison; Charles C. Gu C.; Aravinda Chakravarti; D. C. Rao

doi:10.1002/gepi.20556

Mining gold dust under the genome wide significance level: A two-stage approach to analysis of GWAS

Gang Shi, Eric Boerwinkle, Alanna C. Morrison, Charles C. Gu C., Aravinda Chakravarti, D. C. Rao

Research output: Contribution to journal › Article › peer-review

30 Scopus citations

Abstract

We propose a two-stage approach to analyze genome-wide association data in order to identify a set of promising single-nucleotide polymorphisms (SNPs). In stage one, we select a list of top signals from single SNP analyses by controlling false discovery rate. In stage two, we use the least absolute shrinkage and selection operator (LASSO) regression to reduce false positives. The proposed approach was evaluated using simulated quantitative traits based on genome-wide SNP data on 8,861 Caucasian individuals from the Atherosclerosis Risk in Communities (ARIC) Study. Our first stage, targeted at controlling false negatives, yields better power than using Bonferroni-corrected significance level. The LASSO regression reduces the number of significant SNPs in stage two: it reduces false-positive SNPs and it reduces true-positive SNPs also at simulated causal loci due to linkage disequilibrium. Interestingly, the LASSO regression preserves the power from stage one, i.e., the number of causal loci detected from the LASSO regression in stage two is almost the same as in stage one, while reducing false positives further. Real data on systolic blood pressure in the ARIC study was analyzed using our two-stage approach which identified two significant SNPs, one of which was reported to be genome-significant in a meta-analysis containing a much larger sample size. On the other hand, a single SNP association scan did not yield any significant results.

Original language	English (US)
Pages (from-to)	111-118
Number of pages	8
Journal	Genetic epidemiology
Volume	35
Issue number	2
DOIs	https://doi.org/10.1002/gepi.20556
State	Published - Feb 2011
Externally published	Yes

Keywords

Association
FDR
LASSO
Multi-marker
Power

ASJC Scopus subject areas

Epidemiology
Genetics(clinical)

Access to Document

10.1002/gepi.20556

Cite this

@article{b655232be84343c8bae185251cb18748,

title = "Mining gold dust under the genome wide significance level: A two-stage approach to analysis of GWAS",

abstract = "We propose a two-stage approach to analyze genome-wide association data in order to identify a set of promising single-nucleotide polymorphisms (SNPs). In stage one, we select a list of top signals from single SNP analyses by controlling false discovery rate. In stage two, we use the least absolute shrinkage and selection operator (LASSO) regression to reduce false positives. The proposed approach was evaluated using simulated quantitative traits based on genome-wide SNP data on 8,861 Caucasian individuals from the Atherosclerosis Risk in Communities (ARIC) Study. Our first stage, targeted at controlling false negatives, yields better power than using Bonferroni-corrected significance level. The LASSO regression reduces the number of significant SNPs in stage two: it reduces false-positive SNPs and it reduces true-positive SNPs also at simulated causal loci due to linkage disequilibrium. Interestingly, the LASSO regression preserves the power from stage one, i.e., the number of causal loci detected from the LASSO regression in stage two is almost the same as in stage one, while reducing false positives further. Real data on systolic blood pressure in the ARIC study was analyzed using our two-stage approach which identified two significant SNPs, one of which was reported to be genome-significant in a meta-analysis containing a much larger sample size. On the other hand, a single SNP association scan did not yield any significant results.",

keywords = "Association, FDR, LASSO, Multi-marker, Power",

author = "Gang Shi and Eric Boerwinkle and Morrison, {Alanna C.} and {Gu C.}, {Charles C.} and Aravinda Chakravarti and Rao, {D. C.}",

year = "2011",

month = feb,

doi = "10.1002/gepi.20556",

language = "English (US)",

volume = "35",

pages = "111--118",

journal = "Genetic epidemiology",

issn = "0741-0395",

publisher = "Wiley-Liss Inc.",

number = "2",

}

TY - JOUR

T1 - Mining gold dust under the genome wide significance level

T2 - A two-stage approach to analysis of GWAS

AU - Shi, Gang

AU - Boerwinkle, Eric

AU - Morrison, Alanna C.

AU - Gu C., Charles C.

AU - Chakravarti, Aravinda

AU - Rao, D. C.

PY - 2011/2

Y1 - 2011/2

N2 - We propose a two-stage approach to analyze genome-wide association data in order to identify a set of promising single-nucleotide polymorphisms (SNPs). In stage one, we select a list of top signals from single SNP analyses by controlling false discovery rate. In stage two, we use the least absolute shrinkage and selection operator (LASSO) regression to reduce false positives. The proposed approach was evaluated using simulated quantitative traits based on genome-wide SNP data on 8,861 Caucasian individuals from the Atherosclerosis Risk in Communities (ARIC) Study. Our first stage, targeted at controlling false negatives, yields better power than using Bonferroni-corrected significance level. The LASSO regression reduces the number of significant SNPs in stage two: it reduces false-positive SNPs and it reduces true-positive SNPs also at simulated causal loci due to linkage disequilibrium. Interestingly, the LASSO regression preserves the power from stage one, i.e., the number of causal loci detected from the LASSO regression in stage two is almost the same as in stage one, while reducing false positives further. Real data on systolic blood pressure in the ARIC study was analyzed using our two-stage approach which identified two significant SNPs, one of which was reported to be genome-significant in a meta-analysis containing a much larger sample size. On the other hand, a single SNP association scan did not yield any significant results.

AB - We propose a two-stage approach to analyze genome-wide association data in order to identify a set of promising single-nucleotide polymorphisms (SNPs). In stage one, we select a list of top signals from single SNP analyses by controlling false discovery rate. In stage two, we use the least absolute shrinkage and selection operator (LASSO) regression to reduce false positives. The proposed approach was evaluated using simulated quantitative traits based on genome-wide SNP data on 8,861 Caucasian individuals from the Atherosclerosis Risk in Communities (ARIC) Study. Our first stage, targeted at controlling false negatives, yields better power than using Bonferroni-corrected significance level. The LASSO regression reduces the number of significant SNPs in stage two: it reduces false-positive SNPs and it reduces true-positive SNPs also at simulated causal loci due to linkage disequilibrium. Interestingly, the LASSO regression preserves the power from stage one, i.e., the number of causal loci detected from the LASSO regression in stage two is almost the same as in stage one, while reducing false positives further. Real data on systolic blood pressure in the ARIC study was analyzed using our two-stage approach which identified two significant SNPs, one of which was reported to be genome-significant in a meta-analysis containing a much larger sample size. On the other hand, a single SNP association scan did not yield any significant results.

KW - Association

KW - FDR

KW - LASSO

KW - Multi-marker

KW - Power

UR - http://www.scopus.com/inward/record.url?scp=78751507648&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78751507648&partnerID=8YFLogxK

U2 - 10.1002/gepi.20556

DO - 10.1002/gepi.20556

M3 - Article

C2 - 21254218

AN - SCOPUS:78751507648

SN - 0741-0395

VL - 35

SP - 111

EP - 118

JO - Genetic epidemiology

JF - Genetic epidemiology

IS - 2

ER -

Mining gold dust under the genome wide significance level: A two-stage approach to analysis of GWAS

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this