TY - JOUR
T1 - Filtering genetic variants and placing informative priors based on putative biological function
AU - Friedrichs, Stefanie
AU - Malzahn, Dörthe
AU - Pugh, Elizabeth W.
AU - Almeida, Marcio
AU - Liu, Xiao Qing
AU - Bailey, Julia N.
N1 - Funding Information:
The authors would like to thank Zheyang Wu and Peng Wei for their comments and suggestions, as well as the GAW organizers for all their efforts. SF and DM were supported by the Deutsche Forschungsgemeinschaft (DFG, grant Research Training Group ?Scaling Problems in Statistics? RTG 1644; grant Klinische Forschergruppe (KFO) 241: TP5, BI 576/5-1). EWP and JNB acknowledge support by National Institutes of Health (NIH) grants (HHSN268201200008I, R01 NS055057). XQL was supported by the University of Manitoba start-up funds. T2D-GENES is supported by NIH grants U01 DK085524, U01 DK085501, U01 DK085526, U01 DK085584 and U01 DK085545, the SAFHS by grant P01 HL045222, the SAFDS by grant R01 DK047482, and the SAFGS by grant R01 DK053889. Genetic analysis Workshop 19 was supported by NIH grant R01 GM031575.
Publisher Copyright:
© 2016 Friedrichs et al.
PY - 2016/2/3
Y1 - 2016/2/3
N2 - High-density genetic marker data, especially sequence data, imply an immense multiple testing burden. This can be ameliorated by filtering genetic variants, exploiting or accounting for correlations between variants, jointly testing variants, and by incorporating informative priors. Priors can be based on biological knowledge or predicted variant function, or even be used to integrate gene expression or other omics data. Based on Genetic Analysis Workshop (GAW) 19 data, this article discusses diversity and usefulness of functional variant scores provided, for example, by PolyPhen2, SIFT, or RegulomeDB annotations. Incorporating functional scores into variant filters or weights and adjusting the significance level for correlations between variants yielded significant associations with blood pressure traits in a large family study of Mexican Americans (GAW19 data set). Marker rs218966 in gene PHF14 and rs9836027 in MAP4 significantly associated with hypertension; additionally, rare variants in SNUPN significantly associated with systolic blood pressure. Variant weights strongly influenced the power of kernel methods and burden tests. Apart from variant weights in test statistics, prior weights may also be used when combining test statistics or to informatively weight p values while controlling false discovery rate (FDR). Indeed, power improved when gene expression data for FDR-controlled informative weighting of association test p values of genes was used. Finally, approaches exploiting variant correlations included identity-by-descent mapping and the optimal strategy for joint testing rare and common variants, which was observed to depend on linkage disequilibrium structure.
AB - High-density genetic marker data, especially sequence data, imply an immense multiple testing burden. This can be ameliorated by filtering genetic variants, exploiting or accounting for correlations between variants, jointly testing variants, and by incorporating informative priors. Priors can be based on biological knowledge or predicted variant function, or even be used to integrate gene expression or other omics data. Based on Genetic Analysis Workshop (GAW) 19 data, this article discusses diversity and usefulness of functional variant scores provided, for example, by PolyPhen2, SIFT, or RegulomeDB annotations. Incorporating functional scores into variant filters or weights and adjusting the significance level for correlations between variants yielded significant associations with blood pressure traits in a large family study of Mexican Americans (GAW19 data set). Marker rs218966 in gene PHF14 and rs9836027 in MAP4 significantly associated with hypertension; additionally, rare variants in SNUPN significantly associated with systolic blood pressure. Variant weights strongly influenced the power of kernel methods and burden tests. Apart from variant weights in test statistics, prior weights may also be used when combining test statistics or to informatively weight p values while controlling false discovery rate (FDR). Indeed, power improved when gene expression data for FDR-controlled informative weighting of association test p values of genes was used. Finally, approaches exploiting variant correlations included identity-by-descent mapping and the optimal strategy for joint testing rare and common variants, which was observed to depend on linkage disequilibrium structure.
UR - http://www.scopus.com/inward/record.url?scp=84956651957&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84956651957&partnerID=8YFLogxK
U2 - 10.1186/s12863-015-0313-x
DO - 10.1186/s12863-015-0313-x
M3 - Article
C2 - 26866982
AN - SCOPUS:84956651957
VL - 17
JO - BMC Genetics
JF - BMC Genetics
SN - 1471-2156
IS - 2
M1 - S8
ER -