TY - JOUR
T1 - A Penalized Regression Framework for Building Polygenic Risk Models Based on Summary Statistics From Genome-Wide Association Studies and Incorporating External Information
AU - Chen, Ting Huei
AU - Chatterjee, Nilanjan
AU - Landi, Maria Teresa
AU - Shi, Jianxin
N1 - Publisher Copyright:
© 2020 American Statistical Association.
PY - 2020
Y1 - 2020
N2 - Large-scale genome-wide association studies (GWAS) provide opportunities for developing genetic risk prediction models that have the potential to improve disease prevention, intervention or treatment. The key step is to develop polygenic risk score (PRS) models with high predictive performance for a given disease, which typically requires a large training dataset for selecting truly associated single nucleotide polymorphisms (SNPs) and estimating effect sizes accurately. Here, we develop a comprehensive penalized regression for fitting l 1 regularized regression models to GWAS summary statistics. We propose incorporating pleiotropy and annotation information into PRS (PANPRS) development through suitable formulation of penalty functions and associated tuning parameters. Extensive simulations show that PANPRS performs equally well or better than existing PRS methods when no functional annotation or pleiotropy is incorporated. When functional annotation data and pleiotropy are informative, PANPRS substantially outperforms existing PRS methods in simulations. Finally, we applied our methods to build PRS for type 2 diabetes and melanoma and found that incorporating relevant functional annotations and GWAS of genetically related traits improved prediction of these two complex diseases. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
AB - Large-scale genome-wide association studies (GWAS) provide opportunities for developing genetic risk prediction models that have the potential to improve disease prevention, intervention or treatment. The key step is to develop polygenic risk score (PRS) models with high predictive performance for a given disease, which typically requires a large training dataset for selecting truly associated single nucleotide polymorphisms (SNPs) and estimating effect sizes accurately. Here, we develop a comprehensive penalized regression for fitting l 1 regularized regression models to GWAS summary statistics. We propose incorporating pleiotropy and annotation information into PRS (PANPRS) development through suitable formulation of penalty functions and associated tuning parameters. Extensive simulations show that PANPRS performs equally well or better than existing PRS methods when no functional annotation or pleiotropy is incorporated. When functional annotation data and pleiotropy are informative, PANPRS substantially outperforms existing PRS methods in simulations. Finally, we applied our methods to build PRS for type 2 diabetes and melanoma and found that incorporating relevant functional annotations and GWAS of genetically related traits improved prediction of these two complex diseases. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
KW - Genetic pleiotropy
KW - Genetic risk prediction
KW - Genome wide association study
KW - Lasso
KW - Polygenic risk score
KW - Summary statistics
UR - http://www.scopus.com/inward/record.url?scp=85090867637&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090867637&partnerID=8YFLogxK
U2 - 10.1080/01621459.2020.1764849
DO - 10.1080/01621459.2020.1764849
M3 - Article
C2 - 34483403
AN - SCOPUS:85090867637
SN - 0162-1459
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
ER -