TY - JOUR
T1 - Pathogen exposure misclassification can bias association signals in GWAS of infectious diseases when using population-based common control subjects
AU - Duchen, Dylan
AU - Vergara, Candelaria
AU - Thio, Chloe L.
AU - Kundu, Prosenjit
AU - Chatterjee, Nilanjan
AU - Thomas, David L.
AU - Wojcik, Genevieve L.
AU - Duggal, Priya
N1 - Funding Information:
Funding from the National Institute of Allergy and Infecitous Diseases , project number 2R01AI148049 , along with a COVID-19 supplement under the same grant number (D.L.T., P.D., G.L.W.) and Burroughs Wellcome Fund , MD-GEM training grant (D.D.), supported this study. Access and use of the UK Biobank was approved using application number 17712. G.L.W. was additionally supported by the National Human Genome Research Institute (NHGRI) grant R35HG011944 .
Funding Information:
Funding from the National Institute of Allergy and Infecitous Diseases, project number 2R01AI148049, along with a COVID-19 supplement under the same grant number (D.L.T. P.D. G.L.W.) and Burroughs Wellcome Fund, MD-GEM training grant (D.D.), supported this study. Access and use of the UK Biobank was approved using application number 17712. G.L.W. was additionally supported by the National Human Genome Research Institute (NHGRI) grant R35HG011944. The study was designed by D.D. C.C. G.L.W. and P.D. Data collection was led by (D.L.T. C.L.T, N.C. P.K. P.D.). Simulation was performed by D.D. Statistical analyses were designed and performed by D.D. C.C. G.L.W. and P.D. Manuscript was first drafted by D.D. C.C. G.L.W. and P.D. All authors contributed to the final manuscript. All authors declare no competing interests.
Publisher Copyright:
© 2022 American Society of Human Genetics
PY - 2023/2/2
Y1 - 2023/2/2
N2 - Genome-wide association studies (GWASs) have been performed to identify host genetic factors for a range of phenotypes, including for infectious diseases. The use of population-based common control subjects from biobanks and extensive consortia is a valuable resource to increase sample sizes in the identification of associated loci with minimal additional expense. Non-differential misclassification of the outcome has been reported when the control subjects are not well characterized, which often attenuates the true effect size. However, for infectious diseases the comparison of affected subjects to population-based common control subjects regardless of pathogen exposure can also result in selection bias. Through simulated comparisons of pathogen-exposed cases and population-based common control subjects, we demonstrate that not accounting for pathogen exposure can result in biased effect estimates and spurious genome-wide significant signals. Further, the observed association can be distorted depending upon strength of the association between a locus and pathogen exposure and the prevalence of pathogen exposure. We also used a real data example from the hepatitis C virus (HCV) genetic consortium comparing HCV spontaneous clearance to persistent infection with both well-characterized control subjects and population-based common control subjects from the UK Biobank. We find biased effect estimates for known HCV clearance-associated loci and potentially spurious HCV clearance associations. These findings suggest that the choice of control subjects is especially important for infectious diseases or outcomes that are conditional upon environmental exposures.
AB - Genome-wide association studies (GWASs) have been performed to identify host genetic factors for a range of phenotypes, including for infectious diseases. The use of population-based common control subjects from biobanks and extensive consortia is a valuable resource to increase sample sizes in the identification of associated loci with minimal additional expense. Non-differential misclassification of the outcome has been reported when the control subjects are not well characterized, which often attenuates the true effect size. However, for infectious diseases the comparison of affected subjects to population-based common control subjects regardless of pathogen exposure can also result in selection bias. Through simulated comparisons of pathogen-exposed cases and population-based common control subjects, we demonstrate that not accounting for pathogen exposure can result in biased effect estimates and spurious genome-wide significant signals. Further, the observed association can be distorted depending upon strength of the association between a locus and pathogen exposure and the prevalence of pathogen exposure. We also used a real data example from the hepatitis C virus (HCV) genetic consortium comparing HCV spontaneous clearance to persistent infection with both well-characterized control subjects and population-based common control subjects from the UK Biobank. We find biased effect estimates for known HCV clearance-associated loci and potentially spurious HCV clearance associations. These findings suggest that the choice of control subjects is especially important for infectious diseases or outcomes that are conditional upon environmental exposures.
KW - GWAS
KW - common controls
KW - genetic epidemiology
KW - infectious disease
KW - misclassification bias
KW - population-based controls
UR - http://www.scopus.com/inward/record.url?scp=85147457546&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147457546&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2022.12.013
DO - 10.1016/j.ajhg.2022.12.013
M3 - Article
C2 - 36649706
AN - SCOPUS:85147457546
SN - 0002-9297
VL - 110
SP - 336
EP - 348
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 2
ER -