Large prospective cohorts originally assembled to study environmental risk factors are increasingly exploited to study gene-environment interactions. Given the cost of genetic studies in large samples, being able to select a subsample for genotyping that contains most of the information from the cohort would lead to substantial savings. We consider nested case-control and case-cohort sampling designs with and without stratification and compare their efficiency relative to the entire cohort for estimating the effects of genetic and environmental risk factors and their interactions. Asymptotic calculations show that the relative efficiency of the case-cohort and nested case-control designs implementing the same sampling stratification are similar over a range of scenarios for the relationships among genes, environmental exposures, and disease status. Sampling equal numbers of exposed and unexposed subjects improves efficiency when the exposure is rare. The case-cohort designs had a slight advantage in simulations of sampling designs within the Framingham Offspring Study, using the interaction between apolipoprotein E and smoking on the risk of coronary heart disease as an example. It was possible to estimate the interaction effect with precision close to that of the full cohort when using case-cohort or nested case-control samples containing fewer than half the subjects of the cohort.
ASJC Scopus subject areas