Background: Questions remain regarding the utility of self-reported ethnicity (SRE) in genetic and epidemiologic research. It is not clear whether conditioning on SRE provides adequate protection from inflated type I error rates due to population stratification and admixture. We address this question using data obtained from the Multi-Ethnic Study of Atherosclerosis (MESA), which enrolled individuals from 4 self-reported ethnic groups. We compare the agreement between SRE and genetic based measures of ancestry (GBMA), and conduct simulation studies based on observed MESA data to evaluate the performance of each measure under various conditions.Results: Four clusters are identified using 96 ancestry informative markers. Three of these clusters are well delineated, but 30% of the self-reported Hispanic-Americans are misclassified. We also found that MESA SRE provides type I error rates that are consistent with the nominal levels. More extensive simulations revealed that this finding is likely due to the multi-ethnic nature of the MESA. Finally, we describe situations where SRE may perform as well as a GBMA in controlling the effect of population stratification and admixture in association tests.Conclusions: The performance of SRE as a control variable in genetic association tests is more nuanced than previously thought, and may have more value than it is currently credited with, especially when smaller replication studies are being considered in multi-ethnic samples.
ASJC Scopus subject areas