Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation

Bogdan Pasaniuc, Sriram Sankararaman, Dara G. Torgerson, Christopher Gignoux, Noah Zaitlen, Celeste Eng, William Rodriguez-Cintron, Rocio Chapela, Jean G. Ford, Pedro C. Avila, Jose Rodriguez-Santana, Gary K. Chen, Loic Le Marchand, Brian Henderson, David Reich, Christopher A. Haiman, Esteban Gonzàlez Burchard, Eran Halperin

Research output: Contribution to journalArticle

Abstract

Motivation: Local ancestry analysis of genotype data from recently admixed populations (e.g. Latinos, African Americans) provides key insights into population history and disease genetics. Although methods for local ancestry inference have been extensively validated in simulations (under many unrealistic assumptions), no empirical study of local ancestry accuracy in Latinos exists to date. Hence, interpreting findings that rely on local ancestry in Latinos is challenging.Results: Here, we use 489 nuclear families from the mainland USA, Puerto Rico and Mexico in conjunction with 3204 unrelated Latinos from the Multiethnic Cohort study to provide the first empirical characterization of local ancestry inference accuracy in Latinos. Our approach for identifying errors does not rely on simulations but on the observation that local ancestry in families follows Mendelian inheritance. We measure the rate of local ancestry assignments that lead to Mendelian inconsistencies in local ancestry in trios (MILANC), which provides a lower bound on errors in the local ancestry estimates. We show that MILANC rates observed in simulations underestimate the rate observed in real data, and that MILANC varies substantially across the genome. Second, across a wide range of methods, we observe that loci with large deviations in local ancestry also show enrichment in MILANC rates. Therefore, local ancestry estimates at such loci should be interpreted with caution. Finally, we reconstruct ancestral haplotype panels to be used as reference panels in local ancestry inference and show that ancestry inference is significantly improved by incoroprating these reference panels.

Original languageEnglish (US)
Pages (from-to)1407-1415
Number of pages9
JournalBioinformatics
Volume29
Issue number11
DOIs
StatePublished - Jun 1 2013

Fingerprint

Hispanic Americans
Biased
Genomics
Locus
Population
Genes
Inconsistency
Puerto Rico
Population Genetics
Mexico
Nuclear Family
African Americans
Haplotypes
Cohort Studies
History
Genotype
Genome
Cohort Study
Simulation
Haplotype

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

Pasaniuc, B., Sankararaman, S., Torgerson, D. G., Gignoux, C., Zaitlen, N., Eng, C., ... Halperin, E. (2013). Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation. Bioinformatics, 29(11), 1407-1415. https://doi.org/10.1093/bioinformatics/btt166

Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation. / Pasaniuc, Bogdan; Sankararaman, Sriram; Torgerson, Dara G.; Gignoux, Christopher; Zaitlen, Noah; Eng, Celeste; Rodriguez-Cintron, William; Chapela, Rocio; Ford, Jean G.; Avila, Pedro C.; Rodriguez-Santana, Jose; Chen, Gary K.; Le Marchand, Loic; Henderson, Brian; Reich, David; Haiman, Christopher A.; Gonzàlez Burchard, Esteban; Halperin, Eran.

In: Bioinformatics, Vol. 29, No. 11, 01.06.2013, p. 1407-1415.

Research output: Contribution to journalArticle

Pasaniuc, B, Sankararaman, S, Torgerson, DG, Gignoux, C, Zaitlen, N, Eng, C, Rodriguez-Cintron, W, Chapela, R, Ford, JG, Avila, PC, Rodriguez-Santana, J, Chen, GK, Le Marchand, L, Henderson, B, Reich, D, Haiman, CA, Gonzàlez Burchard, E & Halperin, E 2013, 'Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation', Bioinformatics, vol. 29, no. 11, pp. 1407-1415. https://doi.org/10.1093/bioinformatics/btt166
Pasaniuc, Bogdan ; Sankararaman, Sriram ; Torgerson, Dara G. ; Gignoux, Christopher ; Zaitlen, Noah ; Eng, Celeste ; Rodriguez-Cintron, William ; Chapela, Rocio ; Ford, Jean G. ; Avila, Pedro C. ; Rodriguez-Santana, Jose ; Chen, Gary K. ; Le Marchand, Loic ; Henderson, Brian ; Reich, David ; Haiman, Christopher A. ; Gonzàlez Burchard, Esteban ; Halperin, Eran. / Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation. In: Bioinformatics. 2013 ; Vol. 29, No. 11. pp. 1407-1415.
@article{351dd8b3805947f2b7c0fa7024f812c3,
title = "Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation",
abstract = "Motivation: Local ancestry analysis of genotype data from recently admixed populations (e.g. Latinos, African Americans) provides key insights into population history and disease genetics. Although methods for local ancestry inference have been extensively validated in simulations (under many unrealistic assumptions), no empirical study of local ancestry accuracy in Latinos exists to date. Hence, interpreting findings that rely on local ancestry in Latinos is challenging.Results: Here, we use 489 nuclear families from the mainland USA, Puerto Rico and Mexico in conjunction with 3204 unrelated Latinos from the Multiethnic Cohort study to provide the first empirical characterization of local ancestry inference accuracy in Latinos. Our approach for identifying errors does not rely on simulations but on the observation that local ancestry in families follows Mendelian inheritance. We measure the rate of local ancestry assignments that lead to Mendelian inconsistencies in local ancestry in trios (MILANC), which provides a lower bound on errors in the local ancestry estimates. We show that MILANC rates observed in simulations underestimate the rate observed in real data, and that MILANC varies substantially across the genome. Second, across a wide range of methods, we observe that loci with large deviations in local ancestry also show enrichment in MILANC rates. Therefore, local ancestry estimates at such loci should be interpreted with caution. Finally, we reconstruct ancestral haplotype panels to be used as reference panels in local ancestry inference and show that ancestry inference is significantly improved by incoroprating these reference panels.",
author = "Bogdan Pasaniuc and Sriram Sankararaman and Torgerson, {Dara G.} and Christopher Gignoux and Noah Zaitlen and Celeste Eng and William Rodriguez-Cintron and Rocio Chapela and Ford, {Jean G.} and Avila, {Pedro C.} and Jose Rodriguez-Santana and Chen, {Gary K.} and {Le Marchand}, Loic and Brian Henderson and David Reich and Haiman, {Christopher A.} and {Gonz{\`a}lez Burchard}, Esteban and Eran Halperin",
year = "2013",
month = "6",
day = "1",
doi = "10.1093/bioinformatics/btt166",
language = "English (US)",
volume = "29",
pages = "1407--1415",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "11",

}

TY - JOUR

T1 - Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation

AU - Pasaniuc, Bogdan

AU - Sankararaman, Sriram

AU - Torgerson, Dara G.

AU - Gignoux, Christopher

AU - Zaitlen, Noah

AU - Eng, Celeste

AU - Rodriguez-Cintron, William

AU - Chapela, Rocio

AU - Ford, Jean G.

AU - Avila, Pedro C.

AU - Rodriguez-Santana, Jose

AU - Chen, Gary K.

AU - Le Marchand, Loic

AU - Henderson, Brian

AU - Reich, David

AU - Haiman, Christopher A.

AU - Gonzàlez Burchard, Esteban

AU - Halperin, Eran

PY - 2013/6/1

Y1 - 2013/6/1

N2 - Motivation: Local ancestry analysis of genotype data from recently admixed populations (e.g. Latinos, African Americans) provides key insights into population history and disease genetics. Although methods for local ancestry inference have been extensively validated in simulations (under many unrealistic assumptions), no empirical study of local ancestry accuracy in Latinos exists to date. Hence, interpreting findings that rely on local ancestry in Latinos is challenging.Results: Here, we use 489 nuclear families from the mainland USA, Puerto Rico and Mexico in conjunction with 3204 unrelated Latinos from the Multiethnic Cohort study to provide the first empirical characterization of local ancestry inference accuracy in Latinos. Our approach for identifying errors does not rely on simulations but on the observation that local ancestry in families follows Mendelian inheritance. We measure the rate of local ancestry assignments that lead to Mendelian inconsistencies in local ancestry in trios (MILANC), which provides a lower bound on errors in the local ancestry estimates. We show that MILANC rates observed in simulations underestimate the rate observed in real data, and that MILANC varies substantially across the genome. Second, across a wide range of methods, we observe that loci with large deviations in local ancestry also show enrichment in MILANC rates. Therefore, local ancestry estimates at such loci should be interpreted with caution. Finally, we reconstruct ancestral haplotype panels to be used as reference panels in local ancestry inference and show that ancestry inference is significantly improved by incoroprating these reference panels.

AB - Motivation: Local ancestry analysis of genotype data from recently admixed populations (e.g. Latinos, African Americans) provides key insights into population history and disease genetics. Although methods for local ancestry inference have been extensively validated in simulations (under many unrealistic assumptions), no empirical study of local ancestry accuracy in Latinos exists to date. Hence, interpreting findings that rely on local ancestry in Latinos is challenging.Results: Here, we use 489 nuclear families from the mainland USA, Puerto Rico and Mexico in conjunction with 3204 unrelated Latinos from the Multiethnic Cohort study to provide the first empirical characterization of local ancestry inference accuracy in Latinos. Our approach for identifying errors does not rely on simulations but on the observation that local ancestry in families follows Mendelian inheritance. We measure the rate of local ancestry assignments that lead to Mendelian inconsistencies in local ancestry in trios (MILANC), which provides a lower bound on errors in the local ancestry estimates. We show that MILANC rates observed in simulations underestimate the rate observed in real data, and that MILANC varies substantially across the genome. Second, across a wide range of methods, we observe that loci with large deviations in local ancestry also show enrichment in MILANC rates. Therefore, local ancestry estimates at such loci should be interpreted with caution. Finally, we reconstruct ancestral haplotype panels to be used as reference panels in local ancestry inference and show that ancestry inference is significantly improved by incoroprating these reference panels.

UR - http://www.scopus.com/inward/record.url?scp=84878275481&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84878275481&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btt166

DO - 10.1093/bioinformatics/btt166

M3 - Article

C2 - 23572411

AN - SCOPUS:84878275481

VL - 29

SP - 1407

EP - 1415

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 11

ER -