Fast and accurate inference of local ancestry in Latino populations

Yael Baran, Bogdan Pasaniuc, Sriram Sankararaman, Dara G. Torgerson, Christopher Gignoux, Celeste Eng, William Rodriguez-Cintron, Rocio Chapela, Jean G. Ford, Pedro C. Avila, Jose Rodriguez-Santana, Esteban Gonzàlez Burchard, Eran Halperin

Research output: Contribution to journalArticle

Abstract

Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in twoway admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.

Original languageEnglish (US)
Article numberbts144
Pages (from-to)1359-1367
Number of pages9
JournalBioinformatics
Volume28
Issue number10
DOIs
StatePublished - May 2012

Fingerprint

Hidden Markov models
Hispanic Americans
Population
Haplotype
Nuclear Family
Haplotypes
Markov Model
Locus
Linkage Disequilibrium
Unbiased Estimation
Puerto Rico
Overfitting
Medical Genetics
Segregation
Mexico
Genotype
Large Data Sets
Recombination
Leverage
African Americans

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

Baran, Y., Pasaniuc, B., Sankararaman, S., Torgerson, D. G., Gignoux, C., Eng, C., ... Halperin, E. (2012). Fast and accurate inference of local ancestry in Latino populations. Bioinformatics, 28(10), 1359-1367. [bts144]. https://doi.org/10.1093/bioinformatics/bts144

Fast and accurate inference of local ancestry in Latino populations. / Baran, Yael; Pasaniuc, Bogdan; Sankararaman, Sriram; Torgerson, Dara G.; Gignoux, Christopher; Eng, Celeste; Rodriguez-Cintron, William; Chapela, Rocio; Ford, Jean G.; Avila, Pedro C.; Rodriguez-Santana, Jose; Burchard, Esteban Gonzàlez; Halperin, Eran.

In: Bioinformatics, Vol. 28, No. 10, bts144, 05.2012, p. 1359-1367.

Research output: Contribution to journalArticle

Baran, Y, Pasaniuc, B, Sankararaman, S, Torgerson, DG, Gignoux, C, Eng, C, Rodriguez-Cintron, W, Chapela, R, Ford, JG, Avila, PC, Rodriguez-Santana, J, Burchard, EG & Halperin, E 2012, 'Fast and accurate inference of local ancestry in Latino populations', Bioinformatics, vol. 28, no. 10, bts144, pp. 1359-1367. https://doi.org/10.1093/bioinformatics/bts144
Baran Y, Pasaniuc B, Sankararaman S, Torgerson DG, Gignoux C, Eng C et al. Fast and accurate inference of local ancestry in Latino populations. Bioinformatics. 2012 May;28(10):1359-1367. bts144. https://doi.org/10.1093/bioinformatics/bts144
Baran, Yael ; Pasaniuc, Bogdan ; Sankararaman, Sriram ; Torgerson, Dara G. ; Gignoux, Christopher ; Eng, Celeste ; Rodriguez-Cintron, William ; Chapela, Rocio ; Ford, Jean G. ; Avila, Pedro C. ; Rodriguez-Santana, Jose ; Burchard, Esteban Gonzàlez ; Halperin, Eran. / Fast and accurate inference of local ancestry in Latino populations. In: Bioinformatics. 2012 ; Vol. 28, No. 10. pp. 1359-1367.
@article{069022095f4442998a5ff285622ff4f1,
title = "Fast and accurate inference of local ancestry in Latino populations",
abstract = "Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in twoway admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.",
author = "Yael Baran and Bogdan Pasaniuc and Sriram Sankararaman and Torgerson, {Dara G.} and Christopher Gignoux and Celeste Eng and William Rodriguez-Cintron and Rocio Chapela and Ford, {Jean G.} and Avila, {Pedro C.} and Jose Rodriguez-Santana and Burchard, {Esteban Gonz{\`a}lez} and Eran Halperin",
year = "2012",
month = "5",
doi = "10.1093/bioinformatics/bts144",
language = "English (US)",
volume = "28",
pages = "1359--1367",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "10",

}

TY - JOUR

T1 - Fast and accurate inference of local ancestry in Latino populations

AU - Baran, Yael

AU - Pasaniuc, Bogdan

AU - Sankararaman, Sriram

AU - Torgerson, Dara G.

AU - Gignoux, Christopher

AU - Eng, Celeste

AU - Rodriguez-Cintron, William

AU - Chapela, Rocio

AU - Ford, Jean G.

AU - Avila, Pedro C.

AU - Rodriguez-Santana, Jose

AU - Burchard, Esteban Gonzàlez

AU - Halperin, Eran

PY - 2012/5

Y1 - 2012/5

N2 - Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in twoway admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.

AB - Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in twoway admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.

UR - http://www.scopus.com/inward/record.url?scp=84861127863&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84861127863&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bts144

DO - 10.1093/bioinformatics/bts144

M3 - Article

C2 - 22495753

AN - SCOPUS:84861127863

VL - 28

SP - 1359

EP - 1367

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 10

M1 - bts144

ER -