Fast and accurate inference of local ancestry in Latino populations

Yael Baran; Bogdan Pasaniuc; Sriram Sankararaman; Dara G. Torgerson; Christopher Gignoux; Celeste Eng; William Rodriguez-Cintron; Rocio Chapela; Jean G. Ford; Pedro C. Avila; Jose Rodriguez-Santana; Esteban Gonzàlez Burchard; Eran Halperin

doi:10.1093/bioinformatics/bts144

Fast and accurate inference of local ancestry in Latino populations

Yael Baran, Bogdan Pasaniuc, Sriram Sankararaman, Dara G. Torgerson, Christopher Gignoux, Celeste Eng, William Rodriguez-Cintron, Rocio Chapela, Jean G. Ford, Pedro C. Avila, Jose Rodriguez-Santana, Esteban Gonzàlez Burchard, Eran Halperin

Research output: Contribution to journal › Article › peer-review

140 Scopus citations

Abstract

Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in twoway admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.

Original language	English (US)
Article number	bts144
Pages (from-to)	1359-1367
Number of pages	9
Journal	Bioinformatics
Volume	28
Issue number	10
DOIs	https://doi.org/10.1093/bioinformatics/bts144
State	Published - May 2012
Externally published	Yes

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/bts144

Cite this

Baran, Y., Pasaniuc, B., Sankararaman, S., Torgerson, D. G., Gignoux, C., Eng, C., Rodriguez-Cintron, W., Chapela, R., Ford, J. G., Avila, P. C., Rodriguez-Santana, J., Burchard, E. G., & Halperin, E. (2012). Fast and accurate inference of local ancestry in Latino populations. Bioinformatics, 28(10), 1359-1367. Article bts144. https://doi.org/10.1093/bioinformatics/bts144

@article{069022095f4442998a5ff285622ff4f1,

title = "Fast and accurate inference of local ancestry in Latino populations",

abstract = "Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in twoway admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.",

author = "Yael Baran and Bogdan Pasaniuc and Sriram Sankararaman and Torgerson, {Dara G.} and Christopher Gignoux and Celeste Eng and William Rodriguez-Cintron and Rocio Chapela and Ford, {Jean G.} and Avila, {Pedro C.} and Jose Rodriguez-Santana and Burchard, {Esteban Gonz{\`a}lez} and Eran Halperin",

note = "Funding Information: Funding: This study was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel Aviv University. E.H. and Y.B. were partially supported by the Israeli Science Foundation, grant no. 04514831, and by the IBM open collaborative research. B.P. was supported by National Institutes of Health grant R01 HG006399.",

year = "2012",

month = may,

doi = "10.1093/bioinformatics/bts144",

language = "English (US)",

volume = "28",

pages = "1359--1367",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "10",

}

TY - JOUR

T1 - Fast and accurate inference of local ancestry in Latino populations

AU - Baran, Yael

AU - Pasaniuc, Bogdan

AU - Sankararaman, Sriram

AU - Torgerson, Dara G.

AU - Gignoux, Christopher

AU - Eng, Celeste

AU - Rodriguez-Cintron, William

AU - Chapela, Rocio

AU - Ford, Jean G.

AU - Avila, Pedro C.

AU - Rodriguez-Santana, Jose

AU - Burchard, Esteban Gonzàlez

AU - Halperin, Eran

N1 - Funding Information: Funding: This study was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel Aviv University. E.H. and Y.B. were partially supported by the Israeli Science Foundation, grant no. 04514831, and by the IBM open collaborative research. B.P. was supported by National Institutes of Health grant R01 HG006399.

PY - 2012/5

Y1 - 2012/5

N2 - Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in twoway admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.

AB - Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in twoway admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.

UR - http://www.scopus.com/inward/record.url?scp=84861127863&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84861127863&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bts144

DO - 10.1093/bioinformatics/bts144

M3 - Article

C2 - 22495753

AN - SCOPUS:84861127863

SN - 1367-4803

VL - 28

SP - 1359

EP - 1367

JO - Bioinformatics

JF - Bioinformatics

IS - 10

M1 - bts144

ER -

Fast and accurate inference of local ancestry in Latino populations

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this