Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

Daehwan Kim, Joseph M. Paggi, Chanhee Park, Christopher Bennett, Steven L Salzberg

Research output: Contribution to journalArticle

Abstract

The human reference genome represents only a small number of individuals, which limits its usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index. We use HISAT2 to represent and search an expanded model of the human reference genome in which over 14.5 million genomic variants in combination with haplotypes are incorporated into the data structure used for searching and alignment. We benchmark HISAT2 using simulated and real datasets to demonstrate that our strategy of representing a population of genomes, together with a fast, memory-efficient search algorithm, provides more detailed and accurate variant analyses than other methods. We apply HISAT2 for HLA typing and DNA fingerprinting; both applications form part of the HISAT-genotype software that enables analysis of haplotype-resolved genes or genomic regions. HISAT-genotype outperforms other computational methods and matches or exceeds the performance of laboratory-based assays.

Original languageEnglish (US)
Pages (from-to)907-915
Number of pages9
JournalNature biotechnology
Volume37
Issue number8
DOIs
StatePublished - Aug 1 2019

Fingerprint

Genes
Genotype
Genome
Human Genome
Haplotypes
Benchmarking
Histocompatibility Testing
DNA Fingerprinting
DNA
Software
Computational methods
RNA
Data structures
Assays
Population
Data storage equipment
Datasets

ASJC Scopus subject areas

  • Biotechnology
  • Bioengineering
  • Applied Microbiology and Biotechnology
  • Molecular Medicine
  • Biomedical Engineering

Cite this

Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. / Kim, Daehwan; Paggi, Joseph M.; Park, Chanhee; Bennett, Christopher; Salzberg, Steven L.

In: Nature biotechnology, Vol. 37, No. 8, 01.08.2019, p. 907-915.

Research output: Contribution to journalArticle

Kim, Daehwan ; Paggi, Joseph M. ; Park, Chanhee ; Bennett, Christopher ; Salzberg, Steven L. / Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. In: Nature biotechnology. 2019 ; Vol. 37, No. 8. pp. 907-915.
@article{b00f9d3380e44bd9b0cd380c81b4a47b,
title = "Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype",
abstract = "The human reference genome represents only a small number of individuals, which limits its usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index. We use HISAT2 to represent and search an expanded model of the human reference genome in which over 14.5 million genomic variants in combination with haplotypes are incorporated into the data structure used for searching and alignment. We benchmark HISAT2 using simulated and real datasets to demonstrate that our strategy of representing a population of genomes, together with a fast, memory-efficient search algorithm, provides more detailed and accurate variant analyses than other methods. We apply HISAT2 for HLA typing and DNA fingerprinting; both applications form part of the HISAT-genotype software that enables analysis of haplotype-resolved genes or genomic regions. HISAT-genotype outperforms other computational methods and matches or exceeds the performance of laboratory-based assays.",
author = "Daehwan Kim and Paggi, {Joseph M.} and Chanhee Park and Christopher Bennett and Salzberg, {Steven L}",
year = "2019",
month = "8",
day = "1",
doi = "10.1038/s41587-019-0201-4",
language = "English (US)",
volume = "37",
pages = "907--915",
journal = "Biotechnology",
issn = "1087-0156",
publisher = "Nature Publishing Group",
number = "8",

}

TY - JOUR

T1 - Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

AU - Kim, Daehwan

AU - Paggi, Joseph M.

AU - Park, Chanhee

AU - Bennett, Christopher

AU - Salzberg, Steven L

PY - 2019/8/1

Y1 - 2019/8/1

N2 - The human reference genome represents only a small number of individuals, which limits its usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index. We use HISAT2 to represent and search an expanded model of the human reference genome in which over 14.5 million genomic variants in combination with haplotypes are incorporated into the data structure used for searching and alignment. We benchmark HISAT2 using simulated and real datasets to demonstrate that our strategy of representing a population of genomes, together with a fast, memory-efficient search algorithm, provides more detailed and accurate variant analyses than other methods. We apply HISAT2 for HLA typing and DNA fingerprinting; both applications form part of the HISAT-genotype software that enables analysis of haplotype-resolved genes or genomic regions. HISAT-genotype outperforms other computational methods and matches or exceeds the performance of laboratory-based assays.

AB - The human reference genome represents only a small number of individuals, which limits its usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index. We use HISAT2 to represent and search an expanded model of the human reference genome in which over 14.5 million genomic variants in combination with haplotypes are incorporated into the data structure used for searching and alignment. We benchmark HISAT2 using simulated and real datasets to demonstrate that our strategy of representing a population of genomes, together with a fast, memory-efficient search algorithm, provides more detailed and accurate variant analyses than other methods. We apply HISAT2 for HLA typing and DNA fingerprinting; both applications form part of the HISAT-genotype software that enables analysis of haplotype-resolved genes or genomic regions. HISAT-genotype outperforms other computational methods and matches or exceeds the performance of laboratory-based assays.

UR - http://www.scopus.com/inward/record.url?scp=85071193100&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071193100&partnerID=8YFLogxK

U2 - 10.1038/s41587-019-0201-4

DO - 10.1038/s41587-019-0201-4

M3 - Article

VL - 37

SP - 907

EP - 915

JO - Biotechnology

JF - Biotechnology

SN - 1087-0156

IS - 8

ER -