HIV-1 Full-Genome Phylogenetics of Generalized Epidemics in Sub-Saharan Africa: Impact of Missing Nucleotide Characters in Next-Generation Sequences

Oliver Ratmann, Chris Wymant, Caroline Colijn, Siva Danaviah, Max Essex, Simon Frost, Astrid Gall, Simani Gaseitsiwe, Mary Grabowski, Ronald H Gray, Stephane Guindon, Arndt Von Haeseler, Pontiano Kaleebu, Michelle Kendall, Alexey Kozlov, Justen Manasa, Bui Quang Minh, Sikhulile Moyo, Vlad Novitsky, Rebecca NsubugaSureshnee Pillay, Thomas C Quinn, David Serwadda, Deogratius Ssemwanga, Alexandros Stamatakis, Jana Trifinopoulos, Maria J Wawer, Andy Leigh Brown, Tulio De Oliveira, Paul Kellam, Deenan Pillay, Christophe Fraser

Research output: Contribution to journalArticle

Abstract

To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the "Phylogenetics and Networks for Generalised HIV Epidemics in Africa" consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.

Original languageEnglish (US)
Pages (from-to)1083-1098
Number of pages16
JournalAIDS Research and Human Retroviruses
Volume33
Issue number11
DOIs
StatePublished - Nov 1 2017

Fingerprint

Africa South of the Sahara
HIV-1
Nucleotides
HIV
Genome
Phylogeny
Uganda
Viral Genome
nef Genes
Consensus Sequence
South Africa
Viral Load
Artifacts
Cohort Studies
Health

Keywords

  • human immunodeficiency virus, phylogenomics, phylodynamics, HIV-1 molecular epidemiology, sub-Saharan Africa, PANGEA

ASJC Scopus subject areas

  • Immunology
  • Virology
  • Infectious Diseases

Cite this

HIV-1 Full-Genome Phylogenetics of Generalized Epidemics in Sub-Saharan Africa : Impact of Missing Nucleotide Characters in Next-Generation Sequences. / Ratmann, Oliver; Wymant, Chris; Colijn, Caroline; Danaviah, Siva; Essex, Max; Frost, Simon; Gall, Astrid; Gaseitsiwe, Simani; Grabowski, Mary; Gray, Ronald H; Guindon, Stephane; Von Haeseler, Arndt; Kaleebu, Pontiano; Kendall, Michelle; Kozlov, Alexey; Manasa, Justen; Minh, Bui Quang; Moyo, Sikhulile; Novitsky, Vlad; Nsubuga, Rebecca; Pillay, Sureshnee; Quinn, Thomas C; Serwadda, David; Ssemwanga, Deogratius; Stamatakis, Alexandros; Trifinopoulos, Jana; Wawer, Maria J; Brown, Andy Leigh; De Oliveira, Tulio; Kellam, Paul; Pillay, Deenan; Fraser, Christophe.

In: AIDS Research and Human Retroviruses, Vol. 33, No. 11, 01.11.2017, p. 1083-1098.

Research output: Contribution to journalArticle

Ratmann, O, Wymant, C, Colijn, C, Danaviah, S, Essex, M, Frost, S, Gall, A, Gaseitsiwe, S, Grabowski, M, Gray, RH, Guindon, S, Von Haeseler, A, Kaleebu, P, Kendall, M, Kozlov, A, Manasa, J, Minh, BQ, Moyo, S, Novitsky, V, Nsubuga, R, Pillay, S, Quinn, TC, Serwadda, D, Ssemwanga, D, Stamatakis, A, Trifinopoulos, J, Wawer, MJ, Brown, AL, De Oliveira, T, Kellam, P, Pillay, D & Fraser, C 2017, 'HIV-1 Full-Genome Phylogenetics of Generalized Epidemics in Sub-Saharan Africa: Impact of Missing Nucleotide Characters in Next-Generation Sequences', AIDS Research and Human Retroviruses, vol. 33, no. 11, pp. 1083-1098. https://doi.org/10.1089/aid.2017.0061
Ratmann, Oliver ; Wymant, Chris ; Colijn, Caroline ; Danaviah, Siva ; Essex, Max ; Frost, Simon ; Gall, Astrid ; Gaseitsiwe, Simani ; Grabowski, Mary ; Gray, Ronald H ; Guindon, Stephane ; Von Haeseler, Arndt ; Kaleebu, Pontiano ; Kendall, Michelle ; Kozlov, Alexey ; Manasa, Justen ; Minh, Bui Quang ; Moyo, Sikhulile ; Novitsky, Vlad ; Nsubuga, Rebecca ; Pillay, Sureshnee ; Quinn, Thomas C ; Serwadda, David ; Ssemwanga, Deogratius ; Stamatakis, Alexandros ; Trifinopoulos, Jana ; Wawer, Maria J ; Brown, Andy Leigh ; De Oliveira, Tulio ; Kellam, Paul ; Pillay, Deenan ; Fraser, Christophe. / HIV-1 Full-Genome Phylogenetics of Generalized Epidemics in Sub-Saharan Africa : Impact of Missing Nucleotide Characters in Next-Generation Sequences. In: AIDS Research and Human Retroviruses. 2017 ; Vol. 33, No. 11. pp. 1083-1098.
@article{3aa5d55af28343abb8e225edc1599049,
title = "HIV-1 Full-Genome Phylogenetics of Generalized Epidemics in Sub-Saharan Africa: Impact of Missing Nucleotide Characters in Next-Generation Sequences",
abstract = "To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the {"}Phylogenetics and Networks for Generalised HIV Epidemics in Africa{"} consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80{\%} of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75{\%} of sequences from Mochudi, 60{\%} of sequences from MRC/UVRI Uganda, and 22{\%} of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.",
keywords = "human immunodeficiency virus, phylogenomics, phylodynamics, HIV-1 molecular epidemiology, sub-Saharan Africa, PANGEA",
author = "Oliver Ratmann and Chris Wymant and Caroline Colijn and Siva Danaviah and Max Essex and Simon Frost and Astrid Gall and Simani Gaseitsiwe and Mary Grabowski and Gray, {Ronald H} and Stephane Guindon and {Von Haeseler}, Arndt and Pontiano Kaleebu and Michelle Kendall and Alexey Kozlov and Justen Manasa and Minh, {Bui Quang} and Sikhulile Moyo and Vlad Novitsky and Rebecca Nsubuga and Sureshnee Pillay and Quinn, {Thomas C} and David Serwadda and Deogratius Ssemwanga and Alexandros Stamatakis and Jana Trifinopoulos and Wawer, {Maria J} and Brown, {Andy Leigh} and {De Oliveira}, Tulio and Paul Kellam and Deenan Pillay and Christophe Fraser",
year = "2017",
month = "11",
day = "1",
doi = "10.1089/aid.2017.0061",
language = "English (US)",
volume = "33",
pages = "1083--1098",
journal = "AIDS Research and Human Retroviruses",
issn = "0889-2229",
publisher = "Mary Ann Liebert Inc.",
number = "11",

}

TY - JOUR

T1 - HIV-1 Full-Genome Phylogenetics of Generalized Epidemics in Sub-Saharan Africa

T2 - Impact of Missing Nucleotide Characters in Next-Generation Sequences

AU - Ratmann, Oliver

AU - Wymant, Chris

AU - Colijn, Caroline

AU - Danaviah, Siva

AU - Essex, Max

AU - Frost, Simon

AU - Gall, Astrid

AU - Gaseitsiwe, Simani

AU - Grabowski, Mary

AU - Gray, Ronald H

AU - Guindon, Stephane

AU - Von Haeseler, Arndt

AU - Kaleebu, Pontiano

AU - Kendall, Michelle

AU - Kozlov, Alexey

AU - Manasa, Justen

AU - Minh, Bui Quang

AU - Moyo, Sikhulile

AU - Novitsky, Vlad

AU - Nsubuga, Rebecca

AU - Pillay, Sureshnee

AU - Quinn, Thomas C

AU - Serwadda, David

AU - Ssemwanga, Deogratius

AU - Stamatakis, Alexandros

AU - Trifinopoulos, Jana

AU - Wawer, Maria J

AU - Brown, Andy Leigh

AU - De Oliveira, Tulio

AU - Kellam, Paul

AU - Pillay, Deenan

AU - Fraser, Christophe

PY - 2017/11/1

Y1 - 2017/11/1

N2 - To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the "Phylogenetics and Networks for Generalised HIV Epidemics in Africa" consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.

AB - To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the "Phylogenetics and Networks for Generalised HIV Epidemics in Africa" consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.

KW - human immunodeficiency virus, phylogenomics, phylodynamics, HIV-1 molecular epidemiology, sub-Saharan Africa, PANGEA

UR - http://www.scopus.com/inward/record.url?scp=85032619182&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85032619182&partnerID=8YFLogxK

U2 - 10.1089/aid.2017.0061

DO - 10.1089/aid.2017.0061

M3 - Article

C2 - 28540766

AN - SCOPUS:85032619182

VL - 33

SP - 1083

EP - 1098

JO - AIDS Research and Human Retroviruses

JF - AIDS Research and Human Retroviruses

SN - 0889-2229

IS - 11

ER -