Identifying Transmission Clusters with Cluster Picker and HIV-TRACE

Rebecca Rose, Susanna L. Lamers, James J. Dollar, Mary Grabowski, Emma B. Hodcroft, Manon Ragonnet-Cronin, Joel O. Wertheim, Andrew Redd, Danielle German, Oliver B. Laeyendecker

Research output: Contribution to journalArticle

Abstract

We compared the behavior of two approaches (Cluster Picker and HIV-TRACE) at varying genetic distances to identify transmission clusters. We used three HIV gp41 sequence datasets originating from the Rakai Community Cohort Study: (1) next-generation sequence (NGS) data from nine linked couples; (2) NGS data from longitudinal sampling of 14 individuals; and (3) Sanger consensus sequences from a cross-sectional dataset (n = 1,022) containing 91 epidemiologically linked heterosexual couples. We calculated the optimal genetic distance threshold to separate linked versus unlinked NGS datasets using a receiver operating curve analysis. We evaluated the number, size, and composition of clusters detected by Cluster Picker and HIV-TRACE at six genetic distance thresholds (1%-5.3%) on all three datasets. We further tested the effect of using all NGS, versus only a single variant for each patient/time point, for datasets (1) and (2). The optimal gp41 genetic distance threshold to distinguish linked and unlinked couples and individuals was 5.3% and 4%, respectively. HIV-TRACE tended to detect larger and fewer clusters, whereas Cluster Picker detected more clusters containing only two sequences. For NGS datasets (1) and (2), HIV-TRACE and Cluster Picker detected all linked pairs at 3% and 4% genetic distances, respectively. However, at 5.3% genetic distance, 20% of couples in dataset (3) did not cluster using either program, and for >1/3 of couples cluster assignment were discordant. We suggest caution in choosing thresholds for clustering analyses in a generalized epidemic.

Original languageEnglish (US)
Pages (from-to)211-218
Number of pages8
JournalAIDS Research and Human Retroviruses
Volume33
Issue number3
DOIs
StatePublished - Mar 1 2017

Fingerprint

HIV
HIV Envelope Protein gp41
HIV-2
Heterosexuality
Consensus Sequence
Datasets
Cluster Analysis
HIV-1
Cohort Studies

Keywords

  • HIV
  • Uganda
  • viral clustering

ASJC Scopus subject areas

  • Immunology
  • Infectious Diseases
  • Virology

Cite this

Identifying Transmission Clusters with Cluster Picker and HIV-TRACE. / Rose, Rebecca; Lamers, Susanna L.; Dollar, James J.; Grabowski, Mary; Hodcroft, Emma B.; Ragonnet-Cronin, Manon; Wertheim, Joel O.; Redd, Andrew; German, Danielle; Laeyendecker, Oliver B.

In: AIDS Research and Human Retroviruses, Vol. 33, No. 3, 01.03.2017, p. 211-218.

Research output: Contribution to journalArticle

Rose, Rebecca ; Lamers, Susanna L. ; Dollar, James J. ; Grabowski, Mary ; Hodcroft, Emma B. ; Ragonnet-Cronin, Manon ; Wertheim, Joel O. ; Redd, Andrew ; German, Danielle ; Laeyendecker, Oliver B. / Identifying Transmission Clusters with Cluster Picker and HIV-TRACE. In: AIDS Research and Human Retroviruses. 2017 ; Vol. 33, No. 3. pp. 211-218.
@article{5d5aeceb1cbf4d4c9847ef835c4bbedc,
title = "Identifying Transmission Clusters with Cluster Picker and HIV-TRACE",
abstract = "We compared the behavior of two approaches (Cluster Picker and HIV-TRACE) at varying genetic distances to identify transmission clusters. We used three HIV gp41 sequence datasets originating from the Rakai Community Cohort Study: (1) next-generation sequence (NGS) data from nine linked couples; (2) NGS data from longitudinal sampling of 14 individuals; and (3) Sanger consensus sequences from a cross-sectional dataset (n = 1,022) containing 91 epidemiologically linked heterosexual couples. We calculated the optimal genetic distance threshold to separate linked versus unlinked NGS datasets using a receiver operating curve analysis. We evaluated the number, size, and composition of clusters detected by Cluster Picker and HIV-TRACE at six genetic distance thresholds (1{\%}-5.3{\%}) on all three datasets. We further tested the effect of using all NGS, versus only a single variant for each patient/time point, for datasets (1) and (2). The optimal gp41 genetic distance threshold to distinguish linked and unlinked couples and individuals was 5.3{\%} and 4{\%}, respectively. HIV-TRACE tended to detect larger and fewer clusters, whereas Cluster Picker detected more clusters containing only two sequences. For NGS datasets (1) and (2), HIV-TRACE and Cluster Picker detected all linked pairs at 3{\%} and 4{\%} genetic distances, respectively. However, at 5.3{\%} genetic distance, 20{\%} of couples in dataset (3) did not cluster using either program, and for >1/3 of couples cluster assignment were discordant. We suggest caution in choosing thresholds for clustering analyses in a generalized epidemic.",
keywords = "HIV, Uganda, viral clustering",
author = "Rebecca Rose and Lamers, {Susanna L.} and Dollar, {James J.} and Mary Grabowski and Hodcroft, {Emma B.} and Manon Ragonnet-Cronin and Wertheim, {Joel O.} and Andrew Redd and Danielle German and Laeyendecker, {Oliver B.}",
year = "2017",
month = "3",
day = "1",
doi = "10.1089/aid.2016.0205",
language = "English (US)",
volume = "33",
pages = "211--218",
journal = "AIDS Research and Human Retroviruses",
issn = "0889-2229",
publisher = "Mary Ann Liebert Inc.",
number = "3",

}

TY - JOUR

T1 - Identifying Transmission Clusters with Cluster Picker and HIV-TRACE

AU - Rose, Rebecca

AU - Lamers, Susanna L.

AU - Dollar, James J.

AU - Grabowski, Mary

AU - Hodcroft, Emma B.

AU - Ragonnet-Cronin, Manon

AU - Wertheim, Joel O.

AU - Redd, Andrew

AU - German, Danielle

AU - Laeyendecker, Oliver B.

PY - 2017/3/1

Y1 - 2017/3/1

N2 - We compared the behavior of two approaches (Cluster Picker and HIV-TRACE) at varying genetic distances to identify transmission clusters. We used three HIV gp41 sequence datasets originating from the Rakai Community Cohort Study: (1) next-generation sequence (NGS) data from nine linked couples; (2) NGS data from longitudinal sampling of 14 individuals; and (3) Sanger consensus sequences from a cross-sectional dataset (n = 1,022) containing 91 epidemiologically linked heterosexual couples. We calculated the optimal genetic distance threshold to separate linked versus unlinked NGS datasets using a receiver operating curve analysis. We evaluated the number, size, and composition of clusters detected by Cluster Picker and HIV-TRACE at six genetic distance thresholds (1%-5.3%) on all three datasets. We further tested the effect of using all NGS, versus only a single variant for each patient/time point, for datasets (1) and (2). The optimal gp41 genetic distance threshold to distinguish linked and unlinked couples and individuals was 5.3% and 4%, respectively. HIV-TRACE tended to detect larger and fewer clusters, whereas Cluster Picker detected more clusters containing only two sequences. For NGS datasets (1) and (2), HIV-TRACE and Cluster Picker detected all linked pairs at 3% and 4% genetic distances, respectively. However, at 5.3% genetic distance, 20% of couples in dataset (3) did not cluster using either program, and for >1/3 of couples cluster assignment were discordant. We suggest caution in choosing thresholds for clustering analyses in a generalized epidemic.

AB - We compared the behavior of two approaches (Cluster Picker and HIV-TRACE) at varying genetic distances to identify transmission clusters. We used three HIV gp41 sequence datasets originating from the Rakai Community Cohort Study: (1) next-generation sequence (NGS) data from nine linked couples; (2) NGS data from longitudinal sampling of 14 individuals; and (3) Sanger consensus sequences from a cross-sectional dataset (n = 1,022) containing 91 epidemiologically linked heterosexual couples. We calculated the optimal genetic distance threshold to separate linked versus unlinked NGS datasets using a receiver operating curve analysis. We evaluated the number, size, and composition of clusters detected by Cluster Picker and HIV-TRACE at six genetic distance thresholds (1%-5.3%) on all three datasets. We further tested the effect of using all NGS, versus only a single variant for each patient/time point, for datasets (1) and (2). The optimal gp41 genetic distance threshold to distinguish linked and unlinked couples and individuals was 5.3% and 4%, respectively. HIV-TRACE tended to detect larger and fewer clusters, whereas Cluster Picker detected more clusters containing only two sequences. For NGS datasets (1) and (2), HIV-TRACE and Cluster Picker detected all linked pairs at 3% and 4% genetic distances, respectively. However, at 5.3% genetic distance, 20% of couples in dataset (3) did not cluster using either program, and for >1/3 of couples cluster assignment were discordant. We suggest caution in choosing thresholds for clustering analyses in a generalized epidemic.

KW - HIV

KW - Uganda

KW - viral clustering

UR - http://www.scopus.com/inward/record.url?scp=85014485013&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014485013&partnerID=8YFLogxK

U2 - 10.1089/aid.2016.0205

DO - 10.1089/aid.2016.0205

M3 - Article

C2 - 27824249

AN - SCOPUS:85014485013

VL - 33

SP - 211

EP - 218

JO - AIDS Research and Human Retroviruses

JF - AIDS Research and Human Retroviruses

SN - 0889-2229

IS - 3

ER -