TY - JOUR
T1 - Identifying Transmission Clusters with Cluster Picker and HIV-TRACE
AU - Rose, Rebecca
AU - Lamers, Susanna L.
AU - Dollar, James J.
AU - Grabowski, Mary K.
AU - Hodcroft, Emma B.
AU - Ragonnet-Cronin, Manon
AU - Wertheim, Joel O.
AU - Redd, Andrew D.
AU - German, Danielle
AU - Laeyendecker, Oliver
N1 - Publisher Copyright:
© 2017, Mary Ann Liebert, Inc. 2017.
PY - 2017/3
Y1 - 2017/3
N2 - We compared the behavior of two approaches (Cluster Picker and HIV-TRACE) at varying genetic distances to identify transmission clusters. We used three HIV gp41 sequence datasets originating from the Rakai Community Cohort Study: (1) next-generation sequence (NGS) data from nine linked couples; (2) NGS data from longitudinal sampling of 14 individuals; and (3) Sanger consensus sequences from a cross-sectional dataset (n = 1,022) containing 91 epidemiologically linked heterosexual couples. We calculated the optimal genetic distance threshold to separate linked versus unlinked NGS datasets using a receiver operating curve analysis. We evaluated the number, size, and composition of clusters detected by Cluster Picker and HIV-TRACE at six genetic distance thresholds (1%-5.3%) on all three datasets. We further tested the effect of using all NGS, versus only a single variant for each patient/time point, for datasets (1) and (2). The optimal gp41 genetic distance threshold to distinguish linked and unlinked couples and individuals was 5.3% and 4%, respectively. HIV-TRACE tended to detect larger and fewer clusters, whereas Cluster Picker detected more clusters containing only two sequences. For NGS datasets (1) and (2), HIV-TRACE and Cluster Picker detected all linked pairs at 3% and 4% genetic distances, respectively. However, at 5.3% genetic distance, 20% of couples in dataset (3) did not cluster using either program, and for >1/3 of couples cluster assignment were discordant. We suggest caution in choosing thresholds for clustering analyses in a generalized epidemic.
AB - We compared the behavior of two approaches (Cluster Picker and HIV-TRACE) at varying genetic distances to identify transmission clusters. We used three HIV gp41 sequence datasets originating from the Rakai Community Cohort Study: (1) next-generation sequence (NGS) data from nine linked couples; (2) NGS data from longitudinal sampling of 14 individuals; and (3) Sanger consensus sequences from a cross-sectional dataset (n = 1,022) containing 91 epidemiologically linked heterosexual couples. We calculated the optimal genetic distance threshold to separate linked versus unlinked NGS datasets using a receiver operating curve analysis. We evaluated the number, size, and composition of clusters detected by Cluster Picker and HIV-TRACE at six genetic distance thresholds (1%-5.3%) on all three datasets. We further tested the effect of using all NGS, versus only a single variant for each patient/time point, for datasets (1) and (2). The optimal gp41 genetic distance threshold to distinguish linked and unlinked couples and individuals was 5.3% and 4%, respectively. HIV-TRACE tended to detect larger and fewer clusters, whereas Cluster Picker detected more clusters containing only two sequences. For NGS datasets (1) and (2), HIV-TRACE and Cluster Picker detected all linked pairs at 3% and 4% genetic distances, respectively. However, at 5.3% genetic distance, 20% of couples in dataset (3) did not cluster using either program, and for >1/3 of couples cluster assignment were discordant. We suggest caution in choosing thresholds for clustering analyses in a generalized epidemic.
KW - HIV
KW - Uganda
KW - viral clustering
UR - http://www.scopus.com/inward/record.url?scp=85014485013&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85014485013&partnerID=8YFLogxK
U2 - 10.1089/aid.2016.0205
DO - 10.1089/aid.2016.0205
M3 - Article
C2 - 27824249
AN - SCOPUS:85014485013
SN - 0889-2229
VL - 33
SP - 211
EP - 218
JO - AIDS research and human retroviruses
JF - AIDS research and human retroviruses
IS - 3
ER -