Abstract
Background –
The structure of the HIV transmission networks can be dictated by just a few individuals. Public health intervention, such as ensuring people living with HIV adhere to antiretroviral therapy and remain virally-suppressed, can help control the spread of the virus. However, such intervention requires utilizing limited public health resource allocations. Determining which individuals are most at risk of transmitting HIV could allow public health officials to focus their limited resources on these individuals.
Setting –
Molecular epidemiology can help prioritize people living with HIV by patterns of transmission inferred from their sampled viral sequences. Such prioritization has been previously suggested and performed by monitoring cluster growth. In this paper, we introduce ProACT (Prioritization using AnCesTral edge lengths), a phylogenetic approach for prioritizing individuals living with HIV.
Methods –
ProACT starts from a phylogeny inferred from sequence data and orders individuals according to their terminal branch length, breaking ties using ancestral branch lengths. We evaluated ProACT on a real dataset of 926 HIV-1 subtype B pol data obtained in San Diego between 2005 and 2014, and a simulation datasets modelling the same epidemic. Prioritization methods are compared by their ability to predict individuals who transmit most after the prioritization.
Results –
Across all simulation conditions and most real data sampling conditions, ProACT outperformed monitoring cluster growth for multiple metrics of prioritization efficacy.
Conclusion –
The simple strategy used by ProACT improves the effectiveness of prioritization compared to state-of-the-art methods that rely on monitoring the growth of transmission clusters defined based on genetic distance.
1. Introduction
The transmission of Human Immunodeficiency Virus (HIV) resembles scale-free networks,1 likely due to the scale-free properties of sexual contacts and injection drug use.2,3 As a result, public health intervention may be more effective when targeted at people living with HIV who are more likely to grow the transmission network. However, the best method to target individuals for specific interventions remains an open question.
Continued treatment is one way of reducing transmissions. Antiretroviral therapy (ART) effectively suppresses the HIV virus in the majority of cases, stops the progression of the disease, and prevents onward transmission, provided the individual continuously adheres to the treatment.4 In many (though not all) health care systems, ART is made available routinely to newly diagnosed patients, but further care is needed. Not every diagnosed person initiates ART, and not all cases of ART initiation lead to a sustained suppression through time. Individuals who start ART but fail to sustain it or who are otherwise unsuppressed can still infect others. Thus, a possible intervention is to help diagnosed individuals to stay on ART and to remain continually suppressed.5 Such interventions require allocation of clinical staff who would follow up with patients to provide them further assistance in sustenance of ART. The health system can also provide increased viral load testing to adhering individuals to ensure suppression.
Another form of intervention targets undiagnosed and HIV negative individuals. The health system can use partner tracing to identify the sexual partners of diagnosed individuals as best as possible, test these individuals, and offer them either treatment for positives or PrEP prevention for negatives.6 Although contact tracing may have ethical concerns, it is a common and appropriate tool for controlling infectious diseases (e.g. HIV, Tb, SARS-CoV-2) when delivered by trained public health staff.7 Contact tracing for HIV is resource-intensive and is often viewed as insufficiently used.8 Community-level reductions in viral loads are associated with decreased HIV incidence rates9 (though empirical data on effectiveness of contact tracing are scant10).
These interventions are costly and cannot be undertaken for every known individual. If diagnosed people at risk of not being suppressed can be predicted accurately, the public health system can prioritize followup efforts to these individuals. For tracing, care provided to positive contacts and PrEP given to high-risk negative contacts can decrease HIV spread within the community only if enough unsuppressed positives and high-risk contacts are found; thus, individuals with many such contacts should be prioritized. Moreover, if these high-priority individuals show associations with geographical or demographic groups (beyond what is known), the public health system can design strategies for further outreach, testing, and PrEP administration for the impacted groups. The goal of prioritization is to increase the number of prevented transmissions for a fixed budget by increasing the chance of finding the right targets for followups, tracing, and testing. However, predicting tendency for future transmissions is difficult, especially if we aim to be more specific than broad demographic characteristics.
Molecular epidemiology suggests a solution: prioritizing based on patterns of transmission inferred from HIV sequence data.1,11–17 The inference of transmission networks using phylogenetic or distance-based methods has been the subject of much research.18–21 However, in this work, instead of being concerned with inferring exact patterns of transmissions, we ask the following question: given molecular data from a set of sequenced HIV positive individuals (“samples” for short), who should be prioritized for further intervention?
Prioritizing care based on molecular epidemiology has been studied recently. Wertheim et al.15 present a method for prioritizing samples based on performing transmission clustering (i.e., grouping individuals with low viral genetic distance into transmission clusters) and ordering clusters by growth rate. On a large dataset from New York, they show that the approach is able to predict individuals who have relatively larger numbers of transmission links in the near future. Moshiri et al.22 have studied the same question in simulations and have shown that monitoring cluster growth can be used for predicting future transmissions substantially better than a random guess, whether clusters are defined using genetic distances or using phylogenetic methods. Balaban et al.23 showed in simulations that using an approach similar to that of Wertheim et al.15 but defining clusters using phylogenetic trees gives a small but consistent improvement over clusters defined using distances. McLaughlin et al.24 developed a predictive model in which they combined viral lineage-level diversification rates inferred from phylogenetic trees with sociodemographic and clinical data, and after aggregating by patients’ area of residence, they were able to predict the distribution of new HIV cases in British Columbia, Canada 8 between 2008 and 2018 with higher predictive power than a phylogenetically uninformed model.
In this paper, we introduce a new method for ordering samples based on their phylogenetic relationships. Instead of relying on clustering individuals and then ordering clusters based on their growth, we seek to order individuals without clustering, without relying on parametric models, and without incorporating demographic information about patients. Instead, we seek to simply exploit patterns in the phylogeny, and in particular, in branch lengths.
2. Methods
ProACT (Prioritization using AnCesTral edge lengths) takes as input the inferred phylogenetic tree of sampled HIV viruses, rooted using an outgroup or clock-based methods (e.g., MinVar-root25). ProACT simply orders samples in ascending order of incident branch length of their associated virus, breaking ties based on the incident branch lengths of parent nodes, then those of grandparent nodes, etc., and ultimately by sampling times if available (see Supplementary Figure S1 for a formal description). ProACT is very fast (scaling quasilinearly), and scalable methods exist for inferring26,27 and rooting25 very large trees.
ProACT is motivated and tested in a context similar to those health care systems that enjoy enough resources to offer ART to all (or most) diagnosed individuals. Thus, each sample can be assumed to be given ART at a time close to when their HIV is sequenced, but they may fail to be suppressed for the remainder of their life. These conditions describe the current common practice of care in many places and is increasingly adopted elsewhere.
To motivate ProACT, we start with the observation that, in a ten-year epidemic simulation (described in detail below), when a phylogeny is inferred from sequences obtained at a given time point in an epidemic, the more a node transmits, the shorter its incident branch length tends to be (Figs. 1d–e and S2). Using the Kendall’s tau-b test,28 we found a statistically significant anticorrelation between the incident branch lengths of individuals sampled within the first 9 years of the epidemic and the number of individuals they infected during year 10, regardless of whether we use true (τB = −0.0431, p << 10−10) or inferred (τB = −0.0.0354, p << 10−10) phylogenies. Though not obvious, this observation can be explained by the constraints placed upon the viral phylogeny by the transmission history (Fig. 1a–c).
Figure 1:

(a) Individual A transmits to individuals B and C at times t1 and t2. (b) Viral samples are obtained at times tA, tB, and tC. The viral phylogeny is constrained by each transmission event’s bottleneck. The most likely phylogeny matches the transmission history (left), but with deeper coalescences (less likely), it may not match (right). (c) Moving from tB to tC, the branch length incident to individual A shortens upon the addition of individual C in the likely event that the coalescence of the lineage from C with the lineage from A is more recent than its coalescence with the lineage from B (left), or it remains constant in the event of a less likely deeper coalescence (right). Regardless, the length of this branch never increases. In simulation, we can observe this trend: as time progresses, the incident branch lengths tend to decrease, both in true (Fig. S2) and inferred (d) phylogenies, and as the number of transmissions from a given individual increases, incident edge lengths tend to decrease. Patterns are similar with the true simulated tree (“True”) or the tree inferred from sequences (“Est.”) (e).
In many places, HIV samples are typically sequenced upon beginning Antiretroviral Therapy (ART). Let’s assume for simplicity that every individual in the given dataset has at some point initiated ART; then, future transmissions by individuals in the dataset can happen only if the source stops ART or is otherwise unsuppressed. If, in the future, individual u transmits to individual v, there are two possible scenarios regarding the placement of the leaf corresponding to v in the existing phylogeny: (1) v is placed on the edge incident to u, so the edge incident to u will shorten, or (2) v is placed further up the tree, so the edge incident to u will remain the same length. Although Scenario 2 is possible, Scenario 1 is more likely.29 Thus, as time goes by, a terminal branch can only shorten or stay fixed, and it will most often shorten because of transmissions from the individual represented by that branch. This pattern, easily observed in simulations (Fig. 1d), leads to shorter branches for samples who have transmitted recently. The first time a sample infect others, its terminal branch length likely decreases, and further transmissions further decrease the length (Fig. 1d). Thus, one expects nodes with smaller incident branch length to be more likely to have transmitted since their sampling time. Moreover, they are more likely not to be suppressed compared to individuals on long branches. Since only individuals that are not suppressed transmit, a higher probability of a lack of suppression makes an individual a good candidate for intervention to ensure suppression.
2.1. Simulations
We used FAVITES to simulate a sexual contact network, transmission network, viral phylogeny, and viral sequences. We chose FAVITES parameters (Table S2) to emulate HIV transmission in San Diego (2005–2014). We mostly used the parameters from Moshiri et al.22 with some modifications to better capture reality. We explored the impact of changing three of the main parameters (Table 1) to obtain nine model conditions (each replicated 20 times).
Table 1:
HIV simulation parameters. Values for the base model condition are shown in bold.
| Parameter | Values |
|---|---|
| ART Initiation Rate (λ+, year−1) | 1, 2, 4 |
| ART Termination Rate (λ−, year−1) | 0.12 (0.25x), 0.24 (0.5x), 0.48 (1x), 0.96 (2x), 1.92 (4x) |
| Expected Degree (Ed) | 10, 20, 30 |
Sexual Contact Network.
To simulate scale-free sexual contact networks, Moshiri et al.22 used the Barabási–Albert (BA) model.30 HIV sexual networks also typically include densely-connected communities,31 a property the BA model fails to directly model. To add communities, we independently simulated 20 BA sexual contact networks, each with 5,000 individuals, and connected these communities randomly akin to the Erdős–Rényi (ER) model.32 In the base condition, the expected degree of connection within the community was chosen to be 10, and the expected degree of connection outside the community was set to 1. Estimates from the literature put the number of contacts at 3–4 during a single year.33 Because our simulated sexual contacts remain static over the 10 year simulation period, we explore mean degrees (Ed) between 10 and 30.
Transmission Model.
Transmissions are modeled using a compartmental epidemiological model with 5 states (a simplified version of the Granich et al.34 model): Susceptible (S), Acute HIV Untreated (AU), Acute HIV Treated (AT), Chronic HIV Untreated (CU), and Chronic HIV Treated (CT). Uninfected individuals (S) can only transition to state AU. Each of the infected states I = {AU, AT, CU, CT} has its own “infectiousness rate” λS,.. An uninfected individual u within nx sexual partners in state x ∈ I transitions from S to AU by a Poisson process with rate λu = ∑x∈I nxλS,x. Since ART significantly reduces the risk of transmission, rates are chosen such that λS,AU > λS,CU > λS,AT > λS,CT ≈ 0. We refer to λAU,AT = λCU,CT as the rate of ART initiation (λ+), and we refer to λAT,AU = λCT,CU as the rate of ART termination (λ−). We explore increasing or decreasing λ− by up to 4x, and we explore increasing λ+ by up to 4x (Table 1). At the beginning of the simulation, all initially infected (i.e., “seed”) individuals are distributed among the 4 infected states according to their steady-state proportions (Moshiri et al.22 put all of them in state AU.) The steady-state proportions were determined empirically (Fig. S3).
Viral and Sequence Evolution.
The viral phylogeny evolves inside the transmission tree under a coalescent model with logistic within-host viral population growth and a bottleneck event at the time of transmission (i.e., initial viral population size is 1).35 This process produces a separate viral phylogeny for each seed individual. These trees were merged by simulating a seed tree with one leaf per seed node under a non-homogeneous Yule model36 with rate function λ(t) = exp(−t2) + 1 scaled to have a height of 25 years to match San Diego estimates (see Moshiri et al.22 for details). Once the true tree was available, a mutation rate was sampled for each branch independently from a truncated normal random variable from 0 to infinity with a location parameter of 0.0008 and a scale parameter of 0.0005 to scale branch lengths from years to expected number of per-site mutations.22 Sequences were evolved according to the GTR model.37 Unlike Moshiri et al.22 who sampled all infected individuals in year 10 of simulations, we sample viral sequences the first time an individual begins ART (i.e., state AT or CT). The goal is to model standards of care in the health care systems that afford wide-spread sequencing.
Tree Inference.
Instead of providing ProACT with the simulated tree, using FastTree-2,26 a phylogenetic tree was inferred under the GTR+Γ model from the sequences obtained in the 6 first 9 years of the simulation, leaving the 10th year for the evaluation step. These trees were then MinVar-rooted.25 To measure the impact of imperfect sampling, we randomly subsampled the sequence datasets (at 25%, 50%, or 75% levels) before inferring phylogenies.
2.2. PIRC San Diego Dataset
To test ProACT on real data, we used an alignment of 926 HIV-1 subtype B pol sequences collected by the UC San Diego Primary Infection Resource Consortium (PIRC) between 1996 and 2018 in San Diego. PIRC is one of the largest longitudinal cohorts of samples in the United States. PIRC design strives to include acute infections (as much as 40% of individuals are recruited during early infection). A phylogenetic tree was inferred from the alignment under the GTR+Γ model using FastTree-226 and was MinVar-rooted.25 For each decile, the tree was pruned to only contain samples obtained up to the end of that decile using TreeSwift.38
2.3. Evaluation Procedure
ProACT was run on each of the estimated trees of simulated and real data. The metrics we use for evaluation are different between simulated and real data.
2.3.1. Simulated Data
Transmission/Person.
Because the true transmission histories are known in simulation, we simply average the number of infections caused by highly prioritized individuals in a one-year time frame after the prioritization (i.e., year 10) to obtain a raw outcome measure. Then, we measure the cumulative moving average (CMA) of the outcome measure by the top samples. The higher the CMA in a prioritization, the higher the number of future transmissions from these top individuals, and thus, the higher the effectiveness of the prioritization. Let A = {1,…,n} denote the n individuals sampled during years 1–9 of our simulations. Let c(i) denote the number of individuals directly infected by individual i in year 10 of the simulations. For any subset of individuals s ⊆ A, let C(s) = 1/|s| ∑i∈s c(i) denote their average c(i). For an ordering (x1,…,xn) of A, the (unadjusted) CMA up to i is C({x1,…,xi}).
Adjusted Transmissions/Person.
Sorting individuals by their transmissions in year 10 enables us to compute the optimal CMA curve, and the mean number of transmissions gives us the expected value of the CMA for a random prioritization. Across experimental conditions, the maximum and random expectations vary. Thus, to enable proper comparison of effects of prioritization across conditions, we also report an adjusted CMA normalizing above the random prioritization and over the optimal prioritization. Let (o1,…,on) denote the optimal ordering (i.e., in descending order of c(i)), with ties broken arbitrarily. We define the adjusted CMA up to i as (C({x1,…,xi})–C(A))/(C({o1,…,oi})–C(A)). We use this equation to measure the effectiveness of a selection of the top i individuals from each ordering of all individuals (exploring i for 1 to 10% of all samples; i.e., |A|/10). For this metric, 1 indicates the optimal ordering and 0 indicates an ordering that is no better than random (a negative value indicates an ordering that is worse than random).
Kendall’s tau-b.
We also use Kendall’s tau-b coefficient to measure the correlation between the optimal ordering and the ordering obtained using each method. Kendall’s tau-b is a rank correlation coefficient adjusted for ties with values ranging between −1 and 1, with −1 signifying perfect inversion, 1 signifying perfect agreement, and 0 signifying the absence of association.28
2.3.2. Real Data
The sequences were sorted in ascending order of sample time and were divided into deciles. For each decile, two sets are formed: past (sequences up to the decile) and future (sequences after the decile). A tree was inferred from the sequences in pre as described earlier and was used as input to ProACT. We then evaluated how our outcome measure correlates with the position of each individual in the ordering. We quantify the correlation using Kendall’s tau-b.
To define outcome measures, we cannot rely on infections by individuals. Instead, we use the approach used by Wertheim et al.15 We compute pairwise distances between each sequence in pre and each sequence in post under the Tamura-Nei 93 (TN93) model39 as in HIV-TRACE.19 A natural function to compute the risk of a given individual u in pre, similar to that proposed by Wertheim et al.,15 is to simply count the number of individuals in post who are genetic links to u, i.e., ∑v∈post I(d(u,v) ≤ 1.5%). In other words, the score function is simply a step function with value 1 for all distances less than or equal to 1.5% and 0 for all other distances. However, the selection of 1.5% as the distance threshold, despite being common practice in many HIV transmission clustering analyses, is somewhat arbitrary, and a step function exactly at this threshold may be overly strict (e.g. should a pairwise distance of 1.51% be ignored?).
To generalize this notion of scoring links, we utilized three analytical score functions: the step function, a sigmoid function with the choice of λ = 100 and λ = 5 (smoother versions of the step function) and an empirical smooth step function learnt from the data by fitting a mixture model onto the distribution of pairwise TN93 distances (Fig. S4). For each function, for each decile, we performed a Kendall’s tau-b test to compare the prioritization approaches.28 To generate a null distribution for visualization, we randomly shuffled the individuals in pre; however, p-values are computed using the tau-b test (not the empirical shuffling).
3. Results
3.1. Simulation Results
3.1.1. Default Condition
ProACT dramatically increased the performance compared to random ordering according to all of our outcome measures (Fig. 2). Focusing on the transmissions per person measure, while the population mean was 0.05, the ProACT’s CMA was close to 0.15 for the top 1% of prioritized samples and gradually reduced to 0.1 for the top 10% (Fig. 2a). The top 1,000 individuals in the ProACT ordering (3% of the population) transmitted 0.12 times (median across our 20 replicates), which was 2.4x higher than the median population average (Fig. 2c; see also Fig. S5 for numbers other than 1,000). As desired, selecting fewer people from the top of ProACT prioritization resulted in more transmissions per person (Fig. 2a). Compared to optimal ordering, however, the adjusted score both increased and decreased as more individuals were selected (Fig. 2b). The adjusted metric shows that while ProACT substantially outperformed random ordering, it did not come close to the effectiveness that could be achieved using the (hypothetical) perfect ordering. The Kendall’s tau-b also showed a positive correlation between ProACT and optimal ordering; although the correlation coefficient is far from perfect (Fig. 2d), the correlations are statistically significant in all replicates (p < 10−9; see Fig. S6).
Figure 2:

Effectiveness of prioritization on simulated datasets. The simulations were 10 years in length, prioritization was performed 9 years into the simulation, and the effectiveness of prioritization was computed during the last year of the simulation using four metrics (a–d). “Cluster Growth” denotes prioritization by inferring transmission clusters using HIV-TRACE at year 9 of the simulation and sorting clusters in descending order of growth rate since year 8. All curves were calculated using 20 simulation replicates. (a) Cumulative Moving Average (CMA) of the number of transmissions per person across the first decile of prioritized samples for the default simulation parameter set (see Fig. S8 for all model conditions, which show similar patterns). The horizontal axis depicts the quantile of highest-prioritized samples (e.g. x = 0.01 denotes the top percentile), and the vertical axis depicts their average number of transmission sper person. Global average across all individuals (i.e., expectation under random ordering) is shown in red. The curves labeled with percentages denote subsampled datasets. (b) CMA of adjusted number of transmissions per person for the default model condition (see Fig. S9 for all model conditions, which show similar patterns). For adjusted Transmissions/Person, 1 indicates the optimal ordering, and 0 indicates random ordering. All other settings are similar to part a. (c) Average of the raw number of transmissions per person for the top 1,000 individuals (see Fig. S5 for other counts) in a prioritized list vs. simulation parameter set (1,000 individuals correspond to 1%–6% of all individuals across conditions). The violin plots are across 20 replicates and contain box plots with medians shown as white dots. Red horizontal lines show population mean (i.e., random prioritization). (d) Kendall tau-b correlation between the optimal ordering of samples (i.e., based on their number of transmissions in year 10) and the orderings by the two prioritization methods. See Figure S7 for subsampled data. Distributions are across 20 replicates and are shown for each simulation condition.
Wertheim et al.15 have presented a method for prioritizing samples by clustering individuals based on viral genetic distance, tracking the size of each cluster over time, and prioritizing clusters in descending order of the growth rate. The approach can be extended to also order individuals (i.e., individuals belonging to clusters with high growth rates are prioritized higher; see Methods for details). ProACT consistently outperformed prioritization using cluster growth (Fig. 2). For example, the top 1,000 individuals according to cluster growth transmitted on average to 0.06 other people, which, while higher than the population average, was half the 0.12 transmissions per person according to ProACT. Kendall tau-b results similarly indicate that 7 ProACT has better correlation with the optimal ordering.
3.1.2. Impact of Simulation Parameters
We then tested the impact of three simulation parameters, namely the rate of stopping ART, the rate of starting ART, and the node degree in the sexual network (Figs. 2cd and S5–S7). As we increased the rate of stopping ART (λ−) (i.e., lower adherence), the gap between ProACT and cluster growth grew. For example, the mean number of transmissions per person among the top 1,000 individuals chosen using ProACT and cluster growth were respectively 0.169 and 0.076 (a 1.21x improvement) when λ− = 4x (Fig. 2c). This 1.21x improvement briefly increased to 1.26x and subsequently decreased to 1.01x, 0.69x, and 0.63x improvement as we reduced the rate or ART termination to 2x, 1x, 0.5x, and 0.25x. Kendall tau-b correlations show similar patterns (Fig. 2d); while almost all replicates of λ− = 4x have p < 10−20, for the 0.25x case, all replicates have p > 10−10 and one of the replicates has p > 10−3 (Fig. S6).
As we increased the rate of starting ART (λ+) (i.e., with faster diagnoses), as expected, the raw number of new infections caused per capita decreased (Fig. 2c, S5). While ProACT remained effective in finding high priority individuals, its performance compared to optimal ordering slightly degraded with higher λ+ (Figs. 2d and S7). Also, the gap between ProACT and cluster growth decreased slightly. When observing the mean number of transmissions per person among the top 1,000 individuals chosen by each method (Fig. 2c), ProACT gave a 1.01x, 1.03x, and 0.71x improvement over cluster growth for λ+ set to 1x, 2x, and 4x, respectively.
Changing the number of sexual contacts per person (Ed), which controls the speed of spread, had nonuniform effects (Figs. 2cd). Increasing Ed from 10 to 20 did not substantially impact the performance of ProACT. However, for Ed = 30, we observed a small but noticeable reduction in the improvements of ProACT compared to the cluster growth (Figs. 2d and S7).
3.1.3. Impact of Incomplete Sampling
Subsampling the total dataset to include 3/4, 1/2, or 1/4 of all samples had only a marginal impact on the performance of ProACT according to the CMA metric (Figs. 2ab, S8, S9). Only at 25% sampling level did we observe a small reduction in the performance of ProACT compared to the optimal ordering. For example, with λ+ = 2x, ProACT’s performance remained quite similar across ≥ 1/2 sampling levels, but a reduction in performance was observed for the 1/4 sampling level for both ProACT and cluster growth (Fig. S7).
According to Kendall’s tau-b, which measures the entire order rather than just the top individuals, there was a more noticeable degradation in performance due to sampling (Fig. S7). In particular, reduced sampling increased the variance across replicate simulations (note the wider distributions for reduced sampling in Fig. S7). Moreover, statistical significance of the correlations degrades with lower sampling (Fig. S6). With 1/4 sampling, unlike full sampling, many model conditions include some replicates where the ProACT ordering is not significantly better than random according to Kendall’s tau-b.
3.1.4. Second Order Effects
We next asked if prioritization is effective in detecting people whose contacts also transmit abundantly. To do so, we explored a new outcome measure: the total number of transmissions from all contacts of a sample. Prioritizing samples whose contacts are likely to transmit can give public health officials a chance to find undiagnosed individuals (likely to transmit) through partner tracing from diagnosed individuals and to prioritize PrEP for uninfected individuals in sexual contact with prioritized individuals.
Across all model parameters, ProACT ordering outperformed random ordering and cluster growth according to the number transmissions per neighbors (Fig. 3). For example, contacts of the top 1,000 individuals according to ProACT transmitted to 2.23 individuals on average (median across replicates), which is more than twice the number of across all individuals in the network (1.08). Just as with the previous outcome measure, advantages of ProACT over random prioritization or cluster growth were most pronounced for lower λ+ and higher λ− (Fig. 3c). The Kendall tau-b coefficients for the correlation between ProACT and the optimal ordering were high (Figs. 3d and S10); in fact, they were higher for the transmissions from contacts compared to transmissions from the prioritized person (e.g. median coefficient was 0.084 for contacts and 0.033 for the individuals in the default condition). These coefficients were highly significant across all models and sampling levels (Fig. S11). Thus, ProACT was even more effective in finding individuals with active contact than it was for finding individuals who were not suppressed. These results were largely robust to reduced sampling, showing similar patterns of average performance but increased variance across replicates (Fig. S10 and S11).
Figure 3:

Second order effects. (a) CMA of the number of infections from contacts of the top individuals according to each ordering; other settings similar to Figure 2a. (b) Similar to part (a), but adjusted for random and optimal ordering. (c) Number of transmissions from neighbors for the top 1,000 individuals in a prioritized list vs. simulation parameter set. (d) Kendall’s tau-b correlation between the optimal ordering of samples and their ordering by the two prioritization methods. See Figure S10 for subsampled data.
Further interrogating the properties of an individual and their ordering, we observed a substantial correlation between the number of contacts of samples in the sexual network and their position in the ProACT ordering (Fig. 4). Thus, while ProACT only considers the phylogeny, it prioritized those individuals that had high degrees in the sexual contact network (hidden to ProACT). These correlations were strongest for networks with high degree and weakest when the rate of diagnosis was very high. Reducing sampling did not substantially affect mean correlation but increased variance over replicates (Fig. S12).
Figure 4:

ProACT captures sexual contact network degrees. Kendall’s tau-b correlation between the number of contacts of each individual and their ordering by the prioritization methods. See Figure S12 for subsampled data.
3.2. Real San Diego Dataset
On real datasets, unlike the simulated data, the desired outcome measure, the number of new transmissions per person, is not known. Instead, we have to use inferred relationships. HIV-TRACE (used in our cluster growth approach) defines a pair of samples as “genetically linked” if their sequences are very similar (TN93 distance below 1.5%). We similarly use the TN93 sequence similarity as an outcome measure, but in addition to using a fixed threshold, we also use smoother functions (Fig. S13). We measure the number of linked individuals using a step function (1 if TN93 distance is below 1.5% and 0 otherwise) and an empirical smooth step function determined by fitting a mixture of three Gaussians to the distribution of pairwise TN93 distances (Material and Methods). We also explore an analytical smooth step function (parameterized sigmoid). Note that, when the step function is used, our outcome measure (computed for future transmissions) is exactly the same as what the cluster growth method uses for prioritizing (albeit, using past data). Thus, it is reasonable to suspect that the step function may favor cluster growth. As we move to smoother functions of distance to count genetic links, our measure is expected to become less biased in favor of HIV-TRACE.
Using ProACT or cluster growth to prioritize individuals results in orderings of individuals with positive Kendall’s tau-b correlations to the number of future genetic links regardless of the decile and the function used to count genetic links (Fig. 5). These correlations are statistically significant in almost all cases (Table 2). The correlation coefficient ranges between 0.4 (ProACT; 10% time) and 0.1 (cluster growth; 20% time) for empirical function, and between 0.6 (cluster growth; 10% time) and 0.1 (ProACT; 80% time) for the step function.
Figure 5:

Kendall’s tau-b test results for ProACT ordering on real data using four score functions: an empirical smooth step function, a strict step function around 1.5%, and the sigmoid score functions with λ = 100 and λ = 5. The full San Diego dataset was split into two sets (pre and post) at each decile (shown on the horizontal axis). The individuals in pre were ordered using ProACT and by cluster growth, and they were given a “score” computed using a score function (see MEthods). Kendall’s tau-b correlation coefficient was computed for each ordering with respect to the optimal possible ordering (i.e., sorting in descending order of the score). The null distribution was visualized by randomly shuffling the individuals in pre, and test p-values are shown in Table 2.
Table 2:
Kendall’s tau-b test for a null hypothesis that a given prioritization yields a total outcome measure no better than random. We show p-values for the real San Diego dataset for the first through ninth deciles using two outcome measure functions. Tests that failed to reject the null hypothesis with (uncorrected) p-value < 0.00138 (corresponding to α = 0.05 with a Bonferroni multiple hypothesis testing correct with n = 36) are marked with †.
| Empirical Smooth Step Function (FastTree) | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | |
| GD+CG | †2×10−3 | †2×10−2 | 5×10−6 | 2×10−4 | 5×10−5 | 6×10−7 | 2×10−9 | 2×10−8 | 2×10−11 |
| ProACT | 5×10−8 | 1×10−4 | 6×10−6 | 2×10−7 | 2×10−8 | 2×10−11 | 1×10−11 | 1×10−11 | 1×10−17 |
| Step Function Around 1.5% | |||||||||
| 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | |
| GD+CG | 4×10−12 | 1×10−19 | 3×10−28 | 7×10−25 | 2×10−19 | 8×10−12 | 1×10−17 | 5×10−14 | 2×10−25 |
| ProACT | 1×10−5 | 5×10−8 | 3×10−7 | 2×10−10 | 1×10−6 | 1×10−6 | 1×10−4 | †7×10−3 | 4×10−7 |
The comparison between ProACT and cluster growth depends on the choice of the function to count links. When counting the number of links using the step function, cluster growth consistently outperforms ProACT for all deciles of the dataset. These results should be interpreted with the caveat that we count HIV-TRACE links both to prioritize and to evaluate accuracy. However, according to the empirical smooth step function learned from the TN93 distances, ProACT outperforms cluster growth in all except one time point, where they are tied.
To further test the impact of the link-counting function applied to TN93 distances on the relative accuracy of methods, we used a sigmoid function to replace the step function while keeping the inflection point at 1.5% (Fig. S13). We observed that as the outcome measure function becomes more smooth, ProACT’s performance improves with respect to prioritization by cluster growth (Fig. 5, Table S1). Based on the more smooth sigmoid function (λ = 5), ProACT outperforms cluster growth in all but one case where they are tied. Thus, simply counting distances close to 1.5% as partial links leads to evaluations that favor ProACT.
As time increases, both methods experience seemingly downward trends in their tau coefficients, but the null distribution of tau coefficients also tightens (Fig. 5). Thus, both methods consistently do significantly better than expected by random chance and there is no clear relationship between p-values of individual tool and time (Table 2). However, both for the step function and the sigmoid functions, ProACT’s relative performance with respect to cluster growth tends to improved over time.
4. Discussion
We start by discussing observed results and then comment on practical implications of this paper both for public health and for future research.
4.1. Discussion of Results
In our simulations, ProACT was the least effective in conditions with very low rate of ART termination, which correspond to very high adherence, or high rates of ART initiation. As expected, the total number of new infections per sample is low when adherence is high, reducing the opportunity for improving the ordering. Thus, ProACT is the most beneficial where termination of ART or late diagnosis lead to diagnosed individuals who transmit frequently.
ProACT was quite robust to impacts of subsampling individuals and only at 1/4 sampling did we start to lose accuracy. We remind the reader that a 1/4 sampling does not mean that 1/4 of all infected individuals are in the dataset. Rather, it means that 1/4 of diagnosed individuals are available to us. Recall that, in our model, diagnosed individuals are immediately sequenced and put on ART (which they may or may not sustain). At any point in time, a large partition of individuals who are infected are not diagnosed and thus not sampled. In other words, the full sampling case should not be misunderstood as including undiagnosed people. Rather, lack of full sampling corresponds to a case where some samples are known to some clinic but are not included in the study, perhaps due to a lack of sequencing or data sharing.
ProACT far outperformed random ordering. However, we note that, despite the strong performance, there is much room left for future improvement: the adjusted outcome measure is consistently below 8% when selecting up to 10% of top-priority samples. Thus, there is much room for improvement in identifying high-value individuals. It will be unrealistic to expect that any statistical method based solely on sequence data (and perhaps also commonly available metadata, such as sampling times) will be able to come close to the optimal ordering. Nevertheless, methods better than ProACT could likely be developed in the future. Moreover, here, we used ML methods to infer trees and used mutation rate branch lengths. We made these choices mostly for computational expediency. However, ProACT algorithm can be applied on the potentially more accurate Bayesian estimates of the phylogeny. Also, one can attempt to use ProACT after dating the tree. Whether either adjustment results in substantial improvements should be studied in the future.
4.2. Implications of Results
We proposed a useful approach for thinking about the effectiveness of molecular epidemiology. Instead of focusing on the accuracy of methods of reconstructing phylogenetic trees or transmission networks, a question fraught with difficulties, we asked a more practical question. Given molecular epidemiological data, can the methods, whether phylogenetic or clustering-based, prioritize samples for increased attention by public health? Using molecular epidemiology for prioritization is, of course, not a new idea. For example, Wertheim et al.15 presented a method to prioritize samples based on the growth rates of their transmission clusters. Vasylyeva et al.40 performed a phylogeographic analysis to reconstruct HIV movement among different locations in Ukraine in order to infer region-level risk prioritization. Much earlier even, Mellors et al.41 predicted HIV patient prognosis by quantifying HIV RNA in plasma; predicted prognosis can subsequently be used as a prioritization rank. However, we hope that our formal definition of the prioritization problem, in addition to our extensive simulations and developed metrics of evaluation, will stir further work in this area. It is likely that more advanced methods than our simple prioritization approach can improve performance beyond ProACT in the future.
ProACT prioritizes individuals, not clusters. Prioritizing individuals based on their perceived risk of future transmission promises to be perhaps more effective than targeting clusters. This targeted approach also poses ethical questions that have to be considered. We may not want the algorithm to be biased towards particular demographic attributes. ProACT does not use any metadata in its prioritization, reducing risks of such biases. Nevertheless, it is possible that factors such as the depth of the sampling of a demographic group can in fact change branch length patterns in the phylogeny and make ProACT less or more effective for certain demographic groups. These broader implications of individual prioritization and impacts of demographics on the performance of ProACT should be studied more carefully in future.
Our results indicated that ProACT ordering is a function of features of the sexual contact network. For example, we showed that ProACT orders correlate with the degree of nodes in the sexual network (e.g. Figure 4). These results are significant given the fact that ProACT is given no direct information about the sexual network. The fact that ProACT captures contact network features means that, even if prioritized samples are already suppressed, their sexual contacts, identified using tracing, can be good targets for interventive care.
The main practical question is what can be done with a prioritized list of known samples. We mentioned that using follow-ups, public health officials can try to ensure sustenance of ART for prioritized individuals, and using partner tracing, they can target PrEP and HIV testing to contacts of prioritized individuals. Follow-ups, PrEP, and targeted testing are all expensive and can benefit from prioritization. Given a fixed budget, ProACT can help decide who should get follow-ups, who should be traced, and who should be tested frequently. For example, assuming an epidemic is similar to our default simulation condition, if we afford to contact trace 1,000 known individuals and effectively prevent transmission from all of their contacts (using ART or PrEP), we would prevent 2,230 further transmissions using ProACT as opposed to only 1,080 if the 1,000 individuals are randomly chosen. Thus, ProACT is a way to target prevention strategies to increase their effectiveness.
One may ask whether ordering by branch lengths results in orderings that fail to change with time to reflect changes in the epidemic. To answer this question, on the San Diego PIRC data, we asked how fast the ProACT ordering changes as time progresses. To do so, we computed Kendall’s tau-b correlations to the ProACT ordering obtained using only the first decile of the dataset (Fig. S14). There was a strong but diminishing correlation with the initial ordering. The correlations started at 1 (by definition) and gradually decreased in the ninth decile to 0.522. The results show that as desired, ProACT orders do in fact change with time, albeit gradually. The gradual change implies that certain individuals remain high-priority as time progresses. In practical use, ProACT ordering should be combined with clinical knowledge about the status of individual patients. For example, high priority individuals according to ProACT can be given lower priority if they manage to constantly remain suppressed with multiple follow-ups. Future versions of ProACT should allow users to provide a history of interventions and change the result accordingly. More broadly, the ProACT should be considered one of the tools for prioritizing clinical care, and other data that it does not incorporate should also be considered.
Finally, a question faced by public health officials is whether the cost of targeting diagnosed individuals for follow-ups and partner tracing is worth the potential reduction in future cases. The answer to that question will inevitably depend on who is targeted. For example, in our default simulation case, targeting individuals randomly can directly prevent 0.05 transmissions per chosen person in the next 12 months, whereas targeting top 1,000 individuals according to ProACT would directly target 0.12 transmissions. Thus, prioritization can in fact change the cost-benefit analyses. Moreover, given a prioritization, one can use simulations to predict the outcome measure for the top individuals (similar to Fig. S9) and use metrics such as quality-adjusted life-year (QALY) to estimate how many top individuals should be targeted for the cost to justify the benefits.
Supplementary Material
Acknowledgements
We thank Susan B. Little for providing the San Diego HIV sequence dataset used in this study (access obtained through a PIRC proposal). We also thank Joel O. Wertheim and Sanjay R. Mehta for fruitful discussions that helped motivate the development of ProACT.
Funding: This work was supported by the National Institutes of Health (5P30AI027767-28, AI100665, AI106039, and MH100974) and a developmental grant from the University of California, San Diego Center for AIDS Research (P30 AI036214), supported by the National Institutes of Health.
Footnotes
Code and Data Availability
ProACT is freely available at https://github.com/niemasd/ProACT in open-source. All experiment data are available online: https://github.com/niemasd/ProACT-Paper-Final
References
- 1.Wertheim JO, Leigh Brown AJ, Hepler NL, et al. The global transmission network of HIV-1. Journal of Infectious Diseases 209, 304–313. doi: 10.1093/infdis/jit524 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Little S, Pond SLK, Anderson CM, et al. Using HIV networks to inform real time prevention interventions. PLoS ONE 9. doi: 10.1371/journal.pone.0098443 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Schneeberger ADRN, Mercer CH, Gregson SA, et al. Scale-free networks and sexually transmitted diseases: a description of observed patterns of sexual contacts in Britain and Zimbabwe. Sexually Transmitted Diseases 31, 380–387 (2004). [DOI] [PubMed] [Google Scholar]
- 4.Cohen MS, Chen YQ, McCauley M, et al. Prevention of HIV-1 Infection with Early Antiretroviral Therapy. New England Journal of Medicine 365, 493–505. doi: 10.1136/jfprhc-2012-100379. arXiv: arXiv:1011.1669v3 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Poon AF, Gustafson R, Daly P, et al. Near real-time monitoring of HIV transmission hotspots from routine HIV genotyping: an implementation case study. Lancet HIV 3, e231–e238. doi: 10.1016/S2352-3018(16)00046-1 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gotz HM, van Rooijen MS, Vriens P, et al. Initial evaluation of use of an online partner notification tool for STI, called ‘suggest a test’: a cross sectional pilot study. Sexually Transmitted Infections 90, 195–200. doi: 10.1136/sextrans-2013-051254 (2014). [DOI] [PubMed] [Google Scholar]
- 7.Guttman N & Lev E. Ethical Issues in COVID-19 Communication to Mitigate the Pandemic: Dilemmas and Practical Implications. Health Communication, 1–8. doi: 10.1080/10410236.2020.1847439 (November. 2020). [DOI] [PubMed] [Google Scholar]
- 8.Katz DA, Hogben M, Dooley SW, et al. Increasing Public Health Partner Services for Human Immunodeficiency Virus: Results of a Second National Survey. Sexually Transmitted Diseases 37, 469–475. doi: 10.1097/OLQ.0b013e3181e7104d (August. 2010). [DOI] [PubMed] [Google Scholar]
- 9.Das M, Chu PL, Santos GM, et al. Decreases in Community Viral Load Are Accompanied by Reductions in New HIV Infections in San Francisco. PLoS ONE 5 (ed Carr JK) e11068. doi: 10.1371/journal.pone.0011068 (June 2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hogben M, McNally T, McPheeters M, et al. The Effectiveness of HIV Partner Counseling and Referral Services in Increasing Identification of HIV-Positive Individuals. American Journal of Preventive Medicine 33, S89–S100. doi: 10.1016/j.amepre.2007.04.015 (August. 2007). [DOI] [PubMed] [Google Scholar]
- 11.Bbosa N, Ssemwanga D, Nsubuga RN, et al. Phylogeography of HIV-1 suggests that Ugandan fishing communities are a sink for, not a source of, virus from general populations. Scientific Reports 9, 1051. doi: 10.1038/s41598-018-37458-x (December. 2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Villandré L, Labbe A, Brenner B, et al. Assessing the role of transmission chains in the spread of HIV-1 among men who have sex with men in Quebec, Canada. PLoS ONE 14, e0213366. doi: 10.1371/journal.pone.0213366 (March. 2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Oster AM, France AM, Panneer N, et al. Identifying Clusters of Recent and Rapid HIV Transmission Through Analysis of Molecular Surveillance Data. Journal of Acquired Immune Deficiency Syndromes 79, 543–550. doi: 10.1097/QAI.0000000000001856 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ragonnet-Cronin M, Hu YW, Morris SR, et al. HIV transmission networks among transgender women in Los Angeles County, CA, USA: a phylogenetic analysis of surveillance data. The Lancet HIV 6, e164–e172. doi: 10.1016/S2352-3018(18)30359-X (March. 2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wertheim JO, Murrell B, Mehta SR, et al. Growth of HIV-1 Molecular Transmission Clusters in New York City. The Journal of Infectious Diseases 218, 1943–1953. doi: 10.1093/infdis/jiy431 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wertheim JO, Kosakovsky Pond SL, Little SJ, et al. Using HIV Transmission Networks to Investigate Community Effects in HIV Prevention Trials. PLoS ONE 6, e27775. doi: 10.1371/journal.pone.0027775 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Smith DM, May SJ, Tweeten S, et al. A public health model for the molecular surveillance of HIV transmission in San Diego, California. AIDS 23, 225–232. doi: 10.1097/QAD.0b013e32831d2a81 (January. 2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Leitner T & Romero-Severson E. Phylogenetic patterns recover known HIV epidemiological relationships and reveal common transmission of multiple variants. Nature Microbiology 3, 983–988. doi: 10.1038/s41564-018-0204-9 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kosakovsky Pond SL, Weaver S, Leigh Brown AJ, et al. HIV-TRACE (TRAnsmission Cluster Engine): a Tool for Large Scale Molecular Epidemiology of HIV-1 and Other Rapidly Evolving Pathogens. Molecular Biology and Evolution 35, 1812–1819. doi: 10.1093/molbev/msy016 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ragonnet-Cronin M, Hodcroft E, Hué S, et al. Automated analysis of phylogenetic clusters. BMC Bioinformatics 14, 317. doi: 10.1186/1471-2105-14-317 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Prosperi MC, Ciccozzi M, Fanti I, et al. A novel methodology for large-scale phylogeny partition. Nature Communications 2, 321. doi: 10.1038/ncomms1325 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Moshiri N, Ragonnet-Cronin M, Wertheim JO, et al. FAVITES: simultaneous simulation of transmission networks, phylogenetic trees, and sequences. Bioinformatics, bty921. doi: 10.1093/bioinformatics/bty921. arXiv: 1805.08905 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Balaban M, Moshiri N, Mai U, et al. TreeCluster: clustering biological sequences using phylogenetic trees. PLoS ONE 14, e0221068. doi: 10.1371/journal.pone.0221068 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.McLaughlin A, Sereda P, Oliveira N, et al. Detection of HIV transmission hotspots in British Columbia, Canada: A novel framework for the prioritization and allocation of treatment and prevention resources. EBioMedicine 48, 405–413. doi: 10.1016/j.ebiom.2019.09.026 (October. 2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Mai U, Sayyari E & Mirarab S. Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction. PLoS ONE 12. doi: 10.1371/journal.pone.0182238 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Price MN, Dehal PS & Arkin AP. FastTree 2 - Approximately maximum-likelihood trees for large alignments. PLoS ONE 5. doi: 10.1371/journal.pone.0009490 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Nguyen LT, Schmidt HA, Von Haeseler A, et al. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution 32, 268–274. doi: 10.1093/molbev/msu300 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kendall MG. A New Measure of Rank Correlation. Biometrika 30, 81–93. doi: 10.1093/biomet/30.1-2.81 (1938). [DOI] [Google Scholar]
- 29.Romero-Severson EO, Bulla I & Leitner T. Phylogenetically resolving epidemiologic linkage. Proceedings of the National Academy of Sciences 113, 2690–2695. doi: 10.1073/pnas.1522930113. arXiv: arXiv:1408.1149 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Barabási AL & Albert R. Emergence of scaling in random networks. Science 286, 509–512. doi: 10.1126/science.286.5439.509. arXiv: 9910332 [cond-mat] (1999). [DOI] [PubMed] [Google Scholar]
- 31.Rothenberg RB, Potterat JJ, Woodhouse DE, et al. Social network dynamics and HIV transmission. AIDS 12, 1529–1536. doi: 10.1097/00002030-199812000-00016 (1998). [DOI] [PubMed] [Google Scholar]
- 32.Erdős P & Rényi A. On Random Graphs I. Publicationes Mathematicae Debrecen 6, 290–297 (1959). [Google Scholar]
- 33.Rosenberg ES, Sullivan PS, Dinenno EA, et al. Number of casual male sexual partners and associated factors among men who have sex with men: Results from the National HIV Behavioral Surveillance system. BMC Public Health 11. doi: 10.1186/1471-2458-11-189 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Granich RM, Gilks CF, Dye C, et al. Universal voluntary HIV testing with immediate antiretroviral therapy as a strategy for elimination of HIV transmission: a mathematical model. The Lancet 373, 48–57. doi: 10.1016/S0140-6736(08)61697-9 (January. 2009). [DOI] [PubMed] [Google Scholar]
- 35.Ratmann O, Hodcroft EB, Pickles M, et al. Phylogenetic tools for generalized HIV-1 epidemics: Findings from the PANGEA-HIV methods comparison. Molecular Biology and Evolution 34, 185–203. doi: 10.1093/molbev/msw217 (January. 2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Le Gat Y Recurrent Event Modeling Based on the Yule Process, Volume 2 1–128 (ISTE Ltd, London, 2016). [Google Scholar]
- 37.Tavaré S Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences. Lectures on Mathematics in the Life Sciences 17, 57–86 (1986). [Google Scholar]
- 38.Moshiri N TreeSwift: A massively scalable Python tree package. SoftwareX 11, 100436. doi: 10.1016/j.softx.2020.100436 (January. 2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Tamura K & Nei M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution 10, 512–526. doi: 10.1093/oxfordjournals.molbev.a040023 (1993). [DOI] [PubMed] [Google Scholar]
- 40.Vasylyeva TI, Liulchuk M, Friedman SR, et al. Molecular epidemiology reveals the role of war in the spread of HIV in Ukraine. Proceedings of the National Academy of Sciences 115, 1051–1056. doi: 10.1073/pnas.1701447115 (January. 2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Mellors JW, Rinaldo CR, Gupta P, et al. Prognosis in HIV-1 infection predicted by the quantity of virus in plasma. Science 272, 1167–1170. doi: 10.1126/science.272.5265.1167 (May 1996). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
