Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2022 Oct 31;18(10):e1010624. doi: 10.1371/journal.pcbi.1010624

Optimal sequence-based design for multi-antigen HIV-1 vaccines using minimally distant antigens

Eric Lewitus 1,2, Jennifer Hoang 1,2, Yifan Li 1,2, Hongjun Bai 1,2, Morgane Rolland 1,2,*
Editor: Rob J De Boer3
PMCID: PMC9621458  PMID: 36315492

Abstract

The immense global diversity of HIV-1 is a significant obstacle to developing a safe and effective vaccine. We recently showed that infections established with multiple founder variants are associated with the development of neutralization breadth years later. We propose a novel vaccine design strategy that integrates the variability observed in acute HIV-1 infections with multiple founder variants. We developed a probabilistic model to simulate this variability, yielding a set of sequences that present the minimal diversity seen in an infection with multiple founders. We applied this model to a subtype C consensus sequence for the Envelope (Env) (used as input) and showed that the simulated Env sequences mimic the mutational landscape of an infection with multiple founder variants, including diversity at antibody epitopes. The derived set of multi-founder-variant-like, minimally distant antigens is designed to be used as a vaccine cocktail specific to a HIV-1 subtype or circulating recombinant form and is expected to promote the development of broadly neutralizing antibodies.

Author summary

Diverse HIV-1 populations are generally thought to promote neutralizing responses. Current leading HIV-1 vaccine design strategies maximize the distance between antigens to attempt to cover global HIV-1 diversity or serialize immunizations to recapitulate the temporal evolution of HIV-1 during infection. To date, no vaccine has elicited broadly neutralizing antibodies. As we recently demonstrated that infection with multiple HIV-1 founder variants is predictive of neutralization breadth, we propose a novel strategy that endeavors to promote the development of broadly neutralizing antibodies by replicating the diversity of multi-founder variant acute infections. By training an HIV-1 Env consensus sequence on the diversity from acute infections with multiple founders, we derived in silico a set of minimally distant antigens that is representative of the diversity seen in a multi-founder acute infection. As the model is particular to the input sequence, it can produce antigens specific to any HIV-1 subtype or circulating recombinant form (CRF). We applied this to HIV-1 subtype C and obtained a set of minimally distant antigens that can be used as a vaccine cocktail.

Introduction

The diversification of HIV-1 is mediated by low replication fidelity, large population size, high recombination rates [13] and escape from immune pressures, including that exerted by neutralizing antibodies (nAbs) [4,5]. In the first months of infection, nAbs direct their response to recognize Envelope (Env) targets in autologous viruses [6] before heterologous targets are recognized typically a couple of years later [7,8]. While most individuals living with HIV-1 develop nAbs, only 10–25% of individuals elicit nAbs that can neutralize >70% of a diverse virus panel [9,10]. The maturation time needed to induce broadly nAbs (bnAbs) indicate that it is a complex, multi-factorial process. So far, this process has not been reproduced by vaccine candidates. Several viral factors have been associated with the development of bnAbs, including viral load, viral subtype, CD4+ T cell count, infection duration and viral diversity [1120].

Theoretical and empirical data show that increased Env diversity in acute or early infection may prime the immune system to develop bnAbs. A modeling study by Luo and Perelson showed that broadly neutralizing responses could emerge earlier in infections founded by multiple strains as bNAbs were less likely to develop after infection with single variants due to competitive exclusion by the autologous antibody response [21]. Relatedly, the large increase in diversity that occurs following a super-infection has been linked to the subsequent development of bnAbs in these individuals [2224]. We recently analyzed 3,482 HIV-1 Env sequences sampled from 70 people living with HIV-1 (PLWH) who were diagnosed in acute infection [2527] and compared the sequence data to neutralization breadth measurements performed on samples collected between six months and four years after infection [28]. Participants had been enrolled in the prospective HIV-1 acute infection cohort RV217 [29]. In this cohort, more than 3,000 seronegative individuals from four countries (Kenya, Tanzania, Thailand and Uganda) were tested twice-weekly for HIV-1 RNA and 155 acute infections were identified. RV217 PLWH were followed for up to five years after viremia was detected. We showed that individuals with infections established with multiple HIV-1 founder viruses were more likely to develop neutralization breadth than those with infections established with single founder viruses [27]. This finding was reproduced in the cohort of placebo recipients who were infected during the RV144 vaccine efficacy trial [27,30,31]. We interpret recent data from small cohorts of infants living with HIV-1 as also supporting this relationship between multi-variant infections and the development of breadth. Infections with more diverse viral populations were associated with the development of neutralization breadth [32] and the proportion of infections with multiple founders was high among infants (7/12) in a different small cohort [33].

Infections with multiple HIV-1 founder variants account for approximately 25% of infections [25,3437]. These multi-founder infections occur when multiple sequences are transmitted from a single transmitter who was likely in the chronic phase of their infection. Hence, all the sequences in the recipient are closely related, phylogenetically linked, and show around 1% intra-participant diversity (i.e., minimally distant). We previously showed that infections with multiple founder variants have clinically relevant features as viral load set point was 0.3 log10 higher in infections with multiple founders when compared to single founder infections [36]. We also recently showed that higher engagement of B cells in the first months of infection was associated with the development of bnAbs years later [28]. This finding together with the fact that infections with multiple HIV-1 founder variants was predictive of the development of neutralization breadth [27] emphasize that the early events of HIV-1 infection play a critical role in the ontogeny of bnAbs. This led us to propose that a vaccine which would be constituted by multi-founder like, minimally distant antigens (differing by ~1% in Env) corresponding to the variability we observed in infections with multiple founder variants could initiate the induction of bnAbs. We hypothesize that a set of antigens that show minimal differences and differ primarily at surface sites, including sites that correspond to bnAb epitopes, would promote the maturation of neutralizing responses through toggle responses between variant epitopes.

An HIV-1 vaccine that elicits bnAbs is vital to prevent infection from circulating viruses. Serial administration of the bnAb VRC01 prevented HIV-1 acquisition, albeit blocking only the small fraction of circulating viruses that were highly sensitive to the bnAb (IC80 <1 μg/mL) (corresponding to 28 of 162 infections) [38]. These results emphasize the enormous challenge associated with the development of a protective HIV-1 vaccine and highlight the need for new strategies to develop vaccine candidates that could elicit neutralization breadth. Here we present a probabilistic model for simulating Env alignments that resemble a set of minimally distant antigens representative of the variability seen in multi-founder acute infections. We showed that the model recapitulated features of multi-founder variant infections, including divergence or pairwise distances across sequences and diversity at key Env antibody epitopes. Using this probabilistic model, we derived minimally distant antigens (differing by ~1%) for subtype C Env sequences and selected five sequences that can be used as a vaccine cocktail.

Results

A probabilistic model designed to simulate sequence alignments

The model was based on two training alignments, F1 and F2. F1 corresponded to all the sequences descended from the major founder variant, i.e. the major founder lineage in a participant infected with multiple founder variants. F2 grouped all sequences descended from minor founder variants in that participant (Fig 1A). Infections with multiple founder variants can be established with two founders that are clearly delineated; however, rare lineages and recombinant forms (based on the extant or unsampled founder variants) are often identified—these are sometimes found as singletons. While the model can be adapted to n founder lineages, for simplicity, here we considered the major founder lineage F1 and grouped the other sequences in that participant (all closely phylogenetically related) as representing F2. Hence, a group is not necessarily comprised of sequences descending from a unique founder variant but represents a genetically differentiable cluster with a divergent evolutionary pattern; identifying and distinguishing such clusters allow us to recapitulate the diversity of acute infections with multiple founders without biasing towards one founder variant population. The percentage of non-consensus residues at each site (πj) was calculated for F1 and F2 separately, giving a mutational landscape for each founder lineage (major and minor viral population) (Fig 1B). Next, a transition probability matrix, Θ, was computed according to the procedure described by Le & Gascuel (LG matrix) [39] for the overall rate to define the probability of each amino acid (or gap) transitioning into any other amino acid (or gap) (Fig 1C); the empirical transition probabilities were computed on an alignment of 172 subtype C Env sequences sampled since 2011 from the LANL HIV Database. Finally, a single sequence, aligned to the training alignment reference frame, was required to seed the model (Fig 1D). The single seeding sequence corresponds to the design target and can be any sequence for which a set of multi-founder like antigens is to be derived, e.g. the consensus (or most recent common ancestor) from a set of sequences corresponding to a particular participant or to a specific HIV-1 subtype or Circulating Recombinant Form (CRF). At each iteration, n, sequence simulation was trained on either F1 or F2, such that site j of the seeding sequence had a probability of mutating proportional to πj to a new residue defined by Θj, where π was drawn from either F1 or F2 (Fig 1E). The model then outputs simulated sequences in an alignment (Fig 1F). Here we showed equal numbers of sequences derived from each founder lineage model, however, the proportion of simulated sequences trained on each founder lineages alignment can be specified. Instead of considering the variability in a given participant, the model can also rely on a pooled dataset of major vs. minor founder lineages from multiple participants. In that case, at each site, the mutational pattern can be that of any individual in the pooled set (to retain within-host diversity characteristics and forbid inter-host transitions).

Fig 1. Protocol for simulating alignments with multiple founder lineages.

Fig 1

(A) Alignments of the major (F1) and minor (F2) founder lineages from a participant infected with multiple founder variants are used to train the algorithm; this can also be done by pooling a set of participants in each training alignment. (B) The percentage of non-consensus amino acids is calculated at each site separately for the training alignments. (C) A transition probability matrix is calculated based on a set of empirical HIV-1 sequences. (D) A seeding sequence is input to seed the sequence simulation. (E) For each site, a residue is simulated from a multinomial probability distribution defined by the transition probability matrix and percentage of non-consensus amino acids at that site based on one of the founder variants, with an equal probability of sampling from each variant model. (F) The simulated sequences are output.

Simulated sequences reproduced the variability of acute infections with multiple founder variants

We simulated sequences trained on alignments corresponding to the major (F1) and minor (F2) founder lineages sampled from six RV217 participants who had infections established with multiple founder variants and whose plasma, years later, neutralized >70% of viruses on a 34-virus panel [27,28] (Table 1). Sequences were obtained between 4 and 34 days post-diagnosis. Highlighter plots (S1 Fig) and tree topologies (S2 Fig) indicated that each infection was established with multiple founder variants (Fig 2A and 2B). For the major founder lineage F1, the mean percentage of non-consensus residues per site across the six participants was 0.06% (max = 10%) and median pairwise amino acid diversity was 0.001 (min = 0, max = 0.005). For the minor founder lineage F2, mean percentage of non-consensus residues per site (1.11%, max = 50%) and median diversity (0.019, min = 0.006, max = 0.034) were both significantly higher than for F1 (Mann-Whitney U test, P<0.015). An empirical transition probability matrix, Θ, was computed on an alignment of 172 subtype C Env sequences sampled since 2011 and simulations were seeded with the consensus derived from sequences from an independent RV217 participant (id = 10066). One thousand sequences were simulated and trained in equal proportions on major and minor founder lineages alignments for each participant separately, where the percentage of non-consensus residues at each site, πj, was defined by either the major founder lineages, F1(πj), or minor founder lineages, F2(πj). The percentage of non-consensus residues in simulated sequences at each site was highly correlated (median R2 = 0.99, min = 0.98, max = 100) with the percentage in the training alignment for each founder lineage (Fig 2C and 2D). One thousand sequences were also simulated while trained on a pool of all founder lineage alignments across participants (Fig 2E), where F1(πj) was the maximum percentage of non-consensus residues found at site j in a given individual across all F1 alignments and F2(πj) was the maximum percentage of non-consensus residues found at site j in a given individual across all F2 alignments. The 95% confidence intervals of median pairwise diversity in simulated sequences included median pairwise diversity of training alignments for sequences simulated on each participant and for the pooled set of participants (Fig 2F). Similarly, the 95% confidence intervals of the median number of polymorphic sites per simulated sequence as well as the number of polymorphisms per sequence at CD4bs, V1-V2, V3 and MPER contact sites included the median number for training alignments for sequences simulated on each participant (Fig 2G).

Table 1. RV217 infections with multiple founder variants used as training alignments.

The RV217 participant ID, Env subtype/CRF, number of sequences sampled from each founder lineage, the days post-diagnosis that sequences were sampled, and the peak neutralization breadth reached by the participant within three years of diagnosis are reported.

ID Subtype Founder lineage 1 Founder lineage 2 Days post-diagnosis Peak neutralization breadth (%)
10220 A1 7 13 15,31 74
30124 A1 4 14 4,32 82
20337 C 9 9 7,34 79
40123 CRF01_AE 8 13 7,29 82
40363 CRF01_AE 10 10 7,28 88
40436 CRF01_AE 5 14 4,28 77

Fig 2. Simulated sequences reproduced the variability of acute HIV-1 infections with multiple founder variants.

Fig 2

(A) A highlighter plot and (B) phylogeny for Env sequences sampled at 7 and 28 days post-diagnosis from a RV217 participant (id = 40363) with an infection with multiple founder variants. (C) The percentage of non-consensus residues at each Env site in sequences simulated from the major founder variant or major lineage 1 (top, blue line) and the minor founder variants grouped here as lineage 2 (bottom, blue line) after seeding with the consensus sequence from an independent acutely-infected RV217 participant (id = 10066); values for sequences belonging to the major and minor founder lineages in 40363 are shown in open and filled pink circles, respectively. (D) Regression plots of the percentage of non-consensus residues in the training alignment as a function of non-consensus residues in the simulated alignment for (top, blue fill and pink border) founder lineage 1 and (bottom, blue border and pink fill) founder lineage 2. (E) Phylogeny of sequences sampled at 4–34 days post-diagnosis from 6 RV217 participants with infections with multiple founder variants. Tips are colored to represent the population corresponding to the major (open circles) and minor (closed circles) founder populations for each participant (for simplicity, multiple founder variants or singleton sequences are grouped under the minor lineage). (F) For sequences simulated under each training alignment (see panel E), the pairwise diversity of the training alignment (pink) and of the sequences simulated under that training alignment (blue); and the pairwise diversity of sequences simulated under the pooled alignment (blue). Solid lines represent 25% and 75% interquartile ranges. (G) Barplots of the number of polymorphic sites per sequence in sequences simulated under each training alignment and the pooled-participants training alignment (blue) at all sites, CD4bs, V1-V2 contact sites, V3 contact sites, and MPER sites. Dashed whiskers indicate maximum values. Pink dots represent median values for training alignments.

The model was designed to simulate sequences that are specific to the seeding sequence rather than mimic the composition of the training alignment. Therefore, the simulated alignment should be genetically closer to the seeding sequence than to the training alignment. Indeed, the percentage of mismatched ungapped sites between the consensus of simulated sequences and seeding sequence (0%) was significantly lower (Mann-Whitney U test, P = 0.001) than between the consensus of simulated and consensus of the training sequences (8.27–16.12%) for sequences simulated with each training alignment (S3 Fig).

Simulated sequences replicated the diversity found in infections with multiple founder variants

We compared diversity and divergence estimates for sequences simulated under the pooled set of major and minor founder lineage alignments to Env sequences sampled during acute infection from RV217 participants infected with either a single founder variant (and who developed <35% neutralization breadth) (n = 12) or multiple founder variants (and who developed >70% neutralization breadth) (n = 6). The 95% confidence intervals of diversity and divergence estimates for simulated sequences included the median values for Env alignments of multi-founder variant acute infections and was higher than that of infections with single founders (Fig 3A–3D).

Fig 3. Simulated sequences replicated the diversity found in infections with multiple founder variants.

Fig 3

Violin plots of (A) median pairwise diversity, (B) maximum pairwise diversity, (C) median divergence from the consensus, and (D) maximum divergence from the consensus for sequences sampled from RV217 participants during acute infection with a single founder variant (n = 53, grey), sequences simulated under the pooled-participants probabilistic model (n = 5000, blue), and sequences sampled from RV217 participants during acute infection with multiple founder variants (n = 6, pink). (E) Phylogenies constructed with sequences from a participant with a single founder variant sampled at 1 and 29 days post-diagnosis (id = 10066), a sample of simulated sequences from the pooled-participants model (five from major founder lineage and five from minor founder lineage 2, blue), and sequences from a participant with multiple founder variants sampled at 7 and 28 days post-diagnosis (id = 40363) (pink). Phylogenies are shown at the same scale; the dashed box shows the top phylogeny at ten-times magnification. Violin plots of spectral density profile summary statistics λ*(F), η (G) and ln-transformed ψ (H) for the same groups. Median values are shown above each plot.

Similarly, phylogenies of simulated sequences were more similar to phylogenies of infections with multiple founder variants than to those with single founder variants (Fig 3E). Spectral density profiles of the modified graph Laplacian [40] were computed for Env alignments from infections with single and multiple founder variants in RV217 and for down-sampled alignments from sequences simulated under the pooled set of major and minor founder lineages alignment. Spectral density profile summary statistics each capture a unique aspect of phylogenetic topology: λ* is proportional to non-synonymous/synonymous rates, η is inversely proportional to transition-transversion rate ratio, and ψ is proportional to rate heterogeneity [41]. The simulated alignment was downsampled 100 times for 5 sequences simulated under each of the founder lineage training alignments, F1 and F2. The 95% confidence intervals of λ*, η, and ln-transformed ψ for simulated sequences included the median values for Env alignments of acute infections with multiple founder variants but none included the median value for infections with a single founder variant (Fig 3F–3H).

In silico-derived antigenic sequences for HIV-1 Env subtype C

A consensus sequence was generated from 172 subtype C Env sequences sampled after 2010 (Fig 4A). One thousand sequences were simulated under the pooled founders training alignment and seeded with the subtype C Env consensus sequence derived from the subtype C alignment. One-hundred samples of ten sequences (five simulated under F1 and five under F2) were analyzed. The sampled sequences produced phylogenies with bimodal or multimodal topologies (Fig 4B). The median pairwise distance of subtype C Env sequences sampled after 2010 to the consensus was 0.160 [IQR = 0.149–0.172], while the mean of median pairwise distance to the subtype C consensus across samples of simulated sequences was 0.015 (IQR = 0.013–0.016) (Fig 4C). Across the subtype C alignment, a mean of 15.8% of the residues at a site were non-consensus residues and 79.1% of sites (681/861) were polymorphic, whereas across samples of simulated sequences an average (mean of median) of 3.08% [IQR = 2.93–3.16%] of residues at a site were non-consensus and 17.3% [IQR = 15.2–16%] (148/856) of sites were polymorphic (Fig 4D). B-cell epitopes and exposed sites were predicted for the subtype C consensus. Across samples of simulated sequences, an average of 12.3% [IQR = 11.7–13.2%] of sites (26.73/213) with >50% epitope probability were polymorphic and 16.2% [IQR = 15.0–16.9%] of sites (42.65/260) at predicted exposed sites were polymorphic (Fig 4D). Finally, all samples of simulated sequences were polymorphic at one or more Ab epitope sites. Five Abs corresponding to critical Env targets for neutralization were considered: VRC01:CD4bs[42], CAP256-VRC26.25:V2 apex[43], PGT121:V3[44], 10E8:MPER[45], 35O22:interface between gp120 and gp41[46]. At VRC01 epitope sites (n = 36), a median of 12 sites were polymorphic per sample of simulated sequences and 17 sites were polymorphic in at least one sample; at CAP256-VRC26.25 epitope sites (n = 6), 1 site was polymorphic per sample and 4 were polymorphic in at least one sample; at PGT121 sites (n = 13), 2 and 8; at 10E8 sites (n = 10), 2 and 2; and at 35O22 sites (n = 12), 2 and 6 (Fig 4E).

Fig 4. Sequences simulated from the HIV-1 subtype C Env consensus.

Fig 4

(A) Phylogeny of subtype C Env sequences sampled after 2010. The consensus is marked in purple. (B) Four phylogenies constructed from sequences simulated under the pooled-participants model seeded with the subtype C consensus; each phylogeny is comprised of the subtype C consensus (purple) and five sequences randomly selected from simulations under major founder lineage 1 and five under minor founder lineage 2. (C) Density plots of divergence from the consensus for the alignment corresponding to each intra-host phylogeny of simulated sequences (blue) and for the inter-host subtype C phylogeny (purple). (D) Barplot of the percentage of non-consensus residues at each site in the simulated alignment. Epitope prediction probability for the subtype C consensus across sites (solid grey line). Dashes indicate the subtype C consensus predicted exposed residues and predicted B-cell epitopes that are polymorphic in the simulated alignment. (E) Polymorphisms (blue squares) at contact sites for five representative antibodies in 100 downsampled simulated alignments comprised of five sequences simulated under each founder variant lineage.

Finally, an alignment of simulated sequences was constructed for five Ab epitopes representative of key Env targets (VRC01, CAP256-VRC26.25, PGT121, 10E8, 35O22). There were 166 sequences with non-consensus residues in at least three of the five Ab epitopes (Fig 5A–5F). From these, two candidate sequences generated by F1 and two by F2 that were maximally divergent (within the framework of minimal diversity) were selected as candidate antigens (Fig 5G and S1 File). Together with the subtype C consensus, the candidate sequences had 20 polymorphic sites with 2–5 different residues per polymorphic site (Fig 5H). We predicted the structure of the simulated sequences using AlphaFold2 [47]. Predicted local distance difference tests (pLDDTs) showed generally good confidence across all domains, with median pLDDTs between 86.25–88.44, which was comparable to the median pLDDT of the subtype C Env consensus (87.77) that was used to seed the simulations. The sequences generated by F1 were more similar to the seed sequence than those generated by F2 (Fig 5I), as expected, and the structure protein prediction indicated that all simulated sequences should fold to a structured protein.

Fig 5. In silico-derived sequences for an HIV-1 subtype C vaccine cocktail.

Fig 5

Line plots of the number of non-consensus residues per simulated sequence are represented for F1 (dashed blue line) and F2 (blue line) at epitope sites for five representative antibodies: (A) VRC01, (B) CAP256-VRC26.25, (C) PGT121, (D) 10E8, and (E) 35O22. (F) Divergence of simulated sequences from the subtype C consensus based on the five Ab epitope sites (n = 77). In A-F, sequences are sorted along the x-axis by their y-axis value; the black line traces the value across sequences. (G) Phylogeny of simulated subtype C Env sequences with tips corresponding to the subtype C consensus (purple) and candidate antigen sequences generated by F1 (blue) and F2 (blue). (H) Multiple sequence alignment of antibody epitope sites for the subtype C consensus and candidate antigen sequences; non-consensus residues are shown in larger font and regions corresponding to the epitopes of VRC01, CAP256-VRC26.25, PGT121, 10E8, and 35O22 sites are highlighted below the alignment; a logo plot above the 5 candidate sequences represents the diversity found in circulating subtype C sequences at the 5 representative antibody epitopes. (I) Sequence differences from the consensus to candidate antigen sequences mapped on the predicted structure of the consensus C sequence. The Cα atoms of similar, different, and insertion/deletion (indel) sites are shown as yellow, red and black spheres, respectively.

Discussion

While bnAb infusions in humans can prevent HIV-1 infection [38], no vaccine candidate has shown the induction of such bnAbs in a vaccine efficacy trial [4854], emphasizing the need for novel vaccine strategies. Here we developed a new vaccine design approach that emulates the diversity observed in HIV-1 infections with multiple founder variants. This multi-founder like vaccine design derives from the finding that individuals with infections established with multiple founder variants were more likely to develop bnAbs than individuals with infections established with single founder variants [27]. We showed that our design strategy reproduced the diversity seen in infections with multiple founder variants and we applied this approach to design a subtype C-specific vaccine candidate constituted of a set of five minimally distant Env sequences centered on an updated subtype C consensus.

First, we developed a probabilistic method to design antigens that reflect the diversity seen in acute infections with multiple founder variants. This method was trained on sequences descended from major and minor founder lineages sampled during acute infection from one of six individuals who developed broad neutralization breadth against HIV-1 or on a pooled alignment of sequences from all six individuals. When seeded with an independent acute infection sequence, our method generated a set of antigens that recapitulated the variability of infections with multiple founder variants, including polymorphisms at critical Env target epitopes (CD4bs, V1-V2, V3 and MPER). Importantly, the derived sequences remained close to the seeding sequence, suggesting that the simulated sequences would preserve the structural integrity of the chosen central sequence. By training on alignments sampled from different infections with multiple founder variants, we showed that the variability patterns of simulated sequences differed according to the training alignment. However, for all training alignments, our method generated sequences that reflected a median genetic distance of ~1%, conforming to the infections with multiple founder variants that we want to emulate in order to elicit bNAbs.

Second, we applied this method to design a subtype C specific vaccine candidate. Seeded with the Env subtype C consensus (derived from 172 independent subtype C Env sequences sampled since 2011), we generated a set of minimally distant antigens that preserved the composition of subtype C sequences while recapitulating the variability of infections with multiple founder variants. The divergence from the consensus of simulated sequences was a magnitude smaller than that of independent subtype C sequences and so met the criteria of representing diversity around the seeding sequence while reducing the total inter-host genetic space. We selected four candidate antigens that exemplified diversity at epitope sites for representative bNAbs. However, alternative selection criteria could be used to design antigens with features that could promote other types of immune responses, such as Fc effector functions [55,56]. An important limitation of this method is its reliance on alignments from only six individuals to simulate multi-founder-like sequences. While each individual developed bnAbs, precise mechanisms behind the association between multi-founder variant diversity and the development of neutralization breadth are unknown. We tried to overcome this limitation by pooling founder lineage alignments to capture the scope of diversity found in acute infection in these broad neutralizers rather than relying on one individual to define a prototypical infection with multiple founder variants. While we chose to use as seeding sequence a subtype C consensus in order to design a subtype C vaccine candidate (because over half of the people living with HIV-1 live with subtype C), only one of the six infections we used in our pooled founder lineage alignment corresponded to subtype C (A1 (n = 2) and CRF01_AE (n = 3)). It has not been reported that patterns of variability in acute infections with multiple founders differ by subtype, nonetheless, it is possible that subtype C specific patterns exist and would not necessarily be captured by our approach. Additionally, we separated sequences into two major and minor founder lineages—this does not account for the complexity seen in some infections with multiple founder variants which can include recombinants between extant and/or unsampled sequences and rare or even unique sequences. Hence, our model simplified the landscape of acute viral diversity in these individuals. Nonetheless, if datasets of increased depths are available, the model can be used to simulate the multitude of distinct sequences that can be identified in a set of hundreds of Env sequences from acute HIV-1 infection.

In silico methods for vaccine design have gained a foothold over the last decade. Computational approaches for updating vaccines against Influenza have proposed models for predicting antigenic diversity over time, including multivariate regression on physicochemical properties of circulating variants [57], phylogenetic weighting of antigenic evolution [5860], and dynamic fitness models of antigenic alleles [61]. In silico vaccine design methods have been used for HIV-1 for over two decades to overcome the obstacle posed by HIV-1’s extreme diversity [6264]. There is more diversity within each HIV-1 subtype or CRF than what can be seen across a viral species [65]. Gaschen and colleagues showed that a centralized sequence, consensus or ancestral, would better represent HIV-1 than any sequence derived from a PLWH [63,66]. To cope with HIV-1 diversity, some designs, such as the mosaic approach, seek to integrate a fraction of the diversity seen in HIV-1 sequences [6769]. These variability-inclusive strategies are reminiscent of the diversity seen in super-infections, which have previously been associated with the development of neutralization breadth. As such, mosaic antigens are designed to be maximally distant to cover a large fraction of circulating viruses. The rationale is that immunizations with these diverse mosaic inserts, for example corresponding to consensus sequences for group M, subtype B and subtype C, could lead to the development of antibody responses against these distant viruses thereby potentiating broadly cross-reactive responses. Two vaccine efficacy trials are testing the Mosaic design (one reached futility criteria in 2021: https://www.jnj.com/johnson-johnson-and-global-partners-announce-results-from-phase-2b-imbokodo-hiv-vaccine-clinical-trial-in-young-women-in-sub-saharan-africa). An opposite strategy to Mosaic designs was to focus on only the most conserved elements of HIV-1 [7073]. This stemmed from the realization that variable segments of HIV-1 functioned as decoys eliciting immune responses that were not optimal and that only a small fraction of HIV-1 diversity could be integrated in a vaccine candidate of practical size. Another currently leading strategy, the germline targeting approach, seeks to improve the longitudinal process seen in individuals infected who later developed breadth [74] by reproducing the directional process that leads to breadth in a minority of individuals through using antigens that correspond to stepwise stages of the co-evolution between the virus and the neutralizing response. Our approach is also based on a process seen in natural infections, whereby infections with multiple founder variants were linked to the subsequent development of neutralization breadth. This multi-founder-like design of minimally distant antigens is also akin to the conserved elements vaccine design rationale. We consider that the ‘noisification’ of a central consensus sequence at target sites for key antibodies will trigger responses to these antibody epitopes and that the toggle between these minimally distant epitopes will promote a desirable affinity maturation process leading to the development of bnAbs. The fact that our vaccine design was derived from the variability seen in multi-founder acute infections suggests that this strategy with a cocktail of minimally distant antigens may be best suited as a priming immunization. Whether subsequent immunizations should consist of the same set of antigens or a subset of them, or one or more distinct antigens, will need to be evaluated with experimental assays.

In summary, our in silico method generates a set of antigens that bear distinct epitopes, but maintain a minimal global distance across Env, constituting a projected formula for increasing the probability of eliciting bNAbs. We hypothesize that this generic approach can serve to design vaccine candidates with enhanced bnAb-eliciting properties for any given sequence. As such, this approach can be used to design cocktail vaccine candidates adapted to any HIV-1 subtypes and circulating recombinant forms. While this model can also be used to design an HIV-1 group M vaccine cocktail, the idea of a successful universal HIV-1 vaccine is far-fetched when considering lessons from the past forty years of HIV-1 vaccine research.

Materials and Methods

RV217 participant sequences

We used env sequences that we previously generated via single genome amplification of HIV-1 on plasma samples collected in the first five weeks after HIV-1 diagnosis in acute infection in participants from the RV217 cohort [2527,29]. All participants were antiretroviral treatment naïve. We included Env sequences from twelve participants with infections with single founders (median = 10, min = 10, max = 28) who developed <35% neutralization breadth and from six participants with infections with multiple founders who developed >70% neutralization breadth (median = 11, min = 10, max = 13) (Table 1) (another participant with multiple founder variants and >70% neutralization breadth was excluded because the development of neutralization breadth occurred following superinfection). Infections with multiple founder variants are illustrated with highlighter plots [34] (S1 Fig); we previously reported that these individuals neutralized 74–88% of a 34-virus panel at 435–2115 days post-diagnosis [28]. Sequences belonging to each founder lineage in multi-founder acute infections were used in the training dataset.

Independent subtype C sequences

Subtype C Env sequences sampled since 2011 were downloaded from the Los Alamos National Laboratory HIV Sequence Database (https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html). Sequences were excluded if the individuals had been vaccinated, if the sequence did not have a complete open reading frame or did not have a sampling year. One sequence was downloaded per individual and sequences were removed if they were non-independent or an outlier. Hypermutated sequences identified with Hypermut 2.0 [75] (using https://github.com/philliplab/hypermutR) were removed with a Fisher’s exact test P<0.1. Sequences were de-duplicated at 95% identity.

Probabilistic model for sequence simulation

A probabilistic model was developed to simulate Env sequences that replicated the variability of infections with multiple founder variants. The model is trained on two Env alignments, corresponding to the major (F1) and minor (F2) founder lineages identified in a given participant (the model can also be trained on a pooled set of individual lineages). A first-order Markov transition probability matrix, Θ, is estimated as described by Le and Gascuel (LG matrix) [39]; in brief, transition rates are directly computed between pairs of amino acids, including transition rates from/to gaps based on an alignment of empirical sequences. We suggest using a large dataset of independent sequences that would correspond to the subtype of the desired target vaccine candidate (i.e., subtype C sequences if the goal is a subtype C vaccine which implies that the seed sequence corresponds to subtype C). For each alignment,

  1. The percentage of non-consensus residues was calculated for each alignment site.

  2. For each site, j, percentage of non-consensus residues, πj, an amino acid in the seeding sequence, k, and a transition probability, Θk, the probability of transitioning from k to k’ was written as

P(k|k,πj,Θk)=Θk[k]πjΣ(Θk[k])

where Pj(k′) is calculated for each site j. For each simulated sequence, πj is trained on either F1 or F2 and at each site in the sequence, j, the new residue is then drawn from a multinomial probability distribution based on Pj(k′). The denominator, Σ(Θk[−k]), is included to force the probability distribution to sum to 1, such that Pj(k′≠k) = 1-Pj(k′ = k). For each seeding sequence (e.g., a subtype C consensus sequence), whichever model is initially drawn to simulate the initial amino acid in a sequence is consistently applied to generate the following amino acids in that sequence. For the pooled sets, the less diverse major founder lineages were pooled together and the more diverse minor founder lineages constituted a second set. For each pooled founder lineage alignment, at each site j, πj was estimated as the maximum percentage of non-consensus residues in any individual alignment in that pool (i.e., this retained within-host diversity levels); however, this could be alternatively modeled such that, at each site j, πj was randomly drawn from an individual alignment in the pooled alignment.

Sequence simulation

One thousand sequences were simulated using a probabilistic model with an empirical transition probability matrix, Θ, computed on an alignment of 172 subtype C Env sequences sampled since 2011 and seeded with a consensus sequence corresponding to sequences from an independent acutely-infected RV217 participant (id = 10066) or a subtype C consensus. Seven sets of sequences were simulated: one trained on each of the six individual training alignments separately and one on a pooled alignment of all of the training sequences. For the pooled model, the F1(πj) and F2(πj) were the maximum percentage of non-consensus residues found at site j in a given individual across all F1 and F2 alignments, respectively. The probability of drawing from each variant model was recorded, so the outputs could be analyzed separately.

Sequence analysis

Consensus sequences were computed with a majority rule. Sequences were aligned to the HXB2 reference in MAFFT v7.419 [76]. For alignments of sequences sampled from participants in RV217, the percentage of non-consensus (as well as non-gap, non-ambiguous) residues at each site was calculated as the number of residues at each site different from the majority consensus residue for that alignment divided by the total number of sequences. Polymorphic sites were defined as sites with at least one amino acid different from the consensus. For simulated sequences, the percentage of non-consensus residues and polymorphic sites were defined against the seeding sequence. Contact sites for known HIV-1 antibodies (n = 116) were previously reported in studies of natural HIV-1 infection (https://www.hiv.lanl.gov/components/sequence/HIV/featuredb/search/env_ab_search_pub.comp).

A maximum-likelihood model of pairwise sequence distance that corrects for sequence length was computed using the dist.ml function [77]. Sequence divergence was calculated against the seeding sequence for each alignment. Phylogenies of aligned sequences were constructed with IQ-TREE 2 [78] based on the model with the lowest BIC identified with ModelFinder [79]. The modified graph Laplacian (MGL) is computed for the distance matrix of the reconstructed phylogeny of sequences; eigenvalues calculated from the MGL define the connectivity of the phylogeny in terms of substitutions. Spectral density profile summary statistics represent different aspects of the topology of the phylogeny, such as the longest path through the phylogeny, λ* which is a correlate of non-synonymous/synonymous substitution rates, the proportion of long versus short branching-events, ψ, which is a correlate of rate heterogeneity, and the occurrence of branching-events, η, which is a correlate of transition-transversion rates [40,41]. Spectral density profile summary statistics λ*,ψ, and η were estimated for phylogenies reconstructed from empirical and simulated sequences. Simulated alignments were iteratively down-sampled 100 times to a random set of 10 sequences. Divergence, pairwise distance, and phylogenetic metrics were calculated on each downsampled alignment.

Subtype C sequence antigen prediction

A phylogeny for subtype C sequences was constructed with IQ-TREE 2 [78] based on the model with the lowest BIC identified with ModelFinder [79]. Divergence from the majority consensus was computed for each sequence and pairwise distances were computed for all sequences.

For the subtype C consensus, exposed residues (i.e., accessible to antibodies) were defined as we previously described [27] and B-cell epitopes were predicted with Bepi-Pred 2.0 [80] using an epitope prediction threshold of 0.5. The number of polymorphic sites among simulated sequences corresponding to predicted B-cell epitopes were quantified.

To select candidate antigen sequences, simulated sequences were filtered by those that had a non-consensus residue (with respect to the subtype C consensus) in at least three key Ab epitopes (VRC01, CAP256-VRC26.25, PGT121, 10E8, and 35O22). Of these sequences with minimal variability, the two maximally divergent sequences simulated by F1 and two by F2 were selected as candidate antigens. Maximally divergent sequences were selected to cover as much genetic space as possible within the simulated minimal divergence.

Structure prediction and visualization

The structure of one subunit of the Env-trimer for subtype C consensus (the seed sequence) and four in silico-derived antigenic sequences were predicted with ColabFold [81]. The alignment was prepared using MMseqs2 [82] and the structure prediction was carried out with AlphaFold2 [47]. Before feeding to the ColabFold, the signal peptide and the sequence after the transmembrane helix were removed from the sequence. The structure figure is rendered by PyMol (https://pymol.org/). If a substitution is between a pair of highly similar residues (RK, QE, QN, ED, DN, TS, SA, VI, IL, LM, and FY), the residue is colored yellow; other changes are colored red.

Statistical analysis

Shapiro’s normality test was used to determine if data were normally distributed. If data were normally distributed, pairwise comparisons were made using a Student’s t-test; and otherwise using a Mann-Whitney U test. Two-sample Kolmogorov-Smirnov tests were used to compare distributions. Statistical tests were only used to compare empirical data but not simulated data. Comparisons with simulated data were made by assessing inclusion/exclusion of values within the 95% confidence intervals of simulated data.

Supporting information

S1 Fig. Highlighter plots of 6 RV217 infections with multiple founder variants.

For each individual, a highlighter plot is constructed from sequences sampled during acute infection using the consensus as the master sequence. The number of days post-diagnosis at which each sequence was sampled is listed to the right of each plot.

(TIF)

S2 Fig. Phylogenies of 6 RV217 infections with multiple founder variants.

For each individual, a phylogeny constructed from sequences sampled during acute infection and rooted on the majority consensus sequence.

(TIF)

S3 Fig. Mismatched sites between seeding, training, and simulated sequences.

Correlation plot of the percentage of mismatched non-gapped sites between the consensus of the seeding alignment, simulated alignment, and training alignment for sequences simulated under each RV217 multi-founder infection.

(TIF)

S1 File. Candidate antigen sequences simulated from a subtype C Env consensus sequence and trained on pooled founder alignments sampled from six multi-founder acute infections in RV217.

(TXT)

Acknowledgments

We are indebted to the RV217 participants and clinical team. We also thank Julie Ake, Merlin Robb and Sandhya Vasan.

The views expressed are those of the authors and should not be construed to represent the positions of the U.S. Army, the Department of Defense, or the Department of Health and Human Services.

Data Availability

Sequences analyzed in this study are available in GenBank under accession numbers: MN791130—MN792579, ON959609 - ON959788. The code and data generated during this study are available at https://www.hivresearch.org/publication-supplements.

Funding Statement

This work was supported by a cooperative agreement between The Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., and the U.S. Department of the Army [W81XWH-18-2-0040]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Roberts JD, Bebenek K, Kunkel TA. The accuracy of reverse transcriptase from HIV-1. Science. 1988;242(4882):1171–3. Epub 1988/11/25. doi: 10.1126/science.2460925 . [DOI] [PubMed] [Google Scholar]
  • 2.Smyth RP, Davenport MP, Mak J. The origin of genetic diversity in HIV-1. Virus Res. 2012;169(2):415–29. Epub 2012/06/26. doi: 10.1016/j.virusres.2012.06.015 . [DOI] [PubMed] [Google Scholar]
  • 3.Cuevas JM, Geller R, Garijo R, Lopez-Aldeguer J, Sanjuan R. Extremely High Mutation Rate of HIV-1 In Vivo. PLoS Biol. 2015;13(9):e1002251. Epub 2015/09/17. doi: 10.1371/journal.pbio.1002251 ; PubMed Central PMCID: PMC4574155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Richman DD, Wrin T, Little SJ, Petropoulos CJ. Rapid evolution of the neutralizing antibody response to HIV type 1 infection. Proc Natl Acad Sci U S A. 2003;100(7):4144–9. Epub 2003/03/20. doi: 10.1073/pnas.0630530100 ; PubMed Central PMCID: PMC153062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wei X, Decker JM, Wang S, Hui H, Kappes JC, Wu X, et al. Antibody neutralization and escape by HIV-1. Nature. 2003;422(6929):307–12. Epub 2003/03/21. doi: 10.1038/nature01470 . [DOI] [PubMed] [Google Scholar]
  • 6.Tomaras GD, Yates NL, Liu P, Qin L, Fouda GG, Chavez LL, et al. Initial B-cell responses to transmitted human immunodeficiency virus type 1: virion-binding immunoglobulin M (IgM) and IgG antibodies followed by plasma anti-gp41 antibodies with ineffective control of initial viremia. J Virol. 2008;82(24):12449–63. Epub 2008/10/10. doi: 10.1128/JVI.01708-08 ; PubMed Central PMCID: PMC2593361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Mikell I, Sather DN, Kalams SA, Altfeld M, Alter G, Stamatatos L. Characteristics of the earliest cross-neutralizing antibody response to HIV-1. PLoS Pathog. 2011;7(1):e1001251. Epub 2011/01/21. doi: 10.1371/journal.ppat.1001251 ; PubMed Central PMCID: PMC3020924. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gray ES, Madiga MC, Hermanus T, Moore PL, Wibmer CK, Tumba NL, et al. The neutralization breadth of HIV-1 develops incrementally over four years and is associated with CD4+ T cell decline and high viral load during acute infection. J Virol. 2011;85(10):4828–40. Epub 2011/03/11. doi: 10.1128/JVI.00198-11 ; PubMed Central PMCID: PMC3126191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Simek MD, Rida W, Priddy FH, Pung P, Carrow E, Laufer DS, et al. Human immunodeficiency virus type 1 elite neutralizers: individuals with broad and potent neutralizing activity identified by using a high-throughput neutralization assay together with an analytical selection algorithm. J Virol. 2009;83(14):7337–48. Epub 2009/05/15. doi: 10.1128/JVI.00110-09 ; PubMed Central PMCID: PMC2704778. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hraber P, Seaman MS, Bailer RT, Mascola JR, Montefiori DC, Korber BT. Prevalence of broadly neutralizing antibody responses during chronic HIV-1 infection. AIDS. 2014;28(2):163–9. Epub 2013/12/24. doi: 10.1097/QAD.0000000000000106 ; PubMed Central PMCID: PMC4042313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sather DN, Armann J, Ching LK, Mavrantoni A, Sellhorn G, Caldwell Z, et al. Factors associated with the development of cross-reactive neutralizing antibodies during human immunodeficiency virus type 1 infection. J Virol. 2009;83(2):757–69. Epub 2008/11/07. doi: 10.1128/JVI.02036-08 ; PubMed Central PMCID: PMC2612355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Piantadosi A, Panteleeff D, Blish CA, Baeten JM, Jaoko W, McClelland RS, et al. Breadth of neutralizing antibody response to human immunodeficiency virus type 1 is affected by factors early in infection but does not influence disease progression. J Virol. 2009;83(19):10269–74. Epub 2009/07/31. doi: 10.1128/JVI.01149-09 ; PubMed Central PMCID: PMC2748011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Moore PL, Gray ES, Wibmer CK, Bhiman JN, Nonyane M, Sheward DJ, et al. Evolution of an HIV glycan-dependent broadly neutralizing antibody epitope through immune escape. Nat Med. 2012;18(11):1688–92. Epub 2012/10/23. doi: 10.1038/nm.2985 ; PubMed Central PMCID: PMC3494733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Klein F, Diskin R, Scheid JF, Gaebler C, Mouquet H, Georgiev IS, et al. Somatic mutations of the immunoglobulin framework are generally required for broad and potent HIV-1 neutralization. Cell. 2013;153(1):126–38. Epub 2013/04/02. doi: 10.1016/j.cell.2013.03.018 ; PubMed Central PMCID: PMC3792590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wibmer CK, Bhiman JN, Gray ES, Tumba N, Abdool Karim SS, Williamson C, et al. Viral escape from HIV-1 neutralizing antibodies drives increased plasma neutralization breadth through sequential recognition of multiple epitopes and immunotypes. PLoS Pathog. 2013;9(10):e1003738. Epub 2013/11/10. doi: 10.1371/journal.ppat.1003738 ; PubMed Central PMCID: PMC3814426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Sather DN, Carbonetti S, Malherbe DC, Pissani F, Stuart AB, Hessell AJ, et al. Emergence of broadly neutralizing antibodies and viral coevolution in two subjects during the early stages of infection with human immunodeficiency virus type 1. J Virol. 2014;88(22):12968–81. Epub 2014/08/15. doi: 10.1128/JVI.01816-14 ; PubMed Central PMCID: PMC4249098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Gao F, Bonsignori M, Liao HX, Kumar A, Xia SM, Lu X, et al. Cooperation of B cell lineages in induction of HIV-1-broadly neutralizing antibodies. Cell. 2014;158(3):481–91. Epub 2014/07/30. doi: 10.1016/j.cell.2014.06.022 ; PubMed Central PMCID: PMC4150607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Landais E, Huang X, Havenar-Daughton C, Murrell B, Price MA, Wickramasinghe L, et al. Broadly Neutralizing Antibody Responses in a Large Longitudinal Sub-Saharan HIV Primary Infection Cohort. PLoS Pathog. 2016;12(1):e1005369. Epub 2016/01/15. doi: 10.1371/journal.ppat.1005369 ; PubMed Central PMCID: PMC4713061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Rusert P, Kouyos RD, Kadelka C, Ebner H, Schanz M, Huber M, et al. Determinants of HIV-1 broadly neutralizing antibody induction. Nat Med. 2016;22(11):1260–7. Epub 2016/11/01. doi: 10.1038/nm.4187 . [DOI] [PubMed] [Google Scholar]
  • 20.Kouyos RD, Rusert P, Kadelka C, Huber M, Marzel A, Ebner H, et al. Tracing HIV-1 strains that imprint broadly neutralizing antibody responses. Nature. 2018;561(7723):406–10. Epub 2018/09/12. doi: 10.1038/s41586-018-0517-0 . [DOI] [PubMed] [Google Scholar]
  • 21.Luo S, Perelson AS. Competitive exclusion by autologous antibodies can prevent broad HIV-1 antibodies from arising. Proc Natl Acad Sci U S A. 2015;112(37):11654–9. Epub 2015/09/02. doi: 10.1073/pnas.1505207112 ; PubMed Central PMCID: PMC4577154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Powell RL, Kinge T, Nyambi PN. Infection by discordant strains of HIV-1 markedly enhances the neutralizing antibody response against heterologous virus. J Virol. 2010;84(18):9415–26. Epub 2010/07/16. doi: 10.1128/JVI.02732-09 ; PubMed Central PMCID: PMC2937625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Cortez V, Odem-Davis K, McClelland RS, Jaoko W, Overbaugh J. HIV-1 superinfection in women broadens and strengthens the neutralizing antibody response. PLoS Pathog. 2012;8(3):e1002611. Epub 2012/04/06. doi: 10.1371/journal.ppat.1002611 ; PubMed Central PMCID: PMC3315492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Sheward DJ, Marais J, Bekker V, Murrell B, Eren K, Bhiman JN, et al. HIV Superinfection Drives De Novo Antibody Responses and Not Neutralization Breadth. Cell Host Microbe. 2018;24(4):593–9 e3. Epub 2018/10/03. doi: 10.1016/j.chom.2018.09.001 ; PubMed Central PMCID: PMC6185870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Rolland M, Tovanabutra S, Dearlove B, Li Y, Owen CL, Lewitus E, et al. Molecular dating and viral load growth rates suggested that the eclipse phase lasted about a week in HIV-1 infected adults in East Africa and Thailand. PLoS Pathog. 2020;16(2):e1008179. Epub 2020/02/07. doi: 10.1371/journal.ppat.1008179 ; PubMed Central PMCID: PMC7004303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Dearlove B, Tovanabutra S, Owen CL, Lewitus E, Li Y, Sanders-Buell E, et al. Factors influencing estimates of HIV-1 infection timing using BEAST. PLoS Comput Biol. 2021;17(2):e1008537. Epub 2021/02/02. doi: 10.1371/journal.pcbi.1008537 ; PubMed Central PMCID: PMC7877758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Lewitus E, Townsley SM, Li Y, Donofrio GC, Dearlove BL, Bai H, et al. HIV-1 infections with multiple founders associate with the development of neutralization breadth. PLoS Pathog. 2022;18(3):e1010369. Epub 2022/03/19. doi: 10.1371/journal.ppat.1010369 ; PubMed Central PMCID: PMC8967031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Townsley SM, Donofrio GC, Jian N, Leggat DJ, Dussupt V, Mendez-Rivera L, et al. B cell engagement with HIV-1 founder virus envelope predicts development of broadly neutralizing antibodies. Cell Host Microbe. 2021. Epub 2021/03/05. doi: 10.1016/j.chom.2021.01.016 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Robb ML, Eller LA, Kibuuka H, Rono K, Maganga L, Nitayaphan S, et al. Prospective Study of Acute HIV-1 Infection in Adults in East Africa and Thailand. N Engl J Med. 2016;374(22):2120–30. Epub 2016/05/19. doi: 10.1056/NEJMoa1508952 ; PubMed Central PMCID: PMC5111628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Rolland M, Edlefsen PT, Larsen BB, Tovanabutra S, Sanders-Buell E, Hertz T, et al. Increased HIV-1 vaccine efficacy against viruses with genetic signatures in Env V2. Nature. 2012. Epub 2012/09/11. doi: 10.1038/nature11519 [pii]. . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lewitus E, Sanders-Buell E, Bose M, O’Sullivan AM, Poltavee K, Li Y, et al. RV144 vaccine imprinting constrained HIV-1 evolution following breakthrough infection. Virus Evol. 2021;7(2):veab057. Epub 2021/09/18. doi: 10.1093/ve/veab057 ; PubMed Central PMCID: PMC8438874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Marichannegowda MH, Mengual M, Kumar A, Giorgi EE, Tu JJ, Martinez DR, et al. Different evolutionary pathways of HIV-1 between fetus and mother perinatal transmission pairs indicate unique immune selection in fetuses. Cell Rep Med. 2021;2(7):100315. Epub 2021/08/03. doi: 10.1016/j.xcrm.2021.100315 ; PubMed Central PMCID: PMC8324465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Mishra N, Sharma S, Dobhal A, Kumar S, Chawla H, Singh R, et al. Broadly neutralizing plasma antibodies effective against autologous circulating viruses in infants with multivariant HIV-1 infection. Nat Commun. 2020;11(1):4409. Epub 2020/09/04. doi: 10.1038/s41467-020-18225-x ; PubMed Central PMCID: PMC7468291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Keele BF, Giorgi EE, Salazar-Gonzalez JF, Decker JM, Pham KT, Salazar MG, et al. Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proc Natl Acad Sci U S A. 2008;105(21):7552–7. Epub 2008/05/21. doi: 10.1073/pnas.0802203105 ; PubMed Central PMCID: PMC2387184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Abrahams MR, Anderson JA, Giorgi EE, Seoighe C, Mlisana K, Ping LH, et al. Quantitating the multiplicity of infection with human immunodeficiency virus type 1 subtype C reveals a non-poisson distribution of transmitted variants. J Virol. 2009;83(8):3556–67. Epub 2009/02/06. doi: 10.1128/JVI.02132-08 ; PubMed Central PMCID: PMC2663249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Janes H, Herbeck JT, Tovanabutra S, Thomas R, Frahm N, Duerr A, et al. HIV-1 infections with multiple founders are associated with higher viral loads than infections with single founders. Nat Med. 2015;21(10):1139–41. Epub 2015/09/01. doi: 10.1038/nm.3932 ; PubMed Central PMCID: PMC4598284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Tully DC, Ogilvie CB, Batorsky RE, Bean DJ, Power KA, Ghebremichael M, et al. Differences in the Selection Bottleneck between Modes of Sexual Transmission Influence the Genetic Composition of the HIV-1 Founder Virus. PLoS Pathog. 2016;12(5):e1005619. Epub 2016/05/11. doi: 10.1371/journal.ppat.1005619 ; PubMed Central PMCID: PMC4862634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Corey L, Gilbert PB, Juraska M, Montefiori DC, Morris L, Karuna ST, et al. Two Randomized Trials of Neutralizing Antibodies to Prevent HIV-1 Acquisition. N Engl J Med. 2021;384(11):1003–14. Epub 2021/03/18. doi: 10.1056/NEJMoa2031738 ; PubMed Central PMCID: PMC8189692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Le SQ, Gascuel O. An improved general amino acid replacement matrix. Mol Biol Evol. 2008;25(7):1307–20. Epub 2008/03/28. doi: 10.1093/molbev/msn067 . [DOI] [PubMed] [Google Scholar]
  • 40.Lewitus E, Morlon H. Characterizing and Comparing Phylogenies from their Laplacian Spectrum. Syst Biol. 2016;65(3):495–507. Epub 2015/12/15. doi: 10.1093/sysbio/syv116 . [DOI] [PubMed] [Google Scholar]
  • 41.Lewitus E, Rolland M. A non-parametric analytic framework for within-host viral phylogenies and a test for HIV-1 founder multiplicity. Virus Evol. 2019;5(2):vez044. Epub 2019/11/09. doi: 10.1093/ve/vez044 ; PubMed Central PMCID: PMC6826062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Wu X, Yang ZY, Li Y, Hogerkorp CM, Schief WR, Seaman MS, et al. Rational design of envelope identifies broadly neutralizing human monoclonal antibodies to HIV-1. Science. 2010;329(5993):856–61. Epub 2010/07/10. doi: 10.1126/science.1187659 ; PubMed Central PMCID: PMC2965066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Doria-Rose NA, Bhiman JN, Roark RS, Schramm CA, Gorman J, Chuang GY, et al. New Member of the V1V2-Directed CAP256-VRC26 Lineage That Shows Increased Breadth and Exceptional Potency. J Virol. 2016;90(1):76–91. Epub 2015/10/16. doi: 10.1128/JVI.01791-15 ; PubMed Central PMCID: PMC4702551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Julien JP, Sok D, Khayat R, Lee JH, Doores KJ, Walker LM, et al. Broadly neutralizing antibody PGT121 allosterically modulates CD4 binding via recognition of the HIV-1 gp120 V3 base and multiple surrounding glycans. PLoS Pathog. 2013;9(5):e1003342. Epub 2013/05/10. doi: 10.1371/journal.ppat.1003342 ; PubMed Central PMCID: PMC3642082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Soto C, Ofek G, Joyce MG, Zhang B, McKee K, Longo NS, et al. Developmental Pathway of the MPER-Directed HIV-1-Neutralizing Antibody 10E8. PLoS One. 2016;11(6):e0157409. Epub 2016/06/15. doi: 10.1371/journal.pone.0157409 ; PubMed Central PMCID: PMC4907498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Huang J, Kang BH, Pancera M, Lee JH, Tong T, Feng Y, et al. Broad and potent HIV-1 neutralization by a human antibody that binds the gp41-gp120 interface. Nature. 2014;515(7525):138–42. Epub 2014/09/05. doi: 10.1038/nature13601 ; PubMed Central PMCID: PMC4224615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9. Epub 2021/07/16. doi: 10.1038/s41586-021-03819-2 ; PubMed Central PMCID: PMC8371605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Pitisuttithum P, Gilbert P, Gurwith M, Heyward W, Martin M, van Griensven F, et al. Randomized, double-blind, placebo-controlled efficacy trial of a bivalent recombinant glycoprotein 120 HIV-1 vaccine among injection drug users in Bangkok, Thailand. J Infect Dis. 2006;194(12):1661–71. Epub 2006/11/17. doi: 10.1086/508748 . [DOI] [PubMed] [Google Scholar]
  • 49.Flynn NM, Forthal DN, Harro CD, Judson FN, Mayer KH, Para MF, et al. Placebo-controlled phase 3 trial of a recombinant glycoprotein 120 vaccine to prevent HIV-1 infection. J Infect Dis. 2005;191(5):654–65. Epub 2005/02/03. doi: 10.1086/428404 . [DOI] [PubMed] [Google Scholar]
  • 50.Buchbinder SP, Mehrotra DV, Duerr A, Fitzgerald DW, Mogg R, Li D, et al. Efficacy assessment of a cell-mediated immunity HIV-1 vaccine (the Step Study): a double-blind, randomised, placebo-controlled, test-of-concept trial. Lancet. 2008;372(9653):1881–93. Epub 2008/11/18. doi: 10.1016/S0140-6736(08)61591-3 ; PubMed Central PMCID: PMC2721012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Rerks-Ngarm S, Pitisuttithum P, Nitayaphan S, Kaewkungwal J, Chiu J, Paris R, et al. Vaccination with ALVAC and AIDSVAX to prevent HIV-1 infection in Thailand. N Engl J Med. 2009;361(23):2209–20. Epub 2009/10/22. doi: 10.1056/NEJMoa0908492 . [DOI] [PubMed] [Google Scholar]
  • 52.Gray GE, Allen M, Moodie Z, Churchyard G, Bekker LG, Nchabeleng M, et al. Safety and efficacy of the HVTN 503/Phambili study of a clade-B-based HIV-1 vaccine in South Africa: a double-blind, randomised, placebo-controlled test-of-concept phase 2b study. Lancet Infect Dis. 2011;11(7):507–15. Epub 2011/05/17. doi: 10.1016/S1473-3099(11)70098-6 ; PubMed Central PMCID: PMC3417349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Hammer SM, Sobieszczyk ME, Janes H, Karuna ST, Mulligan MJ, Grove D, et al. Efficacy trial of a DNA/rAd5 HIV-1 preventive vaccine. N Engl J Med. 2013;369(22):2083–92. Epub 2013/10/09. doi: 10.1056/NEJMoa1310566 ; PubMed Central PMCID: PMC4030634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Gray GE, Bekker LG, Laher F, Malahleha M, Allen M, Moodie Z, et al. Vaccine Efficacy of ALVAC-HIV and Bivalent Subtype C gp120-MF59 in Adults. N Engl J Med. 2021;384(12):1089–100. Epub 2021/03/25. doi: 10.1056/NEJMoa2031499 ; PubMed Central PMCID: PMC7888373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Chung AW, Ghebremichael M, Robinson H, Brown E, Choi I, Lane S, et al. Polyfunctional Fc-effector profiles mediated by IgG subclass selection distinguish RV144 and VAX003 vaccines. Sci Transl Med. 2014;6(228):228ra38. Epub 2014/03/22. doi: 10.1126/scitranslmed.3007736 . [DOI] [PubMed] [Google Scholar]
  • 56.Mdluli T, Jian N, Slike B, Paquin-Proulx D, Donofrio G, Alrubayyi A, et al. RV144 HIV-1 vaccination impacts post-infection antibody responses. PLoS Pathog. 2020;16(12):e1009101. Epub 2020/12/09. doi: 10.1371/journal.ppat.1009101 ; PubMed Central PMCID: PMC7748270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Ren X, Li Y, Liu X, Shen X, Gao W, Li J. Computational Identification of Antigenicity-Associated Sites in the Hemagglutinin Protein of A/H1N1 Seasonal Influenza Virus. PLoS One. 2015;10(5):e0126742. Epub 2015/05/16. doi: 10.1371/journal.pone.0126742 ; PubMed Central PMCID: PMC4433265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Steinbruck L, McHardy AC. Inference of genotype-phenotype relationships in the antigenic evolution of human influenza A (H3N2) viruses. PLoS Comput Biol. 2012;8(4):e1002492. Epub 2012/04/26. doi: 10.1371/journal.pcbi.1002492 ; PubMed Central PMCID: PMC3330098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Neher RA, Bedford T, Daniels RS, Russell CA, Shraiman BI. Prediction, dynamics, and visualization of antigenic phenotypes of seasonal influenza viruses. Proc Natl Acad Sci U S A. 2016;113(12):E1701–9. Epub 2016/03/10. doi: 10.1073/pnas.1525578113 ; PubMed Central PMCID: PMC4812706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Kratsch C, Klingen TR, Mumken L, Steinbruck L, McHardy AC. Determination of antigenicity-altering patches on the major surface protein of human influenza A/H3N2 viruses. Virus Evol. 2016;2(1):vev025. Epub 2016/10/25. doi: 10.1093/ve/vev025 ; PubMed Central PMCID: PMC4989879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Luksza M, Lassig M. A predictive fitness model for influenza. Nature. 2014;507(7490):57–61. Epub 2014/02/28. doi: 10.1038/nature13087 . [DOI] [PubMed] [Google Scholar]
  • 62.Korber B, Gaschen B, Yusim K, Thakallapally R, Kesmir C, Detours V. Evolutionary and immunological implications of contemporary HIV-1 variation. Br Med Bull. 2001;58:19–42. Epub 2001/11/21. doi: 10.1093/bmb/58.1.19 . [DOI] [PubMed] [Google Scholar]
  • 63.Gaschen B, Taylor J, Yusim K, Foley B, Gao F, Lang D, et al. Diversity considerations in HIV-1 vaccine selection. Science. 2002;296(5577):2354–60. Epub 2002/06/29. doi: 10.1126/science.1070441 . [DOI] [PubMed] [Google Scholar]
  • 64.Rolland M. HIV-1 phylogenetics and vaccines. Curr Opin HIV AIDS. 2019;14(3):227–32. Epub 2019/03/30. doi: 10.1097/COH.0000000000000545 . [DOI] [PubMed] [Google Scholar]
  • 65.Dearlove B, Lewitus E, Bai H, Li Y, Reeves DB, Joyce MG, et al. A SARS-CoV-2 vaccine candidate would likely match all currently circulating variants. Proc Natl Acad Sci U S A. 2020;117(38):23652–62. Epub 2020/09/02. doi: 10.1073/pnas.2008281117 ; PubMed Central PMCID: PMC7519301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Rolland M, Jensen MA, Nickle DC, Yan J, Learn GH, Heath L, et al. Reconstruction and function of ancestral center-of-tree human immunodeficiency virus type 1 proteins. J Virol. 2007;81(16):8507–14. Epub 2007/06/01. doi: 10.1128/JVI.02683-06 ; PubMed Central PMCID: PMC1951385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Fischer W, Perkins S, Theiler J, Bhattacharya T, Yusim K, Funkhouser R, et al. Polyvalent vaccines for optimal coverage of potential T-cell epitopes in global HIV-1 variants. Nat Med. 2007;13(1):100–6. Epub 2006/12/26. doi: 10.1038/nm1461 . [DOI] [PubMed] [Google Scholar]
  • 68.Barouch DH O’Brien KL, Simmons NL, King SL, Abbink P, Maxfield LF, et al. Mosaic HIV-1 vaccines expand the breadth and depth of cellular immune responses in rhesus monkeys. Nat Med. 2010;16(3):319–23. Epub 2010/02/23. doi: 10.1038/nm.2089 ; PubMed Central PMCID: PMC2834868. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Barouch DH, Tomaka FL, Wegmann F, Stieh DJ, Alter G, Robb ML, et al. Evaluation of a mosaic HIV-1 vaccine in a multicentre, randomised, double-blind, placebo-controlled, phase 1/2a clinical trial (APPROACH) and in rhesus monkeys (NHP 13–19). Lancet. 2018;392(10143):232–43. Epub 2018/07/27. doi: 10.1016/S0140-6736(18)31364-3 ; PubMed Central PMCID: PMC6192527. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Letourneau S, Im EJ, Mashishi T, Brereton C, Bridgeman A, Yang H, et al. Design and pre-clinical evaluation of a universal HIV-1 vaccine. PLoS One. 2007;2(10):e984. Epub 2007/10/04. doi: 10.1371/journal.pone.0000984 ; PubMed Central PMCID: PMC1991584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Rolland M, Nickle DC, Mullins JI. HIV-1 group M conserved elements vaccine. PLoS Pathog. 2007;3(11):e157. Epub 2007/12/07. doi: 10.1371/journal.ppat.0030157 ; PubMed Central PMCID: PMC2098811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Gaiha GD, Rossin EJ, Urbach J, Landeros C, Collins DR, Nwonu C, et al. Structural topology defines protective CD8(+) T cell epitopes in the HIV proteome. Science. 2019;364(6439):480–4. Epub 2019/05/03. doi: 10.1126/science.aav5095 ; PubMed Central PMCID: PMC6855781. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Moyo N, Wee EG, Korber B, Bahl K, Falcone S, Himansu S, et al. Tetravalent Immunogen Assembled from Conserved Regions of HIV-1 and Delivered as mRNA Demonstrates Potent Preclinical T-Cell Immunogenicity and Breadth. Vaccines (Basel). 2020;8(3). Epub 2020/07/10. doi: 10.3390/vaccines8030360 ; PubMed Central PMCID: PMC7563622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Jardine J, Julien JP, Menis S, Ota T, Kalyuzhniy O, McGuire A, et al. Rational HIV immunogen design to target specific germline B cell receptors. Science. 2013;340(6133):711–6. Epub 2013/03/30. doi: 10.1126/science.1234150 ; PubMed Central PMCID: PMC3689846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Rose PP, Korber BT. Detecting hypermutations in viral sequences with an emphasis on G—> A hypermutation. Bioinformatics. 2000;16(4):400–1. Epub 2000/06/27. doi: 10.1093/bioinformatics/16.4.400 . [DOI] [PubMed] [Google Scholar]
  • 76.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80. Epub 2013/01/19. doi: 10.1093/molbev/mst010 ; PubMed Central PMCID: PMC3603318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Schliep KP. phangorn: phylogenetic analysis in R. Bioinformatics. 2011;27(4):592–3. Epub 2010/12/21. doi: 10.1093/bioinformatics/btq706 ; PubMed Central PMCID: PMC3035803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol. 2020;37(5):1530–4. Epub 2020/02/06. doi: 10.1093/molbev/msaa015 ; PubMed Central PMCID: PMC7182206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14(6):587–9. Epub 2017/05/10. doi: 10.1038/nmeth.4285 ; PubMed Central PMCID: PMC5453245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Jespersen MC, Peters B, Nielsen M, Marcatili P. BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes. Nucleic Acids Res. 2017;45(W1):W24–W9. Epub 2017/05/05. doi: 10.1093/nar/gkx346 ; PubMed Central PMCID: PMC5570230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Mirdita M, Schutze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nat Methods. 2022;19(6):679–82. Epub 2022/06/01. doi: 10.1038/s41592-022-01488-1 ; PubMed Central PMCID: PMC9184281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Mirdita M, Steinegger M, Soding J. MMseqs2 desktop and local web server app for fast, interactive sequence searches. Bioinformatics. 2019;35(16):2856–8. Epub 2019/01/08. doi: 10.1093/bioinformatics/bty1057 ; PubMed Central PMCID: PMC6691333. [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010624.r001

Decision Letter 0

Rob J De Boer

19 Jun 2022

Dear Dr Rolland,

Thank you very much for submitting your manuscript "Optimal sequence-based design for multi-antigen HIV-1 vaccines using minimally distant antigens" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

As you can from both reviews, the methods used in the papers should be much better explained!

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Rob J. De Boer

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Lewitus et al describe a method to simulate HIV antigenic diversity using a hidden Markov model, intended to replicate the diversity found in individuals who produce broadly neutralising antibodies. The intended application is future vaccine design. This is naturally a potentially worthwhile direction of research and a potentially fruitful approach. Unfortunately the presented manuscript has simply too many unanswered questions for me to be able to recommend publication.

Unanswered question #1: What is the actual model here? As a reader unfamiliar with how HMMs are used in this field, the description of how the model works (L458-482) is unsatisfactory. Neither the underlying process nor the observation process are described. Step 1 is simply “a transition rate matrix was estimated”, but how was it estimated, and indeed, what are we transitioning between? Is this a model where evolution happens over time, or between subsequent sites on the genome? Nowhere in the paper is this made clear. The mathematical notation is equally confusing, e.g. “n_k is the number of occurrences” – the number of occurrences of what? Why does theta’, which is (I think) a matrix, occur multiple times as a constant in the multinomial p.m.f.? With j defined as a site and k as an amino acid, how can we ever have j=k? This is entirely incoherent. It makes it impossible for me to properly evaluate the potential strengths and weaknesses of the model. I accept that this may be one of a standard class of HMMs which I am not familiar with – but no other example is cited, and even if this is true it does not excuse the failings of the notation in this manuscript.

Unanswered question #2: What is to be made of the difference in variances in divergence and pairwise distance between the simulated and actual (multiple) sequences in figure 4? I presume that the reported p-values are for t-tests (i.e. equality of means); it seems highly unlikely that p>0.05 would be obtained if testing for equality of variances. It would seem that there is potentially much more diversity in real data than this model can readily capture in simulations. In that case, how generally useful is this?

Unanswered question #3: Is the phylogenetic pattern of one very diverse collection of sequences and one not very diverse collection typical of those samples identified as multiple founder? I would not say, on the face of it, that figures 5A or 6D look like multi-founder infections; in fact it rather looks like viruses in “founder 2” are closer to those in “founder 1” than they are to each other. So what actually links those in “founder 2” to each other? Why does that pattern not suggest that all samples are descendants of a single founder strain with a bimodal distribution of subsequent mutations? Are the authors sure that they have robustly identified infections with multiple founder strains, as the one presented example seems to beg a lot of questions? If the multiple-founder status of individual 40363 is established in another way, the question remains – is this pattern typical?

As a final comment: the use of p-values in a simulation study is always a little unsatisfactory because if a difference actually exists, any desired p threshold can be met by simply doing more simulations.

Reviewer #2: Lewitus et al describe a method for 'evolving' HIV env populations in silico from a seed sequence, where the goal is to simulate the type of within-host diversity that naturally occurs after infection with more than one founder variant. The authors propose that selected sequences from these in silico-evolved populations could be used as (multi-valent) HIV vaccine antigens, to elicit superior antibody neutralization breadth. This vaccine antigen design approach is supported by recent work by the authors, currently under review, showing that individuals with multiple founder variants were more likely to develop neutralization breadth years later. The authors conclude their results section by applying their method to "evolve" a subtype C consensus sequence, and propose that the approach could be applied to any subtype (or recombinant) consensus to generate subtype-specific vaccine antigens.

This is a novel approach that addresses an important health problem, but I have a number of concerns, which, if founded, could considerably reduce the relevance of the approach.

1. Dual-founder infections are initiated by two different (yet still closely related) sequences (as both founders are transmitted from the same donor). But, the authors do not seed their simulations with two different founders. Rather they start with a single seed, and 'evolve' it under two different rates, to produce descendant populations. These rates are based on empirical data from an individual with a dual founder infection (40363), and the approach does yield diverse descendant populations. But, the second rate, which was computed based on the diversity in the sequences that were identified as the descendants of 40363's second founder virus, seems unbelievably high. Is 40363's dataset a really "representative" one to use for this purpose? Perhaps more importantly, is this extremely high rate even accurate? Looking at the partial phylogeny of 40363's sequences (Figure 5A, bottom) I wondered if the sequences that were identified as the descendants of their second founder virus might in fact comprise two populations, namely: the *actual* descendants of the second founder virus (three sequences at the bottom of this tree) and recombinants of the first and second founders (the three divergent sequences in the middle of this tree). This indeed would produce a transition rate matrix with very high the rate. But, evolving sequences this way may not actually yield appropriate vaccine antigens.

This is worth addressing because the authors identify a "bimodal phylogenetic topology" as a key correlate of eliciting bNabs (line 376). Dual founder viruses however will produce a specific type of bimodal tree topology because the two founder sequences are different (as mentioned above there may also be recombination between descendant founders, yielding populations that look "intermediate" in a tree). But the bimodal phylogenies produced by the authors' approach could be of a very different type than 'natural' ones. The partial trees shown in 5A, and the presentation of results as tree summary statistics (where topological information is lost) is insufficient to alleviate my concern. (note: the authors do train the model on founder variants from six other participants, though it is not specified whether these also had widely different rate matrices like 40363's did, and they do present favorable summary statistics, but again no trees are shown).

To summarize, the authors should address:

- why they chose to simulate multiple founder descendant populations by applying different mutation rates to a single founder (rather than applying similar rates to distinct yet closely related founders)

- whether the vastly different rates for 40363's two founder populations are truly representative

- whether the phylogenies inferred from the simulated sequences are topologically similar to sequences sampled from real dual-variant infections.

- the potential implications of recombination in the empirical data, as well as the approach

2. Though the authors indicate that individuals with dual-founder infections are more likely to eventually produce bNAbs, this takes years because these antibodies are produced as a result of ongoing co-evolutionary process where Abs adapt to ever-evolving immune-escape variants in vivo (and some current clinical trials are sequentially immunizing with increasingly "evolved" immunogens to recapitulate this). The authors' approach however appears to design immunogens based on the immediate descendants of the founder viruses only, but does not address the issue of time, nor this biological process of Ab/HIV co-evolution. Or, are the authors proposing that this approach could be used to "evolve" immunogens in silico to generate sequential immunogens? This question ties into my question 3d) below regarding whether this process is really a Markov chain.

3. The paper would benefit from additional explanation of key biological and computational concepts to make it more accessible to readers with diverse expertise. Specifically:

a) Can the authors define what they mean by "minimally distant antigens"? I suspect that lines 408-410 contain the necessary information, but these lines don't come until the paper's conclusion. Adding to this confusion is that the authors, in their evolution of the subtype C strain, specifically select four *maximally diverse* sequences, presumably as their candidate vaccine antigens.

b) The authors should explicitly state, early on, that multi-founder virus infections happen when more than one donor sequence is transmitted to the recipient (and therefore that these viruses, though distinct from one another, are still quite closely related). While this will be obvious to experts, failing to state this could confuse non-experts, particularly as the authors mention superinfection and within- and between- subtype HIV diversity (where the scale of this type of diversity are far, far higher than that seen in a dual- or multi-variant infection).

c) The authors should avoid using the term "founder variant" to refer to BOTH the founder variant and its descendants. Use a different term for the latter. Also, the authors should state explicitly how the founder virus sequence is inferred from its descendant populations (by taking the population consensus of early descendants)

d) It is not clear that the use of "Markov chain" and "hidden Markov model" are used appropriately. Authors state that "sequences are simulated using a Markov chain". From the figure however, it seems that,

a set of output sequence (i.e. s1, s2, s3, etc) are produced by applying the relevant transition matrix (F1 or F2, each with a 0.5 probability) to the seed sequence. Is this correct, or are the output sequences produced stepwise from one another (ie s3 is generated by "mutating" s2, which was generated by "mutating s1")? If all simulated sequences are derived directly from the seed, then the use of "Markov chain" is somewhat misleading (as only the first step of the chain is ever performed, though repeatedly). Similarly, can the authors explain why they refer to the model as a *hidden* Markov model? Specifically, the founder viruses and transition matrices, which would normally be the aspects of the model that would be "hidden", are instead treated as known. Given this, can the authors clarify their use of this terminology?

e) The notation in the equations on p. 22 (step 3) are hard to follow, particularly the definitions of \\Theta and \\Theta'. If j is a site, k is an amino acid, and i_j is *also* a site, what does \\Theta[i_j,] mean (the comma must be a typo), and what does \\Theta'[i_{j=k}] mean? Then, what does \\Theta[i_{j\\noteq k}] mean? We think what you mean is that i_j is the amino acid at site j, though the rest of the notation is still unclear.

Moreover, in step 4, the authors' statement that a new amino acid was "drawn from a multinomial probability distribution" seems unnecessarily complex. In their setting, is N ever anything other than 1, and are n_1, n_2, ..., n_k ever anything other than a tuple of 0s and 1s where exactly one element is 1? It seems that it would be simpler to say that the new amino acid would be drawn according to the transition probability matrix.

A clearer way to rewrite steps 3 and 4 may be to write \\Theta_{k,l} for the transition probability of going from amino acid k to amino acid l (since j is used to denote a site). And, for each site j, make a transition with probability \\pi_j, and stay put with probability 1-\\pi_j. Then, if we have chosen to make a transition, we pick a new state according to the probabilities
P(transition from i_j to k) = \\Theta_{i_j, k}. 


f) Speaking of \\Theta and \\Theta', is the term "transition rate matrix" used correctly here? As I understand it, this process is a discrete time process so these are "transition probability matrices", not "rate" matrices.

g) Could the authors provide more detail on how the transition matrices are found. Are these directly computed from the empirical distribution of all mutations in each alignment, where the reference sequence is the consensus of the alignment (ie the founder virus)? If so, this should be stated directly.

4. How could this approach be used to develop a universal vaccine for HIV-1 group M? (lines 412-413)?

5. The discussion should feature a discussion of potential caveats/limitations of the approach.

Additional comments:

1) In figure 1E, the notation P(F1 U F2)= 0.5 is confusing. I assume this means that F1 and F2 have an equal (0.5) probability of being applied to a given output sequence. If so, this should be written as P(F1)=0.5; P(F2)=0.5

2) The results (lines 140-142) states that mutations occur... "such that at each iteration, , site of the seeding sequence had a probability of mutating proportional to to a new residue defined by , where and were randomly drawn from either 1 or 2 (Fig 1E).

But, this wording suggests that each *site* in a sequence could be subjected to either the mutation rules of F1 or the mutation rules of F2, independent of the other sites in that sequence. The legend of Figure 1E suggests the same. However, this approach could not produce the output shown in Figure 2. Instead, all sites within a given sequence must be subjected to either the F1 or F2 transition matrix... (?). Please clarify at what level (i.e. sequence or site) the choice of mutation rules is made.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No: Code is stated as being to follow but it is not currently available

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010624.r003

Decision Letter 1

Rob J De Boer

23 Sep 2022

Dear Dr Rolland,

Thank you very much for submitting your manuscript "Optimal sequence-based design for multi-antigen HIV-1 vaccines using minimally distant antigens" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Rob J. De Boer

Section Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I thank the authors for the extensive work done to respond to the first round of reviews. My remaining comments are minor:

I think that the process by which a transition matrix is estimated from the Los Alamos data is either missing a citation or underdescribed (L150, L429)

On line 164 the number of polymorphisms from training alignments is described as summarised by the mean, but in the legend of figure 2G the text indicates the median was used (L204).

Also in figure 2G legend the "dashed lines" could refer just to the error bars (L204); this is a reference to the horizontal dashed lines presumably.

I don't follow the argument that the bimodal or multimodal phylogenies in figure 4B must reflect recombination. These are simulated sequences and recombination was not explicitly modelled. If the argument is that recombination in the real alignment is the cause of these patterns, or that the simulation process generates patterns that mimic recombination, then this needs fleshing out.

The x-axis is missing from figure 4D.

Reviewer #2: The manuscript by Lewitus et al is substantially improved and many of the concerns raised by reviewers have been addressed. There are still some points that would benefit from clarification, mostly related to the mathematical notation, and some additional minor suggestions to improve the MS, as follows

Related to the mathematical notation and model:

1. line 94-

- One improvement would be to use two separate names for \\pi when it comes from F_1 and F_2. It appears that F_1(\\pi_j) and F_2(\\pi_j) are used to denote this (is this correct?), but it would be better to make this explicit, as this notation is introduced without explanation in lines 157-160

- The authors present the steps starting at the level of a single site, but this later requires some backtracking ("actually all of these sites follow the same template laid out for the entire sequence..." [paraphrasing]). Instead, a better starting point may be "at each iteration, choose one of F_1 and F_2." Everything flows better from there, since the entire sequence is be generated from that choice. (The same holds to some extent for lines 121-123.)

line 110: Is the transition matrix is calculated for overall rate, or for each position? It appears that it is the former, but the notation at line 119 seems to imply that it is per-site, which is confusing.

line 118: This is where the results could say: "At each iteration, choose one of F_1 or F_2. In this example, we will assume F_1 is chosen. Site j of the seeding sequence will have a probability of mutation proportional to F_1(\\pi_j); if it mutates, it will mutate according to the transition matrix \\theta".

line 119-120: Is there any reason to not simply do N/2 iterations with F_1 and then N/2 iterations with F_2? Doing so could also simplify this description.

line 145: perhaps change to "across *the 6* participants" to make it clear that these summary stats are for the 6 participants from RV217 with the multiple founder infections (i.e. those in figure 2)

lines 149-151: Suggestion to move this description of how \\theta is defined to the first mention of \\theta at line 110.

line 287: It is unclear how fig 5a-f show this. There is no scale on the x-axis.

line 439: Is the denominator needed? The normalization should presumably be inherent in \\theta. Also, is \\theta_k[k'] "the probability of a transition from k to k', assuming such a transition happens"? If so, should this be \\theta_k[k'] in the numerator?

line 441: P_n has not been defined anywhere; this is just P, for this step. Perhaps you meant to say "At each step n" in line 434. This would also be a good place to say "first pick F_1 or F_2", or "we first simulate N/2 sequences based on F_1 and then N/2 sequences based on F_2".

Other comments:

1. The manuscript is much more clear, but a brief accessible explanation of why "minimally distant" antigens are optimal for multivalent immunogen design, as early as possible in the introduction, would also be helpful

2. The authors' clarification in the response letter that the approach will primarily produce designs for immunogens used in *priming* vaccinations is helpful - as these are unlikely, on their own, to elicit NAb responses. However, I recommend that this is made more clear in the discussion (e.g. perhaps at the end of line 394 or concluding paragraph) as well as briefly in the abstract and/or author summary.

3. 5/6 of the training alignments are from individuals with CRF02_AE or A1, yet the model was used to produce simulated subtype C sequences (presumably as this the most prevalent subtype worldwide). Some readers may wonder whether this may be a concern; the authors may wish to briefly address this in the discussion.

4. The colors in Figure 2 are re-used in a confusing way. e.g. - maroon is used for simulated data in 2C, D, but the legend of 2F indicates that maroon denotes training data. Similarly, blue is used in 2B, C and D to denote F2, but in 2F it denotes simulated data. MF is also the same blue color, but since this is from the pooled alignment, it may be better to use a different color entirely. It could be helpful to match Figure 3 as well.

5. It would be helpful to have slightly more information on the alignment of 172 subtype C sequences from LANL, as there appear to be far more than this since 2011. Was this restricted to single-genome amplified sequences? HIV RNA (or was DNA also allowed)? What is meant by "sequences were de-duplicated at 95% identity"?

6. recommendation to avoid using "infected" when describing people with HIV, instead use "person with HIV" or "person living with HIV" e.g. lines 42, 65, 368

7. small typos: line 42 comma should be a period. line 372: "breath" should be "breadth".

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No: As in the previous version, it is stated that code will be provided but it is not currently present

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010624.r005

Decision Letter 2

Rob J De Boer

3 Oct 2022

Dear Dr Rolland,

We are pleased to inform you that your manuscript 'Optimal sequence-based design for multi-antigen HIV-1 vaccines using minimally distant antigens' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Rob J. De Boer

Section Editor

PLOS Computational Biology

Rob De Boer

Section Editor

PLOS Computational Biology

***********************************************************

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010624.r006

Acceptance letter

Rob J De Boer

17 Oct 2022

PCOMPBIOL-D-21-02212R2

Optimal sequence-based design for multi-antigen HIV-1 vaccines using minimally distant antigens

Dear Dr Rolland,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Anita Estes

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Highlighter plots of 6 RV217 infections with multiple founder variants.

    For each individual, a highlighter plot is constructed from sequences sampled during acute infection using the consensus as the master sequence. The number of days post-diagnosis at which each sequence was sampled is listed to the right of each plot.

    (TIF)

    S2 Fig. Phylogenies of 6 RV217 infections with multiple founder variants.

    For each individual, a phylogeny constructed from sequences sampled during acute infection and rooted on the majority consensus sequence.

    (TIF)

    S3 Fig. Mismatched sites between seeding, training, and simulated sequences.

    Correlation plot of the percentage of mismatched non-gapped sites between the consensus of the seeding alignment, simulated alignment, and training alignment for sequences simulated under each RV217 multi-founder infection.

    (TIF)

    S1 File. Candidate antigen sequences simulated from a subtype C Env consensus sequence and trained on pooled founder alignments sampled from six multi-founder acute infections in RV217.

    (TXT)

    Attachment

    Submitted filename: Response_to_reviewers_20220815.docx

    Attachment

    Submitted filename: Response_PCOMPBIOL-D-21-02212R1_20220929.docx

    Data Availability Statement

    Sequences analyzed in this study are available in GenBank under accession numbers: MN791130—MN792579, ON959609 - ON959788. The code and data generated during this study are available at https://www.hivresearch.org/publication-supplements.


    Articles from PLOS Computational Biology are provided here courtesy of PLOS

    RESOURCES