Modeling sequence evolution in HIV-1 infection with recombination

Elena E Giorgi; Bette T Korber; Alan S Perelson; Tanmoy Bhattacharya

doi:10.1016/j.jtbi.2013.03.026

. Author manuscript; available in PMC: 2014 Jul 21.

Published in final edited form as: J Theor Biol. 2013 Apr 6;329:82–93. doi: 10.1016/j.jtbi.2013.03.026

Modeling sequence evolution in HIV-1 infection with recombination

Elena E Giorgi ^a,^b, Bette T Korber ^a,^c, Alan S Perelson ^a,^c, Tanmoy Bhattacharya ^a,^c

PMCID: PMC3667750 NIHMSID: NIHMS465784 PMID: 23567647

Abstract

Previously we proposed two simplified models of early HIV-1 evolution. Both showed that under a model of neutral evolution and exponential growth, the mean Hamming distance (HD) between genetic sequences grows linearly with time. In this paper we describe a more realistic continuous-time, age-dependent mathematical model of infection and viral replication, and show through simulations that even in this more complex description, the mean Hamming distance grows linearly with time. This remains unchanged when we introduce recombination, though the confidence intervals of the mean HD obtained ignoring recombination are overly conservative.

Keywords: HIV, population dynamics, viral evolution

1. Introduction

The theory of population genetics has been applied to a vast number of organisms, from primates to bacteria and virus. Combined with coalescent theory and phylogenetic methods, it can trace back in time the dynamics and evolution of species. Previously (Lee et al., 2009; Giorgi et al., 2010), we developed an intra-host evolutionary model for HIV-1 during early infection, and tested the genetic diversity of early HIV-1 samples against a null model of neutral evolution. The method allows one to distinguish infections that are established by a single virus from those initiated by multiple viruses. It also allows one to estimate the time since infection and the onset of immune selection. Though the model is now widely used and performs well on different datasets (Keele et al., 2008; Wood et al., 2009; Keele et al., 2009), including very large ones obtained through 454 sequencing (Fischer et al., 2010), the viral lifecycle described in Lee et al. (2009) was a simplification of the underlying biology. Here we study a more realistic model of HIV replication and how the new approach affects the results.

Following in Sewall Wright’s footsteps, Kimura (1968) used the Fokker-Planck equation to develop a continuous time model of population genetics. This was later expanded by Gillespie (1984) who developed explicit simulation frameworks for continuous time stochastic models in population genetics.

In the first part of this paper, we develop a continuous time, age-dependent model to describe the genetic drift in a growing viral population. We use a deterministic approach and derive a model using the Von Foerster equation and renewal condition with age-dependent birth and death rates. We then expand this into a completely stochastic model using the Fokker-Planck equation and follow, through simulation, the variation in genetic distances across a finite population of viral sequences. We measure genetic distances in the sample using Hamming distances (HD), defined as the number of base positions at which two sequences differ. Model simulations show that even in this fully randomized model, the growth in mean HD, averaged over multiple runs, is linear in time, and that different choices of birth and death rates only affect the rate of the growth but not this behavior.

In the second part of the paper we study how recombination affects the results. Genetic recombination takes place when a newly constructed nucleic acid molecule arises from multiple template strains. Recombination events in HIV-1 are common (Onafuwa-Nuga et al., 2009; Robertson et al., 1995; Wooley et al., 1997; Sabino et al., 1994) and result from multiple infections of the same target cell (Chen et al., 2005; Dang et al., 2004; Jung et al., 2002; Levy et al., 2009). A number of studies have indicated that the HIV recombination rate is several times larger than its mutation rate (Batorsky et al., 2011; Jetzt et al., 2000; Levy et al., 2009; Shriner et al., 2004; Suryavanshi et al., 2007; Neher et al., 2010; Zhang et al., 2010), and that recombinants reach fixation and undergo a rapid expansion in the population, thus representing an evolutionary shortcut (Ramirez et al., 2008).

Previous recombination studies have suggested that recombination affects a virus’s ability to escape CTL responses (Mostowy et al., 2011) and has a significant impact on the emergence of drug resistance (Althaus et al., 2005; Fraser, 2005; Kouyos et al., 2009; Moutouhet al., 1996; Vijay et al., 2008). In Vijay et al. (2008) the authors use a simulation to study the effects of recombination on the changes in mean Hamming distance with time. However, in the works cited above, it is assumed that the host’s immune response plays a major role in defining the fitness of the viral strains. In this paper, we address recombination in our original model of exponential viral growth with no positive selection pressure from the environment, a scenario that is relevant to early dynamics of viral growth in the acute phase of HIV infection prior to the initiation of an effective adaptive immune response. Under this scenario, we see that recombination does not affect the linearity in time of the mean HD growth, though it does change the growth of the variance of the HD distribution.

2. Results

2.1. Continuous time model of viral evolution

In Lee et al. (2009) we described two simplified models of viral evolution, where cell infection and cell death happen at fixed, discrete times. In this section we present a more realistic model based on a continuous time birth and death process. We also introduce age-dependency in the model, where age represents the time since a cell was infected. The corresponding stochastic formulation is presented in Appendix A.

As stated in Lee et al. (2009), we assume that one unique genetic strain, called the transmitted/founder virus, initiates the infection. In general, it takes some time before the host’s immune response kicks in, and since during the early phase of the infection the viral population is much smaller than the target cell population, we assume that viral evolution is initially driven by exponential growth and random accumulation of mutations at allowed sites (Ribeiro et al., 2010). Genetic diversity can also be achieved through recombination, which takes place when two or more distinct strains infect one cell and give rise to new, recombined genomes. However, when the viral population is sufficiently small with respect to the target cell population, the effect of recombination during this initial phase of the infection should be negligible. Under these hypotheses, mutations from the founder strain accumulate randomly following a Poisson distribution (Lee et al., 2009).

In all our analyses, we ignore the presence of proteins such as APOBEC, which results in hypermutated sequences and can modify the mutation rate. We have included the capability to identify and exclude APOBEC mediated hypermutation in previous implementations of our model (Giorgi et al., 2010).

Once a virion enters a cell, the HIV enzyme reverse transcriptase transcribes the viral RNA into DNA, which is then integrated into the host genome. A viral genome integrated into the DNA of the host cell is called a provirus. The time from viral entry into a target cell until the first virions start budding out (called the eclipse phase) has been estimated to last about 24 hours (Perelson et al., 1996; Markowitz et al., 2003), although in some cases the virus can lie dormant inside the cell for years. The infected cell will on average survive another 24 hours while producing virus (Markowitz et al., 2003), and during this time it will produce tens of thousands of new viral particles (Chen et al., 2007). HIV is rapidly cleared from circulation, with an average half-life of roughly 45 minutes (De Boer et al., 2010; Ramratnam et al., 1999), and on average, in early infection, between 6 and 10 virions go on to successfully infect new cells (Ribeiro et al., 2010). The number of virions that successfully infect new cells in this initial phase is called the basic reproductive ratio, usually denoted R₀.

In all that follows, we neglect the time a virus spends outside a cell and choose to follow provirus instead. We only consider mutations that may occur during reverse transcription, and neglect the extremely rare mutations that happen after integration in the cell’s genome.

Given this framework, we let I(a, g, t) be the number of infected cells that at time t are of age a (i.e. a represents the time elapsed since the cell was infected) and generation g (i.e. the genome it carries has undergone g infection cycles since the transmitted virus, with one reverse transcription event occurring at each infection). Here a and t are non-negative, real variables, and g is a non-negative integer.

We assume that both the birth rate α(a) (the number of cells infected by each cell of age a per unit time) and death rate β(a) (the number of cells of age a that die per unit time) are functions that depend only on the age of the infected cell, a, and not on time t. Notice that α(a) is in fact an infection rate. Here we choose to call it birth rate to show that the classic birth and death process provides a good infection model when neglecting the time the virus spends outside the cell. Let Λ denote the lag before new virions start budding out of the infected cell (i.e. the length of the eclipse phase) and impose that α(a) = 0 for a ≤ Λ. In the discussion that follows, we implicitly assume that all functions evaluate to zero on negative arguments.

It has been shown (Von Foerster, 1959; Nisbet et al., 1982; Huddleston et al., 1983) that the dynamics of an age-structured birth and death system is determined by the following differential equations, called the Von Foerster equation and the renewal condition, respectively:

\begin{matrix} \frac{\partial}{\partial t} I (a, g, t) + \frac{\partial}{\partial a} I (a, g, t) = - β (a) I (a, g, t) \\ I (0, g + 1, t) = \int_{0}^{t} I (a, g, t) α (a) d a \end{matrix}

(1)

with initial conditions

I (a, g, 0) = {\begin{matrix} I_{0} & if a = g = 0 \\ 0 & otherwise . \end{matrix}

(2)

The first equation in (1) describes how infected cells are lost via death, whereas the second one describes how newly infected cells are formed via births. This is a classic age-structured birth and death system (Murray, 2002; Kevorkian, 2002), with the addition of the parameter g for generation cycles. Such models are called age-maturity structured and have been described in the context of cell kinetics, where the parameter g in that case represents the number of cell divisions (Bernard et al., 2003; Zilman et al., 2010).

Let $f (a) = α (a) e^{- \int_{0}^{a} β (s) d s}$ . Notice that $e^{- \int_{0}^{a} β (s) d s}$ can be interpreted as the expected fraction of cells whose age at death is greater than a. Therefore, f(a) represents the expected rate at which an infected cell survives up to age a and starts new infections. Let f̃ be the Fourier transform of f, i.e.

\tilde{f} (k) = \int_{Λ}^{\infty} f (a) e^{ika} d a .

(3)

We assume that β(a) is a strictly positive function with lim_a_→∞ β(a) ≠ 0 and such that lim_a_→∞ f (a) = 0. It follows (see Appendix B) that a closed solution of Eq. (1) is given by

I (g, t) = \frac{I_{0}}{2 π} \int_{0}^{t} e^{- \int_{0}^{a} β (s) d s} \int_{- \infty}^{\infty} \tilde{f} {(k)}^{g} e^{- i k (t - a)} d k d a

(4)

2.2. Mean Hamming Distance

In Keele et al. (2008) we attained plasma samples from HIV-1 subjects early after infection, from which viral sequences were derived. Roughly 80% of those subjects were infected by a single viral strain, and we called those infections homogeneous. Under our model of neutral exponential growth, mutations from the founder strain accumulate randomly following a Poisson distribution due to reverse transcription errors introduced when the virus infects a cell. As mutations accumulate, the mean of the Poisson distribution grows with time. Therefore, given a sample of viral sequences taken from a homogeneously infected subject, it is possible to count the number of mutations across all pairs of sequences, fit a Poisson distribution to the frequency counts, and use the mean of the Poisson to estimate how much time has elapsed since the beginning of the infection.

The Hamming distance (HD) between two genetic sequences is defined as the number of bases at which the two differ. In our original model (Lee et al., 2009; Giorgi et al., 2010), we showed that for homogeneous infections, in the absence of selection, the mean HD is given by

E [H D] (t) = \sum_{d} d P (H D = d ∣ t) \approx \sum_{d} \frac{\sum_{g} I (g, t) P_{Binom} (d; 2 {g N}_{B}, ε)}{\sum_{g} I_{g} (t)}

(5)

where ε is the viral mutation rate per base per generation, N_B is the length of the sequences, and g is the number of generation steps since the most recent common ancestor, or MRCA. P_Binom(d; n, p) is the binomial distribution with parameters n and p:

P_{Binom} (d; n, p) = (\begin{matrix} n \\ d \end{matrix}) p^{d} {(1 - p)}^{n - d} .

Here we are neglecting the possibility of multiple mutations at the same site, as these are expected to be very rare in early HIV infection (Lee et al., 2009).

In the synchronous infection model described in Lee et al. (2009), we assume that each provirus produces R₀ new proviruses and that all infected cells are of the same generation and are born and die synchronously. We calculated Eq. (5) for $I (g, t) = R_{0}^{g}$ (synchronous model) and for a slightly more complicated, but still simplified, scenario in which cells produce virions in two synchronous bursts, as a first approximation to an asynchronous infection. In both cases we saw that the mean HD (calculated over all pairs of sequences in a given sample) grows linearly with time and is proportional to 2gεN_B. In Lee et al. (2009) we showed that the MRCA is at most a couple of generations away from the transmitted/founder strain, and it coincides with the transmitted/founder strain in most cases. Since, in HIV infections, one replication cycle is roughly two days (Markowitz et al., 2003), we can use this result to estimate the time since the MRCA in early homogeneous infections (Keele et al., 2008; Wood et al., 2009; Keele et al., 2009).

We now show that the linearity in time of the mean HD growth remains unchanged when substituting the more general expression of I(g, t) from Eq. (4) into Eq. (5). We first derive the expression of the mean HD at generation g when the death rate is constant, i.e. β(a) = β₀ for all a > 0, and the birth rate is a piecewise linear function in age of the form α(a) = 0 for all a ≤ Λ, and α(a) = α₀(a − Λ) for all a > Λ. Expanding Eq. (4) using these particular birth and death rates, one obtains

I (g, t) = I_{0} \frac{α_{0}^{g}}{(2 g)!} [{(t - Λ g)}^{2 g} - {(Λ g)}^{2 g}] e^{- β_{0} t}

(6)

Given the biological properties of HIV infection, we can impose the condition that both β and α be positive, analytic, and asymptotically constant functions, and, in addition, that α(a) be null both at 0 and at infinity. We can then extend the above derivation of I(g, t) for general death and birth rates by noting that a bounded death rate function β(a) can always be approximated with a step function (a function that is constant on a given partition), and f̃ in Eq. (4) can be calculated for a generic birth rate from Eq. (3) using the Taylor expansion of α.

Finally, we use a simulation based on equations (4) and (5) to generate I(g, t) for different choices of piecewise linear death rate functions and a piecewise linear birth rate function as defined above. For these choices of birth and death rate functions, we generate the full population of infected cells, out of which we randomly sample N₀ sequences at each day since the start of infection. At any given time t, we assume that each infected cell in I(g, t) carries one proviral strain that has undergone exactly g infection cycles from the MRCA. Therefore, for each one of the N₀ sequences, knowing how many infection cycles it has undergone, we randomly draw from a Poisson distribution the number of mutations it has accumulated from the MRCA. We then calculate the mean intersequence HD assuming a star phylogeny, in other words, for any two given sequences s₁ and s₂ in the sample, HD[s₁, s₂] = HD₀[s₀, s₁] + HD₀[s₀, s₁], where s₀ is the MRCA, and HD₀ is the Hamming distance from the MRCA. Finally, we calculate the mean HD over all pairs and average over 10,000 runs. In Figure 1 we show the results when both death and birth rates are linear functions of age. Different choices of birth and death rates do not change the linearity in growth of the mean HD, but only affect the rate at which the mean HD grows (results not shown). This result confirms what we have previously shown in Lee et al. (2009) using a discrete generation model, namely that the mean HD is proportional to 2εgN_B.

We generated I(g, t) using the continuous, age-dependent model described in Section 1. Here *N_B* = 2, 600 base pairs, α(a) = α₀(a − Λ) for a > Λ, and β(a) = β₀a. At each day we sampled N₀ = 30 sequences, calculated the mean HD and then averaged over 10,000 runs. The blue dots shot the mean HD for one particular run, the red dots the average over all runs, and the black dashed line underneath shows the mean HD calculated from the theoretical framework. The vertical dashed lines are the 95% confidence intervals.

2.3. Recombination

We now discuss the effect of recombination on the growth of the mean HD. We start with the same scenario described in Lee et al. (2009): a single infecting strain establishes infection, no selection, exponentially growing viral population, and a negligible probability of observing doubly mutated sites (we show in Lee et al. (2009) that the probability of back mutations is $\leq O (\frac{g^{2} ε^{2}}{N_{B}})$ , which is negligible in our scenario of early infections). This is an apt description of early, heterosexually transmitted, HIV infections (Keele et al., 2008), and, under this scenario, prior to the onset of selection, the HD frequency counts follow a Poisson distribution. However, while in a Poisson model the mean HD of a sample of N sequences should approximately equal the HD variance times $\frac{N}{N - 1}$ , the HD variance was 4.7% lower than expected when averaging across all homogeneous subjects described in Keele et al. (2008). This difference fell within the 95% CI and was not statistically significant. Nonetheless, given the current advances in sequencing technology and the availability of larger sample sizes (i.e. 454 sequencing (Margulies, 2005)) it’s interesting to investigate whether this difference is due to recombination, which we neglected in our original model.

We now show that in the presence of recombination the HD frequency counts no longer follow a Poisson distribution, thus explaining possible divergences between the variance and the mean. As discussed in the previous section, in the absence of recombination, our simulation results show that both the continuous and stochastic models yield the same basic results as the previous simplified discrete generation model, namely that the mean HD is proportional to 2εgN_B.

Preliminary phylogenetic analyses conducted on viral samples attained at serial time points from early, homogeneous infections suggest that the mutation rate ε may vary greatly across patients (Wallstrom et al., 2010). Therefore, in all that follows, we revert to our simplified, synchronous infection model and assume that changes in the mean HD growth rate caused by possible divergences from such a simplified model are folded into the parameter ε, which needs to be measured in each patient individually.

To allow for the possibility of multiple strains infecting the same cell and possibly giving rise to a recombinant strain, we introduce a recombination rate 0 < ρ < 1. As a simplification, at each generation we let each viral strain either undergo a mutation event or a recombination (not both) with probabilities ρ and (1 − ρ)ε respectively. The more general case can easily be solved using the same methods. We assume ρ is constant in time and, for simplicity, we also assume that the parental strains recombine at a single position θ, which is a uniformly distributed random variable between 0 and the length N_B of the genome. Given the MRCA s₀, let HD₀(s, t) be the Hamming distance of sequence s from s₀, where s was sampled at time t. For sufficiently small sample sizes, early homogeneous infections follow a star-like phylogeny and both the HD₀ and the intersequence HD follow a Poisson distribution (Lee et al., 2009; Keele et al., 2008). In particular, the expectations are related, i.e. E[HD] = 2E[HD₀]. Under these assumptions, the HD₀ probability distribution at time t + dt is given by

\tilde{P} ({H D}_{0} = d; t + d t) = (1 - ρ) P_{ε} ({H D}_{0} = d; t + d t) + ρ P_{ρ} ({H D}_{0} = d; t + d t)

(7)

where P_ε and P_ρ are the probability of a genome being at distance d from s₀ after a mutation or a recombination event, respectively. Suppose that P_ρ has mean μ_ρ and variance $σ_{ρ}^{2}$ . Similarly, we denote by μ_ε and $σ_{ε}^{2}$ the mean and variance of P_ε. Then

\tilde{μ} (t) = E ({H D}_{0} ∣ t) = (1 - ρ) μ_{ε} (t) + ρ μ_{ρ} (t)

(8)

and

{\tilde{σ}}^{2} = Var ({H D}_{0}) = σ_{ε}^{2} + 2 ρ μ_{ε} (μ_{ε} - μ_{ρ}) + ρ (μ_{2, ρ} - μ_{2, ε}) - ρ^{2} {(μ_{ε} - μ_{ρ})}^{2}

(9)

where μ_2,_ε and μ_2,_ρ indicate the second moments of P_ε and P_ρ, respectively. When ρ = 0, we get ${\tilde{σ}}^{2} = σ_{ε}^{2}$ and when ρ = 1, ${\tilde{σ}}^{2} = σ_{ρ}^{2}$ . Notice also that when μ_ε = μ_ρ one has

{\tilde{σ}}^{2} (t) = σ_{ε}^{2} (t) + ρ (μ_{2, ρ} (t) - μ_{2, ε} (t))

(10)

In what follows, we suppress the variable t when it is clear from the context. When viral genomes undergo mutation

P_{ε} ({H D}_{0} = d; t + d t) = \sum_{k = 0}^{d} \tilde{P} (d - k; t) P_{m} (k; d t)

(11)

where P_m(k; dt) is the probability of having k mutations appear in a time interval dt. Assuming the independence of P̃ and P_m, we can use the fact that cumulants add under convolution. Let dλ be the mean of the distribution P_m in the time interval dt (for example, in a discrete, synchronous model, it would be the mutation rate times the sequence length). Then, omitting O(dλ²) terms,

\begin{matrix} μ_{ε} (t + d t) & = & \tilde{μ} (t) + d λ \\ σ_{ε}^{2} (t + d t) & = & {\tilde{σ}}^{2} (t) + d λ \\ μ_{2, ε} (t + d t) & = & {\tilde{μ}}_{2} (t) + (1 + 2 \tilde{μ} (t)) d λ \end{matrix}

(12)

2.4. Recombination alone

In order to compute P_ρ, it is useful to first envision a scenario in which an initial, genetically diverse population of I₀ sequences is allowed to grow exponentially with no accumulation of mutations. In other words, we set ε = 0 and ρ = 1. Suppose the HD₀ distribution of the I₀ sequences has initial mean μ₀ and variance $σ_{0}^{2}$ . At each generation a recombination position θ is drawn randomly from a uniform distribution, with θ = 0, …, N_B, where the two parental strings split and recombine. If θ = 0, then the recombinant is exactly the first sequence, whereas if θ = N_B, the recombinant is the second sequence. Under this scenario it is easy to see that the mean Hamming distance of a population of sequences will remain unchanged from the initial μ₀, whereas the total variance in the population will decline. As we show in Appendix C, at any generation g, the expectation values of the mean HD₀, μ(g), and the estimator of the variance s²(g) are given by

μ (g) = μ_{0}

(13)

and

s^{2} (g) = \frac{N_{B} - 1}{N_{B}} μ_{0} + \frac{1}{N_{B}} σ_{0}^{2} + 2 \frac{σ_{0}^{2} - μ_{0}}{g + 2} [{(\frac{N_{B}}{N_{B} + 1})}^{g - 1} + O (N_{B}^{- 1})] \prod_{j = 0}^{g - 1} \frac{N_{j} - 1}{N_{j}}

(14)

where μ₀ and $σ_{0}^{2}$ are the mean and variance HD₀ of the initial population of I₀ sequences, N_j is the size of the population at generation j, and N_B is the total sequence length.

In order to test the formulae in (14), we devised a simulation based on the previously described synchronous infection model (Lee et al., 2009) where no site is mutated more than once (infinite site assumption). We start at generation g = 0 with I₀ sequences, each with an initial HD₀ drawn from a probability distribution with mean and variance μ₀ and $σ_{0}^{2}$ respectively (in the simulations shown in the figures we used $μ_{0} = σ_{0}^{2} = 10$ ). At each later generation we resample $I_{0} R_{0}^{g}$ sequences from the previous population of size $I_{0} R_{0}^{g - 1}$ , and introduce recombination and mutation events as described below.

We first tested the scenario where ρ = 0 and ε = 0.003 (we used a much larger value than the estimated 10⁻⁵ HIV-1 mutation rate in order to make mutations accumulate faster since by the time we reach a population size of 2¹⁰ or larger the simulation becomes computationally intensive). In the mutation only scenario, we randomly draw the number of positions to mutate from a Poisson distribution with mean εN_B, where N_B is the length of the sequences. If HD₀(s, g − 1) is the HD₀ of a sequence s at generation g − 1, and d is randomly drawn from Pois(λ = εN_B), then HD₀(s, g) = HD₀(s, g − 1)+d, and the d positions where these mutations take place are drawn randomly from a uniform distribution. Conversely, in the recombination only scenario, i.e., when ρ = 1 and ε = 0, at every generation g, for every sequence s in the sample, we draw the two parents from the previous generation and a recombination position θ uniformly and form s as the recombinant child.

In Figures 2 and 3 we show the mean HD₀ and variance HD₀ for the two scenarios respectively, averaged over 10,000 runs. The simulation is represented by the black dots, and the theoretical results are shown in the red dashed line. 95% confidence intervals are shown in gray, though in Figure 2 they are barely visible. In both scenarios the simulation results fall within the 95% confidence intervals from the theoretical expectations. Note that in Figure 3, since the fluctuations at different times are positively correlated, all points remain above the mean, following a large fluctuation in the first generation.

Comparison between theory (red dashed line) and simulation (black dots) for an exponentially growing population with ρ = 0 (no recombination) and ε = 0.003. Mean and variance of the HDs are averaged over 10,000 runs. At generation g = 0 the initial population is made of 100 sequences with HD mean and variance λ₀ = 10. Confidence intervals (±1.96 times the standard deviation over runs) are represented by vertical gray segments, which are smaller than the size of the symbols in the figure and hence are not visible.

Comparison between theory (red dashed line) and simulation (black dots) for an exponentially growing population with ρ = 1 and ε = 0 (recombination only). Mean and variance of HD are averaged over 10,000 runs. At generation g = 0 the initial population is made of 100 sequences with HD mean and variance λ₀ = 10. Confidence intervals (±1.96 times the standard deviation over runs) are represented by vertical gray segments. Errors at different time points are positively correlated.

2.5. Mutations and Recombination

In Appendix C we generalize the methods used to derive Eq. (14) and find recursive formulae for the general case when both ρ and ε are non-zero. For the mean HD₀ we obtain

μ (g) = N_{B} ε (1 - ρ) g + μ_{0} .

(15)

This shows that even in the presence of recombination, the mean HD₀ grows linearly like in the mutation only scenario. The slope of the linear growth here is diminished by a factor 1 − ρ, which reflects the fact that in our model a recombining sequence does not mutate.

Let ξ = (1 − ρ)ε, and let x and y be any given positions in the genome. In Appendix D we derive the following expressions for the HD₀ mean μ_x at position x, the HD₀ variance estimator $s_{x}^{2}$ at position x, and the HD₀ covariance estimator s_xy between any two positions x and y:

μ_{x} (g + 1) = μ_{x} (g) + ξ

(16)

s_{x}^{2} (g + 1) = \frac{N_{g} - 1}{N_{g}} s_{x}^{2} (g) + ξ (1 - ξ)

(17)

s_{x y} (g + 1) = (1 - ρ \frac{y - x}{N_{B} + 1}) \frac{N_{g} - 1}{N_{g}} s_{x y} (g) - \frac{N_{g + 1}}{N_{g + 1} - 1} ξ^{2} + ξ (ε - \frac{1}{N_{g + 1}}) - 2 ε ξ^{2} \frac{1 - ε}{N_{g + 1}}

(18)

By summing Eqs. (16), (17), and (18) across all positions x and y we get the mean HD₀ and the variance estimator s²(g) in the general case when both mutation and recombination are present. Again, we verified these theoretical derivations against numerous simulation runs. The simulation was designed as follows.

For a fixed recombination rate ρ and mutation rate ε, we first draw the number of recombination events from a binomial distribution with probability ρ. Subsequently, for each sequence s in the new generation, if s is a recombinant, we uniformly draw the two parents and a recombination position θ, and form s as the recombinant child. Else, we randomly draw a number of mutated positions from a Poisson distribution with mean εN_B, where N_B is the length of the sequences. If HD₀(s, g − 1) is the HD₀ of a sequence s at generation g − 1, and d is randomly drawn from a Poisson distribution with mean εN_B, then HD₀(s, g) = HD₀(s, g − 1) + d, and the d positions where these mutations take place are drawn randomly from a uniform distribution.

Because of computational limitations, we limited the number of generations to 6–8 and simulated 10,000 sets, then averaged the HD₀ means and variances and compared the results to the theoretical results discussed in the previous sections. Instead of simulating from the very beginning of the infection, we started at generation g = 0 with I₀ = 100 sequences, each proviral sequence with an HD₀ from the founder strain randomly drawn from a Poisson distribution with mean λ = 10. The code was written in R (R Development Core Team, 2010).

In Figures 4, 5, and 6 we compare the simulation results with the theoretical derivations. The simulation results are denoted with the black dots and the theoretical derivations with the red dashed lines. Notice that because Eq. (D.3) and Eq. (D.4) were obtained up to terms of the order N⁻², where N is the sample size, when we carry the summation over all positions to obtain the general expression for s²(g), the fluctuations become of the order N⁻¹. In Figure 6 we omit the comparison with the theoretical derivation for the first two generations. In fact, for these, the sample size is small enough that fluctuations are sizable. However, starting from g = 3, theory and simulation overlap almost perfectly.

Comparison between theory (red dashed line) and simulation (black dots) of the contribution of each position (denoted x and y, located at one third and two thirds of the genome) to the variance of HD₀ for an exponentially growing population. Here ρ = 0.5 and ε = 0.003. These results were obtained from 10,000 independent simulation runs with the same initial conditions, as described in the main text. The simulation is represented by the black dots, and the theory by the dashed red line. Confidence intervals (±1.96 times the standard deviation over runs) are represented by vertical gray segments. Notice that the errors are positively correlated.

Comparison between theory (red dashed line) and simulation (black dots) of the contribution of each position pair (denoted x and y, which here we took to be one third and two thirds of the genome) to the variance of HD₀ for an exponentially growing population. Here ρ = 0.5 and ε = 0.003. These results were obtained from 10,000 independent simulation runs with the same initial conditions, as described in the main text. The simulation is represented by the black dots, and the theory by the dashed red line. Confidence intervals (±1.96 times the standard deviation over runs) are represented by vertical gray segments.

Comparison between theory (red dashed line) and simulation (black dots) of the mean and variance HD₀ for an exponentially growing population with ρ = 0.5 and ε = 0.003. The left panel represents the mean, and the right panel the variance, both averaged over 10,000 runs. Like before, at generation g = 0 the initial population is made of 100 sequences with HD mean and variance λ₀ = 10. The simulation is represented by the black dots, and the theory by the dashed red line. Confidence intervals (±1.96 times the standard deviation over runs) are represented by vertical gray segments. In the right panel, the first few generations are omitted. The theory we developed neglects early stochastic events, which are of the order $\frac{1}{N_{g}}$ , where *N_g* is the population size at generation g. Hence, we find that only when the population size is large enough, do theory and simulation overlap.

We performed a simulation with parameters taken from the early homogeneous sample SUMA, previously described in Keele et al. (2008) and Lee et al. (2009) (Figure 7). This time the initial population consisted of a single infecting strain and, at each generation, we sampled N = 35 sequences, which was the sample size of the original alignment. Under this particular scenario, on average, the HD variance was lower than the mean HD. This was consistent with what observed across all homogeneous patients in Keele et al. (2008), for whom the HD variance was, on average, 4.7% lower than what expected from a Poisson model. According to our original Poisson model, SUMA was estimated to be 5 generations into the infection, which was consistent with a Fiebig stage II. In Figure 7 we show that adding recombination to the model does indeed cause the HD frequency counts to no longer follow a Poisson distribution, but the effect is so small that the estimate attained through our original model still remains valid.

Simulation runs based on the patient SUMA (Keele et al., 2008). Assuming a single strain initiated the infection, at every generation we sampled N = 35 sequences, calculated the HD mean and HD variance, and then repeated 10,000 times. The top panels represent the simulation run in the absence of recombination, the bottom panels represent the simulation when recombination was present (ρ = 2 × 10⁻⁵, as estimated in Neher et al. (2010)). The red dots represent the HD mean (left) and HD variance (right) averaged over 10,000 runs. The black vertical bars represent the 95 % CIs. Finally, the purple diamond denotes SUMA’s HD mean (left) and HD variance (right). As one can see, the effect of recombination is so small that both models fit the data well, thus validating our previous approach.

3. Conclusions

We developed a model to study the rates of evolution and genetic diversification in acute HIV-1 infection. We used the model to examine the effects of recombination in acute HIV infection sampled prior to the onset of selection and during the exponential growth of the viral population. Previously, we neglected recombination and used a simplified model of infection where all infected cells produced virions and died synchronously (Lee et al., 2009). Using this model we showed that the mean intersequence HD grows linearly with time as E(HD) ∝ 2εN_Bg, where ε is the mutation rate, N_B is the length of the sequences, and g is the generation number. We applied this model to homogeneously infected subjects (Keele et al., 2008; Wood et al., 2009) and, using the linearity in time of the mean HD, for each homogeneously infected subject, were able to estimate the time since the most recent common ancestor.

In this paper, we explored a more complex model of HIV replication. This was motivated by the fact that in general, our estimates in Keele et al. (2008) were well correlated with the time since the infection obtained from clinical data. However, in a Poisson model, we expect the mean HD₀ to be equal to the variance of the HD₀ times $\frac{N - 1}{N}$ , where N is the sample size. Even though not statistically divergent from a Poisson distribution, all homogeneous subjects presented in Keele et al. (2008) had a variance HD that was on average 4.7% lower than the expected theoretical value had the distribution been a Poisson. Here we showed that even in an age-dependent continuous model of infection, the mean HD₀ still grows linearly with time, and this continues to hold in a fully stochastic model. While recombination did not affect the linearity of the mean HD₀ growth, in the presence of recombination the HD₀ distribution is no longer Poisson. In this paper we modeled the new dependency of the variance HD₀ with time, as it no longer equals that of the mean HD₀. We also showed how this new framework helps explain some of the divergence between mean and variance HD in homogeneous samples previously described in Keele et al. (2008). Furthermore, because the effect is small, our work validates our previous methods that estimate the time of infection based on a Poisson model.

This work suggests that our previous, simplified model (Lee et al., 2009) provides reasonable estimates of the time and number of generations since a founder strain initiates the infection even in the presence of in vivo recombination. It is also robust against instances of in vitro recombination (Salazar-Gonzales et al., 2008), which becomes a relevant issue particularly when using 454 deep sequencing (Fischer et al., 2010). For 454 samples in particular, where sample sizes can reach the tens of thousands, our method provides a robust alternative to computationally intense methods such as those employed in the software package BEAST (Drummond et al., 2007).

Finally, we notice that the mathematical framework developed here can be used in cell division models where it is relevant to keep track of the number of divisions cells undergo (generations) (Zilman et al., 2010; Bernard et al., 2003).

Highlights.

We describe a continuous time, age-dependent model of early HIV infection.
We show that the new model yields the same results as a simplified model we published in 2009.
We introduce recombination in the model.
We show how recombination affects the results compared to our previous model.

Acknowledgments

This work was supported by the US Department of Energy through the LANL LDRD program, the Center for HIV/AIDS Vaccine Immunology, and NIH grants U19-AI067854-07, UM1-AI100645-01, AI028433, and OD011095.

Appendix A. Stochastic model

We present a stochastic formulation of the continuous model presented in Section 2.1. In all that follow we use the same mathematical framework developed for birth and death stochastic processes, with the added parameter g for generation. To keep the discussion more general, we will use the word “births” instead of “new infections.” Clearly, when applied to the infection model described in this paper, new births are newly infected cells.

Denote U = ℝ₊ × ℤ₊ the space of all pairs u = (a, g), as defined in Section 2. We define a state of the system as a function I: U → ℤ₊ that associates to every u ∈ U a positive integer I(u) = I(a, g), the number of infected cells of age a and generation g.

Let Inline graphic be the space of all functions I: ℝ₊ × ℤ₊ → ℤ₊. At any given time t ∈ ℝ₊, and for any given state I ∈ , let P (I; t) be the probability of the system being described by the state I at time t. Even though P: × ℝ₊ →[0, 1], for simplicity we omit the dependency in time and consider P as an element in S( Inline graphic ), where S( ) = { → [0, 1]}, the space of all operators from into [0, 1].

To better illustrate the derivation, we first show how to obtain the master equation discretized in time increments Δt. Define Δ: ℝ₊ × Inline graphic → as follows. Given τ ∈ ℝ₊, I ∈ , and (a, g) ∈ U, then Δ_τ I(a, g) = I(a + τ, g) − I(a, g). From the above notice that for every I ∈ and every (a, g) ∈ U, (I + Δ_τ I)(a, g) = I(a + τ, g). Δ_τ I represents the difference between state I at time t + τ and state I at time t when cells age only (e.g. no deaths nor births). It follows that, given I_t₊_τ, the state at time t is I_t₊_τ − Δ_τ I.

We also define the Kroenecker Delta operator as δ: U → Inline graphic as follows. Given u₀ ∈ U, δ_u₀∈ is defined so that for every u ∈ U such that u ≠ u₀, δ_u₀ (u) = 0 and δ_u₀(u₀) = 1.

Suppose now that at a given time t, the system is completely described by the state I. We wish to compute the probability P(I, t + Δt). In other words, we want to see how the system evolves after an increment in time of Δt. There are three possible contributions to the term P(I, t + Δt). In what follows we assume that for every age a and generation g, out of all infected cells I(a, g) only one can die or be born at the time. Hence, the contributions are:

No deaths, no births, cells can only survive. The contribution in this case is
$\begin{matrix} P (I - Δ_{Δ t} I; t) \prod_{u \in U} {(1 - β (u) Δ t)}^{(I - Δ_{Δ t} I) (u)} \\ \prod_{u \in U} {(1 - α (u) Δ t)}^{(I - Δ_{Δ t} I) (u)} \end{matrix}$ (A.1)
One cellu₀ dies. The contribution (summed over all possible u₀ ∈ U ) is
$\sum_{u_{0} \in U} P (I + δ_{u_{0}}; t) (\prod_{u \neq u_{0}} {(1 - β (u) Δ t)}^{I (u)}) β (u_{0}) Δ t (I (u_{0}) + 1)$ (A.2)
One cellu₀ is born. The contribution (summed over all possible u₀ ∈ U ) is
$\begin{matrix} \sum_{u \in U} P (I - δ_{u_{0} = (0, g + 1)}; t) (\prod_{u} {(1 - β (u) Δ t)}^{I (u)}) \\ α (u) Δ t (I (u) - δ_{u_{0} = (0, g + 1)}) \end{matrix}$ (A.3)

We expand each of the above terms in Δt and neglect terms in O(Δt²) to obtain, for each piece respectively:

$P (I - Δ_{Δ t} I; t) (1 - \sum_{u \in U} (I - Δ_{Δ t} I) (u) [β (u) + α (u)] Δ t)$ (A.4)
$\sum_{u \in U} P (I + δ_{u}; t) (I (u) + 1) β (u) Δ t$ (A.5)
$\sum_{u \in U} P (I - δ_{u_{0} = (0, g + 1)}; t) (I (u) - δ_{u_{0} = (0, g + 1)}) α (u) Δ t$ (A.6)

The first order master equation of the process, discretized in time, is therefore

\begin{matrix} \frac{P (I; t + Δ t) - P (I - Δ_{Δ t} I; t)}{Δ t} = \\ = - P (I - Δ_{Δ t} I; t) (\sum_{u \in U} (I - Δ_{Δ t} I) (u) [β (u) + α (u)]) + \\ + \sum_{u \in U} P (I + δ_{u}; t) (I (u) + 1) β (u) + \\ + \sum_{u \in U} P (I - δ_{(0, g + 1)}; t) (I (u) - δ_{u_{0} = (0, g + 1)}) α (u) \end{matrix}

(A.7)

To obtain the continuous master equation we have to take the limit for Δt → 0 on both sides of Eq. (A.7). In order to do so, we use an approximation that is a direct consequence of the definition of functional derivative (Gelfand et al., 1963): given x, y ∈ Inline graphic and f ∈ S( ), the following holds

f (x + y) ≃ f (x) + \int_{U} y (u) \frac{\partial f}{\partial x (u)} d u

(A.8)

Using the above, the left-hand side of Eq. (A.7) becomes

\frac{\partial P}{\partial t} - \int_{U} \frac{\partial I}{\partial a} (u) \frac{\partial P}{\partial I (u)} d u

(A.9)

Notice that in the absence of births and deaths (i.e. α = β = 0), the above reduces to

\frac{\partial P}{\partial t} (I; t) - \int_{U} \frac{\partial I}{\partial a} (u) \frac{\partial P}{\partial I (u)} (I; t) d u = 0

(A.10)

which describes the survival of all cells present at time t = 0.

In order to expand the right-hand side (RHS) of Eq. (A.7), we collect the terms in β together, and similarly for the terms in α. We approximate to the first order in Δt and let Δt → 0.

RHS (β) = P (I; t) (\sum_{u} I (u) β (u) + \sum_{u} β (u) - \sum_{u} I (u) β (u)) + \sum_{u} (\frac{\partial P}{\partial I (u)} (I)) I (u) β (u) + o (Δ t)

(A.11)

In the limit the above becomes

P (I; t) \int_{U} β (u) d u + \int_{U} β (u) \frac{\partial P}{\partial I (u)} I (u) d u

(A.12)

By expanding RHS(α) in a similar way, one obtains the continuous master equation

\begin{matrix} \frac{\partial P}{\partial t} (I; t) = \int_{U} \frac{\partial I}{\partial a} (u) \frac{\partial P}{\partial I (u)} (I; t) d u = \\ = \int_{U} μ (u) [P (I; t) + \frac{\partial P}{\partial I (u)} (I; t) I (u)] d u + \\ - \int_{\partial U} α (u) [P (I; t) - \frac{\partial P}{\partial I (u)} (I; t) I (u)] d u + \\ - \int_{U} \frac{\partial P}{\partial I (0, g + 1)} (I; t) I (u) α (u) d u \end{matrix}

(A.13)

where ∂U = {u = (a, g) ∈ U|a = 0, g > 0}. In all that follows we assume that newborn cells cannot give birth, which implies that α_|_∂U = 0. As a consequence, the second term on the RHS is null. We now use the following two claims to re-write the above equation.

Claim Appendix A.1

Let u ∈ U. We define a map S( Inline graphic ) → S( ) that to every P ∈ S( ) associates a P̃_u ∈ S( ) so defined: P̃_u(I) = I(u)P (I). Then

\frac{\partial {\tilde{P}}_{u} (I)}{\partial I (u)} = P (I) + I (u) \frac{\partial P (I)}{\partial I (u)}

(A.14)

Proof

Given ε > 0 and u₀ ∈ U, define δ_u₀,ε ∈ Inline graphic as the following Kroencker delta: δ_u₀,ε(u) = ε δ(u − u₀). Then

{\tilde{P}}_{u} (I + δ_{u, ε}) - {\tilde{P}}_{u} (I) = (I + δ_{u, ε}) (u) P (I + δ_{u_{0}, ε}) - I (u) P (I) = δ_{u, ε} (u) P (I + δ_{u_{0}, ε}) + I (u) [P (I + δ_{u, ε}) - P (I)]

(A.15)

By dividing by ε and letting ε → 0 we get the assert.

Claim Appendix A.2

Given u₀ ≠ u, we have

\frac{\partial {\tilde{P}}_{u} (I)}{\partial I (u_{0})} = I (u) \frac{\partial P (I)}{\partial I (u_{0})}

(A.16)

As a consequence, we can rewrite the master equation as follows

\begin{matrix} \frac{\partial P}{\partial t} (I; t) - \int_{U} \frac{\partial I}{\partial a} (u) \frac{\partial P}{\partial I (u)} (I; t) d u \\ = \int_{U} \frac{\partial}{\partial I (u)} [I (u) P (I; t)] μ (u) d u \\ - \int_{U} \frac{\partial}{\partial I (u_{0})} [I (u) P (I; t)] α (u) d u \end{matrix}

(A.17)

where u₀ = (0, g + 1).

Eq. (A.17) reduces to well known results when the birth and death rates are both constant, in which case, if we integrate both sides over the domain Inline graphic = {I ∈ |∫_U I = N_t}, where N_t is the total number of cells at time t, and N₀ is the initial number of cells. we obtain

\frac{\partial P}{\partial t} (N_{t}; t) = β [(N_{t} + 1) P (N_{t} + 1; t) - N_{t} P (N_{t}; t)] + α [(N_{t} - 1) P (N_{t} - 1; t) - N_{t} P (N_{t}; t)]

(A.18)

which is the master equation of the classic linear birth and death process. This is fully discussed in Reichl (1998), where a solution is given in terms of the generating function F(z; t) = Σ_{N_t}P (N_t; t) z^N_t by

F (z; t) = {[\frac{β (z - 1) e^{(α - β) t} - α z + β}{α (z - 1) e^{(α - β) t} - α z + β}]}^{N_{0}} .

(A.19)

Here N₀ is the initial number of infected cells.

Appendix B. Continuous model, closed form solution

We now develop the closed form solution to Eq. (1). One can verify that, given a smooth function ε(g, x) such that ε(g, x) = 0 when x < 0, then

I (a, g, t) = ϕ (g, t - a) e^{- \int_{0}^{a} β (s) d s}

(B.1)

satisfies the first equation in (1). Substituting into the renewal condition, one gets that ε should also satisfy

ϕ (g + 1, t) = \int_{Λ}^{\infty} α (a) e^{- \int_{0}^{a} β (s) d s} ϕ (g, t - a) d a .

(B.2)

Given the initial population of I₀ cells, at any given time t, the only cells at g = 0 are the survivors from the initial population. In other words:

I (a, 0, t) = {\begin{matrix} I_{0} e^{- \int_{0}^{a} β (s) d s} & if a = t \\ 0 & otherwise . \end{matrix}

(B.3)

It follows that for g = 0, ε is completely determined by ε(0, t − a) = δ(t − a)I₀, where δ(t − a) is the singular density. When g = 1

I_{0} \int_{0}^{\infty} f (a) δ (a - t) d a = I_{0} f (t) = ϕ (1, t)

(B.4)

and therefore one can show recursively that for g ≥ 1

ϕ (g, t) = I_{0} \int_{0}^{\infty} \dots \int_{0}^{\infty} (\prod_{i = 1}^{g} f (a_{i}) {d a}_{i}) δ (\sum_{i = 1}^{g} a_{i} - t) .

(B.5)

By reordering the above and substituting f with its Fourier transform f̃ one obtains

\begin{matrix} ϕ (g, t) = I_{0} \int_{0}^{\infty} \dots \int_{0}^{\infty} \int_{- \infty}^{\infty} e^{i k (\sum a_{i} - t)} \frac{d k}{2 π} (\prod_{i = 1}^{g} f (a_{i}) {d a}_{i}) = \\ = \frac{I_{0}}{2 π} \int_{- \infty}^{\infty} e^{- ikt} (\prod_{i = 1}^{g} \int_{0}^{\infty} f (a_{i}) e^{{ika}_{i}} {d a}_{i}) d k = \\ = \frac{I_{0}}{2 π} \int_{- \infty}^{\infty} e^{- ikt} \tilde{f} {(k)}^{g} d k \end{matrix}

(B.6)

Substituting Eq. (B.6) in Eq. (B.1), one gets Eq. (4):

I (g, t) = \frac{I_{0}}{2 π} \int_{0}^{t} e^{- \int_{0}^{a} β (s) d s} \int_{- \infty}^{\infty} \tilde{f} {(k)}^{g} e^{- i k (t - a)} d k d a

(B.7)

which reduces the problem to a quadrature.

Appendix C. HD variance for an exponentially growing population in the presence of recombination alone

We now illustrate how to obtain expressions for the mean HD₀ and the expected value of the variance of HD₀ as functions of generation g in the case where the population grows exponentially, the mutation rate is ε = 0, and the recombination rate is ρ = 1.

For every position x between 1 and N_B, we compute both the mean HD₀ and the HD₀ variance at that position across all sequences, and for every pair of positions x and y, we will compute the covariance across all sequences, and finally we will obtain the total mean HD₀ and HD₀ variance by summing across all positions.

Since there are no new mutations, at any generation g, the mean HD at d position x is given by ${\bar{H D}}_{x} (g) = \frac{d}{N_{B}}$ , and hence

μ (g) = \sum_{x = 1}^{N_{B}} {\bar{H D}}_{x} (g) = d .

(C.1)

An unbiased estimator of the variance at position x is given by

s_{x}^{2} (g) = \frac{N_{g}}{N_{g} - 1} (\bar{H D_{x}^{2}} - {\bar{H D}}_{x}^{2}),

(C.2)

Here, for simplicity of notation, we have omitted the dependency in g. So, if $s_{x}^{2} (g + 1)$ is the expected variance at generation g + 1, summing over all sequences s we obtain

s^{2} (g + 1) = \frac{1}{N_{g}} (\sum_{s} {({H D}_{x} - \frac{1}{N_{g}} \sum_{s^{'}} {H D}_{x})}^{2}) = \frac{N_{g} - 1}{N_{g}} s^{2} (g) .

(C.3)

This is not quite the total variance as we still need to add in the contribution from the covariance between every pair of positions x and y.

To understand how the expected covariance s_x_,_y between any two sites x and y changes with generation g, we make the following observation: if the split position θ falls between the two positions, then x and y will be inherited independently to the next generation. If, on the other hand, θ does not split the two positions, then the new covariance will be a random variate with expectation the second cumulant of the covariance at the previous distribution. In order to use this observation, we split the set of sequences in two groups: the ones derived from two parent sequences which recombined somewhere between x and y, and those derived instead from a recombination split outside x and y.

Let N₁ be the number of sequences in the first group, and N₂ the number of sequences in the second group, so that N₁ + N₂ = N_g (when clear from the context, we omit the dependency in g for simplicity of notation). We define $w_{i} = E (\frac{N_{i}}{N_{g}})$ and $κ_{1 x}^{i}$ and $κ_{11}^{i}$ the cumulants of the HD in the i-th group at a fixed generation g. An unbiased estimator of the covariance is given by

s_{x y} (g) = \frac{1}{N_{g} - 1} \sum_{s \in N_{g}} ({H D}_{x} - {\bar{H D}}_{x}) ({H D}_{y} - {\bar{H D}}_{y})

(C.4)

where N_g is the total number of sequences at generation g, HD_x and HD_y are the Hamming distances at positions x and y respectively, and ${\bar{H D}}_{x}$ and ${\bar{H D}}_{y}$ their means. By splitting the summation over sequences in N₁ and sequences in N₂, one gets

s_{x y} (g) = \sum_{i = 1, 2} \frac{N_{i} - 1}{N_{g} - 1} s_{i} + \sum_{i = 1, 2} \frac{N_{i}}{N_{g} - 1} ({\bar{H D}}_{x, i} - {\bar{H D}}_{x}) ({\bar{H D}}_{y, i} - {\bar{H D}}_{y})

(C.5)

where s_i is the variance over subset N_i, and ${\bar{H D}}_{x, i}$ and ${\bar{H D}}_{y, i}$ are the means at positions x and y respectively within group N_i. Taking expectations on both sides of the above equation, one obtains

E (s_{x y}) ∣_{g + 1} = \sum_{i = 1, 2} w_{i} κ_{11}^{i} + \frac{N_{g + 1}}{N_{g + 1} - 1} [\sum_{i = 1, 2} w_{i} κ_{1 x}^{i} κ_{1 y}^{i} - (\sum_{i = 1, 2} w_{i} κ_{1 x}^{i}) (\sum_{j = 1, 2} w_{j} κ_{1 y}^{j})] .

(C.6)

Notice that the first term on the right hand side is the weighted average of the $κ_{11}^{i}$ ’s, and the second one is the weighted covariance between $κ_{1 x}^{i}$ and $κ_{1 y}^{i}$ .

The first cumulants are the mean of the corresponding distribution, hence $κ_{1 x}^{i} = κ_{1 y}^{i} = d$ at all times, whereas with the above definitions of N₁ and N₂, we get $κ_{11}^{1} = 0$ (the split position separates x and y and hence the number of mutations at each site become independent variables), and $κ_{11}^{2} = \frac{N_{g} - 1}{N_{g}} E (s_{x y})$ . Furthermore, $w_{2} = (1 - \frac{y - x}{N_{B} + 1})$ . Therefore, substituting all of the above into Equation (C.6), we obtain

E (s_{x y}) (g + 1) = (1 - \frac{y - x}{N_{B} + 1}) \frac{N_{g} - 1}{N_{g}} E (s_{x y}) (g)

(C.7)

By adding across all positions we get

\begin{matrix} s^{2} (g) = \sum_{x = 1}^{N_{B}} s_{x}^{2} (g) + 2 \sum_{x < y} s_{x y} (g) = \\ = \prod_{j = 0}^{g - 1} \frac{N_{j} - 1}{N_{j}} [\sum_{x = 1}^{N_{B}} s_{x}^{2} (0) + 2 \sum_{x < y} {(1 - \frac{y - x}{N_{B} + 1})}^{g} s_{x, y} (0)] \end{matrix}

(C.8)

With these recursive relations, we have reduced the problem to computing the initial quantities $s_{x}^{2} (0)$ and s_x_,_y(0). Consider the initial population to be made of I₀ sequences such that the HD₀ of any given sequence s from the founder strain s₀ is a random variable from a probability distribution P with mean and variance μ₀ and σ₀ respectively. Then the moment generating function M of the HD₀ distribution is given by

M (u_{1}, \dots, u_{N_{B} - 1}) = {[\frac{1}{N_{B}} (1 + \sum_{i = 1}^{N_{B} - 1} e^{u_{i}})]}^{μ_{0} N_{B}^{- 1}} .

(C.9)

Here N_B is the length of the genomes, and u₁, … u_{N_B−1} are independent variables representing the positions in the genome (there’s a total of N_B positions, but only N_B − 1 are independent due to the constraint that the initial mean is μ₀). Taking the first and second derivatives, we find

s_{x}^{2} (0) = \frac{N_{B} - 1}{N_{B}^{2}} μ_{0} + \frac{1}{N_{B}^{2}} σ_{0}^{2}

(C.10)

and

s_{x y} (0) = \frac{1}{N_{B}^{2}} (σ_{0}^{2} - μ_{0})

(C.11)

It’s easy to verify that summing the above two equations over all positions, one gets the initial variance $σ_{0}^{2}$ .

Combining equations (C.8), (C.10), and (C.11), we get the expression for the total HD variance at generation g > 0:

\begin{matrix} σ_{ρ}^{2} (g) = d (1 - \frac{1}{N_{B}}) - 2 \frac{d}{N_{B}^{2}} \sum_{x < y \leq N_{B}} {(1 - \frac{y - x}{N_{B} + 1})}^{g} \prod_{j = 0}^{g - 1} \frac{N_{j} - 1}{N_{j}} = \\ = d (1 - \frac{1}{N_{B}}) - 2 \frac{d}{g + 2} [{(\frac{N_{B}}{N_{B} + 1})}^{g - 1} + O (N_{B}^{- 1})] \prod_{j = 0}^{g - 1} \frac{N_{j} - 1}{N_{j}} \end{matrix}

(C.12)

where we have used the following relationship:

2 \sum_{x = 1}^{N_{B} - 1} \sum_{y = x + 1}^{N_{B}} {(1 - \frac{y - x}{N_{B} + 1})}^{g} = \frac{2}{{(N_{B} + 1)}^{g}} \sum_{k = 1}^{N_{B} - 1} k {(k + 1)}^{g} = \frac{N_{B}^{g + 1}}{(g + 2) {(N_{B} + 1)}^{g - 1}} + O (N_{B}) .

(C.13)

Appendix D. HD variance for an exponentially growing population when both recombination and mutation rate are non-null

We now extend the methods used to derive Eq. (14) and find recursive formulae for the general case when both ρ and ε are non-null. Eq. (C.6) can be generalized to an indefinite number of partitions N_i such that Σ_i N_i = N_g, the total number of sequences at generation g. We use this to find analogous recursive formulae for the cumulants of the mean HD₀ distribution when both recombination and mutation events are present.

Fix positions x and y and choose N₁ to be the number of sequences derived from a recombination event that split positions x and y; N₂ the number of sequences derived from either a recombination event that did not split positions x and y, or from a division with no mutation; and, finally, denote N_k, with k = 3, 4 or 5 the number of sequences such that only the x position has mutated, only the y position, or both, respectively. Like before, let $w_{i} = E (\frac{N_{i}}{N_{g}})$ . Assume that mutations happen with a rate ε (per site, per generation), and recombination events happen with a rate ρ. Given this set-up, we notice that

E (w_{j}) = {\begin{matrix} \frac{y - x}{N_{B} + 1} ρ & if & j = 1 \\ (1 - \frac{y - x}{N_{B} + 1}) ρ + (1 - ρ) (1 - ε) & if & j = 2 \\ ε (1 - ρ) (1 - ε) & if & j = 3 or 4 \\ (1 - ρ) ε^{2} & otherwise . \end{matrix}

(D.1)

Let ξ = ε(1 − ρ). Carrying out similar computations as sketched in the previous section, one notices that, for every positions x and y

μ_{x} (g + 1) = μ_{x} (g) + ξ

(D.2)

s_{x}^{2} (g + 1) = \frac{N_{g} - 1}{N_{g}} s_{x}^{2} (g) + ξ (1 - ξ)

(D.3)

s_{x y} (g + 1) = (1 - ρ \frac{y - x}{N_{B} + 1}) \frac{N_{g} - 1}{N_{g}} s_{x y} (g) - \frac{N_{g + 1}}{N_{g + 1} - 1} ξ^{2} + ξ (ε - \frac{1}{N_{g + 1}}) - 2 ε ξ^{2} \frac{1 - ε}{N_{g + 1}}

(D.4)

By summing Eq. (D.2) over all positions x one obtains Eq. (15). Similarly, by summing Eq. (D.3) and Eq. (D.4) across all positions, and noticing that we can still use the same initial values we used in Appendix C, one obtains the general expression for s²(g).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Althaus CL, Bonhoeffer S. Stochastic interplay between mutation and recombination during the acuisition of drug resistance mutations in human immunodeficiency virus type 1. J Virol. 2005;79:13572–8. doi: 10.1128/JVI.79.21.13572-13578.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Batorsky R, Kearney MF, Palmer SE, Maldarelli F, Rouzine IM, Coffin JM. Estimate of effective recombination rate and average selection coefficient for HIV in chronic infection. Proc Natl Acad Sci USA. 2011;108:5661–6. doi: 10.1073/pnas.1102036108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bernard S, Pujo-Menjouet L, Mackey MC. Analysis of cell kinetics using a cell division marker: mathematical modeling of experimental data. Biophys J. 84:3414–24. doi: 10.1016/S0006-3495(03)70063-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen J, Dang Q, Unutmaz D, Pathak VK, Maldarelli F, Powell D, Hu WS. Mechanisms of nonrandom human immunodefficiency virus type 1 infection and double infection: preference in virus entry is important but is not the sole factor. J Virol. 2005;79:4140–49. doi: 10.1128/JVI.79.7.4140-4149.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen HY, DiMascio M, Perelson AS, Ho DD, Zhang L. Determination of virus burst size in vivo using a single-cycle SIV in rhesus macaques. Proc Natl Acad Sci USA. 2007;104:19079–84. doi: 10.1073/pnas.0707449104. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dang Q, Chen J, Unutmaz D, Coffin JM, Pathak VK, Powell D, KewalRamani VN, Maldarelli F, Hu WS. Nonrandom HIV-1 infection and double infection via direct and cell-mediated pathways. Proc Natl Acad Sci USA. 2004;101:632–37. doi: 10.1073/pnas.0307636100. [DOI] [PMC free article] [PubMed] [Google Scholar]
De Boer RJ, Ribeiro RM, Perelson AS. Current estimates for HIV-1 production imply rapid viral clearance in lymphoid tissues. PLoS Comput Biol. 2010;6:e1000906. doi: 10.1371/journal.pcbi.1000906. [DOI] [PMC free article] [PubMed] [Google Scholar]
Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007;7:214–23. doi: 10.1186/1471-2148-7-214. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fiebig EW, Wright DJ, Rawal BD, Garrett PE, Schumacher RT, et al. Dynamics of HIV viremia and antibody seroconversion in plasma donors: implications for diagnosis and staging of primary HIV infection. AIDS. 2003;17:1871–79. doi: 10.1097/00002030-200309050-00005. [DOI] [PubMed] [Google Scholar]
Fischer W, Bhattacharya T, Keele BF, Giorgi EE, Hraber PT, Perelson AS, Shaw GM, Korber BT, et al. Rapid mutational escape from cytotoxic T-cell responses in acute HIV-1 infection—an ultra-deep view. PLoS ONE. 2010;5:e12303. [Google Scholar]
Fraser C. HIV recombination: what is the impact on antiretroviral therapy? J R Soc Interface. 2005;2(489) doi: 10.1098/rsif.2005.0064. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gelfand Fomin SV. Calculus of Variations. Englewood Cliffs, NJ: 1963. [Google Scholar]
Giorgi EE, Funkhouser B, Athreya G, Perelson AS, Korber BT, Bhattacharya T. Estimating time since infection in early homogeneous HIV-1 samples using a Poisson model. BMC Bioinformatics. 2010;11:532–9. doi: 10.1186/1471-2105-11-532. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gillespie JH. The status of the neutral theory. Science. 1984;224:732–33. doi: 10.1126/science.224.4650.732. [DOI] [PubMed] [Google Scholar]
Huddleston JV. Population Dynamics with age- and time-dependent birth and death rates. Bull of Math Biol. 1983;45:827–36. [Google Scholar]
Jetzt AE, Yu H, Klarmann GJ, Ron Y, Preston BD, Dougherty JP. High rate of recombination throughout the human immunodeficiency virus type I genome. J Virol. 2000;74:1234–40. doi: 10.1128/jvi.74.3.1234-1240.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jung A, Meier R, Vartanian JP, Bocharov G, Jung V, Fischer U, Meese E, Wain-Hobson S, Meyerhans A. Multiply infected spleen cells in HIV patients. Nature. 2002;418:144. doi: 10.1038/418144a. [DOI] [PubMed] [Google Scholar]
Keele BF, Giorgi EE, Salazar-Gonzalez JF, Decker JM, Pham KT, et al. Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proc Natl Acad Sci USA. 2008;105:7552–57. doi: 10.1073/pnas.0802203105. [DOI] [PMC free article] [PubMed] [Google Scholar]
Keele BF, Li H, Learn GH, Hraber P, Giorgi EE, Grayson T, Sun C, Chen Y, Yeh WW, Letvin NL, Mascola JR, Nabel GJ, Haynes BF, Bhattacharya T, Perelson AS, Korber BT, Hahn BH, Shaw GM. Low-dose rectal inoculation of rhesus macaques by SIVsmE660 or SIVmac251 recapitulates human mucosal infection by HIV-1. J Exp Med. 2009;206:1117–34. doi: 10.1084/jem.20082831. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kevorkian J. Partial Differential Equations: Analytical Solutions Techniques. 2. Springer-Verlag; New York: 2002. [Google Scholar]
Kimura M. Evolutionary rate at the molecular level. Nature. 1968;217:624–626. doi: 10.1038/217624a0. [DOI] [PubMed] [Google Scholar]
Kouyos RD, Fouchet D, Bonhoffer S. Recombination and drug resistance in HIV: population dynamics and stochasticity. Epidemics. 2009;1:58–69. doi: 10.1016/j.epidem.2008.11.001. [DOI] [PubMed] [Google Scholar]
Lee HY, Giorgi EE, et al. Modeling sequence evolution in acute HIV-1 infection. J Theor Biol. 2009;261:341–60. doi: 10.1016/j.jtbi.2009.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
Levy DN, Aldrovandi GM, Kutsch O, Shaw GM. Dynamics of HIV-1 recombination in its natural target cells. Proc Natl Acad Sci USA. 2004;101:4204–09. doi: 10.1073/pnas.0306764101. [DOI] [PMC free article] [PubMed] [Google Scholar]
Markowitz M, Louie M, Hurley A, Sun E, Di Mascio M, et al. A novel antiviral intervention results in more accurate assessment of human immunodeficiency virus type 1 replication dynamics and T-cell decay in vivo. J Virol. 2003;77:5037–38. doi: 10.1128/JVI.77.8.5037-5038.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. doi: 10.1038/nature03959. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mostowy R, Kouyos RD, Fouchet D, Bonhoffer S. The role of recombination for the coevolutionary dynamics of HIV and the immune response. PLoS One. 2011;6:e16052. doi: 10.1371/journal.pone.0016052. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moutouh L, Corbeil J, Richman DD. Recombination leads to the rapid emergency of HIV-1 dually resistant mutants under selective drug pressure. Proc Nat Acad Sci USA. 1996;93:6106–11. doi: 10.1073/pnas.93.12.6106. [DOI] [PMC free article] [PubMed] [Google Scholar]
Murray JD. Mathematical Biology. 3. Springer-Verlag; Berlin Heidelberg: 2002. [Google Scholar]
Neher RA, Leitner T. Recombination rate and selection strength in HIV intra-patient evolution. PLoS Comput Biol. 2010;6:e1000660. doi: 10.1371/journal.pcbi.1000660. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nisbet RM, Gurney WSC. Modeling Fluctuating Populations. J Wiley and Sons; 1982. [Google Scholar]
Onafuwa-Nuga A, Telesnitsky A. The remarkable frequency of human immunodeficiency virus type 1 genetic recombination. Microbiol Mol Biol Rev. 2009;73:451–80. doi: 10.1128/MMBR.00012-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
Perelson AS, Neumann AU, Markowitz M, Leonard JM, Ho DD. HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science. 1996;271:1582–1586. doi: 10.1126/science.271.5255.1582. [DOI] [PubMed] [Google Scholar]
R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2010. URL http://www.R-project.org. [Google Scholar]
Ramirez BC, Simon-Loriere E, Galetto R, Negroni M. Implications of recombination for HIV diversity. Vir Res. 2008;134:64–73. doi: 10.1016/j.virusres.2008.01.007. [DOI] [PubMed] [Google Scholar]
Ramratnam B, Bonhoeffer S, Binley J, Hurley A, Zhang L, et al. Rapid production and clearance of HIV-1 and hepatitis C virus assessed by large volume plasma apheresis. Lancet. 1999;354:1782–85. doi: 10.1016/S0140-6736(99)02035-8. [DOI] [PubMed] [Google Scholar]
Reichl LE. A Modern Course in Statistical Physics. John Wiley and Sons; 1998. [Google Scholar]
Robertson DL, Sharp PM, McCutchan FE, Hahn BH. Recombination in HIV-1. Nature. 1995;374:124. doi: 10.1038/374124b0. [DOI] [PubMed] [Google Scholar]
Ribeiro RM, Qin L, Chavez LL, Li D, Self SG, Perelson AS. Estimation of the initial viral growth rate and basic reproductive number during acute HIV-1 infection. J Virol. 2010;12:6096–102. doi: 10.1128/JVI.00127-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sabino EC, Shpaer EG, Morgado MG, Korber BT, Diaz RS, Bongertz V, Cavalcante S, Galvo-Castro B, Mullins JI, Mayer A. Identification of human immunodeficiency virus type 1 envelope genes recombinant between subtypes B and F in two epidemiologically linked individuals from Brazil. J Virol. 1994;68:6340–6. doi: 10.1128/jvi.68.10.6340-6346.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
Salazar-Gonzalez JF, Bailes E, Pham KT, Salazar MG, Guffey MB, Keele BF, Derdeyn CA, Farmer P, Hunter E, Allen S, Manigart O, Mulenga J, Anderson JA, Swanstrom R, Haynes BF, Athreya GS, Korber BT, Sharp PM, Shaw GM, Hahn BH. Deciphering human immunodefficiency virus type 1 transmission and early envelope diversification by single-genome amplification and sequencing. J Virol. 2008;82:3952–70. doi: 10.1128/JVI.02660-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shriner D, Rodrigo AG, Nickle DC, Mullins JI. Pervasive genomic recombination of HIV-1 in vivo. Genetics. 2004;167:1573–83. doi: 10.1534/genetics.103.023382. [DOI] [PMC free article] [PubMed] [Google Scholar]
Suryavanshi GW, Dixit NM. Emergence of recombinant forms of HIV: dynamics and scaling. PLoS Comp Biol. 2007;3:e205. doi: 10.1371/journal.pcbi.0030205. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vijay NNV, Ajmani R, Perelson AS, Dixit NM. Recombination increases human immunodeficiency virus fitness, but not necessarily diversity. J Gen Virology. 2008;89:1467. doi: 10.1099/vir.0.83668-0. [DOI] [PubMed] [Google Scholar]
Von Foerster H. Some remarks on changing populations. In: Stohlman F Jr, editor. The Kinetic of Cellular Proliferation. Grune & Stratton; New York: 1959. pp. 382–407. [Google Scholar]
Wallstrom TC, Daniels MG, Bhattacharya T. Personal communication. 2010.
Wood N, Bhattacharya T, Keele BF, Giorgi EE, Liu M, Gaschen B, Daniels M, Ferrari G, Haynes BF, McMichael A, Shaw GM, Hahn BH, Korber B, Seoighe C. HIV evolution in early infection: selection pressures, patterns of insertion and deletion, and the impact of APOBEC. PLoS Pathog. 2009;5:e1000414. doi: 10.1371/journal.ppat.1000414. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wooley DP, Smith RA, Czajak S, Desrosiers RC. Direct demonstration of retroviral recombination in rhesus monkey. J Virol. 1997;71:9650–3. doi: 10.1128/jvi.71.12.9650-9653.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang M, Foley B, Schultz AK, Macke JP, Bulla I, Stanke M, Morgenstern B, Korber B, Leitner T. The role of recombination in the emergence of a complex and dynamic HIV epidemic. Retrovirology. 2010;7:25. doi: 10.1186/1742-4690-7-25. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zilman A, Ganusov VV, Perelson AS. Stochastic models of lymphocyte proliferation and death. PLoS ONE. 2010;5:e12775. doi: 10.1371/journal.pone.0012775. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Althaus CL, Bonhoeffer S. Stochastic interplay between mutation and recombination during the acuisition of drug resistance mutations in human immunodeficiency virus type 1. J Virol. 2005;79:13572–8. doi: 10.1128/JVI.79.21.13572-13578.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Batorsky R, Kearney MF, Palmer SE, Maldarelli F, Rouzine IM, Coffin JM. Estimate of effective recombination rate and average selection coefficient for HIV in chronic infection. Proc Natl Acad Sci USA. 2011;108:5661–6. doi: 10.1073/pnas.1102036108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Bernard S, Pujo-Menjouet L, Mackey MC. Analysis of cell kinetics using a cell division marker: mathematical modeling of experimental data. Biophys J. 84:3414–24. doi: 10.1016/S0006-3495(03)70063-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Chen J, Dang Q, Unutmaz D, Pathak VK, Maldarelli F, Powell D, Hu WS. Mechanisms of nonrandom human immunodefficiency virus type 1 infection and double infection: preference in virus entry is important but is not the sole factor. J Virol. 2005;79:4140–49. doi: 10.1128/JVI.79.7.4140-4149.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Chen HY, DiMascio M, Perelson AS, Ho DD, Zhang L. Determination of virus burst size in vivo using a single-cycle SIV in rhesus macaques. Proc Natl Acad Sci USA. 2007;104:19079–84. doi: 10.1073/pnas.0707449104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Dang Q, Chen J, Unutmaz D, Coffin JM, Pathak VK, Powell D, KewalRamani VN, Maldarelli F, Hu WS. Nonrandom HIV-1 infection and double infection via direct and cell-mediated pathways. Proc Natl Acad Sci USA. 2004;101:632–37. doi: 10.1073/pnas.0307636100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] De Boer RJ, Ribeiro RM, Perelson AS. Current estimates for HIV-1 production imply rapid viral clearance in lymphoid tissues. PLoS Comput Biol. 2010;6:e1000906. doi: 10.1371/journal.pcbi.1000906. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007;7:214–23. doi: 10.1186/1471-2148-7-214. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Fiebig EW, Wright DJ, Rawal BD, Garrett PE, Schumacher RT, et al. Dynamics of HIV viremia and antibody seroconversion in plasma donors: implications for diagnosis and staging of primary HIV infection. AIDS. 2003;17:1871–79. doi: 10.1097/00002030-200309050-00005. [DOI] [PubMed] [Google Scholar]

[R10] Fischer W, Bhattacharya T, Keele BF, Giorgi EE, Hraber PT, Perelson AS, Shaw GM, Korber BT, et al. Rapid mutational escape from cytotoxic T-cell responses in acute HIV-1 infection—an ultra-deep view. PLoS ONE. 2010;5:e12303. [Google Scholar]

[R11] Fraser C. HIV recombination: what is the impact on antiretroviral therapy? J R Soc Interface. 2005;2(489) doi: 10.1098/rsif.2005.0064. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Gelfand Fomin SV. Calculus of Variations. Englewood Cliffs, NJ: 1963. [Google Scholar]

[R13] Giorgi EE, Funkhouser B, Athreya G, Perelson AS, Korber BT, Bhattacharya T. Estimating time since infection in early homogeneous HIV-1 samples using a Poisson model. BMC Bioinformatics. 2010;11:532–9. doi: 10.1186/1471-2105-11-532. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Gillespie JH. The status of the neutral theory. Science. 1984;224:732–33. doi: 10.1126/science.224.4650.732. [DOI] [PubMed] [Google Scholar]

[R15] Huddleston JV. Population Dynamics with age- and time-dependent birth and death rates. Bull of Math Biol. 1983;45:827–36. [Google Scholar]

[R16] Jetzt AE, Yu H, Klarmann GJ, Ron Y, Preston BD, Dougherty JP. High rate of recombination throughout the human immunodeficiency virus type I genome. J Virol. 2000;74:1234–40. doi: 10.1128/jvi.74.3.1234-1240.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Jung A, Meier R, Vartanian JP, Bocharov G, Jung V, Fischer U, Meese E, Wain-Hobson S, Meyerhans A. Multiply infected spleen cells in HIV patients. Nature. 2002;418:144. doi: 10.1038/418144a. [DOI] [PubMed] [Google Scholar]

[R18] Keele BF, Giorgi EE, Salazar-Gonzalez JF, Decker JM, Pham KT, et al. Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proc Natl Acad Sci USA. 2008;105:7552–57. doi: 10.1073/pnas.0802203105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Keele BF, Li H, Learn GH, Hraber P, Giorgi EE, Grayson T, Sun C, Chen Y, Yeh WW, Letvin NL, Mascola JR, Nabel GJ, Haynes BF, Bhattacharya T, Perelson AS, Korber BT, Hahn BH, Shaw GM. Low-dose rectal inoculation of rhesus macaques by SIVsmE660 or SIVmac251 recapitulates human mucosal infection by HIV-1. J Exp Med. 2009;206:1117–34. doi: 10.1084/jem.20082831. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Kevorkian J. Partial Differential Equations: Analytical Solutions Techniques. 2. Springer-Verlag; New York: 2002. [Google Scholar]

[R21] Kimura M. Evolutionary rate at the molecular level. Nature. 1968;217:624–626. doi: 10.1038/217624a0. [DOI] [PubMed] [Google Scholar]

[R22] Kouyos RD, Fouchet D, Bonhoffer S. Recombination and drug resistance in HIV: population dynamics and stochasticity. Epidemics. 2009;1:58–69. doi: 10.1016/j.epidem.2008.11.001. [DOI] [PubMed] [Google Scholar]

[R23] Lee HY, Giorgi EE, et al. Modeling sequence evolution in acute HIV-1 infection. J Theor Biol. 2009;261:341–60. doi: 10.1016/j.jtbi.2009.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Levy DN, Aldrovandi GM, Kutsch O, Shaw GM. Dynamics of HIV-1 recombination in its natural target cells. Proc Natl Acad Sci USA. 2004;101:4204–09. doi: 10.1073/pnas.0306764101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Markowitz M, Louie M, Hurley A, Sun E, Di Mascio M, et al. A novel antiviral intervention results in more accurate assessment of human immunodeficiency virus type 1 replication dynamics and T-cell decay in vivo. J Virol. 2003;77:5037–38. doi: 10.1128/JVI.77.8.5037-5038.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. doi: 10.1038/nature03959. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Mostowy R, Kouyos RD, Fouchet D, Bonhoffer S. The role of recombination for the coevolutionary dynamics of HIV and the immune response. PLoS One. 2011;6:e16052. doi: 10.1371/journal.pone.0016052. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Moutouh L, Corbeil J, Richman DD. Recombination leads to the rapid emergency of HIV-1 dually resistant mutants under selective drug pressure. Proc Nat Acad Sci USA. 1996;93:6106–11. doi: 10.1073/pnas.93.12.6106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Murray JD. Mathematical Biology. 3. Springer-Verlag; Berlin Heidelberg: 2002. [Google Scholar]

[R30] Neher RA, Leitner T. Recombination rate and selection strength in HIV intra-patient evolution. PLoS Comput Biol. 2010;6:e1000660. doi: 10.1371/journal.pcbi.1000660. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Nisbet RM, Gurney WSC. Modeling Fluctuating Populations. J Wiley and Sons; 1982. [Google Scholar]

[R32] Onafuwa-Nuga A, Telesnitsky A. The remarkable frequency of human immunodeficiency virus type 1 genetic recombination. Microbiol Mol Biol Rev. 2009;73:451–80. doi: 10.1128/MMBR.00012-09. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Perelson AS, Neumann AU, Markowitz M, Leonard JM, Ho DD. HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science. 1996;271:1582–1586. doi: 10.1126/science.271.5255.1582. [DOI] [PubMed] [Google Scholar]

[R34] R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2010. URL http://www.R-project.org. [Google Scholar]

[R35] Ramirez BC, Simon-Loriere E, Galetto R, Negroni M. Implications of recombination for HIV diversity. Vir Res. 2008;134:64–73. doi: 10.1016/j.virusres.2008.01.007. [DOI] [PubMed] [Google Scholar]

[R36] Ramratnam B, Bonhoeffer S, Binley J, Hurley A, Zhang L, et al. Rapid production and clearance of HIV-1 and hepatitis C virus assessed by large volume plasma apheresis. Lancet. 1999;354:1782–85. doi: 10.1016/S0140-6736(99)02035-8. [DOI] [PubMed] [Google Scholar]

[R37] Reichl LE. A Modern Course in Statistical Physics. John Wiley and Sons; 1998. [Google Scholar]

[R38] Robertson DL, Sharp PM, McCutchan FE, Hahn BH. Recombination in HIV-1. Nature. 1995;374:124. doi: 10.1038/374124b0. [DOI] [PubMed] [Google Scholar]

[R39] Ribeiro RM, Qin L, Chavez LL, Li D, Self SG, Perelson AS. Estimation of the initial viral growth rate and basic reproductive number during acute HIV-1 infection. J Virol. 2010;12:6096–102. doi: 10.1128/JVI.00127-10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] Sabino EC, Shpaer EG, Morgado MG, Korber BT, Diaz RS, Bongertz V, Cavalcante S, Galvo-Castro B, Mullins JI, Mayer A. Identification of human immunodeficiency virus type 1 envelope genes recombinant between subtypes B and F in two epidemiologically linked individuals from Brazil. J Virol. 1994;68:6340–6. doi: 10.1128/jvi.68.10.6340-6346.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] Salazar-Gonzalez JF, Bailes E, Pham KT, Salazar MG, Guffey MB, Keele BF, Derdeyn CA, Farmer P, Hunter E, Allen S, Manigart O, Mulenga J, Anderson JA, Swanstrom R, Haynes BF, Athreya GS, Korber BT, Sharp PM, Shaw GM, Hahn BH. Deciphering human immunodefficiency virus type 1 transmission and early envelope diversification by single-genome amplification and sequencing. J Virol. 2008;82:3952–70. doi: 10.1128/JVI.02660-07. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] Shriner D, Rodrigo AG, Nickle DC, Mullins JI. Pervasive genomic recombination of HIV-1 in vivo. Genetics. 2004;167:1573–83. doi: 10.1534/genetics.103.023382. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] Suryavanshi GW, Dixit NM. Emergence of recombinant forms of HIV: dynamics and scaling. PLoS Comp Biol. 2007;3:e205. doi: 10.1371/journal.pcbi.0030205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] Vijay NNV, Ajmani R, Perelson AS, Dixit NM. Recombination increases human immunodeficiency virus fitness, but not necessarily diversity. J Gen Virology. 2008;89:1467. doi: 10.1099/vir.0.83668-0. [DOI] [PubMed] [Google Scholar]

[R45] Von Foerster H. Some remarks on changing populations. In: Stohlman F Jr, editor. The Kinetic of Cellular Proliferation. Grune & Stratton; New York: 1959. pp. 382–407. [Google Scholar]

[R46] Wallstrom TC, Daniels MG, Bhattacharya T. Personal communication. 2010.

[R47] Wood N, Bhattacharya T, Keele BF, Giorgi EE, Liu M, Gaschen B, Daniels M, Ferrari G, Haynes BF, McMichael A, Shaw GM, Hahn BH, Korber B, Seoighe C. HIV evolution in early infection: selection pressures, patterns of insertion and deletion, and the impact of APOBEC. PLoS Pathog. 2009;5:e1000414. doi: 10.1371/journal.ppat.1000414. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] Wooley DP, Smith RA, Czajak S, Desrosiers RC. Direct demonstration of retroviral recombination in rhesus monkey. J Virol. 1997;71:9650–3. doi: 10.1128/jvi.71.12.9650-9653.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] Zhang M, Foley B, Schultz AK, Macke JP, Bulla I, Stanke M, Morgenstern B, Korber B, Leitner T. The role of recombination in the emergence of a complex and dynamic HIV epidemic. Retrovirology. 2010;7:25. doi: 10.1186/1742-4690-7-25. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] Zilman A, Ganusov VV, Perelson AS. Stochastic models of lymphocyte proliferation and death. PLoS ONE. 2010;5:e12775. doi: 10.1371/journal.pone.0012775. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Modeling sequence evolution in HIV-1 infection with recombination

Elena E Giorgi

Bette T Korber

Alan S Perelson

Tanmoy Bhattacharya

Abstract

1. Introduction

2. Results

2.1. Continuous time model of viral evolution

2.2. Mean Hamming Distance

Figure 1.

2.3. Recombination

2.4. Recombination alone

Figure 2.

Figure 3.

2.5. Mutations and Recombination

Figure 4.

Figure 5.

Figure 6.

Figure 7.

3. Conclusions

Highlights.

Acknowledgments

Appendix A. Stochastic model

Claim Appendix A.1

Proof

Claim Appendix A.2

Appendix B. Continuous model, closed form solution

Appendix C. HD variance for an exponentially growing population in the presence of recombination alone

Appendix D. HD variance for an exponentially growing population when both recombination and mutation rate are non-null

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Modeling sequence evolution in HIV-1 infection with recombination

Elena E Giorgi

Bette T Korber

Alan S Perelson

Tanmoy Bhattacharya

Abstract

1. Introduction

2. Results

2.1. Continuous time model of viral evolution

2.2. Mean Hamming Distance

Figure 1.

2.3. Recombination

2.4. Recombination alone

Figure 2.

Figure 3.

2.5. Mutations and Recombination

Figure 4.

Figure 5.

Figure 6.

Figure 7.

3. Conclusions

Highlights.

Acknowledgments

Appendix A. Stochastic model

Claim Appendix A.1

Proof

Claim Appendix A.2

Appendix B. Continuous model, closed form solution

Appendix C. HD variance for an exponentially growing population in the presence of recombination alone

Appendix D. HD variance for an exponentially growing population when both recombination and mutation rate are non-null

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases