Skip to main content
eLife logoLink to eLife
. 2024 Dec 27;13:RP97350. doi: 10.7554/eLife.97350

Eco-evolutionary dynamics of adapting pathogens and host immunity

Pierre Barrat-Charlaix 1,2,3, Richard A Neher 1,2,
Editors: Armita Nourmohammad4, Aleksandra M Walczak5
PMCID: PMC11677248  PMID: 39728926

Abstract

As pathogens spread in a population of hosts, immunity is built up, and the pool of susceptible individuals are depleted. This generates selective pressure, to which many human RNA viruses, such as influenza virus or SARS-CoV-2, respond with rapid antigenic evolution and frequent emergence of immune evasive variants. However, the host’s immune systems adapt, and older immune responses wane, such that escape variants only enjoy a growth advantage for a limited time. If variant growth dynamics and reshaping of host-immunity operate on comparable time scales, viral adaptation is determined by eco-evolutionary interactions that are not captured by models of rapid evolution in a fixed environment. Here, we use a Susceptible/Infected model to describe the interaction between an evolving viral population in a dynamic but immunologically diverse host population. We show that depending on strain cross-immunity, heterogeneity of the host population, and durability of immune responses, escape variants initially grow exponentially, but lose their growth advantage before reaching high frequencies. Their subsequent dynamics follows an anomalous random walk determined by future escape variants and results in variant trajectories that are unpredictable. This model can explain the apparent contradiction between the clearly adaptive nature of antigenic evolution and the quasi-neutral dynamics of high-frequency variants observed for influenza viruses.

Research organism: Viruses

Introduction

Many human RNA viruses adapt rapidly to evade pre-existing immunity and re-infect humans multiple times over their lifetime. The most prominent examples of this evolution are influenza virus and SARS-CoV-2 (Roemer et al., 2023; Petrova and Russell, 2018), for which the changing virus population is surveilled in great detail and vaccines are updated regularly. To improve the match between the virus population and the vaccine, several groups are working on predictive models to anticipate the variants that dominate future viral populations (Morris et al., 2018; Meijers et al., 2023).

A common framework to model the rapid evolutionary dynamics of RNA viruses is to consider a population located away from the fitness optimum and with many accessible beneficial mutations (Tsimring et al., 1996). In this setting, clones compete to accumulate beneficial mutations as quickly as possible. In a process called selective sweep, successful variants expand and tend to be the ancestors of the future population while less successful mutants eventually disappear. The resulting fitness distribution is a wave traveling along the fitness axis, the so-called traveling fitness waves Rouzine et al., 2003; Desai and Fisher, 2007; Neher, 2013. As the pathogen circulates, hosts develop immunity which leads to a ‘deterioration of the environment’ for the pathogen which approximately balances the increase in average fitness due to adaptation.

The traveling wave framework has been extensively used in this context as it allows for a straightforward ways to approach the prediction problem: each variant is assumed to have a fixed fitness relative to other variants, and inferring the fitness of all competing variants should allow prediction of the future composition of the population. Indeed, current methods typically infer the instantaneous growth advantage of a strain based on past and present circulation and then project this growth advantage forward in time Luksza and Lässig, 2014; Neher et al., 2014; Huddleston et al., 2020. While future mutations can reshuffle the relative fitness of lineages and thereby limit predictability, in these models a lineage that is most fit at any given time is most likely to dominate in the long run.

One short-coming of the traveling wave approach is the lack of explicit representation of the epidemiological dynamics and of the host’s immunity. Indeed, fitness is only an effective parameters that summarize the complex interplay between viral antigenic properties and the hosts’ immune systems. As such, it cannot explicitly describe important phenomena such as the build-up of host immunity to new variants, variant-specific immunity, or the interaction between strains through antigenic cross-reactivity. Taking the hosts’ immunity and viral cross-immunity into account has the potential to strongly improve predictions Meijers et al., 2023 or explain why prediction is difficult (Barrat-Charlaix et al., 2021).

The interaction between epidemiological dynamics and hosts’ immunity are often modeled using generalizations of the Susceptible-Infected-Recovered model (SIR) to include multiple viral strains Gupta et al., 1998; Gog and Grenfell, 2002. In this setting, the natural question is that of the ultimate fate of the pathogen: will it go extinct, diversify to the point of speciation, or reach the so-called Red Queen State where new strains continuously replace old ones Yan et al., 2019; Marchi et al., 2021; Chardès et al., 2023; Rouzine and Rozhnova, 2018. To remain tractable, these studies typically approximate population immunity as a low-dimensional landscape in which the viral population evolves and ignores the complex heterogeneity in the immunity of different individuals. Furthermore, immunity is often assumed to be long-lived, and evolution of the pathogen in a stable low dimensional landscape gives rise to traveling waves.

Here, we study how novel variants of a virus shape the host population’s immunity, which in turn changes their own growth dynamics. To do so, we use a multi-strain SIR model combining immune waning and heterogeneous immunity of the hosts. Such heterogeneity has been demonstrated for influenza virus in individuals of different ages Lee et al., 2019; Welsh et al., 2023. We show that this model generically leads to a situation where novel immune evasive variants emerge. In a homogeneous population of hosts, this leads to a succession of selective sweeps where novel variants compete against each other and replace previously circulating variants. However, in a heterogeneous population with a more rapid waning of immunity, initially growing variants lose their selective advantage before reaching fixation due to immunological adjustment of the host population. The phenomenology of our epidemiological model is reminiscent of ecological systems such as consumer-resource models, where adaptation by one species shifts the global equilibrium and distribution of other species but does not necessarily result in a selective sweep Good et al., 2018. In these systems, adaptation can usually not be modeled by a fixed fitness parameter for each strain but rather depends on the composition of the population Tikhonov and Monasson, 2018.

Strain dynamics in our model differ qualitatively from what is expected in the traveling wave scenario. While adaptive mutations are highly overrepresented in genetic diversity, they cease having a growth advantage when reaching intermediate frequencies, a process we call ‘expiring fitness.’ Once the fitness effect of a mutation has expired, its frequency randomly changes up or down as subsequent adaptive mutations occur on the same or on different genomic backgrounds.

This resemblance to neutral evolution could have important consequences for predictability of viral evolution. It is interesting to relate this to the recent observations that the evolution of influenza is not as predictable as one would expect from typical models Huddleston et al., 2020; Barrat-Charlaix et al., 2021. In particular, we observed in Barrat-Charlaix et al., 2021 that the frequency trajectories of mutants of A/H3N2 influenza show features that are expected in neutral evolution but hard to explain in a traveling wave framework.

Results

Multi-strain SIR model

We describe the interaction of several viral strains and host immunity using a Susceptible/Infected compartmental model, similar to those used in Gog and Grenfell, 2002; Yan et al., 2019. In the most general form, the model describes N variants of the virus labeled a{1N} circulating among M groups of hosts with distinct exposure histories labeled i{1M} (immune groups). These groups could be different age cohorts or could be geographically separated. For each group i, we define compartments Iia and Sia as, respectively, the number of individuals of this group infected or susceptible to strain a. We assume that the total population of hosts is 1 so that we always have 0Iia,Sia1, and values of Iia and Sia can be interpreted as fractions of the host population.

As with usual compartmental models, we assume that the dynamics are driven by the interaction of susceptible and infected hosts, leading to infections and gains of immunity. The rate at which hosts of group i susceptible to variant a are infected by a is αSiaj=1MCijIja. Here, α is an overall infection rate while Cij represents the probability of an encounter between individuals of groups i and j. Thus, the above rate takes into account infections with strains a caused by hosts of all groups. Considering that infected hosts recover at rate δ, we can thus write the dynamics for Iia:

I˙ia=αSiaj=1MCijIjaδIia. (1)

When a host of group i is infected by strain b, it gains immunity against the infecting strain b, but also to other strains a with probability 0Kiab1. Thus, Sia decreases at a rate proportional to Kiab and to the number of hosts infected by b for every strain b. Since susceptibles to a are depleted by infections from other strains, the dynamics of all strains are coupled. This coupling is determined by the matrices Ki of dimension N×N, which in general differ between groups i with different prior exposure history. Additionally, the waning of immunity at a rate γ causes immune hosts to re-enter the susceptible compartment. We can now write the dynamics of Sia as

S˙ia=αb=1Nj=1MSiaKiabCijIjb+γ(1Sia), (2)

where the first term accounts for immunity gains (or loss of susceptibility) due to infections or cross-immunity while the second represents immune waning. This model introduced by Gog and Grenfell, 2002 assumes that immunity builds up through exposure and not only through infection. This explains that the change in Sia is simply proportional to SiaIjb, regardless of the susceptibility of hosts to strain b. Alternative models that require infection for acquisition of immunity have qualitatively similar dynamics, but are mathematically more complex (Appendix 1.5). We also represented loss of susceptibility to a due to exposure to a using a trivial cross-immunity term Kiaa=1.

An important property of our model is that the probability of generating cross-immunity can differ between groups. The motivation is that strains a and b may be perceived as antigenically different by some immune systems, leading to a low Kiab, but as highly similar by others, leading to Kiab1. Such a heterogeneous response by different immune systems has been observed experimentally in the case of influenza: in Lee et al., 2019; Welsh et al., 2023 for instance, it was found that a given mutation in an influenza strain may allow escape from the antibodies of some individuals, i.e., low Kiab, while it had little effect on the serum of other individuals, i.e., high Kiab. Heterogeneous immune response could be caused by varying histories of strain exposure for different individuals, for instance, due to differences in age or geographical region. If immune groups correspond to age cohorts, mixing between groups is rapid, and we can simplify the connectivity between groups to Cij=1/M. If immune groups are shaped by geographic differences in exposure, the connectivity would be close to 1 on the diagonal (1Cii1) while off-diagonal terms would be small (Cij1 for ij).

Invasion of an adaptive variant

Hosts’ immune heterogeneity and strain cross-immunity play two different roles in the model. The latter allows the model to reach a non-trivial equilibrium where multiple strains co-exist, while the former affects the convergence to the equilibrium.

To illustrate this, we design a simple scenario with only two strains: a wild-type and a variant. Accordingly, indices (a,b) describing strains will take values {wt,v}. We consider that the two strains share the same infectivity rate α, which amounts to say that they would have the same reproductive rate in a fully naive population. The case where the two strains differ in intrinsic fitness is explored in detail in Appendix 1.7. In brief, as long as the difference in intrinsic fitness is not too large compared to cross-immunity effects, the qualitative results given below hold, while larger intrinsic fitness differences lead to more classical dynamics like selective sweeps.

In the first version of this scenario, we will only consider one immune group, that is M=1. We can thus skip the indices i,j{1M}, and we only have one cross-immunity matrix K that we parametrize as

K=[1bf1], (3)

with 0b,f1 . b quantifies the amount of ‘backward’-immunity to the wild-type caused by the variant: a large b means that it is likely that an infection by the variant causes immunity to the wild-type. Conversely, f quantifies the ‘forward’-immunity: infections by the wild-type causing immunity to the variant. If f=b=1, the two strains are antigenically indistinguishable, and thus essentially identical for the model. Conversely, if f=b=0, the two strains are completely distinct and do not interact through cross-immunity.

The dynamical equations now take a simplified form:

S˙a=αSab{wt,v}KabIb+γ(1Sa),I˙a=(αSaδ)Ia. (4)

We can immediately derive the equilibrium state for this simplified case. We first define the reproductive number of strain a as Ra=αSa/δ. Ia grows when Ra>1 and declines when Ra<1. The equilibrium susceptibility is, therefore, Sa=δ/α, such that Ra=1. On the other hand, the equilibrium prevalence is determined by the inverse of the cross-immunity matrix K:

Ieqa=γδ(1δα)(K11)a, (5)

with 1 being the vector [1;1]. The order of magnitude of the prevalence is given by the ratio of the rate of waning γ(1δ/α) and the recovery rate δ. In the following, we frequently use values α=3 and γ=5103 in units of inverse generations δ, i.e., we set δ=1. At equilibrium, this corresponds to a fraction of 0.003 of the host population being infected at any time. If generation time is a week, which is roughly the case for respiratory viruses such as influenza virus or SARS-CoV-2, the fraction of hosts infected in any year is 0.15, which is of similar magnitude as empirical estimates for influenza (Kucharski et al., 2018).

It is also straightforward to compute the fraction of infections caused by the variant at equilibrium, thereafter referred to as the frequency of the variant. We find that this frequency is

β=1f(1b)+(1f). (6)

In the case where b=f, the variant will ultimately settle at frequency 1/2. This includes the case where b=f=0, where the two strains are completely independent and do not interact. On the contrary if bf, the final frequency of the variant can in principle be anywhere between 0 and 1. For example if b>f, the variant is more likely to cause immunity to the wild-type than the wild-type is to cause immunity to the variant. In this case, β>1/2 and the variant will be the dominant variant.

We are primarily interested in an ‘invasion’ scenario where only the wild-type is initially present in the population, that is Iv=0 at t<0. Cross-immunity with the resident strain reduces the fraction of hosts susceptible to the variant below one even though it has not circulated yet. But the number of susceptible hosts is always larger than the equilibrium value δ/α unless f=1, As a result, the growth rate of the variant is initially positive and given by

s(t=0)=(1f)(αδ)δ+f(αδ) (7)

The variant thus increases initially exponentially until it has become sufficiently frequent that it starts having a substantial effect on the immunity landscape, before eventually settling into an equilibrium with the wild-type. The details of the equilibrium reached by the system in the absence of additional mutant variants is given in Appendix 1.1. Figure 1 explores different scenarios numerically.

Figure 1. Invasion of animmune escape mutant.

Figure 1.

Top row: one immune group, Middle row: M=10 immune groups and fast mixing Cij=1/M and Bottom row M=10 immune groups and slow mixing Cij=1/10M. Other parameters are the same for all rows: in units of δ, we set α=3 and γ=5103, and f=0.65, b=0.8, ε=0.01. For both rows, graphs represent: Left: number of hosts infectious with the wild-type and the variant; Middle: number of hosts susceptible to the wild-type and the variant, with the equilibrium value δ/α as a gray line; Right: fraction of the infections due to the variant. The thick gray line shows the expected equilibrium frequency β in the case with one immune group, given in Equation 6. The dashed line shows the trajectory of a constant fitness logistic growth with the same initial growth rate.

The top row of Figure 1 shows the dynamics of the model after the introduction of the variant in a homogeneous population (M=1). As expected, the number of infections by the variant initially rises while the number of susceptibles Sv decreases. However, as Sv goes below the critical value δ/α, Iv starts to decline and then oscillates around the equilibrium value before finally converging to it. The mathematical properties of these oscillations are discussed in Appendix 1.8.

However, these strong and slowly damped oscillations are not what is observed in circulating viruses. For instance, in the first oscillation in the specific example of Figure 1, the prevalence of the wild-type Iwt goes down to microscopic levels and the frequency of the variant approaches one, see Figure 1. During stochastic circulation in a finite population of hosts the wild-type would likely be lost. The theoretical equilibrium that is reached at long times is thus not very relevant, and what would be actually observed in reality is a selective sweep by the variant.

Oscillations are the consequence of the rapid rise of the variant followed by an overshoot. This effect is mitigated by immunological heterogeneity, as shown in the following example with M=10 groups. For group i=1, the cross-immunity matrix K1 takes the same form as in the previous scenario, given by Equation 3. However, for other groups, we assume that the two strains are virtually identical, with the cross-immunity having the form

Ki>1=[11ε1ε1], (8)

where ε1. Our reasoning is that we expect an adaptive variant to escape the existing immunity for part of the host population, here immune group 1, while having little effect on the rest of the hosts.

One consequence of many groups that are indifferent to the variant is that globally the excess susceptibility to the variant is lower. If mixing is rapid, the initial growth rate of the variant is smaller by a factor of M compared to the one-group case. If mixing is slow, the initial growth of the variant is as rapid as in the one-group case, but then spreads only slowly across groups. Globally, the frequency of the variant thus never reaches values close to one and population-wide oscillations are reduced.

The central and bottom rows of Figure 1 show the result of the invasion for M groups, respectively, for the rapid and slow mixing cases. In both scenarios, the initial number of hosts susceptible to the variant are now closer to δ/α. When mixing is fast, the frequency of the variant initially resembles a standard selective sweep (dashed line in Figure 1) before saturating, while dynamics are more complicated for the slow mixing case. Either way, the main effect of the immune groups is that the overshoot past the equilibrium is much smaller and the dampening of the oscillations stronger. As a result, the frequency of the variant approaches its equilibrium value without effectively sweeping to fixation before.

Notably, the equilibrium frequency in the above examples does not depend on M and Equation 6 remains valid for ε=0. This invariance is a consequence of the fact that for ε=0, the variant and wild-type strains are completely equivalent in immune groups i>1 and equilibrium is only determined by cross-immunity in group i=1 (Appendix 1.4). For small ε the equilibrium shifts slightly, but Equation 6 remains a good approximation.

While this simple two-strain model predicts that the two strains come to an equilibrium at frequency β, their frequency will of course continue to change due to the emergence of additional strains, which we will discuss below.

Even though the variant has a clear growth rate advantage when it appears, this does not result in it replacing the wild-type. This contrasts with the typical ‘selective sweep’ that occurs when the growth rate advantage stays constant, which is illustrated as a dashed line in the figure. We refer to frequency trajectories of a variant that at first rise exponentially before settling at an intermediate frequency as partial sweeps. As we will discuss below, such partial sweeps can lead to qualitatively different evolutionary dynamics and its predictability.

If the initial growth is due to higher susceptibility, it is misleading to think of it as an intrinsic fitness advantage of the variant. Instead, the initial growth is the result of an imbalance in the immune state of the host population, which gradually disappears as the variant becomes more frequent, as shown in the central panels of Figure 1. In this sense, our model is comparable to ecological systems where interaction between organisms cannot be fully explained using a fixed scalar fitness for each strain but rather depends on the composition of the viral and host population. In particular, the stalling of frequency increase gives rise to the partial sweep is reminiscent of consumer resource models Tikhonov and Monasson, 2018; Good et al., 2018, highlighting the link between ecological and epidemiological models. An important consequence of these dynamics is that predicting the equilibrium frequency reached by the variant and its ultimate fate is hard from the observation of a partial frequency trajectory.

Ultimate fate of the invading variant

In the invasion scenario discussed above, dynamics stop after the initial variant has reached an equilibrium frequency. However, as the viral population evolves, new adaptive mutants can appear. In the framework of the SIR model, a new strain translates into extending the cross-immunity matrix by one row and one column. Each new variant will perform its own partial sweep, and saturate at frequencies β2,β3, sampled from some distribution Pβ. This process is shown in panel A of Figure 2, using the SIR model to simulate up to N=7 variants. For the sake of illustration, it shows a simple scenario where the initial variant appears at time 0 in a homogeneous wild-type population, and subsequent mutants appear at regular time intervals. Simulations are performed using M=10 immune groups, resulting in a slight overshoot of the equilibrium frequency for each trajectory.

Figure 2. Dynamics of partial sweeps and subsequentfixation.

Figure 2.

(A) Simulation of Susceptible-Infected-Recovered (SIR) Equation 1 & Equation 2 with additional strains appearing at regular time intervals. The fraction of infections (frequency) caused by each strain is shown as a function of time. The first strain to appear at t=0 is the variant of interest, and curves are shown in shades of red if they appear on the background of this variant, and of blue if they appear on the background of the wild-type. (B) Same as A but with frequencies stacked vertically. The black line delimiting the red and blue areas represents the frequency at which the mutations defining the original variant are found. (C) Three realizations of the random walk of Equation 9, all starting at x0.5. Two instances converge rapidly to frequencies 0 and 1, corresponding to apparent selective sweeps, while the remaining one oscillates for a longer time. (D) Representation of a partial sweep using the expiring fitness parametrization of Equation 11. The frequency x of the variant is shown as a blue line saturating at value β (gray line). The thin dashed line shows a selective sweep with constant fitness advantage s0. The fitness s is a red dashed line, using the right-axis.

Here, we focus on the mutation or set of mutations A that defines the initial variant. The initial growth rate advantage given by A eventually disappears, meaning that after some times we can consider it as neutral. As subsequent mutants appear, they either do so on the background of the wild-type, in which case they do not carry A, or on the background of the initial variant in which case they do carry A. If we suppose that recombination is negligible, the frequency of A increases or decreases as each new variant undergoes its own partial sweep. This process is shown in panel B of Figure 2, with shades of red (resp. blue) indicating a variant carrying A (resp. not carrying A). The thick line in between the red and blue surfaces indicate the frequency at which mutation A is found, and in practice moves up and down randomly.

The scenario illustrated in Figure 2 suggests that many aspects of the variant dynamics can be approximated by a simple abstraction: if x is the frequency of a mutation A, a new variant has a probability x to appear on the background of A and thus carry A, and a probability 1x to not carry A. If new mutants emerge well separated in time with rate ρ, meaning that they reach equilibrium before the next variant emerges, and if new variants have a similar cross-immunity with all existing variants (see Appendix 1.6), the dynamics of x(t) are described by a particular random walk: in each time interval dt, a partial sweep of amplitude β occurs with probability ρdtPβ(β), changing x in the following way:

x(t+dt)=x(t)+Δx,whereΔx={βxwith probability(1x),β(1x)with probabilityx. (9)

For example, if a new mutant appearing in the background of A does a partial sweep of amplitude β, the frequency of A among the fraction of strains (1β) not concerned by the sweep will still be x, and its frequency among the fraction β of strains concerned by the sweep will be 1. Overall, this gives a frequency change of Δx=(1x)β. A similar reasoning gives us the frequency change when the new mutant appears on the wild-type background. Finally, if no sweep occurs in the time interval dt, that is with probability 1ρdt, x remains unchanged. The resulting frequency dynamics of mutations have many similarities to the effect of ‘genetic draft’, that is the frequency dynamics of neutral mutations due to linked selective sweeps (Gillespie, 2000).

Examples of trajectories from the random walk are shown on panel C of Figure 2, all initially starting around x01/2. Two trajectories converge monotonically to 0 and 1. This is a consequence of one interesting property of Equation 9: the probability for Δx to be positive increases with x, but the magnitude of the upward steps decreases as 1x, and symmetrically with downward steps. This leads to trajectories leading almost exponentially to 0 and 1: it can in fact be shown that trajectories that always go downwards or upwards represent a finite and relatively large fraction of all possible trajectories (see Appendix 2.4). On the other hand, steps away from the closest boundary are unlikely but much larger, resulting in ‘jack-pot’ events (Hallatschek, 2018). This can be seen in the blue trajectory in Figure 2, which oscillates for a longer time.

It is also interesting to look at the moments of the step size Δx. The first two are easily computed, and we find

Δx=0Δx2=ρβ2Pβx(1x). (10)

The first moment being 0 means that for the random walk, increasingly probable but small steps towards the closest boundary (0 or 1) are exactly compensated by rarer but larger steps away from the boundary. Importantly, this means that on average, the trajectory of mutation A is not biased toward either fixation or loss, regardless of the frequency that the initial partial sweep brought it to. For instance, a mutation seen at frequency x0 should on average stay at this frequency, which means in practice that in a finite population, it has a chance x0 to reach 1 and fix, and a chance 1x0 to reach 0 and vanish.

On the other hand, the second moment resembles neutral drift Kimura, 1964: in neutral evolution, allele frequency also undergoes a zero-average random walk with the second moment having the form x(1x)/N with N being the population size. Therefore, this model would predict an ‘effective population size’ as Ne1=ρβ2Pβ completely independent of the size of the viral population. However, there are important differences to neutral drift: in neutral evolution, higher moments of order k>2 decay as N1k and are thus negligible in large populations, whereas here they are independent of N and scale as βkPβ. Depending on higher moments of Pβ, allele dynamics will deviate qualitatively from neutral behavior.

Abstraction as ‘expiring’ fitness advantage

In general, the dynamics of the SIR model proposed in Equations 1&2 depend on the interactions between N strains through an N×N cross-immunity matrix. While this model is useful to give a mechanistic explanation of partial sweeps, it is in general impractical to analyze and numerically simulate for many variants. The random walk model introduced above is simple to analyze and simulate, but assumes that variants rise to their equilibrium frequency instantaneously.

To explore the consequences of partial sweeps over broader parameter ranges, we propose an empirical model that has the same qualitative properties as the over-damped SIR, namely a growth rate that decreases as a strain becomes more frequent and partial sweep trajectories, but is simpler to analyze and simulate on a large scale. In this effective model, the growth rate s of the variant is not explicitly set by the susceptibility dynamics in the host population, but instead decays at a rate proportional to the frequency x of the variant:

x˙=sx(1x)ands˙=νxs. (11)

The dynamic of x in the first equation is simply given by the usual logistic growth with fitness s. To mimic increasing immunity against the invading variant, the growth advantage s decreases proportionally to the abundance of the variant (second part of Equation 11). The initial value of s0 is connected to the invasion rate of the SIR models given in Equation 38.

The dynamics of this new model are represented in panel D of Figure 2, with an initial frequency x01 and an initial growth rate s0=0.05. The initial growth of x is identical to a classical selective sweep of fitness s0 (represented as a dashed line). However, its fitness advantage gradually ‘expires,’ as shown by the red line in the figure. As the variant progressively ‘runs out of steam,’ its frequency finally saturates at a value β given by (Appendix 2.2)

β=1es0/ν. (12)

This final value β depends only on the ratio between the initial fitness advantage s0 and the rate of fitness decay ν. For a large enough s0, β can be arbitrarily close to 1, meaning that this model still accommodates for full selective sweeps as a special case. In the general case, x reaches its final value β<1 and remains there forever unless other variants appear.

It is important to state that the main aim of this effective model is to qualitatively reproduce the phenomenology of the SIR, and in particular the partial sweeps, while being is easier to simulate. It recapitulates the salient feature of invading immune evasive variants: (i) initial exponential growth, and (ii) eventual saturation at an intermediate frequency. We can thus use it to analyze the long-term consequences of the random walk dynamics of Figure 2. However, we do not expect the frequency of the variant x to have quantitatively equivalent dynamics in the two models. In particular, due to its simplicity, this model does not show the complex oscillatory behavior of the SIR model. Appendix 1.9 discusses in more detail the links between the parameters of the two models and the fundamental differences. While we can express the rate ν at which the growth rate declines in terms of the parameters of the simplest SIR models, for models with many groups or with oscillatory dynamics, the decay rate of the growth advantage should be interpreted as an effective parameter that captures a generic effect of reduced growth with increasing circulation.

Consequences for predictability and population dynamics

Accurate prediction of dominant viral variants of the future could improve the choice of antigens in vaccines against rapidly evolving viruses. Specifically, if a potentially adaptive mutation is observed in a viral population, one would want to know if the corresponding variants will grow in frequency, and if yes to what point? The typical traveling wave framework would predict that fast-growing variants should keep on growing until an even fitter one appears. This way of thinking about the prediction problem has shown mixed results. In the case of A/H3N2 influenza, for instance, we showed that there are few signatures that suggest fit variants grow in frequency consistently (Barrat-Charlaix et al., 2021).

In Figure 3, we reproduce some of our results of Barrat-Charlaix et al., 2021 and extend them to SARS-CoV-2. To quantify predictability, we ask the following question: given the state of a viral population at times 0,1,,t, what can we say about variant frequencies at times t+1,? We performed a retrospective analysis of viral evolution and identified all amino-acid mutations that were observed to grow from frequency 0 to an arbitrary threshold x. Adaptive beneficial mutations should in principle be overrepresented in this group and if they provide a persistent fitness advantage, we would expect them on average to keep on growing beyond x. Figure 3 shows these trajectories for the amino acid substitutions in the HA protein of A/H3N2 influenza, using data from 2000–2023,, and the SARS-CoV-2 genome using data from 2020–2023. Panels on the left show all trajectories that reached x=0.4, with their average displayed in black. The panels on the right show the average trajectory for different threshold values x between 0.1 and 0.8.

Figure 3. Retrospective analysis of predictability of viral evolution: frequency trajectories of all amino acid substitutions that are observed to rise from frequency 0 to x for Top: influenza virus A/H3N2 from 2000 to 2023, and Bottom: SARS-CoV-2 from 2020 to 2023.

Left: all trajectories for x=0.4, with blue ones ultimately vanishing and red ones ultimately fixing. The average of all trajectories is shown as a thick black line. Right: showing only the average trajectories for different values of x (gray lines).

Figure 3.

Figure 3—figure supplement 1. Example of mutation frequency trajectories that are increasing up to a frequency of 0.5 for H3N2/HA influenza and the expiring fitness model.

Figure 3—figure supplement 1.

For the latter, parameters used are α=s0=0.03 and three values of ρ to illustrate different clonal interference regimes. In each case, 10 randomly selected trajectories are plotted, with blue color indicating final loss and red final fixation.

While the dynamics of the variants of the two viruses can not be compared directly due to vastly different sampling intensities and different rates of adaptation, the qualitative patterns differ strikingly. In the case of influenza, trajectories of seemingly adaptive mutations show little inertia and on average hover around x instead of growing. This surprising result is in line with the study in Barrat-Charlaix et al., 2021 which used data from the period 2000–2018. On the other hand, trajectories of SARS-CoV-2 mutations show a much smoother behavior with steady growth beyond x. On longer timescales, however, we observe a systematic decrease in frequency: this is explained by the particular initial dynamics of SARS-CoV-2, where new variants arose at a rapid pace and replaced old ones. This process is often called clonal interference and reduces long-term predictability.

In our setting of eco-evolutionary adaptation, the random walk model predicts that the probability of fixation of an immune evasive variant is given by the final frequency β of its initial partial sweep. Subsequent allele dynamics and diversity are governed by an anomalous coalescent process driven by the random walk defined in Equation 9, leading to little predictability of evolution. This abstraction should hold when partial sweeps are instant and do not overlap, meaning that the rate ρ at which new variants emerge is small compared to their initial growth rate s0.

To explore the behavior of our partial sweep model in a more general setting, we simulate the evolutionary dynamics of a population under a Wright-Fisher model with expiring fitness dynamics. Simulations involve a population of N viruses with a genotype where each position can be in one of two possible states σi{0,1}. Fitness effects si are associated with mutations at each position, and the total fitness of a virus is given by F=iσisi. At each generation, viruses with a fitness F expand by a factor eF, and the next generation is constructed by sampling N individuals from the previous one. Following Equation 11, mutational effects si decrease by an amount νxisi, where xi is the frequency at which mutation i is found in the population.

We simulate the emergence of adaptive variants in the following way. At a constant rate ρ, we pick one sequence position i that has no polymorphism and set the fitness effect of the corresponding mutation to an initial value si, with an amplitude drawn from probability distribution Ps and the sign chosen such that the mutation is adaptive. In practice, we use an exponential distribution Pses/s0, meaning that the typical magnitude of initial fitness effects are described by only one parameter s0. The corresponding distribution of partial sweep size is described Appendix 2.3. At the same time, we introduce the corresponding mutant in the population at a low frequency, picking its background genotype from a random existing strain. The behavior of the model is determined by (i) the distribution Pβ of partial sweeps size depending on ν/s0, and (ii) the ratio of the variant emergence rate and their growth rate ρ/s0, which determines how often sweeps overlap and interact. The probability of two sweeps overlapping is defined in Appendix 2, Equation 42.

We use this simulation to address the question of predictability: given the state of the population at generations 0,1,,t, can we predict its state at future times t+1,? Specifically, we ask whether we can predict the frequency x(t+Δt) of a variant A, given it is at frequency x at time t, as we did previously for the influenza virus Barrat-Charlaix et al., 2021, see Figure 3. The dynamics of isolated selective sweeps (ρ/s01, ν/s01) should be perfectly predictable: after an initial stochastic phase when the variant is very rare, its frequency grows monotonically to fixation. This predictability decreases with increasing ρ/s0 due to clonal interference (Schiffels et al., 2011; Strelkowa and Lässig, 2012), for example when an adaptive variant is outcompeted by an even more adaptive one. We also expects predictability to decrease with increasing ν/s0 since sweeps are then partial and their ultimate fixation is determined by subsequent variants with dynamics that resemble a random walk.

To quantify these effects, we select from a long simulation all rising frequency trajectories of adaptive mutations that cross an arbitrary threshold x. The results are shown in panel A of Figure 4, where we show the average x(t) of rising frequency trajectories after crossing the threshold x=0.5. We use three rates of fitness decay: ν[0,s0/3,s0,3s0] and low clonal interference ρ/s0=0.05. The case ν=0 corresponds to a classical traveling wave scenario with constant fitness effects, and, as expected, is the most predictable: the average trajectory rises well above 0.5. For larger values of ν/s0, corresponding to a quicker decay of fitness, predictability gradually declines and becomes negligible for ν/s1. Note that this matches quite well with the predictions from the random walk model where the average change in frequency Δx is null.

Figure 4. Simulations under the Wright-Fisher model with expiring fitness.

(A) Average frequency dynamics of immune escape mutations that are found to cross the frequency threshold x=0.5, for four different rates of fitness decay. If the growth advantage is lost rapidly (high ν/s0), the trajectories crossing x have little inertia, while stable growth advantage (small ν/s0) leads to steadily increasing frequencies. (B, C, D) Ultimate probability of pfix(x) of trajectories found crossing frequency threshold x. Each panel corresponds to a different rate of emergence of immune escape variants, with four rates of fitness decay per panel. Increased clonal interference ρ/s0 and fitness decay ν/s0 both result in a gradual loss of predictability. We use s0=0.03. (E) Time to most recent common ancestor TMRCA for the simulated population, as a function of the prediction obtained using the random walk Ne=1/ρβ2. Points correspond to different choices of parameters ρ and Pβ, and a darker color indicates a higher probability of overlap as computed in Appendix 2.2.

Figure 4.

Figure 4—figure supplement 1. Probability of fixation of mutation Probability of fixation of mutations pfix(x) of mutation frequency trajectories found crossing the frequency threshold x.

Figure 4—figure supplement 1.

Fitness effects are exponentially distributed with fixed scale s0=0.03. The blue to red gradient in colors corresponds to the increasing rate ρ at which mutations are introduced. Strong clonal interference regime is obtained when ρ/s0>1, in which case good mutations are introduced in close succession and compete for fixation. At low ρ/s0, trajectories are very predictable and an increasing trajectory almost certainly fixes. Even for the highest ρ/s0, pfix(x) remains significantly larger than x and dynamics are visibly not neutral.

To explore parameter space more systematically, we quantify predictability as the probability of fixation pfix of rising variants that cross threshold x. In a perfectly predictable scenario with well-separated selective sweeps, pfix should be close to 1 regardless of x, while it should be equal to x in an unpredictable setting such as neutral evolution.

In panels B, C, and D of Figure 4, the probability of fixations are shown for three values of ρ/s0 and four values of ν/s0. Clonal interference increases when going from left to right among these panels (increasing ρ/s0), while the intensity of fitness decay increases when going from blue to red curves (increasing ν). Increasing either ρ/s0 or ν/s0 reduces pfix towards the dashed diagonal corresponding to pfix=x. However, as observed previously (Barrat-Charlaix et al., 2021), in the classic scenario with stable fitness effects ν/s0=0 considerable predictability remains even in cases of strong interference (blue curve in panel D and Figure 4—figure supplement 1). The strong interference setting is explored in more detail in Appendix 2.1 up to values ρ/s030, using similar simulations but without expiring fitness ν=0. Figure 4—figure supplement 1 shows that even in these cases of strong interference, pfix remains significantly above the diagonal.

Finally, we use our simulation to investigate typical levels of diversity in the population and the time to the most recent common ancestor. One quantity that can easily be estimated from the random walk model is the average pairwise coalescence time T2, that is the typical times separating two random strains from their most recent common ancestor (MRCA). In Appendix 2.5, we show that under the random walk approximation T2=1/ρβ2Pβ, which in neutral models of evolution would correspond to the effective population size Ne. A more detailed analysis of the coalescent process reveals that the random walk approximation corresponds to the so-called Λ-coalescent Schweinsberg, 2000; Berestycki, 2009.

In panel E of Figure 4, the average time to the common ancestor of pairs of strains in the population is plotted as a function of T2 predicted by the random walk model. Each point in the figure corresponds to one simulation of long duration with a given distribution of partial sweep size Pβ and a given ρ setting T2, with darker color indicating a higher probability of overlap as computed in Appendix 2.2. We find a good agreement between the empirical time to MRCA and the estimation from the random walk, at least as long as the probability of overlap between successive partial sweeps is small (indicated by shading). With increasing overlaps, coalescence slows down, and diversity increases: points in darker shades of red tend to have a larger time to MRCA than what is expected from the distribution of β. This is expected intuitively: if another adaptive variant emerges before the previous one has reached its final frequency, it has a lower probability of landing on the same background and thus tends to be in competition with the first variant. This leads to a smaller effective β which slows the dynamics.

Discussion

Evolutionary adaptation is often pictured as an optimization problem in a static environment. In many cases, however, this environment is changed by the presence of the evolving species, for example, because a host population develops immunity or a dynamic ecology. Here, we have explored the consequences of such eco-evolutionary dynamics in a case of host-pathogen co-evolution where different variants of a pathogen shape each other’s environment through generation of cross-immunity.

Influenza virus evolution has been the subject of intense research with efforts to predict the composition of future viral populations (Bush et al., 1999; Luksza and Lässig, 2014; Neher et al., 2014; Huddleston et al., 2020). The A/H3N2 subtype in particular undergoes rapid antigenic change through frequent substitutions in prominent epitopes on its surface proteins (Smith et al., 2004; Bhatt et al., 2011; Neher et al., 2016; Kistler and Bedford, 2023). Given the clear signal of adaptive evolution, one might expect A/H3N2 to be predictable in the sense that variants that grow keep growing. Yet, it has been difficult to find convincing signals of fit, antigenically novel, variants that consistently grow and replace their competitors (Barrat-Charlaix et al., 2021; Huddleston et al., 2020). In contrast, SARS-CoV-2 evolution has been consistently predictable in the sense that dynamics are well modeled by exponentially growing variants that compete for a common pool of susceptible hosts. However, even in this case, taking into account the immune adaptation of hosts leads to a better description of variant dynamics Meijers et al., 2023.

We have shown that depending on (i) the heterogeneity of immunity in the population, (ii) the asymmetry between backward and forward cross-immune recognition, and (iii) waning or turn-over of immunity, the immune escape can either lead to dynamics dominated by selective sweeps, or to one were escape mutations have an initial growth advantage that dissipates before the variant fixes. The former scenario is observed when initial growth is fast, backward immunity high, and waning slow compared to variant dynamics. In this case, new variants can rise to high frequency driven by their own advantage and fix. Immunological heterogeneity slows down the initial rise, allowing for population immunity to respond and adjust before the variant has been fixed.

This process of partial sweeps reconciles two seemingly contradicting observations: HA evolution in human influenza A virus is clearly driven by adaptive immune escape and most substitutions are clustered in epitope regions (Bhatt et al., 2013). On the other hand, most substitutions does not sweep to fixation but tends to meander in a quasi-neutral fashion (Barrat-Charlaix et al., 2021). In the partial sweep scenario proposed here, diversity is dominated by immune escape mutations that are rapidly brought to macroscopic frequency by their initial growth advantage, but their ultimate fate is determined mostly by subsequent mutations.

In any real-world scenario, there will be a variety of mutations, including some mutations that perform complete selective sweeps, either because they escape immunity of a large fraction of the population (M small), because they generate robust immunity against previous strains (‘back-boost’ Fonville et al., 2014), or because of the increase in the intrinsic transmissibility of the virus (for example reverting a previous escape mutation that had a deleterious effect on transmissibility). The degree to which partial sweeps matter will vary from virus to virus and will change over time. Recently emerged viruses circulate in a homogeneous immune landscape and adapt to the new host for some time, consistent with rapid and complete sweeps of variants in SARS-CoV-2. Similarly, the influenza virus A/H1N1pdm, which emerged in humans in 2009, exhibited more consistent trajectory dynamics than A/H3N2 (Barrat-Charlaix et al., 2021).

More generally, qualitative features of the partial sweep dynamics investigated here are expected to exist in any system where the environment responds to evolutionary changes on time scales comparable to the time it takes for the adaptive variants to take over, leading to eco-evolutionary dynamics (Pelletier et al., 2009). In ecological systems involving eukaryotes, it is the evolutionary part of this interaction that is thought of as slow, while ecology is fast. In the cases of rapidly adapting RNA viruses in human populations with long-lived immunological memory, models often assume that viral adaptation is fast while hosts have long-lasting memory. The most complex and least predictable dynamics are expected when the evolutionary and ecological time scales are similar and different host-pathogen systems will fall on different points along this axis.

Materials and methods

Code availability

Data availability

Sequence data of influenza viruses was obtained from GISAID (Shu and McCauley, 2017). We thank the teams involved in sample collection, sequencing, and processing of these data for their contribution to global surveillance of influenza virus circulation. A table acknowledging all originating and submitting laboratories is provided as supplementary information.

Sequence data of SARS-CoV-2 viruses was obtained from NCBI and restricted to data from North America to ensure more homogeneous sampling. We are grateful to all teams involved in the collection and generation of these data for generously sharing these data openly.

Acknowledgements

We gratefully acknowledge research support from the University of Basel (core funding) and the Swiss National Science Foundation (grant 310030_188547).

Appendix 1

SIR model

Equilibrium of the SIR model with one immune group

To help us compute the equilibrium reached by the SIR model, we introduce additional notation: the vectors S=[S1,,SN] and I=[I1,,IN] will, respectively, represent the compartments of hosts susceptible and infectious to each strain; the matrix K describes the cross-immunity; the vector 1 is a vector of dimension N whose elements are all equal to 1.

We derive the equilibrium state for Equations 1&2. For each strain, a, equilibrium for Ia is reached when Sa=δ/α. We thus have Seq=δ/α1. Introducing this in the equation for the derivative of Sa, we obtain the following equilibrium:

Seq=δα1,Ieq=γδ(1δα)K11. (13)

An interesting remark is that at equilibrium, I is of order γ/δ1. Note that the structure of K makes it invertible in most cases. Indeed, we impose Kaa=1 and 0Kab<1 for ab.

Equilibrium for two viruses with one immune group

We consider the case where two viruses are present, called wild-type (wt) and mutant (m). The cross-immunity is represented by a 2×2 matrix

K=[1bf1], (14)

As shown in the previous section, the equilibrium is given by

Swt=Sm=δαI=γδ(1δα)K1, (15)

where I stands for [Iwt,Im] and 1 for [1,1]. It is straightforward to invert the cross-immunity matrix, and we obtain

Iwt=γ(1δα1)δ(1bf)(1b)Im=γ(1δα1)δ(1bf)(1f). (16)

Note that without cross-immunity, the number of infected by either virus would be γ(1δα1)δ. Positive values of b and f thus have the effect of lowering the equilibrium values of Im and Iwt with respect to the absence of cross-immunity.

It is interesting to compute the fraction of infections due to the mutant at equilibrium. This is easily derived from the relations above:

ImIwt+Im=1f2bf. (17)

A few observations can be made:

  • if b=1 and f<1, then the wild-type vanishes at equilibrium and the mutant reaches frequency 1. In this case, the presence of the mutant alone is enough to keep Swt to its threshold value R01, making it impossible for the wild-type to grow.

  • Inversely, if f=1, then the mutant stays at frequency 0.

  • If b,f<1, the mutant will reach a finite frequency x, with x>0.5 if b>f and x<0.5 if b<f.

Equilibrium without the mutant

We first derive the equilibrium situation before the mutant virus is introduced in the case with only one immune group. We remind that in this case there is only one cross-immunity matrix which has the form

K=(1bf1),

where b is the immunity to the wild-type caused by an infection with the mutant, and f the reverse.

Since the mutant is absent from the host population, we assume Im=0. The equilibrium values for Swt and Iwt are easily obtained from the dynamical equations:

Swt=δ/α=R01Iwt=γδ(1Swt). (18)

We then set the derivative of Sm to 0:

αfSmIwt+γ(1Sm)=0Sm=11+f(R01)=δδ+f(αδ)>δα=Swt. (19)

Since we assume f<1, the initial number of susceptibles to the mutant will be larger than δ/α, allowing the initial growth of the mutant. Using the dynamical equation for Im, the initial growth rate of the mutant can be written as

I˙m(t=0)=αSmδ=δ(αδ+f(αδ)1). (20)

If f=0, the growth rate is αδ, i.e., the one expected in a fully naive population. If f=1 however, the growth rate is 0 as the wild-type confers perfect immunity to the mutant.

The equations above generalize to more immune groups. Cross-immunity matrices Ki now depend on parameters fi and bi, and the initial number of susceptibles in immune group i is given by

Sim=δδ+fi(αδ). (21)

In a given immune group i, the mutant growth rate is proportional to Simδ/α. The growth rate of the mutant will thus be initially faster in immune groups for which it is antigenically different, i.e., fi<1, than in groups where it is similar to the wild-type, i.e., fi1.

In the case of a well-mixed population, that is Cij=1/M, we can write the growth of the infections by the mutant Im=iIim as an exponential growth with a time-dependent rate. In this case, the overall growth rate is given by the derivative of Im=iIim:

I˙m=(αMi=1MSiaδ)Im (22)

In particular, using the invasion scenario from the main text with ε=0 (i.e.fi=1) in M1 group and an arbitrary value f in group 1, we obtain the following growth rate at t=0:

I˙m=δM(αδ+f(αδ)1)Im. (23)

That is, the initial growth rate for M groups is M times smaller than the one in the single group case.

In the case of a non-well-mixed population, i.e., arbitrary Cij, it is not possible to write a pseudo-exponential growth rate as in Equation 22. However, it is clear that the initial growth rate will also be smaller than in the single group case since the mutant initially only grows in group i=1.

Equilibrium with M immune groups

For M immune groups and arbitrary cross-immunity matrices Ki, the equilibrium frequency of the two strains is not easy to compute. However, it is possible to give an analytical expression in the regime of fast mixing Cij=M1 and when the two strains differ immunologically for only one group, i.e., for matrices.

K1=(1bf1)andKj=(1111)forj>1.

Note that this corresponds to the situation studied in the main text with fast mixing and ε0. We show here that in this case, the equilibrium frequency for all immune groups is the same as the one obtained for only one immune group with matrix K1. In other words, the expression for β in Equation 6 is still valid.

To prove this, we assume the following form for the solution of the dynamical equations:

i{1M},a,Sia=MδαIiaIaandIiaIi=IaI=νa, (24)

where the index a runs over all strains (here wild-type and mutant), and where we have defined the infectious levels for group i, for strain a and globally:

Ii=aIia,Ia=iIia,I=i,aIia.

Note that the second equation in Equation S24 means that the frequency νq of strain a is the same across all immune groups and consequently also globally.

We now show that injecting these expressions of S and I in the dynamical system and solving for νa gives the expected result. First, note that with this choice of Sia, the derivative of Iia given by equation Equation 1 immediately vanishes. We thus concentrate on S˙ia given by Equation 2. For any immune group i and strain a, we have

S˙ia=0=δIiaIabKiabIb+γ(1MδαIiaIa).

where we have used Cij=1/M and Ib=jIjb to remove the sum on immune groups. Multiplying this equation by νa=Ia/I, we obtain

δIiabKiabνbγ(νaMδαIiaI)=0.

We now eliminate Iia by using the expression Iia=Iiνa:

(17)δIiνabKiabνbγ(1MδαIiI)νa=0,(18)δIibKiabνbγ(1MδαIiI)=0.

Note that this last expression is true for all strains a. Considering any two strains a and b, we can thus write

δIicKiacνcγ(1MδαIiI)=0,δIicKibcνcγ(1MδαIiI)=0,a,bc(KiacKibc)νc=0,

where the last expression is obtained by subtracting the two previous ones. First, we see that for i>1 any frequency vector ν is a solution since Kiab=1 for all a,b. For i=1 and defining νm=β and νwt=1β and using the expression for K1, we obtain

(1f)(1β)+(b1)β=0β=1f(1b)+(1f)

as claimed.

This result is not completely trivial and should be commented. In this setting, the mutant escapes immunity built by the wild-type for a fraction 1/M of the population, and yet it reaches the same frequency as in the case with one immune group. This can be rationalized as follows: for immune groups i>1, the cross-immunity matrix is such that the wild-type and mutant strains are completely equivalent. If immune group 1 was not here, the mutant could thus equilibrate at any frequency between 0 and 1. Since it was initially introduced at a very low frequency, it would remain marginal in immune groups i>1. However, since its ‘natural’ equilibrium frequency in group i=1 is β and since the groups are connected, equilibrium is reached when the mutant reaches frequency β in all groups.

Note that if we take the situation of the main text with

Ki=(1εε1)

and ε>0, the expressions above do not hold. However, if ε1, the perturbation is small, and we expect an equilibrium frequency close to β, which is the case in Figure 1.

Realistic modeling of the host’s immune state

The SIR model proposed in the main text relies on the assumption of immunity acquisition through exposure. This explains terms like αbSaKabIb in the derivative of Sa: acquiring immunity to a through cross-immunity Kab requires a combination of prior susceptibility and exposure to strain b. Importantly, this does not depend on the immune state of the hosts with respect to strain b.

A more realistic representation would be one where acquiring immunity to strain a from exposure to b requires being infected by b. However, this would require keeping track of more precisely of the immune state of hosts, as we would need to separate hosts into two groups, namely

  • hosts who are susceptible to both a and b, and can thus acquire immunity to a through infection by b;

  • hosts who are susceptible to a but immune to b, and can no longer acquire immunity to a through infection by b.

To test whether our results are robust to such changes in hypothesis, we write a simple SIR model with two strains a and b where cross-immunity is only activated through infection rather than exposure. To properly track the immune status of the hosts, we introduce the groups Ra and Rb, respectively representing hosts immune to only a or only b, and Rab representing hosts immune to both a and b. The compartment R0=1RaRbRab groups hosts susceptible to both strains. It is simpler in this case to write the dynamics in terms of compartments I and R, rather than I and S as in the main text. For simplicity, we do not use immune groups here. The dynamics involve two equations for the infected:

Ia=α(1RaRab)IaδIa,Ib=α(1RbRab)IbδIb, (25)

and three for the immune:

Ra=α(1Kba)R0IaγRa,Rb=α(1Kab)R0IbγRb,Rab=αR0(KbaIa+KabIb)+α(RaIb+RbIa)γRab. (26)

In Appendix 1—figure 1, we show both that the dynamical and equilibrium properties of this model are qualitatively the same as the one from the main text. On the left panel, we show that the dynamics of this new model do not differ qualitatively from the model of the main text. In particular, in the invasion scenario, the frequency of the variant converges to some equilibrium value after some oscillations. On the right, we show that this equilibrium value β is different but relatively close to the one from the main text.

Appendix 1—figure 1. Comparison of SIR model implementations.

Appendix 1—figure 1.

Left: Dynamics of the frequency of the variant for the Susceptible-Infected-Recovered (SIR) model from Equations 25; 26 using the invasion scenario from the main text. Two 2×2 cross-immunity matrices are used, with off-diagonal parameters f and b chosen to give the same equilibrium. The gray line represents the equilibrium that would be obtained using the model of the main text. Right: Equilibrium frequency β for this new SIR model (y-axis) versus the β from the main text (x-axis). Each point corresponds to a given pair (f,b).

Change in frequency when adding subsequent strains

This section shows that under certain condition adding a new variant to the SIR model does not change the relative frequencies of previous variants. This is an important condition for the random walk of the main text to be valid.

Here is a quick summary of the results proved below. Adding a variant to the SIR model involves adding a column b and a row f to the cross-immunity matrix, which can be given by two vectors. If these vectors only depend on one parameter, i.e., b=b1 and f=f1, then the relative frequency of previous strains is unchanged in the new equilibrium. What this means in practice is that the new strain must be at an equal ‘antigenic distance’ from all previous strains. A possible interpretation is an antigenic space of infinite dimensions: all mutations explore an antigenic region which is new.

We start from an initial situation where there are N variants with an N×N cross-immunity matrix K. At equilibrium, the number of hosts infected by each virus a{1N} is given by the elements of the vector I that can be computed from the cross-immunity matrix and parameters of the model:

I=γ(1δ/α)γ+δK11N, (27)

where 1N is a vector containing only 1’s and of length N. The relative frequency of the variant a with respect to variant b is simply defined as

fab=IaIa+Ib. (28)

We assume that the initial population has reached equilibrium.

We now add a new virus to this population, with index N+1. The new cross-immunity matrix K~ is now written as

K~=[Kbf1], (29)

where b and f are two vectors of length N. This is a general way to write that the backward cross-immunity to variant a caused by an infection with the new variant N+1 is ba. Inversely, the forward cross-immunity to variant N+1 caused by an infection with an old variant a is fa.

This new cross-immunity matrix will of course result in a new equilibrium for the number of infected hosts, given by the vector I~:

I~=γ(1δ/α)γ+δK~11N+1.w

The question we ask here is whether the relative frequency of two variants 1a,bN is changed by the addition of the new variant. In other words, we want to know whether the equality below holds:

IaIa+Ib=?I~aI~a+I~b.

Below, we prove this equality under a condition for cross-immunity of the new variant b and f:

b=b1Nandf=f1N, (30)

where 0<b,f<1 are scalars. This amounts to say that cross-immunity is the same between the new variant N+1 and any old variant a, i.e., that the new variant is at an equal antigenic distance from all previous variants.

To prove the equality, we perform the computation K~11N+1. To do that, we make use of the following formula for inverting a block matrix:

[ABCD]1=[A1+A1BΛCA1A1BΛΛCA1Λ],

where we defined Λ=(DCA1B)1. The following identities map to our problem:

A=K,B=b,C=F,D=1.

We immediately see that Λ=(1fK1b)1 reduces to a scalar that we note λ for more clarity. We also define the other scalar value μ=1NK11N. A few manipulations give us the following for K~1:

K~1=[K1+λK1bfK1λk1bλfK1λ]

Multiplying this by 1N+1 results in

(1)I~=K~11N+1(2)=[(K1+λK1bfTK1)1NλK1bsizeN;λfTK11N+λscalar](3)=[I+bfμλIbλI;λ(1μf)](4)=[(1bλ+bfλμ)I;λ(1μf)],

where we used the equalities K1b=bK11N=bI.

This result essentially shows that after adding the new variant, the fraction of hosts infected by the previous variants if simply multiplied by a scalar value 1bλ(1fμ). This implies that the relative frequencies of the original variants are conserved when adding a new one.

Case with intrinsic fitness effects

In the SIR model of the main text, we assume that the transmission rate α is the same for the different strains. It is also interesting to investigate the case where this transmission rate varies. Here, we study a simple extension of the SIR model without immune groups where there are two variants – mutant and wild-type – with respective transmission rates αwt=αϕwt and αm=αϕm. The quantities ϕwt,ϕm[δ/α,] can be interpreted as intrinsic fitness values for the two strains. Note that if ϕa<δ/α, the strain a cannot grow even in a fully susceptible population. The cross-immunity is as usual defined by matrix K with off-diagonal terms f and b.

The equations of motion are now

S˙a=αSab{wt,m}ϕbKabIb+γ(1Sa)I˙a=αϕaSaIaδIa. (31)

Computing the equilibrium, we immediately obtain

Sa=δαϕaI=γδG1h, (32)

where we have defined the following quantities:

ha=αϕaδαϕa,G=(1bsfs11),s=ϕmϕwt. (33)

The quantities ha can be interpreted as a scaled growth rate of each variant given a fully susceptible population, and the matrix G combines the cross-immunity and the ratio of fitness values ϕ. Note that it is straightforward to generalize these equations to an arbitrary number of strains: the relevant quantity will be the scaled cross-immunity matrix defined by Gab=ϕbϕaKab.

Inverting G and simplifying the equations a bit, we obtain

Iwt=γδαϕwtδαϕwt1bξ1bf,Im=γδαϕmδαϕm1fξ11bf, (34)

where we defined

ξ=αϕmδαϕwtδ. (35)

Note the interesting structure of Equations 34: for each variant, they involve a first term αϕaδαϕa that depends only on the intrinsic growth rate of the mutant, and a second 1bξ1bf that involves cross-immunity and relative growth rate through ξ.

These equilibrium equations give us two conditions for the co-existence of the two variants,

b<ξ1andf<ξ, (36)

respectively, corresponding to Iwt>0 and Im>0. We mention three interesting cases below.

  • If the mutant has an intrinsic fitness disadvantage ξ<1, it will only be able to invade if f<ξ. Since f represents the probability that a host becomes immune to the mutant if infected by the wild-type, this means that the immune ‘niche’ of the mutant must be large enough when compared to ξ.

  • Invertly, if the mutant is fitter and ξ>1, the mutant is always able to invade. The wild-type only survives if b<ξ1, meaning that the immunity to the wild-type caused by the mutant must be small enough.

  • If one considers a situation without total cross-immunity, *i.e.,* b=f=1, the only way a mutant invades is if ξ>1 meaning ϕm>ϕwt, and the result is a full selective sweep.

Oscillations of the SIR model

The SI model from the main text tends to oscillate while returning to equilibrium. Here, we study this behavior in the simple case of one immune group (M=1) and two viruses (wild-type and variant).

The idea is to linearize the dynamical equations around the equilibrium. This gives us

X˙=QX,

where X=[Swt,Sm,Iwt,Im] and

Q=(αγ/δ0δδb0αγ/δδfδg10000g200)g1=γδ(αδ)1b1bfg2=γδ(αδ)1f1bf.

To quantify the convergence to equilibrium is the frequency of the oscillations, we need the eigenvalues of matrix Q. For low enough γ, we can prove that the four eigenvalues are

λ1=12αδγ±i(γ(αδ)14α2γ2δ2)1/2λ2=12αδγ±i(γ(αδ)(1b)(1f)1bf14α2γ2δ2)1/2.

This is only valid if the terms in the square roots above are positive, which requires γ to be small enough. In our setting, we assume γα,δ, so this will always hold.

From the eigenvalues we can compute:

  • The rate of convergence to equilibrium: αγ/2δ. This means that convergence is slower for a smaller waning rate γ.

  • The two oscillation frequencies that appear:
    ωhigh=γ(αδ)2π,ωlow=ωhigh(1b)(1f)1bf.

Note that since (1b)(1f)/(1bf)1, we always have ωlowωhigh. With the values of the main text α=3,δ=1,γ=5103 (units are inverse of generations), we obtain ωhigh0.016. If one generation is 1 wk, this gives us a period ωhigh160w, that is approximately 1 y.

Link between parameters of the SIR and expiring fitness models

The effective expiring fitness model used in the second part of this work is characterized by the system of differential equations

x˙=sx(1x)ands˙=νxs,

where x is the frequency of the mutant strain. Here, we try to express the dynamics of the SIR model in this form to find a link between its parameters and the quantities s and ν.

We first focus on the case with two strains and one immune group. The frequency of the mutant is x=Im/(Im+Iwt). Using the logit function ψ(x)=log(x/(1x)) and the dynamical equations of the SIR, we find

ddtψ(x)=α(SmSwt), (37)

which allows us to define the fitness in the SIR case: s=α(SmSwt). At the beginning of the invasion, the initial growth rate is readily computed:

s(t=0)=(1f)(αδ)δ+f(αδ)δ, (38)

which is the same as the initial growth rate of Im. Note that if f=1, the initial growth rate is 0.

We then compute the time derivative of s early in the invasion, when Iwt,Swt, and Sm are close to their equilibrium values. In this case, a straightforward calculation gives

s˙=α(Iwt+Im)α(SmbSwt)xα(Iwt+Im)xs, (39)

where the approximation is valid if 1b1. This would give an expiry rate of fitness ν=α(Iwt+Im) in the case of the SIR model.

These results can also be obtained in the case of immune groups. We then have the following expressions for s and ν at t0:

s=αMi=1M(SimSiwt)s˙=α(Iwt+Im)xi=1M(SimbiSiwt)=α(Iwt+Im)xs, (40)

where the second expression is again valid if 1bi1.

Another question is that of the link between cross-immunity parameters b and f and the distribution of partial sweep sizes β. The relation between cross-immunity and β given by Equation 6 shows that the distribution of partial sweep size depends on the distribution of both b and f. As we do not have a prior on how b and f should be distributed, we explore the case where 1f and 1b are exponentially distributed. In other words, we define ϵf=1f and ϵb=1b, with the following distributions

P(ϵf)e-ϵf/λandP(ϵb)e-ϵb/μ.

The expression of the partial sweep size becomes beta=ϵf/(ϵb+ϵf). Note that both ϵ should remain smaller than 1, which is not guaranteed with exponential distributions. However, this is not problematic if μ and λ take small enough values.

These assumptions allow us to compute the distribution of beta:

P(β)=μ/λ(μλβ+(1β))2

with support on the interval 0β1. Appendix 1—figure 2 shows the various shapes that P(β) then takes for different values of the μ/λ parameter. Note that if μ>λ, b tends to be higher than f and β is biased towards one. If μ=λ, P(β) becomes uniform on the [0,1] range.

Appendix 1—figure 2. Distribution of partial sweep size β if 1f and 1b are exponentially distributed with respective scales μ and λ.

Appendix 1—figure 2.

Left: Probability distribution function P(β) for various values of μ/λ. Right: Mean and standard deviation of β as a function of μ/λ.

The exponential distribution away from one of f and b is a reasonable assumption, and allows analytical derivation of P(β). Of course, any other distribution of f and b could be considered. For this reason, we choose to use a Beta distribution for Pβ in the analysis of the main text, as it can accommaodate various shapes.

Appendix 2

Expiring fitness model and random walk

Clonal interference

In non recombining genomes with a large mutation rate, the appearance of many adaptive mutations in close succession leads to clonal interference. In this regime, beneficial mutations present on different background compete for fixation, and the success of a mutation does not depend only on its fitness effect but also on the global state of the population. For this reason, clonal interference causes a decrease in predictability: dynamics are not deterministic but rather depend on the precise structure of which mutation appeared on which background. For instance, a beneficial mutation that increases in frequency can be outcompeted by a fitter one before it has the time to fix, making the extrapolation of frequency trajectories difficult.

We conduct simulations to quantify how much predictability decreases because of clonal interference, based on the ones of the main text. We study a large population of N=105 genomes of length L, where each genome position i can be in either of two states σi{0,1}. Fitness effects siR are associated to each position, and the fitness of an individual is F=isiσi. To simulate the adaptation of the population, we proceed in the following way: at a constant rate ρ, we pick a position i that is non-polymorphic and set the fitness effect si by sampling its magnitude from distribution Ps and choosing its sign in a way that favors mutations (positive if σi=0 is more frequent, negative otherwise). At the same time, the corresponding mutation (σi=0 if si<0 and inversely) is introduced in the population at a small frequency δf=0.01. We choose Ps to be an exponential distribution with scale s0=0.02, in agreement with findings on the distribution of fitness effects in Schiffels et al., 2011, Rice et al., 2015.

There is no fitness decay in this simulation, and the parameters N, s0, and δf have numerical values such that neutral drift is mostly irrelevant to the dynamics of mutations. For example, without accounting for interference, the probability of fixation of a mutation of effect s0 when it is introduced at frequency δf is pfix(δf)1e40. As a consequence, the two parameters governing the dynamics are the scale of fitness effects s0 and the rate of introduction of beneficial mutations ρ. The ratio of these two quantities represents the amount of clonal interference: at low ρ/s0, mutations are mostly independent, while at high ρ/s0 they strongly interfere.

We measure the probability of fixation pfix(x) of mutations found in a frequency bin [xδx,x+δx] over a long simulation. We only consider mutations with increasing frequency, meaning that their frequency was below x at all times and at some point was measured in the frequency bin. Figure 4—figure supplement 1 shows pfix(x) as a function of x for different values of ρ/s0. According to intuition, a low clonal interference value leads to easily predictable fixations: whatever the frequency x at which it is observed, a mutation that is increasing in frequency fixes with a very high probability. Increasing clonal interference clearly makes dynamics less predictable and closer to neutrality, with pfix approaching the diagonal line. However, even in a regime of strong interference, e.g.,ρ/s0=32, deviations from neutrality remain very clear.

Expiring fitness effects: Sweep size and probability of overlap

This section gives a few results about the expiring fitness equations from the main text. We rewrite the equations here for reference:

x˙=sx(1-x),s˙=-νxs.

First, we prove the expression for the amplitude of the partial sweeps. We divide the equation for x˙ by the one for s˙ to obtain

dxds=ν1(x1).

This immediately gives us

x(s)=1+λes/ν

with a constant λ. At t=0, we have s=s0 and x=x01, while for t we have s=0 and x=β to be determined. From the t=0 case we obtain λ=(x01)es0/ν, and from t we get β=1+λ. Assuming x01, we obtain the result of the main text:

β=1-e-s0/ν.

We now try to find an expression for the probability that two partial sweeps overlap. First, we try to estimate the time it takes for one partial sweep to complete. While we could not solve the differential equations of the main text analytically, we can give an approximate expression for the time-dependent frequency x during the partial sweep:

x(t)x0βx0+(βx0)est,

where β is a function of s and ν and x0=x(t=0). This is simply the expression of a logistic growth starting at x0 and saturating at β. From there, we compute the time Tr(s) it takes a partial sweep of initial fitness s to reach a frequency rβ with x0β1<r<1. We quickly find

Tr(s)=s1log(βx0r1r).

We now consider that two consecutive partial sweeps of initial fitness values s1 and s2 overlap if the first one is not yet at frequency rβ1 while the second one is already at (1r)β2. In the figure of the main text, we use r=3/4: an overlap occurs if the first sweep is not yet at 3/4 of its final value while the second one is already at 1/4 of its final value. Thus, for an overlap to occur, we need the time τ between the two partial sweeps to be smaller than Tr(s1)T1r(s2). For sweeps happening at rate ρ, this has probability 1exp(ρ(Tr(s1)T1r(s2))). Since the two sweeps have random initial fitness effects, we find that the overall probability for two consecutive sweeps to overlap is

Pr(overlap)=0ds1ds2Ps(s1)Ps(s2){1eρ(Tr(s1)T1r(s2))}.

This integrates over all possible pairs of sweep amplitudes (or initial fitnesses) and weighs them by the probability that the time between the two leads to an overlap. It is this quantity (computed numerically) that is used for the scale of the colorbar in panel E of Figure 4 of the main text.

Distribution of partial sweep size β

This section discusses the distribution of the size of partial sweeps β in the context of Equation 11 of the main text as well as the choice of parameters for panel E of Figure 4.

A first interesting case is when fitness effects are exponentially distributed, with parameter s0:

P(s)=s0-1e-s/s0.

This is the distribution we use in most of the population simulations. We compute the corresponding distribution of β in a straightforward way:

P(β<x)=P(s<νlog(1x))=s010νlog(1x)es/s0ds=1(1x)ν/s0.

Taking the derivative with respect to x, we obtain

P(β)(1β)ν/s01.

This distribution can accommodate various shape: for ν/s0>1 it peaks at 0, and for ν/s0<1 it peaks at 1. We can also compute the following formula for the moments of β:

β=s0s0+νandβ2=s0s0+ν2s02s0+ν.

When investigating the coalescence time in the main text, we use a different distribution for fitness effects. In this case, we want a finer control over the second moment of β, and we decide to sample β directly using a Beta distribution. The Beta distribution has a support over [0,1] and can accommodate many different shapes. It is defined by two parameters a and b:

P(β)βa1(1β)b1.

In our case, it is more practical to parametrize it by its mean m and variance v. For given m and v<m(1m) we have

a=(m(1m)v1)m,b=(m(1m)v1)(1m)

In the case of panel E of Figure 4 and in order to explore a wide range of distributions, we used three values of m, and for each m

  • a low variance v=εm2 with ε=105

  • a high variance v=m(1m)/3

For a given set of parameters defining a Beta distribution, we decide on the fitness effects by first sampling a β for each new adaptive mutation, and then computing s by using Equation 12 from the main text. For each distribution Pβ, the simulation is performed for 6 values of ρ[0.003,0.01,0.018,0.032,0.056,0.1].

Random walk: Monotonous trajectories

Here, we compute the probability that in the random walk defined in the main text, a trajectory starting at x0 converges straight to 0 without ever taking an upward step. While going to 0 requires an infinite amount of downward steps, the probability is still finite since the steps are increasingly likely to go down. For simplicity, we compute this for a fixed β.

If the random walk always goes down, its position at time step t will be xt=(1β)tx0. Since the probability of going down is 1xt, the probability of always going down is

Pdown=t=0(1(1β)tx0). (41)

We simplify this expression by taking the logarithm and assuming that (1β)t1 for t1:

Pdown(1x0)ex0(1β1).

Since the random walk is invariant by the change x1x, we can easily compute the probability of a trajectory always going up, and thus of a monotonous trajectory going straight to either boundary 0 or 1.

Appendix 2—figure 1. Probability of a strictly monotonous trajectory in the random walk of the main text, as a function of β (fixed) and the initial value x0.

Appendix 2—figure 1.

The ‘exact’ solution is obtained by numerically computing the product in Equation 41 up to t=100.

The quality of the approximation is quite good, as can be seen from Appendix 2—figure 1. The same figure also shows that this probability is relatively high, even for x0 far from either boundary.

Coalescent

Consider a partial sweep happening between generations t and t+1, with probability ρ. One individual A in generation t will then have βN children in generation t+1. Any individual in generation t+1 has a probability β of having A as a direct ancestor, and a probability 1β of the opposite. If we consider n lineages at generation t+1 and look backward in time, the probability that at least k out of n have A as an ancestor is βk. Averaging over Pβ, we find the probability of k specific lineages to have a common ancestor in the previous generation:

qk=ρβk.

Another useful quantity is the probability λn(k) that given n lineages, a particular set of exactly k lineages merge one generation back. If a partial sweep of known amplitude β took place, this requires the set of k lineages to merge at this generation, with probability βk, and that the other nk do not merge, with probability (1β)nk.

λn(k)=ρβk(1β)nk,=01βk(1β)nkP(β)dβ,=01βk2(1β)nkβ2P(β)β2dβ. (42)

This turns out to be the definition of the Λ-coalescent with Λ(β)β2P(β)Schweinsberg, 2000, Berestycki, 2009. The Λ-coalescent is a general model for genealogies of multiple mergers. We mention two interesting subcases:

Finally, we derive a few more properties of the partial sweep coalescent and show the explicit link to Kingman’s when β1. Using the λn(k)’s, we can compute the times Tn: the time during which exactly n lineages are present in parallel in the genealogy. If there are n lineages present, any coalescence will lower the number of lineages below n. The time Tn is thus exponentially distributed with rate ν(n), where ν(n) is the total rate of coalescence given n lineages:

ν(n)=k=2nρ(nk)λn(k).

Since we have k=0nρ(nk)Λn(k)=ρ(1β+β)n=ρ, we finally obtain

Tn1=ρ(1(1β)nnβ(1β)n1). (43)

With nβ1, we now exactly recover the Kingman coalescent. For simplicity, we assume a constant β and expand Tn up to the second order in nβ, to obtain

Tn1=ρβ2n(n1)2=n(n1)2Ne.

These are the times expected for the Kingman coalescent with population size Ne=1/ρβ2.

In the high n limit, we also obtain Tnρ1, since quantities of the type (1β)n vanish. This is expected as coalescences only take place when a partial sweep happens, with rate ρ. It is another qualitative difference with the Kingman coalescent: since Tnρ1 for all n, one must wait a time ρ1 to observe the first coalescence even in large trees. The shortest branches will thus always be of order ρ1. In contrast, in the Kingman process, the shortest branches vanish when the number of lineages n increases. This difference is clearly visible when looking at terminal branches of trees in Appendix 2—figure 3.

Appendix 2—figure 2. Average coalescence times Tn for a partial sweep coalescent with effective population size Ne and a Kingman coalescent with population size Ne.

Appendix 2—figure 2.

For simplicity, a constant β is used: Left: a high value β=0.25; Right: a low value β=0.05. For low β, the two coalescent processes are very similar until a high n. They considerably differ if β is larger. Note that for the partial sweep process, Tn never goes below ρ1.

Appendix 2—figure 3. Realizations of different coalescence processes for 30 lineages (leaves).

Appendix 2—figure 3.

Left: Partial sweep coalescent, with constant β=0.4 and ρ=0.00625 such that Ne=(ρβ2)1=1000. Right: Kingman coalescent with population size N=Ne=1000.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Richard A Neher, Email: richard.neher@unibas.ch.

Armita Nourmohammad, University of Washington, United States.

Aleksandra M Walczak, École Normale Supérieure - PSL, France.

Funding Information

This paper was supported by the following grants:

  • Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung 310030_188547 to Pierre Barrat-Charlaix, Richard A Neher.

  • University of Basel to Pierre Barrat-Charlaix, Richard A Neher.

Additional information

Competing interests

No competing interests declared.

Reviewing editor, eLife.

Author contributions

Conceptualization, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Conceptualization, Supervision, Visualization, Writing - original draft, Writing - review and editing.

Additional files

Supplementary file 1. GISAID acknowledgements table listing submitting and originating laboratories for the all sequences used in this study.
elife-97350-supp1.zip (457.5KB, zip)
MDAR checklist

Data availability

Accession numbers for all sequences from GISAID are provided as Supplementary file 1.

References

  1. Barrat P. ExpiringFitnessFigures. swh:1:rev:816f5c4102aedcc63310cd6d6c3e6b42bce4cb26Software Heritage. 2024a https://archive.softwareheritage.org/swh:1:dir:e9653cbc20efe427cdd82742fca1eba42162e80d;origin=https://github.com/PierreBarrat/ExpiringFitnessFigures;visit=swh:1:snp:d8a15f3a76638b766a17c7ed4a7929ab5563c3b2;anchor=swh:1:rev:816f5c4102aedcc63310cd6d6c3e6b42bce4cb26
  2. Barrat P. PartialSweepSIR.jl. swh:1:rev:0cc02a6f288d911a1df8a65c05945c1049f0810cSoftware Heritage. 2024b https://archive.softwareheritage.org/swh:1:dir:c8e2dbd354f328a4a36b525bf6db0231d9b9b30a;origin=https://github.com/PierreBarrat/PartialSweepSIR.jl;visit=swh:1:snp:df0fa2ed449d3aa56f5da9e7feb0c744f0554050;anchor=swh:1:rev:0cc02a6f288d911a1df8a65c05945c1049f0810c
  3. Barrat P. WrightFisher.jl. swh:1:rev:75dc0d1bf5af7f5447effaa10225141fd63c633dSoftware Heritage. 2024c https://archive.softwareheritage.org/swh:1:dir:10016634ff6f0bc1b5bc8b86422e03cb8aa23d2c;origin=https://github.com/PierreBarrat/WrightFisher.jl;visit=swh:1:snp:c00c2b309c36fcbc51167a48f3da199112a78c6e;anchor=swh:1:rev:75dc0d1bf5af7f5447effaa10225141fd63c633d
  4. Barrat-Charlaix P, Huddleston J, Bedford T, Neher RA. Limited predictability of amino acid substitutions in seasonal influenza viruses. Molecular Biology and Evolution. 2021;38:2767–2777. doi: 10.1093/molbev/msab065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Berestycki N. Recent progress in coalescent theory. Ensaiosmatemáticos. 2009;16:1–193. doi: 10.21711/217504322009/em161. [DOI] [Google Scholar]
  6. Bhatt S, Holmes EC, Pybus OG. The genomic rate of molecular adaptation of the human influenza A virus. Molecular Biology and Evolution. 2011;28:2443–2451. doi: 10.1093/molbev/msr044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bhatt S, Lam TT, Lycett SJ, Leigh Brown AJ, Bowden TA, Holmes EC, Guan Y, Wood JLN, Brown IH, Kellam P, Combating Swine Influenza Consortium. Pybus OG. The evolutionary dynamics of influenza A virus adaptation to mammalian hosts. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 2013;368:20120382. doi: 10.1098/rstb.2012.0382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bolthausen E, Sznitman AS. On ruelle’s probability cascades and an abstract cavity method. Communications in Mathematical Physics. 1998;197:247–276. doi: 10.1007/s002200050450. [DOI] [Google Scholar]
  9. Brunet É, Derrida B, Mueller AH, Munier S. Effect of selection on ancestry: An exactly soluble case and its phenomenological generalization. Physical Review E. 2007;76:041104. doi: 10.1103/PhysRevE.76.041104. [DOI] [PubMed] [Google Scholar]
  10. Bush RM, Bender CA, Subbarao K, Cox NJ, Fitch WM. Predicting the evolution of human influenza A. Science. 1999;286:1921–1925. doi: 10.1126/science.286.5446.1921. [DOI] [PubMed] [Google Scholar]
  11. Chardès V, Mazzolini A, Mora T, Walczak AM. Evolutionary stability of antigenically escaping viruses. PNAS. 2023;120:e2307712120. doi: 10.1073/pnas.2307712120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Desai MM, Fisher DS. Beneficial mutation selection balance and the effect of linkage on positive selection. Genetics. 2007;176:1759–1798. doi: 10.1534/genetics.106.067678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fonville JM, Wilks SH, James SL, Fox A, Ventresca M, Aban M, Xue L, Jones TC, Le NMH, Pham QT, Tran ND, Wong Y, Mosterin A, Katzelnick LC, Labonte D, Le TT, van der Net G, Skepner E, Russell CA, Kaplan TD, Rimmelzwaan GF, Masurel N, de Jong JC, Palache A, Beyer WEP, Le QM, Nguyen TH, Wertheim HFL, Hurt AC, Osterhaus A, Barr IG, Fouchier RAM, Horby PW, Smith DJ. Antibody landscapes after influenza virus infection or vaccination. Science. 2014;346:996–1000. doi: 10.1126/science.1256427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gillespie JH. Genetic drift in an infinite population. The pseudohitchhiking model. Genetics. 2000;155:909–919. doi: 10.1093/genetics/155.2.909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gog JR, Grenfell BT. Dynamics and selection of many-strain pathogens. PNAS. 2002;99:17209–17214. doi: 10.1073/pnas.252512799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Good BH, Martis S, Hallatschek O. Adaptation limits ecological diversification and promotes ecological tinkering during the competition for substitutable resources. PNAS. 2018;115:E10407–E10416. doi: 10.1073/pnas.1807530115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Gupta S, Ferguson N, Anderson R. Chaos, persistence, and evolution of strain structure in antigenically diverse infectious agents. Science. 1998;280:912–915. doi: 10.1126/science.280.5365.912. [DOI] [PubMed] [Google Scholar]
  18. Hallatschek O. Selection-like biases emerge in population models with recurrent jackpot events. Genetics. 2018;210:1053–1073. doi: 10.1534/genetics.118.301516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Huddleston J, Barnes JR, Rowe T, Xu X, Kondor R, Wentworth DE, Whittaker L, Ermetal B, Daniels RS, McCauley JW, Fujisaki S, Nakamura K, Kishida N, Watanabe S, Hasegawa H, Barr I, Subbarao K, Barrat-Charlaix P, Neher RA, Bedford T. Integrating genotypes and phenotypes improves long-term forecasts of seasonal influenza A/H3N2 evolution. eLife. 2020;9:e60067. doi: 10.7554/eLife.60067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kimura M. Diffusion models in population genetics. Journal of Applied Probability. 1964;1:177–232. doi: 10.2307/3211856. [DOI] [Google Scholar]
  21. Kistler KE, Bedford T. An atlas of continuous adaptive evolution in endemic human viruses. Cell Host & Microbe. 2023;31:1898–1909. doi: 10.1016/j.chom.2023.09.012. [DOI] [PubMed] [Google Scholar]
  22. Kucharski AJ, Lessler J, Cummings DAT, Riley S. Timescales of influenza A/H3N2 antibody dynamics. PLOS Biology. 2018;16:e2004974. doi: 10.1371/journal.pbio.2004974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lee JM, Eguia R, Zost SJ, Choudhary S, Wilson PC, Bedford T, Stevens-Ayers T, Boeckh M, Hurt AC, Lakdawala SS, Hensley SE, Bloom JD. Mapping person-to-person variation in viral mutations that escape polyclonal serum targeting influenza hemagglutinin. eLife. 2019;8:e49324. doi: 10.7554/eLife.49324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Luksza M, Lässig M. A predictive fitness model for influenza. Nature. 2014;507:57–61. doi: 10.1038/nature13087. [DOI] [PubMed] [Google Scholar]
  25. Marchi J, Lässig M, Walczak AM, Mora T. Antigenic waves of virus-immune coevolution. PNAS. 2021;118:e2103398118. doi: 10.1073/pnas.2103398118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Meijers M, Ruchnewitz D, Eberhardt J, Łuksza M, Lässig M. Population immunity predicts evolutionary trajectories of SARS-CoV-2. Cell. 2023;186:5151–5164. doi: 10.1016/j.cell.2023.09.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Morris DH, Gostic KM, Pompei S, Bedford T, Łuksza M, Neher RA, Grenfell BT, Lässig M, McCauley JW. Predictive modeling of influenza shows the promise of applied evolutionary biology. Trends in Microbiology. 2018;26:102–118. doi: 10.1016/j.tim.2017.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Neher RA. Genetic draft, selective interference, and population genetics of rapid adaptation. Annual Review of Ecology, Evolution, and Systematics. 2013;44:195–215. doi: 10.1146/annurev-ecolsys-110512-135920. [DOI] [Google Scholar]
  29. Neher RA, Hallatschek O. Genealogies of rapidly adapting populations. PNAS. 2013;110:437–442. doi: 10.1073/pnas.1213113110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Neher RA, Russell CA, Shraiman BI. Predicting evolution from the shape of genealogical trees. eLife. 2014;3:e03568. doi: 10.7554/eLife.03568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Neher RA, Bedford T, Daniels RS, Russell CA, Shraiman BI. Prediction, dynamics, and visualization of antigenic phenotypes of seasonal influenza viruses. PNAS. 2016;113:E1701–E1709. doi: 10.1073/pnas.1525578113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Neher R. Flu frequencies. 0868cffGitHub. 2024 https://github.com/nextstrain/flu_frequencies/tree/fixation
  33. Pelletier F, Garant D, Hendry AP. Eco-evolutionary dynamics. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 2009;364:1483–1489. doi: 10.1098/rstb.2009.0027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Petrova VN, Russell CA. The evolution of seasonal influenza viruses. Nature Reviews. Microbiology. 2018;16:47–60. doi: 10.1038/nrmicro.2017.118. [DOI] [PubMed] [Google Scholar]
  35. Rice DP, Good BH, Desai MM. The evolutionarily stable distribution of fitness effects. Genetics. 2015;200:321–329. doi: 10.1534/genetics.114.173815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Roemer C, Sheward DJ, Hisner R, Gueli F, Sakaguchi H, Frohberg N, Schoenmakers J, Sato K, O’Toole Á, Rambaut A, Pybus OG, Ruis C, Murrell B, Peacock TP. SARS-CoV-2 evolution in the Omicron era. Nature Microbiology. 2023;8:1952–1959. doi: 10.1038/s41564-023-01504-w. [DOI] [PubMed] [Google Scholar]
  37. Rouzine IM, Wakeley J, Coffin JM. The solitary wave of asexual evolution. PNAS. 2003;100:587–592. doi: 10.1073/pnas.242719299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Rouzine IM, Rozhnova G. Antigenic evolution of viruses in host populations. PLOS Pathogens. 2018;14:e1007291. doi: 10.1371/journal.ppat.1007291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Schiffels S, Szöllosi GJ, Mustonen V, Lässig M. Emergent neutrality in adaptive asexual evolution. Genetics. 2011;189:1361–1375. doi: 10.1534/genetics.111.132027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Schweinsberg J. Coalescents with simultaneous multiple collisions. Electronic Journal of Probability. 2000;5:1–50. doi: 10.1214/EJP.v5-68. [DOI] [Google Scholar]
  41. Shu Y, McCauley J. GISAID: Global initiative on sharing all influenza data - from vision to reality. Euro Surveillance. 2017;22:30494. doi: 10.2807/1560-7917.ES.2017.22.13.30494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Smith DJ, Lapedes AS, de Jong JC, Bestebroer TM, Rimmelzwaan GF, Osterhaus A, Fouchier RAM. Mapping the antigenic and genetic evolution of influenza virus. Science. 2004;305:371–376. doi: 10.1126/science.1097211. [DOI] [PubMed] [Google Scholar]
  43. Strelkowa N, Lässig M. Clonal interference in the evolution of influenza. Genetics. 2012;192:671–682. doi: 10.1534/genetics.112.143396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Tikhonov M, Monasson R. Innovation rather than improvement: a solvable high-dimensional model highlights the limitations of scalar fitness. Journal of Statistical Physics. 2018;172:74–104. doi: 10.1007/s10955-018-1956-6. [DOI] [Google Scholar]
  45. Tsimring LS, Levine H, Kessler DA. RNA virus evolution via a fitness-space model. Physical Review Letters. 1996;76:4440–4443. doi: 10.1103/PhysRevLett.76.4440. [DOI] [PubMed] [Google Scholar]
  46. Welsh FC, Eguia RT, Lee JM, Haddox HK, Galloway J, Chau NVV, Loes AN, Huddleston J, Yu TC, Le MQ, Nhat NTD, Thanh NTL, Greninger AL, Chu HY, Englund JA, Bedford T, Boni MF, Bloom JD. Age-dependent heterogeneity in the antigenic effects of mutations to influenza hemagglutinin. bioRxiv. 2023 doi: 10.1101/2023.12.12.571235. [DOI] [PMC free article] [PubMed]
  47. Yan L, Neher RA, Shraiman BI. Phylodynamic theory of persistence, extinction and speciation of rapidly adapting pathogens. eLife. 2019;8:e44205. doi: 10.7554/eLife.44205. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife Assessment

Armita Nourmohammad 1

This important study provides a new perspective on how human immunity shapes the antigenic evolution of pathogens. By combining theory and simulation the authors make a compelling case for the importance of eco-evolutionary interactions in population-level virus-host dynamics, which arise due to coupling between the dynamics of immune memories and viral variants. Although the work does not propose improved data-driven viral forecasting methods, it makes a conceptual contribution that advances the field's understanding of this problem's intrinsic difficulty.

Reviewer #1 (Public review):

Anonymous

In this work, the authors study the dynamics of fast-adapting pathogens under immune pressure in a host population with prior immunity. In an immunologically diverse population, an antigenically escaping variant can perform a partial sweep, as opposed to a sweep in a homogeneous population. In a certain parameter regime, the frequency dynamics can be mapped onto a random walk with zero mean, which is reminiscent of neutral dynamics, albeit with differences in higher order moments. Next, they develop a simplified effective model of time dependent selection with expiring fitness advantage, and posit that the resulting partial sweep dynamics could explain the behaviour of influenza trajectories empirically found in earlier work (Barrat-Charlaix et al. Molecular Biology and Evolution, 2021). Finally, the authors put forward an interesting hypothesis: the mode of evolution is connected to the age of a lineage since ingression into the human population. A mode of meandering frequency trajectories and delayed fixation has indeed been observed in one of the long-established subtypes of human influenza, albeit so far only over a limited period from 2013 to 2020. The paper is overall interesting and well-written.

In the revised version, the authors have addressed questions on the role of clonal interference by new simulations in the SI, clarified the connection between the SIR model and vanishing-fitness models, and placed their analysis into the broader context of consumer resource dynamics.

However, the general conclusion, as stated in the abstract, that variant trajectories become unpredictable as a consequence of the SIR dynamics remains somewhat misleading. Two aspects contribute to this problem. (1) The empirical observation of ``quasi-neutrality', i.e. the absence of a net frequency increase inferred as an average of many trajectories at intermediate frequencies, does not imply that individual trajectories are neutral (i.e., fully stochastic and unpredictable) over the time span of observation. Rather, it just says that some have a positive and some have a negative selection coefficient over that time span. (2) As stated by the authors, the observation of average quasi-neutrality is indeed incompatible with the travelling wave model, where initially successful new variants are assumed to retain a fixed, positive selection coefficient from origination to fixation. This observation also limits predictions by extrapolation, where a positive selection coefficient inferred at small frequency is assumed to remain the same at later times and higher frequencies. However, predictions derived from Gog and Grenfell's multi-strain SIR model, as used by several authors, do not make the assumption of fixed selection coefficients and incorporate trajectory-specific, time-dependent expiration effects into their model predictions. This distinction remains blurred throughout the text of the paper.

Reviewer #3 (Public review):

Anonymous

In this work the authors present a multi-strain SIR model in which viruses circulate in a heterogeneous population with different groups characterized by different cross-immunity structures. They reformulate the qualitative features of these SIR dynamics as a random walk characterized by new variants saturating at intermediate frequencies. Then they recast their microscopic description to an effective formalism in which viral strains lose fitness independently from one another. They study several features of this process numerically and analytically, such as the average variants frequency, the probability of fixation, and the coalescent time. They compare qualitatively the dynamics of this model to variants dynamics in RNA viruses such as flu and SARS-CoV-2

The idea that vanishing fitness mechanisms that produce partial sweeps may explain important features of flu evolution is very interesting. Its simplicity and potential generality make it a powerful framework. As noted by the authors, this may have important implications for predictability of virus evolution and such a framework may be beneficial when trying to build predictive models for vaccine design. The vanishing fitness model is well analyzed and produces interesting structures in the strains coalescent. Even though the comparison with data is largely qualitative, this formalism would be helpful when developing more accurate microscopic ingredients that could reproduce viral dynamics quantitatively.

This general framework has the potential to be more universal than human RNA viruses, in situations where invading mutants would saturate at intermediate frequencies.

The qualitative connection between the coarse-grained features of these vanishing fitness dynamics and structured SIR processes offers additional intuition relevant to host-pathogens interactions, although as noted by the authors other ecological processes could drive similar evolutionary patterns. The additions in the revised manuscript, substantiating more thoroughly the connection between the SIR and the vanishing fitness description, are important to better appreciate the scope of the work.

eLife. 2024 Dec 27;13:RP97350. doi: 10.7554/eLife.97350.3.sa3

Author response

Pierre Barrat-Charlaix 1, Richard A Neher 2

The following is the authors’ response to the original reviews.

Response to reviewers

We thank the Editor and the Reviewers for their constructure review. In the light of this feedback, we have made a number of changes and additions to the manuscript, that we think improved the presentation and hopefully address the majority of the concerns by the reviewers.

Main changes:

• We added a new SI section (B1) with a population dynamics simulation in the high clonal interference regime and without expiring fitness (see R1: (1)).

• We added a new SI section (A9) with the derivation of the equilibrium state of our SIR model in the case of 𝑀 immune groups and in the limit 𝜀 → 0 (see R1: (5)).

• The text of the section Abstraction as “expiring” fitness advantage has been modified.

• We added a new SI section (A4) describing the links between parameters of the “expiring fitness” and SIR models.

All three reviewers had concerns about the relation between our SIR model and the “expiring fitness” model, that we hope will be addressed by the last two items listed above. In particular, we would like to underline the following points:

• The goal of our SIR model is to give a mechanistic explanation of partial sweeps using traditional epidemiological models. While ecological models (e.g. consumer resource) can give rise to the same phenomenology, we believe that in the context of host-pathogen interaction it is relevant to explicitely show that SIR models can result in partial sweeps.

• The expiring fitness model is mainly an effective model: it reproduces some qualitative features of the SIR but does not quantitatively match all aspects of the frequency dynamics in SIR models.

• It is possible to link the parameters of the SIR (𝛼,𝛾,𝑏,𝑓) and expiring fitness (𝑠,𝑥,𝜈) models at the beginning of the invasion of the variant (new SI section A4). However, the two models also differ in significant ways (the SIR model can for example oscillate, while the effective model can not). The correspondence of quantities like the initial invasion rate and the ‘expiration rate’ of fitness effects is thus only expected to hold for some time after the emergence of a novel variant.

Public reviews:

Reviewer 1:

Summary In this work, the authors study the dynamics of fast-adapting pathogens under immune pressure in a host population with prior immunity. In an immunologically diverse population, an antigenically escaping variant can perform a partial sweep, as opposed to a sweep in a homogeneous population. In a certain parameter regime, the frequency dynamics can be mapped onto a random walk with zero mean, which is reminiscent of neutral dynamics, albeit with differences in higher order moments. Next, they develop a simplified effective model of time dependent selection with expiring fitness advantage, and posit that the resulting partial sweep dynamics could explain the behaviour of influenza trajectories empirically found in earlier work (Barrat-Charlaix et al. Molecular Biology and Evolution, 2021). Finally, the authors put forward an interesting hypothesis: the mode of evolution is connected to the age of a lineage since ingression into the human population. A mode of meandering frequency trajectories and delayed fixation has indeed been observed in one of the long-established subtypes of human influenza, albeit so far only over a limited period from 2013 to 2020. The paper is overall interesting and well-written. Some aspects, detailed below, are not yet fully convincing and should be treated in a substantial revision.

We thank the reviewer for their constructive criticism. The deep split in the A/H3N2 HA segment from 2013 to 2020 is indeed the one of the more striking examples of such meandering frequency dynamics in otherwise rapidly adapting populations. But the up and down of H1N1pdm clade 5a.2a.1 in recent years might be a more recent example. We argue that such meandering dynamics might be a common contributor to seasonal influenza dynamics, even if it only spans 3-6 years.

(1) The quasi-neutral behaviour of amino acid changes above a certain frequency (reported in Fig, 3), which is the main overlap between influenza data and the authors’ model, is not a specific property of that model. Rather, it is a generic property of travelling wave models and more broadly, of evolution under clonal interference (Rice et al. Genetics 2015, Schiffels et al. Genetics 2011). The authors should discuss in more detail the relation to this broader class of models with emergent neutrality. Moreover, the authors’ simulations of the model dynamics are performed up to the onset of clonal interference 𝜌/ 𝑠0 = 1 (see Fig. 4). Additional simulations more deeply in the regime of clonal interference (e.g. 𝜌/ 𝑠0 = 5) show more clearly the behaviour in this regime.

We agree with the reviewer that we did not discuss in detail the effects of clonal interference on quasi-neutrality and predictability. As suggested, we conducted additional simulations of our population model in the regime of high clonal interference (𝜌/ 𝑠0 ≫ 1) and without expiring fitness effects. The results are shown in a new section of the supplementary information. These simulations show, as expected, that increasing clonal interference tends to decrease predictability: the fixation probability of an adaptive mutation found at frequency 𝑥 moves closer to 𝑥 as 𝜌 increases. However, even in a case of strong interference 𝜌/ 𝑠0 = 32, 𝑝fix remains significantly different from the neutral expectation. We conclude from this that while it is true that dynamics tend to quasi-neutrality in the case of strong interference, this effect alone is unlikely to explain observations of H3N2 influenza dynamics. In our previous publication (BarratCharlaix et al, MBE, 2021) we have also investigated the effect of epistatic interactions between mutations, along side strong clonal interference. We concluded that, while most of these processes make evolution less predictable and push 𝑝fix towards the diagonal, it is hard to reproduce the empirical observations with realistic parameters. The “expiring fitness” model, however, produces this quite readily.

But there are qualitative differences between quasi-neutrality in traveling wave models and the expiring fitness model. In the traveling wave, a genotype carrying an adaptive mutation is always fitter than if it didn’t carry the mutation. Quasi-neutrality emerges from the accumulation of fitness variation at other loci and the fact that the coalescence time is not much bigger than the inverse selection coefficient of the mutation. In the expiring fitness model, the selective effect of the mutation itself goes away with time. We now discuss the literature on quasi-neutrality and cite Rice et al. 2015 and Schiffels et al. 2011.

In this context, I also note that the modelling results of this paper, in particular the stalling of frequency increase and the decrease in the number of fixations, are very similar to established results obtained from similar dynamical assumptions in the broader context of consumer resource models; see, e.g., Good et al. PNAS 2018. The authors should place their model in this broader context.

We thank the reviewer for pointing out the link between consumer resource models and our work. We further strengthened our discussion of the similarity of the phenomenology to models typically used in ecology and made an effort to highlight the link between consumer-resource models and ours in the introduction and in the part on the SIR model.

(2) The main conceptual problem of this paper is the inference of generic non-predictability from the quasi-neutral behaviour of influenza changes. There is no question that new mutations limit the range of predictions, this problem being most important in lineages with diverse immune groups such as influenza A(H3N2). However, inferring generic non-predictability from quasi-neutrality is logically problematic because predictability refers to individual trajectories, while quasi-neutrality is a property obtained by averaging over many trajectories (Fig. 3). Given an SIR dynamical model for trajectories, as employed here and elsewhere in the literature, the up and down of individual trajectories may be predictable for a while even though allele frequencies do not increase on average. The authors should discuss this point more carefully.

We agree with the reviewer that the deterministic SIR model is of course predictable. Similarly, a partial sweep is predictable. But we argue that expiring fitness makes evolution less predictable in two ways: (i) When a new adaptive mutation emerges and rises in frequency, we typically don’t know how rapidly its fitness effect is ‘expiring’. Thus even if we can measure its instantaneous growth rate accurately, we can’t predict its fate far into the future. (ii) Compared to the situation where fitness effects are not expiring, time to fixation is longer and there are more opportunities for novel mutations to emergence and change the course of the trajectory. We have tried to make this point clearer in the manuscript.

(3) To analyze predictability and population dynamics (section 5), the authors use a Wright-Fisher model with expiring fitness dynamics. While here the two sources of the emerging neutrality are easily tuneable (expiring fitness and clonal interference), the connection of this model to the SIR model needs to be substantiated: what is the starting selection 𝑠0 as a function of the SIR parameters (𝑓,𝑏,𝑀,𝜀), the selection decay 𝜈 = 𝜈(𝑓,𝑏,𝑀,𝜀,𝛾)? This would enable the comparison of the partial sweep timing in both models and corroborate the mapping of the SIR onto the simplified W-F model. In addition, the authors’ point would be strengthened if the SIR partial sweeps in Fig.1 and Fig.2 were obtained for a combination of parameters that results in a realistic timescale of partial sweeps.

We added a new section to the SI (A4) that relates the parameters of the SIR and expiring fitness models. In particular, we compute the initial growth rate 𝑠0 and a proxy for the fitness expiry rate 𝜈 as a function of the SIR parameters 𝛼,𝛾,𝑓,𝑏,𝑀, at the instant where the variant is introduced. The initial growth rate depends primarily on the degree of immune escape 𝑓, while the expiration rate 𝜈 is related to incidence 𝐼wt + 𝐼𝑚. However, as both models have fundamentally different dynamics, these relations are only valid on time scales shorter than potential oscillations of the SIR model. Beyond that, the connection between the models is mostly qualitative: both rely on the fact that growth rate of a strain diminishes when the strain becomes more frequent, and give rise to partial sweeps.

In Figure 1, the time it takes a partial sweep to finish is roughly 100− 200 generations (bottom right panel). If we consider H3N2 influenza and take one generation to be one week, this corresponds to a sweep time of 2 to 4 years, which is slightly slower but roughly in line with observations for selective sweeps. This time is harder to define if oscillatory dynamics takes place (middle right panel), but the time from the introduction of the mutant to the peak frequency is again of about 4 years. The other parameters of the model correspond to a waning time of 200 weeks and immune escape on the order of 20-30% change in susceptibility.

Reviewer 2:

Summary

This work addresses a puzzling finding in the viral forecasting literature: high-frequency viral variants evince signatures of neutral dynamics, despite strong evidence for adaptive antigenic evolution. The authors explicitly model interactions between the dynamics of viral adaptations and of the environment of host immune memory, making a solid theoretical and simulation-based case for the essential role of host-pathogen eco-evolutionary dynamics. While the work does not directly address improved data-driven viral forecasting, it makes a valuable conceptual contribution to the key dynamical ingredients (and perhaps intrinsic limitations) of such efforts.

Strengths

This paper follows up on previous work from these authors and others concerning the problem of predicting future viral variant frequency from variant trajectory (or phylogenetic tree) data, and a model of evolving fitness. This is a problem of high impact: if such predictions are reliable, they empower vaccine design and immunization strategies. A key feature of this previous work is a “traveling fitness wave” picture, in which absolute fitnesses of genotypes degrade at a fixed rate due to an advancing external field, or “degradation of the environment”. The authors have contributed to these modeling efforts, as well as to work that critically evaluates fitness prediction (references 11 and 12). A key point of that prior work was the finding that fitness metrics performed no better than a baseline neutral model estimate (Hamming distance to a consensus nucleotide sequence). Indeed, the apparent good performance of their well-adopted “local branching index” (LBI) was found to be an artifact of its tendency to function as a proxy for the neutral predictor. A commendable strength of this line of work is the scrutiny and critique the authors apply to their own previous projects. The current manuscript follows with a theory and simulation treatment of model elaborations that may explain previous difficulties, as well as point to the intrinsic hardness of the viral forecasting inference problem.

This work abandons the mathematical expedience of traveling fitness waves in favor of explicitly coupled eco-evolutionary dynamics. The authors develop a multi-compartment susceptible/infected model of the host population, with variant cross-immunity parameters, immune waning, and infectious contact among compartments, alongside the viral growth dynamics. Studying the invasion of adaptive variants in this setting, they discover dynamics that differ qualitatively from the fitness wave setting: instead of a succession of adaptive fixations, invading variants have a characteristic “expiring fitness”: as the immune memories of the host population reconfigure in response to an adaptive variant, the fitness advantage transitions to quasi-neutral behavior. Although their minimal model is not designed for inference, the authors have shown how an elaboration of host immunity dynamics can reproduce a transition to neutral dynamics. This is a valuable contribution that clarifies previously puzzling findings and may facilitate future elaborations for fitness inference methods.

The authors provide open access to their modeling and simulation code, facilitating future applications of their ideas or critiques of their conclusions.

We thank the reviewer for their summary, assessement, and constructive critique.

(1) The current modeling work does not make direct contact with data. I was hoping to see a more direct application of the model to a data-driven prediction problem. In the end, although the results are compelling as is, this disconnect leaves me wondering if the proposed model captures the phenomena in detail, beyond the qualitative phenomenology of expiring fitness. I would imagine that some data is available about cross-immunity between strains of influenza and sarscov2, so hopefully some validation of these mechanisms would be possible.

We agree with the reviewer that quantitatively confronting our model with data would be very interesting. Unfortunately, most available serological data for influenza and SARS-CoV-2 is obtained using post-infection sera from previoulsy naive animal models. To test our model, we would require human serology data, ideally demographically resolved, and a way to link serology to transmission dynamics. Furthermore, our model is mostly an explanation for qualitative features of variant dynamics and their apparent lack of predictability. We therefore considered that quantitative validation using data is out of scope of this work.

(2) After developing the SIR model, the authors introduce an effective “expiring fitness” model that avoids the oscillatory behavior of the SIR model. I hoped this could be motivated more directly, perhaps as a limit of the SIR model with many immune groups. As is, the expiring fitness model seems to lose the eco-evolutionary interpretability of the SIR model, retreating to a more phenomenological approach. In particular, it’s not clear how the fitness decay parameter 𝜈 and the initial fitness advantage 𝑠0 relate to the key ecological parameters: the strain cross-immunity and immune group interaction matrices.

The expiring fitness model emerges as a limiting case, at least qualitatively, of the SIR model when growth rate of the new variant is small compared to the waning rate and the SIR model does not oscillate. This can be readily achieved by many immune groups, which reconciles the large effect of many escape mutations and the lack of oscillation by confining the escape to some fraction of the population. Beyond that, the expiring fitness model is mainly an effective model that allows us to study the consequences of partial sweeps on predictability on long timescales. As stated in the “Main changes” section at the start of this reply, we added an SI section which links parameters of the two models. However, we underline the fact that beyond the phenomenon of partial sweeps, the dynamics of the two are different.

Reviewer 3:

Summary

In this work the authors start presenting a multi-strain SIR model in which viruses circulate in an heterogeneous population with different groups characterized by different cross-immunity structures. They argue that this model can be reformulated as a random walk characterized by new variants saturating at intermediate frequencies. Then they recast their microscopic description to an effective formalism in which viral strains lose fitness independently from one another. They study several features of this process numerically and analytically, such as the average variants frequency, the probability of fixation, and the coalescent time. They compare qualitatively the dynamics of this model to variants dynamics in RNA viruses such as flu and SARS-CoV-2.

Strengths

The idea that a vanishing fitness mechanisms that produce partial sweeps may explain important features of flu evolution is very interesting. Its simplicity and potential generality make it a powerful framework. As noted by the authors, this may have important implications for predictability of virus evolution and such a framework may be beneficial when trying to build predictive models for vaccine design. The vanishing fitness model is well analyzed and produces interesting structures in the strains coalescent. Even though the comparison with data is largely qualitative, this formalism would be helpful when developing more accurate microscopic ingredients that could reproduce viral dynamics quantitatively. This general framework has a potential to be more universal than human RNA viruses, in situations where invading mutants would saturate at intermediate frequencies.

We thank the reviewer for their positive remarks and constructive criticism below.

Weaknesses

The authors build the narrative around a multi-strain SIR model in which viruses circulate in an heterogeneous population, but the connection of this model to the rest of the paper is not well supported by the analysis. When presenting the random walk coarse-grained description in section 3 of the Results, there is no quantitative relation between the random walk ingredients importantly 𝑃(𝛽) - and the SIR model, just a qualitative reasoning that strains would initially grow exponentially and saturate at intermediate frequencies. So essentially any other microscopic description with these two features would give rise to the same random walk.

As also highlighted in the response to other reviewers, we now discuss how the parameter of the SIR model are related to the initial growth rate and the ‘expiration’ rate of the effective model. While the phenomenology of the SIR model is of course richer, this correspondence describes its overdamped limit qualitatively well.

Currently it’s unclear whether the specific choices for population heterogeneity and cross-immunity structure in the SIR model matter for the main results of the paper. In section 2, it seems that the main effect of these ingredients are reduced oscillations in variants frequencies and a rescaled initial growth rate. But ultimately a homogeneous population would also produce steady state coexistence between strains, and oscillation amplitude likely depends on parameters choices. Thus a homogeneous population may lead to a similar coarse-grained random walk.

The reviewer is correct that the primary effects of using many immune groups is to slow down the increase of novel variant, which in turn dampens the oscillations. Having multiple immune groups widens the parameter space in which partial sweeps without dramatic oscillations are observed. For slow sweeps, similar dymamics are observed in a homogeneous population.

Similarly, it’s unclear how the SIR model relates to the vanishing fitness framework, other than on a qualitative level given by the fact that both descriptions produce variants saturating at intermediate frequencies. Other microscopic ingredients may lead to a similar description, yet with quantitative differences.

Both of these points were also raised by other reviewers and we agree that it is worth discussing them at greater length. We now discuss how the parameters of the ‘expiring fitness’ model relate to those of the SIR. We also discuss how other models such as ecological models give rise to similar coarse grained models.

At the same time, from the current analysis the reader cannot appreciate the impact of such a mean field approximation where strains lose fitness independently from one another, and under what conditions such assumption may be valid.

In the SIR model, the rate at which strains lose fitness does depend on the precise state of the host population through the quantities 𝑆𝑚 and 𝑆wt , which is apparent in equation (A27) of the new SI section. The fact that a new variant shifts the equilibrium frequencies of previous strains in a proportional way is valid if the “antigenic space” is of very high dimensions, as explained in section Change in frequency when adding subsequent strains of the SI. It would indeed be interesting to explore relaxations of this assumption by considering a larger class of cross immunity matrices 𝐾. However, in the expiring fitness model, the fact that strains lose fitness independently from each ohter is a necessary simplification.

In summary, the central and most thoroughly supported results in this paper refer to a vanishing fitness model for human RNA viruses. The current narrative, built around the SIR model as a general work on host-pathogen eco-evolution in the abstract, introduction, discussion and even title, does not seem to match the key results and may mislead readers. The SIR description rather seems one of the several possible models, featuring a negative frequency dependent selection, that would produce coarse-grained dynamics qualitatively similar to the vanishing fitness description analyzed here.

We have revised the text throughout to make the connections between the different parts of the manuscript, in particular the SIR model and the expiring fitness model, clearer. We agree that the phenomenology of the expiring fitness model is more general than the case of human RNA viruses described by the SIR model, but we think this generality is an attractive feature of the coarse-graining, not a shortcoming. Indeed, other settings with negative frequency dependent selection or eco-systems that adapt on appropriate time scale generate similar dynamics.

Recommendations for the authors:

Reviewer 1:

(4) Line 74: what does fitness mean?

Many population dynamics models, including ones used for viral forecasting, attach a scalar fitness to each strain. The growth rate of each strain is then computed by substracting the average population fitness to the strain’s fitness. In this sentence, fitness is intended in this way.

(5) Fig. 1: The equilibrium frequency in the middle and bottom rows is hardly smaller than the equilibrium frequency in the top row for one immune group. This is surprising since for M=10, the variant escapes in only 1/10th of the population, which naively should impact the equilibrium frequency more strongly. Could the authors comment on this?

This is indeed non-trivial, and a hand-waving argument can be made by considering the extreme case 𝜀 = 0. The variant is then completely neutral for the immune groups 𝑖 > 1, and would be at equilibrium at any frequency in these immune groups. Its equilibrium frequency is then only determined by group 1, which is the only one breaking degeneracy. For 𝜀 > 0 but small, we naturally expect a small deviation from the 𝜀 = 0 case and thus 𝛽 should only change slightly.

A more rigorous argument with a mathematical proof in the case 𝜀 = 0 is now given in section A4 of the supplementary information.

(6) Fig. 1: In the caption, it is stated that the simulations are performed with 𝜀 = 0.99. Is this a typo? It seems that it should be 𝜀 = 0.01, as in and just below equation (7).

This was indeed a typo. It is now fixed.

(7) Fig. 3: The data analysis should be improved. In order to link the average frequency trajectories to standard population genetics of conditional fixation probabilities, the focal time should always be the time where the trajectory crosses the threshold frequency for the first time. Plotting some trajectories from a later time onwards, on their downward path destined to loss, introduces a systematic bias towards negative clonal interference (for these trajectories, the time between the first and the second crossing of the threshold frequency is simply omitted). The focal time of first crossing of the threshold frequency can easily be obtained, e.g., by linear interpolation of the trajectory between subsequent time points of frequency evalution. In light of the modified procedure, the statements on the on the inertia of the trajectories after crossing 𝑥⋆ (line 356) should be re-examined.

The way we process the data is already in line with the suggestions of the reviewer. In particular, we use as focal time the first time at which a trajectory is found in the threshold frequency bin. Trajectories that are never seen in the bin because of limited time-resolution are simply ignored.

In Fig. 3, there are no trajectories that are on their downward path at the focal time and when crossing the threshold frequency. Our other work on predictability of flu Barrat-Charlaix et. al. (2021) has a similar figure, which maybe created confusion.

(8) Fig. 4: authors write 𝛼/ 𝑠0 in the figure, but should be 𝜈/ 𝑠0.

Fixed.

(9) Line 420: authors refer to the blue curve in panel B as the case with strong interference. However, strong interference is for higher 𝜌/ 𝑠0, that is panel D (see point 1).

Fixed.

(10) Line 477: typo “there will a variety of mutations”.

Fixed.

Reviewer 2:

Should 𝛼 be 𝜈 in Figure 4 legends?

Thank you very much for spotting this error. We fixed it.

Equations 4-5 could be further simplified.

We factorised the 𝐼 term in equation 4. In equation 5, we prefered to keep the 1− 𝛿/ 𝛼 term as this quantity appears in different calculations concerning the model. For instance, 𝑆 = 𝛿/ 𝛼 at equilibrium.

The sentence before equation 8 references 𝑃𝛽(𝛽), but this wasn’t previously introduced.

We now introduce 𝑃𝑏𝜂 at the beginning of the section Ultimate fate of the variant.

In the last paragraph of page 12, “monotonously” maybe should be “monotonically”.

Fixed.

For the supplement section B, you might want a more descriptive title than “other”.

We renamed this section to Expiring fitness model and random walk.

Reviewer 3:

To expand on my previous comments, my main concerns regard the connection of section 2 and the SIR model with the rest of the paper.

In the first paragraph of page 9 the authors argue that a stochastic version of the SIR model would lead to different fixation dynamics in homogeneous vs heterogeneous populations due to the oscillations. This paragraph is quite speculative, some numerical simulations would be necessary to quantitatively address to what extent these two scenarios actually differ in a stochastic setting, and how that depends on parameters.

Likewise, the connection between the SIR model, the random walk coarse-grained description and the vanishing fitness model can be investigated through numerical simulations of a stochastic SIR given the chosen population and cross-immunity structures with i.e. 10-20 strains. This would allow for a direct comparison of individual strain dynamics rather than the frequency averages, as well as other scalar properties such as higher moments, coalescent, and fixation probability once reaching a given frequency. It would also be possible to characterize numerically the SIR P(beta) bridging the gap with the random walk description. It’s not obvious to me that the SIR P(beta) would not depend on the population size in the presence of birth-death stochasticity, potentially changing the moments scalings. I appreciate that such simulations may be computationally expensive, but similar numerical studies have been performed in previous phylodynamics works so it shouldn’t be out of reach.

An alternative, the authors should consider re-centering the narrative directly on the random walk of the vanishing fitness model, mentioning the SIR more briefly as a possible qualitative way to get there. Either way the authors should comment on other ways in which this coarse-grained dynamics could arise.

In the vanishing fitness model, where variants fitnesses are independent, is an infinite dimensional antigenic space implicitly assumed? If that’s the case, it should be explained in the main text.

A long simulation of the SIR model would indeed be interesting, but is numerically demanding and our current simulation framework doesn’t scale well for many strains and susceptibilities. We thus refrained from adding extensive simulations.

In Figure 2B of the main text, the simulation with 7 strains illustrates the qualitative match between the expiring fitness and the SIR model. However, it is clearly not long enough to discuss statistical properties of the corresponding random walk. Furthermore, we do not expect the individual strain dynamics of the SIR and expiring fitness models to match. The latter depends on few parameters (𝛼, 𝑠0), while the former depends on the full state of the host population and of the previous variants.

In the sectin linking the parameters of the two models, we now discuss the distribution 𝑃(𝛽) of the SIR model for two strains and a specific choice of distribution for the cross immunity 𝑏 and 𝑓.

Minor comments:

There is some back and forth in the writing. For instance, when introducing the model, 𝐶𝑖𝑗 is first defined as 1/ 𝑀, then a few paragraphs later the authors introduce that in another limit 𝐶𝑖𝑖 is just much higher than any 𝐶𝑖𝑗, and finally they specify that the former is the fast mixing scenario.

Another example is in section 2, in the first paragraph they put forward that heterogeneity and crossimmunity have different impacts on the dynamics, but the meaning attributed to these different ingredients becomes clear only a while later after the homogeneous population analysis. Uniforming the writing would make it easier for the reader to follow the authors’ train of thought.

We removed the paragraph below Equation (1) mentioning the 𝐶𝑖𝑗 = 1/ 𝑀 case, which we hope will linearize the writing.

When mentioning geographical structure, why would geography affect how immunity sees pairs of viral strains (differences in 𝐾)?

Geographic structure could influence cross-immunity because of exposure histories of hosts. For instance in the case of influenza, different geographical regions do not have the same dominating strains in each season, and hosts from different regions may thus build up different immunity.

In the current narrative there are some speculations about non-scalar fitness, especially in section 2. The heterogeneity in this section does not seem so strong to produce a disordered landscape that defies the notion of scalar fitness in the same way some complex ecological systems do. A more parsimonious explanation for the coexistence dynamics observed here may be a negative frequency dependent selection.

Our language here was not very precise and we agree that the phenomenology we describe is related to that of frequency dependent selection (mediated by via immunity of the host population that integrates past frequencies). Traveling wave models typically use fitness function that are independent of the population distribution and only account for the evolution via an increasing average fitness. We have made discussion more accurate by stating that we consider a case where fitness depends explicitly on present and past population composition, which includes the case of negative frequency dependent selection.

I don’t understand the comparison with genetic drift (typo here, draft) in the last paragraph of section 3 given that there is no stochasticity in growth death dynamics.

We compare the random walk to genetic drift because of the expression of the second moment of the step size. The genetic draft has the same functional form. If one defines the effective population size as in the text, the drift due to random sampling of alleles (neutral drift) and the changes in strain frequency in our model have the same first and second moments. The stochasticity here does not come from the dynamics, which are indeed deterministic, but from the appearance of new mutations (variants) on backgrounds that are randomly sampled in the population. This latter property is shared with genetic draft.

In the vanishing fitness model, I think the reader would benefit from having 𝑃(𝑠) in the main text, and it should be made more clear what simulations assume what different choice of 𝑃(𝑠).

We added the expression of 𝑃(𝑠) in the main text. Simulations use the value 𝑠0 = 0.03, which we added in the caption of Figure 4.

When comparing the model and data, is the point that COVID is not reproduced due to clonal interference? It seems from the plot that flu has clonal interference as well though. Why is that negligible?

A similar point has been raised by the first reviewer (see R1-(1)). Clonal interference is not negligible, but we find it to be insufficient to explain the observations made for H3N2 influenza, namely the lack of inertia of frequency trajectories or the probability of fixation. This is shown in the new section (B1) of the SI. Both SARS-CoV-2 and H3N2 influenza experience clonal interference, but the former is more predictable than the latter. Our point is that expiring fitness effects should be stronger in influenza because of the higher immune heterogeneity of the host population, making it less predictable than SARS-CoV-2.

Does the fixation probability as a function of frequency threshold match the flu data for some parameters sets?

For H3N2 influenza, the fixation probability is found to be equal to the threshold frequency (see Barrat-Charlaix MBE 2021, also indirectly visible from Fig. 3). In Figure 4, we obtain that either a high expiry rate or intermediate expiry rates and clonal interference regimes match this observation.

It would be instructive to see examples of the individual variant dynamics of the vanishing fitness model compared to the presented data.

We added an extra SI figure (S7) showing 10 randomly selected trajectories of individual variants in the case of H3N2/HA influenza and for the expiring fitness model with different parameter choices.

Figure 4E has no colorbar label. The reader shouldn’t have to look for what that means in the bottom of the SIs. In panels A and B the label should be 𝜈, not 𝛼. Same thing in most equations of page 42.

We added the colorbar label to the figure and also updated the caption: a darker color corresponds to a higher probability of sweeps to overlap. We fixed the 𝜈 – 𝛼 confusion in the SI and in the caption of the figure.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Supplementary file 1. GISAID acknowledgements table listing submitting and originating laboratories for the all sequences used in this study.
    elife-97350-supp1.zip (457.5KB, zip)
    MDAR checklist

    Data Availability Statement

    Sequence data of influenza viruses was obtained from GISAID (Shu and McCauley, 2017). We thank the teams involved in sample collection, sequencing, and processing of these data for their contribution to global surveillance of influenza virus circulation. A table acknowledging all originating and submitting laboratories is provided as supplementary information.

    Sequence data of SARS-CoV-2 viruses was obtained from NCBI and restricted to data from North America to ensure more homogeneous sampling. We are grateful to all teams involved in the collection and generation of these data for generously sharing these data openly.

    Accession numbers for all sequences from GISAID are provided as Supplementary file 1.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES