Significance
Viruses, such as influenza, evolve under the selection of host immune systems. Previously infected individuals become immune, forcing the virus to find susceptible hosts or mutate, chasing it away in antigenic space. We formulate this viral escape process in terms of a low-dimensional wave moving in antigenic space. The dimensionality of the antigenic space impacts the persistence, as well as stability, of viral evolution. We uncover a characteristic timescale for the persistence of the viral strain, which is an order of magnitude longer than individual host immunity and emerges collectively from the pressure of the chasing immune systems. These results offer intuition about the antigenic turnover of viruses and highlight the importance of the effective dimensionality of coevolution.
Keywords: viral evolution, fitness wave, coevolution, host–pathogen dynamics
Abstract
The evolution of many microbes and pathogens, including circulating viruses such as seasonal influenza, is driven by immune pressure from the host population. In turn, the immune systems of infected populations get updated, chasing viruses even farther away. Quantitatively understanding how these dynamics result in observed patterns of rapid pathogen and immune adaptation is instrumental to epidemiological and evolutionary forecasting. Here we present a mathematical theory of coevolution between immune systems and viruses in a finite-dimensional antigenic space, which describes the cross-reactivity of viral strains and immune systems primed by previous infections. We show the emergence of an antigenic wave that is pushed forward and canalized by cross-reactivity. We obtain analytical results for shape, speed, and angular diffusion of the wave. In particular, we show that viral–immune coevolution generates an emergent timescale, the persistence time of the wave’s direction in antigenic space, which can be much longer than the coalescence time of the viral population. We compare these dynamics to the observed antigenic turnover of influenza strains, and we discuss how the dimensionality of antigenic space impacts the predictability of the evolutionary dynamics. Our results provide a concrete and tractable framework to describe pathogen–host coevolution.
The evolution of viral pathogens under the selective pressure of its hosts’ immunity is an example of rapid coevolution. Viruses adapt in the usual Darwinian sense by evading immunity through antigenic mutations, while immune repertoires adapt by creating memory against previously encountered strains. Some mechanisms of in-host immune evolution, such as the affinity maturation process, are important for the rational design of vaccines. Examples are the seasonal human influenza virus, where vaccine strain selection can be informed by predicting viral evolution in response to collective immunity (1), as well as chronic infections such as HIV (2–5), where coevolution occurs within each host. Because of the relatively short time scales of selection and strain turnover, these dynamics also provide a laboratory for studying evolution and its link to ecology (6).
It is useful to think of both viral strains and immune protections as living in a common antigenic space (6), corresponding to an idealized “shape space” of binding motifs between antibodies and their cognate epitopes (7). While the space of molecular recognition is high-dimensional, projections onto a low-dimensional effective shape space have provided useful descriptions of the antigenic evolution. In the example of influenza, neutralization data from hemagglutination-inhibition assays can be projected onto a two-dimensional antigenic space (8–10). Mapping historical antigenic evolution in this space suggests coevolutionary dynamics pushing the virus away from its past positions, where collective immunity has developed. Importantly, the evolution of influenza involves competitive interactions of antigenically distinct clades in the viral population, generating a “Red Queen” dynamics of pathogen evolution (11, 12). Genomic analysis of influenza data has revealed evolution by clonal interference (13); this mode of evolution is well known from laboratory microbial populations (14). In addition, the viral population may split into subtypes. Such splitting or “speciation” events, which are marked by a decoupling of the corresponding immune interactions, happened in the evolution of influenza B (15) and of noroviruses (16).
The joint dynamics of viral strains and the immune systems of the host population can be modeled using agent-based simulations (17, 18) that track individual hosts and strains. Such approaches have been used to study the effect of competition on viral genetic diversity (19), to study geographical effects (20), and to study the effect of vaccination (21). Alternatively, systems of coupled differential equations known as susceptible-infected-recovered (SIR) models may be adapted to incorporate evolutionary mechanisms of antigenic adaptation (6, 22, 23). Agent-based simulations in two dimensions were used to recapitulate the ballistic evolution characteristic of influenza A (18) and to predict the occurrence of splitting and extinction events (24). In parallel, theory was developed to study the Red Queen effect (12, 25), based on the well-established theory of the traveling fitness wave (26–28). While effectively set in one dimension, this class of models can nonetheless predict extinction and splitting events assuming an infinite antigenic genome (12).
In this work, we propose a coevolutionary theory in an antigenic interaction space of arbitrary dimension , which is described by joint nonlinear stochastic differential equations coupling the population densities of viruses and of protected hosts. We show that these equations admit a -dimensional antigenic wave solution, and we study its motion, shape, and stability, using simulations and analytical approximations. Based on these results, we discuss how canalization and predictability of antigenic evolution depend on the dimensionality .
Results
Coarse-Grained Model of Viral–Immune Coevolution.
Our model describes the joint temporal evolution of populations of viruses and immune protections in some effective antigenic space of dimension . Both viral strains and immune protections are labeled by their position (or “phenotype”) in that common antigenic space (Fig. 1A). In that space, viruses randomly move as a result of antigenic mutations and proliferate through infections of new hosts. Immune memories are added at the past positions of viruses. Immune memories distributed across the host population provide protection that reduces the effective fitness of the virus. We coarse grain that description by summarizing the viral population by a density of hosts infected by a particular viral strain and immunity by a density of immune memories specific to strain in the host population.
Fig. 1.
A simple model of viral–host coevolution predicts the emergence of an antigenic wave. (A) Schematic of the coevolution model. Viruses proliferate while effectively diffusing in antigenic space (here in two dimensions) through mutations, with coefficient . Past virus positions are replaced by immune protections (light blue). Immune protections create a fitness gradient for the viruses (green gradient), favoring strains at the front. Both populations of viruses and immune populations are coarse grained into densities in antigenic space. (B) Snapshot of a numerical simulation of Eqs. 2 and 3 showing the existence of a wave solution. The blue colormap represents the density of immune protections left behind by past viral strains. The current virus density is shown in red. (C) Close-up onto the viral population, showing fitness isolines. The wave moves in the direction of the fitness gradient (arrow) through the enhanced growth of stains at the edge of the wave (black dots). (D) Distribution of fitness across the viral population (corresponding to the projection of B along the fitness gradient). Parameters for B–D: , , , .
At each infection cycle, each host may infect unprotected hosts, where is called the basic reproduction number. However, a randomly picked host is susceptible to strain with probability , where is the coverage of strain by immune memories of the population and the number of immune memories carried by each host. Because of cross-reactivity, which allows immune memories to confer protection against close-by strains, immune coverage is given as a function of the density of immune memories,
[1] |
where is a cross-reactivity kernel describing how well memory protects against strain , and is the range of the coverage provided by cross-reactivity. In summary, the effective growth rate, or “fitness,” of the virus is given by .
The coupled dynamics of viruses and immune memories are then described by the stochastic differential equations (with time in units of infection cycles throughout):
[2] |
[3] |
Here is a Gaussian white noise in time and space, , accounting for demographic noise (29). This stochastic term is crucial, as it will drive the evolution of the wave. The diffusion constant describes the effect of infinitesimal mutations on the phenotype, , where is the mean number of mutations per cycle, and is the mean-square effect of each mutation along each antigenic dimension (assuming that mutations do not have a systematic bias, ). The continuous-diffusion assumption implied by Eq. 2 is valid only when there are many small mutation effects, and , in contrast with regimes where mutations are rare but have a substantial fitness effect drawn from a distribution (25, 30). Our choice is simpler in that it describes the mutation process through a single parameter . Along with the choice of the cross-reactivity kernel , it also naturally preserves the isotropy of the antigenic space.
The total viral population size, or number of infected hosts, is subject to fluctuations. At the same time, the host population size remains constant because newly added memories (first term of right-hand side of Eq. 3) overwrite existing ones picked uniformly at random (second term of right-hand side of Eq. 3). Since each host carries immune receptors, we have .
If we assume that the system reaches an evolutionary steady state, with stable viral population size , then Eq. 3 can be integrated explicitly:
[4] |
with . Eq. 4 shows how the density of protections reflects the past evolution of the viral population.
Antigenic Waves.
We simulated Eqs. 2 and 3 on a square lattice (Methods) and found a stable wave solution (Fig. 1 B–D). The wave has a stable population size and moves approximately ballistically through antigenic space, pushed from behind by the immune memories left in the trail of past viral strains (Fig. 1B). These memories exert an immune pressure on the viruses, forming a fitness gradient across the width of the wave (Fig. 1C), favoring the few strains that are farthest from immune memories, at the edge of the wave.
We assume that the solution of the coupled evolution equations Eqs. 2 and 3 takes the form of a moving quasispecies in a -dimensional antigenic space,
[5] |
Here, we have written the solution in a comoving frame, in which a motion with constant speed takes place in the direction of the coordinate , and fluctuations in the other dimensions, , centered around for , are assumed to be independent. In the next sections, we analyze solutions of this form. First, we project the -dimensional antigenic wave onto the one-dimensional fitness space; this projection produces a traveling fitness wave (26–28, 31, 32) that determines the antigenic speed and the mean pair coalescence time of the viral genealogy. Second, we study the shape of the -dimensional quasispecies and determine the fluctuations in the transverse directions. These fluctuations produce a key result of this paper: Immune interactions canalize the evolution of the antigenic wave; this constraint can be quantified by characteristic time scales governing the transverse antigenic fluctuations. Canalization is most pronounced in spaces of low dimensionality and, as we discuss below, affects the predictability of antigenic evolution.
Speed of Antigenic Evolution.
Projected onto the fitness axis , the solution is approximately Gaussian (Fig. 1D). This representation suggests a strong similarity to the fitness wave solution found in models of rapidly adapting populations with an infinite reservoir of beneficial mutations (26–28, 31, 32). To make the analogy rigorous, we must assume that the fitness gradient in antigenic space is approximately constant, meaning that fitness isolines are straight and equidistant. Mutations along the gradient direction have a fitness effect that is linear in the displacement, while mutations along perpendicular directions are neutral and can be treated independently. Note that while we will use this projection onto fitness to compute the speed of the antigenic wave, the underlying antigenic wave remains in dimensions; we will come back to transverse fluctuations in the next sections.
There are several models of fitness waves that differ in the assumptions on the statistics of mutational effects. Our assumption of diffusive motion makes our projected dynamics equivalent to those studied in ref. 32, which itself builds on earlier work (27). This equivalence results from the two key assumptions of the mutation model in antigenic space: Mutations have a small effect, and their distribution is isotropic, meaning that there are as many deleterious as beneficial mutations. In the limit where the wave is small compared to the adaptation time scale, , the wave may be replaced by a Dirac delta function at in Eq. 4. One can then calculate explicitly the immune density (upstream of the wave) and coverage (downstream of the wave, using Eq. 1):
[6] |
[7] |
where for and 0 otherwise. This idealized exponential trail of immune protections corresponds to the blue trace in Fig. 1B and the coverage or fitness gradient to the isolines in Fig. 1C.
In the moving frame of the wave, , with , the local immune protection and viral fitness can be expanded locally for (see ref. 25 for a similar treatment in a one-dimensional antigenic space):
[8] |
where is the average population fitness, and
[9] |
is the fitness gradient (for , see below). Rescaling the antigenic variable as , this process is equivalent to the evolution of a population where mutation effects are described by diffusion in fitness space with coefficient . This is precisely the model from which the fitness wave solution of refs. 27 and 32 was described (SI Appendix). In the following we use results from these works to describe the antigenic wave. However, we note that in the usual fitness wave theory, population is kept constant by construction, which implies that fitness is relevant only when compared to the mean of the population. By contrast, in our model population size is itself a dynamical variable, and fitness is defined as an absolute growth rate. In this version of the model, the fitness of the whole viral population undergoes continuous negative drift due to the constant adaptation of immune systems, encoded in the term in Eq. 8. This negative fitness drift has an analogous effect to subtracting the mean fitness in models with constant population size, making the equivalence possible.
The fitness wave theory allows us to make an analytical prediction about the properties of the antigenic wave. Let us start with its population size , which is regulated by how fast the immune system catches up with the wave. The immune turnover time in Eq. 4 is inversely proportional to : The larger the population size is, the faster immune memories are updated, increasing the immune pressure on current viral strains (lower ) and thus decreasing . As the moving wave reaches a stable moving state, its size becomes stable over time, giving the condition , which in turn constrains the ratio between the wave’s size and speed:
[10] |
But the fitness wave theory predicts that the speed of the wave itself depends on the population size. The larger is, the more outliers at the nose of the fitness wave, and the farther out they may jump in antigenic space, establishing fitter ancestors of the future population. This results in a fitness wave whose speed depends only weakly on population size and mutation rate (ref. 32 and SI Appendix),
[11] |
where and are the diffusivity and wave speed in fitness space, which are related to their counterparts in antigenic space through the scaling factor . Substituting this scaling into Eq. 11 yields a relation between antigenic speed and population size,
[12] |
which closes the system of equations: Using the definition of (Eq. 9), Eqs. 10 and 12 completely determine and as a function of the model’s parameters (through a transcendental equation; see SI Appendix). We validated these theoretical predictions for and by comparing them to numerical simulations, which show good agreement over a wide range of parameters (Fig. 2 A and B). We note that the alternative fitness wave model of Desai and Fisher (28) predicts different scaling relations between speed and population size, including for an arbitrary distribution of fitness effects (30). The major difference to our description is that we assume infinitesimal and reversible fitness effects. Relaxing that assumption to account for rare but strong mutational effects would affect Eq. 12, but the dependence on would still be logarithmic at most.
Fig. 2.
Analytical prediction of wave properties. Shown are the numerical versus analytical predictions for the wave’s population size (A), speed (B), width along the wave’s direction of motion (C), and width in the direction perpendicular to motion (D), with dimensions. Lengths are in units of the cross-reactivity range (so that , with no loss of generality). Parameters: (squares), (circles), or (triangles); (solid symbols) or 3 (open symbols); (small symbols) or 5 (large symbols).
Shape of the Antigenic Wave.
The width of the wave in the direction of motion is given by Fisher’s theorem, which relates the rate of change of the average fitness to its variance in the population: . In our description fitness and the antigenic dimension are linearly related with coefficient , implying . The result of that prediction for is validated against numerical simulations in Fig. 2C.
The wave is led by an antigenic “nose” formed by a few outlying strains of reduced cross-reactivity with the concurrent immune population, generating high fitness. These strains have phenotype and fitness . They serve as founder strains from which the bulk of the future population will derive some time later (SI Appendix). As a result, two strains taken at random can trace back to their most recent common ancestor to some average time in the past, where is a numerical factor estimated from simulations (32).
To explain the width of the wave in the other phenotypic dimensions than that of motion (), we note that in these directions evolution is neutral. Two strains taken at random in the bulk are expected to have drifted, or “diffused” in physical language, by an average squared displacement from their common ancestor, so that their mean-square distance is along . If one assumes an approximately Gaussian wave of width , the mean-square distance between two random strains along should be equal to . Equating the two estimates yields . Fig. 2D checks the validity of this prediction against simulations.
Both longitudinal and transversal fluctuations in antigenic space are instances of quantitative traits under interference selection generated by multiple small-effect mutations. The width of these traits is governed by the common relation , which expresses the effective neutrality of the underlying genetic mutations (33). This relation says that antigenic variations in all dimensions scale in the same way with the model parameters, and the wave should have an approximately spherical shape. Consistently, here we find a wave with a fixed ratio between transverse and longitudinal variations. This implies a slightly asymmetric shape (which may be nonuniversal and depend on the microscopic assumptions of our mutation model).
In what parameter regime is our theory valid? The fitness wave theory we built upon is meant to be valid in the large population size, . In addition, we assumed that the fitness landscape was locally linear across the wave. This approximation should be valid all of the way up to the tip of wave, given by , since this is where the selection of future founder strains happens. This condition translates into , implying [using and Eqs. 9 and 12], where is in antigenic unit squared per infection cycle. This result means that one infection cycle will not produce enough mutations for the virus to leave the cross-reactivity range. In that limit, another assumption is automatically fulfilled, namely that the width of the wave be small compared to the span of immune memory: . Our simulations, which run in the regime of very slow effective diffusion () and have relatively large population sizes (), satisfy these conditions. This explains the good agreement between analytics and numerics.
Equations of Motion of the Wave’s Position.
The wave solution allows for a simplified picture. The wave travels in the direction of the fitness gradient (or equivalent to the gradient of immune coverage) with speed (Fig. 3A). Occasionally the population splits into two separate waves that then travel away from each other and from their common ancestor (Fig. 3B). The tip of the wave’s nose, which contains the high-fitness individual that will seed the future population, determines its future position in antigenic space. In the directions perpendicular to the fitness gradient, this position diffuses neutrally with coefficient . This motivates us to write effective equations of motion for the mean position of the wave:
[13] |
[14] |
where and are Gaussian white noises in the directions along, and perpendicular to, the fitness gradient . is an effective diffusivity in the direction of motion resulting from the fluctuations at the nose tip. These fluctuations are different from those suggested by , as they involve feedback mechanisms between the wave’s speed , size , and advancement of the fitness nose . In the following, we do not consider these fluctuations and focus on perpendicular fluctuations instead.
Fig. 3.
Stochastic behavior of the wave: diffusive motion, splits, and extinctions. (A) The wave moves forward in antigenic space but is driven by its nose tip, which undergoes antigenic drift (diffusion) in directions perpendicular to its direction of motion. These fluctuations deviate that direction, resulting in effective angular diffusion. (B) When antigenic drift is large, the wave may randomly split into subpopulations, creating independent waves going in different directions. Each wave can also go extinct as size fluctuations bring it to 0. (C) Cartoon illustrating the wave’s angular diffusion. Selection and drift combine to create an inertial random walk of persistence time . (D) Analytical prediction (Eq. 17) for the persistence time, versus estimates from simulations. Symbols and colors are the same as in Fig. 2.
Angular Diffusion and Persistence of the Antigenic Wave.
In the description of Eqs. 13 and 14, the viral wave is pushed by immune protections left in its trail. The fitness gradient, and thus the direction of motion, points in the direction that is set by the wave’s own path. This creates an inertial effect that stabilizes forward motion. On the other hand, fluctuations in perpendicular directions are expected to deviate the course of that motion, contributing to effective angular diffusion. To study this behavior, we assume that motion is approximately straight in direction and study small fluctuations in the perpendicular directions, , with (as illustrated in Fig. 3C). Eqs. 13 and 14 simplify to (SI Appendix)
[15] |
where is an effective memory time scale combining the host’s actual immune memory and the cross-reactivity with strains encountered in the past.
Eq. 15 may be solved in Fourier space. Defining , it becomes
[16] |
To understand the behavior at long times , we expand at small : or equivalently in the temporal domain . This implies that the direction of motion, , undergoes effective angular diffusion in the long run: . The persistence time of that inertial motion,
[17] |
does not depend explicitly on speed, population size, or the dimension of antigenic space. However, a larger diffusivity implies larger and while reducing the persistence time. Likewise, a larger reproduction number or smaller memory capacity speeds up the wave and increases its size, but also reduces its persistence time. This implies that, for a fixed number of hosts , larger epidemic waves not only move faster across antigenic space, but also change course faster.
This persistence time scales as the time it would take a single virus drifting neutrally to escape the cross-reactivity range, . For comparison, the much shorter time scale for a population of viruses to escape from the cross-reactivity range ,
[18] |
scales with the inverse incidence rate . This is consistent with the whole population having been infected at least once every infection cycles. This separation of time scales is consistent with the observation that evolution in the transverse directions is driven by neutral drift, which is much slower than adaptive evolution in the longitudinal direction. Both and are longer than the coalescence time of the viral population, , since they reflect long-term memory from the immune system. However, while is related to the reinfection period and is thus bounded by the hosts’ immune memory (itself bounded by their lifetime, which we do not consider), can be longer than that. This is possible due to inertial effects, which are allowed by the high-order dynamics of Eq. 15 generated by the immune system. This is very much like when, in mechanics, a massive object set in motion in a given direction will keep that direction without the need for an external force to maintain it.
The high-frequency behavior of Eq. 16 has a logarithmic divergence, meaning that the total power of is infinite unless we impose a (ultraviolet) cutoff. Such a regularization emerges from the fine structure of the wave. While the motion of the wave is driven by its nose tip, the immune pressure extends back only to the recent past of the bulk of the distribution, which stands at a distance away from the nose. In other words, there is a lag (and thus a gap in antigenic space) between the most innovative variants that drive viral evolution and the majority of currently circulating variants that drive host immunity. Mathematically, this implies that the domain of integration of the first term in the right-hand side of Eq. 15 should start at , which regularizes the divergence. A more careful analysis provided in SI Appendix shows that this regularization does not affect the long-term diffusive behavior of the wave.
Canalization, Speciations, and Predictability of Antigenic Evolution.
We now examine how deflections of the wave in the transverse direction determine the predictability and stability of the viral quasispecies. Assuming , angular diffusion causes motion to be deflected as (SI Appendix) . Crucially, this deflection depends on the dimension of the antigenic space, because the displacement acts additively in each of the transversal coordinates. Higher dimension means more deviation from the predictable course of the wave and thus less predictability. We can define a predictability time scale
[19] |
which is the time it takes for prediction errors to become of the order of the cross-reactivity range. In low dimensions, this time scales as a weighted geometric mean between and . However, at high dimensions may be significantly reduced, causing loss of predictability even below . The prediction time scale is distinct from the previously discussed persistence time: involves the integrated displacement in the transversal direction, while quantifies the diffusion of the tangent velocity vector. Thus, may be interpreted as quantifying the predictability of the actual location of the next viral population in antigenic space, while gives the predictability of the general direction of evolution, which changes more slowly. Therefore, the persistence time is both harder to extract from data and less relevant for actionable predictions.
To get a sense of numbers, we can compare our results with epidemiological data, taking the evolution of influenza as an example, with an infection cycle time of 3 d. It is assumed that individuals lose immunity to the circulating strain of the flu within y cycles, meaning that the wave would travel a distance in ; i.e., . For instance, with to , , and , we may choose to get a speed of the same order, , and y. By contrast, the predictability time scale is much shorter and depends on dimension, albeit slowly, ranging from y for to about 2 y for . We stress that these numbers are obtained by scaling laws and should not be taken as precise quantitative predictions.
Large deflections may also cause speciations, or splits, which occur when two substrains coexist long enough to become independent from the immune standpoint. This happens when two sublineages see the difference of their transverse positions become larger than , within some limited period given by the coalescence time. We estimated the rate of such splitting events using a saddle-point approximation (SI Appendix):
[20] |
with some numerical factor. Simulations confirmed the validity of this scaling (Fig. 4A).
Fig. 4.
Rate of speciation. (A) Rescaled rate of splitting events, defined as the emergence of two substrains at distance from each other in antigenic space, meaning that they are becoming antigenically independent. The predicted scaling, , as well as the definition of the collective variable as a function of the model parameters, is given by Eq. 20. The line shows a linear fit of the logarithm of the ordinate. (B) Predicted rate of splitting as a function of the dimension , for , , , and , with . Symbols and colors are the same as in Fig. 2.
The splitting rate grows with the dimension (Fig. 4B), consistent with the intuition that departure from canalized evolution is easier when more directions of escape are available. Splitting events are expected to strongly affect our ability to predict the future course of the wave. However, the rarity of such events (exponential scaling of ) means that they will have a lower impact on predictability than deflections. These results provide a theoretical and quantitative basis from which to assess the effect of dimension on predictability and possibly estimate from antigenic time course data of real viral populations.
Discussion
In this work, we have developed an analytical theory for studying antigenic waves of viral evolution in response to immune pressure. We showed that predictability is limited by two features of antigenic evolution, transversal diffusion and lineage speciations of the antigenic wave, both of which explicitly depend on the dimensionality of antigenic space.
To derive these results, we explicitly embedded the antigenic phenotype in a -dimensional Euclidean space. This description is different from previous work that considered one- (25) or infinite-dimensional antigenic spaces (12). It allows for the possibility of compensatory mutations and makes it easier to compare results with empirical studies of viral evolution projected onto low-dimensional spaces (8, 9). Unlike these studies, however, our work does not address the question of how an effective dimension of antigenic space arises from the molecular architecture of immune interactions. Rather, we focused on the implications of the dimensionality of antigenic space for phenotypic evolution and its predictability.
Our results suggest a hierarchy of time scales for viral evolution. The shortest is the coalescence time , which determines population turnover. Then comes , which is the time it takes the viral population to escape immunity elicited at a previous time point. The longest time scale is the persistence time , which governs the angular diffusion of the wave’s direction, but has no bearing on the prediction of the actual position of the dominant strain in antigenic space. That time scale is due to inertial effects. It does not rely directly on the hosts’ immune memories and may thus exceed their individual lifetimes. Finally, the prediction time scale , beyond which prediction accuracy falls below the resolution of cross-reactivity, scales between and at low dimensions. measures the predictability of transversal fluctuations and is thus the most relevant for actual predictions of future dominant strains in antigenic space. Importantly, it decreases with the dimension of the antigenic space and may become arbitrarily low at very high dimensions. The fact that the evolution of influenza strains is hard to predict beyond 1 y suggests that the effective dimension may indeed be large.
Our solution builds on the fitness wave solution for a diffusion model of mutation effects (27, 32). It implies a particular dependence of the wave’s speed on the population size, Eq. 12. General distribution of noninfinitesimal mutational effects, such as considered in ref. 30, would yield different expressions for the speed. However, we expect most of our other results to hold—in particular, all expression that do not carry an explicit logarithmic dependence on , as well the effective equations of motion for the wave. Our results strongly rely on the assumption of a homogeneous, isotropic antigenic space. We expect our results to be affected by anisotropies (e.g., in the mutational or the cross-reactivity kernels) or by structure in the intrinsic fitness landscape (i.e., not linked to immunity). Such structure may funnel the wave in preferred directions, hinder it, or favor its splitting. Generally, the local geometry and metric of the space are expected to determine the evolutionary behavior. For instance, Yan et al. (12) assumed a Hamming distance metric in an effectively infinite antigenic space, meaning that any mutation is both an escape mutation and a candidate for a lineage split. By contrast, in our geometry, escape happens only in the direction of the wave, while splits originate from mutations perpendicular to that direction, due to the choice of a Euclidean metric. While our results emphasize the role of the effective dimension , studying other geometrical effects is an interesting topic for future work.
Despite these caveats, it is interesting to ask whether the effective antigenic dimension can be extracted from data. A possible scheme for doing so starts by inferring the effective model parameters. may be estimated from exponential epidemic growth in a susceptible population. Dependence of key quantities on such as is weak. may be assumed to be of the order of the number of antigenically distinct infections encountered during a host’s lifetime, to 6 (every 15 y). may be inferred from , which can be estimated from cross-immunity assays or from the incidence rate . Alternatively, since is the inverse time it takes for mutations to neutrally evade immunity, it could be estimated directly from genomic data by computing the time for unselected mutations (whose rate is inferred from synonymous mutations) to affect antigenic sites. Interestingly, if and can be inferred independently, predictions about the wave’s shape, width, angular diffusion, and splitting do not depend on the particular choice of fitness wave theory. Assuming that all these parameters are known, the splitting rate, which depends sensitively on (Fig. 4B), could be used to infer an effective dimension. Since splitting is rare and may not be observed in practice, one could define instead partial splits, where a sublineage diverges an antigenic distance from the main lineage, for which the same scaling as Eq. 20 holds (SI Appendix). Alternatively, our results could be used to check the consistency of dimensionality-reduction schemes based on serological assays (8–10), by testing our predicted relations between the speed of the wave, its width and length, and angular diffusion properties, and ask what choice of dimension best agrees with our theory.
Our framework should be applicable to general host–pathogen systems. For instance, coevolution between viral phages and bacteria protected by the CRISPR-Cas system (34) is governed by the same principles of escape and adaptation as vertebrate immunity. Even more generally, our theory (Eqs. 2 and 3) may be relevant to the coupled dynamics of predators and preys interacting in space (geographical or phenotypic), opening potential avenues for experimental tests of these theories in synthetic microbial systems. Given the current context of the global severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, it is natural to ask whether our results could be applicable to predict its evolution. While our theory describes the long-term coevolution of viral strains with the hosts’ immune systems, in which most hosts have been exposed to at least one strain of the virus, SARS-CoV-2 is still in a phase of growth and has not exhausted the reservoir of susceptible hosts. As the situation develops, it will be interesting to see whether its future evolution follows a Red Queen type of evolution like influenza, goes extinct, or splits into many antigenically independent sublineages. While our model may shed light on these questions, fine microscopic details such as geographical and population structure impose additional challenges for predictions.
Methods
We simulated discrete population dynamics of infected hosts and immune protections (all integers) on a square lattice with lattice size ranging from to . Each time step corresponds to a single infection cycle, . At each time step, 1) viral fitness is computed at each occupied lattice site from the immune coverage Eq. 1; 2) viruses at each occupied lattice site are grown according to their fitness, ; 3) viruses are mutated by jumping to nearby sites on the lattice; and 4) the immune system is updated according to a discrete version of Eq. 3, by implementing and then removing protections at random (so that remains constant).
To implement step 1, we used a combination of exact computation of Eq. 1 and approximate methods, including one based on nonhomogeneous fast Fourier transforms (35, 36). Details are given in SI Appendix.
To implement step 3, we drew the number of mutants at each occupied site from a binomial distribution . The number of new mutations affecting each of these mutants is drawn from a Poisson distribution of mean conditioned on having at least one mutation. The new location of each mutant is drawn as , with (rounding is applied to each dimension), where is a vector of random orientation and modulus drawn from a Gamma distribution of mean and shape parameter 20. This distribution was chosen to maximize the number of nonzero jumps while maintaining isotropy. We then define .
To find the wave solution more rapidly, the viral population was initialized as a Gaussian distribution centered at with size and width in all dimensions, to which additional viruses are randomly added within the interval along (, , and being all given by the theory prediction). Immune protections are placed according to Eq. 6. The first 20,000 time steps serve to reach steady state and are discarded from the analysis. When a population extinction () or explosion () occurs, the simulation is resumed at an earlier checkpoint to avoid reequilibrating. Simulations are ended after steps or after 20 consecutive extinctions or explosions from the same checkpoint.
To analyze the organization of viruses in phenotypic space, we save snapshots of the simulation at regular time intervals. For each saved snapshot we take all of the coordinates with and then cluster them into separate lineages through the python scikit-learn DBSCAN algorithm (37, 38) with the minimal number of samples . The parameter defines the maximum distance between two samples that are considered to be in the neighborhood of each other. We perform the clustering for different values of and select the value that minimizes the variance of the 10th nearest-neighbor distance. Clustering results are not sensitive to this choice. This preliminary clustering step is refined by merging clusters if their centroids are closer than the sum of the maximum distances of all of the points in each cluster from the corresponding centroid.
From the clustered lineages we can easily obtain a series of related observables, such as its speed obtained as the derivative of the center’s position. The width of the lineage profile in the direction of motion as well as in the perpendicular direction is obtained by taking the standard deviation of the desired component of the distances of all of the lineage viruses from the lineage centroid. Reported numbers are time averages of these observables. We can track their separate trajectories in antigenic space. A split of a lineage into two new lineages is defined when two clusters are detected where previously there was one, and their distance is larger than , the chosen threshold for calling a split.
To estimate the persistence time, we first subsample the trajectory so that the distance between consecutive points is bigger than so that fast fluctuations in the population size do not affect the inference. We take the resulting trajectory angles and smooth them with a sliding window of 5. Then we divide the trajectory into subsegments and compute the angles’ mean-square displacement (MSD) over all lineages and all subsegments. We consider time lags only bigger than twice the typical smoothing time, and, if the MSD trace is long enough, we also require the time lag to be bigger than . Finally, we keep only time lag bins with at least 10 datapoints. We fit the resulting time series to a linear function and get the persistence time as . We compute the reduced as a goodness-of-fit score. Results are shown for simulations that had enough statistics to perform the fit, lasted at least cycles, and had a reduced below 3.
Supplementary Material
Acknowledgments
This study was supported by the European Research Council Grants COG 724208 and ANR-19-CE45-0018 “RESP-REP” from the Agence Nationale de la Recherche and Deutsche Forschungsgemeinschaft (DFG) Grant CRC 1310 “Predictability in Evolution.”
Footnotes
The authors declare no competing interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2103398118/-/DCSupplemental.
Data Availability
There are no data underlying this work.
References
- 1.Morris D. H., et al. , Predictive modeling of influenza shows the promise of applied evolutionary biology. Trends Microbiol. 26, 102–118 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wang S., et al. , Manipulating the selection forces during affinity maturation to generate cross-reactive HIV antibodies. Cell 160, 785–797 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Barton J. P., et al. , Relative rate and location of intra-host HIV evolution to evade cellular immunity are predictable. Nat. Commun. 7, 11660 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Nourmohammad A., Otwinowski J., Plotkin J. B., Host-pathogen coevolution and the emergence of broadly neutralizing antibodies in chronic infections. PLoS Genet. 12, e1006171 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Nourmohammad A., Eksin C., Optimal evolutionary control for artificial selection on molecular phenotypes. Phys. Rev. X 11, 011044 (2021). [Google Scholar]
- 6.Gandon S., Day T., Metcalf C. J. E., Grenfell B. T., Forecasting epidemiological and evolutionary dynamics of infectious diseases. Trends Ecol. Evol. 31, 776–788 (2016). [DOI] [PubMed] [Google Scholar]
- 7.Segel L. A., Perelson A. S., Shape space: An approach to the evaluation of cross-reactivity effects, stability and controllability in the immune system. Immunol. Lett. 22, 91–99 (1989). [DOI] [PubMed] [Google Scholar]
- 8.Smith D. J., Lapedes A. S., Jong J. C. D., Mapping the antigenic and genetic. Science 305, 371–377 (2004). [DOI] [PubMed] [Google Scholar]
- 9.Bedford T., et al. , Integrating influenza antigenic dynamics with molecular evolution. eLife 2014, e01914 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Fonville J. M., et al. , Antibody landscapes after influenza virus infection or vaccination. Science 346, 996–1000 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Van Valen L., A new evolutionary law. Evol. Theor. 1, 1–30 (1973). [Google Scholar]
- 12.Yan L., Neher R. A., Shraiman B. I., Phylodynamic theory of persistence, extinction and speciation of rapidly adapting pathogens. eLife 8, e44205 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Strelkowa N., Lässig M., Clonal interference in the evolution of influenza. Genetics 192, 671–682 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Gerrish P. J., Lenski R. E., The fate of competing beneficial mutations in an asexual population. Genetica 102/103, 127–144 (1998). [PubMed] [Google Scholar]
- 15.Rota P. A., et al. , Lineages of influenza type B virus since 1983. Virology 68, 59–68 (1990). [DOI] [PubMed] [Google Scholar]
- 16.White P. A., Evolution of norovirus. Clin. Microbiol. Infect. 20, 741–745 (2014). [DOI] [PubMed] [Google Scholar]
- 17.Ferguson N. M., Galvanl A. P., Bush R. M., Ecological and immunological determinants of influenza evolution. Nature 422, 428–433 (2003). [DOI] [PubMed] [Google Scholar]
- 18.Bedford T., Rambaut A., Pascual M., Canalization of the evolutionary trajectory of the human influenza virus. BMC Biol. 10, 38 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zinder D., Bedford T., Gupta S., Pascual M., The roles of competition and mutation in shaping antigenic and genetic diversity in influenza. PLoS Pathog. 9, e1003104 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wen F., Bedford T., Cobey S., Explaining the geographical origins of seasonal influenza A (H3N2). Proc. Biol. Sci. 283, 20161312 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wen F. T., Malani A., Cobey S., The beneficial effects of vaccination on the evolution of seasonal influenza. bioRxiv [Preprint] (2017). 10.1101/162545 (Accessed 18 June 2021). [DOI]
- 22.Gog J. R., Grenfell B. T., Dynamics and selection of many-strain pathogens. Proc. Natl. Acad. Sci. U.S.A. 99, 17209–17214 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Koelle K., Kamradt M., Pascual M., Understanding the dynamics of rapidly evolving pathogens through modeling the tempo of antigenic change: Influenza as a case study. Epidemics 1, 129–137 (2009). [DOI] [PubMed] [Google Scholar]
- 24.Marchi J., Lässig M., Mora T., Walczak A. M., Multi-lineage evolution in viral populations driven by host immune systems. Pathogens 8, 1–16 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Rouzine I. M., Rozhnova G., Antigenic evolution of viruses in host populations. PLoS Pathog. 14, e1007291 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Rouzine I. M., Wakeley J., Coffin J. M., The solitary wave of asexual evolution. Proc. Natl. Acad. Sci. U.S.A. 100, 587–592 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Cohen E., Kessler D. A., Levine H., Front propagation up a reaction rate gradient. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 72, 1–11 (2005). [DOI] [PubMed] [Google Scholar]
- 28.Desai M. M., Fisher D. S., Beneficial mutation-selection balance and the effect of linkage on positive selection. Genetics 176, 1759–1798 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hallatschek O., The noisy edge of traveling waves. Proc. Natl. Acad. Sci. U.S.A. 108, 1783–1787 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Good B. H., Rouzine I. M., Balick D. J., Hallatschek O., Desai M. M., Distribution of fixed beneficial mutations and the rate of adaptation in asexual populations. Proc. Natl. Acad. Sci. U.S.A. 109, 4950–4955 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tsimring L. S., Levine H., Kessler D. A., RNA virus evolution via a fitness-space model. Phys. Rev. Lett. 76, 4440–4443 (1996). [DOI] [PubMed] [Google Scholar]
- 32.Neher R., Hallatschek O., Genealogies of rapidly adapting populations. Proc. Natl. Acad. Sci. U.S.A. 110, 437–442 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Held T., Klemmer D., Lässig M., Survival of the simplest in microbial evolution. Nat. Commun. 10, 2472 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Westra E. R., Van Houte S., Gandon S., Whitaker R., The ecology and evolution of microbial CRISPR-Cas adaptive immune systems. Philos. Trans. R. Soc. Lond. B Biol. Sci. 374, 20190101 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Keiner J., Kunis S., Potts D., Using NFFT 3—a software library for various nonequispaced fast Fourier transforms. ACM Trans. Math Software 36, 1–30 (2009). [Google Scholar]
- 36.Potts D., Steidl G., Nieslony A., Fast convolution with radial kernels at nonequispaced knots. Numer. Math. 98, 329–351 (2004). [Google Scholar]
- 37.Pedregosa F., et al. , Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). [Google Scholar]
- 38.Ester M., Kriegel H. P., Sander J., Xu X., “A density-based algorithm for discovering clusters in large spatial databases with noise” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Simoudis E., Han J., Fayyad U., Eds. (AAAI Press, 1996), pp. 226–231. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
There are no data underlying this work.