Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2022 Feb 16;596:127071. doi: 10.1016/j.physa.2022.127071

Epidemiological theory of virus variants

Giacomo Cacciapaglia a,b, Corentin Cot a,b, Adele de Hoffer c, Stefan Hohenegger a,b,, Francesco Sannino d,e,f, Shahram Vatani a,b
PMCID: PMC8848575  PMID: 35185268

Abstract

We propose a physics-inspired mathematical model underlying the temporal evolution of competing virus variants that relies on the existence of (quasi) fixed points capturing the large time scale invariance of the dynamics. To motivate our result we first modify the time-honoured compartmental models of the SIR type to account for the existence of competing variants and then show how their evolution can be naturally re-phrased in terms of flow equations ending at quasi fixed points. As the natural next step we employ (near) scale invariance to organise the time evolution of the competing variants within the effective description of the epidemic Renormalisation Group framework. We test the resulting theory against the time evolution of COVID-19 virus variants that validate the theory empirically.

Keywords: Epidemiology, Renormalisation group, Time evolution, Fixed points, Mutation and variants, Field theory

1. Introduction

In the wake of the ongoing pandemic of SARS-CoV-2, epidemiologists are currently witnessing a surge of data showing not only the spread and time evolution of the virus but also its genetic evolution, notably the emergence of mutations. While understanding their biological properties and assessing the danger they pose for humans is of tantamount importance, the very occurrence of new variants is also a crucial component in the time evolution of the pandemic itself. Indeed, the companion paper [1] analyses the emergence of new variants as one of the driving forces behind the dynamics of the SARS-CoV-2 pandemic (notably the multi-wave structure) through the analysis of the above mentioned data by means of machine learning and numerical techniques. The current paper is aimed at providing a theoretical framework to model the interplay and competition of different virus variants within a given population in an efficient manner.

The application of mathematical modelling to describe and predict the spread of infectious diseases has a history going back more than a century [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. The pioneering SIR model [13] is an example of compartmental models, which are based on dividing the population into different classes and model the spread of the disease via a set of first order differential equations in time that describes the flow of individuals between these different compartments. Such models are deterministic in the sense that the solutions are uniquely determined through the initial conditions supplemented to the differential equations (apart from certain parameters related to the infectivity of the disease and the rate at which individuals may recover from an infection). These models can be refined by further subdividing the population into more compartments depending on the biological, geographical and social particularities of the situation. We refer the reader to some of the excellent reviews that are available in the literature for more details, e.g. [14], [15], [16], [17], [18].

Complementary to this, there exist other models of a stochastic nature: in these models, the microscopic processes leading to the spread of the disease are understood in a probabilistic sense and time is typically a discretised variable. Models of this type include lattice and percolation models, many of which are inspired by chemical or diffusion processes. We refer the reader to the excellent reviews [19], [20] for more details. These models are related to the compartmental models mentioned above through processes that effectively reduce the number of degrees of freedom, such as mean field approximations and averaging procedures.

While very different in their original modelling of the problem, the approaches outlined above, exhibit very interesting properties, such as criticality and symmetries related to a re-scaling of time. The former is the observation that in many cases the difference between solutions where only a very small number of individuals is concerned compared to those where a significant fraction of the population becomes infected depends on certain threshold parameters and the transition is rather sharp. This similarity to phase transitions in Physics has lead to further approaches using universality classes of field theories, such as in the pioneering works [21], [22], [23], [24], [25], [26].

Furthermore, a direct use of large and short time-scale invariance has been proposed in [27]. There, a framework of flow equations for the total number of infected individuals was introduced that effectively encapsulates the symmetries respected by the epidemic curves. Inspired by the similarity of this flow system and its symmetries to the one in high energy physics, for the rate of change of interaction couplings with respect to some reference energy scale, this approach was baptised the epidemic Renormalisation Group (eRG) [27]. Specifically, time itself plays the role of the renormalisation scale and the coupling strength in theories of fundamental interactions [28], [29] can be interpreted as an epidemiological quantity, such as the cumulative number of infected individuals [27]. After demonstrating the power of reorganising the epidemic diffusion process around time-scale invariance, the approach has been extended to account for human interactions and mobility across different regions of the world in [30]. When combined with mobility data provided by Google and Apple [31] as well as US flight data [32], the framework was used to deduce the impact of lockdowns on the global spread of the virus. This lead to the prediction, with few months of advance, of the advent of a second pandemic wave that started in the fall of 2020 in Europe [33]. The framework has been extended to contain complex-fixed points [34] in the flow equations in order to provide a first fully consistent mathematical description of multiwave pandemics. The extended framework, dubbed Complex eRG, related the interwave period to the timing of the insurgence of the next wave [35]. Last but not least, a slight modification of the approach has been used to incorporate the first impact of the US vaccination campaign [32].

In the comprehensive work of [36] we further showed how the eRG framework is related to the traditional compartmental models of SIR type [37], and how they emerge from microscopic percolation models which are stochastic in nature.

The investigations above were purely epidemiological in nature, however with the advent of massive genome sequencing a new era commenced in which, as we shall see, these data become an integral part of refined epidemiological models of the type discussed above.

In order to acquire intuition on the virus variant diffusion and dynamics, in Section 2 we will use modified SIR-inspired compartmental models [13]. In particular we will describe the competition between two virus variants in terms of a SIIR model. Focussing on the temporal flow of the cumulative numbers of infected, the analysis points to a more efficient description in terms of an eRG system of flow equations. From a theoretical physics standpoint, we find amusing the appearance of a degeneracy in the system, specifically in terms of the asymptotic number of infected, which is similar to the emergence of a marginal operator regulating the end-point of the flow. These features seem peculiar to the rather special systems and will grant a deeper understanding at a more microscopic level. It will be interesting to study similar features in other ‘microscopic’ systems e.g. such as percolation models described in [25], [26]. We therefore study the virus mutation version of the eRG (MeRG) in Section 3. We show that the flow-equations can be efficiently described in terms of a gradient flow diagrammatically. The flow diagram reveals the existence of fixed points to be interpreted as initial and final stages of the cumulative number of infected individuals by different virus variants. We discover that virus mutations and their variant evolutions can be efficiently represented in terms of theoretical physics concepts such as critical surfaces and (quasi)-fixed points. We observe a similarity between the appearance of a virus mutation and switching on a relevant operator from the perspective of fixed points controlling the final stage of a single wave pandemic. These lend a natural and profound interpretation of the complex fixed points eRG (CeRG) [34], [35], which mathematically models multi wave pandemics.

To further substantiate our findings, in Section 4 we use the developed theory to describe epidemiological data of COVID-19 for several regions of the world where variants of concern have first emerged. The data on the genome sequencing was extracted from the online repository GISAID (gisaid.org/), the epidemiological data for the US states was taken from the NY times github and the epidemiological data for the other countries on Ourworldindata. We first used the GISAID data to extract the percentage of representation of the variants among the sequences collected at each specific date. Then, by multiplying this percentage to the epidemiological data on the number of new cases within the country, we estimated the number of new cases related to each variant. This method allowed us to generate the cumulative number of cases in order to fit it with a logistic function and extract the associated infection rate.

2. Compartmental models as mutation templates

The eRG framework and its potential generalisations to several pathogens (e.g. virus variants or different bacteria types) are effective descriptions of the system, capturing its dynamics via certain key fields (typically the cumulative number of infected individuals within a population). As such, other (‘microscopic’) degrees of freedom of the system have been ‘integrated out’ (see [36] for more details on this point of view). While this approach is very efficient in simulating infection numbers, it is non-trivial to model new effects of the dynamics (such as the emergence of variants via mutations). To this end, we find it convenient to first obtain intuition by studying more basic models, of a compartmental nature, before using the obtained knowledge to propose eRG-like models.

2.1. Simple SIIR model

As a very simple ‘first-principle’ model, we consider a compartmental model to describe the temporal evolution of two variants of a disease within a population of size N. The latter is split into four different classes (compartments):

  • Susceptible individuals: we denote by NS(t) the number of individuals who can become infected with either variant of the disease.

  • Infectious individuals: we denote by NI1(t) and NI2(t) the number of individuals that are currently infected with the two variants of the disease, respectively. Individuals in these two compartments can infect susceptible individuals, with the same variant of the disease, if getting in contact with them, with a well-defined constant rate. We assume that it is not possible to be infected with both variants simultaneously.

  • Removed individuals: we denote by NR(t) the number of individuals that can neither be infected themselves with any of the two variants, nor infect susceptible individuals. While this removal may not only be due to recovery from a previous infection with either of the two invariants, we assume that the latter grants permanent immunity with respect to both variants.

We assume that N is sufficiently large such that the relative numbers of susceptible S(t), infectious I1,2(t) and removed individuals R(t) are continuously differentiable functions of time

S,I1,2,R:R+[0,1]. (2.1)

Without loss of generality, we place the outbreak of the epidemic at time t=0. Furthermore, we consider the population to be stable in time, i.e. we impose the constraint

S(t)+I1(t)+I2(t)+R(t)=1,tR+. (2.2)

The dynamics of the epidemic is described by individuals being moved between the four compartments introduced above with certain fixed1 (i.e. time-independent) rates:

  • γ1,2 are the rates at which susceptible individuals turn to infected once in contact with an individual infected by the two variants of the disease respectively.

  • ε1,2 are the removal rates at which infectious individuals (carrying variants 1 or 2 respectively) become non-infectious. This includes recovery from the disease as well as other removal mechanisms (e.g. death of the individual).

Schematically, the flow between the compartments (S,I1,I2,R) is shown in Fig. 1, and it can be described by the following system of coupled first order differential equations

dSdt(t)=(γ1I1(t)+γ2I2(t))S(t),
dI1dt(t)=γ1I1(t)S(t)ε1I1(t),
dI2dt(t)=γ2I2(t)S(t)ε2I2(t),
dRdt(t)=ε1I1(t)+ε2I2(t), (2.3)

together with the following initial conditions

S(0)=S0,I1(0)=I1,0,I2(0)=I2,0,R(0)=0,with0S0,I1,0,I2,01,S0+I1,0+I2,0=1. (2.4)

Fig. 1.

Fig. 1

Schematic flow between the four compartments of the SIIR model.

Concretely we shall consider variant 2 to have been created through a mutation of variant 1. To quantify the effect of the second variant on the evolution of the pandemic, we will study the evolution of the number of infected under the variant 1, I1(t) for I2,00, and compare the results with the control case, defined in absence of the second variant, i.e. I2,0=0. To better connect the results in this section with the eRG approach, we focus on the cumulative number of infected Ic,1(t), defined as

Ic,i(t)=NI0,i+Nγi0tIi(t)S(t)dt,i=1,2. (2.5)

We remark that the model introduced above does not trace the origin of the variant 2 of the virus as a mutation of the first one. The latter could be encoded in a more evolutionary version of the 4-compartment model, where a mutated variant could appear at some time t0>0. To be concrete, the evolutionary model can be described by the following set of differential equations:

dSdt=(γ1I1+γ2I2)S,withS(0)=S0,I1(0)=I1,0,I2(0)=R(0)=0dI1dt=γ1I1Sε1I1β(t)I1,0S0,I1,0,1dI2dt=γ2I2Sε2I2+β(t)I1,S0+I1,0=1dRdt=ε1I1+ε2I2, (2.6)

where β(t) is a time-dependent rate with point-like support that converts at time t0>0 (the instant in which the variant appears) a fraction of the infectious individuals of variant 1 into those of variant 2. This function is defined as:

β(t)=β0ift=t0,0iftt0. (2.7)

However, since we are excluding the possibility of a re-infection of removed individuals (with either of the two variants) and we are only interested in the impact of the appearance of the mutation on the dynamics of variant 1 (i.e. the evolution for t>t0), the above SIIR model (2.6) is equivalent to the simpler system (2.3) for a suitable choice of the boundary conditions (2.4).

2.2. Impact on the cumulative number of infected

While an analytic solution of the system (2.3), (2.4) appears very difficult, we can study numerical solutions using a simple Euler integration method. Since to this end we have to consider fixed values for γ1,2 and ε1,2, we first study the dependence of Ic,1(t) on γ2 and ε2. In analogy to the SIR model, it is convenient to define the reproduction numbers of the two variants as follows:

R0,1σ1=γ1ε1,R0,2σ2=γ2ε2. (2.8)

As we will see, the time dependence of the cumulative number of infected mainly depends on the two reproduction numbers instead of the detailed values of infection and recovery rates, under certain conditions.

This point is illustrated in Fig. 2, where we show solutions Ic,1(t) for fixed σ1=1.2 and for four values of σ2. The various curves in each panel correspond to different values of γ2 (where ε2 also varies accordingly to keep σ2 fixed). These results, along with further checks we have performed, suggest that the cumulative number of infected by the original variant only depends on σ2 as long as its value is similar to that of σ1, i.e. if the two variants have similar reproduction numbers (quantitatively, σ2 can be at most 50% larger than σ1). Conversely, for σ2σ1, the solutions of the SIIR model are sensitive to the specific values of γ2 and ε2. The results in Fig. 2 also highlight that the shape of Ic,1(t), or equivalently its asymptotic value at large times, is substantially affected by the presence of the second variant only is σ2 is significantly larger than σ1.

Fig. 2.

Fig. 2

Time evolution of Ic,1 as a function of γ2 with σ2=0.75 (top left panel), σ2=1.25 (top right panel), σ2=1.5 (bottom left panel) and σ=2 (bottom right panel). In all cases we have used γ1=0.03 and ε1=0.025 (such that σ1=1.2) as well as S0=0.9, I1,0=0.099 and I2,0=0.001.

To quantify the observations above, we can use the fact that the numerical solutions for Ic,1(t) are fairly well approximated by a logistic function of the form

Ic,1(t)NA1+eλ(tκ),withA[0,1],λ,κR, (2.9)

for different values of the parameters (A,λ,κ). Here, NA is the asymptotic value of the cumulative number of infected

NA=limtIc,1(t), (2.10)

while λ is an effective infection rate, i.e. a measure of how infectious the variant is (and is generally related to σ1), while κ is a parameter that allows to shift the outbreak of the spread of variant 1. Exemplary fits of numerical solutions of Ic,1(t) for different values of (σ1,σ2) along with the numerical fitting parameters are shown in Fig. 3. We remark, however, that fitting the cumulative number of infected individuals with logistic functions is not the only viable approximation and we shall encounter other possibilities (depending on specific cases) in Section 2.3.

Fig. 3.

Fig. 3

Numerical solution (blue) and fitted logistic function (red) of Ic,1 as a function of time for two different combinations of σ1,2: top row: (σ1,σ2)=(0.75,0.5),(0.75,1.25),(0.75,2), middle row: (σ1,σ2)=(1.5,0.5),(1.5,1.25),(1.5,2), bottom row: (σ1,σ2)=(2,0.5),(2,1.25),(2,2). All plots use S0=0.9, I1,0=0.099, I2,0=0.001.

Finally, using the result that Ic,1 can be approximated using a logistic function, we can quantify more concretely the impact of the appearance of variant 2. To this end, we plot in Fig. 4 the values (A,λ,κ) of the approximations of Ic,t as functions of σ2. These examples suggest that an existing variant of a disease is strongly impacted by the appearance of a new variant only if the latter is significantly more infectious, i.e. if σ2 is much larger than σ1 (in many cases approximately by a factor of 2).

Fig. 4.

Fig. 4

Fit parameters (A,λ,κ) for different values of σ1 as functions of σ2. All plots are shown for S0=0.9, I1,0=0.099, I2,0=0.001.

We can summarise the basic findings suggested by the simple model (2.3), (2.4) through the following three points:

  • Despite not being an exact solution, logistic functions (sigmoids) are good approximations to describe competing variants of a disease.

  • Key parameters of a variant in competition with a second variant that is not significantly more infectious can be described, to first approximation, by the reproduction numbers σ1,2 alone (rather than individually γ1,2 and ε1,2).

  • In order to have a significant impact on an existing variant (i.e. to change key parameters such as the asymptotic numbers of infected individuals, the effective infection rates, etc.), a new variant needs to have a much higher reproduction number, σ2σ1.

2.3. Towards RG flows

Having understood the impact of variants on the evolution of the infections in a simple SIIR model, we now want to understand the dynamics of the system in analogy to the renormalisation group in particle physics [28], [29]. This is a preliminary step that will help us gear up towards the inclusion of variants in the eRG framework. In practice, the solutions of the system of Eqs. (2.3), (2.4) can be thought of as describing the flow of the physical system (here represented by the cumulative number of infected by the two variants) in the plane

P={(Ic,1,Ic,2)[0,N]×[0,N]|Ic,1+Ic,2N}. (2.11)

The trajectories are parametrised by time t. Following the eRG approach [27], we shall try to interpret these trajectories as renormalisation group flows from a repulsive fixed point at (Ic,1,Ic,2)=(0,0) to a new fixed point (Ic,1,Ic,2)(0,0) in the plane P. In this regard, we shall consider the initial conditions (2.4) as small deformations away from the initial fixed point. Keeping S0 fixed, this gives a line (a co-dimension 1 surface) of starting points of trajectories in the plane P, which can be found by solving (2.3) with the initial conditions satisfying

S0=1I1,0I2,0. (2.12)

We then plot (Ic,1(t),Ic,2(t)) for successive times t[0,). The result is shown schematically, for different values of σ1,2, in Fig. 5. We remark that here (and in the following) we have assumed ε1=ε2, i.e. that the removal rates for both variants are the same.2

Fig. 5.

Fig. 5

Trajectories for different initial conditions in the P plane for S0=0.999 and (σ1,σ2)=(0.85,0.95) (left panel) and (σ1,σ2)=(1.075,1.1) (right panel).

Before analysing in more detail the cases of σ1,2<1 and σ1,2>1 (which show certain qualitative differences), we first remark that the flows, i.e. the lines in Fig. 5, do not continue indefinitely but end at determined points in the P-plane. These can be considered fixed points of the flow, as the evolution of the epidemic stops once these points are reached. More concretely, we observe that the line of initial conditions flows to another line of end-points. In fact, from the point of view of the SIIR model, the endpoints are equivalent, and correspond to the same final state of the system. To understand this, we recall that the asymptotic solutions of the model (2.3), (2.4) corresponds to the eradication of both variants

limtI1(t)=0=limtI2(t). (2.13)

Therefore at the endpoints in Fig. 5 (which correspond to t), only susceptible and removed individuals remain in the system and

limt(Ic,1+Ic,2)(t)=limtNR(t). (2.14)

Hence, the co-dimension 1 line of end-points in Fig. 5 reflects different weighted re-distributions of the removed individuals into Ic,1 and Ic,2. In the SIIR model, however, the removed individuals are indistinguishable as they are collected in a single compartment R, and all the final configurations in the end-point line correspond to the same type of final state of the SIIR compartments (i.e. with I1=I2=0). In other words, while the flow lines keep track of which individuals have been infected with which variant, this distinction becomes irrelevant at the level of the SIIR system when all previously infected individuals have become (indistinguishable) removed individuals. We shall see that this redundancy of the endpoints of the ‘flow’ of the system has a very natural interpretation from the perspective of the epidemic renormalisation group.

2.3.1. Flow for σ1,2<1

We shall now analyse in more detail the trajectories of the system in the P-plane and we first focus on the case

σ1γ1ε1<1,andσ2γ2ε2<1. (2.15)

In the simple SIR model [13], the values of σ1,2 correspond to the basic reproduction numbers of each variant. Physically, they tell the average number of (susceptible) individuals who are infected by a single infectious one during the period in which the latter is infectious. The condition σi<1, therefore, implies that the number of infectious individuals Ii(t) is a monotonically decreasing function of time, since dIidt(t)<0 t>0, as can be seen from Eq. (2.3).

As is visible in the left panel of Fig. 5, under the condition (2.15) the end points of the trajectories of the system follow an approximate linear relation in the P-plane. Concretely, upon defining the positions of the end-points as

Ic,iasym(I1,0,I2,0)=1NlimtIc,i(t),i=1,2, (2.16)

which depend on the initial conditions (S0,I1,0,I2,0) subject to (2.12), we find to good approximation (see Fig. 6 for an example) the relation

Ic,2asym(I1,0,I2,0)aIc,1asym(I1,0,I2,0)+b,witha=1+1σ2W(S0σ2eσ2)1+1σ1W(S0σ1eσ1),b=1+1σ2W(S0σ2eσ2), (2.17)

where W is the Lambert function. The coefficients (a,b) can be inferred from the points (Ic,1asym(1S0,0),0) and (0,Ic,2asym(0,1S0)), which can be determined analytically from the SIR model [13] with 3 compartments (i.e. with only one variant of the disease).

Fig. 6.

Fig. 6

Left panel: Approximation (2.17) of the codimension 1 surface (dashed black line) that represents the end-points of the trajectories in the P-plane. Each red dot represents (Ic,1asym,Ic,2asym) at asymptotic time t for different choices of (I1,0,I2,0) satisfying (2.12). Right panel: Numerical approximations (dashed black lines) for equal time-slices (Ic,1(t),Ic,2(t)) for finite t (red dots). Both plots use S0=0.999 and (σ1,σ2)=(0.85,0.95).

In fact, even at finite time t<, we have found numerical evidence for a linear relation between Ic,1(t) and Ic,2(t) with different initial conditions (I1,0,I2,0): the right panel of Fig. 6 shows equal time slices of the flows in the P-plane, i.e. (Ic,1(t),Ic,2(t)) with different (I1,0,I2,0) evaluated at the same time t< (red points), which are approximated by linear curves (dashed black lines).

Next, in order to make contact with a possible eRG approach to describe the flow of the system in the P-plane, we consider the time derivatives of (Ic,1(t),Ic,2(t)). According to the definition (2.5) they are given by

dIc,idt(t)=NγiIi(t)Si(t),i=1,2. (2.18)

After dividing by the total size of the population for simplicity, from studying the numerical solutions of (2.3), (2.4) we observe that these time derivatives can be well approximated by linear functions in Ic,i

βi(Ic,j(t))1NdIc,idt(t)λi1Ic,i(t)NAi,i=1,2, (2.19)

where (λi,Ai) are constants that are implicitly functions of the initial conditions (I1,0,I2,0). Indeed, examples of the approximation are shown in Fig. 7. Here, in order to better gauge the impact of the initial conditions, we have parametrised them as

I1,0=(1S0)r,I2,0=(1S0)(1r),withr[0,1]. (2.20)

In fact, one can also represent the trajectory of the system (for fixed initial conditions (I1,0,I2,0)) in the P-plane as following the vector field 1N(dIc,1dt,dIc,2dt), as is schematically shown in Fig. 8. This plot better illustrates the concept of flow of the system in the two dimensional plane: the system slows down as it approaches the fixed point. This is represented by the colour code of the flow lines (arrows in Fig. 8), matching the length of the derivative vector and is also visible from the fact that the black dots of the trajectory (which represent the system after equal time intervals) become denser as they approach the fixed point.

Fig. 7.

Fig. 7

Linear approximation of (Ic,1,dIc,1dt) (left panel) and (Ic,2,dIc,2dt) (right panel) as a function of the parameter r (defined in Eq. (2.20)). The red dots represent numerical solutions of (Ic,i,dIc,idt) for different values of time t, grouped according to the initial conditions. All plots use S0=0.999 and (σ1,σ2)=(0.85,0.95).

Fig. 8.

Fig. 8

Trajectory of the system in the P-plane for a fixed initial condition (black line) following the vector field 1N(dIc,1dt,dIc,2dt) (the colour of the vectors represents the length). We have used the numerical parameters r=12, S0=0.999 and (σ1,σ2)=(0.85,0.95).

We can express the approximated derivatives (2.19) in the following form:

βi(Ic,j)=1NdIc,idt(t)iΦ(Ic,j(t)),withΦ(Ic,j)=j=12Ic,jλj1Ic,j2NAj (2.21)

where we have used that βi in (2.19) only depends on Ic,i and we have defined the gradient operator in the P-plane

iIc,i,i=1,2. (2.22)

Hence, for fixed initial conditions, the trajectory of the system in the P-plane can (approximately) be described as a gradient flow. Notice in this regard that Φ (through (λi,Ai)), also implicitly depends on (I1,0,I2,0).

Before moving on to the cases σ1,2>1, in passing we make three more remarks:

  • We find evidence for the fact that (λi,Ai) depend also on σi, but not on σji. Concretely, the numerical solutions can be approximated by
    λiλ~ir,AiA~ir1+1σiW(S0σieσi), (2.23)
    where (λ~i,A~i) are independent of σji.
  • The differential Eq. (2.19)
    dIc,idt(t)=λiN1Ic,iNAi,withIc,i(t=0)=I1,0, (2.24)
    can be solved analytically
    Ic,i(t)=AiN+(I1,0NAi)eλiAit. (2.25)
  • As we have explained, varying the initial conditions (I1,0,I2,0) while keeping S0 fixed as in (2.12) leads to a codimension 1 surface of fixed points. However, by changing at the same time (σ1,σ2), it is possible to find a family of trajectories that flow to a single point in the P-plane (an example is schematically shown in Fig. 9). This can be achieved for any choice of σ1,2 and is not limited to σ1,2<1.

Fig. 9.

Fig. 9

Modified trajectories where in addition to the initial condition also (σ1,σ2) have been modified in such a way to lead to a single point in the P-plane.

2.3.2. Flow for σ1,2>1

We now move to analysing the case where both σ1,2>1. The first difference from the previous case is that neither the asymptotic end points (Ic,iasym,Ic,2asym) (see (2.16) for the definition) of the trajectories in the P-plane nor equal-time slices for finite t can be approximated by linear relations. As an illustration, Fig. 10 shows the corresponding plots for (σ1,σ2)=(1.2,1.3), highlighting visible deviations from a linear regime. We notice that these deviations become more significant the larger is the difference between σ1 and σ2.

Fig. 10.

Fig. 10

Left panel: endpoints of the trajectories of the system in the P-plane for different initial conditions. Each red dot represents (Ic,1asym,Ic,2asym) at asymptotic time t for different choices of (I1,0,I2,0) satisfying (2.12). The black lines correspond to (linear) approximations for initial conditions leading to Ic,1,2asym1 respectively. Right panel: Equal time-slices (Ic,1(t),Ic,2(t)) for finite t (red dots) show deviations from approximate linear relations between (Ic,1,Ic,2). Both plots use S0=0.999 and (σ1,σ2)=(1.2,1.3).

We now turn to the differential Eqs. (2.18) for Ic,i(t). The main difference compared to the case σ1,2<1 is that the factors NγiIi(t)Si(t) can no longer be approximated by linear functions in Ic,i. Instead, we find evidence that the time-derivatives can be approximated by a quadratic function of the form

βi(Ic,j(t))1NdIc,idt(t)λiIc,i(t)1Ic,i(t)NAi+δi,i=1,2, (2.26)

where (λi,Ai,δi) are constants that implicitly depend on the initial conditions (I1,0,I2,0) (we remark that δi1 is generically a rather small parameter, since it is related to the initial conditions Ii,0). Indeed, examples of the approximations as functions of the initial conditions parametrised by r, as defined in Eq. (2.20), are shown in Fig. 11. By varying σ1,2, we find that the quadratic approximation (2.26) becomes less satisfactory the larger is the difference between σ1,2.

Fig. 11.

Fig. 11

Quadratic approximation of (Ic,1,dIc,1dt) (left panel) and (Ic,2,dIc,2dt) (right panel) as a function of the parameter r (defined in Eq. (2.20)). The red dots represent numerical solutions of (Ic,i,dIc,idt) for different values of time t, grouped according to the initial conditions. All plots use S0=0.999 and (σ1,σ2)=(1.2,1.3).

As before, we can also represent the trajectory of the system (for fixed initial conditions (I1,0,I2,0)) in the P-plane as following the vector field 1N(dIc,1dt,dIc,2dt), as schematically shown in Fig. 12. Using (2.26), we can therefore write

βi(Ic,j)=1NdIc,idt(t)iΦ(Ic,j(t)),withΦ(Ic,j)=j=12Ic,j2λj212Ic,j3NAj+δjIc,j, (2.27)

where we have used the fact that βi in (2.26) only depends on Ic,i. The gradient operator in the P-plane i is defined in (2.22). Hence, for fixed initial conditions, the trajectory of the system in the P-plane can be (approximately) described as a gradient flow. Notice in this regard that Φ also implicitly depends on (I1,0,I2,0) through (λi,Ai).

Fig. 12.

Fig. 12

Trajectory of the system in the P-plane for a fixed initial condition (black line) following the vector field 1N(dIc,1dt,dIc,2dt) (the colour of the vectors represents the length). We have used the numerical parameters r=12, S0=0.999 and (σ1,σ2)=(1.2,1.3).

2.3.3. Flow for σ1<1 and σ2>1

We briefly comment on the case where one of the σi (without loss of generality, we can choose it to be σ1) is smaller than 1 and the other one larger than 1. This corresponds to a situation, where I1(t) is a monotonically decreasing function of time, while I2(t) has a maximum before tending to 0 for large t.

The situation is a combination of the cases discussed in the previous subsections. Notably, the time evolution of Ic,1 can be approximated by an equation of the form (2.21) with a function Φ that is quadratic in Ic,1, while the time evolution of Ic,2 can be approximated by an equation of the form (2.27) with a Φ that is cubic in Ic,2. We remark that, as before, the approximations are less satisfactory the larger is the difference between σ1 and σ2.

3. Mutation eRG (MeRG) approach

The SIIR model discussed in the previous Section is a simple model that allows to gain basic intuition about the dynamics of two competing variants of a disease, without assuming too much about basic ‘microscopic’ processes that govern its spread. However, due to the simplicity (notably the fact that infection and recovery rates are assumed to be constant in time), this model is not particularly useful to confront (or even predict) real world data. In principle, it is possible to extend the model by allowing for time-dependent (γi,εi) and/or adding additional compartments, however at the price of complicating the analysis and losing predicting power.

Hence, in the following we will use the intuition we have gained about the pandemic with two variants to extend the eRG approach. The latter is an economical effective approach that entails a high degree of predictivity in terms of the time structure of the pandemic. As discussed in [27], the eRG framework makes use of particular symmetries in the time evolution of an epidemic to give a simplified description of certain key quantities (namely the cumulative number of infected individuals), while more microscopic degrees of freedom have already been taken into account (or ‘integrated out’). See [36] for a more detailed description of this philosophy.

3.1. The model

The eRG approach consists in defining β-functions that govern the time evolution of the system at the global level. To this end, in analogy to particle physics, for each variant we first define an epidemic ‘coupling strength’ as a monotonic, differentiable and bijective, function αi=fi(Ic,i) of the cumulative number of infected individuals. Different choices of fi correspond to different renormalisation schemes and we expect physical results in general not to depend on the choice. The β-functions are then defined as

βi=dαidt=j=12dfidIc,jdIc,jdt(t),i=1,2. (3.1)

The intuition obtained from the SIIR model suggests to model these β-functions as polynomials in Ic,i, which is either linear (for variants with a basic reproduction number σi<1) or quadratic (for variants with a basic reproduction number σi>1). In fact, comparing the equations that emerged in the context of the SIIR model, i.e. (2.19) with (2.26) (and neglecting3 the small coefficient δi), we see that both can in fact be brought into the same framework (3.1) simply by different choices of the functions fi, namely

fi(Ic,i)=ciln(Ic,i)forσi<1,ciIc,iforσi>1, (3.2)

for some constant ci. Moreover, (2.21), (2.27) suggest to further model the beta function in the form of a gradient equation, i.e.

βi=dαidt=j=12dfidIc,jdIc,jdt(t)=iΦ(Ic,j). (3.3)

where the gradient operator is defined in (2.22) and Φ is a quadratic or cubic function, respectively. In fact, taking these results together with the liberty of scheme-redefinitions and comparing with the general form of the eRG for single variants (see [34]) we are naturally led to choose for simplicity fi(Ic,i)=Ic,i and model Φ as

Φ(Ic,j)=j=12Ic,j2λj212Ic,j3NAj. (3.4)

In practice, this means that we are modelling the time-evolution of each variant as independent of the other, except for a potential change in the parameters (λi,Ai). From the intuition obtained from the SIIR model, this is justified as long as the two variants do not have effective reproduction numbers that are largely different from one another (see Section 2.2). We shall see that this assumption also leads to reasonable results compared to real world data. Indeed, the system (3.3) for Φ given in (3.4) allows for an analytic solution of (Ic,1,Ic,2)(t) which can be written in terms of logistic functions

Ic,i(t)=NAi1+eλi(tκi),i=1,2, (3.5)

which well reproduce the data for the first wave of the COVID-19 pandemic [27]. Here κ1,2 are integration constants that determine which trajectory the system follows in the P-plane. These parameters resemble the initial conditions (I1,0,I2,0) from the perspective of the SIIR model.

3.2. Structure of the β-functions in the P-plane

The vector field (β1(Ic,i),β2(Ic,i)) in the P-plane is schematically plotted in Fig. 13. It has four fixed points, i.e. points (Ic,1(k),Ic,2(k)) (for k=0,1,2,3) where

βi(Ic,1(k),Ic,2(k))=0,i=1,2, (3.6)

explicitly given by

P0=(Ic,1(0),Ic,2(0))=(0,0),P1=(Ic,1(1),Ic,2(1))=(NA1,0),P2=(Ic,1(2),Ic,2(2))=(0,NA1),P3=(Ic,3(k),Ic,2(3))=(NA1,NA2). (3.7)

Among them, P0 is repulsive in all direction (i.e. in Fig. 13 all arrows point away from it) and corresponds to the case where no disease is (and never has been) present. In fact, moving away from this fixed point by infecting even only a small number of individuals of the population (with either of the two variants) causes the system to flow to one of the other three fixed points. Among them, P1,2 are repulsive in one direction, but attractive in the other. Since they are characterised by Ic,2=0 or Ic,1=0 respectively, they correspond to the endpoints of scenarios in which variant 2 or variant 1 is never present in the population (i.e. all infected individuals are infected with only one of the two variants). These fixed points can be reached only by the flow lines represented in black in Fig. 13, which are initiated by a deformation away from P0 along Ic,1 or Ic,2 only, respectively.4 Any deformation that switches on both Ic,1 and Ic,2 (i.e. any scenario in which infected with both variants are present in the population) causes the system to flow to fixed point P3, an example of such a flow is indicated in red in Fig. 13. Which trajectory is realised depends on the initial deformation, which is represented by the parameters κ1,2 in the solution (3.5) (and which is akin to the choice of different initial conditions in the case of the SIIR model).

Fig. 13.

Fig. 13

Schematic structure of the RG-equations in the P-plane (the values used for the plot are Nλ1=0.2, Nλ2=0.25 and A1=0.3, A2=0.4). The β-functions exhibit four fixed points: P0,1,2,3 of which P0 is repulsive in all directions, P1,2 have one attractive direction and one repulsive one, while P3 is attractive in all directions. Different trajectories as solutions of (3.3) connecting the fixed points are indicated by the dotted lines. The colouring of the vectors indicates the norm (β1,β2), which respectively leads to smaller or larger distances between different points of the flow lines.

3.3. Critical surfaces and mutations

The assumption of a deformation away from the fixed point P0 in Fig. 13 (which represents the complete absence of the disease in the population) along a generic direction is not realistic, since it would correspond to the simultaneous appearance of infected individuals of both variants in the population. A more likely scenario would be the appearance of one variant first, while a second deformation at a latter stage introduces the second variant. This dynamics can be understood from an RG perspective as the switching on of a relevant operator.

To make this statement more precise, we first need to introduce the concept of critical surface associated with a fixed point of the β-functions. A critical surface consists in all points in the P-plane from where the RG-flow leads to the fixed point in question. Concretely, for the fixed points P1,2,3, the critical surfaces are

CP1={(Ic,1,Ic,2)P|Ic,1>0 and Ic,2=0},
CP2={(Ic,1,Ic,2)P|Ic,1=0 and Ic,2>0},
CP3={(Ic,1,Ic,2)P|Ic,1>0 and Ic,2>0}. (3.8)

A relevant operator (from the perspective of the fixed point in question), corresponds to a direction that drives the theory away from the critical surface, such that it flows to a new critical point. In the case at hand, CP1,2 have one critical direction orthogonal to it, which, from an epidemiological perspective, precisely corresponds to the appearance of the second variant. A small deformation at any point of CP1,2 (for example due to a relevant mutation of the virus) causes the system to deviate from the critical surface and ultimately flow towards P3.

In a scenario with only two variants, the fixed point P3 has no relevant deformations and is attractive along all directions. Instead, small fluctuations along trajectories leading towards P3 (such as the red path shown in Fig. 13) can be interpreted as irrelevant operators being switched on. To make this statement more concrete, instead of (Ic,1,Ic,2) we can consider the following functions5

O+=A1A2A12+A22Ic,1NA2+Ic,2NA1,andO=A1A2A12+A22Ic,1NA1+Ic,2NA2. (3.9)

Notice that, at the fixed point P3, we have

O(Ic,1=NA1,Ic,2=NA2)=0. (3.10)

Using the explicit solutions of (Ic,1,Ic,2)(t) in (3.5) we have schematically plotted O± and its β-functions as functions of time in Fig. 14. The latter can be written in the form

β±=dO±dt=1N(A12+A22)±Φ(O±),with±=O±. (3.11)

We can then interpret the time evolution of the system as a flow of O+ from the repulsive fixed point O+=0 to the attractive fixed point O+=1, while O can be interpreted as an irrelevant operator that is switched on along the way, which does not drive the system away from the fixed point (O+,O)=(1,0).

Fig. 14.

Fig. 14

Parametric Plot of the operators O± (left panel) and their β-functions (right panel) as functions of time. The plots use Nλ1=0.2, Nλ2=0.25, A1=0.3, A2=0.4 and κ1=16, κ2=25.

3.4. Multi wave dynamics and CeRG

We now consider in more detail a particular case of a flow along the critical surface CP1 with a relevant operator being switched on along the way (i.e. the second variant appearing at some moment t0>0) through a small fluctuation. If σ2 is not too large compared to σ1, for some time after the relevant deformation is turned on, the theory stays close to the (initial) critical surface CP1 and flows towards the fixed point P1. At some later time, the relevant deformation along direction Ic,2 becomes too large and the flow runs significantly away from the critical surface CP1. If the deformation appears well after I1(t) has reached a maximum, the flow can be described as a crossover flow (see Fig. 15 for a schematic example): the RG-flow can be decomposed into a flow along the original critical surface CP1 followed by a flow perpendicular to it. The latter drives the system from the proximity of the fixed point P1 to the new one P3. From the perspective of the new fixed point, the second phase of the flow looks like an RG flow from a UV fixed point P1 to an IR one P3.

Fig. 15.

Fig. 15

Left panel: Schematic example of a crossover flow. During the first part, the system follows closely the critical surface CP1 (i.e. parallel to the Ic,1-axis) and comes close to the fixed point P1 without reaching it. After staying for some time in the vicinity of P1 the system enters into the second part of the flow to the fixed point P3. Right panel: Cumulative numbers of infected as functions of time. The total number of infected shows a two-wave structure with a linear growth phase at around t=30 corresponding to the time that the RG flow is in proximity to P1. The numerical values used for the plots are Nλ1=0.2, Nλ2=0.25 and A1=0.3, A2=0.4.

Since during the first part of the flow, the number of active infected of the second variant I2(t) is fairly small, this flow is very well approximated by a usual eRG dynamics (see Eq. (3.3)), which has been shown [27] to describe real world data very well. Once the system reaches the vicinity of the fixed point P1, it will then enter into a quasi-linear growth phase, in which the number of active infected with respect to both variants is small and therefore the total number of infected (Ic,1+Ic,2)(t) only grows linearly (see Fig. 15). However, after a certain time, the number of infected Ic,2 will grow exponentially (while the number of infected with respect to the original variant remains small) and the system enters into the crossover phase. Now, the β-function for Ic,2 is, once again, essentially modelled by a standard eRG equation (see Eq. (3.3)) describing the flow to P3.

In this picture, the two-wave structure is explained as the (more or less successive) appearance of two different variants of the disease. In particular, the linear-growth phase (that has for example been observed in real world data in the inter-wave period of the COVID-19 pandemic [35]) is explained by the fact that the system comes close to a fixed point, which, however, it cannot reach. It nevertheless spends significant time in its proximity. A similar reasoning underlies the Complex epidemic Renormalisation Group (CeRG) approach [34]. The CeRG β-function of the following type was proposed for the combined number of cumulative infected Ic,tot=Ic,1+Ic,2

βCeRG=dIc,totdt(t)=λIc,tot(t)1ζIc,tot(t)A2δp11Ic,tot(t)Ap2, (3.12)

with A the asymptotic number of infected, λ the infection rate, ζ>1 and δ<0. Indeed, besides the fixed points Ic,tot=0 and Ic,tot=A, the beta-function (3.12) also has the complex fixed points Ic,tot=Aζ1±i|δ|, which cannot be reached by the flow, but are responsible for the linear-growth phase.

To compare (3.3), (3.4) with the β-function in (3.12), we assume that σ1 and σ2 are not significantly different from one another and that the mutation occurs significantly after the maximum number of infected of the first variant (such that the condition of a crossover flow is satisfied). Furthermore, we can write the following combined β-function for the total cumulative number of infected

βtot=1NdIc,1dt(t)+dIc,2dt(t)θNA1Ic,1λ1Ic,11Ic,1A1+θIc,2ξN(A1+A2)Ic,2λ2Ic,2NA11Ic,2NA1NA2, (3.13)

where θ is the Heaviside step-function. Furthermore, we can use the solutions (3.5) of (Ic,1,Ic,2)(t) as logistic functions to schematically plot (3.13): the left panel of Fig. 16 shows a parametric plot of (Ic,1(t)+Ic,2(t),dIc,1+Ic,2dt(t)) for different values of t. The latter are very well approximated by (3.13) (shown by the thin black line), except for a small region around Ic,tA1, in which the beta-function does not in fact reach zero, but interpolates between the two terms in (3.13). This region corresponds to a non-trivial interaction between the variants and governs the transition from the first part of the flow (close to the original critical surface) to the crossover flow. It precisely corresponds to the linear growth region in the context of the CeRG: Indeed, a similar shape of the beta-function can also be achieved through a function of the form (3.12), as is shown in the right panel of 16. The region around Ic,totA1 corresponds to the RG-flow not quite reaching a zero, thus leading to the quasi-linear growth phase.

Fig. 16.

Fig. 16

Schematic plot of the beta-function of the total number of infected computed from the solutions (3.5) (red dots). Left panel: comparison to (3.13), right panel: fitting with a β-function of the type (3.12), with κ=0.2075, δ=0.0021, p1=0.336, p2=0.959, A=0.6996 and Aζ=0.2995.

4. Explicit examples from the COVID-19 pandemic

In the following we present selected examples of number of infected individuals for different countries during the COVID-19 pandemic. These serve merely as illustration for some of the points we have outlined in the previous sections.

4.1. Second wave in California

As a first example, we consider the evolution of the SARS-CoV-2 Epsilon6 variant relative to the other variants in California. The daily number of new cases is shown in Fig. 17 and we have also indicated changes in the social distancing and lockdown measures imposed at the same time. To obtain the number of new infections with the Epsilon variant relative to the others, we have used data from GISAID [38] that provide the sequenced genomes of samples taken in California from Sep/2020 until Apr/2021 (we refer to [1] for more details on our methodology): while only a fraction of all positive test samples per day is genetically analysed, we assume that the distribution of the Epsilon variant in this subset is representative of the distribution of the variant among all infected individuals in California. We therefore have first calculated the percentage of each variant among the sequences analysed at each specific date. By multiplying this percentage with the total number of new cases (obtained from [39], [40]) within all of California, we are able to extract an approximation of the number of new cases per day for each variant. The statistical uncertainty inherent in this procedure has been estimated and taken into account when fitting the cumulative number of cases (see Fig. 18 below). Due to the large number of tests and genome sequencings performed in California and the rather short duration of our study, the resulting uncertainty is moderate. Indeed, the maximal number of new infections for both curves in Fig. 17 occur during a period of 2–3 months (Nov/2020 until Jan/2021), during which furthermore the regional lockdown measures have stayed unchanged. Therefore, in order to model the second wave of the COVID-19 pandemic in California, our implicit assumption that the parameters of the eRG model (i.e. λ1,2 and A1,2) are time-independent seems a reasonable starting point. Indeed, from the data in Fig. 17 we can calculate the cumulative number of infected individuals, which is shown in Fig. 18 along with an approximation in terms of logistic functions (3.5) with constant coefficients. Denoting the cumulative number of individuals infected with the Epsilon variant and the other ones by Ic,2 and Ic,1 respectively, the parameters extracted from the fit are given by

Fig. 17.

Fig. 17

Number of daily new infections in California as a function of time since 08/Sep/2020. The epidemiological data have been extracted from [39], [40], while the number of cases for the Epsilon variant are based on genome sequencing data extracted from [38]. Finally, the information about lockdown and social distancing measures is taken from [41].

Fig. 18.

Fig. 18

Cumulative number of infected individuals with respect to the different variants in California since 08/Sep/2020 and their approximation in terms of logistic functions. The latter have in fact been optimised to fit the daily new infections taking into account their statistical errors.

NA1=(1.644±0.021)106,κ1=105.92±0.40,λ1=0.0590±0.0009,NA2=(1.090±0.014)106,κ2=123.00±0.32,λ2=0.0799±0.0007. (4.1)

These numbers are in fact determined by fitting the derivative of Ic,1,2 (i.e. the daily new cases), as shown in Fig. 19. With the help of these approximations, we can compute the function Φ in (3.4) which in turn allows us to compute the β-functions (3.3). The corresponding flow of the system in the (Ic,1,Ic,2)-plane is shown in the left panel of Fig. 20, where we have chosen the scheme fi(Ic,i)=Ic,i. The right panel of that figure gives the relative deviation of the β-functions from the time derivative of (Ic,1,Ic,2) as given by the number of daily new cases. The plot also indicates changes imposed by the government on the social distancing measures among the population, leading to a stronger deviation of the modelled β-functions from the actual data.

Fig. 19.

Fig. 19

Comparison of the time-dependence of the β-functions (3.3) for (Ic,1,Ic,2) with the numbers of daily new infections.

Fig. 20.

Fig. 20

Left panel: flow of the system in the (Ic,1,Ic,2)-plane. The vectors indicate the β-function as given by (3.3), (3.4) (with fi(Ic,i)=Ic,i for i=1,2). Right panel: relative difference of the modelled β-function (3.3) with the actual number of daily infected along the flow of the system in the (Ic,1,Ic,2)-plane.

From the left panel of Fig. 20 we can see that the flow of the system for the most part is not close to either of the two axes, and thus has not the markings of a crossover flow. This is also clear from the fact that the maxima of daily new infections of the variant B.1.427 and the remaining ones are not very well separated in time (see Fig. 17). Nevertheless, we expect that the formulation of the flow in terms of the relevant operator O+ and the irrelevant operator O defined in Eq. (3.9) may still give a viable description. The time dependence of O± is shown in the left panel of Fig. 21, while the right panel of the same figure shows an approximation of the corresponding β-functions. Finally, the flow of the system in the (O+,O) plane (along with the gradient vector field (3.11)) is shown in Fig. 22.

Fig. 21.

Fig. 21

Left panel: The functions O± defined in Eq. (3.9) computed from the actual numbers of infected individuals as a function of time. Right panel: The corresponding β-functions together with their approximations (dashed black lines) implied by (4.1).

Fig. 22.

Fig. 22

Flow of the system in the (O+,O)-plane based on the data for California.

We remark that the main reason for deviation of the flow from a gradient flow (as is showcased in the right panel of Fig. 20) can be attributed to the fact that the active number of infected individuals is not zero at the beginning and the end of the flow (see Fig. 19). This is, in fact, because we are not describing the entire pandemic in California, but only the period of September 2020 to April 2021, when a second wave hit the state. Therefore, we are not really describing the flow from one fixed point to another, but rather the flow between two linear-growth phases of the system. As such, the description in terms of the β-function (3.3) is only approximative, which is particularly visible in the beginning and the end of the flow.

4.2. Second and third waves in the UK

As a next example, we consider the time evolution of the SARS-CoV-2 Alpha7 variant in the United Kingdom. The daily number of new cases starting from 01/July/2020 is shown in Fig. 23, where we have also indicated changes in the social distancing and lockdown measures. The number of new infections with the Alpha variant have been extracted using GISAID [38] combined with epidemiological data from [39]. Due to the high number of PCR tests and genome analysis performed in the UK, the inherent statistical uncertainty is rather small (see e.g. Fig. 24). In contrast to the evolution (of a single wave) in California in the previous subsection, the time evolution studied in this example spans two distinct waves lasting roughly 5 months (Oct/2020 until Feb/2021). As Fig. 23 indicates, during this period,8 the lockdown measures have not remained constant, which potentially leads to additional effects, as we shall remark later on. From Fig. 23 we can compute the cumulative number of infected individuals, which is shown in Fig. 24. The latter also shows approximations in terms of logistic functions: denoting the cumulative number of individuals infected with the Alpha variant and the other ones by Ic,2 and Ic,1 respectively, the parameters extracted from the fit are given by

Fig. 23.

Fig. 23

Number of daily new infections in the United Kingdom as a function of time since 01/July/2020. The epidemiological data have been extracted from [39], while the number of cases for the Alpha variant are based on genome sequencing data extracted from [38]. Finally, the information about lockdown and social distancing measures is taken from [42].

Fig. 24.

Fig. 24

Cumulative number of infected individuals with respect to the different variants in the United Kingdom since 01/July/2020 and their approximation in terms of logistic functions. The latter have in fact been optimised to fit the daily new infections.

NA1=(1.895±0.035)106,κ1=136.06±0.78,λ1=0.0447±0.0009,NA2=(2.007±0.030)106,κ2=197.12±0.33,λ2=0.0812±0.0014. (4.2)

As in the case of California, these numbers are in fact determined by fitting the derivative of Ic,1,2 (i.e. the daily new cases), as indicated in Fig. 25. While the fit of β2 correctly captures a single peak, the fit of β1 gives a single maximum rather than two. The appearance of the second maximum in the red curve in the left part of Fig. 25 (stemming from the maximum of the orange curve in Fig. 23 in the end of Dec/2020) cannot be explained with statistical uncertainties inherent in the way we have extracted the relative number of new infections with the Alpha variant and may have different reasons:

Fig. 25.

Fig. 25

Comparison of the time-dependence of the β-functions (3.3) for (Ic,1,Ic,2) with the numbers of daily new infections in the UK.

  • (i)

    change of the infection rate of all variants due to modified social behaviour and/or travelling habits related to the Christmas holidays

  • (ii)

    local geographic effects not captured by the data

  • (iii)

    appearance of additional (subdominant) variants of the virus

  • (iv)

    non-trivial interaction of the virus variants that are not captured by the beta-functions (3.3)

Since the effect is rather small, we shall continue with the approximations in Fig. 25 and leave the analysis of this effect (notably the possibility (iv)) for future work. With these approximations we can compute the function Φ in (3.4) which in turn determines the β-functions (3.3). The corresponding flow of the system in the (Ic,1,Ic,2)-plane is shown in the left panel of Fig. 26, where we have chosen the scheme fi(Ic,i)=Ic,i. The right panel of that figure gives the relative deviation of the β-functions from the time derivative of (Ic,1,Ic,2) as given by the number of daily new cases.

Fig. 26.

Fig. 26

Left panel: Vector field and flow of the system in the (Ic,1,Ic,2)-plane. Right panel: relative difference of the modelled β-function (3.3) with the actual number of daily infected along the flow of the system in the (Ic,1,Ic,2)-plane for the UK. As mentioned before, the deviations from the beta-function predicted by the eRG approach may be related to the changes in the lockdown measures or may indicate of additional subleading effects not captured in the current approach.

From the flow diagram in Fig. 26 we can see that the system remains close to the Ic,1-axis. We can therefore try to see whether it can be parametrised in a fashion resembling a crossover flow and whether it is possible to explain the two-wave structure from the flow close to a fixed point. To this end, we have plotted the time derivative of the total number of infected individuals Ic,tot=Ic,1+Ic,2 in Fig. 27 along with approximations along the lines of Eq. (3.12), (3.13) (along with its 0.999 confidence interval). Indeed, we can see that the function has a pronounced local minimum, which models the proximity of the system to the fixed point near the Ic,1-axis and which is responsible for the short linear growth phase in the end of November/beginning of December, as can be seen in Fig. 23.

Fig. 27.

Fig. 27

Approximation of the time derivative of Ic,tot=Ic,1=Ic,2 (red dots). Left panel: βtot as defined in (3.13); Right panel: βCeRG as in Eq. (3.12).

4.3. First and second waves in South Africa

As a final example, we consider the time evolution of the SARS-CoV-2 Beta9 variant in South Africa. The daily number of new cases starting from 08/March/2020 is shown in Fig. 28. The number of new infections with the Beta variant have been extracted using GISAID [38] combined with epidemiological data from [39]. Due to the rather small number of genome sequences of samples, the separation between the Beta variant and others is afflicted with a rather large (time-dependent) statistical uncertainty, which needs to be taken into account in the following and which makes the interpretation of certain results delicate. In Fig. 29 we have plotted the cumulative number of infected individuals alongside with their uncertainties. This figure also shows an approximation in terms of logistic functions, which is weighted by the (time-dependent) uncertainties: denoting the cumulative number of individuals infected with the variant B.1.351 by Ic,2 and the cumulative number of infected with the other invariants by Ic,1, the fit parameters along Eq. (3.5) are given by

Fig. 28.

Fig. 28

Number of daily new infections in South Africa as a function of time since 08/March/2020. The epidemiological data have been extracted from [39], while the number of cases for the Beta variant are based on genome sequencing data extracted from [38]. Finally, the information about lockdown and social distancing measures is taken from [43].

Fig. 29.

Fig. 29

Cumulative number of infected individuals for the variant B.1.351 (right panel) and all other variants (left panel) since 08/March/2020 and their approximation in terms of logistic functions. The error bars represent the (accumulated) statistical uncertainty following the rather low number of genome sequencings compared to the number of (positive) tests.

NA1=(0.636±0.009)106,κ1=131.86±0.35,λ1=0.0739±0.0012,NA2=(0.852±0.020)106,κ2=298.91±1.08,λ2=0.0424±0.0007. (4.3)

These numbers are determined by fitting (part of) the derivative of Ic,1,2 (i.e. the daily new cases), as indicated in Fig. 30. With these approximations we may compute the function Φ in (3.4) which in turn gives an approximation of the β-functions (3.3). The flow of the system following the latter in the (Ic,1,Ic,2)-plane is shown in Fig. 31 (as before, we use the scheme fi(Ic,i)=Ic,i). In the right panel, for better visibility, we have combined the error bars of the (black) data points into a grey region: while the black trajectory is not actually hitting the predicted fixed point of the beta-function, the latter is within the error bars. It is therefore difficult to say, whether this indicates additional subleading effects in the time-evolution of the virus, or merely statistical uncertainty. Apart from this effect, the flow strongly resembles a crossover flow, which for the first part follows the Ic,1-axis, coming close to a fixed point at (NA1,0). However, the appearance of the variant B.1.351 triggers a crossover flow to a new fixed point.

Fig. 30.

Fig. 30

Comparison of the time-dependence of the β-functions (3.3) for (Ic,1,Ic,2) with the numbers of daily new infections in South Africa. The fits (along with their 0.99 confidence interval) are weighted with respect to the statistical uncertainty of the data. This explains for example, why the maximum of the red curve around day 300 in the left panel is effectively neglected: it is afflicted with a very high statistical uncertainty.

Fig. 31.

Fig. 31

Vector field and flow of the system in the (Ic,1,Ic,2)-plane. Left panel: the black dots represent the actual numbers of cumulative infected (together with their error bars), while the red dots represent the predicted eRG flow which lies within the error bars. right panel: For better visibility we have fused the error bars into the grey zone representing the statistical error of the cumulative numbers of infected individuals.

The fact that the flow in Fig. 31 stays close to the Ic,1-axis during the first part of the flow, allows us to model it in the form of a crossover flow. To this end, we have plotted the time derivative of the total number of infected individuals Ic,tot=Ic,1+Ic,2 in Fig. 32 along with approximations along the lines of Eq. (3.12), (3.13). Indeed, we can see that the function has a pronounced local minimum, which models the proximity of the system to the fixed point near the Ic,1-axis and which is responsible for the linear growth phase between the end of August 2020 and the end of November 2020 as can be seen in Fig. 28.

Fig. 32.

Fig. 32

Approximation of the time derivative of Ic,tot=Ic,1=Ic,2 (red dots). Left panel: βtot as defined in (3.13); Right panel: βCeRG as in Eq. (3.12) together with its 0.99 confidence interval.

5. Conclusions

In this paper we have extended the epidemic Renormalisation Group (eRG) approach to study the time evolution of several competing variants of a disease. As a first step, in order to gain intuition, we have analysed a simple compartmental model (termed SIIR) with two different groups of infectious individuals (and different infection and recovery rates). Numerical solutions of the model indicate that as long as the reproduction numbers of the two variants are not too different, the cumulative number of infected individuals of the two variants can be well approximated by independent logistic functions (sigmoids). Moreover, we have approximated the dynamics in terms of flow equations that describe the trajectories of the system in the P-plane keeping track of the cumulative number of infected. The resulting Eqs. (2.21), (2.27) can be compactly formulated in terms of the gradient of a single function Φ, which is quadratic for variants with a reproduction number σ<1 and cubic for σ>1. Furthermore, from a theoretical perspective, we find it striking that the endpoints of the flows are characterised by co-dimension 1 surfaces in the P-plane that represent equivalent system from the perspective of the SIIR model, akin to surfaces of fixed points that are related by the action of marginal operators in the context of conformal field theories.

As a second step, we have used the intuition gained from the simple SIIR model to propose a generalisation of the eRG framework to include the dynamics of multiple variants. We have defined the flow Eq. (3.3) which we call beta-function in analogy to the RG framework. Writing it in the form of a gradient equation, we have analysed its fixed points in the P-plane, as well as different trajectories connecting them. In particular, we have made contact to the CeRG approach [34], [35] which has modelled the multi-wave structure of epidemics with the help of complex fixed points: in this regard, quasi-linear growth phases separating two waves are explained by the system coming close to a complex fixed point, which it cannot reach. In the current paper we have shown that such a behaviour can occur naturally through the appearance of a new variant of the disease.

Finally, we have confronted our model with data from the spread of different variants of SARS-CoV-2 in California, the United Kingdom and South Africa, thus empirically validating our approach.

In the future it will be interesting to further generalise and extend the approach developed here: on the one hand going beyond two competing variants will lead to a richer structure of fixed points for the system, thus allowing to model more complex multi-wave pandemics. On the other hand, so far our analysis has not taken into account other factors that govern the time evolution of pandemics, such as the impact of vaccines and non-pharmaceutical interventions but also the possibility for re-infections (through only partial immunity granted from recovery from a given variant). We hope to be able to return to these points in the future. Finally, we shall use the model developed in this paper for a computer-aided analysis of the spread of different SARS-CoV-2 variants in Europe and the USA along the lines of the work in our companion paper [1].

CRediT authorship contribution statement

Giacomo Cacciapaglia: Conceptualisation, Methodology, Writing – original draft. Corentin Cot: Conceptualisation, Methodology, Writing – original draft. Stefan Hohenegger: Conceptualisation, Methodology, Writing – original draft. Francesco Sannino: Conceptualisation, Methodology, Writing – original draft. Shahram Vatani: Conceptualisation, Methodology, Writing – original draft.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Footnotes

1

As we shall explain below, this assumption is one of the reasons why this model is not very suited to describe realistic epidemics and confront real world data. However, it simplifies the analysis while still providing some intuition on the dynamics of two competing variants of a disease.

2

As we have seen in Section 2.2, choosing different ε1,2 leads to only small deviations, as long as σ1 and σ2 are not too different from one another.

3

As we have remarked above, in the SIIR model the (small) parameters δi are related to the (finite) initial conditions Ii,0 of the (active) number of infectious individuals which is necessary to start the time evolution of the system. In the context of the eRG, as we shall discuss in more detail below, we model the point (Ic,1,Ic,2)=(0,0) (i.e. the absence of the disease) as a fixed point, albeit a repulsive one, which requires δi=0.

4

We shall discuss in the following Section 3.3 in more detail the case in which along one of the black trajectories in Fig. 13 infectious individuals of the other variant appear. This scenario models for example the appearance of a mutation of an already present variant. In this case the system will flow to the fixed point P3 (rather than continue towards P1 or P2.)

5

SH would like to thank Michele Della Morte for useful exchanges on the form of these functions as O(2) transformations of (Ic,1,Ic,2).

6

This variant comprises the variants lineage B.1.429 and B.1.427 under the Phylogenetic Assignment of Named Global Outbreak Lineages (pangolin) tool and was first detected in California in July 2020.

7

This variant is also called lineage B.1.1.7 under the pangolin tool and was first found in Nov. 2020 (in a sample dating from September 2020) in the UK.

8

We also remark that the time period includes the Christmas holidays, which traditionally leads to increased social activity and travel among the population.

9

This variant is also called lineage B.1.351 under the pangolin tool and was first found in the Eastern Cape province of South Africa.

References

  • 1.de Hoffer A., Vatani S., Cot C., Cacciapaglia G., Chiusano M.L., Cimarelli A., Conventi F., Giannini A., Hohenegger S., Sannino F. 2021. Variant-driven multi-wave pattern of COVID-19 via a machine learning analysis of spike protein mutations. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hamer W. Age-incidence in relation with cycles of disease prevalence. Trans. Epidem. Soc. Lond. 1896;15:64–77. [PMC free article] [PubMed] [Google Scholar]
  • 3.Hamer W. Epidemic disease in England: The evidence of variability and of persistency of type; Lecture 1. Lancet. 1906:569–574. [Google Scholar]
  • 4.Hamer W. Epidemic disease in England: The evidence of variability and of persistency of type; Lecture 2. Lancet. 1906:655–662. [Google Scholar]
  • 5.Hamer W. Epidemic disease in England: The evidence of variability and of persistency of type; Lecture 3. Lancet. 1906:733–739. [Google Scholar]
  • 6.Ross R. The Prevention of Malaria. second ed. John Murray; London: 1911. [Google Scholar]
  • 7.Ross R. An application of the theory of probabilities to the study of a priori pathometry: Part I. Proc. R. Soc. A. 1916;92:204–230. [Google Scholar]
  • 8.Ross R., Hudson H. An application of the theory of probabilities to the study of a priori pathometry: Part II. Proc. R. Soc. A. 1916;93:212–225. [Google Scholar]
  • 9.Ross R., Hudson H. An application of the theory of probabilities to the study of a priori pathometry: Part III. Proc. R. Soc. A. 1916;93:225–240. [Google Scholar]
  • 10.McKendrick A. The rise and fall of epidemics. Paludism (Trans. Comm. Study Malar.India) 1912;1:54–66. [Google Scholar]
  • 11.McKendrick A. Studies on the theory of continuous probabilities, with special reference to its bearing on natural phenomena of a progressive nature. Proc. Lond. Math. Soc. 1914;13:401–416. [Google Scholar]
  • 12.McKendrick A. Applications of mathematics to medical problems. Proc. Edinburgh Math. Soc. 1926;44:98–130. [Google Scholar]
  • 13.Kermack W.O., McKendrick A., Walker G.T. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 1927;115:700–721. [Google Scholar]
  • 14.Perc M., Jordan J.J., Rand D.G., Wang Z., Boccaletti S., Szolnoki A. Statistical physics of human cooperation. Phys. Rep. 2017;687:1–51. [Google Scholar]
  • 15.Wang Z., Andrews M.A., Wu Z.-X., Wang L., Bauch C.T. Coupled disease–behavior dynamics on complex networks: A review. Phys. Life Rev. 2015;15:1–29. doi: 10.1016/j.plrev.2015.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wang Z., Bauch C.T., Bhattacharyya S., d’Onofrio A., Manfredi P., Perc M., Perra N., Salathé M., Zhao D. Statistical physics of vaccination. Phys. Rep. 2016;664:1–113. [Google Scholar]
  • 17.Hethcote H.W. The mathematics of infectious diseases. SIAM Rev. 2000;42(4) [Google Scholar]
  • 18.N. Bailey, The Mathematical Theory of Infectious Diseases, second ed., Hafner, New York, 1975.
  • 19.Essam J.W. Percolation theory. Rep. Progr. Phys. 1980;43:833. [Google Scholar]
  • 20.Stauffer D. Scaling theory of percolation clusters. Phys. Rep. 1979;54:1–74. [Google Scholar]
  • 21.Domb C. Fluctuation phenomena and stochastic processes. Nature. 1959;184:509–512. [Google Scholar]
  • 22.Peliti L. Path integral approach to birth-death processes on a lattice. J. Phys. France (Paris) 1985;46:1469–1483. [Google Scholar]
  • 23.Doi M. Second quantization representation for classical many-particle system. J. Phys. A: Math. Gen. 1976;9:1465. [Google Scholar]
  • 24.Doi M. Stochastic theory of diffusion-controlled reaction. J. Phys. A: Math. Gen. 1976;9:1479. [Google Scholar]
  • 25.Cardy J.L., Grassberger P. Epidemic models and percolation. J. Phys. A: Math. Gen. 1985;18(6):L267–L271. [Google Scholar]
  • 26.Grassberger P. On the critical behavior of the general epidemic process and dynamical percolation. Math. Biosci. 1983;63(2):157–172. [Google Scholar]
  • 27.Della Morte M., Orlando D., Sannino F. Renormalization group approach to pandemics: The COVID-19 case. Front. Phys. 2020;8:144. [Google Scholar]
  • 28.Wilson K.G. Renormalization group and critical phenomena. 1. Renormalization group and the Kadanoff scaling picture. Phys. Rev. B. 1971;4:3174–3183. [Google Scholar]
  • 29.Wilson K.G. Renormalization group and critical phenomena. 2. Phase space cell analysis of critical behavior. Phys. Rev. B. 1971;4:3184–3205. [Google Scholar]
  • 30.Cacciapaglia G., Sannino F. Interplay of social distancing and border restrictions for pandemics (COVID-19) via the epidemic Renormalisation Group framework. Sci. Rep. 2020;10:15828. doi: 10.1038/s41598-020-72175-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Cacciapaglia G., Cot C., Sannino F. Mining google and apple mobility data: Temporal anatomy for COVID-19 social distancing. Sci. Rep. 2021;11:4150. doi: 10.1038/s41598-021-83441-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Cacciapaglia G., Cot C., Islind A.S., Óskarsdóttir M., Sannino F. Impact of US vaccination strategy on COVID-19 wave dynamics. Sci. Rep. 2021;11(1):1–11. doi: 10.1038/s41598-021-90539-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Cacciapaglia G., Cot C., Sannino F. Second wave COVID-19 pandemics in europe: A temporal playbook. Sci. Rep. 2020;10:15514. doi: 10.1038/s41598-020-72611-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Cacciapaglia G., Sannino F. Evidence for complex fixed points in pandemic data. Front. Appl. Math. Stat. 2021;7 [Google Scholar]
  • 35.Cacciapaglia G., Cot C., Sannino F. Multiwave pandemic dynamics explained: How to tame the next wave of infectious diseases. Sci. Rep. 2021;11:6638. doi: 10.1038/s41598-021-85875-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Cacciapaglia G., Cot C., Della Morte M., Hohenegger S., Sannino F., Vatani S. 2021. The field theoretical ABC of epidemic dynamics. [Google Scholar]
  • 37.Della Morte M., Sannino F. Renormalisation group approach to pandemics as a time-dependent SIR model. Front. Phys. 2021;8:583. [Google Scholar]
  • 38.Gisaid, https://www.gisaid.org.
  • 39.Our world in data, https://ourworldindata.org.
  • 40.New York times github,https://github.com/nytimes/covid-19-data/blob/master/us-states.csv.
  • 41.COVID-19 pandemic in California, https://en.wikipedia.org/wiki/COVID-19_pandemic_in_California.
  • 42.COVID-19 pandemic in the United Kingdom, https://en.wikipedia.org/wiki/COVID-19_pandemic_in_the_United_Kingdom.
  • 43.COVID-19 pandemic in South Africa, https://en.wikipedia.org/wiki/COVID-19_pandemic_in_South_Africa.

Articles from Physica a are provided here courtesy of Elsevier

RESOURCES