Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2021 Nov 19;154:111621. doi: 10.1016/j.chaos.2021.111621

Modeling the effect of the vaccination campaign on the COVID-19 pandemic

Mattia Angeli a,, Georgios Neofotistos a, Marios Mattheakis a, Efthimios Kaxiras a,b
PMCID: PMC8603113  PMID: 34815624

Abstract

Population-wide vaccination is critical for containing the SARS-CoV-2 (COVID-19) pandemic when combined with restrictive and prevention measures. In this study we introduce SAIVR, a mathematical model able to forecast the COVID-19 epidemic evolution during the vaccination campaign. SAIVR extends the widely used Susceptible-Infectious-Removed (SIR) model by considering the Asymptomatic (A) and Vaccinated (V) compartments. The model contains several parameters and initial conditions that are estimated by employing a semi-supervised machine learning procedure. After training an unsupervised neural network to solve the SAIVR differential equations, a supervised framework then estimates the optimal conditions and parameters that best fit recent infectious curves of 27 countries. Instructed by these results, we performed an extensive study on the temporal evolution of the pandemic under varying values of roll-out daily rates, vaccine efficacy, and a broad range of societal vaccine hesitancy/denial levels. The concept of herd immunity is questioned by studying future scenarios which involve different vaccination efforts and more infectious COVID-19 variants.

Keywords: COVID-19, Machine learning, Neural networks, Vaccines

1. Introduction

The World Health Organization (WHO) declared the SARS-CoV-2 (COVID-19) outbreak in Wuhan to be a pandemic on March 11, 2020. Since then, COVID-19 has become a serious global health threat due to its rapid spread, transmission through asymptomatic infected individuals and complex epidemiological dynamics. As of May 2021, already more than 3 million lives have been lost due to the virus. The spread of SARS-CoV-2 has thus far been extremely difficult to contain.

By the end of 2020, the successful development of effective vaccines and the onset of their widespread distribution in most of the world’s countries, was hailed as the decisive mean to contain the pandemic. However, important questions linger on whether the vaccination effort will succeed in effectively eradicating the disease. The appearance and wide spread of more contagious SARS-Cov-2 strains, the onset and scale of the vaccine deployment and high levels of vaccine hesitancy/denial in the society, are among the key factors hindering the vaccination effort and the achievement of herd immunity. Modeling the impact of these key factors on the evolution of the pandemic is of critical importance for assessing the vaccination effectiveness against it.

In studying past epidemics, scientists have systematically applied “random mixing” compartmental models which assume that an infectious individual can spread the disease to any susceptible member of the population before becoming recovered or removed, as originally considered by Kermack and McKendrick [1]. These models constrain the total population in compartments by considering stages of the infection and flows among them.

In the present study we propose a new model named SAIVR, which incorporates two important characteristics of the COVID-19 epidemic, namely the considerable transmission of the disease by asymptomatic infected individuals and the vaccination campaign with World Health Organization (WHO) approved vaccines. More recent modeling approaches involve agent-based simulations [2], heterogeneous social networks [3], [4], [5], [6], [7], [8], and Bayesian inference models [9]. Although a large number of research studies are currently investigating the COVID-19 epidemiological characteristics [10], [11], [12], [13], [14], [15], [16], [17], [18], we believe that a simple but efficient model, which can capture the basics of the complex behavior of the pandemic including the vaccine roll-out, can offer useful guidance for the pandemic’s near-term and longer-term evolution. By using a recently developed semi-supervised machine learning approach [19], [20], [21] we systematically reproduced the pandemic dynamics during the 2021 spring in several different countries. We then used the model to assess the importance of a rapid vaccination campaign to prevent future outbreaks driven by more infectious variants.

The work is organized as follows. In Section 2 we introduce the SAIVR model and its parameters. The machine learning approach that we used to reproduce the infectious curves of 27 selected countries/states is thoroughly described in Section 3. In Section 4 we study future scenarios involving more infectious variants making quantitative arguments on how they might affect herd immunity. Section 5 is devoted to concluding remarks.

2. The SAIVR model

One of the first attempts to mathematically describe the spread of an infectious disease is due to Kermack and McKendrick [1]. In 1927 they introduced the so-called Susceptible-Infectious-Removed (SIR) model. The SIR model describes the dynamics of a (fixed) population of N individuals split into three compartments:

  • S(t) is the Susceptible compartment that counts the number of individuals susceptible but still not infected by the disease;

  • I(t) is the Infectious compartment that counts the number of infectious individuals;

  • R(t) is the Removed compartment. It represents the number of those who can no longer be infected either because they recovered and gained long-term immunity or because they passed away.

The model involves two positive parameters, β and γ which govern the flow from one compartment to the other:

  • -

    β is the transmission rate or effective contact rate of the disease: an infected individual comes into contact with β other individuals per unit time (the fraction that are susceptible to contracting the disease is S/N);

  • -

    γ is the removal rate. γ1 is the mean number of days who is infected spends in the Infectious compartment.

The SIR model obeys the following system of ordinary differential equations (ODE):

dIdt=βISNγI (1a)
dSdt=βISN (1b)
dRdt=γI (1c)

Although the SIR model has been adopted to study epidemic outbreaks in many previous works [2], [22], [23], [24], [25], [26], [27], it lacks few important aspects of the current ongoing pandemic. First of all, it has been reported [28], [29] that an important fraction of those who are carrying the virus is asymptomatic. Since they often avoid contact tracing due to the absence of symptoms, they can spread the disease while remaining undetected. Furthermore, in December 2020 a global vaccination campaign has started. Vaccinating is a safe way to transfer people from the Susceptible to the Removed compartment bypassing the Infectious one thus reducing the likelihood of an outbreak.

The SAIVR model extends the SIR model by incorporating the two aforementioned additional compartments:

  • A(t) is the Asymptomatic/Undetected compartment that counts the number of those individuals that despite being infected are not tested/traced. This mainly occurs to those who recover from the infection without suffering any symptoms.

  • V(t) is the Vaccinated compartment. It takes into account those that have received a vaccine shot but are still not fully immunized by it.

We emphasize that in the SAIVR model the fully vaccinated population is already included in the model as part of the ‘Removed’ compartment, while the ‘Vaccinated’ population is an additional compartment that adds further predictive power to the model by taking into consideration the fact that it takes a few weeks (often due to the necessity of a second vaccine shot) to reach full immunization; full immunization, in the context of the model, is equivalent to moving a person to the ‘Removed’ compartment.

The SAIVR model ODEs read:

dIdt=β1ISN+α2ASN+ζIVNγI, (2a)
dAdt=α1ASN+β2ISN+ηAVNγA, (2b)
dSdt=βISNαASNδSN+(1λ)ϵV, (2c)
dVdt=δSNηAVNζIVNϵV, (2d)
dRdt=γI+γA+λϵV. (2e)

The compartment inter-dependencies and flow are presented in Fig. 1 . The parameters of the SAIVR model are the following:

  • -

    β1 describes the rate at which individuals are exposed to symptomatic infection. An infected symptomatic individual comes into contact and infects β1 susceptible individuals per unit time;

  • -

    α1 is the asymptomatic infection rate. An infected asymptomatic individual comes into contact with α1 susceptible individuals per unit time;

  • -

    β2 describes the rate at which susceptible individuals become asymptomatic infected after entering in contact with a symptomatic individual;

  • -

    α2 describes the rate at which who’s susceptible becomes symptomatic after entering in contact with an asymptomatic individual;

  • -

    γ retains the same meaning as in the SIR model, representing the mean removal rate. γ1 is the mean amount of time individuals spend either in the Infectious or Asymptomatic compartments;

  • -

    ζ is the rate at which a vaccinated (but still not immune) individual enters in contact with a symptomatic infectious;

  • -

    η describes the transmission rate at which who’s asymptomatic comes into contact and infects vaccinated (but still not immune) individuals;

  • -

    δ is the first shot vaccination rate;

  • -

    λ is the vaccine efficacy;

  • -

    ϵ1 is the mean amount of time an individual spends in the Vaccinated compartment before reaching immunity and moving to the Removed compartment.

Fig. 1.

Fig. 1

Illustration of the SAIVR model compartments and their inter-dependencies denoted by incoming and outgoing arrows and relevant flow parameters.

Countries and states do not respond to the disease as static entities passively facing the pandemic. They react by actively imposing (and relaxing) restrictive measures, learning how to effectively treat the infected, adjusting social interactions and by launching vaccination campaigns. Finally, the virus itself evolves in more infectious variants [30].

Country-specific parameters can be obtained by fitting the SAIVR model to a selected infectious wave occurred in a given country. SAIVR has 14 adjustable parameters or initial conditions that needs to be estimated; given the scarcity of data (only the infectious and vaccinated populations are known) optimizing them presents a challenging problem. To address this, we either fixed some of them or employed a novel fitting method based on semi-supervised neural networks, which we present in the following section.

3. Solving the SAIVR model with neural networks

In order to apply the SAIVR model we need a realistic estimate of the parameters and initial conditions for the system of Eq. (2). To obtain them we employed machine learning, a powerful method which has been extensively used for disease modeling [21], [31], [32], [33] and dynamical system forecasting [19], [34], [35]. Our approach employs a semi-supervised procedure which determines the optimal set of initial conditions and parameters of the SAIVR model, yielding solutions that best fit a given data-set. A sketch of this procedure is shown in Fig. 2 .

Fig. 2.

Fig. 2

Semi-supervised network architecture. During the unsupervised procedure (blue box), a time sequence t, a set of initial conditions Z0 and parameter bundles Θ are fed as an input to a 6 layers fully connected network (FCN). The output of the network ZNN is multiplied by a function f(t) to become a tentative parametric solution Z^(t) of the system of ODE in Eq. (2). The quality of Z^(t) is probed by the loss function L. When the network has learned the solutions, the inverse problem is then solved (red box). An optimization algorithm selects the initial conditions and parameters in the bundle that best fit a given data-set. The loss Linv depends on the infectious population of a given country/state IData during the time sequence t. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

3.1. Unsupervised learning

The unsupervised part (blue box) consists of a data-free Neural Network (NN) that is trained to discover solutions for an ODE system of the form:

dZdt=g(Z),Z(t=0)=Z0. (3)

where Z=(S(t),A(t),I(t),V(t),R(t)) and g(Z) is given in Eq. (2). The NN takes as an input a time sequence t, a set of initial conditions Z0, and modeling parameters Θ.

As we’ll see in the following, t is the set of days involved in a given epidemic wave going from t0 to t0+Δt, Z0 are the initial compartment populations and Θ some parameters of the SAIVR model. The initial conditions and parameters are randomly sampled at each iteration n over predefined intervals called bundles [19], [21], so that the network learns an entire family of solutions. The inputs propagate through the network until an output vector ZNN of the same dimensions as the target solutions Z is produced. The learned solutions Z^ satisfy the initial conditions identically by considering parametric solutions of the form:

Z^=Z0+f(t)(ZNNZ0) (4)

where f(t)=1et [20]. The loss function:

L=(dZdtg(Z))2 (5)

solely depends on the network predictions averaged () over all the iterations n, providing an unsupervised learning framework. Time derivatives are computed using the automatic-differentiation and back-propagation techniques [36].

3.2. Fitting a dataset

Once the NN is trained to provide solutions for the system of Eq. (2), its weights and biases are fixed and the trained network is used to develop a supervised pipeline for the estimation of the initial conditions and parameters, leading to solutions Z˜(t) that fit given data. This procedure is illustrated in the red box in Fig. 2. A solution Z^ is generated by the network starting from Z0¯ and Θ¯ randomly selected in the bundles. A stochastic gradient descent optimizer then adjusts Z0¯ and Θ¯ in order to minimize the loss function:

Linv=(I˜(t)IData(t))2 (6)

where IData(t) is the infectious population of a given country/state and I˜(t) is its NN fit.

The machine learning approach presented in this work provides numerical solutions to a nonlinear system of ODEs without statistical error (no data is used in the first part of the process). The supervised part is only learning what are the best parameters/conditions of the SAIVR model that fits given data, so the statistical error of the noisy data does not affect the final outcome of the process.

3.3. Applying the method to real data

We used this method to reproduce the most recent COVID-19 waves in 27 countries or states. To test the generality of the model we selected epidemic waves that occurred in a broad range of geopolitical conditions, restrictive measures, time periods and vaccination efforts. The bundles and fixed parameters used during the unsupervised training of the network are listed in Table. 1 .

Table 1.

Bundles and fixed parameters.

I0 A0 V0 R0 β1 γ α1 δ
[0.1%,2%] [0.1%,2%] [0%,60%] [0%,30%] [0.1,0.25] [0.07,0.12] [0.1,0.25] [0,0.03]
ϵ λ η ζ β2 α2
1/21 0.95 1e2 5e3 1e3 1e2

We found that the model is weakly sensitive on the choice of most parameters and thus, we kept some of them fixed during the training. The value of the vaccine efficacy λ is based on Refs. [37], [38], where a vaccine efficacy of 94.8% and 94.1% is reported for the Pfizer-BioNTech and Moderna mRNA vaccines. The VI and VA rates ζ and η are derived by considering the order-of-magnitude ratio of the infected individuals in the vaccinated and placebo cohorts of Ref. [37], with β2 and α2 set to 0.001 and 0.01 respectively, which are in order-of-magnitude agreement with the aforementioned results of the clinical trials. The VR rate ϵ is the inverse time an individual takes to acquire vaccine protection after the first shot. We set it to ϵ1=21 to reflect the fact that the second shot of vaccines is usually administered about three weeks after the first one.

The remaining parameters Θ=(α1,β1,δ,γ) strongly depend on what kind of restrictive measures are taken or how fast the vaccination campaign is, i.e. they are country dependent. We therefore selected them in bundles so that the network could learn solutions corresponding to a broad range of parameters and fit multiple countries. The decay rate γ is the inverse of the removal time which is about 1–2 weeks [39]. The main symptomatic infection rate β1 was sampled in an interval consistent with previous reports [2]. The earlier estimates that 80% of infected population is asymptomatic has been considered too high and have since been revised down [28], [29]; the initial studies estimating this proportion were limited by heterogeneity in case definitions, incomplete symptom assessment, and inadequate retrospective and prospective follow-up of symptoms. We selected the main asymptomatic infection rate α1 to be varying in the same interval of β1. The first shot vaccination rate δ was selected based on known vaccination reports (see Appendix A).

Finally, the initial condition bundles Z0=(S0,A0,I0,V0,R0) are defined over broad intervals able to cover the expected (S,A,I,V,R) populations at any given time for all the cases considered. Although the initial infected population I0 is known, we still included it in the set of quantities to be fit by the network. We found that by doing so, the network generalizes better improving the fit of a given epidemic wave.

We then performed the fitting procedure described in Sectiion 3.2 using the infectious populations of 27 countries/states. The data (including the number of vaccine shots administered) is retrieved from the ‘Our World in Data’ GitHub repository [40]. Strictly speaking, the Infectious population of the SAIVR model is the amount of people that are actively infected by the virus on a given day, a number that should not be confused with the daily new cases. As such, it was computed as the difference between the total number of cases and of recovered/dead individuals.

We first applied the method to study the most recent COVID-19 wave in some of the countries with the fastest vaccination campaigns and that managed to inoculate the first shot in at least 30% of their population. Fig. 3 presents real data (red points), fits (black solid line), and some predictions (black dashed line) for the infectious populations of Israel, UK, Hungary, France, Romania and Serbia. We also studied the USA, although due to the large size of the country we focused only on the largest states or those with the highest vaccination rates in the first quarter 2021, see Fig. 4 . To assess the generality of the model and the fitting procedure, we applied it to other 12 countries spread throughout the world and which had at the end of spring 2021 a high number of cases. The corresponding fits are shown in Figs. S1 and S2 of the Supplementary Material. As can be seen, the model is able to well reproduce all these epidemic curves despite missing some abrupt and rapid events that can be captured by more sophisticated multiple-wave models [2]. All the parameters determined by the network can be found in the Supplementary Material; their values are within the bundles of Table 1. In particular, γ was found in the 10–12 days range, β1 oscillating in [0.14–0.19] while alpha was more volatile.

Fig. 3.

Fig. 3

Infectious population percentage (red dots) of some selected countries in which a high percentage of their population has received a vaccine shot. The infectious population is expressed as a function of time (days). The date at which the wave began is pointed out on the horizontal axis. The neural network fits and predictions are shown by black solid and dashed lines respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 4.

Fig. 4

Infectious population percentage (red dots) of ten selected US states. The infectious population is expressed as a function of time (days). The date at which the wave began is pointed out on the horizontal axis. The neural network fits and predictions are shown by black solid and dashed lines respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

4. Insights on the future: vaccine hesitancy, herd immunity and new variants

In this section we use the results of the analysis performed in the previous section to study how the vaccination campaign is affecting the pandemic and its future evolution. Unless otherwise specified, we set β1=0.16, α1=0.2 and γ=1/12, the average values retrieved from fitting real data. We start by pointing out how the vaccine efficacy is a key factor in halting the spread of the virus and how hesitancy is challenging the vaccination campaigns. Finally, we discuss the concept of herd immunity and how it is affected by more infectious COVID-19 variants.

4.1. Vaccination efficacy and hesitancy

Fig. 5 presents the total infected (I+A) population under increasing values of vaccination onset times (T0), vaccination daily rates (δ), vaccine efficacy (λ) and of vaccine hesitancy/denial population percentage. In the top panel, total infected population is shown as a function of the vaccination rate δ and vaccine efficacy λ. As can be seen, even vaccines with a relatively low efficacy can rapidly reduce the infected population. In Fig. 5b) we show how the number of those infected evolves as a function of δ and the percentage of population that avoids getting vaccinated. These findings suggest that vaccine hesitancy, which accounts for a significant proportion of the population might seriously threaten the reach of herd immunity, especially if the situation is worsened by the appearance of more infectious COVID-19 strains.

Fig. 5.

Fig. 5

Total infected population as a function of vaccination rate, vaccine efficacy and vaccine denial population percentage. Results are obtained by numerically solving the SAIVR model for I0=105Npop and A0=0.2×I0, where Npop=106. The parameters of the model used are those obtained by applying machine learning on the epidemic curves. a) Infected population vs. vaccination rate δ and vaccine efficacy λ. b) Infected population as a function of the percentage of the population that avoids getting vaccinated and the vaccine rate δ.

4.2. Herd immunity and new COVID-19 variants

The achievement of herd immunity has been hailed as the ultimate goal of a successful vaccination campaign. ‘Herd immunity’, also known as ‘population immunity’, is the indirect protection from an infectious disease that happens when a sufficient portion of the population is immune either through vaccination or immunity developed through previous infection. Once the herd immunity threshold is met, the spread of the infectious disease is kept under control, current outbreaks will extinguish and endemic transmission of the pathogen will be interrupted. Earlier estimates of the threshold found values of about 60–70% of the population [41], [42], [43]. In reality, highly transmissible strains tend to increase the threshold value, possibly keeping this goal out of reach. Furthermore, persistent hesitancy about vaccines makes vaccinating more than the 60–65% of the population unlikely even in countries which are at the global forefront of the vaccination effort.

We quantitatively investigate the likelihood of incurring resurgent COVID-19 epidemics after having immunized 50%, 60%, and 70% of the population, under different new infection introductions, COVID-19 variants and ongoing vaccine deployment pace. Herd immunity protection is affected by the initial value of the removed population (R0 at t=0), which comprises both recovered as well as fully vaccinated individuals, assuming permanent immunity for both cases. In each scenario, we study the epidemic evolution after the introduction of a cluster of newly infected individuals in the population. I0 represents the newly infected load at t=0 (I0=1%,0.1%,0.01%,0.001% of the total population).

In the first part of the study we considered the less infectious variants spreading during the 2021 spring by using the parameters retrieved in Section 3.3. The second scenario involves a more infectious COVID-19 strain such as the Delta variant, which has been reported to be able to spread the virus more efficiently [44]. Finally, we explore cases that involve or not further (continuing) vaccine roll-outs.

Fig. 6 presents the results obtained by numerically solving the SAIVR model for the aforementioned cases. The top row presents the time evolution of an outbreak in a population where the 50% (left panel), 60% (middle panel), and 70% (right panel) of the individuals have been immunized, for different numbers of initially infected individuals I0. Results are obtained by solving the SAIVR model for β1=0.16, α1=0.20, and γ=1/12; the average values obtained for the countries considered in Section 3.3 and listed in Supplementary Material. As it can be seen, when the immune portion of the population is only 50%, the outbreaks are contained but not eradicated as the virus spreads in low intensity waves making the disease endemic. Given the contagiousness of the less infectious variants, an immunity threshold larger than 60% is enough to eradicate the disease.

Fig. 6.

Fig. 6

Top row: Time evolution of epidemics following the introduction of infected individuals in a population that has been already vaccinated at 50% (left panel), 60% (middle panel), and 70% (right panel), with permanent immunity and no further vaccine roll-out, for four different numbers of newly infected individuals (I0=10.001% of total population, the color code is presented in the legends). Results are obtained from numerically solving the SAIVR model for β1=0.16, α1=0.20, which represent the average of the countries’ fitted values (see text for details and for the values of the other parameters). Middle row: Same as in top row but with β1=0.25, α1=0.25, which represent a more contagious COVID-19 variant. As is shown, vaccinated coverage of 50% and 60% cannot prevent the resurgence of outbreaks. Bottom row: Same as in middle row, but with ongoing vaccination roll-out with rate δ=0.001, per day, as it is shown, a continuing vaccine roll-out lowers the intensity of the resurgent waves and prevents the resurgence of subsequent outbreaks.

The middle row of Fig. 6 presents the evolution of outbreaks driven by a more contagious variant; here β1=0.25, and α1=0.25. As it is shown, if the immunized portion of the population is only 50% or 60%, the resurgence of outbreaks cannot be prevented (60% immunity protection makes the disease endemic). Only when the 70% of the population is immunized the disease is eradicated.

In both the top and the middle rows, the vaccine deployment is not taking place during the outbreaks. The bottom row instead considers the highly infectious variant disease evolution but with constant vaccine roll-out (with rate δ=0.001). As it can be seen, since even after getting only the first vaccine shot individuals are partially protected, continuing the vaccine administration rapidly lowers the intensity of the resurgent waves and helps preventing subsequent outbreaks.

Although recent reports on highly infectious variants claim that the efficacy of most vaccines is still about 90% in preventing serious illnesses [45], it is still not clear their performance on halting asymptomatic transmission. In this study, we assumed that the vaccine efficacy on protecting from more infectious variants is the same as for the less infectious ones. Despite this optimistic assumption, the herd immunity threshold is moved to higher values by simply increasing the infection rates.

5. Conclusions

Compartmental models are efficient tools to deal with the time evolution of disease outbreaks. They provide us with useful intuition on the impact of non-pharmaceutical intervention in decreasing the number of infectious incidence rates.

In this work, we have augmented the classic SIR model with the ability to accommodate asymptomatic transmission and vaccinated individuals. The SAIVR model is a straightforward deterministic model, which does not take into consideration age, gender or geographic clustering. Despite this, its simplicity and the insights it offers on how key epidemiological variables affect individuals are among its main strengths. Its power also lies in the fact that, as factors such new variants are added to the model, it is easy to adjust its parameters and provide with best fit curves between the data and the model predictions.

Since the inclusion of the Asymptomatic and Vaccinated compartments enlarged the number of parameters and initial conditions of the model, we employed a novel semi-supervised framework to estimate most of them. An unsupervised neural network solves the model’s differential equations over a range of parameters and initial conditions. A supervised approach then incorporates data and determines the optimal initial conditions and modeling parameters that best fit the 27 epidemic curves considered. As expected due to the heterogeneity of the countries sample, the resulting parameters fit are dissimilar although they follow similar trends.

We used these results to shed light on the impact of the vaccination campaign on the future of the pandemic. We pointed out how vaccine hesitancy is one of the most important hurdles of the campaign and further efforts should be done to support people and give them correct information about vaccines. Because of this, vaccinating the critical number of people that have to be immune in order to prevent future outbreaks (i.e. herd immunity), is likely to be out of reach. Widely circulating coronavirus variants are also a threat as they move the herd immunity threshold to higher values. This points out the importance of rapidly reducing the infection rate by any means, such as by imposing restrictive measures in case highly infective new variants appears before the herd immunity threshold is reached. These results manifest the need for continuing the vaccination effort and the drive for achieving high vaccination coverage in order to contain outbreaks generated by new and possibly more infectious variants.

Data availability

The code used to perform the fitting is available on GitHub [46]. All study data are either included in the article and supporting information or available in Ref. [40].

CRediT author statement

E.K. conceived the proposed model and supervised the study. M.A, and M.M. designed the machine learning procedure and code. M.A. and G.N. performed numerical experiments, collected data and analyzed the results. M.A. wrote the initial draft of the manuscript. All authors critically revised, improved, and reviewed the manuscript in various ways, and gave final approval for publication.

Declaration of Competing Interest

Authors declare that they have no conflict of interest.

Appendix A. Computational methods

We implemented a fully connected feed forward neural network that consists of six hidden layers with 48 neurons per layer and Sigmoid activation functions. The code is written in PyTorch [36] and published on GitHub [46]. In the following we list the technical details of the learning procedure described in Section 3:

A1. Unsupervised learning

The network parameters (weights and biases) are updated with an adaptive learning rate using the Adam algorithm [47] until the loss L of Eq. (5) becomes smaller than 108. The learning rate ranged from 0.001 to 106. The (initial) infectious and asymptomatic populations involved in realistic situations are a small fraction of the total population. The ODEs in Eq. (2) are highly non-linear and extremely sensitive on initial conditions. Subsequently, training the unsupervised network for the initial conditions given in Table 1 is very challenging. To cope with this, we started by training the network with a different choice of initial condition bundles taken in ‘safer’ intervals (I0,A0=[10%,30%]) where the network was able to quickly learn the solutions with high-accuracy. Then, we gradually decreased I0 and A0 until they meet the values in Table 1, while we keep training the network using the previously trained weights and biases. After performing this annealing procedure, the network was able to learn the solutions of the family of ODE defined by Table 1.

A2. Supervised learning and fitting procedure

The SAIVR parameters and initial conditions are updated with a stochastic gradient descent algorithm. Whenever the infection wave started before the vaccination campaign, we divided the time series in two parts: before (δ=0) and after (δ>0) the vaccination campaign began. The first shot vaccination rate δ can abruptly change in time based on logistics, social or political reasons. Therefore, instead of fitting δ as we did with the other parameters in the bundles we estimated it using real data, with the aim of reproducing the average number of vaccinated people. The Vaccinated compartment counts those individuals who have received the first shot but have not been fully immunized yet. We defined the average number of people in the vaccinated compartment V˜ in a given time interval as V˜=t0tfV(t)ϵdt, where ϵ is the VR rate. Then, we computed with a similar procedure V˜Data using data relative to a given country and where VData(t) is the difference between the total number of people who got at least one shot and those fully vaccinated. δ and V0 were selected to match V˜ with V˜Data.

Although I0 was provided in the data, we did not fix it during the fitting. We have empirically observed that the network generalizes better in this case.

The fitting procedure was performed 20 times for any given data-set in order to select the parameters and initial conditions that fit the ground-truth data with the lowest loss of Eq. (6).

Supplementary material

Supplementary material associated with this article can be found, in the online version, at 10.1016/j.chaos.2021.111621

Appendix B. Supplementary materials

Supplementary Data S1

Supplementary Raw Research Data. This is open data under the CC BY license http://creativecommons.org/licenses/by/4.0/

mmc1.pdf (185.5KB, pdf)

References

  • 1.Kermack W.O., McKendrick A.G. A contribution to the mathematical theory of epidemics. Proc R Soc Lond A. 1927;115(772):700–721. [Google Scholar]
  • 2.Kaxiras E., Neofotistos G. Multiple epidemic wave model of the COVID-19 pandemic: modeling study. J Med Internet Res. 2020;22(7):e20912. doi: 10.2196/20912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Barthélemy M., Barrat A., Pastor-Satorras R., Vespignani A. Dynamical patterns of epidemic outbreaks in complex heterogeneous networks. J Theor Biol. 2005;235(2):275–288. doi: 10.1016/j.jtbi.2005.01.011. [DOI] [PubMed] [Google Scholar]
  • 4.Ferrari M.J., Bansal S., Meyers L.A., Bjørnstad O.N. Network frailty and the geometry of herd immunity. Proc R Soc B. 2006;273(1602):2743–2748. doi: 10.1098/rspb.2006.3636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Volz E. SIR dynamics in random networks with heterogeneous connectivity. J Math Biol. 2007;56(3):293–310. doi: 10.1007/s00285-007-0116-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Tagliazucchi E., Balenzuela P., Travizano M., Mindlin G., Mininni P. Lessons from being challenged by COVID-19. Chaos Solitons Fractals. 2020;137:109923. doi: 10.1016/j.chaos.2020.109923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Liu Q.-H., Ajelli M., Aleta A., Merler S., Moreno Y., Vespignani A. Measurability of the epidemic reproduction number in data-driven contact networks. Proc Natl Acad Sci. 2018;115(50):12680–12685. doi: 10.1073/pnas.1811115115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zhang M., Verbraeck A., Meng R., Chen B., Qiu X. Modeling spatial contacts for epidemic prediction in a large-scale artificial city. J Artif Soc Soc Simul. 2016;19(4):3. doi: 10.18564/jasss.3148. [DOI] [Google Scholar]
  • 9.Groendyke C., Welch D., Hunter D.R. Bayesian inference for contact networks given epidemic data. Scand J Stat. 2010 doi: 10.1111/j.1467-9469.2010.00721.x. [DOI] [Google Scholar]; no–no
  • 10.Sanche S., Lin Y.T., Xu C., Romero-Severson E., Hengartner N., Ke R. High contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2. Emerg Infect Dis. 2020;26(7):1470–1477. doi: 10.3201/eid2607.200282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Li Q., Guan X., Wu P., Wang X., Zhou L., Tong Y., Ren R., Leung K.S., Lau E.H., Wong J.Y., Xing X., Xiang N., Wu Y., Li C., Chen Q., Li D., Liu T., Zhao J., Liu M., Tu W., Chen C., Jin L., Yang R., Wang Q., Zhou S., Wang R., Liu H., Luo Y., Liu Y., Shao G., Li H., Tao Z., Yang Y., Deng Z., Liu B., Ma Z., Zhang Y., Shi G., Lam T.T., Wu J.T., Gao G.F., Cowling B.J., Yang B., Leung G.M., Feng Z. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. New Engl J Med. 2020;382(13):1199–1207. doi: 10.1056/NEJMoa2001316. [DOI] [PMC free article] [PubMed] [Google Scholar]; PMID: 31995857
  • 12.Du Z., Wang L., Cauchemez S., Xu X., Wang X., Cowling B.J., et al. Risk for transportation of coronavirus disease from Wuhan to other cities in China. Emerg Infect Dis. 2020;26(5):1049–1052. doi: 10.3201/eid2605.200146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Rothe C., Schunk M., Sothmann P., Bretzel G., Froeschl G., Wallrauch C., et al. Transmission of 2019-nCoV infection from an asymptomatic contact in Germany. New Engl J Med. 2020;382(10):970–971. doi: 10.1056/NEJMc2001468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wynants L., Calster B.V., Collins G.S., Riley R.D., Heinze G., Schuit E., et al. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. BMJ. 2020:m1328. doi: 10.1136/bmj.m1328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Koh J., Shah S.U., Chua P.E.Y., Gui H., Pang J. Epidemiological and clinical characteristics of cases during the early phase of COVID-19 pandemic: a systematic review and meta-analysis. Front Med. 2020;7:295. doi: 10.3389/fmed.2020.00295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Riccardo F., Ajelli M., Andrianou X.D., Bella A., Del Manso M., Fabiani M., et al. Epidemiological characteristics of COVID-19 cases and estimates of the reproductive numbers 1 month into the epidemic, Italy, 28 January to 31 March 2020. Eurosurveillance. 2020;25(49):2000790. doi: 10.2807/1560-7917.ES.2020.25.49.2000790. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Khalili M., Karamouzian M., Nasiri N., Javadi S., Mirzazadeh A., Sharifi H. Epidemiological characteristics of COVID-19: a systematic review and meta-analysis. Epidemiol Infect. 2020;148:e130. doi: 10.1017/S0950268820001430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Li J., Huang D.Q., Zou B., Yang H., Hui W.Z., Rui F., et al. Epidemiology of COVID-19: a systematic review and meta-analysis of clinical characteristics, risk factors, and outcomes. J Med Virol. 2021;93(3):1449–1458. doi: 10.1002/jmv.26424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Flamant C., Protopapas P., Sondak D.. Solving differential equations using neural network solution bundles. 2020. arXiv e-prints arXiv:2006.14372
  • 20.Mattheakis M., Sondak D., Dogra A.S., Protopapas P.. Hamiltonian neural networks for solving differential equations. 2020. arXiv:2001.11107 [DOI] [PubMed]
  • 21.Paticchio A., Scarlatti T., Mattheakis M., Protopapas P., Brambilla M.. Semi-supervised neural networks solve an inverse problem for modeling COVID-19 spread. 2020. arXiv:2010.05074
  • 22.Saito M.M., Imoto S., Yamaguchi R., Sato H., Nakada H., Kami M., et al. Extension and verification of the SEIR model on the 2009 influenza a (H1N1) pandemic in Japan. Math Biosci. 2013;246(1):47–54. doi: 10.1016/j.mbs.2013.08.009. [DOI] [PubMed] [Google Scholar]
  • 23.Fang H., Chen J., Hu J.. Modelling the SARS epidemic by a lattice-based Monte–Carlo simulation2005; 10.1109/iembs.2005.1616239 [DOI] [PubMed]
  • 24.Smirnova A., deCamp L., Chowell G. Forecasting epidemics through nonparametric estimation of time-dependent transmission rates using the SEIR model. Bull Math Biol. 2017;81(11):4343–4365. doi: 10.1007/s11538-017-0284-3. [DOI] [PubMed] [Google Scholar]
  • 25.Alanazi S.A., Kamruzzaman M.M., Alruwaili M., Alshammari N., Alqahtani S.A., Karime A. Measuring and preventing COVID-19 using the SIR model and machine learning in smart health care. J Healthc Eng. 2020;2020:1–12. doi: 10.1155/2020/8857346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Palladino A., Nardelli V., Atzeni L.G., Cantatore N., Cataldo M., Croccolo F., Estrada N., Tombolini A.. Modelling the spread of COVID19 in Italy using a revised version of the SIR model. 2020. arXiv:2005.08724
  • 27.Cooper I., Mondal A., Antonopoulos C.G. A SIR model assumption for the spread of COVID-19 in different communities. Chaos Solitons Fractals. 2020;139:110057. doi: 10.1016/j.chaos.2020.110057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Buitrago-Garcia D., Egli-Gany D., Counotte M.J., Hossmann S., Imeri H., Ipekci A.M., et al. Occurrence and transmission potential of asymptomatic and presymptomatic SARS-CoV-2 infections: a living systematic review and meta-analysis. PLoS Med. 2020;17(9):1–25. doi: 10.1371/journal.pmed.1003346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Byambasuren O., Cardona M., Bell K., Clark J., McLaws M.-L., Glasziou P.. Estimating the extent of asymptomatic COVID-19 and its potential for community transmission: systematic review and meta-analysis. medRxiv202010.1101/2020.05.10.20097543 [DOI] [PMC free article] [PubMed]
  • 30.Hossain M.K., Hassanzadeganroudsari M., Apostolopoulos V. The emergence of new strains of SARS-CoV-2. what does it mean for COVID-19 vaccines? Expert Rev Vaccines. 2021;0(0):1–4. doi: 10.1080/14760584.2021.1915140. [DOI] [PMC free article] [PubMed] [Google Scholar]; PMID: 33896316
  • 31.Wang R., Maddix D., Faloutsos C., Wang Y., Yu R.. Bridging physics-based and data-driven modeling for learning dynamical systems. 2021. arXiv:2011.10616
  • 32.Yang Z., Zeng Z., Wang K., Wong S.-S., Liang W., Zanin M., Liu P., Cao X., Gao Z., Mai Z., Liang J., Liu X., Li S., Li Y., Ye F., Guan W., Yang Y., Li F., Luo S., Xie Y., Liu B., Wang Z., Zhang S., Wang Y., Zhong N., He J. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. J Thorac Dis. 2020;12(3):165–174. doi: 10.21037/jtd.2020.02.64. [DOI] [PMC free article] [PubMed] [Google Scholar]; https://jtd.amegroups.com/article/view/36385
  • 33.Zou D., Wang L., Xu P., Chen J., Zhang W., Gu Q.. Epidemic model guided machine learning for COVID-19forecasts in the United States. medRxiv2020;. 10.1101/2020.05.24.20111989
  • 34.Ayed I., Bézenac E.D., Pajot A., Gallinari P.. Learning partially observed PDE dynamics with neural networks. 2019. https://openreview.net/forum?id=HyefgnCqFm.
  • 35.Chen R.T.Q., Rubanova Y., Bettencourt J., Duvenaud D.K. In: Advances in neural information processing systems. Bengio S., Wallach H., Larochelle H., Grauman K., Cesa-Bianchi N., Garnett R., editors. vol. 31. Curran Associates, Inc.; 2018. Neural ordinary differential equations. [Google Scholar]; https://proceedings.neurips.cc/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf
  • 36.Paszke A., Gross S., Chintala S., Chanan G., Yang E., DeVito Z., Lin Z., Desmaison A., Antiga L., Lerer A.. Automatic differentiation in Pytorch2017.
  • 37.Polack F.P., Thomas S.J., Kitchin N., Absalon J., Gurtman A., Lockhart S., Perez J.L., Pérez Marc G., Moreira E.D., Zerbini C., Bailey R., Swanson K.A., Roychoudhury S., Koury K., Li P., Kalina W.V., Cooper D., Frenck R.W., Hammitt L.L., Türeci O., Nell H., Schaefer A., Ünal S., Tresnan D.B., Mather S., Dormitzer P.R., Sahin U., Jansen K.U., Gruber W.C. Safety and efficacy of the BNT162b2 mRNA COVID-19 vaccine. New Engl J Med. 2020;383(27):2603–2615. doi: 10.1056/NEJMoa2034577. [DOI] [PMC free article] [PubMed] [Google Scholar]; PMID: 33301246
  • 38.Baden L.R., El Sahly H.M., Essink B., Kotloff K., Frey S., Novak R., Diemert D., Spector S.A., Rouphael N., Creech C.B., McGettigan J., Khetan S., Segall N., Solis J., Brosz A., Fierro C., Schwartz H., Neuzil K., Corey L., Gilbert P., Janes H., Follmann D., Marovich M., Mascola J., Polakowski L., Ledgerwood J., Graham B.S., Bennett H., Pajon R., Knightly C., Leav B., Deng W., Zhou H., Han S., Ivarsson M., Miller J., Zaks T. Efficacy and safety of the mRNA-1273 SARS-CoV-2vaccine. New Engl J Med. 2021;384(5):403–416. doi: 10.1056/NEJMoa2035389. [DOI] [PMC free article] [PubMed] [Google Scholar]; PMID: 33378609
  • 39.George N., Tyagi N.K., Prasad J.B. COVID-19 pandemic and its average recovery time in Indian states. Clin Epidemiol Glob Health. 2021;11:100740. doi: 10.1016/j.cegh.2021.100740. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Our world in data github repository: 2021;https://github.com/owid/covid-19-data.
  • 41.Omer S.B., Yildirim I., Forman H.P. Herd immunity and implications for SARS-CoV-2 control. JAMA. 2020;324(20):2095–2096. doi: 10.1001/jama.2020.20892. [DOI] [PubMed] [Google Scholar]
  • 42.Chowdhury S., Roychowdhury S., Chaudhuri I. Universality and herd immunity threshold: revisiting the sir model for COVID-19. Int J Mod Phys C. 2020;0(0):2150128. doi: 10.1142/S012918312150128X. [DOI] [Google Scholar]
  • 43.Aguas R., Corder R.M., King J.G., Gonçalves G., Ferreira M.U., Gomes M.G.M.. Herd immunity thresholds for SARS-CoV-2 estimated from unfolding epidemics. medRxiv2020. 10.1101/2020.07.23.20160762
  • 44.Arora P., Kempf A., Nehlmeier I., Sidarovich A., Krüger N., Graichen L., Moldenhauer A.-S., Winkler M.S., Schulz S., Jäck H.-M., Stankov M.V., Behrens G.M.N., Pöhlmann S., Hoffmann M.. Increased lung cell entry of B.1.617.2 and evasion of antibodies induced by infection and BNT162b2 vaccination. bioRxiv202110.1101/2021.06.23.449568
  • 45.Bernal J.L., Andrews N., Gower C., Gallagher E., Simmons R., Thelwall S., Stowe J., Tessier E., Groves N., Dabrera G., Myers R., Campbell C., Amirthalingam G., Edmunds M., Zambon M., Brown K., Hopkins S., Chand M., Ramsay M.. Effectiveness of COVID-19 vaccines against the B.1.617.2 variant. medRxiv2021;. 10.1101/2021.05.22.21257658
  • 46.MLCD machine learning COVID dynamics2021, https://github.com/mattangeli/mlcd-machine-learning-covid-dynamics, https://github.com/mattangeli/MLCD-Machine-Learning-Covid-Dynamics.
  • 47.Kingma D.P., Ba J.. Adam: a method for stochastic optimization. 2017. arXiv:1412.6980

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data S1

Supplementary Raw Research Data. This is open data under the CC BY license http://creativecommons.org/licenses/by/4.0/

mmc1.pdf (185.5KB, pdf)

Data Availability Statement

The code used to perform the fitting is available on GitHub [46]. All study data are either included in the article and supporting information or available in Ref. [40].


Articles from Chaos, Solitons, and Fractals are provided here courtesy of Elsevier

RESOURCES