Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2020 Aug 12;413:132693. doi: 10.1016/j.physd.2020.132693

Predicting the evolution of the COVID-19 epidemic with the A-SIR model: Lombardy, Italy and São Paulo state, Brazil

Armando GM Neves a,c,⁎,1, Gustavo Guerrero b,c,1
PMCID: PMC7419264  PMID: 32834253

Abstract

The presence of a large number of infected individuals with few or no symptoms is an important epidemiological difficulty and the main mathematical feature of COVID-19. The A-SIR model, i.e. a SIR (Susceptible–Infected–Removed) model with a compartment for infected individuals with no symptoms or few symptoms was proposed by Gaeta (2020). In this paper we investigate a slightly generalized version of the same model and propose a scheme for fitting the parameters of the model to real data using the time series only of the deceased individuals. The scheme is applied to the concrete cases of Lombardy, Italy and São Paulo state, Brazil, showing different aspects of the epidemic. In both cases we see strong evidence that the adoption of social distancing measures contributed to a slower increase in the number of deceased individuals when compared to the baseline of no reduction in the infection rate. Both for Lombardy and São Paulo we show that we may have good fits to the data up to the present, but with very large differences in the future behavior. The reasons behind such disparate outcomes are the uncertainty on the value of a key parameter, the probability that an infected individual is fully symptomatic, and on the intensity of the social distancing measures adopted. This conclusion enforces the necessity of trying to determine the real number of infected individuals in a population, symptomatic or asymptomatic.

Keywords: COVID-19, Epidemics, Mathematical modeling, SIR-type models

1. Introduction

Although there are good models for predicting the time evolution of an epidemic of diseases such as influenza or measles, models of the same type are not working for the COVID-19. An important feature of the COVID-19 is that it may be asymptomatic or mildly symptomatic in some patients, although causing severe respiratory symptoms in others. As a consequence, there is a large number of undocumented infections [1].

The lack of tests for assessing the health state of large samples of the populations contributes to the spread of the COVID-19, as asymptomatic individuals may not isolate themselves. Although there is no clear distinction between symptomatic and asymptomatic, in this paper we will use the acronym MSA (mildly symptomatic or asymptomatic) to mean a case of COVID-19 weak enough for not causing death or lead to hospitalization and probably unreported due to the lack of tests.

We consider that a good step towards a good predicting model for COVID-19 has been taken in [2]. In Section 2 we will describe a slight generalization of the A-SIR model proposed in that work and use it in the rest of this paper in predicting the possible evolution of the COVID-19 epidemic in Lombardy, Italy and São Paulo state, Brazil.

Of course, as the pandemic progresses several different attempts are being made to understand it from a mathematical point of view. It is hard to quote and describe all the approaches, many of them still unpublished, so we cite here only some examples. Fanelli and Piazza [3] use the SIR model augmented by a compartment for dead individuals, with parameter estimation by the stochastic differential evolution algorithm [4]. Other papers [5], [6], [7], [8], [9], [10] use models with compartmental models other than SIR or A-SIR, or age-structured populations. Examples of papers where the spatial localization of the population is taken into account are [11] and [12].

The A-SIR model is just the traditional SIR (Susceptible–Infected–Removed) model for epidemics, introduced by Kermack and McKendrick [13] almost 100 years ago, with one extra compartment for accounting the MSA infected individuals. The MSA can still transmit the disease to susceptible individuals, but, as they mostly ignore their condition, it is reasonable that they will remain for larger periods transmitting the disease when compared to fully symptomatic individuals, which will probably isolate themselves after a few days. Of course, when the epidemic is already well developed, a large fraction of the population may have been MSA infected, and, as these individuals are healed, they will contribute to largely decrease the number of susceptible people.

Although the proportion of symptomatic cases has already been estimated to be 16% for the development of the disease in China [1], we will take the liberty to explore the possibility that this proportion may be larger or smaller.

As a support to the possibility that there are less symptomatic cases than previously estimated, we cite [14]. Referring to 11 European countries, the report states that “In all countries, we estimate there are orders of magnitude fewer infections detected than true infections, mostly likely due to mild and asymptomatic infections as well as limited testing capacity”. Fig. 1 in that paper illustrates that.

Fig. 1.

Fig. 1

Typical behavior in the A-SIR model of the fractions of susceptible, symptomatic infected, MSA infected and symptomatic removed individuals. Parameter values: β0=0.5, μ=0.5, ξ=23, γs=17, γa=121. The initial conditions are S(0)=1, I(0)=A(0)=0.0001, R(0)=0.

Supporting the other possibility, Lavezzo et al. [15] state that at Vò, Italy, the asymptomatic cases were a fraction of 43.2% of the total. Although clearly casting some doubt, we also cite [16]. The paper states “Among the participants with positive results for SARS-CoV-2, symptoms of Covid-19 were reported (...) by 57% of those in the overall population-screening group. However, 29% of participants who tested negative in the overall population-screening group also reported having symptoms”.

One reason for the uncertainty in the outcome of mathematical models for COVID-19 is that the models usually contain parameters for which reasonable values are taken, but sometimes without full scientific support. In particular, the models are extremely sensitive to the infection rate β0, see 1. We will show in this paper how to ignore the data on the number of currently infected people. These are prone to not only a large uncertainty, because of the MSA cases, but also underreporting of the symptomatic cases due to the lack of tests. We will use only the data on the number of deaths due to the COVID-19. Such a decision has also been taken in [5]. The reason behind it is that we believe that the deaths data are a more faithful indicator than the number of cases. In most instances only the patients in severe conditions are being tested for COVID-19. We are aware, however, that underreporting in deaths is also possible due to the lack of tests, and also that political manipulation – both for increasing and for decreasing deaths numbers – cannot be discarded.

We will restrict for the time being to the study of the development of COVID-19 in Lombardy and in the state of São Paulo. Both cases result in good fits of the model to the data. It will turn out that an important part of the fitting procedure is the way of tuning the value of the infection rate β0 to the data.

One important conclusion supported by our good fits – both in Lombardy and in São Paulo – is that the adopted social distancing measures taken in both localities did contribute to diminishing the number of deaths due to COVID-19 with respect to the expected behavior if no measures were taken.

An important question is what will happen when the social distancing measures currently in act in most countries are relaxed. One bad possibility is that a second wave of COVID-19 will arise. If not mitigated, the potential number of deaths in the second wave may be larger than the deaths in the first wave. Another possibility is that sufficient herd immunity will have been acquired by the populations after the present epidemic and no large increase of cases should happen after relaxation of the social distancing.

We will show in this paper that neither of the above possibilities can be ruled out for Lombardy. Part of our ignorance is due to the fact that one key parameter of the A-SIR model, the probability that a newly infected individual is symptomatic, is still largely unknown. Another reason for not being able to predict the future of the epidemic is that we do not know how much the social distancing measures adopted were effective in reducing the infection rate of the model.

In the case of São Paulo state, Brazil, the fraction of deaths up to now is much smaller than in Lombardy. Although this is good, it also means that the population is still very susceptible. Strong economic pressure is being exerted on politicians for relaxation of the social distancing measures. We predict that even in the best of the possibilities, the number of infected individuals will steadily grow for a large period and in its peak it will be much larger than present values. Thus, social distancing measures should not be relaxed before the number of infected individuals is small. We see that the increase in the number of cases may be catastrophic if social distancing measures are not strengthened.

The paper is organized as follows. Section 2 starts with a mathematical description of the model and all its parameters. Then we will talk about the linear regime, i.e. the behavior of the solutions of the model for a short time after the beginning of the epidemic. Finally, we will describe conditions for the population fraction of infected individuals to decrease and also the limit behavior after the epidemic is finished. Section 3 describes the procedure for finding values of the parameters such that the output of the A-SIR model fits well the deaths data, both in Lombardy and in São Paulo. The paper is closed by Section 4, in which we draw some conclusions on the results obtained, and by Section 5, in which we account for some changes in the conclusions because of new data released during the time the paper was being written.

2. The A-SIR model

Let S(t), I(t), A(t) and R(t) be the population fractions at time t respectively of susceptible, symptomatic infected, MSA infected and removed individuals. By susceptible, we mean individuals which were not yet infected by the SARS-Cov-2 virus. By symptomatic we mean fully symptomatic individuals and by MSA we mean individuals which have either no symptoms, or few symptoms. By removed we mean individuals which were either healed after infection, isolated (at a hospital or at home) or deceased. The fraction of removed individuals is composed by the sum of symptomatic removed Rs(t) and MSA removed individuals Ra(t), according to whether the individuals were fully symptomatic before removal, or had mild or no symptoms. The time span we are going to consider is of a few months, thus we may ignore births and disregard deaths by reasons other than infection. In particular, we suppose that MSA or susceptible individuals do not die and that all infected individuals do not become susceptible again, at least for the time span we are considering.

The A-SIR model is described by the following set of nonlinear ordinary differential equations:

S(t)=β0S(I+μA)I(t)=β0ξS(I+μA)γsIA(t)=β0(1ξ)S(I+μA)γaARs(t)=γsIRa(t)=γaA. (1)

The latter two equations are not essential for solving the system. In fact, we may calculate Rs and Ra simply by integrating respectively I(t) and A(t). Another simple property of the model, proved by summing all the 5 equations, is that the sum of the fractions S, I, A, Rs and Ra is a constant. If we take μ=1, this is exactly the same model as in [2], although with different notation.

All parameters above are considered to be positive and are interpreted as follows:

  • β0 is the infection rate of symptomatic individuals;

  • μ(0,1] is a reduction factor such that the infection rate for the MSA is μβ0;

  • ξ(0,1) is the probability that a new infection event leads to a symptomatic case;

  • γs and γa are respectively the inverses of the mean time symptomatic and MSA individuals remain infective. We suppose that γs>γa.

The mean removal time for symptomatic individuals will be considered to be around a week. This does not mean that individuals with symptoms will be healed after one week, but that these individuals, after showing symptoms for some days will either be hospitalized, or stay isolated at home. We will take then γs=17(days)1. Following [2], we will take γa=121(days)1, meaning that MSA individuals will remain active in the population for a larger time than symptomatic individuals.

We believe that we may take values such as above for γs and γa without risk of overestimating or underestimating the size of the epidemic. Other values could have been considered, but would not change the main conclusions. In the following we will explain how to use mortality data to infer the value of the contact rate β0. It will turn out that the value of the parameter μ will not alter almost anything in the numeric predictions. On the contrary, we will see that the remaining parameter ξ alters drastically the outcome of the model in the future.

As a preparation for understanding things to come, we show in Fig. 1 a typical graph of the fractions S, I, A and Rs as functions of time for a seemingly reasonable choice of parameters. The graphs are obtained by numerically solving Eqs. (1).

For this choice of parameters, note that about 65% of the population are symptomatic removed 100 days after the start of the epidemic. Another thing to notice in the graphs of Fig. 1 is that, although we used ξ=23 expecting to obtain that 2/3 of the cases are symptomatic, this does not happen. In fact, by time t=40 the fraction of MSA is larger than the number of symptomatic individuals, even considering that the probability of a case being symptomatic is larger than the probability of a MSA case. The reason for obtaining a large number of MSA individuals is not related to ξ, but to the fact that the mean time γa1 that an individual takes as MSA is larger than the mean time γs1 taken by a symptomatic individual.

To better understand this important issue, in Fig. 2 we plot, for the same set of parameters and initial conditions, the ratio I(t)(I(t)+A(t)) of symptomatic to total cases. The figure shows that, because of the initial condition I(0)=A(0), the ratio of symptomatic to total infected individuals starts equal to 1/2. After a transient, it becomes almost constant around 0.6, not 2/3, and then decays to 0. Part of this behavior will be explained in the next subsection.

Fig. 2.

Fig. 2

The fraction of symptomatic cases as a function of time in the A-SIR model for the same parameter values and initial conditions as in Fig. 1.

2.1. The linear regime

Another feature of the A-SIR model, already noticed in [2], is that the fraction S(t) is very well approximated by 1 for the initial times. We may use this to approximate the solution of Eqs. (1) for small times. Substituting S(t) by 1 in (1), we get

I(t)=(β0ξγs)I+β0ξμAA(t)=β0(1ξ)I+(μβ0(1ξ)γa)A, (2)

which is a linear system of ordinary differential equations with constant coefficients. Although the exact system (1) cannot be exactly solved, its linear approximation for initial times can be solved in terms of the eigenvalues and eigenvectors of its coefficient matrix

M=β0ξγsβ0ξμβ0(1ξ)μβ0(1ξ)γa. (3)

As M is 2 × 2, its eigenvalues can be easily calculated as roots of a quadratic polynomial. It can be shown that the eigenvalues of M are always real and that the smaller of them, denoted λ, is negative. The larger eigenvalue of M will be denoted λ+ and is positive, provided that β0 is not too small, as will be seen ahead. Although the formula for λ+ in terms of the parameters of the model is somewhat large, we may invert it and find a rather simple formula for β0 as a function of λ+ and the remaining parameters:

β0=(λ++γs)(λ++γa)(λ++γa)ξ+(λ++γs)μ(1ξ). (4)

In Section 3, we will use the above formula along with an estimate of λ+ derived from the data to fix the parameter β0.

A straightforward, but lengthy, calculation shows that for any initial conditions the solution of Eqs. (2) satisfies

I(t)I(t)+A(t)tρ, (5)

with

ρ=1β0(1ξ)λ++γa+β0(1ξ)(1μ). (6)

In the above equation λ+ and β0 are related through Eq. (4). For the parameter values of Fig. 2 we have ρ=0.610411, which is approximately the height of the plateau in that graph.

This agreement illustrates the more general fact that the exact solution of Eq. (2) is a good approximation to the true solution of Eq.  (1) at the beginning of the epidemic. Of course, the exact solution of Eqs. (2) breaks down as an approximation for larger times. This is also shown in Fig. 2, because I(t)I(t)+A(t) is not always close to the value defined in Eq. (6).

Another consequence of Fig. 2 is that the ratio of symptomatic to total infected individuals is a dynamic quantity. It cannot be included as a parameter in the model’s equations as in [1]. Moreover, as shown by Eq. (6), not even for the small interval of time in which the ratio I(t)I(t)+A(t) is approximately constant, it equals parameter ξ.

We will call linear regime the time interval in which the true solution of (1) is well approximated by the solution of (2). For the parameter values in Fig. 1, Fig. 2 the linear regime lasts approximately up to time t=25, in which the fractions of symptomatic and MSA individuals are already quite high.

In the linear regime, any of the quantities I(t), A(t), Rs(t) and R(t) are approximated by exact solutions having the form c1eλ+t+c2eλt, where c1 and c2 are constants depending of which quantity we are calculating. As the term eλt quickly tends to 0, we see that I(t), A(t), Rs(t) and R(t) are all approximated by c1eλ+t, i.e., all of them are exponentially growing in the linear regime whenever λ+>0. Most importantly, all of them grow at the same rate determined by the largest eigenvalue λ+. For this reason, λ+ is called the Malthusian parameter of the model [17].

2.2. Conditions for the fraction of infected individuals to decrease

It can be shown [17] that the total fraction of infected individuals I(t)+A(t) in the solutions of (1) will decrease for all t>0 if and only if λ+0. The same condition is generally written in terms not of λ+, but of the basic reproduction ratio R0. R0 is defined as the mean number of individuals infected by a single infected individual during its whole infective period if the population is entirely susceptible. If R01, it can be shown that the number of infected individuals will initially decrease and always decrease in the A-SIR model. It is straightforward to calculate R0 for the A-SIR model following any of the recipes given in [17]. The result is

R0=β0ξγs+μ(1ξ)γa. (7)

As commented before, the Malthusian parameter λ+ will be positive if β0 is sufficiently large. The exact condition is exactly that the right-hand side in the above equation is larger than 1, i.e. β0>γsγa(γaξ+γsμ(1ξ)).

If R0>1 and the whole population is susceptible, the total number of infected individuals will initially increase, but as the number of susceptible individual decreases, contagion becomes more difficult, and, consequently, the total number of infected will reach a maximum at some time t. In the simpler SIR model, it can be shown that t is the time such that S(t)=1R0. In the A-SIR model, no such simple condition exists, as we have two types of infected individuals and the number of one type may increase at the same time the other decreases. An instance of that is shown in Fig. 1 in the interval between the maximum point of I and the maximum point of A.

Gaeta [2] provided conditions for each of the fractions I and A to decrease. As Fig. 2 illustrates, whenever γs>γa and t is large enough, the fraction of MSA individuals is much larger than the fraction of symptomatic infected individuals, so that I(t)+A(t)A(t). We may use this fact to give an approximate condition for the fraction of total infected individuals to decrease.

In fact, the third equation in (1) shows that A(t) decreases whenever

S(t)<γaAβ0(1ξ)(I+μA)=γaβ0(1ξ)I+AA(1μ).

Substituting I+AA in the above formula for its approximate value 1 for large times, we get the approximate condition

S(t)<γaβ0(1ξ)μ (8)

for the decrease of the total number of infected individuals. We stress that the above condition is approximate and holds whenever γs>γa and t is large enough.

Notice that the threshold on the right-hand side of (8) is higher for smaller β0. As S(t) starts close to 1 and decreases, the threshold will be easier to attain if β0 is smaller. In other words, if β0 is small, less people have to be infected in order that the number of infected individuals starts to decrease.

2.3. The asymptotic equilibrium

All solutions of the A-SIR equations (1) converge as t to the disease-free equilibrium in which S=S, I and A are both null and R=1S. For a very contagious virus such as SARS-Cov-2, i.e. for large β0, S is close to 0. In other words, almost the entire population is eventually infected in an unmitigated epidemic caused by a sufficiently contagious virus. This is illustrated in Fig. 1, in which S can be numerically calculated to be 0.0181302.

Individuals exit the susceptible compartment of the model either as symptomatic or MSA infected, and they do so at ratios ξ and 1ξ, respectively. Since they will eventually become either symptomatic removed, or MSA removed, then the symptomatic removed and MSA removed fractions at equilibrium obey

Rs()Ra()=ξ1ξ.

Moreover, Rs()+Ra()=1S. Solving the set formed by the latter two equations, we obtain that Rs()=(1S)ξ. In the important case of a very contagious virus,

Rs()ξ. (9)

This approximation is also well illustrated in Fig. 1.

3. Fitting the A-SIR model to real COVID-19 epidemic data

According to Crisanti [18] and Fenga [19], the lack of efficient testing has been responsible for a substantial underestimation of the number of cases in the COVID-19 epidemic in Italy. In particular, probably due to testing preferentially the most severe cases, the mortality rate in the regions of Lombardy and Emilia-Romagna was three times larger than in the neighboring region of Veneto, in which testing for COVID-19 was widespread [18]. The same problem caused by lack of tests is reported also in Brazil and elsewhere.

Because of this, we believe that the cumulative number of deaths is a much more faithful indicator of the evolution of the epidemic than the number of confirmed cases, both in Lombardy and in São Paulo. We chose the deaths data in both locations as the sources for our fitting. This approach was also taken e.g. in [14]. The official deaths data, plotted in Fig. 3, were collected respectively in [20] and [21], along with the official numbers of confirmed cases.

Fig. 3.

Fig. 3

The number of accumulated deaths due to COVID-19 in Lombardy since Feb. 24, 2020 (left) and in São Paulo state since Mar. 17, 2020 (right).

Data collected respectively in [20] and [21].

We need one extra parameter, ω, to relate the outcome of the model to the number of reported deaths used here. In fact, the A-SIR model does not make any prediction for the population fraction D(t) of individuals dead due to COVID-19 up to time t. As only the symptomatic cases may die, it is natural to suppose that

D(t)=ωRs(t), (10)

where ω(0,1) is thus interpreted as the case fatality rate. The value for ω must also be found. As the lack of tests is a reality, an examination of data for several countries, as in [22], shows that the ratio of deaths to confirmed cases varies broadly among them.

Before entering into details, we describe the fitting procedure in general. Both in Lombardy and in São Paulo state, the epidemic started uncontrolled. Noticing the logarithmic vertical scale, we can see in both panels of Fig. 3 that in the first days the number of deaths seemed to increase exponentially, as expected for the linear regime, see sub Section 2.1. We will call these first days as the phase of uncontrolled epidemic. After this phase, in both locations the number of deaths started to increase at a lower rate. One of the important things will be to assess whether this lower rate is a natural consequence of the A-SIR model, or if it is due to the mitigation measures adopted in both locations.

As will be fully explained in the following, the first step will be using the deaths data in the uncontrolled epidemic phase to estimate the values of λ+ and ω.

In the next steps, for each phase of social distancing we will have another parameter ϵ indicating the intensity of the adopted measures. In Lombardy we will consider two phases of different intensities ϵ1 and ϵ2 for the period of social distancing. In São Paulo state, only one phase of social distancing will be considered. We will see that it is possible to use the number of deaths data to obtain estimates for ξ and for the intensities ϵi. Although in general an optimal choice for these parameters exists, we will see that many possible choices are almost as good for the purpose of fitting the data with the model. It results that more than one possible good fit of the model to the data exists. We will explore the consequences of this approximate degeneracy in the optimization procedure.

Since parameters γa and γe are fixed, as already explained, and β0 will be related to λ+ by Eq. (4), the parameter μ still remains undetermined. After several experiences we noticed that, as long as λ+ is determined and β0 is related to it by (4), the results of the model are to large extent independent of μ. This is a consequence of the fact that the initial behavior of the deaths number according to the model is dictated by the linear regime, i.e., λ+, whereas the final behavior is dictated by ξ, see Eq. (9). The above phenomenon is illustrated in Fig. 4, in which the results of A-SIR numerical solutions with two different values of μ show no noticeable differences. Thus, there will be no problem in adopting for μ any fixed value. We assume μ=0.5 for the rest of this paper.

Fig. 4.

Fig. 4

Two almost indistinguishable solutions of the A-SIR model with γs=17, γa=121, ξ=0.16, λ+=0.3. The left panel was produced with μ=0.25 and the right panel with μ=0.6. Observe that β0 is calculated by Eq. (4), thus it gets disparate values in the two simulations; 1.06995 for the solution in the left panel, and 0.555397 for the solution in the right panel. The initial conditions were S(0)=1, I(0)=A(0)=6.22×106 and Rs(0)=0 for both cases.

The goodness of a fit between the number of deaths reported in the data and the A-SIR model will be quantified by a cost function to be minimized. For future reference, our choice of cost function is

cd(λ+,ξ,μ,ω)=Ndi=0d(D(i)ωRs(i))2. (11)

In the above formula, besides other notations already introduced, N is the total population of the location (Lombardy or São Paulo state) and d is the last day up to which the numbers of reported deaths D(i) are being compared to the prediction of the model ωRs(i). We do not indicate the dependence of the cost function on the fixed parameters γa=121 and γs=17. The cost function also depends on the initial conditions to be used in the numerical solution of the model’s differential equations. The initial conditions will be specified when needed. The value of d depends on which data is used for estimating the parameters. For example, for estimating λ+ and ω we will take d to be the last day of the uncontrolled epidemic phase. For estimating the intensity of social distancing measures, we will use a large value for d. Of course neither the square root, nor the multiplication by Nd are necessary in the definition of the cost function, but they are included for convenience.

We will describe first the fitting procedure for Lombardy, where more data are available and then proceed to São Paulo state.

3.1. Fitting the epidemic in lombardy

The national lockdown in Italy started on Mar. 9, 2020, day number 14 after Feb. 24, the start date of the Italian time-series. We will define Mar. 9, 2020 to be the end of the uncontrolled epidemic phase for Lombardy.

For the numerical solution of Eqs. (1) in Lombardy we use initial conditions

S(0)=1,I(0)=1.66×105,
A(0)=1ρρI(0),Rs(0)=1ω6×107. (12)

Here we used that the population of Lombardy is 107 inhabitants and that the number of confirmed COVID-19 cases on day 0 was 166 [20], accounting for the value of I(0). As the initial number of MSA is unknown, our choice for A(0) is a natural one such that the fraction I(t)(I(t)+A(t)) is equal to ρ already on day 0, see (5) and (6). The initial condition for Rs(0) is also the natural one using the number of people dead due to COVID-19 on day 0, obtained from the data [20], and taking into account Eq. (10).

We stress that the initial conditions for I(0) and A(0) are the only places in our fitting procedure where some information on the number of confirmed cases is used. We will not use such information on any other day. It is conceivable that on day 0 the number of confirmed cases is more reliable than in later days. Moreover, the exact values of I(0) and A(0) do not matter so much, as I(t) and A(t) initially grow exponentially by a rate to be determined by the number of deaths.

In Fig. 5 we show contour plots of the objective function c14(λ+,0.1,0.5,ω). Observe that we are considering only the deaths data up to day 14, i.e. in the uncontrolled epidemic phase. We arbitrarily fixed μ=0.5 and ξ=0.1. The plots show that, for the chosen values of μ and ξ, in a large region in the (λ+,ω) plane the cost function has a single local minimum, which is probably a global minimum. The set of parameter values that minimize the cost function, i.e., provide the best fit of the model to the data up to day 14, is λ+=0.3216, ω=0.051.

Fig. 5.

Fig. 5

Left: Contour plots of c14(λ+,0.1,0.5,ω) for a broad range in λ+ and ω. Right: Zoom-up to the region of interest (darkest area of the left panel). Darker colors are smaller values and lighter colors are higher values of the cost function. In the white area the cost function has larger values than in the colored area. It seems clear that the function has a global minimum in the darker area of the right panel. The numerically determined location of the minimum point is λ+=0.3216, ω=0.051. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 6 compares the epidemic data with the solution of Eqs. (1) with initial conditions (12) and parameters γs=17, γa=121 (fixed), μ=0.5, ξ=0.1 (arbitrarily chosen) and λ+=0.3216, ω=0.051 (optimally chosen with respect to the preceding values). The dots in the figure are the data for D(i) divided by ω, and should thus approximate the curve for Rs(t) up to t=14. Numerical experiments (not presented for conciseness) with different values for μ and ξ confirm that the graph Rs(t) up to t=14, to be adjusted to data, almost does not change. Thus, the fit shown in the figure remains equally good independent of the values of μ and ξ. As already mentioned μ is quite irrelevant as far as β0 is calculated as a function of λ+. Also, the value of ξ is not relevant for reproducing the first days of the epidemic.

Fig. 6.

Fig. 6

Results of the A-SIR model, Eqs.  (1) with initial conditions (12) and parameters γs=17, γa=121, μ=0.5, ξ=0.1, λ+=0.3216, ω=0.051. The latter two were chosen to minimize the cost function, c14, using the former four parameters. The blue dots correspond to the number of deaths reported in pandemic data divided by ω. The graph of S(t) lies out of the range shown in the figure. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

It is well clear by Fig. 6 that after day 14 (start of the national lockdown in Italy) the data increase less than the predicted Rs based on an uncontrolled epidemic. This is good evidence that the lockdown was effectively important in reducing the number of deaths due to COVID-19 in Lombardy.

We will incorporate the effect of social distancing measures in our model by introducing a decrease of the infection rate β0 to a smaller value ϵ1β0, where 0<ϵ1<1. More precisely, in order to avoid introducing a discontinuous function into the system of differential equations, we replace β0 in Eqs. (1) by the smooth function

β(t)=β0r(t) (13)

with

r(t)=1(1ϵ1)θ(t14). (14)

In the above formula, θ(t) may be any continuous approximation of the unit step function. We used

θ(t)=12(1+erf(t)), (15)

where erf(z)=2π0zet2dt is the integral of the normal distribution. The graph of θ(t) is shown in Fig. 7. Any similar continuous function switching from values close to 0 to values close to 1 in an interval around 0 of size 1 can be used without relevant changes. The important thing is that β(t)β0 for t<14 and β(t)ϵ1β0 for t>14.

Fig. 7.

Fig. 7

Graph of the transition function θ(t) considered in Eq. (15).

In the second step of our fitting procedure, we will fix the values of λ+ and ω already estimated and evaluate the values for the intensity ϵ1 of the first phase of the lockdown, and parameter ξ, which was not relevant in the uncontrolled epidemic phase. We referred above to the first phase of the lockdown, because a strengthening of it occurred on Mar. 22, day 27 after Feb. 24. Therefore, to estimate ϵ1 and ξ we will use d=27 in the cost function, Eq. (11).

Fig. 8 shows contour plots of the cost function c27 with parameters γs, γa, μ, λ+ and ω having the same values as in Fig. 6, but allowing now variation of ξ and of the new parameter ϵ1. The left panel shows a larger region in (ϵ1,ξ) plane and the right panel shows in detail the region in which c27 attains its smallest values. We notice that, contrary to the analogous plots in Fig. 5, in which a clear global minimum is evidenced, this time the region where the cost function is close to minimum seems like a large “canyon”.

Fig. 8.

Fig. 8

Left: Contour plot of c27 as a function of ϵ1 and ξ in a large region of parameter choices. Right: Zoom to the smaller region contained in the dark area of the left panel plot. The remaining parameters have the same values as in Fig. 6. Darker colors are smaller values and lighter colors are higher values of the cost function. The global minimum is located at (0.513,0.256), but the cost function is rather close to the minimum in the large darker region in the right panel. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Referring to the darker area in the right panel of Fig. 8, we will call optimistic choices for (ϵ1,ξ) those points in the darker region with smaller values of ξ. In fact, because of Eqs. (9), (10), such a choice will lead to a small expected number of dead people at the end of the epidemic. On the contrary, points in the darker region with larger values of ξ will be called pessimistic choices for (ϵ1,ξ).

We show in Fig. 9 plots of two solutions to Eqs. (1) with optimization up to day 27. Both choices for (ϵ1,ξ) are almost equally good in fitting the data for days 0 to 27, however the results on the left panel were obtained with an optimistic choice for (ϵ1,ξ), while the results on the right panel were obtained by considering a pessimistic choice.

Fig. 9.

Fig. 9

Plots of two solutions to Eqs. (1) with different possibilities for the effect of the social distancing measures introduced by the lockdown of Mar. 9. In both panels, the full lines show, for comparison, what would be the solutions if the epidemic remained uncontrolled, whereas the dashed lines in corresponding colors show the effect of social distancing measures beginning at t=14. The results on the left panel were produced with ϵ1=0.566, ξ=0.04, and those on the right panel were obtained with ϵ1=0.52, ξ=0.5. The remaining parameters are the optimal ones determined using the results presented in Fig. 6. The choice for the left panel is an example of what we called an optimistic choice. The right panel corresponds to a pessimistic choice for the parameters. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

In both panels of Fig. 8 we also see that after day 27 the green dashed curve of Rs(t) grows faster than the data points. This suggests us that the strengthening of the lockdown in Lombardy after Mar. 22 did produce effects in slowing down the number of deaths.

We will then introduce a further parameter ϵ2 such that β(t)=ϵ2β0 for t>27. In order to see the effects of relaxing the social isolation measures, we will also restore β(t) smoothly to its value β0 for t>100. Precisely, we will take

r(t)=1(1ϵ1)θ(t14)(ϵ1ϵ2)θ(t27)+(1ϵ2)θ(t100). (16)

The value for ϵ2 will be determined by minimizing the cost function up to day 63, April 27. It turns out that the best choice for ϵ2 will depend on the choice made for ξ and ϵ1. In Fig. 10 we show the behavior of the cost function, c63, when varying ϵ2 for both the optimistic and pessimistic choices for (ϵ1,ξ) already considered in Fig. 9.

Fig. 10.

Fig. 10

Plots of the cost function c63 for several values of ϵ2 and two choices of (ϵ1,ξ), the same optimistic (blue) and pessimistic (red) choices already used in the panels of Fig. 9. The remaining parameter values are the ones used in Fig. 6. The minima of the cost function occur at ϵ2=0.268 (optimistic case), and ϵ2=0.186 (pessimistic case). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

As the final result of the fitting procedure for Lombardy, we show in Fig. 11 the graphs of the solutions to the A-SIR model with two different sets of parameter values such that the model fits rather well the deaths data.

Fig. 11.

Fig. 11

Expected outcomes for the COVID-19 epidemic in Lombardy for up to 150 days after Feb. 24, compared with the deaths data. We are supposing, as a possibility, that the social isolation measures are completely relaxed on day 100, June 3. An optimistic outcome is presented in the first row, and a pessimistic one in the second row. The plots in the left and right columns differ only in the range of the vertical scale. The parameter values valid for both rows are: γs=17, γa=121, λ+=0.3216, ω=0.051, μ=0.5, see Fig. 6. For the first row only, ϵ1=0.566, ξ=0.04, ϵ2=0.268. For the second row only, ϵ1=0.52, ξ=0.5, ϵ2=0.186. The red lines appearing in the second column are the graphs of γaβ(t)(1ξ)μ, see Eq. (8). Notice that with a good approximation, when the graph of S(t) is below the red line, the fraction of MSA infected individuals decreases. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

A conclusion to be drawn from the results in Fig. 11 is that the A-SIR model can fit the deaths data for the epidemic in Lombardy extremely well if we consider that the social distancing measures adopted there were less intense from Mar. 9 to Mar. 22, days 14 to 27, and more intense from Mar. 22 to June 3, days 27 to 100. On the other hand, the goodness of the fits does not tell us very much about the future, when the social distancing measures are relaxed. Besides the two parameter choices shown, we have many others almost as good as them. The choices shown here are extreme in the sense that ξ=0.04 is not too much above the minimum value of ξ for the points in the darker region in the right panel of Fig. 8, and ξ=0.5 is the maximum value in the same region. Due to Eqs. (10), (9), these are also close to the extreme possibilities for the number of deaths at the end of the epidemic.

The optimistic choice, corresponding to the first row of Fig. 11, is such that after complete relaxation of the social distancing, the number of cases continues decreasing and the number of accumulated deaths increases slowly. This happens because the fraction of susceptible individuals since day 38 falls below the threshold on the right-hand side of Eq. (8) and does not increase very much above it after the social isolation measures are removed on day 100.

On the other hand, for the pessimistic choice, depicted in the second row of Fig. 11, the susceptible fraction is below the threshold on the right-hand side of Eq. (8) between days 27 and 100. The red line representing the threshold is not shown in the figure in this time interval, because the threshold is larger than 1. But the susceptible fraction stays high above the threshold after day 100, when social isolation measures are removed. As a consequence, there is a fast increase of the number of cases after day 100, i.e. a second wave of COVID-19 cases, potentially almost as intense as it would have been if the epidemic remained uncontrolled since the beginning.

If we had more knowledge either on the value of ξ or on the values of ϵ1 and ϵ2, we might choose among the most optimistic, the most pessimistic, or some intermediate possibility between them. Due to lack of such knowledge, we present all possibilities.

The value of ξ could be estimated either by clinical research, or a large scale population screening. Such studies are on their way, and their results might help us in removing the uncertainties. In Section 5 we will comment that if the results of a population screening performed in Spain [23] were extrapolated to Lombardy, this would restrict the possibilities for ξ.

As far as ϵ1 and ϵ2 are concerned, some attempts have been made to measure the intensity of the social distancing using cell phone localization data, in particular for the regions in Italy [24]. A problem is that we do not know how to relate these measures with the decreases ϵ1 and ϵ2 in the infection rate.

3.2. Fitting the epidemic in São Paulo state

In Brazil the first imported case of COVID-19 was identified in the city of São Paulo on Feb. 26, 2020. The first official death, also in São Paulo, occurred on Mar. 17, 2020. For this reason, we chose that date as day 0 for the epidemic in Brazil. Although there has been up to now no nation-wide social distancing measures in Brazil, many state governors and mayors, including the governor of São Paulo state, decreed such measures. It is difficult to identify a clear starting date, as measures were gradual, but it seems reasonable to choose Mar. 23, day number 6 after the first death, as the end of the uncontrolled epidemic phase in the state of São Paulo. Mar. 23 was, in fact, the first day on which all schools were closed in São Paulo state.

After the start of the social distancing measures, cell phone localization data [25] show a tendency of slowly decreasing efficiency of these measures. Although some metropolitan areas in Brazil have already declared lockdown, no such measure has been declared at São Paulo up to the time of the latest data shown in this paper, Apr. 29. As a consequence, we prefer to use a single reduction of infection rates in fitting the epidemic in São Paulo.

The fitting procedure for São Paulo is thus similar to the one for Lombardy, with the exception that we use only one social distancing reduction factor ϵ1 for the entire period of the data. Taking into account that the population of the state of São Paulo is N=4.6×107 people, the initial conditions used for the numerical solution of the ODEs (1) were

S(0)=1,I(0)=3.56522×106,
A(0)=1ρρI(0),Rs(0)=1ω2.17391×108. (17)

The choice of these conditions is analogous to what was done for Lombardy.

The uncontrolled epidemic phase lasted from day 0 (Mar. 17) to day 6 (Mar. 23). We used the deaths during that phase to obtain estimates for λ+ and ω. Fig. 12 shows that the cost function c6(λ+,0.1,0.5,ω) apparently has a global minimum at (λ+,ω)=(0.302,0.074).

Fig. 12.

Fig. 12

Contour plot of c6(λ+,0.1,0.5,ω) for the state of São Paulo. Darker colors are smaller values and lighter colors are higher values of the cost function. In the large white area the cost function has larger values than in the colored area. It seems clear that the function has a global minimum in the darker area. The numerically determined location of the minimum point is λ+=0.302, ω=0.074. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

After determining λ+ and ω, we used the deaths data up to day 43 to try to determine the parameter ξ and the reducing factor ϵ1 for the infection rate during the social distancing measures started on day 6. Similar to Lombardy, Fig. 13 shows a canyon shaped region in which the cost function c43 is close to its minimum. The parameters used in the figure, besides the optimal values for λ+ and ω, are specified in its caption.

Fig. 13.

Fig. 13

Contour plot of c43 as a function of ϵ1 and ξ. The remaining parameters are γs=17, γa=121 (fixed), μ=0.5 (arbitrary) and λ+=0.302, ω=0.074 (determined by minimizing c6). Darker colors are smaller values and lighter colors are higher values of the cost function. The blue canyon shaped region is where c43 is close to its minimum. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

As in the case of Lombardy, we are faced with the fact that good fits of the A-SIR model with the deaths data can be obtained for a continuum of values for ξ. For small values of ξ we have optimistic outcomes, in the sense that the number of deaths at the end of the epidemic is smaller. For larger values of ξ, we have pessimistic scenarios. Fig. 14 shows results both of an optimistic and a pessimistic solution, with different vertical scales.

Fig. 14.

Fig. 14

Plots of expected outcomes for the COVID-19 epidemic in the state of São Paulo for up to 150 days after Mar. 17. The blue dots compare the results with epidemic deaths data. The optimistic and pessimistic outcomes are presented in the first and second rows, respectively. The plots in the different columns are different only in the range of the vertical scale. The parameter values valid for both rows are: γs=17, γa=121, λ+=0.302, ω=0.074, μ=0.5. For the first row only, ϵ1=0.395, ξ=0.02 and the social isolation measures were relaxed on day 100. For the second row only, ϵ1=0.445, ξ=0.5 and the social isolation measures were relaxed only on day 120. The red lines appearing in the second column are the graphs of γaβ(t)(1ξ)μ, see Eq. (8). Notice that with a good approximation, when the graph of S(t) is below the red line, the fraction of MSA infected individuals decreases. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

One important difference between the cases of São Paulo and Lombardy is that the social distancing period started much earlier in São Paulo. As a consequence, the fraction of deceased people in São Paulo is much smaller up to now. The data points in the graphs of the second column in Fig. 14 are almost invisible. Of course, this is good, but it also has a bad side. In both optimistic and pessimistic cases the fraction of susceptible individuals remains for a long period above the threshold on the right-hand side of Eq. (8). Consequently, the fraction of infected individuals will be growing for a longer time when compared to Lombardy.

In the optimistic case, we see that if the social distancing measures remain with the present intensity until day 100 (Jun. 23), the fraction of infected individuals will increase up to a maximum by day 80 and start decreasing. If the social distancing measures are removed on day 100, the fraction of infected individuals will resume growth for some days and then it will decrease again.

In the pessimistic case, the number of infected individuals will grow to a number much larger than the present one. Even if the social distancing measures are relaxed after day 120, there will still be a rapid increase in the number of infected people.

In both cases, we predict that the number of infected individuals at present will still increase considerably before attaining its peak value, which will be by day 90 in the optimistic case, or by day 130 in the pessimistic case. Given that the health services at São Paulo are already operating close to their maximum capacity, in both cases we see that it is necessary that social distancing is intensified in order to raise the threshold for the decrease of the infected fraction and prevent their collapse.

4. Conclusions

The COVID-19 pandemic forced us, scientists, to tackle the difficult task of trying to understand a new disease at the same time it is killing people in our neighborhoods and stressing our health services. As shown in this paper, the lack of solid knowledge on basic questions produces also an ignorance of what may happen in the future, even the present situation being well described by a simple mathematical model. We hope that more basic research may help fill the knowledge gaps, but probably that will take time.

Since the beginning of the pandemic, we have seen many papers with very different predictions on the population fraction that may die as a consequence of COVID-19, most of them too catastrophic. Part of this disparity is due to our ignorance on basic facts about the virus and the disease, as already remarked, and on the number of mildly symptomatic or asymptomatic cases. Another part is a consequence of the difficulty in estimating the many parameters in any realistic mathematical model.

In this paper we used a model as simple as possible in order to have the minimum number of parameters, and devised a procedure to estimate these parameters based only on the more faithful data, the number of deaths. We avoided using information on the number of confirmed cases, or trying to guess underreporting factors. If we had considered more complete models, we would probably have to estimate a larger number of parameters, resulting in a larger uncertainty. We believe that our results, uncertain as they are, may be useful in showing both a worst and a best possible outcome. One important result of our calculations is that the epidemic both in Lombardy and in São Paulo would have been much worse if social distancing measures had not been taken.

We saw that in an optimistic possibility, the number of cases of COVID-19 in Lombardy might not increase after the social distancing measures are relaxed. However, we also saw that it is likely that the number of cases shows a new quick rise, being then necessary either to keep these measures for longer, or use alternative measures. As social distancing is being relaxed at many countries, particularly in Italy, it is possible that the uncertainty in our results will be solved in the next days according to whether the number of cases will grow rapidly, or not.

In São Paulo state, as the fraction of infected individuals is up to now much smaller than in Lombardy, herd immunity is still far. As a consequence, our calculations predict that the fraction of infected individuals will still increase for some time, even in the most optimistic case. As stressed before, strengthening social distance measures could alleviate this situation.

We believe that the two locations in which we fitted the model to real data may serve as examples of what may happen in other locations.

5. Post scriptum

As this paper was being written, more data became available both for Lombardy and for São Paulo state. The paper was finished on May 21, but in all figures of Section 3 we decided to keep the data only up to Apr. 29, because the conclusions would not change too much. We report some differences in this section.

The strict lockdown in Italy ended on May 4, but social isolation measures are being slowly relaxed. If we had added the most recent data to Fig. 11, we would still have good fits of the data to the model in both cases, suggesting that the infection rate has not yet increased very much.

If in Eq. (10) we use t=85 and the datum for the population fraction of deceased individuals up to that day, we obtain an estimate Rs(85)=0.0306. Taking into account Eq. (9) and the fact that Rs is an increasing function, this means that although values of ξ smaller than 0.0306 are allowed by Fig. 8, these must be ruled out. The optimistic value ξ=0.04 used in Fig. 11 remains as a possibility.

On May 13 appeared the first results of a serological study for a large random sample of the population in Spain [23]. Although we have not used any data of Spain in this paper, one result in the cited report is interesting to consider here. On page 12 of [23] a map shows the percentages of people having antibodies against SARS-COV-2 in all provinces of Spain. In the provinces where the epidemic was stronger, these percentages vary between 10.9% and 14.2%. This means that the fraction of susceptible individuals varies between 0.858 and 0.891 in these provinces. Of course these susceptible fractions cannot be blindly extended to any other location, but if we extrapolate them to Lombardy, that would also rule out the optimistic value, ξ=0.04, because it produces values for the susceptible fraction considerably smaller.

Regarding the epidemic in São Paulo,  Table 1 shows the ratio of the maximum predicted symptomatic infected individuals to the present value (day 65) of the same quantity in three situations: the pessimistic and optimistic situations considered in Fig. 14, and an intermediate situation (graphs not shown) with ξ=0.12, ϵ=0.405 and social distance measures with the present intensity up to day 120. The table also shows the predicted date of the maximum.

Table 1.

Predicted day/date for the maximum of symptomatic infected individuals in the state of São Paulo and predicted ratio of the maximum symptomatic infected individuals to the present number of symptomatic infected individuals, day 65 (May 21). We consider three cases: the optimistic and pessimistic cases of Fig. 14 and an intermediate one with ξ=0.12, ϵ=0.405 and social distancing lasting up to day 120.

Day/Date ImaxI(65)
Optimistic 80 / June 5 1.38
Pessimistic 130 / July 25 39.8
Intermediate 100/ June 25 5.49

We see that even in an optimistic situation, an increase of 38% in the number of cases in only 16 days is expected from the results of the A-SIR model. Intensified social distancing measures might help mitigating this situation. In the other two cases, the recommendation for intensifying social distancing is of course still stronger.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We thank the members of the COVID-19 Modeling Task Force in Minas Gerais, Brazil, for suggesting many references and for discussions of results in earlier phases of this research.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Communicated by V.M. Perez-Garcia

References


Articles from Physica D. Nonlinear Phenomena are provided here courtesy of Elsevier

RESOURCES