Skip to main content
Royal Society Open Science logoLink to Royal Society Open Science
. 2020 Dec 2;7(12):201878. doi: 10.1098/rsos.201878

SI epidemic model applied to COVID-19 data in mainland China

J Demongeot 1, Q Griette 2,3, P Magal 2,3,
PMCID: PMC7813244  PMID: 33489297

Abstract

The article is devoted to the parameters identification in the SI model. We consider several methods, starting with an exponential fit to the early cumulative data of SARS-CoV2 in mainland China. The present methodology provides a way to compute the parameters at the early stage of the epidemic. Next, we establish an identifiability result. Then we use the Bernoulli–Verhulst model as a phenomenological model to fit the data and derive some results on the parameters identification. The last part of the paper is devoted to some numerical algorithms to fit a daily piecewise constant rate of transmission.

Keywords: corona virus, reported and unreported cases, parameters identification, epidemic mathematical model

1. Introduction

Estimating the average transmission rate is one of the most crucial challenges in the epidemiology of communicable diseases. This rate conditions the entry into the epidemic phase of the disease and its return to the extinction phase, if it has diminished sufficiently. It is the combination of three factors, one, the coefficient of virulence, linked to the infectious agent (in the case of infectious transmissible diseases), the other, the coefficient of susceptibility, linked to the host (all summarized into the probability of transmission), and also, the number of contacts per unit of time between individuals [1]. The coefficient of virulence may change over time due to mutation over the course of the disease history. The second and third also, if mitigation measures have been taken. This was the case in China from the start of the pandemic [2]. Monitoring the decrease in the average transmission rate is an excellent way to monitor the effectiveness of these mitigation measures. Estimating the rate is therefore a central problem in the fight against epidemics.

The goal of this article is to understand how to compare the SI model to the reported epidemic data and therefore the model can be used to predict the future evolution of epidemic spread and to test various possible scenarios of social mitigation measures. For tt0, the SI model is the following:

S(t)=τ(t)S(t)I(t)andI(t)=τ(t)S(t)I(t)νI(t),} 1.1

where S(t) is the number of susceptible and I(t) the number of infectious at time t. This system is supplemented by initial data

S(t0)=S00andI(t0)=I00. 1.2

In this model, the rate of transmission τ(t) combines the number of contacts per unit of time and the probability of transmission. The transmission of the pathogen from the infectious to the susceptible individuals is described by a mass action law τ(t) S(t) I(t) (which is also the flux of new infectious).

The quantity 1/ν is the average duration of the infectious period and νI(t) is the flux of recovering or dying individuals. At the end of the infectious period, we assume that a fraction f ∈ (0, 1] of the infectious individuals is reported. Let CR(t) be the cumulative number of reported cases. We assume that

CR(t)=CR0+νfCI(t),\,fortt0, 1.3

where

CI(t)=t0tI(σ)dσ. 1.4

Assumption 1.1. —

We assume that

  • S0 > 0 the number of susceptible individuals at time t0 when we start to use the model;

  • 1/ν > 0 the average duration of infectious period;

  • f > 0 the fraction of reported individuals;

are known parameters.

Throughout this paper, the parameter S0 = 1.4 × 109 will be the entire population of mainland China (since COVID-19 is a newly emerging disease). The actual number of susceptibles S0 can be smaller since some individuals can be partially (or totally) immunized by previous infections or other factors. This is also true for SARS-CoV2, even if COVID-19 is a newly emerging disease. In fact, for COVID-19 the level of susceptibility may depend on blood group and genetic lineage. It is indeed suspected that the blood group O is associated with a lower susceptibility to SARS-CoV2 while a gene cluster inherited from Neanderthal has been identified as a risk factor for severe symptoms [3,4].

At the early beginning of the epidemic, the average duration of the infectious period 1/ν is unknown, since the virus has never been investigated in the past. Therefore, at the early beginning of the COVID-19 epidemic, medical doctors and public health scientists used previously estimated average duration of the infectious period to make some public health recommendations. Here we show that the average infectious period is impossible to estimate by using only the time series of reported cases, and must therefore be identified by other means. Actually, with the data of SARS-CoV2 in mainland China, we will fit the cumulative number of the reported case almost perfectly for any non-negative value 1/ν < 3.3 days. In the literature, several estimations were obtained: 11 days in [5], 9.5 days in [6], 8 days in [7] and 3.5 days in [8]. The recent survey by Byrne et al. [9] focuses on this subject.

Result.

In §3, our analysis shows that:

  • It is hopeless to estimate the exact value of the duration of infectiousness by using SI models. Several values of the average duration of the infectious period give the exact same fit to the data.

  • We can estimate an upper bound for the duration of infectiousness by using SI models. In the case of SARS-CoV2 in mainland China, this upper bound is 3.3 days.

In [10], it is reported that transmission of COVID-19 infection may occur from an infectious individual who is not yet symptomatic. In [11], it is reported that COVID-19-infected individuals generally develop symptoms, including mild respiratory symptoms and fever, on average 5–6 days after the infection date (with a confidence of 95%, range 1–14 days). In [12], it is reported that the median time prior to symptom onset is 3 days, the shortest 1 day, and the longest 24 days. It is evident that these time periods play an important role in understanding COVID-19 transmission dynamics. Here the fraction of reported individuals f is unknown as well.

Result.

In §3, our analysis shows that:

  • It is hopeless to estimate the fraction of reported by using the SI models. Several values for the fraction of reported give the exact same fit to the data.

  • We can estimate a lower bound for the fraction of unreported. We obtain 3.83 × 10−5 < f ≤ 1. This lower bound is not significant. Therefore, we can say anything about the fraction of unreported from this class of models.

As a consequence, the parameters 1/ν and f have to be estimated by another method, for instance by a direct survey methodology that should be employed on an appropriated sample in the population in order to evaluate the two parameters.

The goal of this article is to focus on the estimation of the two remaining parameters. Namely, knowing the above-mentioned parameters, we plan to identify

  • I0 the initial number of infectious at time t0;

  • τ(t) the rate of transmission at time t.

This problem has already been considered in several articles. In the early 1970s, London & Yorke [13,14] already discussed the time-dependent rate of transmission in the context of measles, chickenpox and mumps. More recently, in Wang & Ruan [15] the question of reconstructing the rate of transmission was considered for the 2002–2004 SARS outbreak in China. In Chowell et al. [16], a specific form was chosen for the rate of transmission and applied to the Ebola outbreak in Congo. Another approach was also proposed in Smirnova et al. [17].

In §2, we will explain how to apply the method introduced in Liu et al. [18] to fit the early cumulative data of SARS-CoV2 in China. This method provides a way to compute I0 and τ0 = τ(t0) at the early stage of the epidemic. In §3, we establish an identifiability result in the spirit of Hadeler [19].

In §4, we use the Bernoulli–Verhulst model as a phenomenological model to describe the data. As it was observed in several articles, the data from mainland China (and other countries as well) can be fitted very well by using this model. As a consequence, we will obtain an explicit formula for τ(t) and I0 expressed as a function of the parameters of the Bernoulli–Verhulst model and the remaining parameters of the SI model. This approach gives a very good description of this set of data. The disadvantage of this approach is that it requires an evaluation of the final size CR from the early beginning (or at least it requires an estimation of this quantity).

Therefore, in order to be predictive, we will explore in the remaining sections of the paper the possibility of constructing a day-by-day rate of transmission. Here we should refer to Bakhta et al. [20] where another novel forecasting method was proposed.

In §5, we will prove that the daily cumulative data can be approached perfectly by at most one sequence of day-by-day piecewise constant transmission rates. In §6, we propose a numerical method to compute such a (piecewise constant) rate of transmission. Section 7 is devoted to the discussion, and we will present some figures showing the daily basic reproduction number for the COVID-19 outbreak in mainland China.

2. Estimating τ(t0) and I0 at the early stage of the epidemic

In this section, we apply the method presented in [21] to the SI model. At the early stage of the epidemic, we can assume that S(t) is almost constant and equal to S0. We can also assume that τ(t) remains constant equal to τ0 = τ(t0). Therefore, by replacing these parameters into the I-equation of system (1.1) we obtain

I(t)=(τ0S0ν)I(t).

Therefore,

I(t)=I0exp(χ2(tt0)),

where

χ2=τ0S0ν. 2.1

By using (1.3), we obtain

CR(t)=CR0+νfI0eχ2(tt0)1χ2. 2.2

We obtain a first phenomenological model for the cumulative number of reported cases (valid only at the early stage of the epidemic)

CR(t)=χ1eχ2tχ3. 2.3

In figure 1, we compare the model to the COVID-19 data for mainland China. The data used in the article are taken from [2224] and reported in appendix A. In order to estimate the parameter χ3, we minimize the distance between CRData(t) + χ3 and the best exponential fit tχ1eχ2t (i.e. we use the Matlab function fit(t, data,‘exp1’)).

Figure 1.

Figure 1.

In this figure, we plot the best fit of the exponential model to the cumulative number of reported cases of COVID-19 in mainland China between 19 February and 1 March. We obtain χ1 = 3.7366, χ2 = 0.2650 and χ3 = 615.41 with t0 = 19 Feb. The parameter χ3 is obtained by minimizing the error between the best exponential fit and the data.

The estimated initial number of infected and transmission rate.

By using (1.3) and (2.3), we obtain

I0=CR(t0)νf=χ1χ2eχ2t0νf, 2.4

and by using (2.1)

τ0=χ2+νS0. 2.5

Remark 2.1. —

Fixing f = 0.5 and ν = 0.2, we obtain

I0=3.7366×0.2650×exp(0.2650×19)(0.2×0.5)=1521

and

τ0=0.2650+0.21.4×109=3.3214×1010.

The influence of the errors made in the estimations (at the early stage of the epidemic) has been considered in the recent article by Roda et al. [25]. To understand this problem, let us first consider the case of the rate of transmission τ(t) = τ0 in the model (1.1). In that case (1.1) becomes

S(t)=τ0S(t)I(t)andI(t)=τ0S(t)I(t)νI(t).} 2.6

By using the S-equation of model (2.6) we obtain

S(t)=S0exp(τ0t0tI(σ)dσ)=S0exp(τ0CI(t)),

where CI(t) is the cumulated number of infectious individuals. Substituting S(t) by this formula in the I-equation of (2.6) we obtain

I(t)=S0exp(τ0CI(t))τ0CI(t)νI(t).

Therefore, by integrating the above equation between t and t0 we obtain

CI(t)=I0+S0[1exp(τ0CI(t))]νCI(t). 2.7

Remarkably, equation (2.7) is monotone. We refer to Smith [26] for a comprehensive presentation on monotone systems. By applying a comparison principle to (2.7), we are in a position to confirm the intuition about epidemics SI models. Note that the monotone properties are only true for the cumulative number of infectious (this is false for the number of infectious).

Theorem 2.2. —

Let t > t0 be fixed. The cumulative number of infectious CI(t) is strictly increasing with respect to the following quantities

  • (i)

    I0 > 0 the initial number of infectious individuals;

  • (ii)

    S0 > 0 the initial number of susceptible individuals;

  • (iii)

    τ > 0 the transmission rate;

  • (iv)

    1/ν > 0 the average duration of the infectiousness period.

Error in the estimated initial number of infected and transmission rate.

Assume that the parameters χ1 and χ2 are estimated with a 95% confidence interval

χ1,95%χ1χ1,95%+

and

χ2,95%χ2χ2,95%+.

We obtain

I0,95%:=χ1,95%χ2,95%eχ2,95%t0νfI0I0,95%+:=χ1,95%+χ2,95%+eχ2,95%+t0νf 2.8

and

τ0,95%:=χ2,95%+νS0τ0τ0,95%+:=χ2,95%++νS0. 2.9

Remark 2.3. —

By using the data for mainland China, we obtain

χ1,95%=1.57,χ1,95%+=5.89,χ2,95%=0.24,χ2,95%+=0.28. 2.10

In figure 2, we plot the upper and lower solutions CR+(t) (obtained by using I0=I0,95%+ and τ0=τ0,95%+) and CR(t) (obtained by using I0=I0,95% and τ0=τ0,95%) corresponding to the blue region and the black curve corresponds to the best estimated value I0 = 1521 and τ0 = 3.3214 × 10−10.

Figure 2.

Figure 2.

In this figure, the black curve corresponds to the cumulative number of reported cases CR(t) obtained from the model (2.6) with CR′(t) = νf I(t) by using the values I0 = 1521 and τ0 = 3.32 × 10−10 obtained from our method and the early data from 19 February to 1 March. The blue region corresponds to the 95% confidence interval when the rate of transmission τ(t) is constant and equal to the estimated value τ0 = 3.32 × 10−10.

Recall that the final size of the epidemic corresponds to the positive equilibrium of (2.7)

0=I0+S0[1exp(τ0CI)]νCI. 2.11

In figure 2, the changes in the parameters I0 and τ0 (in (2.8) and (2.9)) do not affect significantly the final size.

3. Theoretical formula for τ(t)

By using the S-equation of model (1.1) we obtain

S(t)=S0exp(t0tτ(σ)I(σ)dσ),

next by using the I-equation of model (1.1) we obtain

I(t)=S0exp(t0tτ(σ)I(σ)dσ)τ(t)I(t)νI(t),

and by taking the integral between t and t0 we obtain a Volterra integral equation for the cumulative number of infectious

CI(t)=I0+S0[1exp(t0tτ(σ)I(σ)dσ)]νCI(t), 3.1

which is equivalent to (by using (1.3))

CR(t)=νf(I0+S0[1exp(1νft0tτ(σ)CR(σ)dσ)])+νCR0νCR(t). 3.2

The following result permits to obtain a perfect match between the SI model and the time-dependent rate of transmission τ(t).

Theorem 3.1. —

Let S0, ν, f, I0 > 0 and CR0 ≥ 0 be given. Let tI(t) be the second component of system (1.1). Let CR^:[t0,)R be a two times continuously differentiable function satisfying

CR^(t0)=CR0, 3.3
CR^(t0)=νfI0, 3.4
CR^(t)>0,tt0 3.5
andνf(I0+S0)CR^(t)ν(CR^(t)CR0)>0,tt0. 3.6

Then

CR^(t)=CR0+νft0tI(s)ds,tt0, 3.7

if and only if

τ(t)=νf(CR^(t)/CR^(t)+ν)νf(I0+S0)CR^(t)ν(CR^(t)CR0). 3.8

Proof. —

Assume first (3.7) is satisfied. Then by using equation (3.1) we deduce that

S0exp(t0tτ(σ)I(σ)dσ)=I0+S0I(t)νCI(t).

Therefore,

t0tτ(σ)I(σ)dσ=ln[S0I0+S0I(t)νCI(t)]=ln(S0)ln[I0+S0I(t)νCI(t)]

therefore by taking the derivative on both sides

τ(t)I(t)=I(t)+νI(t)I0+S0I(t)νCI(t)τ(t)=(I(t)/I(t))+νI0+S0I(t)νCI(t) 3.9

and by using the fact that CR(t) − CR0 = νfCI(t) we obtain (3.8).

Conversely, assume that τ(t) is given by (3.8). Then if we define I~(t)=CR^(t)/νf and CI~(t)=(CR^(t)CR0)/νf, by using (3.3) we deduce that

CI~(t)=t0tI~(σ)dσ,

and by using (3.4)

I~(t0)=I0. 3.10

Moreover from (3.8), we deduce that I~(t) satisfies (3.9). By using (3.10), we deduce that tCI~(t) is a solution of (3.1). By uniqueness of the solution of (3.1), we deduce that CI~(t)=CI(t),tt0 or equivalently CR(t)=CR0+νft0tI(s)ds,tt0. The proof is completed. ▪

Formula (3.8) was already obtained by Hadeler ([19], see corollary 2).

4. Explicit formula for τ(t) and I0

Many phenomenological models have been compared to the data during the first phase of the COVID-19 outbreak. We refer to the paper of Tsoularis & Wallace [27] for a nice survey on the generalized logistic equations. Let us consider here for example, the Bernoulli–Verhulst equation

CR(t)=χ2CR(t)(1(CR(t)CR)θ),tt0, 4.1

supplemented with the initial data

CR(t0)=CR00.

Let us recall the explicit formula for the solution of (4.1)

CR(t)=eχ2(tt0)CR0[1+(χ2θ/CRθ)t0t(eχ2(σt0)CR0)θdσ]1/θ=eχ2(tt0)CR0[1+(CR0θ/CRθ)(eχ2θ(tt0)1)]1/θ. 4.2

Assumption 4.1. —

We assume that the cumulative numbers of reported cases CRData(ti) are known for a sequence of times t0 < t1 < · · · < tn+1 (see figure 3).

Figure 3.

Figure 3.

In this figure, we plot the best fit of the Bernoulli–Verhulst model to the cumulative number of reported cases of COVID-19 in China. We obtain χ2 = 0.66 and θ = 0.22. The black dots correspond to data for the cumulative number of reported cases and the red curve corresponds to the model.

Estimated initial number of infected.

By combining (1.3) and the Bernoulli–Verhulst equation (4.1) for t → CR(t), we deduce the initial number of infected

I0=CR(t0)νf=χ2CR0(1(CR0/CR)θ)νf. 4.3

Remark 4.2. —

We fix f = 0.5, from the COVID-19 data in mainland China and formula (4.3) (with CR0 = 198), we obtain

I0=1909forν=0.1

and

I0=954\,forν=0.2.

By using (4.1), we deduce that

CR(t)=χ2CR(t)(1(CR(t)CR)θ)χ2θCRθCR(t)(CR(t))θ1CR(t)=χ2CR(t)(1(CR(t)CR)θ)χ2θCRθ(CR(t))θCR(t),

therefore

CR(t)=χ2CR(t)(1(1+θ)(CR(t)CR)θ). 4.4

Estimated rate of transmission.

By using the Bernoulli–Verhulst equation (4.1) and substituting (4.4) in (3.8), we obtain

τ(t)=νf(χ2(1(1+θ)(CR(t)/CR)θ)+ν)νf(I0+S0)+νCR0CR(t)(χ2(1(CR(t)/CR)θ)+ν). 4.5

This formula (4.5) combined with (4.2) gives an explicit formula for the rate of transmission.

Since CR(t) < CR, by considering the sign of the numerator and the denominator of (4.5), we obtain the following proposition.

Proposition 4.3. —

The rate of transmission τ(t) given by (4.5) is non-negative for all tt0 if

νχ2θ 4.6

and

f(I0+S0)+νCR0>CR(χ2+ν). 4.7

Compatibility of the model SI with the COVID-19 data for mainland China.

The model SI is compatible with the data only when τ(t) stays positive for all tt0. From our estimation of the Chinese’s COVID-19 data, we obtain χ2 θ = 0.14. Therefore from (4.6), we deduce that model is compatible with the data only when

1/ν10.14=3.3days. 4.8

This means that the average duration of infectious period 1/ν must be shorter than 3.3 days.

Similarly, the condition (4.7) implies

fCRχ2+(CRCR0)νS0+I0CRχ2+(CRCR0)χ2θS0+I0

and since we have CR0 = 198 and CR = 67 102, we obtain

f67102×0.66+(67102198)×0.141.4×1093.83×105. 4.9

So according to this estimation the fraction of unreported 0 < f ≤ 1 can be almost as small as we want.

Figure 4 illustrates proposition 4.3. We observe that the formula for the rate of transmission (4.5) becomes negative whenever ν < χ2θ. In figure 5, we plot the numerical simulation obtained from (1.1) to (1.3) when tτ(t) is replaced by the explicit formula (4.5). It is surprising that we can reproduce perfectly the original Bernoulli–Verhulst even when τ(t) becomes negative (see figure 3). This was not guaranteed at first, since the I-class of individuals is losing some individuals which are recovering.

Figure 4.

Figure 4.

In this figure, we plot the rate of transmission obtained from formula (4.5) with f = 0.5, χ2 θ = 0.145 < ν = 0.2 (in (a)) and ν = 0.1 < χ2 θ = 0.145 (in (b)), χ2 = 0.66 and θ = 0.22, and CR = 67 102, which is the latest value obtained from the cumulative number of reported cases for China.

Figure 5.

Figure 5.

In this figure, we plot the number of reported cases by using model (1.1) and (1.3), with the rate of transmission obtained in (4.5). The parameters values are f = 0.5, ν = 0.1 or ν = 0.2, χ2 = 0.66 and θ = 0.22, and CR = 67 102 is the latest value obtained from the cumulative number of reported cases for China. Furthermore, we use S0 = 1.4 × 109 for the total population of China and I0 = 954 which is obtained from formula (4.3). The black dots correspond to observed data for the cumulative number of reported cases and the blue curve corresponds to the model.

5. Computing numerically a day-by-day piecewise constant rate of transmission

Assumption 5.1. —

We assume that the rate of transmission τ(t) is piecewise constant and for each i = 0, …, n,

τ(t)=τi,whenevertit<ti+1. 5.1

For t ∈ [ti−1, ti], we deduce by using assumption 5.1 that

t0tτ(σ)CR(σ)dσ=j=0i2tjtj+1τjCR(σ)dσ+ti1tτi1CR(σ)dσ.

Therefore by using (3.2), for t ∈ [ti−1, ti], we obtain

CR(t)=νf(I0+S0[1Πi1exp(τi1νf[CR(t)CR(ti1)])])+νCR0νCR(t), 5.2

where

Πi1=exp(j=0i2τjνf[CR(tj+1)CR(tj)]). 5.3

By fixing τi−1 = 0 on the right-hand side of (5.2), we get

CR(t)νf(I0+S0[1Πi1])+νCR0νCR(t),

and when τi−1 → ∞ we obtain

CR(t)νf(I0+S0)+νCR0νCR(t).

By using the theory of monotone ordinary differential equations [26], we deduce that the map τi → CR(ti) is monotone increasing, and we get the following result.

Theorem 5.2. —

Let assumptions 1.1, 4.1 and 5.1 be satisfied. Let I0 be fixed. Then we can find a unique sequence τ0, τ1, …, τn of non-negative numbers such that t → CR(t) the solution of (3.2) fits exactly the data at any time ti, that is to say that

CR(ti)=CRData(ti),i=1,,n+1,

if and only if the following two conditions are satisfied for each i = 0, 1, …, n + 1,

CRData(ti)eν(titi1)CRData(ti1)+ti1tiνeν(tiσ)dσ(f(I0+S0[1Πi1Data])+CR0), 5.4

where

Πi1Data=exp(j=0i2τjνf[CRData(tj+1)CRData(tj)]) 5.5

and

CRData(ti)eν(titi1)CRData(ti1)+ti1tiνeν(tiσ)dσ(f(I0+S0)+CR0). 5.6

Remark 5.3. —

The above theorem means that the data are identifiable for this model SI if and only if the conditions (5.4) and (5.6) are satisfied. Moreover, in that case, we can find a unique sequence of transmission rates τi ≥ 0 which gives a perfect fit to the data.

6. Numerical simulations

In this section, we propose a numerical method to fit the day-by-day rate of transmission. The goal is to take advantage of the monotone property of CR(t) with respect to τi on the time interval [ti, ti+1]. Recently, more sophisticated methods were proposed by Bakhta et al. [20] by using several types of approximation methods for the rate of transmission.

We start with the simplest algorithm 1 in order to show the difficulties to identify the rate of transmission.

Algorithm 1

Step 1: We fix S0=1.4×109, ν=0.1 or ν=0.2 and f=0.5. We consider the system

S(t)=τS(t)I(t),I(t)=τS(t)I(t)νI(t)andCR(t)=νfI(t),} 6.1

on the interval of time t[t0,t1]. This system is supplemented by initial value S(t0)=S0 and I(t0)=I0 is given by formula (2.4) (if we consider the data only at the early stage) or formula (4.3) (if we consider all the data) and CR(t0)=CRData(t0) is obtained from the data.

The map τCR(t1) being monotone increasing, we can apply a bisection method to find the unique value τ0 solving

CR(t1)=CRData(t1).

Then we proceed by induction.

Step i: For each integer i=1,,n we consider the system

S(t)=τS(t)I(t),I(t)=τS(t)I(t)νI(t)andCR(t)=νfI(t),} 6.2

on the interval of time t[ti,ti+1]. This system is supplemented by initial values S(ti) and I(ti) obtained from the previous iteration and with CR(ti)=CRData(ti) obtained from the data.

The map τCR(ti) being monotone increasing, we can apply a bisection method to find the unique value τi solving

CR(ti)=CRData(ti).

In figure 6, we plot an example of such a perfect fit, which is the same for ν = 0.1 and ν = 0.2. In figure 7, we plot the rate of transmission obtained numerically for ν = 0.2 in (a) and ν = 0.1 in (b). This is an example of a negative rate of transmission. Figure 7 should be compared to figure 4 which gives a similar result.

Figure 6.

Figure 6.

In this figure, we plot the perfect fit to the cumulative number of reported cases of COVID-19 in China. We fix the parameters f = 0.5 and ν = 0.2 or ν = 0.1 and we apply our algorithm 1 to obtain the perfect fit. The black dots correspond to data for the cumulative number of reported cases and the blue curve corresponds to the model.

Figure 7.

Figure 7.

In this figure, we plot the rate of transmission obtained for the reported cases of COVID-19 in China with the parameters f = 0.5 and ν = 0.2 in (a) and ν = 0.1 in (b). This rate of transmission corresponds to the perfect fit obtained in figure 6.

In figures 810, we use algorithm 1 and we plot the rate of transmission obtained by using the reported cases of COVID-19 in China where the parameters are fixed as f = 0.5 and ν = 0.2. In figures 810, we observe an oscillating rate of transmission which is alternately positive and negative back and forth. These oscillations are due to the amplification of the error in the numerical method itself. In figure 8, we run the same simulation as in figure 9 but during a shorter period. In figure 8, we can see that the slope of CR(t) at the t = ti between 2 days (the black dots) is amplified 1 day to the next.

Figure 8.

Figure 8.

In (a), we plot the cumulative number of reported cases obtained from the data (black dots) and the model (blue curve). In (b), we plot the daily rate of transmission obtained by using algorithm 1. We see that we can fit the data perfectly. But the method is very unstable. We obtain a rate of transmission that oscillates from positive to negative values back and forth.

Figure 10.

Figure 10.

We apply algorithm 1 to the regularized data. In (a), we plot the regularized cumulative number of reported cases obtained from the data (black dots) and the model (blue curve). In (b), we plot the daily rate of transmission obtained by using algorithm 1. We see that we can fit the data perfectly. But the method is very unstable. We obtain a rate of transmission that oscillates from positive to negative values back and forth.

Figure 9.

Figure 9.

In (a), we plot the cumulative number of reported cases obtained from the data (black dots) and the model (blue curve) on a period six times longer than in figure 8. In (b), we plot the daily rate of transmission obtained by using algorithm 1. We see that we can fit the data perfectly. But the method is very unstable like on figure 8. We obtain a rate of transmission that oscillates from positive to negative values back and forth.

In figure 10, we first smooth the original cumulative data by using the Matlab function CRData = smoothdata(CRData,‘gaussian’,50) to regularize the data and we apply algorithm 1. Unfortunately, smoothing the data does not help to solve the instability problem in figure 10.

We need to introduce a correction when choosing the next initial value I(ti). In algorithm 1, the errors are due to the following relationship:

CR(t)=νfI(t),

which is not respected at the points t = ti which should be reflected by the algorithm.

In figure 11, we smooth the data first by using the Matlab function CRData = smoothdata(CRData, ′gaussian′,50), and we apply algorithm 2 by approximating equation (6.6) by

Ii=[CRData(ti)CRData(ti1)](ν×f). 6.3

In figure 11, we no longer observe the oscillations of the rate of transmission.

Figure 11.

Figure 11.

In this figure, we plot the rate of transmission obtained by using the reported cases of COVID-19 in China with the parameters f = 0.5 and ν = 0.2. We first regularize the data by applying the Matlab function CRData = smoothdata(CRData, ‘gaussian’,50). Then we apply algorithm 2 to the regularized data. In (a), we plot the regularized cumulative number of reported cases obtained after smoothing (black dots) and the model (blue curve). In (b), we plot the daily rate of transmission obtained by using algorithm 2. We see that we can fit the data perfectly and this time the rate of transmission is becoming reasonable.

Algorithm 2

We fix S0=1.4×109, ν=0.1 or ν=0.2 and f=0.5. Then we fit the data by using the method described in §2 to estimate the parameters χ1, χ2 and χ3 from day 1 to 10. Then we use

S0=1.40005×109,I0=χ2χ1[exp(χ2(t01))](fν)andCR0=χ1exp(χ2t0)χ3.} 6.4

For each integer i=0,,n, we consider the system

S(t)=τS(t)I(t),I(t)=τS(t)I(t)νI(t)andCR(t)=νfI(t),} 6.5

for t[ti,ti+1]. Then the map τCR(ti+1) being monotone increasing, we can apply a bisection method to find the unique τi solving

CR(ti+1)=CRData(ti+1).

The key idea of this new algorithm is the following correction on the I-component of the system. We start a new step by using the value S(ti) obtained from the previous iteration and

Ii=CRData(ti)(νf) 6.6

and

CRi=CRData(ti). 6.7

In figure 12, we plot several types of regularized cumulative data in (a) and several types of regularized daily data in (b). Among the different regularization methods, an important one is the Bernoulli–Verhulst best-fit approximation.

Figure 12.

Figure 12.

In this figure, we plot the cumulative number of reported cases (a) and the daily number of reported cases (b). The black curves are obtained by applying the cubic spline Matlab function ‘spline(Days,DATA)’ to the cumulative data. The left-hand side is obtained by using the cubic spline function and right-hand side is obtained by using the derivative of the cubic spline interpolation. The blue curves are obtained by using cubic spline function to the day-by-day values of cumulative number of cases obtained from the best fit of the Bernoulli–Verhulst model. The orange curves are obtained by computing the rolling weekly daily number of cases (we use the Matlab function ‘smoothdata(DAILY,‘movmean’,7)’) and then by applying the cubic spline function to the corresponding cumulative number of cases. The yellow curves are obtained by using the Gaussian weekly smoothing to the daily number of cases (we use the Matlab function ‘smoothdata(DAILY,‘gaussian’,7)’) and then by applying the cubic spline function to the corresponding cumulative number of cases.

In figure 13, we plot the rate of transmission tτ(t) obtained by using algorithm 2. We can see that the original data give a negative transmission rate while at the other extreme the Bernoulli–Verhulst seems to give the most regularized transmission rate. In figure 13a, we observe that we now recover almost perfectly the theoretical transmission rate obtained in §4. In figure 13b, the rolling weekly average regularization and in figure 13c the Gaussian weekly average regularization still vary a lot and in both cases, the transmission rate becomes negative after some time. In figure 13c, the original data give a transmission rate that is negative from the beginning. We conclude that it is crucial to find a ‘good’ regularization of the daily number of cases. So far the best regularization method is obtained by using the best fit of the Bernoulli–Verhulst model.

Figure 13.

Figure 13.

In this figure, we plot the transmission rates tτ(t) obtained by using algorithm 2 with the parameters f = 0.5 and ν = 0.2. We use the cumulative data obtained by using (a) the Bernoulli–Verhulst regularization, (b) the rolling weekly average regularization, (c) the Gaussian weekly average regularization and in (d) we use the original cumulative data.

Remark 6.1. —

For each simulation figure 13b,c, it is possible to obtain a transmission rate tτ(t) that is non-negative for all time t by increasing sufficiently the parameter ν. Nevertheless, we do not present these simulations here because the corresponding values of ν to obtain a non-negative τ(t) are unrealistic.

In figure 14(ad respectively), we plot the daily basic reproduction number corresponding to the figure 13(ad respectively). The red line corresponds to R0 = 1. We see some complex behaviour for figure 14b,c,d is again unrealistic.

Figure 14.

Figure 14.

In this figure, we plot the daily basic reproduction number tR0(t) = τ(t)S(t)/ν obtained by using algorithm 2 with the parameters f = 0.5 and ν = 0.2. We use the cumulative data obtained by using (a) the Bernoulli–Verhulst regularization, (b) the rolling weekly average regularization, (c) the Gaussian weekly average regularization and in (d) we use the original cumulative data.

7. Discussion

Estimating the parameters of an epidemiological model is always difficult and generally requires strong assumptions about their value and their consistency and constancy over time. Despite this, it is often shown that many sets of parameter values are compatible with a good fit of the observed data. The new approach developed in this article consists first of all in postulating a phenomenological model of growth of infectious, based on the very classic model of Verhulst, proposed in demography in 1838 [28]. Then, obtaining explicit formulae for important parameter values such as the transmission rate or the initial number of infected (or for lower and/or upper limits of these values), gives an estimate allowing an almost perfect reconstruction of the observed dynamics.

The uses of phenomenological models can also be regarded as a way of smoothing the data. Indeed, the errors concerning the observations of new infected cases are numerous:

  • the census is rarely regular and many countries report late cases that occurred during the weekend and at varying times over-add data from specific counts, such as those from homes for the elderly;

  • the number of cases observed is still underestimated and the calculation of not-reported new cases of infected is always a difficult problem [21];

  • the raw data are sometimes reduced for medical reasons of poor diagnosis or lack of detection tools, or for reasons of domestic policy of states.

For all these causes of error, it is important to choose the appropriate smoothing method (moving average, spline, Gaussian kernel, auto-regression, generalized linear model, etc.). In this article, several methods were used and the one which allowed the model to perfectly match the smoothed data was retained.

In this article, we developed several methods to understand how to reconstruct the rate of transmission from the data. In §2, we reconsidered the method presented in [21] based on an exponential fit to the early data. The approach gives a first estimation of I0 and τ0. In §3, we prove a result to connect the time-dependent cumulative reported data and the transmission rate. In §4, we compare the data to the Bernoulli–Verhulst model and we use this model as a phenomenological model. The Bernoulli–Verhulst model fits the data for mainland China very well. Next by replacing the data by the solution of the Bernoulli–Verhulst model, we obtain an explicit formula for the transmission rate. So we derive some conditions on the parameters for the applicability of the SI model to the data for mainland China. In §5, we discretized the rate of transmission and we observed that given some daily cumulative data, we can get at most one perfect fit the data. Therefore, in §6, we provide two algorithms to compute numerically the daily rates of transmission. Such numerical questions turn out to be a delicate problem. This problem was previously considered by another French group, Bakhta et al. [20]. Here we use some simple ideas to approach the derivative of the cumulative reported cases combined with some smoothing method applied to the data.

To conclude this article, we plot the daily basic reproduction number

R0(t)=τ(t)S(t)ν

as a function of the time t and the parameters f or ν. The above simple formula for R0 is not the real basic reproductive number in the sense of the number of newly infected produced by a single infectious. But this is a simple formula which gives a tendency about the growth or decay of the number of infectious. In figure 15a, the daily basic reproduction number is almost independent of f, while in figure 15b, R0(t) is depending on ν mostly for the small value of ν. The red curve on each surface in figure 15 corresponds to the turning point (i.e. time tt0 for which R0(t) = 1). We also see that turning point is not depending much on these parameters.

Figure 15.

Figure 15.

In this figure we plot R0(t) = τ(t)S(t)/ν the daily basic reproduction number and we vary the parameter f (a) and ν (b).

Concerning contagious diseases, public health physicians are constantly facing four challenges. The first concerns the estimation of the average transmission rate. Until now, no explicit formula had been obtained in the case of the SIR model, according to the observed data of the epidemic, that is to say the number of reported cases of infected patients. Here, from realistic simplifying assumptions, a formula is provided (formula (4.5)), making it possible to accurately reconstruct theoretically the curve of the observed cumulative cases. The second challenge concerns the estimation of the mean duration of the infectious period for infected patients. As for the transmission rate, the same realistic assumptions make it possible to obtain an upper limit to this duration (inequality (4.8)), which makes it possible to better guide the individual quarantine measures decided by the authorities in charge of public health. This upper bound also makes it possible to obtain a lower bound for the percentage of unreported infected patients (inequality (4.8)), which gives an idea of the quality of the census of cases of infected patients, which is the third challenge faced by epidemiologists, specialists of contagious diseases. The fourth challenge is the estimation of the average transmission rate for each day of the infectious period (dependent on the distribution of the transmission over the ‘ages’ of infectivity), which will be the subject of further work and which poses formidable problems, in particular those related to the age (biological age or civil age) class of the patients concerned. Another interesting prospect is the extension of methods developed in the present paper to the contagious non-infectious diseases (i.e. without causal infectious agent), such as social contagious diseases, the best example being that of the pandemic linked to obesity [2931], for which many concepts and modelling methods remain available.

Supplementary Material

Reviewer comments

Appendix A. Supplementary table

We use cumulative reported data from the National Health Commission of the People’s Republic of China and the Chinese CDC for mainland China. Before 11 February, the data were based on confirmed testing. From 11 February to 15 February, the data included cases that were not tested for the virus, but were clinically diagnosed based on medical imaging showing signs of pneumonia. There were 17 409 such cases from 10 February to 15 February. The data from 10 February to 15 February specified both types of reported cases. From 16 February, the data did not separate the two types of reporting, but reported the sum of both types. We subtracted 17 409 cases from the cumulative reported cases after 15 February to obtain the cumulative reported cases based only on confirmed testing after 15 February. The data are given in table 1 with this adjustment.

Table 1.

Cumulative data describing confirmed cases in mainland China from 20 January to 18 March 2020. The data are taken from [2224].

January
19 20 21 22 23 24 25
198 291 440 571 830 1287 1975
26 27 28 29 30 31
2744 4515 5974 7711 9692 11 791
February
1 2 3 4 5 6 7
14 380 17 205 20 438 24 324 28 018 31 161 34 546
8 9 10 11 12 13 14
37 198 40 171 42 638 44 653 46 472 48 467 49 970
15 16 17 18 19 20 21
51 091 70 548–17 409 72 436–17 409 74 185–17 409 75 002–17 409 75 891–17 409 76 288–17 409
22 23 24 25 26 27 28
76 936–17 409 77 150–17 409 77 658–17 409 78 064–17 409 78 497–17 409 78 824–17 409 79 251–17 409
29
79 824–17 409
March
1 2 3 4 5 6 7
79 824–17 409 79 824–17 409 79 824–17 409 80 409–17 409 80 552–17 409 80 651–17 409 80 695–17 409
8 9 10 11 12 13 14
80 735–17 409 80 754–17 409 80 778–17 409 80 793–17 409 80 813–17 409 80 824–17 409 80 844–17 409
15 16 17 18
80 860–17 409 80 881–17 409 80 894–17 409 80 928–17 409

Data accessibility

The data in my paper are public and can be found at: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_mainland_China; http://www.nhc.gov.cn/yjb/pzhgli/new_list.shtml.

Authors' contributions

P.M. conceived and designed the study, and analysed the data. P.M. and Q.G. carried out the analysis and performed numerical simulations, and all authors conducted the literature review. All authors participated in writing and reviewing of the manuscript.

Competing interests

The authors declare no conflict of interest.

Funding

This research was funded by the Agence Nationale de la Recherche in France (Project name: MPCUII (P.M.) and (Q.G.)).

References

  • 1.Magal P, Ruan S. 2014. Susceptible-infectious-recovered models revisited: from the individual level to the population level. Math. Biosci. 250, 26–40. ( 10.1016/j.mbs.2014.02.001) [DOI] [PubMed] [Google Scholar]
  • 2.Qiu Y, Chen X, Shi W. 2020. Impacts of social and economic factors on the transmission of coronavirus disease 2019 (COVID-19). China. J. Popul. Econ. 33, 1127–1172. ( 10.1007/s00148-020-00778-2) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Zeberg H, Pääbo S. 2020. The major genetic risk factor for severe COVID-19 is inherited from Neanderthals. Nature. ( 10.1038/s41586-020-2818-3) [DOI] [PubMed] [Google Scholar]
  • 4.Guillon P, Clément M, Sébille V, Rivain JG, Chou CF, Ruvoën-Clouet N, Le Pendu J. 2008. Inhibition of the interaction between the SARS-CoV spike protein and its cellular receptor by anti-histo-blood group antibodies. Glycobiology 18, 1085–1093. ( 10.1093/glycob/cwn093) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Zhou F. et al. 2020. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet 395, 1054–1062. ( 10.1016/S0140-6736(20)30566-3) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hu Z. et al. 2020. Clinical characteristics of 24 asymptomatic infections with COVID-19 screened among close contacts in Nanjing, China. Sci. China Life Sci. 63, 706–711. ( 10.1007/s11427-020-1661-4) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ma S, Zhang J, Zeng M, Yun Q, Guo W, Zheng Y, Zhao S, Wang MH, Yang Z. 2020. Epidemiological parameters of coronavirus disease 2019: a pooled analysis of publicly reported individual data of 1155 cases from seven countries. medRxiv. ( 10.1101/2020.03.21.20040329) [DOI] [Google Scholar]
  • 8.Li R, Pei S, Chen B, Song Y, Zhang T, Yang W, Shaman J. 2020. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science 368, 489–493. ( 10.1126/science.abb3221) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Byrne AW. et al. 2020. Inferred duration of infectious period of SARS-CoV-2: rapid scoping review and analysis of available evidence for asymptomatic and symptomatic COVID-19 cases. BMJ Open 10, e039856 ( 10.1136/bmjopen-2020-039856) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rothe C. 2020. Transmission of 2019-nCoV Infection from an asymptomatic contact in Germany. N. Engl. J. Med. 382, 970–971. ( 10.1056/NEJMc2001468) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Report of the WHO-China Joint Mission on Coronavirus Disease 2019 (COVID-19). https://www.who.int/docs/default-source/coronaviruse/who-china-joint-mission-on-covid-19-final-report.pdf.
  • 12.Yang Z. et al. 2020. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. J. Thoracic Dis. 12, 165–174. ( 10.21037/jtd.2020.02.64) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.London WP, Yorke JA. 1973. Recurrent outbreaks of measles, chickenpox and mumps: I. Seasonal variation in contact rates. Am. J. Epidemiol. 98, 453–468. ( 10.1093/oxfordjournals.aje.a121575) [DOI] [PubMed] [Google Scholar]
  • 14.Yorke JA, London WP. 1973. Recurrent outbreaks of measles, chickenpox and mumps: II. Systematic differences in contact rates and stochastic effects. Am. J. Epidemiol. 98, 469–482. ( 10.1093/oxfordjournals.aje.a121576) [DOI] [PubMed] [Google Scholar]
  • 15.Wang W, Ruan S. 2004. Simulating the SARS outbreak in Beijing with limited data. J. Theor. Biol. 227, 369–379. ( 10.1016/j.jtbi.2003.11.014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Chowell G, Hengartner NW, Castillo-Chavez C, Fenimore PW, Hyman JM. 2004. The basic reproductive number of Ebola and the effects of public health measures: the cases of Congo and Uganda. J. Theor. Biol. 229, 119–126. ( 10.1016/j.jtbi.2004.03.006) [DOI] [PubMed] [Google Scholar]
  • 17.Smirnova A, deCamp L, Chowell G. 2019. Forecasting epidemics through nonparametric estimation of time-dependent transmission rates using the SEIR model. Bull. Math. Biol. 81, 4343–4365. ( 10.1007/s11538-017-0284-3) [DOI] [PubMed] [Google Scholar]
  • 18.Liu Z, Magal P, Seydi O, Webb G. 2020. Predicting the cumulative number of cases for the COVID-19 epidemic in China from early data. Math. Biosci. Eng. 17, 3040–3051. ( 10.3934/mbe.2020172) [DOI] [PubMed] [Google Scholar]
  • 19.Hadeler KP. 2011. Parameter identification in epidemic models. Math. Biosci. 229, 185–189. ( 10.1016/j.mbs.2010.12.004) [DOI] [PubMed] [Google Scholar]
  • 20.Bakhta A, Boiveau T, Maday Y, Mula O. 2020 doi: 10.3390/biology10010022. Epidemiological short-term forecasting with model reduction of parametric compartmental models: application to the first pandemic wave of COVID-19 in France. (http://arxiv.org/abs/2009.09200. ) [DOI] [PMC free article] [PubMed]
  • 21.Liu Z, Magal P, Seydi O, Webb G. 2020. Understanding unreported cases in the 2019-nCov epidemic outbreak in Wuhan, China, and the importance of major public health interventions. MPDI Biol. 9, 50 ( 10.3390/biology9030050) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Data sourced Wikipedia who used from NHC daily reports: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_mainland_China.
  • 23.The National Health Commission of the People’s Republic of China: http://www.nhc.gov.cn/yjb/pzhgli/new_list.shtml. [DOI] [PMC free article] [PubMed]
  • 24.Chinese Center for Disease Control and Prevention: http://www.chinacdc.cn/jkzt/crb/zl/szkb_11803/jszl_11809/. [DOI] [PMC free article] [PubMed]
  • 25.Roda WC, Varughese MB, Han D, Li MY. 2020. Why is it difficult to accurately predict the COVID-19 epidemic? Inf. Dis. Modell. 5, 271–281. ( 10.1016/j.idm.2020.03.001) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Smith HL. 1995. Monotone dynamical systems, an introduction to the theory of competitive and cooperative systems. Math. Surveys and Monographs, vol. 41 Providence, RI: American Mathematical Society. [Google Scholar]
  • 27.Tsoularis A, Wallace J. 2002. Analysis of logistic growth models. Math. Biosci. 179, 21–55. ( 10.1016/S0025-5564(02)00096-2) [DOI] [PubMed] [Google Scholar]
  • 28.Verhulst P-F. 1838. Notice sur la loi que la population pursuits dans son increase. Correspondance mathématique et physique 10, 113–121. [Google Scholar]
  • 29.Demongeot J, Taramasco C. 2014. Evolution of social networks: the example of obesity. Biogerontology 15, 611–626. ( 10.1007/s10522-014-9542-z) [DOI] [PubMed] [Google Scholar]
  • 30.Demongeot J, Hansen O, Taramasco C. 2015. Complex systems and contagious social diseases: example of obesity. Virulence 7, 129–140. ( 10.1080/21505594.2015.1082708) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Demongeot J, Jelassi M, Taramasco C. 2017. From susceptibility to frailty in social networks: the case of obesity. Math. Pop. Studies 24, 219–245. ( 10.1080/08898480.2017.1348718) [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reviewer comments

Data Availability Statement

The data in my paper are public and can be found at: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_mainland_China; http://www.nhc.gov.cn/yjb/pzhgli/new_list.shtml.


Articles from Royal Society Open Science are provided here courtesy of The Royal Society

RESOURCES