Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Oct 20;15(10):e0240578. doi: 10.1371/journal.pone.0240578

On the use of growth models to understand epidemic outbreaks with application to COVID-19 data

Chénangnon Frédéric Tovissodé 1, Bruno Enagnon Lokonon 1, Romain Glèlè Kakaï 1,*
Editor: Maria Alessandra Ragusa2
PMCID: PMC7575103  PMID: 33079964

Abstract

The initial phase dynamics of an epidemic without containment measures is commonly well modelled using exponential growth models. However, in the presence of containment measures, the exponential model becomes less appropriate. Under the implementation of an isolation measure for detected infectives, we propose to model epidemic dynamics by fitting a flexible growth model curve to reported positive cases, and to infer the overall epidemic dynamics by introducing information on the detection/testing effort and recovery and death rates. The resulting modelling approach is close to the Susceptible-Infectious-Quarantined-Recovered model framework. We focused on predicting the peaks (time and size) in positive cases, active cases and new infections. We applied the approach to data from the COVID-19 outbreak in Italy. Fits on limited data before the observed peaks illustrate the ability of the flexible growth model to approach the estimates from the whole data.

Introduction

COVID-19 is a pandemic caused by the new coronavirus strain SARS-nCOV2 which emerged from Wuhan, China [1, 2]. A total of 21 026 758 COVID-19 cases and 755 786 related deaths were reported across the world as at August 15, 2020 [3]. The worldwide social, as well as economic ravages by COVID-19 has immediately motivated the use of mathematical models to understand the course of the epidemic and plan for effective control strategies. These include, for instance, the SIR (Susceptible, Infectious, Recovered), SEIR (Susceptible, Exposed, Infectious, Recovered) and its variants, SIDR (Susceptible, Infectious, Recovered, Dead) and SIQR (Susceptible, Infectious, Quarantined, Recovered) models [47]. These modelling approaches use mechanistic models which incorporate key physical laws or mechanisms involved in the dynamics of the population at risk and the pathogen [8]. A second class of approaches uses empirical phenomenological models which does not require specific knowledge on the physical laws or mechanisms that give rise to the observed epidemic data [9], and was considered, for instance, by [10] and [11] to understand both short and long term dynamics of COVID-19. A new curve fitting-like approach, namely fractal interpolation [12, 13] was also proposed by [1416] to account for the high noise and reporting bias in data from the COVID-19 pandemic. As generally is the case with dynamic biological systems [17, 18], mathematical model development and adaptation are fundamental requirements to guide public health policies.

When facing an epidemic outbreak, public health officials are mostly interested in data driven, mathematically motivated, practical and computationally efficient approaches that can: i) generate estimates of key transmission parameters; ii) gain insight to the contribution of different transmission pathways; iii) assess the impact of control interventions (e.g. social distancing, test + isolation, vaccination campaigns); iv) optimize the impact of control strategies; and v) generate short and long-term forecasts [8]. In regard to the current COVID-19 outbreak, politics and public health officials are mostly worried about the ability of the disease to induce saturation of the health system, reducing the survival of patients, and even consulting for reasons different from the epidemic itself. High interest is thus currently given to accurate forecasting of the epidemic peak time and size, epidemic size and duration, as well as their sensitivity to control interventions in order to optimize the impact of control strategies.

An exponential-growth model is usually assumed to characterize the early phase of epidemics. But, this assumption can lead to failure to appropriately capture the profile of the epidemic growth, eventually giving rise to non-realistic epidemic forecasts [10, 19]. In an ultimate view to guide control interventions aiming to limit the spread of epidemics, with focus on the COVID-19 pandemic, this work considered a flexible growth curve fitting approach to understand the dynamics of epidemics. We used the generic growth model of [20] to model the course of reported positive cases and a binomial regression to model removals (recoveries and deaths). Thereafter, we inferred the overall dynamics of the epidemic, in terms of observables (reported cases, active/quarantined cases) and unobservables (new infections, lost cases), and predicted interest quantities such as the peak (time and size) in reported cases, active cases and new infections. The performance of the approach was assessed through an application to daily case reporting data from Italy, which has virtually completed a whole COVID-19 outbreak wave, thus offering the possibility to compare predicted outputs to real events.

Methods

We used a growth curve approach for modeling the course of an epidemic along time. We followed [8, 10, 21] who, among others, used growth models to forecast epidemic dynamics.

Structural model for epidemic incidence

Let Ct denote the size of the detected infected population at time t, i.e. the cumulative number of infected, identified and isolated individuals. We assumed for convenience that Ct is continuous and denote C˙t its first derivative with respect to t. Also let It be the true size of infectives at t, related to Ct through

C˙t=δtIt (1)

where δt ∈ (0, 1] is the detection rate which is closely related to the testing effort (number of tests, tracing of contact persons of identified cases and targeting exposed people) and is assumed at least twice differentiable with respect to t. We ressorted to the generic growth model of [20] for the identified positive cases:

Ct=K(1+ut)1/ν (2)

with ut = [1 + νωρ(tτ)]−1/ρ. In Eq (2), K > 0 is the ultimate epidemic size (detected), ω > 0 is the “intrinsic” growth constant, ν and ρ are powers (ν > 0 and −1 < ρ < ν−1) characterizing respectively the rates of change with respect to the initial size C0 = δ0 I0 (number of cases detected at time t = 0) and the ultimate size K, and τ is a constant of integration, determined by the initial conditions of the epidemic and implicitly the detection rate δ0 through C0 = K[1 + (1 − νωρτ)−1/ρ]−1/ν for ρ ≠ 0 and C0 = K(1 + eνωτ)−1/ν for ρ = 0. The growth model in Eq (2) is quite flexible to handle various shapes of epidemic dynamics. Indeed, if K → ∞ and νρ → 0, Eq (2) specializes to the exponential growth model

Ct=eω(tτ) (3)

where ω is the exponential growth rate. Apart from Eq (3), other special or limiting cases of Eq (2) include the hyper-Gompertz (ν → 0 while ων1+ρ is constant) and the Gompertz (ν → 0, ρ → 0 while ων is constant), the Bertalanffy-Richards (ρ → 0), the hyper-logistic (ν = 1) and the logistic (ν = 1 and ρ → 0) growth models [20]. From Eq (2), the observed epidemic incidence C˙t is given by

C˙t=Kωut1+ρ(1+ut)ν+1ν. (4)

In order to ensure the restriction −1 < ρ < ν−1, we set ρ=ρ0ν+1ν1 with ρ0 ∈ (0, 1) free of ν.

Active cases and outcomes

The number At of detected and active cases along an epidemic outbreak is of high interest for public health officials. Indeed, At must be kept under the carrying capacity of the health system to avoid overload and disrupture. The derivative A˙t of the detected and active cases satisfies

A˙t=C˙tRt (5)

where Rt = αt At denotes the number of removed and permanently immune (mortality and recovery) at time t, and αt is the unit time removal probability, i.e. the odds to have an outcome (recovery or death), averaged over the active cases. Eq (5) fits in the SIQR (Susceptible, Infectious, Quarantined, Recovered) model framework [22] with the detected active cases referred to as “quarantined” and the strong assumption that αt is constant along the epidemic outbreak (see the third equation in system (6) in [22]). The removal probability can more generally be given the logistic form αt=eηt1+eηt with ηt=Xtβ+κt where Xt = (Xt1, Xt2, ⋯, Xtq) is a vector of q covariates (known constants) and β is the q vector of associated effects, and κ determines the change in the log-odds ratio for having an outcome per unit time. These changes in αt can be due to an improvement in the health care system during the epidemic outbreak (increase in recovery ratio) or a deterioration of the health care system for infected individuals (increase in mortality ratio due to the outbreak). The general solution of the differential Eq (5) turns to have the form

At={[A0+0tC˙seαssds]eαttifκ=0[A0(1+eXtβ)1/κ+0tC˙s(1+eηs)1/κds](1+eηt)1/κifκ0 (6)

where A0 is the number of active cases at time t = 0. Indeed, when κ = 0, taking the first derivative of At yields A˙t=C˙teαtteαtt+[A0+0tC˙seαssds](αt)eαtt resulting in A˙t=C˙tαtAt which is the Eq (5). For t = 0, the integral in Eq (6) vanishes, resulting as expected in At = A0 since eαtt=1. When κ ≠ 0, the first derivative of At is A˙t=C˙t(1+eηt)1/κ(1+eηt)1/κ+At(1κ)(κeηt)(1+eηt)1=C˙teηt1+eηtAt which reduces to A˙t=C˙tαtAt in accordance with Eq (5). Here, for t = 0, eηt=eXtβ so that At = A0.

There are no general closed form solutions for the integrals in Eq (6), unless C˙t and αt are purposely chosen as functions of time to simplify the integral. At can, however, be obtained in practice from Eq (6) using a numerical integration routine such as the function integrate in R freeware [23] or the function integral of Matlab [24]. Nevertheless, to circumvent this issue during estimation under the generic growth model in Eq (2), we discretized the active cases At by assuming a binomial removal process Rt conditional on the detected unit time new cases Yt as

Rt|At1,YtBIN(At1+Yt,αt) (7)
At=At1+YtRt (8)

where BIN(n, α) denotes a binomial distribution with n trials and success probability α and Yt is a non-negative process with expectation λt=C˙t. Clearly, the bivariate process {At, Rt} defined by Eqs (8) and (7) is not stationary. However, since Yt ≥ 0 and C˙t0 as t → ∞, we have Yt → 0 in distribution as t → ∞, and if the removal probability αt does not approach zero as t → ∞, then At → 0 as t → ∞.

Peak of detected cases

The epidemic peak is an important event in the disease dynamic and can be estimated for a better management of the epidemic. An epidemic described by the exponential growth model in Eq (3), (K → ∞ or νρ → 0) does not peak. Otherwise, the peak in the detected number of infected individuals corresponds to the maximum of the incidence rate C˙t. This maximum is then attained when C¨t=C˙tt=0. We have from Eq (4)

C¨t=νωutρ[ν+1νut1+ut(1+ρ)]C˙t. (9)

Solving C¨t=0 for t using Eq (9) yields the peak time tp=τ+{[1νρν(1+ρ)]ρ1}/(νωρ) which reads,

tp=τ+1ω[ρ0(1ρ0)ν][(ρ01ρ0)1ρ0(ν+1)/ν1] (10)

on replacing ρ=ρ0ν+1ν1. Inserting tp in Eq (1) and denoting up=ν1+ρ1ρν gives the peak

C˙p=Kωup1+ρ(ν+11ρν)ν+1ν. (11)

At the peak in detected cases, the cumulative number of detected cases is Cp = K(1 + up)−1/ν.

Overall epidemic dynamics

An important interest in modelling the epidemic incidence is the derivation of quantities related to the overall dynamics of the epidemic, in both detected and undetected cases.

Total cases: Detected and losts

Let us denote St the cumulative number of cases from the epidemic outbreak to t, and let S˙t be the first derivative of St. We also introduce Λt, the cumulative number of lost cases (with first derivative Λ˙t), i.e. people who were infected, undetected, and removed from infectives (mortality and recovery).

The size of the lost cases is determined by the unit time removal rate πt ∈ (0, 1) from undetected infectives (πt is an average over all infectives, i.e. irrespective of the time since infection onset). The lost rate πt which is assumed at least twice differentiable with respect to t, depends on various factors like the disease related mortality, the average infection duration, the natural proportion of asymptomatics within infectives, and the existence and the use of medicines that may reduce symptoms (induced asymptomatics). It is worthwhile noticing that πt can be estimated from the removal rate αt in the detected cases, taking into account various factors that may induce difference between the two rates. For instance, since the undetected cases include asymptomatics, disease related mortality may be lower and recovery rate higher in undetected as compared to detected cases. However, efficiency of the health care system in treating identified and isolated cases can reduce mortality thereby reducing αt, but also improve recovery thereby increasing αt.

With the above notations, the lost cases count Λt satisfies the differential equation,

Λ˙t=πt(1δt)It (12)

whereas the cumulative number of cases St is given on setting υt = (1 − πt)(1 − δt) by

St=Ct+Λt+υtIt. (13)

The factor υt represents at time t the proportion of infectives who will potentially continue to spread the epidemic after adequate contacts (i.e. contacts sufficient for transmission) with susceptibles. In other words, the number of undetected currently infectives is (1πt)(δt11)C˙t. From Eq (1), the infectives It and its first derivative with respect to time I˙t are given for t ≥ 0 by

It=δt1C˙t (14)
I˙t=δt1[C¨tδ˙tδt1C˙t] (15)

where δ˙t is the first derivative of the detection rate δt with respect to t. Straightforward algebraic operations then give the number of new cases and the cumulative number of cases as

S˙t=[πtδt1+(1πt)(1(δt11)δt1δ˙t)+υ˙tδt1]C˙t+(1πt)(δt11)C¨t (16)
St=Ct+Λt+(1πt)(δt11)C˙t (17)

where υ˙t=(1πt)δ˙t(1δt)π˙t with π˙t the first derivative of the lost rate πt, and the cumulative number of lost cases Λt is given for t ≥ 0 by

Λt=S0+0tπs(δs11)C˙sds (18)

with S0 the cumulative number of all cases until the first detection date t = 0. The total size of the epidemic is S = C + Λ since C˙t0 as t → ∞. Under the Turner’s growth model, S = K + Λ.

Let us assume a constant detection rate δt = δ closely related to detection effort but also to the average duration from infection to recovery or death of non-isolated cases. Assuming in addition a constant lost rate (πt = π), we have δ˙t=π˙t=υ˙t=0, and the new cases S˙t and its accumulation St, as well as the lost cases Λt simplify to

S˙t=[1+π(δ11)]C˙t+(1π)(δ11)C¨t (19)
St=S0+[1+π(δ11)]Ct+(1π)(δ11)C˙t (20)
Λt=S0+π(δ11)Ct. (21)

The total epidemic size is here S = S0 + [1 + π(δ−1 − 1)]K.

Epidemic peak

At the time tp of the peak of reported cases (C¨t=0) under constant detection and lost rates, the new infectives is S˙p=[1+π(δ11)]C˙p with C˙p given in Eq (11). This, however, corresponds to the peak in the overall new cases S˙t only under the unrealistic assumption δ = 1. The peak of new infections occurs when the second derivative S¨t of St with respect to t vanishes (S¨t=0). We have from Eq (16)

S¨t=[πtδt1+(1πt)(1(δt11)δt1δ˙t)+υ˙tδt1]C¨t+(1πt)(δt11)Ct+Ψt (22)

where Ct (the third derivative of Ct with respect to t) and Ψt are given by

Ct=ν2ω2ut2ρ[(1+ρ)(2ρ+1)3(ν+1)(ρ+1)νzt+(ν+1)(2ν+νρ+1)ν2zt2]C˙t (23)
Ψt={π˙t[δt1(1+(δt11)δ˙t)1]+(1πt)δt1[υ¨t(δt11)δ¨t]+δt2δ˙t[(1πt)[π˙t(1δt)+δ˙t(2δt1πt)]πt]}C˙t[π˙t(δt11)+(1πt)δt2δ˙t]C¨t (24)

with zt=ut1+ut, and π¨t and δ¨t the second derivatives of respectively πt and δt with respect to t. The peak time and value depend on the particular forms of δt and πt as functions of time. Here, we restrict the attention to the simple situation with constant positive detection and lost rates (δt = δ with δ ∈ (0, 1) and πt = π with π ∈ (0, 1)) where δ˙t=π˙t=Ψt=0 and Eq (22) reduces to

S¨t=[1+π(δ11)]C¨t+(1π)(δ11)Ct. (25)

It appears that the peak of new infections occurs before the time tp of the peak in detected cases. Indeed, at t = tp, we have C¨t=0, (1 − π)(δ−1 − 1) > 0 and Ct<0 so that S¨t<0, i.e. S˙t is already in its descending phase. The expression Eq (25) indicates that at the time tP of the peak of new infections, C¨t is equal to C¨P=ζCP where ζ=(1π)(1δ)π+δ(1π) and CP is given by Eq (23) with t = tP. The lower ζ, the lower |C¨P|, and the lower the difference tptP (delay of the observed peak). Differentiating ζ with respect to δ gives ζδ=1π[π+δ(1π)]2<0, hence the higher δ, the lower the delay between the observed peak time and the time of the peak in new infections. Using Eqs (9) and (23), S¨t becomes

S¨t=νωutρδ1{νω(1π)(1δ)1+νωρ(tτ)[(1+ρ)(2ρ+1)3(ν+1)(ρ+1)νzt+(ν+1)(2ν+νρ+1)ν2zt2]+[δ+π(1δ)][ν+1νzt(1+ρ)]}C˙t (26)

which does not have a closed form root. The root tP can, however, be obtained using root finding numerical routines such as the R function uniroot or the Matlab function fzero. Afterwards, the peak S˙p size (the maximum number of new infections) is obtained using Eq (19).

Statistical models and inference

Let us consider a record of new confirmed infected cases Y1, Y2, ⋯, Yn, active cases A0, A1, ⋯, An−1, removed cases R1, R2, ⋯, Rn (available from Eq (8) as Rt = YtAt + At−1) and the associated vectors of covariates X1, X2, ⋯, Xn at n time points. The parameters K, ω, ν, ρ0, τ, and κ can be estimated using maximum likelihood (ML) by assigning to each Yt an appropriate statistical distribution with expectation λt=C˙t and a dispersion parameter σ > 0, and probability density function (pdf) or probability mass function (pmf) f(Yt|θ) where θ = (K, ω, ν, ρ0, τ, κ, β, σ). We subsequently considered inference under log-normal and negative binomial distributions.

Log-normal model

Epidemic incidence case data are generally fitted through non-linear least squares applied at logarithmic scale [19, 25, 26]. To deal with zero incidence cases, the logarithmic transform is usually applied on the shifted cases Yt + 1. Mimicking this procedure in a likelihood inference framework, we consider a log-normal distribution assumption for the shifted incidence cases, i.e. Yt + 1 ∼ LNt + 1, σ). The pdf of Yt, adapted from [27], reads

f(Yt|θ)=1σ(Yt+1)2πexp{12(log(Yt+1)log(λt+1)σ+σ2)2} (27)

so that Yt has expectation E[Yt] = λt and variance Var[Yt]=(λt+1)2(eσ21).

Negative binomial model

Since incidence cases are counts, Yt can be assumed to follow the negative binomial distribution, i.e. YtNBt, σ) with pmf

f(Yt|θ)=Γ(Yt+1/σ)Γ(Yt+1)Γ(1/σ)(σλtσλt+1)1/σ(1σλt+1)Yt. (28)

The incidence case Yt then has expectation E[Yt] = λt and variance Var[Yt] = λt(1 + σλt).

Likelihood inference

Based on the information {Yt, Rt} for t = 1, 2, ⋯, n, the conditional log-likelihood of the parameter θ given A0 is

(θ)=t=1n[logf(Yt|θ)+logfB(Rt|θ)] (29)

where fB(Rt|θ)=(At1+NtRt)αtRt(1αt)At1+NtRt is the binomial probability mass function for Rt. The function (⋅) can be maximized to obtain the maximum likelihood estimate θ^ of θ using an optimization routine such as the function optim in R or the function fminsearch of Matlab. Let H(θ) the hessian matrix of (θ) and define the covariance Σ(θ) = −[H(θ)]−1. The large sample distribution (i.e. for n → ∞) of the maximum likelihood estimator is multivariate normal with mean θ^ and covariance matrix Σ^=Σ(θ^).

Application to reported COVID-19 new cases in Italy

The data

In order to test the reliability of the Turner’s growth model in predicting the dynamics of an epidemic, we used data from one of the countries which had completed a whole COVID-19 outbreak wave. The daily case reporting data in Italy was obtained from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series. We used only the confirmed data (2020-02-20 to 2020-07-11) accessed on 2020-07-28, discarding the latest data subject to possible reporting delay, as indicated by the Istituto Superiore di Sanità (ISS) at https://www.epicentro.iss.it/en/coronavirus/sars-cov-2-dashboard.

Data analysis

All analyses were performed in R [23]. We fitted the Turner’s growth model curve to the whole Italian data. Both the log-normal and the negative binomial distributions were used, and the fit with the lowest root mean square error (RMSE) computed for the daily new positive cases was selected as the best. Then, we derived peak statistics (time and size) for daily new reported cases and active cases. We also inferred the daily new infections from assuming constant detection and lost rates and estimated its peak (time and size). The detection (δ = 0.033/day) and the lost rates (π = 0.1/day) for Italy were obtained from [7]. These rates follow from assumptions that the average time of duration from infection to recovery or death of non-isolated cases is 10 days (hence π = 0.1/day) and that during this detection window, 1/3 of infectives are tested positives (hence δ = 0.033/day).

To assess the ability of the model in predicting the peak of the new positive cases in countries which have not yet reached the peak, we retrospectively fitted the model to the Italian data before the observed peak (day 29 after the notification of first case), using data of the first two weeks, and then data of the first three weeks. For these analyses with limited data, we fitted the full Turner’s growth model to the positive cases, but also its special cases, namely the hyper-Gompertz (ν → 0 while ων1+ρ is constant), the Gompertz (ν → 0, ρ → 0 while ων is constant), the Bertalanffy-Richards (ρ → 0), the hyper-logistic (ν = 1) and the logistic (ν = 1 and ρ → 0) models using the log-normal distribution for the daily counts. We then computed the Akaike’s Information Criterion (AIC) defined as AIC =2^+2Np with ^ the maximized log-likelihood and Np the number of parameters in a fitted model. Finally, we retained and presented the best fit (lowest AIC value).

Results

Modelling the whole Italian data

Table 1 shows parameter estimates using the whole Italian COVID-19 daily case reporting data from 2020-02-20 to 2020-07-11, with standard errors and 95% confidence intervals. The log-normal distribution based fit recorded the lowest RMSE, and was thus retained for subsequent analyses. The confidence bounds for the parameter ρ (ρ^=0.32 with CI(ρ) = [0.29, 0.35]) indicated that neither the logistic growth model (ρ → 0 and ν = 1) nor the Bertalanffy-Richards growth model (ρ → 0) were appropriate for this dataset. It was noted that ν was not significantly different from 1 (ν^=0.85 with CI(ν) = [0.70, 1.05]), hence the hyper-logistic model (ν = 1) was found to be compatible with the data. The fitted equation (Eq (30) is for t ≥ 0

C^t=253124.1{1+[1+0.0242(t39.3877)]3.1659}1.1691 (30)

with a coefficient of determination of R2 = 99.97%. The curves fitted to the new positive cases and the cumulative number of positive cases are shown on Fig 1(A) and 1(B). It can be observed on Fig 1(A) that the peak of new positive cases occurred 29 days after the notification of first case, whereas the maximum likelihood estimate of the theoretical peak time is five days later as shown in Table 2 (t^p=34.10, CI(tp) = [31.94, 36.41] days). The theoretical peak size is on average 5 298 new positive cases (C˙^p=5298.96, CI(C˙p)=[4609.72,6091.25] new cases) against a maximum of 6 248 observed new positive cases.

Table 1. Estimate, standard error (SE) and 95% confidence interval (CI95%) of Turner’s growth model parameters fitted to the Italian COVID-19 daily case reporting data from 2020-02-20 to 2020-07-11, using the log-normal distribution (RMSE = 514.24, R2 = 99.97%) and the negative binomial distribution (RMSE = 530.93, R2 = 99.93%).

Model Log-normal fit Negative binomial fit
parameter Estimate SE CI95% Estimate SE CI95%
K 253124.1 12623.0 [229554.2, 279114.1] 242952.6 169.6 [242951.2, 242954.0]
ω 0.0896 0.0113 [0.0700, 0.1146] 0.0902 0.0098 [0.0729, 0.1117]
ν 0.8553 0.0906 [0.6951, 1.0526] 0.8300 0.0771 [0.6918, 0.9959]
ρ 0.3159 0.0142 [0.2892, 0.3451] 0.3231 0.0124 [0.2996, 0.3484]
τ 39.3877 2.5181 [34.7491, 44.6456] 39.3457 2.2393 [35.1927, 43.9888]
β -4.0229 0.0060 [-4.0348, -4.0111] -4.0229 0.0060 [-4.0348, -4.0111]
κ 0.0076 0.0001 [0.0075, 0.0078] 0.0076 0.0001 [0.0075, 0.0078]
σ 0.4332 0.0257 [0.3857, 0.4867] 0.1466 0.0175 [0.1160, 0.1853]

Notes: RMSE = root mean square error; β and κ define the daily removal rate from detected cases as αt=eβ+κt1+eβ+κt; σ is the log-normal/negative binomial distribution scale parameter (see the pdf in Eq (27) and pmf in Eq (28)).

Fig 1. Log-normal fit of Turner’s model to the COVID-19 daily case reporting data from Italy (2020-02-20 to 2020-07-11).

Fig 1

New reported cases (A), cumulative positive cases (B), active (quarantined) cases (C) and estimated (average) daily new infections based on a detection rate of δ = 0.033/day and a lost rate (recovery or death) of non-detected cases of π = 0.1/day (D).

Table 2. Estimate, standard error (SE) and 95% confidence interval of peak statistics using the COVID-19 daily case reporting data from Italy (2020-02-20 to 2020-07-11).

Quantity Peak statistic Estimate SE CI95% Observed
Detected Time (day) 34.10 1.14 [31.94, 36.41] 29
New positive cases 5298.96 376.73 [4609.72, 6091.25] 6248
Actives (isolated) Time (day) 55.71 0.96 [53.87, 57.62] 58
Active cases 111069.88 6759.93 [98580.39, 125141.70] 114683
New infections Time (day) 27.52 1.02 [25.62, 29.56] -
New infections 22748.38 1351.44 [19726.30, 26233.44] -

Notes: - = not available.

From the estimate of the parameter β given in Table 1 (β^=4.02 with CI(β) = [−4.03, −4.01]), it appears that the daily removal rate (recoveries and deaths) averaged α^0=1.8% in the very early phase of the epidemic (t ≈ 0 day). Then, from the estimate of κ (κ^=0.0076 with CI(κ) = [0.0075, 0.0076]), it appears that the removal rate increased with time, i.e. the probability for an active case to recover or die within a day increased on average by 5.5% over a week. Fig 1(C) displays the active cases and the corresponding fitted curve using the removal probability along with the fitted Eq (30). The active cases were predicted to peak on day 56 (t^a=55.71 days, CI(ta) = [53.87, 57.62] days) to 111 070 active cases (A^a=111069.88 cases, CI(Aa) = [98 580.39, 125 141.70] cases), whereas the observed peak amounted to 114 683 cases and occurred 58 days after the notification of the first case.

The daily new infections inferred from assuming a constant detection rate (δ = 0.033/day) and a constant lost rate (π = 0.1/day) is depicted on Fig 1(D). The peak in new infections likely occurred about 28 days (t^P=27.52 days, CI(tP) = [25.62, 29.56] days) after the notification of the first case, and averaged 22 748 new infections (S˙^P=22748.38, CI(S˙P)=[19726.30,26233.44] new infections) (Table 2). The ratio of the number of infectives to the number of active cases decreased from 44.70 at the first notification day to 11.41 one week later (averaging 22.95, CI = [22.01, 23.93] over this period) and to 2.99 at peak time, 22 days later.

Retrospective fits

The AICs of the retrospective fits of Tuners’s growth model and its special cases to the Italian COVID-19 data of the first two weeks and the first three weeks are presented in Table 3. It can be observed that the best fits correspond to the hyper-logistic growth model for both data of the first two weeks (AIC = 483.03) and data of the first three weeks (AIC = 863.58). Although parsimony indicated the hyper-logistic model fits as the best, the differences ΔAIC in AIC with respect to the full Turner’s growth model fit were mild (|ΔAIC| < 2).

Table 3. AIC of Turner’s growth model fitted to the Italian COVID-19 daily case reporting data of the first two weeks and the first three weeks from 2020-02-20, with a log-normal distribution for the positive cases.

Dataset Growth model Restrictions NFGMP AIC ΔAIC
Data of the first two weeks Full Turner - 5 484.49 0
Bertalanffy-Richards ρ → 0 4 504.45 19.97
Hyper-logistic ν = 1 4 483.03 -1.46
Logistic ν = 1 and ρ → 0 3 530.22 45.73
Hyper-Gompertz ν → 0 and ων1+ρ is constant 3 542.92 58.43
Gompertz ν → 0, ρ → 0 and ων is constant 2 499.50 15.02
Data of the first three weeks Full Turner - 5 864.21 0
Bertalanffy-Richards ρ → 0 4 901.78 37.57
Hyper-logistic ν = 1 4 863.58 -0.63
Logistic ν = 1 and ρ → 0 3 945.16 80.95
Hyper-Gompertz ν → 0 and ων1+ρ is constant 3 966.71 102.49
Gompertz ν → 0, ρ → 0 and ων is constant 2 896.52 32.30

Notes: - = not applicable; NFGMP = Number of free growth parameters; ΔAIC = difference between the AIC of a special growth model fit and the AIC of the full Turner’s growth model fit.

Table 4 shows the estimate of the hyper-logistic growth model parameters for the two shorted datasets. It appears that the estimates of the intrinsic growth parameter ω increased slightly with data availability from ω^=0.05 (CIω = [0.04, 0.07]) using the data of the first two weeks, to ω^=0.07 (CIω = [0.06, 0.08]) using the data of the first three weeks and to ω^=0.09 (CIω = [0.07, 0.11]) using the whole dataset from Italy.

Table 4. Estimate, standard error (SE) and 95% confidence interval of peak statistics using the COVID-19 daily case reporting data from Italy (2020-02-20 to 2020-07-11).

Model First two weeks data First three weeks data
parameter Estimate SE CI95% Estimate SE CI95%
K 260124.1 930.9 [258305.9, 261955.1] 260122.6 633.1 [258884.8, 261366.3]
ω 0.0518 0.0066 [0.0404, 0.0665] 0.0661 0.0050 [0.0569, 0.0768]
ρ 0.3401 0.0183 [0.3061, .3779] 0.3075 0.0121 [0.2846, 0.3322]
τ 55.5287 4.3775 [47.5790, 64.8068] 47.7007 2.0405 [43.8645, 51.8724]
β -4.7679 0.2379 [-5.2341, -4.3017] -3.4678 0.1013 [-3.6663, -3.2693]
κ 0.1144 0.0196 [0.0761, 0.1528] -0.0100 0.0058 [-0.0214, 0.0013]
σ 0.2081 0.0393 [0.1437, 0.3014] 0.2165 0.0334 [0.1600, 0.2930]

Notes: β and κ define the daily removal rate from detected cases as αt=eβ+κt1+eβ+κt; σ is the log-normal distribution scale parameter (see pdf in Eq (27))

The estimates of the peak time and size from the two shorted datasets are shown in Table 4. The forecast of the peak time from the data of the first two weeks was day 44 (t^p=43.38, CI(tp) = [39.04, 48.22] days) which overestimated the observed peak time (day 29). The estimate from the data of the first three weeks reduced the delay, with t^p=38.97 (CI(tp) = [36.80, 41.27]) days. The forecast of the peak size from the data of the first two weeks was 3 794 (C˙^p=3793.60, CI(tp) = [3 032.63, 4 745.53]) new positive cases (Table 5), which underestimated the observed peak (6248 new positive cases). The forecast from the data of the first three weeks also underestimated the peak but is less biased, with C˙^p=4733.35 (CI(tp) = [4 136.58, 5 416.22]) new positive cases (Table 5).

Table 5. Estimate, standard error (SE) and 95% confidence interval (CI95%) of the parameters of the hyper-logistic growth model fitted using the log-normal distribution to the COVID-19 daily case reporting data from Italy for the first two weeks (RMSE = 92.16, R2 = 99.68%) and for the first three weeks (RMSE = 224.41, R2 = 99.87%) from 2020-02-20.

Peak Data of the first two weeks Data of the first three weeks
statistic Estimate SE CI95% Estimate SE CI95%
Time (day) 43.38 2.34 [39.04, 48.22] 38.97 1.14 [36.80, 41.27]
New positive cases 3793.60 433.34 [3032.63, 4745.53] 4733.35 325.46 [4136.58, 5416.22]

Notes: RMSE = root mean square error.

Summary and perspectives

This work proposes the use of a flexible growth model to model case reporting data from an epidemic outbreak with containment measures including at least isolation of individuals tested positive. The generic growth model of [20] offers a flexible framework with the possibility to recover many special growth models such as the common exponential and the logistic growth models, the hyper-logistic, the hyper-Gompertz, the Gompertz and the Bertalanffy-Richards growth models. Since the special models are all nested within the generic model framework, the most appropriate model can be identified using information criteria such as the Akaike’s Information Criterion (AIC), but a likelihood ratio test [28] can also be conducted for models with different number of free parameters. Where additional information can be obtained on the ability to detect infective individuals, the proposed framework allows to include this information so as to infer on the dynamics of the epidemic beyond the identified (positive) cases, without ressorting to mechanistic/compartmental models. Nevertheless, we considered a constant (average) detection rate whereas the detection rate obviously changes over the epidemic course in terms of the detection effort (number of tests, tracing of contact persons).

From our application to the COVID-19 outbreak data in Italy, the hyper-logistic model is the most appropriate model for the dataset. It appears that the modelling approach can predict the dynamics of an epidemic using data from first few days of an outbreak, at least in this example. Indeed, the predicted peak time (and size) for the positive cases (using only the first two/three weeks data) overestimates (and underestimates) the observed peak time (and size). However, the biases can be attributed, for instance, to the increase in the testing effort and isolation (and the subsequent decrease in the growth rate) in Italy where only about 3 762 tests/day were performed in the first three weeks from 2020-02-20, and about 21 248 tests/day were performed in the subsequent three weeks. Our estimate of the ratio of the number of infectives to the number of active cases averaged 22.95 in the first week of the outbreak, within the range [5, 25] obtained by [7] using the SIQR model. Our proposal thus offers a valid alternative to mechanistic models, for instance, the piecewise exponential growth used by [7] within the SIQR model framework on the Italian early outbreak data.

In a very limited data situation, we suggest a further reduction of the number of model parameters to be estimated. Indeed, since the parameter τ in the growth model in Eq (2) is a constant of integration determined by the initial conditions of the epidemic, it can be expressed in terms of other parameters and the number of cases C0 detected at time t = 0 as τ=1νωρ{1[(KC0)ν1]ρ} for ρ ≠ 0 and τ = log((K/C0)ν − 1)/(νω) for ρ = 0. Consideration of a procedure where τ is not estimated as a free parameter may lead to parsimony, with inference conditional on the number of individuals tested positive at time t = 0. Inference on the effective reproduction number and the sensitivity of the epidemic dynamics to the containment measures under the generic growth model framework is considered for future work.

Acknowledgments

The authors are grateful to an anonymous reviewer for drawing their attention on models based on fractal-wavelet for modelling the COVID-19 pandemic. RGK acknowledges the support from the African German Network of Excellence in Sciences (AGNES).

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

The authors received no specific funding for this work.

References

  • 1. Giordano G, Blanchini F, Bruno R, Colaneri P, Di Filippo A, Di Matteo A, et al. Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy. Nature Medicine. 2020; p. 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Velavan TP, Meyer CG. The COVID-19 epidemic. Tropical medicine & international health. 2020;25(3):278 10.1111/tmi.13383 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.WHO. Coronavirus disease 2019 (COVID-19): situation report, 208; 2020.
  • 4. Anastassopoulou C, Russo L, Tsakris A, Siettos C. Data-based analysis, modelling and forecasting of the COVID-19 outbreak. PloS one. 2020;15(3):e0230405 10.1371/journal.pone.0230405 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Casella F. Can the COVID-19 epidemic be controlled on the basis of daily test reports? IEEE Control Systems Letters. 2020;5(3):1079–1084. 10.1109/LCSYS.2020.3009912 [DOI] [Google Scholar]
  • 6.Kucharski AJ, Russell TW, Diamond C, Liu Y, Edmunds J, Funk S, et al. Early dynamics of transmission and control of COVID-19: a mathematical modelling study. The lancet infectious diseases. 2020. [DOI] [PMC free article] [PubMed]
  • 7. Pedersen MG, Meneghini M. Quantifying undetected COVID-19 cases and effects of containment measures in Italy. ResearchGate Preprint (online 21 March 2020). 2020;10. [Google Scholar]
  • 8. Chowell G. Fitting dynamic models to epidemic outbreaks with quantified uncertainty: A primer for parameter uncertainty, identifiability, and forecasts. Infectious Disease Modelling. 2017;2(3):379–398. 10.1016/j.idm.2017.08.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Chowell G, Sattenspiel L, Bansal S, Viboud C. Mathematical models to characterize early epidemic growth: A review. Physics of life reviews. 2016;18:66–97. 10.1016/j.plrev.2016.07.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Golinski A, Spencer PD. Modeling the Covid-19 Epidemic using Time Series Econometrics. medRxiv. 2020; 10.1101/2020.06.01.20118612. [DOI] [PMC free article] [PubMed]
  • 11.Agosto A, Giudici P. A Poisson autoregressive model to understand COVID-19 contagion dynamics. SSRN ePrint. 2020.
  • 12. Guariglia E. Primality, fractality, and image analysis. Entropy. 2019;21(3):1–12. 10.3390/e21030304 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Guariglia E. Entropy and fractal antennas. Entropy. 2016;18(3):1–17. 10.3390/e18030084 [DOI] [Google Scholar]
  • 14. Păcurar CM, Necula BR. An analysis of COVID-19 spread based on fractal interpolation and fractal dimension. Chaos, Solitons & Fractals. 2020;139:1–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Materassi M. Some fractal thoughts about the COVID-19 infection outbreak. Chaos, Solitons & Fractals: X. 2019;4(7):1696–1711. [Google Scholar]
  • 16. Kosmidis K, Macheras P. A fractal kinetics SI model can explain the dynamics of COVID-19 epidemics. PLOS ONE. 2020;15(8):1–9. 10.1371/journal.pone.0237304 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bianca C, Pennisi M, Motta S, Ragusa MA. Immune system network and cancer vaccine. In: AIP Conference Proceedings. 1. American Institute of Physics; 2011. p. 945–948.
  • 18.Bianca C, Pappalardo F, Pennisi M, Ragusa M. Persistence analysis in a Kolmogorov-type model for cancer-immune system competition. In: AIP Conference Proceedings. 1. American Institute of Physics; 2013. p. 1797–1800.
  • 19. Chowell G, Viboud C. Is it growing exponentially fast?–impact of assuming exponential growth for characterizing and forecasting epidemics with initial near-exponential growth dynamics. Infectious disease modelling. 2016;1(1):71–78. 10.1016/j.idm.2016.07.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Turner ME, Bradley EL, Kirk KA, Pruitt KM. A theory of growth. Mathematical Biosciences. 1976;29(3):367–373. 10.1016/0025-5564(76)90112-7. [DOI] [Google Scholar]
  • 21. Chowell G, Luo R, Sun K, Roosa K, Tariq A, Viboud C. Real-time forecasting of epidemic trajectories using computational dynamic ensembles. Epidemics. 2020;30:100379 10.1016/j.epidem.2019.100379 [DOI] [PubMed] [Google Scholar]
  • 22. Hethcote H, Zhien M, Shengbing L. Effects of quarantine in six endemic models for infectious diseases. Mathematical biosciences. 2002;180(1-2):141–160. 10.1016/S0025-5564(02)00111-6 [DOI] [PubMed] [Google Scholar]
  • 23. R Core Team. R: A Language and Environment for Statistical Computing; 2019. Available from: https://www.R-project.org/. [Google Scholar]
  • 24.MATLAB. version 9.0.0 (R2016a). Natick, Massachusetts: The MathWorks Inc.; 2016.
  • 25. Chowell G, Nishiura H, Bettencourt LM. Comparative estimation of the reproduction number for pandemic influenza from daily case notification data. Journal of the Royal Society Interface. 2007;4(12):155–166. 10.1098/rsif.2006.0161 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Viboud C, Simonsen L, Chowell G. A generalized-growth model to characterize the early ascending phase of infectious disease outbreaks. Epidemics. 2016;15:27–37. 10.1016/j.epidem.2016.01.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Limpert E, Stahel WA, Abbt M. Log-normal Distributions across the Sciences: Keys and Clues. BioScience. 2001;341(5). 10.1641/0006-3568(2001)051[0341:LNDATS]2.0.CO;2 [DOI] [Google Scholar]
  • 28. Wilks SS. The large-sample distribution of the likelihood ratio for testing composite hypotheses. The annals of mathematical statistics. 1938;9(1):60–62. 10.1214/aoms/1177732360 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All relevant data are within the manuscript and its Supporting Information files.


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES