Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2022 Jan 11;36(9):2907–2917. doi: 10.1007/s00477-022-02170-w

A stochastic Bayesian bootstrapping model for COVID-19 data

Julia Calatayud 1,, Marc Jornet 2, Jorge Mateu 1
PMCID: PMC8749118  PMID: 35035283

Abstract

We provide a stochastic modeling framework for the incidence of COVID-19 in Castilla-Leon (Spain) for the period March 1, 2020 to February 12, 2021, which encompasses four waves. Each wave is appropriately described by a generalized logistic growth curve. Accordingly, the four waves are modeled through a sum of four generalized logistic growth curves. Pointwise values of the twenty input parameters are fitted by a least-squares optimization procedure. Taking into account the significant variability in the daily reported cases, the input parameters and the errors are regarded as random variables on an abstract probability space. Their probability distributions are inferred from a Bayesian bootstrap procedure. This framework is shown to offer a more accurate estimation of the COVID-19 reported cases than the deterministic formulation.

Keywords: Bayesian bootstrap, COVID-19 reported infections and waves, Deterministic and stochastic modeling, Least-squares fitting, Multiple generalized logistic growth curves, Random parameters and errors

Introduction

COVID-19 is an infectious disease caused by coronavirus SARS-CoV-2. It was detected for the first time in Wuhan, China, in December 2019, and quickly spread around the globe becoming an ongoing pandemic. The virus is rapidly transmitted between persons through small droplets and aerosols. The most common symptoms of the disease are fever, dry cough and fatigue. Data to date suggest that 80% of infections are mild or asymptomatic, 15% are severe, and 5% are critical. Lethality strongly depends on age and comorbidities. As of July 2021, more than 190 million people have been infected and more than 4 million people have died. To contain the spread of the virus and alleviate the pressure on the health systems, governments put several restrictions such as city or country lockdowns, quarantines, social distancing and hygiene measures, curfews, mandatory masks, etc. (World Health Organization 2021; Wu et al. 2020; Kraemer et al. 2020; Wu et al. 2020).

The use of mathematical models is an effective tool to describe and predict the evolution of epidemics and to propose targeted measures (Chitnis et al. 2010; Remuzzi and Remuzzi 2020). Due to the fast transmissibility of SARS-CoV-2 and the containment measures frequently implemented by governments, the modeling of its spread is a difficult problem. For example, as already noticed by other researchers (Acedo et al. 2016; Muñoz-Fernández et al. 2021; Moein et al. 2021), the usual autonomous SIR (susceptible-infected-recovered) model cannot capture the quick variations of COVID–19 reported infections.

When aggregated time-series data are present, the logistic differential equation model may be useful to fit the measurements on COVID-19 infections (Wang et al. 2020). The curve is characterized by an increasing growth in the beginning period, combined with a decreasing growth at a later stage. Generalizations of the logistic growth curve, to allow for more general sigmoid shapes, have been employed to model COVID-19 (Pelinovsky et al. 2020; Wu et al. 2020; Aviv-Sharon and Aharoni 2020; Lee et al. 2020). Those approaches seem to be adequate when fitting a single wave of the COVID-19 epidemic. For multiple phases of growth, as occurs with COVID-19 infection waves, a single S-shaped trajectory is not applicable. In this line, ideas from (Meyer 1994) on the bi-logistic model have been used to fit cumulative cases of COVID-19 along two different outbreaks or waves (Zhang et al. 2020; Salpasaranis and Stylianakis 2020; da Silva et al. 2020). These types of models, usually called phenomenological or statistical, are often useful to reproduce and forecast the course of an epidemic (Chowell et al. 2016; Pell et al. 2018), when the insight is limited, treatments and interventions are rapidly changing, data vary abruptly, and mechanistic models (compartmental models with laws of transmission) present difficulties.

The incidence of an epidemic and its modeling have intrinsic uncertainties that are irreducible (random uncertainty). Thus, to better mimic reality, models should incorporate stochastic components (Smith 2013). For example, reference (Lee et al. 2020) also included stochastic effects through a Bayesian formalization. Bayesian inference has been suggested for other models to accommodate COVID-19 data, such as the Gompertz curve (Berihuete et al. 2021). However, the use of the Bayesian bootstrap (Rubin 1981) to infer input uncertainties does not seem to have been investigated so far.

In this paper, we investigate the use of four combined generalized logistic differential equations to model the four waves of the COVID-19 epidemic in Castilla-Leon (a Spanish autonomous region with 2.5 million inhabitants) at once. The temporal period runs from March 1, 2020 to February 12, 2021. The calibration of model parameters is conducted by a least-squares minimization procedure. Due to data measurement errors, some form of stochasticity is introduced into the model. The input parameters and the errors are considered as random variables, whose probability distributions are inferred from a Bayesian bootstrap technique.

The plan of the paper is the following. Section 2 provides the methodology, including a brief description of the data, and the deterministic and stochastic approaches. Section 3 presents the results (numerical calculations, fittings and plots). A discussion comes in Sect. 4. The paper ends with some final conclusions in Sect. 5.

Methods

Data

We have data on the number of new daily COVID-19 reported infections in the Spanish autonomous region of Castilla-Leon. This region is the largest community in Spain by area, it is located in the northwest of Spain, and it has a population of around 2.5 million. The data correspond to the temporal period of almost one year, from March 1, 2020 to February 12, 2021, which encompasses four waves of the epidemic. The cases have been retrieved from the open data portal of Castilla-Leon1. This dataset only captures a small fraction of the true burden, due to asymptomatic cases, lack of resources and omission of suspected but not confirmed cases. In this paper, we treat Castilla-Leon as a whole where people interact homogeneously.

In Fig. 1, the number of daily new reported infections, for 349 consecutive days, is depicted in the left panel, while the accumulated number of daily reported infections is shown in the right panel. We note that there are four clear waves of the epidemic. The first wave corresponds to the first entrance of the virus in Spain, which ended up in summer 2020 due to the severe lockdown imposed by authorities. The second wave started after summer 2020 due to relaxation of measures and overlapped with a larger third wave along autumn. Finally, the fourth wave began after the end-of-year vacations and ended in February 2021 due to some restrictions and the vaccination program. A significant variability in the daily data, with abruptly increasing and decreasing magnitudes, is observed between nearby days, highlighting some sort of uncertainty entailing some stochastic nature in the data. This may be due to highly variable factors, such as the quantity of tests available and performed, symptomatology, etc. The implementations and computations are performed with Mathematica®, version 12.0, and are included as supplementary material, where the data are available (variable vtotal).

Fig. 1.

Fig. 1

Left panel: number of new daily reported infections. Right panel: accumulated daily number of reported infections

Multiple generalized logistic growth curves

In this subsection, the reported infections of the COVID-19 disease presented in Fig. 1 are modeled. Each wave of the epidemic—the accumulated version from the right panel of Fig. 1—is described by means of a generalized logistic growth curve. The four waves are then modeled by juxtaposing four generalized logistic growth curves. The input parameters of the complete model are calibrated by certain optimization procedures. This framework provides a smooth curve that fits the data from both panels of Fig. 1. Note that in this subsection, stochastic effects are not taken into account yet.

A generalized logistic differential equation

The Malthusian model, proposed by T.R. Malthus in 1798 in an essay (Malthus 1999), describes an exponential growth in a population through the ordinary differential equation

y(t)=ay(t),

where a is a positive parameter defined as the intrinsic growth rate. Given an initial condition y(t0)=y0, the solution of the above equation is given by

y(t)=y0ea(t-t0).

A modern formulation of the Malthusian growth model can be read at any introductory text (Murray 2002). In the field of population ecology, it is considered as the first law of population dynamics (Turchin 2001).

In order to capture the decrease in the growth rate with time, P.F. Verhulst proposed in 1838 (Verhulst 1838; Kingsland 1982) the logistic model, given by

y(t)=ay(t)1-y(t)K,

where K>0 is the carrying capacity (the limit of y(t) as t tends to infinity). This is a Bernoulli-type ordinary differential equation. Given an initial condition y(t0)=y0, the solution is known in closed form

y(t)=K1+-1+Ky0e-a(t-t0).

This function was employed to forecast an Ebola epidemic (Chowell et al. 2014). The saturation effect implicitly captures public health interventions, without complex mechanistic assumptions about the transmission process.

To allow for more flexible S-shaped curves to model growth phenomena over time, the following modification of the logistic differential equation has been suggested in the literature

y(t)=ay(t)1-y(t)Kb. 1

Originally, (1) was used for the analysis of tumor growth (Marusic et al. 1994; Spratt et al. 1996; Birch 1999; Sachs et al. 2001), though applications in epidemiology are also found. Examples of diseases include SARS (Hsieh et al. 2004; Hsieh 2009), dengue fever (Hsieh 2005), influenza H1N1 (Hsieh 2010), Zika (Chowell et al. 2016), Ebola (Pell et al. 2018), and COVID-19 (Pelinovsky et al. 2020; Wu et al. 2020; Aviv-Sharon and Aharoni 2020; Lee et al. 2020). Here b>0 is a power that controls the asymmetry of the curve and how fast the limiting number K is approached. It endows the model with higher flexibility. When b=1, the classical logistic differential equation is obtained, and when b tends to 0, the Gompertz equation is recovered. This generalization belongs to the class of Bernoulli differential equations too. If y(t0)=y0 is the initial condition, the solution takes the form

y(t)=K1+-1+Ky0be-ab(t-t0)1b. 2

This is called a generalized logistic growth curve (or sometimes Richards’ curve). It may be appropriate to model the aggregated cases of a single wave of the COVID-19 epidemic. For new cases (not accumulated), consecutive differences y(t)-y(t-1) are considered.

Combination of growth curves

For multiple phases of growth, a single sigmoid curve is not appropriate to describe such data. Thus, we propose a combination of generalized logistic growth curves of the form (2). This is an extension of the work initiated by P.S. Meyer for the bi-logistic model (Meyer 1994), with subsequent applications in sociology (Fenner et al. 2013), agriculture (Shehu 2015) or epidemiology (Zhang et al. 2020; Salpasaranis and Stylianakis 2020; da Silva et al. 2020; Lavrova et al. 2017), for instance.

Mathematically, a combination of generalized logistic growth curves takes the following form

y(t)=iKi1+-1+Kiy0,ibie-aibi(t-t0,i)1bi. 3

This sum of trajectories, supplied with four terms i=1,2,3,4, allows modeling the accumulated cases of the four concatenated COVID-19 waves. In practice, one should try fitting several concatenated models and compare their goodness-of-fit. In our case, we tried three terms, but the fit was not good. For new cases (not accumulated), consecutive differences y(t)-y(t-1) are considered. It is important to note that the four generalized logistic growth curves are not independent (the four waves are not treated independently). Finally, notice that t0,i captures the beginning of the i-th wave. Each Ki measures the highest infection level of the i-th wave.

A deterministic fit

Let dl be the number of new reported cases at time l{1,,349}. Let Il be the number of accumulated reported cases at time l{1,,349}, scaled by the total population in Castilla-Leon (N0=2.408×106 inhabitants). The following simple relations hold

dl=(Il-Il-1)×N0,Il×N0=j=1ldj.

A combination of generalized logistic growth curves as in (3) is used to model {Il}l=1349. The parameters of (3) are estimated at once by a deterministic least-squares procedure (Stanescu et al. 2009),

minparametersl=1349Il-y(l)2. 4

In most of the cases, the model curve will not go through all the data, so this minimum value will not be zero. The minimum gives a measure of how good the fit is.

Parametric randomization of the model

We now proceed with modeling the daily random variability of the data through a probabilistic setting. Following a Bayesian formalism, the parameters and the errors will be regarded as random variables on an abstract probabilistic space. Their probability distributions will be then inferred by means of Bayesian bootstrapping. In this context, the output of the model is a stochastic process, which will render a stochastic fit of the data, rather than an averaged estimation.

Randomization

Part of the variability of the data is not captured by the deterministic model. We assume the errors are random variables naturally defined in a probabilistic framework. We also consider the lack of knowledge on the input parameters, which prescribe the constitutive laws of the system, represented within a probabilistic framework (Smith 2013, chapter 1), (Le Maître and Knio 2010, chapter 1). Thus, we consider in this new setup the parameters and the errors of the model as random variables.

The field of uncertainty quantification studies the impact of random uncertainties on models (Smith 2013). This quantification is necessary to evaluate the discrepancies between the model predictions and the current system behavior. Inverse uncertainty quantification deals with inference of the probability distributions of the parameters from the data. These probability distributions are not, in general, independent. Forward uncertainty quantification extracts the main statistical content of the model output, once the probability distributions of the parameters are fixed (Le Maître and Knio 2010; Xiu 2010). The various stages of the probabilistic modeling process are schematically illustrated in Fig. 2. Inverse uncertainty quantification is not an easy task. Here we rely on the Bayesian bootstrap technique. Forward uncertainty quantification will be conducted via Monte Carlo simulation.

Fig. 2.

Fig. 2

Schematic illustration of the various stages of the probabilistic modeling process

Bayesian bootstrap

Given a mathematical model, the model error coming from a deterministic least-squares optimization technique may be regarded as a random variable X. The error varies with time, and thus the errors at different instants of time may be seen as copies of X. In this context, the errors of the model are identically distributed and independent random variables X1,,Xm, where m is the length of the data. Actually, these random variables are unknown; only different realizations x1,,xm are available from the data and the deterministic fit. The bootstrap methodology assumes that the observed residuals x1,,xm are all possible distinct values of X, based on the principle that all observed variables are discrete.

The Bayesian bootstrap, developed by D.B. Rubin (Rubin 1981), infers the distribution of X by resampling x1,,xm with Dirichlet weights of coefficients 1,,1. This procedure corresponds to the following hierarchical Bayesian statistical model

(X1,,Xm)|(p1,,pm)mtimesCat(m,(p1,,pm)),

with

(p1,,pm)Dir(m,(0,,0)).

Bayes’ theorem yields the posterior distribution

(p1,,pm)|(x1,,xm)Dir(m,(1,,1)),

because, for p1++pm=1,

π(p1,,pm|x1,,xm)π(x1,,xm|p1,,pm)×π(p1,,pm)k=1mpk×k=1mpk-1=1Dir(m,(1,,1)).

That is, the posterior proportions are uniformly distributed on the simplex. Here Cat is the Categorical distribution on {x1,,xm} and Dir stands for the Dirichlet distribution, which is a conjugate prior. Parameters (0,,0) and (1,,1) come from the frequencies in x1,,xm before and after observing them, respectively. Note that Dir(m,(0,,0)) is an improper prior.

The Dir(m,(1,,1)) distribution can be sampled as follows. From independent realizations u1,,um-1Unif(0,1), assume that these values are ordered as u1um-1. Set u0=0 and um=1. Define gk=uk-uk-1, k=1,,m. Then g1,,gm are m independent realizations of Dir(m,(1,,1)).

For each resampling of (x1,,xm) with Dirichlet weights of coefficients (1,,1), the input parameters of the model are determined by a least-squares fitting. Formally, it is assumed that

parameters=Λ(X1,,Xm)almost surely,Λ=least-squares fitting operator.

This gives rise to samples of the input parameters, so their (posterior) probability distributions may be inferred as

parameters|(x1,,xm)parameters|(X1,,Xm)×(X1,,Xm)|(p1,,pm)×(p1,,pm)|(x1,,xm).

This allows solving the problem of inverse uncertainty quantification.

Monte Carlo simulation

Monte Carlo simulation is a popular method for forward uncertainty quantification. It is simple to implement and robust. It uses a collection {γ1,,γM} of independent random realizations of the model response, usually obtained from deterministic numerical techniques. The statistics of the model response are derived from the statistics of that sample. For example, the mean of a function h of the model output is estimated by 1Mk=1Mh(γk), by the law of large numbers. Monte Carlo simulation essentially amounts to conducting M deterministic resolutions, where M is generally large. The robustness of the method is due to its independence of the random dimensionality, the variable t, or regularity issues. Thus, in contrast to spectral methods, its use is advantageous when there is a large number of input random parameters or the variable t may be large. Further, if the inverse parameter estimation method generates realizations of the parameters (such as Bayesian methods), then Monte Carlo simulation seems the logic option for forward uncertainty quantification. The convergence of the Monte Carlo estimate behaves as M-1/2 due to the central limit theorem. It is assessed based on the estimated statistics. The reader is referred to (Smith 2013; Le Maître and Knio 2010; Xiu 2010) for similar discussions on Monte Carlo sampling.

A stochastic fit

The aim here is to randomize the deterministic model. The input parameters and the model errors are assumed to be random variables. Note that to apply the Bayesian bootstrap methodology, one needs errors that are identically distributed and independent. However, the fit for the accumulated infections does not yield independent residuals. Indeed, two consecutive residuals are correlated, because of the increasing character of the curve. Also, in addition, the fit for the daily new infections does not yield identically distributed errors. Certainly, a resampling of residuals may give rise to negative data points, which does not make sense. To fix these issues and achieve, as far as possible, an independent and identically distributed sample, the (natural) logarithms of the daily new infections will be considered. While the parameters of model (3) for cumulative infections are calibrated by least-squares fitting, the resampling is performed for the residuals obtained for the logarithms of the daily new infections. For each resampling of these residuals, we come back to cumulative infections and refit model (3). In this way, realizations of the model parameters are obtained for each refit.

The following steps summarize the procedure for estimating the probability distributions of the parameters:

  • Step 1

    Determine the residuals for the logarithms of the daily new infections: xl=log(dl)-log((y(l)-y(l-1))×N0). The parameters of y were previously determined by the deterministic least-squares fitting (4).

  • Step 2

    Start a FOR loop to generate M bootstrap samples.

  • Step 3

    Resample the residuals with Dirichlet weights of coefficients (1,,1). Keep the resampling as a vector xiteration with components xiteration(l).

  • Step 4

    Define zl=log(dl)+xiteration(l). These are the new generated data for the logarithms of the daily new infections.

  • Step 5

    Let I~l=(j=1lezj)/N0. These are the new generated data for the accumulated infections.

  • Step 6

    Fit the deterministic model (3) to Il~ by least-squares fitting. Keep the parameters as a vector λiteration.

  • Step 7

    End the FOR loop.

  • An ensemble of realizations (x1,,xM) for the residuals (errors) of the logarithms of the daily new infections. An ensemble of realizations (λ1,,λM) for the input parameters.

Once (x1,,xM) and (λ1,,λM) are available, the following steps outline the procedure to calculate the realizations of the model output:

  • Step 1

    Start a FOR loop over k=1,,M.

  • Step 2

    With parameters λk, evaluate αk(l)=log((y(l)-y(l-1))×N0). This gives a vector αk, a discrete sample path of the model for the logarithms of the daily new infections.

  • Step 3

    Incorporate the error: βk=αk+xk. This is a a discrete sample path of the model for the logarithms of the daily new infections, taking into account the random errors.

  • Step 4

    Let γk(l)=(j=1leβk(j))/N0. Here γk is a discrete sample path for the accumulated infections.

  • Step 5

    End the FOR loop.

  • The M discrete sample paths (γ1,,γM) of the model output for the accumulated infections. From them, statistics such as the mean, the variance, quantiles, etc. may be determined by means of Monte Carlo simulation.

Results

Deterministic fit

We have used the built-in function FindFit in Mathematica® to obtain the optimal parameters in (4), which are (up to four significant digits) as follows:

y0,1=2.9×10-6,b1=0.1675,a1=0.7396,K1=0.01982,t0,1=0,y0,2=0.008758,b2=0.8369,a2=0.1142,K2=0.008834,t0,2=113.7,y0,3=0.01955,b3=4.595,a3=0.02152,K3=0.05543,t0,3=212.1,y0,4=0.001298,b4=0.8733,a4=0.1505,K4=0.03213,t0,4=303.6.

The fitted model is depicted in Fig. 3. In the left panel, the aggregated reported infections in percentages, y(l)×100, are shown, while the right panel shows the estimated new daily infections, (y(l)-y(l-1))×N0. The combination of generalized logistic growth curves allows for a good fit of the COVID-19 incidence, at least from an averaged (smoothed) point of view. In the following subsection, stochasticity will be taken into account to deal with the daily random variability of the measurements. In Fig. 4, the predictability of the model is quantified. We use two train sets, up to t=200 and up to t=300, and predict the epidemic size a few days later. The forecast is reasonably good for two weeks. It seems to be better at t=200 than at t=300, possibly due to the fact that t=200 corresponds to the peak of the second wave, while t=300 corresponds to the very early growth phase of the fourth wave. Predictions seem to be better when a larger dataset of the wave is available.

Fig. 3.

Fig. 3

Left panel: deterministic fit for the daily accumulated reported infections in percentages. Right panel: deterministic fit for the new daily reported infections

Fig. 4.

Fig. 4

Left panel: deterministic prediction for the daily accumulated reported infections in percentages. Right panel: deterministic prediction for the new daily reported infections. The vertical dashed line indicates the end of the calibration period

Stochastic fit

Table 1 reports the estimated marginal posterior statistics (mean, median, standard deviation, and quantiles 0.025 and 0.975) of the corresponding parameters. Figure 5 plots histograms for some marginal posterior distributions. Realizations (λ1,,λM) are employed, for M=1000 bootstrap samples (larger bootstrap samples do not render significant differences at the scale of the figures). In Fig. 6, the fit of the randomized model is shown, both for the accumulated reported infections (left panel) and the new reported infections per day (right panel). We draw mean values and regions of probability 0.95 for (γ1,,γM). Optical inspection shows that the estimated mean values for the model response are very similar to the deterministic fit from Fig. 3. Thus, the deterministic fit is extended by incorporating probabilistic features. In the stochastic approach, the probability regions must contain the variability of the data, but in a correct way, in the sense that the realizations generated should resemble the pattern of the data. A stochastic method whose probability regions are unnecessarily wide is not good, despite containing all recorded measurements. To better appreciate the similarity between the real data and the stochastic model, compared to the deterministic counterpart, some realizations (γ1, γ2 and γ3) are plotted in Fig. 7. In Fig. 8, the predictability of the model is illustrated. We use two sets of incidence data, up to t=200 and up to t=300 as in the deterministic subsection.

Table 1.

Estimated marginal posterior statistics (mean, median, standard deviation, and quantiles 0.025 and 0.975) of the input random parameters

Mean Median SD Quantile 0.025 Quantile 0.975
b1 0.350 0.283 0.224 0.103 0.892
a1 0.580 0.513 0.191 0.361 0.962
K1 0.018 0.018 0.004 0.010 0.024
y0,2 0.011 0.010 0.003 0.005 0.018
b2 0.564 0.674 0.316 0.074 0.930
a2 0.248 0.137 0.203 0.104 0.769
K2 0.011 0.010 0.005 0.006 0.019
t0,2 115.1 114.8 2.839 111.7 118.4
y0,3 0.019 0.019 0.002 0.016 0.021
b3 4.183 4.490 0.857 2.146 4.985
a3 0.025 0.022 0.043 0.020 0.027
K3 0.055 0.055 0.004 0.049 0.060
t0,3 212.6 212.4 1.412 211.5 213.7
y0,4 0.001 0.001 0.0003 0.0004 0.002
b4 0.712 0.814 0.269 0.131 0.984
a4 0.220 0.168 0.138 0.122 0.690
K4 0.032 0.032 0.003 0.026 0.037
t0,4 304.4 304.2 1.138 302.5 306.8

Fig. 5.

Fig. 5

Histograms for some marginal posterior distributions

Fig. 6.

Fig. 6

Left panel: stochastic fit for the accumulated daily reported infections in percentages. Right panel: stochastic fit for the new daily reported infections

Fig. 7.

Fig. 7

Left panel: some realizations of the stochastic model for the accumulated daily reported infections in percentages. Right panel: some realizations of the stochastic model for the new daily reported infections

Fig. 8.

Fig. 8

Left panel: stochastic prediction for the daily accumulated reported infections in percentages. Right panel: stochastic prediction for the new daily reported infections. The vertical dashed line indicates the end of the calibration period

Discussion

The COVID-19 spread in Castilla-Leon is modeled along the four waves in a single equation, both from the deterministic and the stochastic points of view. Generalized logistic differential equations are concatenated four times to deal with the four waves at once. This procedure extends the use of the generalized logistic differential equation to more than one wave, compared to (Pelinovsky et al. 2020; Wu et al. 2020; Aviv-Sharon and Aharoni 2020; Lee et al. 2020), and to more than two waves, compared to (Zhang et al. 2020; Salpasaranis and Stylianakis 2020; da Silva et al. 2020) from the work (Meyer 1994) on the bi-logistic model. The deterministic model gives a good fit for COVID-19 reported cases, especially for accumulated cases. Daily new reported infections present significant variability, perhaps due to highly variable factors, such as amount of tests available and performed, symptomatology, etc. Thus, stochasticity is incorporated, to extend the deterministic model and to obtain realizations that resemble the irregular dynamics of the reported new cases closer (Smith 2013). The Bayesian bootstrap (Rubin 1981), together with a trick to deal with residuals, is employed to set probability distributions for the input parameters and the errors. The potential of the Bayesian bootstrap for mathematical models with uncertainties does not seem to have been investigated. Our approach resembles the previous use of the frequentist bootstrap (Efron 1979) for mathematical models with deterministic parameters and random errors (Dogan 2007). Apart from fitting the available data, predictions have been performed by using two train sets, for a quite large forecast period of 15 days. Phenomenological models like the one proposed in this paper may be useful to generate reasonable forecasts in near time of the incidence of starting and advanced epidemic outbreaks, without entering in accounting for possible underlying mechanisms of the studied phenomenon (such as temporal or spatial dependencies), which would clearly be the base for further analysis. This idea was also discussed in (Chowell et al. 2016; Pell et al. 2018) for a single generalized logistic model.

We justify the use of the Bayesian bootstrap by commenting the unfeasibility of other techniques:

  • Maximum entropy principle. This principle has been used in the literature to infer consistent probability distributions for input parameters (Dorini and Sampaio 2012). The probability density function of the parameter is taken by maximizing the Shannon entropy functional, often restricted to a certain support, to a mean value equal to the deterministic estimate, and rarely to a variance if available. In the case studied in the present paper, reliable supports of the parameters are unknown a priori.

  • General Bayesian model. Not restricted to the Bayesian bootstrap, Markov Chain Monte Carlo algorithms may solve the problem for any set of prior distributions (Smith 2013; Lesaffre and Lawson 2012). However, the large amount of input random parameters in our case study prevented us from using this option.

  • Itô-type stochastic differential equations. One could naturally ask about the incorporation of a white noise (formal derivative of Brownian motion) into the generalized logistic differential equation (1) (Mao 2007; Allen 2007). However, accumulated infections give rise to increasing sample paths, which would contradict the everywhere non-differentiability of Itô processes.

Some limitations of the present work, which define potential avenues for future research, are the following: (a) Ideally, the probabilistic interval for the model output should be narrower. It would be of interest to investigate alternative deterministic models (the averaged, smooth curve) or Bayesian approaches. (b) We needed to consider the logarithms of the daily new infections to have an adequate resampling of residuals for which increasing sample paths (accumulated infections) or negative values (daily new infections) were not a problem. It would be interesting to directly define a model error for the accumulated infections that takes into account the correlation between successive days. This correlation is due to the increasing character of the curve. (c) Spatial, behavioral, or environmental effects have been neglected. These conditions may provide a more faithful fit of the COVID-19 data, albeit at the expense of higher complexity. Nonetheless, when uncertainty is present on the phenomenon itself and the data, sometimes it may be preferable to simply consider the model as uncertain, rather than augmenting its complexity.

Conclusion

We have shown in the paper that the sum of four generalized logistic growth curves allows for a proper fit of the accumulated reported infections along the four waves of the COVID-19 epidemic in Castilla-Leon (Spain). Daily new reported infections are described by consecutive differences. The input parameters are pointwise calibrated by least-squares fitting. However, this calibration lacks of probabilistic interpretations.

Taking into account the significant variability in the daily reported data, with noisy features, stochasticity is incorporated into the model by treating the input parameters and the model errors as random variables. This conception of uncertainty matches with the Bayesian formalism of Statistics. The Bayesian bootstrap is an adequate approach for inverse uncertainty quantification and infers the probability distributions of the parameters. The model response is stochastic and includes realizations that permit a more reliable fit of the daily new COVID-19 reported cases, compared to the smooth deterministic counterpart.

Funding

This paper has been partially funded by projects PID2019-107392RB-I00 from Spanish Ministry of Science, AICO/2019/198 from Generalitat Valenciana, and PID2020-115270GB-I00 from Spanish Agencia Estatal de Investigación (MCIN/AEI/10.13039/501100011033).

Data Availability Statement

The implementations and computations in Mathematica® are included as supplementary material, where the data are available (variable vtotal). The cases have been retrieved from the open data portal of Castilla-Leon.

Declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this article.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Julia Calatayud, Email: calatayj@uji.es.

Marc Jornet, Email: marc.jornet@uv.es.

Jorge Mateu, Email: mateu@uji.es.

References

  1. Acedo L, Moraño JA, Santonja FJ, Villanueva RJ. A deterministic model for highly contagious diseases: the case of varicella. Phys A. 2016;450:278–286. doi: 10.1016/j.physa.2015.12.153. [DOI] [Google Scholar]
  2. Allen E. Modeling with itô stochastic differential equations. Dordrecht: Springer Science & Business Media; 2007. [Google Scholar]
  3. Aviv-Sharon E, Aharoni A. Generalized logistic growth modeling of the COVID-19 pandemic in Asia. Inf Disease Model. 2020;5:502–509. doi: 10.1016/j.idm.2020.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Berihuete A, Sánchez-Sánchez M, Suárez-Llorens A. A Bayesian model of COVID-19 cases based on the Gompertz curve. Mathematics. 2021;9(3):228. doi: 10.3390/math9030228. [DOI] [Google Scholar]
  5. Birch CP. A new generalized logistic sigmoid growth equation compared with the Richards growth equation. Ann Bot. 1999;83(6):713–723. doi: 10.1006/anbo.1999.0877. [DOI] [Google Scholar]
  6. Chitnis N, Schpira A, Smith D, Hay SI, Smith T, Steketee R. Mathematical modelling to support malaria control and elimination. Roll Back Malar Prog Impact Ser (World Health Organization, Progress & impact series) 2010;5:1–48. [Google Scholar]
  7. Chowell G, Hincapie-Palacio D, Ospina JF, Pell B, Tariq A, Dahal S, Moghadas SM, Smirnova A, Simonsen L, Viboud C (2016) Using phenomenological models to characterize transmissibility and forecast patterns and final burden of Zika epidemics. PLoS Currents 8 [DOI] [PMC free article] [PubMed]
  8. Chowell G, Simonsen L, Viboud C, Kuang Y (2014) Is west Africa approaching a catastrophic phase or is the Ebola epidemic slowing down? Different models yield different answers for Liberia. PLoS Curr 2014(6) [DOI] [PMC free article] [PubMed]
  9. da Silva EV, da Silva Melo J, Leite MA. Modelo bi-logístico aplicado aos primeiros 1015 casos de COVID-19 em indígenas do Estado do Amapá e norte do Pará. Sci Knowl Focus. 2020;3(2):77–88. [Google Scholar]
  10. Dogan G. Bootstrapping for confidence interval estimation and hypothesis for parameters of system dynamics models. Syst Dyn Rev. 2007;23:415–436. doi: 10.1002/sdr.362. [DOI] [Google Scholar]
  11. Dorini FA, Sampaio R. Some results on the random wear coefficient of the Archard model. J Appl Mech. 2012;79(5):051008–051014. doi: 10.1115/1.4006453. [DOI] [Google Scholar]
  12. Efron B. Bootstrap methods: Another look at the jackknife. Ann Stat. 1979;7(1):1–26. doi: 10.1214/aos/1176344552. [DOI] [Google Scholar]
  13. Fenner T, Levene M, Loizou G. A bi-logistic growth model for conference registration with an early bird deadline. Open Phys. 2013;11(7):904–909. doi: 10.2478/s11534-013-0275-4. [DOI] [Google Scholar]
  14. Hsieh YH (2009) Richards model: a simple procedure for real-time prediction of outbreak severity. In: Zhou Y, Wu J, Ma Z (eds) Modeling and dynamics of infectious diseases, World Scientific, pp 216–236
  15. Hsieh YH. Pandemic influenza A (H1N1) during winter influenza season in the southern hemisphere. Influenza Other Respir Viruses. 2010;4(4):187–197. doi: 10.1111/j.1750-2659.2010.00147.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Hsieh YH, Lee JY, Chang HL. SARS epidemiology modeling. Emerg Infect Dis. 2004;10(6):1165. doi: 10.3201/eid1006.031023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hsieh YH. Ma S (2009) Intervention measures, turning point, and reproduction number for dengue, Singapore. Am J Trop Med Hyg. 2005;80(1):66–71. doi: 10.4269/ajtmh.2009.80.66. [DOI] [PubMed] [Google Scholar]
  18. Kingsland S. The refractory model: the logistic curve and the history of population ecology. Q Rev Biol. 1982;57:29–52. doi: 10.1086/412574. [DOI] [Google Scholar]
  19. Kraemer MU, Yang CH, Gutierrez B, Wu CH, Klein B, Pigott DM, Brownstein JS. The effect of human mobility and control measures on the COVID-19 epidemic in China. Science. 2020;368(6490):493–497. doi: 10.1126/science.abb4218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lavrova AI, Postnikov EB, Manicheva OA, Vishnevsky BI. Bi-logistic model for disease dynamics caused by Mycobacterium tuberculosis in Russia. Roy Soc Open Sci. 2017;4(9):171033. doi: 10.1098/rsos.171033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Le Maître OP, Knio OM. Spectral methods for uncertainty quantification: with applications to computational fluid dynamics. Netherlands: Springer Science & Business Media; 2010. [Google Scholar]
  22. Lee SY, Lei B, Mallick B. Estimation of COVID-19 spread curves integrating global data and borrowing information. PLoS ONE. 2020;15(7):e0236860. doi: 10.1371/journal.pone.0236860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lesaffre E, Lawson AB (2012) Bayesian biostatistics. Wiley, Statistics in Practice, New York
  24. Malthus TR. An essay on the principal of population. Oxford: Oxford World’s Classics Paperbacks Oxford University Press; 1999. [Google Scholar]
  25. Mao X (2007) Stochastic differential equations and applications. Elsevier
  26. Marusic M, Bajzer Z, Vuk-Pavlovic S, Freyer JP. Tumor growth in vivo and as multicellular spheroids compared by mathematical models. Bull Math Biol. 1994;56:617–631. doi: 10.1007/BF02460714. [DOI] [PubMed] [Google Scholar]
  27. Meyer PS. Bi-logistic growth. Technol Forecast Soc Chang. 1994;47(1):89–102. doi: 10.1016/0040-1625(94)90042-6. [DOI] [PubMed] [Google Scholar]
  28. Moein S, Nickaeen N, Roointan A, Borhani N, Heidary Z, Javanmard SH, Ghaisari J, Gheisari Y. Inefficiency of SIR models in forecasting COVID-19 epidemic: a case study of Isfahan. Sci Rep. 2021;11(1):1–9. doi: 10.1038/s41598-021-84055-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Muñoz-Fernández GA, Seoane JM, Seoane-Sepúlveda JB. A SIR-type model describing the successive waves of COVID-19. Chaos Soliton Fract. 2021;144:110682. doi: 10.1016/j.chaos.2021.110682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Murray JD. Mathematical biology I: an introduction. New York: Springer; 2002. [Google Scholar]
  31. Pelinovsky E, Kurkin A, Kurkina O, Kokoulina M, Epifanova A. Logistic equation and COVID-19. Chaos Soliton Fract. 2020;140:110241. doi: 10.1016/j.chaos.2020.110241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Pell B, Kuang Y, Viboud C, Chowell G. Using phenomenological models for forecasting the 2015 Ebola challenge. Epidemics. 2018;22:62–70. doi: 10.1016/j.epidem.2016.11.002. [DOI] [PubMed] [Google Scholar]
  33. Remuzzi A, Remuzzi G. COVID-19 and Italy: What next? Lancet. 2020;395(10231):1225–1228. doi: 10.1016/S0140-6736(20)30627-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Rubin DB. The Bayesian bootstrap. Ann Stat. 1981;9(1):130–134. doi: 10.1214/aos/1176345338. [DOI] [Google Scholar]
  35. Sachs RK, Hlatky LR, Hahnfeldt P. Simple ODE models of tumor growth and anti-angiogenic or radiation treatment. Math Comput Model. 2001;33(12–13):1297–1305. doi: 10.1016/S0895-7177(00)00316-2. [DOI] [Google Scholar]
  36. Salpasaranis K, Stylianakis V. Forecasting models of the coronavirus (COVID-19) cumulative confirmed cases using a hybrid genetic programming method. Eur J Eng Technol Res. 2020;5(12):52–60. [Google Scholar]
  37. Shehu V. Simple Logistic and Bi-Logistic Growth used as forecasting models of greenhouse areas in Albanian agriculture. J Multidiscip Eng Sci Technol. 2015;2(9):2648–2653. [Google Scholar]
  38. Smith RC (2013) Uncertainty quantification: theory, implementation, and applications. SIAM
  39. Spratt JS, Meyer JS, Spratt JA. Rates of growth of human neoplasms: Part II. J Surg Oncol. 1996;61(1):68–83. doi: 10.1002/1096-9098(199601)61:1<68::AID-JSO2930610102>3.0.CO;2-E. [DOI] [PubMed] [Google Scholar]
  40. Stanescu D, Chen-Charpentier BM, Jensen BJ, Colberg PJS. Random coefficient differential models of growth of anaerobic photosynthetic bacteria. Electron T Numer Ana. 2009;34:44–58. [Google Scholar]
  41. Turchin P. Does population ecology have general laws? Oikos. 2001;94(1):17–26. doi: 10.1034/j.1600-0706.2001.11310.x. [DOI] [PubMed] [Google Scholar]
  42. Verhulst PF. Notice sur la loi que la population suit dans son accroissement. Corr Math et Phys. 1838;10:113–121. [Google Scholar]
  43. Wang P, Zheng X, Li J, Zhu B. Prediction of epidemic trends in COVID-19 with logistic model and machine learning technics. Chaos Soliton Fract. 2020;139:110058. doi: 10.1016/j.chaos.2020.110058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. World Health Organization (WHO) (2021) Coronavirus disease (COVID–19) pandemic. Available at https://www.who.int/emergencies/diseases/novel-coronavirus-2019. Accessed 22nd July 2021
  45. Wu YC, Chen CS, Chan YJ. The outbreak of COVID-19: an overview. J Chin Med Assoc. 2020;83(3):217. doi: 10.1097/JCMA.0000000000000270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Wu K, Darcet D, Wang Q, Sornette D. Generalized logistic growth modeling of the COVID-19 outbreak: comparing the dynamics in the 29 provinces in China and in the rest of the world. Nonlinear Dyn. 2020;101(3):1561–1581. doi: 10.1007/s11071-020-05862-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, Yuan ML, Zhang YL, Dai FH, Liu Y, Wang QM, Zheng JJ, Xu L, Holmes EC, Zhang YL. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579(265):265–269. doi: 10.1038/s41586-020-2008-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Xiu D (2010) Numerical methods for stochastic computations: a spectral method approach. Cambridge Texts in Applied Mathematics, Princeton University Press, New York
  49. Zhang L, Tao Y, Zhuang G, Fairley CK. Characteristics analysis and implications on the COVID-19 reopening of Victoria, Australia. Innovation. 2020;1(3):100049. doi: 10.1016/j.xinn.2020.100049. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The implementations and computations in Mathematica® are included as supplementary material, where the data are available (variable vtotal). The cases have been retrieved from the open data portal of Castilla-Leon.


Articles from Stochastic Environmental Research and Risk Assessment are provided here courtesy of Nature Publishing Group

RESOURCES