Abstract
In this paper, we propose a hierarchical Bayesian approach for modeling the evolution of the 7-day moving average for the number of deaths due to COVID-19 in a country, state or city. The proposed approach is based on a Gaussian process regression model. The main advantage of this model is that it assumes that a nonlinear function f used for modeling the observed data is an unknown random parameter in opposite to usual approaches that set up f as being a known mathematical function. This assumption allows the development of a Bayesian approach with a Gaussian process prior over f. In order to estimate the parameters of interest, we develop an MCMC algorithm based on the Metropolis-within-Gibbs sampling algorithm. We also present a procedure for making predictions. The proposed method is illustrated in a case study, in which, we model the 7-day moving average for the number of deaths recorded in the state of São Paulo, Brazil. Results obtained show that the proposed method is very effective in modeling and predicting the values of the 7-day moving average.
Keywords: COVID-19, Bayesian approach, Gaussian process, predictions, MCMC algorithm
1. Introduction
Especially in the year 2020, many articles were published describing modeling procedures for the number of cases and/or deaths due to COVID-19 in many countries. The interest in this kind of modeling lies mainly in projections that these models may provide and consequently assist government agents in making decisions regarding the intensification of social isolation, the acquisition of hospital equipment, an increase in the number of intensive care units in hospitals, among others.
In general, the published works model the accumulated number of cases (or deaths) by using some nonlinear growth model. For example, Musa et al. [5] considered a simple exponential growth model to analyze the initial phase of the epidemic of COVID-19 in Africa, Aviv-Sharon and Aharoni [1] modeled the data from the Philippines and Taiwan using the generalized logistic model, Vasconcelos et al. [8] applied the Richards growth model to the data collected in China, France, Germany, Iran, Italy, South Korea, and Spain, Wu et al. [10] calibrated the logistic growth model, the generalized logistic growth model, and the generalized Richards model for the number of cases recorded in China, among others.
Since the growth in the cumulative number of cases and deaths by COVID-19 has, in general, presented a heterogeneous evolution over time, this implies that the adjustment of only one of these growth models may not be adequate to explain the entire study period. This heterogeneous evolution is due to the occurrence of more than one wave of the pandemic or due to the accuracy of the statistical reports on the number of cases and deaths recorded. For example, the number of cases may present a fast growth if the number of diagnostic tests is increased. On the other hand, tests that need to be realized to include the death cases in mortality statistics may take some days. This implies a sub-notification followed by an over-notification. In addition, the number of cases and deaths may be sub-notified on the weekends and appear on the statistical reports only a few days after.
An alternative is modeling, for instance, the 7-day moving average in order to minimize the discrepancies that may be contained in the dataset. In this paper, we assume for the 7-day moving average dataset an additive model composed by a nonlinear function f plus a random error ϵ. However, in opposite to the usual approach that is based on set up f as a known mathematical function, we assume that f is an unknown random parameter. In order to estimate it from the data, we adopt a Bayesian approach putting over the unknown nonlinear function a Gaussian process prior. That is, we are assuming a probability distribution over all possible functions that fit a set of points equally well [9]. In addition, we assume a more hierarchical level putting prior distributions on the parameters of the Gaussian process. The main advantage of assuming a Gaussian process prior over f is that we are estimating f by ‘smooth functions’ obtained by the generation of values of a multivariate normal distribution with an adequate covariance matrix, and linking the generated points by lines.
To estimate all parameters of interest, we developed an MCMC algorithm based on the Metropolis-within-Gibbs sampling algorithm. We also present an MCMC algorithm for making predictions. The proposed method is illustrated in a case study, in which, we model the 7-day moving average recorded in the state of São Paulo, Brazil. Results obtained show that the proposed method is very effective in modeling the values of the 7-day moving average for the number of deaths due to COVID-19. In the course of 481 days of the pandemic, we run the estimation procedure four times, on the 100th, 180th, 280th and 465th days. In these four analyses, the mean square error of the fitted model was smaller than 0.05, indicating a very good performance of the proposed method.
The three main advantages of the proposed method are:
It is very flexible since it is not restricted to a parametric mathematical function;
Does not need the fit of a set of parametric models followed by the application of a model comparison procedure;
It is not too difficult to be implemented computationally since the estimation procedure is based on the use of a Metropolis-within-Gibbs sampling algorithm;
Although we develop the paper with a focus on the 7-day moving average for the number of deaths, the method also can be used for modeling the moving average for the number of cases. In addition, the method is not restricted to the 7-day moving average and a user may use it for modeling a d-day moving average dataset, for d>1.
The remainder of the paper is organized as follows. In Section 2, we present the Bayesian approach for modeling the 7-day moving average for the number of deaths recorded in a country, state and/or city. In this section, we also describe the MCMC algorithms used to estimate the parameters of interest and make predictions. Section 3 presents an application of the proposed method to a case study. Section 4 concludes the paper with the final remarks.
2. Bayesian model for 7-day moving average
Let be the number of deaths by COVID-19 recorded in a country, state or city on the tth day, for , where t = 0 represents the day that the first death was recorded, and N is the last day considered in the analysis. Consider be the 7-day moving average of the number of deaths due to the covid-19, for .
Assume that values are generated according to the following additive model
| (1) |
where is an unknown function and is a random error assumed as being generated from a normal distribution with mean 0 and variance , , with , for and .
At this point, it is usual to complete the model (1) by setting up as a known nonlinear mathematical function. However, there may be several nonlinear parametric models that can fit the observed data points equally well. Due to this, it is common to fit a set of candidate models and then choose the best model using some model selection criteria, such as AIC or BIC. That is, the analysis stays limited to the set of models previously chosen by a user. In addition, the complexity and/or flexibility of the parametric models considered are limited by the number of parameters in the model.
In order to give flexibility to the modeling and not to be restricted to a set of parametric models, hereafter, we assume that the unknown nonlinear function is a parameter of interest. To estimate these parameters from an observed dataset, we assume a Bayesian approach. Thus, consider be the set of all possible functions that can explain the data. Let be a probability distribution defined over , in a way that, a finite set of follows a multivariate normal distribution with mean vector of dimension and covariance of dimension and elements , for . In other words, we are assuming that a priori , i.e. a Gaussian process prior over , where represents the n-variate normal distribution.
Thus, setting up in order to represent our noninformative prior knowledge about the expected value of and letting be an unknown quantity, we propose the following hierarchical Bayesian model
where and represent the inverse gamma and gamma distributions, respectively, and represents the inverse-Wishart distribution with parameters γ and is a matrix of dimension with elements , for . Each term is calculated according to the squared exponential kernel, i.e.
| (2) |
We also assume that a, b, c, d, g and h are known hyperparameters. We set up all of them equal to 0.1 in order to get noninformative prior distributions. Now we point out some reasons that led us to consider this structure of prior distributions. The option for the Gamma distribution for lies of the fact of this be a natural conjugated prior. Analogously, the inverse Wishart distribution is the natural conjugated prior for . Since parameters and ν assume only positive values, , then a natural choice as prior distribution is the Gamma distribution.
Using the Bayes theorem, the joint posterior distribution for is
| (3) |
where is the likelihood function from a n-variate normal distribution, for , being is the identity matrix of dimension and the overwritten represents the matrix transpose.
The conditional posterior distributions are given by
| (4) |
| (5) |
| (6) |
| (7) |
| (8) |
where the symbol • represents all other parâmeters and the observed data.
Since Equations (4)–(7) have a known form and Equation (8) does not, then in order to get estimates for the parameters of interest, we consider a Metropolis-within-Gibbs sampling algorithm (MWGS). For each iteration of the MWGS algorithm, we update the parameters according to the Algorithm 1.
In order to update parameter ν via Metropolis-Hastings algorithm, let to be a candidate value generated from a candidate generating-density . So, the value is accepted with probability , where
Now, it is necessary to specify the candidate-generating density . Usually is chosen such that it is easy to sample from it. Two common choices are:
- , i.e. the candidate generating-density is given by the prior distribution. In this case, simplifies to
This case is denominated in the literature by Independent Metropolis-Hastings (IMH). Although the choice of the prior distributions as the candidate generating density is mathematically attractive, this may lead to many rejections of the proposed moves and a slow convergence of the algorithm. This happens, specially, for cases in which no prior information is available and prior distribution has a large variance. -
An alternative to the IMH is to explore the neighborhood of the current value of the chain in order to propose a new value.
Thus, let , where is a symmetric density, i.e. the probability of generating a move from ν to depends only on the distance between them. In this case, simplifies to
This case is denominated in the literature as random walk Metropolis (RWM).(9)
In the remaining of the paper, we adopt the RWM algorithm for updating ν. But, this does not prevent the proposed method to be applied with the Metropolis step being given by the IMH. In order to implement a RWM, we set up with ). However, as discussed by Chib and Greenberg [3] the choice of has a great influence on the efficiency of the algorithm. If is small, then random perturbations will be small in magnitude and almost all will be accepted, requiring a large number of iterations to get convergence. On the other hand, if is too large, then it will cause too many rejections of the proposed moves and a considerably slowing down convergence.
According to Bedard [2], Roberts et al. [6], Mattingly et al. [4] and Saraiva and Suzuki [7], one may fix the value of testing some values in a few pilot runs and then choosing a value in which the acceptance ratio lies between and . Thus, following this procedure, we run 10 pilot runs of the algorithm with L = 10.000 iterations for , where is a grid from 0.1 to 1 with increments of size 0.1. For the acceptance rate was of , for the acceptance rate was of , for the acceptance rate was of for the acceptance rate was of and for the other values tested, the acceptance rate was smaller than . Thus, we fix up , the mean of the tested values that have led to an acceptance rate between and .
The RWM for updating ν is implemented according to the steps of the Algorithm 2. Using Algorithms 1 and 2, we implemented the MWGS to get estimates for parameters . The steps for the implementation of this algorithm are given in Algorithm 3.
After running the L iterations of the Algorithm 3, we discard the first B iterations as a burn-in. We also consider jumps of size J, i.e. only 1 drawn from every J was extracted from the original sequence obtaining a subsequence of size to make inferences. The estimates for parameters of interest are given by the average of the generated values. For example, the estimates for and are and , where
and and are the generated values for and in the th iteration of the algorithm, respectively, for . For the other parameters, the estimates are obtained in a similar way. The credibility interval ( ) for each one of the parameters is given by the quantiles and of the sampled values.
2.1. Predictions
Defined the estimation procedure for the parameters , another interest lies in predicting the value for a new input . That is, the interest is in the predictive distribution
| (10) |
where is the joint posterior distribution for , given in Equation (3). However, these integrals do not have a known mathematical solution. Hence, we present in the sequel an MCMC algorithm to get a workable approximation for the above integral.
From model (1), the marginal distribution for is given by a n-variate normal distribution with mean vector and covariance matrix , . Since , the joint distribution for is
where and is a row vector, , composed by the covariance among and , , for .
By using the properties of the multivariate normal distribution, the conditional posterior distribution for is given by
| (11) |
At this point, as we know the value of the 7-day moving average on the nth day, we fix up . In addition, we set up each in order to get these values according to Equation (2). Then, a sample from conditional posterior distribution in (11) can be generated according to the implementation of the Algorithm 4.
After running the algorithm for the same L iterations, burn in B and jump J of the Algorithm 3, an approximation for the integral in (10) is given by
where is the th iteration of the algorithm.
The predictions for the next n + j days, for , is obtained in a similar way by generating a sample from according to Algorithm 4, where is given by Equation (11), setting up and , for .
In many cases, the interest lies in predicting the value for a new input conditional on the last recorded values and not on all past values . For example, one can have interest in the value given the recorded 7-day moving average in the last s = 30 days, . This prediction is also made using Algorithm 4, just changing by and adapting the parameters for the dimension of . The predictions for the next n + j days are done as described in the paragraph above, clearly with some differences. The advantage of this kind of prediction is the computation time that is smaller than the prediction procedure conditional on .
3. Application
In this section, we apply the proposed method to a real dataset. The dataset refers to the 7-day moving average for the number of deaths recorded in the state of São Paulo, Brazil. As an illustration of this dataset, we present in Figure 1 the number of deaths recorded in the period from 17 March 2020 (first case) to 16 July 2021 and the 7-day moving average recorded in the period from 23 March 2020 (t = 0) to 27 April 2021 (n = 480).
Figure 1.
Number of deaths by day and 7-day moving average values. (a) Number of deaths. (b) 7-day moving average.
In order to estimate parameters of interest and make the predictions, we apply Algorithms 3 and 4 with iterations, B = 5000 iterations and J = 10. Thus, we got a sample of size 5000 to make inferences. Using these values, we ran our first analysis on the 100th day after the recording of the first 7-day moving average. Figure 2 shows the observed value (symbols •) and the credibility band of (blue area) determined by the proposed method. This figure also shows the prediction band of (black area) and the recorded values for the next ten days. The mean square error (MSE) of the predicted values in relation to the recorded values was 0.0056. As one can note, the prediction band contains all recorded values. Both results show a very satisfactory performance of the proposed method.
Figure 2.
Confidence and prediction bands ( ).
Figure 3 shows the graphic of the residuals. Figure 3(a) shows the quantile-quantile plot of the residuals in relation to the normal distribution. As one can note, the normality assumption is satisfied. Figure 3(b) shows the graphic of the predicted values versus the standardized residuals. As one can note, the points are uniformly distributed indicating that there is not evidence to reject the assumption of homogeneity of the variance. This graphic also shows that the residuals are not correlated. These both results also show the very satisfactory performance of the proposed method.
Figure 3.
Residuals plot. (a) QQPlot, (b) residuals × predicted.
The estimated value for the is , with a credibility interval ( ) given by . Figure 4 shows the graphics of the ergodic mean (ErM) and the estimated autocorrelation function (ACF) from the sampled values for . As one can note, there is no reason to doubt the convergence of the sampled values, since the ErM values present satisfactory stabilization and there is no significant ACF.
Figure 4.
ErM and ACF from the sampled values for . (a) ErM for . (b) ACF for .
We also verify the convergence of the sampled values for . The results are similar to the presented by the sampled values for . As an illustration, we show in Figure 5, the ErM and the ACF for the sample valued for . The was chosen at random among the , for . The estimate for is ; and the credibility interval ( ) is . The recorded value was 43.43. That is an absolute percentage error of . Analogously to the sampled value for , there is no reason to doubt the convergence of the sampled values for .
Figure 5.
ErM and ACF from the sampled values for . (a) ErM for . (b) ACF for .
The estimate for parameter is with credibility interval . The estimate for parameter ν is with credibility interval and acceptance rate of . The convergence checking for both parameters is similar to the presented for and , i.e. the ergodic mean presents satisfactory stabilization and there is no significant ACF.
3.1. Predictions based in the last 30 days
Consider now the interest in predicting the evolution of the 7-day moving average values for the next 15 days after the nth day conditional on the recorded values in the last thirty days. That is, the interest is to predict the values for conditional on . In the following, we present the results from three analyses ran at 180th, 280th and 465th days.
In order to obtain these predictions, we apply Algorithm 4 just changing by and adapting the dimension of the parameters . We run this algorithm for the same values for L, burn-in and jump J used in Section 3. Figure 6 shows the results of the analysis carried out on the 180th day. The MSE of the fitted model is 0.0123. As one can note, the five 7-day average values recorded after the 180th day are inside the prediction band, but the values recorded after the 185th day are outside the band. This a good news because it indicates that the reduction in the values of the 7-day moving average was greater than expected. In this period of 15 days, the moving average reduced from 191.43 (180th day) to 153.29 (195th day). A reduction of .
Figure 6.
Recorded values and confidence band determined by the proposed method, 180th day. (a) 2nd analysis.
Figure 7 shows the results of the analysis ran on the 280th day. The MSE of the fitted model is 0.0471. As one can note, only the two last recorded values are outside the prediction band. In this period of 15 days, the 7-day moving average value increased , going from 119.14 (280th day) to 213 (295th day).
Figure 7.
Recorded values and confidence band determined by the proposed method. (a) 2nd analysis.
Figure 8 shows the results of the analysis ran on the 465th day. The MSE of the fitted model is 0.0437. Note that, the credibility band determined by the proposed method base on the recorded values in the last thirty days indicates a stabilization of the moving average, but the recorded values were all below this region. This is a piece of very good news because it shows that a greater reduction than expected happened. The reduction was , going from 550.86 (365th day) to 340.14 (480th day).
Figure 8.
Recorded values and confidence band determined by the proposed method, 280th day. (a) 2nd analysis.
Figure 9 shows the recorded values in the last 30 days and the credibility band determined by the proposed method for the next 15 days. As one can note, this graphic shows there is no expected a great increase in the values of the 7-day moving average in the next 15 days.
Figure 9.
Recorded values and confidence band determined by the proposed method, 465th day. (a) 2nd analysis.
4. Final remarks
This article presented a hierarchical Bayesian methodology for modeling the evolution of the 7-day moving average for the number of deaths due to COVID-19. We opt to model the 7-day moving average in opposite to the cumulative number of deaths, as is usual, due to the moving average smooth the possible discrepancies that may have in the statistical reports of the number of deaths divulged.
Contrary to usual approaches, which are based on the adoption of a set of parametric models followed by a comparison using some model selection criteria, such as, AIC and BIC, the proposed approach assumes that the nonlinear mathematical function is an unknown random quantity that needs to be estimated from the observed data. Then, we adopt a Bayesian nonlinear regression approach with a Gaussian process.
The adoption of the Gaussian process prior means that we are defining a probability distribution over functions and the inference is taken directly in the space of the functions. This approach is considered nonparametric since we are using a prior distribution on the space of functions that corresponds to infinite-dimensional parameter space. Since making inferences about the infinite number of parameters is impractical, the assumption of the Gaussian process has the advantage of that conditional on a dataset a finite number of parameters (function values) can be explicitly represented by a multivariate normal distribution, making the estimation procedure simple to be computationally implemented.
In addition, since the Gaussian process is heavily influenced by the choice of the covariance function , we assume one more hierarchical level by putting a prior distribution over and on its hyperparameters. The inference of the parameters of interest is carried out using a Metropolis-within-Gibbs algorithm. We also present an MCMC algorithm to make predictions.
The proposed method is illustrated in a case study, in which, we model the 7-day moving average values recorded in the state of São Paulo, Brazil, in the period from 03/23/2020 to 04/27/2021. In this period, we ran the estimation procedure four times in order to verify the performance of the method. The results have shown a very good performance of the method. The MSE values were all smaller than 0.05, and there is no reason to doubt the convergence of the sampled values by the MCMC algorithm.
From a statistical data analysis point of view, the proposed method is very interesting because it does not need the assumption of a parametric model, and the inference is made by an MCMC algorithm that can be easily implemented in free software, such that, R software. The computational codes were implemented in the R language and can be obtained by e-mail to the authors.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Aviv-Sharon E. and Aharoni A., Generalized logistic growth modeling of the COVID-19 pandemic in Asia, Infect. Dis. Model 5 (2020), pp. 502–509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bedard M., Weak convergence of metropolis algorithms for non-i.i.d. target distributions, Ann. Appl. Probab. 17 (2007), pp. 1222–1244. [Google Scholar]
- 3.Chib S. and Greenberg E., Understanding the metropolis-Hastings algorithm, Am. Stat. 49 (1995), pp. 327–335. [Google Scholar]
- 4.Mattingly J.C., Pillai N.S., and Stuart A.M., Diffusion limits of the random walk metropolis algorithm in high dimensions, Ann. Appl. Probab. 22 (2011), pp. 881–930. [Google Scholar]
- 5.Musa S.S., Zhao S., and Wang M.H., Estimation of exponential growth rate and basic reproduction number of the coronavirus disease 2019 (COVID-19) in Africa, Infect. Dis. Poverty 9 (2020), Article number: 96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Roberts G., Gelman A., and Gilks W., Weak convergence and optimal scaling of random walk metropolis algorithms, Ann. Appl. Probab. 7 (1997), pp. 110–120. [Google Scholar]
- 7.Saraiva E.F. and Suzuki A.K., Bayesian computational methods for estimation of two-parameters Weibull distribution in presence of right-censored data, Chil. J. Stat. 8 (2017), pp. 25–43. [Google Scholar]
- 8.Vasconcelos G.L., Macêdo A.M.S., Ospina R., Almeida F.A.G., Duarte-Filho G.C., Brum A.A., and Souza I.C.L., Modelling fatality curves of COVID-19 and the effectiveness of intervention strategies, PeerJ 8 (2020), Article ID e9421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wang J., An intuitive tutorial to gaussian processes regression, preprint (2021). Available at https://arxiv.org/pdf/2009.10862.pdf
- 10.Wu K., Darcet D., and Wang Q., Generalized logistic growth modeling of the COVID-19 outbreak: Comparing the dynamics in the 29 provinces in China and in the rest of the world, Nonlinear Dyn. 101 (2020), pp. 1561–1581. [DOI] [PMC free article] [PubMed] [Google Scholar]









