Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2020 Jul 25;140:110151. doi: 10.1016/j.chaos.2020.110151

Modeling and forecasting the spread and death rate of coronavirus (COVID-19) in the world using time series models

Mohsen Maleki a, Mohammad Reza Mahmoudi b,c,, Mohammad Hossein Heydari d, Kim-Hung Pho e
PMCID: PMC7381941  PMID: 32834639

Abstract

Coronaviruses are a huge family of viruses that affect neurological, gastrointestinal, hepatic and respiratory systems. The numbers of confirmed cases are increased daily in different countries, especially in Unites State America, Spain, Italy, Germany, China, Iran, South Korea and others. The spread of the COVID-19 has many dangers and needs strict special plans and policies. Therefore, to consider the plans and policies, the predicting and forecasting the future confirmed cases are critical. The time series models are useful to model data that are gathered and indexed by time. Symmetry of error's distribution is an essential condition in classical time series. But there exist cases in the real practical world that assumption of symmetric distribution of the error terms is not satisfactory. In our methodology, the distribution of the error has been considered to be two-piece scale mixtures of normal (TPSMN). The proposed time series models works well than ordinary Gaussian and symmetry models (especially for COVID-19 datasets), and were fitted initially to the historical COVID-19 datasets. Then, the time series that has the best fit to each of the dataset is selected. Finally, the selected models are applied to predict the number of confirmed cases and the death rate of COVID-19 in the world.

Keywords: Coronaviruses, COVID-19, Forecasting, Time series modeling, Two pieces scale mixtures of normal distributions

1. Introduction

Coronaviruses are a huge family of viruses that affect neurological, gastrointestinal, hepatic and respiratory systems. This family can be grown among humans, bats, mice, livestock, birds, and others [1], [2], [3]. In 2003, a type of coronavirus, called SARS coronavirus (SARS-CoV), was distributed from animal to animal [4]. In 2012, another type of coronavirus, named as MERS coronavirus (MERS-CoV), was significantly distributed from human to human [4]. Late in year 2019, the World Health Organization (WHO) reported many cases in China with respiratory diseases. It was verified that most of the reported cases contacted with the persons that had went to a seafood market in Wuhan [5]. Recently, a new type of coronavirus, named COVID-19 (it may be also named 2019-nCoV), is spreading in Wuhan [6]. The scientists believe that the COVID-19 acts in human similar to that are in bats. However, to know the main source of the COVID-19, more scientific studies are needed. Based on the reports, the COVID-19 has been observed in others cities in China and also in about other 198 countries (up to 06 February 2020). The Centers for Disease Control and Prevention (CDC) verified that the COVID-19 is distributed from human to human. Based on the CDC's reports, the COVID-19 is spread by touching surfaces, close contact, air, or objects that contain viral particles. The COVID-19 is a dangerous virus, because the incubation period of the COVID-19 is at least 14 days [7], and it can spread to others in the incubation period. A recent research indicates that the median age and incubation period of confirmed cases are respectively 3 days and 47.0 years [8].

The number of confirmed cases has increased daily in different countries, specially in United State American, Italy, Spanish, Germany, Iran, China and other countries. The spread of the COVID-19 has many dangers and needs strict special plans and policies. Therefore, to consider the plans and policies, the prediction and forecasting the future confirmed cases are critical. The number of the unreported COVID-19 cases in China has been mathematically estimated by [9]. Using a data-driven analysis, they estimated that there are 469 unreported COVID-19 cases in China in 1–15 January 2020. Based on the information of some Japanese passengers in Wuhan, Nishiura et al. [10] estimated the rate of the infection for COVID-19 in Wuhan. The results indicated a rate of 9.5% for infection and a rate from 0.3% to 0.6%, for death. Since the size of the considered population is very small, there is doubt in about accuracy of estimated rates. Based on a mathematical model, Tang et al. [11] concluded that the transmission risk of COVID-19 is averagely about 6.47 persons and predicted the time that the peak of COVID-19 will be reached. Using the information of 47 patients, Thompson [12] estimated a sustained human-to-human transmission equal to 0.4 for COVID-19. Based on two different scenarios, Jung et al. [13] concluded that the risk of death is 5.1% and 8.4%. Al-qaness et al. [14] proposed an optimization method, named FPASSA-ANFIS, to model the number of confirmed cases of COVID-19 and to predict its future values using previous recorded dataset in China. They introduced a technique that was a combination of neuro-fuzzy system, flower pollination algorithm, and salp swarm technique. Generally, the salp swarm technique was applied to develop flower pollination algorithm to prevent its disadvantages such as returning trapped at the local optimum. The theory of FPASSA-ANFIS model is based on the improvement in the ability and accuracy of neuro-fuzzy system by considering the parameters of adaptive neuro-fuzzy inference system using salp swarm and flower pollination algorithms. The ability and applicability of FPASSA-ANFIS technique were studied using the real dataset including the outbreak of the COVID-19 given by WHO. Moreover, FPASSA-ANFIS technique was applied to forecast the confirmed cases in future days.

The modeling, forecasting, predicting and estimating the characteristics of the epidemiological problems were considered in some previous researches. For example, the forecasting of the cases and transmission risk of West Nile virus (WNV) [15], the forecasting of the infection of hepatitis A virus [16], the forecasting of the seasonal outbreaks of influenza [17, 18], the forecasting of the outbreaks of Ebola [19], the estimating of the infection's rate of the SARS [20], the modeling of the influenza A (H1N1–2009) [21], predicting the outbreaks of the MERS [22].

Time series models are useful to models data that gathered and indexed by time. Time series analysis has been used effectively to model, estimate, forecast and predict real practical problems, see refs. [23], [24], [25], [26], [27], [28], [29], [30], [31], [32]. Symmetry of error's distribution is an essential condition. But there exist many cases in the real world that assumption of symmetrically distribution of the error terms is not satisfactory (see e.g., refs. [25], [26], [27], [28], [29], [30], [31], [32]), so in our methodology we consider the time series models based on the two-piece distributions, especially two-piece scale mixture normal (TPSMN) distributions which had introduced by refs. [32], [33], [34], [35], [36], [37], [38]. The proposed time series models includes the symmetric Gaussian and symmetric/asymmetric lightly/heavy-tailed non-Gaussian time series models, and were fitted initially to the historical COVID-19 datasets. Then, the time series that has the best fit to each of the dataset is selected. Finally, the selected models are used to predict the number of confirmed cases and death rate of COVID-19 in the world. In this study,

  • 1

    An improved time series model is introduced applying TPSMN distributions.

  • 2

    The new efficient predictive model is applied to predict and estimate the confirmed cases and death rate of COVID-19 in the world, using past and current datasets.

2. Preliminaries

The autoregressive moving-average (ARMA) processes are a useful and accurate class of time series for modeling and forecasting of real datasets. The ARMA model presents a time series based on two linear functions; one contains the linear combinations of past values of time series, called the autoregressive (AR), and the other contains the linear combinations of a set of uncorrelated errors, called the moving average (MA). This model was firstly introduced by Peter Whittle, ref. [39], and then used by refs. [40,41].

Definition 2.1

The process {Xt} is a ARMA process with orders of (p, q), {Xt}  ~  ARMA(p, q), if

Xtα1Xt1αpXtp=Zt+η1Zt1++ηqZt1;,t=0,±1,±2,,{Zt}WN(0,σ2), (1)

where WN(0, σ 2) refers to a set of uncorrelated and identically distributed zero-mean random variables with variance σ 2.

It should be noted that the cases q=0, and p=0, are called the AR(p) and the MA(q) models, respectively.

Following general two-piece distributions from ref. [33] based on the scale mixtures of normal (SMN) family, the probability density function (pdf) of the TPSMN family for yR, that is presented by Y  ~  TPSMN(μ, σ, ν, γ), is represented by

g(y|μ,σ,γ,ν)={2(1γ)fSMN(y|μ,σ(1γ),ν),yμ,2γfSMN(y|μ,σγ,ν),y>μ, (2)

such that 0 < γ < 1 is the slant coefficient and fSMN( · |μ, σ, ν) is pdf of the SMN family.

Lemma 2.1

Let Y  ~  TP–SMN(μ, σ, γ, ν), then Y has a stochastic representation given by

Y=S1Y+S2Y+, (3)

where YSMN(μ,σ1,ν)IA(y) and Y+SMN(μ,σ2,ν)IAc(y), for which σ1=σ(1γ), σ2=σγ, A=(,μ) and SMN( · )IA( · ) is the truncated SMN–distribution on A, and S=(S1,S2)T such that S1+S2=1 has following probability mass function (pmf):

P(S=s)=(σ1σ1+σ2)s1(σ2σ1+σ2)s2;s1,s2=0,1,s1+s2=1. (4)

Lemma 2.2

Let Y  ~  TP–SMN(μ, σ, γ, ν),

  • a)

    E(Y)=μbΔ;

  • b)

    Var(Y)=σ2[c2k2(ν)b2c12],

where Δ=σ(12γ), b=2/πk1(ν), cr=γr+1+(1)r(1γ)r+1 and kr(ν)=E(Ur/2), for which U is the scale mixing variable (details are given in [32], [33], [34], [35], [36], [37], [38]).

3. ARMA process based on the two-piece distributions

3.1. The TP–SMN–ARMA process

Consider the ARMA(p, q) model (1) with independent and identically distributed (i.i.d.) noises from TPSMN,

{Zt}TPSMN(bΔ,σ,ν,γ),t=0,±1,±2,, (5)

And assume α=(α1,,αp)Tand η=(η1,,ηq)T are AR and MA coefficients of the TPSMNARMA model, respectively. In this work, we will represent this model by {Xt}TPSMNARMA(p,q) with the model parameter Θ=(α,η,μ,σ1,σ2,ν)T(based on the TPSMN representation from Lemma 2.1.).

Remark 3.1

Let {Xt}TPSMNARMA(p,q). The process {Xt} can be represented by a one-sided MA(∞) process, Xt=j=0ψjZtj. If the condition j=0|ψj|< is satisfied, then Xt converges in the mean, and this process is strictly stationary with the following mean and covariance functions:

μX(t)=E(Xt)=μz1+η1++ηq1α1αp;γX(h)=Cov(Xt,Xt+h)=σz2ξ(h), (6)

where μz=E(Zt), σz2=Var(Zt) (given by Lemma 2.2.), and ξ(h)=j=0ψj+|h|ψj. Also γX(h) → 0, as h → ∞, (see, ref. [42]).

3.2. Maximum-Likelihood estimates

Let X=(X1,,Xn)T and xt1=(Xt1,,Xtp)T  are sample and sub-samples of X, respectively. Also, assume that zt1=(Zt1,,Ztq)T for t=1,,n are conditionally errors on initial values X0=(X0,,Xp+1)T and Z0=(Z0,,Zq+1)T. Since the ARMA(p, q) model follows the Markovian property, then

L(Θ)=fX(X|X0,Z0,Θ)=t=1ng(Zt|X0,Z0,Θ),

where L(Θ) is the conditional likelihood function on initial values, (See more details about choosing the initial values and construction of the conditional likelihood function, in ref. [40]). So the log– conditional likelihood function is derived by

l(Θ)=t=1nlt(Θ)=t=1nlogg(XtαTxt1ηTzt1) (7)

such that g( · ) refers to TPSMN pdf given in (2).

The SMN–densities in the pdf (2) are complex, and then the exploring the MaximumLikelihood (ML) estimates for the parameters of model (7) will tractable. But, using the Lemma 2.1., concludes a suitable hierarchically form of the TPSMN family besides the proposed ARMA model, to employ an EM–type algorithm to estimate the parameters.

Considering the Lemma 2.1., and stochastic representation of SMN family (ref. [43]), let D=(X,U,S)T as the complete data for the observations X, and U=(U1,,Un)T and S=(St1,St2)T;t=1,,n are the missing (latent) data. It is noticed that the TPSMNARMA model via (1) and (5) has the following hierarchically representation:

Xt|xt1,zt1,Ut=ut,Sti=1N(αTxt1+ηTxt1+μ,ut1σi2)IAt(xt)2iIAtc(xt)i1
Ut|Sti=1H(ut|ν),
StMultinomial(1,σ1/(σ1+σ2),σ2/(σ1+σ2)), (11)

for t=1,,n and i=1,2, where At=(,αTxt1+ηTzt1+μ) and N( · )IA( · ) is the truncated normal distribution on A.

The hierarchical form of the TPSMNARMA process given in (11) and ECME algorithm, that is a generalization of the EM algorithm [44], are applied to find the ML estimates. So considering the proposed the TPSMNARMA(p,q) and (11), ignoring constants, the conditional log–likelihood function is

cl(Θ)=nlog(σ1+σ2)12t=1ni=12StiUt(XtαTxt1ηTzt1μσi)2+t=1ni=12Stilogh(Ut|ν), (12)

where Θ=(φ,θ,μ,σ1,σ2,ν)T.

Remark 3.1

The conditional expectations s^t1=E[Sti|Θ^,X]=I(,α^Txt1+η^Tzt1+μ^](xt)  and s^t2=1s^t1, w^ti=E[UtSti|Θ^,X]=κ^tis^ti for κ^ti=E[Ut|Θ^,X,Sti=1],t=1,,n,i=1,2 for the TPSMNARMA members are as follows:

  • 2•

    TP–N–ARMA model: κ^ti=1,

  • 2•

    TP–T–ARMA model: κ^ti=ν^+1ν^+dti,

  • 2•

    TP–SL–ARMA model: κ^ti=2ν^+1dtiP1(ν^+3/2,dti/2)P1(ν^+1/2,dti/2),

  • 2•

    TP–CN–ARMA model: κ^ti=τ^2ν^eτ^dti/2+(1ν^)edti/2τ^ν^eτ^dti/2+(1ν^)edti/2,

where dti=(xtα^Txt1η^Tzt1μ^)2/σ^i2, and Px(a, b) is the cumulative distribution function of the Gamma(a, b) distribution at x.

The function Q(Θ|Θ^(k))=Eθ[cll(Θ)|Θ^(k),X] must be maximized. For the (k+1) th, the E–Step of the ECME algorithm is as following:

Q(Θ|Θ^(k))=nlog(σ1+σ2)12t=1ni=12w^ti(k)(XtαTxt1ηTzt1μσi)2+i=1nj=12E[Stilogh(Ut|ν)|Θ^(k),X],

where w^ti(k)=κ^ti(k)s^ti(k) has obtained by Remark 3.1.

The CM–Steps of the ECME algorithm is also as following:

α^(k+1)=(t=1nζ^t(k)xt1xt1T)1t=1nζ^t(k)(Xtη^T(k)zt1μ^(k))xt1,
η^(k+1)=(t=1nζ^t(k)zt1zt1T)1t=1nζ^t(k)(Xtα^T(k+1)xt1μ^(k))zt1,
μ^(k+1)=t=1nζ^t(k)(Xtα^T(k+1)xt1η^T(k+1)zt1)t=1nζ^t(k),

where ζ^t(k)=i=12w^ti(k)/σi2(k).

At the follows of CM–Steps, solving the stressed cubic equations σi3+pσi+q=0;i=1,2, concluding the updates σ^i(k+1);i=1,2, where p=1nt=1nw^ti(k)(Xtα^T(k+1)xt1η^T(k+1)zt1μ^(k+1))2 , for which q=pσ2I(i=1)+pσ1I(i=2). Since p < 0 and q < 0, hence this equation has unique just root in (0,+).

Finally, the CML–step of the ECME algorithm is as following:

ν(k+1)=argmaxνl(α^T(k+1),η^T(k+1),μ^(k+1),σ^1(k+1),σ^2(k+1),ν).

The proposed algorithm will be continued until a convergence condition is verified, i.e., |l(Θ^(k+1))/l(Θ^(k))1|ε, where ɛ is a known and fixed tolerance.

4. Modeling the confirmed cases and death rate of coronavirus

4.1. Confirmed cases COVID-19 data in the world

The coronavirus (COVID-19) is spreading in about 203 countries of the world. The daily data related the COVID-19 in the world, are reporting by the China National Health Commission (NHC) and World Health Organization (WHO). In this part we fit the maintained time series models to the total confirmed cases in the world include and exclude China from 22-Jan-2020 up to 08-Apr-2020.

Time series plots of the total and daily cases in the world from 22-Jan up to 08-Apr of 2020 which are confirmed, and its stationary differenced with order 3 (i.e. 3Xt=Xt3Xt1+3Xt2+Xt3) are given in Fig. 1, Fig. 2 , respectively. Using the Dickey–Fuller test leads to p–value=0.01 with alternative hypothesis: stationary.

Fig. 1.

Fig. 1

Time series plot of the total confirmed cases of COVID-19 in the world from 22-Jan up to 08-Apr of 2020.

Fig. 2.

Fig. 2

Stationary time series plot of the COVID-19 in the world (differenced with order three).

Obviously number of cases (total and daily) in any days depend the number on them in the previous day(s), so the ARMA model can be suitable model for the COVID-19 cases data.

Two famous model selection criteria are Akaike information criteria (AIC=2k2l(Θ^); ref. [45]) and Bayesian information criteria (BIC=klogn2l(Θ^); ref. [46]), k is the number of parameters that are estimated in fitted model. The proposed criteria have used to choose the best TPSMNARMA model with the best fitted orders. These criteria and partial auto-correlation function (PACF) in Fig. 3 , demonstrate the following TPTARMA(7,0) is the best model

Xt+0.8994Xt1+0.9817Xt2+0.9336Xt3+0.7858Xt4+0.6506Xt5+0.4597Xt6+0.2662Xt7=Zt,

where

{Zt}<italic>TP-T</italic>(μ=8.847374,σ=2766.178,γ=0.5362869,ν=2.100046).

Fig. 3.

Fig. 3

PACF of the stationary transformed total COVID-19 data in the world.

The histogram of the estimated errors (residuals) based on the estimated TP–T density (near symmetry but heavy-tailed) is superimposed on it shows the suitable performance of the estimated model to COVID-19 data (Fig. 4 ). To further demonstrate the good fit of the model, we eliminated the last 10 data (2020-Mar-30 up to 2020-Apr-08), then fitted the TP–SMN–ARMA model and forecast these data. Fig. 5, Fig. 6 and Table 1 , show the forecasted real values of the COVID-19 in the world data are close. Table 1 contains the predictions and 98% confidence intervals for them.

Fig. 4.

Fig. 4

Histogram of the residuals of the fitted time series model on COVID-19 data in the world with superimposed estimated TPT density.

Fig. 5.

Fig. 5

Time series plot of real values and predicted COVID-19 data from 2020-Mar-30 up to 2020-Apr-08 with 98%.

Fig. 6.

Fig. 6

Time series plot of COVID-19 data and predicted data from 2020-Mar-30 up to 08-Apr of 2020.

Table 1.

The real values of the COVID-19 in the world data from 2020-Mar-30 up to 2020-Apr-08 with predictions and 98% confidence interval.

Date Real value Prediction Lower Upper
2020-Mar-30 785,828 783,114 776,624 789,937
2020-Mar-31 859,620 852,651 845,272 859,197
2020-Apr-01 936,637 937,797 930,885 944,428
2020-Apr-02 1,016,734 1,016,045 1,008,173 1,022,633
2020-Apr-03 1,118,414 1,101,645 1,093,850 1,108,143
2020-Apr-04 1,203,235 1,223,923 1,215,528 1,230,375
2020-Apr-05 1,274,653 1,286,735 1,277,745 1,295,487
2020-Apr-06 1,348,564 1,348,163 1,338,874 1,357,682
2020-Apr-07 1,430,981 1,426,889 1,417,614 1,435,226
2020-Apr-08 1,518,023 1,520,874 1,511,512 1,529,308

The mean relative percentage error (MAPE) index given by

MAPE=1ni=1n|X^iXiXi|,

where X^n+1=E(Xn+1Xn,,X1), is then used to evaluate the accuracy of the suggested data prediction, which for the proposed predictions is 0.60% which shows the suitability of the proposed model for predicting. Note that, this criterion for the modeling via the ordinary Gaussian–ARMA model (also, the simplest TP–SMN–ARMA member) is 0.89%. Also the AIC and BIC criteria for the best fitted TP–SMN–ARMA are 1290.49 and 1298.02, and for the best fitted Gaussian–ARMA model are 1524.14 and 1544.12, respectively.

Finally, the p–value=0.972 from the Box–Pierce and p–value=0.931 from the Ljung–Box tests indicate the independency of residuals. Also the auto–correlation function (ACF) plot of the residuals presented in Fig. 7 shows the suitability of the TPTARMA(7,0) model to the total confirmed cased of the COVID-19 dataset.

Fig. 7.

Fig. 7

ACF of the residuals of fitted time series model to total COVID-19 in the world data.

4.2. Death rate of COVID-19 data

In this section we consider and model the death rate of COVID-19 in the world from 02-Feb-2020 up to 08-Apr-2020, which this daily data also has reported by the China National Health Commission (NHC) and World Health Organization (WHO).

Time series plots of the death rate of coronavirus in the world from 02-Feb-2020 up to 08-Apr-2020, and its stationary differenced with order 3 (i.e. 3Xt=Xt3Xt1+3Xt2+Xt3) are given in Fig. 8, Fig. 9 , respectively. Using the Dickey–Fuller test leads to p–value=0.01 which demonstrate the stationarity of differenced data.

Fig. 8.

Fig. 8

Time series plot of the death rate of COVID-19 in the world from 2020-Mar-30 up to 08-Apr of 2020.

Fig. 9.

Fig. 9

Stationary time series plot of the death rate of COVID-19 in the world (differenced with order three).

Using the model selection criteria and methodology in the previous data, demonstrate that best TPSMNARMA model with the best fitted orders is TPTARMA(7,1). The PACF given in Fig. 10 also satisfies it. Therefore the following TPSMNARMA is the best model

Xt+1.3760Xt1+1.4183Xt2+1.1401Xt3+0.9269Xt4+0.6482Xt5+0.3181Xt6+0.1752Xt7=Zt.0628Zt1,

where

{Zt}<italic>TP-T</italic>(μ=0.056836,σ=0.2664454,γ=0.297544,ν=2.826561).

Fig. 10.

Fig. 10

PACF of the stationary transformed death rate of COVID-19 data in the world.

The histogram of the estimated errors (residuals) based on the estimated TP–T density (heavy-tailed and asymmetry) is superimposed on it shows the suitable performance of the estimated model to death rate of COVID-19 in the world (Fig. 11 ). Same as previous data, we eliminated the last 10 data (2020-Mar-30 up to 2020-Apr-08, then fitted the TP–SMN–ARMA model and forecast these data.

Fig. 11.

Fig. 11

Histogram of the residuals of the fitted time series model on the death rate of COVID-19 in the world data with superimposed estimated TP–T density.

Figs. 12 , and 13 and Table 2 , show the forecasted real values of the death rate of COVID-19 in the world data are close. Table 2 contains the predictions and also 98% confidence intervals for them.

Fig. 12.

Fig. 12

Time series plot of real values and predicted death rate of COVID-19 in the world data from 2020-Mar-30 up to 2020-Apr-28 with 98% confidence interval.

Fig. 13.

Fig. 13

Time series plot of death rate of COVID-19 in the world data and predicted data from 2020-Mar-30 up to 2020-Mar-08.

Table 2.

The real values of the death rate of COVID-19 in the world data from 2020-Mar-30 up to 2020-Apr-08 with predictions and 98% confidence interval.

Date Real value Prediction Lower Upper
2020-Mar-30 18.59 19.00 18.55 19.39
2020-Mar-31 19.19 18.75 18.29 19.17
2020-Apr-01 19.55 19.45 18.98 19.89
2020-Apr-02 20.03 19.87 19.41 20.31
2020-Apr-03 20.48 20.32 19.86 20.76
2020-Apr-04 20.79 20.97 20.51 21.41
2020-Apr-05 20.86 21.16 20.70 21.60
2020-Apr-06 21.13 20.86 20.41 21.31
2020-Apr-07 21.35 21.12 20.68 21.59
2020-Apr-08 21.12 21.51 21.09 22.00

The MAPE for the second proposed predictions is 1.30% demonstrating the suitability of the proposed model for prediction. Note that, this criterion for the modeling via the ordinary Gaussian–ARMA model (also, the simplest TP–SMN–ARMA member) is 1.70%. Also the AIC and BIC criteria for the best fitted TP–SMN–ARMA are 4.42 and 18.05, and for the best fitted Gaussian–ARMA model are 76.68 and 95.07, respectively.

Finally, the p–value=0.974 from the Box–Pierce and p–value=0.873 from the Ljung–Box tests indicate the independence of residuals. Also, the ACF plot of the residuals presented in Fig. 14 demonstrates the suitability of the TPTARMA(7,1) model to the death rate of COVID-19 in the world dataset.

Fig. 14.

Fig. 14

ACF of the residuals of the fitted time series model to the death rate of COVID-19 in the world data.

4. Conclusion

Coronaviruses are a huge family of viruses that affect neurological, gastrointestinal, hepatic, and respiratory systems. The numbers of confirmed cases are increased daily in different countries, especially in China, Iran, South Korea, Italy and others. The spread of the COVID-19 has many dangers and needs strict special plans and policies. Therefore, to consider the plans and policies, the predicting and forecasting the future confirmed cases are critical. The time series models are useful to model data that gathered and indexed by time. Classical time series is based on the symmetry of error's distribution. But there exist many situations in the real world that the assumption of symmetric distribution of the error terms is not satisfactory. In our methodology, we considered the time series models based on the two-piece scale mixture normal (TP–SMN) distributions. The proposed time series models were fitted initially to the historical COVID-19 datasets. Then, the time series that had the best fit to a dataset was selected. Finally, the selected models were applied to forecast the number of confirmed COVID-19 cases. The results indicate that the introduced approach acts well in forecasting the future confirmed COVID-19 cases. Also all of criteria demonstrate that the proposed models are more reasonable that the ordinary Gaussian time series model (, which also is the simplest members of our proposed model). Note that a sample copy of the code is available from the authors upon request.

Funding

No fund.

Declaration of Competing Interest

The authors declare no conflict of interest.

References

  • 1.Chen Y., Liu Q., Guo D. Emerging coronaviruses: genome structure, replication, and pathogenesis. J Med Virol. 2020 doi: 10.1002/jmv.25681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ge X.Y., Li J.L., Yang X.L., Chmura A.A., Zhu G., Epstein J.H., Mazet J.K., Hu B., Zhang W., Peng C. Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. Nature. 2013;503:535–538. doi: 10.1038/nature12711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wang L.F., Shi Z., Zhang S., Field H., Daszak P., Eaton B.T. Review of bats and SARS. Emerg Infect Dis. 2006;12:1834. doi: 10.3201/eid1212.060401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Cauchemez S., Van Kerkhove M., Riley S., Donnelly C., Fraser C., Ferguson N. Transmission scenarios for Middle East Respiratory Syndrome Coronavirus (MERS-CoV) and how to tell them apart. Euro Surveill Bull Eur Sur Les Mal Transm Eur Commun Dis Bull. 2013;18:20503. [PMC free article] [PubMed] [Google Scholar]
  • 5.World Health Organization. Novel Coronavirus (2019-nCoV) 2020, 2020. Available online:https://www.who.int/ (accessed on 27 January 2020).
  • 6.Lu R., Zhao X., Li J., Niu P., Yang B., Wu H., Wang W., Song H., Huang B., Zhu N.;. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet. 2020;395:565–574. doi: 10.1016/S0140-6736(20)30251-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cheng Z.J., Shan J. 2019 Novel Coronavirus: where We are and what we know. Infection. 2020 doi: 10.1007/s15010-020-01401-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Guan W.J., Ni Z.Y., Hu Y., Liang W.H., Ou C.Q., He J.X., Liu L., Shan H., Lei C.L., Hui D.S.;. Clinical characteristics of 2019 novel coronavirus infection in China. medRxiv. 2020 doi: 10.1101/2020.02.06.20020974. https://www.medrxiv.org/content/early/2020/02/09/2020.02.06.20020974.full.pdf Available online: (accessed on 9 February 2020) [DOI] [Google Scholar]
  • 9.Zhao S., Musa S.S., Lin Q., Ran J., Yang G., Wang W., Lou Y., Yang L., Gao D., He D.;. Estimating the unreported number of Novel Coronavirus (2019-nCoV) cases in China in the first half of January 2020: a data-driven modelling analysis of the early outbreak. J Clin Med. 2020;9:388. doi: 10.3390/jcm9020388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Nishiura H., Kobayashi T., Yang Y., Hayashi K., Miyama T., Kinoshita R., Linton N.M., Jung S.m., Yuan B., Suzuki A.;. The rate of underascertainment of Novel Coronavirus (2019-nCoV) infection: estimation using Japanese passengers data on evacuation flights. J Clin Med. 2020;9:419. doi: 10.3390/jcm9020419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tang B., Wang X., Li Q., Bragazzi N.L., Tang S., Xiao Y., Wu J. Estimation of the transmission risk of the 2019-nCoV and its implication for public health interventions. J Clin Med. 2020;9:462. doi: 10.3390/jcm9020462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Thompson R.N. Novel Coronavirus outbreak in Wuhan, China, 2020: intense surveillance is vital for preventing sustained transmission in new locations. J Clin Med. 2020;9:498. doi: 10.3390/jcm9020498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Jung S.M., Akhmetzhanov A.R., Hayashi K., Linton N.M., Yang Y., Yuan B. Real time estimation of the risk of death from novel coronavirus (2019-nCoV) infection: inference using exported cases. J Clin Med. 2020;9:523. doi: 10.3390/jcm9020523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Al-qaness M.A.A., Ewees A.A., Fan H., Abd El Aziz M. Optimization method for forecasting confirmed cases of COVID-19 in China. J Clin Med. 2020;9:674. doi: 10.3390/jcm9030674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.DeFelice N.B., Little E., Campbell S.R., Shaman J. Ensemble forecast of human West Nile virus cases and mosquito infection rates. Nat Commun. 2017;8:1–6. doi: 10.1038/ncomms14592. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ture M., Kurt I. Comparison of four different time series methods to forecast hepatitis A virus infection. Expert Syst Appl. 2006;31:41–46. [Google Scholar]
  • 17.Shaman J., Karspeck A. Forecasting seasonal outbreaks of influenza. Proc Natl Acad Sci USA. 2012;109:20425–20430. doi: 10.1073/pnas.1208772109. J. Clin. Med. 2020, 9, 674 14 of 15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Shaman J., Karspeck A., Yang W., Tamerius J., Lipsitch M. Real-time influenza forecasts during the 2012–2013 season. Nat Commun. 2013;4:1–10. doi: 10.1038/ncomms3837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Shaman J., Yang W., Kandula S. Inference and forecast of the current West African Ebola outbreak in Guinea, Sierra Leone and Liberia. PLoS Curr. 2014:6. doi: 10.1371/currents.outbreaks.3408774290b1a0f2dd7cae877c8b8ff6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Massad E., Burattini M.N., Lopez L.F., Coutinho F.A. Forecasting versus projection models in epidemiology: the case of the SARS epidemics. Med Hypotheses. 2005;65:17–22. doi: 10.1016/j.mehy.2004.09.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ong J.B.S., Mark I., Chen C., Cook A.R., Lee H.C., Lee V.J. Real-time epidemic monitoring and forecasting of H1N1-2009 using influenza-like illness from general practice and family doctor clinics in Singapore. PLoS ONE. 2010;5 doi: 10.1371/journal.pone.0010036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Nah K., Otsuki S., Chowell G., Nishiura H. Predicting the international spread of Middle East respiratory syndrome (MERS) BMC Infect Dis. 2016;16:356. doi: 10.1186/s12879-016-1675-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Mahmoudi M.R., Maleki M., Pak A. Testing the difference between two independent time series models. Iran J Sci Technol A (Sciences) 2017;41:665–669. [Google Scholar]
  • 24.Mahmoudi M.R., Maleki M. A new method to detect periodically correlated structure. Comput Stat. 2017;32:1569–1581. [Google Scholar]
  • 25.Maleki M., Arellano-Valle R.B. Maximum a-posteriori estimation of autoregressive processes based on finite mixtures of scale-mixtures of skew-normal distributions. J Stat Comput Sim. 2017;87:1061–1083. [Google Scholar]
  • 26.Maleki M., Nematollahi A.R. Autoregressive models with mixture of scale mixtures of Gaussian innovations. Iran J Sci Technol A (Sciences) 2017;41:1099–1107. [Google Scholar]
  • 27.Zarrin P., Maleki M., Khodadadi Z., Arellano-Valle R.B. Time series process based on the unrestricted skew normal process. J Stat Comput Sim. 2018;89:38–51. [Google Scholar]
  • 28.Maleki M., Arellano-Valle R.B., Dey D.K., Mahmoudi M.R., Jalali S.M. A Bayesian approach to robust skewed Autoregressive process. Calcutta Statistical Association Bulltaine. 2018;69:165–182. [Google Scholar]
  • 29.Hajrajabi A., Maleki M. Nonlinear semiparametric autoregressive model with finite mixtures of scale mixtures of skew normal innovations. J APPL STAT. 2019;46:2010–2029. [Google Scholar]
  • 30.Maleki M., Wraith D., Mahmoudi M.R., Contreras-Reyes J.E. Asymmetric heavy-tailed vector auto-regressive processes with application to financial data. J Stat Comput Sim. 2020;90:324–340. [Google Scholar]
  • 31.Ghasami S., Khodadadi Z., Maleki M. Autoregressive processes with generalized hyperbolic innovations. Commun Stat Simul Comput. 2018 doi: 10.1080/03610918.2018.1535066. [DOI] [Google Scholar]
  • 32.Ghasami S., Maleki M., Khodadadi Z. Leptokurtic and Platykurtic class of robust symmetrical and asymmetrical time series models. J Comput Appl Math. 2020 doi: 10.1016/j.cam.2020.112806. [DOI] [Google Scholar]
  • 33.Arellano-Valle R.B., Gómez H., Quintana F.A. Statistical inference for a general class of asymmetric distributions. J Stat Plan Infer. 2005;128:427–443. [Google Scholar]
  • 34.Maleki M., Mahmoudi M.R. Two-piece location-scale distributions based on scale mixtures of normal family. Commun Stat Theory Methods. 2017;46:12356–12369. [Google Scholar]
  • 35.Moravveji M., Khodadadi Z., Maleki M. A bayesian analysis of two-piece distributions based on the scale mixtures of normal family. Iran J Sci Technol A (Sciences) 2019;43:991–1001. [Google Scholar]
  • 36.Maleki M., Mahmoudi M.R., Contreras-Reyes J.E. Robust mixture modeling based on two-piece scale mixtures of normal family. Axioms. 2019;8(2):38. [Google Scholar]
  • 37.Maleki M., Barkhordar Z., Khodadadi Z., Wraith D. A robust class of homoscedastic nonlinear regression models. J Stat Comput Sim. 2019;89:2765–2781. doi: 10.1080/02664763.2020.1854203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Hoseinzaseh A., Maleki M., Khodadadi Z., Contreras-Reyes J.E. The Skew-Reflected-Gompertz distribution for analyzing symmetric and asymmetric data. J Comput Appl Math. 2019;349:132–141. [Google Scholar]
  • 39.Whittle P. Almquist and Wicksell; 1951. Hypothesis Testing in Time Series Analysis. [Google Scholar]
  • 40.Box George, Jenkins Gwilym M, Reinsel Gregory C. 3rd ed. Prentice-Hall; 1994. Time series analysis: forecasting and control. ISBN 0130607746. [Google Scholar]
  • 41.Brockwell P.J., Davis R.A. 2nd ed. Springer; New York: 2009. Time series: theory and methods. [Google Scholar]
  • 42.Brockwell P.J., Davis R.A. 2nd ed. Springer Science & Business Media; 2002. Introduction to time series and forecasting (Springer texts in statistics) [Google Scholar]
  • 43.Andrews D.R., Mallows C.L. Scale mixture of normal distribution. J R Stat Soc B. 1974;36:99–102. [Google Scholar]
  • 44.Dempster A.P., Laird N.M., Rubin D.B. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B. 1977;39:1–22. [Google Scholar]
  • 45.Akaike H. A new look at the statistical model identification. IEEE T Automa Contr. 1974;19:716–723. [Google Scholar]
  • 46.Schwarz G. Estimating the dimension of a model. Ann Stat. 1978;6:461–464. [Google Scholar]

Articles from Chaos, Solitons, and Fractals are provided here courtesy of Elsevier

RESOURCES