A new hybrid prediction model of cumulative COVID-19 confirmed data

Guohui Li; Kang Chen; Hong Yang

doi:10.1016/j.psep.2021.10.047

. 2021 Nov 2;157:1–19. doi: 10.1016/j.psep.2021.10.047

A new hybrid prediction model of cumulative COVID-19 confirmed data

Guohui Li ^1,^⁎, Kang Chen ¹, Hong Yang ^1,^⁎

PMCID: PMC8560186 PMID: 34744323

Graphical Abstract

The flow chart of GVMD-ELM-ARIMA

Keywords: COVID-19, Cumulative confirmed data, Prediction, Variational mode decomposition

Abstract

Establishing an accurate and efficient prediction model is of great significance for governments and other social organizations to formulate prevention and control policies and curb the explosive spread of the pandemic. To improve prediction accuracy of cumulative COVID-19 confirmed data, a new hybrid prediction model based on gradient-based optimizer variational mode decomposition (GVMD), extreme learning machine (ELM), and autoregressive integrated moving average (ARIMA), named GVMD-ELM-ARIMA, is proposed. To solve the problem of selecting the $k$ value and the penalty factor $α$ in variational mode decomposition (VMD), this paper proposes gradient-based optimizer variational mode decomposition (GVMD), which realizes the self-adaptive determination of $k$ value and $α$ value. Firstly, GVMD decomposes the cumulative COVID-19 confirmed data into some intrinsic mode functions (IMFs) and a residual component (IMFr). Secondly, IMFs are predicted by ELM. Then, IMFr is predicted by ARIMA. Finally, the final prediction results are obtained by reconstructing the prediction result of IMFs and IMFr. The cumulative COVID-19 confirmed data of the United States, India and Russia is used to verify its effectiveness. Taking the United States as an example, compared with the average MAPE, RMSE and MAE of the single model, the average MAPE of the hybrid model is reduced by 47.27%, the average RMSE is reduced by 44.50%, and the average MAE is reduced by 55.34%. Compared with GVMD-ELM-ELM, GVMD-ELM-ARIMA proposed in this paper reduces the MAPE by 60%, the RMSE by 56.85%, and the MAE by 61.61%. The experimental results show that GVMD-ELM-ARIMA has best prediction accuracy, and it provides a new method for predicting the cumulative COVID-19 confirmed data.

1. Introduction

As a new type of RNA virus, corona virus disease 2019 (COVID-19) causes the whole world into this viral crisis because of its strong transmissibility. Establishing an accurate and efficient prediction model is of great significance for governments and other social organizations to formulate prevention and control policies and curb the explosive spread of the pandemic. At present, the prediction of pandemic data has been carried out pretty deeply (Santosh, 2020a, Santosh, 2020b, Bhapkar et al., 2020, Xiang et al., 2021), which can be roughly divided into three models: epidemic model, single model prediction, and hybrid prediction model.

Epidemic models used to study COVID-19 are mainly divided into three types: susceptible infectious (SI) model (Cong et al., 2020), susceptible infectious recovered (SIR) model (Cooper et al., 2020, Rafieenasab et al., 2020, Singh and Gupta, 2021), and susceptible exposed infectious recovered (SEIR) model (Annas et al., 2020, Feng et al., 2021). Epidemic model can better predict the number of people infected and susceptible in the future, but their establishment process is complex, and the transmission coefficient needs to be obtained by a large number of experiments. Therefore, they can only roughly estimate the trend of the epidemic, but cannot achieve accurate prediction.

In addition to the epidemic model, single model prediction is the most common method to study COVID-19. Specifically, the single model can be subdivided into statistical theoretical model and artificial intelligence model. Statistical theoretical model mainly includes the logistic regression model (Ibrahim and Al-Najafi, 2020), linear regression model (Ladha et al., 2020), and multiple linear regression model (Rath et al., 2020), etc. The above statistical theoretical model has the advantages such as fast training speed, few reference factors, simple and easy to understand. However, its accuracy is low. It is difficult to deal with problems such as high complexity data. Different from statistical theoretical model, single artificial intelligence model can usually process signal well and reach the expected prediction accuracy. Therefore, some researchers used artificial intelligence model to predict COVID-19 such as support vector machine (SVM) (Balli, 2021), artificial neural network (ANN) (Moftakhar et al., 2020, Hamadneh et al., 2021), long short-term memory (LSTM) (Shyam Sunder Reddy and Padmanabha Reddy, 2020, Shahid et al., 2020). These models have a certain degree of prediction effect, but they all have some certain limitations in varying degrees.

In order to improve the COVID-19 confirmed data prediction accuracy and avoid the defect of the single intelligent model as much as possible, some researchers began to propose some hybrid models. By combining the advantages and disadvantages of different models, they expected to achieve the desired effect. Hybrid models are mainly divided into two categories: epidemic hybrid model with combined artificial intelligence and hybrid model based on decomposition. The former combined artificial intelligence technology with epidemic model to predict the future development of the epidemic in many aspects (Zhang et al., 2020, Zhang and Liu, 2021). The epidemic hybrid model combined artificial intelligence can optimize the parameters and improve the prediction accuracy to a certain extent. However, the nature of epidemic model remains unchanged. After the introduction of artificial intelligence, there are too many parameters, and the inherent characteristics of data are limited. Another hybrid model is based on decomposition, which can decompose data into components of different complexity by introducing a mode decomposition algorithm. Therefore, this kind of model can better capture the inherent characteristics of data and reduce the complexity of original data, so as to achieve a better prediction effect (Yang et al., 2020a, Wang et al., 2020). The hybrid model based on decomposition has been applied in many ways (Shrivastava et al., 2016, Wang et al., 2017, Chu et al., 2021). In the field of COVID-19 confirmed data prediction, some researchers have proposed hybrid prediction model based on decomposition. Singh et al. (2020) combined discrete wavelet decomposition (DWD) and autoregressive integrated moving average (ARIMA) model to predict death data in Italy, Spain, and other countries. The result shows that the error of prediction after wavelet decomposition is greatly reduced and the prediction accuracy is high. But the essence of wavelet decomposition is Fourier transform, which cannot describe the local characteristics of the signal in the time domain, and it is not suitable for the decomposition of the non-stationary signal. Qiang et al. (2021) proposed a hybrid prediction model for COVID-19 outbreak based on ensemble empirical mode decomposition (EEMD). The hybrid model decomposes the real data into several components of different frequency by EEMD, and then predicts these components by ARIMA respectively. This hybrid model has strong adaptability to each country and good prediction accuracy. However, EEMD and its improved algorithm such as complementary ensemble empirical mode decomposition (CEEMD), and complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) have two shortcomings. The first is the lack of strict mathematical theoretical basis (Li et al., 2018). The second is the existence of modal aliasing (Jiang et al., 2019, Lang et al., 2020, Yang et al., 2021a). It often separates similar frequencies into multiple patterns, which can lead to significant error in the prediction.

Compared with EEMD, CEEMD and CEEMDAN, variational mode decomposition (VMD) not only has a solid theoretical basis, but also overcomes the phenomenon of modal aliasing (Dragomiretskiy and Zosso, 2014, Li et al., 2020a, Yang et al., 2021b) and can better capture the inherent characteristics of data. However, this predictive thinking is just beginning to be applied to the COVID-19 confirmed data prediction. Therefore, in order to avoid the shortcomings of other decomposition algorithms and further improve the prediction accuracy of COVID-19 confirmed data, VMD is introduced in this paper. Unfortunately, the VMD mode number $k$ and penalty factor $α$ are set artificially, which are mainly dependent on subjective consciousness and are not convincing (Chen et al., 2017, Zheng et al., 2017). Gradient-based optimizer (GBO) is a primal heuristic algorithm proposed by Ahmadianfar et al. (2020) which has strong detection, development, convergence and effective ability to avoid local optimality. Therefore, this paper proposes gradient-based optimizer variational mode decomposition (GVMD) to determine mode number $k$ and penalty factor $α$ , which improve its adaptability. Although GVMD can optimally decompose real data to obtain intrinsic mode functions (IMFs), there are still some small components that cannot be decomposed, which are called residual component (IMFr). In previous studies, since the IMFr component is small, its influence is often ignored. However, in the study of COVID-19, the impact of IMFr can not be ignored, so we refer to Reference (Tang et al., 2020) to determine the treatment of IMFr. The IMFs obtained by GVMD have the characteristics of flat data and mainly low-frequency components. And IMFr is mainly composed of high-frequency components, and the data fluctuates greatly. Extreme learning machine (ELM) has the advantage of fast speed and strong versatility (Kasun et al., 2013, Chorowski et al., 2014), while ARIMA can deal with high-frequency components, few parameters and simple application. According to the different characteristics of IMFs and IMFr, ELM is used to predict IMFs, and ARIMA is introduced to analyze and predict IMFr to improve the prediction accuracy.

To improve prediction accuracy of cumulative COVID-19 confirmed data, a new hybrid prediction model based on gradient-based optimizer variational mode decomposition (GVMD), extreme learning machine (ELM), and autoregressive integrated moving average (ARIMA), named GVMD-ELM-ARIMA, is proposed. The main work of this paper is as follows:

(1)
At present, most of the hybrid prediction models for COVID-19 are based on the combination of epidemiology and artificial intelligence, and the prediction model of decomposition and integration is just beginning. Considering the shortcoming of EEMD and its improved decomposition algorithm, in order to improve the prediction accuracy, this paper introduces VMD to decompose the cumulative COVID-19 confirmed data.
(2)
To solve the problem of selecting the $k$ value and the penalty factor $α$ in variational mode decomposition (VMD), this paper proposes gradient-based optimizer variational mode decomposition (GVMD), which realizes the self-adaptive determination of $k$ value and $α$ value.
(3)
Although the accuracy of the hybrid model has been high, there is still some error. In order to reduce the error and improve the prediction performance, this paper analyzed the IMFr, and innovatively introduced ARIMA to predict it, which further improving the prediction accuracy.
(4)
GVMD-ELM-ARIMA is proposed. Firstly, GVMD decomposes the cumulative COVID-19 confirmed data into some intrinsic mode functions (IMFs), and residual component (IMFr) is obtained. Secondly, IMFs are predicted by ELM. Then, IMFr is predicted by ARIMA. Finally, the final prediction results are obtained by reconstructing the prediction result of IMFs and IMFr. The cumulative confirmed data of the United States, India, and Russia is used for simulation experiment, and several comparative models are added. The experimental results show that GVMD-ELM-ARIMA has best prediction accuracy, and provides a new method for predicting the cumulative COVID-19 confirmed data.
(5)
The GVMD-ELM-ARIMA method is used to predict cumulative COVID-19 confirmed data in the United States, India and Russia in the future, and its effectiveness is tested.

The rest of this paper is organized as follows. In Section 2, basic theory such as VMD, GVMD, ARIMA and ELM are briefly introduced. In Section 3, the GVMD-ELM-ARIMA hybrid prediction model is proposed. In Section 4, three case studies are performed and the effectiveness of the proposed model is analyzed. Section 5 is the multi-step prediction of COVID-19 cumulative data by GVMD-ELM-ARIMA. Section 6 is discussion. The last Section is conclusion. For the convenience of reading, the nomenclature used in this paper is shown in Appendix A.

2. Basic theory

2.1. Variational mode decomposition

Variational mode decomposition is a new signal processing method based on EMD, which decomposes the signal by solving the variational formula adaptively.

Firstly, each IMF component obtained by VMD is defined as a signal with adjustable frequency and amplitude:

u_{k} (t) = A_{k} (t) \cos (φ_{k} (t))

(1)

where, $u_{k} (t)$ is the kth modal component, $A_{k} (t)$ and $φ_{k} (t)$ are instantaneous amplitude and phase respectively. The principle of VMD is as follows (Yang et al., 2020b):

(1)
Hilbert transform is performed on $u_{k} (t)$ , and unilateral spectrum of modal function is obtained by establishing analytical signal:
$(δ (t) + \frac{j}{π t}) * u_{k} (t)$ (2)
(2)
Mix the analytical signal and center frequency of each IMF to realize the spectrum shift to the base band:
$[(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j w_{k} t}$ (3)
(3)
Estimate the bandwidth of each IMF component.
(4)
Constraint variational model:

{\begin{matrix} \min_{{u_{k}}, {ω_{k}}} {\sum_{k = 1}^{k} {‖ \partial_{t} [(δ (t) + \frac{j}{π t}) u_{k} (t)] e^{- j ω_{k} t} ‖}_{2}^{2}} \\ s . t . \sum_{k = 1}^{k} u_{k} (t) = f (t) \end{matrix}

(4)

where, ${{u}_{k}}$ and ${ω_{k}}$ represents the set of sub-signals and their center frequencies, K represents the total number of sub-signals, $δ (t)$ is the Dirac distribution.

For solving the variational problem, the extended Lagrange function $L ({u_{k}}, {ω_{k}}, {λ})$ can transform the constrained problem into a non-constrained problem to realize the solution of the variational problem (Chen et al., 2020), which is shown in formula (5).

\begin{matrix} L ({u_{k}}, {ω_{k}}, θ) = C \sum_{k = 1}^{K} {‖ \partial_{t} [(δ (t) + \frac{j}{π t}) u_{k} (t)] e^{- j ω_{k} t} ‖}_{2}^{2} \\ + {‖ f (t) - \sum_{k = 1}^{K} u_{k} (t) ‖}_{2}^{2} + 〈 θ (t), f (t) - \sum_{k = 1}^{K} u_{k} (t) 〉 \end{matrix}

(5)

where, $C$ represents the quadratic penalty factor, $θ (t)$ represents the Lagrange multiplier and $f (t)$ represents the original signal.

To solve formula (5), the alternating direction method of the multiplication operator is introduced. $u_{k}^{n + 1}$ , $ω_{k}^{n + 1}$ and $θ^{n + 1}$ are updated alternately, where $n$ represents the number of iterations. The specific process is as follows (Li et al., 2019):

{\hat{u}}_{k}^{n + 1} (ω) = \frac{\hat{f} (ω) - \sum_{k = 1}^{K} {\hat{u}}_{k} (ω) + \frac{\hat{θ} (ω)}{2}}{1 + 2 C {(ω - ω_{k})}^{2}}

(6)

ω_{k}^{n + 1} = \frac{\int_{0}^{\infty} ω {| {\hat{u}}_{k} (ω) |}^{2} d ω}{\int_{0}^{\infty} {| {\hat{u}}_{k} (ω) |}^{2} d ω}

(7)

{\hat{θ}}^{n + 1} (ω) = {\hat{θ}}^{n + 1} (ω) + τ [\hat{f} (ω) - \sum_{k = 1}^{K} {\hat{u}}_{k}^{n + 1} (ω)]

(8)

Determine whether to stop iteration convergence according to the accuracy $e$ . The conditions are as follows:

\sum_{k = 1}^{K} \frac{{‖ {\hat{u}}_{k}^{n + 1} - {\hat{u}}_{k}^{n} ‖}_{2}^{2}}{{‖ {\hat{u}}_{k}^{n} ‖}_{2}^{2}} < e

(9)

2.2. VMD optimized by gradient-based optimizer

When VMD is used for decomposition, it is necessary to determine the VMD mode number $k$ and penalty factor $α$ (Yang et al., 2020c). The selection of mode number $k$ has a great influence on the decomposition effect. When $k$ is too small, it may lead to insufficient decomposition of original data and large errors in the final prediction result. If $k$ is too large, the original data may be over-decomposed. It results in modal aliasing and increases the algorithm complexity. $α$ affects the initial center constraint strength of the mode. It also has a great influence on the decomposition result.

In this paper, VMD optimized by gradient-based optimizer (GBO) is proposed, named GVMD. GBO is a primal heuristic algorithm proposed in 2020 (Ahmadianfar et al., 2020). The key to GVMD lies in the selection of its fitness function. The multiscale fuzzy entropy (MFE) reflects the complexity of the signal (X.X. Zheng et al., 2017; J.D. Zheng et al., 2017). The more complex the signal is, the greater the multiscale fuzzy entropy is, and vice versa. If the IMF component has a strong regularity, the lower the noise, the lower the complexity of the signal, and the lower the multiscale fuzzy entropy. Therefore, when GBO is chosen to optimize VMD, the selected fitness function is multiscale fuzzy entropy. After calculating the fitness function values each time, the optimal combination $[k, α]$ is determined by comparing and updating each other. The IMF component is obtained by replacing $[k, α]$ into VMD. Its brief search principle is as follows:

(1)
Set the population size and initial position.
(2)
Calculate the multiscale fuzzy entropy corresponding to each point in the population, and get the best Xbest and worst Xworst.
(3)
In order to achieve the balance between exploring the development process and seeking to improve the search ability, the gradient search rule (GSR) is introduced by using the best Xbest and worst Xworst in the population, and the GSR value is calculated. And the updating position of the population can be defined as follows:
$X_{n + 1} = X_{n} - G S R$ (10)
(4)
In order to make better use of the region near $X_{n}$ , the movement direction （DM） is introduced, and a new point $X 1_{n}^{m}$ is generated by the value of DM. $X 1_{n}^{m}$ is defined as follows:
$X 1_{n}^{m} = x_{n}^{m} - G S R + D M$ (11)
(5)
At the same time, two other special points $X 2_{n}^{m}$ and $X 3_{n}^{m}$ are generated, and the next iteration position is obtained by using the updated $X 1_{n}^{m}$ , $X 2_{n}^{m}$ , $X 3_{n}^{m}$ :
$x_{n}^{m + 1} = r_{a} \times (r_{b} \times X 1_{n}^{m} + (1 - r_{b}) \times X 2_{n}^{m}) + (1 - r_{a}) \times X 3_{n}^{m}$ (12)
where, $r_{a}$ and $r_{b}$ are two arbitrary numbers in [0,1].
(6)
In order to improve the performance of the optimization algorithm for solving complex problem, the local escaping operator (LEO) algorithm is introduced to solve the update point of the main position, the best and the worst point of the population. It is helpful to get out of the local optimum and accelerate the convergence speed. The specific optimization flow chart is shown in Fig. 1.

Fig. 1 — The specific optimization flow chart.

2.3. Autoregressive integrated moving average

Autoregressive moving average (ARMA), proposed by Box and Jenkins, is a model used to deal with random event sequences with fewer model parameters and simple application (Li et al., 2020b). There are three basic types of ARMA such as autoregressive (AR) model, moving average (MA) model and autoregressive moving average (ARMA) model. The basic structure is as follows:

X_{t} = φ_{1} X_{t - 1} + φ_{2} X_{t - 2} + \dots + φ_{p} X_{t - p} + δ + u_{t} + θ_{1} u_{t - 1} + θ_{2} u_{t - 2} + \dots + θ_{q} u_{t - q}

(13)

where, $φ_{1}, φ_{2}, \dots, φ_{p}$ is the autoregressive coefficient, $θ_{1}, θ_{2}, \dots, θ_{p}$ is the moving average coefficient, $p$ and $q$ are the order of the coefficient, $u_{t}, u_{t - 1}, \dots, u_{t - p}$ are independent white noise sequence, denoted as ARMA(p, q). If the original sequence is non-stationary and needs to be stationary after D-order difference, then the original sequence can be expressed as ARIMA (p, d, q) sequence.

2.4. Extreme learning machine

ELM is a kind of machine learning system based on feedforward neural network (FNN), which is suitable for supervised learning and unsupervised learning problems (Huang et al., 2015). ELM is regarded as a special FNN in the research, which is an improvement of FNN and its back-propagation algorithm. Its characteristic is that the weight of hidden layer nodes is random or artificially given, and does not need to be updated. The learning process only calculates the output weights (Ding et al., 2015). Its network structure is shown in Fig. 2. Suppose that there are N arbitrary samples $(x_{i}, t_{i})$ , $X = [x_{1}, x_{2}, ., x_{N}]$ and $T = [t_{1}, t_{2}, ., t_{N}]$ are the target matrix, namely the target vectors set of each individual sample, where, $x_{i} = {(x_{i 1}, x_{i 2}, ., x_{i n})}^{T} \in R^{n}, i = 1, 2, ., N$ is the $n \times 1$ dimension vector, and $t_{i} = {(t_{i 1}, t_{i 2}, ., t_{i m})}^{T} \in R^{m}, i = 1, 2, ., N$ is the $m \times 1$ dimension target.

For an ELM network with L hidden layer node, the hidden layer node output is as follows:

{\begin{matrix} h (x_{i}) = [h_{1} (x_{i}), h_{2} (x_{i}), ., h_{L} (x_{i})] \\ h (x_{i}) = g (w_{i} \cdot x_{i} + b_{i}) \end{matrix}

(14)

where, $g$ is the activation function, $w_{i} = [w_{i 1}, w_{i 2}, ., w_{i n}]$ is the input weight, and $b_{i}$ is the bias of the ith hidden layer unit. The output of ELM is:

{\begin{matrix} Y (X) = [y (x_{1}), y (x_{2}), ., y (x_{N})] \\ y (x_{i}) = \sum_{i = 1}^{L} β_{i} h_{i} (x) \end{matrix}

(15)

where, $β_{i}$ is the weight between each hidden layer node and each output layer node.

In ELM, hidden layer node parameters such as $w_{i}$ and $b_{i}$ are randomly generated according to any continuous probability distribution, which are independent of training data. The goal of ELM learning is to minimize the error of the output. That is the error between the output result $Y (X)$ and the matrix $T$ .

\min : | | Y (X) - T | |^{2}

(16)

The minimum approximate square difference method is used to solve the weight, so that the solution with the minimum objective function $Y (X) = T$ is the optimal solution $β^{*}$ . The solution process of $β^{*}$ is as follows:

T = Y (X) = H β, β \in R^{L \times M}

(17)

{\begin{matrix} H = [h (x_{1}), ., h (x_{N})] = [\begin{matrix} h_{1} (x_{1}), ., h_{L} (x_{1}) \\ h_{1} (x_{N}), ., h_{L} (x_{N}) \end{matrix}] \\ T = {[{t_{1}}^{T}, ., {t_{N}}^{T}]}^{T} \end{matrix}

(18)

In ELM, once the input weight $w_{i}$ and hidden layer node bias $b_{i}$ are determined, the output matrix H of the hidden layer is uniquely determined and the optimal solution $β^{*}$ is obtained:

β^{*} = H^{+} T

(19)

where, $H^{+}$ is the Moore-Penrose generalized inverse matrix of the matrix $H$ .

3. The proposed prediction model

Based on the theoretical basis of Section 2, a new hybrid prediction model based on gradient-based optimizer variational mode decomposition (GVMD), extreme learning machine (ELM), and autoregressive integrated moving average (ARIMA), named GVMD-ELM-ARIMA, is proposed. Its flow chart is shown in Fig. 3. The detailed steps are as follows:

Fig. 3 — The flow chart of GVMD-ELM-ARIMA.

Step 1: GVMD decomposition process.

The cumulative COVID-19 confirmed data are substituted into GVMD for optimization. The optimal $k$ and $α$ of VMD are obtained through several iterations, and some intrinsic mode functions (IMFs) are obtained. The residual component IMFr is obtained by the difference between the original data and the sum of IMFs.

Step 2: ELM prediction process.

ELM prediction model is established for the modal functions $[I M F 1, I M F 2, ., I M F n]$ obtained by GVMD, and the predicted results are denoted as $[I M F 1^{'}, I M F 2^{'}, ., I M F n^{'}]$ .

Step 3: ARIMA prediction process.

The ARIMA model is used to predict the high frequency residual component IMFr. Firstly, the stability is tested. Then the optimal model is obtained by Akaike information criterion (AIC) delimiter. Finally, the prediction result $I M F r^{'}$ is obtained.

Step 4: Hybrid reconstruction process.

$[I M F 1^{'}, I M F 2^{'}, ., I M F n^{'}]$ and $I M F r^{'}$ are reconstructed to complete the prediction and get the final output $X_{m i x e d}$ . The mixed output formula is as follows:

X_{m i x e d} = \sum_{i = 1}^{n} I M F i' + I M F r'

(20)

4. Simulation experiment

In this section, data source, comparison models, evaluation index of prediction result, detailed simulation experiment and result are given.

4.1. The experimental data source

The real cumulative COVID-19 confirmed data is used and carried out one-step prediction simulation experiment in this research. The real cumulative COVID-19 confirmed data comes from the United States, India and Russia. It includes data from January 22, 2020 to January 27, 2021, and the data is collected every day. The cumulative COVID-19 confirmed data can be obtained at website https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases. Each country has 372 cumulative COVID-19 confirmed data, and the data of each country are simulated respectively. Firstly, the first 327 real cumulative COVID-19 confirmed data are used as training samples to obtain 45 predicted values, and then compared with the remaining 45 real cumulative COVID-19 confirmed data. The real cumulative COVID-19 confirmed data is shown in Fig. 4.

4.2. Compared model description

In order to prove the performance of GVMD-ELM-ARIMA in this paper, five different models are selected for comparison. Among them, the single prediction model includes LSTM and ELM, and the hybrid model includes CEEMDAN-ELM, GVMD-ELM and GVMD-ELM-ELM. ELM and LSTM refers to the direct prediction of cumulative COVID-19 confirmed data. CEEMDAN-ELM is as follows: Apply the component decomposed by CEEMDAN to the ELM for prediction, and finally reconstruct the result. GVMD-ELM is as follows: Apply the component decomposed by GVMD to the ELM for prediction, and finally reconstruct the result. GVMD-ELM-ELM is as follows: IMFs and IMFr obtained after GVMD decomposition are both predicted by ELM, and then reconstructed.

4.3. The evaluation index

In order to quantitatively describe the performance of the evaluation prediction model, mean absolute percentage error (MAPE), root mean square error (RMSE) and mean absolute error (MAE) are selected. Their formulas are as follows:

M A P E = \frac{1}{n} \sum_{i = 1}^{n} | \frac{x_{i} - {\hat{x}}_{i}}{{\hat{x}}_{i}} |

(21)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - {\hat{x}}_{i})}^{2}}

(22)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | x_{i} - {\hat{x}}_{i} |

(23)

where, $x_{i}$ is the ith actual value, ${\hat{x}}_{i}$ is the ith predicted value, $\bar{x}$ is the actual average, n is the total number of samples.

4.4. American data simulation experiment

4.4.1. GVMD decomposition result

The cumulative COVID-19 confirmed data has been affected by uncertainties such as weather, temperature and government measures to combat the pandemic. Its fluctuation range is large. In order to predict it more accurately, GVMD initial optimization parameters setting are shown in Table 1.

Table 1.

GVMD initial optimization parameters setting.

The main parameter	The parameter value
The population size	30
Maximum number of iterations	50
Regularization parameter optimization range	[1500,3500]
Kernel function width optimization range	(Xiang et al., 2021, Annas et al., 2020)

Open in a new tab

The cumulative confirmed data of 372 days in the United States are substituted into GVMD for optimization, and the optimal fitness function is obtained through 50 iterations. Finally, the optimal parameters are found as $k = 3$ and $α = 3369$ , which are decomposed into 3 modes IMF. GVMD decomposition result is shown in Fig. 5.

VMD is the most advanced mode decomposition method. The problem is that the $k$ value and $α$ value which are difficult to determine are overcome by GBO. To prove the superiority of GVMD, CEEMDAN, which is the most advanced among the current improved EEMD, is selected to compare with GVMD. CEEMDAN decomposition result is shown in Fig. 6.

Fig. 6 shows that IMF components obtained by CEEMDAN are disturbed by noise, and a relatively large swing appears at the right end, thus affecting the whole component sequence. Therefore, large errors will be caused and the decomposition result will be distorted. The component waveform distortion of GVMD is smaller than that of CEEMDAN, and GVMD has better noise robustness. In addition, according to the decomposition result of the original data, only low-frequency components exist after the decomposition of GVMD, and there is no high-frequency component with high complexity. However, CEEMDAN has many high-frequency components, which is not conducive to the prediction.

4.4.2. ARIMA processing process

After GVMD, the residual component can be obtained as follows:

I M F r = Y - \sum_{i = 1}^{k} I M F i

(24)

where, Y represents the cumulative confirmed data. The residual component is shown in Fig. 7.

Fig. 7 shows that the residual component of the cumulative confirmed data in the United States after decomposition is relatively large, reaching about 2 W personages. Therefore, it is necessary to deal with the residual component. In this paper, ARIMA model is used to deal with the residual component. The processing process is as follows:

1)
Stationarity test

The stationarity test results such as Hp = 1, Hpvalue = 1.000E-03 can be obtained. They conform to

the characteristics of stationary sequence and do not need differential operation.
2)
Correlation test

Test the autocorrelation function (ACF) and partial autocorrelation function (PACF) on the data to judge the ARMA model. The results are shown in Fig. 8.

Fig. 8 shows that both ACF and PACF have the phenomenon of trailing, so the mixed model of AR and MA is selected.

3)
AIC order determination

It is difficult to determine the order according to ACF and PACF charts. In this paper, a more effective AIC is selected for delimiting, and the AIC order heat map is shown in Fig. 9.

4)
Establish a model

AR = 5 and MA = 3 are determined according to the heat map, and the model ARIMA (5, 0, 3) is established, and the residual component $I M F r^{'}$ is predicted.

4.4.3. Prediction result of each model

This section presents in detail the prediction result and performance of GVMD-ELM-ARIMA proposed in this paper. In order to show the superiority and advancement of this model, the five models in Section 4.2 are compared and analyzed. The comparison diagram and error diagram of the prediction model are firstly shown to observe the prediction effect. Finally, the error box plot and the evaluation index in Section 4.3 are used for quantitative analysis. In this paper, the prediction result of each model is firstly presented in Fig. 10. Then the prediction comparison result of each model is shown in Fig. 11. Fig. 12 further shows the prediction error distribution diagram.

Fig. 10 — the prediction result of ELM, LSTM, CEEMDAN-ELM, GVMD-ELM, GVMD-ELM-ELM and GVMD-ELM-ARIMA.

Fig. 11 — The prediction comparison result of each model.

Fig. 12 — The prediction error distribution diagram.

Fig. 10 shows that the prediction effect of GVMD-ELM-ARIMA is better. Compared with other models, the prediction result of GVMD-ELM-ARIMA has the highest coincidence degree with the original data. Fig. 11 shows that the prediction points of GVMD-ELM-ARIMA and CEEMDAN-ELM are the closest to the original data points and LSTM is the farthest from the original data points. According to the prediction error distribution diagram in Fig. 12, the predicted value of GVMD-ELM-ARIMA is the closest to the true value. From the first day to the 35th day, the data value fluctuates around the base line. On day 35–40, the error fluctuates slightly. In single model, the prediction effect of ELM is better, and the prediction error is small from day 1 to day 10. However, the prediction error of the next 11–30 days is larger than the base line. In order to improve the prediction accuracy of qualitative analysis, the error box diagram and the evaluation index of each model are shown in Fig. 13 and Table 2 respectively.

Table 2.

The evaluation index.

Model	MAPE	RMSE	MAE
ELM	0.0047	1.41E+ 05	1.06E+ 05
LSTM	0.0063	1.7557E+ 05	1.43E+ 05
CEEMDAN-ELM	0.0042	1.53E+ 05	7.31E+ 04
GVMD-ELM	0.0032	8.56E+ 04	7.01E+ 04
GVMD-ELM-ELM	0.0030	7.88E+ 04	6.59E+ 04
GVMD-ELM-ARIMA	0.0012	3.40E+ 04	2.53E+ 04

Open in a new tab

Fig. 13 shows that GVMD-ELM-ARIMA proposed in this paper has the best prediction effect, with the maximum error of 74,703.42, the minimum error of − 86,186.61 and the median of − 7387.99. Compared with the other 5 models, GVMD-ELM-ARIMA has the smallest overall fluctuation range and relatively uniform prediction error distribution, and its prediction performance is significantly better than other models.

According to the evaluation index in Table 2, it can be concluded that:

(1)
Compared with the average MAPE, RMSE and MAE of the single model, the average MAPE of the hybrid model is reduced by 47.27%, the average RMSE is reduced by 44.50%, and the average MAE is reduced by 55.34%.
(2)
Compared with the average MAPE mean, RMSE mean and MAE mean of the hybrid model with untreated residual component (CEEMDAN-ELM, GVMD-ELM), that of the hybrid model with treated residual component (GVMD-ELM-ELM, GVMD-ELM-ARIMA) decreased by 43.24%, 52.72% and 36.31% respectively.
(3)
Compared with GVMD-ELM-ELM, GVMD-ELM-ARIMA proposed in this paper reduces MAPE by 60%, the RMSE by 56.85%, and MAE by 61.61%.

By comparing the evaluation index in Table 2 with the model data in the previous paragraph, we can draw the following conclusions:

(1)
The prediction accuracy of the hybrid model is better than that of the single prediction model, which also demonstrates the advantages of the hybrid model based on mode decomposition in the prediction.
(2)
The accuracy of the hybrid model after processing the residual component is higher than that of the hybrid model without processing the residual component. It shows that the processing of the residual component can further improve the prediction accuracy.
(3)
GVMD-ELM-ARIMA has the minimum MAPE, RMSE and MAE, which proves that GVMD-ELM-ARIMA has better prediction performance and can effectively improve the prediction accuracy.

4.5. India simulation result

As the experimental process has been demonstrated in detail in Section 4.4, in order to save space, this section briefly presents the experimental process and result of the cumulative confirmed data in India. The prediction comparison result of each model is shown in Fig. 14. The prediction error distribution diagram is shown in Fig. 15, and the error box diagram is shown in Fig. 16. The evaluation index of each model is shown in Table 3.

Fig. 14 — The prediction comparison result of each model.

Fig. 15 — The prediction error distribution diagram.

Table 3.

The evaluation index of each model.

Model	MAPE	RMSE	MAE
ELM	5.11E-04	8.53E+ 02	5.32E+ 03
LSTM	8.16E-04	1.11E+ 04	4.53E+ 04
CEEMDAN-ELM	3.29E-04	4.75e+ 05	3.42E+ 03
VMD-ELM	1.20E-04	1.59E+ 04	1.24E+ 03
VMD-ELM-ELM	1.17E-04	1.58E+ 03	1.22E+ 03
VMD-ELM-ARIMA	9.55E-05	1.51E+ 03	9.92E+ 02

Open in a new tab

Fig. 14, Fig. 15, Fig. 16 show that compared with other models, GVMD-ELM-ARIMA has the best prediction effect. The coordinates of the predicted points are relatively close to those of the original data points, and the errors have been fluctuating slightly around the base line. The accuracy of the hybrid model is still higher than that of the single model, which is consistent with the results in Section 4.4. GVMD-ELM-ARIMA proposed in this paper still has the smallest MAPE, RMS and MAE. In conclusion, the experimental conclusion in this section is consistent with the result in Section 4.4.

4.6. Russia simulation result

In order to save space, this section briefly presents the experimental process and result of the cumulative confirmed data in Russia. The prediction comparison result of each model is shown in Fig. 17. The prediction error distribution diagram is shown in Fig. 18, and the error box diagram is shown in Fig. 19. The evaluation index of each model is shown in Table 4.

Fig. 17 — The prediction comparison result of each model.

Fig. 18 — The prediction error distribution diagram.

Table 4.

The evaluation index of each model.

Model	MAPE	RMS	MAE
ELM	2.83E-03	1.33E+ 04	9.84E+ 03
LSTM	5.73E-03	2.66E+ 04	1.96E+ 04
CEEMDAN-ELM	2.09E-04	1.18e+ 05	7.43E+ 03
GVMD-ELM	1.36E-04	6.45E+ 04	4.71E+ 03
GVMD-ELM-ELM	1.14E-04	6.67E+ 03	3.98E+ 03
GVMD-ELM-ARIMA	4.79E-05	1.81E+ 03	1.57E+ 02

Open in a new tab

The experimental conclusion in this section is the same as that in Section 4.4. So it will not be repeated.

5. Multi-step prediction

The GVMD-ELM-ARIMA model proposed in this paper is used to predict the cumulative COVID-19 confirmed data in the United States, India and Russia, and its prediction effect is verified. It includes data from January 5, 2021 to October 18, 2021 and the data is collected every day. The cumulative COVID-19 confirmed data can be obtained at website https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases.

Each country has 287 cumulative COVID-19 confirmed data, and the data of each country are simulated respectively. Considering the multi-step prediction and the influence of outliers on the experiment, this paper takes the average value of confirmed data every 7 days as the reference point. 266 confirmed data (38 sample points) from January 5, 2021 to September 27, 2021 are selected as the training set, and 21 data (3 sample points) from September 28, 2021 to October 18, 2021 are selected as the test set. Finally, a total of 10 values are predicted, and the first three predicted values are compared with the test set as the verification point of the prediction effect. In order to facilitate readers to understand the selection of data sets, Fig. 20 is provided as a reference. Fig. 21 represents the forecast results of COVID-19 cumulative data in the US, India and Russia respectively.

Fig. 20 — Schematic diagram of data set selection, (a) United States, (b) India, (c) Russia.

Fig. 21 shows that 10 points are predicted, and each point represents the average value of one week, which can represent the prediction data of 70 days in total. The first three predicted values are close to the real test data, and good results are obtained, which proves the effectiveness of the proposed model.

6. Discussion

Qiang et al. (2021) proposed a hybrid prediction model for COVID-19 outbreak based on ensemble empirical mode decomposition (EEMD). CEEMDAN is an improved version of EEMD. CEEMDAN can solve the problem of modal aliasing and is more widely used in real life. However, this algorithm will add noise to the signal for many times and then carry out overall average every time when IMF is calculated, which increases the complexity of the algorithm. In practice, it can cause modal aliasing, intermittent and high frequency anomalies, etc. However, VMD is essentially a set of adaptive Wiener filter, which has strong robustness to noise, so VMD is introduced in this paper.

The decomposition effect of VMD is better, but it has high requirements on mode number $k$ and parameter $α$ , and the setting of parameters will directly affect the decomposition result. GBO has a fast convergence rate and can avoid the local optimal problem. Therefore, this paper combines GBO and VMD to propose GVMD to determine the mode number $k$ and parameter $α$ . As shown in Figs. 5 and 6, compared with CEEMDAN decomposition result, GVMD decomposition result is relatively smooth and mainly consists of low-frequency components, indicating that GVMD is suitable for data decomposition in this experiment. GVMD has strong adaptability and can be applied to signal denoising, feature extraction, fault diagnosis and time series prediction.

ELM has the advantages of fast speed and strong universality. It can be seen from the results in Tables 2-4 that the prediction error of ELM alone in this experiment is small. Therefore, ELM is selected to predict IMFs, and GVMD-ELM is proposed. At the same time, it can be found that although the GVMD-ELM has achieved a better prediction effect, there are still some errors. ARIMA can process high-frequency components with fewer parameters and simple application. In order to further reduce the error, according to the characteristics of high-frequency IMFr, ARIMA is introduced to process IMFr, and GVMD-ELM-ARIMA is proposed.

At present, there are many studies on the prediction of the cumulative COVID-19 confirmed data, but most of them are based on a single model. In this paper, a variety of hybrid models are proposed to compare with the single model. The simulation result shows that the prediction accuracy of the hybrid model is all higher than that of the single model, which is also reflected in the experimental results in Section 4.4, Section 4.5 and Section 4.6. At the same time, it can be found that although the hybrid model has achieved a better prediction effect, there are still some errors. To solve this problem, IMFr is processed for the first time in this paper. Comparing GVMD-ELM-ELM and GVMD-ELM-ARIMA with other prediction models, it can be found that the processing of IMFr can better improve the prediction accuracy. At the same time, the prediction result of GVMD-ELM-ELM and GVMD-ELM-ARIMA are compared. The comparison result shows that ARIMA has a better effect on IMFr than ELM, and MAPE, RMSE, MAE in Tables 2-4 are reduced to a certain extent. Therefore, the ARIMA model is applied to the residual processing after the decomposition of the cumulative COVID-19 confirmed data.

The GVMD-ELM-ARIMA proposed in this paper achieves the best prediction effect in the single-step prediction of the mixed model, and has the lowest MAPE, RMSE and MAE. In order to test the prediction effect of GBO-ELM-ARIMA, the cumulative COVID-19 confirmed data of various countries is predicted in multiple steps. Fig. 20 shows the multi-step prediction results of GVMD-ELM-ARIMA for cumulative COVID-19 confirmed data. This prediction method has achieved certain prediction results and realized 40 days data prediction, and predicted the epidemic development under the condition of obtaining more data. It will provide a new method for cumulative COVID-19 confirmed data prediction.

The GVMD-ELM-ARIMA proposed in this paper provides a new method for the prediction of cumulative COVID-19 diagnostic data, but with limited energy and knowledge level, there are still some problems to be further studied.

(1)
For the IMFs and IMFr components after data decomposition, only several methods are selected for prediction and comparison, and the optimal model is selected. In fact, more prediction models can be explored to study the effect of prediction.
(2)
Due to the limitations of neural network, it needs a certain amount of data as support. When the amount of data is small, there will be some problems in the accuracy of prediction. Therefore, more other processing methods of medium and short-term prediction can be considered.
(3)
If we can use the online real-time data to compare the predicted data with the actual data in time and realize error correction, it will be a good improvement direction.
(4)
In multi-step prediction, the prediction accuracy will descend with the increase of prediction step, which is also a direction worthy of improvement.

7. Conclusions

To improve the prediction accuracy of the cumulative COVID-19 confirmed data, avoid the low prediction accuracy of single model and the deficiency of CEEMDAN, GVMD-ELM-ARIMA is proposed in this paper. The main conclusions and innovations in this paper are as follows:

(1)
VMD is applied to the prediction of the cumulative COVID-19 confirmed data, which avoids the disadvantages of CEEMDAN such as modal aliasing and too long operation time, and improves the prediction accuracy.
(2)
VMD needs to preset the value of mode number $k$ and parameter $α$ , which will directly affect the decomposition result. In order to solve this problem, GVMD is proposed. The multiscale fuzzy entropy is used as fitness value to determine the mode number $k$ value and parameter α value, which enhances the adaptability of VMD.
(3)
In order to further improve the prediction accuracy of the cumulative COVID-19 confirmed data, ARIMA is used for the first time to analyze and predict the residual component after GVMD.
(4)
GVMD-ELM-ARIMA is proposed. The five comparison models are brought into the cumulative COVID-19 confirmed data by the United States, India and Russia for one-step simulation test. The results show that the GVMD-ELM-ARIMA model has good prediction accuracy.
(5)
The GVMD-ELM-ARIMA model is introduced into the latest epidemic data of the United States, India and Russia to carry out multi-step prediction, and its prediction results are verified. It provides a new method for predicting cumulative COVID-19 confirmed data.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 51709228).

Biographies

Guohui Li received the bachelor degree in electronic information engineering from Chongqing University of Technology, Chongqing, China, in 2001. He received the master in circuit and system degree from University of Electronic Science and Technology of China, Chengdu, China, in 2004. He received the doctor degree in acoustics from Northwestern Polytechnical University, Xi’an, China, in 2015. He is an associate professor in school of electronic engineering, Xi’an University of Posts and Telecommunications, Shaanxi, China. His research interests include chaotic signal processing and neural network prediction.

Kang Chen received the bachelor degree in Optoelectronic Information Science and Engineering from Jiangxi Science and Technology Normal University, Jiangxi, China, in 2019. He is currently pursuing the master degree in electronics and communication engineering at Xi’an University of Posts and Telecommunications, Xi’an, China. His research interests include chaotic signal processing and neural network prediction.

Hong Yang received the bachelor and master degree in mechanical & electronic engineering from University of Electronic Science and Technology of China, Chengdu, China, in 2003 and 2006 respectively. She received the doctor degree in acoustics from Northwestern Polytechnical University, Xi’an, China, in 2015. She is an associate professor in school of electronic engineering, Xi’an University of Posts and Telecommunications, Shaanxi, China. Her research interests include chaotic signal processing and neural network prediction.

Appendix A. The nomenclature used in this paper

List of nomenclature
AIC	Akaike information criterion
ANN	Artificial neural network
AR	Autoregressive
ARIMA	Autoregressive integrated moving average
EEMD	Ensemble empirical mode decomposition
CEEMD	Complementary ensemble empirical mode decomposition
CEEMDAN	Complete ensemble empirical mode decomposition with adaptive noise
COVID-19	Corona virus disease 2019
DWD	Discrete wavelet decomposition
ELM	Extreme learning machine
FNN	Feedforward neural network
GBO	Gradient-based optimizer
GSR	Gradient search rule
GVMD	Gradient-based optimizer variational mode decomposition
IMF	Intrinsic mode function
IMFr	Intrinsic mode function residual component
IMFs	Intrinsic mode functions
LEO	local escaping operator
LSTM	Long short-term memory
MA	Moving average
MAE	Mean absolute error
MAPE	Mean absolute percentage error
MFE	Multiscale fuzzy entropy
RMSE	Root mean square error
SEIR	Susceptible exposed infectious recovered
SI	Susceptible infectious
SIR	Susceptible infectious recovered
SVM	Support vector machine
VMD	Variational mode decomposition

Open in a new tab

References

Ahmadianfar I., Bozorg-Haddad O., Chu X.F. Gradient-based optimizer: a new metaheuristic optimization algorithm. Inf. Sci. 2020;540:131–159. [Google Scholar]
Annas S., Pratama M.I., Rifandi M., Sanusi W., Side S. Stability analysis and numerical simulation of SEIR model for pandemic COVID-19 spread in Indonesia. Chaos, Solitons and Fractals. Chaos Solitons Fractals. 2020;139 doi: 10.1016/j.chaos.2020.110072. (1-17) [DOI] [PMC free article] [PubMed] [Google Scholar]
Balli S. Data analysis of COVID-19 pandemic and short-term cumulative case forecasting using machine learning time series methods. Chaos Solitons Fractals. 2021;142 doi: 10.1016/j.chaos.2020.110512. (1-7) [DOI] [PMC free article] [PubMed] [Google Scholar]
Bhapkar H.R., Mahalle P.N., Dey N., Santosh K.C. Revisited COVID-19 mortality and recovery rates: are we missing recovery time period? J. Med. Syst. 2020;44(12):202. doi: 10.1007/s10916-020-01668-6. : 202(1-5) [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen D.N., Zhang Y.D., Yao C.Y., Lai B.W., Lv S.J. Fault diagnosis method based on variational mode decomposition and multi-scale permutation entropy. Comput. Integr. Manuf. Syst. 2017;23(12):2604–2612. [Google Scholar]
Chen J.X., Zhu Z., Zhang X.D. Feature cognitive model combined by an improved variational mode and singular value decomposition for fault signals. Cogn. Comput. Syst. 2020;2(2):66–71. [Google Scholar]
Chorowski J., Wang J., Zurada J.M. Review and performance comparison of SVM- and ELM-based classifiers. Neurocomputing. 2014;128:507–516. [Google Scholar]
Chu J.W., Dong Y.C., Han X.X., Xie J., Xu X.Y., Xie G. Short-term prediction of urban PM2.5 based on a hybrid modified variational mode decomposition and support vector regression model. Environ. Sci. Pollut. Res. 2021;28(1):56–72. doi: 10.1007/s11356-020-11065-8. [DOI] [PubMed] [Google Scholar]
Cong W., Jie Y., Xu W., Min L. Analysis on early spatiotemporal transmission characteristics of COVID-19. Acta Phys. Sin. 2020;69(8) (1-10) [Google Scholar]
Cooper I., Mondal A., Antonopoulos C.G. A SIR model assumption for the spread of COVID-19 in different communities. Chaos Solitons Fractals. 2020;139 doi: 10.1016/j.chaos.2020.110057. (1-19) [DOI] [PMC free article] [PubMed] [Google Scholar]
Ding S.F., Zhao H., Zhang Y.N., Xu X.Z., Nie R. Extreme learning machine: algorithm, theory and applications. Artif. Intell. Rev. 2015;44(1):103–115. [Google Scholar]
Dragomiretskiy K., Zosso D. Variational mode decomposition. IEEE Trans. Signal Process. 2014;62(3):531–544. [Google Scholar]
Feng S., Feng Z.B., Ling C., Chang C., Feng Z.K. Prediction of the COVID-19 epidemic trends based on SEIR and AI models. PLOS One. 2021;16(1) doi: 10.1371/journal.pone.0245101. (1-15) [DOI] [PMC free article] [PubMed] [Google Scholar]
Hamadneh N.N., Tahir M., Khan W.A. Using artificial neural network with prey predator algorithm for prediction of the COVID-19: the case of Brazil and Mexico. Mathematics. 2021;9(2):180. [Google Scholar]
Huang G., Huang G.B., Song S.J., You K.Y. Trends in extreme learning machines: a review. Neural Netw. 2015;61:32–48. doi: 10.1016/j.neunet.2014.10.001. [DOI] [PubMed] [Google Scholar]
Ibrahim M.A., Al-Najafi A. Modeling, control, and prediction of the spread of COVID-19 using compartmental, logistic, and Gauss models: a case study in Iraq and Egypt. Processes. 2020;8(11):1400. [Google Scholar]
Jiang Y., Huang G.Q., Yang Q.S., Yan Z.T., Zhang C.F. A novel probabilistic wind speed prediction approach using real time refined variational model decomposition and conditional kernel density estimation. Energy Convers. Manag. 2019;185:758–773. [Google Scholar]
Kasun L.L.C., Zhou H.M., Huang G.B., Vong C.M. Representational learning with ELMs for big data. IEEE Intell. Syst. 2013;28(6):31–34. [Google Scholar]
Ladha N., Bhardwaj P., Charan J., Mitra P., Goyal J.P., Sharma P., Singh K., Misra S. Association of environmental parameters with COVID-19 in Delhi, India. Indian J. Clin. Biochem. 2020:1–5. doi: 10.1007/s12291-020-00921-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lang X., Rehman N.U., Zhang Y.F., Xie L., Su H.Y. Median ensemble empirical mode decomposition. Signal Process. 2020;176 (1-8) [Google Scholar]
Li G.H., Zheng M.T., Yang H. Cycle analysis method of tree ring and solar activity based on variational mode decomposition and Hilbert transform. Adv. Meteorol. 2019;2019:1715673–1715678. (1-8) [Google Scholar]
Li G.H., Chang W.N., Yang H. A new hybrid model for underwater acoustic signal prediction. Complexity. 2020;2020 (1-19) [Google Scholar]
Li G.H., Chang W.N., Yang H. A novel combined prediction model for monthly mean precipitation with error correction strategy. IEEE Access. 2020;8:141432–141445. [Google Scholar]
Li Y.X., Li Y.A., Chen X., Yu J. Research on ship-radiated noise denoising using secondary variational mode decomposition and correlation coefficient. Sensors. 2018;18(1) doi: 10.3390/s18010048. 48(1-17) [DOI] [PMC free article] [PubMed] [Google Scholar]
Moftakhar L., Seif M., Safe M.S. Exponentially increasing trend of infected patients with COVID-19 in Iran: a comparison of neural network and ARIMA forecasting models. Iran. J. Public Health. 2020;49:92–100. doi: 10.18502/ijph.v49iS1.3675. [DOI] [PMC free article] [PubMed] [Google Scholar]
Qiang X.L., Aamir M., Naeem M., Ali S., Aslam A., Shao Z.H. Analysis and forecasting COVID-19 outbreak in Pakistan using decomposition and ensemble model. Comput. Mater. Contin. 2021;68(1):841–856. [Google Scholar]
Rafieenasab S., Zahiri A.P., Roohi E. Prediction of peak and termination of novel coronavirus COVID-19 epidemic in Iran. Int. J. Mod. Phys. C. 2020;31(11) (1-23) [Google Scholar]
Rath S., Tripathy A., Tripathy A.R. Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model. Diabetes Metab. Syndr. Clin. Res. Rev. 2020;14(5):1467–1474. doi: 10.1016/j.dsx.2020.07.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
Santosh K.C. COVID-19 prediction models and unexploited data. J. Med. Syst. 2020;44(9):170. doi: 10.1007/s10916-020-01645-z. 170(1-4) [DOI] [PMC free article] [PubMed] [Google Scholar]
Santosh K.C. AI-driven tools for coronavirus outbreak: need of active learning and cross-population train/test models on multitudinal/multimodal data. J. Med. Syst. 2020;44(5):93. doi: 10.1007/s10916-020-01562-1. : 93(1-5) [DOI] [PMC free article] [PubMed] [Google Scholar]
Shahid F., Zameer A., Muneeb M. Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM. Chaos Solitons Fractals. 2020;140 doi: 10.1016/j.chaos.2020.110212. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shrivastava N.A., Lohia K., Panigrahi B.K. A multiobjective framework for wind speed prediction interval forecasts. Renew. Energy. 2016;87:903–910. [Google Scholar]
Shyam Sunder Reddy K., Padmanabha Reddy Y.C.A. Mallikarjuna Rao Ch. Recurrent neural network based prediction of number of COVID-19 cases in India. Mater. Today. Proc. 2020 doi: 10.1016/j.matpr.2020.11.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
Singh P., Gupta A. Generalized SIR (GSIR) epidemic model: An improved framework for the predictive monitoring of COVID-19 pandemic. ISA Trans. 2021 doi: 10.1016/j.isatra.2021.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Singh S., Parmar K.S., Kumar J. Development of new hybrid model of discrete wavelet decomposition and autoregressive integrated moving average (ARIMA) models in application to one month forecast the casualties cases of COVID-19. Chaos Solitons Fractals. 2020;135 doi: 10.1016/j.chaos.2020.109866. (1-8) [DOI] [PMC free article] [PubMed] [Google Scholar]
Tang Z.P., Zhang T.T., Wu J.C., Du X.X., Chen K.J. Multistep-ahead stock price forecasting based on secondary decomposition technique and extreme learning machine optimized by the differential evolution algorithm. Math. Probl. Eng. 2020;2020 2604915-13(1-13) [Google Scholar]
Wang D.Y., Wei S., Luo H.Y., Yue C.Q., Grunder O. A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine. Sci. Total Environ. 2017;580:719–733. doi: 10.1016/j.scitotenv.2016.12.018. [DOI] [PubMed] [Google Scholar]
Wang R.H., Li C.S., Fu W.L., Tang G. Deep learning method based on gated recurrent unit and variational mode decomposition for short-term wind power interval prediction. IEEE Trans. Neural Netw. Learn. Syst. 2020;31(10):3814–3827. doi: 10.1109/TNNLS.2019.2946414. [DOI] [PubMed] [Google Scholar]
Xiang Y., Jia Y.H., Chen L.L., Guo L., Shu B.Z., Long E.S. COVID-19 epidemic prediction and the impact of public health interventions: a review of COVID-19 epidemic models. Infect. Dis. Model. 2021;6:324–342. doi: 10.1016/j.idm.2021.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang H., Gao L.P., Li G.H. Underwater acoustic signal prediction based on MVMD and optimized kernel extreme learning machine. Complexity. 2020;2020 (1-18) [Google Scholar]
Yang H., Li L.L., Li G.H. A new denoising method for underwater acoustic signal. IEEE Access. 2020;8:201874–201888. [Google Scholar]
Yang H., Gao L.P., Li G.H. Underwater acoustic aignal prediction based on correlation variational mode decomposition and error compensation. IEEE Access. 2020;8:103941–103955. [Google Scholar]
Yang H., Li L.L., Li G.H., Guan Q.R. A novel feature extraction method for ship-radiated noise. Def. Technol. 2021 doi: 10.1016/j.dt.2021.03.012. [DOI] [Google Scholar]
Yang H., Cheng Y.X., Li G.H. A denoising method for ship radiated noise based on Spearman variational mode decomposition, spatial-dependence recurrence sample entropy, improved wavelet threshold denoising, and Savitzky-Golay filter. Alex. Eng. J. 2021;60(3):3379–3400. [Google Scholar]
Zhang G.P., Liu X.D. Prediction and control of COVID-19 spreading based on a hybrid intelligent model. Plos One. 2021;16(2) doi: 10.1371/journal.pone.0246360. (1-12) [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang H., Cui W.T., Kang Z.J., Yang T., Lou B., Chi Y.T., Long H., Ma M., Yuan Q., Zhang S.P., Zhang D., Xin J.M. Predicting COVID-19 using hybrid AI mode. IEEE Trans. Cybern. 2020;50(7):2891–2904. doi: 10.1109/TCYB.2020.2990162. [DOI] [PubMed] [Google Scholar]
Zheng J.D., Pan H.Y., Cheng J.S. Rolling bearing fault detection and diagnosis based on composite multiscale fuzzy entropy and ensemble support vector machines. Mech. Syst. Signal Process. 2017;85:746–759. [Google Scholar]
Zheng X.X., Zhou G.W., Ren H.H., Fu Y. A rolling bearing fault diagnosis method based on variational mode decomposition and permutation entropy. J. Vib. Shock. 2017;36(22):22–28. [Google Scholar]

[bib1] Ahmadianfar I., Bozorg-Haddad O., Chu X.F. Gradient-based optimizer: a new metaheuristic optimization algorithm. Inf. Sci. 2020;540:131–159. [Google Scholar]

[bib2] Annas S., Pratama M.I., Rifandi M., Sanusi W., Side S. Stability analysis and numerical simulation of SEIR model for pandemic COVID-19 spread in Indonesia. Chaos, Solitons and Fractals. Chaos Solitons Fractals. 2020;139 doi: 10.1016/j.chaos.2020.110072. (1-17) [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Balli S. Data analysis of COVID-19 pandemic and short-term cumulative case forecasting using machine learning time series methods. Chaos Solitons Fractals. 2021;142 doi: 10.1016/j.chaos.2020.110512. (1-7) [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Bhapkar H.R., Mahalle P.N., Dey N., Santosh K.C. Revisited COVID-19 mortality and recovery rates: are we missing recovery time period? J. Med. Syst. 2020;44(12):202. doi: 10.1007/s10916-020-01668-6. : 202(1-5) [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Chen D.N., Zhang Y.D., Yao C.Y., Lai B.W., Lv S.J. Fault diagnosis method based on variational mode decomposition and multi-scale permutation entropy. Comput. Integr. Manuf. Syst. 2017;23(12):2604–2612. [Google Scholar]

[bib6] Chen J.X., Zhu Z., Zhang X.D. Feature cognitive model combined by an improved variational mode and singular value decomposition for fault signals. Cogn. Comput. Syst. 2020;2(2):66–71. [Google Scholar]

[bib7] Chorowski J., Wang J., Zurada J.M. Review and performance comparison of SVM- and ELM-based classifiers. Neurocomputing. 2014;128:507–516. [Google Scholar]

[bib8] Chu J.W., Dong Y.C., Han X.X., Xie J., Xu X.Y., Xie G. Short-term prediction of urban PM2.5 based on a hybrid modified variational mode decomposition and support vector regression model. Environ. Sci. Pollut. Res. 2021;28(1):56–72. doi: 10.1007/s11356-020-11065-8. [DOI] [PubMed] [Google Scholar]

[bib9] Cong W., Jie Y., Xu W., Min L. Analysis on early spatiotemporal transmission characteristics of COVID-19. Acta Phys. Sin. 2020;69(8) (1-10) [Google Scholar]

[bib10] Cooper I., Mondal A., Antonopoulos C.G. A SIR model assumption for the spread of COVID-19 in different communities. Chaos Solitons Fractals. 2020;139 doi: 10.1016/j.chaos.2020.110057. (1-19) [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Ding S.F., Zhao H., Zhang Y.N., Xu X.Z., Nie R. Extreme learning machine: algorithm, theory and applications. Artif. Intell. Rev. 2015;44(1):103–115. [Google Scholar]

[bib12] Dragomiretskiy K., Zosso D. Variational mode decomposition. IEEE Trans. Signal Process. 2014;62(3):531–544. [Google Scholar]

[bib13] Feng S., Feng Z.B., Ling C., Chang C., Feng Z.K. Prediction of the COVID-19 epidemic trends based on SEIR and AI models. PLOS One. 2021;16(1) doi: 10.1371/journal.pone.0245101. (1-15) [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Hamadneh N.N., Tahir M., Khan W.A. Using artificial neural network with prey predator algorithm for prediction of the COVID-19: the case of Brazil and Mexico. Mathematics. 2021;9(2):180. [Google Scholar]

[bib15] Huang G., Huang G.B., Song S.J., You K.Y. Trends in extreme learning machines: a review. Neural Netw. 2015;61:32–48. doi: 10.1016/j.neunet.2014.10.001. [DOI] [PubMed] [Google Scholar]

[bib16] Ibrahim M.A., Al-Najafi A. Modeling, control, and prediction of the spread of COVID-19 using compartmental, logistic, and Gauss models: a case study in Iraq and Egypt. Processes. 2020;8(11):1400. [Google Scholar]

[bib17] Jiang Y., Huang G.Q., Yang Q.S., Yan Z.T., Zhang C.F. A novel probabilistic wind speed prediction approach using real time refined variational model decomposition and conditional kernel density estimation. Energy Convers. Manag. 2019;185:758–773. [Google Scholar]

[bib18] Kasun L.L.C., Zhou H.M., Huang G.B., Vong C.M. Representational learning with ELMs for big data. IEEE Intell. Syst. 2013;28(6):31–34. [Google Scholar]

[bib19] Ladha N., Bhardwaj P., Charan J., Mitra P., Goyal J.P., Sharma P., Singh K., Misra S. Association of environmental parameters with COVID-19 in Delhi, India. Indian J. Clin. Biochem. 2020:1–5. doi: 10.1007/s12291-020-00921-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Lang X., Rehman N.U., Zhang Y.F., Xie L., Su H.Y. Median ensemble empirical mode decomposition. Signal Process. 2020;176 (1-8) [Google Scholar]

[bib21] Li G.H., Zheng M.T., Yang H. Cycle analysis method of tree ring and solar activity based on variational mode decomposition and Hilbert transform. Adv. Meteorol. 2019;2019:1715673–1715678. (1-8) [Google Scholar]

[bib22] Li G.H., Chang W.N., Yang H. A new hybrid model for underwater acoustic signal prediction. Complexity. 2020;2020 (1-19) [Google Scholar]

[bib23] Li G.H., Chang W.N., Yang H. A novel combined prediction model for monthly mean precipitation with error correction strategy. IEEE Access. 2020;8:141432–141445. [Google Scholar]

[bib24] Li Y.X., Li Y.A., Chen X., Yu J. Research on ship-radiated noise denoising using secondary variational mode decomposition and correlation coefficient. Sensors. 2018;18(1) doi: 10.3390/s18010048. 48(1-17) [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Moftakhar L., Seif M., Safe M.S. Exponentially increasing trend of infected patients with COVID-19 in Iran: a comparison of neural network and ARIMA forecasting models. Iran. J. Public Health. 2020;49:92–100. doi: 10.18502/ijph.v49iS1.3675. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Qiang X.L., Aamir M., Naeem M., Ali S., Aslam A., Shao Z.H. Analysis and forecasting COVID-19 outbreak in Pakistan using decomposition and ensemble model. Comput. Mater. Contin. 2021;68(1):841–856. [Google Scholar]

[bib27] Rafieenasab S., Zahiri A.P., Roohi E. Prediction of peak and termination of novel coronavirus COVID-19 epidemic in Iran. Int. J. Mod. Phys. C. 2020;31(11) (1-23) [Google Scholar]

[bib28] Rath S., Tripathy A., Tripathy A.R. Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model. Diabetes Metab. Syndr. Clin. Res. Rev. 2020;14(5):1467–1474. doi: 10.1016/j.dsx.2020.07.045. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Santosh K.C. COVID-19 prediction models and unexploited data. J. Med. Syst. 2020;44(9):170. doi: 10.1007/s10916-020-01645-z. 170(1-4) [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Santosh K.C. AI-driven tools for coronavirus outbreak: need of active learning and cross-population train/test models on multitudinal/multimodal data. J. Med. Syst. 2020;44(5):93. doi: 10.1007/s10916-020-01562-1. : 93(1-5) [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Shahid F., Zameer A., Muneeb M. Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM. Chaos Solitons Fractals. 2020;140 doi: 10.1016/j.chaos.2020.110212. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Shrivastava N.A., Lohia K., Panigrahi B.K. A multiobjective framework for wind speed prediction interval forecasts. Renew. Energy. 2016;87:903–910. [Google Scholar]

[bib33] Shyam Sunder Reddy K., Padmanabha Reddy Y.C.A. Mallikarjuna Rao Ch. Recurrent neural network based prediction of number of COVID-19 cases in India. Mater. Today. Proc. 2020 doi: 10.1016/j.matpr.2020.11.117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Singh P., Gupta A. Generalized SIR (GSIR) epidemic model: An improved framework for the predictive monitoring of COVID-19 pandemic. ISA Trans. 2021 doi: 10.1016/j.isatra.2021.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] Singh S., Parmar K.S., Kumar J. Development of new hybrid model of discrete wavelet decomposition and autoregressive integrated moving average (ARIMA) models in application to one month forecast the casualties cases of COVID-19. Chaos Solitons Fractals. 2020;135 doi: 10.1016/j.chaos.2020.109866. (1-8) [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Tang Z.P., Zhang T.T., Wu J.C., Du X.X., Chen K.J. Multistep-ahead stock price forecasting based on secondary decomposition technique and extreme learning machine optimized by the differential evolution algorithm. Math. Probl. Eng. 2020;2020 2604915-13(1-13) [Google Scholar]

[bib37] Wang D.Y., Wei S., Luo H.Y., Yue C.Q., Grunder O. A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine. Sci. Total Environ. 2017;580:719–733. doi: 10.1016/j.scitotenv.2016.12.018. [DOI] [PubMed] [Google Scholar]

[bib38] Wang R.H., Li C.S., Fu W.L., Tang G. Deep learning method based on gated recurrent unit and variational mode decomposition for short-term wind power interval prediction. IEEE Trans. Neural Netw. Learn. Syst. 2020;31(10):3814–3827. doi: 10.1109/TNNLS.2019.2946414. [DOI] [PubMed] [Google Scholar]

[bib39] Xiang Y., Jia Y.H., Chen L.L., Guo L., Shu B.Z., Long E.S. COVID-19 epidemic prediction and the impact of public health interventions: a review of COVID-19 epidemic models. Infect. Dis. Model. 2021;6:324–342. doi: 10.1016/j.idm.2021.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Yang H., Gao L.P., Li G.H. Underwater acoustic signal prediction based on MVMD and optimized kernel extreme learning machine. Complexity. 2020;2020 (1-18) [Google Scholar]

[bib41] Yang H., Li L.L., Li G.H. A new denoising method for underwater acoustic signal. IEEE Access. 2020;8:201874–201888. [Google Scholar]

[bib42] Yang H., Gao L.P., Li G.H. Underwater acoustic aignal prediction based on correlation variational mode decomposition and error compensation. IEEE Access. 2020;8:103941–103955. [Google Scholar]

[bib43] Yang H., Li L.L., Li G.H., Guan Q.R. A novel feature extraction method for ship-radiated noise. Def. Technol. 2021 doi: 10.1016/j.dt.2021.03.012. [DOI] [Google Scholar]

[bib44] Yang H., Cheng Y.X., Li G.H. A denoising method for ship radiated noise based on Spearman variational mode decomposition, spatial-dependence recurrence sample entropy, improved wavelet threshold denoising, and Savitzky-Golay filter. Alex. Eng. J. 2021;60(3):3379–3400. [Google Scholar]

[bib45] Zhang G.P., Liu X.D. Prediction and control of COVID-19 spreading based on a hybrid intelligent model. Plos One. 2021;16(2) doi: 10.1371/journal.pone.0246360. (1-12) [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Zhang H., Cui W.T., Kang Z.J., Yang T., Lou B., Chi Y.T., Long H., Ma M., Yuan Q., Zhang S.P., Zhang D., Xin J.M. Predicting COVID-19 using hybrid AI mode. IEEE Trans. Cybern. 2020;50(7):2891–2904. doi: 10.1109/TCYB.2020.2990162. [DOI] [PubMed] [Google Scholar]

[bib47] Zheng J.D., Pan H.Y., Cheng J.S. Rolling bearing fault detection and diagnosis based on composite multiscale fuzzy entropy and ensemble support vector machines. Mech. Syst. Signal Process. 2017;85:746–759. [Google Scholar]

[bib48] Zheng X.X., Zhou G.W., Ren H.H., Fu Y. A rolling bearing fault diagnosis method based on variational mode decomposition and permutation entropy. J. Vib. Shock. 2017;36(22):22–28. [Google Scholar]

PERMALINK

A new hybrid prediction model of cumulative COVID-19 confirmed data

Guohui Li

Kang Chen

Hong Yang

Graphical Abstract

Abstract

1. Introduction

2. Basic theory

2.1. Variational mode decomposition

2.2. VMD optimized by gradient-based optimizer

Fig. 1.

2.3. Autoregressive integrated moving average

2.4. Extreme learning machine

Fig. 2.

3. The proposed prediction model

Fig. 3.

4. Simulation experiment

4.1. The experimental data source

Fig. 4.

4.2. Compared model description

4.3. The evaluation index

4.4. American data simulation experiment

4.4.1. GVMD decomposition result

Table 1.

Fig. 5.

Fig. 6.

4.4.2. ARIMA processing process

Fig. 7.

Fig. 8.

Fig. 9.

4.4.3. Prediction result of each model

Fig. 10.

Fig. 11.

Fig. 12.

Fig. 13.

Table 2.

4.5. India simulation result

Fig. 14.

Fig. 15.

Fig. 16.

Table 3.

4.6. Russia simulation result

Fig. 17.

Fig. 18.

Fig. 19.

Table 4.

5. Multi-step prediction

Fig. 20.

Fig. 21.

6. Discussion

7. Conclusions

Declaration of Competing Interest

Acknowledgements

Biographies

Appendix A. The nomenclature used in this paper

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases