Graphical abstract

Keywords: COVID-19, Spatial information, Time series, Deep learning, STG-Net, Confirmed cases forecasting
Abstract
The modern urban population features a high population density and a fast population flow, and COVID-19 has strong transmission ability, long incubation period, and other characteristics. Considering only the time sequence of COVID-19 transmission cannot effectively respond to the current epidemic transmission situation. The distance between cities and population density information also have a significant impact on the transmission of the virus. Currently, cross-domain transmission prediction models do not fully exploit the time–space information and fluctuation trend of data, and cannot reasonably predict the trend of infectious diseases by integrating time–space multi-source information. To solve this problem, this paper proposes the COVID-19 prediction network (STG-Net) based on multivariate spatio-temporal information, which introduces the Spatial Information Mining module (SIM) and the Temporal Information Mining module (TIM) to mine the spatio-temporal information of the data in a deeper level, and uses the slope feature method to further mine the fluctuation trend of the data. Also, we introduce the Gramian Angular Field module (GAF), which converts one-dimensional data into two-dimensional images, further enhancing the network's feature mining capability in the time and feature dimension, ultimately combining spatiotemporal information to predict daily newly confirmed cases. We tested the network on datasets from China, Australia, the United Kingdom, France, and Netherlands. The experimental results show that STG-Net has better prediction performance than existing prediction models, with an average decision coefficient R2 of 98.23% on the datasets from five countries, as well as good long- and short-term prediction ability and overall good robustness.
1. Introduction
Since the outbreak of COVID-19, the epidemic has spread rapidly in many countries and regions around the world, and the World Health Organization has classified the epidemic as a “global pandemic” [1]. It has become a great challenge for governmental agencies to quickly designate appropriate prevention strategies for this major public health event. Therefore, early prediction of the epidemic trend is needed to help the government prevent and effectively control a large-scale outbreak of COVID-19. In terms of epidemic prediction, the main methods used by researchers are mathematical epidemic models and deep learning methods.
In mathematical epidemiological models, most studies have analyzed the spread of COVID-19 by modifying and adapting two models, SIR (Susceptible, Infected, and Recovered) and SEIR (Susceptible, Exposed, Infected, and Recovered) [2], [3]. Liao et al. [4] proposed a time-window-based SIR prediction model that dynamically analyzes data through time windows and then uses machine learning methods to predict the underlying reproduction number and exponential growth rate of the epidemic. Singh et al. [5] proposed a generalized SIR (GSIR) prediction framework that captures different waves of different times through GSIR pandemic data and captures the impact of government decisions through dynamic modeling of parameters. Liu et al. [6] developed a region-based SEIR model that considers infections within spatio-temporal regions during commuting and quarantine, demonstrating the importance of spatio-temporal restriction policies such as social distance for epidemic prevention and control. Cooper et al. [7] proposed a new model based on the SIR model, which is able to dynamically track changes in community infection. However, traditional prediction models consider fewer factors, and the spread of COVID-19 is influenced by several factors, such as population movement [8], virus variants [9], and city size [10], which can affect the ability of COVID-19 to spread. Therefore, more advanced methods should be used in order to predict COVID-19 more effectively.
With the development of computer technology, deep learning techniques have started to be widely applied to the diagnosis and prediction of COVID-19. For the diagnosis of COVID-19, Wang et al. [11] proposed a self-adjusting convolutional neural network PSTCNN guided by a PSO particle swarm optimization algorithm, which can automatically adjust the hyperparameters in the COVID-19 diagnosis and significantly improve the performance of the algorithm. Wang et al. [12] proposed a deep learning model WE-SAJ for the fast diagnosis and detection of COVID-19, which analyzes the CT images of patients, uses wavelet entropy for feature extraction, uses a double-layer FNN for classification, and uses the adaptive Jaya algorithm as the training algorithm. For prediction of COVID-19, Liao et al. [13] proposed a time-dependent SIRVD prediction model based on deep learning techniques combined with a mathematical model of infectious diseases incorporating mortality and vaccination rates, resulting in a 50 % performance improvement of the model in single-day prediction. Chandra et al. [14] performed a multi-step (short-term) COVID-19 prediction for India using a bidirectional LSTM and successfully captured the first wave (2020) and second wave (2021) of infections in India. Bhimala et al. [15] analyzed the relationship between weather factors and COVID-19 cases, and they added temperature and humidity as features to the LSTM network, thus improving the predictive power of the network. However, with the rapid development of modern transportation and the accelerated population movement, diseases are more likely to spread across regions, and ordinary prediction methods cannot incorporate geographical information well [16]. Graph Neural Network (GNN) plays a great role in dealing with the information transfer of spatial nodes, and the use of graph neural networks can better mine potential regional features and make more accurate predictions of epidemic trends [17]. Panagopoulos et al. [18] proposed a spatio-temporal prediction network model to construct graph information with social networks of human groups, and through graph neural networks to mine the potential relationship between population movement and the spread of NCCP to predict future confirmed cases. Although the above methods have obtained good prediction results, certain problems do occur in these methods. For example, most methods only consider the temporal aspect of the data and do not simultaneously take into account population and location information between cities. The considered factors are limited and rely on a series of assumptions, which makes it difficult to obtain the necessary data and limits its generalization to most countries.
To address the above problems, we propose a novel fusion network (STG-Net) containing three modules, in which the Spatial Information Mining module (SIM) uses Graph Convolutional Neural Network (GCN) to mine spatial information of each province (state) of each country, and the Temporal Information Mining module (TIM) uses long short-term memory network (LSTM) to mine the temporal correlation of data.
In order to be able to better integrate the feature dimension with the temporal dimension, we also propose a Gramian Angular Field transformation module (GAF), which transforms the one dimensional temporal data into two dimensional image data and uses a Convolutional Neural Network (CNN) for feature extraction. STG-Net fuses the multivariate information of the three modules and uses an end-to-end training approach to penalize the loss functions of the three modules simultaneously using a joint optimization method. The contributions of this paper can be summarized as follows:
-
•
The data features are diversified by introducing a data enhancement method based on slope features, while a weighted moving average method is used for noise elimination of the data.
-
•
A multivariate spatio-temporal information prediction network (STG-Net) containing a Spatial Information Mining module (SIM), a Temporal Information Mining module (TIM), and a Gramian Angular Field transformation module (GAF) is proposed to mine the spatial information, temporal information, and temporal and feature fusion information of the data by the three modules, respectively.
-
•
The STG-Net was used in each of the five countries in different continents to make accurate predictions of the epidemic with very strong generalizability, while the prediction performance was improved compared to other comparative methods.
The rest of the paper is organized as follows: in the second part, we presented the proposed method. In the third part, we performed experiments and analyzed the experimental results to demonstrate the effectiveness of STG-Net. Then, in the fourth section, we have some discussions. The last part is the summary of the paper.
2. Materials and methods
Our prediction process is divided into three main parts: data collection, data pre-processing, and construction of the STG-Net prediction network. The overall workflow is shown in Fig. 1 .
Fig. 1.
Overview of the STG-Net Predictive Workflow. (a) Collection of COVID-19 Time Series Data. (b) Pre-processing of collected COVID-19 Data. (c) Training of STG-Net network using processed data for predicting COVID-19 infections.
Step 1: Data collection. First, the data set of COVID-19 was obtained, including the daily number of new confirmed cases, cumulative number of confirmed cases, recovered cases, and deaths in each province (state) of a single country. Then the distance of the geographic center of mass of each province (state) and the number of resident population in that province (state) are collected and used to construct the adjacency matrix relationship matrix.
Step 2: Data pre-processing. The COVID-19 dataset obtained in the first step is processed using the slope feature approach. Then the data are noise-reduced and smoothed using the weighted moving average method.
Step 3: Construction of STG-Net prediction network. Based on the data obtained in the previous step, the input vector of STG-Net is constructed. The three modules of STG-Net are jointly optimized to select the best model parameters. The number of COVID-19 infections is then predicted.
STG-Net is designed to assess the variation in the daily number of COVID-19 additions, and in the rest of this section, we describe each component in detail.
2.1. COVID-19 data preparation
2.1.1. Data source
Since the outbreak of Novel Corona Virus Pneumonia, many institutions have released a large number of public data sources, and they have helped researchers to analyze and predict the outbreak by updating the data in real time. The data used in this paper are time-series data collected by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (https://github.com/CSSEGISandData/COVID-19) [19], which provides variables such as country, province (state), longitude, and latitude, as well as dates corresponding to the cumulative number of confirmed cases, cumulative cured cases and cumulative deaths corresponding to the date. The countries considered in this paper include China, Australia, Netherlands, the United Kingdom, and France. The time series data for these five countries cover the dates from January 22, 2020 to October 6, 2022. Each country has a total of 990 data points. Table 1 shows a portion of the sample data for China in the dataset. The data format for the other countries used in the study is the same as that for China.
Table 1.
Sample data from Johns Hopkins University of global confirmed cases.
| Province/State | Country/Region | Latitude | Longitude | 1/22/20 | 1/23/20 | … | 9/29/21 | 9/30/21 | … |
|---|---|---|---|---|---|---|---|---|---|
| Beijing | China | 40.1824 | 116.4142 | 14 | 22 | … | 1123 | 1124 | … |
| Chongqing | China | 30.0572 | 107.874 | 6 | 9 | … | 603 | 603 | … |
| Fujian | China | 26.0789 | 117.9874 | 1 | 5 | … | 1282 | 1282 | … |
| Gansu | China | 35.7518 | 104.2861 | 0 | 2 | … | 199 | 199 | … |
2.1.2. Data pre-processing
In general, there will be noise in the collected data, which will affect the subsequent use of deep learning methods to extract temporal features. To deal with this problem, we smoothed the data using a weighted moving average method to handle data noise and anomalies. We give greater weight to the most recent case features relative to the historical case features. For example, for confirmed cases, the most recent confirmed cases have the most influence, and historical confirmed cases become progressively less influential as time lengthens. The equations are as follows.
| (1) |
The variable is the window size, is the weight coefficient, and is the moving average at time . Usually we set the weight coefficients to , the nearest one has a numerical weight of , the next closest one is , and so on, as shown in Equation (2).
| (2) |
Recent fluctuations in the data can often indicate recent trends in the data. For example, if the number of daily additions is on an upward trend, it is likely that the number of daily additions will also be on an upward trend in the near future. Therefore, in order to obtain the recent fluctuations of the data and at the same time improve the learning efficiency of the model, the slope information is also used as a feature in this paper. The slope value of the th feature at time is defined as Equation (3).
| (3) |
is the value of the th feature at moment and is the value of the th feature at moment . When , the data has an upward trend; when , the data has a downward trend.
We used the cumulative number of confirmed cases, the daily number of deaths, the daily number of recoveries, and the daily number of new confirmed cases and their slopes on two adjacent days as the inputs to the network. Also, the number of new confirmed cases on the next day is used as the output. We use MinMaxScaler for normalization so that the input data are between [0,1], which puts the indicators in the same order of magnitude and facilitates a comprehensive comparative evaluation.
2.2. Multivariate spatio-temporal information prediction network (STG-Net)
We propose a new multivariate spatio-temporal information prediction network (STG-Net), which is based on three modules: 1) Spatial Information Mining module (SIM), 2) Temporal Information Mining module (TIM), and 3) Gramian Angular Field transformation module (GAF). The first two modules are used to mine the spatial and temporal information of the data, respectively, while the third module is designed to enhance the feature representation and better extract the multidimensional features of the data (Fig. 2 ).
Fig. 2.
Diagrammatic representation of our STG-Net, starting with (A) an overview of the processing flow, followed by a zoomed-in view of each module, including (B) the Spatial Information Mining module (SIM), (C) the Temporal Information Mining module (TIM), and (D) the Gramian Angular Field transformation module (GAF).
In this work, we choose to use a joint optimization approach to penalize the loss functions of the three modules simultaneously to achieve the best performance of STG-Net. A detailed description of each module will be given below.
2.2.1. Spatial information mining module (SIM)
In order to better mine the spatial information of the data, we first propose a spatial information mining module. We first use to represent the network topology between regions of a country, where is the set of regional nodes of the country and is the set of edges between regions. For the values of edges between two adjacent nodes and in the network topology, we consider two factors, distance and population [20]. Taking Australia as an example, seven state regions and one capital region of Australia are chosen as the set of nodes . The adjacency matrix is used to represent the network topology of these eight regions (as shown in Fig. 3 ).
Fig. 3.
Construction of Network Topology Using Australia as an Example (a) Determining node positions by geographic centroid latitude and longitude of each state in Australia. (b) Building network structure by connecting neighboring nodes of each state in Australia.
For distance, the smaller the distance between nodes and , the greater the probability of virus propagation to each other. Therefore, we choose the latitude and longitude of the geographical center of mass of nodes and and calculate the distance between the two nodes by equation (4). For node with () and node with (), we get the distance:
| (4) |
Where:
| (5) |
| (6) |
| (7) |
| (8) |
In the above equations, represents the diameter of the earth. denotes the distance between the points and . () and () denote the longitude and latitude of and respectively. and represent the longitude and latitude converted into radians respectively.
Also, we consider the effect of the population size of each state on the graph matrix. The larger the population size, then the greater the population mobility between states and the easier the virus spreads. We calculate the population mobility at the nodes and by Equation (9).
| (9) |
is the number of resident population of and is the number of resident population of .
Finally, we calculate the value of in the set by considering the distance and population factors together and obtain the adjacency matrix relationship graph by equation (10).
| (10) |
After constructing the adjacency matrix relationship graph, we use Graph Convolutional Neural Network (GCN) to mine the node feature relationships implicitly in the graph nodes. For each node, the GCN takes its neighbor node relationship and the feature information contained in itself as input and obtains the output result after multiple convolutional layers. Our constructed GCN network consists of three layers and employs symmetric normalization Laplacian matrix for forward propagation, with the weight serving as the parameter in the forward propagation equation. The feedforward computation process of GCN is shown in Equations (11), (12).
| (11) |
| (12) |
denotes the number of convolution layers currently located; denotes the adjacency matrix with self-loop, adding self-loop to the original adjacency matrix; is the degree matrix corresponding to ; is the Laplace matrix regularization form; denotes the weight coefficient of the th layer; and denotes the Relu function. The network structure of GCN is shown in Fig. 4 .
Fig. 4.
GCN network structure.
2.2.2. Temporal information mining module (TIM)
To be able to better mine the temporal information of the data, we add a temporal information mining module to the multivariate spatio-temporal information prediction network, through which the temporal correlation and structure of the data are automatically learned to make a better analysis of the periodicity and trend of the data. This module mainly consists of an LSTM model [21], which adds a Cell and three Gates compared to the original RNN. The Cell is used to preserve useful information from the previous moment, and the value size of the Gate determines the degree of input information retention. Specifically, the feedforward calculation process of LSTM is shown in Equation (13), (14), (15), (16), (17), (18).
| (13) |
| (14) |
| (15) |
| (16) |
| (17) |
| (18) |
denote the parameter matrices associated with the three Gates, denote the bias parameters associated with the three Gates, and is the Sigmoid function, ° represents the operation of multiplying the elements of two vectors at the same position. The network structure of LSTM is shown in Fig. 5 .
Fig. 5.
Network structure of LSTM.
2.2.3. Gramian angular field transformation module (GAF)
Finally, in order to be able to better integrate the feature dimension with the temporal dimension, we propose a Gramian angular field transformation module. Usually, time series data are one-dimensional, while in fact, the temporal data have another dimension besides the feature variables, which is time itself, and transforming one-dimensional time series into two-dimensional image data can not only enhance the data [22], but also uncover the deeper information of the data [23]. We try to extract information from the COVID-19 feature time series variation by converting it into a two-dimensional Gramian matrix using the Gramian angular field transformation, and the Gramian matrix construction process is shown in Algorithm 1.
| Algorithm 1: Gramian matrix transformation | |
|
Input:, Output:, Y, is a one-dimensional time series, is a two-dimensional Gramian matrix, and is the selected time step. | |
| 1: | Normalize the event time series features X |
| 2: | |
| 3: | |
| 4. | |
| 5. | end |
| 6. | , |
| 7. | Construct polar coordinates by using the time stamp as the radius r and the inverse cosine of the scaled value as the angle number |
| 8. | |
| 9. | |
| 10. | |
| 11. | end |
| 12. | , |
| 13. | The in polar coordinates are summed in pairs, and then the cosine is taken as the size of the node in the matrix point |
| 14. | |
| 15. | |
| 16. | |
| 17. | end |
| 18. | , Y |
By the above method we can then convert the one dimensional time series into two dimensional image data by constructing polar coordinates (Fig. 6 ), and then we add a Convolutional Neural Network (CNN) to the Gramian Angular Field transformation module for feature extraction and prediction.
Fig. 6.
Conversion Process of Gramian Angular Field. (a) Scaling of a One-Dimensional Time Series. (b) Transformation of the Scaled Time Series to Polar Coordinates. (c) Final Gram Matrix Obtained.
The core network in the Gramian angular field transformation module is a Convolutional Neural Network (CNN) [24]. The convolutional neural network mainly consists of an input layer, a hidden layer and an output layer, and the hidden layer contains several convolutional layers, pooling layers and fully connected layers. The convolutional layer is the core part of the CNN, and its main task is to extract the features of the image. The computation in the convolutional layer is to convolve the input feature vector with the convolutional kernel, and the result is then transformed by the activation function to obtain a new feature. The specific computation is shown in Equation (19).
| (19) |
is the output value of the th neuron in the th layer, is the weight of the convolution kernel, is the input value, is the bias, is the convolution operation, and is the activation function. The image features can be extracted using alternating operations of the convolution and pooling layers, and then the output values of the samples are obtained through the fully connected layer. The network structure of CNN is shown in Fig. 7 .
Fig. 7.
Network structure of CNN.
Finally, we select the data of the previous days as the time series features of the nodes and input them to STG-Net containing the Spatial Information Mining module (SIM), the Temporal Information Mining module (TIM) and the Gramian Angular Field transformation module (GAF) to predict the number of COVID-19 confirmations on day .
| (20) |
| (21) |
denotes the value of the th feature in the data on day , , denotes the set of all features for day in that country. We use a joint optimization approach to penalize the loss functions of the three modules simultaneously to achieve the best performance of the model by minimizing the final objective.
| (22) |
is our final target, is the loss of the Spatial Information Mining module (SIM), is the loss of the Temporal Information Mining module (TIM), is the loss of the Gramian Angular Field transformation module (GAF), and are the adjustment factors.
2.2.4. Model evaluation metrics
In this paper, three evaluation metrics are used to compare the actual and predicted values to assess the performance of the model. They are: Mean Absolute Error (MAE), Mean Square Error (MSE), and R-squared (R2). Mean Absolute Error (MAE) better reflects the actual prediction error, Mean Squared Error (MSE) is the expected value of the difference between the prediction and the actual value squared, and it can evaluate the degree of change of the data, and R-squared (R2) can evaluate the model's goodness of fit. The calculation is shown in Equations (23), (24), (25).
| (23) |
| (24) |
| (25) |
In the above equations, is the total number of samples, is the prediction value of the model, is the true value of the sample, and is the average of the sample.
3. Numerical results
3.1. Parameter setting
Our experiments are conducted on NVIDIA GTX 1070 using Numpy, Pandas and Pytorch open source platforms, and the main hyperparameters used in the experiments are shown in Table 2 .
Table 2.
Hyperparameters used for training.
| Hyperparameter | Possible values |
|---|---|
| Epochs | {300} |
| Batch Size | {32} |
| Learning rate | {0.005,0.0005,0.00005} |
| Optimizer | {Adam} |
In this paper, all three modules of STG-Net are implemented using the Pytorch package and use MSE as the loss function. Table 3 shows the number of layers, the total number of parameters and the model size for each method.
Table 3.
Number of layers, total number of parameters and the model size for each module.
| Module | Parameter | Values |
|---|---|---|
| SIM | Number of layers | 6 |
| Total params | 120,193 | |
| Total model size | 0.458 MB | |
| TIM | Number of layers | 3 |
| Total Params | 50,497 | |
| Total model size | 0.193 MB | |
| GAF | Number of layers | 5 |
| Total Params | 30,015 | |
| Total model size | 0.114 MB |
The dataset used in the experiments is the historical data of the epidemic in Australia from April 29, 2020 to October 6, 2022, and we set the time series window size to 3 days. The data is divided into a training set and a test set to train and test our prediction model.
3.2. Experiments and analysis results
3.2.1. Method validity experiment
In our proposed STG-Net network, the Temporal Information Mining (TIM) module employs LSTM as the backbone network, while the Gram Angle Field (GAF) transformation module utilizes the Gram Angle Field transformation method to obtain a two-dimensional representation of the data. To prove the effectiveness of the methods used, a series of validity tests were performed on the test set. First, we tested the data on LSTM, BiLSTM, GRU, and BiGRU networks and the results are shown in Table 4 .
Table 4.
LSTM, BiLSTM, GRU and BiGRU experimental results.
| MAE | MSE | R2 | |
|---|---|---|---|
| GRU | 0.1225 | 0.0172 | 80.98 % |
| BiGRU | 0.0868 | 0.0111 | 85.69 % |
| BiLSTM | 0.0721 | 0.0081 | 86.30 % |
| LSTM[25] | 0.0331(↓0.0894) | 0.0029(↓0.0143) | 87.87 %(↑6.89 %) |
To better highlight the differences in the prediction results of the four methods, we have conducted a statistical analysis on the average prediction performance indices obtained from Table 4 and visualized the results as shown in Fig. 8 . After comparison, we finally chose the LSTM network proposed by Yan et al. [25] as our baseline.
Fig. 8.
Comparison of Results for LSTM, BiLSTM, GRU, and BiGRU (The LSTM model has better prediction performance compared to GRU, BiGRU, and BiLSTM, with a maximum decrease of 0.0894 in MAE, a maximum decrease of 0.0143 in MSE, and a maximum increase of 6.89 % in R2).
When processing one-dimensional time-series data, we have the option of transforming the one-dimensional time-series signal into a two-dimensional image, and then using advanced image processing techniques such as convolutional neural networks to extract more data features from the one-dimensional time-series data. Methods for transforming one-dimensional data into a two-dimensional image include Gramian Angular Field (GAF), Continuous Wavelet Transform (CWT), Markov Transition Field (MTF), Recurrence Plot (RP), and Short-Time Fourier Transform (STFT). We utilized these methods to transform our collected one-dimensional test data into two-dimensional images, as shown in Fig. 9 .
Fig. 9.
The one-dimensional data is transformed into two-dimensional images using Gramm Angle Field (GAF), Continuous Wavelet Transform (CWT), Markov Transition Field (MTF), Recursive Plot (RP), and Short-Time Fourier Transform (STFT). (a) The image transformed using GAF, (b) The image transformed using CWT, (c) The image transformed using MTF, (d) The image transformed using RP, (e) The image transformed using STFT.
We conducted experiments on the dataset using the GAF, CWT, MTF, RP and STFT methods, as shown in Table 5 . Experimental results reveal that the GAF method produced the best results. Compared to the other four methods, GAF achieved the highest reduction in MAE by 0.0893, the highest reduction in MSE by 0.0177, and the highest increase in the R2 coefficient by 21.27 %. The results have demonstrated the efficacy of the GAF method.
Table 5.
Performance of GAF, CWT, MTF, RP, and STFT methods on the dataset.
| MAE | MSE | R2 | |
|---|---|---|---|
| CWT | 0.1424 | 0.0221 | 71.38 % |
| STFT | 0.1502 | 0.0296 | 79.02 % |
| RP | 0.0586 | 0.0065 | 88.99 % |
| MTF | 0.0552 | 0.0044 | 92.52 % |
| GAF | 0.0531(↓0.0893) | 0.0044(↓0.0177) | 92.65 %(↑21.27 %) |
3.2.2. Ablation experiments
Our STG-Net network contains three modules: the Temporal Information Mining module (TIM), the Spatial Information Mining module (SIM), and the Gramian Angular Field transformation module (GAF). Yan et al.'s [25] method was taken as our baseline, and we used the control variable approach to demonstrate the positive impact of each module on the network's prediction ability. Firstly, we added each of our designed modules individually to the baseline, then added two different module combinations to the baseline, and finally added all three modules to the baseline. Table 6 shows the comparison of performance in single-day prediction on the Australian test set for different module combination methods.
Table 6.
Predicted effects of ablation experiments.
| MAE | MSE | R2 | |
|---|---|---|---|
| LSTM [25] | 0.0331 | 0.0029 | 87.87 % |
| SIM | 0.0323 | 0.0028 | 89.39 % |
| GAF | 0.0223 | 0.0013 | 92.65 % |
| TIM | 0.0226 | 0.0015 | 93.48 % |
| TIM + SIM | 0.0159 | 0.0012 | 95.05 % |
| SIM + GAF | 0.0153 | 0.0011 | 95.12 % |
| TIM + GAF | 0.0147 | 0.0010 | 96.15 % |
| TIM + SIM + GAF | 0.0113(↓0.0218) | 0.0007(↓0.0022) | 97.10 %(↑9.23 %) |
As shown in Table 6, each individual module of STG-Net contributes to the improvement of baseline prediction performance, and the combination of multiple modules further enhances the model performance. Among these combinations, the best prediction performance is achieved by fusing TIM, SIM, and GAF, with a MAE of 0.0113, MSE of 0.0007, and R2 of 97.10 %. The R2 value is 9.23 % higher than that of the baseline. As shown in Fig. 10 , in order to provide a more intuitive comparison, we visually present the prediction results of the TIM module, which has the best performance in the single module, the TIM + SIM module with the best performance among the two-module combinations, and the TIM + SIM + GAF fused module.
Fig. 10.
Comparison of Prediction Results between Baseline and Different Combinations of TIM, SIM, and GAF Modules (The Network Prediction Value with the Fusion of TIM, SIM, and GAF Modules is Closer to the Ground Truth).
3.2.3. Robustness experiments
To demonstrate the generalizability of our method, we used data from four countries: China, France, the Netherlands, and the United Kingdom, as validation. The data of COVID-19 are the same for the four countries, all from April 29, 2020 to October 6, 2022. The prediction effects using STG-Net for each of these four countries are shown in Table 7 .
Table 7.
Prediction effects of the four countries.
| Country | MAE | MSE | R2 |
|---|---|---|---|
| China | 0.0072 | 0.0002 | 99.45 % |
| France | 0.0128 | 0.0008 | 96.48 % |
| Netherlands | 0.0066 | 0.0002 | 99.33 % |
| United Kingdom | 0.0139 | 0.4334 | 98.40 % |
As it can be seen from Table 7, our STG-Net performs very well in single-day forecasting on all four country datasets, with R2 scores above 95 %. This proves that our STG-Net has good robustness, and Fig. 11 presents the prediction results for four countries.
Fig. 11.
Robustness Experiment of STG-Net (Performance of STG-Net on Single-Day Prediction on Datasets from China, France, the Netherlands, and the UK).
3.2.4. Comparison experiment
In order to verify the improvement in performance of STG-Net compared to existing methods, we first compared STG-Net with Autoregressive (AR) model, Moving Average (MA) model, Autoregressive Integrated Moving Average (ARIMA) model, and DeepAR model. Table 8 shows a comparison of the R2 coefficient in single-day forecasting between STG-Net and these four forecasting models.
Table 8.
Experimental results of comparison between STG-Net and AR, MA, ARIMA, and DeepAR models in terms of R2 coefficient.
| Method | Australia | China | France | Netherlands | United Kingdom |
|---|---|---|---|---|---|
| AR | 71.03 % | 78.20 % | 51.53 % | 44.85 % | 71.45 % |
| MA | 37.10 % | 42.24 % | 40.12 % | 52.37 % | 49.68 % |
| ARIMA | 59.91 % | 82.07 % | 56.95 % | 37.85 % | 81.85 % |
| DeepAR | 86.75 % | 84.36 % | 87.26 % | 90.18 % | 88.36 % |
| STG-Net | 97.10 % | 99.45 % | 96.48 % | 99.33 % | 98.83 % |
To better compare the experimental results, we averaged the results of the five methods across five countries and evaluated the generalizability of the models by comparing the average evaluation metrics. As shown in Fig. 12 , according to the experimental results, STG-Net shows a significant performance improvement compared to the other four methods, with the highest improvement of 53.94 % in the R2 coefficient.
Fig. 12.
Comparison of Average Performance between STG-Net and AR, MA, ARIMA, and DeepAR Models (STG-Net showed significant improvement compared to the other four methods with the highest R2 coefficient increase of 53.94 %).
Additionally, we also replicated existing deep learning predictive models from previous studies and conducted a comparison on the test sets from five countries. Among them, VOC-DL uses the method of Liao et al. [9] and BD-LSTM uses the method of Chandra et al. [14]. Table 9 shows the comparison between STG-Net and the other two forecasting models for single-day forecasting.
Table 9.
Comparison of experimental results.
| Method | Country | MAE | MSE | R2 |
|---|---|---|---|---|
| Australia | 0.0850 | 0.0090 | 84.22 % | |
| China | 0.0334 | 0.0024 | 93.10 % | |
| BD-LSTM [14] | France | 0.0432 | 0.0028 | 88.40 % |
| Netherlands | 0.0436 | 0.0025 | 90.95 % | |
| United Kingdom | 0.0312 | 0.0038 | 90.57 % | |
| Average | 0.0472 | 0.0041 | 89.45 % | |
| Australia | 0.0688 | 0.0066 | 88.49 % | |
| China | 0.0339 | 0.0019 | 94.41 % | |
| VOC-DL [9] | France | 0.0413 | 0.0026 | 89.27 % |
| Netherlands | 0.0167 | 0.0009 | 96.65 % | |
| United Kingdom | 0.0265 | 0.0023 | 94.19 % | |
| Average | 0.0374 | 0.0028 | 92.60 % | |
| Australia | 0.0113 | 0.0007 | 97.10 % | |
| China | 0.0072 | 0.0002 | 99.45 % | |
| STG-Net | France | 0.0128 | 0.0008 | 96.48 % |
| Netherlands | 0.0066 | 0.0002 | 99.33 % | |
| United Kingdom | 0.0133 | 0.0005 | 98.83 % | |
| Average | 0.0102 | 0.0004 | 98.23 % |
We selected the R2 coefficient, which measures the model's fitting effect, for visualization, and the comparison results are shown in Fig. 13 .
Fig. 13.
Comparison of Test Results of BD-LSTM, STG-Net and VOC-DL Methods in Australia, China, France, Netherlands and United Kingdom.
In order to better compare the experimental results, we averaged the experimental results of the three methods in five countries and evaluated the generalizability of the model by comparing the average values of the assessment indicators, and the comparison results are shown in Table 10 .
Table 10.
Comparison of the average prediction performance of the five countries.
To better observe the differences in the prediction results of the three methods, we conducted a statistical analysis of the average prediction performance indices obtained from Table 10 and visualized the results, as shown in Fig. 14 . Overall, our proposed STG-Net performed the best, with an average MAE of 0.0102, MSE of 0.0004, and R2 of 98.23 % across five countries. Compared to the comparison methods, MAE and MSE were reduced by up to 0.0369 and 0.0037 respectively, and R2 was improved by a maximum of 8.8 %. This demonstrates the superiority of our approach.
Fig. 14.
Comparison of Experimental Results (Comparison of average prediction indices of BD-LSTM, STG-Net, and VOC-DL methods in Australia, China, France, Netherlands, and the United Kingdom. STG-Net performed the best with an average MAE of 0.0102, MSE of 0.0004, and R2 of 98.23 % in the five countries).
3.2.5. Long and short-term prediction experiments
Meanwhile, to test the effectiveness of STG-Net in long and short-term prediction, we conducted prediction experiments on datasets from five countries for 3, 7, 14 and 28 days, respectively. The experimental results are shown in Table 11 .
Table 11.
Long and short-term prediction experimental results.
| Method | Country | MAE | MSE | R2 |
|---|---|---|---|---|
| Australia | 0.0113 | 0.0005 | 99.06 % | |
| China | 0.0111 | 0.0006 | 98.04 % | |
| 3-day | France | 0.0181 | 0.0014 | 93.81 % |
| Netherlands | 0.0095 | 0.0004 | 98.38 % | |
| United Kingdom | 0.0187 | 0.0012 | 96.82 % | |
| Average | 0.0137 | 0.00082 | 97.22 % | |
| Australia | 0.0193 | 0.0016 | 97.48 % | |
| China | 0.0168 | 0.0020 | 94.98 % | |
| 7-day | France | 0.0199 | 0.0016 | 94.02 % |
| Netherlands | 0.0149 | 0.0010 | 96.77 % | |
| United Kingdom | 0.0233 | 0.0020 | 95.60 % | |
| Average | 0.01884 | 0.00164 | 95.77 % | |
| Australia | 0.0343 | 0.0056 | 91.39 % | |
| China | 0.0272 | 0.0059 | 85.68 % | |
| 14-day | France | 0.0267 | 0.0031 | 88.72 % |
| Netherlands | 0.0246 | 0.0027 | 91.87 % | |
| United Kingdom | 0.0332 | 0.0040 | 97.10 % | |
| Average | 0.0292 | 0.00426 | 90.95 % | |
| Australia | 0.0374 | 0.0048 | 96.46 % | |
| China | 0.0493 | 0.0122 | 84.90 % | |
| 28-day | France | 0.0435 | 0.0072 | 88.32 % |
| Netherlands | 0.0451 | 0.0069 | 89.61 % | |
| United Kingdom | 0.0464 | 0.0060 | 93.52 % | |
| Average | 0.0443 | 0.00742 | 90.56 % |
As it can be seen from the table, STG-Net performs well in long- and short-term forecasting. Among them, the prediction performance of 3-day and 7-day is the best, with the average R2 coefficient reaching above 95 %, and the prediction performance of 14-day and 28-day is slightly worse, but the average R2 coefficient is still above 90 %, which we analyze is due to the rapid change of the number of infections caused by the government intervention policy of individual countries, so it leads to the poor fitting of the model to the long-term trend. However, in general, STG-Net is still universal and superior in its long- and short-term prediction ability.
4. Discussion
Since the outbreak of COVID-19, the epidemic has continued to spread rapidly, causing a huge impact on countries and regions worldwide, and how to accurately predict the trend of the spread of the epidemic is a key concern for researchers. In this paper, we propose a new multivariate spatio-temporal prediction model network (STG-Net), which performs information mining and feature transformation through spatial information mining module, temporal information mining module and Gramian angular field transformation, and simultaneously penalizes the loss functions of the three modules by using joint optimization to predict the trend of COVID-19. The advantage of STG-Net over existing prediction models is that it can not only take into account the changes of time-series features and analyze the data dynamically, but also combine the internal spatial features of regions and use GCN to mine spatial topological structure information, which increases the prediction performance of the model. Compared with the traditional time-series prediction model, STG-Net uses Gramian angular field transformation to convert the time-series features into two-dimensional images, which mines deeper feature information and optimizes the prediction performance of the model. In feature engineering, this paper introduces the slope trend as an “incremental feature” added to the data set, which can better combine the fluctuation trend of the data and make the model more sensitive to the volatility of the data set, so as to better predict the large oscillation of the epidemic trend.
As shown in the experimental results in Section 3.2.4, the STG-Net multi-dimensional time–space prediction network showed a significant improvement in performance compared to the AR model, MA model, ARIMA model, and DeepAR model, with the highest increase in the R2 coefficient of 53.94 %. Furthermore, our proposed STG-Net also showed advantages over the works of other researchers, with a maximum reduction of 0.0369 and 0.0037 in MAE and MSE, respectively, and a maximum increase of 8.8 % in R2 when compared to the comparison methods. Like all other modeling studies, the modeling study presented in this paper has some limitations. First, the modeling experiments involved in this paper are based on no change in national policies that would have an impact on epidemic trends due to policy changes. While in fact, national policies can promote or inhibit the spread and spread of the epidemic to some extent. In addition, the STG-Net multivariate spatio-temporal prediction network considers the infection data and spatial location information, and does not consider the social association information of each infected person. The social relationship of infected persons is also one of the important factors affecting the spread of the virus, and we may consider adding a social relationship mining module of infected persons in the future to make more accurate prediction of the epidemic by mining the social contact information of infected persons. As observed in the experimental results presented in Section 3.2.5 of long- and short-term prediction experiments, STG-Net demonstrates good performance in short-term predictions for three-day and seven-day horizons, however, its performance is slightly inferior in long-term predictions for 14-day and 28-day horizons. In the future, we will focus on improving long-term predictions by incorporating higher dimensional feature vectors to better capture the evolution of the epidemic.
5. Conclusion
We propose a new multivariate spatio-temporal prediction network (STG-Net). We first introduce the slope trend as an “incremental feature” into the dataset, and propose a spatial information mining module, a temporal information mining module and a Gramian angular field transformation module to mine the spatio-temporal information of the data, and then conduct training modeling on the COVID-19 dataset by jointly optimizing the loss function, and then predict the epidemic trends in five countries and regions around the world. The experimental results show that STG-Net has excellent prediction of new confirmed diagnoses in all five countries, with the coefficient of determination R2 reaching more than 95 %, which is of good robustness. Among them, STG-Net has the best performance in predicting the development of the epidemic in China, with the coefficient of determination R2 reaching 99.45 %, and the mean absolute error MAE and mean square error MSE being the smallest 0.0072 and 0.0002, respectively. In addition, compared with all the existing prediction models, STG-Net showed good robustness and superiority, with an average coefficient of determination R2 as high as 98.23 % and significantly lower average absolute error MAE and mean square error MSE of 0.0102 and 0.0004, respectively, on the test set of five countries. Meanwhile, STG-Net performs well in long and short-term forecasting, and the test results from five countries show that the best performance is achieved for 3-day and 7-day forecasts, with average R2 coefficient of over 95 %, and slightly worse for 14-day and 28-day forecasts, but the average R2 coefficient is still above 90 %. Overall, STG-Net shows superior performance in terms of prediction performance, generalizability, and long- and short-term prediction ability, which is a very important reference for determining epidemic trends and similar infectious disease prediction.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
Thanks to the authors for their help. Yucheng Song provided help with experiments. Huaiyi Chen provided assistance with statistical analysis. Xiaomeng Song provided writing assistance. Zhifang Liao provided dataset curation. Yan Zhang helped us proofread the article.
Data availability
I have indicated the link to the data in the article
References
- 1.Gulati A., Pomeranz C., Qamar Z., et al. A comprehensive review of manifestations of novel coronaviruses in the context of deadly COVID-19 global pandemic. Am. J. Med. Sci. 2020;360(1):5–34. doi: 10.1016/j.amjms.2020.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.AlQadi H., Bani-Yaghoub M. Incorporating global dynamics to improve the accuracy of disease models: example of a COVID-19 SIR model. PLoS One. 2022;17(4):e0265815. doi: 10.1371/journal.pone.0265815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Qiu Z., et al. Application of genetic algorithm combined with improved SEIR model in predicting the epidemic trend of COVID-19, China. Sci. Rep. 2022;12(1):1–9. doi: 10.1038/s41598-022-12958-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Liao Z., Lan P., Liao Z., Zhang Y., Liu S. TW-SIR: time-window based SIR for COVID-19 forecasts. Sci. Rep. 2020;10(1):1–15. doi: 10.1038/s41598-020-80007-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Singh P., Gupta A. Generalized SIR (GSIR) epidemic model: an improved framework for the predictive monitoring of COVID-19 pandemic. ISA Trans. 2022;124:31–40. doi: 10.1016/j.isatra.2021.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Liu J., Ong G.P., Pang V.J. Modelling effectiveness of COVID-19 pandemic control policies using an area-based SEIR model with consideration of infection during interzonal travel. Transp. Res. Part Policy Pract. 2022;161:25–47. doi: 10.1016/j.tra.2022.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Cooper I., Mondal A., Antonopoulos C.G., Mishra A. Dynamical analysis of the infection status in diverse communities due to COVID-19 using a modified SIR model. Nonlinear Dyn. 2022;109(1):19–32. doi: 10.1007/s11071-022-07347-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Nanda R.O., et al. Community mobility and COVID-19 dynamics in Jakarta, Indonesia. Int. J. Environ. Res. Public Health. 2022;19(11):11. doi: 10.3390/ijerph19116671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Liao Z., Song Y., Ren S., Song X., Fan X., Liao Z. VOC-DL: Deep learning prediction model for COVID-19 based on VOC virus variants. Comput. Methods Prog. Biomed. 2022;224 doi: 10.1016/j.cmpb.2022.106981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ribeiro H.V., Sunahara A.S., Sutton J., Perc M., Hanley Q.S. City size and the spreading of COVID-19 in Brazil. PLoS One. 2020;15(9):e0239699. doi: 10.1371/journal.pone.0239699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wang W., Pei Y., Wang S.-H., Gorrz J.M., Zhang Y.-D. PSTCNN: explainable COVID-19 diagnosis using PSO-guided self-tuning CNN. Biocell. 2023;47(2):373–384. doi: 10.32604/biocell.2021.0xxx. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wang W., Zhang X., Wang S.-H., Zhang Y.-D. Covid-19 diagnosis by WE-SAJ. Syst. Sci. Control Eng. 2022;10(1):325–335. doi: 10.1080/21642583.2022.2045645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Liao Z., Lan P., Fan X., Kelly B., Innes A., Liao Z. SIRVD-DL: a COVID-19 deep learning prediction model based on time-dependent SIRVD. Comput. Biol. Med. 2021;138 doi: 10.1016/j.compbiomed.2021.104868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chandra R., Jain A., Chauhan D.S. Deep learning via LSTM models for COVID-19 infection forecasting in India. PLoS One. 2022;17(1):e0262708. doi: 10.1371/journal.pone.0262708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bhimala K.R., Patra G.K., Mopuri R., Mutheneni S.R. Prediction of COVID-19 cases using the weather integrated deep learning approach for India. Transbound. Emerg. Dis. 2022;69(3):1349–1363. doi: 10.1111/tbed.14102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Fritz C., Dorigatti E., Rügamer D. Combining graph neural networks and spatio-temporal disease models to improve the prediction of weekly COVID-19 cases in Germany. Sci. Rep. 2022;12(1):1. doi: 10.1038/s41598-022-07757-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Li Y., Wang Y., Ma K. Integrating transformer and GCN for COVID-19 forecasting. Sustainability. 2022;14(16):16. doi: 10.3390/su141610393. [DOI] [Google Scholar]
- 18.G. Panagopoulos, G. Nikolentzos, M. Vazirgiannis, Transfer graph neural networks for pandemic forecasting, Proc. AAAI Conf. Artif. Intell. 35(6) (2021) 6, doi: 10.1609/aaai.v35i6.16616.
- 19.Dong E., Du H., Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 2020;20(5):533–534. doi: 10.1016/S1473-3099(20)30120-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Leiva G.C., Sathler D., Orrico Filho R.D. Urban structure and population mobility: implications for social distance and dissemination of COVID-19. Rev. Bras. Estud. Popul. 2020:e0118. [Google Scholar]
- 21.ArunKumar K.E., Kalaga D.V., Kumar C.M.S., Kawaji M., Brenza T.M. Comparative analysis of Gated Recurrent Units (GRU), long Short-Term memory (LSTM) cells, autoregressive Integrated moving average (ARIMA), seasonal autoregressive Integrated moving average (SARIMA) for forecasting COVID-19 trends. Alex. Eng. J. 2022;61(10):7585–7603. [Google Scholar]
- 22.Xu H., et al. Human activity recognition based on Gramian angular field and deep convolutional neural network. IEEE Access. 2020;8:199393–199405. doi: 10.1109/ACCESS.2020.3032699. [DOI] [Google Scholar]
- 23.Long Y., Zhou W., Luo Y. A fault diagnosis method based on one-dimensional data enhancement and convolutional neural network. Measurement. 2021;180 doi: 10.1016/j.measurement.2021.109532. [DOI] [Google Scholar]
- 24.Verma H., Mandal S., Gupta A. Temporal deep learning architecture for prediction of COVID-19 cases in India. Expert Syst. Appl. 2022;195 doi: 10.1016/j.eswa.2022.116611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.B. Yan et al., An improved method for the fitting and prediction of the number of COVID-19 confirmed cases based on LSTM, arXiv, May 13, 2020. 10.48550/arXiv.2005.03446. [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
I have indicated the link to the data in the article














