Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2022 Apr 8;114:69–77. doi: 10.1016/j.gr.2022.03.014

Improving performance of deep learning predictive models for COVID-19 by incorporating environmental parameters

Roshan Wathore a,b, Samyak Rawlekar c, Saima Anjum a, Ankit Gupta a,b, Hemant Bherwani a,b,, Nitin Labhasetwar a,b, Rakesh Kumar b,d
PMCID: PMC8990533  PMID: 35431596

Graphical abstract

graphic file with name ga1_lrg.jpg

Keywords: COVID-19, SARS-CoV-2, LSTM. Multivariate time series forecasting, Deep Learning

Abstract

The Coronavirus disease 2019 (COVID-19) pandemic has severely crippled the economy on a global scale. Effective and accurate forecasting models are essential for proper management and preparedness of the healthcare system and resources, eventually aiding in preventing the rapid spread of the disease. With the intention to provide better forecasting tools for the management of the pandemic, the current research work analyzes the effect of the inclusion of environmental parameters in the forecasting of daily COVID-19 cases. Three univariate variants of the long short-term memory (LSTM) model (basic/vanilla, stacked, and bi-directional) were employed for the prediction of daily cases in 9 cities across 3 countries with varying climatic zones (tropical, sub-tropical, and frigid), namely India (New Delhi and Nagpur), USA (Yuma and Los Angeles) and Sweden (Stockholm, Skane, Uppsala and Vastra Gotaland). The results were compared to a basic multivariate LSTM model with environmental parameters (temperature (T) and relative humidity (RH)) as additional inputs. Periods with no or minimal lockdown were chosen specifically in these cities to observe the uninhibited spread of COVID-19 and explore its dependence on daily environmental parameters. The multivariate LSTM model showed the best overall performance; the mean absolute percentage error (MAPE) showed an average of 64% improvement from other univariate models upon the inclusion of the above environmental parameters. Correlation with temperature was generally positive for the cold regions and negative for the warm regions. RH showed mixed correlations, most likely driven by its temperature dependence and effect of allied local factors. The results suggest that the inclusion of environmental parameters could significantly improve the performance of LSTMs for predicting daily cases of COVID-19, although other positive and negative confounding factors can affect the forecasting power.

1. Introduction

The Coronavirus Disease 2019 (COVID-19) pandemic caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV 2) has severely impacted the social, economic and environmental aspects of human lives. Various sectors, industries and businesses have been harshly impacted by the restrictions due to this global crisis (Bherwani et al., 2021, Ranjbari et al., 2021). Worldwide response to COVID-19 included travel bans, social distancing, lockdown of non-essential services and working from home in an attempt to “flatten the curve” and reduce burden on the healthcare system. The extent and effectiveness of these restrictions varied greatly across administrative/political, demographic, economic, and environmental factors.

This pandemic has spurred tremendous research efforts to better understand the virus survival and transmission. One of the study areas that has been widely explored is the effect of environmental conditions on the virus transmission has been well reviewed (Wathore et al., 2020; Gautam et al., 2021a; Bherwani et al., 2020) with most studies reporting increased temperature, UV radiation and wind speed reduce the risk of COVID-19 spread. Wang et al., (2020) analyzed data from 100 Chinese cities and 1005 US counties and found that high temperature and high humidity reduce the transmission of COVID-19. In Indonesia, Tosepu et al., (2020) determined that only average temperature was significantly correlated with the pandemic. Data from Italian province capitals showed that low wind speed, high moisture and occurrences of foggy days were associated with increased transmission of COVID-19 (Coccia, 2020). Wu et al., (2020) reported that with a unit increase in temperature, the COVID-19 cases reduced by about 3% whereas a unit increase in humidity caused the cases to shrink by about 1%. Lin et al., (2020) was concluded that low temperature with increased humidity causes increased spread of the virus and increased prevalence of the disease in Asian countries. Dbouk and Drikakis, (2020) explored energy and mass balance correlations with respect to the viability of the virus and concluded that high temperature and lower humidity significantly reduces the prevalence of the virus and reduces the transmission of disease. Some studies have suggested that the correlation with respect to humidity and temperature is not that linear as it is being projected and confounding variables such as physical distancing and restricted movements are an integral part of the rise and fall of cases and hence it may not be possible to pinpoint the exact relation of virus viability, transmissibility with the environmental attributes (Bherwani et al., 2020; Gupta et al., 2021). Bherwani et al., (2020a) incorporated environmental parameters (T and RH) into a Susceptible-Exposed-Infectious-Removed (SEIR) model and showed that the inclusion of environmental parameters is essential for improved model performance and systematic planning for handling the pandemic. However, allied parameters such as administrative restrictions, lockdowns, social and physical distancing have confounding effects and need detailed delineation in order to find out their impacts vis-a-vis environmental attributes.

Another study area which has been explored is the use of various modeling techniques to predict the COVID-19 cases in a country or region. Accurate forecasting is essential for managing a pandemic; it facilitates a better decision-making process and development of practical measures and strategic plans to enable preparedness and allocate (health) resources appropriately. Statistical models such as Auto Regressive Moving Average (ARIMA) have been used to predict COVID-19 cases (Bayyurt and Bayyurt, 2020, Benvenuto et al., 2020, Sahai et al., 2020, Singh et al., 2020). Other popular forecasting methods include epidemiological models such as the SEIR models and their variations (Bherwani et al., 2020; Gupta et al., 2020; Xu et al., 2020). Recently deep learning models such as Long Short-Term Memory (LSTM) and their variants have demonstrated improvements in the forecasting power compared to traditional forecasting approaches (Azarafza et al., 2020, Chimmula and Zhang, 2020). Devaraj et al., (2021) compared ARIMA and two LSTM variants (basic/vanilla and stacked) for daily COVID-19 cases in India and Chennai and determined that the stacked LSTM significantly outperformed the ARMIA model (46% reduction in mean absolute percentage error (MAPE)). Shoaib et al., (2021) concluded that LSTM showed the best predictive performance when the cases in 4 countries (Pakistan, USA, India and Brazil) were predicted using varied techniques including LSTM, ARIMA, Artificial Neural Network Models (ANN), Exponential Smoothing/Error Trend Seasonality (ETS) and Gene Expression Programming (GEP). Srivastava et al., (2021) did a comparative study of LSTM, ARIMA, Holt’s Linear forecasting model, Exponential smoothing, and Moving-average model algorithms to forecast the number of new cases in six countries (Italy, Spain, France, USA, China, and Australia), and also concluded that LSTM gave the best performance. Kırbaş et al., (2020) carried out a study in eight European countries using confirmed cases as the parameter of validation using ARIMA and Nonlinear Autoregression Neural Network (NARNN) and LSTM and again, LSTM was found the most accurate model with MAPE ranging from 0.16% to 2.5%. Moreover, modified versions of the LSTM’s such as stacked LSTM and bi-directional LSTM (Bi-LSTM) have also been compared and have shown better results than the basic LSTM at prediction of COVID-19 cases (Arora et al., 2020, Shastri et al., 2020, Zeroual et al., 2020).

Given the above insights, it is important to note that the above cited literature looking into forecasting using LSTMs and their variants for forecasting COVID-19 cases has at least one of the following shortcomings:

  • 1.

    The models incorporated are univariate, implying the assumption of no external influence of other factors in the transmission of the virus. The input is the historical cases and the output is the predicted cases. Out of the above mentioned studies, only Devaraj et al., (2021) have considered multivariate LSTM’s by considering additional input parameters such as number of deaths, recoveries, latitude and longitude.

  • 2.

    The duration considered for the training of forecasting models fail to capture various dynamic changes in the spread of the virus. This time period would likely capture only a monotonous increase or decrease in the daily COVID-19 cases without capturing the peaks. Out of the above-mentioned studies, only Devaraj et al., 2021, Chimmula and Zhang, 2020, Shoaib et al., 2021 have considered both rise and drop in cases.

  • 3.

    Modeling is done focused on the number of cases in a larger region such as country or state with limited studies on city/district/county/province level data (Azarafza et al., 2020, Devaraj et al., 2021). Availability of accurate district/city level information facilitates better decision making than availability of state/country level information.

The current work attempts to 1) include the effects of environmental parameters, and 2) forecast daily cases using various LSTM variants due to their better forecasting power. To the best of the authors' knowledge, only Bhimala et al., (2021) have proposed a weather integrated multivariate LSTM models to improve the model performance; however they assume a single parameter for every state in India, which is not practically applicable in such a scenario as environmental and meteorological parameters can exhibit hyperlocal variation even in the city scale.

In order to fill the gaps outlined above, this study looks into the performance of 3 univariate LSTM models (Basic LSTM, Stacked LSTM, and Bi-directional LSTM) to forecast daily cases in 8 cities across 3 countries – India (New Delhi and Nagpur), USA (Los Angeles and Yuma) and Sweden (Stockholm, Skane, Uppsala, and Vastra Gotaland) with varying climatic zones (tropical, sub-tropical and near-frigid respectively). The methods section describes the dataset, preprocessing methods, brief introduction on the different LSTM variants used and the metrics for evaluating model performance. Subsequent to the methodology, the results are explained and the conclusions derived therein are discussed. The paper is unique in its way that it discusses the incorporation of environmental parameters in LSTM models used for forecasting COVID-19 daily cases, which is clearly delineated in the results and discussion sections. Finally, the potential for future work and limitations of this study are outlined.

2. Methodology

2.1. Dataset

A total of 8 study cities across 3 countries (India, Sweden, and the USA) with varying climate zones (tropical, temperate, and frigid) were considered. The study periods were also chosen so as to capture periods with low or minimal lockdown to observe the near-uninhibited spread of the virus. Additionally, locations across different climates were considered to explore the effect of environmental parameters (daily average RH and T) on the daily cases. Table 1 summarizes the study locations, date ranges for analysis, and the environmental parameters observed during the study period. Daily cases and environmental parameters were taken from publicly available datasets. The duration of the analysis ranged from 164 to 204 days.

Table 1.

Sr. No Location (Country) Duration Temperature (°C) Range (Average, Std Dev) RH Range (%)
(Average, Std Dev)
Data Source
1. Stockholm (Sweden) 24th February – 5th August
(164 days)
−0.7 to 24.8
(11.1,6.3)
30 to 97
(66.6, 13.5)
Coronalevel.com, 2021 (URL 01)
Timeanddate.com (URL 02)
2. Skane
(Sweden)
28th February– 18th September (204 days) −0.6 – 22.7 (12, 5.8) 49–96
(72.5,10.6)
Coronalevel.com (URL 01)
Timeanddate.com (URL 02)
3. Uppsala
(Sweden)
4th March-18th September (199 days) −2.2–23.5
(11.2,6.4)
38–97
(68.2,13.2)
Coronalevel.com (URL 01)
Timeanddate.com (URL 02)
4. Vastra Gotaland
(Sweden)
28th February-17th September (203 days) −4.8–21.4
(10.5,5.9)
41–95
(72.4,13.6)
Coronalevel.com (URL 01)
Timeanddate.com (URL 02)
5. Yuma
(USA)
26th April-24th October (182 days) 22.8–39.3
(32.5, 3.3)
10.2–54.7
(26.3, 9.0)
USA Facts (URL 03)Weather Underground
(URL 04)
6. Los Angeles
(USA)
20th April-16th October (180 days) 14.9–36.9
(23.0, 3.6)
13.7–76.6
(53.8,15.3)
USA Facts (URL 03)Weather Underground
(URL 04)
7. New Delhi
(India)
12th May-23rd October (165 days) 26–37.5 (31.2, 2.5) 27–97.8
(69.2,13.7)
covid19india.org (URL 05)CCR
(URL 06)
8. Nagpur
(India)
12th May-17th October (159 days) 23.5–37.95
(28.3, 3.0)
9.5–86.9 (53.4,15.0) covid19india.org (URL 05)
CCR (URL 06)

Swedish strategy during the onset of the COVID-19 pandemic was exploring options of voluntary measures with no specific and strict lockdowns in force (Ludvigsson, 2020), making the country an ideal case to explore the spread of COVID-19 with minimal interference of external factors. India had gone into complete lockdown from 24th March 2020, which resulted in significant reductions in pollution levels across the country. The lockdowns consisted of 4 phases, with the fourth phase ranging from 18 to 31 May 2020, during which lockdown restrictions gradually started lifting. This was followed by 6 unlocking phases from June to November (Ambade et al., 2021a, Ambade et al., 2021b, Ambade et al., 2021c, Ambade et al., 2021d, Chelani and Gautam, 2021; Gautam et al., 2021). In Arizona, USA, the statewide lockdown order expired in May 2020, which eventually led to a sharp rise in cases that eventually declined by October 2020. Similarly, in California, restrictions gradually started relaxations starting from May 2020.

2.2. Data preprocessing and model preparation

The daily cases and daily averaged environmental data for the selected cities were further passed for preprocessing. Missing environmental data was imputed by interpolating over the missing values using the pandas interpolate module with a linear method which assumes that the missing values are equally spaced. (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html). A lag of 6 days was incorporated for the environmental parameters to account for the virus incubation period (Cheng et al., 2021, WHO, 2020). An appropriate running average of 5–7 days was applied on the daily cases time series depending on the location to account for the sharp rise and drops in cases due to various infrastructural lags in testing time and lack of testing which was observed on the weekends and removal of within week variations (Adiga et al., 2021). A split of 80% and 20% were considered for the training and test data respectively. The inputs were normalized and reshaped before passing it through the models. Models were prepared in Python using the Keras library (https://keras.io). Univariate LSTM variants considered for this study were the basic LSTM, Bidirectional LSTM (Bi-LSTM), and Stacked LSTM. A basic multivariate LSTM was applied by incorporating two environmental parameters – daily averaged T and RH. These models are briefly explained below:

LSTM: First introduced by Hochreiter and Schmidhuber, (1997) , a Long Short-Term Memory (LSTM) network is a variant of a Recurrent Neural Network (RNN). Traditional RNN’s are capable of storing short term past information i.e. the previous time step. RNN’s are not suitable for longer term predictions as the gradients are prone to vanishing (i.e. the solution does not converge) or exploding gradient (i.e. the solution diverges). LSTM’s on the other hand are capable of retaining past information over a longer period of time, thus tackling the problem of long-term dependencies and give more accurate predictions. LSTM’s are hence well suited for forecasting of time-series (Gers et al., 2000).

LSTM uses three gates as indicated in Fig. 1 and the subsequent equations. The forget gate (ft) is responsible for forgetting unnecessary information, while the input gate (it) is used for adding new or useful information. The output gate (ot) controls the flow of the information updates the hidden states at every time step (Arora et al., 2020).

ft=σWf·Ct-1,ht-1,xt+bf (1)
it=σWi·Ct-1,ht-1,xt+bi (2)
ot=σWo·Ct,ht-1,xt+bo (3)
Ct=tanhWc·ht-1,xt+bc (4)
Ct=ftCt-1+itCt (5)

Fig. 1.

Fig. 1

Schematic of the LSTM Model.

Here.

  • 1.

    ft represents the equation for the forget gate

  • 2.

    it represents the equation for the input gate

  • 3.

    ot represents the equation for the output gate

  • 4.

    xt is the input vector

  • 5.

    ht and Ct are hidden layer vectors

  • 6.

    bf,bi,bo,bc are the bias vectors

  • 7.

    Wf,Wi,Wo,Wc are the weight vectors

  • 8.

    σ represents the sigmoid activation function

  • 9.

    Tanh represents the hyperbolic tangent activation function

Stacked-LSTM: The Stacked LSTM, is a modified version of LSTM with multiple hidden layers and memory cells, with a typical schematic indicated in Fig. 2 . Stacked LSTM also is comprised of multiple stacked LSTM layers, leading to increased model complexity and depth. The output of each LSTM layer is used as in input for the subsequent LSTM layer. (Kuo and Chen, 2020, Shastri et al., 2020). Finally, the output from the final LSTM layer is passed to a fully connected Dense layer which applies the updated weights for predicting the model output. Additional information on stacked LSTM’s are provided in Shastri et al., (2020).

Fig. 2.

Fig. 2

Schematic of the Stacked LSTM Model.

Bidirectional-LSTM: The bidirectional LSTM is a modification of LSTM, which takes input in both forward and backward directions. This is achieved with the help of two hidden layers as indicated below in Fig. 3. Additional details are provided in Shastri et al., (2020)

Fig. 3.

Fig. 3

Schematic of the Bidirectional LSTM Model.

Multivariate LSTM: For this study, a multivariate basic LSTM model with daily averaged T and RH as additional inputs was incorporated. (Fig. 4A-H) shows the results of multivaariate LSTM for eight cities considered in the study

Fig. 4.

Fig. 4

Performance of the Multivariate LSTM for the 8 cities considered in this study. A) Vastra Gotaland; B) Stockholm; C) Skane; D) Uppsala; E) Yuma; F) Los Angeles; G) New Delhi; and H) Nagpur.

Model Architecture:

The following model architectures were used for this study:

  • The LSTM model consists of the input layer, a single hidden layer, and the dense layer.

  • The Stacked LSTM consists of the input layer followed by two LSTM layers and the dense layer.

  • The Bi-Directional LSTM consists of the input layer, the Bi-LSTM layer, and the dense layer.

  • The Multivariate model consists of the input layer, a single hidden layer, and the dense layer.

The model parameters are summarized in Table 2 .

Table 2.

LSTM model parameters.

Parameter Value
Hidden units 16
Batch Size 1
Lookback Period 7 days
Optimizer Adam (learning rate = 0.01)
Loss Function Mean Squared Error
Number of epochs 1000

2.3. Metrics Used

For this work, the accuracy of the above-indicated models was evaluated using the mean absolute percentage error (MAPE) calculated by the following equation:

MAPE=1Ni=1N(yi-y^i)yi (6)

Other metrics used were the Pearson’s correlation coefficient (r), coefficient of determination (R2), and the root mean square error (RMSE).

r=ixi-x¯yi-y¯ixi-x¯2iyi-y¯2 (7)
R2=1-i=1N(y^i-yi)2i=1N(y^i-y¯)2 (8)
RMSE=i=1Nyi-y^i2N (9)

where, yi's represents the actual values, y^i's represents the predicted values from the model, y¯ represent the mean values of the y-variable and x¯ represent the mean values of the x-variable. All metrics were calculated in Python using the sklearn.metrics module (https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics).

3. Results & Discussion

Table 3 shows the model R2, RMSE and MAPE values for all the LSTM variants for the locations considered. For the colder regions, the MAPE is higher compared to other locations due to the relatively lower number of cases observed in this region and lack of explicit peaks. In general, the multivariate LSTM model significantly outperformed the other models, displaying an average improvement of 61 – 71 % in the MAPE compared to the univariate models (Table 4 ). The improvement is similar to a study done by Shetty and Pai, (2021) who observed a 66% improvement in the MAPE (from 20.73% to 7.03%) after implementing a cookoo search algorithm for better forecasting of COVID-19 cases in the state of Karnataka, India.

Table 3.

Summary of RMSE and MAPE values obtained for the various LSTM models.

Cities Model Performance Metrics
R2 MAPE (%) RMSE
Vastra Gotaland Basic 0.716 16.8 9.828
Stacked 0.526 20.7 12.385
Bidirectional 0.64 17.7 12.002
Multivariate 0.925 8.9 6.685
Stockholm Basic 0.881 13.2 11.852
Stacked 0.553 25.5 22.311
Bidirectional 0.804 18.9 14.769
Multivariate 0.969 8.7 7.944
Skane Basic 0.811 6 3.997
Stacked 0.678 8 5.217
Bidirectional 0.673 6.2 5.179
Multivariate 0.995 0.6 0.486
Uppsala Basic 0.842 8.5 0.959
Stacked 0.596 7 4.889
Bidirectional 0.931 8.1 0.632
Multivariate 0.993 2.1 0.175
Yuma Basic 0.841 10.9 5.659
Stacked 0.63 18.4 8.528
Bidirectional 0.859 9.3 5.325
Multivariate 0.99 3 0.892
Los Angeles Basic 0.568 4.7 57.703
Stacked 0.11 5.5 82.798
Bidirectional 0.336 6.3 71.487
Multivariate 0.978 0.8 9.325
New Delhi Basic 0.885 4.7 171.525
Stacked 0.866 4.9 184.932
Bidirectional 0.896 4.5 163.258
Multivariate 0.794 3 142.112
Nagpur Basic 0.473 18.1 208.935
Stacked −0.29 23.3 326.86
Bidirectional 0.83 11.5 118.522
Multivariate 0.964 5.4 71.77

Table 4.

MAPE statistics for the various LSTM variants used in this study.

Location Multivariate LSTM Basic LSTM Stacked LSTM Bi-directional LSTM
Vastra Gotaland 8.9 16.8 20.7 17.7
Stockholm 8.7 13.2 25.5 18.9
Skane 0.6 6 8 6.2
Uppsala 2.1 8.5 7 8.1
Yuma 3 10.9 18.4 9.3
Los Angeles 0.8 4.7 5.5 6.3
New Delhi 3 4.7 4.9 4.5
Nagpur 5.4 18.1 23.3 11.5
Average (Std Dev) 4.1 (3.3) 10.4 (5.3) 14.2 (8.6) 10.3 (5.4)
Average Improvement (%) of Multivariate LSTM 60.8 71.3 60.6

MAPE values for the multivariate LSTM ranged from 0.6% (Skane) to 8.9% (Vastra Gotaland). Average model error across the 8 locations is 4.1%. MAPE Results are comparable to Shastri et al., (2020), where MAPE’s ranged from 2.17% to 4% and 2.00% to 10.00% for various LSTM variants in India and USA respectively. MAPE values observed by Kırbaş et al., (2020) for 8 European countries ranged from 0.16% to 2.5%. Abbasimehr and Paki, (2021) obtained MAPE values of 0.81% and 0.77% for USA and India respectively with a Beysian Optimized LSTM (Mean MAPE of 2.6% for 10 countries). Chowdhury et al., (2021) predicted daily cases in Bangladesh using LSTM and achieved a MAPE of 4.51%. In New Delhi, Arora et al., (2020) observed MAPE of 6.17% for weekly predictions for Bi-LSTM; whereas in this work, the MAPE is 4.5% for Bi-LSTM and 3% for Multivariate LSTM, although the time period of analysis is different.

While the MAPE values in this study are higher than those observed by Kırbaş et al., 2020, Abbasimehr and Paki, 2021, this can be partly attributed to different model architectures, parameters, regions considered (countries vs cities) and significantly lesser number of daily cases observed in our study, especially in Sweden which further amplifies the errors.

R2 values for the multivariate LSTM ranged from 0.925 to 0.995 which is comparable to the LSTM results from Shoaib et al., (2021) who looked into country-level daily cases. The correlations for the multivariate LSTM in this study are significantly higher than other LSTM variants likely due to ample environmental data available over the past week for predictions (since the lookback period is 7 for the models). The Stacked LSTM was on average the worst performing model, with MAPE ranging from 4.9% (New Delhi) to 25.5% (Stockholm); with an average of 14.2% across the 8 locations considered. Bi-LSTM showed the highest MAPE for 3 out of 4 locations in Sweden (except for Uppsala) but improved performance in India; the overall performance was similar to the Basic LSTM, with average MAPE of 10.3% (Bi-LSTM) and 10.4% (Basic LSTM).

While a majority of studies have seen better performance from LSTM variants, this was not the case in our study. Stacked LSTM and Bidirectional LSTM, in general, allow for greater model complexity and are suited for more complex input patterns. Passing univariate inputs into these LSTM variants makes the models prone to overfitting, which in turn deteriorates the model performance on the test dataset. For instance, Said et al., (2021) also observed a reduced performance of Bi-LSTM against Basic LSTM in Qatar, where they looked into multivariate time series data enriched with data related to lockdown measures. While there are several LSTM results available, a direct one-to-one comparison with this study would be fallacious due to the varying locations, model architectures and time period considered. Metrics of absolute errors (eg. RMSE, MAE, etc.) rather than relative errors such as MAPE prevents comparisons (ArunKumar et al., 2021). A comprehensive summary of COVID-19 prediction models and their results has been compiled elsewhere (Ghafouri-Fard et al., 2021).

Table 5 shows the correlations of the smoothed daily cases with the environmental parameters after consideration of a lag period of 6 days to account for the virus incubation period. Correlations with temperatures were generally positive for the colder regions in Sweden and negative for the warmer regions in the USA and India. There was no correlation with temperature observed in Uppsala, likely due to a smaller number of daily cases over the last two months of the period considered, which is essentially a flat curve. Moreover, since the considered period captures peaks, the correlations are subject to both rising and fall of cases. From these observations, it can be inferred that there is a range of temperature which is ideal for the virus transmission and survival, with colder temperatures generally favoring virus spread and vice versa. RH showed mixed correlations with the daily cases. These correlations however can also be driven by the inherent dependence of the RH with temperature, hence making it difficult to effectively determine the relationship with the virus survival and transmission.

Table 5.

Correlations of smoothed daily cases with environmental parameters after considering a lag period of 6 days.

Cities Correlation with Temperature Correlation with RH Temperature (Avg, Std) RH
(Avg, Std)
Vastra Gotaland 0.561 −0.287 10.5,5.9 72.4,13.6
Stockholm 0.435 −0.377 11.1,6.3 66.6, 13.5
Skane 0.572 0.303 12, 5.8 72.5,10.6
Uppsala 0.006 −0.207 11.2,6.4 68.2,13.2
Yuma 0.200 −0.248 32.5, 3.3 26.3, 9.0
Los Angeles −0.157 0.293 23.0, 3.6 53.8,15.3
New Delhi −0.174 −0.014 31.2, 2.5 69.2,13.7
Nagpur −0.632 −0.004 28.3, 3.0 53.4,15.0

Fig. 4A-H shows the time series of the actual and predicted daily cases for the case of multivariate LSTM in the 8 cities considered. In general, the multivariate model performs well in predicting and following the trend of the daily cases, the model accounts for the clearly observable peaks in daily cases as observed in the warmer regions of LA, Nagpur, New Delhi, and Yuma and the cold region of VG. In Delhi, the model captures double peaks as well. For the colder regions, where there is a lack of explicit peaks, the MAPE is higher due to the relatively lower number of cases observed in this region.

4. Conclusion

This work is specifically aimed at forecasting the uninhibited spread of COVID-19, with minimal interference of other parameters so as to confirm the hypothesis of improved model prediction by integrating environmental parameters into conventional models such as LSTM. The datasets considered ranged from 5 to 7 months which included a period of uncertainty when researchers were still characterizing the virus transmission and survival. In conclusion, the research presents the improved potential for deep learning models incorporated with environmental parameters as inputs for better and improved prediction of the daily COVID-19 cases in the selected locations, consisting of 8 cities across the globe with varying climatic zones. The multivariate LSTM model significantly outperformed the other univariate models. The proposed T and RH integrated multivariate LSTM model can help the decision-makers and the authorities to effectively manage lockdown measures, resources and available infrastructure (Das et al., 2021, Lemaitre et al., 2021, Tomar and Gupta, 2020).

5. Limitations and future work

It is important to note that the work done here can be further improved. While additional data was readily available, their inclusion would have been subject to various additional influences such as lockdowns, festivals and social distancing parameters, which would have potentially introduced bias in the models. The training set data is still small in the context of deep learning, hence limiting this work to a simple model architecture; with more data, the predictive power of LSTM’s is expected to increase, which would further enable the development of deeper and complex models and their optimization. The model does not account for many other factors such as demographic, socio-economic, political, infrastructural, asymptomatic individuals, pollution levels, lockdown status, vaccination status, and many other non-linear factors which can significantly affect the transmission of the disease. Additional information on the characterization of the virus and virus-laden particles (bioaerosols) and its influence on the local microenvironments may also provide additional insights on the pandemic (Gollakota et al., 2021). It is expected that the inclusion of such parameters would further improve the predictive power of these forecasting models.

Nevertheless, the universal availability of city-level weather information, including weather forecasts, enables quick and easy integration of these parameters into forecasting models. It is also expected that the other more complex univariate LSTM variants used in this study (stacked and bidirectional LSTM’s) would further improve upon the integration of environmental parameters and additional training data. Additionally, there are ways to improve the accuracy of these forecasting models such as data augmentation, generative adversarial networks and transfer learning by using some of the earlier epidemiological models as a pre-trained network.

The outcome of this work suggests that the inclusion of daily averaged environmental parameters could significantly improve the prediction capability of deep learning forecasting model for COVID-19. Hence, it is recommended to integrate publicly available weather data (historical and forecast) for enhanced accuracy in the forecasting of city-level COVID-19 cases, although other positive and negative confounding factors can affect the forecasting power.

CRediT authorship contribution statement

Roshan Wathore: Investigation, Method, Resources, Software, Visualization, Writing – original draft. Samyak Rawlekar: Resources, Software, Writing – original draft. Saima Anjum: Investigation, Resources, Software. Ankit Gupta: Formal Analysis, Visualization, Validation. Hemant Bherwani: Conceptualization, Formal Analysis, Project Administration, Resources, Supervision, Writing – original draft, Writing – review & editing. Nitin Labhasetwar: Supervision, Validation, Writing – review & editing. Rakesh Kumar: Supervision, Validation, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Abbasimehr H., Paki R. Prediction of COVID-19 confirmed cases combining deep learning methods and Bayesian optimization. Chaos Solitons Fractals. 2021;142 doi: 10.1016/j.chaos.2020.110511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Adiga, A., Wang, L., Hurt, B., Peddireddy, A., Porebski, P., Venkatramanan, S., Lewis, B., Marathe, M., 2021. All Models Are Useful: Bayesian Ensembling for Robust High Resolution COVID-19 Forecasting. MedRxiv Prepr. Serv. Health Sci. 2021.03.12.21253495. https://doi.org/10.1101/2021.03.12.21253495.
  3. Ambade, B., Kumar, Amit, Kumar, Ashwini, Sahu, L.K., 2021a. Temporal variability of atmospheric particulate-bound polycyclic aromatic hydrocarbons (PAHs) over central east India: sources and carcinogenic risk assessment. Air Qual. Atmosphere Health. https://doi.org/10.1007/s11869-021-01089-5. [DOI] [PMC free article] [PubMed]
  4. Ambade B., Kurwadkar S., Sankar T.K., Kumar A. Emission reduction of black carbon and polycyclic aromatic hydrocarbons during COVID-19 pandemic lockdown. Air Qual. Atmosphere Health. 2021;14:1081–1095. doi: 10.1007/s11869-021-01004-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Ambade B., Sankar T.K., Kumar A., Gautam A.S., Gautam S. COVID-19 lockdowns reduce the Black carbon and polycyclic aromatic hydrocarbons of the Asian atmosphere: source apportionment and health hazard evaluation. Environ. Dev. Sustain. 2021;23:12252–12271. doi: 10.1007/s10668-020-01167-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Ambade B., Sankar T.K., Panicker A.S., Gautam A.S., Gautam S. Characterization, seasonal variation, source apportionment and health risk assessment of black carbon over an urban region of East India. Urban Clim. 2021;38 doi: 10.1016/j.uclim.2021.100896. [DOI] [Google Scholar]
  7. Arora P., Kumar H., Panigrahi B.K. Prediction and analysis of COVID-19 positive cases using deep learning models: A descriptive case study of India. Chaos Solitons Fractals. 2020;139 doi: 10.1016/j.chaos.2020.110017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. ArunKumar K.E., Kalaga D.V., Kumar C.M.S., Kawaji M., Brenza T.M. Forecasting of COVID-19 using deep layer Recurrent Neural Networks (RNNs) with Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM) cells. Chaos Solitons Fractals. 2021;146 doi: 10.1016/j.chaos.2021.110861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Azarafza M., Azarafza M., Tanha J. COVID-19 Infection Forecasting based on Deep Learning in Iran (preprint) Epidemiology. 2020 doi: 10.1101/2020.05.16.20104182. [DOI] [Google Scholar]
  10. Bayyurt, L., Bayyurt, B., 2020. Forecasting of COVID-19 Cases and Deaths Using ARIMA Models. medRxiv 2020.04.17.20069237. https://doi.org/10.1101/2020.04.17.20069237.
  11. Benvenuto D., Giovanetti M., Vassallo L., Angeletti S., Ciccozzi M. Application of the ARIMA model on the COVID-2019 epidemic dataset. Data Brief. 2020;29 doi: 10.1016/j.dib.2020.105340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bherwani H., Gupta A., Anjum S., Anshul A., Kumar R. Exploring dependence of COVID-19 on environmental factors and spread prediction in India. Npj Clim. Atmospheric Sci. 2020;3:38. doi: 10.1038/s41612-020-00142-x. [DOI] [Google Scholar]
  13. Bherwani H., Kumar S., Musugu K., Nair M., Gautam S., Gupta A., Ho C.-H., Anshul A., Kumar R. Assessment and valuation of health impacts of fine particulate matter during COVID-19 lockdown: a comprehensive study of tropical and sub tropical countries. Environ. Sci. Pollut. Res. 2021;28:44522–44537. doi: 10.1007/s11356-021-13813-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Bhimala K.R., Patra G.K., Mopuri R., Mutheneni S.R. Prediction of COVID-19 cases using the weather integrated deep learning approach for India. Transbound. Emerg. Dis. 2021 doi: 10.1111/tbed.14102. https://doi.org/10.1111/tbed.14102. doi: 10.1111/tbed.14102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Chelani A., Gautam S. Lockdown during COVID-19 pandemic: A case study from Indian cities shows insignificant effects on persistent property of urban air quality. Geosci. Front. 2021;101284 doi: 10.1016/j.gsf.2021.101284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Cheng C., Zhang D., Dang D., Geng J., Zhu P., Yuan M., Liang R., Yang H., Jin Y., Xie J., Chen S., Duan G. The incubation period of COVID-19: a global meta-analysis of 53 studies and a Chinese observation study of 11 545 patients. Infect. Dis. Poverty. 2021;10:119. doi: 10.1186/s40249-021-00901-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Chimmula V.K.R., Zhang L. Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos Solitons Fractals. 2020;135 doi: 10.1016/j.chaos.2020.109864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Chowdhury A.A., Hasan K.T., Hoque K.K.S. Analysis and Prediction of COVID-19 Pandemic in Bangladesh by Using ANFIS and LSTM Network. Cogn. Comput. 2021;13:761–770. doi: 10.1007/s12559-021-09859-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Coccia M. Factors determining the diffusion of COVID-19 and suggested strategy to prevent future accelerated viral infectivity similar to COVID. Sci. Total Environ. 2020;729 doi: 10.1016/j.scitotenv.2020.138474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Das A., Dhar A., Goyal S., Kundu A., Pandey S. COVID-19: Analytic results for a modified SEIR model and comparison of different intervention strategies. Chaos Solitons Fractals. 2021;144 doi: 10.1016/j.chaos.2020.110595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Dbouk T., Drikakis D. Weather impact on airborne coronavirus survival. Phys. Fluids. 2020;32 doi: 10.1063/5.0024272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Devaraj J., Madurai Elavarasan R., Pugazhendhi R., Shafiullah G.M., Ganesan S., Jeysree A.K., Khan I.A., Hossain E. Forecasting of COVID-19 cases using deep learning models: Is it reliable and practically significant? Results Phys. 2021;21 doi: 10.1016/j.rinp.2021.103817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Gautam A.S., Kumar S., Gautam S., Anand A., Kumar R., Joshi A., Bauddh K., Singh K. Pandemic induced lockdown as a boon to the Environment: trends in air pollution concentration across India. Asia-Pac. J. Atmospheric Sci. 2021;57:741–756. doi: 10.1007/s13143-021-00232-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Gautam S., Samuel C., Gautam A.S., Kumar S. Strong link between coronavirus count and bad air: a case study of India. Environ. Dev. Sustain. 2021;23:16632–16645. doi: 10.1007/s10668-021-01366-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Gers F.A., Schmidhuber J., Cummins F. Learning to forget: continual prediction with LSTM. Neural Comput. 2000;12:2451–2471. doi: 10.1162/089976600300015015. [DOI] [PubMed] [Google Scholar]
  26. Ghafouri-Fard S., Mohammad-Rahimi H., Motie P., Minabi M.A.S., Taheri M., Nateghinia S. Application of machine learning in the prediction of COVID-19 daily new cases: A scoping review. Heliyon. 2021;7 doi: 10.1016/j.heliyon.2021.e08143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Gollakota A.R.K., Gautam S., Santosh M., Sudan H.A., Gandhi R., Sam Jebadurai V., Shu C.-M. Bioaerosols: Characterization, pathways, sampling strategies, and challenges to geo-environment and health. Gondwana Res. 2021;99:178–203. doi: 10.1016/j.gr.2021.07.003. [DOI] [Google Scholar]
  28. Gupta A., Bherwani H., Gautam S., Anjum S., Musugu K., Kumar N., Anshul A., Kumar R. Air pollution aggravating COVID-19 lethality? Exploration in Asian cities using statistical models. Environ. Dev. Sustain. 2021 doi: 10.1007/s10668-020-00878-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Gupta, R., Pandey, G., Chaudhary, P., Pal, S., 2020. SEIR and Regression Model based COVID-19 outbreak predictions in India. medRxiv 2020.04.01.20049825. https://doi.org/10.1101/2020.04.01.20049825.
  30. Hochreiter S., Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
  31. Kırbaş İ., Sözen A., Tuncer A.D., Kazancıoğlu F.Ş. Comparative analysis and forecasting of COVID-19 cases in various European countries with ARIMA, NARNN and LSTM approaches. Chaos Solitons Fractals. 2020;138 doi: 10.1016/j.chaos.2020.110015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Kuo C.-E., Chen G.-T. Automatic Sleep Staging Based on a Hybrid Stacked LSTM Neural Network: Verification Using Large-Scale Dataset. IEEE Access. 2020;8:111837–111849. doi: 10.1109/ACCESS.2020.3002548. [DOI] [Google Scholar]
  33. Lemaitre J.C., Grantz K.H., Kaminsky J., Meredith H.R., Truelove S.A., Lauer S.A., Keegan L.T., Shah S., Wills J., Kaminsky K., Perez-Saez J., Lessler J., Lee E.C. A scenario modeling pipeline for COVID-19 emergency planning. Sci. Rep. 2021;11:7534. doi: 10.1038/s41598-021-86811-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Lin J., Huang W., Wen M., Li D., Ma S., Hua J., Hu H., Yin S., Qian Y., Chen P., Zhang Q., Yuan N., Sun S. Containing the spread of coronavirus disease 2019 (COVID-19): Meteorological factors and control strategies. Sci. Total Environ. 2020;744 doi: 10.1016/j.scitotenv.2020.140935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Ludvigsson J.F. The first eight months of Sweden’s COVID-19 strategy and the key actions and actors that were involved. Acta Paediatr. 2020;109:2459–2471. doi: 10.1111/apa.15582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Ranjbari M., Shams Esfandabadi Z., Zanetti M.C., Scagnelli S.D., Siebers P.-O., Aghbashlo M., Peng W., Quatraro F., Tabatabaei M. Three pillars of sustainability in the wake of COVID-19: A systematic review and future research agenda for sustainable development. J. Clean. Prod. 2021;297 doi: 10.1016/j.jclepro.2021.126660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Sahai A.K., Rath N., Sood V., Singh M.P. ARIMA modelling & forecasting of COVID-19 in top five affected countries. Diabetes Metab. Syndr. Clin. Res. Rev. 2020;14:1419–1427. doi: 10.1016/j.dsx.2020.07.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Said A.B., Erradi A., Aly H.A., Mohamed A. Predicting COVID-19 cases using bidirectional LSTM on multivariate time series. Environ. Sci. Pollut. Res. 2021;28:56043–56052. doi: 10.1007/s11356-021-14286-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Shastri S., Singh K., Kumar S., Kour P., Mansotra V. Time series forecasting of Covid-19 using deep learning models: India-USA comparative case study. Chaos Solitons Fractals. 2020;140 doi: 10.1016/j.chaos.2020.110227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Shetty R.P., Pai P.S. Forecasting of COVID 19 Cases in Karnataka State using Artificial Neural Network (ANN) J. Inst. Eng. India Ser. B. 2021;1–11 doi: 10.1007/s40031-021-00623-4. [DOI] [Google Scholar]
  41. Shoaib M., Salahudin H., Hammad M., Ahmad S., Khan A.A., Khan M.M., Baig M.A.I., Ahmad F., Ullah M.K. Performance Evaluation of Soft Computing Approaches for Forecasting COVID-19 Pandemic Cases. SN Comput. Sci. 2021;2:372. doi: 10.1007/s42979-021-00764-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Singh S., Sundram B.M., Rajendran K., Law K.B., Aris T., Ibrahim H., Dass S.C., Gill B.S. Forecasting daily confirmed COVID-19 cases in Malaysia using ARIMA models. J. Infect. Dev. Ctries. 2020;14:971–976. doi: 10.3855/jidc.13116. [DOI] [PubMed] [Google Scholar]
  43. Srivastava, Y., Bhardwaj, S., R, P., 2021. Covid-19 Forecasting and Analysis Using Different Time - Series Model and Algorithms. Int. J. Curr. Res. Rev. 184–189. https://doi.org/10.31782/IJCRR.2021.SP191.
  44. Tomar A., Gupta N. Prediction for the spread of COVID-19 in India and effectiveness of preventive measures. Sci. Total Environ. 2020;728 doi: 10.1016/j.scitotenv.2020.138762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Tosepu R., Gunawan J., Effendy D.S., Ahmad L.O.A.I., Lestari H., Bahar H., Asfian P. Correlation between weather and Covid-19 pandemic in Jakarta Indonesia. Sci. Total Environ. 2020;725 doi: 10.1016/j.scitotenv.2020.138436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. URL 01, Coronalevel.com, 2021. Development of Coronavirus cases: Stockholm, Sweden (549,733 cases) [WWW Document]. URL https://coronalevel.com/Sweden/Stockholm/ (accessed 4.20.21).
  47. URL 02 Time and Date AS, 2021. World Temperatures — Weather Around The World [WWW Document]. URL https://www.timeanddate.com/weather/ (accessed 4.20.21).
  48. URL 03, USAFacts, 2021. US COVID-19 cases and deaths by state [WWW Document]. USAFacts.org. URL https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/ (accessed 4.20.21).
  49. URL 04, The Weather Company, 2021. Weather History & Data Archive | Weather Underground [WWW Document]. URL https://www.wunderground.com/history (accessed 4.20.21).
  50. URL 05, COVID19INDIA, 2021. Coronavirus in India: Latest Map and Case Count [WWW Document]. URL https://www.covid19india.org (accessed 4.20.21).
  51. URL 06, CPCB, 2021. CCR [WWW Document]. URL https://app.cpcbccr.com/ccr/#/caaqm-dashboard-all/caaqm-landing (accessed 4.20.21).
  52. Wang J., Tang K., Feng K., Lv W. High Temperature and High Humidity Reduce the Transmission of COVID-19. SSRN Electron. J. 2020 doi: 10.2139/ssrn.3551767. [DOI] [Google Scholar]
  53. Wathore R., Gupta A., Bherwani H., Labhasetwar N. Understanding air and water borne transmission and survival of coronavirus: Insights and way forward for SARS-CoV-2. Sci. Total Environ. 2020;749 doi: 10.1016/j.scitotenv.2020.141486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. WHO, 2020. Transmission of SARS-CoV-2: implications for infection prevention precautions [WWW Document]. URL https://www.who.int/news-room/commentaries/detail/transmission-of-sars-cov-2-implications-for-infection-prevention-precautions (accessed 3.6.21).
  55. Wu Y., Jing W., Liu J., Ma Q., Yuan J., Wang Y., Du M., Liu M. Effects of temperature and humidity on the daily new cases and new deaths of COVID-19 in 166 countries. Sci. Total Environ. 2020;729 doi: 10.1016/j.scitotenv.2020.139051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Xu C., Yu Y., Chen Y., Lu Z. Forecast analysis of the epidemics trend of COVID-19 in the USA by a generalized fractional-order SEIR model. Nonlinear Dyn. 2020;101:1–14. doi: 10.1007/s11071-020-05946-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Zeroual A., Harrou F., Dairi A., Sun Y. Deep learning methods for forecasting COVID-19 time-Series data: A Comparative study. Chaos Solitons Fractals. 2020;140 doi: 10.1016/j.chaos.2020.110121. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Gondwana Research are provided here courtesy of Elsevier

RESOURCES