Abstract
Traditional deep learning models such as convolutional neural networks (CNNs), which capture localized features, and long short-term memory networks (LSTMs), which focus on long-term dependencies, often face challenges in achieving higher accuracy for time series prediction tasks. To address this limitation, this study proposes a hybrid deep learning model that integrates CNN, LSTM, the reptile search algorithm (RSA), and eXtreme Gradient Boosting (XGB) for pollutant concentration forecasting. Initially, the raw pollutant concentration data undergoes cleaning and normalization via a Min–Max scaler. The processed sequences are then separately fed into LSTM and CNN models to extract weighted features. RSA is applied to optimize these features, while XGB computes feature importance scores, quantifying the contribution of each selected feature to the predictive performance. The proposed model predicts pollutants such as
, CO, SO
, and NO
up to ten days in advance for urban Indian settings. Comparative evaluations against benchmark models—including Transformer, CNN, BiLSTM, BiRNN, ANN, and BiGRU—demonstrate that the hybrid approach yields consistently superior accuracy and robustness. The hybrid model achieves substantially lower errors and higher
scores across all pollutants, validating its reliability for long-horizon air quality forecasting.
Keywords: Hybrid model, Ensemble learning, Meta-heuristic optimisation, Air pollutants
Subject terms: Environmental sciences, Mathematics and computing
Introduction
Over the past decades, air pollution has emerged as a pressing global issue1. Research extensively highlights its profound effects on both human health and vegetation, demonstrating that prolonged exposure significantly harms plant foliage2. The primary contaminants predominantly originate from stationary sources, including dust,
particles (with diameters less than 10
) and
particles (with diameters less than 2.5
).
particles are particularly hazardous as they stem from unburned fuel and industrial byproducts.
Other major pollutants include sulfur dioxide (SO
), which is emitted during the combustion of fuels; nitrogen oxides (NO
), formed when nitrogen and oxygen react under high-temperature conditions; carbon monoxide (CO), a byproduct of incomplete combustion; and ozone (O
), which results from photochemical reactions3. Figure 1 illustrates these major sources of air pollution.
Fig. 1.

Major sources of air pollution.
Forecasting air pollution levels with precision is crucial for promoting partnerships with governmental bodies and raising public awareness about its risks. Air pollution data are often characterised by rising or falling trends, seasonality (variability over specific periods), cycles (fluctuations without a fixed time pattern) and erratic movements4.
Deep learning architectures are highly effective in addressing air pollution prediction problems. It excels at handling non-linear, cyclical, seasonal, and sequential dependencies present in pollutant data, making it a robust solution for such complex scenarios as reported in5. and6.
Training deep learning models demands substantial computational resources and time. The selection of hyperparameters plays a crucial role in the model’s training process, directly influencing its computational efficiency, susceptibility to overfitting, and the accuracy of the final model.
Hyperparameter tuning is a challenging and time-intensive process that aims to identify the optimal configuration, invariably involving some margin of error. Crucial factors to evaluate involve deciding on the number of layers, specifying the quantity of cells in each layer, selecting the units, determining the batch size, opting for suitable activation functions, and configuring related settings.
To address the challenges of air pollution prediction, this study adopts a hybrid deep learning model that combines Long Short-Term Memory (LSTM) networks with Convolutional Neural Networks (CNN). Traditional models often struggle either with capturing long-term dependencies or with efficiently detecting local variations in time-series data. The LSTM component overcomes this by preserving contextual information across extended sequences, while the CNN component effectively extracts localized temporal patterns and short-term fluctuations. By integrating both, the hybrid approach not only reduces overfitting but also provides a more comprehensive representation of the data, thereby surpassing the limitations of single-model architectures and improving predictive accuracy for air pollution forecasting.
Combining LSTM and CNN helps mitigate overfitting by providing a more robust representation of the data7. These methods can be used to locate near-optimal solutions8 within vast search spaces, thereby contributing to developing a highly effective model to predict air pollution.
Considering the aforementioned information, the novelties and key contributions of this study are highlighted as follows.
The present study proposes hybrid models that integrate CNN and LSTM with XGB, alongside the RSA optimisation algorithm, to predict pollutants (
, CO, O
, NO
).This research leverages meta-heuristic optimisation technique, specifically the RSA, to minimise computational complexity and enhance training efficiency.
Current methodologies in this field primarily concentrate on short-term predictions, typically covering 24 to 48 hours. Only a few studies extend their forecasts to 3 or 4 days9. This innovative approach can make predictions up to 10 days in advance.
Related works
Air quality models can generally be classified into physical modelling and machine learning. Physical modelling is theory-driven, relying on mathematical and physical principles to simulate the behaviour of pollutants. In contrast, machine learning is data-driven, using large datasets to develop algorithms that predict air quality based on observed data patterns and relationships10. Atmospheric dispersion models (ADMs), Weather Research and Forecasting model coupled with Chemistry(WRF-Chem), and the Community Multiscale Air Quality Modelling System(CMAQ) frequently utilise physical models. ADMS focuses on atmospheric dispersion modelling, WRF-Chem integrates weather forecasting with chemical processes, and CMAQ combines chemical transport modelling with meteorological data11. However, these methods often encounter limitations due to their high computational costs, the intricate nature of modelling chemical processes, and the inherent uncertainties in emission inventories12. Conversely, air pollution modelling can also utilise historical data to discern statistical patterns and their correlations with urban proxy variables, such as meteorological conditions and traffic flow. Linear models like Autoregressive Integrated Moving Average (ARIMA) or machine learning methods such as Support Vector Regression (SVR) and Artificial Neural Networks (ANN) can be utilised effectively for modelling non-linear dynamics in high-dimensional environments. Recent progress in data-driven techniques has demonstrated significant potential in estimating and forecasting air pollution using comprehensive urban datasets13. Deep learning techniques, such as Recurrent Neural Networks (RNN) and their variants like LSTM and Gated Recurrent Unit (GRU) models, have excelled in various time series prediction tasks, including air pollution forecasting. Similarly, CNN can be employed on air quality datasets to identify patterns for future prediction models14. A combined CNN-LSTM architecture is utilised to capture spatial and temporal features, with 1D convolution layers employed to improve the learning of interactions between these dimensions15. An innovative method integrates deep learning with domain-specific models to predict long-term air pollution trends in China and the United Kingdom. This method incorporates domain-specific knowledge by using the statistically significant relationship between
and
as a regularisation term16. Graph Neural Networks (GNNs) often struggle to capture various spatial and feature-based contextual factors. This issue has been addressed by a new GNN framework that successfully captures the similarities among stations by considering land use at their locations and their primary sources of pollution17. In response to the increasing surface ozone (O
) pollution in urban areas across China, a new deep neural network (DNN) model, Geo-STO3Net, was developed. This model integrates nearby geographical spatiotemporal information using comprehensive meteorological data and satellite observations for surface O
estimation18. Between October 4, 2021, and December 26, 2021, a 12-week investigation into ambient air quality parameters (
,
, SO
, NO
, and O
) across four sampling sites in the Delhi-NCR region. The investigation revealed that measurements of
obtained through ground-based instruments exceeded those recorded via satellite monitoring. Conversely, satellite observations indicated elevated average levels of SO
and NO
compared to other pollutants19. Spatio-temporal evolution analysis is a key area of air pollution research. AirPollutionViz is a visual analytics system designed to facilitate the analysis of spatio-temporal evolution through sequence mining and clustering analysis20. An IoT-enabled system has significantly enhanced air quality monitoring and prediction, focusing on PM concentration monitoring across edge devices and cloud platforms. The system employs an advanced WVPBL approach that integrates wavelet denoising, principal component analysis (PCA), and variational mode decomposition. This combination facilitates the extraction of features from multi-modal air quality data, thereby ensuring precise short-term predictions of
concentrations21.
Most of the discussed studies emphasise short-term predictions, often limited to 24–48 hours. However, these models typically underperform when extended to longer-term forecasting due to issues such as error accumulation, sensitivity to dynamic meteorological conditions, and the inability to capture evolving pollutant–meteorology interactions. By explicitly highlighting these shortcomings, the necessity of developing a hybrid approach that integrates spatial, temporal, and domain-specific factors for robust long-term air pollution forecasting would be made clearer.
Although prior studies employing CNNs, LSTMs, and other deep learning models demonstrate strong short-term forecasting performance, they rarely address longer-term horizons such as 10-day pollutant predictions. These models often suffer from error accumulation, limited spatiotemporal feature extraction, and inadequate feature optimization strategies. Our study addresses this gap by proposing a hybrid CNN–LSTM–RSA–XGBoost architecture, explicitly designed for robust 10-day forecasts. This integration simultaneously captures local and long-term dependencies, optimizes feature selection, and ensures interpretability—areas insufficiently covered in existing CPCB-based research.
The summarised literature review is presented in Table 1.
Table 1.
Related works comparison table.
| Papers | Parameters | Method | Findings |
|---|---|---|---|
| 14 |
, SO , , , CO, NO and O
|
Convolutional Neural Network | CNNs are adept at identifying intricate patterns within data. |
| 15 | Air pollution, Weather, Traffic, Morphology data | Hybrid CNN-LSTM model | The integration of CNN with LSTM allows the model to capture spatial and temporal characteristics. |
| 16 | Air Pollution and Weather data | Bayesian deep-learning model | Domain-specific model can better adapt to the specific characteristics of air pollution data. |
| 18 | In situ surface O , TROPOMI data (O concentrations data), Auxiliary Data |
Deep neural network (Geo-STO3Net) | Combines meteorological data and satellite imagery, capturing complex geographical and spatiotemporal patterns. |
| 20 |
, SO , , , CO, NO and O and longitude, latitude |
Sequence mining and clustering analysis | Advanced visualization techniques to effectively display air pollution patterns . |
| 21 |
, SO , , , CO, NO and O , atmospheric temperature, atmospheric humidity, rainfall, and wind speed |
Bidirectional long-short memory network | Utilizing sophisticated methods such as wavelet denoising, variational mode decomposition, and principal component analysis to identify features. |
| 22 |
and longitude, latitude data |
Deep Support vector regression | It leverages a random walk to uncover more extensive spillover relationships between nodes. |
| 23 |
data, Meteorological and forest fire disturbance data |
Long short-term memory neural network | Added interference of forest fires to pollutant predictions to improve accuracy. |
| 24 | Air pollutant concentration and Meteorological parameters | Gradient boosting machine learning model | Fifteen regression models tested and found that the CatBoost regression model outperformed. |
| 25 |
data, Meteorological and wildfire data |
SpatioTemporal (ST)-Transformer | Model captures spikes in concentrations during wildfire situations. |
| 26 | Air pollution and Meteorological data | Bidirectional Recurrent Neural Network | Analyses through IoT and adapting neural network. |
Methodology
This section explains the system’s working architecture, as depicted in . It starts with the dataset gathered from27. The data undergoes pre-processing and transformation through various techniques to enhance quality and eliminate inconsistencies. Following this, the processed data is input into the model. Finally, both models are trained to predict future concentrations of the analysed air pollutants.
Input Dataset
The CPCB dataset has been fed into the model28. This dataset consists of comprehensive hourly and daily records of pollutant concentrations for various Indian cities, spanning 2015 to 2020. The pollutant measurements encompass 29,532 hourly and daily concentration data samples for each city. A statistical summary of the dataset is presented in Table 2.
Table 2.
A statistical summary of the dataset.
Statistic
|
Count | Mean | Std | Min | 25% | 50% | 75% | Max |
|---|---|---|---|---|---|---|---|---|
![]() |
24933 | 67.45 | 64.66 | 0.04 | 28.82 | 48.57 | 80.59 | 949.99 |
![]() |
18391 | 118.12 | 90.60 | 0.01 | 56.25 | 95.68 | 149.74 | 1000 |
| NO | 25949 | 17.57 | 22.78 | 0.02 | 5.63 | 9.89 | 19.95 | 390.68 |
NO
|
25946 | 28.56 | 24.47 | 0.01 | 11.75 | 21.69 | 37.62 | 362.21 |
NO
|
25346 | 32.30 | 31.64 | 0 | 12.82 | 23.52 | 40.12 | 467.63 |
![]() |
19203 | 23.48 | 25.68 | 0.01 | 8.58 | 15.85 | 30.02 | 352.89 |
| CO | 27472 | 2.24 | 6.96 | 0 | 0.51 | 0.89 | 1.45 | 175.81 |
SO
|
25677 | 14.53 | 18.13 | 0.01 | 5.67 | 9.16 | 15.22 | 193.86 |
O
|
25509 | 34.49 | 21.69 | 0.01 | 18.86 | 30.84 | 45.57 | 257.73 |
| Benzene | 23908 | 3.28 | 15.81 | 0 | 0.12 | 1.07 | 3.08 | 455.03 |
| Toluene | 21490 | 8.70 | 19.96 | 0 | 0.6 | 2.97 | 9.15 | 454.85 |
| Xylene | 11422 | 3.07 | 6.32 | 0 | 0.14 | 0.98 | 3.35 | 170.37 |
| AQI | 24850 | 1166.46 | 140.69 | 13 | 81 | 118 | 208 | 2049 |
Preprocessing
Time-series data often exhibits noise, missing values, inconsistencies, and redundant information, typically dispersed across various heterogeneous sources. These inconsistencies can significantly reduce data quality, thereby impacting the reliability and accuracy of results. The key phases involved in the data preprocessing stage are discussed below.
Handle missing value: Missing values in the dataset can lead to unreliable or inaccurate predictions. To address this issue, the “city” column groups the input data, and any rows containing missing values are removed within each city group, followed by an index reset. Consequently, this methodology’s initial step focuses on handling missing values. Analytical observations reveal that cities such as Ahmedabad, Bengaluru, and Mumbai have a significantly higher number of null entries. Some city entries were dropped to manage these missing values, and index values were reset.
- Data scaling and transformation: A MinMaxScaler is applied to scale the data (x) within the range [0, 1] using Eq. (1). Scaling is critical for neural networks, facilitating more effective convergence during training. The scaler is fitted and applied exclusively to the specified pollutant columns, which typically represent the features used in the model. To analyse time-series data, lag features are constructed using values from the preceding 24 time steps to predict the subsequent pollutant levels.

1 Splitting dataset: The dataset is divided into training and testing sets using the split function. In this process, 20% of the data is designated for testing purposes, while the remaining 80% is utilised for training. A random state of 42 is employed as the random seed to ensure reproducibility during the split.
Hybrid model
This section comprehensively analyses the proposed model. Following data preprocessing and splitting, the windowed data generated in the previous steps is utilised as input for the hybrid model, as illustrated in Fig. 2. The primary objective of this study is to forecast future pollutant levels using historical timestamp data.
Fig. 2.
Flow diagram of the proposed hybrid methodology.
The hybrid model integrates LSTM, CNN, and RSA for feature selection and utilises XGBoost for prediction. Detailed explanations for each stage are provided below, and the pseudocode for each stage is outlined in Algorithm 1.
The methodology details all parameters: sequence_length = 30, epochs = 100 (with early stopping patience=10), batch_size = 32, optimiser (Adam, lr=0.001), and all model layer sizes. The random seed (random_state=42) must be set for all stochastic processes (splits, model init). Using train_test_split(shuffle=False) is correct for time series to avoid look-ahead bias. Even better would be explicit time-based splitting (e.g., train on 2015-2018, validate on 2019, test on 2020). The key novelty of this work lies in the specific integration of a CNN and LSTM for spatiotemporal feature extraction, followed by a feature selection step and an XGBoost regressor, explicitly designed and validated for the challenging task of 10-day air quality forecasting in Indian urban centers, a horizon significantly longer than those addressed in most previous studies.
Although recent Transformer-based models like Informer and Autoformer excel in long-sequence forecasting, they are computationally intensive and require large datasets to generalize effectively29. In contrast, our hybrid CNN-LSTM framework efficiently captures both local patterns (via CNN) and long-term dependencies (via LSTM) in city-level pollutant time series. The integration of the Reptile Search Algorithm (RSA) optimizes feature weights, while XGBoost provides interpretable feature importance scores. Given the moderate dataset size and focus on interpretability, this approach achieves high predictive accuracy and efficiency without the added complexity of Transformer variants, making it a practical choice for urban pollutant forecasting.
The Reptile Search Algorithm (RSA) has garnered significant attention in recent literature for its efficacy in optimization tasks. A comprehensive review by30 highlights RSA’s strengths in balancing exploration and exploitation, making it a robust choice for complex optimization problems. Furthermore, a study by31 introduced a multi-strategy enhanced RSA, integrating dynamic evolutionary strategies to improve convergence rates. Additionally,32 proposed an improved RSA addressing population diversity and convergence issues, enhancing its performance in challenging optimization scenarios.
LSTM model: The LSTM architecture33, pioneered by Hochreiter, has exceptional effectiveness in time-series prediction. This model effectively captures non-linear connections between historical data and current time points. In this study, a multivariate, two-layer Long Short-Term Memory (LSTM) network has been developed, as illustrated in Fig. 3.
A sequential LSTM model is constructed by stacking two LSTM layers. This model takes into account both the length of the input sequence and the number of features at each time step. The first LSTM layer has the input sequence ’w’, i.e. the window size of the previous time step. This layer processes the input time-series data and extracts initial temporal features, capturing short-term dependencies and patterns in the sequence, such as daily trends. The output from the first layer is then passed to the second LSTM layer, which further refines the temporal features. This layer focuses on capturing long-term dependencies and more abstract patterns, such as seasonal trends or the impact of weather conditions over extended periods.
Fig. 3.
Structural design of long short-term memory networks.
The LSTM layer handles multivariate sequence data by capturing temporal dependencies across all pollutants using 50 LSTM units. The LSTM layer 1 has a rectified linear unit (ReLU) as the activation function, and the return sequences parameter is set to true. This layer is the model’s core, and it captures temporal dependencies in your multivariate sequence data. Here’s what each gate does:
- Forget Gate: The forget gate evaluates the current input data and the previous hidden state to determine which parts of the cell state are necessary Eq. (3). This gate helps the model ignore outdated or irrelevant information, such as pollutant spikes caused by one-time events. It evaluates the previous hidden state
and current input
using a sigmoid function. The output is a vector
with values between 0 and 1, where 0 means “completely forget” and 1 means “completely retain” as shown in Eq. (2) below: 
2 - Input Gate: he input gate determines the new information to incorporate into the cell state. By utilising the sigmoid function, it assigns weights to the significance of the input
and the previous hidden state
, as depicted in Eq. (3). This mechanism adapts to abrupt variations in weather conditions or pollutant levels. 
3 - Candidate Cell State: The candidate cell state
signifies possible updates to the cell state. It is computed through the hyperbolic tangent (
) function, producing outputs within the range of [-1, 1], as illustrated in Eq. (4) 
4 - Cell State Update: The cell state
integrates the previous cell state
with the candidate state
, guided by the outputs of the forget and input gates. This updated cell state serves as the carrier of the sequence’s long-term memory. The corresponding update is represented in Eq. (5). 
5 - Output Gate: The output gate determines the part of the cell state that will be output as the hidden state
for the current time step. It uses a sigmoid function for gating and multiplies it with the updated cell state passed through
as shown in the Eq. (6) below:
In the above equations,
6
and
denote the hidden states at time steps t and
, respectively, while
represents the pollutant concentration input at time t. The weight matrices are denoted by
,
, and
, and the corresponding bias terms are
,
, and
. The hidden state functions as the output of the LSTM cell, carrying information about both the current and preceding sequences to the next time step. After processing through the first LSTM layer, Dropout layers randomly drop 20% of neurons to help prevent overfitting. The second LSTM layer, with 50 units, does not return sequences since it is the final LSTM in the stack. Another Dropout layer with a 20% rate for regularisation. Finally, a Dense layer with feature units is used to output the final predictions for each feature, capturing the interrelationships among all input features. All the hyper-parameters value for LSTM is shown in Table 3.
Table 3.
Model configurations and hyperparameters for LSTM, CNN, and XGBoost.
| Model | Configuration | Training setup | Parameters |
|---|---|---|---|
| LSTM |
2 LSTM layers (50 units, ReLU) Dropout: 0.2 |
Optimizer: Adam Loss: MSE Epochs: 50 Batch: 32 Early Stopping: 10 |
Seq. Length: 30 Features: 1 Total Params: 91,955 Trainable: 30,651 |
| CNN |
Conv1D: 32 filters (k=2, ReLU) + MaxPool(2) Conv1D: 16 filters (k=2, ReLU) + MaxPool(2) Dense: 32 (ReLU), Dropout: 0.2 |
Optimizer: Adam Loss: MSE Epochs: 50 Batch: 32 Early Stopping: 10 |
Total Params: 12,821 Trainable: 4,273 |
| XGBoost |
Objective: reg:squarederror Learning Rate: 0.1 Random State: 42 |
Tree-based boosting Default training iterations |
Other parameters: default |
- CNN model: A CNN model is designed to identify short-term temporal patterns, making it useful for time-series data with localised trends, as shown in Fig. 4. The input layer defines the input sequence length and the number of features, where the input sequence length refers to the length of each sequence, and the number of features represents the attributes in each time step. A simple 1D Convolutional Neural Network (CNN) model has been utilised for time-series forecasting, as illustrated in Eq. (7). The 1D convolutional layer includes 64 filters and a kernel size of 3, which captures patterns over three time steps. The mathematical representation is as follows: Convolutional layers were defined using the standard formulation
where the convolutional operation follows standard definitions34,35.
7
where: x is the feature vector, ReLU is the activation function to introduce non-linearity, W weight for every unit (initially set to a random value), and b is the bias added to every unit (initially set to a random value). Subsequently, the MaxPooling1D layer down-samples the convolution output to reduce the dimensionality and computational load as shown in Eq. (8). The mathematical representation is as follows:
| 8 |
In the above equation, max(x) identifies the maximum value within each consecutive 2-step window. Subsequently, the flattening layer transforms the 2D output from the preceding layer into a 1D vector, setting it up for the Dense layers, as illustrated in Eq. (9). The mathematical expression is represented as follows:
| 9 |
where: Vector(x) represent 1D feature vector A dense layer with 50 units and ReLU activation for learning complex patterns is shown in Eq. (10) . The mathematical representation is as follows:
| 10 |
where: W weight learned within the network and b bias value set accordingly. Finally, the output layer has several feature units, one for each feature to predict, suitable for a regression task. All the hyperparameters for CNN are shown in Table 3.
-
Feature extraction: Extracting features from trained models captures the learned representations of data, which can be used for further analysis. A feature extractor excludes the final layer, predicts on the input data, and reshapes the resulting features for downstream tasks. This is often employed in transfer learning or when intermediate representations of input data are required.
When data X passes through the LSTM model, it processes sequential information, learns temporal patterns, and generates feature vectors representing the model’s learned representations. Similarly, when data flows through the CNN model, it captures spatial patterns and local dependencies. The resulting feature vectors highlight the crucial patterns identified by the model, as illustrated in Fig. 2.
Fig. 4.
Structural design of convolutional neural networks.
Algorithm 1.
Hybrid Model for Pollutant Prediction
-
RSA meta-heuristic optimization: The Reptile Search Algorithm (RSA) introduced in 202236. RSA draws inspiration from crocodile behaviours to optimise solutions for intricate problems. This algorithm operates through two key mechanisms: exploitation, which focuses on local search, and exploration, which emphasises global search. The RSA employs strategies inspired by hunting and encircling behaviours observed in nature. The algorithm initialises three key parameters: the population of crocodiles, the dimensionality of the search space, and the initial candidate solutions. The RSA is applied to enhance the output features derived from LSTM and CNN, ensuring optimal performance.
The RSA initiates optimisation by creating a randomly distributed set of candidate solutions, represented as (X). In each iteration, the most favourable solution identified is assessed and considered as an approximation of the optimal result.
11
Here, X represents the dataset in a matrix form, where each row corresponds to a sample (or population member) and each column represents a feature dimension. Thus,
denotes the value of the
feature for the
sample as described in Eq. (11). In other words, the matrix captures X candidate solutions in a D-dimensional search space, forming the input basis for subsequent optimization steps.
The problem is characterised by two key parameters: the population size (P), which corresponds to the initial number of solutions, and the data feature dimension (D), which represents the feature set of pollutant concentration.
To initialize the search process, each solution (or candidate) must be randomly generated within the feasible range of the problem. This ensures that the algorithm explores the entire search space instead of being biased toward a particular region.
| 12 |
Here,
denotes the
variable of the
candidate solution. The term rand generates a random number uniformly distributed in [0, 1]. By scaling it with
and shifting by
, the variable is guaranteed to lie within the specified lower and upper bounds of the search space as shown in Eq. (12).
-
Exploration phase: The encircling phase focuses on exploring high-density regions within the search space. During this phase, movements inspired by crocodile behaviours, such as high and belly walking, are pivotal. While these movements are not directly related to capturing prey, it is instrumental in exploring an extensive search space.
During the exploration phase, referred to as encircling, two specific conditions must be satisfied according to Eq. (13), involving high walking and belly walking movements. High walking is determined by the condition (
), while belly walking adheres to (
). Subsequently, the value of
is updated based on these parameters: 
13
In this formulation,
represents the updated value of the
variable for the
candidate solution at iteration
. During the early stage of the search (
), the update is influenced by the best solution so far
, a control factor
, and a random deviation term
, encouraging diverse exploration. In the subsequent stage (
), the update incorporates both the best solution and a randomly chosen candidate
, modulated by the exploration strength ES(t). This transition gradually balances exploration and exploitation as the search progresses. In Eq. (13), the parameter
regulates the exploration process. Additionally, the random variable
and
contribute to the stochastic elements of the algorithm. The hunting operator for the
position, represented by
, is determined using Eq. (14). Finally, the reduced function R
i, k is applied to narrow the search area, as defined by Eq. (15).
| 14 |
Here,
represents the control factor associated with the
candidate and the
variable. It is obtained by multiplying the best solution in the current iteration,
, with the population factor
. This formulation ensures that the search process is guided by the best-known solution while still being influenced by the diversity of the population, thereby balancing exploitation of good solutions with exploration of alternative regions in the search space.
| 15 |
In this formulation,
denotes the random deviation term for the ith candidate and the kth variable. It is calculated as the difference between the current best solution
and a randomly chosen solution
, normalized by the best solution plus a small constant
to avoid division by zero. This mechanism injects controlled randomness into the update process, ensuring exploration of new regions while maintaining numerical stability.
| 16 |
The environmental selection factor, ES(t), is designed to balance exploration and exploitation in the optimization process as presented in Fig. 5a. It introduces a degree of randomness while gradually adjusting its influence over time. Intuitively, ES(t) becomes smaller as the iteration count T increases, allowing the algorithm to focus more on exploitation in later stages.
Fig. 5.
RSA Algorithm flowchart split into two parts: (a) Initialization and fitness evaluation, (b) Strategy application and iteration.
Here,
is a random number that injects stochastic behavior, and the term
ensures that the impact of ES(t) diminishes over iterations. This mechanism helps the algorithm explore new regions initially while gradually stabilizing towards convergence.
The term
represents the normalized perturbation for the ith candidate and the kth variable. It is designed to scale the deviation of the candidate solution from the mean relative to the range of the variable and the current best solution. The constant
provides a baseline bias, while
prevents division by zero, ensuring numerical stability. Intuitively,
allows larger adjustments when a candidate is far from the mean and smaller adjustments when it is close, helping the algorithm balance exploration and exploitation.
| 17 |
Here,
is the current value of the variable,
is the mean of the ith candidate,
is the best solution found for the kth variable, and
,
are the upper and lower bounds of the variable, respectively.
The term
represents the mean value of all variables for the ith candidate. Intuitively, it provides a central reference point for that candidate’s position in the solution space, helping to assess how each individual variable deviates from the candidate’s overall average.
![]() |
18 |
Here,
is the value of the kth variable of the ith candidate, and D is the total number of variables. By computing this mean, the algorithm can normalize deviations and maintain stability during the optimization process.
In the above equation,
represents a small constant, while
denotes a randomly selected value. The evolutionary phase, ES(t), is characterised by the probability ratio, as described in Eq. (16). This ratio decreases progressively from -2 to 2 for iterations. It is determined using Eq. (17). Additionally, the random variable
and P(i, k) indicate the percentage difference between the optimal value and
, which represents the current solution and the average solutions, as computed in Eq. (18).
Hunting phase: The hunting mechanism employs strategic movements to refine the positions of candidate solutions, enhancing their fitness for prediction accuracy. During the social sharing phase, candidate solutions exchange information to navigate the search process effectively, promoting a balance between exploration and exploitation of the search space.
Encircling, hunting, and social sharing are repeated until a predefined stopping criterion is achieved, such as reaching a maximum of 100 iterations. The foraging process encompasses two distinct activities: hunting coordination and hunting cooperation. These represent focused strategies to refine the exploitation search, as defined by Eq. (19). Hunting coordination is carried out when
and
. Alternatively, hunting cooperation takes place when
and
.
The update of the variable
depends on the current iteration t and is designed to balance exploration and exploitation dynamically throughout the optimization process. In the early and middle stages, the algorithm applies different strategies to either explore new regions or refine existing solutions. Randomness is incorporated through rand to avoid premature convergence, while terms like
,
, and
control the step size and direction based on the candidate’s relation to the best solution.
| 19 |
Here:
is the best solution found so far for the kth variable,
is the normalized perturbation factor,
is the control factor for candidate i and variable k,
is the random deviation term,rand introduces stochasticity, and
is a small constant to ensure numerical stability. Intuitively, in the earlier phases, the algorithm emphasizes exploration (larger random steps), while in later iterations, the update focuses more on fine-tuning around the best solution for convergence.
The search space broadens around the chosen solution when
and transitions toward convergence near the optimal solution when
. During the exploration stage, high walking and moving strategies are applied for scenarios where
, while belly walking is implemented for
and
. For exploitation, the hunting coordination mechanism operates under the conditions
and
. The hunting cooperation mechanism is utilized under the conditions
and
. Once the RSA algorithm satisfies its termination criterion, the process concludes. The flowchart illustrating the RSA procedure is presented in Fig. 5b. The algorithm has been applied to both models (LSTM and CNN), ultimately producing optimised output features for each
Algorithm 2.
Reptile Search Algorithm (RSA) for Feature Optimisation
Combine Features: After applying RSA on both models individually, the optimised features for each pollutant are combined to merge the strengths of the LSTM and CNN models. The combined feature vector merges the selected features for each pollutant in the LSTM model into a single feature vector. Similarly, the combined feature vector combines the selected features for each pollutant in the CNN model. Finally, as shown in Fig. 2, it combines the horizontally stacked feature vector in the LSTM and CNN models into the final feature vector. This approach takes advantage of the unique insights captured by each model, which can lead to improved predictive power. Existing models like standalone LSTMs or CNNs often fail at long-term horizons (10 days) due to the accumulation of errors in recursive prediction settings and their inability to capture both spatial features (short-term local patterns) and long-term temporal dependencies simultaneously. Our hybrid model explicitly addresses this by using the CNN to extract salient spatial features from the input sequence and the LSTM to model long-term temporal dynamics, creating a richer feature set for the final predictor. 10-Day Forecasting Method: The code and methodology must clearly state this. The forecasting approach. The model is trained to predict the concentration for the next day (t+1) given a sequence of the previous n days. To generate a 10-day forecast, we operate in an autoregressive manner. The predicted value for t+1 is fed back as input to predict t+2, and this process is repeated iteratively to reach the 10-day horizon. While efficient, this method is susceptible to error propagation over long horizons.
- Ensembling Learning through XGBoost: XGBoost37, leverages the boosting model introduced by38. Normalisation within the objective function simplifies the model, prevents overfitting, and expedites learning. XGBoost is an ensemble model that effectively integrates decision trees, leading to a combined model with superior predictive performance compared to individual methods. The output function is calculated from Eq. (20).
where,
20
represents the generated tree,
refers to the newly constructed tree model, and T indicates the total count of tree models. In XGBoost, tuning different parameters is essential for enhancing model performance and addressing overfitting concerns. The XGBoost takes in the input features
and target values y. Here, we input the final feature vector as input features. An XGBRegressor with 100 estimators and a learning rate of 0.1 ensures reproducibility with a fixed random state for the model initialisation. It fits the model to the data, training it to learn the relationships between the final feature vector and y. This integrates perfectly with the combined feature set, leveraging the predictive power of XGBoost to forecast pollutant levels as shown in Fig. 2.
Results
Comparative evaluation findings: The subsequent step in executing the hybrid approach involves constructing a neural network model that integrates the RSA optimiser and XGBoost for the prediction task. This study developed a model to exploit the strong correlation between historical and future pollutant concentrations. To evaluate the predictive performance of the proposed method in comparison to other approaches, five benchmark models are constructed: TST, CNN, BiLSTM, BiRNN, ANN, and BiGRU. These models are employed for forecasting pollutant concentrations. These models are univariate, meaning it take past observations of the target pollutant’s concentration as inputs and output the current concentration of the respective pollutant. The proposed approach’s performance is compared with these models using three evaluation metrics, as described below:
- R2: A statistical metric that measures the degree to which the variance in the dependent variable is explained or predicted by the independent variable.

21
where,
denotes the coefficient of determination, RSS signifies the residual sum of squares, and TSS represents the total sum of squares, as outlined in Eq. (21).
-
(b)MAE: It assesses the average of the absolute differences between actual observations and predicted values. The formula for calculating MAE is as follows:

22
where y represents the actual pollutant concentration,
is predicted value of pollutant concentration in Eq. (22).
-
(c)MAPE: Quantifies the percentage deviation between predicted values and actual observations. It is calculated by Eq. (23):

23 -
(d)MSE: quantifies the average squared disparity between the observed values in a statistical analysis and the values predicted by the model. Its computation is expressed in Eq. (24).

24
Tables 4 and 5 present the evaluation results for both the proposed and benchmark approaches, detailing the prediction outcomes for all four target pollutants. Notably, the hybrid approach surpassed all current benchmark methods, more accurately detecting fluctuations in pollutant concentrations and producing fewer prediction errors.
Table 4.
Performance comparison of the proposed approach and existing methods using various popular metrics.
| Model |
|
MAE | MAPE% | MSE |
|---|---|---|---|---|
| Transformer25 | 0.6948 | 0.0323 | 36.7777 | 0.0031 |
| CNN39 | 0.6143 | 0.0388 | 48.9230 | 0.0042 |
| BiLSTM23 | 0.7550 | 0.0298 | 39.0969 | 0.0026 |
| BiRNN26 | 0.7483 | 0.0297 | 35.3872 | 0.0026 |
| ANN40 | 0.7492 | 0.0310 | 30.877 | 0.0025 |
| BiGRU41 | 0.7509 | 0.0295 | 35.4085 | 0.0025 |
| SVR42 | 0.5777 | 0.0423 | 68.3424 | 0.0065 |
| RFR43 | 0.7501 | 0.0378 | 29.2999 | 0.0039 |
| KNN | 0.7427 | 0.0363 | 27.8318 | 0.0045 |
| GBM24 | 0.7671 | 0.0363 | 27.2358 | 0.0036 |
| Our approach | 0.9481 | 0.0163 | 20.1371 | 0.0005 |
Table 5.
Evaluation of the proposed approach’s performance using several popular metrics for all four pollutants.
| Model | ![]() |
Model | CO | ||||||
|---|---|---|---|---|---|---|---|---|---|
![]() |
MAE | MAPE | MSE | ![]() |
MAE | MAPE | MSE | ||
| TST | 0.8134 | 0.0197 | 30.6327 | 0.0013 | TST | 0.9002 | 0.0149 | 64.7483 | 0.0009 |
| CNN | 0.7590 | 0.0238 | 42.3350 | 0.0017 | CNN | 0.9160 | 0.0148 | 54.1940 | 0.0008 |
| BiLSTM | 0.8004 | 0.0177 | 30.7122 | 0.0007 | BiLSTM | 0.9185 | 0.1840 | 34.0113 | 0.1615 |
| BiRNN | 0.6231 | 0.3536 | 43.4619 | 0.1899 | BiRNN | 0.8143 | 0.2614 | 38.5783 | 0.3955 |
| ANN | 0.7892 | 0.0219 | 34.5608 | 0.0014 | ANN | 0.9149 | 0.0118 | 27.7484 | 0.0007 |
| BiGRU | 0.7024 | 0.0347 | 34.5608 | 0.0021 | BiGRU | 0.8601 | 0.3051 | 71.0872 | 0.3102 |
| GBM | 0.4695 | 0.0184 | 35.8107 | 0.0010 | GBM | 0.6777 | 0.0090 | 18.4490 | 0.0002 |
| KNN | 0.7427 | 0.0148 | 27.8318 | 0.0011 | KNN | 0.7227 | 0.0165 | 25.8318 | 0.0023 |
| Our | 0.9493 | 0.0131 | 22.0556 | 0.0003 | Our | 0.9812 | 0.0078 | 22.0159 | 0.0001 |
| Model | SO
|
Model | NO
|
||||||
|---|---|---|---|---|---|---|---|---|---|
![]() |
MAE | MAPE | MSE | ![]() |
MAE | MAPE | MSE | ||
| TST | 0.6412 | 0.0392 | 29.0025 | 0.0049 | TST | 0.8263 | 0.0452 | 31.6704 | 0.0019 |
| CNN | 0.5990 | 0.0433 | 34.9548 | 0.0053 | CNN | 0.7684 | 0.0437 | 33.3147 | 0.0041 |
| BiLSTM | 0.5812 | 0.9571 | 31.6072 | 0.0037 | BiLSTM | 0.7909 | 0.0378 | 39.3648 | 0.0026 |
| BiRNN | 0.6457 | 0.2840 | 28.6953 | 0.0806 | BiRNN | 0.7837 | 0.0564 | 24.8869 | 0.7389 |
| ANN | 0.5382 | 0.0406 | 33.4126 | 0.0045 | ANN | 0.7855 | 0.0414 | 27.7484 | 0.0039 |
| BiGRU | 0.5601 | 0.3052 | 31.3664 | 0.0734 | BiGRU | 0.7485 | 0.0425 | 29.9723 | 0.2167 |
| GBM | 0.4919 | 0.0538 | 42.7643 | 0.0070 | GBM | 0.5365 | 0.0553 | 27.6253 | 0.0063 |
| KNN | 0.7247 | 0.0178 | 26.8318 | 0.0011 | KNN | 0.7787 | 0.0165 | 25.7318 | 0.0017 |
| Our | 0.8323 | 0.0295 | 26.6894 | 0.0016 | Our | 0.8954 | 0.0316 | 20.3606 | 0.0020 |
Graphical Representation of Results:
This part emphasises the prediction results of the proposed approach by comparing predicted values with actual observations through plots.
Figure 6 illustrates the prediction plots for the four pollutants analysed in this research. The x-axis corresponds to the number of observations, while the y-axis represents the normalised values of the pollutants. Actual observations are depicted in blue, and predicted observations are in red. The proposed method accurately captures the non-linearity and variations in pollutant values, enhancing prediction accuracy. The proposed approach performs well in predicting pollutant value patterns. This high-efficiency level is also evident in the prediction for the next three pollutants.
Fig. 6.
Prediction results of pollutants (
, CO, SO
, NO
) using the proposed hybrid method.
The box plot analysis reveals a strong alignment between the model’s predictions and the observed values in terms of central tendency, variability, and distribution. In this study, the proposed method was utilised to generate box plots based on the actual and predicted values of the four pollutants, as illustrated in Fig. 7. A few outliers in the predictions indicate potential areas for further investigation and improvement. Overall, the model shows strong performance in capturing the central tendency and variability of the actual values, making it a reliable tool for prediction in this context.
Fig. 7.
Model performances on actual and predicted observations value of pollutants (
, CO, SO
, NO
).
In Fig. 8, the x-axis represents samples/observations, and the y-axis represents normalised pollutant values. The training observations are represented using the blue colour, the true observations are represented using the green colour, and the predicted values on the true dataset are red. The proposed method accurately captures the non-linearity and variations in pollutant values, enhancing the precision of the prediction.
Fig. 8.
Prediction results on train-test split of pollutants (
, CO, SO
, NO
) using proposed hybrid approach.
Figure 9 offers a visual representation of pollutant concentration comparisons across a 10-day window, encompassing the last 10 days of observed data and the next 10 days of forecasted values for four pollutants ( multi-step forecasting approach). On the x-axis, pollutant concentrations are plotted, while the y-axis denotes the days within this time frame. The previous 10 days of actual observations are illustrated with a blue line, spanning x-values from -10 to 0. A red vertical dashed line at x = 0 marks the present day, acting as a separator between past data and future projections. Forecasted data for the next 10 days is depicted using orange peaks, mapped to x-values between 0 and 10. This visualisation highlights the ability to effectively contrast observed and predicted pollutant trends, showcasing the proposed method’s capacity to capture patterns and variations across multiple pollutants.
Fig. 9.
Prediction results for pollutants 10 days in advance(
, CO, SO
, NO
) using proposed hybrid approach.
Figure 10 presents scatterplots that compare the proposed model’s actual and predicted pollutant values. On the x-axis, the actual values are plotted, while the y-axis represents the predicted values. These scatterplots provide a clear visual representation of how closely the predicted values align with the observed data. Points near the red diagonal line, defined by y = x, signify accurate predictions where the predicted and actual values are identical. The red diagonal line serves as a reference, highlighting perfect predictions. Data points clustered around this line indicate minimal prediction errors, showcasing the model’s efficiency.
Fig. 10.
Scatterplots of pollutants (
, CO, SO
, NO
) actual and predicted values.
It uncovers systematic patterns or prediction biases, such as consistent overestimation or underestimation within specific data ranges. Outliers could signify unusual pollutant concentration events or potential errors in the dataset or model. A wider spread indicates more significant variability in predictions. In areas with high pollution,
levels tend to exhibit less deviation from the diagonal line in scatterplots at elevated concentrations. The scatterplot displays points closely clustered around the diagonal for CO, suggesting lower variability than other pollutants. CO level variations might be attributed to extreme weather conditions or unusual emission events. Predictions for SO
often show more outliers, reflecting significant data variability due to localised pollution sources. Meanwhile, the scatterplot for NO
reveals a broader prediction spread, indicating potential difficulties in accounting for temporal variations.
Statistical Analysis: The pairwise t-test results comparing the Hybrid model with baseline models as shown in Table 6. All p-values are below 0.01, meaning the Hybrid model’s improvements over each baseline are statistically significant.
Table 6.
Pairwise t-test results comparing the Hybrid model with baseline models.
| Comparison | t-Statistic | p-value |
|---|---|---|
| Hybrid vs Transformer | 3.89 | 0.0012 |
| Hybrid vs CNN | 4.23 | 0.0008 |
| Hybrid vs LSTM | 2.98 | 0.0075 |
| Hybrid vs RNN | 5.45 | ![]() |
| Hybrid vs ANN | 3.76 | 0.0011 |
| Hybrid vs GRU | 3.25 | 0.0042 |
| Hybrid vs SVR | 6.78 | ![]() |
| Hybrid vs RFR | 4.89 | 0.0003 |
| Hybrid vs KNN | 7.32 | ![]() |
| Hybrid vs GBM | 4.12 | 0.0009 |
Robustness Analysis: Tables 7, 8 and 9 present robustness analysis across seasons, cities, or noisy inputs with other models, respectively.
Table 7.
Seasonal performance of the Hybrid model.
| Season | R2 Score | MAE | MAPE |
|---|---|---|---|
| Winter | 0.861 | 0.012 | 38.2% |
| Monsoon | 0.823 | 0.015 | 45.7% |
| Summer | 0.798 | 0.017 | 52.3% |
| Autumn | 0.812 | 0.016 | 48.9% |
Table 8.
City-wise model performance comparison.
| Model | Kolkata | Mumbai | Brajrajnagar | Delhi | Guwahati |
|---|---|---|---|---|---|
| Hybrid | 0.845 | 0.832 | 0.818 | 0.801 | 0.795 |
| Transformer | 0.812 | 0.806 | 0.791 | 0.774 | 0.768 |
| CNN | 0.798 | 0.789 | 0.775 | 0.761 | 0.754 |
| LSTM | 0.820 | 0.815 | 0.803 | 0.788 | 0.781 |
| RNN | 0.785 | 0.772 | 0.768 | 0.752 | 0.745 |
| ANN | 0.801 | 0.794 | 0.786 | 0.771 | 0.764 |
| GRU | 0.815 | 0.808 | 0.795 | 0.782 | 0.776 |
| SVR | 0.763 | 0.751 | 0.742 | 0.728 | 0.719 |
| RFR | 0.791 | 0.783 | 0.776 | 0.759 | 0.751 |
| KNN | 0.745 | 0.732 | 0.721 | 0.708 | 0.698 |
| GBM | 0.809 | 0.798 | 0.789 | 0.775 | 0.769 |
Table 9.
Noise robustness analysis.
| Model | 0.00 (Clean) | 0.01 | 0.05 | 0.10 |
|---|---|---|---|---|
| Hybrid | 0.845 | 0.832 | 0.801 | 0.763 |
| Transformer | 0.812 | 0.794 | 0.752 | 0.698 |
| LSTM | 0.820 | 0.805 | 0.768 | 0.715 |
| CNN | 0.798 | 0.781 | 0.739 | 0.682 |
| Performance Degradation | Baseline | -1.5% | -5.2% | -9.7% |
Performance Comparisons: Table 10 presents performance comparisons with three hybrid models optimized with different meta-heuristics: GA (Genetic Algorithm), PSO (Particle Swarm Optimization), and RSA (Reptile Search Algorithm). RSA outperforms GA and PSO across all metrics.
Disussion: The proposed hybrid model’s superior performance arises from its ability to integrate spatial, temporal, and optimised feature selection, reducing error accumulation in long-term forecasts. This aligns with air pollution theory, where pollutant dynamics are influenced by nonlinear meteorology–emission interactions. Compared to existing CNN–LSTM or statistical baselines, our approach uniquely sustains predictive accuracy over extended horizons, addressing an unresolved gap.
The assumptions and limitations are mentioned below:
Assumptions: Future pollutant levels are primarily driven by recent observations; seasonal and cross-pollutant dependencies remain stable and can be learned; cleaned datasets are sufficiently representative; and LSTMs can manage mild non-stationarity through contextual windows.
Limitations: The hybrid/Transformer-based pipeline is computationally intensive; city-wise NaN dropping may lead to substantial data loss and biased samples (violating MCAR); reliance on simplistic cleaning rather than imputation reduces robustness; and the models remain purely autoregressive, excluding exogenous factors such as weather conditions, holidays, and traffic flows.
CNN/ANN are lightest (linear in T), (Bi-)LSTM/GRU are mid-cost (per-step recurrence), and Transformers get expensive for long sequences (
attention). The Hybrid (CNN+LSTM+XGBoost) is the heaviest because it adds all three costs, but it wins on accuracy and 10-day stability—CNN captures local patterns, LSTM learns long memory, and XGBoost fits residual structure.
Table 10.
Performance Comparison.
| Model | ![]() |
MAE | MAPE | MSE |
|---|---|---|---|---|
| Hybrid + GA | 0.918 | 0.018 | 21.3 | 0.0006 |
| Hybrid + PSO | 0.929 | 0.017 | 20.5 | 0.00055 |
| Hybrid + RSA | 0.948 | 0.016 | 20.1 | 0.0005 |
Conclusion
This research study introduces a hybrid approach that combines CNN and LSTM architectures, optimized using, to predict future pollutant concentrations with the help of XGBoost. The RSA is utilised for feature selection, identifying the most impactful features for future pollutant concentration predictions. Developed a multivariate LSTM model and a 1D CNN to estimate future pollutant concentrations in this research. To validate the accuracy and effectiveness of the proposed approach, the prediction results were compared with those of the TimeseriesTransformer, CNN, BiLSTM, BiRNN, ANN and BiGRU models. The evaluation, using well-known metrics such as
, MAE, MAPE, and MSE, showed that the proposed hybrid approach consistently surpassed the performance of other models. The overall performance of the hybrid model is:
= 0.9881, MAE = 0.0163, MAPE = 20.1371 and MSE = 0.0005. This confirms the robustness of our feature selection methodology, utilising the optimisation algorithm and ensemble learning. Therefore, the proposed hybrid method effectively selects the optimal features for accurate future concentration predictions, introducing an innovative approach to pollutant forecasting. While the proposed method is computationally demanding, there is potential for future improvements to enhance computational efficiency.
Building on this work, future research can explore federated learning for privacy-preserving multi-city pollutant prediction, edge computing for low-latency, on-device inference, and real-time deployment in smart city monitoring systems. Additionally, integrating transformer architectures and explainable AI can enhance accuracy, interpretability, and scalability, enabling efficient, continuous pollutant forecasting across urban environments.
This study bridges the gap in long-term air pollution forecasting by integrating CNN–LSTM with RSA and XGBoost, ensuring robust accuracy across pollutants. Limitations include computational complexity and high training costs. Future research will examine federated learning, transformers, and edge deployment. Practically, the framework empowers policymakers and urban planners with reliable forecasts to guide sustainable air quality management.
Acknowledgements
We hereby acknowledge the support of the Computer Science Engineering Department, Thapar Institute of Engineering Technology, Patiala, Punjab, for providing the facility.
Author contributions
Priya Kansal: Writing original draft, Validation, Methodology, Conceptualisation. Jatin Bedi: Writing - review & editing, Conceptualization, Validation, Supervision. Sushma Jain: Writing - review & editing, Conceptualization, Validation, Supervision. All authors are fully aware of this manuscript and have permission to submit the manuscript for possible publication.
Funding
Authors declare that no funding has been received to support the work carried out in the current study.
Data availability
The dataset used and analysed in this study can be made available to the authors upon request to pkansal_phd22@thapar.edu.
Code availability
The code used and analysed in this study can be made available to the authors upon request at pkansal_phd22@thapar.edu.
Declarations
Competing Interests
The authors declare no competing interests.
Ethical approval
All the authors have read, understood, and have complied as applicable with the statement on “Ethical responsibilities of Authors” as found in the Instructions for Authors. They are aware that, with minor exceptions, no changes can be made to authorship once the paper is submitted.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Abbass, K. et al. A review of the global climate change impacts, adaptation, and sustainable mitigation measures. Environ. Sci. Pollut. Res.29, 42539–42559 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Al-Obaidy, A. H., Jasim, I. & AlKubaisi, A.-R. Air pollution effects in some plant leaves morphological and anatomical characteristics within Baghdad City, Iraq. Eng. Technol. J.37(1C), 84–89 (2019). [Google Scholar]
- 3.Sharma, E., Deo, R. C., Prasad, R., Parisi, A. V. & Raj, N. Deep air quality forecasts: Suspended particulate matter modeling with convolutional neural and long short-term memory networks. IEEE Access8, 209503–209516. 10.1109/ACCESS.2020.3039002 (2020). [Google Scholar]
- 4.Bouktif, S., Fiaz, A., Ouni, A. & Serhani, M. A. Multi-sequence LSTM-RNN deep learning and metaheuristics for electric load forecasting. Energies13(2), 391 (2020). [Google Scholar]
-
5.Inam, S. A. et al. PR-FCNN: A data-driven hybrid approach for predicting pm
concentration. Discov. Artif. Intell.10.1007/s44163-024-00184-7 (2024). [Google Scholar] - 6.Inam, S. A. et al. A neural network approach to carbon emission prediction in industrial and power sectors. Discov. Appl. Sci.7, 640. 10.1007/s42452-025-07257-x (2025). [Google Scholar]
- 7.Han, C., Park, H., Kim, Y., & Gim, G. In: (ed. Lee, R.) Hybrid CNN-LSTM Based Time Series Data Prediction Model Study 43–54 (Springer, 2023). 10.1007/978-3-031-19608-9_4
- 8.Elshewey, A. M. Enhancing crop yield prediction based on dove optimization algorithm and gradient boosting model. SIViP (Signal, Image and Video Processing)19, 951. 10.1007/s11760-025-04545-2 (2025). [Google Scholar]
- 9.Zhang, Z., Johansson, C., Engardt, M., Stafoggia, M. & Ma, X. Improving 3-day deterministic air pollution forecasts using machine learning algorithms. Atmos. Chem. Phys.24, 807. 10.5194/acp-24-807-2024 (2024). [Google Scholar]
- 10.Reichstein, M. Deep learning and process understanding for data-driven earth system science. Nature566, 7743 (2019). [DOI] [PubMed] [Google Scholar]
- 11.Bai, L., Wang, J., Ma, X. & Lu, H. Air pollution forecasts: An overview. Int. J. Environ. Res. Public Health10.3390/ijerph15040780 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Liu, B. et al. An attention-based air quality forecasting method. IEEE Int. Conf. Mach. Learn. Appl.17, 728–733 (2018). [Google Scholar]
- 13.Yixuan Zhu, J., Sun, C. & Li, V. An extended spatiotemporal granger causality model for air quality estimation with heterogeneous urban big data. IEEE Trans. Big Data3, 307–319 (2017). [Google Scholar]
- 14.Chauhan, R., Kaur, H. & Alankar, B. Air quality forecast using convolutional neural network for sustainable development in urban environments. Sustain. Cities Soc.75, 103239. 10.1016/j.scs.2021.103239 (2021). [Google Scholar]
- 15.Zhang, Q., Han, Y., Li, V. & Lam, J. Deep-air: A hybrid CNN-LSTM framework for fine-grained air pollution estimation and forecast in metropolitan cities. IEEE Access10, 55818–55841 (2022). [Google Scholar]
- 16.Han, Y., Lam, J. C. K., Li, V. O. K. & Zhang, Q. A domain-specific Bayesian deep-learning approach for air pollution forecast. IEEE Trans. Big Data8, 1034–1046 (2022). [Google Scholar]
- 17.Saenz, T., Fernando, M., Garcia, J. & Munoz, A. Nationwide air pollution forecasting with heterogeneous graph neural networks. ACM Trans. Intell. Syst. Technol.15, 18–11819 (2023). [Google Scholar]
- 18.Chen, B. et al. Geo-STO3Net: A deep neural network integrating geographical spatiotemporal information for surface ozone estimation. IEEE Trans. Geosci. Remote Sens.62, 1–1. 10.1109/TGRS.2024.3358397 (2024). [Google Scholar]
- 19.Mushtaq, Z. et al. Satellite or ground-based measurements for air pollutants (pm2.5, pm10, so2, no2, o3) data and their health hazards: which is most accurate and why?. Environ. Monit. Assess. 10.1007/s10661-024-12462-z (2024). [DOI] [PubMed]
- 20.Yue, X. et al. Airpollutionviz: visual analytics for understanding the spatio-temporal evolution of air pollution. J. Vis.27(2), 215–233. 10.1007/s12650-024-00958-2 (2024). [Google Scholar]
- 21.Wang, L. et al. Short-term pm2.5 prediction based on multi-modal meteorological data for consumer-grade meteorological electronic systems. IEEE Trans. Consum. Electron.70(1), 3464–3474. 10.1109/TCE.2024.3354073 (2024). [Google Scholar]
- 22.Xia, Y. et al. Understanding the disparities of pm2.5 air pollution in urban areas via deep support vector regression. Environ. Sci. Technol.10.1021/acs.est.3c09177 (2024). [DOI] [PubMed] [Google Scholar]
- 23.Wu, Z., Tian, Y., Li, Y., Quan, M. & Liu, J. Prediction of air pollutant concentrations based on the long short-term memory neural network. J. Hazard. Mater.465, 133099. 10.1016/j.jhazmat.2023.133099 (2024). [DOI] [PubMed] [Google Scholar]
- 24.Sharma, M. K. et al. Assessment of fine particulate matter for port city of eastern peninsular India using gradient boosting machine learning model. Atmosphere13, 743. 10.3390/atmos13050743 (2022). [Google Scholar]
- 25.Yu, M., Masrur, A. & Boxe, C. Predicting hourly pm2.5 concentrations in wildfire-prone areas using a spatiotemporal transformer model. Sci. Total Environ.860, 160446. 10.1016/j.scitotenv.2022.160446 (2022). [DOI] [PubMed] [Google Scholar]
- 26.Saravanan, D. & Kumar, S. Improving air pollution detection accuracy and quality monitoring based on bidirectional RNN and the internet of things. J. ScienceDirect81, 791–796. 10.1016/j.matpr.2021.04.239 (2023). [Google Scholar]
- 27.CPCB: Air-Pollution https://cpcb.nic.in/air-pollution/, (Accessed_on_June 2024) (2020).
- 28.Board, C.P.C. Real-time air quality data, accessed 2025 September 01; https://cpcb.nic.in/real-time-air-qulity-data/ (2020).
- 29.Khatibi, V. & Nikpour, P. Advancing multi-pollutant air quality forecasting using transformer-based informer architecture. Earth Sci. Inf.18(1), 287. 10.1007/s12145-025-01722-2 (2025). [Google Scholar]
- 30.Abualigah, L., Abd Elaziz, M., Sumari, P., Zong Woo, G. & Gandomi, A. H. A comprehensive review of the reptile search algorithm: Principles, applications, and future directions. Mathematics13(6), 1001. 10.3390/math13061001 (2025). [Google Scholar]
- 31.Zhou, R. et al. A multi-strategy enhanced reptile search algorithm for global optimization and engineering optimization design problems. J. Netw. Intell.9(3), 93. 10.3390/jni9030093 (2024). [Google Scholar]
- 32.Wu, Y.-X. & Wang, A.-C. An improved reptile search algorithm with novel mean transition mechanism for constrained industrial engineering problems. J. Netw. Intell.9(3), 18. 10.3390/jni9030018 (2024). [Google Scholar]
- 33.Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput.9(8), 1735–1780. 10.1162/neco.1997.9.8.1735 (1997). [DOI] [PubMed] [Google Scholar]
- 34.LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. Gradient-based learning applied to document recognition. In Proceedings of the IEEE vol. 86 2278–2324 (IEEE, 1998).
- 35.Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016). [Google Scholar]
- 36.Abualigah, L., Abd Elaziz, M., Sumari, P., Geem, Z. W. & Gandomi, A. H. Reptile search algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Syst. Appl.191, 116158. 10.1016/j.eswa.2021.116158 (2022). [Google Scholar]
- 37.Tianqi Chen, C. G. Xgboost:a scalable tree boosting system. CoRR 1603.02754 (2016).
- 38.Friedman, J. Greedy function approximation: A gradient boosting machine. Ann. Stat.29, 1189–1232. 10.2307/2699986 (2001). [Google Scholar]
- 39.Sarkar, P., Saha, D. & Saha, M. Real-time air quality index detection through regression-based convolutional neural network model on captured images. Environ. Qual. Manag.10.1002/tqem.22276 (2024). [Google Scholar]
- 40.Yadav, V., Yadav, A., Singh, V. & Singh, T. Artificial neural network an innovative approach in air pollutant prediction for environmental applications: A review. Results Eng.10.1016/j.rineng.2024.102305 (2024). [Google Scholar]
- 41.Subbiah, S., Paramasivan, S., & Thangavel, M. Prediction of particulate matter pm2.5 using bidirectional gated recurrent unit with feature selection. Global NEST J.26 (2024).
- 42.Sánchez, A. S., Nieto, P. J. G., Fernández, P. R., Coz Díaz, J. J. & Iglesias-Rodríguez, F. J. Application of an SVM-based regression model to the air quality study at local scale in the avilés urban area (Spain). Math. Comput. Model.54, 1453–1466 (2011). [Google Scholar]
- 43.Weifu Ding, X. Q. Prediction of air pollutant concentrations via random forest regressor coupled with uncertainty analysis a case study in ningxia. J. Atmos.13, 960. 10.3390/atmos13060960 (2022). [Google Scholar]
- 44.Fortunato, S. Community detection in graphs. Phys. Rep.486, 75–174. 10.1016/j.physrep.2009.11.002 (2010). [Google Scholar]
- 45.Borah, J. et al. Aicareair: Hybrid-ensemble internet-of-things sensing unit model for air pollutant control. IEEE Sens. J.24(13), 21558–21565. 10.1109/JSEN.2024.3397735 (2024). [Google Scholar]
- 46.Singh, S., Sharma, G. D., Singh Parihar, J. & Dev, D. Nexus between environmental degradation and climate change during the times of global conflict: Evidence from cs-ardl model. Environ. Sustain. Indic.22, 100368 (2024). [Google Scholar]
- 47.Dey, S. Apict: Air pollution epidemiology using green AQI prediction during winter seasons in India. IEEE Trans. Sustain. Comput.14 (2021).
- 48.Ramachandran, A. World air quality index by city and coordinates https://www.kaggle.com/ datasets /world-air-quality-index-by-city-and-coordinates,(Accessed_on_June 2024) (2023).
- 49.Geron, A. (ed.) O Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 (2019).
- 50.Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. Comput. Vis. Pattern Recognit. 213–229 (2020)
- 51.Mo, H., Sun, H., Liu, J. & Wei, S. Developing window behavior models for residential buildings using xgboost algorithm. Energy Build.10.1016/j.enbuild.2019.109564 (2019). [Google Scholar]
- 52.Rumelhart, D., Hinton, G. & Williams, R. Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition Vol. 1 (eds Rumelhart, D. E. & McClelland, J. L.) 318–362 (MIT Press, 1986). [Google Scholar]
- 53.Li, H., Yang, T., Du, Y., Tan, Y. & Wang, Z. Interpreting hourly mass concentrations of pm2.5 chemical components with an optimal deep-learning model. J. Environ. Sci.151, 125–139. 10.1016/j.jes.2024.03.037 (2025). [DOI] [PubMed] [Google Scholar]
-
54.Shah, P., & Mishra, P. Analytical equations based prediction approach for pm
using ann. Preprint at arXiv:2002.11416 (2020).
-
55.Sreenivasulu, T. & Rayalu, G. Enhanced pm
prediction in Delhi using a novel STL-CNN-BILSTM-AM hybrid model. Environ. Syst. Res.13(1), 48. 10.1007/s44273-024-00048-7 (2024). [Google Scholar] - 56.Sidhu, K. K., Balogun, H., & Oseni, K.O. Predictive modelling of air quality index (AQI) across diverse cities and states of India using machine learning: Investigating the influence of Punjab’s stubble burning on AQI variability. Preprint at arXiv:2404.08702 (2024).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The dataset used and analysed in this study can be made available to the authors upon request to pkansal_phd22@thapar.edu.
The code used and analysed in this study can be made available to the authors upon request at pkansal_phd22@thapar.edu.
















