Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Nov 7;15:39041. doi: 10.1038/s41598-025-23940-w

A hybrid approach leveraging meta-heuristic and ensemble learning for time-sensitive prediction of pollutant concentrations

Priya Kansal 1,, Jatin Bedi 1, Sushma Jain 1
PMCID: PMC12595109  PMID: 41203671

Abstract

Traditional deep learning models such as convolutional neural networks (CNNs), which capture localized features, and long short-term memory networks (LSTMs), which focus on long-term dependencies, often face challenges in achieving higher accuracy for time series prediction tasks. To address this limitation, this study proposes a hybrid deep learning model that integrates CNN, LSTM, the reptile search algorithm (RSA), and eXtreme Gradient Boosting (XGB) for pollutant concentration forecasting. Initially, the raw pollutant concentration data undergoes cleaning and normalization via a Min–Max scaler. The processed sequences are then separately fed into LSTM and CNN models to extract weighted features. RSA is applied to optimize these features, while XGB computes feature importance scores, quantifying the contribution of each selected feature to the predictive performance. The proposed model predicts pollutants such as Inline graphic, CO, SOInline graphic, and NOInline graphic up to ten days in advance for urban Indian settings. Comparative evaluations against benchmark models—including Transformer, CNN, BiLSTM, BiRNN, ANN, and BiGRU—demonstrate that the hybrid approach yields consistently superior accuracy and robustness. The hybrid model achieves substantially lower errors and higher Inline graphic scores across all pollutants, validating its reliability for long-horizon air quality forecasting.

Keywords: Hybrid model, Ensemble learning, Meta-heuristic optimisation, Air pollutants

Subject terms: Environmental sciences, Mathematics and computing

Introduction

Over the past decades, air pollution has emerged as a pressing global issue1. Research extensively highlights its profound effects on both human health and vegetation, demonstrating that prolonged exposure significantly harms plant foliage2. The primary contaminants predominantly originate from stationary sources, including dust, Inline graphic particles (with diameters less than 10 Inline graphic) and Inline graphic particles (with diameters less than 2.5 Inline graphic). Inline graphic particles are particularly hazardous as they stem from unburned fuel and industrial byproducts.

Other major pollutants include sulfur dioxide (SOInline graphic), which is emitted during the combustion of fuels; nitrogen oxides (NOInline graphic), formed when nitrogen and oxygen react under high-temperature conditions; carbon monoxide (CO), a byproduct of incomplete combustion; and ozone (OInline graphic), which results from photochemical reactions3. Figure 1 illustrates these major sources of air pollution.

Fig. 1.

Fig. 1

Major sources of air pollution.

Forecasting air pollution levels with precision is crucial for promoting partnerships with governmental bodies and raising public awareness about its risks. Air pollution data are often characterised by rising or falling trends, seasonality (variability over specific periods), cycles (fluctuations without a fixed time pattern) and erratic movements4.

Deep learning architectures are highly effective in addressing air pollution prediction problems. It excels at handling non-linear, cyclical, seasonal, and sequential dependencies present in pollutant data, making it a robust solution for such complex scenarios as reported in5. and6.

Training deep learning models demands substantial computational resources and time. The selection of hyperparameters plays a crucial role in the model’s training process, directly influencing its computational efficiency, susceptibility to overfitting, and the accuracy of the final model.

Hyperparameter tuning is a challenging and time-intensive process that aims to identify the optimal configuration, invariably involving some margin of error. Crucial factors to evaluate involve deciding on the number of layers, specifying the quantity of cells in each layer, selecting the units, determining the batch size, opting for suitable activation functions, and configuring related settings.

To address the challenges of air pollution prediction, this study adopts a hybrid deep learning model that combines Long Short-Term Memory (LSTM) networks with Convolutional Neural Networks (CNN). Traditional models often struggle either with capturing long-term dependencies or with efficiently detecting local variations in time-series data. The LSTM component overcomes this by preserving contextual information across extended sequences, while the CNN component effectively extracts localized temporal patterns and short-term fluctuations. By integrating both, the hybrid approach not only reduces overfitting but also provides a more comprehensive representation of the data, thereby surpassing the limitations of single-model architectures and improving predictive accuracy for air pollution forecasting.

Combining LSTM and CNN helps mitigate overfitting by providing a more robust representation of the data7. These methods can be used to locate near-optimal solutions8 within vast search spaces, thereby contributing to developing a highly effective model to predict air pollution.

Considering the aforementioned information, the novelties and key contributions of this study are highlighted as follows.

  • The present study proposes hybrid models that integrate CNN and LSTM with XGB, alongside the RSA optimisation algorithm, to predict pollutants (Inline graphic, CO, OInline graphic, NOInline graphic).

  • This research leverages meta-heuristic optimisation technique, specifically the RSA, to minimise computational complexity and enhance training efficiency.

  • Current methodologies in this field primarily concentrate on short-term predictions, typically covering 24 to 48 hours. Only a few studies extend their forecasts to 3 or 4 days9. This innovative approach can make predictions up to 10 days in advance.

Related works

Air quality models can generally be classified into physical modelling and machine learning. Physical modelling is theory-driven, relying on mathematical and physical principles to simulate the behaviour of pollutants. In contrast, machine learning is data-driven, using large datasets to develop algorithms that predict air quality based on observed data patterns and relationships10. Atmospheric dispersion models (ADMs), Weather Research and Forecasting model coupled with Chemistry(WRF-Chem), and the Community Multiscale Air Quality Modelling System(CMAQ) frequently utilise physical models. ADMS focuses on atmospheric dispersion modelling, WRF-Chem integrates weather forecasting with chemical processes, and CMAQ combines chemical transport modelling with meteorological data11. However, these methods often encounter limitations due to their high computational costs, the intricate nature of modelling chemical processes, and the inherent uncertainties in emission inventories12. Conversely, air pollution modelling can also utilise historical data to discern statistical patterns and their correlations with urban proxy variables, such as meteorological conditions and traffic flow. Linear models like Autoregressive Integrated Moving Average (ARIMA) or machine learning methods such as Support Vector Regression (SVR) and Artificial Neural Networks (ANN) can be utilised effectively for modelling non-linear dynamics in high-dimensional environments. Recent progress in data-driven techniques has demonstrated significant potential in estimating and forecasting air pollution using comprehensive urban datasets13. Deep learning techniques, such as Recurrent Neural Networks (RNN) and their variants like LSTM and Gated Recurrent Unit (GRU) models, have excelled in various time series prediction tasks, including air pollution forecasting. Similarly, CNN can be employed on air quality datasets to identify patterns for future prediction models14. A combined CNN-LSTM architecture is utilised to capture spatial and temporal features, with 1D convolution layers employed to improve the learning of interactions between these dimensions15. An innovative method integrates deep learning with domain-specific models to predict long-term air pollution trends in China and the United Kingdom. This method incorporates domain-specific knowledge by using the statistically significant relationship between Inline graphic and Inline graphic as a regularisation term16. Graph Neural Networks (GNNs) often struggle to capture various spatial and feature-based contextual factors. This issue has been addressed by a new GNN framework that successfully captures the similarities among stations by considering land use at their locations and their primary sources of pollution17. In response to the increasing surface ozone (OInline graphic) pollution in urban areas across China, a new deep neural network (DNN) model, Geo-STO3Net, was developed. This model integrates nearby geographical spatiotemporal information using comprehensive meteorological data and satellite observations for surface OInline graphic estimation18. Between October 4, 2021, and December 26, 2021, a 12-week investigation into ambient air quality parameters (Inline graphic, Inline graphic, SOInline graphic, NOInline graphic, and OInline graphic) across four sampling sites in the Delhi-NCR region. The investigation revealed that measurements of Inline graphic obtained through ground-based instruments exceeded those recorded via satellite monitoring. Conversely, satellite observations indicated elevated average levels of SOInline graphic and NOInline graphic compared to other pollutants19. Spatio-temporal evolution analysis is a key area of air pollution research. AirPollutionViz is a visual analytics system designed to facilitate the analysis of spatio-temporal evolution through sequence mining and clustering analysis20. An IoT-enabled system has significantly enhanced air quality monitoring and prediction, focusing on PM concentration monitoring across edge devices and cloud platforms. The system employs an advanced WVPBL approach that integrates wavelet denoising, principal component analysis (PCA), and variational mode decomposition. This combination facilitates the extraction of features from multi-modal air quality data, thereby ensuring precise short-term predictions of Inline graphic concentrations21.

Most of the discussed studies emphasise short-term predictions, often limited to 24–48 hours. However, these models typically underperform when extended to longer-term forecasting due to issues such as error accumulation, sensitivity to dynamic meteorological conditions, and the inability to capture evolving pollutant–meteorology interactions. By explicitly highlighting these shortcomings, the necessity of developing a hybrid approach that integrates spatial, temporal, and domain-specific factors for robust long-term air pollution forecasting would be made clearer.

Although prior studies employing CNNs, LSTMs, and other deep learning models demonstrate strong short-term forecasting performance, they rarely address longer-term horizons such as 10-day pollutant predictions. These models often suffer from error accumulation, limited spatiotemporal feature extraction, and inadequate feature optimization strategies. Our study addresses this gap by proposing a hybrid CNN–LSTM–RSA–XGBoost architecture, explicitly designed for robust 10-day forecasts. This integration simultaneously captures local and long-term dependencies, optimizes feature selection, and ensures interpretability—areas insufficiently covered in existing CPCB-based research.

The summarised literature review is presented in Table 1.

Table 1.

Related works comparison table.

Papers Parameters Method Findings
14 Inline graphic, SOInline graphic, Inline graphic, Inline graphic, CO, NOInline graphic and OInline graphic Convolutional Neural Network CNNs are adept at identifying intricate patterns within data.
15 Air pollution, Weather, Traffic, Morphology data Hybrid CNN-LSTM model The integration of CNN with LSTM allows the model to capture spatial and temporal characteristics.
16 Air Pollution and Weather data Bayesian deep-learning model Domain-specific model can better adapt to the specific characteristics of air pollution data.
18 In situ surface OInline graphic, TROPOMI data (OInline graphic concentrations data), Auxiliary Data Deep neural network (Geo-STO3Net) Combines meteorological data and satellite imagery, capturing complex geographical and spatiotemporal patterns.
20 Inline graphic, SOInline graphic, Inline graphic, Inline graphic, CO, NOInline graphic and OInline graphic and longitude, latitude Sequence mining and clustering analysis Advanced visualization techniques to effectively display air pollution patterns .
21 Inline graphic, SOInline graphic, Inline graphic, Inline graphic, CO, NOInline graphic and OInline graphic, atmospheric temperature, atmospheric humidity, rainfall, and wind speed Bidirectional long-short memory network Utilizing sophisticated methods such as wavelet denoising, variational mode decomposition, and principal component analysis to identify features.
22 Inline graphic and longitude, latitude data Deep Support vector regression It leverages a random walk to uncover more extensive spillover relationships between nodes.
23 Inline graphic data, Meteorological and forest fire disturbance data Long short-term memory neural network Added interference of forest fires to pollutant predictions to improve accuracy.
24 Air pollutant concentration and Meteorological parameters Gradient boosting machine learning model Fifteen regression models tested and found that the CatBoost regression model outperformed.
25 Inline graphic data, Meteorological and wildfire data SpatioTemporal (ST)-Transformer Model captures spikes in Inline graphic concentrations during wildfire situations.
26 Air pollution and Meteorological data Bidirectional Recurrent Neural Network Analyses through IoT and adapting neural network.

Methodology

This section explains the system’s working architecture, as depicted in . It starts with the dataset gathered from27. The data undergoes pre-processing and transformation through various techniques to enhance quality and eliminate inconsistencies. Following this, the processed data is input into the model. Finally, both models are trained to predict future concentrations of the analysed air pollutants.

Input Dataset

The CPCB dataset has been fed into the model28. This dataset consists of comprehensive hourly and daily records of pollutant concentrations for various Indian cities, spanning 2015 to 2020. The pollutant measurements encompass 29,532 hourly and daily concentration data samples for each city. A statistical summary of the dataset is presented in Table 2.

Table 2.

A statistical summary of the dataset.

StatisticInline graphic Count Mean Std Min 25% 50% 75% Max
Inline graphic 24933 67.45 64.66 0.04 28.82 48.57 80.59 949.99
Inline graphic 18391 118.12 90.60 0.01 56.25 95.68 149.74 1000
NO 25949 17.57 22.78 0.02 5.63 9.89 19.95 390.68
NOInline graphic 25946 28.56 24.47 0.01 11.75 21.69 37.62 362.21
NOInline graphic 25346 32.30 31.64 0 12.82 23.52 40.12 467.63
Inline graphic 19203 23.48 25.68 0.01 8.58 15.85 30.02 352.89
CO 27472 2.24 6.96 0 0.51 0.89 1.45 175.81
SOInline graphic 25677 14.53 18.13 0.01 5.67 9.16 15.22 193.86
OInline graphic 25509 34.49 21.69 0.01 18.86 30.84 45.57 257.73
Benzene 23908 3.28 15.81 0 0.12 1.07 3.08 455.03
Toluene 21490 8.70 19.96 0 0.6 2.97 9.15 454.85
Xylene 11422 3.07 6.32 0 0.14 0.98 3.35 170.37
AQI 24850 1166.46 140.69 13 81 118 208 2049

Preprocessing

Time-series data often exhibits noise, missing values, inconsistencies, and redundant information, typically dispersed across various heterogeneous sources. These inconsistencies can significantly reduce data quality, thereby impacting the reliability and accuracy of results. The key phases involved in the data preprocessing stage are discussed below.

  • Handle missing value: Missing values in the dataset can lead to unreliable or inaccurate predictions. To address this issue, the “city” column groups the input data, and any rows containing missing values are removed within each city group, followed by an index reset. Consequently, this methodology’s initial step focuses on handling missing values. Analytical observations reveal that cities such as Ahmedabad, Bengaluru, and Mumbai have a significantly higher number of null entries. Some city entries were dropped to manage these missing values, and index values were reset.

  • Data scaling and transformation: A MinMaxScaler is applied to scale the data (x) within the range [0, 1] using Eq. (1). Scaling is critical for neural networks, facilitating more effective convergence during training. The scaler is fitted and applied exclusively to the specified pollutant columns, which typically represent the features used in the model. To analyse time-series data, lag features are constructed using values from the preceding 24 time steps to predict the subsequent pollutant levels.
    graphic file with name 41598_2025_23940_Article_Equ1.gif 1
  • Splitting dataset: The dataset is divided into training and testing sets using the split function. In this process, 20% of the data is designated for testing purposes, while the remaining 80% is utilised for training. A random state of 42 is employed as the random seed to ensure reproducibility during the split.

Hybrid model

This section comprehensively analyses the proposed model. Following data preprocessing and splitting, the windowed data generated in the previous steps is utilised as input for the hybrid model, as illustrated in Fig. 2. The primary objective of this study is to forecast future pollutant levels using historical timestamp data.

Fig. 2.

Fig. 2

Flow diagram of the proposed hybrid methodology.

The hybrid model integrates LSTM, CNN, and RSA for feature selection and utilises XGBoost for prediction. Detailed explanations for each stage are provided below, and the pseudocode for each stage is outlined in Algorithm 1.

The methodology details all parameters: sequence_length = 30, epochs = 100 (with early stopping patience=10), batch_size = 32, optimiser (Adam, lr=0.001), and all model layer sizes. The random seed (random_state=42) must be set for all stochastic processes (splits, model init). Using train_test_split(shuffle=False) is correct for time series to avoid look-ahead bias. Even better would be explicit time-based splitting (e.g., train on 2015-2018, validate on 2019, test on 2020). The key novelty of this work lies in the specific integration of a CNN and LSTM for spatiotemporal feature extraction, followed by a feature selection step and an XGBoost regressor, explicitly designed and validated for the challenging task of 10-day air quality forecasting in Indian urban centers, a horizon significantly longer than those addressed in most previous studies.

Although recent Transformer-based models like Informer and Autoformer excel in long-sequence forecasting, they are computationally intensive and require large datasets to generalize effectively29. In contrast, our hybrid CNN-LSTM framework efficiently captures both local patterns (via CNN) and long-term dependencies (via LSTM) in city-level pollutant time series. The integration of the Reptile Search Algorithm (RSA) optimizes feature weights, while XGBoost provides interpretable feature importance scores. Given the moderate dataset size and focus on interpretability, this approach achieves high predictive accuracy and efficiency without the added complexity of Transformer variants, making it a practical choice for urban pollutant forecasting.

The Reptile Search Algorithm (RSA) has garnered significant attention in recent literature for its efficacy in optimization tasks. A comprehensive review by30 highlights RSA’s strengths in balancing exploration and exploitation, making it a robust choice for complex optimization problems. Furthermore, a study by31 introduced a multi-strategy enhanced RSA, integrating dynamic evolutionary strategies to improve convergence rates. Additionally,32 proposed an improved RSA addressing population diversity and convergence issues, enhancing its performance in challenging optimization scenarios.

  • LSTM model: The LSTM architecture33, pioneered by Hochreiter, has exceptional effectiveness in time-series prediction. This model effectively captures non-linear connections between historical data and current time points. In this study, a multivariate, two-layer Long Short-Term Memory (LSTM) network has been developed, as illustrated in Fig. 3.

A sequential LSTM model is constructed by stacking two LSTM layers. This model takes into account both the length of the input sequence and the number of features at each time step. The first LSTM layer has the input sequence ’w’, i.e. the window size of the previous time step. This layer processes the input time-series data and extracts initial temporal features, capturing short-term dependencies and patterns in the sequence, such as daily trends. The output from the first layer is then passed to the second LSTM layer, which further refines the temporal features. This layer focuses on capturing long-term dependencies and more abstract patterns, such as seasonal trends or the impact of weather conditions over extended periods.

Fig. 3.

Fig. 3

Structural design of long short-term memory networks.

The LSTM layer handles multivariate sequence data by capturing temporal dependencies across all pollutants using 50 LSTM units. The LSTM layer 1 has a rectified linear unit (ReLU) as the activation function, and the return sequences parameter is set to true. This layer is the model’s core, and it captures temporal dependencies in your multivariate sequence data. Here’s what each gate does:

  • Forget Gate: The forget gate evaluates the current input data and the previous hidden state to determine which parts of the cell state are necessary Eq. (3). This gate helps the model ignore outdated or irrelevant information, such as pollutant spikes caused by one-time events. It evaluates the previous hidden state Inline graphic and current input Inline graphic using a sigmoid function. The output is a vector Inline graphic with values between 0 and 1, where 0 means “completely forget” and 1 means “completely retain” as shown in Eq. (2) below:
    graphic file with name 41598_2025_23940_Article_Equ2.gif 2
  • Input Gate: he input gate determines the new information to incorporate into the cell state. By utilising the sigmoid function, it assigns weights to the significance of the input Inline graphic and the previous hidden state Inline graphic, as depicted in Eq. (3). This mechanism adapts to abrupt variations in weather conditions or pollutant levels.
    graphic file with name 41598_2025_23940_Article_Equ3.gif 3
  • Candidate Cell State: The candidate cell state Inline graphic signifies possible updates to the cell state. It is computed through the hyperbolic tangent (Inline graphic) function, producing outputs within the range of [-1, 1], as illustrated in Eq. (4)
    graphic file with name 41598_2025_23940_Article_Equ4.gif 4
  • Cell State Update: The cell state Inline graphic integrates the previous cell state Inline graphic with the candidate state Inline graphic, guided by the outputs of the forget and input gates. This updated cell state serves as the carrier of the sequence’s long-term memory. The corresponding update is represented in Eq. (5).
    graphic file with name 41598_2025_23940_Article_Equ5.gif 5
  • Output Gate: The output gate determines the part of the cell state that will be output as the hidden state Inline graphic for the current time step. It uses a sigmoid function for gating and multiplies it with the updated cell state passed through Inline graphic as shown in the Eq. (6) below:
    graphic file with name 41598_2025_23940_Article_Equ6.gif 6
    In the above equations, Inline graphic and Inline graphic denote the hidden states at time steps t and Inline graphic, respectively, while Inline graphic represents the pollutant concentration input at time t. The weight matrices are denoted by Inline graphic, Inline graphic, and Inline graphic, and the corresponding bias terms are Inline graphic, Inline graphic, and Inline graphic. The hidden state functions as the output of the LSTM cell, carrying information about both the current and preceding sequences to the next time step. After processing through the first LSTM layer, Dropout layers randomly drop 20% of neurons to help prevent overfitting. The second LSTM layer, with 50 units, does not return sequences since it is the final LSTM in the stack. Another Dropout layer with a 20% rate for regularisation. Finally, a Dense layer with feature units is used to output the final predictions for each feature, capturing the interrelationships among all input features. All the hyper-parameters value for LSTM is shown in Table 3.

Table 3.

Model configurations and hyperparameters for LSTM, CNN, and XGBoost.

Model Configuration Training setup Parameters
LSTM

2 LSTM layers (50 units, ReLU)

Dropout: 0.2

Optimizer: Adam

Loss: MSE

Epochs: 50

Batch: 32

Early Stopping: 10

Seq. Length: 30

Features: 1

Total Params: 91,955

Trainable: 30,651

CNN

Conv1D: 32 filters (k=2, ReLU) + MaxPool(2)

Conv1D: 16 filters (k=2, ReLU) + MaxPool(2)

Dense: 32 (ReLU), Dropout: 0.2

Optimizer: Adam

Loss: MSE

Epochs: 50

Batch: 32

Early Stopping: 10

Total Params: 12,821

Trainable: 4,273

XGBoost

Objective: reg:squarederror

Learning Rate: 0.1

Random State: 42

Tree-based boosting

Default training iterations

Other parameters: default
  • CNN model: A CNN model is designed to identify short-term temporal patterns, making it useful for time-series data with localised trends, as shown in Fig. 4. The input layer defines the input sequence length and the number of features, where the input sequence length refers to the length of each sequence, and the number of features represents the attributes in each time step. A simple 1D Convolutional Neural Network (CNN) model has been utilised for time-series forecasting, as illustrated in Eq. (7). The 1D convolutional layer includes 64 filters and a kernel size of 3, which captures patterns over three time steps. The mathematical representation is as follows: Convolutional layers were defined using the standard formulation
    graphic file with name 41598_2025_23940_Article_Equ7.gif 7
    where the convolutional operation follows standard definitions34,35.

where: x is the feature vector, ReLU is the activation function to introduce non-linearity, W weight for every unit (initially set to a random value), and b is the bias added to every unit (initially set to a random value). Subsequently, the MaxPooling1D layer down-samples the convolution output to reduce the dimensionality and computational load as shown in Eq. (8). The mathematical representation is as follows:

graphic file with name 41598_2025_23940_Article_Equ8.gif 8

In the above equation, max(x) identifies the maximum value within each consecutive 2-step window. Subsequently, the flattening layer transforms the 2D output from the preceding layer into a 1D vector, setting it up for the Dense layers, as illustrated in Eq. (9). The mathematical expression is represented as follows:

graphic file with name 41598_2025_23940_Article_Equ9.gif 9

where: Vector(x) represent 1D feature vector A dense layer with 50 units and ReLU activation for learning complex patterns is shown in Eq. (10) . The mathematical representation is as follows:

graphic file with name 41598_2025_23940_Article_Equ10.gif 10

where: W weight learned within the network and b bias value set accordingly. Finally, the output layer has several feature units, one for each feature to predict, suitable for a regression task. All the hyperparameters for CNN are shown in Table 3.

  • Feature extraction: Extracting features from trained models captures the learned representations of data, which can be used for further analysis. A feature extractor excludes the final layer, predicts on the input data, and reshapes the resulting features for downstream tasks. This is often employed in transfer learning or when intermediate representations of input data are required.

    When data X passes through the LSTM model, it processes sequential information, learns temporal patterns, and generates feature vectors representing the model’s learned representations. Similarly, when data flows through the CNN model, it captures spatial patterns and local dependencies. The resulting feature vectors highlight the crucial patterns identified by the model, as illustrated in Fig. 2.

Fig. 4.

Fig. 4

Structural design of convolutional neural networks.

Algorithm 1.

Algorithm 1

Hybrid Model for Pollutant Prediction

  • RSA meta-heuristic optimization: The Reptile Search Algorithm (RSA) introduced in 202236. RSA draws inspiration from crocodile behaviours to optimise solutions for intricate problems. This algorithm operates through two key mechanisms: exploitation, which focuses on local search, and exploration, which emphasises global search. The RSA employs strategies inspired by hunting and encircling behaviours observed in nature. The algorithm initialises three key parameters: the population of crocodiles, the dimensionality of the search space, and the initial candidate solutions. The RSA is applied to enhance the output features derived from LSTM and CNN, ensuring optimal performance.

    The RSA initiates optimisation by creating a randomly distributed set of candidate solutions, represented as (X). In each iteration, the most favourable solution identified is assessed and considered as an approximation of the optimal result.
    graphic file with name 41598_2025_23940_Article_Equ11.gif 11

Here, X represents the dataset in a matrix form, where each row corresponds to a sample (or population member) and each column represents a feature dimension. Thus, Inline graphic denotes the value of the Inline graphic feature for the Inline graphic sample as described in Eq. (11). In other words, the matrix captures X candidate solutions in a D-dimensional search space, forming the input basis for subsequent optimization steps.

The problem is characterised by two key parameters: the population size (P), which corresponds to the initial number of solutions, and the data feature dimension (D), which represents the feature set of pollutant concentration.

To initialize the search process, each solution (or candidate) must be randomly generated within the feasible range of the problem. This ensures that the algorithm explores the entire search space instead of being biased toward a particular region.

graphic file with name 41598_2025_23940_Article_Equ12.gif 12

Here, Inline graphic denotes the Inline graphic variable of the Inline graphic candidate solution. The term rand generates a random number uniformly distributed in [0, 1]. By scaling it with Inline graphic and shifting by Inline graphic, the variable is guaranteed to lie within the specified lower and upper bounds of the search space as shown in Eq. (12).

  • Exploration phase: The encircling phase focuses on exploring high-density regions within the search space. During this phase, movements inspired by crocodile behaviours, such as high and belly walking, are pivotal. While these movements are not directly related to capturing prey, it is instrumental in exploring an extensive search space.

    During the exploration phase, referred to as encircling, two specific conditions must be satisfied according to Eq. (13), involving high walking and belly walking movements. High walking is determined by the condition (Inline graphic), while belly walking adheres to (Inline graphic). Subsequently, the value of Inline graphic is updated based on these parameters:
    graphic file with name 41598_2025_23940_Article_Equ13.gif 13

In this formulation, Inline graphic represents the updated value of the Inline graphic variable for the Inline graphic candidate solution at iteration Inline graphic. During the early stage of the search (Inline graphic), the update is influenced by the best solution so far Inline graphic, a control factor Inline graphic, and a random deviation term Inline graphic, encouraging diverse exploration. In the subsequent stage (Inline graphic), the update incorporates both the best solution and a randomly chosen candidate Inline graphic, modulated by the exploration strength ES(t). This transition gradually balances exploration and exploitation as the search progresses. In Eq. (13), the parameter Inline graphic regulates the exploration process. Additionally, the random variable Inline graphic and Inline graphic contribute to the stochastic elements of the algorithm. The hunting operator for the Inline graphic position, represented by Inline graphic, is determined using Eq. (14). Finally, the reduced function R ik is applied to narrow the search area, as defined by Eq. (15).

graphic file with name 41598_2025_23940_Article_Equ14.gif 14

Here, Inline graphic represents the control factor associated with the Inline graphic candidate and the Inline graphic variable. It is obtained by multiplying the best solution in the current iteration, Inline graphic, with the population factor Inline graphic. This formulation ensures that the search process is guided by the best-known solution while still being influenced by the diversity of the population, thereby balancing exploitation of good solutions with exploration of alternative regions in the search space.

graphic file with name 41598_2025_23940_Article_Equ15.gif 15

In this formulation, Inline graphic denotes the random deviation term for the ith candidate and the kth variable. It is calculated as the difference between the current best solution Inline graphic and a randomly chosen solution Inline graphic, normalized by the best solution plus a small constant Inline graphic to avoid division by zero. This mechanism injects controlled randomness into the update process, ensuring exploration of new regions while maintaining numerical stability.

graphic file with name 41598_2025_23940_Article_Equ16.gif 16

The environmental selection factor, ES(t), is designed to balance exploration and exploitation in the optimization process as presented in Fig. 5a. It introduces a degree of randomness while gradually adjusting its influence over time. Intuitively, ES(t) becomes smaller as the iteration count T increases, allowing the algorithm to focus more on exploitation in later stages.

Fig. 5.

Fig. 5

RSA Algorithm flowchart split into two parts: (a) Initialization and fitness evaluation, (b) Strategy application and iteration.

Here, Inline graphic is a random number that injects stochastic behavior, and the term Inline graphic ensures that the impact of ES(t) diminishes over iterations. This mechanism helps the algorithm explore new regions initially while gradually stabilizing towards convergence.

The term Inline graphic represents the normalized perturbation for the ith candidate and the kth variable. It is designed to scale the deviation of the candidate solution from the mean relative to the range of the variable and the current best solution. The constant Inline graphic provides a baseline bias, while Inline graphic prevents division by zero, ensuring numerical stability. Intuitively, Inline graphic allows larger adjustments when a candidate is far from the mean and smaller adjustments when it is close, helping the algorithm balance exploration and exploitation.

graphic file with name 41598_2025_23940_Article_Equ17.gif 17

Here, Inline graphic is the current value of the variable, Inline graphic is the mean of the ith candidate, Inline graphic is the best solution found for the kth variable, and Inline graphic, Inline graphic are the upper and lower bounds of the variable, respectively.

The term Inline graphic represents the mean value of all variables for the ith candidate. Intuitively, it provides a central reference point for that candidate’s position in the solution space, helping to assess how each individual variable deviates from the candidate’s overall average.

graphic file with name 41598_2025_23940_Article_Equ18.gif 18

Here, Inline graphic is the value of the kth variable of the ith candidate, and D is the total number of variables. By computing this mean, the algorithm can normalize deviations and maintain stability during the optimization process.

In the above equation, Inline graphic represents a small constant, while Inline graphic denotes a randomly selected value. The evolutionary phase, ES(t), is characterised by the probability ratio, as described in Eq. (16). This ratio decreases progressively from -2 to 2 for iterations. It is determined using Eq. (17). Additionally, the random variable Inline graphic and P(ik) indicate the percentage difference between the optimal value and Inline graphic, which represents the current solution and the average solutions, as computed in Eq. (18).

  • Hunting phase: The hunting mechanism employs strategic movements to refine the positions of candidate solutions, enhancing their fitness for prediction accuracy. During the social sharing phase, candidate solutions exchange information to navigate the search process effectively, promoting a balance between exploration and exploitation of the search space.

Encircling, hunting, and social sharing are repeated until a predefined stopping criterion is achieved, such as reaching a maximum of 100 iterations. The foraging process encompasses two distinct activities: hunting coordination and hunting cooperation. These represent focused strategies to refine the exploitation search, as defined by Eq. (19). Hunting coordination is carried out when Inline graphic and Inline graphic. Alternatively, hunting cooperation takes place when Inline graphic and Inline graphic.

The update of the variable Inline graphic depends on the current iteration t and is designed to balance exploration and exploitation dynamically throughout the optimization process. In the early and middle stages, the algorithm applies different strategies to either explore new regions or refine existing solutions. Randomness is incorporated through rand to avoid premature convergence, while terms like Inline graphic, Inline graphic, and Inline graphic control the step size and direction based on the candidate’s relation to the best solution.

graphic file with name 41598_2025_23940_Article_Equ19.gif 19

Here:

  • Inline graphic is the best solution found so far for the kth variable,

  • Inline graphic is the normalized perturbation factor,

  • Inline graphic is the control factor for candidate i and variable k,

  • Inline graphic is the random deviation term,

  • rand introduces stochasticity, and

  • Inline graphic is a small constant to ensure numerical stability. Intuitively, in the earlier phases, the algorithm emphasizes exploration (larger random steps), while in later iterations, the update focuses more on fine-tuning around the best solution for convergence.

The search space broadens around the chosen solution when Inline graphic and transitions toward convergence near the optimal solution when Inline graphic. During the exploration stage, high walking and moving strategies are applied for scenarios where Inline graphic, while belly walking is implemented for Inline graphic and Inline graphic. For exploitation, the hunting coordination mechanism operates under the conditions Inline graphic and Inline graphic. The hunting cooperation mechanism is utilized under the conditions Inline graphic and Inline graphic. Once the RSA algorithm satisfies its termination criterion, the process concludes. The flowchart illustrating the RSA procedure is presented in Fig. 5b. The algorithm has been applied to both models (LSTM and CNN), ultimately producing optimised output features for each

Algorithm 2.

Algorithm 2

Reptile Search Algorithm (RSA) for Feature Optimisation

  • Combine Features: After applying RSA on both models individually, the optimised features for each pollutant are combined to merge the strengths of the LSTM and CNN models. The combined feature vector merges the selected features for each pollutant in the LSTM model into a single feature vector. Similarly, the combined feature vector combines the selected features for each pollutant in the CNN model. Finally, as shown in Fig. 2, it combines the horizontally stacked feature vector in the LSTM and CNN models into the final feature vector. This approach takes advantage of the unique insights captured by each model, which can lead to improved predictive power. Existing models like standalone LSTMs or CNNs often fail at long-term horizons (10 days) due to the accumulation of errors in recursive prediction settings and their inability to capture both spatial features (short-term local patterns) and long-term temporal dependencies simultaneously. Our hybrid model explicitly addresses this by using the CNN to extract salient spatial features from the input sequence and the LSTM to model long-term temporal dynamics, creating a richer feature set for the final predictor. 10-Day Forecasting Method: The code and methodology must clearly state this. The forecasting approach. The model is trained to predict the concentration for the next day (t+1) given a sequence of the previous n days. To generate a 10-day forecast, we operate in an autoregressive manner. The predicted value for t+1 is fed back as input to predict t+2, and this process is repeated iteratively to reach the 10-day horizon. While efficient, this method is susceptible to error propagation over long horizons.

  • Ensembling Learning through XGBoost: XGBoost37, leverages the boosting model introduced by38. Normalisation within the objective function simplifies the model, prevents overfitting, and expedites learning. XGBoost is an ensemble model that effectively integrates decision trees, leading to a combined model with superior predictive performance compared to individual methods. The output function is calculated from Eq. (20).
    graphic file with name 41598_2025_23940_Article_Equ20.gif 20
    where, Inline graphic represents the generated tree, Inline graphic refers to the newly constructed tree model, and T indicates the total count of tree models. In XGBoost, tuning different parameters is essential for enhancing model performance and addressing overfitting concerns. The XGBoost takes in the input features Inline graphic and target values y. Here, we input the final feature vector as input features. An XGBRegressor with 100 estimators and a learning rate of 0.1 ensures reproducibility with a fixed random state for the model initialisation. It fits the model to the data, training it to learn the relationships between the final feature vector and y. This integrates perfectly with the combined feature set, leveraging the predictive power of XGBoost to forecast pollutant levels as shown in Fig. 2.

Results

  • Comparative evaluation findings: The subsequent step in executing the hybrid approach involves constructing a neural network model that integrates the RSA optimiser and XGBoost for the prediction task. This study developed a model to exploit the strong correlation between historical and future pollutant concentrations. To evaluate the predictive performance of the proposed method in comparison to other approaches, five benchmark models are constructed: TST, CNN, BiLSTM, BiRNN, ANN, and BiGRU. These models are employed for forecasting pollutant concentrations. These models are univariate, meaning it take past observations of the target pollutant’s concentration as inputs and output the current concentration of the respective pollutant. The proposed approach’s performance is compared with these models using three evaluation metrics, as described below:

  1. R2: A statistical metric that measures the degree to which the variance in the dependent variable is explained or predicted by the independent variable.
    graphic file with name 41598_2025_23940_Article_Equ21.gif 21

where, Inline graphic denotes the coefficient of determination, RSS signifies the residual sum of squares, and TSS represents the total sum of squares, as outlined in Eq. (21).

  • (b)
    MAE: It assesses the average of the absolute differences between actual observations and predicted values. The formula for calculating MAE is as follows:
    graphic file with name 41598_2025_23940_Article_Equ22.gif 22

where y represents the actual pollutant concentration, Inline graphic is predicted value of pollutant concentration in Eq. (22).

  • (c)
    MAPE: Quantifies the percentage deviation between predicted values and actual observations. It is calculated by Eq. (23):
    graphic file with name 41598_2025_23940_Article_Equ23.gif 23
  • (d)
    MSE: quantifies the average squared disparity between the observed values in a statistical analysis and the values predicted by the model. Its computation is expressed in Eq. (24).
    graphic file with name 41598_2025_23940_Article_Equ24.gif 24

Tables 4 and 5 present the evaluation results for both the proposed and benchmark approaches, detailing the prediction outcomes for all four target pollutants. Notably, the hybrid approach surpassed all current benchmark methods, more accurately detecting fluctuations in pollutant concentrations and producing fewer prediction errors.

Table 4.

Performance comparison of the proposed approach and existing methods using various popular metrics.

Model Inline graphic MAE MAPE% MSE
Transformer25 0.6948 0.0323 36.7777 0.0031
CNN39 0.6143 0.0388 48.9230 0.0042
BiLSTM23 0.7550 0.0298 39.0969 0.0026
BiRNN26 0.7483 0.0297 35.3872 0.0026
ANN40 0.7492 0.0310 30.877 0.0025
BiGRU41 0.7509 0.0295 35.4085 0.0025
SVR42 0.5777 0.0423 68.3424 0.0065
RFR43 0.7501 0.0378 29.2999 0.0039
KNN 0.7427 0.0363 27.8318 0.0045
GBM24 0.7671 0.0363 27.2358 0.0036
Our approach 0.9481 0.0163 20.1371 0.0005

Table 5.

Evaluation of the proposed approach’s performance using several popular metrics for all four pollutants.

Model Inline graphic Model CO
Inline graphic MAE MAPE MSE Inline graphic MAE MAPE MSE
TST 0.8134 0.0197 30.6327 0.0013 TST 0.9002 0.0149 64.7483 0.0009
CNN 0.7590 0.0238 42.3350 0.0017 CNN 0.9160 0.0148 54.1940 0.0008
BiLSTM 0.8004 0.0177 30.7122 0.0007 BiLSTM 0.9185 0.1840 34.0113 0.1615
BiRNN 0.6231 0.3536 43.4619 0.1899 BiRNN 0.8143 0.2614 38.5783 0.3955
ANN 0.7892 0.0219 34.5608 0.0014 ANN 0.9149 0.0118 27.7484 0.0007
BiGRU 0.7024 0.0347 34.5608 0.0021 BiGRU 0.8601 0.3051 71.0872 0.3102
GBM 0.4695 0.0184 35.8107 0.0010 GBM 0.6777 0.0090 18.4490 0.0002
KNN 0.7427 0.0148 27.8318 0.0011 KNN 0.7227 0.0165 25.8318 0.0023
Our 0.9493 0.0131 22.0556 0.0003 Our 0.9812 0.0078 22.0159 0.0001
Model SOInline graphic Model NOInline graphic
Inline graphic MAE MAPE MSE Inline graphic MAE MAPE MSE
TST 0.6412 0.0392 29.0025 0.0049 TST 0.8263 0.0452 31.6704 0.0019
CNN 0.5990 0.0433 34.9548 0.0053 CNN 0.7684 0.0437 33.3147 0.0041
BiLSTM 0.5812 0.9571 31.6072 0.0037 BiLSTM 0.7909 0.0378 39.3648 0.0026
BiRNN 0.6457 0.2840 28.6953 0.0806 BiRNN 0.7837 0.0564 24.8869 0.7389
ANN 0.5382 0.0406 33.4126 0.0045 ANN 0.7855 0.0414 27.7484 0.0039
BiGRU 0.5601 0.3052 31.3664 0.0734 BiGRU 0.7485 0.0425 29.9723 0.2167
GBM 0.4919 0.0538 42.7643 0.0070 GBM 0.5365 0.0553 27.6253 0.0063
KNN 0.7247 0.0178 26.8318 0.0011 KNN 0.7787 0.0165 25.7318 0.0017
Our 0.8323 0.0295 26.6894 0.0016 Our 0.8954 0.0316 20.3606 0.0020
  • Graphical Representation of Results:

This part emphasises the prediction results of the proposed approach by comparing predicted values with actual observations through plots.

Figure 6 illustrates the prediction plots for the four pollutants analysed in this research. The x-axis corresponds to the number of observations, while the y-axis represents the normalised values of the pollutants. Actual observations are depicted in blue, and predicted observations are in red. The proposed method accurately captures the non-linearity and variations in pollutant values, enhancing prediction accuracy. The proposed approach performs well in predicting pollutant value patterns. This high-efficiency level is also evident in the prediction for the next three pollutants.

Fig. 6.

Fig. 6

Prediction results of pollutants (Inline graphic , CO, SOInline graphic, NOInline graphic) using the proposed hybrid method.

The box plot analysis reveals a strong alignment between the model’s predictions and the observed values in terms of central tendency, variability, and distribution. In this study, the proposed method was utilised to generate box plots based on the actual and predicted values of the four pollutants, as illustrated in Fig. 7. A few outliers in the predictions indicate potential areas for further investigation and improvement. Overall, the model shows strong performance in capturing the central tendency and variability of the actual values, making it a reliable tool for prediction in this context.

Fig. 7.

Fig. 7

Model performances on actual and predicted observations value of pollutants (Inline graphic, CO, SOInline graphic, NOInline graphic).

In Fig. 8, the x-axis represents samples/observations, and the y-axis represents normalised pollutant values. The training observations are represented using the blue colour, the true observations are represented using the green colour, and the predicted values on the true dataset are red. The proposed method accurately captures the non-linearity and variations in pollutant values, enhancing the precision of the prediction.

Fig. 8.

Fig. 8

Prediction results on train-test split of pollutants (Inline graphic , CO, SOInline graphic, NOInline graphic) using proposed hybrid approach.

Figure 9 offers a visual representation of pollutant concentration comparisons across a 10-day window, encompassing the last 10 days of observed data and the next 10 days of forecasted values for four pollutants ( multi-step forecasting approach). On the x-axis, pollutant concentrations are plotted, while the y-axis denotes the days within this time frame. The previous 10 days of actual observations are illustrated with a blue line, spanning x-values from -10 to 0. A red vertical dashed line at x = 0 marks the present day, acting as a separator between past data and future projections. Forecasted data for the next 10 days is depicted using orange peaks, mapped to x-values between 0 and 10. This visualisation highlights the ability to effectively contrast observed and predicted pollutant trends, showcasing the proposed method’s capacity to capture patterns and variations across multiple pollutants.

Fig. 9.

Fig. 9

Prediction results for pollutants 10 days in advance(Inline graphic , CO, SOInline graphic, NOInline graphic) using proposed hybrid approach.

Figure 10 presents scatterplots that compare the proposed model’s actual and predicted pollutant values. On the x-axis, the actual values are plotted, while the y-axis represents the predicted values. These scatterplots provide a clear visual representation of how closely the predicted values align with the observed data. Points near the red diagonal line, defined by y = x, signify accurate predictions where the predicted and actual values are identical. The red diagonal line serves as a reference, highlighting perfect predictions. Data points clustered around this line indicate minimal prediction errors, showcasing the model’s efficiency.

Fig. 10.

Fig. 10

Scatterplots of pollutants (Inline graphic , CO, SOInline graphic, NOInline graphic) actual and predicted values.

It uncovers systematic patterns or prediction biases, such as consistent overestimation or underestimation within specific data ranges. Outliers could signify unusual pollutant concentration events or potential errors in the dataset or model. A wider spread indicates more significant variability in predictions. In areas with high pollution, Inline graphic levels tend to exhibit less deviation from the diagonal line in scatterplots at elevated concentrations. The scatterplot displays points closely clustered around the diagonal for CO, suggesting lower variability than other pollutants. CO level variations might be attributed to extreme weather conditions or unusual emission events. Predictions for SOInline graphic often show more outliers, reflecting significant data variability due to localised pollution sources. Meanwhile, the scatterplot for NOInline graphic reveals a broader prediction spread, indicating potential difficulties in accounting for temporal variations.

  • Statistical Analysis: The pairwise t-test results comparing the Hybrid model with baseline models as shown in Table 6. All p-values are below 0.01, meaning the Hybrid model’s improvements over each baseline are statistically significant.

Table 6.

Pairwise t-test results comparing the Hybrid model with baseline models.

Comparison t-Statistic p-value
Hybrid vs Transformer 3.89 0.0012
Hybrid vs CNN 4.23 0.0008
Hybrid vs LSTM 2.98 0.0075
Hybrid vs RNN 5.45 Inline graphic
Hybrid vs ANN 3.76 0.0011
Hybrid vs GRU 3.25 0.0042
Hybrid vs SVR 6.78 Inline graphic
Hybrid vs RFR 4.89 0.0003
Hybrid vs KNN 7.32 Inline graphic
Hybrid vs GBM 4.12 0.0009
  • Robustness Analysis: Tables 7, 8 and 9 present robustness analysis across seasons, cities, or noisy inputs with other models, respectively.

Table 7.

Seasonal performance of the Hybrid model.

Season R2 Score MAE MAPE
Winter 0.861 0.012 38.2%
Monsoon 0.823 0.015 45.7%
Summer 0.798 0.017 52.3%
Autumn 0.812 0.016 48.9%

Table 8.

City-wise model performance comparison.

Model Kolkata Mumbai Brajrajnagar Delhi Guwahati
Hybrid 0.845 0.832 0.818 0.801 0.795
Transformer 0.812 0.806 0.791 0.774 0.768
CNN 0.798 0.789 0.775 0.761 0.754
LSTM 0.820 0.815 0.803 0.788 0.781
RNN 0.785 0.772 0.768 0.752 0.745
ANN 0.801 0.794 0.786 0.771 0.764
GRU 0.815 0.808 0.795 0.782 0.776
SVR 0.763 0.751 0.742 0.728 0.719
RFR 0.791 0.783 0.776 0.759 0.751
KNN 0.745 0.732 0.721 0.708 0.698
GBM 0.809 0.798 0.789 0.775 0.769

Table 9.

Noise robustness analysis.

Model 0.00 (Clean) 0.01 0.05 0.10
Hybrid 0.845 0.832 0.801 0.763
Transformer 0.812 0.794 0.752 0.698
LSTM 0.820 0.805 0.768 0.715
CNN 0.798 0.781 0.739 0.682
Performance Degradation Baseline -1.5% -5.2% -9.7%
  • Performance Comparisons: Table 10 presents performance comparisons with three hybrid models optimized with different meta-heuristics: GA (Genetic Algorithm), PSO (Particle Swarm Optimization), and RSA (Reptile Search Algorithm). RSA outperforms GA and PSO across all metrics.

  • Disussion: The proposed hybrid model’s superior performance arises from its ability to integrate spatial, temporal, and optimised feature selection, reducing error accumulation in long-term forecasts. This aligns with air pollution theory, where pollutant dynamics are influenced by nonlinear meteorology–emission interactions. Compared to existing CNN–LSTM or statistical baselines, our approach uniquely sustains predictive accuracy over extended horizons, addressing an unresolved gap.

The assumptions and limitations are mentioned below:

  • Assumptions: Future pollutant levels are primarily driven by recent observations; seasonal and cross-pollutant dependencies remain stable and can be learned; cleaned datasets are sufficiently representative; and LSTMs can manage mild non-stationarity through contextual windows.

  • Limitations: The hybrid/Transformer-based pipeline is computationally intensive; city-wise NaN dropping may lead to substantial data loss and biased samples (violating MCAR); reliance on simplistic cleaning rather than imputation reduces robustness; and the models remain purely autoregressive, excluding exogenous factors such as weather conditions, holidays, and traffic flows.

CNN/ANN are lightest (linear in T), (Bi-)LSTM/GRU are mid-cost (per-step recurrence), and Transformers get expensive for long sequences (Inline graphicattention). The Hybrid (CNN+LSTM+XGBoost) is the heaviest because it adds all three costs, but it wins on accuracy and 10-day stability—CNN captures local patterns, LSTM learns long memory, and XGBoost fits residual structure.

Table 10.

Performance Comparison.

Model Inline graphic MAE MAPE MSE
Hybrid + GA 0.918 0.018 21.3 0.0006
Hybrid + PSO 0.929 0.017 20.5 0.00055
Hybrid + RSA 0.948 0.016 20.1 0.0005

Conclusion

This research study introduces a hybrid approach that combines CNN and LSTM architectures, optimized using, to predict future pollutant concentrations with the help of XGBoost. The RSA is utilised for feature selection, identifying the most impactful features for future pollutant concentration predictions. Developed a multivariate LSTM model and a 1D CNN to estimate future pollutant concentrations in this research. To validate the accuracy and effectiveness of the proposed approach, the prediction results were compared with those of the TimeseriesTransformer, CNN, BiLSTM, BiRNN, ANN and BiGRU models. The evaluation, using well-known metrics such as Inline graphic, MAE, MAPE, and MSE, showed that the proposed hybrid approach consistently surpassed the performance of other models. The overall performance of the hybrid model is: Inline graphic = 0.9881, MAE = 0.0163, MAPE = 20.1371 and MSE = 0.0005. This confirms the robustness of our feature selection methodology, utilising the optimisation algorithm and ensemble learning. Therefore, the proposed hybrid method effectively selects the optimal features for accurate future concentration predictions, introducing an innovative approach to pollutant forecasting. While the proposed method is computationally demanding, there is potential for future improvements to enhance computational efficiency.

Building on this work, future research can explore federated learning for privacy-preserving multi-city pollutant prediction, edge computing for low-latency, on-device inference, and real-time deployment in smart city monitoring systems. Additionally, integrating transformer architectures and explainable AI can enhance accuracy, interpretability, and scalability, enabling efficient, continuous pollutant forecasting across urban environments.

This study bridges the gap in long-term air pollution forecasting by integrating CNN–LSTM with RSA and XGBoost, ensuring robust accuracy across pollutants. Limitations include computational complexity and high training costs. Future research will examine federated learning, transformers, and edge deployment. Practically, the framework empowers policymakers and urban planners with reliable forecasts to guide sustainable air quality management.

Acknowledgements

We hereby acknowledge the support of the Computer Science Engineering Department, Thapar Institute of Engineering Technology, Patiala, Punjab, for providing the facility.

Author contributions

Priya Kansal: Writing original draft, Validation, Methodology, Conceptualisation. Jatin Bedi: Writing - review & editing, Conceptualization, Validation, Supervision. Sushma Jain: Writing - review & editing, Conceptualization, Validation, Supervision. All authors are fully aware of this manuscript and have permission to submit the manuscript for possible publication.

Funding

Authors declare that no funding has been received to support the work carried out in the current study.

Data availability

The dataset used and analysed in this study can be made available to the authors upon request to pkansal_phd22@thapar.edu.

Code availability

The code used and analysed in this study can be made available to the authors upon request at pkansal_phd22@thapar.edu.

Declarations

Competing Interests

The authors declare no competing interests.

Ethical approval

All the authors have read, understood, and have complied as applicable with the statement on “Ethical responsibilities of Authors” as found in the Instructions for Authors. They are aware that, with minor exceptions, no changes can be made to authorship once the paper is submitted.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Abbass, K. et al. A review of the global climate change impacts, adaptation, and sustainable mitigation measures. Environ. Sci. Pollut. Res.29, 42539–42559 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Al-Obaidy, A. H., Jasim, I. & AlKubaisi, A.-R. Air pollution effects in some plant leaves morphological and anatomical characteristics within Baghdad City, Iraq. Eng. Technol. J.37(1C), 84–89 (2019). [Google Scholar]
  • 3.Sharma, E., Deo, R. C., Prasad, R., Parisi, A. V. & Raj, N. Deep air quality forecasts: Suspended particulate matter modeling with convolutional neural and long short-term memory networks. IEEE Access8, 209503–209516. 10.1109/ACCESS.2020.3039002 (2020). [Google Scholar]
  • 4.Bouktif, S., Fiaz, A., Ouni, A. & Serhani, M. A. Multi-sequence LSTM-RNN deep learning and metaheuristics for electric load forecasting. Energies13(2), 391 (2020). [Google Scholar]
  • 5.Inam, S. A. et al. PR-FCNN: A data-driven hybrid approach for predicting pmInline graphic concentration. Discov. Artif. Intell.10.1007/s44163-024-00184-7 (2024). [Google Scholar]
  • 6.Inam, S. A. et al. A neural network approach to carbon emission prediction in industrial and power sectors. Discov. Appl. Sci.7, 640. 10.1007/s42452-025-07257-x (2025). [Google Scholar]
  • 7.Han, C., Park, H., Kim, Y., & Gim, G. In: (ed. Lee, R.) Hybrid CNN-LSTM Based Time Series Data Prediction Model Study 43–54 (Springer, 2023). 10.1007/978-3-031-19608-9_4
  • 8.Elshewey, A. M. Enhancing crop yield prediction based on dove optimization algorithm and gradient boosting model. SIViP (Signal, Image and Video Processing)19, 951. 10.1007/s11760-025-04545-2 (2025). [Google Scholar]
  • 9.Zhang, Z., Johansson, C., Engardt, M., Stafoggia, M. & Ma, X. Improving 3-day deterministic air pollution forecasts using machine learning algorithms. Atmos. Chem. Phys.24, 807. 10.5194/acp-24-807-2024 (2024). [Google Scholar]
  • 10.Reichstein, M. Deep learning and process understanding for data-driven earth system science. Nature566, 7743 (2019). [DOI] [PubMed] [Google Scholar]
  • 11.Bai, L., Wang, J., Ma, X. & Lu, H. Air pollution forecasts: An overview. Int. J. Environ. Res. Public Health10.3390/ijerph15040780 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Liu, B. et al. An attention-based air quality forecasting method. IEEE Int. Conf. Mach. Learn. Appl.17, 728–733 (2018). [Google Scholar]
  • 13.Yixuan Zhu, J., Sun, C. & Li, V. An extended spatiotemporal granger causality model for air quality estimation with heterogeneous urban big data. IEEE Trans. Big Data3, 307–319 (2017). [Google Scholar]
  • 14.Chauhan, R., Kaur, H. & Alankar, B. Air quality forecast using convolutional neural network for sustainable development in urban environments. Sustain. Cities Soc.75, 103239. 10.1016/j.scs.2021.103239 (2021). [Google Scholar]
  • 15.Zhang, Q., Han, Y., Li, V. & Lam, J. Deep-air: A hybrid CNN-LSTM framework for fine-grained air pollution estimation and forecast in metropolitan cities. IEEE Access10, 55818–55841 (2022). [Google Scholar]
  • 16.Han, Y., Lam, J. C. K., Li, V. O. K. & Zhang, Q. A domain-specific Bayesian deep-learning approach for air pollution forecast. IEEE Trans. Big Data8, 1034–1046 (2022). [Google Scholar]
  • 17.Saenz, T., Fernando, M., Garcia, J. & Munoz, A. Nationwide air pollution forecasting with heterogeneous graph neural networks. ACM Trans. Intell. Syst. Technol.15, 18–11819 (2023). [Google Scholar]
  • 18.Chen, B. et al. Geo-STO3Net: A deep neural network integrating geographical spatiotemporal information for surface ozone estimation. IEEE Trans. Geosci. Remote Sens.62, 1–1. 10.1109/TGRS.2024.3358397 (2024). [Google Scholar]
  • 19.Mushtaq, Z. et al. Satellite or ground-based measurements for air pollutants (pm2.5, pm10, so2, no2, o3) data and their health hazards: which is most accurate and why?. Environ. Monit. Assess. 10.1007/s10661-024-12462-z (2024). [DOI] [PubMed]
  • 20.Yue, X. et al. Airpollutionviz: visual analytics for understanding the spatio-temporal evolution of air pollution. J. Vis.27(2), 215–233. 10.1007/s12650-024-00958-2 (2024). [Google Scholar]
  • 21.Wang, L. et al. Short-term pm2.5 prediction based on multi-modal meteorological data for consumer-grade meteorological electronic systems. IEEE Trans. Consum. Electron.70(1), 3464–3474. 10.1109/TCE.2024.3354073 (2024). [Google Scholar]
  • 22.Xia, Y. et al. Understanding the disparities of pm2.5 air pollution in urban areas via deep support vector regression. Environ. Sci. Technol.10.1021/acs.est.3c09177 (2024). [DOI] [PubMed] [Google Scholar]
  • 23.Wu, Z., Tian, Y., Li, Y., Quan, M. & Liu, J. Prediction of air pollutant concentrations based on the long short-term memory neural network. J. Hazard. Mater.465, 133099. 10.1016/j.jhazmat.2023.133099 (2024). [DOI] [PubMed] [Google Scholar]
  • 24.Sharma, M. K. et al. Assessment of fine particulate matter for port city of eastern peninsular India using gradient boosting machine learning model. Atmosphere13, 743. 10.3390/atmos13050743 (2022). [Google Scholar]
  • 25.Yu, M., Masrur, A. & Boxe, C. Predicting hourly pm2.5 concentrations in wildfire-prone areas using a spatiotemporal transformer model. Sci. Total Environ.860, 160446. 10.1016/j.scitotenv.2022.160446 (2022). [DOI] [PubMed] [Google Scholar]
  • 26.Saravanan, D. & Kumar, S. Improving air pollution detection accuracy and quality monitoring based on bidirectional RNN and the internet of things. J. ScienceDirect81, 791–796. 10.1016/j.matpr.2021.04.239 (2023). [Google Scholar]
  • 27.CPCB: Air-Pollution https://cpcb.nic.in/air-pollution/, (Accessed_on_June 2024) (2020).
  • 28.Board, C.P.C. Real-time air quality data, accessed 2025 September 01; https://cpcb.nic.in/real-time-air-qulity-data/ (2020).
  • 29.Khatibi, V. & Nikpour, P. Advancing multi-pollutant air quality forecasting using transformer-based informer architecture. Earth Sci. Inf.18(1), 287. 10.1007/s12145-025-01722-2 (2025). [Google Scholar]
  • 30.Abualigah, L., Abd Elaziz, M., Sumari, P., Zong Woo, G. & Gandomi, A. H. A comprehensive review of the reptile search algorithm: Principles, applications, and future directions. Mathematics13(6), 1001. 10.3390/math13061001 (2025). [Google Scholar]
  • 31.Zhou, R. et al. A multi-strategy enhanced reptile search algorithm for global optimization and engineering optimization design problems. J. Netw. Intell.9(3), 93. 10.3390/jni9030093 (2024). [Google Scholar]
  • 32.Wu, Y.-X. & Wang, A.-C. An improved reptile search algorithm with novel mean transition mechanism for constrained industrial engineering problems. J. Netw. Intell.9(3), 18. 10.3390/jni9030018 (2024). [Google Scholar]
  • 33.Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput.9(8), 1735–1780. 10.1162/neco.1997.9.8.1735 (1997). [DOI] [PubMed] [Google Scholar]
  • 34.LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. Gradient-based learning applied to document recognition. In Proceedings of the IEEE vol. 86 2278–2324 (IEEE, 1998).
  • 35.Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016). [Google Scholar]
  • 36.Abualigah, L., Abd Elaziz, M., Sumari, P., Geem, Z. W. & Gandomi, A. H. Reptile search algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Syst. Appl.191, 116158. 10.1016/j.eswa.2021.116158 (2022). [Google Scholar]
  • 37.Tianqi Chen, C. G. Xgboost:a scalable tree boosting system. CoRR 1603.02754 (2016).
  • 38.Friedman, J. Greedy function approximation: A gradient boosting machine. Ann. Stat.29, 1189–1232. 10.2307/2699986 (2001). [Google Scholar]
  • 39.Sarkar, P., Saha, D. & Saha, M. Real-time air quality index detection through regression-based convolutional neural network model on captured images. Environ. Qual. Manag.10.1002/tqem.22276 (2024). [Google Scholar]
  • 40.Yadav, V., Yadav, A., Singh, V. & Singh, T. Artificial neural network an innovative approach in air pollutant prediction for environmental applications: A review. Results Eng.10.1016/j.rineng.2024.102305 (2024). [Google Scholar]
  • 41.Subbiah, S., Paramasivan, S., & Thangavel, M. Prediction of particulate matter pm2.5 using bidirectional gated recurrent unit with feature selection. Global NEST J.26 (2024).
  • 42.Sánchez, A. S., Nieto, P. J. G., Fernández, P. R., Coz Díaz, J. J. & Iglesias-Rodríguez, F. J. Application of an SVM-based regression model to the air quality study at local scale in the avilés urban area (Spain). Math. Comput. Model.54, 1453–1466 (2011). [Google Scholar]
  • 43.Weifu Ding, X. Q. Prediction of air pollutant concentrations via random forest regressor coupled with uncertainty analysis a case study in ningxia. J. Atmos.13, 960. 10.3390/atmos13060960 (2022). [Google Scholar]
  • 44.Fortunato, S. Community detection in graphs. Phys. Rep.486, 75–174. 10.1016/j.physrep.2009.11.002 (2010). [Google Scholar]
  • 45.Borah, J. et al. Aicareair: Hybrid-ensemble internet-of-things sensing unit model for air pollutant control. IEEE Sens. J.24(13), 21558–21565. 10.1109/JSEN.2024.3397735 (2024). [Google Scholar]
  • 46.Singh, S., Sharma, G. D., Singh Parihar, J. & Dev, D. Nexus between environmental degradation and climate change during the times of global conflict: Evidence from cs-ardl model. Environ. Sustain. Indic.22, 100368 (2024). [Google Scholar]
  • 47.Dey, S. Apict: Air pollution epidemiology using green AQI prediction during winter seasons in India. IEEE Trans. Sustain. Comput.14 (2021).
  • 48.Ramachandran, A. World air quality index by city and coordinates https://www.kaggle.com/ datasets /world-air-quality-index-by-city-and-coordinates,(Accessed_on_June 2024) (2023).
  • 49.Geron, A. (ed.) O Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 (2019).
  • 50.Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. Comput. Vis. Pattern Recognit. 213–229 (2020)
  • 51.Mo, H., Sun, H., Liu, J. & Wei, S. Developing window behavior models for residential buildings using xgboost algorithm. Energy Build.10.1016/j.enbuild.2019.109564 (2019). [Google Scholar]
  • 52.Rumelhart, D., Hinton, G. & Williams, R. Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition Vol. 1 (eds Rumelhart, D. E. & McClelland, J. L.) 318–362 (MIT Press, 1986). [Google Scholar]
  • 53.Li, H., Yang, T., Du, Y., Tan, Y. & Wang, Z. Interpreting hourly mass concentrations of pm2.5 chemical components with an optimal deep-learning model. J. Environ. Sci.151, 125–139. 10.1016/j.jes.2024.03.037 (2025). [DOI] [PubMed] [Google Scholar]
  • 54.Shah, P., & Mishra, P. Analytical equations based prediction approach for pmInline graphic using ann. Preprint at arXiv:2002.11416 (2020).
  • 55.Sreenivasulu, T. & Rayalu, G. Enhanced pmInline graphic prediction in Delhi using a novel STL-CNN-BILSTM-AM hybrid model. Environ. Syst. Res.13(1), 48. 10.1007/s44273-024-00048-7 (2024). [Google Scholar]
  • 56.Sidhu, K. K., Balogun, H., & Oseni, K.O. Predictive modelling of air quality index (AQI) across diverse cities and states of India using machine learning: Investigating the influence of Punjab’s stubble burning on AQI variability. Preprint at arXiv:2404.08702 (2024).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The dataset used and analysed in this study can be made available to the authors upon request to pkansal_phd22@thapar.edu.

The code used and analysed in this study can be made available to the authors upon request at pkansal_phd22@thapar.edu.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES