Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2026 Feb 7;16:7665. doi: 10.1038/s41598-026-38436-4

Particulate matter (PM2.5 and PM10) prediction using fourier series decomposition in combination with LSTM and SVM

Mohamed Bennis 1,, Mohamed Youssfi 1, Rachida El Morabet 2, Majed Alsubih 3, Roohul Abad Khan 3
PMCID: PMC12946331  PMID: 41654624

Abstract

Sustainable development globally is highly impacted by increased air pollution which is attributed to increasing population, commercial and industrial activities. Combustion gas emissions attributed to transportation, social and other activities are a major cause of air pollution. To mitigate adverse impact of air pollution on human health, forecasting PM10 and PM2.5 is a necessity. This study employs Fourier series decomposition approach in combination with support vector machine and long short-term memory machine learning algorithms to predict PM10 and PM2.5. Hourly data was obtained from December 2020 to November 2021 for Mohammedia city in Morocco. The model’s performance was evaluated using RMSE, MAE and R2. LSTMF and SVMF models in combination with Fourier series decomposition performed better than the SMV and LSTM standalone models. Hourly prediction of PM10, LSTMF model performed better than other models during Autumn season with closely followed by the model in winter seasons. For PM2.5 prediction the model during autumn season was observed to be outperforming other models in all other seasons. These results were based on hourly prediction based on season. Then this study also, forecasted seven ahead for PM10 and PM2.5. LSTMF model performed best with R2 value of 0.95 (winter), 0.93 (spring), 0.85 (summer) and 0.96 (autumn) for PM2.5. For PM10 the LSTMF performance was also good with R2 value of 0.84 (winter), 0.92 (spring), 0.84 (summer), and 0.92 (autumn). This study highlights how hourly prediction can be achieved to identify in advance the patterns and trends for particulate matter concentration. This will aid the decision and policy makers to adopt mitigation measures and policy in advance to address air pollution issues during peak hours.

Keywords: Particulate matter, Seasonal variation, SVM, LSTM, Fourier series

Subject terms: Climate sciences, Environmental sciences, Environmental social sciences

Introduction

Events which can lead to health issues, injury, loss of life, livelihoods, infrastructure and logistic services are termed as hazards1. Air pollution is termed as hazard and considered as serious risks to environment and human health2. In recent the multifold increase in industrialization and urbanization has triggered extreme air pollution occurrences often gaining global attention3. Ambient air pollution has Particulate matter (PM10 and PM2.5) either generated artificially or naturally remain suspended in atmosphere in particulate or gaseous state with radius of 2.5 and 10 μm 4. Particulate are identified as main culprit in air pollution as compared to other air pollutants as its concentration is significantly dominant as compared to other air pollutants5.

Deabji et al., (2025) have studied health risk assessment and source apportionment from aerosol particles in Morocco for city of Fez and rural Atlas Mohammed V (AMV).6 Bouma et al., (2025) has assess exposure methods based on nitrogen dioxide, black carbon, particulate matter and ultrafine particles.7 Saidi et al., (2023) have modelled air quality in city of Marrakech in Morocco in terms of particulate matter.8 Oufdou et al., (2021) forecasted daily surface ozone concentration using statistical models in grand Casablanca region of Morocco.9 Sekmoudi et al., (2020) has assessed air quality during COVID-19 lockdown in terms of PM2.5 and nitrogen dioxide.10 Ajdour et al., (2020) used CHIMERE/WRF for air quality modelling in Agadir city of Morocco. However, there is no modeling being done for the Mohammedia city of Morocco.11

Recently several studies have applied various approaches for noise cancellation in data series in terms of decomposition and optimization for better model adaptation. Li and Li, (2023), have used WOA (whale optimization algorithm) and CEEMDAN approach for decomposition for prediction air quality index (AQI).12 Wei and Du, (2025) has employed IChOA (improved chimp optimization algorithm) for model optimization for predicting PM2.5. 13 has also used decomposition approach using whale optimization algorithm for forecasting PM2.5. 14 has compared models of LSTM and SVM while employing complete ensemble empirical mode decomposition approach for predicting PM2.5. Though there are several studies on modelling for particulate matter. There is still lack of literature on modelling particulate matter in Mohammedia city of Morocco. Hence this study was carried out to address this research gap. The novelty of this study is that it compares the models in forecasting hourly PM2.5 and PM10 concentration. Another novelty of this study is to employ Fourier series and seasonal trend using Loess approach for decomposition. The objectives of this study are (a) Hourly forecast of PM2.5 and PM10 concentration in Mohammedia City, (b) comparative performance evaluation of support vector machine and long short-term memory algorithms and (c) determine the impact on model performance upon decomposition of data series. Figure 1 provides an overview of the proposed model architecture employed in this study.

Fig. 1.

Fig. 1

Proposed model for forecasting PM2.5 and PM10.

Methodology

Data collection

The authors obtained the hourly data for PM2.5 and PM10 concentration from the Department of Air quality service, national climate center, general directorate of meteorology for Mohammedia city, Morocco. Mohammedia city is located on coast of Atlantic Ocean. It is also an old port city. Currently Mohammedia city is host to oil refinery and adjacent industrial areas. The data was obtained for year 2020 December to Year 2021 November (Figs. 2 and 3). 70% of the data was used for training of models and 30% was used for testing of models. This study employs the Fourier series transform methods as preprocessing techniques to decompose the PM2.5 and PM10 time series data into their respective components—trend, seasonal, and residual components using STL, and sinusoidal frequency components using Fourier Transform. Preprocessing of data for modelling particulate matter is also reported by Ameri et al., (2023).14 Fourier decomposition is a method that breaks down periodic signals into sines and cosines functions. It helps to separate complicated waveforms into simpler, more predictable parts. The outcome is a series in which each term is a harmonic, which is a multiple of the base frequency. The coefficients of these terms (a₀, aₙ, bₙ) tell you how big each sine and cosine wave is. This is helpful for looking at signals in physics, engineering, and data compression. The Fourier transform gives a continuous version of this breakdown for signals that don’t repeat.

Fig. 2.

Fig. 2

Hourly concentration of PM2.5 during winter (a), spring (b), summer (c) and autumn season (d).

Fig. 3.

Fig. 3

Hourly concentration of PM10 during winter (a), spring (b), summer (c) and autumn season (d).

For a periodic function f(t) with period T, the Fourier series is:

graphic file with name d33e349.gif 1

A0​ is the average component an and bn​ are the coefficients that determine the amplitude of cosine and sine terms and are calculated like this :

graphic file with name d33e355.gif 2
graphic file with name d33e359.gif 3
graphic file with name d33e363.gif 4

Dataset description

The study uses a proprietary dataset of continuous hourly measurements of PM₂.₅ and PM₁₀ concentrations. A total of 7,442 hourly observations were available for PM₂.₅, of which 214 values (2.88%) were missing. For PM₁₀, 7,441 hourly observations were recorded, with 215 missing values (2.89%). Summary descriptive statistics for both pollutants indicate substantial variability. The mean concentration was 23.91 µg/m³ for PM₂.₅ and 28.49 µg/m³ for PM₁₀, with standard deviations of 24.44 µg/m³ and 24.64 µg/m³, respectively. Both pollutants exhibited values to 841.88 µg/m³ peaks for PM₂.₅ and 510.14 µg/m³ for PM₁₀. The distribution was right-skewed, as shown by the difference between the median and the upper percentiles: for PM₂.₅ the 25th, 50th, and 75th percentiles were 11.36 µg/m³, 18.74 µg/m³, and 29.98 µg/m³, while for PM₁₀ they were 14.64 µg/m³, 22.28 µg/m³, and 34.25 µg/m. These statistics suggest the presence of episodic high-pollution events influencing the upper tail of the distributions. A gap-length distribution was also computed. Most missing segments were between 1 and 3 h, but both pollutants exhibited gaps of up to 77 consecutive hours, indicating that naive imputation would distort the temporal structure.

Also, the structure of gap lengths. For PM₂.₅, missing values were not uniformly distributed: although short gaps of one to three hours represented 60% of all gap occurrences, they accounted for only 5.13% of all missing points. Gaps of 48, 58, and 77 consecutive hours each occurred once, but together represented 85.51% of all missing points, with the longest uninterrupted gap of 77 h (35.98% of all missing PM₂.₅ values).A similar pattern was observed for PM₁₀. Short gaps occurred most frequently, constituting 63.63% of all gaps, yet represented only 5.59% of the missing observations. As with PM₂.₅, the majority of missing values came from a small number of extended gaps. A 77-hour gap alone represented 35.81% of all missing PM₁₀ values, while additional gaps of 48 and 58 h contributed 22.33% and 26.98%, respectively (Fig. 4). These findings indicate that missingness was not random but concentrated in a few prolonged outages, likely due to sensor malfunction or communication interruptions. This justified the use of a seasonal-aware imputation method, as long gaps cannot be adequately reconstructed using simple linear or forward-fill approaches.

Fig. 4.

Fig. 4

distribution of missing gap length and Q-Q plot for PM2.5 and PM10.

To further evaluate the underlying data distribution, Q-Q plots against a theoretical normal distribution were generated for each pollutant. Both pollutants deviate significantly from normality. The middle portion of the distribution aligns moderately with the theoretical line, but strong upward curvature appears in the upper tail, indicating strong right-skewness.Extreme upper-tail deviations correspond to outlier pollution events.

Missing values handling

Missing observations were imputed before any other preprocessing step. Given that both PM₂.₅ and PM₁₀ exhibit strong and recurrent seasonality (daily, weekly, and monthly periodic patterns), simple interpolation techniques risked introducing bias or smoothing out real pollutant peaks. To address this, we adopted a seasonal interpolation approach, which proposes imputing missing values in three structured steps we first Extract the seasonal profile then interpolate the seasonally-adjusted residual series And finally Recompose the series by adding the seasonal component back Missing gaps ranged from single points to blocks of up to 77 h. Reconstructed segments were compared to nearby observed intervals to confirm imputation fidelity. The average RMSE was less than 5 µg/m³, indicating that the seasonal-aware interpolation maintained temporal integrity. Therefore, two parameter configurations were used as presented in Table 1.

Table 1.

Gap size of data, strategy and parameter values.

Gap size Strategy Parameter values
≤ 48 h aggressive interpolation preserving fast dynamics lo_frac = 0.05, lo_delta = 0.05
> 48 h smoother interpolation avoiding oscillation artifacts lo_frac = 0.04, lo_delta = 0.08

lo_frac controls how much of the series influences the local regression. Low values (0.04–0.05) ensure that interpolation remains local in time, reconstructing peaks rather than smoothing them away. lo_delta introduces linear interpolation when the sampling distance is sufficiently small, reducing computational burden while retaining fidelity. Although the time series exhibits daily, weekly, and seasonal variations, only a single seasonal period can be specified in the implementation of the seasonal interpolation algorithm. We evaluated multiple options (24, 168, and 744 h), and selected 24 h because the exploratory analysis revealed that the diurnal cycle is the strongest and most consistent pattern across the entire dataset. To explain further let the observed time series be denoted by Eq. 5.

graphic file with name d33e486.gif 5

A seasonal component S(t) was estimated using all observations occurring at the same hour on different days (dominant diurnal period = 24 h). This forms Eq. 6.

graphic file with name d33e495.gif 6

Where:

S(t) captures the stable 24-hour periodic pattern,

R(t) represents the de-seasonalized residuals.

In practice, R(t) is the only part subject to interpolation; the seasonal pattern S(t) is untouched. The residuals were interpolated using locally weighted regression (LOESS), a non-parametric method that fits a local polynomial to a moving window of data:

graphic file with name d33e507.gif 7

After interpolating the residual series, the seasonal signal was reintroduced to restore the original time-series structure:

graphic file with name d33e513.gif 8

Multi-seasonality decomposition, outlier detection

Air quality measurements collected at hourly resolution exhibited several superimposed seasonal behaviours. In our dataset, exploratory analysis confirmed a daily cycle linked to traffic emissions, a weekly pattern differentiating weekdays and weekends, and longer-term seasonal variation associated with meteorology and heating demand. To correctly isolate these temporal structures, we used Multiple Seasonal-Trend decomposition using LOESS (MSTL). MSTL is an extension of STL that supports multiple seasonalities simultaneously through iterative LOESS smoothing. Using MSTL ensures that predictable recurrence is removed before anomaly detection, thereby avoiding the misclassification of legitimate daily pollution peaks as outliers. After decomposition, outlier detection was performed exclusively on the residual component. For anomaly detection, we employed the Isolation Forest algorithm, a tree-based method that isolates abnormal observations by recursively partitioning the feature space. Observations requiring fewer partitions were considered anomalous removed and replaced with time interpolation.

For Multi-seasonality decomposition

Formally, each observation x(t)x(t)x(t), at timestamp ttt, is decomposed into:

graphic file with name d33e529.gif 9

where:

  • S₁(t), S₂(t), …, Sₖ(t) are the seasonal components,

  • T(t) is the long-term trend,

  • R(t) is the residual component (irregular fluctuations and anomalies).

Seasonal components were defined through the expected periodicities of the data:

graphic file with name d33e550.gif

For Outlier Detection :

The anomaly score for an observation is computed as:

graphic file with name d33e557.gif 10

where:

  • h(t) = average path length required to isolate the point R(t),

  • c(n) = normalization term based on expected path length for a dataset of size n.

In our implementation, n estimators = 512 and a contamination = 0.005 produced the most stable behavior. The large number of trees improved robustness the low contamination reflected expected low proportion of anomalies.

Support vector machine (SVM)

A machine learning method for predicting continuous outcomes, Support Vector Regression (SVR) is based on Support Vector Machines (SVM). It finds a hyperplane that best approximates the data within a given error tolerance after converting input features into a high-dimensional space using a kernel function. The Radial Basis Function (RBF) kernel is one of the most popular kernel types because of how well it captures non-linear interactions.

A weighted sum of kernel evaluations between the input and a collection of training points called support vectors is used in SVR to make predictions. The model’s complexity and prediction accuracy are balanced by parameters like the error tolerance , kernel width γ, and regularization constant C.

Support Vector Machine (SVM) algorithm is a reliable and flexible method that can be used for both classification and regression tasks. SVM functions similarly to classification in regression applications, with the main difference being how the loss function and margin are defined. By using kernel functions, the approach converts input data into a high-dimensional, nonlinear feature space, making it possible to choose the best hyperplane for prediction. Polynomial, sigmoid, linear, and radial basis function (RBF) kernels are examples of common kernel functions. Because of its ability to capture intricate, nonlinear interactions, the RBF kernel was used in this study to forecast PM2.5 concentrations. To improve predictive accuracy and guarantee model generalization, the SVM model was calibrated using an epsilon-insensitive loss function and a regularization parameter, with the epsilon value set to 1. The general equation used to predict regression function is presented in Eq. 11.

graphic file with name d33e588.gif 11

Where, f(x) = predicted output (PMx value), x: input feature vector (other parameters), Inline graphic: support vectors from training data, αi, Inline graphic, : lagrange multipliers determined during training, Inline graphic: Kernel function measuring similarity between input and support vector, b: bias term, N: number of support vectors.

Long-short term memory (LSTM)

A sophisticated subtype of recurrent neural networks (RNNs), Long Short-Term Memory (LSTM) networks were created to solve the vanishing gradient issue that usually prevents traditional RNN architectures from training. Because LSTM models can capture long-term dependencies, they perform better in tasks that require sequential data processing, such classification and time-series forecasting. Because LSTM incorporates three specific gates—the input gate, forget gate, and output gate—it has an architectural advantage. These gates work together to regulate the transmission, dismissal, and retention of data across time steps. In particular, the input gate controls how much fresh data enters the memory cell, the forget gate chooses which data from the previous state should be destroyed, and the output gate controls which data should be transmitted to the next hidden state. LSTM networks may effectively maintain pertinent temporal patterns while preventing learning loss over lengthy sequences thanks to this gating mechanism14. The typical equations used in LSTM algorithm are present as Eq. 14 - Eq. 17.

Forget gate

graphic file with name d33e626.gif 12

Input gate

graphic file with name d33e632.gif 13
graphic file with name d33e636.gif 14

Cell state update

graphic file with name d33e642.gif 15

Output Gate

graphic file with name d33e648.gif 16
graphic file with name d33e652.gif 17

To prevent data leaking, a time-series 5-fold cross-validation method was used to train and validate the SVM and LSTM models. The RBF kernel was used for SVM with a regularization value of 100 C = 100, a kernel width of 0.01 γ = 0.01, and an error margin of 0.1 ε = 0.1. Grid search was used to optimize these parameters on the training set. Two hidden layers (64 and 32 neurons) with ReLU activation, dropout of 0.2, Adam optimizer (learning rate 0.001), and early stopping after 10 epochs without improvement made up the LSTM model. To ensure that there was no overlap with the test data, Fourier decomposition was only applied to the training section prior to model fitting. To verify the enhancement offered by the suggested hybrid models, model performance was contrasted with baseline persistence and moving-average predictors.

Performance evaluation metrics

The model performance evaluation metrics used in this study are Mean Absolute Error (MAE), Root mean square error (RMSE) and Coefficient of determination (R2). The evaluation metrics is selected based on published literature which also used similar metrics to render relevancy of the study results (Li and Li, 2023; Wei and Du, 2025; and Zeng et al., 2024).

graphic file with name d33e664.gif 18
graphic file with name d33e668.gif 19
graphic file with name d33e672.gif 20

Results and discussion

This study compares the SVM and LSTM model for predicting PM2.5 and PM10 concentration. Additionally, this study deployed Fourier series decomposition approach to improve the prediction performance of the model. the models after decomposition application were defined as SVMF and LSTMF. The metrics employed to evaluate the model accuracy are MAE, RMSE and R2.

PM2.5 seasonal variation and modelling

PM2.5 concentration variation in four season is presented in Fig. 5. During winter season it was observed that the PM2.5 concentration as on average 18 µgm− 3 – 24 µgm− 3. The peak concentration was observed in between 7 pm to 1 am. Which is indicating time of sunset to late night-time indicating commercial and recreational activity taken by local population. During spring seasons, the average concentration was 14 µgm− 3 − 19 µgm− 3. There are two peaks observed, one between 1 pm to 3 pm and second peak a longer one between 5 pm to 12 am. During summer season, average PM2.5 was in range 18 µgm− 3 to 28 µgm− 3. In summer season the peaks were observed between 7 pm to 9 pm, after which PM2.5 concentration increased again between 11 pm −12 am, followed by another interval of 4 am to 6 am, followed by spike at 7 am. These were short peaks observed in summer season. However, the highest peak was observed in between 10 am to 1 pm. During autumn season, the average PM2.5 concentration was 20 µgm− 3 to 32 µgm− 3. The peak hours were recorded starting from 7 pm and increasing till 2 am. After which there was a very sharp drop in PM2.5 concentration.

Fig. 5.

Fig. 5

PM2.5 time series plot of the observed and predicted values for the four seasons and two best models.

From Table 1. it can be deduced that models with decomposition approach performed better than the models without Fourier decomposition for PM2.5 concentration in terms of RMSE, MAE and R2. From the four seasons viz., winter, spring, summer and autumn, the model performed best for Autumn season for predicting PM2.5. the prediction accuracy was consistent for all the four season in order SVM< LSTM< SVMF< LSTMF. Overall, from Table 2. It can be inferred that decomposition using Fourier series aids the models in better performance as compared to the models without preprocessing of data. In terms of MAE, RMSE < and MAE it can be inferred that SVMF & LSTMF models have better accuracy as compared to SVM and LSTM standalone models as presented in Fig. 5.

Table 2.

PM2.5 evaluation metrics of the four models.

TIME 1AM 2AM 3AM 4AM 5AM 6AM 7AM 8AM 9AM 10AM 11AM 12PM 1PM 2PM 3PM 4PM 5PM 6PM 7PM 8PM 9PM 10PM 11PM 12AM
WINTER SVM MAE 5.23 4.44 4.00 4.02 5.55 3.61 6.44 4.58 4.23 4.73 5.04 4.62 6.38 4.73 7.85 4.78 4.06 8.54 5.41 7.52 9.04 8.51 8.98 8.66
RMSE 9.24 5.86 5.47 5.88 7.07 5.32 7.54 7.36 5.59 7.18 6.30 8.18 7.47 6.62 7.96 7.16 5.32 8.92 7.98 8.74 11.6 9.10 6.35 9.35
R2 0.44 0.54 0.58 0.48 0.51 0.76 0.56 0.63 0.72 0.76 0.63 0.50 0.53 0.40 0.43 0.46 0.45 0.35 0.51 0.39 0.46 0.44 0.54 0.88
LSTM MAE 5.89 5.99 3.06 2.83 4.70 2.71 3.95 4.38 3.45 3.96 4.51 5.31 3.54 4.48 5.09 4.08 3.75 4.73 4.18 9.45 8.97 6.83 2.94 6.11
RMSE 7.60 8.31 4.28 4.97 6.52 4.40 5.52 5.71 4.00 5.74 6.25 5.84 4.38 6.22 6.89 5.49 4.93 7.57 6.11 10.1 10.4 10.8 5.13 1.91
R2 0.42 0.33 0.57 0.67 0.59 0.66 0.52 0.70 0.78 0.77 0.67 0.42 0.76 0.48 0.47 0.49 0.52 0.53 0.46 0.40 0.39 0.43 0.71 0.79
SVMF MAE 3.64 2.64 2.93 2.24 2.33 2.18 3.97 2.86 1.89 2.02 2.49 3.46 7.76 3.76 6.00 3.70 2.47 8.33 2.88 7.45 8.20 6.77 9.10 7.63
RMSE 4.94 3.02 4.60 4.24 4.29 3.04 5.75 3.17 2.13 4.15 3.61 3.69 7.85 4.59 7.80 4.38 4.07 9.61 4.42 8.68 10.2 7.21 9.59 7.91
R2 0.60 0.73 0.85 0.73 0.89 0.93 0.78 0.88 0.77 0.75 0.73 0.85 0.72 0.70 0.71 0.65 0.71 0.54 0.65 0.61 0.52 0.66 0.70 0.76
LSTMF MAE 2.07 2.51 1.82 1.85 1.83 1.61 2.09 2.04 1.17 2.02 2.24 2.23 2.22 3.63 3.10 3.06 2.11 2.16 2.88 2.08 3.64 3.58 8.63 1.87
RMSE 3.67 3.97 3.06 2.64 3.16 2.87 3.72 2.43 1.96 3.10 3.11 3.59 3.81 3.81 4.06 3.33 3.10 3.88 4.01 2.76 3.86 3.98 9.28 8.77
R2 0.69 0.76 0.91 0.80 1.00 0.74 0.74 0.75 0.95 0.83 0.72 0.76 0.94 0.72 0.86 0.91 0.70 0.77 0.72 0.65 0.77 0.62 0.74 0.70
SPRING SVM MAE 4.96 4.70 4.40 4.45 6.01 3.92 6.76 4.81 4.52 5.36 4.79 4.58 8.01 5.86 8.48 5.65 4.86 ### 7.26 8.08 10.3 9.51 11.82 12.17
RMSE 8.77 6.20 6.02 6.52 7.66 5.77 7.91 7.73 5.98 8.13 5.99 8.11 9.38 8.20 8.59 8.46 6.36 ### ### 9.40 13.2 10.1 8.36 13.14
R2 0.46 0.52 0.54 0.45 0.48 0.72 0.54 0.61 0.68 0.70 0.66 0.51 0.44 0.34 0.41 0.41 0.39 0.26 0.38 0.36 0.40 0.39 0.41 0.63
LSTM MAE 5.59 6.34 3.37 3.14 5.10 2.95 4.15 4.60 3.69 4.48 4.29 5.26 4.45 5.56 5.50 4.82 4.49 6.58 5.60 10.1 10.2 7.64 3.87 8.59
RMSE 7.21 8.79 4.72 5.51 7.06 4.78 5.79 6.00 4.28 6.50 5.94 5.79 5.51 7.71 7.44 6.49 5.90 ### 8.20 10.9 12.0 12.1 6.76 2.69
R2 0.44 0.32 0.53 0.62 0.55 0.63 0.50 0.67 0.75 0.70 0.70 0.42 0.63 0.41 0.44 0.44 0.45 0.39 0.35 0.37 0.33 0.38 0.54 0.56
SVMF MAE 3.45 2.79 3.22 2.48 2.52 2.37 4.17 3.01 2.02 2.29 2.37 3.43 9.74 4.66 6.48 4.37 2.96 ### 3.87 8.01 9.39 7.56 11.97 10.72
RMSE 4.69 3.19 5.06 4.70 4.64 3.30 6.04 3.33 2.28 4.70 3.43 3.65 9.86 5.69 8.42 5.18 4.88 ### 5.94 9.33 11.7 8.06 12.62 11.12
R2 0.63 0.70 0.79 0.68 0.84 0.88 0.76 0.85 0.73 0.68 0.76 0.86 0.60 0.58 0.68 0.57 0.62 0.40 0.49 0.56 0.44 0.58 0.53 0.55
LSTMF MAE 1.96 2.65 2.01 2.05 1.99 1.75 2.19 2.14 1.25 2.29 2.13 2.21 2.78 4.50 3.35 3.61 2.53 3.02 3.87 2.24 4.17 4.00 11.35 2.64
RMSE 3.49 4.21 3.37 2.93 3.42 3.12 3.90 2.55 2.10 3.51 2.95 3.56 4.78 4.72 4.38 3.93 3.71 5.40 5.38 2.97 4.42 4.44 12.21 12.33
R2 0.72 0.73 0.85 0.74 0.94 0.70 0.71 0.72 0.90 0.76 0.74 0.77 0.78 0.60 0.81 0.81 0.61 0.57 0.55 0.60 0.66 0.54 0.56 0.50
SUMMER SVM MAE 5.03 4.56 4.32 4.84 6.30 4.14 7.48 4.78 4.20 5.00 7.43 6.56 6.50 4.66 7.92 5.16 4.03 8.64 5.54 6.97 7.51 7.09 8.62 9.04
RMSE 8.88 6.01 5.91 7.09 8.03 6.10 8.75 7.68 5.56 7.58 9.29 11.6 7.61 6.53 8.02 7.73 5.28 9.02 8.16 8.10 9.64 7.58 6.09 9.76
R2 0.46 0.53 0.55 0.42 0.46 0.69 0.50 0.61 0.72 0.74 0.41 0.35 0.53 0.40 0.43 0.44 0.46 0.35 0.50 0.42 0.54 0.52 0.56 0.85
LSTM MAE 5.66 6.15 3.31 3.41 5.34 3.11 4.59 4.57 3.43 4.18 6.65 7.54 3.61 4.42 5.13 4.40 3.72 4.78 4.28 8.76 7.45 5.69 2.82 6.38
RMSE 7.30 8.52 4.63 5.99 7.40 5.05 6.41 5.96 3.97 6.06 9.20 8.29 4.46 6.14 6.95 5.93 4.90 7.65 6.25 9.43 8.72 9.05 4.93 2.00
R2 0.44 0.33 0.54 0.58 0.53 0.60 0.46 0.67 0.79 0.74 0.44 0.29 0.75 0.49 0.46 0.47 0.52 0.52 0.45 0.43 0.45 0.50 0.73 0.76
SVMF MAE 3.50 2.71 3.16 2.70 2.64 2.50 4.61 2.99 1.88 2.13 3.67 4.91 7.90 3.71 6.05 4.00 2.45 8.43 2.95 6.91 6.81 5.64 8.73 7.96
RMSE 4.75 3.09 4.97 5.12 4.87 3.49 6.68 3.31 2.12 4.38 5.32 5.23 7.99 4.53 7.86 4.73 4.04 9.72 4.53 8.05 8.52 6.01 9.20 8.26
R2 0.62 0.72 0.80 0.63 0.81 0.84 0.70 0.85 0.77 0.72 0.48 0.60 0.71 0.70 0.71 0.61 0.71 0.53 0.64 0.65 0.61 0.77 0.72 0.74
LSTMF MAE 1.99 2.57 1.97 2.23 2.08 1.85 2.42 2.13 1.16 2.14 3.30 3.17 2.26 3.58 3.13 3.30 2.10 2.19 2.95 1.93 3.02 2.98 8.28 1.96
RMSE 3.53 4.08 3.31 3.19 3.59 3.29 4.32 2.54 1.95 3.28 4.58 5.10 3.88 3.76 4.09 3.59 3.08 3.92 4.11 2.56 3.21 3.31 8.91 9.16
R2 0.72 0.75 0.86 0.70 0.90 0.68 0.66 0.72 0.95 0.80 0.47 0.53 0.93 0.72 0.85 0.87 0.71 0.76 0.71 0.70 0.90 0.72 0.76 0.67
AUTUMN SVM MAE 5.60 5.10 4.05 4.07 5.24 3.63 6.05 4.11 3.82 4.70 4.54 4.40 6.24 4.76 7.79 5.09 4.12 8.62 5.28 6.04 7.46 7.01 9.69 8.97
RMSE 9.90 6.73 5.54 5.95 6.68 5.34 7.08 6.60 5.06 7.13 5.68 7.79 7.31 6.66 7.90 7.62 5.39 9.00 7.78 7.03 9.58 7.49 6.85 9.69
R2 0.41 0.49 0.57 0.48 0.52 0.76 0.58 0.68 0.77 0.77 0.68 0.52 0.54 0.40 0.43 0.44 0.45 0.35 0.51 0.47 0.55 0.52 0.51 0.85
LSTM MAE 6.31 6.88 3.10 2.86 4.44 2.73 3.71 3.93 3.12 3.93 4.07 5.05 3.47 4.51 5.05 4.34 3.80 4.77 4.08 7.60 7.40 5.63 3.17 6.34
RMSE 8.14 9.53 4.34 5.03 6.16 4.42 5.19 5.12 3.62 5.70 5.63 5.56 4.29 6.26 6.84 5.84 5.00 7.64 5.96 8.18 8.66 8.95 5.54 1.98
R2 0.40 0.30 0.57 0.67 0.61 0.66 0.54 0.75 0.84 0.77 0.72 0.43 0.77 0.48 0.47 0.47 0.51 0.52 0.47 0.48 0.45 0.51 0.67 0.77
SVMF MAE 3.90 3.03 2.96 2.27 2.20 2.19 3.73 2.57 1.71 2.00 2.25 3.29 7.59 3.78 5.96 3.94 2.50 8.41 2.81 5.99 6.77 5.57 9.82 7.91
RMSE 5.29 3.46 4.66 4.29 4.05 3.05 5.41 2.84 1.93 4.12 3.25 3.51 7.68 4.62 7.74 4.66 4.13 9.70 4.32 6.98 8.46 5.94 10.35 8.20
R2 0.57 0.65 0.84 0.72 0.93 0.93 0.81 0.95 0.82 0.75 0.79 0.88 0.73 0.69 0.72 0.62 0.70 0.54 0.66 0.73 0.61 0.77 0.65 0.74
LSTMF MAE 2.22 2.88 1.85 1.87 1.73 1.62 1.96 1.83 1.05 2.01 2.02 2.13 2.17 3.65 3.08 3.25 2.14 2.18 2.81 1.68 3.00 2.95 9.31 1.94
RMSE 3.94 4.56 3.10 2.68 2.98 2.88 3.49 2.18 1.78 3.08 2.80 3.42 3.73 3.84 4.03 3.54 3.14 3.91 3.91 2.22 3.19 3.27 10.01 9.09
R2 0.65 0.68 0.90 0.79 0.94 0.74 0.77 0.80 0.95 0.83 0.77 0.79 0.95 0.71 0.86 0.88 0.70 0.76 0.74 0.78 0.90 0.73 0.69 0.68

Kim et al., (2022) employed tree-based machine learning algorithm to predict PM2.5. the light gradient boosting algorithm and reported RMSE value of 7.48 µgm− 3, and R2 = 0.83.4 Masood and Ahmad, (2020) have used ANN to predict PM2.5 in city of Delhi, India and reported R2 value of 0.856 for average PM2.5 concentration of 137.77 µgm− 3.15 While when SVM was employed, it gave R2 value of 0.730. Tao et al., (2023) has employed LSTM in combination with Map curve and Cramer’s V for predicting PM2.5 concentration in Iraq and reported R2 value of 0.95 and 0.91 respectively.16 Seng et al., (2021) used LSTM based neural network for predicting PM2.5, and reported RMSE value in range of 8.85 µgm− 3 to 10.48µgm− 3 and R2 value in range of 0.96–0.97.17 Meng et al., (2025)18 have reported RMSE value in range of 18.39 to 24.78, 18.99 to 25.66 and 17.53 to 23.17, R2 0.91 to 0.942, 0.911 to 0.935 and 0.92 to 0.95 for three different dataset PM2.5 in combination with PM10, NO2, SO2, O3 and CO, (dataset-1), dataset-1 in combination with temperature, dew point, wind speed and pressure, and dataset-3 consisted of only PM2.5 respectively. They observed that the BiLSTM network in combination with attention mechanism performed best for predicting PM2.5 individually instead of combination of other air pollutants dataset.

PM10 seasonal variation and modelling

PM10 concentration variation in four season is presented in Fig. 6. During winter season it was observed that the PM10 concentration as on average 18 µgm− 3 – 30 µgm− 3. The peak concentration was observed in between 8 pm to 12 am. This can be attributed to the local population activities pertaining to dining out, increased traffic conditions and commercial activities. During spring seasons, the average concentration was 18 µgm− 3 − 24 µgm− 3. In contrast to PM2.5 two peaks observed in spring season, PM10 concentration peak started from 3 pm continuously rising till 10 pm and declining till 2 am. During summer season, average PM10 was in range 18 µgm− 3 to 24 µgm− 3. In summer season the peaks were observed between 7 am to 1 pm, after which PM10 concentration increased again between 3 pm −7 pm. During autumn season, the average PM10 concentration was 20 µgm− 3 to 41 µgm− 3. The peak hours were recorded starting from 6 pm and increasing till 2 am, after which there was a sudden drop in between 4am to 5 am. Bathmanabhan and Saragur Madanayak, (2010) have reported PM10 concentration to be 189 µgm− 3, 102 µgm− 3 and 135 during post monsoon, winter and summer season respectively in urban city of Chennai, India.19

Fig. 6.

Fig. 6

PM2.5 observed values against predicted values from two best models.

Similar to PM2.5 models performance, from Table 2 it can be deduced that models with decomposition approach performed better than the models without Fourier decomposition for predicting PM10. From the four seasons viz., winter, spring, summer and autumn, the model performed best for Autumn season for predicting PM10 followed closely by model for winter season. The prediction accuracy was consistent for all the four season in order SVM< LSTM< SVMF< LSTMF. Overall, from Table 3, it can be inferred that decomposition using Fourier series aids the models in better performance as compared to the models without preprocessing of data. In terms of MAE, RMSE, and R2 it can be inferred that SVMF & LSTMF models have better accuracy as compared to SVM and LSTM standalone models which is presented in Fig. 4.

Table 3.

PM10 evaluation metrics of the four models.

TIME 1AM 2AM 3AM 4AM 5AM 6AM 7AM 8AM 9AM 10AM 11AM 12PM 1PM 2PM 3PM 4PM 5PM 6PM 7PM 8PM 9PM 10PM 11PM 12AM
WINTER SVM MAE 5.58 4.81 3.73 4.36 5.25 3.89 6.81 4.26 4.51 4.40 5.48 4.90 6.02 4.33 8.51 4.51 4.42 8.98 5.86 6.87 8.52 9.25 9.52 9.31
RMSE 10.1 6.30 5.17 6.30 6.42 5.80 8.00 6.68 5.90 7.77 5.88 8.62 6.73 7.12 7.46 6.58 4.89 8.25 8.38 9.51 12.2 9.95 6.69 10.25
R2 0.47 0.59 0.55 0.44 0.46 0.68 0.52 0.68 0.66 0.82 0.67 0.55 0.50 0.44 0.41 0.49 0.49 0.39 0.55 0.42 0.43 0.42 0.59 0.81
LSTM MAE 6.33 5.69 3.23 2.67 4.98 2.55 3.58 4.64 3.14 3.64 4.27 5.73 3.76 4.10 5.38 3.73 3.42 5.06 3.91 10.2 9.62 7.46 3.23 5.52
RMSE 7.17 9.02 4.03 5.34 7.15 4.14 5.80 6.20 4.24 6.17 6.80 5.41 4.07 5.67 6.40 5.01 4.47 7.99 5.56 10.8 11.2 9.99 4.68 2.04
R2 0.45 0.35 0.61 0.62 0.63 0.71 0.55 0.65 0.74 0.82 0.64 0.38 0.69 0.51 0.50 0.54 0.48 0.57 0.50 0.43 0.41 0.40 0.67 0.85
SVMF MAE 3.36 2.49 2.65 2.05 2.21 2.07 4.25 2.59 2.00 2.19 2.33 3.26 8.21 4.00 5.47 3.96 2.62 7.88 3.04 7.06 8.84 6.35 8.21 7.04
RMSE 5.40 3.19 4.97 4.57 3.95 3.31 5.22 2.93 1.96 4.44 3.83 3.45 7.07 4.18 7.41 4.65 3.87 8.69 4.06 7.90 11.2 7.85 10.21 7.46
R2 0.56 0.78 0.78 0.69 0.81 0.84 0.85 0.94 0.83 0.70 0.78 0.80 0.66 0.64 0.78 0.59 0.78 0.50 0.69 0.56 0.49 0.59 0.75 0.84
LSTMF MAE 2.23 2.65 1.71 1.74 1.95 1.48 2.21 2.19 1.26 1.87 2.06 2.40 2.06 3.95 3.27 2.82 1.91 2.33 2.63 2.25 3.88 3.79 9.21 2.02
RMSE 3.38 3.74 3.24 2.90 3.33 3.04 3.96 2.60 1.77 2.91 3.28 3.37 4.07 4.08 4.41 3.62 3.31 4.11 3.71 3.01 3.64 4.23 9.83 9.29
R2 0.73 0.82 0.86 0.75 0.96 0.69 0.80 0.67 0.91 0.78 0.75 0.71 0.86 0.77 0.81 0.96 0.77 0.70 0.66 0.58 0.81 0.59 0.69 0.65
SPRING SVM MAE 4.64 5.11 4.81 4.82 5.62 4.22 7.24 4.42 4.26 5.77 5.09 4.14 8.74 6.34 7.73 6.14 4.60 12.7 7.96 7.55 10.8 10.3 12.68 11.16
RMSE 9.32 6.58 6.35 5.99 6.93 6.07 8.68 8.40 6.54 8.81 6.43 8.76 8.57 7.67 9.37 9.14 5.92 13.2 11.2 10.1 14.5 11.0 7.81 11.90
R2 0.50 0.48 0.50 0.49 0.44 0.79 0.59 0.67 0.75 0.64 0.62 0.46 0.47 0.31 0.43 0.39 0.36 0.29 0.41 0.33 0.43 0.36 0.45 0.66
LSTM MAE 5.99 5.87 3.65 2.92 5.56 3.13 4.36 4.34 3.94 4.11 3.94 4.85 4.01 5.94 5.12 5.07 4.21 5.98 5.30 11.0 11.0 6.98 3.51 7.87
RMSE 6.69 9.34 5.09 5.83 7.62 5.24 5.28 5.54 4.53 6.90 5.34 6.20 5.89 8.36 6.85 6.88 6.24 9.88 7.57 11.5 13.2 13.2 6.29 2.86
R2 0.41 0.35 0.50 0.56 0.52 0.57 0.53 0.63 0.70 0.64 0.74 0.39 0.59 0.44 0.47 0.41 0.48 0.41 0.37 0.40 0.30 0.40 0.50 0.60
SVMF MAE 3.11 2.63 2.98 2.27 2.38 2.25 4.49 2.71 2.18 2.42 2.55 3.20 9.00 5.10 5.87 4.62 2.74 12.4 4.18 7.30 10.3 8.02 11.37 11.54
RMSE 4.97 2.97 4.63 5.08 4.34 3.02 6.53 3.07 2.15 5.02 3.17 3.96 10.5 5.28 9.26 5.61 5.26 14.0 6.42 8.43 12.8 7.36 13.72 10.17
R2 0.57 0.63 0.71 0.72 0.91 0.83 0.83 0.80 0.77 0.75 0.71 0.91 0.56 0.55 0.61 0.61 0.57 0.43 0.53 0.59 0.42 0.55 0.57 0.50
LSTMF MAE 2.06 2.51 1.85 1.89 2.10 1.85 2.40 2.35 1.34 2.49 2.34 2.37 2.62 4.86 3.57 3.34 2.30 2.85 4.19 2.07 3.90 3.61 12.11 2.47
RMSE 3.75 3.99 3.08 3.17 3.64 3.36 3.68 2.41 1.99 3.16 3.22 3.33 4.34 4.45 4.12 3.62 3.38 4.94 4.99 2.74 4.12 4.86 11.14 11.47
R2 0.77 0.78 0.79 0.80 0.88 0.74 0.76 0.67 0.83 0.71 0.78 0.82 0.70 0.55 0.89 0.85 0.67 0.60 0.58 0.63 0.69 0.58 0.52 0.46
SUMMER SVM MAE 4.56 4.27 4.68 4.37 5.98 3.76 6.93 4.38 4.60 4.54 8.05 7.04 5.85 4.97 7.14 4.77 4.40 8.16 6.01 7.33 7.95 7.70 9.13 8.45
RMSE 9.47 5.61 6.50 7.78 7.58 6.64 9.24 8.11 6.04 8.08 8.41 10.9 8.36 6.88 7.43 6.98 5.60 8.25 8.93 8.86 8.96 7.16 5.64 9.14
R2 0.41 0.56 0.51 0.46 0.49 0.73 0.46 0.64 0.67 0.81 0.44 0.32 0.57 0.37 0.46 0.40 0.41 0.38 0.46 0.39 0.58 0.57 0.51 0.79
LSTM MAE 5.12 6.51 3.62 3.13 4.97 3.28 4.93 4.30 3.69 4.51 6.16 8.21 3.92 4.77 4.63 4.67 3.47 4.46 4.05 9.49 7.86 6.14 2.56 5.97
RMSE 7.81 8.06 4.96 5.42 6.78 5.44 7.04 5.64 3.60 5.61 8.69 7.47 4.03 6.61 7.34 6.44 5.27 8.39 5.92 10.3 9.55 8.51 5.40 1.88
R2 0.40 0.35 0.49 0.64 0.50 0.66 0.42 0.72 0.86 0.79 0.40 0.31 0.79 0.46 0.49 0.44 0.48 0.56 0.42 0.47 0.42 0.53 0.68 0.71
SVMF MAE 3.79 2.46 2.97 2.89 2.85 2.31 5.01 3.23 1.75 2.32 3.96 4.57 7.16 4.07 6.58 3.70 2.32 7.66 2.71 6.52 7.20 6.14 9.23 7.18
RMSE 5.20 3.32 5.24 5.53 4.41 3.17 7.22 2.98 2.32 4.66 5.70 5.56 8.52 4.96 7.20 5.10 3.72 8.87 4.18 7.45 8.04 5.70 8.32 8.77
R2 0.59 0.77 0.84 0.67 0.89 0.92 0.63 0.81 0.82 0.67 0.52 0.55 0.64 0.66 0.66 0.57 0.65 0.57 0.59 0.62 0.56 0.71 0.66 0.68
LSTMF MAE 2.14 2.81 1.79 2.42 1.97 1.98 2.25 2.30 1.08 1.93 3.12 2.92 2.48 3.86 2.91 3.02 1.97 2.33 3.21 1.79 3.29 3.19 7.59 1.78
RMSE 3.87 3.87 3.04 2.92 3.33 3.46 4.73 2.72 1.77 2.96 4.90 5.36 3.66 3.99 4.43 3.85 2.89 3.65 4.39 2.35 2.95 3.05 8.29 8.31
R2 0.66 0.82 0.92 0.76 0.97 0.63 0.60 0.67 0.87 0.85 0.50 0.50 0.85 0.67 0.91 0.93 0.66 0.70 0.77 0.73 0.81 0.76 0.69 0.73
AUTUMN SVM MAE 5.97 4.68 4.46 3.74 4.73 3.41 5.73 4.33 4.04 5.07 4.09 4.64 5.66 5.00 8.36 4.58 4.46 9.17 5.63 5.68 8.07 6.51 10.49 9.42
RMSE 9.25 6.20 5.05 6.35 6.14 5.00 6.46 5.96 5.54 7.60 6.21 8.47 7.83 6.10 8.56 8.05 5.03 9.75 7.02 7.46 8.84 8.00 7.45 8.93
R2 0.38 0.46 0.63 0.52 0.49 0.72 0.53 0.63 0.81 0.81 0.71 0.56 0.58 0.37 0.47 0.40 0.41 0.38 0.46 0.43 0.51 0.48 0.55 0.79
LSTM MAE 5.83 6.20 3.40 3.07 4.20 2.54 3.90 3.69 3.30 4.25 4.36 4.60 3.72 4.78 5.35 4.65 3.53 5.15 3.68 6.95 6.70 5.27 2.98 6.89
RMSE 7.57 10.0 3.99 5.49 5.85 4.00 5.49 4.81 3.41 5.41 5.08 5.23 4.61 6.78 7.30 6.33 4.72 8.16 6.36 8.59 9.42 9.64 5.15 1.85
R2 0.38 0.27 0.53 0.63 0.65 0.60 0.59 0.79 0.89 0.82 0.79 0.39 0.84 0.44 0.50 0.51 0.48 0.57 0.51 0.52 0.48 0.47 0.73 0.81
SVMF MAE 3.56 3.27 2.78 2.42 2.06 2.01 3.47 2.77 1.59 2.18 2.07 3.03 7.08 4.00 6.35 3.68 2.34 7.67 2.60 6.45 6.34 5.02 8.96 7.32
RMSE 4.77 3.70 4.36 4.01 3.73 3.31 5.00 2.57 1.79 3.76 3.08 3.17 7.24 4.22 7.17 4.29 3.80 8.89 4.73 7.37 9.17 6.36 9.68 8.77
R2 0.52 0.62 0.76 0.77 0.87 0.91 0.77 0.93 0.75 0.70 0.85 0.79 0.66 0.74 0.79 0.57 0.65 0.48 0.60 0.77 0.66 0.85 0.70 0.80
LSTMF MAE 2.09 3.04 1.68 1.70 1.84 1.71 1.80 1.72 0.99 1.82 2.18 2.00 1.99 3.36 3.36 3.05 2.00 2.33 2.62 1.54 3.20 3.14 8.45 1.78
RMSE 3.72 4.97 2.89 2.87 2.69 3.12 3.27 2.31 1.88 2.82 2.97 3.69 3.44 4.04 4.32 3.88 3.36 4.23 3.57 2.07 3.38 3.44 10.97 8.31
R2 0.71 0.63 0.82 0.74 0.98 0.69 0.71 0.74 0.92 0.88 0.69 0.72 0.88 0.66 0.78 0.83 0.76 0.70 0.81 0.84 0.95 0.66 0.63 0.72

Kim et al., (2022) has reported tree based learning algorithm precision in predicting PM10 with R2 value of 0.86, and RMSE value of 13.15 µgm− 3.4 Gualtieri et al., (2025) assessed the precision of Copernicus Atmosphere Monitoring Service European in predicting PM10 with RMSE value in range of 22.2 µgm− 3 to 37.8 µgm− 3.20 Alsowaidan et al., (2024) has used XGBoost model to predict PM10 prediction and reported RMSE to be 0.255 and R2 to be 0.999, and random forest approach gave RMSE value of 6.295 and R2 value of 0.889.20,21Gaowa et al., (2024) has predicted PM10 concentration with R2 in range of 0.78 to 0.81 by using random forest approach, multilayer neural network and long-short term memory neural network.22 Lee et al., (2023) have developed AI model to predict PM10 concentration based on chemical and physical data to the accuracy of 90%.23 Zhai and Cheng, (2020) have predicted PM10 concentration using LSTM and MLR (multiple linear regression) models and observed MAE and RMSE in between 34 µgm− 3 to 59 µgm− 3 and 35 µgm− 3 to 49 µgm− 3 respectively.24

Discussion

Figures 5 and 6 exhibit the time series data presented by dash lines for all four seasons. For Figs. 5 and 6 both better performing models (SVMF and LSTMF) are considered for all four seasons. Both models forecasting is presented using red line (SVMF) and green line (LSTMF). From both figures it can be observed that forecasts follow closely to the measured PM2.5 and PM10 values. This clearly exhibits the accuracy in identifying the trend of the particulate matter. In Tables 2 and 3, it can be observed that hourly prediction value is in between average to good. This is attributed to the fact that in previous studies no one has attempted to predict hourly values, instead the published literature has in general incorporated overall prediction values using hourly data. However, it must be noted that when it comes to air quality assessment, the peak hours are particularly responsible for adverse impact on human health. Additionally, peak hours are the period where general population is most exposed to the air pollution. This necessitates that model can predict hourly concentrations. However, noise cancellation being deployed it cancels the sudden peaks. This must be noted that sudden extreme peaks are not common occurrence and are rare events, so cancelling these peaks would not compromise the forecast value against its health risk estimation.

However, to incorporate the general trend of model Fig. 7 presents seven days ahead prediction using each model used in this study and R2 value for the best model i.e., LSTMF is included in the figure itself. From Fig. 5, it can be inferred that each model gives good results and precision in predicting particulate matter concentration. The RMSE value for the model was 3.95 µgm− 3, and mean absolute error was 2.66 µgm− 3 for winter season. For spring season MAE value was 3.06 µgm− 3 and RMSE value was observed to be 4.28 µgm− 3. The mean absolute error was 2.67 µgm− 3 and RMSE was 3.94 µgm− 3 for summer season. The autumn season was 2.47 µgm− 3 and RMSE was 3.84 µgm− 3. From study results it can be derived that hour wise prediction even though have lower performance, but hourly prediction models are performing good. Hence Fourier decomposition approach can enhance the prediction performance of the SVM and LSTM models. Ameri et al., (2023) has employed CEEMDAN decomposition approach and have reported similar results to this study. They concluded that preprocessing data does improves the accuracy of the prediction models. Chu et al., (2023) has used ensemble empirical decomposition approach in combination with CEEMDAN to predict PM2.5 with R2 value of 0.91.25 Zeng et al., (2024) have predicted PM2.5 in combination of LSTM with variational modal decomposition (VMD) and whale optimization algorithm (WOA) reporting RMSE for 1–6 h, 7–12 h and 13 to 24 h interval as 18.12, 20.17, and 5.36.13 From this study it can be inferred that higher the interval model accuracy increases. Hence, this results of the study on hour wise accuracy also showed similar lower accuracy in model. Hence when this study used hourly data irrespective of hour wise intervals the accuracy of the models also increased significantly.

Fig. 7.

Fig. 7

Scatter plot measured and forecasted PM2.5 and PM10 for 7 day ahead for all four seasons.

Conclusion

This study was a carried out to forecast PM2.5 and PM10 in Mohammedia city of Morocco. Support vector machine and long-short memory models were used for prediction. Fourier series decomposition approach was employed for data processing. The study evaluated the models performance based on decomposition and non-decomposition approach. The integration of decomposition approach significantly improved model accuracy. Both SVM and LSTM model performance was significantly improved. Among all the models, LSTMF model performed better and adopted well with long-term and short-term fluctuations.

The seasonal prediction allowed to get an insight into variation of particulate matter. Also impact of season on model performance can be evaluated. It was observed that LSTMF model performed best in autumn season for predicting both PM2.5 and PM10. From the results it can be included that combining deep learning algorithms with mathematical decomposition provides a robust framework for air quality forecasting.

From the results it can be observed that the hourly prediction reduces the model performance. The hourly prediction can aid decision makers and policy developers to identify the peaks of air pollution and can come up with necessary mitigation methods and strategies to curb short term peaks of particulate matter. Future work could explore hybrid models or the inclusion of meteorological variables to further enhance forecasting performance, with potential applications in early warning systems and real-time pollution management strategies.

The use of data from a single monitoring site in this study may limit how broadly the results can be applied to other metropolitan settings. The models do not explicitly include meteorological or emission-related variables, which could further enhance forecast ability, especially during extreme occurrences. Instead, they only use historical particulate matter concentrations. Additionally, infrequent short-term pollution spikes may be mitigated by the application of decomposition and interpolation techniques. Longer forecasting horizons, more predictors, and multi-site datasets should all be considered in future research.

Funding.

The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work through Large Research Project under grant number RGP2/67/46. The database used in this article was funded under project scheme, programme ibn khaldoun d appui à la recherche dans le domaine des sciences humaines et sociales, CNRST, Morocco, for project entitled “Health security in Casablanca” project number IK/2018/23.

Authors Contribution.

M.B. contributed to methodology, data curation, and writing – original draft. M.Y. was responsible for conceptualization, validation, supervision, and resources. R.E.M. carried out formal analysis, investigation, software development, and visualization. M.A. contributed to methodology and review & editing. R.A.K. handled project administration and contributed to writing. All authors reviewed and approved the final manuscript.

Acknowledgements

The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work through Large Research Project under grant number RGP2/67/46. The database used in this article was funded under project scheme, programme ibn khaldoun d appui à la recherche dans le domaine des sciences humaines et sociales, CNRST, Morocco, for project entitled “Health security in Casablanca” project number IK/2018/23.

Author contributions

M.B. contributed to methodology, data curation, and writing – original draft.M.Y. was responsible for conceptualization, validation, supervision, and resources.R.E.M. carried out formal analysis, investigation, software development, and visualization.M.A. contributed to methodology and review & editing.R.A.K. handled project administration and contributed to writing.All authors reviewed and approved the final manuscript.

Data availability

The datasets analysed during the current study were obtained from the “Direction de la Météorologie Nationale” (Morocco) and are not publicly available due to data sharing restrictions.However, they are available from the corresponding author (Bennis Mohamed) upon reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Anggraini, T. S., Irie, H., Sakti, A. D. & Wikantika, K. Global air quality index prediction using integrated Spatial observation data and geographics machine learning. Sci. Remote Sens.11, 100197 (2025). [Google Scholar]
  • 2.Ravindiran, G. et al. Ensemble stacking of machine learning models for air quality prediction for Hyderabad City in India. iScience28, 111894 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wei, M. & Du, X. Machine learning with applications apply a deep learning hybrid model optimized by an improved chimp optimization algorithm in PM 2. 5 prediction. Mach. Learn. Appl.19, 100624 (2025). [Google Scholar]
  • 4.Kim, B. Y., Lim, Y. K. & Cha, J. W. Short-term prediction of particulate matter (PM10 and PM2.5) in Seoul, South Korea using tree-based machine learning algorithms. Atmos. Pollut Res.13, 101547 (2022). [Google Scholar]
  • 5.Srijiranon, K. Applied sciences Neuro-Fuzzy transformation with minimize entropy principle to create new features for particulate matter prediction. Appl Sci11, 16(2021).
  • 6.Deabji, N. et al. A twin site study of size-resolved composition, source apportionment and health impacts of aerosol particles in Morocco. Atmos. Environ.355, 121273 (2025). [Google Scholar]
  • 7.Bouma, F. et al. Comparison of air pollution mortality effect estimates using different long-term exposure assessment modelling methods. Environ. Res.279, 121832 (2025). [DOI] [PubMed] [Google Scholar]
  • 8.Saidi, L., Valari, M. & Ouarzazi, J. Air quality modeling in the City of Marrakech, Morocco using a local anthropogenic emission inventory. Atmos. Environ.293, 119445 (2023). [Google Scholar]
  • 9.Oufdou, H., Bellanger, L., Bergam, A. & Khomsi, K. Forecasting daily of surface Ozone concentration in the grand Casablanca region using parametric and nonparametric statistical models. Atmosphere (Basel)12, 19 (2021).
  • 10.Sekmoudi, I., Khomsi, K. & Faieq, S. Covid- 19 lockdown improves air quality in Morocco. (2020).
  • 11.Ajdour, A. et al. Towards air quality modeling in Agadir City (Morocco). Mater. Today Proc.24, 17–23 (2020). [Google Scholar]
  • 12.Li, Y. & Li, R. A hybrid model for daily air quality index prediction and its performance in the face of impact effect of COVID-19 lockdown. Process. Saf. Environ. Prot.176, 673–684 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zeng, T. et al. A hybrid optimization prediction model for PM2.5 based on VMD and deep learning. Atmos. Pollut Res.15, 102152 (2024). [Google Scholar]
  • 14.Ameri, R. et al. Forecasting PM 2.5 concentration based on integrating of CEEMDAN decomposition method with SVM and LSTM. Ecotoxicol. Environ. Saf.266, 115572 (2023). [DOI] [PubMed] [Google Scholar]
  • 15.Masood, A. & Ahmad, K. A model for particulate matter (PM2.5) prediction for Delhi based on machine learning approaches. Procedia Comput. Sci.167, 2101–2110 (2020). [Google Scholar]
  • 16.Tao, H. et al. Machine learning algorithms for high-resolution prediction of Spatiotemporal distribution of air pollution from meteorological and soil parameters. Environ. Int.175, 107931 (2023). [DOI] [PubMed] [Google Scholar]
  • 17.Seng, D., Zhang, Q., Zhang, X., Chen, G. & Chen, X. Spatiotemporal prediction of air quality based on LSTM neural network. Alexandria Eng. J.60, 2021–2032 (2021). [Google Scholar]
  • 18.Meng, X., Xie, C., Tang, X. & Pan, Y. Prediction of particulate matter 2. 5 concentration based on attention mechanism and convolutional BiLSTM network. Discov Appl. Sci (2025).
  • 19.Bathmanabhan, S. & Saragur Madanayak, S. N. Analysis and interpretation of particulate matter - PM10, PM2.5 and PM1 emissions from the heterogeneous traffic near an urban roadway. Atmos. Pollut Res.1, 184–194 (2010). [Google Scholar]
  • 20.Gualtieri, G. et al. Assessing capability of copernicus atmosphere monitoring service to forecast PM2.5 and PM10 hourly concentrations in a European air quality hotspot. Atmos. Pollut Res.16, 102567 (2025). [Google Scholar]
  • 21.Alsowaidan, S., Al-Hurban, A., Alsaber, A. & Anbar, A. Assessment of seasonal variations in the air quality index (2019–2022) in Al-Jahra city, Kuwait. Kuwait J. Sci.51, 100280 (2024). [Google Scholar]
  • 22.Gaowa, S. et al. Using artificial neural networks to predict indoor particulate matter and TVOC concentration in an office building: model selection and method development. Energy Built Environ.10.1016/j.enbenv.2024.03.001 (2024). [Google Scholar]
  • 23.Lee, J. Y. Y., Miao, Y., Chau, R. L. T., Hernandez, M. & Lee, P. K. H. Artificial intelligence-based prediction of indoor bioaerosol concentrations from indoor air quality sensor data. Environ. Int.174, 107900 (2023). [DOI] [PubMed] [Google Scholar]
  • 24.Zhai, W. & Cheng, C. A long short-term memory approach to predicting air quality based on social media data. Atmos. Environ.237, 117411 (2020). [Google Scholar]
  • 25.Chu, Y. et al. Three-hourly PM2.5 and O3 concentrations prediction based on time series decomposition and LSTM model with attention mechanism. Atmos. Pollut Res.14, 101879 (2023). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets analysed during the current study were obtained from the “Direction de la Météorologie Nationale” (Morocco) and are not publicly available due to data sharing restrictions.However, they are available from the corresponding author (Bennis Mohamed) upon reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES