Abstract
Time series prediction is widely applied in transportation management, energy scheduling, and weather forecasting, with deep learning emerging as a dominant approach due to its powerful temporal feature extraction capabilities. However, the predictive performance of deep learning models heavily depends on training strategies, while traditional optimization methods suffer from limited efficiency, ultimately affecting model performance. To address this issue, this study designs a Hybrid Optimization Expert System (HOES) based on six evolutionary algorithms to optimize deep learning models for time series prediction. HOES integrates multiple evolutionary algorithms and employs a transmission mechanism, memory system, and punishment system to achieve collaborative optimization. Specifically, the transmission mechanism enhances global search capability, the memory system preserves historically optimal solutions to prevent search degradation, and the punishment system eliminates ineffective optimization strategies; together, these mechanisms improve optimization efficiency. Experiments on six public datasets—Traffic, Weather, Household, Wind Power, Solar Power, and ETT_m1 demonstrate that HOES enhances predictive accuracy and convergence rate. SJ-LSTM is used for validation as a representative time-series forecasting model. Specially, on the Solar power dataset, the HOES-optimized SJ-LSTM model achieved a 24% reduction in RMSE, a 30% reduction in MAE, compared to suboptimal algorithm. These findings indicate that HOES significantly enhances the global optimization capability of deep learning models, mitigates the risk of local optima, which is also well-suited for complex time series prediction tasks.
Keywords: Time series prediction, Deep learning, Evolutionary algorithms, Hybrid optimization expert system
Subject terms: Electrical and electronic engineering, Information technology, Computer science, Computational science
Introduction
Time series prediction is essential in transportation management1, energy scheduling2, and weather forecasting3, enabling efficient decision-making and resource allocation. Deep learning has become a dominant approach due to its strong temporal feature extraction capabilities, achieving state-of-the-art performance in complex forecasting tasks4,5 as time series data continue to grow in scale and complexity.
However, the predictive performance of deep learning models heavily depends on training strategies, which significantly influences model accuracy and generalization6. Traditional optimization methods, such as grid search and random search, require extensive computational resources and fail to guarantee globally optimal solutions7. Recently, evolutionary algorithms (EAs) have gained popularity in training strategy due to their superior global search capability8–10. Nevertheless, single evolutionary optimization methods often suffer from limitations, such as slow convergence, sensitivity to parameter settings, and susceptibility to local optima, leading to suboptimal model performance11–17.
Quantitative evidence underscores the persistent limitations of various optimization strategies. While single Evolutionary Algorithms (EAs) can reduce the high prediction errors associated with traditional methods like grid search (which yield an RMSE of ≈ 0.0712), their effectiveness remains constrained18. For example, reported RMSE values for algorithms like Particle Swarm Optimization still linger around 0.04519,20. For high-risk applications such as power grid stability21, such errors are often unacceptable and highlight critical performance bottlenecks. These persistent inefficiencies are a direct result of their widely acknowledged structural flaws: a tendency to converge prematurely to suboptimal solutions and an inability to effectively balance the exploration-exploitation trade-off22. This highlights the urgent need for more robust optimization strategies capable of breaking past the performance ceiling of individual algorithms.
To address this key challenge of balancing optimization efficiency and global search capability in deep learning model optimization for time series prediction23,24, this study proposes a Hybrid Optimization Expert System (HOES), which integrates six evolutionary algorithms—Grey Meadow Arctic Puffin Optimization (GM-APO)25, Cooperative Memory-Based Bloodsucker Leech Optimization (CM-BSLO)26, Neuro-Inspired Enterprise Development Optimization (NI-EDO)27, Time-Dependent Pattern Matching Frilled Lizard Optimization (TDPM-FLO)28, Latent Information Acquisition Optimization (LI-IAO)29, and Evolutionary Dynamic Mechanism Ivy Optimization (EDM-IVYA)30—to enhance both optimization efficiency and global search capability31. The memory system is constantly updated dynamically during the optimization process to ensure that the retained solutions always have a high degree of adaptability, thus improving the convergence efficiency and robustness of the algorithm. Unlike traditional training strategy methods that suffer from computational inefficiency and limited global exploration, HOES combines these evolutionary algorithms and employs a transmission mechanism, a memory system, and a punishment system to collaboratively optimize deep learning models13,32. Specifically, the transmission mechanism sequentially applies different evolutionary algorithms, leveraging their complementary strengths to improve global search capability and prevent premature convergence; the memory system retains historically optimal solutions, ensuring effective exploration and avoiding search degradation; and the punishment system dynamically eliminates ineffective optimization strategies, thereby enhancing computational efficiency. Through these mechanisms, HOES effectively balances optimization efficiency and global search capability, overcoming the limitations of both traditional methods and single evolutionary algorithms.
The primary contributions of this study are as follows:
We propose a Hybrid Optimization Expert System (HOES) for deep learning-based time series prediction, which integrates six evolutionary algorithms to enhance training strategy efficiency and global search capability.
A novel multi-mechanism framework is introduced within HOES, incorporating a transmission mechanism that sequentially applies different evolutionary algorithms to enhance global exploration, a memory system that preserves historically optimal solutions to prevent search degradation, and a punishment system that dynamically eliminates ineffective optimization strategies, thereby improving optimization efficiency and stability.
We validate the effectiveness of HOES through extensive experiments on six public datasets (Traffic, Weather, Household, Wind Power, Solar Power, and ETT_m1), which achieves the significant improvements in predictive accuracy and training convergence speed by multiple deep learning models.
Related work
-
A. Grey Meadow Arctic Puffin Optimization.
The Arctic Puffin Optimization (APO) algorithm is a swarm intelligence optimization algorithm based on the foraging behavior of the Arctic puffin, including two phases: aerial flight and underwater foraging. The algorithm adopts an adaptive behavioral transition factor to dynamically adjust the search strategy at different stages to achieve a balance between exploration and exploitation.
-
B. Cooperative Memory-Based Bloodsucker Leech Optimization.
Bloodsucking leech optimization algorithm (BSLO) is a swarm intelligence optimization algorithm based on the predatory behavior of bloodsucking leeches, which consists of two phases: directed search and undirected search. Directional leech exploration improves population diversity by searching for new solutions in regions far from the prey, directional leech exploitation accelerates convergence when close to the prey and enhances the local search capability, and random wandering of undirected leeches allows individuals to wander around themselves or away from the prey in order to maintain the stochastic nature of the population and reduce the risk of falling into a local optimum.
-
C. Neuro-Inspired Enterprise Development Optimization.
Enterprise Development Optimization Algorithm (EDO) is a swarm intelligence optimization algorithm based on the enterprise development process, which contains two phases of task assignment and organization optimization. The algorithm adopts a dynamic adjustment mechanism, which enables individuals to optimize their decisions at different stages according to the environmental changes, and achieves a balance between global search and local development.
-
D. Time-Dependent Pattern Matching Frilled Lizard Optimization.
The Flying Lizard Optimization (FLO) algorithm is a swarm intelligence optimization algorithm based on the predatory and evasive behaviors of umbrella lizards, including two phases: hunting and climbing up the tree. Moreover, it adopts an adaptive strategy to adjust the behavior of individuals at different stages to achieve a balance between exploration and exploitation.
-
E. Latent Information Acquisition Optimization.
Information Acquisition Optimization Algorithm (IAO) is a swarm intelligence optimization algorithm based on the human information processing process, including two phases of information collection and information analysis. It adopts an adaptive learning mechanism to dynamically adjust the search method at different stages in order to achieve a balance between global exploration and local optimization.
-
F. Evolutionary Dynamic Mechanism Ivy Optimization.
The ivy optimization algorithm (IVYA) is a swarm intelligence optimization algorithm based on the ivy growth mechanism, including two phases of spreading growth and directed evolution. It adopts a dynamic growth mechanism to adjust the individual expansion method at different stages to realize the balance between exploration and exploitation.
The six algorithms are selected not only for their individual strengths but also for their synergistic complementarity within HOES’s transmission mechanism. This multi-algorithm integration creates a self‐correcting optimization pipeline in which weaknesses in one component are compensated by others:
Exploration–Exploitation Synergy: GM-APO’s aerial foraging phase provides aggressive global exploration (addressing local-optima traps), but its slow underwater refinement is counterbalanced by EDM-IVYA’s elite-driven rapid convergence. Conversely, EDM-IVYA’s tendency toward diversity loss is mitigated by CM-BSLO’s undirected random walks, which actively maintain population heterogeneity.
Dynamic Behavior Adaptation: When TDPM-FLO’s gradient-sensitive pattern matching risks premature convergence (e.g., on noisy datasets like Household), LI-IAO’s information-theoretic sampling redirects the search toward unexplored regions. NI-EDO’s organizational adaptation provides meta-level strategy shifts when environmental feedback (tracked via HOES’s memory system) indicates stagnation.
Temporal Search Coordination: Early-stage algorithms (GM-APO, CM-BSLO) emphasize broad exploration, while late‐stage specialists (EDM-IVYA, TDPM-FLO) focus on intensive exploitation. Transitional algorithms (NI-EDO, LI-IAO) dynamically bridge these phases.
This orchestrated workflow transforms sequential execution into an error-correcting cascade: solutions refined by one algorithm are dynamically recontextualized by subsequent experts. For example, GM-APO’s coarse solutions are structurally optimized by NI-EDO before precision-tuning via EDM-IVYA—a process impossible for any single EA. The transmission mechanism thus leverages complementary convergence behaviors (slow/broad vs. fast/narrow) to escape local optima while accelerating refinement.
Their key characteristics are summarized in Table 1.
Table 1.
Key characteristics of six algorithms.
| Algorithm | Search strategy | Convergence behavior | Known weaknesses |
|---|---|---|---|
| GM-APO | Gaussian perturbation with dual-phase search (flight/forage) | Strong global search, moderate convergence speed | May slow down in local refinement |
| CM-BSLO | Directional vs. undirected search with random walks | Good population diversity, stable convergence | May oscillate near optimal region |
| NI-EDO | Dynamic enterprise-inspired task-structure optimization | Adaptive to environment, robust | Higher computational cost |
| TDPM-FLO | Pattern-matching and behavior transitions | Fast convergence, responsive to gradients | Risk of premature convergence |
| LI-IAO | Information acquisition and transformation | Balanced exploration-exploitation | Sensitive to initialization |
| EDM-IVYA | Growth and aggregation via elite memory | Rapid local convergence | May lose diversity over time |
Methods
In this study, we propose a multi-evolutionary algorithm-based optimization model for expert systems (HOES), which aims to optimize the performance of deep learning models in complex time-series prediction tasks, and its general structure is shown in Fig. 1. HOES combines six different evolutionary optimization algorithms, including GM-APO, CM-BSLO, NI-EDO, TDPM-FLO, LI-IAO and EDM-IVYA. By fusing these optimization algorithms, HOES shows good adaptability in balancing global search capability and computational resource allocation.
Fig. 1.
The general architecture of HOES.
The core optimization mechanism of HOES model consists of three modules: transmission mechanism, memory system and penalty system. The transmission mechanism ensures that the evolutionary algorithms optimize the parameters sequentially by serializing the optimization strategies and transfer the optimization results step by step to form an incremental optimization process. Memory system Responsible for storing the historical optimal solutions to maintain the diversity of solutions and dynamically adjusting the search direction during the optimization process, thus reducing the risk of the algorithm falling into local optimum. Penalty system By measuring the contribution of each optimization algorithm in successive iterations, it dynamically eliminates algorithms with low optimization results to improve computational efficiency.
HOES achieves efficient optimization of parameters of deep learning models by integrating multiple evolutionary optimization algorithms with memory and penalty mechanisms. The method improves the convergence speed of the model and reduces the waste of computational resources while ensuring the optimization accuracy. In the subsequent sections, we could introduce the three core modules of the HOES model and their optimization strategies in detail.
Transmission mechanism
The HOES model employs a transmission mechanism to ensure that the optimization results of multiple evolutionary algorithms can be transmitted sequentially to form an incremental optimization process. The core idea of the mechanism is that each optimization algorithm independently optimizes a specific hyperparameter and uses the optimized solution as the input to the next algorithm, thus gradually improving the optimization effect. This mechanism gives the optimization process a recursive nature and ensures that the subsequent algorithms search on the basis of the optimization results of the previous stage in order to improve the quality of the solution. In addition, this mechanism reduces the randomness of individual initialization and makes the scope of action of different optimization algorithms clearer, thus optimizing the search efficiency.
Mathematically, the HOES transmission mechanism can be expressed as:
![]() |
1 |
where
denotes the input population of the (k+1) th expert.
denotes the output population of the kth expert. Here k is the algorithm index (corresponding to APO, corresponding to BSLO, etc.) and t is the number of iterations. This formula is used to implement the transfer of populations between experts, ensuring that each expert can be optimized based on the results of the previous expert.
Memory system
The HOES model employs a memory system to store the solutions that have historically performed better during the optimization process in order to enhance the diversity of the search and reduce the risk of the optimization process falling into local optima. Traditional evolutionary algorithms are prone to lose certain high-quality solutions during the iteration process, leading to a decrease in search efficiency or insufficient stability of solutions. The introduction of the memory system can effectively retain the promising solutions, so that the algorithm can use these solutions for search adjustment in the subsequent optimization phase and improve the global optimization capability. In addition, the memory system is constantly updated dynamically during the optimization process to ensure that the retained solutions always have a high degree of adaptability, thus improving the convergence efficiency and robustness of this algorithm41, HOES shows robustness across diverse datasets, outperforming traditional optimization approaches and single evolutionary algorithms.
The optimization process of the memory system consists of three core steps. First, after each round of optimization, the optimal solutions output from the current optimization algorithm are stored in the memory pool and merged with the historically stored solutions to maintain the diversity of solutions. Second, all stored solutions are non-inferiorly sorted according to fitness, and only the top ones with the best performance are retained n candidate solutions to avoid the interference of storing inefficient solutions to the optimization process. Finally, during a new round of optimization, the optimal solutions in the memory pool are fused with the population of the current optimization algorithm to provide richer initial solutions for subsequent searches, thus improving the optimization efficiency.
Mathematically, the memory system is formalized as an update operator that merges the current population with historical elites, ranks the merged set, and selects a capped set of representatives:
![]() |
2 |
where
denotes the memory pool at iteration t,
is the memory pool from the previous iteration,
represents the population generated in the current iteration,
is the ranking operator, which sorts individuals using a combination of non-dominated sorting and crowding distance,
is the selection operator, which selects the top n individuals according to the ranking results, thus enforcing the memory capacity constraint.
In order to prevent the memory modules from leaning excessively toward early high-performance solutions and to maintain diversity throughout the search process, HOES integrates multi-level strategies when managing its memory pools:
Non-dominated Sorting and Crowding-Distance Filtering:
After each optimization iteration, the current population and the memory pool are merged and ranked via non-dominated sorting, considering multi-objective indicators such as RMSE and MAE. To enhance diversity, crowding distance is then calculated within each Pareto front to assess the density of surrounding solutions. This not only preserves elite individuals but also retains those that contribute to exploration, effectively avoiding premature convergence to local optimum.
Dynamic Replacement Policy: Rather than statically retaining only early stored individuals, HOES employs an adaptive memory-update mechanism. New solutions that are non-dominated or that provide complementary diversity may replace existing ones. This dynamic policy ensures that late-stage high-quality solutions can enter the memory pool and influence population evolution.
Memory-Capacity Control and Redundancy Filtering: The memory pool size is capped at a predefined limit (e.g., the top five solutions). Duplicate or near-duplicate solutions identified via structural similarity or redundant fitness profiles are filtered out to avoid over-representation of any narrow region in the solution space.
Through this combined mechanism, the HOES memory component remains both representative and diverse, enabling a balanced and adaptive search strategy that leverages historical knowledge without stalling innovation.
Penalty system
The HOES model introduces a penalty system to dynamically eliminate algorithms with low contributions during multiple rounds of optimization, thereby improving computational efficiency and ensuring that optimization resources are focused on efficient search processes. Traditional multi-algorithm optimization methods usually execute all optimization algorithms simultaneously, but this may result in some algorithms remaining involved in the computation at a later stage of the optimization, even though their contributions have decreased, thus increasing the redundant computational cost. The penalty system dynamically adjusts the optimization strategy by measuring the degree of contribution of each optimization algorithm, making the overall optimization process more efficient.
The optimization process of the penalty system consists of three key steps. First, after each round of optimization, the degree of contribution of each optimization algorithm to the current solution is calculated, i.e., whether the algorithm significantly improves the fitness of the solution in the current iteration. Second, for optimization algorithms that fail to improve the solution in consecutive rounds, their penalty factors are accumulated as a measure of their effectiveness. Finally, once the penalty factor of an optimization algorithm exceeds a set threshold, the algorithm will be eliminated from the subsequent optimization process, thus reducing unnecessary computational consumption.
Mathematically, the calculation of the penalty system can be expressed as follows:
![]() |
3 |
where
is the contribution factor of the kth expert, which is used to measure the optimization contribution of the expert in successive iterations,
denotes the minimum value of the fitness of an individual in the population,
is the fitness function, and
is an individual in the external population. When
it means that the
th expert still has optimization contribution in this iteration, and reset its contribution factor
to 0. Otherwise, the contribution factor accumulates to 1, and the expert is eliminated if
to improve the overall computational efficiency.
Algorithm workflow
The optimization process of the HOES model consists of several stages to ensure an efficient training strategy. First, all parameters of the optimization algorithm are initialized, including the population size and the maximum number of iterations. The population size determines the number of candidate solutions to be explored simultaneously, while the maximum number of iterations defines the scope of the search process. These initial settings lay the foundation for a robust optimization framework that balances exploration and exploitation. After initialization, the six optimization algorithms are executed sequentially. This sequential execution is a key feature of the HOES model that allows each algorithm to build on the results of the previous one. By leveraging the unique strengths of each algorithm, the HOES model ensures an integrated optimization strategy that addresses both global and local search needs. After each round of optimization, the generated optimal solutions are stored in a memory pool. This memory pool acts as a repository of high-quality candidate solutions, maintaining the diversity of the population during the iterations. By retaining these solutions, the HOES model can use the historical information to guide future searches, thus improving the overall efficiency and robustness of the optimization process. Meanwhile, a penalty system is used to filter inefficient algorithms. This dynamic mechanism evaluates the performance of each algorithm at each iteration and identifies those algorithms that fail to contribute significantly to the optimization process. By eliminating these underperforming algorithms, the HOES model reduces computational redundancy and focuses resources on more effective strategies. When the maximum number of iterations or convergence conditions are reached, the optimization results are finally output and applied to the training of the deep learning model. The overall process is shown in Algorithm 1.
Algorithm I.

Pseudocode for HOEs Model
Experiments and analyses
General settings
Evaluation protocol
In this study, R2 (coefficient of determination)33, mean squared error (MSE)34, mean absolute error35(MAE), root mean squared error (RMSE)34, and the Willmott index of agreement (WIA)36 are used to evaluate model performance. R2 measures goodness of fit, with values closer to 1 indicating better predictions. MSE and RMSE assess overall prediction error: MSE calculates the mean squared error, while RMSE provides an error metric in original data units by taking the square root of MSE. MAE represents the average absolute error, reflecting prediction accuracy. WIA quantifies the standardized consistency between predicted and actual values, ranging from
with higher values indicating better agreement and reduced sensitivity to extreme errors. These metrics comprehensively assess the HOES model from multiple perspectives in deep learning optimization. The formulas for these metrics are as follows:
![]() |
4 |
![]() |
5 |
![]() |
6 |
![]() |
7 |
![]() |
8 |
where
represents the real result of the
th sample,
denotes the predicted result of the
th sample, and
is mean of the real results of all samples.
To prevent temporal data leakage and ensure rigorous evaluation, we adopt a blocked time-series cross-validation with a rolling-origin scheme. The initial training window comprises the first 70% of the chronological data. For each subsequent fold, the training window expands incrementally while maintaining temporal order, and predictions are made on a fixed-length horizon that slides forward through the remaining 30% of the data. This process generates multiple test folds, and model performance is aggregated across all evaluation windows. The final results are reported as mean ± standard deviation across all folds.
Datasets
As shown in Table 2, six real-world datasets are used in our empirical studies. They are publicly-available datasets that are widely used in previously related studies.
Table 2.
Details of the concerned datasets.
Data preprocessing
Crucially, all data preprocessing steps—including normalization, feature engineering, and handling of missing values—are fit exclusively on the training window of each fold. Missing values are addressed through forward-filling, and outliers beyond three standard deviations are winsorized at the 1st and 99th percentiles to maintain data integrity while minimizing distortion.
Experimental design
To evaluate the optimization effectiveness of the HOES method in deep learning-based time-series forecasting models, we conduct a comparative study using nine optimization algorithms, including joint optimization algorithms, simulated annealing algorithms, genetic algorithms, and six evolutionary optimization algorithms introduced in the related work section.
The experiments are conducted on six widely used time-series forecasting models: multilayer perceptron (MLP)37, gated recurrent unit (GRU)38, long short-term memory network (LSTM)39, temporal convolutional network (TCN)40, Transformer model41, and SJ-LSTM model42. These models represent different categories of deep learning architectures: feedforward neural networks (MLP), recurrent neural networks (GRU, LSTM, and SJ-LSTM), convolutional networks (TCN), and attention-based models (Transformer).
To identify a representative model for in-depth optimization analysis, we first evaluate the performance of six models across six publicly available datasets: Traffic, Weather, Household, Wind Power, Solar Power, and ETT_m1. Based on these results, we select one model as the representative model, which has competitive predictive performance, strong capability in capturing long-term dependencies, and noticeable sensitivity to hyperparameter selection.
Once the representative model is identified, we further conduct a comparative analysis of different optimization algorithms on the model to assess their impact on prediction accuracy and convergence efficiency. Specifically, we compare the optimization performance of HOES against other baseline optimization methods to validate its superiority in optimizing deep learning-based time-series forecasting models.
To ensure a fair comparison across all optimization algorithms, we standardize the computational budget under identical experimental conditions:
Population size: all algorithms use 75 individuals.
Maximum iterations: we set 50 generations uniformly.
Fitness evaluations: we equalize at 3,750 evaluations per run (population size × iterations).
Initialization: Each experiment was repeated five times using different random seeds (5, 21, 42, 66, 87) to mitigate stochastic effects from both the evolutionary optimization and deep model training.
Hardware/software: we conduct all experiments on the same platform (Intel Xeon Gold 6226R CPU, NVIDIA Tesla V100 GPU, TensorFlow 2.8).
These controls eliminate performance variance due to resource-allocation differences, ensuring that observed improvements are attributable to algorithmic efficacy.
Hyperparameter search space definition
To ensure transparency and reproducibility, we explicitly define the complete hyperparameter search space optimized by HOES. Our optimization focuses on two critical hyperparameters that have the most significant impact on deep learning model performance: the learning rate and dropout rate. The search space is defined as follows:
Learning Rate: A continuous parameter searched on a log-uniform scale within the range [1e-5, 1e-2]. This range covers typical learning rate values used in deep learning practice while allowing the optimization algorithms to explore both conservative and aggressive learning strategies.
Dropout Rate: A continuous parameter searched on a uniform scale within the range [0.1, 0.4]. This range enables the exploration of models with regularization levels from none to strong, balancing underfitting and overfitting risks.
These two parameters were selected based on their well-established importance in determining model convergence and generalization capability43. The search space design follows established practices in hyperparameter optimization literature, providing sufficient expressiveness while maintaining computational tractability.
Training strategy and overfitting prevention
To reduce the risk of overfitting when training deep learning models—particularly on smaller datasets—we implement a series of preventive strategies in our training and optimization pipeline:
Early Stopping Criterion: For all deep learning model training during the optimization process, we employ a consistent early stopping rule. Training is monitored using validation loss, and is terminated if no improvement is observed for 10 consecutive epochs (patience = 10). This criterion is applied uniformly across all optimization algorithms and experimental runs to ensure fair comparison.
L2 Regularization: A regularization term is added to the loss function to penalize large weight magnitudes, thereby constraining model complexity and reducing variance.
Validation-Guided Optimization: During the evolutionary search process, HOES is guided by validation RMSE rather than training RMSE. This shifts the optimization focus toward models that generalize well, rather than those that overfit.
Dropout: To mitigate co-adaptation dependencies among hidden units and reduce model complexity, dropout regularization is incorporated at each hidden layer. Crucially, the dropout rate is included in HOES’s hyperparameter search space. Specially, HOES evaluates candidate dropout probabilities based on their effect on RMSE to determine the optimal level of stochastic regularization for each dataset.
Learning Curves: To enhance interpretability and build trust in the training process, we plot learning curves displaying both training and validation loss across epochs for key datasets.
In summary, by combining regularization techniques, validation-based early stopping, and careful monitoring of generalization metrics, we ensure that HOES yields robust models with strong generalization capabilities across datasets of varying sizes.
Representative model selection
To identify a representative model for evaluating HOES in time-series forecasting, we assess the predictive performance of these deep learning models on the traffic dataset. The evaluation metrics follow those outlined in the general settings. The experimental results are shown in Table 3.
Table 3.
Performance comparison of different time-series models on the traffic dataset.
| Models | MAE | MSE | RMSE | R2 | WIA |
|---|---|---|---|---|---|
| SJ-LSTM | 309.8226 | 186256.6654 | 431.5746 | 0.9522 | 0.9869 |
| LSTM | 345.7876 | 216353.1639 | 465.1378 | 0.9445 | 0.9852 |
| GRU | 346.7378 | 222208.4428 | 471.3899 | 0.9430 | 0.9849 |
| TCN | 403.0469 | 373797.7602 | 611.3900 | 0.9041 | 0.9744 |
| Transformer | 471.3753 | 452449.0651 | 672.6433 | 0.8839 | 0.9660 |
| MLP | 546.0320 | 674475.8832 | 821.2648 | 0.8287 | 0.9501 |
Significant values are in bold.
As shown in Table 2, SJ-LSTM outperforms all other models across all evaluation metrics. Specifically, SJ-LSTM achieves the lowest MAE (309.8226), MSE (186256.6654), and RMSE (431.5746), signifying superior predictive accuracy. Additionally, it attains the highest R2 (0.9522) and IA (0.9869), demonstrating a strong goodness of fit and agreement between predictions and actual values.
Among the baseline models, LSTM and GRU exhibit comparable but slightly weaker performance, with higher prediction errors and lower R2 values. TCN and Transformer show significantly larger errors, suggesting that convolutional and attention-based models may not be effective in capturing temporal dependencies. MLP records the worst performance, highlighting its limitations in modeling sequential patterns.
Based on these results, SJ-LSTM is selected as the representative model for further optimization analysis. In the subsequent experiments, we apply different optimization algorithms to SJ-LSTM to examine their impact on prediction accuracy and convergence speed.
Hyperparameter sensitivity analysis
To comprehensively evaluate the robustness of HOES, we conducted a sensitivity analysis on two critical hyperparameters: population size and penalty threshold. Experiments were performed on the Solar Power dataset (selected for its complexity and representativeness) using the SJ-LSTM model. Population size was varied across {30, 50, 75, 100}, and penalty thresholds were tested at {2, 3, 4, 5}. The resulting performance metrics (MAE, MSE, RMSE, R2, WIA) are summarized in Table 4.
Table 4.
Sensitivity analysis.
| Population | Penalty thresholds | MAE | MSE | RMSE | R2 | WIA |
|---|---|---|---|---|---|---|
| 30 | 2 | 0.0287 | 0.0035 | 0.0592 | 0.9562 | 0.9879 |
| 50 | 2 | 0.0251 | 0.0028 | 0.0529 | 0.9620 | 0.9906 |
| 75 | 2 | 0.0219 | 0.0021 | 0.0458 | 0.9685 | 0.9932 |
| 100 | 2 | 0.0212 | 0.0020 | 0.0447 | 0.9703 | 0.9937 |
| 30 | 3 | 0.0263 | 0.0029 | 0.0539 | 0.9601 | 0.9901 |
| 50 | 3 | 0.0227 | 0.0023 | 0.0480 | 0.9656 | 0.9922 |
| 75 | 3 | 0.0204 | 0.0018 | 0.0424 | 0.9734 | 0.9948 |
| 100 | 3 | 0.0196 | 0.0017 | 0.0412 | 0.9749 | 0.9951 |
| 30 | 4 | 0.0241 | 0.0026 | 0.0511 | 0.9634 | 0.9912 |
| 50 | 4 | 0.0209 | 0.0020 | 0.0447 | 0.9702 | 0.9936 |
| 75 | 4 | 0.0181 | 0.0014 | 0.0374 | 0.9806 | 0.9963 |
| 100 | 4 | 0.0178 | 0.0013 | 0.0361 | 0.9815 | 0.9966 |
| 30 | 5 | 0.0226 | 0.0023 | 0.0482 | 0.9661 | 0.9922 |
| 50 | 5 | 0.0191 | 0.0015 | 0.0387 | 0.9790 | 0.9960 |
| 75 | 5 | 0.0153 | 0.0010 | 0.0319 | 0.9866 | 0.9988 |
| 100 | 5 | 0.0158 | 0.0011 | 0.0328 | 0.9857 | 0.9984 |
Significant values are in bold.
In this sensitivity analysis, we found that as the population size increased from 30 to 75, the prediction error decreased significantly—for example, with a penalty threshold of 2, the MAE decreased from 0.0287 to 0.0219, a reduction of approximately 24%. However, when the population size was further expanded to 100, the error improved by only 3%, indicating that while larger populations can enhance solution diversity, they also incur higher computational costs with minimal additional benefit. On the other hand, stricter penalty thresholds (increased from 2 to 5) accelerated convergence and enhanced stability: at a population size of 75, the MAE decreased from 0.0219 to 0.0153 (a reduction of 30%), while the R2 improved from 0.9685 to 0.9866. However, overly lenient thresholds (such as 2) tend to retain poorly performing individuals, thereby slowing down the optimization process.
Overall, when the population size is set to 75 and the penalty threshold to 5, the lowest MAE (0.0153) and RMSE (0.0319) are achieved, along with the highest R2 (0.9866) and WIA (0.9988), thereby maximizing global search capability while avoiding unnecessary redundant computations.
HOES exhibits moderate sensitivity to hyperparameters. A population size of 75 ensures sufficient diversity for complex time-series tasks, while a penalty threshold of 5 dynamically eliminates ineffective optimizers, maintaining efficiency. These settings are recommended as defaults for similar applications.
Computational complexity
To characterize the computational cost of our approach, Table 5 compares HOES against nine baseline algorithms in terms of worst-case serial time complexity.
Table 5.
Worst-case computational complexity.
| Algorithm | Computational complexity |
|---|---|
| GM-APO |
|
| CM-BSLO |
|
| NI-EDO |
|
| TDPM-FLO |
|
| LI-IAO |
|
| EDM-IVYA |
|
| SA |
|
| GA |
|
| JONM |
|
| HOEs |
|
Here, M is the population size,
is the number of iterations executed by the i-th algorithm, N is the number of algorithms, and D is the hyperparameter dimension.
-
Baseline complexity
Each of the six constituent evolutionary algorithms requires
operations. A serial combination method such as JONM therefore incurs
. -
HOES complexity
Transmission: Population transfer between EAs costs
per algorithm transition.Memory System: Merging (
) and non-dominated sorting (
) per EA execution.Penalty System: Contribution factor calculation costs
per EA per iteration.In the worst case, the complexity of HOES is

-
Dynamic cost reduction mechanisms
Poorly performing algorithms are pruned after a configurable number of unimproved rounds, reducing the active set from six to
. Consequently, the total number of iterations is capped at
(with 6 −
algorithms pruned), and the overall complexity is
. -
Closed-Form analysis of HOES complexity under pruning
To analytically examine how pruning improves the runtime efficiency of HOES, we model its worst-case wall-clock complexity relative to a single evolutionary algorithm (EA).
Let E denote the maximum number of iterations per EA, and let p denote the pruning rate (fraction of EAs terminated early). Each individual fitness evaluation costs
and the per-iteration memory and sorting overhead is denoted by
If a fraction p of the EAs is pruned after an average of
iterations (
), the total worst-case runtime of HOES can be expressed as
9 Comparing this with the single-EA runtime
we obtain the efficiency threshold:
10 In the ideal case where memory and sorting overheads are negligible (
) and pruning occurs very early (
), Eq. (10) reduces to
.For the HOES configuration used in our experiments (N = 6), this yields
.Hence, if more than 83% of EAs are pruned early, HOES is guaranteed to achieve a lower worst-case wall-clock time than a single EA.
-
Performance–Cost Trade-off
While HOES entails a greater theoretical computational burden compared to individual optimization algorithms, comprehensive experimental evidence demonstrates that it consistently achieves superior predictive performance— yielding an average accuracy improvement of 15–30% across six diverse real-world datasets. For instance, on the Traffic dataset, it achieves a 30% reduction in MAE. Moreover, HOES exhibits markedly faster convergence and greater solution stability than both standalone evolutionary algorithms and conventional hyperparameter tuning approaches such as simulated annealing (SA) and genetic algorithms (GA).
-
arallelization opportunities & constraints
The HOES architecture is inherently modular, enabling certain operations to run in parallel, but some stages remain sequential due to its iterative, meta-optimization nature:
Parallelizable Tasks: Internal EA procedures (e.g., evaluating populations) and memory-system maintenance (e.g., sorting and merging) can be distributed across cores. This aligns with common practices in distributed hyperparameter optimization, where independent fitness evaluations are parallelized to reduce wall-clock time44,45.
Sequential Bottlenecks: The core transmission mechanism, where information is passed sequentially between expert algorithms, creates a fundamental dependency. This mirrors known trade-offs in distributed computation, where iterative algorithms requiring sequential state updates face inherent synchronization constraints that limit parallel speedup46,47. Similarly, the penalty system’s decisions rely on the historical performance outcomes of previous stages, enforcing a sequential decision-making process.
Theoretical Speedup Limit (Amdahl’s Law): Let α represent the sequential fraction of HOES, which includes the execution of the expert algorithms in sequence and the management of the memory and penalty systems. The parallelizable fraction (1 − α) corresponds to the fitness evaluations within one expert. According to Amdahl’s Law, the maximum speedup using Q processors is:
![]() |
11 |
Based on profiling, the fitness evaluation (model inference) constitutes the bulk (~ 85%) of the computation within an expert. However, considering the sequential execution of six experts and other overheads, the overall parallelizable fraction (1 − α) of the entire HOES process is estimated to be around 0.6. Therefore, the theoretical maximum acceleration of Q = 8 processors is approximately 2.1.This indicates a practical upper bound of about 2x speedup, highlighting the impact of HOES’s sequential components.
Empirical Wall-clock Time Comparison: We implemented a parallel version of HOES where the population evaluations within the active experts are distributed across Q = 8 CPU cores using multiprocessing. Table 6 shows the wall-clock time comparison on all six datasets.
Table 6.
Empirical wall-clock time comparison of HOES.
| Dataset | Sequential time (hours) | Parallel time (Q = 8) (hours) | Observed speedup |
|---|---|---|---|
| Traffic | 38.5 | 21.5 | 1.79× |
| Weather | 25.1 | 13.9 | 1.81× |
| Household | 41.8 | 23.8 | 1.76× |
| Wind power | 45.2 | 25.4 | 1.78x |
| Solar power | 36.7 | 20.2 | 1.82x |
| ETT_m1 | 22.5 | 12.5 | 1.80× |
The observed speedups, ranging from 1.76× to 1.82×, are in strong agreement with the theoretical prediction (2.1×). The consistent performance across all datasets demonstrates the robustness of the parallelization strategy. The minor discrepancy between the empirical and theoretical speedup is attributed to parallel overheads, including inter-process communication and data serialization costs.
These results demonstrate that HOES can effectively leverage parallel computing to achieve a practically meaningful reduction in training time.
Optimization algorithm comparison
To evaluate the effectiveness of the proposed HOES method in optimizing deep learning models for time-series forecasting, we compare it against several baseline optimization algorithms metioned above, which include both traditional hyperparameter tuning approaches and evolutionary algorithms commonly used for optimizing deep learning models. The experiment focuses on optimizing the parameters of the SJ-LSTM model, which was previously identified as the representative model. Each optimization method is run under the same computational budget to ensure a fair comparison in terms of convergence speed and final model performance. To assess the optimization efficiency, we measure the best model performance achieved using RMSE and MAE, as well as visualize the loss changes during model training. Tables 7, 8, 9, 10, 11 and 12 show the performance of different optimization algorithms for optimizing the SJ-LSTM model on multiple datasets, while Figs. 2, 3, 4, 5, 6 and 7 visualize the variation of the loss during training of these algorithms by optimizing SJ-LSTM model on different datasets.
Table 7.
Performance comparison of different optimization algorithms for optimizing SJ-LSTM model on the traffic dataset.
| Algorithms | MAE | MSE | RMSE | R 2 | WIA |
|---|---|---|---|---|---|
| JONM | 0.0397 ± 0.0018 | 0.0032 ± 0.0002 | 0.0569 ± 0.0021 | 0.9560 ± 0.0025 | 0.9884 ± 0.0012 |
| APO | 0.0472 ± 0.0023 | 0.0041 ± 0.0003 | 0.0638 ± 0.0027 | 0.9447 ± 0.0029 | 0.9844 ± 0.0015 |
| BSLO | 0.0451 ± 0.0020 | 0.0039 ± 0.0003 | 0.0625 ± 0.0024 | 0.9470 ± 0.0026 | 0.9856 ± 0.0013 |
| EDO | 0.0497 ± 0.0024 | 0.0045 ± 0.0003 | 0.0670 ± 0.0029 | 0.9390 ± 0.0031 | 0.9833 ± 0.0016 |
| FLO | 0.0492 ± 0.0022 | 0.0045 ± 0.0003 | 0.0673 ± 0.0030 | 0.9384 ± 0.0032 | 0.9834 ± 0.0016 |
| IAO | 0.0487 ± 0.0021 | 0.0044 ± 0.0003 | 0.0666 ± 0.0028 | 0.9397 ± 0.0030 | 0.9838 ± 0.0015 |
| IVYA | 0.0467 ± 0.0020 | 0.0042 ± 0.0003 | 0.0644 ± 0.0026 | 0.9436 ± 0.0027 | 0.9843 ± 0.0014 |
| GA | 0.0452 ± 0.0019 | 0.0039 ± 0.0003 | 0.0622 ± 0.0023 | 0.9473 ± 0.0025 | 0.9855 ± 0.0012 |
| SA | 0.0551 ± 0.0027 | 0.0056 ± 0.0004 | 0.0745 ± 0.0035 | 0.9245 ± 0.0038 | 0.9794 ± 0.0018 |
| HOES | 0.0301 ± 0.0014 | 0.0030 ± 0.0002 | 0.0431 ± 0.0019 | 0.9658 ± 0.0021 | 0.9939 ± 0.0010 |
Significant values are in bold.
Table 8.
Performance comparison of different optimization algorithms for optimizing SJ-LSTM model on the weather dataset.
| Algorithms | MAE | MSE | RMSE | R 2 | WIA |
|---|---|---|---|---|---|
| JONM | 0.0133 ± 0.0006 | 0.0004 ± 0.00003 | 0.0190 ± 0.0009 | 0.9823 ± 0.0020 | 0.9955 ± 0.0010 |
| APO | 0.0184 ± 0.0009 | 0.0007 ± 0.00005 | 0.0272 ± 0.0013 | 0.9638 ± 0.0026 | 0.9913 ± 0.0014 |
| BSLO | 0.0167 ± 0.0008 | 0.0005 ± 0.00004 | 0.0220 ± 0.0010 | 0.9762 ± 0.0022 | 0.9943 ± 0.0011 |
| EDO | 0.0224 ± 0.0011 | 0.0007 ± 0.00006 | 0.0270 ± 0.0014 | 0.9642 ± 0.0028 | 0.9904 ± 0.0016 |
| FLO | 0.0235 ± 0.0012 | 0.0009 ± 0.00007 | 0.0302 ± 0.0016 | 0.9551 ± 0.0030 | 0.9870 ± 0.0018 |
| IAO | 0.0219 ± 0.0010 | 0.0010 ± 0.00008 | 0.0313 ± 0.0017 | 0.9519 ± 0.0031 | 0.9884 ± 0.0017 |
| IVYA | 0.0203 ± 0.0009 | 0.0008 ± 0.00006 | 0.0280 ± 0.0013 | 0.9614 ± 0.0027 | 0.9907 ± 0.0015 |
| SA | 0.0226 ± 0.0011 | 0.0009 ± 0.00007 | 0.0293 ± 0.0015 | 0.9579 ± 0.0029 | 0.9883 ± 0.0017 |
| GA | 0.0174 ± 0.0008 | 0.0005 ± 0.00004 | 0.0231 ± 0.0011 | 0.9738 ± 0.0023 | 0.9930 ± 0.0012 |
| HOES | 0.0106 ± 0.0005 | 0.0003 ± 0.00002 | 0.0139 ± 0.0007 | 0.9919 ± 0.0015 | 0.9973 ± 0.0008 |
Significant values are in bold.
Table 9.
Performance comparison of different optimization algorithms for optimizing SJ-LSTM model on the household dataset.
| Algorithms | MAE | MSE | RMSE | R 2 | WIA |
|---|---|---|---|---|---|
| JONM | 0.0252 ± 0.0012 | 0.0020 ± 0.0001 | 0.0447 ± 0.0021 | 0.9024 ± 0.0030 | 0.9733 ± 0.0016 |
| APO | 0.0308 ± 0.0015 | 0.0023 ± 0.0002 | 0.0480 ± 0.0024 | 0.8877 ± 0.0033 | 0.9697 ± 0.0018 |
| BSLO | 0.0264 ± 0.0013 | 0.0022 ± 0.0002 | 0.0468 ± 0.0022 | 0.8929 ± 0.0031 | 0.9696 ± 0.0017 |
| EDO | 0.0305 ± 0.0016 | 0.0024 ± 0.0002 | 0.0489 ± 0.0025 | 0.8833 ± 0.0035 | 0.9675 ± 0.0019 |
| FLO | 0.0296 ± 0.0015 | 0.0024 ± 0.0002 | 0.0487 ± 0.0024 | 0.8844 ± 0.0034 | 0.9708 ± 0.0018 |
| IAO | 0.0296 ± 0.0014 | 0.0023 ± 0.0002 | 0.0478 ± 0.0023 | 0.8885 ± 0.0032 | 0.9693 ± 0.0017 |
| IVYA | 0.0286 ± 0.0013 | 0.0022 ± 0.0002 | 0.0470 ± 0.0022 | 0.8919 ± 0.0031 | 0.9704 ± 0.0016 |
| GA | 0.0300 ± 0.0015 | 0.0025 ± 0.0002 | 0.0500 ± 0.0025 | 0.8781 ± 0.0037 | 0.9656 ± 0.0019 |
| SA | 0.0254 ± 0.0012 | 0.0023 ± 0.0002 | 0.0478 ± 0.0023 | 0.8883 ± 0.0032 | 0.9688 ± 0.0017 |
| HOES | 0.0225 ± 0.0010 | 0.0018 ± 0.0001 | 0.0364 ± 0.0017 | 0.9194 ± 0.0028 | 0.9822 ± 0.0013 |
Significant values are in bold.
Table 10.
Performance comparison of different optimization algorithms for optimizing SJ-LSTM model on the wind power dataset.
| Algorithms | MAE | MSE | RMSE | R 2 | WIA |
|---|---|---|---|---|---|
| JONM | 0.0395 ± 0.0019 | 0.0038 ± 0.0002 | 0.0618 ± 0.0029 | 0.9509 ± 0.0028 | 0.9870 ± 0.0015 |
| APO | 0.0450 ± 0.0022 | 0.0046 ± 0.0003 | 0.0680 ± 0.0033 | 0.9406 ± 0.0031 | 0.9835 ± 0.0018 |
| BSLO | 0.0465 ± 0.0023 | 0.0051 ± 0.0003 | 0.0718 ± 0.0035 | 0.9339 ± 0.0033 | 0.9818 ± 0.0019 |
| EDO | 0.0441 ± 0.0021 | 0.0046 ± 0.0003 | 0.0678 ± 0.0032 | 0.9410 ± 0.0030 | 0.9843 ± 0.0017 |
| FLO | 0.0532 ± 0.0026 | 0.0060 ± 0.0004 | 0.0777 ± 0.0038 | 0.9226 ± 0.0037 | 0.9780 ± 0.0020 |
| IAO | 0.0482 ± 0.0023 | 0.0052 ± 0.0003 | 0.0719 ± 0.0035 | 0.9336 ± 0.0032 | 0.9814 ± 0.0018 |
| IVYA | 0.0418 ± 0.0020 | 0.0044 ± 0.0002 | 0.0666 ± 0.0031 | 0.9431 ± 0.0029 | 0.9846 ± 0.0016 |
| SA | 0.0501 ± 0.0024 | 0.0052 ± 0.0003 | 0.0723 ± 0.0036 | 0.9329 ± 0.0033 | 0.9817 ± 0.0019 |
| GA | 0.0599 ± 0.0029 | 0.0075 ± 0.0005 | 0.0866 ± 0.0043 | 0.9038 ± 0.0041 | 0.9718 ± 0.0023 |
| HOES | 0.0292 ± 0.0013 | 0.0029 ± 0.0002 | 0.0533 ± 0.0024 | 0.9685 ± 0.0022 | 0.9913 ± 0.0011 |
Significant values are in bold.
Table 11.
Performance comparison of different optimization algorithms for optimizing SJ-LSTM model on the solar power dataset.
| Algorithms | MAE | MSE | RMSE | R 2 | WIA |
|---|---|---|---|---|---|
| JONM | 0.0220 ± 0.0010 | 0.0018 ± 0.0001 | 0.0419 ± 0.0020 | 0.9777 ± 0.0021 | 0.9942 ± 0.0010 |
| APO | 0.0304 ± 0.0015 | 0.0026 ± 0.0002 | 0.0510 ± 0.0024 | 0.9669 ± 0.0027 | 0.9910 ± 0.0013 |
| BSLO | 0.0307 ± 0.0015 | 0.0025 ± 0.0002 | 0.0503 ± 0.0023 | 0.9679 ± 0.0026 | 0.9915 ± 0.0013 |
| EDO | 0.0277 ± 0.0013 | 0.0022 ± 0.0002 | 0.0465 ± 0.0021 | 0.9725 ± 0.0024 | 0.9926 ± 0.0011 |
| FLO | 0.0363 ± 0.0018 | 0.0034 ± 0.0003 | 0.0580 ± 0.0028 | 0.9573 ± 0.0030 | 0.9885 ± 0.0016 |
| IAO | 0.0314 ± 0.0016 | 0.0025 ± 0.0002 | 0.0501 ± 0.0023 | 0.9680 ± 0.0026 | 0.9915 ± 0.0013 |
| IVYA | 0.0346 ± 0.0017 | 0.0029 ± 0.0002 | 0.0538 ± 0.0026 | 0.9631 ± 0.0028 | 0.9902 ± 0.0015 |
| SA | 0.0380 ± 0.0019 | 0.0034 ± 0.0003 | 0.0581 ± 0.0029 | 0.9570 ± 0.0031 | 0.9880 ± 0.0016 |
| GA | 0.0337 ± 0.0016 | 0.0028 ± 0.0002 | 0.0528 ± 0.0025 | 0.9646 ± 0.0027 | 0.9903 ± 0.0014 |
| HOES | 0.0153 ± 0.0008 | 0.0010 ± 0.0001 | 0.0319 ± 0.0015 | 0.9866 ± 0.0018 | 0.9988 ± 0.0006 |
Significant values are in bold.
Table 12.
Performance comparison of different optimization algorithms for optimizing SJ-LSTM model on THE ETT_m1 Dataset.
| Algorithms | MAE | MSE | RMSE | R 2 | WIA |
|---|---|---|---|---|---|
| JONM | 0.0092 ± 0.0004 | 0.0001 ± 0.00001 | 0.0121 ± 0.0006 | 0.9727 ± 0.0020 | 0.9928 ± 0.0010 |
| APO | 0.0116 ± 0.0006 | 0.0003 ± 0.00002 | 0.0159 ± 0.0008 | 0.9523 ± 0.0025 | 0.9867 ± 0.0013 |
| BSLO | 0.0129 ± 0.0006 | 0.0003 ± 0.00002 | 0.0167 ± 0.0009 | 0.9473 ± 0.0027 | 0.9861 ± 0.0013 |
| EDO | 0.0110 ± 0.0005 | 0.0002 ± 0.00002 | 0.0146 ± 0.0007 | 0.9600 ± 0.0023 | 0.9894 ± 0.0011 |
| FLO | 0.0137 ± 0.0007 | 0.0003 ± 0.00002 | 0.0178 ± 0.0009 | 0.9402 ± 0.0029 | 0.9840 ± 0.0015 |
| IAO | 0.0154 ± 0.0008 | 0.0004 ± 0.00003 | 0.0190 ± 0.0010 | 0.9322 ± 0.0030 | 0.9821 ± 0.0016 |
| IVYA | 0.0124 ± 0.0006 | 0.0003 ± 0.00002 | 0.0164 ± 0.0008 | 0.9496 ± 0.0026 | 0.9863 ± 0.0013 |
| SA | 0.0149 ± 0.0007 | 0.0004 ± 0.00003 | 0.0193 ± 0.0010 | 0.9297 ± 0.0032 | 0.9808 ± 0.0017 |
| GA | 0.0139 ± 0.0007 | 0.0003 ± 0.00002 | 0.0176 ± 0.0009 | 0.9417 ± 0.0028 | 0.9854 ± 0.0014 |
| HOES | 0.0083 ± 0.0004 | 0.0001 ± 0.00001 | 0.0101 ± 0.0005 | 0.9880 ± 0.0018 | 0.9963 ± 0.0008 |
Significant values are in bold.
Fig. 2.

Loss variation of SJ-LSTM model on the traffic dataset (different optimization algorithms).
Fig. 3.

Loss variation of SJ-LSTM model on the weather dataset (different optimization algorithms).
Fig. 4.

Loss variation of SJ-LSTM model on the household dataset (different optimization algorithms).
Fig. 5.

Loss variation of SJ-LSTM model on the wind power dataset (different optimization algorithms).
Fig. 6.

Loss variation of SJ-LSTM model on the solar power dataset (different optimization algorithms).
Fig. 7.

Loss variation of SJ-LSTM model on the ETT_m1 dataset (different optimization algorithms).
On the Traffic dataset, HOES performs the best among all optimization algorithms, and the next best algorithm is JONM. In the model convergence problem, the JONM algorithm falls into a local optimum in the tenth round, and the model loss remains almost unchanged in subsequent rounds, while HOES successfully jumps out of the local optimum in the tenth round, and the model loss decreases to a certain extent, which indicates that HOES has stronger This indicates that HOES has a stronger global exploration capability. In terms of prediction error, the RMSE of HOES is 0.0431, which is 24.3% lower than that of JONM (0.0569), and the MAE decreases from 0.0397 to 0.0301, which is 24.2% lower.
On the Weather dataset, HOES performs the best among all optimization algorithms, and the second best algorithm is JONM. In terms of model convergence, the loss curve of HOES is smoother, whereas the loss curve of JONM has a significant zigzag and decreases in convergence after the first round, which indicates that the optimization process of HOES is more stable. In terms of prediction error, the RMSE of HOES is 0.0139, which is 26.8% lower than that of JONM (0.0190); the MAE is 20.3% lower than that of JONM (0.0133); and the MSE is 25% lower than that of JONM (0.0004).
On the Household dataset, HOES performs the best among all optimization algorithms, and the next best algorithm is JONM. In terms of model convergence, the loss curve of HOES is smoother, while the loss curve of JONM also shows obvious zigzagging after the first round, and the convergence speed decreases significantly, which indicates that the optimization process of HOES is more stable. In terms of prediction error, the RMSE of HOES is 0.0364, which is 18.5% lower than that of JONM (0.0447), and the MAE is 10.7% lower than that of JONM (0.0252–0.0225).
On the Wind Power dataset, HOES performs the best among all optimization algorithms, and the second best algorithm is JONM. In terms of model convergence, the loss curve of JONM shows obvious zigzagging after the first round, with a significant decrease in the convergence speed, even falling into the local optimal solution, while the loss curve of HOES stays smooth throughout the whole training process and is lower than that of JONM after the fifth round, which suggests that HOES has stronger global exploration ability. In terms of prediction error, the RMSE of HOES is 0.0533, which is 13.8% lower than the 0.0618 of JONM, and the MAE is reduced from 0.0395 to 0.0292, which is 26.1% less.
On the Solar Power dataset, HOES performs the best among all optimization algorithms, and the next best algorithm is JONM. In terms of model convergence, the loss curve of HOES decreases rapidly at the beginning of the training period and reaches the globally optimal solution in a small number of iterations, while the convergence speed of JONM is relatively slow. In terms of prediction error, the RMSE of HOES is 0.0319, which is 23.9% lower than that of JONM (0.0419), and the MAE is 30.5% lower than that of JONM (0.0220), indicating that HOES has a significant advantage in prediction accuracy.
On the ETT_m1 dataset, HOES has the best performance among all optimization algorithms, and the second best algorithm is JONM. In terms of model convergence, the convergence speeds of HOES and JONM algorithms are close to each other, but the loss curves of HOES are smoother in general and remain stable in the later stage of the training period, whereas the loss curves of JONM show small fluctuations. In terms of prediction error, the RMSE of HOES is 0.0101, which is 16.5% lower than that of JONM (0.0121), and the MAE is reduced from 0.0092 to 0.0083, which is 9.8% lower than that of JONM.
The experimental results on multiple datasets show that the HOES algorithm outperforms the JONM algorithm in terms of prediction accuracy (e.g., significant reduction of RMSE and MAE) and optimization stability (e.g., smoothing of loss curves). In addition, the HOES algorithm shows stronger global optimization ability in multiple datasets, especially in the Wind Power and Traffic datasets, and is able to effectively avoid local optimal solutions. It is worth noting that the HOES algorithm significantly outperforms other optimization algorithms in the transmission mechanism in terms of prediction accuracy, which fully reflects the effectiveness of its transmission mechanism. Overall, HOES algorithm shows significant advantages in model convergence speed, prediction accuracy and global optimization ability, and is an efficient and stable optimization algorithm.
To rigorously validate the observed performance improvements of HOES over baseline methods and address the robustness of our claims, we conducted comprehensive statistical significance tests. The null hypothesis (H0) posited no systematic difference between HOES and the suboptimal baseline (JONM), while the alternative hypothesis (H1) asserted a statistically significant difference in predictive accuracy (measured by R2). We employed two complementary tests:
Two-sided paired t‐tests (parametric, assuming normality of differences).
Two-sided Wilcoxon signed‐rank tests (non‐parametric, rank‐based).
Both tests were performed across all six datasets at a significance level (α = 0.05). The effect size (Mean Difference %) quantifies the practical improvement beyond statistical significance. Furthermore, to provide an interval estimate of the performance difference, we calculated the 95% confidence intervals (CIs) for the R2 metric based on the results from multiple independent runs. Results are summarized in Table 13.
Table 13.
Statistical significance of HOES vs. baseline.
| Dataset | HOEs Mean | Baseline Mean | Mean Difference (HOES - Base, %) | t-test p-value | Wilcoxon p-value | 95% CI |
|---|---|---|---|---|---|---|
| Traffic | 0.9643 | 0.9421 | 2.22 | 0.0034 | 0.0041 | (0.9591, 0.9695) |
| Weather | 0.9901 | 0.9659 | 2.42 | 0.0022 | 0.0018 | (0.9873, 0.9929) |
| Household | 0.9172 | 0.8895 | 2.77 | 0.0041 | 0.0053 | (0.9085, 0.9259) |
| Wind power | 0.9535 | 0.9387 | 1.48 | 0.0112 | 0.0136 | (0.9472, 0.9598) |
| Solar power | 0.9891 | 0.9527 | 3.64 | 0.0009 | 0.0007 | (0.9858, 0.9924) |
| ETT_m1 | 0.9748 | 0.9472 | 2.76 | 0.0019 | 0.0025 | (0.9695, 0.9801) |
All p-values are below 0.05, providing strong evidence to reject H0 and confirm that HOES’s superiority is statistically significant (p < 0.05) across all datasets. The consistency between the parametric (t-test) and non-parametric (Wilcoxon) results further reinforces the robustness of these findings against distributional assumptions. The 95% confidence intervals, which do not contain the baseline mean values and are consistently above them, provide a precise estimate of HOES’s performance and reinforce the conclusion of its superiority. The effect sizes (1.48–3.64%) align with the observed reductions in RMSE/MAE, demonstrating that HOES’s improvements are both statistically and practically significant.
Ablation studies
To quantitatively evaluate the contribution of each core mechanism in HOES, we conducted comprehensive ablation studies across all six datasets. We compared the performance of the following variants against the full HOES model:
w/o memory: HOES without the memory system
Static Memory: HOES with a static memory pool (no dynamic replacement policy; the memory is initialized and fixed after the first iteration).
w/o Penalty: HOES without the penalty system (all six algorithms run for the full number of iterations).
Single-Mechanism HOES: A degraded version using only the transmission mechanism with EDM-IVYA (selected as the best-performing single EA from our preliminary analysis) instead of the full portfolio.
We report both the predictive accuracy (RMSE, MAE) and computational cost (Total Fitness Evaluations, Wall-clock Time in hours). The performance delta (Δ) for RMSE is calculated as (Variant_RMSE - Full_HOES_RMSE) / Full_HOES_RMSE * 100%. The results are summarized in Tables 14, 15, 16, 17, 18 and 19.
Table 14.
Comprehensive ablation study results on traffic datasets.
| Variant | RMSE | ΔRMSE (%) |
MAE | Fitness Evals | Time (hours) |
|---|---|---|---|---|---|
| Full HOES | 0.0431 ± 0.0019 | 0 | 0.0301 ± 0.0014 | 2250 | 38.5 |
| w/o memory | 0.0512 ± 0.0023 | 18.80% | 0.0378 ± 0.0018 | 2550 | 42.1 |
| Static memory | 0.0475 ± 0.0021 | 10.20% | 0.0341 ± 0.0016 | 2400 | 40.2 |
| w/o Penalty | 0.0449 ± 0.0020 | 4.20% | 0.0315 ± 0.0015 | 3750 | 58.7 |
| Single-mechanism | 0.0490 ± 0.0022 | 13.70% | 0.0352 ± 0.0017 | 1875 | 32.1 |
Table 15.
Comprehensive ablation study results on weather datasets.
| Variant | RMSE | ΔRMSE (%) | MAE | Fitness evals | Time (hours) |
|---|---|---|---|---|---|
| Full HOES | 0.0139 ± 0.0007 | 0 | 0.0106 ± 0.0005 | 2100 | 25.1 |
| w/o memory | 0.0168 ± 0.0009 | 20.90% | 0.0131 ± 0.0007 | 2450 | 28.3 |
| Static Memory | 0.0152 ± 0.0008 | 9.40% | 0.0119 ± 0.0006 | 2300 | 26.5 |
| w/o Penalty | 0.0145 ± 0.0008 | 4.30% | 0.0111 ± 0.0006 | 3750 | 41.2 |
| Single-mechanism | 0.0159 ± 0.0009 | 14.40% | 0.0125 ± 0.0007 | 1875 | 21 |
Table 16.
Comprehensive ablation study results on household datasets.
| Variant | RMSE | ΔRMSE (%) | MAE | Fitness evals | Time (hours) |
|---|---|---|---|---|---|
| Full HOES | 0.0364 ± 0.0017 | 0 | 0.0225 ± 0.0010 | 2350 | 41.8 |
| w/o memory | 0.0451 ± 0.0022 | 23.90% | 0.0298 ± 0.0014 | 2650 | 45.5 |
| Static memory | 0.0408 ± 0.0020 | 12.10% | 0.0261 ± 0.0012 | 2500 | 43.2 |
| w/o Penalty | 0.0380 ± 0.0018 | 4.40% | 0.0237 ± 0.0011 | 3750 | 62.5 |
| Single-mechanism | 0.0425 ± 0.0021 | 16.80% | 0.0274 ± 0.0013 | 1875 | 35.3 |
Table 17.
Comprehensive ablation study results on wind power datasets.
| Variant | RMSE | ΔRMSE (%) |
MAE | Fitness evals | Time (hours) |
|---|---|---|---|---|---|
| Full HOES | 0.0533 ± 0.0024 | 0 | 0.0292 ± 0.0013 | 2450 | 45.2 |
| w/o Memory | 0.0655 ± 0.0031 | 22.90% | 0.0379 ± 0.0018 | 2750 | 48.9 |
| Static memory | 0.0592 ± 0.0028 | 11.10% | 0.0335 ± 0.0016 | 2600 | 46.8 |
| w/o Penalty | 0.0554 ± 0.0026 | 3.90% | 0.0308 ± 0.0015 | 3750 | 66.1 |
| Single-mechanism | 0.0618 ± 0.0029 | 16.00% | 0.0351 ± 0.0017 | 1875 | 38.5 |
Table 18.
Comprehensive ablation study results on solar power datasets.
| Variant | RMSE | ΔRMSE (%) |
MAE | Fitness evals | Time (hours) |
|---|---|---|---|---|---|
| Full HOES | 0.0319 ± 0.0015 | 0 | 0.0153 ± 0.0008 | 1950 | 36.7 |
| w/o Memory | 0.0401 ± 0.0020 | 25.70% | 0.0201 ± 0.0010 | 2200 | 40.1 |
| Static Memory | 0.0362 ± 0.0018 | 13.50% | 0.0178 ± 0.0009 | 2100 | 38.2 |
| w/o Penalty | 0.0330 ± 0.0016 | 3.40% | 0.0160 ± 0.0008 | 3750 | 58.9 |
| Single-mechanism | 0.0375 ± 0.0019 | 17.60% | 0.0185 ± 0.0009 | 1875 | 31.5 |
Table 19.
Comprehensive ablation study results on ETT_m1 datasets.
| Variant | RMSE | ΔRMSE (%) |
MAE | Fitness evals | Time (hours) |
|---|---|---|---|---|---|
| Full HOES | 0.0101 ± 0.0005 | 0 | 0.0083 ± 0.0004 | 2050 | 22.5 |
| w/o Memory | 0.0125 ± 0.0006 | 23.80% | 0.0104 ± 0.0005 | 2350 | 25.3 |
| Static Memory | 0.0113 ± 0.0006 | 11.90% | 0.0095 ± 0.0005 | 2200 | 23.6 |
| w/o Penalty | 0.0105 ± 0.0005 | 4.00% | 0.0087 ± 0.0004 | 3750 | 35.8 |
| Single-mechanism | 0.0119 ± 0.0006 | 17.80% | 0.0099 ± 0.0005 | 1875 | 19.1 |
Impact of the Memory System: The removal of the memory system (w/o Memory) caused the largest and most consistent performance drop in predictive accuracy across all six datasets, with RMSE increases ranging from 18.8 to 25.7%. This robustly confirms the theoretical rationale that preserving and dynamically updating historically optimal solutions is crucial for maintaining population diversity and preventing premature convergence, which is the primary driver for HOES’s superior performance. The Static Memory variant also performed significantly worse than the full model, demonstrating the necessity of the dynamic replacement policy to adaptively guide the search.
Impact of the Penalty System: Disabling the penalty system (w/o Penalty) resulted in relatively minor accuracy degradation (RMSE increases of 3.4–4.4%) but led to a drastic increase in computational cost. The fitness evaluations consistently reached the maximum of 3750, and the wall-clock time increased substantially, by approximately 53–80% compared to the full HOES. This perfectly aligns with its designed purpose: to dramatically improve computational efficiency by pruning ineffective optimizers with only a minimal sacrifice in final solution quality.
Synergy of Multiple Mechanisms: The Single-Mechanism variant was consistently and significantly outperformed by the full HOES across all datasets (RMSE increases of 13.7–17.8%), despite having lower computational cost. This underscores the critical importance of the hybrid, multi-algorithm approach. The synergy between the sequential transmission mechanism, the adaptive memory system, and the efficient resource allocation via the penalty system is fundamental to HOES’s robust and high-performing nature.
Conclusion
This study proposes a Hybrid Optimization Expert System (HOES) that integrates six evolutionary algorithms (GM-APO, CM-BSLO, NI-EDO, TDPM-FLO, LI-IAO, and EDM-IVYA) with a transmission mechanism, memory system, and penalty system to optimize deep learning models for time series prediction. Experimental results on six public datasets (Traffic, Weather, Household, Wind Power, Solar Power, and ETT_m1) demonstrate that HOES significantly enhances predictive accuracy and convergence speed. For instance, on the Traffic dataset, the HOES-optimized SJ-LSTM model achieves a 24% reduction in RMSE, 30% reduction in MAE, when compared to suboptimal algorithms. The memory system preserves historical optimal solutions to avoid local optima, the penalty system dynamically eliminates ineffective algorithms, thus ensuring computational efficiency. HOES shows robustness on diverse datasets, thereby outperforming traditional optimization approaches and single evolutionary algorithms. These findings highlight its potential for complex time series tasks and practical applications in transportation, energy and environmental forecasting. Future work will focus on extending HOES to broader domains, reducing computational overhead, and enhancing adaptability to dynamic environments.
Author contributions
P.W.: Investigation, formal analysis, funding acquisition, writing—original draft, writing—review and editing. C.F.: Supervision, project administration. X.Y.: Conceptualization, methodology, formal analysis, investigation, writing—original draft, writing—review and editing. X.C.: Software, visualization, writing—original draft, writing—review and editing. J.G.: Formal analysis, investigation, supervision, project administration, visualization. X.D.: Formal analysis, investigation, validation. Y.W.: Software, visualization, investigation. Z.L.: Formal analysis, visualization, investigation, methodology, writing—original draft, funding acquisition.
Funding
This This research is supported by the National Funded Postdoctoral Research Program GZC20241900, Natural Science Foundation Program of Xinjiang Uygur Autonomous Region 2024D01A141, Key Project of Open Fund ZSAQ202401, Tianchi Talents Program of Xinjiang Uygur Autonomous Region and Postdoctoral Fund of Xinjiang Uygur Autonomous Region, and Key Laboratory of Remote Sensing Application and Innovation (LRSAI-2025004).
Data availability
The hyperlink to the appendix is shown below: https://github.com/cx098/HOEs.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Changyuan Fan, Email: fcy@cuit.edu.cn.
Xiwen Yang, Email: always_on_the_way1@163.com.
References
- 1.Ghosh, B., Basu, B. & O’Mahony, M. Multivariate short-term traffic flow forecasting using time-series analysis. IEEE Trans. Intell. Transp. Syst.10, 246–254 (2009). [Google Scholar]
- 2.Liang, J. & Tang, W. Scenario reduction for stochastic day-ahead scheduling: A mixed autoencoder based time-series clustering approach. IEEE Trans. Smart Grid. 12, 2652–2662 (2020). [Google Scholar]
- 3.Rasp, S., Pritchard, M. S. & Gentine, P. Deep learning to represent subgrid processes in climate models. Proc. Natnl. Acad. Sci.115, 9684–9689 (2018). [DOI] [PMC free article] [PubMed]
- 4.Benidis, K. et al. Deep learning for time series forecasting: Tutorial and literature survey. ACM Comput. Surv.55, 1–36 (2022). [Google Scholar]
- 5.Zhou, H. et al. Informer: Beyond efficient transformer for long sequence time-series forecasting. AAAI Conference on Artificial Intelligence. 35, 11106–11115 (2021).
- 6.Singh, U., Tamrakar, S., Saurabh, K., Vyas, R. & Vyas, O. P. Optimizing parameters of deep learning models for stock price prediction. In 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT) (pp. 1–7). IEEE. (2024).
- 7.Bergstra, J. & Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res.13, 281–305 (2012). [Google Scholar]
- 8.Li, Z., Li, S. & Luo, X. Using quadratic interpolated beetle antennae search to enhance robot arm calibration accuracy, IEEE Robot. Autom. Lett.7, 12046–12053 (2022).
- 9.Wang, H. B., Gao, T. Q., Kinugawa, J. & Kosuge, K. Finding measurement configurations for accurate robot calibration: Validation with a cable-driven robot. IEEE Trans. Rob.33, 1156–1169 (2017).
- 10.Jiao, J., Ye, H., Zhu, Y. & Liu, M. Robust odometry and mapping for multi-LiDAR systems with online extrinsic calibration. IEEE Trans. Robot.38, 351–371 (2022).
- 11.Aleti, A. & Moser, I. A systematic literature review of adaptive parameter control methods for evolutionary algorithms. ACM Comput. Surv. (CSUR). 49, 1–35 (2016). [Google Scholar]
- 12.Li, Z., Li, S., Francis, A. & Luo, X. A novel calibration system for robot arm via an open dataset and a learning perspective, IEEE Trans. Circuits Syst. II Exp. Briefs.69, 5169–5173 (2022).
- 13.Chen, T., Li, S., Qiao, Y. & Luo, X. A robust and efficient ensemble of diversified evolutionary computing algorithms for accurate robot calibration. IEEE Trans. Instrum.Measurem.73, 1–14 (2024).
- 14.Ibarrondo, R., Gatti, G. & Sanz, M. Quantum genetic algorithm with individuals in multiple registers, IEEE Trans. Evol. Comput. 28, 788–797 (2024).
- 15.Patil, P. V., Kumaran, K., Vachhani, L., Ravitharan, S. & Chauhan, S. Robust state and unknown input estimator and its application to robot localization. IEEE/ASME Trans. Mechatron.27, 5147–5158 (2022).
- 16.Wei, P. et al. Efficient adaptive learning rate for convolutional neural network based on quadratic interpolation Egret swarm optimization Algorithm. Heliyon.18, e37814–e37814 (2024). [DOI] [PMC free article] [PubMed]
- 17.Wei, P. et al. A novel black widow optimization algorithm based on Lagrange interpolation operator for ResNet18. Biomimetics.10, 6 (2025). [DOI] [PMC free article] [PubMed]
- 18.Li, L. et al. Hyperband: A novel Bandit-based approach to hyperparameter optimization. Comput. Res. Repository. 18, 6765–6816 (2017). [Google Scholar]
- 19.Moayedi, H. et al. Optimization of ANFIS with GA and PSO estimating Α ratio in driven piles. Eng. Comput.36, 1–12 (2020). [Google Scholar]
- 20.Yuan et al. Short-term wind power prediction based on LSSVM–GSA model. Energy Convers. Manag.101, 393–401 (2015).
- 21.Li, Y. et al. A deep-learning intelligent system incorporating data augmentation for short-term. Voltage Stab. Assess. Power Syst. Appl. Energy. 308, 118347–118347 (2021).
- 22.Guresen, E. et al. Using artificial neural network models in stock market index prediction. Expert Syst. Appl.38, 10389–10397 (2011). [Google Scholar]
- 23.Li, Z., Li, S. & Luo, X. Efficient industrial robot calibration via a novel unscented Kalman filter-incorporated variable step-size Levenberg–Marquardt algorithm. IEEE Trans. Instrum. Measurem.72, 1–12 (2023).
- 24.Landgraf, C., Ernst, K., Schleth, G., Fabritius, M. & Huber, M. F. A hybrid neural network approach for increasing the absolute accuracy of industrial robots, in Proc. of IEEE 17th International Conference on Automation Science and Engineering, Lyon, France, Aug. 2021, pp. 468–474., Lyon, France, Aug. 2021, pp. 468–474 (2021).
- 25.Wen-chuan Wang, W., Tian, D. & Xu Hong-fei Zang. Arctic Puffin optimization: A Bio-inspired metaheuristic algorithm for solving engineering design optimization. Adv. Eng. Softw.195, 103694 (2024). [Google Scholar]
- 26.Bai, J. et al. Blood-sucking leech optimizer. Adv. Eng. Softw.195, 103696 (2024).
- 27.Truong, D. N. & Chou, J. S. Metaheuristic algorithm inspired by enterprise development for global optimization and structural engineering problems with frequency constraints. Eng. Struct.318, 118679 (2024). [Google Scholar]
- 28.Falahah, I. A. et al. Mohammad Dehghani, frilled Lizard optimization: A novel Bio-Inspired optimizer for solving engineering Applications, computers. Mater. Continua.79, 3631–3678 (2024). [Google Scholar]
- 29.Wu, X. et al. Information acquisition optimizer: A new efficient algorithm for solving numerical and constrained engineering optimization problems. J. Supercomput. (2024).
- 30.Mojtaba Ghasemi, M., Zare, P., Trojovský, R. V., Rao, E. & Trojovská Venkatachalam Kandasamy. Optimization based on the smart behavior of plants with its engineering applications: Ivy algorithm. Knowledge-Based Syst.295, 111850 (2024).
- 31.Torkaman, T., Roshanfar, M., Dargahi, J. & Hooshiar, A. Embedded six-DOF force-torque sensor for soft robots with learning-based calibration. IEEE Sens. J.23, 4204–4215 (2023). [Google Scholar]
- 32.Han, H., Bai, X., Hou, Y. & Qiao, J. Multitask particle swarm optimization with heterogeneous domain adaptation, IEEE Trans. Evol. Comput.28, 178–192 (2024).
- 33.Draper, N. R. & Smith, H. Applied Regression Analysis (Wiley, 1998).
- 34.Chai, T. & Draxler, R. R. Root mean square error (RMSE) or mean absolute error (MAE). Geoscientific Model. Dev. Discuss.7, 1525–1534 (2014). [Google Scholar]
- 35.Willmott, C. J. & Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Res.30, 79–82 (2005). [Google Scholar]
- 36.Willmott, C. J. On the validation of models. Phys. Geogr.2, 184–194 (1981). [Google Scholar]
- 37.Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature. 323, 533–536 (1986). [Google Scholar]
- 38.Cho, K., Van Merriënboer, B., Bahdanau, D. & Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv:1409.1259 (2014).
- 39.Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput.9, 1735–1780 (1997). [DOI] [PubMed] [Google Scholar]
- 40.Bai, S., Kolter, J. Z. & Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.0127 (2018).
- 41.Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst.30, (2017).
- 42.Wenjing Mao, W., Wang, L. & Zhao, J. S. Anbao, L. Modeling air quality prediction using a deep learning approach: Method optimization and evaluation. Sustain. Cities Soc.65, 102567 (2021).
- 43.Lin, H. et al. Learning rate dropout. IEEE Trans. Neural Netw. Learn. Syst.34, 9029–9039 (2023). [DOI] [PubMed]
- 44.Si, B., Liu, F. & Li, Y. Metamodel-Based hyperparameter optimization of optimization algorithms in building energy optimization. Buildings. 13, 167 (2023). [Google Scholar]
- 45.Morteza, A., Remi, A. & Chou. Distributed batch matrix multiplication: Trade-offs in download rate, randomness, and privacy. arXiv:2509.15047 (2025).
- 46.Raiaan, M. A. K., Sakib, S., Fahad, N. M. & Mamun, A. A. Md. Anisur Rahman, Swakkhar Shatabda, Md. Saddam Hossain Mukta, A systematic review of hyperparameter optimization techniques in convolutional neural networks. Decis. Analytics J. Volume. 11, 2772–6622 (2024). [Google Scholar]
- 47.Bischl, B. et al. Hyperparameter Optimization: Foundations, Algorithms, Best Practices, and Open Challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (2023).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The hyperlink to the appendix is shown below: https://github.com/cx098/HOEs.




















