Abstract
This paper presents an improved teaching-learning-based optimization (TLBO) algorithm for solving optimization problems, called RLTLBO. First, a new learning mode considering the effect of the teacher is presented. Second, the Q-Learning method in reinforcement learning (RL) is introduced to build a switching mechanism between two different learning modes in the learner phase. Finally, ROBL is adopted after both the teacher and learner phases to improve the local optima avoidance ability of RLTLBO. These two strategies effectively enhance the convergence speed and accuracy of the proposed algorithm. RLTLBO is analyzed on 23 standard benchmark functions and eight CEC2017 test functions to verify the optimization performance. The results reveal that proposed algorithm provides effective and efficient performance in solving benchmark test functions. Moreover, RLTLBO is also applied to solve eight industrial engineering design problems. Compared with the basic TLBO and seven state-of-the-art algorithms, the results illustrate that RLTLBO has superior performance and promising prospects for dealing with real-world optimization problems. The source codes of the RLTLBO are publicly available at https://github.com/WangShuang92/RLTLBO.
1. Introduction
In recent years, real-world optimization problems have become increasingly complex and diverse in a wide range of fields and disciplines. Traditional (mathematical) optimization methods, such as Newton's method and the gradient descent method can no longer meet the needs for solving current optimization problems. Thus, nontraditional methods, especially metaheuristic algorithms, are becoming increasingly pervasive among researchers [1–3]. Metaheuristics are algorithms based on intuition or experience, that can provide a feasible solution at an acceptable cost (referring to computing time and computational resources), and the deviation between the feasible solution and the optimal solution may not be predicted in advance. Metaheuristic optimization algorithms have the merits of being flexible, having few parameters and avoiding local optima. Additionally, they can be rapidly deployed and thus have been utilized for solving various optimization problems over the past decades [4, 5]. Some of the most representative meta-heuristic algorithms are listed as follows: genetic algorithms (GA) [6], differential evolution algorithm (DE) [7], simulated annealing (SA) [8], arithmetic optimization algorithm (AOA) [9], heat transfer relation-based optimization algorithm (HTOA) [10], particle swarm optimization (PSO) [11], salp swarm algorithm (SSA) [12], grey wolf optimizer (GWO) [13], whale optimization algorithm (WOA) [14], aquila optimizer (AO) [15], remora optimization algorithm (ROA) [16], etc.
Teaching-learning-based optimization (TLBO) is a meta-heuristic algorithm proposed by Rao et al. in 2011 [17]. The TLBO method is inspired by the teaching-learning process in a class and simulates the influence of a teacher on learners. Due to the advantages of rapid convergence, absence of algorithm-specific parameters and easy implementation, TLBO has become a viral optimization algorithm and has been successfully applied to real-world problems in diverse fields. Aouf et al. [18] applied TLBO to optimize the parameters of the ANFIS structure to obtain the optimal trajectory and traveling time to address the navigation problem of the mobile robot in a strange environment. Singh et al. [19] studied the application of TLBO for optimal coordination of directional overcurrent relays (DOCRs) in a looped power system. Multiobjective TLBO was applied to solve the motif discovery problem (MDP) in the bioinformatics field by Gonzalez-Alvarez et al. [20], and obtained better solutions than other biology-based multiobjective evolutionary algorithms. All the above applications have suggested that TLBO can be effectively applied to many optimization problems in various fields.
The improvement and hybrid algorithms of TLBO and their applications have also been studied by several researchers [21]. Kumar and Singh [22] developed a chaotic version of TLBO with different chaotic mechanisms. A local search method was also incorporated to guide the search direction between local and global search and to improve the quality of solution. The application of clustering problems proved the effectiveness of this algorithm. Taheri et al. [23] proposed a balanced TLBO with three modifications, called BTLBO. A weighted mean replaced the mean value in the teacher phase to maintain the diversity. The tutoring phase was added as a powerful local search mechanism for exploiting regions around the best solution. The restarting phase was introduced to improve the exploration ability by replacing inactive learners with randomly initialized learners. Ma et al. [24] proposed a modified TLBO (MTLBO) by introducing a population group mechanism into the basic TLBO. All students were divided into two groups and updated by different updating strategies. The MTLBO was also applied to establish the NOx emission model of a circulation fluidized bed boiler. Xu et al. [25] introduced dynamic-opposite learning (DOL) strategy into TLBO to overcome premature convergence. The asymmetric search space and the dynamic change in the characteristics of DOL help DOLTLBO to holistically improve the exploitation and exploration capabilities. Dong et al. [26] presented a KTLBO algorithm to achieve computationally expensive constrained optimization. The kriging-assisted two-phase optimization framework was used to alternately conduct global and local searches, achieving the search acceleration. KTLBO was also adopted to design the structure of a blended-wing-body underwater glider. Ren et al. [27] developed a multiobjective elitist feedback TLBO (MEFTO) for multiobjective optimization problems. The elitism strategy was used to store the best solutions obtained thus far. The proposed feedback phase allowed students to choose whether to study directly with the teacher or to motivate themselves, providing a novel way for students to improve themselves. Zhang et al. [28] proposed a hybrid algorithm based on TLBO and a neural network algorithm (NNA) named TLNNA to solve engineering optimization problems. The experimental results suggested that TLNNA has improved global search ability and fast convergence speed. By considering the features of the WOA and TLBO, Lakshmi and Mohanaiah [29] proposed a hybrid WOA-TLBO algorithm. This was also applied to solve the facial emotion recognition (FER) functional problem, and the reported results showed its effectiveness and high accuracy.
The TLBO variants proposed previously have improved searchability and accelerated the convergence process, but they still struggle with premature convergence and insufficient learning processes. Thus, in this paper we propose an improved TLBO algorithm to solve industrial engineering optimization problems. Given the characteristics of TLBO, reinforcement learning (RL) in machine learning is introduced to the learner phase, and enables the algorithm to choose a more suitable learning mode, which can train the search agents to perform more beneficial actions. In addition, a random opposition-based learning (ROBL) strategy is added after the whole learner phase to facilitate the convergence acceleration and avoid local optima. The proposed improved TLBO with RH and ROBL strategies is called RLTLBO. The standard and CEC2017 benchmark functions and eight engineering design problems are used to test the exploration and exploitation capabilities of the proposed method. The RLTLBO algorithm is compared with some existing algorithms, including the basic TLBO and the Salp Swarm Algorithm (SSA), which are considered the classical algorithms, the Aquila Optimizer (AO), Harris Hawks Optimization (HHO) [30], and Horse herd Optimization Algorithm (HOA) [31], which are the recent new methods, and the memory-based Grey Wolf Optimizer (mGWO) [32], modified Ant Lion Optimizer (MALO) [33] and dynamic Sine Cosine Algorithm (DSCA) [34], which are the latest improved algorithms. The experimental results show that the proposed RLTLBO method is superior to the state-of-the-art algorithms in exploration and exploitation capabilities. Moreover, eight industrial engineering design problems are applied to evaluate the effectiveness of the algorithm when solving real-world optimization problems.
The rest of this paper is organized as follows: Section 2 provides a brief overview of the basic TLBO, RL, and ROBL strategies. Section 3 describes the proposed RLTLBO algorithm in detail. Simulations, experiments and an analysis of the results are presented in Section 4. Section 5 describes industrial engineering design problems. Finally, Section 6 concludes the paper.
2. Related Work
2.1. Teaching-Learning-Based Optimization
The TLBO algorithm mimics the influence of a teacher on the output of learners, which can be reflected by learners' grades. As a highly learned person, the teacher gives their knowledge to the learners. The outcome of the learners is affected by the quality of the teacher. It is obvious that learners trained by a good teacher can achieve better results in terms of their grades. The optimization process of TLBO is divided into two phases: the teacher phase and the learner phase.
2.1.1. Teacher Phase
The teacher phase simulates the teaching process of a teacher. The best one in the class is selected as the teacher, and then the teacher tries their best to improve the overall level of the class. The teaching process can be formulated as follows:
| (1) |
where Xnew and Xold represent the positions of the individual after and before learning, that is, the candidate solutions after and before updating. Xteacher is the position of the teacher, which is the best individual of the population. Mean indicates the average level of search agents in the population. TF is a teaching factor that determines the change of the mean value, and rand is a random number between 0 and 1. The value of TF can be either 1 or 2, which is a heuristic step and randomly decided with equal probability as TF = round (1 + rand (0, 1){2–1}).
2.1.2. Learner Phase
In addition to learning new knowledge from the teacher, learners can also increase knowledge through interaction. In the mutual learning process, a learner can randomly learn knowledge from another learner with a better grade randomly. The expression of the learner phase can be written as follows:
| (2) |
where Xr1 and Xr2 indicate the positions of two learners randomly selected from the population. f (·) is the fitness value. The comparison between two learners determines the learning direction. The individual with a poor grade learns from the individual with a better grade. The new individual with improvements after learning will be accepted, otherwise rejected.
The flow chart of the TLBO algorithm is shown in Figure 1.
Figure 1.

The flowchart of TLBO.
2.2. Reinforcement Learning (RL)
Machine learning algorithms are also widely used to solve various optimization problems [35]. Machine learning methods generally consist of four categories, as shown in Figure 2: supervised learning, unsupervised learning, semisupervised learning, and reinforcement learning (RL). In RL algorithms, the agent is trained to learn optimal actions in a complex environment. The agent is trained in different ways and uses its training experience in the subsequent actions. RL methods generally consist of model-free and model-based approaches. The model-free approaches can be divided into two subgroups: value-based and policy-based methods. The value-based algorithms are convenient for coordinating with meta-heuristic algorithms because they are model-free and policy-free, providing higher flexibility [36]. In the value-based RL approaches, the reinforcement agent learns from its actions and experience in the environment, such through reward and penalty. The agent measures the success of the action in completing the task goal through the reward penalty and then makes a decision based on its achievement.
Figure 2.

Classification of the reinforcement learning algorithms.
The Q-Learning method is one of the representative algorithms among the value-based RL methods. In the Q-Learning method, the agent takes random actions and then obtains a reward or penalty. An experience is gradually constructed based on the agent's actions. Throughout process of building experience, a table called Q-Table is defined [37]. The agent considers all possible actions and tries to update its state according to the Q-Table values to select the best action that maximizes the current state's maximal rewards. Therefore, the agent in action determines whether to explore or exploit the environment.
Compared to RL methods, meta-heuristic algorithms often require deep expert knowledge to establish the balance between different phases. RL methods can help discover optimal designs of parameters and more balanced strategies allowing the algorithm to switch between the exploration and exploitation phases. Metaheuristic methods usually operate with specific policies in certain situations, and thus, the dynamism is lower than that of RL algorithms, especially value-based methods. The agent in the value-based methods is online and operates beneficial actions through a reward-penalty mechanism without following any policy. Many types of research have been presented in the literature regarding the combination of meta-heuristics and RL [38–44].
2.3. Random Opposition-Based Learning (ROBL)
Random opposition-based learning (ROBL) is a variant of opposition-based learning (OBL) [45] proposed by Long et al. in 2019 [46]. OBL is a powerful optimization tool that simultaneously considers the fitness of an estimate and its corresponding opposite estimate to achieve a better candidate solution. In contrast from the basic OBL, ROBL utilizes a random term to improve the OBL strategy, which is defined as follows:
| (3) |
where and xj indicate the opposite and original solutions, uj and lj are the upper and lower bound of the problem in jth dimension. The opposite solution is randomly selected in the opposite half of the search space. This solution is not only opposite, but also random, with a wider range of distributions. An example of ROBL solutions is shown in Figure 3. The opposite solution with a random term described by equation (3) is more stochastic than the basic OBL and can effectively help the algorithm jump out of the local optima.
Figure 3.

Example of ROBL solutions. Three sets of solutions (original solution, corresponding opposite solution (xobl), and random opposite solution (xrobl)) are labeled in a two-dimensional search space. The random opposite solutions are not only in the symmetric positions, but also with a wider range of distributions.
3. The Proposed RLTLBO Algorithm
3.1. New Learning Mode
The basic TLBO algorithm performs the learner phase after the teacher phase in each iteration. The search agent learns from other individuals in the learner phase. However, in the actual learning process, students learning from each other varies from person to person. Different students might choose different learning modes, such as formal communications, group discussions, presentations, etc. Moreover, the students might adjust the learning mode according to their learning situation during the learning process. Therefore, in this paper, we introduce another learning mode to diversify the learning methods of the students, which can be described in the equations as follows:
| (4) |
where Xr3 is the position of a learner randomly selected from the population. t and T are the current and maximum number of iterations.
In this learning mode, the effect of the teacher is introduced. Sometimes the mutual learning between students is not always beneficial, and the partial intervention of the teacher is more helpful to students' improvement. Students will not only learn from each other but also ask the teacher for help. At the beginning of the iterations, the weight of mutual learning among students is larger, and the algorithm pays more attention to random learning, which can maintain population diversity and increase global searchability. In the later iteration stage, students consult more from the teacher and approach the teacher, enhancing the algorithms local searchability.
3.2. Learner Phase with RL Strategy
To enable students to adjust their learning mode more effectively, Q-Learning in RL is introduced to complete the switching between both learning modes. The student uses Q-Table values as a guide to decide between different learning modes. The Q-table is updated using a reward-penalty mechanism. The student selects the best state by calculating the benefit degree of each possible state and taking the leaning mode with the highest Q-values for the next step. The student obtains a reward or a penalty according to its actions after each step. The general pattern of the RL agent and environmental framework is shown in Figure 4.
Figure 4.

Reinforcement learning agent and environment framework. at represents the current action st and st + 1 indicate the current and the next state, rt and rt + 1 indicate the current and the next reward, respectively.
In the Q-Learning method, a reward table is used to reward or penalize the agent for its action or state compositions, which users can provide. The reward table in this work contains the positive (+1) or negative (−1) rewards for each state and action couple. The Q-Table can be considered the agents experience, which should be assigned a zero value for all units in the beginning. Consequently, the student updates Q-Table using the Bellman equation (5) and prepares the Q-Table for the next iteration [44].
| (5) |
where st and st + 1 indicate the current and the next state respectively, Qt and Qt + 1 are the current Q-value and pre-estimated Q-value for the next state st + 1, and at represents the current action. λ and γ are the learning rate value and discount factor, respectively, which are numbers between 0 and 1. The learning rate determines how fast the algorithm should learn and controls the convergence of the learning process. The discount factor defines how much the algorithm learns from the mistake and controls the importance of future rewards. rt + 1 indicates the immediate reward or penalty an agent gets for taking current action.
In each iteration, the agent uses equation (5) to calculate and weight each possible state and action for the next step, before choosing the best action (learning mode 1 or learning mode (2) with the highest likelihood to get closer to the best optimal solution. Examples of the reward table and Q-Table are displayed in Figure 5. This RL strategy helps establish a switching mechanism between different learning modes in the learner phase and find the most suitable decision scheme. Four optional actions can occur as listed below:
When the student is learning in learning mode 1, they still decides to stay in learning mode 1
When the student is learning in learning mode 2, they still decides to stay in learning mode 2
When the student learns in learning mode 1, they decides to transition to learning mode 2
When the student learns in learning mode 1, they decides to transition to learning mode 2
Figure 5.

The reward table and Q-Table example of RLTLBO. (a) Reward Table sample (b) Q-Table sample.
The most important value of the RL strategy is to help the algorithm switch between different learning modes as and when needed during the learner phase. For the above reason, the algorithm can find better solutions faster and more effectively in the search space, considerably increasing the search efficiency. Therefore, the convergence speed of the algorithm can be improved effectively.
3.3. The Detail of RLTLBO
In the improved TLBO algorithm, the teacher phase of basic TLBO is carried out first. Then, the learner phase with RL strategy is implemented to achieve effective and efficient investigation in the search space. Finally, ROBL is added to enhance the ability of local optima avoidability. The random opposite solution increases the probability of the algorithm finding a better solution. This variant of TLBO, which incorporates RL, is named RLTLBO. The pseudocode and the flowchart of the proposed RLTLBO algorithm are shown in Algorithm 1 and Figure 6, respectively.
Algorithm 1.

Pseudocode of RLTLBO.
Figure 6.

The flowchart of RLTLBO.
3.4. Computational Complexity Analysis
RLTLBO mainly consists of three components: initialization, fitness evaluation, and position updating. In the initialization phase, the computational complexity of positions generated is O(N). Then, the computational complexity of fitness evaluation for the solution is O(2 × N) during the iteration process. Finally, we utilize ROBL to keep the algorithm from falling into local optima. Thus, the computational complexities of position updating of RLTLBO is O(2 × N × D), where D is the dimension size of the problem. Therefore, the total computational complexity of the proposed RLTLBO algorithm is O(3 × N + 2 × N × D).
4. Numerical Experiments and Results
In this section, two different kinds of benchmark functions are performed to evaluate the performance of the proposed RLTLBO algorithm. Standard benchmark functions are first tested to assess the algorithm in solving twenty-three simple numerical problems. Then, the CEC2017 benchmark functions are utilized to evaluate the algorithm in solving complex numerical problems. The RLTLBO is compared with three types of existing algorithms, including the classic methods, TLBO and SSA, the recently proposed algorithms, HOA [31], AO, and HHO [30], and the improved algorithms, mGWO [32], MALO [33] and DSCA [34]. For the consistency of all tests, we set the population size to N = 30, the dimension size to D = 30, and the maximum number of iterations to T = 500. All algorithms are run 30 times independently, and the average values and standard deviations are presented as the final experimental results. All experiments are implemented in MATLAB R2020b on a PC with Intel (R) Core (TM) i5-9500 CPU @ 3.00 GHz and RAM 16 GB memory on OS windows 10.
4.1. Standard Benchmark Function Experiments
Standard benchmark functions [47] can be divided into three types: unimodal, multimodal and fixed-dimension multimodal functions. Unimodal functions only have one global optimum and no local optima, which can be used to evaluate an algorithm's convergence rate and exploitation capability. Multimodal and fixed-dimension multimodal functions have a global optimum and multiple local optima. This characteristic makes these functions effective for testing the exploration and local optima avoidance abilities of an algorithm. The benchmark function details are listed in Tables 1–3.
Table 1.
Unimodal benchmark functions.
| Function | Dim | Range | f min |
|---|---|---|---|
| F 1(x)=∑i=1nxi2 | 30 | [−100, 100] | 0 |
| F 2(x)=∑i=1n|xi|+∏i=1n|xi| | 30 | [−10, 10] | 0 |
| F 3(x)=∑i=1n(∑j−1ixj)2 | 30 | [−100, 100] | 0 |
| F 4(x)=maxi{|xi|, 1 ≤ i ≤ n} | 30 | [−100, 100] | 0 |
| F 5(x)=∑i=1n−1[100(xi+1 − xi2)2+(xi − 1)2] | 30 | [−30, 30] | 0 |
| F 6(x)=∑i=1n(xi+5)2 | 30 | [−100, 100] | 0 |
| F 7(x)=∑i=1nixi4+random[0,1) | 30 | [−1.28, 1.28] | 0 |
Table 2.
Multimodal benchmark functions.
| Function | Dim | Range | f min |
|---|---|---|---|
| 30 | [−500, 500] | −418.9829 × Dim | |
| F 9(x)=∑i=1n[xi2 − 10 cos(2πxi)+10] | 30 | [−5.12, 5.12] | 0 |
| 30 | [−32, 32] | 0 | |
| 30 | [−600, 600] | 0 | |
| 30 | [−50, 50] | 0 | |
| 30 | [−50, 50] | 0 |
Table 3.
Fixed-dimension multimodal benchmark functions.
| Function | Dim | Range | f min |
|---|---|---|---|
| F 14(x)=(1/500+∑j=1251/j+∑i=12(xi − aij)6)−1 | 2 | [−65, 65] | 0.998 |
| F 15(x)=∑i=111[ai − x1(bi2+bix2)/bi2+bix3+x4]2 | 4 | [−5, 5] | 0.00030 |
| F 16(x)=4x12 − 2.1x14+1/3x16+x1x2 − 4x22+x24 | 2 | [−5, 5] | −1.0316 |
| F 17(x)=(x2 − 5.1/4π2x12+5/πx1 − 6)2+10(1 − 1/8π)cos x1+10 | 2 | [−5, 5] | 0.398 |
| 2 | [−2, 2] | 3 | |
| F 19(x)=−∑i=14ciexp(−∑j=13aij(xj − pij)2) | 3 | [−1, 2] | −3.86 |
| F 20(x)=−∑i=14ciexp(−∑j=16aij(xj − pij)2) | 6 | [0, 1] | −3.32 |
| F 21(x)=−∑i=15[(X − ai)(X − ai)T+ci]−1 | 4 | [0, 10] | −10.1532 |
| F 22(x)=−∑i=17[(X − ai)(X − ai)T+ci]−1 | 4 | [0, 10] | −10.4028 |
| F 23(x)=−∑i=110[(X − ai)(X − ai)T+ci]−1 | 4 | [0, 10] | −10.5363 |
4.1.1. Qualitative Results
The data results of the 23 standard benchmark functions are shown in Table 4, and the optimal results are bolded. For the unimodal functions F1–F7, the RLTLBO algorithm achieves the best results among all comparative algorithms on most functions in average values and standard deviations, and only obtains worse results on F5–F6. The RLTLBO obtains the theoretical optimum of F1 and F3. It can be concluded from the comparison results that RLTLBO has strong competitiveness in the unimodal functions, which indicates that the excellent exploitation capability comes from the RL mechanism.
Table 4.
Results of algorithms on 23 standard benchmark functions.
| Function | RLTLBO | TLBO | mGWO | MALO | DSCA | HOA | AO | HHO | SSA | |
|---|---|---|---|---|---|---|---|---|---|---|
| F1 | Mean | 0.00E + 00 | 3.90E − 79 | 4.26E − 19 | 1.37E − 03 | 2.55E − 288 | 3.13E − 136 | 2.34E − 104 | 8.97E − 98 | 1.30E − 07 |
| Std | 0.00E + 00 | 6.59E − 79 | 1.08E − 18 | 1.56E − 03 | 0.00E + 00 | 1.21E − 135 | 1.08E − 103 | 4.16E − 97 | 1.09E − 07 | |
| F2 | Mean | 1.29E − 223 | 4.17E − 40 | 3.37E − 12 | 6.86E + 01 | 5.92E − 171 | 4.44E − 68 | 2.82E − 53 | 1.34E − 48 | 1.79E + 00 |
| Std | 0.00E + 00 | 3.21E − 40 | 2.54E − 12 | 4.90E + 01 | 0.00E + 00 | 2.42E − 67 | 1.13E − 52 | 5.75E − 48 | 1.15E + 00 | |
| F3 | Mean | 0.00E + 00 | 2.50E − 17 | 6.41E − 01 | 4.81E + 03 | 1.43E − 241 | 2.23E + 02 | 2.22E − 101 | 7.16E − 79 | 1.61E + 03 |
| Std | 0.00E + 00 | 4.35E − 17 | 1.46E + 00 | 2.18E + 03 | 0.00E + 00 | 5.03E + 02 | 1.22E − 100 | 3.56E − 78 | 1.03E + 03 | |
| F4 | Mean | 3.07E − 221 | 1.72E − 32 | 2.42E − 03 | 1.64E + 01 | 1.97E − 134 | 5.04E − 65 | 3.20E − 53 | 2.51E − 48 | 1.11E + 01 |
| Std | 0.00E + 00 | 1.76E − 32 | 3.02E − 03 | 4.23E + 00 | 1.08E − 133 | 1.84E − 64 | 1.75E − 52 | 8.46E − 48 | 3.74E + 00 | |
| F5 | Mean | 2.65E + 01 | 2.42E + 01 | 2.64E + 01 | 9.86E − 01 | 2.85E + 01 | 2.89E + 01 | 6.82E − 03 | 1.22E − 02 | 2.55E + 02 |
| Std | 4.01E − 01 | 7.41E − 01 | 8.44E − 01 | 5.21E + 00 | 3.59E − 01 | 7.45E − 02 | 1.66E − 02 | 1.79E − 02 | 3.44E + 02 | |
| F6 | Mean | 9.03E − 02 | 2.57E − 06 | 4.54E − 01 | 5.00E − 04 | 6.01E + 00 | 6.46E + 00 | 4.43E − 05 | 9.58E − 05 | 1.28E − 07 |
| Std | 1.15E − 01 | 7.98E − 06 | 3.20E − 01 | 3.05E − 04 | 1.61E − 01 | 4.76E − 01 | 6.15E − 05 | 1.24E − 04 | 1.13E − 07 | |
| F7 | Mean | 3.57E − 05 | 1.12E − 03 | 4.61E − 03 | 1.05E − 04 | 2.54E − 04 | 5.88E − 02 | 9.62E − 05 | 1.68E − 04 | 1.81E − 01 |
| Std | 4.71E − 05 | 3.06E − 04 | 1.64E − 03 | 7.89E − 05 | 2.88E − 04 | 4.10E − 02 | 7.92E − 05 | 1.36E − 04 | 8.96E − 02 | |
| F8 | Mean | −7.36E + 03 | −7.85E + 03 | −6.58E + 03 | −1.22E + 04 | −3.96E + 03 | −4.30E + 03 | −8.92E + 03 | −1.25E + 04 | −7.56E + 03 |
| Std | 6.78E + 02 | 9.32E + 02 | 1.24E + 03 | 1.08E + 03 | 4.31E + 02 | 7.82E + 02 | 3.77E + 03 | 8.42E + 01 | 7.07E + 02 | |
| F9 | Mean | 0.00E + 00 | 1.41E + 01 | 1.70E + 01 | 8.44E + 01 | 0.00E + 00 | 5.06E + 01 | 0.00E + 00 | 0.00E + 00 | 5.19E + 01 |
| Std | 0.00E + 00 | 6.20E + 00 | 9.11E + 00 | 3.15E + 01 | 0.00E + 00 | 9.32E + 01 | 0.00E + 00 | 0.00E + 00 | 1.88E + 01 | |
| F10 | Mean | 8.88E − 16 | 7.05E − 15 | 1.14E + 00 | 4.77E + 00 | 8.88E − 16 | 6.10E − 15 | 8.88E − 16 | 8.88E − 16 | 2.62E + 00 |
| Std | 0.00E + 00 | 1.60E − 15 | 1.88E + 00 | 2.64E + 00 | 0.00E + 00 | 2.42E − 15 | 0.00E + 00 | 0.00E + 00 | 8.98E − 01 | |
| F11 | Mean | 0.00E + 00 | 3.29E − 04 | 4.86E − 03 | 6.05E − 02 | 0.00E + 00 | 1.18E − 01 | 0.00E + 00 | 0.00E + 00 | 2.24E − 02 |
| Std | 0.00E + 00 | 1.80E − 03 | 9.13E − 03 | 2.33E − 02 | 0.00E + 00 | 2.57E − 01 | 0.00E + 00 | 0.00E + 00 | 1.45E − 02 | |
| F12 | Mean | 8.32E − 04 | 5.38E − 07 | 3.51E − 02 | 1.60E − 05 | 8.37E − 01 | 1.23E + 00 | 3.04E − 06 | 1.02E − 05 | 7.22E + 00 |
| Std | 1.52E − 03 | 2.76E − 06 | 4.56E − 02 | 1.16E − 05 | 1.08E − 01 | 2.42E − 01 | 4.59E − 06 | 1.12E − 05 | 3.01E + 00 | |
| F13 | Mean | 2.00E + 00 | 7.41E − 02 | 3.83E − 01 | 1.70E − 03 | 2.76E + 00 | 3.08E + 00 | 4.57E − 05 | 8.69E − 05 | 2.19E + 01 |
| Std | 1.17E + 00 | 8.70E − 02 | 2.15E − 01 | 3.95E − 03 | 5.11E − 02 | 1.83E − 01 | 1.18E − 04 | 9.70E − 05 | 1.44E + 01 | |
| F14 | Mean | 1.06E + 00 | 9.98E − 01 | 9.98E − 01 | 1.46E + 00 | 1.35E + 00 | 2.78E + 00 | 4.06E + 00 | 1.36E + 00 | 1.16E + 00 |
| Std | 3.62E − 01 | 0.00E + 00 | 3.81E − 12 | 7.69E − 01 | 6.1E − 01 | 2.07E + 00 | 4.46E + 00 | 9.52E − 01 | 4.57E − 01 | |
| F15 | Mean | 3.55E − 04 | 3.82E − 04 | 3.04E − 03 | 1.40E − 03 | 8.91E − 04 | 6.77E − 03 | 5.00E − 04 | 4.01E − 04 | 3.55E − 03 |
| Std | 1.02E − 04 | 1.54E − 04 | 6.91E − 03 | 3.62E − 03 | 3.99E − 04 | 5.47E − 03 | 1.10E − 04 | 2.36E − 04 | 6.71E − 03 | |
| F16 | Mean | −1.03E + 00 | −1.03E + 00 | −1.03E + 00 | −1.03E + 00 | −1.03E + 00 | −9.99E − 01 | −1.03E + 00 | −1.03E + 00 | −1.03E + 00 |
| Std | 6.58E − 16 | 6.95E − 16 | 3.39E − 08 | 1.65E − 13 | 3.99E − 04 | 3.29E − 02 | 3.01E − 04 | 3.76E − 09 | 1.83E − 14 | |
| F17 | Mean | 3.98E − 01 | 3.98E − 01 | 3.98E − 01 | 3.98E − 01 | 4.09E − 01 | 3.99E − 01 | 3.98E − 01 | 3.98E − 01 | 3.98E − 01 |
| Std | 0.00E + 00 | 0.00E + 00 | 6.52E − 09 | 5.57E − 14 | 1.06E − 02 | 1.08E − 03 | 1.09E − 04 | 4.60E − 06 | 7.21E − 15 | |
| F18 | Mean | 3.00E + 00 | 3.00E + 00 | 3.00E + 00 | 3.00E + 00 | 3.00E + 00 | 4.94E + 00 | 3.03E + 00 | 3.00E + 00 | 3.00E + 00 |
| Std | 4.95E − 16 | 1.24E − 15 | 1.03E − 07 | 5.76E − 13 | 8.33E − 04 | 6.82E + 00 | 5.73E − 02 | 3.88E − 07 | 2.87E − 13 | |
| F19 | Mean | −3.86E + 00 | −3.86E + 00 | −3.86E + 00 | −3.86E + 00 | −3.82E + 00 | −3.86E + 00 | −3.85E + 00 | −3.86E + 00 | −3.86E + 00 |
| Std | 2.71E − 15 | 3.16E − 15 | 1.08E − 06 | 6.39E − 13 | 2.33E − 02 | 6.99E − 04 | 6.96E − 03 | 2.07E − 03 | 1.09E − 12 | |
| F20 | Mean | −3.31E + 00 | −3.30E + 00 | −3.23E + 00 | −3.23E + 00 | −2.80E + 00 | −3.25E + 00 | −3.16E + 00 | −3.08E + 00 | −3.23E + 00 |
| Std | 2.95E − 02 | 4.12E − 02 | 6.47E − 02 | 5.14E − 02 | 2.71E − 01 | 9.05E − 02 | 8.91E − 02 | 1.22E − 01 | 6.22E − 02 | |
| F21 | Mean | −1.02E + 01 | −1.02E + 01 | −9.98E + 00 | −7.62E + 00 | −3.27E + 00 | −9.43E + 00 | −1.01E + 01 | −5.18E + 00 | −8.07E + 00 |
| Std | 6.04E − 09 | 1.41E − 03 | 9.30E − 01 | 2.82E + 00 | 1.54E + 00 | 9.62E − 01 | 2.09E − 02 | 7.51E − 01 | 3.28E + 00 | |
| F22 | Mean | −1.04E + 01 | −1.01E + 01 | −1.04E + 01 | −7.06E + 00 | −3.87E + 00 | −9.36E + 00 | −1.04E + 01 | −5.08E + 00 | −9.32E + 00 |
| Std | 1.23E − 07 | 1.25E + 00 | 4.45E − 04 | 3.48E + 00 | 1.17E + 00 | 1.69E + 00 | 5.50E − 02 | 6.94E − 03 | 2.51E + 00 | |
| F23 | Mean | −1.05E + 01 | −1.01E + 01 | −1.05E + 01 | −7.31E + 00 | −4.19E + 00 | −9.63E + 00 | −1.05E + 01 | −5.24E + 00 | −7.89E + 00 |
| Std | 1.57E − 07 | 1.57E + 00 | 3.42E − 04 | 3.55E + 00 | 1.11E + 00 | 1.52E + 00 | 2.23E − 02 | 9.58E − 01 | 3.59E + 00 | |
For the multimodal and fixed-dimension multimodal functions F8–F23, it can be seen from Table 4 that RLTLBO achieves the smallest average values and standard deviations on 12 of all 16 test functions compared to other methods, which indicates a very high accuracy and stability. Several poor results appear on F8 and F12–F14, but they are not the worst results. The satisfying results on the multimodal and fixed-dimension multimodal functions prove that the exploration and local optima avoidance capabilities of the RLTLBO are excellent, which might be derived from the ROBL strategy.
Figure 7 provides the convergence curves of RLTLBO and the comparative algorithms for 23 standard benchmark functions. The convergence rate reflected by convergence curves can show us the improvement of exploration and exploitation more intuitively. For F1–F4, F7, F9–F11, and F15–F21, the RLTLBO presents a faster convergence speed than other meta-heuristic algorithms, and the convergence accuracy is also the best. The RLTLBO is ranked in the second position in terms of convergence speed for F22 and F23. For benchmark functions F5–F6, F8, and F12–F14, the RLTLBO does not perform very well, the same as the results in Table 4.
Figure 7.

Convergence curves of 23 standard benchmark functions.
4.1.2. The Wilcoxon Test
The Wilcoxon rank-sum test [48] results are listed in Table 5, which can assess the statistical performance differences between the RLTLBO algorithm and the comparative algorithms. A p-value less than 0.05 indicates a substantial difference between the two compared methods. It is obvious that the overwhelming majority p-values in Table 5 are less than 0.05, indicating that there are statistically and substantial differences between RLTLBO and the other methods. Combining the results in Table 4, it can be concluded that the RLTLBO algorithm outperforms the others. The competitive results of RLTLBO indicate that this algorithm has high capabilities of exploration and exploitation. In summary, the RLTLBO algorithm provides better results than other comparative algorithms.
Table 5.
p-Values from the Wilcoxon rank-sum test for the results in Table 4.
| Function | RLTLBO vs. | |||||||
|---|---|---|---|---|---|---|---|---|
| TLBO | mGWO | MALO | DSCA | HOA | AO | HHO | SSA | |
| F1 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | NaN | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 |
| F2 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 04 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 |
| F3 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 1.56E − 02 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 |
| F4 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 |
| F5 | 6.10E − 05 | 3.30E − 01 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 8.54E − 04 |
| F6 | 6.10E − 05 | 1.22E − 04 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 |
| F7 | 6.10E − 05 | 6.10E − 05 | 4.89E − 01 | 6.10E − 04 | 6.10E − 05 | 4.89E − 01 | 7.30E − 02 | 6.10E − 05 |
| F8 | 0.010254 | 6.37E − 02 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 1.21E − 01 | 6.10E − 05 | 5.61E − 01 |
| F9 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | NaN | 1.25E − 01 | NaN | NaN | 6.10E − 05 |
| F10 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | NaN | 6.10E − 05 | NaN | NaN | 6.10E − 05 |
| F11 | NaN | 1.95E − 03 | 6.10E − 05 | NaN | 3.12E − 02 | NaN | NaN | 6.10E − 05 |
| F12 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 |
| F13 | 3.05E − 04 | 6.10E − 04 | 6.10E − 05 | 3.89E − 01 | 2.01E − 03 | 6.10E − 05 | 6.10E − 05 | 3.05E − 04 |
| F14 | NaN | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 |
| F15 | 8.90E − 01 | 2.01E − 03 | 1.83E − 04 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 8.36E − 03 | 6.10E − 05 |
| F16 | NaN | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 1.22E − 04 | 6.10E − 05 |
| F17 | NaN | 6.10E − 05 | 2.44E − 04 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 9.76E − 04 |
| F18 | NaN | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 |
| F19 | NaN | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 |
| F20 | 8.52E − 01 | 4.13E − 02 | 1.35E − 01 | 6.10E − 05 | 2.01E − 03 | 6.10E − 05 | 6.10E − 05 | 3.05E − 04 |
| F21 | 1.68E − 01 | 6.10E − 05 | 4.79E − 02 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 1.03E − 02 |
| F22 | 6.25E − 02 | 6.10E − 05 | 2.56E − 02 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 4.13E − 02 |
| F23 | 7.81E − 03 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 2.56E − 02 |
4.2. CEC2017 Benchmark Function Experiments
Standard benchmark function experiments prove the superior performance on simple optimization problems of the proposed RLTLBO algorithm. CEC2017 [49], one of the most challenging test suites, can help check the performance of complex optimization problems. Some hybrid and composition functions are selected to further test the performance of RLTLBO. These types of functions are precisely what the standard test functions do not have. The functional details and the comparison results are presented in Tables 6 and 7. As mentioned above, each method runs 30 times with 30 search agents and 500 iterations. From Table 7, the proposed RLTLBO achieves both the best average and standard deviation values on five of the eight all functions. For the remaining three functions, RLTLBO obtains one of the best average and standard deviation values. The RLTLBO completely exceeds the TLBO, MALO, HOA, AO, HHO, and SSA methods completely. The statistical results are also listed in Table 8. There are only seven p-values greater than 0.05 in all test functions, which means considerable differences between the RLTLBO and the compared methods. These results suggest that RLTLBO can achieve great results on complex problems as well.
Table 6.
Descriptions of the benchmark functions from CEC2017.
| Function | Name | Dim | Range | f min |
|---|---|---|---|---|
| Hybrid functions (N is basic number of functions) | ||||
| C13 | Hybrid function 3 (N = 3) | 10 | [−100, 100] | 1300 |
| C14 | Hybrid function 4 (N = 4) | 10 | [−100, 100] | 1400 |
| C15 | Hybrid function 5 (N = 4) | 10 | [−100, 100] | 1500 |
| C19 | Hybrid function 6 (N = 5) | 10 | [−100, 100] | 1900 |
| Composite functions (N is basic number of functions) | ||||
| C22 | Composite function 2 (N = 3) | 10 | [−100, 100] | 2200 |
| C25 | Composite function 5 (N = 5) | 10 | [−100, 100] | 2500 |
| C28 | Composite function 8 (N = 6) | 10 | [−100, 100] | 2800 |
| C29 | Composite function 9 (N = 6) | 10 | [−100, 100] | 2900 |
Table 7.
Comparison results of algorithms on CEC2017.
| Function | RLTLBO | TLBO | mGWO | MALO | DSCA | HOA | AO | HHO | SSA | |
|---|---|---|---|---|---|---|---|---|---|---|
| C13 | Mean | 4.38E + 03 | 6.04E + 03 | 4.35E + 03 | 1.78E + 04 | 6.25E + 05 | 1.53E + 06 | 1.77E + 04 | 1.70E + 04 | 1.46E + 04 |
| Std | 2.76E + 03 | 4.33E + 03 | 2.99E + 03 | 1.30E + 04 | 4.55E + 05 | 1.28E + 06 | 1.39E + 04 | 1.03E + 04 | 1.29E + 04 | |
| C14 | Mean | 1.46E + 03 | 1.47E + 03 | 1.47E + 03 | 2.75E + 03 | 4.78E + 03 | 3.87E + 03 | 2.36E + 03 | 2.20E + 03 | 3.35E + 03 |
| Std | 1.81E + 01 | 2.40E + 01 | 1.98E + 01 | 2.02E + 03 | 3.76E + 03 | 1.99E + 03 | 1.12E + 03 | 1.05E + 03 | 3.10E + 03 | |
| C15 | Mean | 1.62E + 03 | 1.73E + 03 | 1.74E + 03 | 8.28E + 03 | 7.97E + 03 | 2.49E + 04 | 5.91E + 03 | 7.35E + 03 | 1.06E + 04 |
| Std | 5.96E + 01 | 1.44E + 02 | 2.36E + 02 | 5.72E + 03 | 3.62E + 03 | 1.54E + 04 | 2.16E + 03 | 3.10E + 03 | 7.51E + 03 | |
| C19 | Mean | 2.00E + 03 | 2.11E + 03 | 2.65E + 03 | 1.54E + 04 | 3.37E + 04 | 1.69E + 04 | 2.10E + 04 | 1.67E + 04 | 8.46E + 03 |
| Std | 9.63E + 00 | 3.19E + 02 | 1.68E + 03 | 1.23E + 04 | 3.00E + 04 | 1.34E + 04 | 2.88E + 04 | 1.37E + 04 | 6.44E + 03 | |
| C22 | Mean | 2.30E + 03 | 2.30E + 03 | 2.30E + 00 | 2.30E + 03 | 2.55E + 03 | 2.47E + 03 | 2.31E + 03 | 2.41E + 03 | 2.33E + 03 |
| Std | 1.99E + 01 | 8.68E + 00 | 9.25E − 01 | 2.88E + 01 | 8.10E + 01 | 4.58E + 02 | 5.85E + 00 | 3.85E + 02 | 1.69E + 02 | |
| C25 | Mean | 2.92E + 03 | 2.93E + 03 | 2.92E + 03 | 2.93E + 03 | 3.12E + 03 | 2.97E + 03 | 2.94E + 03 | 2.93E + 03 | 2.92E + 03 |
| Std | 2.32E + 01 | 2.41E + 01 | 2.33E + 01 | 2.38E + 01 | 6.48E + 01 | 2.35E + 01 | 2.50E + 01 | 6.24E + 01 | 2.45E + 01 | |
| C28 | Mean | 3.23E + 03 | 3.30E + 03 | 3.33E + 03 | 3.31E + 03 | 3.40E + 03 | 3.50E + 03 | 3.44E + 03 | 3.45E + 03 | 3.29E + 03 |
| Std | 1.15E + 02 | 1.60E + 02 | 1.12E + 02 | 1.47E + 02 | 9.48E + 01 | 1.06E + 02 | 1.09E + 02 | 1.45E + 02 | 1.68E + 02 | |
| C29 | Mean | 3.18E + 03 | 3.19E + 03 | 3.17E + 03 | 3.27E + 03 | 3.38E + 03 | 3.38E + 03 | 3.26E + 03 | 3.37E + 03 | 3.27E + 03 |
| Std | 1.84E + 01 | 2.16E + 01 | 2.13E + 01 | 6.15E + 01 | 5.77E + 01 | 6.58E + 01 | 5.87E + 01 | 1.20E + 02 | 7.20E + 01 | |
Table 8.
p values from the Wilcoxon rank-sum test for the results in Table 7.
| Function | RLTLBO vs. | |||||||
|---|---|---|---|---|---|---|---|---|
| TLBO | mGWO | MALO | DSCA | HOA | AO | HHO | SSA | |
| C13 | 2.90E − 02 | 3.59E − 01 | 1.81E − 02 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 3.36E − 03 | 1.22E − 04 |
| C14 | 1.35E − 01 | 4.37E − 02 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 1.83E − 04 | 6.10E − 05 |
| C15 | 3.36E − 03 | 8.36E − 03 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 |
| C19 | 1.24E − 02 | 3.30E − 02 | 8.54E − 04 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 |
| C22 | 4.27E − 03 | 4.13E − 02 | 4.04E − 02 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 8.47E − 02 |
| C25 | 5.61E − 01 | 8.47E − 01 | 8.47E − 01 | 6.10E − 05 | 2.01E − 03 | 1.21E − 02 | 1.69E − 02 | 3.62E − 01 |
| C28 | 2.48E − 02 | 1.51E − 02 | 4.79E − 02 | 1.81E − 02 | 1.53E − 03 | 6.10E − 04 | 5.37E − 03 | 4.21E − 02 |
| C29 | 4.54E − 03 | 6.10E − 04 | 6.10E − 05 | 6.10E − 05 | 6.10E − 05 | 8.54E − 04 | 6.10E − 05 | 1.51E − 02 |
5. Experiments on Industrial Engineering Design Problems
In this section, eight well-known constrained industrial engineering design problems, including the welded beam design problem, pressure vessel design problem, tension and compression spring design problem, speed reducer design problem, three-bar truss design problem, car crashworthiness design problem, tubular column design problem, and frequency-modulated sound wave design problem, are solved to further verify the performance of the proposed RLTLBO algorithm. The results of RLTLBO are compared to various optimization methods proposed in previous studies.
5.1. Welded Beam Design Problem
The purpose of this problem is to minimize the cost of the welded beam (Figure 8). Four variables need to be optimized: the thickness of weld (h), the thickness of the bar (b), length of the bar (l), and height of the bar (t). The mathematical formulation is listed as follows:
Consider
- Minimize subject to
(6) Variable range
| (7) |
where
| (8) |
Figure 8.

Welded beam design problem.
The RLTLBO is compared to SMA [50], WOA, MPA [51], MVO [52], GA, and HS [53] methods. The comparison results presented in Table 9 show the superior of the RLTLBO algorithm with a smaller cost than other algorithms.
Table 9.
Comparison results for the welded beam design problem.
| Algorithm | Optimum variables | Optimum cost | |||
|---|---|---|---|---|---|
| h | l | t | b | ||
| RLTLBO | 0.205730 | 3.253000 | 9.036600 | 0.205730 | 1.695200 |
| SMA [50] | 0.205400 | 3.258900 | 9.038400 | 0.205800 | 1.696040 |
| WOA [14] | 0.205396 | 3.484293 | 9.037426 | 0.206276 | 1.730499 |
| MPA [51] | 0.205728 | 3.470509 | 9.036624 | 0.205730 | 1.724853 |
| MVO [52] | 0.205463 | 3.473193 | 9.044502 | 0.205695 | 1.726450 |
| GA [6] | 0.248900 | 6.173000 | 8.178900 | 0.253300 | 2.430000 |
| HS [53] | 0.244200 | 6.223100 | 8.291500 | 0.240000 | 2.380700 |
5.2. Pressure Vessel Design Problem
The objective of this problem is to minimize the fabrication cost of the cylindrical pressure vessel to meet the pressure requirements. As shown in Figure 9, four structural parameters in this problem need to be minimized, including the thickness of the shell (Ts), the thickness of the head (Th), inner radius (R), and the length of the cylindrical section without the head (L). The formulation of four optimization constraints can be described as follows:
Consider
- Minimize subject to
(9) - Variable range
(10)
Figure 9.

Pressure vessel design problem.
From the results in Table 10, it is obvious that RLTLBO can obtain superior optimal values compared to AO, SMA, WOA, GWO, MVO, GA, and ES [54].
Table 10.
Comparison results for the pressure vessel design problem.
| Algorithm | Optimum variables | Optimum cost | |||
|---|---|---|---|---|---|
| Ts | Th | R | L | ||
| RLTLBO | 0.7698901 | 0.4201098 | 42.536830 | 171.348900 | 5926.77920 |
| AO [15] | 1.0540000 | 0.1828060 | 59.621900 | 38.8050000 | 5949.22580 |
| SMA [50] | 0.7931000 | 0.3932000 | 40.671100 | 196.217800 | 5994.18570 |
| WOA [14] | 0.8125000 | 0.4375000 | 42.098270 | 176.638998 | 6059.74100 |
| GWO [13] | 0.8125000 | 0.4345000 | 42.089200 | 176.758700 | 6051.56390 |
| MVO [52] | 0.8125000 | 0.4375000 | 42.090738 | 176.738690 | 6060.80660 |
| GA [6] | 0.8125000 | 0.4375000 | 42.097398 | 176.654050 | 6059.94634 |
| ES [54] | 0.8125000 | 0.4375000 | 42.098087 | 176.640518 | 6059.74560 |
5.3. Tension/Compression Spring Design Problem
This problem aims to minimize the weight of the tension/compression spring (Figure 10). Three variables need to be optimized, including the wire diameter (d), the number of active coils (N), and mean coil diameter (D). This problem can be described as follows:
Consider
- Minimize subject to
(11) - Variable range
(12)
Figure 10.

Tension/compression spring design problem.
The RLTLBO is compared to AO, SSA, WOA, GWO, PSO, GA, and HS algorithms. Results are listed in Table 11 and show that the RLTLBO can obtain the best weight compared to all other algorithms.
Table 11.
Comparison results for the tension/compression spring design problem.
| Algorithm | Optimum variables | Optimum weight | ||
|---|---|---|---|---|
| d | D | N | ||
| RLTLBO | 0.0551180 | 0.505900 | 5.1167000 | 0.01093800 |
| AO [15] | 0.0502439 | 0.352620 | 10.542500 | 0.01116500 |
| SSA [12] | 0.0512070 | 0.345215 | 12.004032 | 0.01267630 |
| WOA [14] | 0.0512070 | 0.345215 | 12.004032 | 0.01267630 |
| GWO [13] | 0.0516900 | 0.356737 | 11.288850 | 0.01266600 |
| PSO [11] | 0.0517280 | 0.357644 | 11.244543 | 0.01267470 |
| GA [6] | 0.0514800 | 0.351661 | 11.632201 | 0.01270478 |
| HS [53] | 0.0511540 | 0.349871 | 12.076432 | 0.01267060 |
5.4. Speed Reducer Design Problem
In this case, the purpose is to minimize the weight of the speed reducer (Figure 11). Seven variables are considered, including face width (x1), a module of teeth (x2), a discrete design variable on behalf of the teeth in the pinion (x3), length of the first shaft between bearings (x4), length of the second shaft between bearings (x5), diameters of the first shaft (x6), and diameters of the second shaft (x7). The mathematical formulation is listed as follows:
Figure 11.

Speed reducer design problem.
Minimize
| (13) |
subject to
| (14) |
Variable range
| (15) |
Compared to AO, PSO, AOA, GA, SCA [55], HS, and FA [56], RLTLBO achieves better results in the speed reducer problem, as shown in Table 12.
Table 12.
Comparison results for the speed reducer design problem.
| Algorithm | Optimum variables | Optimum weight | ||||||
|---|---|---|---|---|---|---|---|---|
| x1 | x2 | x3 | x4 | x5 | x6 | x7 | ||
| RLTLBO | 3.497600 | 0.7000 | 17.0000 | 7.30000 | 7.800000 | 3.350060 | 5.285530 | 2995.43740 |
| AO [15] | 3.502100 | 0.7000 | 17.0000 | 7.30990 | 7.747600 | 3.364100 | 5.299400 | 3007.73280 |
| PSO [11] | 3.500100 | 0.7000 | 17.0002 | 7.51770 | 7.783200 | 3.350800 | 5.286700 | 3145.92200 |
| AOA [9] | 3.503840 | 0.7000 | 17.0000 | 7.30000 | 7.729330 | 3.356490 | 5.286700 | 2997.91570 |
| GA [6] | 3.510253 | 0.7000 | 17.0000 | 8.35000 | 7.800000 | 3.362201 | 5.287723 | 3067.56100 |
| SCA [55] | 3.508755 | 0.7000 | 17.0000 | 7.30000 | 7.800000 | 3.461020 | 5.289213 | 3030.56300 |
| HS [53] | 3.520124 | 0.7000 | 17.0000 | 8.37000 | 7.800000 | 3.366970 | 5.288719 | 3029.00200 |
| FA [56] | 3.507495 | 0.7001 | 17.0000 | 7.719674 | 8.080854 | 3.351512 | 5.287051 | 3010.13749 |
5.5. Three-Bar Truss Design Problem
The three-bar truss design problem aims to minimize the weight of a truss with three bars by controlling the length of three bars (A1, A2, and A3) (Figure 12). Three main constraints need to be satisfied, including deflection, stress, and buckling. The mathematical form of this problem is given:
Consider
- Minimize subject to
(16) Consider 0 ≤ x1, x2 ≤ 1, where l=100cm, P=2KN/cm2, σ=2KN/cm2.
Figure 12.

ThreE − bar truss design problem.
The result of RLTLBO is listed in Table 13, compared to AO, SSA, AOA, MVO, and GOA [57]. It can be observed that RLTLBO outperforms other algorithms in the literature.
Table 13.
Comparison results for the threE − bar truss design problem.
| Algorithm | Optimum variables | Optimum weight | |
|---|---|---|---|
| x1 | x2 | ||
| RLTLBO | 0.788420000000000 | 0.408110000000000 | 263.852300000000 |
| AO [15] | 0.792600000000000 | 0.396600000000000 | 263.868400000000 |
| SSA [12] | 0.788665410000000 | 0.408275784000000 | 263.895840000000 |
| AOA [9] | 0.793690000000000 | 0.394260000000000 | 263.915400000000 |
| MVO [52] | 0.788602760000000 | 0.408453070000000 | 263.895849900000 |
| GOA [57] | 0.788897555578973 | 0.407619570115153 | 263.895881496069 |
5.6. Car Crashworthiness Design Problem
The car crashworthiness design problem aims to minimize the weight by optimizing eleven influence variables [58], including the thickness of B-Pillar inner (x1), B-pillar reinforcement (x2), floor side inner (x3), cross members (x4), door beam (x5), door beltline reinforcement (x6) and roof rail (x7), materials of B-Pillar inner (x8) and floor side inner (x9), barrier height (x10), and barrier hitting position (x11). This problem can be formulated as follows.
Minimize
| (17) |
subject to
| (18) |
Variable range
| (19) |
The RLTLBO and DE, GA, FA, CS [59], GOA, and EOBL-GOA [58] are applied to solve the car crashworthiness problem. As shown in Table 14, compared to other methods, the proposed RLTLBO achieves the best result than others.
Table 14.
Comparison results for the car crashworthiness design problem.
| Algorithm | RLTLBO | DE [7] | GA [6] | FA [55] | CS [59] | GOA [57] | EOBL-GOA [58] |
|---|---|---|---|---|---|---|---|
| x1 | 0.50000 | 0.50000 | 0.50005 | 0.50000 | 0.50000 | 0.50000 | 0.50000 |
| x2 | 1.11621 | 1.11670 | 1.28017 | 1.36000 | 1.11643 | 1.11670 | 1.11643 |
| x3 | 0.50000 | 0.50000 | 0.50001 | 0.50000 | 0.50000 | 0.50000 | 0.50000 |
| x4 | 1.30215 | 1.30208 | 1.03302 | 1.20200 | 1.30208 | 1.30208 | 1.30208 |
| x5 | 0.50000 | 0.50000 | 0.50001 | 0.50000 | 0.50000 | 0.50000 | 0.50000 |
| x6 | 1.50000 | 1.50000 | 0.50000 | 1.12000 | 1.50000 | 1.50000 | 1.50000 |
| x7 | 0.50000 | 0.50000 | 0.50000 | 0.50000 | 0.50000 | 0.50000 | 0.50000 |
| x8 | 0.34500 | 0.34500 | 0.34994 | 0.34500 | 0.34500 | 0.34500 | 0.34500 |
| x9 | 0.332814 | 0.192000 | 0.192000 | 0.192000 | 0.192000 | 0.192000 | 0.192000 |
| x10 | −19.58840 | −19.54935 | 10.31190 | 8.87307 | −19.54935 | −19.54935 | −19.54935 |
| x11 | 0.019066 | −0.004310 | 0.001670 | −18.998080 | −0.004310 | −0.004310 | −0.004310 |
| Optimal weight | 22.84240 | 22.84298 | 22.85653 | 22.84298 | 22.84294 | 22.84474 | 22.84294 |
5.7. Tubular Column Design Problem
The main intention is to find a minimum cost for a uniform column, making the tubular section be able to carry a compressive load P = 2,500 kgf. The column is made of a material with a yield stress (σy) of 500 kgf/cm2, a modulus of elasticity (E) of 0.85 × 106 kgf/cm2, and a density (ρ) equal to 0.0025 kgf/cm3. The length (L) of the column is 250 cm. The cost of the column consists of material and construction costs. This problem is shown in Figure 13, and the optimization model of the problem is listed as follows.
Figure 13.

Tubular column design problem [59].
Minimize f(d, t)=9.8dt+2 d subject to
| (20) |
From the comparison results in Table 15, we can see that RLTLBO can obtain superior optimal cost compared to mGWO, DSCA, HOA, AO, HHO, and CS.
Table 15.
Comparison results for the tubular column design problem.
| Algorithm | Optimum variables | Optimum cost | |
|---|---|---|---|
| d | t | ||
| RLTLBO | 5.45120 | 0.29196 | 26.53130 |
| mGWO | 5.45080 | 0.29201 | 26.53270 |
| DSCA | 5.50250 | 0.29214 | 26.79030 |
| HOA | 5.26260 | 0.35487 | 28.86470 |
| AO | 5.46300 | 0.29656 | 26.83540 |
| HHO | 5.44380 | 0.29313 | 26.55820 |
| CS [59] | 5.45139 | 0.29196 | 26.53217 |
5.8. Frequency-Modulated Sound Waves Design Problem
This problem aims to optimize the frequency-modulated (FM) synthesizer parameter in six dimensions [60]. The following equation is given for optimization X={a1, ω1, a2ω2, a3, ω3} as a sound wave, where ai (i = 1, 2, 3) is the amplitude and ωi (i = 1, 2, 3) is the angular frequency. This problem has the lowest value . The objective function is calculated based on the square errors between the target wave and the estimated wave. This problem is modeled as follows.
Minimize
| (21) |
where
| (22) |
The RLTLBO is compared with GWO, MFO [61], PSO, TSA [62], and FFA [63] algorithms, and the comparison results are listed in Table 16. It is obvious that the proposed method found a much better solution than the comparative algorithms.
Table 16.
Comparison results for the frequency-modulated sound waves design problem.
| Algorithm | Optimum variables | Optimum cost | |||||
|---|---|---|---|---|---|---|---|
| a1 | ω1 | a2 | ω2 | a3 | ω3 | ||
| RLTLBO | −0.97498 | −5.0327 | −1.5640 | −4.7840 | −2.0060 | 4.9055 | 0.21738 |
| GWO [13] | −0.66540 | −0.1684 | 1.5173 | −0.1287 | −4.1335 | −4.8997 | 8.47250 |
| MFO [61] | 0.61410 | 0.0432 | −4.3251 | 4.7923 | 0.8339 | 0.1278 | 11.89690 |
| PSO [11] | −0.58860 | 5.0145 | −3.2779 | −4.9324 | −0.8562 | −0.1476 | 13.18070 |
| TSA [62] | 0.34150 | 4.7881 | 1.4309 | 0.1158 | 0.0975 | 0.5480 | 25.10520 |
| FFA [63] | −0.56270 | 0.0525 | −3.4797 | 4.8930 | 1.1491 | −4.8345 | 17.42910 |
In general, the excellent performance in solving industrial engineering design problems suggests that RLTLBO can be widely used in real-world optimization problems.
6. Conclusion
This study presents an improved teaching-learning-based optimization algorithm (RLTLBO) by incorporating reinforcement learning (RL) and random opposition-based learning (ROBL) strategies. Because of the defect of the insufficient learning processes, a new learning model is proposed in the learner phase. The two different modes uniting the inherent learning mode are switched through the Q-learning mechanism in RL. This mechanism helps the individuals learn thoroughly, resulting in accelerating the convergence speed of the RLTLBO. To improve the ability of local optima avoidance, the ROBL strategy is appended after the teacher and learner phases. The proposed RLTLBO algorithm is tested using 23 standard and eight CEC2017 benchmark functions to analyze its search performance. Experimental results illustrate competitive results compared to other state-of-the-art meta-heuristic algorithms. To further verify the superiority of RLTLBO, eight industrial engineering design problems are solved. The results are also very competitive with other comparative algorithms.
The code for RLTLBO is provided at https://github.com/WangShuang92/RLTLBO and can be used for more practical problems. However, this algorithm still suffers with premature convergence on several benchmark functions, which can be studied in the future. Moreover, RLTLBO can only solve single objective problems. For future research, binary and multiobjective versions of RLTLBO can be considered. More applications of this algorithm in different fields are valuable works, including text clustering, scheduling problems, appliances management, parameter estimation, feature selection, test classification, image segmentation problems, network applications, sentiment analysis, etc.
Acknowledgments
This research was funded by National fund cultivation project of Sanming University (PYS2107 and PYT2105), the Sanming University Introduces High-level Talents to Start Scientific Research Funding Support Project (21YG01S, 20YG01, and 20YG14), Fujian Natural Science Foundation Project (2021J011128), Bidding project for higher education research of Sanming University (SHE2101), the Guiding Science and Technology Projects in Sanming City (2020-S-39, 2020-G-61, and 2021-S-8), the Educational Research Projects of Young and Middle-aged Teachers in Fujian Province (JAT200638 and JAT200618), and the Scientific Research and Development Fund of Sanming University (B202029 and B202009), Open Research Fund of Key Laboratory of Agricultural Internet of Things in Fujian Province (ZD2101), Ministry of Education Cooperative Education Project (202002064014), School level education and teaching reform project of Sanming University (J2010306 and J2010305), Higher education research project of Sanming University (SHE2102 and SHE2013), and 2021 project of the 14th Five-year Plan of Education science in Fujian Province (FJJKBK21-138).
Contributor Information
Shuang Wang, Email: wang_shuang9279@163.com.
Heming Jia, Email: jiaheminglucky99@126.com.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
On behalf of all authors, the corresponding author states that there are no conflicts of interest.
References
- 1.Abualigah L., Diabat A. Advances in sine cosine algorithm: a comprehensive survey. Artificial Intelligence Review . 2021;54(4):2567–2608. doi: 10.1007/s10462-020-09909-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Abualigah L., Diabat A. A comprehensive survey of the grasshopper optimization algorithm: results, variants, and applications. Neural Computing & Applications . 2020;32(19) doi: 10.1007/s00521-020-04789-8.15533 [DOI] [Google Scholar]
- 3.Deng W., Zhang X., Zhou Y., et al. An enhanced fast non-dominated solution sorting genetic algorithm for multi-objective problems. Information Sciences . 2022;585:441–453. doi: 10.1016/j.ins.2021.11.052. [DOI] [Google Scholar]
- 4.Kumar Y., Singh P. K. Improved cat swarm optimization algorithm for solving global optimization problems and its application to clustering. Applied Intelligence . 2018;48(9):2681–2697. doi: 10.1007/s10489-017-1096-8. [DOI] [Google Scholar]
- 5.Wu E. Q., Zhou M., Hu D., et al. IEEE Transactions on Cybernetics . IEEE; 2020. Self-paced dynamic infinite mixture model for fatigue evaluation of pilots’ brains; pp. 1–6. [DOI] [PubMed] [Google Scholar]
- 6.Holland J. H. Genetic algorithms. Scientific American . 1992;267(1):66–72. doi: 10.1038/scientificamerican0792-66. [DOI] [Google Scholar]
- 7.Storn R., Price K. Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization . 1997;11(4):341–359. doi: 10.1023/a:1008202821328. [DOI] [Google Scholar]
- 8.Kirkpatrick S., Gelatt C. D., Vecchi M. P. Optimization by simulated annealing. Science . 1983;220(4598):671–680. doi: 10.1126/science.220.4598.671. [DOI] [PubMed] [Google Scholar]
- 9.Abualigah L., Diabat A., Mirjalili S., Abd Elaziz M., Gandomi A. H. The arithmetic optimization algorithm. Computer Methods in Applied Mechanics and Engineering . 2021;376 doi: 10.1016/j.cma.2020.113609.113609 [DOI] [Google Scholar]
- 10.Asef F., Majidnezhad V., Feizi-Derakhshi M.-R., Parsa S. Heat transfer relation-based optimization algorithm (HTOA) Soft Computing . 2021;25(13):8129–8158. doi: 10.1007/s00500-021-05734-0. [DOI] [Google Scholar]
- 11.Kennedy J., Eberhart R. Particle swarm optimization. Proceedings of the 1995 IEEE International Conference on Neural Networks, IEEE ICNN; December 1995; Perth, Australia. IEEE; pp. 1942–1948. [DOI] [Google Scholar]
- 12.Mirjalili S., Gandomi A. H., Mirjalili S. Z., Saremi S., Faris H., Mirjalili S. M. Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Advances in Engineering Software . 2017;114:163–191. doi: 10.1016/j.advengsoft.2017.07.002. [DOI] [Google Scholar]
- 13.Mirjalili S., Mirjalili S. M., Lewis A. Grey wolf optimizer. Advances in Engineering Software . 2014;69:46–61. doi: 10.1016/j.advengsoft.2013.12.007. [DOI] [Google Scholar]
- 14.Mirjalili S., Lewis A. The whale optimization algorithm. Advances in Engineering Software . 2016;95:51–67. doi: 10.1016/j.advengsoft.2016.01.008. [DOI] [Google Scholar]
- 15.Abualigah L., Yousri D., Elaziz M. A., Ewees A. A., Alqaness M. A. A., Gandomi A. H. Aquila optimizer: a novel meta-heuristic optimization algorithm. Computers & Industrial Engineering . 2021;157 doi: 10.1016/j.cie.2021.107250.107250 [DOI] [Google Scholar]
- 16.Jia H., Peng X., Lang C. Remora optimization algorithm. Expert Systems with Applications . 2021;185:p. 115665. doi: 10.1016/j.eswa.2021.115665. [DOI] [Google Scholar]
- 17.Rao R. V., Savsani V. J., Vakharia D. P. Teaching–learning-based optimization: a novel method for constrained mechanical design optimization problems. Computer-Aided Design . 2011;43:303–315. doi: 10.1016/j.cad.2010.12.015. [DOI] [Google Scholar]
- 18.Aouf A., Boussaid L., Sakly A. TLBO-based adaptive neurofuzzy controller for mobile robot navigation in a strange environment. Computational Intelligence and Neuroscience . 2018;2018:8. doi: 10.1155/2018/3145436.3145436 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Singh M., Panigrahi B. K., Abhyankar A. R. Optimal coordination of directional over-current relays using teaching learning-based optimization (TLBO) algorithm. International Journal of Electrical Power & Energy Systems . 2013;50:33–41. doi: 10.1016/j.ijepes.2013.02.011. [DOI] [Google Scholar]
- 20.Gonzalez-Alvarez D. L., Vega-Rodriguez M. A., Gomez-Pulido J. A., Sanchez-Perez J. M. Multiobjective teaching-learning-based optimization (MO-TLBO) for motif finding. Proceedings of the IEEE International Conference on Intelligence and Informatics CINTI’12; November 2012; Budapest Hungary. IEEE; pp. 1–6. [DOI] [Google Scholar]
- 21.Zou F., Chen D., Xu Q. A survey of teaching–learning-based optimization. Neurocomputing . 2018;335:366–383. doi: 10.1016/j.neucom.2018.06.076. [DOI] [Google Scholar]
- 22.Kumar Y., Singh P. K. A chaotic teaching learning based optimization algorithm for clustering problems. Applied Intelligence . 2019;49:1036–1062. doi: 10.1007/s10489-018-1301-4. [DOI] [Google Scholar]
- 23.Taheri A., Rahimizadeh K., Rao R. V. An efficient balanced teaching-learning-based optimization algorithm with individual restarting strategy for solving global optimization problems. Information Sciences . 2021;576:68–104. doi: 10.1016/j.ins.2021.06.064. [DOI] [Google Scholar]
- 24.Ma Y., Zhang X., Song J., Chen L. A modified teaching-learning-based optimization algorithm for solving optimization problem. KnowledgE − Based Systems . 2020;212 doi: 10.1016/j.knosys.2020.106599.106599 [DOI] [Google Scholar]
- 25.Xu Y., Yang Z., Li X., Kang H., Yang X. Dynamic opposite learning enhanced teaching-learning-based optimization. KnowledgE − Based Systems . 2020;188 doi: 10.1016/j.knosys.2019.104966.104966 [DOI] [Google Scholar]
- 26.Dong H., Wang P., Song B. Kriging-assisted teaching-learning-based optimization (KTLBO) to solve computationally expensive constrained problems. Information Sciences . 2020;556:404–435. doi: 10.1016/j.ins.2020.09.073. [DOI] [Google Scholar]
- 27.Ren Z., Jiang R., Yang F., Qiu J. A multi-objective elitist feedback teaching–learning-based optimization algorithm and its application. Expert Systems with Applications . 2021;188 doi: 10.1016/j.eswa.2021.115972.115972 [DOI] [Google Scholar]
- 28.Zhang Y., Jin Z., Chen Y. Hybrid teaching-learning-based optimization and neural network algorithm for engineering design optimization problems. KnowledgE − Based Systems . 2020;187 doi: 10.1016/j.knosys.2019.07.007.104836 [DOI] [Google Scholar]
- 29.Lakshmi A. V., Mohanaiah P. WOA-TLBO: whale optimization algorithm with teaching-learning-based optimization for global optimization and facial emotion recognition. Applied Soft Computing . 2021;110 doi: 10.1016/j.asoc.2021.107623.107623 [DOI] [Google Scholar]
- 30.Heidari A. A., Mirjalili S., Faris H., Aljarah I., Mafarja M., Chen H. Harris hawks optimization: algorithm and applications. Future Generation Computer Systems . 2019;97:849–872. doi: 10.1016/j.future.2019.02.028. [DOI] [Google Scholar]
- 31.Miarnaeimi F., Azizyan G., Rashki M. Horse herd optimization algorithm: a naturE − inspired algorithm for high-dimensional optimization problems. KnowledgE − Based Systems . 2020;213 doi: 10.1016/j.knosys.2020.106711.106711 [DOI] [Google Scholar]
- 32.Gupta S., Deep K. A memory-based grey wolf optimizer for global optimization tasks. Applied Soft Computing . 2020;93 doi: 10.1016/j.asoc.2020.106367.106367 [DOI] [Google Scholar]
- 33.Wang S., Sun K., Zhang W., Jia H. Multilevel thresholding using a modified ant lion optimizer with opposition-based learning for color image segmentation. Mathematical Biosciences and Engineering . 2021;18:3092–3143. doi: 10.3934/mbe.2021155. [DOI] [PubMed] [Google Scholar]
- 34.Li Y., Zhao Y., Liu J. Dynamic sine cosine algorithm for largE − scale global optimization problems. Expert Systems with Applications . 2021;177 doi: 10.1016/j.eswa.2021.114950.114950 [DOI] [Google Scholar]
- 35.Talbi E. Machine learning into metaheuristics: a survey and taxonomy of data-driven metaheuristics. ACM Computing Surveys . 2020;54:1–32. doi: 10.1145/3459664. [DOI] [Google Scholar]
- 36.Drugan M. Reinforcement learning versus evolutionary computation: a survey on hybrid algorithms. Swarm and Evolutionary Computation . 2019;44:228–246. doi: 10.1016/j.swevo.2018.03.011. [DOI] [Google Scholar]
- 37.Lingam G., Rout R. R., Somayajulu D. V. L. N. Adaptive deep Q-learning model for detecting social bots and influential users in online social networks. Applied Intelligence . 2019;49:3947–3964. doi: 10.1007/s10489-019-01488-3. [DOI] [Google Scholar]
- 38.Liu F., Zeng G. Study of genetic algorithm with reinforcement learning to solve the TSP. Expert Systems with Applications . 2009;36:6995–7001. doi: 10.1016/j.eswa.2008.08.026. [DOI] [Google Scholar]
- 39.Samma H., Mohamad-Saleh J., Suandi S. A., Lahasan B. Q-learning-based simulated annealing algorithm for constrained engineering design problems. Neural Computing & Applications . 2020;32:5147–5161. doi: 10.1007/s00521-019-04008-z. [DOI] [Google Scholar]
- 40.Chen Q., Huang M., Xu Q., Wang H., Wang J. Reinforcement learning-based genetic algorithm in optimizing multidimensional data discretization scheme. Mathematical Problems in Engineering . 2020;20:1–13. doi: 10.1155/2020/1698323. [DOI] [Google Scholar]
- 41.Xu Y., Pi D. A reinforcement learning-based communication topology in particle swarm optimization. Neural Computing & Applications . 2020;32:1–26. doi: 10.1007/s00521-019-04527-9. [DOI] [Google Scholar]
- 42.Emary E., Zawbaa H. M., Grosan C. Experienced gray wolf optimization through reinforcement learning and neural networks. IEEE Transactions on Neural Networks and Learning Systems . 2019;29:681–694. doi: 10.1109/TNNLS.2016.2634548. [DOI] [PubMed] [Google Scholar]
- 43.Ghafoorian M., Taghizadeh N., Beigy H. Automatic Abstraction in Reinforcement Learning Using Ant System Algorithm. Proceedings of the AAAI Spring Symposium Series; March 2017; Stanford, CA, USA. pp. 9–14. [Google Scholar]
- 44.Seyyedabbasi A., Aliyev R., Kiani F., Gulle M. U., Shah M. A. Hybrid algorithms based on combining reinforcement learning and metaheuristic methods to solve global optimization problems. KnowledgE − Based Systems . 2013;222 doi: 10.1016/j.knosys.2021.107044.107044 [DOI] [Google Scholar]
- 45.Tizhoosh H. Opposition-based learning: a new scheme for machine intelligence. Proceedings of the International Conference on Computational Intelligence for Modelling; November 2005; Vienna, Austria. pp. 695–701. [DOI] [Google Scholar]
- 46.Long W., Jiao J., Liang X., Cai S., Xu M. A random opposition-based learning grey wolf pptimizer. IEEE Access . 2019;7 doi: 10.1109/access.2019.2934994.113810 [DOI] [Google Scholar]
- 47.Yao X., Liu Y., Lin G. Evolutionary programming made faster. IEEE Transactions on Evolutionary Computation . 1999;3:82–102. doi: 10.1109/4235.771163. [DOI] [Google Scholar]
- 48.Derrac J., García S., Molina D., Herrera F. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evolut Comput . 2011;1:3–18. doi: 10.1016/j.swevo.2011.02.002. [DOI] [Google Scholar]
- 49.Awad N. H., Ali M. Z., Suganthan P. N., Liang J. J., Qu B. Y. “Problem Definitions and Evaluation Criteria for the CEC2017,” in Special Session and Competition on Single Objective Real-Parameter Numerical Optimization . IEEE Congress on Evolutionary Computation; 2016. [Google Scholar]
- 50.Li S. M., Chen H. L., Wang M. J., Heidari A. A., Mirjalili S. Slime mould algorithm: a new method for stochastic optimization. Future Generation Computer Systems . 2020;111:300–323. doi: 10.1016/j.future.2020.03.055. [DOI] [Google Scholar]
- 51.Faramarzi A., Heidarinejad M., Mirjalili S., Gandomi A. H. Marine predators algorithm: a naturE − inspired metaheuristic. Expert Systems with Applications . 2020;152 doi: 10.1016/j.eswa.2020.113377.113377 [DOI] [Google Scholar]
- 52.Mirjalili S., Mirjalili S. M., Hatamlou A. Multi-verse optimizer: a naturE − inspired algorithm for global optimization. Neural Computing & Applications . 2015;27:495–513. doi: 10.1007/s00521-015-1870-7. [DOI] [Google Scholar]
- 53.Geem Z. W., Kim J. H., Loganathan G. V. A new heuristic optimization algorithm: harmony search. SIMULATION . 2001;76:60–68. doi: 10.1177/003754970107600201. [DOI] [Google Scholar]
- 54.Rechenberg I. “Evolutionsstrategien” . Berlin, Germany: Springer Berlin Heidelberg; 1978. pp. 83–114. [DOI] [Google Scholar]
- 55.Mirjalili S. SCA: a sine cosine algorithm for solving optimization problems. KnowledgE − Based Systems . 2016;96:120–133. doi: 10.1016/j.knosys.2015.12.022. [DOI] [Google Scholar]
- 56.Baykaso G. A., Ozsoydan F. B. Adaptive firefly algorithm with chaos for mechanical design optimization problems. Applied Soft Computing . 2015;36:152–164. doi: 10.1016/j.asoc.2015.06.056. [DOI] [Google Scholar]
- 57.Saremi S., Mirjalili S., Lewis A. Grasshopper optimisation algorithm: theory and application. Advances in Engineering Software . 2017;105:30–47. doi: 10.1016/j.advengsoft.2017.01.004. [DOI] [Google Scholar]
- 58.Yildiz B. S., Pholdee N., Bureerat S., Yildiz A. R., Sait S. M. Enhanced Grasshopper Optimization Algorithm Using Elite Opposition-Based Learning for Solving Real-World Engineering Problems. Engineering with Computers . 2021 doi: 10.1007/s00366-021-01368-w. [DOI] [Google Scholar]
- 59.Gandomi A. H., Yang X. S., Alavi A. H. Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems. Engineering with Computers . 2013;29:17–35. doi: 10.1007/s00366-011-0241-y. [DOI] [Google Scholar]
- 60.Abdollahzadeh B., Gharehchopogh F. S., Mirjalil S. Artificial gorilla troops optimizer: a new naturE − inspired metaheuristic algorithm for global optimization problems. International Journal of Intelligent Systems . 2021;36:1–72. doi: 10.1002/int.22535. [DOI] [Google Scholar]
- 61.Mirjalili S. Moth-flame optimization algorithm: a novel naturE − inspired heuristic paradigm. KnowledgE − Based Systems . 2015;89:228–249. doi: 10.1016/j.knosys.2015.07.006. [DOI] [Google Scholar]
- 62.Kaur S., Awasthi L. K., Sangal A. L., Dhiman G. Tunicate swarm algorithm: a new bio-inspired based metaheuristic paradigm for global optimization. Engineering Applications of Artificial Intelligence . 2020;90:1–29. doi: 10.1016/j.engappai.2020.103541. [DOI] [Google Scholar]
- 63.Shayanfar H., Gharehchopogh F. S. Farmland fertility: a new metaheuristic algorithm for solving continuous optimization problems. Applied Soft Computing . 2018;71:728–746. doi: 10.1016/j.asoc.2018.07.033. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data used to support the findings of this study are available from the corresponding author upon request.
