Skip to main content
PLOS One logoLink to PLOS One
. 2024 Mar 11;19(3):e0295643. doi: 10.1371/journal.pone.0295643

Hybrid whale algorithm with evolutionary strategies and filtering for high-dimensional optimization: Application to microarray cancer data

Rahila Hafiz 1,*, Sana Saeed 1
Editor: Omar A Alzubi2
PMCID: PMC10927076  PMID: 38466740

Abstract

The standard whale algorithm is prone to suboptimal results and inefficiencies in high-dimensional search spaces. Therefore, examining the whale optimization algorithm components is critical. The computer-generated initial populations often exhibit an uneven distribution in the solution space, leading to low diversity. We propose a fusion of this algorithm with a discrete recombinant evolutionary strategy to enhance initialization diversity. We conduct simulation experiments and compare the proposed algorithm with the original WOA on thirteen benchmark test functions. Simulation experiments on unimodal or multimodal benchmarks verified the better performance of the proposed RESHWOA, such as accuracy, minimum mean, and low standard deviation rate. Furthermore, we performed two data reduction techniques, Bhattacharya distance and signal-to-noise ratio. Support Vector Machine (SVM) excels in dealing with high-dimensional datasets and numerical features. When users optimize the parameters, they can significantly improve the SVM’s performance, even though it already works well with its default settings. We applied RESHWOA and WOA methods on six microarray cancer datasets to optimize the SVM parameters. The exhaustive examination and detailed results demonstrate that the new structure has addressed WOA’s main shortcomings. We conclude that the proposed RESHWOA performed significantly better than the WOA.

1. Introduction

Metaheuristics are a set of tactics used to navigate the search space. These tactics are the natural processes becoming more critical in genetic engineering [1]. The primary goal is rapidly searching the search space for near-optimal solutions to a given problem. Nature-inspired metaheuristics include evolutionary, physics-based, and swarm-based algorithms that effectively solve complex optimization problems [2]. Evolutionary strategy-based algorithms mimic genetic behaviour and generate innovative solutions iteratively through mutation and recombination. They select the best individuals from the population and carry them over to the next generation, repeating this process until they obtain a satisfactory result. The two types of ES are non-recombinative and recombinative. In evolutionary strategy, non-recombinative strategies involve mutation-only operators that modify the parent solution to produce a new solution. In contrast, recombinative methods involve recombination operators that combine the features of multiple parent solutions to create a new offspring solution. Genetic algorithms (GA) [3] and Genetic programming (GP) [4] are the most common evolutionary-based approaches. Physics-based algorithms simulate physical occurrences in space. Simulating annealing (SA) is a popular method in this field [5]. Swarm-based algorithms are nature-inspired optimization algorithms that mimic the collective behaviour of social organisms such as ants, bees, birds, and fish. Particle swarm optimization (PSO) is a well-known swarm-based method that mimics swarm social behaviour. The field began with the expansion of GA, and since then, researchers have simulated numerous versions. In the literature, several optimization algorithms have been proposed [6]. Due to their ability to achieve the optimal global solution with fewer parameters, these algorithms are widely utilized across various fields [3, 79].

Mirjalili, an Australian researcher, introduced a novel heuristic population-based algorithm in 2016 that mimics whale hunting behaviour, distinguished by a distinctive spiral pattern known as bubble net feeding. This feature gives this algorithm a significant advantage over others [2]. It uses probability to update its optimal individual and motion modes, resulting in greater randomness, faster convergence speed, and a sound effect in practical engineering. It is currently widely used in images [7, 10], medical [8, 9], microgrids [11], and other fields. However, it has the same drawbacks as different swarm intelligence optimization algorithms. When dealing with complex environmental problems, it is easy to become bogged down by local optimum, low convergence rate, and low precision. Given the limitations of WOA, scholars and experts have devoted their efforts to improving the algorithm. To further balance the development and exploration stages in the traditional WOA algorithm, Sahu et al. proposed the MWOA algorithm. They use MWOA as a static synchronous series compensator of the multi-input-single-output (MISO) type (SSSC) [12]. By incorporating a chaos strategy, Sayed et al. proposed a chaotic whale optimization algorithm (CWOA) that improved the ability to jump out from the optimal local solution [13]. Yan et al. proposed a method (AWOA) for using logistic mapping to initialize population positioning and inertia weight to improve population diversity and accelerate convergence speed [14]. Several current WOA improvement algorithms have improved the optimization effect compared to the traditional WOA algorithm, but WOA performance still has much room for enhancement. Researchers need to conduct more research on balancing the ability of local and global exploration, quickly falling into the local optimum, improving convergence accuracy, and so on.

Among the three major branches of evolutionary computation, genetic algorithms (GAs), evolutionary programming (EP), and evolution strategies (ESs), ESs are the only one that was initially proposed for numerical optimization and is still widely used in optimization today [15, 16]. ESs primarily use mutation as the search operator, although they have also used recombination. The state-of-the-art of ES is (μ, λ)−ES, where λ>μ≥1, (μ, λ) means that μ parents generate λ offspring through recombination and mutation in each generation. The best μ offspring are selected deterministically from the λ offspring and replace the parents [17]. The strategies do not use elitism and probabilistic selection. This paper only considers a simplified version of ESs, i.e., RES, without mutation and elitism. ESs are population-based versions of generate-and-test algorithms [18]. They generate fresh solutions using search operators such as mutation and then use a selection scheme to determine which newly developed solutions should survive for the next generation. The advantage of viewing ESs as a variant of search algorithms is that the relationships between different search algorithms, such as simulated annealing (SA), tabu search (TS), hill-climbing, etc., we can make it more explicit and thus easier to explore. Furthermore, the generate-and-test perspective on EAs clarifies that genetic operators like a crossover (recombination) and mutation are stochastic search operators used to generate new search points in a search space. Rather than biological analogy, its ability to produce promising new facts with a higher probability of leading to a global optimum best describes a search operator’s effectiveness.

The function of a test in a generate-and-test algorithm or selection in an EA is to determine how promising a new point is. These assessments can be heuristic or probabilistic. The (μ, λ)−ESs use a Gaussian mutation to generate new offspring and the deterministic selection to evaluate them. There has been a lot of work on different selection schemes for ESs [17].

The creation of prediction models is a fascinating application of machine learning. Prediction models have been used in several biological applications [1922]. Using gene expression profiles to identify and classify malignant and normal tissues can be a challenging application of machine learning. Still, its difficulty may vary depending on the specific context and data type being analyzed [23]. The novel DNA microarray technique can detect the expression levels of several genes in a single experiment. Researchers can use this technology to understand the genes expressed in each tissue under various conditions. The Support Vector Machine (SVM) is widely used in machine learning models [24] and is a supervised learning algorithm for classification and regression analysis. It performed well in various classification applications [2529]. Medical diagnosis is an essential application for the SVM classifier because it is crucial in diagnosing specific disorders. We must first solve the SVM model to benefit from it, including determining the best parameters. Many solutions have been developed in recent years to address this difficulty, such as probabilistic optimization methods, Laplace evidence approximations, and Generalized Approximate Cross-Validation (GACV) error [30, 31].

Accurate tumour progression prediction is critical for cancer diagnosis and treatment. Developing cDNA microarray technology is significant in molecular biology and cancer research [32]. Because of the enormous number of genes monitored, cDNA microarray data sets have a high dimensionality, and there are frequently few samples. To improve the efficiency of the considered model SVM, we employ the Bhattacharyya distance (BC) and signal-to-noise ratio (SNR) statistical filtration techniques [33].

There can be a strong correlation between high-dimensional data, such as microarray data, and optimization techniques. High-dimensional data typically involve datasets with many features or variables, posing several challenges, including noise, redundancy, and the curse of dimensionality. Optimization techniques can be valuable in addressing these challenges and extracting meaningful information from such data.

Previous studies have used a few ways in which optimization is often applied to high-dimensional data, especially in microarray analysis [3437]. High-dimensional data often contain many irrelevant or redundant features. Feature selection techniques use optimization algorithms to identify a subset of the most informative features, reducing dimensionality while preserving the relevant information [3844]. Dimensionality reduction methods like Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE) aim to project high-dimensional data into a lower-dimensional space while preserving the most significant variance or structure [37, 4548]. These techniques often involve optimization to find the best projections. In microarray data analysis, clustering and classification tasks are common. Machine learning practitioners employ optimization to find optimal parameters for machine learning algorithms like k-means clustering, support vector machines (SVM), or neural networks, which can handle high-dimensional data for tasks such as identifying disease subtypes or predicting outcomes [49, 50]. When building predictive models from high-dimensional data, regularization and model selection techniques like L1 regularization (Lasso) and L2 regularization (Ridge) use optimization to balance model complexity and accuracy [5153]. Model selection methods also rely on optimization to choose the best model hyperparameters. Gene Network Inference: Researchers often use microarray data in genomics to infer gene regulatory networks. Optimization techniques can help discover the relationships between genes by fitting models that best explain the observed expression patterns [5456]. Biomarker Discovery: High-dimensional data analysis is crucial in identifying biomarkers for disease diagnosis, prognosis, or treatment response [5759]. Optimization plays a role in feature selection and model building for biomarker discovery. Optimization in Experimental Design: When planning microarray experiments, optimization techniques can assist in selecting the most informative samples or conditions to maximize the utility of the data collected [60, 61].

The correlation between high-dimensional data, such as microarray data, and optimization techniques is significant. Optimization methods are essential for the preprocessing, analyzing, and modelling high-dimensional datasets to extract meaningful information, reduce noise, and enhance the effectiveness of data-driven decision-making processes.

This paper presents a new hybrid algorithm, the Recombinative Evolutionary Strategy Hybrid Whale Optimization Algorithm (RESHWOA), to tackle the problems mentioned earlier. The following are the paper’s main contributions:

  1. The authors present a new optimization algorithm incorporating a recombinant evolutionary strategy into the whale optimization algorithm to improve the diversity of the positions of the “whales,” ideally leading to better optima. This approach can significantly reduce the occurrence of getting stuck in local optima and improve the convergence accuracy of the algorithm.

  2. The author’s main contribution is comparing the proposed algorithm with the original version of WOA to validate its effectiveness on benchmark functions. Additionally, the authors conduct scalability experiments and present the results in Tables 3 and 4, demonstrating the algorithm’s ability to solve high-dimensional problems.

  3. Inspired by Table 6 algorithms and techniques, this research primarily aims to optimize the hyperparameters of the SVM model by utilizing the proposed method and the original WOA optimizer to minimize the MSE. The main objective is to identify the optimal set of hyperparameters for high-dimensional data with minimum MSE.

  4. We ran this algorithm on various microarray cancer datasets and examined the results with thirty runs to check the efficiency of the considered model.

  5. To reduce the dimensionality of the data, we have employed statistical filtration techniques such as Bhattacharyya distance (BC) and signal-to-noise ratio (SNR) [33, 62] in our approach.

Table 3. Test results of the benchmark functions, the dimension fixed to thirty.

Function WOA RESHWOA
Avg. Std. Time Avg. Std. Avg. Time benchmark
F1 1.0683e-74 3.0191e-74 01.2780 1.8256e-10 6.9650e-10 05.1139 0
F2 1.4999e-49 6.8637e-49 01.4028 5.9813e-60 2.2459e-59 05.1934 0
F3 4.2659e+04 1.3960e+04 05.9696 1.5931e+04 7.9631e+03 21.5713 0
F4 47.4474 25.6257 01.5514 18.3155 26.0905 05.0813 0
F5 27.9464 0.4439 01.8927 27.0641 0.6092 06.8722 0
F6 0.3906 0.2617 01.2809 0.0437 0.0996 05.0578 0
F7 0.0015 0.0015 03.8390 9.8461e-04 0.0014 13.5069 0
F8 -1.0523e+04 1.7622e+03 01.8925 -1.1591e+04 1.2476e+03 07.0700 -418.9829*30
F9 3.7896e-15 2.0756e-14 01.5322 0 0 05.3827 0
F10 3.6119e-15 2.9033e-15 01.5966 4.0856e-15 1.9459e-15 05.7373 0
F11 0.0043 0.0236 01.9610 0 0 07.1504 0
F12 0.0286 0.0240 08.5613 0.0048 0.0086 29.7863 0
F13 0.4524 0.2052 08.6047 0.1038 0.1104 29.3452 0

Table 4. Test results of the benchmark function, the dimension fixed to one hundred.

Function WOA RESHWOA
Avg. Std. Avg. Time Avg. Std. Avg. Time benchmark
F1 1.8822e-69 1.0299e-68 02.7470 4.1499e-100 1.3019e-99 03.7140 0
F2 3.7181e-48 1.6884e-47 02.7874 7.7554e-58 2.4461e-57 03.2223 0
F3 1.0457e+06 3.8371e+05 26.0376 6.5064e+05 1.4678e+05 91.0296 0
F4 67.2380 28.5696 02.7059 62.7189 35.6743 08.9364 0
F5 98.2406 0.2069 03.3927 97.4164 0.4801 12.3029 0
F6 4.1325 1.0834 02.5921 1.1605 0.5098 09.1404 0
F7 0.0034 0.0039 11.1243 0.0012 0.0018 38.1394 0
F8 -3.6407e+04 6.1080e+03 04.4031 -3.9287e+04 3.1190e+03 14.9302 -418.9829*100
F9 0 0 03.0551 0 0 10.3598 0
F10 4.7962e-15 2.8529e-15 03.2018 4.3225e-15 2.5523e-15 11.0596 0
F11 3.7007e-18 2.0270e-17 03.990081 0.0069 0.0264 13.8847 0
F12 0.0576 0.0372 23.853848 0.0102 0.0056 79.2452 0
F13 2.7122 0.6399 24.335468 0.9871 0.5980 80.5072 0

Table 6. SNR results of RESHWOA and WOA with thirty runs and fifty iterations.

S. No Data sets RESHWOA parameters(C, γ) at min MSE RESHWOA Min MSE RESHWOA Avg. MSE RESHWOA Avg. Std WOA SVM Parameters(C, γ) at min MSE WOA Min MSE WOA Avg. MSE WOA Avg. Std
1 Carcinoma 0.923, 73.933 0.0000 0.0013 0.0024 5.031,20.832 0.5200 0.5200 3.3876e-16
2 CNS 1.878, 75.967 0.0000 3.7665e-04 0.0014 5.328, 25.214 0.2857 0.2865 0.0043
3 colon 2.672, 89.281 0.0056 0.0019 0.0027 7.275, 52.180 0.3023 0.3023 5.6460e-17
4 Breast 4.970, 95.967 0.1194 0.1095 0.0120 10, 100 0.1194 0.1527 0.0202
5 Leukemia 6.616, 88.078 0.0000 5.6497e-04 0.0017 0.000, 0.1028 0.2000 0.2767 0.0167
6 Ovarian 7.543, 23.674 0.0000 0.0011 0.0023 7.549, 34.188 0.0000 0.0119 0.0045

The paper comprises five sections. In Section 2, we illustrate the main ideas of the standard whale algorithm and standard recombinative evolutionary strategies, and then we describe the details of our proposed hybrid algorithm in Section 3. In Section 4, we will demonstrate and analyze the experimental results. Finally, Section 5 concludes the work.

2 Whale optimization and recombinative evolutionary strategy

The whale optimization algorithm (WOA) is a new nature-inspired metaheuristic optimization algorithm that Australian scholar Mirjalili and others proposed. The main inspiration of the algorithm is to simulate the predation behaviour of the humpback whale population and update the position of the candidate solution through the process of the whale population, spiral updating position, encircling, and finding prey. Recombinative evolutionary strategy (RES), developed by Deb in 1997, is a metaheuristic technique [12, 30]. Both WOA and RES are non-gradient-based evolutionary algorithms that potentially have a parallel structure. Research has proven that both algorithms can get better optimization results than existing methods.

2.1 Whale optimization algorithm

The whale optimization algorithm is a relatively new nature-inspired optimization algorithm proposed in 2016 by Seyedali Mirjalili, Andrew Lewis, and Ashraf Alasty [2, 63]. Humpback whales′ hunting behaviour forms the basis for it, where whales cooperate to encircle their prey and gradually decrease the prey escape options until they catch it. In the WOA algorithm, researchers represent each potential solution as a whale, and the whale’s position corresponds to the decision variables in the optimization problem. The algorithm begins with an initial population of randomly generated whales, and the goal is to find the best solution that minimizes or maximizes the objective function [10]. The WOA algorithm employs three mechanisms to explore and exploit the search space: exploration, exploitation, and convergence. During the exploration phase, the whales move randomly to explore the search space. In the exploitation phase, the whales swim and encircle to achieve the optimal solution. Finally, in the convergence phase, the whales use the spiral and bubble-net hunting behaviour to converge to the optimal solution. The WOA algorithm also uses a set of adaptive parameters updated during the optimization process to control the balance between exploration and exploitation. Experts utilize these parameters to adjust the step sizes of the whales and the search agent search range to ensure that the algorithm converges efficiently and accurately.

The WOA algorithm effectively solves various optimization problems, including benchmark functions, engineering design problems, and machine learning problems. However, like other optimization algorithms, the performance of the WOA algorithm depends on the problem characteristics and the selection of appropriate algorithm parameters. The WOA is relatively easy to understand and codes in applications. Since its inception, WOA has gained widespread popularity and has found application in various engineering fields. Below, we describe in detail the steps of the algorithm in section 3.

2.2 Recombinative evolutionary strategy

Evolutionary strategies (ESs) are a powerful class of search and optimization methods inspired by natural evolutionary mechanisms [22, 64]. Intuitively, they mimic evolutionary principles such as a population-based strategy, information inheritance, information variability through crossover/mutation, and individual choice based on fitness. There are various evolutionary strategy variants; however, we will concentrate on discrete recombination (DR), a type of ES. In the DR phase, researchers develop a starting population of μ individuals, and during a DR generation, ρ parents produce λ descendants. The algorithm repeatedly selects parents during this process. They recombine the components, and then they mutate the resulting offspring. In the primary ES, two parents produce a descendant, respectively. Initially, two individuals P1 and P2, are stochastically determined as parents of the offspring. All individuals have the same selection probability of 1/μ.

The components of both parent vectors subsequently recombine to produce offspring. Empirical evidence has shown that using different recombination schemes for the decision variables provides an advantage. To determine the values of the decision variables (Index O), we stochastically select the value of one or the other parent indices (P1 and P2) with equal probability for each decision variable. We refer to this approach as discrete recombination. It strongly resembles the Uniform Crossover in GA. On the other hand, intermediary recombination performs an averaging operation on the mutation step widths of parents in genetic algorithms to generate offspring [65]. Let us focus on one solution element in Eq 1: the jth element, denoted as XO,j. This element can take on the value of either XP1,j or XP2,j. Here, P1 and P2 are randomly selected parent indices from the population with μ individuals.

XO,j=XP1,jorXP2,j(j=1,2,..,λ;p1,p2ϵ(1,2,.μ)) (1)

Let us consider a scenario with four parents (P1, P2, P3, P4) and λ offspring. Each parent has values for several decision variables. For example, we derive a discretely recombined solution from the following four solutions.

P1:XP1,1XP1,2XP1,3XP1,4XP1,5XP1,λ
P2:XP2,1XP2,2XP2,3XP2,4XP2,5XP2,λ
P3:XP3,1XP3,2XP3,3XP3,4XP3,5XP3,λ
P4:XP4,1XP4,2XP4,3XP4,4XP4,5XP4,λ
Offspring:XP2,1XP4,2XP3,3XP1,4XP4,5XP4,λ

We randomly take the value from one of the parents to create an offspring for each decision variable. This process helps explore different combinations of decision variables to improve the quality of solutions in optimization problems potentially.

Figs 1 and 2 illustrate the recombination forms’ working and further recombination variants. The practical implementation of the discrete recombination is also available in S1 Fig.

Fig 1. Discrete recombination.

Fig 1

Fig 2. Intermediate recombination.

Fig 2

3. RESHWOA algorithm

The conventional WOA method generates the whales′ initial solution using random numbers. We will apply the DR strategy, starting with an initial random population, to achieve a diverse perspective. Moreover, the computer-generated initial population is generally unevenly distributed in the solution space, resulting in a low initial population diversity [64]. It also necessitates a stochastic and iterative procedure to evolve a population of individuals over a predetermined number of generations [66]. In contrast, the proposed hybrid method practices a discrete recombinative technique and introduces a group of recombinants and sampling from the DR [6]. The main reason is to use the recombination principle; it is a popular method of combining the beneficial characteristics of the parents in the offspring and holds a significant position in computing. Our proposed method provides a diverse initial population for improving the algorithm’s global search capability and the accuracy of the optimal solution. The literature provides evidence for the effectiveness of hybrid optimization techniques for the initial population, which involve integrating multiple optimization algorithms to yield superior results. For example, in the CMA-ES method, new search points are generated by sampling from a multivariate normal distribution [67]. Similarly, in chaos initialization, a set of chaotic variables is generated as the initial population to improve the optimization performance of the WOA algorithm [68].

The structure of the WOA algorithm consists of the following components. Below, we describe three key aspects, each highlighted in bold.

Step 1: Generate the vectors of the initial WOA population. All the vectors are random values between 0 and 1; also, initialize k and i.

Step 2: Encircling Prey Behaviour: The location of the whale closest to the prey influences the

movement behaviour of other whales, making other whales′ approach the optimal whale. The location update model is as follows.

D1=|C.X(*)(i)X(i)| (2)
X(i+1)=X(*)(i)A.D1ifp<0.5 (3)
A=2a.r1a (4)
C=2.r1 (5)

Parameter ‘a’ accelerates from 2 to 0 and attains zero at the maximum iteration, r1 randomly generated vector lies between 0 and 1. ‘A’ assumed the values -1 to 1 range. Where ‘i’ indicates the current iteration, A and C are coefficient vectors, X* is the position vector of the best solution obtained so far, and X is the position vector.

Step 3: Attacking Prey Behaviour: When attacking prey, the humpback whale has its unique path movement mode, which attacks targets by the bubble net movement. Its specific mathematical model is as follows:

X(i+1)=D.ebl.cos(2πl)+X(*)(i)ifp0.5 (6)
D=|X(*)(i)X(i)| (7)

Where b is the logarithmic helix shape constant, usually b is one, and l is the random number between [–1,1]. D′ indicates the distance of the ith whale to the prey.

It is worth mentioning that whether a whale population encircles its prey or spirals to attack prey depends on the p-value, and p is a random number between [0,1].

Step 4: Searching Prey Behaviour: To improve the global search ability of the optimization algorithm, whale′s populations surround prey when |A|<1 or randomly select a whale’s position as a reference to update other whales′ positions when |A|>1. The ability is to move whale populations away from the position of the whale closest to prey so far for a global search. The specific mathematical model is as follows:

D1=|C.X(rand)(i)X(i)| (8)
X(i+1)=X(rand)(i)A.D1 (9)

Where X(rand)(i) is a random whale position.

Step 5: If p≥0.50, update the position of the current search by Eq (6).

Step 6: If ’i’ is not equal to no. of whales, then update ’i’ and go to step 2. else

Step 7: if k ≠ no. of iterations, update k and go to step 2.

The genetic algorithm repeats the entire process until it achieves the desired number.

The proposed model actively enhances the initialization phase of WOA, thus leading to a more efficient optimization process. To develop the proposed model, let us delve into how we determine the values of decision variables for the offspring using discrete recombination.

Step 1: Initialize upper boundary (ub)

Step 2: Initialize lower boundary (lb)

Step 3: Initialize the dimensionality of the problem (dim).

Step 4: Calculate several boundaries (Boundary no) based on the size of ub.

Step 5: Set population size parameters λ and μ.

Step 6: Generate a random population (pop) of lambda individuals.

Step 7: Initialize an empty matrix ‘pxx’ with dimensions (lambda, lambda).

Step 8: Loop through each individual in the population.

Step 8.1: Eq 1 involves random sampling method to select μ individuals from ‘pop’ without replacement to form a group of ρ parents.

Step 8.2: Stack these individuals into a matrix xm.

Step 8.3: Get dimensions of xm (nrow and ncol).

Step 8.4: Randomly select indices from each column of xm.

Step 8.5: Update the corresponding row in ‘pxx’ with selected values from ‘xm.’

Step 9: Check if the boundaries of all variables are equal, and the user enters a single number for both ub and lb.

Step 9.1: Loop through each dimension. Scale each column of pxx by (ub—lb) and then add lb and get a matrix ‘Positions’.

Step 10: If each variable has different lb and ub (Boundary no is greater than 1).

Step 10.1: Loop through each dimension. Scale each column of pxx by (ub—lb) and then add lb and get a matrix ‘Positions’.

Step 11: Resulting matrix Positions contains the initial population of search agents.

Step 12: Return the Positions matrix as the initialized population.

3.1 Simulation experiments

We conduct a test setup to demonstrate how RESHWOA performs for both optimization evaluation metrics.

  1. Unconstrained Unimodal Test Function

  2. Unconstrained Multimodal Test Function

3.1.1 Benchmark and experimental setup

Benchmark functions are an essential tool to evaluate the precision, convergence rate, robustness, and overall performance of new algorithms and their features. Therefore, we have selected a set of thirteen benchmark test functions based on their characteristics, modality, and other properties, providing various functions with varied difficulties. The benchmark functions used in our study are the same as those used in previous research [69, 70] and are summarized in Tables 1 and 2. The dimension of the benchmark function is denoted by D, the scales of the variables by S, and Fmin represents the global optimum value in the variable scales.

Table 1. Unimodal benchmark functions.
Function Name Function D S F min
Sphere F1(X)=i=1DXi2 30,100 [−100, 100]D 0
Schwefel’s 2.22 F2(X)=i=1D|Xi|+i=1D|Xi| 30,100 [−10, 10]D 0
Schwefel’s 1.20 F3(X)=i=1D(i=1DXi)2 30,100 [−100, 100]D 0
Schwefel’s 2.21 F4(X)=max{|Xi|,1iD}i 30,100 [−100, 100]D 0
Rosenbrock F5(X)=i=1D[100(Xi+1Xi2)2+(Xi1)2] 30,100 [−30, 30]D 0
Step F6(X)=i=1D(Xi+0.5)2 30,100 [−100, 100]D 0
Quartic Noise F7(X)=i=1DiXi4+random[0,1) 30,100 [−1.28, 1.28]D 0
Table 2. Multimodal benchmark functions.
Function Name Function D S F min
Schewefel’s 2.26 F8(X)=i=1DXisin(|Xi|) 30,100 [−500, 500]D -148.9829*n
Rastrigin F9(X)=[Xi210cos(2πXi+10)] 30,100 [−5.12, 5.12]D 0
Ackley F10(X)=20exp(0.21Di=1DXi2)
exp(1Di=1Dcos(2πXi))+20+e
30,100 [−32, 32]D 0
Griewank F(X)=1400i=1DXi2i=1Dcos(Xii)+1 30,100 [−600, 600]D 0
Pendlized F12(X)=i=1DU(Xi,10,100,4)
+πD{10sin2(3πyi)+i=1D1(yi1)2[1+sin2(3πyi+1)]+(yD1)2}
yi=1+14(Xi+1)
U(Xi,a,k,m)={k(Xi1)m,0,k(Xi1)m,Xi>a,aXia,Xi<a,
30,100 [−50, 50]D 0
Generalized Pendlized F13(X)=i=1DU(Xi,10,100,4)
+110{sin2(3πXi)+i=1D1(Xi1)2[1+sin2(3πXi+1)]+(XD1)2[1+sin2(2πXD)]}
U(Xi,a,k,m)={k(Xi1)m,0,k(Xi1)m,Xi>a,aXia,Xi<a,
30,100 [−50, 50]D 0

Before beginning the simulation study, we must determine four parameters for RESHWOA. The parent population size is one of the parameters, and the others are A, C, and D. Configure; these parameters are as follows: μ = 100, λ 100, D = 30,100, ρ = 5, and A and C are internally adjusted. We evaluated the functions over 30 runs using 500 iterations and 30 random search agents.

3.1.2 Intensification capability experiment

Unimodal benchmark functions have a single global optimum over their entire domain. Experiments on unimodal benchmark functions revealed an intensification of convergence, as shown in Table 3. Results in Table 3 showed that the proposed RESHWOA algorithm would perform best in most cases. However, the WOA algorithm sometimes finds the global optima with the same iteration number. The best values are in bold.

3.1.3 Diversification capability experiment

Multimodal benchmark functions contain multiple local optimal and one global optimum. When approaching these functions with limited exploration, local optima can easily trap individuals, insufficient search strategies, or premature convergence. To escape entrapment, individuals should possess diversification capability. Table 4 shows the results of simulation experiments that produce multimodal benchmark functions.

3.1.4 Scalability experiment

Tables 3 and 4 compare the results of both dimensions for all the test functions considered in this article. We conducted a comparative study to justify the efficiency of the proposed method. We validate the performance of RESHWOA by analyzing the results in two parts. In the first part, the results with the 30-dimensions case show that RESHWOA produced the best values equal to the globally best values for two test functions (F9, F11) out of the thirteen test functions. In (F1, F2, F4, F6, F7, F12, F13) test functions, the RESHWOA did not find the optimal solution, but they are very close to theoretical optimal solutions, except in four (F3, F5, F8, F10). In the second part, as the dimension increases, the RESHWOA also has precedence in (F1, F2, F4, F6, F7, F9, F12, F13), particularly in the function F9, See Table 4. We calculated the standard deviation for each set of runs and recorded the results. Moreover, the minimal level of the Std. indicates how the algorithm showed the best result near the global value. RESHWOA showed consistent behaviour for all the test functions that the least values of the Std. can judge. Tables 3 and 4 also provide the average time of the algorithm.

3.1.5 Acceleration convergence experiment

We thoroughly examine the convergence behaviour of the algorithm by plotting its optimal values in each iteration for dimension thirty. Figs 315 depict the convergence graphs of the two techniques for each of the thirteen benchmark test functions. RESHWOA achieves the most rapid and significant convergence for four distinct types of tests, namely F3, F6, F8 and F12. The suggested algorithm took first place and performed exceptionally well in terms of quicker convergence to the best value of the test function. Except for F10, all test functions showed poor WOA convergence. When observing convergence behavior F1, F2, F5, F9 and F11, we find that both algorithms exhibit equal convergence. We illustrated the results in Figs 315.

Fig 3. WOA and RESHWOA convergence curves.

Fig 3

Fig 15. WOA and RESHWOA convergence curves.

Fig 15

Fig 4. WOA and RESHWOA convergence curves.

Fig 4

Fig 5. WOA and RESHWOA convergence curves.

Fig 5

Fig 6. WOA and RESHWOA convergence curves.

Fig 6

Fig 7. WOA and RESHWOA convergence curves.

Fig 7

Fig 8. WOA and RESHWOA convergence curves.

Fig 8

Fig 9. WOA and RESHWOA convergence curves.

Fig 9

Fig 10. WOA and RESHWOA convergence curves.

Fig 10

Fig 11. WOA and RESHWOA convergence curves.

Fig 11

Fig 12. WOA and RESHWOA convergence curves.

Fig 12

Fig 13. WOA and RESHWOA convergence curves.

Fig 13

Fig 14. WOA and RESHWOA convergence curves.

Fig 14

4. Algorithm performance on cancer data

A microarray is a laboratory instrument that detects the expression of thousands of genes at the same time. The main disadvantage of microarray data is the curse of dimensionality, which obstructs helpful information in a data set and causes computational instability. Therefore, relevant gene selection in microarray data analysis is complex [21]. In this study, we examined microarray data and the results of optimization.

4.1 Data sets

We used two datasets, one with 2,000 features (the smallest) and the other with 24,481 features (the largest). Table 5 details the datasets. We conducted the experiments using MATLAB 2020 on a Windows 10 platform running on an Intel Core i7 computer. We used two different approaches for feature selection to improve the model′s performance.

Table 5. Information of data sets and reduced data by data reduction techniques BC and SNR.

S. No. Data sets Classes Sample Total Genes BC-Reduced data sets SNR-Reduced Data sets Ratio of BC Ratio of SNR
1 Breast Relapse(46) non_relapse (51) 97 24481 23 20 0.09% 0.08%
2 Carcinoma Tumour(18),Normal(18) 36 7464 62 20 0.83% 0.27%
3 Colon Tumour(40),Normal(22) 62 2000 06 20 0.30% 1.0%
4 CNS Tumour(21),Normal(39) 60 7129 36 20 0.50% 0.28%
5 Ovarian Normal(91),Cancer(162) 253 15154 24 20 0.16% 0.13%
6 Leukemia ALL (47), AML (25) 72 7129 28 20 0.39% 0.28%
Average FF 0.38% 0.34%

4.1.1 Filtration techniques and reduced data sets

We apply two data reduction techniques before applying the proposed method. These techniques are good in avoiding redundant genes because they use information from normal and cancer genes.

4.1.1.1 Signal-to-noise ratio (SNR). The signal-to-noise ratio test identifies the gene expression pattern with a significant difference in mean and variance within each group [62]. We select the top-ranked genes through the SNR test statistics according to their expression levels. Below, we provide the formula for the method.

Signaltonoiseratio=μ1μ2σ1+σ2 (10)

Where μ1 and μ2 are the mean expression values for the sample, classes 1 and 2, respectively. The standard deviations are σ1 and σ2 in each class.

To discover differentially expressed genes with SNR, we perform the following steps.

Step 1: Normalize the data.

Step 2: Separate the two groups for normal and disease data.

Step 3: Evaluate the signal-to-noise ratio.

Step 4: Sort the data in ascending order.

Step 5: Select the top twenty genes.

Step 6: Fed this selected gene to the SVM.

Step 7: We use MSE as the objective function of the proposed method.

4.1.1.2 Bhattacharyya distance (BC). Bhattacharyya distance is a metric to evaluate the similarity between two probability distributions [71]. No previous investigation exists on feature selection using BC distance, and our proposed method has found utility in the optimum choice of features [72].

B=18(μiμj)t(σi+σj2)t(μiμj)+12ln|σi+σj2|(|σi||σj|) (11)

μi and σi refer to the mean and variance of the gene in the cancer sample, μj and σj refer to the mean and variance of the gene in the normal tissue samples. The greater the distance is, the stronger the relationship between genes and cancer. Following are the steps for the Bhattacharyya distance.

Step 1: We have a set of genes S = {F1, F2, F3………Fn}.

Step 2: Evaluate the Bhattacharyya distance (BC).

Step 3: Sort the distance in descending order. (i.e., the values with maximum dissimilarity)

Step 4: We determine the benchmark value through a hit-and-trial method.

Step 5: Select the genes based on the threshold value if BC>threshold.

Step 6: Subset of the informative genes are {F1, F2, F3,…Fs}

We investigated marker genes using BC and SNR techniques and presented the selected genes in Table 5. We employed the hit-and-trial method to determine the threshold value of BC. On the other hand, we set the top twenty genes using SNR. Since these methods only select a limited subset of informative genes, we used reduced datasets in all our experiments below to ensure we could conduct a rigorous analysis with a manageable number of features.

4.1.2 Classifier and validation. 4.1.2.1 Support vector machine

The support vector machine is well-known for its high generalization capability and robustness when dealing with high-dimensional data [1]. Our SVM models use the radial basis function (RBF) kernel, which requires two parameters, penalty term C and kernel parameter γ, since these parameters directly impact classification performance [31]. Optimizing parameters is crucial for achieving better classification accuracy. However, this process can be challenging and costly, and it may compromise the reliability of the results. While the default parameters of support vector machines (SVM) can yield satisfactory performance, fine-tuning them through parameter optimization can significantly improve classification accuracy [73]. The lower and upper limits of c are 0.0001 and 100, while for γ, lower and upper limits are 0.001 and 50, respectively. The trial-and-error approach defines these bounds. After initialization, we evaluate the search space for each candidate with an objective function. This study uses the mean squared error (MSE) as the objective function.

The objective function’s primary purpose is to determine the individual within the population with the lowest loss function value, as this member is regarded as the best performer among all individuals. We must minimize the objective function through the optimization algorithm to obtain the optimal solution and optimize the parameters c and γ. Subsequently, we assessed the accuracy of the proposed method and WOA regarding mean squared error (MSE) as the evaluation metric. Cross-validation is a technique to reduce the problem of overfitting. To choose parameters c and γ using the holdout method, we split data into seventy by thirty ratios for training and testing purposes. We use one subset to train the model and another to evaluate the predictions.

We apply the SVM method with the following presets.

  1. We use the Radial basis function (RBF) kernel.

  2. We split the data into training and testing using holdout.

  3. We optimize the kernel scale and box constraints.

4.1.2.2 Validation. We use the data set S={(Xi,Yi)|iN} of microarray cancer data with the defined binary class diseased or non-diseased to assess predictive accuracy. We divided the data into two disjoint subsets for training and testing purposes, i.e., StrainStest = S and StrainStest = ∅. We integrated an SVM model with measured responses on the training subset Strain and used it to estimate the unknown responses in the test subset Stest.

Adding the predictions from the disjoint test sets, as a result, for the original data set S, we now have out-of-sample predictions. Two aspects are critical for model validation: discrimination and calibration. Discrimination affects only classification, whereas calibration affects both classification and accuracy. Discrimination assesses the model’s ability to distinguish between high and low-risk individuals without considering the absolute values of the predictions. Conversely, calibration quantifies the similarity of predicted outcome variables or ratings to observed outcomes. We want to predict the SVM parameter’s accuracy as precisely as possible in our classification task. As a result, we require well-calibrated models. The mean squared error (MSE), which we normalized by dividing by the sample size, is a statistic that measures both discrimination and calibration. We confirmed that the model produced consistent results across all datasets. Mean Squared Error (MSE) quantifies and communicates the model’s accuracy.

MSE(Stest)=1|Stest|iϵStest(yiy^i)2

4.1.2.3 Feature encoding for SVM training. Our study used a machine learning approach to analyze the data. Specifically, we employed a Support Vector Machine (SVM) classifier to perform classification tasks. To prepare the data for SVM training, we implemented a feature encoding process. This process involved converting categorical labels into numerical values, which are suitable for SVM training. We performed the feature encoding as follows. To illustrate our algorithm’s operation, please refer to Fig 16, which provides a visual representation of how these encoded features are utilized.

Fig 16. Flow diagram of the proposed algorithm.

Fig 16

  1. Loading the Data: We load the data from the ’file_name.xlsx’ Excel file using the ‘readtable’ function.

  2. Defining Categorical Labels and Numerical Values: We define the categorical labels in the ‘k’ array as [“Tumour”, “Normal”], and we define their corresponding numerical values in the ‘l’ array as [1, 0]. These arrays establish a mapping between the categorical labels and their numerical representations.

  3. Encoding Features: The loop iterates through the ‘y’ column of the data (assumed to contain the categorical labels). For each label in the ‘y’ column, it checks if it matches either “Tumour” or “Normal” (as defined in ‘k’). If we find a match, it assigns the corresponding numerical value (1 or 0) to the ‘number’ array. This process encodes the categorical labels as numerical values.

  4. Adding Encoded Features to the Data: We add the encoded labels to the data as a new column named ‘category_encoded’. This column contains the numerical representations of the categorical labels.

Following these steps, we transform the original categorical features in the ’y’ column of the data into a numerical format suitable for training the SVM model. This encoding allows the SVM model to work with the data effectively.

4.2 SVM Parameter optimization using RESHWOA and WOA

To optimize the SVM parameters c and γ, we used the traditional WOA and proposed RESHWOA. with ub = [10 100], lb = [0.0001 0.1], and runs = 30.

4.2.1 SNR results of RESHWOA and WOA

The proposed model achieved lower averages in all datasets than the WOA algorithm, as shown in Table 6. Furthermore, the proposed model provided minimum standard deviation measures in all datasets except two; however, WOA performance in two datasets, carcinoma, and colon, is good. In all datasets, RESHWOA outperforms as compared to WOA in terms of minimum score except in the Colon and Breast cancer datasets. Overall, the proposed method significantly improved performance in the presence of the SNR. This experiment demonstrates that the RESHWOA always finds a near-optimal parameter combination with minimum MSE in the given range. We highlight the best values in bold.

4.2.2 BC results of RESHWOA and WOA

In the previous experiment with Colon and Breast cancer datasets, the signal-to-noise ratio failed to identify the minimum MSE score. In contrast, BC gives the minimum error score ’zero’ in all data sets in the current investigation except for CNS data because the minimum error score is 0.5%. Furthermore, the proposed model provided minimum standard deviation measures in all datasets except Carcinoma and Leukemia, while WOA performs best in both. The proposed method yielded the best parameter combination with the lowest MSE score in most data. Table 7 displays the efficiency of the proposed algorithm.

Table 7. BC results of RESHWOA and WOA with thirty runs and fifty iterations.
S.No. Data Sets RESHWOA SVM parameters (C, γ) at min MSE RESHWOA Min MSE RESHWOA Avg. MSE RESHWOA Avg. Std WOA SVM parameters (C, γ) at min MSE WOA Min MSE WOA Avg. MSE WOA Avg. Std
1 Carcinoma 6.7891, 100 0 3.7665e-04 0.0014 9.267, 11.566 0 0 0
2 CNS 10, 100 0.0056497 7.5330e-04 0.0020 4.422, 24.812 0.11905 0.1587 0.0201
3 colon 10, 100 0 7.5330e-04 0.0020 9.498,92.920 0.046512 0.1023 0.0233
4 Breast 10, 34.157 0 3.7665e-04 0.0014 0.169, 63.283 0.37313 0.4159 0.0217
5 Leukemia 2.7157, 56.666 0 3.7665e-04 0.0014 6.748, 49.153 0 0 0
6 Ovarian 2.2702, 40.578 0 9.4162e-04 0.0021 4.296, 71.344 0 0.0083 0.0038

Table 8 summarizes the previous work done for parameter optimization where optimization techniques and kernel functions of SVM vary [32, 7476]. All these kernels are acceptable because there are specific issues for which each of them is the best option. We have listed fourteen publicly available datasets from published studies in Table 8 and the number of optimization techniques used for each dataset. To our knowledge, no existing literature currently utilizes WOA as a parameter optimization technique. Using an improved version of the WOA, we optimized the SVM parameter by BC and SNR integrated techniques. From the results, RESHWOA features have improved minimum error over the existing method; this is a significant achievement. Fig 17 indicates the performance comparison of BC and SNR. Both techniques showed a good result, but BC indicates a more optimal result than SNR.

Table 8. Typical datasets and parameter optimization techniques with a specific kernel.
Datasets (High Dimensional) Optimization techniques Kernel (SVM) Impression using the optimization algorithm Authors
Leukemia, Embryonal Tumours, Dexter, Internet ads, Madelon, Musk, Spam base, SPECTF Heart, Intrusion Grid search, GA. Linear, Rbf, Polynomial, Sigmoid GA has proven to be more stable than Grid search. Iwan Syarif (2016)
Acute leukemia, Breast cancer GA. Gaussian Perform well in gene selection and achieve high classification accuracies with a small number of genes. Mao Yong (2005)
GT data ACO, GA RBF We improved all five accuracy criteria from a confusion matrix by at least 5% compared to SVM. Elahe Tamimi (2017)
ICA, PSO
PIMA, WDBC PSO, DE, HS, ABC, TLBO RBF TLBO outperforms various SVM model selection methods proposed in the literature, particularly the well-known Bayesian method. Ghnimi (2020)
Fig 17. Accuracy of SVM with SNR and BC for different data sets.

Fig 17

5. Conclusion

This paper proposes a hybridization algorithm of the whale optimization algorithm (WOA) with a discrete recombinative (DR) strategy. The improved RESHWOA algorithm adds a diverse random population and tends to the global optimum, which enhances the global search capability of the RESHWOA algorithm. The RESHWOA algorithm draws inspiration from the phase exchanging information found in the DR strategy, and it introduces two control parameters, μ and ρ. These parameters play a crucial role in guiding the algorithm’s behaviour. To evaluate the performance of the RESHWOA algorithm, we assessed it on thirteen benchmark test functions. We then compared the results of this evaluation with those obtained from the state-of-the-art WOA algorithm. The aim of this comparison is to assess the performance of the RESHWOA algorithm in relation to the existing advanced optimization technique WOA. The results show that the RESHWOA algorithm has better global exploration capabilities and higher convergence accuracy. Furthermore, the SVM parameter optimization experimental results, using SNR and BC on microarray cancer data, show the best performance in five data sets out of six. This experiment demonstrates that the RESHWOA offers a near-optimal parameter combination with minimum MSE in the given range. Although the proposed RESHWOA algorithm is a hybridization, the detailed simulation experiments in this paper verified its better performance. The iterative mechanism effectively enhances the randomness in the control parameter. Various fields can apply this approach and may have significant implications for future research. Building upon these promising findings, several potential research directions emerge for future investigation: i) integration with other metaheuristic algorithms, ii) Further exploration of DR strategy, iii) parameter tuning and sensitivity analysis of DR strategy, iv) How to readily hybridize existing algorithms with fast convergence, low MSE, stability, and steadiness might be promising work in the future.

Supporting information

S1 Fig. Description of evolutionary strategy.

(TIF)

pone.0295643.s001.tif (96.9KB, tif)

Acknowledgments

The authors are grateful to the editor and anonymous reviewers for their constructive comments and valuable suggestions, which not only improved the quality of the paper but added value to it.

Data Availability

The data that support the findings of this study are openly available. The Breast, Colon, Central Nervous System (CNS), Ovarian, and Leukemia were downloaded from https://csse.szu.edu.cn/staff/zhuzx/datasets.html, while the Carcinoma tumour data set is come from [ Princeton University gene expression project] at http://genomics-pubs.princeton.edu/oncology.

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Hameed SS, Hassan WH, Latiff LA, Muhammadsharif FF. A comparative study of nature-inspired metaheuristic algorithms using a three-phase hybrid approach for gene selection and classification in high-dimensional cancer datasets. Soft Comput. 2021;25: 8683–8701. doi: 10.1007/s00500-021-05726-0 [DOI] [Google Scholar]
  • 2.Mirjalili S, Lewis A. The Whale Optimization Algorithm. Adv Eng Softw. 2016;95: 51–67. doi: 10.1016/j.advengsoft.2016.01.008 [DOI] [Google Scholar]
  • 3.Holland JH. Genetic algorithms. Sci Am. 1992;267: 66–73. doi: org/10.1038/scientificamerican0792-661411454 [Google Scholar]
  • 4.Koza JR, Bennett FH, Andre D, Keane MA. Genetic programming III: Darwinian invention and problem solving [Book Review]. IEEE Trans Evol Comput. 2005;3: 251–253. doi: 10.1109/tevc.1999.788530 [DOI] [Google Scholar]
  • 5.Bangert P. Optimization: Simulated Annealing. Optim Ind Probl. 2012;220: 165–200. doi: 10.1007/978-3-642-24974-7_7 [DOI] [Google Scholar]
  • 6.Deb Kalyanmoy (2001). Multi- objective Optimization using Evolutionary Algorithms. Suparyanto dan Rosad. John Wiley & Sons; 2001. [Google Scholar]
  • 7.Mostafa A, Hassanien AE, Houseni M, Hefny H. Liver segmentation in MRI images based on whale optimization algorithm. Multimed Tools Appl. 2017;76: 24931–24954. doi: 10.1007/S11042-017-4638-5 [DOI] [Google Scholar]
  • 8.Karlekar NP, Gomathi N. OW-SVM: Ontology and whale optimization-based support vector machine for privacy-preserved medical data classification in cloud. Int J Commun Syst. 2018;31. doi: 10.1002/DAC.3700 [DOI] [Google Scholar]
  • 9.Hassan G, Hassanien AE. Retinal fundus vasculature multilevel segmentation using whale optimization algorithm. Signal, Image Video Process. 2018;12: 263–270. doi: 10.1007/s11760-017-1154-z [DOI] [Google Scholar]
  • 10.Aziz MA El, Ewees AA, Hassanien AE. Whale Optimization Algorithm and Moth-Flame Optimization for multilevel thresholding image segmentation. Expert Syst Appl. 2017;83: 242–256. doi: 10.1016/j.eswa.2017.04.023 [DOI] [Google Scholar]
  • 11.Khadanga RK, Padhy S, Panda S, Kumar A. Design and analysis of multi-stage PID controller for frequency control in an islanded micro-grid using a novel hybrid whale optimization-pattern search algorithm. Int J Numer Model Electron Networks, Devices Fields. 2018;31. doi: 10.1002/JNM.2349 [DOI] [Google Scholar]
  • 12.Sahu PR, Hota PK, Panda S. Modified whale optimization algorithm for fractional-order multi-input SSSC-based controller design. Optim Control Appl Methods. 2018;39: 1802–1817. doi: 10.1002/OCA.2443 [DOI] [Google Scholar]
  • 13.Ismail Sayed G, Darwish A, Ella Hassanien A. A New Chaotic Whale Optimization Algorithm for Features Selection. J Classif. 2018;35: 300. doi: 10.1007/s00357-018-9261-2 [DOI] [Google Scholar]
  • 14.Yan Z, Sha J, Liu B, Tian W, Lu J. An ameliorative whale optimization algorithm for multi-objective optimal allocation ofwater resources in Handan, China. Water (Switzerland). 2018;10. doi: 10.3390/W10010087 [DOI] [Google Scholar]
  • 15.Fogel DB. An Introduction to Simulated Evolutionary Optimization. IEEE Trans Neural Networks. 1994;5: 3–14. doi: 10.1109/72.265956 [DOI] [PubMed] [Google Scholar]
  • 16.Bäck T, Schwefel H-P. An Overview of Evolutionary Algorithms for Parameter Optimization. Evol Comput. 1993;1: 1–23. doi: 10.1162/evco.1993.1.1.1 [DOI] [Google Scholar]
  • 17.Bäck T. Evolutionary algorithms in theory and practice. Oxford University Press; 1996. [Google Scholar]
  • 18.Yao X. An Overview of Evolutionary Computation. Evol Comput Model Optim. 2006;667: 1–31. doi: 10.1007/0-387-31909-3_1 [DOI] [Google Scholar]
  • 19.Ruan J, Jahid MJ, Gu F, Lei C, Huang YW, Hsu YT, et al. A novel algorithm for network-based prediction of cancer recurrence. Genomics. 2019;111: 17–23. doi: 10.1016/j.ygeno.2016.07.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Fan S, Huang K, Ai R, Wang M, Wang W. Predicting CpG methylation levels by integrating Infinium HumanMethylation450 BeadChip array data. Genomics. 2016;107: 132–137. doi: 10.1016/j.ygeno.2016.02.005 [DOI] [PubMed] [Google Scholar]
  • 21.Mohammadi M, Sharifi Noghabi H, Abed Hodtani G, Rajabi Mashhadi H. Robust and stable gene selection via Maximum-Minimum Correntropy Criterion. Genomics. 2016;107: 83–87. doi: 10.1016/j.ygeno.2015.12.006 [DOI] [PubMed] [Google Scholar]
  • 22.Bhandari V, Boutros PC. Comparing continuous and discrete analyses of breast cancer survival information. Genomics. 2016;108: 78–83. doi: 10.1016/j.ygeno.2016.06.002 [DOI] [PubMed] [Google Scholar]
  • 23.Alireza O, Shadgar B. Classification and diagnostic prediction of cancer using microarray gene expression.pdf. J Appl Sceinces. 2009;9: 459–468. doi: 10.3923/jas.2009.459.468 [DOI] [Google Scholar]
  • 24.Vapnik VN, Chervonenkis AY. On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. Measures of Complexity. Cham: Springer International Publishing; 2015. pp. 11–30. doi: 10.1007/978-3-319-21852-6_3 [DOI] [Google Scholar]
  • 25.Anton SDD, Sinha S, Dieter Schotten H. Anomaly-based intrusion detection in industrial data with SVM and random forests. 2019 27th Int Conf Software, Telecommun Comput Networks, SoftCOM 2019. 2019; 1–6. doi: 10.23919/SOFTCOM.2019.8903672 [DOI] [Google Scholar]
  • 26.Jalal D, Ezzedine T. Toward a smart real time monitoring system for drinking water based on machine learning. 2019 27th Int Conf Software, Telecommun Comput Networks, SoftCOM 2019. 2019; 1–5. doi: 10.23919/SOFTCOM.2019.8903866 [DOI] [Google Scholar]
  • 27.Gold C, Sollich P. Model selection for support vector machine classification. Neurocomputing. 2003;55: 221–249. doi: 10.1016/S0925-2312(03)00375-8 [DOI] [Google Scholar]
  • 28.Duarte E, Wainer J. Empirical comparison of cross-validation and internal metrics for tuning SVM hyperparameters. Pattern Recognit Lett. 2017;88: 6–11. doi: 10.1016/j.patrec.2017.01.007 [DOI] [Google Scholar]
  • 29.Aparna M, Radha D. Detection of weed using visual attention model and SVM classifier. Lecture Notes in Computational Vision and Biomechanics. Springer International Publishing; 2019. doi: 10.1007/978-3-030-00665-5_25 [DOI] [Google Scholar]
  • 30.Coluccia A, Fascista A, Ricci G. Spectrum sensing by higher-order SVM-based detection. Eur Signal Process Conf. 2019;2019-Septe. doi: 10.23919/EUSIPCO.2019.8903028 [DOI] [Google Scholar]
  • 31.Vinge R, McKelvey T. Understanding support vector machines with polynomial kernels. Eur Signal Process Conf. 2019;2019-Septe. doi: 10.23919/EUSIPCO.2019.8903042 [DOI] [Google Scholar]
  • 32.Mao Y, Zhou XB, Pi DY, Sun YX, Wong STC. Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm. J Zhejiang Univ Sci. 2005;6 B: 961–973. doi: 10.1631/jzus.2005.B0961 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Shah MH, Dang X. Novel Feature Selection Method Using Bhattacharyya DIstance for Neural Networks Based Automatic Modulation Classification. IEEE Signal Process Lett. 2020;27: 106–110. doi: 10.1109/lsp.2019.2957924 [DOI] [Google Scholar]
  • 34.Zebari R, Abdulazeez A, Zeebaree D, Zebari D, Saeed J. A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction. J Appl Sci Technol Trends. 2020;1: 56–70. doi: 10.38094/jastt1224 [DOI] [Google Scholar]
  • 35.Mohammed MS, Rachapudy PS, Kasa M. Big data classification with optimization driven MapReduce framework. Int J Knowledge-Based Intell Eng Syst. 2021;25: 173–183. doi: 10.3233/KES-210062 [DOI] [Google Scholar]
  • 36.B N, V I. Enhanced machine learning based feature subset through FFS enabled classification for cervical cancer diagnosis. Int J Knowledge-based Intell Eng Syst. 2022;26: 79–89. doi: 10.3233/KES-220009 [DOI] [Google Scholar]
  • 37.Trozzi F, Wang X, Tao P. UMAP as a Dimensionality Reduction Tool for Molecular Dynamics Simulations of Biomacromolecules: A Comparison Study. J Phys Chem B. 2021;125: 5022–5034. doi: 10.1021/acs.jpcb.1c02081 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Mohd Ali N, Besar R, Nor NA. Hybrid Feature Selection of Breast Cancer Gene Expression Microarray Data Based on Metaheuristic Methods: A Comprehensive Review. Symmetry (Basel). 2022;14: 1955. doi: 10.3390/sym14101955 [DOI] [Google Scholar]
  • 39.Ahmad Zamri N, Nor NA, Bhuvaneswari T, Abdul Aziz NH, Ghazali AK. Feature Selection of Microarray Data Using Simulated Kalman Filter with Mutation. Processes. 2023;11: 2409–2418. doi: 10.3390/pr11082409 [DOI] [Google Scholar]
  • 40.García-Torres M, Ruiz R, Divina F. Evolutionary feature selection on high dimensional data using a search space reduction approach. Eng Appl Artif Intell. 2023;117: 105556. doi: 10.1016/j.engappai.2022.105556 [DOI] [Google Scholar]
  • 41.Dokeroglu T, Deniz A, Kiziloz HE. A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing. 2022;494: 269–296. doi: 10.1016/j.neucom.2022.04.083 [DOI] [Google Scholar]
  • 42.Brahim Belhaouari S, Shakeel MB, Erbad A, Oflaz Z, Kassoul K. Bird’s Eye View feature selection for high-dimensional data. Sci Rep. 2023;13: 13303. doi: 10.1038/s41598-023-39790-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Yaqoob A, Aziz RM, Verma NK, Lalwani P, Makrariya A, Kumar P. A Review on Nature-Inspired Algorithms for Cancer Disease Prediction and Classification. Mathematics. 2023;11. doi: 10.3390/math11051081 [DOI] [Google Scholar]
  • 44.Kaur S, Kumar Y, Koul A, Kumar Kamboj S. A Systematic Review on Metaheuristic Optimization Techniques for Feature Selections in Disease Diagnosis: Open Issues and Challenges. Archives of Computational Methods in Engineering. Springer Netherlands; 2023. doi: 10.1007/s11831-022-09853-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Anowar F, Sadaoui S, Selim B. Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Comput Sci Rev. 2021;40: 100378. doi: 10.1016/j.cosrev.2021.100378 [DOI] [Google Scholar]
  • 46.Flexa C, Gomes W, Moreira I, Alves R, Sales C. Polygonal Coordinate System: Visualizing high-dimensional data using geometric DR, and a deterministic version of t-SNE. Expert Syst Appl. 2021;175: 114741. doi: 10.1016/j.eswa.2021.114741 [DOI] [Google Scholar]
  • 47.Xiang R, Wang W, Yang L, Wang S, Xu C, Chen X. A Comparison for Dimensionality Reduction Methods of Single-Cell RNA-seq Data. Front Genet. 2021;12: 1–12. doi: 10.3389/fgene.2021.646936 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Jia W, Sun M, Lian J, Hou S. Feature dimensionality reduction: a review. Complex Intell Syst. 2022;8: 2663–2693. doi: 10.1007/s40747-021-00637-x [DOI] [Google Scholar]
  • 49.Ahmad Z, Li J, Mahmood T. Adaptive Hyperparameter Fine-Tuning for Boosting the Robustness and Quality of the Particle Swarm Optimization Algorithm for Non-Linear RBF Neural Network Modelling and Its Applications. Mathematics. 2023;11: 1–16. doi: 10.3390/math11010242 [DOI] [Google Scholar]
  • 50.Abbas F, Zhang F, Ismail M, Khan G, Iqbal J, Alrefaei AF, et al. Optimizing Machine Learning Algorithms for Landslide Susceptibility Mapping along the Karakoram Highway, Gilgit Baltistan, Pakistan: A Comparative Study of Baseline, Bayesian, and Metaheuristic Hyperparameter Optimization Techniques. Sensors. 2023;23. doi: 10.3390/s23156843 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Shahsavari M, Mohammadi V, Alizadeh B, Alizadeh H. Application of machine learning algorithms and feature selection in rapeseed (Brassica napus L.) breeding for seed yield. Plant Methods. 2023;19: 1–22. doi: 10.1186/s13007-023-01035-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Calesella F, Testolin A, De Filippo De Grazia M, Zorzi M. A comparison of feature extraction methods for prediction of neuropsychological scores from functional connectivity data of stroke patients. Brain Informatics. 2021;8: 1–13. doi: 10.1186/s40708-021-00129-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Kim SH, Boukouvala F. Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques. Optim Lett. 2020;14: 989–1010. doi: 10.1007/s11590-019-01428-7 [DOI] [Google Scholar]
  • 54.Cassan O, Lèbre S, Martin A. Inferring and analyzing gene regulatory networks from multi-factorial expression data: a complete and interactive suite. BMC Genomics. 2021;22: 1–15. doi: 10.1186/s12864-021-07659-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Van den Broeck L, Gordon M, Inzé D, Williams C, Sozzani R. Gene Regulatory Network Inference: Connecting Plant Biology and Mathematical Modeling. Front Genet. 2020;11: 1–12. doi: 10.3389/fgene.2020.00457 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Zito F, Cutello V. and Infer Gene Regulatory Networks †. Entropy. 2023;25: 1214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Torres R, Judson-Torres RL. Research Techniques Made Simple: Feature Selection for Biomarker Discovery. J Invest Dermatol. 2019;139: 2068–2074.e1. doi: 10.1016/j.jid.2019.07.682 [DOI] [PubMed] [Google Scholar]
  • 58.Dhillon A, Singh A, Bhalla VK. A Systematic Review on Biomarker Identification for Cancer Diagnosis and Prognosis in Multi-omics: From Computational Needs to Machine Learning and Deep Learning. Archives of Computational Methods in Engineering. Springer Netherlands; 2023. doi: 10.1007/s11831-022-09821-9 [DOI] [Google Scholar]
  • 59.Al-Tashi Q, Saad MB, Muneer A, Qureshi R, Mirjalili S, Sheshadri A, et al. Machine Learning Models for the Identification of Prognostic and Predictive Cancer Biomarkers: A Systematic Review. Int J Mol Sci. 2023;24. doi: 10.3390/ijms24097781 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Drovandi CC, Holmes CC, McGree JM, Mengersen K, Richardson S, Ryan EG. Principles of experimental design for Big Data analysis. Stat Sci. 2017;32: 385–404. doi: 10.1214/16-STS604 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Kreutz C, Timmer J. Systems biology: Experimental design. FEBS J. 2009;276: 923–942. doi: 10.1111/j.1742-4658.2008.06843.x [DOI] [PubMed] [Google Scholar]
  • 62.Huang CJ, Liao WC. A Comparative Study of Feature Selection Methods for Probabilistic Neural Networks in Cancer Classification. Proc Int Conf Tools with Artif Intell. 2003; 451–458. doi: 10.1109/tai.2003.1250224 [DOI] [Google Scholar]
  • 63.Gharehchopogh FS, Gholizadeh H. A comprehensive survey: Whale Optimization Algorithm and its applications. Swarm Evol Comput. 2019;48: 1–24. doi: 10.1016/j.swevo.2019.03.004 [DOI] [Google Scholar]
  • 64.Cai J, Thierauf G. Evolution strategies for solving discrete optimization problems. Adv Eng Softw. 1996;25: 177–183. doi: 10.1016/0965-9978(95)00104-2 [DOI] [Google Scholar]
  • 65.Moutinho L, Sokele M. Innovative research methodologies in management: Volume I: Philosophy, measurement and modelling. Innovative Research Methodologies in Management: Volume I: Philosophy, Measurement and Modelling. 2017. doi: 10.1007/978-3-319-64394-6 [DOI] [Google Scholar]
  • 66.Yao X, Liu Y. Fast evolution strategies. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). 1997;1213: 151–161. doi: 10.1007/bfb0014808 [DOI] [Google Scholar]
  • 67.Hussien AG, Asghar A, Xiaojia H, Guoxi Y, Huiling L, Zhifang C. Boosting whale optimization with evolution strategy and Gaussian random walks: an image segmentation method. Engineering with Computers. Springer London; 2021. doi: 10.1007/s00366-021-01542-0 [DOI] [Google Scholar]
  • 68.Zhang Y, Zhang Y, Wang G, Liu B. An improved hybrid whale optimization algorithm based on differential evolution. Proc—Int Conf Artif Intell Electromechanical Autom AIEA 2020. 2020; 103–107. doi: 10.1109/AIEA51086.2020.00029 [DOI] [Google Scholar]
  • 69.Yao X, Liu Y, Lin G. Evolutionary programming made faster. IEEE Trans Evol Comput. 1999;3: 82–102. doi: 10.1109/4235.771163 [DOI] [Google Scholar]
  • 70.Brown CT, Liebovitch LS, Glendon R. Lévy flights in dobe Ju/’hoansi foraging patterns. Hum Ecol. 2007;35: 129–138. doi: 10.1007/s10745-006-9083-4 [DOI] [Google Scholar]
  • 71.Choi E, Lee C. Feature extraction based on the Bhattacharyya distance. Pattern Recognit. 2003;36: 1703–1709. doi: 10.1016/S0031-3203(03)00035-9 [DOI] [Google Scholar]
  • 72.Yu B, Zhang Y. The analysis of colon cancer gene expression profiles and the extraction of informative genes. J Comput Theor Nanosci. 2013;10: 1097–1103. doi: 10.1166/jctn.2013.2812 [DOI] [Google Scholar]
  • 73.Viljanen M, Meijerink L, Zwakhals L, van de Kassteele J. A machine learning approach to small area estimation: predicting the health, housing and well-being of the population of Netherlands. Int J Health Geogr. 2022;21: 1–18. doi: 10.1186/s12942-022-00304-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Syarif I, Wills G. SVM Parameter Optimization using Grid Search and Genetic Algorithm to SVM Parameter Optimization Using Grid Search and Genetic Algorithm to Improve Classification Performance. TELKOMNIKA (Telecommunication Comput Electron Control. 2016;14: 1502–1509. doi: 10.12928/telkomnika.v14i4.3956 [DOI] [Google Scholar]
  • 75.Tamimi E, Ebadi H, Kiani A. Evaluation of different metaheuristic optimization algorithms in feature selection and parameter determination in SVM classification. Arab J Geosci. 2017;10. doi: 10.1007/s12517-017-3254-z [DOI] [Google Scholar]
  • 76.Ghnimi O, Kharbech S, Belazi A, Bouallegue A. Model selection for support-vector machines through metaheuristic optimization algorithms. In: Osten W, Zhou J, Nikolaev DP, editors. Thirteenth International Conference on Machine Vision. SPIE; 2021. p. 59. doi: 10.1117/12.2587439 [DOI] [Google Scholar]

Decision Letter 0

Omar A Alzubi

9 Feb 2023

PONE-D-23-01358A state-of-the-art Fusion of Whale algorithm with Evolutionary Strategies for high dimensionality and Filtration with Signal to Noise Ratio and Bhattacharyya DistancePLOS ONE

Dear Dr. hafiz,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 26 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Omar A. Alzubi

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Thank you for stating the following financial disclosure: 

"No"

At this time, please address the following queries:

a) Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution. 

b) State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

c) If any authors received a salary from any of your funders, please state which authors and which funders.

d) If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. Thank you for stating the following in your Competing Interests section:  

"No"

Please complete your Competing Interests on the online submission form to state any Competing Interests. If you have no competing interests, please state "The authors have declared that no competing interests exist.", as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now 

 This information should be included in your cover letter; we will change the online submission form on your behalf.

5. We note that you are reporting an analysis of a microarray, next-generation sequencing, or deep sequencing data set. PLOS requires that authors comply with field-specific standards for preparation, recording, and deposition of data in repositories appropriate to their field. Please upload these data to a stable, public repository (such as ArrayExpress, Gene Expression Omnibus (GEO), DNA Data Bank of Japan (DDBJ), NCBI GenBank, NCBI Sequence Read Archive, or EMBL Nucleotide Sequence Database (ENA)). In your revised cover letter, please provide the relevant accession numbers that may be used to access these data. For a full list of recommended repositories, see http://journals.plos.org/plosone/s/data-availability#loc-omics or http://journals.plos.org/plosone/s/data-availability#loc-sequencing.

6. Please ensure that you refer to Figures 1, 4 to 14 in your text as, if accepted, production will need this reference to link the reader to the figure.

7. We note you have included a table to which you do not refer in the text of your manuscript. Please ensure that you refer to Table 6 in your text; if accepted, production will need this reference to link the reader to the Table.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Large number of English language issues, including in abstract.

Needs extensive review by technical editor proficient in English.

for example the very first sentence is missing an 'IS'

'The standard whale algorithm IS easily trapped in suboptimal and high-dimensional regions.'

In many cases in past reviews I can still follow the flow of the document even when the English usage problems are significant; in this case, the paper is not well organized and it makes it harder to read such that both the organization and the English usage are difficult and problematic.

What is meant by trapped in subotimal regions? do you mean suboptimal solutions?

How is the algorithm trapped in high-dimensional regions? do you mean high dimensionality problems create problems finding optimal solutions?

Are there more references that could be used in section 2.2?

I would consider more references on WOA which give good context to nature-inspired and metaheuristics, such as

Gharehchopogh, Farhad Soleimanian, and Hojjat Gholizadeh. "A comprehensive survey: Whale Optimization Algorithm and its applications." Swarm and Evolutionary Computation 48 (2019): 1-24.

The WOA is not well explained; more background is needed.

The recombinative strategy description around line 167 is not well described at all and appears to mix generalities with specific parameters (5 parents, 100 offsping) that are of unclear origin.

An intuitive sense of the contrast between the collaborative nature of WOA and the recombination would be helpful. As I understand it, the WOA typically probabilistically updates each search agent; sometimes this uses the influence of a randomly selected search agent by using the influence of other search agents. The recombination is not clear as described in lines 154-174; the best I can understand is that the initial positions are found by recombinative methods rather than randomly, but it is not well described.

The presentation order is very hard to follow. The WOA is finally described in more detail starting at line 197, along with the updated version around line 220, but there is still no explanation of the number of parents / offspring which appear to be hard coded to some arbitrary number "100".

The solution appears to simply be selecting the initial population with the recombinative method, but it is not clear why this is better than simply having a better 'initial guess'.

what is the origin of the test functions in section 2.4.1? I do like the concept but are these functions that are used commonly in optimization problems particularly with WOA? If they were cited earlier than line 244, it would be helpful to repeat this ag line 244 "13 international test functions" (and what, exactly, is 'international' about a 'test function'?)

The table 1 says 'mean' and the document (line 247) says "Avg". The benchmark column is 0 for everything except F8 for dim=30 and dim=100 which is very confusing, because those are both 418.9829*5 (is that supposed to be an exponent, like 10e5?)

Table 1 needs reformatting; I would have a table for D=30 and D=100

Section 3 appears to jump into the use of data sets with cancer sets; I assume "filtration" refers to dimensionality reduction but this is very confusiong.

Section 4 follows with cancer datasets, which are apparently generalized as 'high dimensional data'.

Overall I think this paper is somewhat interesting but needs considerable work - basically a re-write. In particular it is not clear to me why the recombination is helpful beyond providing a better initial answer for the WOA as opposed to a bounded random set of selections. If that is the main contribution I would likely recommend rejection .

Reviewer #2: The authors present a new optimization algorithm incorprating an evolutionary algorithm into the whale optimization algorithm, with the goal of improving the diversity of the positions of the "whales," and ideally leading to better optima.

There are numerous grammatical errors in the manuscript. I am not an editor so I did not comprehensively list every such error, but I strongly insist that the authors go through the manuscript and correct all such errors. (I took the time to point out these errors in Abstract and a bit in the Intro)

The title is quite wordy and difficult to understand. Perhaps shorten it? The capitalization is also inconsistent (e.g. "state-of-the-art" and "high dimensionality" should be capitalized).

Due to the number of issues, I provide detailed comments for the first three sections, but I invite the authors to revise the manuscript and resubmit.

Abstract

Line 24: grammar "The standard whale algorithm [is] easily trapped..."

Line 25-26: grammar "The computer-generated initial populations [are] generally unevenly distributed..."

Line 27-28: grammar "A fusion of this algorithm based on ... [is] proposed."

Line 30: assess the "complexity" of what?

Line 33: Sentence fragment?

Introduction

Line 38: What does "these" refer to?

Line 39: "genetic engineering. (1)." -> "genetic engineering (1)."

Line 41: "Started" -> "Start"

You are describing a generic algorithm, so use the present tense. The past tense implies that you are referring to a specific run of an algorithm in the past.

Line 42: "optimal answers"? If they are already optimal, why continue the search?

Line 44: "Genetic algorithm" -> "Genetic algorithms"

How are evolutionary strategies different from genetic algorithms? Also, you write as though there is a singular unique genetic algorithm. Isn't it more a family of algorithms? Same for evolutionary algorithms.

Line 53: What's "WOA"?

Line 58-60: "Using gene expression profiles to identify and classify malignant and normal tissues is the most difficult application of machine learning"

This statement is overly broad. Many would strongly disagree, such as those working with brain data.

Line 63: sentence fragment "The Support Vector Machine (SVM), which is widely used in machine learning models (12)."

How are SVMs relevant/related to your work? How is it related to DNA microarray classification? Otherwise this section seems a bit random/out-of-place.

Line 70: You already said this earlier.

Line 77: "Tumour" -> "tumour" Also, you use both "tumor" and "tumour." Pick one and maintain consistency throughout the manuscript.

Line 90: Who/what is the "operator of the algorithm"?

Line 91: I thought you were using WOA as the optimization algorithm. How are you simultaneously using SVM?

Line 95: What is an "operator"?

Line 96: Font size changes?

Line 99: Missing period.

Line 104: "calculated" -> "organized"

Section 2

Line 112-113: Decapitalize "Hybrid algorithm techniques"

Line 118: Decapitalize "Logistic chaotic mapping"

Is the algorithm called RESHWOA or RESWOA? What does it stand for?

Section 2.1 should go first in section 2, as it provides necessary background for the reader to understand your method.

Section 2.2 has different font.

Line 137-138: What is a decision variable? What are the parents? This section is missing significant exposition/background.

Section 2.2: How does the actual recombination take place, operationally? What is the reprsenentation of the "DNA"? How does mutation occur, operationally?

Line 159: What is the "dominant ρ recombination"?

Equation 1.1: What is the "random" function? Is it sampling an element uniformly at random from a given set?

Line 170: Different citation style? (square bracket vs parentheses)

You have described how recombination occurs, but how does mutation occur?

Lines 177-179: This is the second equation but is labeled (1). What is the "rand" function? Why is there a subscript outside of "random(...)"? Is this position update not dependent on the previous position X_(i)?

Lines 180-181: Add commas to separate the clauses.

Lines 182-192: What are these equations?

Equation 2: What is C? What is X_(*)(i)? How does it differ from X(i)?

Equation 3: Why is this update equation different from equation (1)?

Equation 4: What is r_1?

Equation 6: This is the third distinct equation for X(i+1).

Equation 8: D' is never used anywhere else. What is X_(rand)(i)? How does it differ from X(i)?

Figure 1: This figure is too small and very difficult to read. This figure is also not very helpful to understand the algorithm. Pseudocode would be much better. Also, all figures in PLOS manuscripts must be at the end of the document, with captions as placeholders in the main text.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

<quillbot-extension-portal></quillbot-extension-portal>

PLoS One. 2024 Mar 11;19(3):e0295643. doi: 10.1371/journal.pone.0295643.r002

Author response to Decision Letter 0


15 Apr 2023

Manuscript

Response to reviewers

Dear Dr. Omar,

Thank you for allowing us to submit a revised draft of the manuscript "Hybrid Whale Algorithm with Evolutionary Strategies and Filtering for High-Dimensional Optimization: Application to Microarray Cancer Data" for publication in the Plos One. We appreciate the time and effort you and the reviewers dedicated to providing feedback on our manuscript and are grateful for the insightful comments and valuable improvements to our paper. We have incorporated most of the suggestions made by the reviewers. Please see below, in blue, for a point-by-point response to the reviewers' comments and concerns. All page numbers refer to the revised manuscript file with tracked changes.

Reviewers' Comments to the Authors:

Response to Reviewer #1

Large number of English language issues, including in abstract. Needs extensive review by technical editor proficient in English. for example, the very first sentence is missing an 'IS' 'The standard whale algorithm IS easily trapped in suboptimal and high-dimensional regions.

It is corrected by using Grammarly English correction software.

What is meant by trapped in subotimal regions? do you mean suboptimal solutions?

Trapped in suboptimal regions" refers to a phenomenon in optimization where an algorithm or search process becomes stuck in a local minimum or maximum of a function rather than finding the global minimum or maximum.

Suboptimal solutions refer to any solutions that are not the best for a given optimization problem. When an algorithm becomes trapped in a suboptimal region, it can only find suboptimal solutions and cannot explore the entire search space to find the optimal solution. Yes, that's correct. Suboptimal regions mean suboptimal solutions.

How is the algorithm trapped in high-dimensional regions? do you mean high dimensionality problems create problems finding optimal solutions?

In high-dimensional optimization problems, the number of possible combinations of parameters increases exponentially with the number of variables, making it challenging to find the global optimum. As a result, many optimization algorithms can become trapped in suboptimal regions when working in high-dimensional spaces.

One reason algorithms can become trapped in high-dimensional regions is due to the curse of dimensionality. In high-dimensional spaces, the volume of the space increases exponentially with the number of dimensions, making it difficult to explore the search space efficiently. For example, the distance between two randomly chosen points becomes larger, and the number of points required to get good coverage of the space increases exponentially with the dimensionality.

Another reason algorithms can become trapped in high-dimensional regions is the sparsity of the search space. In high-dimensional spaces, the objective function is typically dominated by a few influential dimensions, and most other dimensions are either irrelevant or redundant. This makes it difficult for an algorithm to explore the relevant dimensions and identify the global optimum.

Yes, that's correct. High-dimensional optimization problems, where the number of variables or dimensions is large, can make it challenging to find the optimal solution. As the number of dimensions increases, the search space grows exponentially, and the algorithms have to explore many combinations of parameters, which becomes computationally expensive. Additionally, the curse of dimensionality can make it challenging to explore the search space efficiently, leading to the possibility of the algorithm becoming trapped in suboptimal regions.

As a result, finding the global optimum in high-dimensional optimization problems can be challenging, and the solution obtained may only be suboptimal.

Are there more references that could be used in section 2.2?

added

The WOA is not well explained; more background is needed.

Some additional background information on the Whale Optimization Algorithm (WOA) on pages 158-180 has been included.

The recombinative strategy description around line 167 is not well described at all and appears to mix generalities with specific parameters (5 parents, 100 offsping) that are of unclear origin.

We have revised the manuscript per your recommendations and added more clarity on the recombinative strategy, and the relevant changes have been made on lines 182-205.

An intuitive sense of the contrast between the collaborative nature of WOA and the recombination would be helpful. As I understand it, the WOA typically probabilistically updates each search agent; sometimes this uses the influence of a randomly selected search agent by using the influence of other search agents. The recombination is not clear as described in lines 154-174; the best I can understand is that the initial positions are found by recombinative methods rather than randomly, but it is not well described.

Yes, you are right. The initial positions are found by discrete recombinative methods rather than randomly. A detailed description regarding the initial population has been provided in the revised manuscript, lines 207-222 and 258-270. We have also provided the recombination strategy details on lines 78-99 and 182-205.

The presentation order is very hard to follow. The WOA is finally described in more detail starting at line 197, along with the updated version around line 220, but there is still no explanation of the number of parents / offspring which appear to be hard coded to some arbitrary number "100".

We have rearranged the presentation order of the paper in the revised manuscript for a better understanding. Furthermore, Regarding the number of parents/offspring used in the recombination process, the authors chose a hyperparameter based on empirical studies. We choose

μ=100 random population and λ=100 offspring

The solution appears to simply be selecting the initial population with the recombinative method, but it is not clear why this is better than simply having a better 'initial guess'.

The choice of the initial population can significantly impact the performance of a metaheuristic algorithm. In some cases, it may be beneficial to use a recombinative method to generate the initial population because it can help to explore the search space more thoroughly and generate diverse solutions. The recombinative process is perfect when the search space is large or complex, and it may not be easy to generate a diverse set of initial solutions manually. On the other hand, if the search space is relatively small or straightforward, it may be sufficient to use a few well-chosen initial solutions as the starting population. In such cases, a recombinative method may not provide any additional benefit and could waste computational resources. We have been provided with the literature review for the diversity impact of the initial population and why this is better than simply having a better 'initial guess on lines 207-222.

what is the origin of the test functions in section 2.4.1?

It is added on lines No. 279-285 in the revised manuscript.

I do like the concept but are these functions that are used commonly in optimization problems particularly with WOA?

Yes, the functions commonly used in optimization problems can be used in the Whale Optimization Algorithm (WOA) as well. One of the advantages of using WOA is that it can be applied to a wide range of optimization problems that involve different types of objective functions.

If they were cited earlier than line 244, it would be helpful to repeat this ag line 244 "13 international test functions" (and what, exactly, is 'international' about a 'test function'?)

We have corrected it on lines No. 28,281,324,480. To clarify, the phrase "13 international test functions" in line 244 refers to a set of benchmark functions that are commonly used to evaluate the performance of optimization algorithms. These functions are referred to as "international" because they have been widely recognized and adopted by the research community around the world.

The table 1 says 'mean' and the document (line 247) says "Avg".

Corrected

The benchmark column is 0 for everything except F8 (Schwefel's function)for dim=30 and dim=100 which is very confusing, because those are both 418.9829*5 (is that supposed to be an exponent, like 10e5?)

The benchmark column for Schwefel's function (F_8) is not 0 for all dimensions.

The correct information is that the global minimum value of Schwefel's function (F_8) is 418.9829 * n, where n is the dimension of the function. For example, if n = 30, the global minimum value is 12,569.487, and if n = 100, the global minimum value is 41,898.29.

Section 3 appears to jump into the use of data sets with cancer sets; I assume "filtration" refers to dimensionality reduction but this is very confusiong.

We have rearranged the whole manuscript for clarity. Yes, filtration refers to dimensionality reduction, and we corrected it in the revised manuscript on pages 335-382.

Section 4 follows with cancer datasets, which are apparently generalized as 'high dimensional data'.

You are right, but we revised it as cancer data on line 335 for clarity.

In particular it is not clear to me why the recombination is helpful beyond providing a better initial answer for the WOA as opposed to a bounded random set of selections.

Recombination is used in many evolutionary algorithms, including the WOA, to generate new solutions by combining information from multiple parent solutions. In the case of the WOA, the recombination operation is used to generate a new candidate solution by combining the information from two randomly selected parent solutions. The main advantage of recombination is that it can exploit the information contained in the parent solutions to generate potentially better offspring than the parent solutions themselves. Offspring can inherit beneficial traits from both parent solutions and combine them more effectively than randomly selecting solutions from a bounded set. In the context of the WOA, the recombination operation can help to generate a more diverse set of candidate solutions that explore the search space more effectively. It can lead to a better chance of finding high-quality solutions to the optimization problem, especially if the initial set of solutions is limited or poorly suited to the problem.

If that is the main contribution, I would likely recommend rejection.

The paper's main contribution has been added on lines 121-141.

Response to reviewer # 2

There are numerous grammatical errors in the manuscript. I am not an editor so I did not comprehensively list every such error, but I strongly insist that the authors go through the manuscript and correct all such errors. (I took the time to point out these errors in Abstract and a bit in the Intro)

Thank you for your valuable feedback on our manuscript. We appreciate your concern regarding the grammatical errors in the manuscript. We want to inform you that we have used a grammar checker tool like Grammarly to help us identify and correct the errors. However, we understand that no automated tool is perfect and may not catch all the errors. We have reviewed the manuscript and made all the necessary corrections. We have also considered the assistance of a professional editor or proofreader to help us further improve the quality of the manuscript.

The title is quite wordy and difficult to understand. Perhaps shorten it?

It has been corrected in the revised manuscript.

The capitalization is also inconsistent (e.g. "state-of-the-art" and "high dimensionality" should be capitalized).

Corrected; see lines 82-481 and Table 8, column No. 1.

Abstract

Line 24: grammar "The standard whale algorithm [is] easily trapped..."

corrected

Line 25-26: grammar "The computer-generated initial populations [are] generally unevenly distributed..."

corrected

Line 27-28: grammar "A fusion of this algorithm based on ... [is] proposed."

corrected

Line 30: assess the "complexity" of what?

Corrected

Line 33: Sentence fragment?

corrected

Introduction

Line 38: What does "these" refer to?

Corrected; See line 38 on the revised manuscript.

Line 39: "genetic engineering. (1)." -> "genetic engineering (1)."

Corrected; See line 39

Line 41: "Started" -> "Start"

Corrected

You are describing a generic algorithm, so use the present tense. The past tense implies that you are referring to a specific run of an algorithm in the past.

Corrected

Line 42: "optimal answers"? If they are already optimal, why continue the search?

Corrected

Line 44: "Genetic algorithm" -> "Genetic algorithms"

Corrected

How are evolutionary strategies different from genetic algorithms? Also, you write as though there is a singular unique genetic algorithm. Isn't it more a family of algorithms? Same for evolutionary algorithms.

Evolutionary strategies and genetic algorithms are members of the broader family of evolutionary algorithms, which are a class of metaheuristic optimization algorithms inspired by biological evolution. Although they share some similarities, these two approaches have some key differences.

One key difference is how they handle the representation of the solution space. Genetic algorithms typically use a binary string or string of integers to represent the solutions, which can be manipulated through crossover and mutation operations to generate new candidate solutions. Evolutionary strategies, on the other hand, typically use a continuous vector representation of the solutions, which can be modified through mutation and recombination.

Another difference is in how they perform the selection. Genetic algorithms typically use selection methods such as roulette wheel selection or tournament selection, where individuals with higher fitness values are more likely to be selected for reproduction. In contrast, evolutionary strategies typically use a form of selection called (1+1)-ES, where a single individual is mutated to produce a new candidate solution. Only the better of the two solutions is retained.

Regarding your second question, you are correct that genetic algorithms and evolutionary algorithms are more accurately described as families of algorithms rather than singular, unique algorithms. Some many variations and modifications can be made to the basic algorithmic structure to improve performance or adapt to different problem domains. The same is true for evolutionary strategies and other types of evolutionary algorithms.

Line 53: What's "WOA"?

Corrected, Whale Optimization Algorithm (WOA)

Line 58-60: "Using gene expression profiles to identify and classify malignant and normal tissues is the most difficult application of machine learning" This statement is overly broad. Many would strongly disagree, such as those working with brain data.

Corrected; see lines 101-104

Line 63: sentence fragment "The Support Vector Machine (SVM), which is widely used in machine learning models (12)."

Corrected; see lines 106-108

How are SVMs relevant/related to your work? How is it related to DNA microarray classification? Otherwise this section seems a bit random/out-of-place.

We have used mean square error (MSE) as the objective function and SVM as the prediction modal. We minimized the MSE incorporating it into the Whale optimization algorithm and the proposed algorithm as the objective function. Additionally, we optimized the SVM parameter with minimum MSE. Regarding DNA microarray classification, SVMs are commonly used to analyze and classify gene expression data. DNA microarrays allow researchers to simultaneously measure the expression levels of thousands of genes, producing high-dimensional data that can be challenging to analyze and interpret using traditional statistical methods. SVMs can be applied to these data sets to identify patterns and classify samples from different disease states, such as cancerous or non-cancerous tissues.

In the paper context, SVMs have been used as a benchmark method for the proposed fusion algorithm. We have compared the performance of our fusion algorithm to that of an SVM or other commonly used machine learning algorithm to demonstrate its superiority.

Line 70: You already said this earlier.

Corrected

Line 77: "Tumour" -> "tumour" Also, you use both "tumor" and "tumour." Pick one and maintain consistency throughout the manuscript.

Corrected

Line 90: Who/what is the "operator of the algorithm"?

Corrected; typo mistake.

Line 91: I thought you were using WOA as the optimization algorithm. How are you simultaneously using SVM?

Support Vector Machine (SVM) has been proven to perform much better when dealing with high-dimensional datasets and numerical features. Although SVM works well with the default value, the performance of SVM can be improved significantly using parameter optimization. We applied two methods which are the recombinative evolutionary strategy hybrid whale algorithm (RESHWOA)and Whale optimization algorithm (WOA), to optimize the SVM parameters. Our experiment showed that SVM parameter optimization using RESHWOA always finds near-optimal parameter combinations within the given ranges. However, WOA was not; therefore, it was reliable only in low-dimensional datasets with few parameters. SVM parameter optimization using RESHWOA can be used to solve the problem of high dimensional regions. RESHWOA has proven to be more stable than WOA. The average running time on 6 datasets shows that RESHWOA was almost better than WOA.

Furthermore, the RESHWOA's results were slightly better than the grid search in 5 of 6 datasets. For more clarity, the whale optimization algorithm has been used to optimize the SVM's hyperparameter with minimum mean square error. WOA is the optimizer, SVM is the prediction modal, and MSE is the objective function.

Line 95: What is an "operator"?

Corrected; Typo mistake. See lines 47-48 and 92-94 for more details.

Line 96: Font size changes?

Corrected

Line 99: Missing period.

corrected

Line 104: "calculated" -> "organized"

Corrected

Section 2

Line 112-113: Decapitalize "Hybrid algorithm techniques"

Corrected

Line 118: Decapitalize "Logistic chaotic mapping"

Corrected; see line 72

Is the algorithm called RESHWOA or RESWOA? What does it stand for?

The algorithm called Recombinative evolutionary strategy hybrid whale optimization algorithm (RESHWOA)

Section 2.1 should go first in section 2, as it provides necessary background for the reader to understand your method.

Corrected; see lines 146,157,181,206.

Section 2.2 has different font.

corrected

Line 137-138: What is a decision variable? What are the parents? This section is missing significant exposition/background.

Decision variables are the offspring generated through a discrete recombination strategy. Parents are the individuals selected through reproduction. Background of the evolutionary strategy has been added; see lines 78-99,182-205.

Section 2.2: How does the actual recombination take place, operationally? What is the reprsenentation of the "DNA"? How does mutation occur, operationally?

We added a new S1 Fig. 18, demonstrating how the recombination occurs operationally. For the second part of the question, the representation of DNA in microarray cancer data by measuring gene expression levels using specific probes that target individual genes. In the discrete recombination strategy, offspring are selected randomly, so no mutation occurs operationally.

Line 159: What is the "dominant ρ recombination"?

See S1 Fig. 18

Equation 1.1: What is the "random" function? Is it sampling an element uniformly at random from a given set?

Yes, it is sampling an element uniformly randomly from a given set.

Line 170: Different citation style? (square bracket vs parentheses)

Corrected

You have described how recombination occurs, but how does mutation occur

No mutation occurs operationally in the discrete recombination strategy.

Lines 177-179: This is the second equation but is labeled (1). What is the "rand" function? Why is there a subscript outside of "random(...)"? Is this position update not dependent on the previous position X_(i)?

We have formatted these equations and updated them in the new format. See line 201.

Lines 180-181: Add commas to separate the clauses.

Updated

Lines 182-192: What are these equations?

Updated see line 201

Equation 2: What is C? What is X_(*)(i)? How does it differ from X(i)?

A and C are coefficient vectors, X* is the position vector of the best solution obtained so far, and X is the position vector. See lines 137-137

Equation 3: Why is this update equation different from equation (1)?

Equation 4: What is r_1?

r_1 randomly generated vector lies between 0 and 1.

Equation 6: This is the third distinct equation for X(i+1).

Yes, you are right; Actually, equations 3,6 and 9 show the three different behaviours of the whales, Encircling Prey Behavior if p<0.5, Attacking Prey Behavior if p≥0.5 and Searching Prey Behavior if |A|<1 or |A|>1. According to these equations, whales update their behaviours depending on the values of p and A. For more details, check the reference given below.

Mirjalili S, Lewis A. The Whale Optimization Algorithm. Adv Eng Softw [Internet]. 2016;95:51–67. Available from: http://dx.doi.org/10.1016/j.advengsoft.2016.01.008

Equation 8: D' is never used anywhere else. What is X_(rand)(i)? How does it differ from X(i)?

D^' Indicates the distance of the ith whale to the prey, and it is only part of equation 6. Where X_((rand) ) (i) is a random whale position; X (i) is the position vector.

Attachment

Submitted filename: Response to reviewers.docx

pone.0295643.s002.docx (32.2KB, docx)

Decision Letter 1

Omar A Alzubi

22 Jun 2023

PONE-D-23-01358R1Hybrid Whale Algorithm with Evolutionary Strategies and Filtering for High-Dimensional Optimization: Application to Microarray Cancer DataPLOS ONE

Dear Dr. hafiz,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Aug 06 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Omar A. Alzubi

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments (if provided):

In its current state, the level of English throughout the manuscript needs improvement. You may wish to ask a native speaker to check your manuscript for grammar, style, and syntax.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #3: All comments have been addressed

********** 

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #3: Yes

********** 

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #3: Yes

********** 

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: Yes

********** 

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #3: Yes

********** 

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Please use additional screening for grammar and punctuation, such as Microsoft Word (there are still many cases of missing spaces and other simple errors, but overall the paper is improved from before).

Reviewer #3: The manuscript has been significantly improved and in this form is of considerable scientific interest. I believe that the authors of the article managed to prove a significant advantage of the combined RESHWOA method over the classical Whale Optimization Algorithm (WOA). I believe that the discrete recombination (DR) strategy can be used to improve a number of other algorithms.

********** 

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: Yes: Osipov Aleksey

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Mar 11;19(3):e0295643. doi: 10.1371/journal.pone.0295643.r004

Author response to Decision Letter 1


27 Jul 2023

Manuscript

Response to reviewers

Dear Dr. Omar,

Thank you for allowing us to submit a second revised draft of the manuscript "Hybrid Whale Algorithm with Evolutionary Strategies and Filtering for High-Dimensional Optimization: Application to Microarray Cancer Data" for publication in the Plos One. We appreciate the time and effort you and the reviewers dedicated to providing feedback on our manuscript and are grateful for the insightful comments and valuable improvements to our paper. We have incorporated most of the suggestions made by the reviewers. Please see below, in blue, for a point-by-point response to the reviewers' comments and concerns. All page numbers refer to the revised manuscript file with tracked changes.

Reviewers' Comments to the Authors:

Response to Reviewer #1:

Please use additional screening for grammar and punctuation, such as Microsoft Word (there are still many cases of missing spaces and other simple errors, but overall the paper is improved from before).

Thank you for your valuable feedback on our paper. We sincerely appreciate your efforts in reviewing our work and providing constructive suggestions. We are glad to know that the paper has shown improvement since the previous version.

Regarding your concern about grammar and punctuation, we acknowledge the importance of ensuring the highest quality of language in our paper. To address this, we have taken the following steps:

1. Using Microsoft Word: We have run the entire document through the Microsoft Word spelling and grammar check. This process has helped us identify and correct many instances of missing spaces and other simple errors.

2. Professional Proofreading: Additionally, we have enlisted the assistance of professional proofreaders to meticulously review the paper. Their expertise has been instrumental in catching and rectifying any remaining grammar and punctuation issues.

3. Multiple Revisions: Throughout the revision process, we have paid extra attention to refining the language, sentence structure, and punctuation. We have carefully combed through the text to ensure coherence and clarity.

4. Using Grammarly: In addition to carefully proofreading the content, we have also utilized Grammarly software to enhance the clarity and correctness of the text. This combination of human proofreading and Grammarly assistance has helped us address any spelling and grammar errors, ensuring a more polished and accurate presentation of our work.

While we have taken significant steps to improve the paper's language, we understand that achieving absolute perfection is crucial. Therefore, we will perform another thorough review to make certain that all grammar and punctuation errors are effectively addressed.

Reviewer #3:

The manuscript has been significantly improved and in this form is of considerable scientific interest. I believe that the authors of the article managed to prove a significant advantage of the combined RESHWOA method over the classical Whale Optimization Algorithm (WOA). I believe that the discrete recombination (DR) strategy can be used to improve a number of other algorithms

Thank you for your positive and encouraging feedback on our manuscript. We are delighted to learn that you find the improved version to be of considerable scientific interest. Your insightful comments and appreciation for our work are truly motivating.

Regarding your observation on the combined RESHWOA method, we are pleased that you recognize the significant advantage it offers over the classical Whale Optimization Algorithm (WOA). We put considerable effort into designing and implementing the RESHWOA approach, and we are thrilled that our efforts have yielded promising results.

Your suggestion of using the discrete recombination (DR) strategy to enhance other algorithms is indeed thought-provoking. We wholeheartedly agree with your assessment and believe that the DR strategy's versatility could be applicable to various optimization algorithms beyond our current research. We intend to explore further the potential of the DR strategy and its broader implications in the optimization domain. In our future work, we plan to investigate its effectiveness in combination with other metaheuristic algorithms, aiming to contribute to the advancement of optimization techniques.

We are pleased to inform you that we have taken your suggestion to heart and incorporated it into our paper. In the conclusion section, specifically on line 495.

Additional Editor Comments:

In its current state, the level of English throughout the manuscript needs improvement. You may wish to ask a native speaker to check your manuscript for grammar, style, and syntax.

All the corrections have been done.

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

All references in the manuscript have been reviewed and updated to comply with PLOS ONE referencing style. We have not cited any retracted papers in our work.

Attachment

Submitted filename: Response to the reviewers.docx

pone.0295643.s003.docx (17.6KB, docx)

Decision Letter 2

Omar A Alzubi

30 Aug 2023

PONE-D-23-01358R2Hybrid Whale Algorithm with Evolutionary Strategies and Filtering for High-Dimensional Optimization: Application to Microarray Cancer DataPLOS ONE

Dear Dr. hafiz,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Oct 14 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Professor Omar A. Alzubi

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #4: All comments have been addressed

Reviewer #5: (No Response)

Reviewer #6: All comments have been addressed

Reviewer #7: All comments have been addressed

Reviewer #8: All comments have been addressed

Reviewer #9: All comments have been addressed

********** 

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: Partly

Reviewer #8: Yes

Reviewer #9: Yes

********** 

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #4: Yes

Reviewer #5: N/A

Reviewer #6: Yes

Reviewer #7: Yes

Reviewer #8: Yes

Reviewer #9: Yes

********** 

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: Yes

Reviewer #8: (No Response)

Reviewer #9: Yes

********** 

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: Yes

Reviewer #8: Yes

Reviewer #9: Yes

********** 

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #4: All the concerns have been addressed well, I thus recommend this manuscript to be published in Plos one.

Reviewer #5: The paper, titled "Hybrid Whale Algorithm with Evolutionary Strategies and Filtering for High-Dimensional Optimization: Application to Microarray Cancer Data," submitted as PONE-D-23-01358R2, has shown significant improvement from the initial draft. However, several aspects still require further enhancement, with a notable need for an extended literature review.

The revised version of the paper demonstrates commendable progress in terms of content. The authors have refined their algorithm and provided a more comprehensive explanation of the proposed Hybrid Whale Algorithm with Evolutionary Strategies and Filtering. This has resulted in increased clarity regarding the methodology used for high-dimensional optimization in the context of microarray cancer data analysis.

The paper has made significant strides in explaining the Hybrid Whale Algorithm, making it more accessible to a wider readership. The authors have successfully addressed some of the ambiguities present in the previous draft, clarifying the key concepts and steps involved in the algorithm.

The presentation of empirical results has also been improved, with more detailed analysis and visualization of outcomes in the context of microarray cancer data. This contributes to a better understanding of the algorithm's performance and its potential applications.

Areas for Improvement:

One critical aspect that still requires substantial improvement is the literature review. The current literature review appears limited in scope and depth. It is essential to expand this section to include a more extensive survey of related works in the field of high-dimensional optimization and microarray data analysis. A robust literature review will not only provide a broader context for the proposed algorithm but also help identify gaps and opportunities for future research.

- B, N., & V, I. (2022). Enhanced machine learning based feature subset through FFS enabled classification for cervical cancer diagnosis. International Journal of Knowledge-Based and Intelligent Engineering Systems, 26, 79–89. https://doi.org/10.3233/KES-220009

- Mohammed, M. S., Rachapudy, P. S., & Kasa, M. (2021). Big data classification with optimization driven MapReduce framework. International Journal of Knowledge-based and Intelligent Engineering Systems, 25(2), 173-183.

While the paper has improved in terms of clarity, some mathematical notations and equations can still be challenging to follow. It would be beneficial to simplify complex equations, provide clearer explanations, and possibly offer more intuitive examples to aid in comprehension.

To strengthen the paper's credibility, the authors should consider including a more extensive validation process, including comparisons with other state-of-the-art optimization algorithms. This would help demonstrate the advantages and limitations of the proposed Hybrid Whale Algorithm more effectively.

The paper could benefit from improved visual presentation, such as the use of charts, graphs, and tables to illustrate key points and results. Visual aids can enhance the reader's understanding and engagement with the material.

Reviewer #6: The authors propose an improved RESHWOA algorithm and demonstrate that it outperforms WOA. The manuscript has been improved, but I have a few suggestions:

Please include the links to all data used in the study apart from carcinoma data, or perhaps their accession numbers for easy identification, the link szu.edu.cn is not enough to locate the data easily.

It would be good if the authors added how the features used to train the SVM model were encoded for easy reproducibility.

Reviewer #7: Should Explain in detail About RESHWOA Algorithm and explain clearly how it works for the medical data.

Reviewer #8: The authors have provided thoughtful and comprehensive responses to my comments, leaving me thoroughly satisfied.

Reviewer #9: The main strength of the RESHWOA algorithm is its ability to improve the diversity of the initial population of the WOA algorithm. This can help to prevent premature convergence and improve the performance of the algorithm.

The algorithm is easy to implement and understand, which makes it a good choice for practitioners who are new to metaheuristic algorithms.

Recommendations:

*Consider adding a section comparing RESHWOA with other state-of-the-art algorithms(beside WOA) if possible.

*While passive voice is common in scientific writing, overuse can make the text harder to read. Consider using active voice where it improves clarity.

Example: Instead of "It was evaluated," you could say, "We evaluated." (line 481)

Overall, the paper "A Novel Whale Optimization Algorithm with Discrete Recombination Strategy for Global Optimization" is a well-written and well-presented study. The authors have done a good job of explaining the motivation for the study, the methods used, and the results obtained. The paper makes a significant contribution to the field of metaheuristic algorithms and is a valuable resource for practitioners and researchers.

Upon addressing these minor revisions, I believe the manuscript will be ready for publication. I do not see a need for further rounds of review after these corrections are made. Please correct and submit the revised manuscript for final acceptance.

********** 

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #4: No

Reviewer #5: No

Reviewer #6: No

Reviewer #7: No

Reviewer #8: Yes: Valdecy Pereira

Reviewer #9: Yes: Ahsan ur Rehman

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Mar 11;19(3):e0295643. doi: 10.1371/journal.pone.0295643.r006

Author response to Decision Letter 2


11 Oct 2023

Manuscript

Response to reviewers

Dear Dr. Omar,

Thank you for allowing us to submit a 3rd revised draft of the manuscript "Hybrid Whale Algorithm with Evolutionary Strategies and Filtering for High-Dimensional Optimization: Application to Microarray Cancer Data" for publication in the Plos One. We appreciate the time and effort you and the reviewers dedicated to providing feedback on our manuscript and are grateful for the insightful comments and valuable improvements to our paper. We have incorporated most of the suggestions made by the reviewers. Please see below, in blue, for a point-by-point response to the reviewers' comments and concerns. All page numbers refer to the revised manuscript file with tracked changes.

Reviewers' Comments to the Authors:

Response to Reviewer #4:

All the concerns have been addressed well; I thus recommend this manuscript to be published in Plos one.

Thank you very much for your positive feedback and for recommending the publication of our manuscript in PLOS ONE. We appreciate your time and effort in reviewing our work, and we're delighted to hear that all your concerns have been addressed satisfactorily.

Your recommendation is highly valuable to us, and we look forward to sharing our research with the scientific community through PLOS ONE. Your feedback has contributed significantly to the improvement of our manuscript, and we are grateful for your constructive comments.

Response to Reviewer #5:

The paper, titled "Hybrid Whale Algorithm with Evolutionary Strategies and Filtering for High-Dimensional Optimization: Application to Microarray Cancer Data," submitted as PONE-D-23-01358R2, has shown significant improvement from the initial draft. However, several aspects still require further enhancement, with a notable need for an extended literature review.

The revised version of the paper demonstrates commendable progress in terms of content. The authors have refined their algorithm and provided a more comprehensive explanation of the proposed Hybrid Whale Algorithm with Evolutionary Strategies and Filtering. This has resulted in increased clarity regarding the methodology used for high-dimensional optimization in the context of microarray cancer data analysis.

The paper has made significant strides in explaining the Hybrid Whale Algorithm, making it more accessible to a wider readership. The authors have successfully addressed some of the ambiguities present in the previous draft, clarifying the key concepts and steps involved in the algorithm.

The presentation of empirical results has also been improved, with more detailed analysis and visualization of outcomes in the context of microarray cancer data. This contributes to a better understanding of the algorithm's performance and its potential applications.

We sincerely appreciate your thoughtful and comprehensive review of our manuscript. Your feedback has been immensely valuable in guiding us toward further improvements in our work. We are pleased to hear that you have observed significant progress in the revised version of our paper. Your observation regarding the need for an extended literature review is duly noted, and we will certainly work on enhancing this aspect of the manuscript. We understand the importance of providing a comprehensive background to contextualize our research better.

The fact that you found our refined algorithm explanation to be more accessible and the clarification of key concepts as commendable is encouraging. We are committed to ensuring that the methodology is explained in a clear and concise manner to benefit a broad readership.

Furthermore, your feedback on the presentation of empirical results is well received. We acknowledge the importance of providing a thorough analysis and visualization of outcomes, especially in the context of microarray cancer data analysis.

Once again, we extend our gratitude for your time and expertise in reviewing our work. Your input has been invaluable, and we will diligently address your suggestions to enhance the overall quality of the manuscript.

One critical aspect that still requires substantial improvement is the literature review. The current literature review appears limited in scope and depth. It is essential to expand this section to include a more extensive survey of related works in the field of high-dimensional optimization and microarray data analysis. A robust literature review will not only provide a broader context for the proposed algorithm but also help identify gaps and opportunities for future research.

- B, N., & V, I. (2022). Enhanced machine learning based feature subset through FFS enabled classification for cervical cancer diagnosis. International Journal of Knowledge-Based and Intelligent Engineering Systems, 26, 79–89. https://doi.org/10.3233/KES-220009

- Mohammed, M. S., Rachapudy, P. S., & Kasa, M. (2021). Big data classification with optimization driven MapReduce framework. International Journal of Knowledge-based and Intelligent Engineering Systems, 25(2), 173-183.

We appreciate your input and have taken your suggestion seriously. In response to your feedback, we have substantially expanded the literature review section in the manuscript, as you can see on lines 125 to 156. We have conducted a more comprehensive survey of related works in the fields of high-dimensional optimization and microarray data analysis. This expansion not only provides a broader context for our proposed algorithm but also helps identify gaps and opportunities for future research in a more robust manner.

As per your recommendation, we have expanded the literature review section and also included the two references you mentioned. These additional references provide valuable context to our work and contribute to a more comprehensive overview of the relevant literature.

We hope that the inclusion of these references aligns with your expectations and further strengthens the background and context for our research.

While the paper has improved in terms of clarity, some mathematical notations and equations can still be challenging to follow. It would be beneficial to simplify complex equations, provide clearer explanations, and possibly offer more intuitive examples to aid in comprehension.

We've expanded the explanations of mathematical notations and equations on pages 235 to 253, aiming to enhance clarity. Additionally, we've included more intuitive examples to aid comprehension. We appreciate your valuable input in improving our work.

To strengthen the paper's credibility, the authors should consider including a more extensive validation process, including comparisons with other state-of-the-art optimization algorithms. This would help demonstrate the advantages and limitations of the proposed Hybrid Whale Algorithm more effectively.

We fully understand the significance of a comprehensive validation process, including comparisons with other state-of-the-art optimization algorithms. While we acknowledge the value of such comparisons, we would like to highlight certain constraints we encountered during the preparation of this manuscript.

Firstly, the scope and length of our paper are already substantial, and we were conscious of providing a balanced level of detail and clarity. Given the space constraints in the journal, we opted to focus on providing a thorough presentation of our proposed Hybrid Whale Algorithm, its application to microarray cancer data, and detailed empirical results.

Secondly, conducting extensive comparisons with other optimization algorithms can be a resource-intensive endeavor, requiring a significant amount of computational resources and time. Unfortunately, due to limitations in both time and computational capacity, we were unable to perform a comprehensive benchmark against all relevant algorithms.

However, we are committed to addressing this aspect in future research endeavors and believe it would be a valuable extension of our work. In the current manuscript, we have placed significant emphasis on explaining the methodology, providing detailed results, and illustrating the algorithm's performance in the context of microarray cancer data.

We genuinely appreciate your insightful feedback, which will guide our future research directions, including more extensive validation and comparative analyses.

Once again, we thank you for your time and expertise in reviewing our manuscript.

The paper could benefit from improved visual presentation, such as the use of charts, graphs, and tables to illustrate key points and results. Visual aids can enhance the reader's understanding and engagement with the material.

We've already included charts, graphs, and tables in the paper to illustrate key points and results, enhancing the reader's understanding and engagement with the material. We appreciate your suggestion and are pleased to inform you that these visual aids are an integral part of our manuscript.

Response to Reviewer #6:

Please include the links to all data used in the study apart from carcinoma data, or perhaps their accession numbers for easy identification, the link szu.edu.cn is not enough to locate the data easily.

Thank you for your query regarding the availability of the data used in our study. We appreciate your thorough review and would like to address your concern.

In our study, we utilized data from various sources, including the carcinoma data, which we have previously provided. Additionally, for the remaining data used in our research, we sourced it from https://csse.szu.edu.cn/staff/zhuzx/datasets.html, as noted in our manuscript on page no. 581. To access the specific data sets used in our study, readers can follow the link provided, where they will find detailed information on each data set, including accession numbers, descriptions, and any relevant documentation. We believe that this link offers a comprehensive and user-friendly resource for accessing the data we utilized in our research.

It would be good if the authors added how the features used to train the SVM model were encoded for easy reproducibility.

Thank you for your suggestion regarding the encoding of features for our SVM model. We have provided a detailed explanation of this process in our manuscript, specifically in lines 495 to 517. This section offers a step-by-step guide for feature encoding, ensuring transparency and reproducibility.

Response to Reviewer #7:

Should Explain in detail About RESHWOA Algorithm and explain clearly how it works for the medical data.

For detailed insight into the RESHWOA algorithm, we have provided an extensive explanation spanning pages 217 to 253. In this section, we delve into the intricacies of the recombinative evolutionary strategy, outlining its core principles, mechanisms, and its specific role within the RESHWOA framework.

Furthermore, for a comprehensive understanding of RESHWOA as a whole, we have dedicated pages 307 to 333 to explain the algorithm's operation in detail. This section offers step-by-step insights into how RESHWOA functions, including its interactions with the data and optimization processes.

In our study, we address the crucial task of medical diagnosis, where the SVM (Support Vector Machine) classifier plays a pivotal role. The SVM classifier is highly valuable in diagnosing specific disorders because it can effectively distinguish between different classes or conditions. However, to harness the full potential of the SVM, we must first fine-tune its parameters to achieve the best possible performance, often measured by minimizing the Mean Squared Error (MSE).

One of the challenges we encounter in medical data analysis is the high dimensionality of the data. Medical datasets often comprise a large number of features or variables, which can introduce issues such as noise, redundancy, and the curse of dimensionality. To address these challenges and extract meaningful information from high-dimensional data, we employ data reduction techniques.

Our data reduction techniques are applied to both cancer data and normal data. By reducing the dimensionality of the dataset, we aim to make it more manageable while preserving the essential information necessary for accurate diagnosis.

Subsequently, we proceed with optimizing the SVM parameters based on the reduced data. This optimization step is critical to fine-tune the SVM model specifically for our dataset, ensuring optimal performance and minimal MSE. It allows us to tailor the SVM to the unique characteristics of the medical data under study.

Furthermore, we incorporate the recombinative evolutionary technique (RESHWOA) into our approach. This technique is noteworthy because it introduces diversity into the initial population. Diverse populations are advantageous in optimization problems as they explore a broader solution space, potentially leading to better solutions.

In summary, our approach combines data reduction techniques, SVM parameter optimization, and the inclusion of the recombinative evolutionary technique (RESHWOA) to address the challenges of medical data analysis. By optimizing the SVM model and introducing diversity in the initial population, we enhance the accuracy and robustness of our diagnosis process.

Response to Reviewer #8:

The authors have provided thoughtful and comprehensive responses to my comments, leaving me thoroughly satisfied.

We appreciate your kind words and are delighted to hear that you found our responses satisfactory. Your feedback and insights have been invaluable in improving the quality and clarity of our work.

Response to Reviewer #9:

The main strength of the RESHWOA algorithm is its ability to improve the diversity of the initial population of the WOA algorithm. This can help to prevent premature convergence and improve the performance of the algorithm. The algorithm is easy to implement and understand, which makes it a good choice for practitioners who are new to metaheuristic algorithms.

Thank you for highlighting the strengths of the RESHWOA algorithm in your feedback. We are pleased to hear that you recognize its ability to enhance the diversity of the initial population within the WOA algorithm, preventing premature convergence and improving overall performance. Additionally, we appreciate your observation that the algorithm's ease of implementation and understanding makes it a valuable choice, especially for practitioners new to metaheuristic algorithms.

Your positive assessment of these strengths aligns with our objectives in developing the RESHWOA algorithm, and we are encouraged by your feedback.

Consider adding a section comparing RESHWOA with other state-of-the-art algorithms(beside WOA) if possible.

We fully understand the importance of comprehensive validation, including comparisons with state-of-the-art optimization algorithms. However, due to constraints in the scope, length, computational resources, and time during the preparation of this manuscript, we focused on presenting our Hybrid Whale Algorithm, its application to microarray cancer data, and detailed empirical results. We acknowledge the limitations in not conducting extensive comparisons with other algorithms but express our commitment to addressing this in future research. We value the reviewer's feedback and appreciate their guidance for our future endeavors.

While passive voice is common in scientific writing, overuse can make the text harder to read. Consider using active voice where it improves clarity.

Thank you for your valuable feedback regarding the use of passive voice in our manuscript. We appreciate your input, and we have taken your suggestion to heart.

Upon careful review, we have made concerted efforts to address this issue and have revised the manuscript to incorporate active voice where it enhances clarity without compromising scientific rigor. We believe that these changes have significantly improved the readability and overall quality of the text.

We genuinely appreciate your constructive feedback, which has played a crucial role in enhancing the clarity and coherence of our work.

Attachment

Submitted filename: Respose to the Reviewers.docx

pone.0295643.s004.docx (20.9KB, docx)

Decision Letter 3

Omar A Alzubi

28 Nov 2023

Hybrid Whale Algorithm with Evolutionary Strategies and Filtering for High-Dimensional Optimization: Application to Microarray Cancer Data

PONE-D-23-01358R3

Dear Dr. hafiz,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Professor Omar A. Alzubi

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #4: (No Response)

Reviewer #5: All comments have been addressed

Reviewer #6: (No Response)

Reviewer #8: (No Response)

Reviewer #9: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #4: (No Response)

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #8: Yes

Reviewer #9: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #4: (No Response)

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #8: Yes

Reviewer #9: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #4: (No Response)

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #8: Yes

Reviewer #9: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #4: (No Response)

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #8: Yes

Reviewer #9: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #4: (No Response)

Reviewer #5: The paper has been improved according my comments. Therefore, I recommend to accepting in its current form.

Reviewer #6: (No Response)

Reviewer #8: (No Response)

Reviewer #9: This paper gives a valueable insight about the performance of given methods accross different data sets with improved optimization using RESHWOA.

Instead of using only minimum MSE, average MSE, and average standard deviation (Avg. Std), I recommend to use p-value to analyse the significant differences between the methods.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #4: No

Reviewer #5: No

Reviewer #6: No

Reviewer #8: No

Reviewer #9: Yes: Ahsan-ur-Rehman

**********

Acceptance letter

Omar A Alzubi

15 Dec 2023

PONE-D-23-01358R3

PLOS ONE

Dear Dr. Hafiz,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Omar A. Alzubi

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Description of evolutionary strategy.

    (TIF)

    pone.0295643.s001.tif (96.9KB, tif)
    Attachment

    Submitted filename: Response to reviewers.docx

    pone.0295643.s002.docx (32.2KB, docx)
    Attachment

    Submitted filename: Response to the reviewers.docx

    pone.0295643.s003.docx (17.6KB, docx)
    Attachment

    Submitted filename: Respose to the Reviewers.docx

    pone.0295643.s004.docx (20.9KB, docx)

    Data Availability Statement

    The data that support the findings of this study are openly available. The Breast, Colon, Central Nervous System (CNS), Ovarian, and Leukemia were downloaded from https://csse.szu.edu.cn/staff/zhuzx/datasets.html, while the Carcinoma tumour data set is come from [ Princeton University gene expression project] at http://genomics-pubs.princeton.edu/oncology.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES