Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2023 Feb 17;84:104718. doi: 10.1016/j.bspc.2023.104718

An augmented Snake Optimizer for diseases and COVID-19 diagnosis

Ruba Abu Khurma a, Dheeb Albashish b,, Malik Braik b, Abdullah Alzaqebah c, Ashwaq Qasem d, Omar Adwan a,e
PMCID: PMC9935299  PMID: 36811003

Abstract

Feature Selection (FS) techniques extract the most recognizable features for improving the performance of classification methods for medical applications. In this paper, two intelligent wrapper FS approaches based on a new metaheuristic algorithm named the Snake Optimizer (SO) are introduced. The binary SO, called BSO, is built based on an S-shape transform function to handle the binary discrete values in the FS domain. To improve the exploration of the search space by BSO, three evolutionary crossover operators (i.e., one-point crossover, two-point crossover, and uniform crossover) are incorporated and controlled by a switch probability. The two newly developed FS algorithms, BSO and BSO-CV, are implemented and assessed on a real-world COVID-19 dataset and 23 disease benchmark datasets. According to the experimental results, the improved BSO-CV significantly outperformed the standard BSO in terms of accuracy and running time in 17 datasets. Furthermore, it shrinks the COVID-19 dataset’s dimension by 89% as opposed to the BSO’s 79%. Moreover, the adopted operator on BSO-CV improved the balance between exploitation and exploration capabilities in the standard BSO, particularly in searching and converging toward optimal solutions. The BSO-CV was compared against the most recent wrapper-based FS methods; namely, the hyperlearning binary dragonfly algorithm (HLBDA), the binary moth flame optimization with Lévy flight (LBMFO-V3), the coronavirus herd immunity optimizer with greedy crossover operator (CHIO-GC), as well as four filter methods with an accuracy of more than 90% in most benchmark datasets. These optimistic results reveal the great potential of BSO-CV in reliably searching the feature space.

Keywords: Snake Optimizer, Feature selection, COVID-19, Transfer function, Greedy crossover

1. Introduction

The volume of medical data is steadily expanding daily to keep up with the rapid changes in medical equipment. Nowadays, machine learning and data science techniques play a vital role in medical diagnosis, particularly in discriminating between various forms of cancer. This diagnosing task is considered a classification task in machine learning, aiming to classify the input medical data into several discrete cases (i.e., benign and malignant). Many of the medical domain’s collected features are relevant, redundant, noisy, or irrelevant to classification tasks. Using irrelevant, noisy, and redundant features degrades classification model performance in medical diagnosis. As a result, the final decision in this domain becomes shaky and untrustworthy [1]. Therefore, it is necessary to pick only the proper features on which to perform the learning model. This will boost the effectiveness of the classifier’s output while reducing the learning model’s time-consuming, particularly when dealing with large datasets. In machine learning, FS methods are considered essential preprocessing algorithms for optimizing the efficiency of the classification methods by identifying a meaningful pattern to support the final judgment of the classifiers.

The FS task in the medical domain essentially involves devising a procedure to obtain a suitable subset of features from the original big dataset (e.g., all features). This subset includes features crucial to the current problem while excluding unnecessary or redundant features. If all features are utilized for the classification of medical tasks, the learning model will become porn to overfitting issues due to the curse of dimensionality. Thus, ultimate performance will suffer either in terms of accuracy or time-consuming [2]. Therefore, the primary purpose of the FS algorithms relies on two objectives: generating a smaller version of the original dataset by selecting the most relevant features and excluding irrelevant and redundant features. Meanwhile, improving classification performance [3], [4]. Implementing the FS process has a significant effect in avoiding the curse of dimensionality, which makes the learning methods less likely to overfit [5].

Typically, the feature selection process is divided into five stages [5]: initialization, subset discovery/search, subset evaluation, stopping criterion, and final subset validation. The subsets of features are generated in subset discovery, where each subset is chosen from the whole set of features (all dataset features). The search approach, in particular, explores the search space to identify the optimal feature subset. In reality, each feature in the new subset is checked for eligibility using a forward or backward elimination procedure. The quality of the selected features is assessed using a subset evaluation function. For this assignment, the majority of the FS approaches employ a predictive model with a suitable fitness function (i.e., accuracy). The halting condition is utilized to prevent the FS techniques from becoming trapped in an indefinite loop. Most stopping conditions include the maximum number of iterations as a predefined parameter [6].

FS algorithms can be divided into four classes based on using the evaluation methods: filter, embedded, wrapper and hybrid-based methods [6]. Filter-based FS approaches leverage statistical assessment metrics to rank features. Each feature is granted a score based on the designated metric (for example, information gain (IG) and F-score). Then, each feature is ranked based on its obtained score (ascending or descending). The high-score features are considered the most effective in the current domain. Generally, the filter-based methods have no real interaction with the classifier (predictive) model. As a result, they are faster than the wrapper and embedded methods. Many filters have been adopted in the literature, such as ReliefF [7], mutual information, absolute cosine (AC), and mRMR [8]. The second type is embedded methods, where the FS process is integrated into the classifier learning to become a single process, such as SVM-recursive feature elimination (SVM-RFE) [9]. The hybridization of the embedded feature selection with the filter method can be found in our latest study, where we combined the AC with the SVM-RFE to handle the redundancy in the SVM-RFE(SVM(AC)) [1].

In contrast to the filter method, wrapper methods make use of a predictive model (i.e., K-Nearest-Neighbor (KNN)) as part of the assessment phase to evaluate the fitness value of the acquired feature subset. The wrapper-based approach finds an appropriate subset (i.e., solution) for the current task. However, because the total number of potential solutions is 2n, where n is the number of dimensions, it is difficult to find a near-optimal subset of features in terms of objective fitness due to the vast search space. In addition, this problem has become more complicated when n is increasing dramatically in many fields due to the data collection phase. Thus, the complexity of those problems is increased. This indicates that standard brute force techniques are unfeasible and that advanced search techniques should be utilized instead. Hence, one of the promising techniques that can be used for those problems is Meta-heuristic Algorithms (MAs).

MAs are intelligent algorithms that involve mathematical operations and make several efforts to identify an optimal solution from a set of random solutions with the assistance of the learning model for a particular task [10]. MAs compute either a single or multiple objectives to select the optimal solution. To be precise, MAs use the information obtained during the search to guide the optimization process. They usually merge numerous solutions to generate a highly proficient one, e.g., crossover in Genetic Algorithm (GA) and avoid getting stuck in local minima. While the MAs search for the optimal solution, they usually perform two stages of search: exploration and exploitation [11]. During the exploration stage, the investigation covers a variety of environments to identify more locations for high-quality solutions. In contrast, at the exploitation stage, available resources are focused on a specific search location. The main challenge for MAs is to strike a balance between exploration and exploitation [3].

Using MAs with FS problems is considered a multi-objective task, where the primary goal is to preserve a minimum number of selected features while improving the classification performance. However, these two objectives are contradictory, and the optimal decision should be determined by making a trade-off between them. Recently, numerous MAs have been adopted for FS in medical classification and diagnosis apps. They are utilized in wrapper or hybrid wrapper-filter approaches. These include Moth Flame Optimization (MFO) [12], Coronavirus Herd Immunity Optimizer (CHIO) [2], Particle Swarm Optimization (PSO) [11], [13], Rat Swarm Optimizer (RSO) [14], and Mine Blast Algorithm (MBA) [15]. Continuing with the advantages of MAs in FS problems, we chose Snake Optimizer (SO) as one of the most recent MAs to achieve. The SO is a newly invented, continuous, nature-inspired method that mimics the snakes’ mating and fighting behaviors. SO includes mating and fighting modes. The former occurs based on cold temperatures. While in the latter, the snakes fight until the male gets the female and the female gets the best male. If no food is encountered, the exploration stage is started to search for food. In contrast, if the snakes eat the food, this is one case of exploitation.

SO has some particular aspects over MAs, first, it has a novel natural inspiration. This is the first time to propose the mating behavior of snakes for solving optimization problems. Second, the experimental results and statistical comparisons prove the effectiveness and efficiency of SO on different landscapes concerning exploration–exploitation balance and convergence curve speed [16]. Third, it has high stability and good convergence and is simple to implement and parameter-less [17].

However, many optimization tasks (e.g., feature selection) include discrete search space and decision variables. Besides, updating the population impacts the population’s diversity; as a result, the exploration stage needs to be improved to fully explore the search space [15], [18].

In this study, authors strive to exploit the swarm-based SO algorithm to build a wrapper-based approach to address several medical classification problems. This depends on nominating the most valuable and informative medical features in a specific dataset that are required for generating the best medical classification model with higher performance, less number of features, and less running time. Increasing the effectiveness of medical models using AI tools can be considered as an alternative to physicians with less cost and side effects on patients. Therefore, SO was adopted to search in the feature space for the best feature subset. Since SO was developed to deal with continuous optimization problems and it has never been used before to deal with discrete search space, in the first experiments authors generated a new binary version of SO called BSO using a common and widely used S-shape transform function. The BSO was validated by examining its performance using several evaluation measures such as accuracy, sensitivity, specificity, fitness value, number of selected features, running time, conversion curves, box plots, convergence speed, Holm’s test, and Friedman’s test. In the second experiment, new evolutionary greedy crossover operators (GC) (i.e., one-point, two-point, and uniform crossovers) are proposed to be integrated with SO to enhance its explorative power in the feature space. These are controlled by a switch probability. The two newly developed FS algorithms, BSO and BSO-CV, are implemented and assessed on a real-world COVID-19 dataset and 23 disease benchmark datasets. According to the experimental results, the improved BSO-CV significantly outperformed the standard BSO in terms of accuracy and running time in 17 datasets. Furthermore, it shrinks the COVID-19 dataset’s dimension by 89% as opposed to the BSO’s 79%. Moreover, the adopted operator on BSO-CV improved the balance between exploitation and exploration capabilities in the standard BSO, particularly in searching and converging toward optimal solutions. The BSO-CV was compared against the most recent wrapper-based FS methods; namely, the hyperlearning binary dragonfly algorithm (HLBDA), the binary moth flame optimization with Lévy flight (LBMFO-V3), the coronavirus herd immunity optimizer with greedy crossover operator (CHIO-GC), as well as four filter methods with an accuracy of more than 90% in most benchmark datasets. These optimistic results reveal the great potential of BSO-CV in reliably searching the feature space.

The manuscript’s structure is organized as follows: Section 2 provides a review of some related works. The theoretical and mathematical background of SO is presented in Section 3. In Section 4, the details of the new BSO and BSO-CV methods for FS are outlined. Then, in Section 5, the obtained results and related comparisons are reported and discussed. Followed by the computational statistical test analysis in Section 6. Finally, the conclusion and several recommendations for future research are illustrated in Section 7.

2. Literature review

Medical applications are a critical and crucial research area for machine learning scientists. Recently, many studies that exploit artificial intelligence and data science techniques have assisted in developing medical models. This depends on medical images, patient medical files, and other features to predict disease occurrence at an early stage [5].

2.1. Evolutionary feature selection for disease diagnosis

This subsection sheds light on the recent research in the field of medical applications that have developed evolutionary FS models to support physicians. In [19], the authors developed a new model for the early prediction of diabetes. The new model used Grey Wolf Optimization (GWO) and Adaptive Particle Swam Optimization (APSO) to improve the Multilayer Perceptron (MLP). They were able to reduce the number of selected features and achieve high-performance results. GWO-MLP and APGWO-MLP obtained an accuracy of 96% and 97% respectively.

Mazaher et al. [20], developed a new computer-aided diagnosis (CAD) system to detect different types of cardiac arrhythmia disorder using the ElectroCardioGram (ECG) signal. After the preprocessing steps, different features of the ECG signals were segmented and analyzed. Several metaheuristic algorithms were used in combination with the selected features. The best results were obtained using a multi-objective optimization algorithm called the Non-dominated Sorting Genetic Algorithm (NSGA II). The feed-forward neural network accuracy for heart disease classification was 98.75%. In [21], the authors proposed a new model-based Marine Predator Algorithm (MPA) to extract the most significant feature subset to enhance the classification accuracy using the k-Nearest Neighbors (k-NN). The MPA-KNN was applied to 18 medical datasets and achieved the best results compared with other compared with meta-heuristic algorithms.

The Moth Flame Optimization algorithm (MFO) [22] was one of the optimization algorithms used in developing FS approaches to handle medical diagnosis [12], [18], and [23]. In [12], Khurma et al. generated eight binary MFO versions using eight transfer functions. Then, they applied the Levy flight operator in combination with transfer functions to increase the diversity of the algorithm and support the exploration of the search space. The proposed approach achieved an accuracy of 83% over 23 datasets. In [18], an FS model based on the Moth Flame optimization algorithm (MFO) was proposed. The performance of MFO was improved by adopting an adaptive method to update the position of a solution. The proposed MFO was tested on sixteen medical datasets, and the results showed promising classification results. Another study [23] proposed the MFO using Levy flight and different selection mechanisms: random selection, tournament selection, and roulette wheel selection methods to decrease the bias of the MFO algorithm toward exploitation. The proposed methods were tested using 23 medical data sets. Their results showed an enhanced behavior of MFO in the exploration, convergence, and diversity of solutions.

Dhanusha et al. [24], proposed a new model for Alzheimer’s disease (AD) based on the imaging data and clinical profile. The memetic metaheuristic model was called the Chaotic Shuffled Frog Leaping Algorithm (CSFLA). It used chaotic mapping when the solution in the search space obtained the worst result. CSFLA [24] was a simple model with few parameters and generated smaller subsets of features, less computation time, and best performance compared with other algorithms in the deep neural network.

Jaddi et al. [25], employed the Cell Separation Algorithm (CSA) for cancer classification based on applying feature selection to microRNA data. The authors enhanced the movement of virtual cells in the CSA to achieve a balance between global and local search. The improved CSA (I-CSA) was tested using 22 classifiers on 25 test functions and four general biological classification problems, and an experiment for feature selection from microRNA data was performed. The accuracy of each cancer type was also compared with the accuracy of 77 classifiers reported in previous studies. The proposed approach obtained 100% accuracy in 25 out of 29 classes.

Abouelmagd et al. [26] applied the Coral Reefs Optimization (CRO) algorithm for FS of breast cancer. This was based on using five classifiers. The algorithm achieved an accuracy of 100% in four algorithms and 99.1% using one classifier. In the study performed by Alweshah [2], an FS approach was applied to determine the most informative subset of features for several medical problems. The Coronavirus Herd Immunity Optimizer (CHIO) was used with and without a Greedy Crossover (GC) operator to improve the exploration of the CHIO. The CHIO and CHIO-GC were applied to 24 medical datasets. The results show that CHIO-GC was better than CHIO in terms of accuracy, the number of selected features, F-measure, and convergence speed. The CHIO-GC obtained an accuracy of 79% on medical benchmark datasets and an accuracy of 93% on the COVID-19 dataset.

Kanya [27], developed a new CAD to use mammogram images for early detection of breast cancer. The authors applied feature extraction, selection, and other preprocessing steps. They proposed the Weighted Adaptive Binary Teaching Learning Based Optimization (WA-BTLBO) and XGBoost classifier. The experiments showed high-accuracy results in classifying mammogram images as normal or abnormal.

2.2. Machine learning techniques for tackling COVID-19: Background

Dey [28], proposed a hybrid model that was applied in two stages. The first stage fine-tuned the parameters of the Convolutional Neural Networks (CNNs) to get the features from the COVID-19 patient’s infected lungs. In the second stage applied the Manta Ray Foraging-based Golden Ratio Optimizer (MRFGRO) to select the most informative feature subset. The proposed model achieved a classification accuracy of 99.15%, 99.42%, and 95.57% on three COVID-19 datasets, respectively.

Aslan [29] presented a classification model that extracted features using CNN in a study. Furthermore, it identified the hyperparameters of algorithms by Bayesian Optimization. The main contribution of this study was using Artificial Neural Networks (ANNs) for lung image segmentation. Also, it classified the chest images computed from the COVID-19 Radiography Database. Using classifiers together with the best hyperparameters could produce optimizing results. The best-achieved result was 96.29% using SVM. In [30], an evolutionary and deep learning algorithm and an advanced interpretation model were combined into one framework to help clinical decision-makers in dealing with different pandemic cases promptly. The feature selection stage was implemented using a genetic algorithm, A deep artificial neural network achieved an AUC of 0.883.

Bandyopadhyay et al. [31], proposed two stages of methods that apply feature extraction and feature selection for detecting COVID-19 from CT scan images. For feature extraction, the CNN DenseNet architecture was used. Harris Hawks Optimization (HHO), Simulated Annealing (SA), and Chaotic maps were combined to perform feature selection. The method was applied to the SARS-COV-2 CT-Scan dataset and the achieved accuracy result was 98.42 Deniz et al. [32], used the genetic algorithm and the Extreme Learning Machines (MG-ELM), a multi-threaded genetic feature selection algorithm, to predict the risk level of COVID-19 patients. The authors studied the effects of multi-threaded genetic algorithm implementation with statistical analysis. To verify the efficiency of MG-ELM, they compared their results with traditional and more recent techniques. The proposed algorithm outperformed other algorithms in terms of prediction accuracy. Kurnaz et al. [33], applied an FS approach using the Crow Learning Algorithm and ANN. The FS was used to select the relevant features for COVID-19 disease. The experiments were applied to the COVID-19 disease dataset in a Brazilian hospital. The experimental results showed that the accuracy was 94.31%.

In the study conducted by Kukker et al. [34], reinforcement learning was applied to determine COVID-19 using chest X-ray images. The author used the JAYA-Optimization algorithm, Wavelet Transform, feature extraction, and Principal Component Analysis feature reduction technique on X-ray images. The obtained accuracy of the COVID-19 prediction using the proposed method was 87.75%.

Ragab et al. [35], used the ensemble method for the detection of COVID-19. In addition, Gaussian filtering was used to eliminate noise and enhance image quality. Furthermore, a Shark Optimization Algorithm (SOA) with Recurrent Neural Networks (RNN) was applied to extract features. An Improved Bat Algorithm with a Multiclass Support Vector Machine (IBA-MSVM) was used for CT scan classification. The results showed promising classification performance over other approaches.

In [36], the authors introduced a novel HyperLearning Binary Dragonfly Algorithm (HLBDA) to select the most promising features from the COVID-19 dataset. The results showed that the HLBDA achieved higher results than other related algorithms on the same dataset. In [37], the Ensemble Support vector machine with Ludo Game-based Swarm Algorithm (ESLGSA) was used for the COVID-19 prediction from the CT and X-ray images. The proposed approach reduced the physical labeling of the images. The accuracy results were 99.64% while the AUC was 0.9257.

According to a recent study [38], the authors utilized PSO with a convolutional neural network (PSTCNN) to discover COVID-19 using chest computed tomography (CT) medical images. In more detail, the authors use PSO to perform self-tuning for the CNN’s hyperparameters to improve diagnosis performance. The proposed (PSTCNN) achieved an accuracy value of 93.99%±1.78% for binary classification. However, the PSTCNN is only utilized for tuning three hyperparameters (i.e., the coefficient that controls the decay rates of the past gradient, the square of the decay rates of the past gradient, and the learning rate).

Authors in [39] utilized deep learning with the self-adaptive Jaya algorithm (WE-SAJ) for Covid-19 CT image diagnosis. The proposed model first extracted the wavelet entropy features from the CT images, then utilized the self-adaptive Jaya algorithm for training the model. Finally, they employed a 2-layer feedforward neural network (FNN) as the classifier. Their proposed WE-SAJ model achieved more than 85% sensitivity. Although this model was well-designed, it requires hyperparameter tuning to increase the obtained results and achieve fast convergence.

The bottom line is that many evolutionary algorithms have been applied in medical applications to diagnose several diseases. The findings showed that applying these intelligent algorithms with FS as a preprocessing stage can enhance the classification results. Concerning COVID-19, several machine-learning algorithms have been utilized to detect COVID-19. However, the critical aspects of medical diagnosis pushed researchers to propose new methodologies and new enhancement strategies to optimize the random search.

According to the No-Free-Lunch theorem, there is still an area for proposing new algorithms to diagnose diseases. In this study, the recent SO algorithm is proposed for the first time for medical diagnosis. A binary version is produced to perform an FS within a wrapper framework. Furthermore, the crossover operators are proposed to enhance the search capability and generate more balance between the exploration and exploitation phases. The target is to enforce more diversity among solutions and assist entrapped solutions to jump from the local minima.

3. Snake optimizer (SO)

The SO algorithm is inspired by the behavior of snakes in nature [16]. The following points show the main SO steps: SO Initializes a set of random solutions in the search space using Eq. (1).

Snakei=Snakemin+rand×(SnakemaxSnakemin) (1)

where Snakei is the location in the search space of the ith solution in the swarm. rand is a random number [0,1]. Snakemax and Snakemin are the minimum and the maximum values respectively for the studied problem.

The population is divided into two parts (50% male and 50% female) using Eqs. (2), (3)

NummaleNum/2 (2)
NumfemaleNumNummale (3)

where Num is the size of the population (all snakes). Nummale is the number of the male solutions. Numfemale is the number of female solutions.

Get the best solution from the male group (Snakebestmale), and female group (Snakebestfemale) and find the location of the food Lfood. Two other concepts are defined which are the temperature (Temperature) and the quantity of food (Qantity) as in Eqs. (4), (5) respectively.

Temperature=e(CuriterTotiter) (4)

where Curiter is the current iteration and Totiter is the number of all iterations.

Qantity=Const1×Exp(CuriterTotiterTotiter) (5)

where Const1 is a constant value equal to 0.5.

Exploring the search space (food is not found): this depends on using a specified threshold value. If Quantity<0.25, the solutions search globally by updating their locations concerning a specified random location in the search space. This modeled by Eqs. (6)– (9)

Snakemalei(iter+1)=Snakemalerand(iter)±Const2×ABmale×((SnakemaxSnakemin)×rand+Snakemin) (6)

where Snakemalei is ith male solution, Snakemalerand is the location of a random male solution, rand is a random number [0,1] and ABmale is the ability of the male solution to find the food and can be computed using Eq. (7):

ABmale=Exp(FitnessmalerandFitnessmalei) (7)

where Fitnessmalerand is the fitness of Snakemalerand and Fitnessmalei is the fitness of ith solution the in male group and Const2 is a constant equals 0.05.

Snakefemalei(iter+1)=Snakefemalerand(iter)±Const2×ABfemale×((SnakemaxSnakemin)×rand+Snakemin) (8)

where Snakefemalei is ith female solution, Snakefemalerand is the location of a random female solution, rand is a random number [0,1] and ABfemale is the ability of the female solution to find the food and can be computed using Eq. (9):

ABfemale=Exp(FitnessfemalerandFitnessfemalei) (9)

where Fitnessfemalerand is the fitness of Snakefemalerand and Fitnessfemalei is the fitness of ith solution the in male group and Const2 is a constant equals 0.05.

Exploiting the search space (Food is found) If the quantity of food is greater than a specified threshold Quantity>0.25 then the temperature is checked. If Temperature>0.6 (hot), The solutions will move to the food only.

Snake(i,j)(iter+1)=Lfood±Const3×Temperature×rand×(LfoodSnake(i,j)(iter)) (10)

where Snake(i,j) is the location of a solution (male or female), Lfood is the location of the best solutions, and Const3 is the constant value and equals 2.

If Temperature>0.6 (cold), The snake will be in the fight mode or mating mode Fight Mode.

Snakemalei(iter+1)=Snakemalei(iter)±Const3×FAM×rand×(SnakefemalebestSnakemalei(iter)) (11)

where Snakemalei is the ith male location, Snakefemalebest is the location of the best solution in the female group, and FAM is the fighting ability of the male solution.

Snakefemalei(itert+1)=Snakefemalei(iter+1)±Const3×FAF×rand×(SnakemalebestSnakefemalei(iter+1)) (12)

where Snakefemalei is the ith female location, Snakemalebest is the location of the best solution in the male group, and FAF is the fighting ability of the female solution.

FAM and FAF can be computed from the following equations:

FAM=Exp(FitnessfemalebestFitnessi) (13)
FAF=Exp(FitnessmalebestFitnessi) (14)

where Fitnessfemalebest is the fitness of the best solution for the female group, Fitnessmalebest is the fitness of the best solution of male group and Fitnessi is the solution fitness.

Mating mode.

Snakemalei(iter+1)=Snakemalei(iter)±Const3×MAm×rand×(Quantity×Snakefemalei(iter)Snakemalei(iter)) (15)
Snakefemalei(iter+1)=Snakefemalei(iter)±Const3×MAfm×rand×(Quantity×Snakemalei(iter)Snakefemalei(iter)) (16)

where Snakefemalei is the location of the ith solution in female group and Snakemalei is the location of the ith solution in male group and MAm and MAf are the ability of males and females for mating respectively and they can be computed as follow:

MAm=Exp(FitnessfemaleiFitnessmalei) (17)
MAf=Exp(FitnessmaleiFitnessfemalei) (18)

If the Egg hatch, select the worst male solution and the worst female solution and replace them

Snakemaleworst=Snakemin+rand×(SnakemaxSnakemin) (19)
Snakefemaleworst=Snakemin+rand×(SnakemaxSnakemin) (20)

where Snakemaleworst is the worst solution in the male group, Snakefemaleworst is the worst solution in the female group. The diversity factor operator ± gives chance to increase or decrease locations’ solutions to give a high probability to change the locations of solutions in the search space in all possible directions.

4. Proposed binary snake optimizer (BSO)

The FS problem deals with binary solutions that move in a discrete search space. The goal of the FS problem is to find the optimal subset of features. This feature subset represents the minimum number of features that have the maximum classification performance. In this study, the SO optimizer is converted for the first time into binary to tackle the FS problem. The original version of SO was developed to deal with continuous search space. Generating a binary version of SO requires representing a solution using a binary vector. The values of the solution’s elements are restricted to either ‘0’ or ‘1’. Concerning the update strategy in the algorithm, the solutions change their positions in the feature space. This requires using transfer functions to guarantee that the solution’s elements are either ‘0’ or ‘1’.

4.1. S-shaped transfer function

The used transfer function in this study is the sigmoid function Eq. (21). The main task of the sigmoid function is to generate a probability for each solution’s element. If this probability is greater than a specified threshold, then the value is ‘0’, otherwise, the value is ’‘1’, as presented in Eq. (22), where Xid(t), is the ith Snake at iteration t in dimension d. Algorithm 1 represents the pseudo-code of the binary version of the SO algorithm (BSO). Fig. 2 shows the flowchart of the BSO.

S=11+eXid(t) (21)
Xid(t+1)=0ifrand<S(Xid(t))1ifrandS(Xid(t)) (22)

Fig. 2.

Fig. 2

The flowchart of the proposed BSO for feature selection.

In the FS approach presented in this work, the TF displayed in Fig. 1 which represents Eq. (21) is used to represent the probability of changing the positions of the elements.

graphic file with name fx1001_lrg.jpg

Fig. 1.

Fig. 1

The sigmoidal transfer function for converting continuous data to discrete.

4.2. BSO for feature selection

To prepare the BSO for the FS problem, two main aspects should be considered: the solution representation and the fitness function. The FS problem requires initializing a solution using a binary vector. The length of this vector is the dimensionality of the problem. Hence, each bit of this vector represents a feature in a dataset. The values of the elements are either ‘0’ or ‘1’. ‘0’ means that the corresponding feature is not selected, while ‘1’ means the corresponding feature is selected. Fig. 3 shows the binary representation of the solutions.

Fig. 3.

Fig. 3

The binary representation of the solutions for feature selection task.

Selecting all the features of a dataset, such as in brute force algorithms, causes the search algorithm’s running time to be exponential. The running time of the search algorithm is 2n where n is the number of features in a dataset. Hence, reducing the number of features will increase the efficiency of the search algorithm. For this reason, an FS algorithm is multiobjective because its main target is to find a solution with the minimum number of features and maximum classification performance. In this study, the K-nearest neighbor (K-NN) algorithm was used to train the dataset. The parameter k is set to 5 [40]. The second aspect of the FS problem is the fitness function. The evaluation of the solutions depends on the number of selected features and the classification error rate as in Eq. (23), where αγR(D) is the error rate of classification, |SF| is the selected feature subset, |AF| is the set of all features in a dataset.

Fitness=αγR(D)+β|SF||AF| (23)

4.3. Evolutionary crossover operators

The crossover operator is one of the primary evolutionary operators that have been widely used to enhance the swarm-based algorithm. Integrating the crossover operator in the structure of a swarm-based algorithm causes a greater exploration of the search space. This means that the solutions are re-positioned and distributed to undiscovered regions of the search space. Empowering the diversity of the algorithm assists the optimizer to alleviate the local minima problem and being close to the global best solution. This algorithm is abbreviated as BSO-CV. Eq. (24) shows the crossover function. Each solution Si in the search space is linked with the position of one of the fittest solutions in the swarm. The roulette wheel selection operator is used to find out the fittest solution Sw.

Si(iter+1)=Si(iter)Sw(iter) (24)

Three types of crossover operators are used and integrated with the SO algorithm. The roulette wheel selection operator is used to select among these types randomly in each run of the SO.

  • One-point crossover: selects randomly a single point from the current solution. Based on the selected point, the next elements to them are exchanged with each other. The crossover is applied on the current solution Si and the best solution Sw. The probability of the occurrence of a single-point crossover r[0,.33] where r is a random number

  • Two-point crossover: selects randomly two points in the current solution. The elements within the two points are interchanged. This happens between Si and Sw. The probability of the occurrence of the two-point crossover r(0.33,0.67].

  • Uniform crossover: The current solution elements Si and the best solution elements Sw are shuffled based on a pre-determined ratio. For example, if the ratio is 30%, this means that every 30% of the number of elements in the solution must be exchanged between the two solutions. The probability of the occurrence of the uniform crossover r(0.67,1].

Fig. 4 shows the techniques followed by different types of crossover operators. Algorithm 3 shows the Greedy crossover operator pseudocode.

graphic file with name fx1002_lrg.jpg

Fig. 4.

Fig. 4

Evolutionary crossover operators: (i) one-point crossover. (ii) Two-point crossover. (iii) Uniform crossover.

4.4. Complexity analysis of BSO and BSO-CV

Time complexity of the BSO and BSO-CV algorithms was recruited with the use of the Big-O notation (i.e., the worst case). Particularly, the time complexity analysis of these methods for feature selection tasks is based basically on the initialization process, the dataset dimensions (d), the cost of the fitness function (C), the number of iterations for the optimization algorithm (K), population size n (i.e., the number of male + female populations), and the number of running experiments (V). In addition, the S-shaped transfer function is used to produce binary versions of the BSO and BSO-CV. Based on the above notations, the general computational complexity of the BSO and BSO-CV can be formulated using the Big-O case as follows:

OBSO=Oinit.+OKpop.update (25)
+OKfitnesseval.+OKselection

By calculating the Big-O for each phase in Eq. (26), the time complexity for BSO can be represented as the following:

OBSO=O(nf+nm)d+OVK(nf+nm)d (26)
+OVK(nf+nm)c+OVK(nf+nm)d
OBSOCV=O(nf+nm)d+OVK(nf+nm)d (27)
+O2VK(nf+nm)c+VK(nf+nm)d+OVK(nf+nm)d

As shown in Eq. (28) the main parameters of the complexity issue rely on the number of iterations as well as the size of the population. Besides, (nf+nM)dVK(nf+nM)d and (nf+nM)dVKc(nf+nM), so the component (nf+nM)d can be ruled out from the time complexity given in Eq. (28). Thus, the time complexity of the BSO can be viewed as the following:

O(BSO)(VK(nf+nM)d+VK(nf+nm)c) (28)

for the BSO-CV, the time complexity is the same as the BSO expect the CV is added into each iteration. thus, the time complexity for BSO-CV is ginven in the following:

5. Experimental results and discussion

5.1. Experiment settings and parameters setup

Some preliminary experiments were carried out to determine the input parameters that enabled the proposed method to produce a better output. To apply fairness, the algorithm configurations were identical throughout the experiments. The used classifier in the BSO wrapper framework is the K-nearest neighbor (KNN). The KNN receives each unclassified new data instance in the feature space as an input, then uses the similarity method to classify it and put it in a particular category. This labeling method is a kind of supervised learning that is commonly used in the diagnosis of disease. In this study, K=5 is used to make voting and the decision of the class membership is based on the majority of votes. The parameters setting is, the number of runs is 30, the number of iterations is 100, and the size of the population is 100.

5.2. Evaluation measures

The proposed BSO and BSO-CV are evaluated using accuracy, the number of selected features, running time, sensitivity and specificity, convergence curves, boxplots, and T-test. The following are descriptions of the accuracy, sensitivity, and specificity measures along with their formulas and meaning relative to disease diagnosis. Eq. (29), Eq. (30), and Eq. (31) and shows the mathematical formulas of the classification accuracy, sensitivity specificity, and precision respectively.

Accuracy=TP+TNTP+TN+FP+FN (29)

where:

  • True positives (TPs): indicates the instances that are actually sick (have a disease) and the model diagnoses them as sick. and the actual output was also true.

  • True negatives (TNs): indicates the instances that are actually well (does not have a disease) and the model diagnoses them as well.

  • False positives (FPs): indicates the instances that are actually well and the model diagnoses them as sick.

  • False negatives (FNs): indicates the instances that are actually sick and the model diagnoses them as well.
    Sensitivity=TPTP+FN (30)
    Specificity=TNTN+FP (31)

5.3. Description of the benchmark datasets

Table 1 shows the datasets used in this study. 23 medical benchmark datasets are used in the experiments. In addition, a real COVID-19 dataset is used. For the benchmark datasets, twelve of them were downloaded from the UCI (Diagnostic, Original, Prognostic, Coimbra, BreastEW, Retinopathy, Dermatology, ILPD-Liver, Lymphography, Parkinsons, ParkinsonC, and Prostate). Seven datasets were downloaded from KEEL (SPECT, Cleveland, HeartEW, Hepatitis, SAHear, Spectfheart, and Thyroid0387). Two datasets (Heart and Pima-diabetes) were downloaded from Kaggle. The remaining three datasets were downloaded from different feature selection websites (Leukemia from https://jundongl.github.io/scikit-feature/datasets.html), (Colon from https://jundongl.github.io/scikit-feature/datasets.html) and (Prostate_GE from https://jundongl.github.io/scikit-feature/datasets.html)

Table 1.

Medical benchmark datasets.

Number Dataset Number of features Number of instances Number of classes
1 Diagnostic 30 569 2
2 Original 9 699 2
3 Prognostic 33 194 2
4 Coimbra 9 115 2
5 BreastEW 30 596 2
6 Retinopathy 19 1151 2
7 Dermatology 34 366 6
8 ILPD-Liver 10 583 2
9 Lymphography 18 148 4
10 Parkinsons 22 194 2
11 ParkinsonC 753 755 2
12 SPECT 22 267 2
13 Cleveland 13 297 5
14 HeartEW 13 270 2
15 Hepatitis 18 79 2
16 SAHeart 9 461 2
17 Spectfheart 43 266 2
18 Thyroid0387 21 7200 3
19 Heart 13 302 5
20 Pima-diabetes 9 768 2
21 Leukemia 7129 72 2
22 Colon 2000 62 2
23 Prostate_GE 5966 102 2

5.4. A real world COVID-19 dataset

Recently, the world has suffered from the spread of Corona disease, caused by a contagious virus. The disease has spread so widely that it has been classified as an epidemic. It caused many deaths, and the number of patients exceeded the capacity of hospitals to accommodate them. Machine learning techniques have been used to treat disease and control it is spread [41], [42]. In this study, the COVID-19 real dataset was downloaded from https://github.com/AtharvaPeshkar/Covid-19-Patient-Health-Analytics. The purpose is to validate the BSO and BSO-CV by examining their ability to detect the disease. Table 2 shows the features of the dataset. This study intends to predict the death and recovery conditions depending on the given factors. The patients’ data that contain missing values for both “death” and “recov” status are removed from the main dataset. For the training and testing methodology, the dataset was split evenly between 50% training and 50% testing.

Table 2.

Covid-19 real dataset.

No Feature name Description
1 Id Patient identifier
2 Location Patient location (local address)
3 Country Country of origin of the patient
4 Gender Gender of the patient
5 Age Age of the patient
6 Sym_on Date the patient shows symptoms
7 Hosp_vis Date the patient visits hospital
8 vis_wuhan The patient has visited Wuhan
9 From_wuhan The patient is from Wuhan
10 Symptom1 A symptom presented by the patient
11 Symptom2 A symptom presented by the patient
12 symptom3 A symptom presented by the patient
13 Symptom4 A symptom presented by the patient
14 Symptom5 A symptom presented by the patient
15 Symptom6 A symptom presented by the patient

5.5. Results and discussion

As was already stated, the FS approaches try to reduce the problem space dimension by choosing the most informative features that perform at the most significant level. The datasets used for this purpose employ the binary version of the SO, known as BSO. Table 3 depicts the classification accuracy, sensitivity, specificity, and precision, as well as the number of features that were selected and the best average fitness value (AVE). Additionally, each measurement’s standard deviation (STD) was recorded. Superior improvements are demonstrated by getting ideal results with fewer features when using the BSO. In 11 datasets, it achieves a classification accuracy of greater than 90%. Additionally, in 8 datasets, over 95%. As can be seen, we were able to reduce data in 22 datasets by more than 50% using the suggested technique, and in four datasets (ParkinsonC, Leukemia, Colon, and ProstateGE), the reduction rate exceeded 90%, which minimized the complexity and saves resources. In contrast, and for the COVID-19 dataset, the BSO correctly identified the occurrences with a 95% proportion by employing just, on average, 3.5 features, generating a 76% reduction rate, as displayed in Table 10.

Table 3.

The results of SO without feature selection.

Benchmark Stat. measure Accuracy Sensitivity Specificity Time
Diagnostic AVE
STD
0.7499
0.0425
0.4796
0.0991
0.9098
0.0356
28.9987
3.9533
Original AVE
STD
0.9691
0.0087
0.9699
0.0133
0.9698
0.0256
54.5091
2.5939
Prognostic AVE
STD
0.7184
0.0587
0.9193
0.0754
0.6956
0.0889
75.9876
4.9834
Coimbra AVE
STD
0.5145
0.0976
0.4239
0.1815
0.6137
0.1573
29.8974
39.9873
Retinopathy AVE
STD
0.6523
0.0303
0.6742
0.0517
0.6333
0.0407
149.9875
29.7789
Dermatology AVE
STD
0.6934
0.0423
0.8332
0.0459
0.9211
0.0772
30.9987
2.3456
ILPD-Liver AVE
STD
0.7728
0.0444
0.8060
0.9281
0.2066
0.0735
55.1002
2.7344
Lymphography AVE
STD
0.9043
0.0739
0.4162
0.5348
0.7000
0.0948
59.9567
49.4922
Parkinsons AVE
STD
0.8471
0.0492
0.5703
0.1452
0.9410
0.0546
72.5678
0.9988
ParkinsonC AVE
STD
0.7307
0.0295
0.2707
0.0595
0.8938
0.0313
77.6543
5.9866
SPECT AVE
STD
0.6717
0.0448
0.7249
0.0625
0.5971
0.0847
96.5789
25.5431
Cleveland AVE
STD
0.4712
0.0569
0.1969
0.0291
0.8142
0.0128
15.8569
5.2949
HeartEW AVE
STD
0.8284
0.0412
0.8481
0.0635
0.8064
0.0682
56.9973
34.7790
Hepatitis AVE
STD
0.8778
0.0759
0.0789
0.2138
0.9533
0.0491
45.7563
2.7331
SAHeart AVE
STD
0.6239
0.0516
0.7774
0.0477
0.3216
0.1003
97.4367
5.7791
Spectfheart AVE
STD
0.7692
0.0505
0.3404
0.1621
0.8869
0.0438
104.8872
3.4456
Thyroid0387 AVE
STD
0.9382
0.0065
0.5463
0.0329
0.7536
0.0129
155.9893
7.8893
Heart AVE
STD
0.8567
0.5679
0.8087
0.0994
0.6432
0.3451
29.6388
2.4167
Pima-diabetes AVE
STD
0.7107
0.0287
0.8021
0.0347
0.5345
0.0554
69.8896
5.7689
Leukemia AVE
STD
0.8714
0.0868
0.9773
0.0495
0.6872
0.2041
67.9987
5.7654
Prostate_GE AVE
STD
0.8711
0.0535
0.8813
0.0941
0.8594
0.0763
250.6578
9.6678
BreastEW AVE
STD
0.9596
0.0154
0.9824
0.0158
0.9212
0.0404
7.8890
0.6789
Colon AVE
STD
0.7528
0.1285
0.8701
0.1508
0.6002
0.2365
9.9865
0.8976

Table 10.

Results of BSO and BSO-CV on COVID-19.

Results of BSO ON Covid-19 dataset

Evaluation measure Average Standard deviation Minimum Maximum
Accuracy 0.9378 0.0136 0.9167 0.9500
# Selected features 3.1000 2.5582 2.5000 3.5000
Fitness value 0.1371 0.0054 0.1304 0.1390
Running time 21.6646 0.0412 21.6157 21.7439
Sensitivity 0.2706 0.0610 0.1765 0.3571
Specificity 0.9631 0.0197 0.9314 1.0000

Results of BSO-CV ON Covid-19 dataset

Evaluation measure Average Standard deviation Minimum Maximum
Accuracy 0.9560 0.0123 0.9167 0.9661
#Selected features 1.7000 2.4967 1.5000 2.4400
Fitness value 0.1351 0.0039 0.1299 0.1366
Running time 4.4075 0.1031 4.2226 4.6160
Sensitivity 0.2973 0.1132 0.1111 0.5000
Specificity 0.9664 0.0231 0.9118 0.9905

These result from how snakes behave, as circumstances like quantity and temperature can lead snakes to change their locations. Additionally, the groupings of males and females can investigate more positions in the search space during the initialization phase.

In general, the population-based evolutionary algorithms can fall into local optima according to the investigation of the solutions in a guided manner besides the best solution’s space. In contrast, SO is a population-based evolutionary algorithm and can trap in the local optima region. Therefore, we use the well-known crossover CV technique to avoid trapping into local optima and balance the exploration and exploitation phases. Furthermore, by using the CV technique and updating the positions operator, more positions in the search space will be discovered to go beyond the local optima region. Table 4 shows a classification accuracy comparison between the BSO and the augmented BSO with a crossover operator, which is called a BSO-CV. It is demonstrated that the BSO-CV outperforms the standard BSO in 16 datasets and has the same classification accuracy as BSO in 6 datasets. Furthermore, the minimal accuracy for the BSO-CV is better than the minimum accuracy for the BSO in the six datasets (Original, Coimbra, Dermatology, Pimadiabetes, Leukemia, and Colon), as opposed to reaching similar accuracy on average. Which indicates the effect of covering more positions in the search space. Thus, the newly generated solutions by using the CV operators play an essential role in widening the searchability of the algorithm and gaining the power to avoid local optima

Table 4.

Results of the proposed BSO vs augmented BSO with crossover in terms of average, standard deviation, minimum, maximum accuracy.

Benchmark Average
Standard deviation
Minimum
Maximum
BSO BSO-CV BSO BSO-CV BSO BSO-CV BSO BSO-CV
Diagnostic 0.9912 0.9930 0.0091 0.0124 0.9549 0.9725 0.9800 1.0000
Original 0.9886 0.9886 0.0060 0.0060 0.9357 0.9557 0.9700 1.0000
Prognostic 0.8368 0.8474 0.0166 0.0299 0.7995 0.8421 0.8733 0.8947
Coimbra 0.8182 0.8182 0.0000 0.0000 0.7582 0.7782 0.7910 0.8182
Retinopathy 0.7348 0.7391 0.0074 0.0100 0.6917 0.7004 0.7210 0.7478
Dermatology 1.0000 1.0000 0.0000 0.0000 0.9290 0.9400 0.9600 1.0000
ILPD-Liver 0.7949 0.7966 0.0000 0.0054 0.7397 0.7466 0.7711 0.7966
Lymphography 0.9452 0.9729 0.0628 0.0351 0.8571 0.9286 0.95550 1.0000
Parkinsons 0.9845 0.9850 0.0250 0.0242 0.9474 0.9500 0.9676 1.0000
ParkinsonC 0.7833 0.7873 0.0158 0.0131 0.7332 0.7632 0.7900 0.8133
SPECT 0.8047 0.8180 0.0479 0.0266 0.7989 0.8889 0.8419 0.8889
Cleveland 0.6817 0.7041 0.0390 0.0145 0.6097 0.7737 0.7022 0.7241
HeartEW 0.9148 0.9370 0.0250 0.0250 0.6800 0.7078 0.9411 0.9630
Hepatitis 0.9750 0.9875 0.0530 0.0395 0.8362 0.9350 0.98100 1.0000
SAHeart 0.7478 0.7500 0.0112 0.0185 0.7391 0.8730 0.7509 0.7826
Spectfheart 0.9051 0.9084 0.0406 0.0267 0.7091 0.8846 0.9445 0.9615
Thyroid0387 0.9894 0.9881 0.0032 0.0032 0.9145 0.9400 0.9831 0.9944
Heart 0.9074 0.9259 0.0000 0.0195 0.7082 0.7682 0.9033 0.9259
Pima-diabetes 0.8182 0.8182 0.0000 0.0000 0.8344 0.8959 0.7982 0.8182
Leukemia 1.0000 1.0000 0.0000 0.0000 0.9066 0.9321 0.9870 1.0000
Prostate_GE 0.9900 1.0000 0.0316 0.0000 0.8571 0.8571 0.9822 1.0000
BreastEW 0.9895 0.9912 0.0092 0.0123 0.8093 0.9080 0.9810 1.0000
Colon 0.9571 0.9571 0.0690 0.0690 0.9000 0.9049 0.9744 1.0000

On the other hand, and based on the selected number of features, Table 5 shows that the BSO-CV achieves a higher reduction rate by adopting fewer features in 14 datasets since it discovers fewer positions in the search space compared to the BSO. In comparison, the BSO-CV surpasses the BSO in terms of reduction rate on the COVID-19 dataset by shrinking the dataset’s dimension by 89% as opposed to the BSO’s 79% as shown in Table 10. As a result, BSO and BSO-CV show the most significant reduction rate for solving FS problems.

Table 5.

Results of the proposed BSO vs augmented BSO with crossover in terms of the average, standard deviation, minimum and maximum number of selected features.

Benchmark Average
Standard deviation
Minimum
Maximum
BSO BSO-CV BSO BSO-CV BSO BSO-CV BSO BSO-CV
Diagnostic 14.4000 13.0000 2.6331 1.4907 9.0000 11.0000 18.0000 15.0000
Original 3.3000 3.2000 0.4830 0.4216 3.0000 3.0000 4.0000 4.0000
Prognostic 15.5000 14.2000 2.6352 2.2509 12.0000 10.0000 21.0000 18.0000
Coimbra 6.1000 6.0000 0.3162 0.0000 6.0000 6.0000 7.0000 6.0000
Retinopathy 11.2000 10.5000 2.2010 1.9003 8.0000 8.0000 14.0000 14.0000
Dermatology 17.3000 17.3000 1.7670 1.7670 14.0000 15.0000 19.0000 19.0000
ILPD-Liver 4.0000 3.6000 0.4714 0.5164 3.0000 3.0000 5.0000 4.0000
Lymphography 9.4000 9.1000 3.3015 1.9120 16.0000 7.0000 27.0000 12.0000
Parkinsons 9.1000 8.5000 1.9692 1.3540 7.0000 4.0000 13.0000 10.0000
ParkinsonC 461.4000 466.1000 23.5523 20.1299 425.0000 437.0000 496.0000 508.0000
SPECT 12.5000 12.6000 1.5811 1.8379 9.0000 10.0000 15.0000 15.0000
Cleveland 6.4000 6.5000 0.9661 0.9718 5.0000 5.0000 8.0000 8.0000
HeartEW 7.3000 6.8000 1.4181 0.6325 5.0000 6.0000 9.0000 8.0000
Hepatitis 5.9000 6.9000 1.8529 1.9120 2.0000 4.0000 8.0000 9.0000
SAHeart 3.1000 2.6000 0.5676 0.5164 2.0000 2.0000 4.0000 3.0000
Spectfheart 22.7000 25.0000 3.3015 3.0551 16.0000 19.0000 27.0000 30.0000
Thyroid0387 10.0000 9.5000 0.9428 1.8409 9.0000 6.0000 12.0000 12.0000
Heart 5.2000 4.2000 0.9189 1.3166 4.0000 3.0000 6.0000 6.0000
Pima-diabetes 4.0000 4.0000 0.0000 0.0000 4.0000 4.0000 4.0000 4.0000
Leukemia 3502.8000 3498.1000 32.8830 22.0880 3453.0000 3468.0000 3541.0000 3533.0000
Prostate_GE 2969.8000 2982.5000 28.9513 48.0954 2934.0000 2898.0000 3028.0000 3055.0000
BreastEW 15.8000 16.0000 1.6193 2.7889 13.0000 12.0000 19.0000 20.0000
Colon 965.3000 970.3000 15.8749 12.6232 937.0000 945.0000 986.0000 994.0000

Furthermore, since the BSO-CV selects fewer features and higher variations in the conducting accuracy compared with the BSO, the BSO-CV achieves the lowest fitness values during the algorithm’s iterations. Since the reduction rate and the classification error rate are the fitness function’s objectives, as shown in Eq. (23). In terms of the number of selected features, Table 5 shows the best selected number of features. While the best fitness values for both BSO and BSO-CV are illustrated in Table 6.

Table 6.

Results of the proposed BSO vs augmented BSO with crossover in terms of the average, standard deviation, minimum, and maximum fitness values.

Benchmark Average
Standard deviation
Minimum
Maximum
BSO BSO-CV BSO BSO-CV BSO BSO-CV BSO BSO-CV
Diagnostic 0.0230 0.0217 0.0033 0.0005 0.0204 0.0211 0.0321 0.0224
Original 0.0122 0.0114 0.0029 0.0022 0.0105 0.0105 0.0176 0.0176
Prognostic 0.1869 0.1813 0.0117 0.0112 0.1625 0.1593 0.2122 0.1877
Coimbra 0.0929 0.0928 0.0004 0.0000 0.0928 0.0928 0.0939 0.0928
Retinopathy 0.2353 0.2337 0.0015 0.0010 0.2329 0.2323 0.2382 0.2355
Dermatology 0.0051 0.0051 0.0005 0.0004 0.0041 0.0044 0.0056 0.0056
ILPD-Liver 0.2259 0.2255 0.0005 0.0005 0.2249 0.2249 0.2269 0.2259
Lymphography 0.0598 0.0528 0.0175 0.0235 0.0386 0.0067 0.0749 0.0738
Parkinsons 0.0598 0.0851 0.0168 0.0106 0.0549 0.0789 0.1056 0.1056
ParkinsonC 0.2297 0.2245 0.0094 0.0030 0.2220 0.2224 0.2483 0.2291
SPECT 0.1738 0.1608 0.0179 0.0087 0.1348 0.1540 0.1927 0.1736
Cleveland 0.1738 0.3390 0.0089 0.0010 0.3346 0.3172 0.3578 0.3520
HeartEW 0.1670 0.1611 0.0079 0.0098 0.1521 0.1513 0.1719 0.1712
Hepatitis 0.0218 0.0100 0.0292 0.0190 0.0028 0.0022 0.0652 0.0641
SAHeart 0.2940 0.2934 0.0006 0.0006 0.2928 0.2928 0.2950 0.2939
Spectfheart 0.0948 0.0841 0.0167 0.0117 0.0613 0.0622 0.1164 0.0995
Thyroid0387 0.0208 0.0192 0.0025 0.0026 0.0169 0.0152 0.0254 0.0222
Heart 0.1840 0.1813 0.0114 0.0078 0.1700 0.1700 0.2067 0.1867
Pima-diabetes 0.1862 0.1862 0.0000 0.0000 0.1862 0.1862 0.1862 0.1862
Leukemia 0.1463 0.1463 0.0000 0.0000 0.1463 0.1463 0.1464 0.1464
Prostate_GE 0.0545 0.0545 0.0000 0.0000 0.0544 0.0544 0.0546 0.0546
BreastEW 0.0079 0.0062 0.0038 0.0027 0.0053 0.0040 0.0138 0.0134
Colon 0.2524 0.2523 0.0001 0.0001 0.2522 0.2522 0.2525 0.2524

Evolutionary algorithms are iterative-nature algorithms, which initially begin with random solutions and iteratively update and generate new solutions. Then, based on the fitness function, the most feasible solution will be compared with the upcoming generated solution’s fitness value and kept the best. As for the proposed BSO and BSO-CV, Table 7 presents the running time for both algorithms. It is clearly shown that the BSO-CV takes less time than the BSO in 17 datasets. Furthermore, the BSO-CV saves computation resources and time by achieving the best fitness value in early iterations since it uses a crossover operator and covers more positions in the search space. In other words, it balances the exploration and exploitation phases, which gives it the power to avoid falling into local optima.

Table 7.

Results of the proposed BSO vs augmented BSO with crossover in terms of the average, standard deviation, minimum, and maximum running times.

Benchmark Average
Standard deviation
Minimum
Maximum
BSO BSO-CV BSO BSO-CV BSO BSO-CV BSO BSO-CV
Diagnostic 24.0234 20.1709 1.2632 0.0919 22.3863 20.0304 25.7645 20.2939
Original 52.4918 21.7947 1.4929 0.6899 50.6546 19.8469 55.1332 22.1207
Prognostic 68.5666 19.3182 1.7727 2.1484 65.8117 18.1769 71.0640 24.6123
Coimbra 21.6436 31.3520 4.3715 1.5690 14.0520 30.1216 27.5464 35.1497
Retinopathy 143.4651 25.5549 8.6208 1.1699 122.1240 24.5064 148.8203 28.7080
Dermatology 19.8110 19.6653 1.2157 0.0583 18.0606 19.5964 21.5541 19.7621
ILPD-Liver 40.2314 21.2359 1.6953 0.14384 38.0070 21.0012 43.2428 21.5077
Lymphography 48.6365 18.7374 47.3621 18.5243 50.1288 19.0404 2.4862 0.1618
Parkinsons 61.0268 18.4078 0.9944 0.1112 59.6189 18.2514 62.7003 18.5701
ParkinsonC 71.5125 85.9408 2.7348 0.7091 67.6873 85.0221 74.2297 87.0784
SPECT 88.4608 19.4803 1.5437 0.2716 86.3784 19.0815 91.1695 19.8700
Cleveland 13.7458 34.7516 4.1938 6.1588 7.3716 21.4041 18.0061 41.4505
HeartEW 31.8354 19.6732 1.3055 0.0925 29.7337 19.5130 33.8312 19.8095
Hepatitis 35.6952 17.9791 1.5000 0.1082 33.8597 17.8650 38.6481 18.1687
SAHeart 86.1272 20.5537 1.6604 0.0840 83.2163 20.4461 88.0913 20.7296
Spectfheart 92.3180 19.3628 1.2729 0.1233 90.6496 19.2185 94.3931 19.5442
Thyroid0387 128.0856 169.6468 4.6228 7.6117 118.4025 158.0242 131.3219 179.4991
Heart 28.5277 19.3467 1.3055 0.4925 26.3888 17.9694 30.4858 19.7040
Pima-diabetes 64.7067 22.0212 1.7239 0.8515 65.8117 19.7021 66.9316 22.7713
Leukemia 56.6581 45.1058 2.4862 0.3344 53.3672 44.2620 59.1867 45.3924
Prostate_GE 149.3277 81.1939 7.0456 4.4155 135.0467 70.0391 155.6277 86.0762
BreastEW 4.5585 21.0010 0.0729 0.1930 4.5031 20.7807 4.7405 21.3079
Colon 5.8711 23.8118 0.5273 0.1527 5.5947 23.6718 7.0661 24.1146

Since the BSO-CV achieved the best fitness values in early iterations, the proposed algorithm has a fast convergence speed. Fig. 6 illustrates the convergence curves for both the BSO and BSO-CV algorithms. The convergence curves prove the strength of covering more positions in the search space and the effect of balancing the exploration and exploitation phases. As shown for the SAHeart, Spectfheart, SPECT, Heart, and Covid-19 subfigures, the BSO is trapped in local optima. In comparison, the BSO-CV can avoid and go beyond the local optima and not be stuck in this local optimum solution

Fig. 6.

Fig. 6

Convergence curves for BSO-CV and BSO methods on the medical benchmark data sets and COVID-19 dataset.

Additionally, Fig. 5 shows the box plots for the BSO and BSO-CV methods on the tested datasets regarding classification accuracy. The y-axis represents the classification accuracy, while the x-axis represents the tested methods. It is clearly shown that the distribution of the BSO-CV is better than BSO since the median of the BSO-CV’s box plots is greater than or equal, in some cases, compared with the median of the BSO. That proves the robustness of the proposed method. Fig. 7 shows the average maximum and minimum accuracy of BSO and BSO-CV for all datasets

Fig. 5.

Fig. 5

Boxplots for BSO-CV and BSO methods on the medical benchmark datasets and COVID-19 dataset.

Fig. 7.

Fig. 7

Average of maximum and minimum accuracy of BSO and BSO-CV for all datasets.

In classification problems, sensitivity and specificity are essential for determining how well a model can forecast true positives and negatives for each category. Furthermore, more diversification in the obtained solutions means covering a more expansive space in the search space, which forces the algorithm to be more sensitive and specific. Table 8, Table 9 show the sensitivity and specificity of the proposed BSO and BSO-CV, respectively. As clearly shown, the BSO-CV is more sensitive and specific than the BSO in 22 datasets and performed better for the Covid-19 dataset. The reason behind that is the crossover technique, which gives the BSO-CV the strength to discover more positions in the search space, balances the exploration and exploitation phases, and gives it the power to avoid trapping into local optima.

Table 8.

Comparison results of the proposed BSO vs augmented BSO with crossover in terms of average, standard deviation, minimum and maximum sensitivity.

Benchmark Average
Standard deviation
Minimum
Maximum
BSO BSO-CV BSO BSO-CV BSO BSO-CV BSO BSO-CV
Diagnostic 0.9635 0.9684 0.0113 0.0222 0.9437 0.9221 0.9825 1.0000
Original 0.9704 0.9718 0.0094 0.0118 0.9577 0.9420 0.9867 0.9853
Prognostic 0.1933 0.2103 0.1549 0.2268 0.0000 0.0000 0.4286 0.6667
Coimbra 0.6407 0.7156 0.1371 0.1273 0.5455 0.4444 1.0000 0.8571
Retinopathy 0.6443 0.6726 0.0328 0.6239 0.5882 0.0293 0.7025 0.7154
Dermatology 0.9500 0.9817 0.0304 0.0158 0.9200 0.9500 1.0000 1.0000
ILPD-Liver 0.8170 0.8352 0.0496 0.0629 0.7284 0.6829 0.8875 0.8941
Lymphography 0.4473 0.4861 0.4239 0.4749 0.0000 0.0000 0.9286 1.0000
Parkinsons 0.9494 0.9531 0.0558 0.0286 0.8519 0.9000 1.0000 1.0000
ParkinsonC 0.8835 0.8861 0.0340 0.0369 0.8182 0.8073 0.9252 0.9292
SPECT 0.5112 0.5405 0.1152 0.0747 0.3529 0.4000 0.7059 0.6667
Cleveland 0.1792 0.5405 0.0953 0.0948 0.0000 0.0769 0.3000 0.3636
HeartEW 0.8297 0.8783 0.0581 0.0786 0.7931 0.7097 0.9630 0.9667
Hepatitis 0.2000 0.3283 0.2297 0.3234 0.0000 0.0000 0.5000 1.0000
SAHeart 0.3854 0.4916 0.0531 0.0914 0.3333 0.3333 0.5000 0.6071
Spectfheart 0.8520 0.8661 0.0404 0.0705 0.7857 0.7436 0.9024 0.9474
Thyroid0387 0.8472 0.8345 0.0816 0.0986 0.7097 0.6923 0.9655 1.0000
Heart 0.8244 0.8544 0.0694 0.0847 0.7353 0.7241 0.9355 0.9643
Pima-diabetes 0.5008 0.5362 0.0473 0.0586 0.4483 0.3750 0.6102 0.5800
Leukemia 0.6470 0.7895 0.2734 0.1701 0.3333 0.5000 1.0000 1.0000
Prostate_GE 0.8392 0.8769 0.0882 0.0892 0.7273 0.7273 1.0000 1.0000
BreastEW 0.9411 0.9521 0.0219 0.0159 0.9130 0.9306 0.9857 0.9726
Colon 0.5238 0.5850 0.1903 0.2304 0.3333 0.2500 1.0000 1.0000

Table 9.

Results of the proposed BSO vs augmented BSO with crossover in terms of average, standard deviation, minimum and maximum specificity.

Benchmark Average
Standard deviation
Minimum
Maximum
BSO BSO-CV BSO BSO-CV BSO BSO-CV BSO BSO-CV
Diagnostic 0.9058 0.9133 0.0387 0.0355 0.8571 0.8605 0.9778 0.9722
Original 0.9735 0.9794 0.0149 0.0111 0.9531 0.9577 1.0000 0.9867
Prognostic 0.8909 0.9218 0.0419 0.0527 0.8214 0.8125 0.9688 1.0000
Coimbra 0.7087 0.7636 0.1253 0.1587 0.5714 0.5000 1.0000 1.0000
Retinopathy 0.7088 0.7130 0.0450 0.6577 0.6449 0.0383 0.7822 0.7565
Dermatology 0.9404 0.9415 0.0354 0.0285 0.8824 0.8889 1.0000 0.9800
ILPD-Liver 0.2442 0.2980 0.0624 0.1057 0.1379 0.1667 0.3611 0.4545
Lymphography 0.7311 0.7699 0.0839 0.1392 0.6154 0.5357 0.8571 1.0000
Parkinsons 0.6498 0.6745 0.0756 0.1030 0.5000 0.4545 0.8889 0.8182
ParkinsonC 0.2856 0.2901 0.0149 0.0573 0.1622 0.1842 0.4048 0.3714
SPECT 0.6075 0.7784 0.1131 0.6389 0.3889 0.9000 0.7778 0.0787
Cleveland 0.6215 0.6469 0.0417 0.0647 0.5714 0.5490 0.7021 0.7647
HeartEW 0.7152 0.7208 0.1111 0.0996 0.5714 0.6000 0.8333 0.9130
Hepatitis 0.9552 0.9565 0.0514 0.0504 0.8571 0.8571 1.0000 1.0000
SAHeart 0.7440 0.7901 0.0700 0.0504 0.7049 0.8571 0.9492 1.0000
Spectfheart 0.4165 0.4879 0.1507 0.1328 0.2222 0.2000 0.7000 0.7000
Thyroid0387 0.9828 0.9815 0.0037 0.0039 0.9759 0.9780 0.9866 0.9908
Heart 0.6629 0.6686 0.0890 0.0946 0.5200 0.5357 0.8095 0.8000
Pima-diabetes 0.8007 0.8211 0.0319 0.0390 0.7573 0.7292 0.8632 0.8614
Leukemia 0.9657 0.9678 0.0738 0.0564 0.7778 0.8571 1.0000 1.0000
Prostate_GE 0.8852 0.8887 0.0753 0.0998 0.8000 0.6667 1.0000 1.0000
BreastEW 0.8920 0.9015 0.0697 0.0472 0.7442 0.8298 1.0000 0.9545
Colon 0.8276 0.8663 0.1704 0.1635 0.4444 0.5556 1.0000 1.0000

Finally, the conducted results show superior performance for the proposed BSO and BSO-CV for solving the FS problems. The crossover techniques have a significant effect in balancing the exploration and exploitation phases. This plays a vital role in allowing the algorithm to converge more quickly and avoid becoming stuck in local optima. This results in a more reliable ML algorithm.

5.6. Comparison with other meta-heuristic algorithms in the literature

The above results show that the BSO-CV achieved promising results in classification accuracy, running time, sensitivity, specificity, convergence, and boxplots. Also, it achieves competitive results regarding the fitness value and the size of the selected feature subset. To validate the results and show their reliability, the proposed BSO-CV is compared against six methods in the literature. These methods are LBMFO-V3 [12] by using 23 medical datasets. Then, it was compared against HLBDA [2] using the COVID-19 dataset. Then, it is compared with the CHIO-GC [2] using 23 medical datasets and the COVID-19 dataset. Finally, the BSO-CV will be compared to four filter methods used in previous studies: Chi-square, Relief, correlation-based feature selection (CFS), and information gain (IG) [12].

5.6.1. Comparison with CHIO-GC

Table 11 shows a comparison between the proposed BSO-CV and CHIO-GC in terms of the average accuracy and the average feature subset size. It appears that the BSO-CV outperforms the CHIO-GC in all of the datasets except three, which are the ParkinsonsC, Prognostic, and Coimbra. Regarding the feature subset size, it appears that the BSO-CV outperforms the CHIO-GC in 57% of the datasets. Through ten datasets, the CHIO-GC achieved better feature selection than the BSO-CV. These datasets are Diagnostic, Prognostic, Coimbra, Retinopathy, ParkinsonsC, SPECT, HeartEW, Spectfheart Thyroid0387, and BreastEW.

Table 11.

Comparison results of the BSO-CV with LBMFO-V3 and CHIO-GC.

Benchmark Average accuracy
Average selection size
BSO-CV CHIO-GC LBMFO-V3 BSO-CV CHIO-GC LBMFO-V3
Diagnostic 0.9930 0.9033 0.9100 14.4000 13.3700 13.9991
Original 0.9886 0.9710 0.9683 3.3000 5.1040 5.5000
Prognostic 0.8474 0.6716 0.9312 15.5000 14.6202 3.5103
Coimbra 0.8182 0.8896 0.9312 6.1000 3.6007 3.5103
Retinopathy 0.7391 0.6436 0.5380 11.2000 7.2647 6.9002
Dermatology 1.0000 0.8006 0.8442 17.3000 18.4900 18.3541
ILPD-Liver 0.7966 0.7716 0.7143 4.0000 4.0000 4.0000
Lymphography 0.9729 0.8343 0.8002 9.4000 10.0622 9.7520
Parkinsons 0.9850 0.7903 0.7689 9.1000 9.7383 10.3584
ParkinsonC 0.7873 0.8400 0.8190 461.4000 365.8322 369.1070
SPECT 0.8180 0.6960 0.6576 12.5000 9.6050 10.7832
Cleveland 0.7041 0.5966 0.5333 6.4000 6.8097 6.6899
HeartEW 0.9370 0.9116 0.9388 7.3000 7.0105 6.3100
Hepatitis 0.9875 0.7903 0.7500 5.9000 8.2011 8.3569
SAHeart 0.7500 0.7036 0.6992 3.1000 3.1551 3.2222
Spectfheart 0.9084 0.7303 0.7013 22.7000 21.0030 20.4598
Thyroid0387 0.9881 0.9603 0.9776 10.0000 8.0116 8.4563
Heart 0.9259 0.8126 0.7603 5.2000 6.1505 6.2752
Pima-diabetes 0.8182 0.7956 0.8065 4.0000 6.8387 6.7612
Leukemia 1.0000 0.9900 1.0000 3502.8000 3560.5107 3570.7137
Prostate_GE 1.0000 0.6010 0.5056 2969.8000 2979.4116 2984.7153
BreastEW 0.9912 0.9400 0.9398 16.0000 13.7303 13.9714
Colon 0.9571 0.7176 0.6667 970.3000 1000.0067 991.5551

In Fig. 8, the BSO-CV is compared with the CHIO-GC in terms of the classification accuracy and number of selected features using 23 benchmark medical datasets. From the subfigure on the left side, the BSO-CV achieved an average classification accuracy of 89.9 across the 23 datasets, which outperformed the CHIO-GC, which got 78.5. In terms of feature selection size, the BSO-CV achieved a smaller rate, which equals 350.2 features across all datasets, while the CHIO-GC achieved 351.3 features.

Fig. 8.

Fig. 8

Average accuracy and average number of selected features using BSO-CV and CHIO-GC methods on 23 benchmark medical datasets.

As mentioned in [2], it is clear that the CHIO-GC employed the greedy crossover approach, which greedily takes the best candidate to generate new solutions. Which, in turn, eliminates the worst solution. The worst solutions can form better solutions in the upcoming generations using different searching techniques [43]. In contrast, the BSO-CV employs the roulette wheels mechanism for choosing the crossover operator (single point, double point, and uniform), enhancing the generated solutions’ diversity and avoiding trapping into local optima. On the other hand, the SO algorithm initializes the population based on two groups, males and females, which makes the initial population more discoverable. For these reasons, the BSO-CV shows a better performance compared with CHIO-GC.

Fig. 9, shows graphically the results of the BSO-CV compared with CHIO-GC. BSO-CV achieved an accuracy of 95.9 on the COVID-19 dataset. However, the CHIO-GC achieved smaller accuracy equal to 93.2. Regarding the number of selected features,

Fig. 9.

Fig. 9

Average accuracy and average number of selected features using BSO-CV and CHIO-GC methods on COVID-19 dataset.

5.6.2. Comparison with LBMFO-V3

Table 11 shows that the BSO-CV outperformed the LBMFO-V3 in all the datasets except Prognostic, Coimbra, ParkinsonC, and HeartEW. This indicates the outperformance of the proposed BSO-CV in 83% of the datasets. Also, it appears from Table 11, that the BSO-CV outperformed the LBMFO-V3 in 57% of the datasets.

Fig. 10 shows graphically the results of this comparison. It appears from the subfigure on the left side that the BSO-CV achieved an average accuracy of 89.9% across all the datasets. However, the LBMFO-V3 achieved an average accuracy of 76.5% across all datasets. This indicates that the BSO-CV outperforms the LBMFO-V3 in 13.4% of the datasets. It appears from the subfigure on the right side that the BSO-CV achieved an average of 350.2 selected features from all the datasets. On the other hand, LBMFO-V3 has an average of 351.7 which is higher than the BSO-CV.

Fig. 10.

Fig. 10

Average accuracy and the average number of selected features using BSO-CV and LBMFO-V3 methods on 23 benchmark medical datasets.

As mentioned earlier, The proposed BSO-CV employs different operators for crossover, which gives the proposed algorithm the power to cover more regions in the search space and avoid falling into local optima regions. In other words, it creates a balance between the exploration and exploitation phases. In addition, the MFO suffers from slow population diversity [44], where the BSO-CV solves this issue by using groups of males and females in the initialization phase. For this, the BSO-CV shows a better performance compared with LBMFO.

5.6.3. Comparison with HLBDA

Fig. 11 shows that the BSO-CV is compared with the HLBDA in terms of the classification accuracy and the number of selected features using the COVID-19 dataset. The subfigure on the right side achieved an accuracy of 95.9, while the HLBDA achieved an accuracy of 91.5. On the other hand, in the subfigure on the right side, the HLBDA achieved an average selection rate of 1.7, whereas the BSO-CV achieved a greater selection rate of 2.3 features.

Fig. 11.

Fig. 11

Average accuracy and the average number of selected features using BSO-CV and LBMFO-V3 methods on the COVID-19 dataset.

The hyperlearning approach was introduced to the HLDA algorithm to enhance the search capability of the BDA by considering both individual and group bests. The algorithm uses this approach to circumvent the local optima problem. However, using the search strategy to find more positions in the search space is still restricted. By comparison, the proposed BSO-CV can outperform the HLDA by using crossover operators to identify more positions in the search space.

5.7. Comparison with filter-based methods

In this subsection, the proposed BSO-CV as a wrapper-based method is compared against four general filter-based approaches, namely, Chi-square, relief, CFS, and IG. Table 12 shows the average accuracy achieved by these methods after applying them 30 times to the 23 medical benchmark datasets. It can be observed from Table 12 that the BSO-CV exceeded all the filters in all the datasets except in one case. This is the case when the Chi-square is implemented on the SPECT dataset. Fig. 12 shows the accuracy rate of the BSO-CV and the four filter methods. It appears from the red line that represents the BSO-CV that it occupies a larger area of the accuracy radar shape. Table 13 shows that BSO-CV was superior in 81% of the datasets.

Table 12.

Comparison between the proposed BSO-CV and filter methods using the classification accuracy.

Benchmark BSO-CV Chi-square Relief CFS IG
Diagnostic 0.9930 0.5714 0.9585 0.9533 0.9349
Original 0.9886 0.9091 0.6426 0.6860 0.6759
Prognostic 0.8447 0.5910 0.7727 0.7576 0.7577
Coimbra 0.8182 0.3846 0.6672 0.5763 0.5578
Retinopathy 0.7391 0.6349 0.5036 0.4783 0.5393
Dermatology 1.0000 0.7250 0.7248 0.4732 0.4021
ILPD-Liver 0.7949 0.7106 0.5119 0.5223 0.5264
Lymphography 0.9729 0.8824 0.5886 0.5533 0.5204
Parkinsons 0.9850 0.7581 0.7588 0.7360 0.7150
ParkinsonC 0.7873 0.6593 0.6590 0.6487 0.6376
SPECT 0.8180 0.9667 0.5651 0.5508 0.5460
Cleveland 0.7041 0.3940 0.1181 0.0398 0.0826
HeartEW 0.9370 0.9334 0.6153 0.5757 0.6202
Hepatitis 0.9875 0.7778 0.5538 0.5857 0.6417
SAHeart 0.7500 0.6471 0.5024 0.5115 0.5227
SPECTfheart 0.9084 0.7000 0.6079 0.6279 0.5551
Thyroid0387 0.9881 1.0000 0.6379 0.6955 0.9773
Heart 0.9259 0.5333 0.6317 0.5575 0.6114
Pima-diabetes 0.8182 0.6905 0.5147 0.5426 0.5264
Leukemia 1.0000 0.7120 0.6883 0.6759 0.6410
Prostate_GE 1.0000 0.5042 0.5033 0.4786 0.4421
BreastEW 0.9912 0.9365 0.8160 0.8029 0.8128
Colon 0.9571 0.5850 0.5641 0.5116 0.5097

Fig. 12.

Fig. 12

Accuracy-based comparison between the BSO-CV and other common filter methods.

Table 13.

Comparison between the proposed BSO-CV and filter methods based on the number of selected features.

Benchmark BSO-CV Chi-square Relief CFS IG
Diagnostic 13.0000 18.0000 17.5000 16.0000 15.0000
Original 3.2000 4.0000 4.0000 4.0000 4.0000
Prognostic 14.2000 20.0000 18.0000 19.0000 21.0000
Coimbra 6.0000 7.0000 7.0000 7.0000 7.0000
Retinopathy 10.5000 14.0000 13.0000 11.0000 12.0000
Dermatology 17.3000 19.0000 15.0000 14.0000 16.0000
ILPD-Liver 3.6000 5.0000 4.0000 4.0000 4.0000
Lymphography 9.1000 16.0000 27.0000 12.0000 15.0000
Parkinsons 8.5000 13.0000 10.0000 9.0000 11.0000
ParkinsonC 466.1000 495.0000 496.0000 460.0000 461.0000
SPECT 12.6000 15.0000 13.0000 14.0000 14.0000
Cleveland 6.5000 7.0000 8.0000 7.0000 7.0000
HeartEW 6.8000 9.0000 8.0000 7.0000 9.0000
Hepatitis 6.9000 9.0000 8.0000 7.0000 9.0000
SAHeart 2.6000 4.0000 3.0000 3.0000 4.0000
SPECTfheart 25.0000 30.0000 27.0000 26.0000 29.0000
Thyroid0387 9.5000 10.0000 12.0000 11.0000 10.0000
Heart 4.2000 6.0000 5.0000 6.0000 6.0000
Pima-diabetes 4.000 4.0000 4.0000 4.0000 4.0000
Leukemia 3498.1000 3500.0000 3530.0000 3533.0000 3540.0000
Prostate_GE 2982.5000 2972.0000 2540.5000 2533.0000 2966.5000
BreastEW 16.0000 19.0000 17.0000 20.0000 18.0000
Colon 970.3000 990.0000 985.0000 977.0000 980.0000

6. Computational statistical test analysis

The accuracy of the proposed BSO and its improved variant BSO-CV is evaluated in the previous section using two metric performance measures. It also contrasts their conclusions about FS problems with those of other meta-heuristic and filter-based techniques that have been previously reported in the literature. The average and standard deviation of the optimal solutions found thus far in 30 independent runs serve as the metric performance measures. These metrics broadly understand how well the proposed algorithms handle FS problems. The proposed algorithms’ average performance is shown in the first metric, and their consistency across all 30 independent runs is shown in the second statistic. Although these statistical metric measurements could lead to the suggested BSO and BSO-CV being generally reliable and resilient, they cannot compare each of the 30 independent runs separately. In other words, they have demonstrated that the proposed BSO and BSO-CV enjoy significant concentrations of exploitation and exploration but are unable to demonstrate their superiority.

This part applies statistical Friedman’s and Holm’s test methods to compare each independent run and highlights the significance of the results that were not produced by chance. Friedman’s test is a well-known non-parametric statistical test technique that is always used to evaluate algorithms’ performance levels. The goal of Friedman’s test is to ascertain whether there is a fundamental distinction between the outputs of the various algorithms [45]. The null hypothesis underlying this statistical test is that there is no variation in the accuracy of the compared algorithms. The algorithm with the best performance receives the lowest rank, and the algorithm with the worst performance gets the highest rank. Finding the p-value of Friedman’s test for the results of the FS problems under consideration is necessary for the Friedman and Holm test techniques. If Friedman’s statistical test produces a p-value that is equal to or less than the level of significance, it is equivalent to 0.05 in this case. The null hypothesis is disproved, indicating statistically significant variations in how well the comparing algorithms work. Following this test is a post-hoc test procedure, where Holm’s method can be used to examine the pairwise comparison of the algorithms. For post-hoc analysis, according to Friedman’s test, the algorithm with the lowest rank is typically used as a control technique.

In the following subsections, Friedman’s and Holm’s tests were applied to reveal that the average results shown in Table 11, Table 12 are statistically significant and do not swerve from the results of other wrapper-based and filter-based FS methods in a statistically respectable way.

6.1. A statistical test of BSO compared to other wrapper FS methods

To quantify the statistical disparity between BSO-CV and other wrapper FS methods, Friedman’s test [46] was performed with a significance level of alpha (i.e., 5%). Therefore, according to the outcomes shown in Table 11, the BSO-CV method will be ranked with other FS methods. A summary of the rankings of the FS approaches determined using Friedman’s test for the accuracy results shown in Table 11 is provided in Table 14.

Table 14.

A summary of the ranking results obtained by applying Friedman’s test to the results given in Table 11.

Algorithm Rank
BSO-CV 1.282608
CHIO-GC 2.260869
LBMFO-V3 2.456521

The p-value reported by Friedman’s statistical method is based on the accuracy results of FS problems in Table 11 is 1.119E−4. The null hypothesis of equivalent performance was rejected to emphasize a statistically significant difference between the classification rates of the comparing algorithms. According to the results shown in Table 14, The BSO-CV algorithm outperformed all other comparative algorithms and has statistical significance on the datasets listed in Table 1; the BSO-CV has attained the lowest rank of 1.282608 with a degree of significance of 5%. CHIO-GC is the second-best performing algorithm on these datasets, and LBMFO-V3 is the third-best performing algorithm but underperformed BSO-CV. Overall, Friedman’s test results on the FS problems are in Table 1, producing the ranks BSO-CV, CHIO-GC, and LBMFO-V3 in that order. Then, Holm’s test method was used to determine whether there was a statistically significant difference between BSO-CV and the other algorithms displayed in Table 14. Statistical comparison was made utilizing Holm’s method, and the FS benchmark datasets outlined in Table 1 are shown in Table 15.

Table 15.

Results of Holm’s test method based on the results in Tables 14 for α=0.05.

i Algorithm z=(R0Ri)/SE p-value α÷i Hypothesis
2 LBMFO-V3 3.980932 6.864535E−05 0.025000 Rejected
1 CHIO-GC 3.317444 9.084512E−04 0.050000 Rejected

The findings of Holm’s technique are shown in Table 15 and reject hypotheses with p-values less than 0.05. These findings demonstrate that BSO-CV outperforms other convincing competitors’ algorithms statistically. Furthermore, these findings indicate how BSO-CV has avoided the local optimal solutions by striking the ideal balance between its capabilities for exploration and exploitation.

6.2. A statistical test of BSO-CV compared to filter-based FS methods

As shown in Table 12, the results of the proposed BSO-CV are compared with those of well-known filter-based FS approaches to evaluate its robustness further. The acquired accuracy results are then statistically assessed using Friedman’s test, as shown in Table 16.

Table 16.

A summary of the ranking results obtained by Friedman’s test on the results presented in Table 12.

Method Rank
BSO-CV 1.130434
Chi-square 2.478260
Relief 3.521739
CFS 3.782608
IG 4.086956

The accuracy results in Table 12 show that Friedman’s test’s predicted p-value is 1.144E−10. According to the report of the findings presented in Table 16, the BSO-CV algorithm is the most effective overall and statistically significant. The BSO-CV performed the best, with a significance level of α=5% and the best rank of 1.130434. With a rank of 2.478260, the Chi-square method comes in second place. The rankings in Table 16 show that the BSO-CV proposed in this work outperforms the filter-based FS approaches evaluated in Table 12. In conclusion, the proposed BSO-CV comes out on top, with Chi-square, Relief, CFS, and IG coming in last.

The statistical significance of any differences between the filter-based FS method and the BSO-CV method is then determined using Holm’s test. Based on the datasets mentioned in Table 1, Table 17 presents the statistical findings of Holm’s test.

Table 17.

Results of Holm’s method based on the statistical results of the results in Table 16 (Friedman’s test with α=0.05).

i Method z=(R0Ri)/SE p-value α÷i Hypothesis
4 IG 6.341032 2.2823009 E-10 0.012500 Rejected
3 CFS 5.688279 1.283257E−08 0.016666 Rejected
2 Relief 5.128776 2.916314E−07 0.025000 Rejected
1 Chi-square 2.890764 0.003843 0.050000 Rejected

For the outcomes in Table 17, Holm’s test method rejects any hypothesis with a p-value 0.05. From the data in Table 17, it can be shown that BSO-CV has outperformed filter-based approaches in terms of performance when used as an FS method. These results show that BSO-CV has successfully avoided local solutions while exploring and utilizing the search space of the FS datasets, which is statistically significant from an important point of view. In addition, the proposed BSO-CV selects the most critical features with minimum redundancy. Thus, the performance of the BSO-CV significantly outperformed the state-of-the-art filter methods on most of the benchmark datasets in Table 1. These findings support the proposed BSO’s efficiency in addressing FS tasks in the medical domain.

7. Conclusion and future works

Due to the importance of caring for people’s lives, diseases must be diagnosed accurately and impartially. Recently, swarm-based algorithms have proved their capability to perform disease classification very efficiently based on different data mining techniques such as feature selection. This study converts the recently developed SO algorithm into binary and enhances it using other evolutionary crossover methods. The enhanced BSO is applied to 23 medical benchmark datasets and a real-world COVID-19 dataset. Various evaluation measures are used to evaluate the performance of the proposed algorithm, including accuracy, the number of selected features, fitness value, running time, sensitivity, specificity, convergence curves, boxplots, and T-test. The average, standard deviation, and minimum and maximum values are reported for each evaluation measure.

The developed binary version of SO shows exciting results for solving a medical classification problem by achieving an accuracy of more than 90% in 11 datasets and exceeding 95% in 8 datasets. In addition, as the BSO can fall into local optima like any evolutionary algorithm, this paper proposes an augmented BSO with a crossover operator to avoid this drawback and balance the exploration and exploitation phases. The BSO-CV performs better than the BSO since it outperforms the BSO in 17 datasets and shrinks the COVID-19 dataset’s dimension by 89% as opposed to the BSO’s 79%. And Since the BSO-CV achieved the best fitness values in early iterations, the proposed algorithm has a fast convergence speed. It saves more resources by consuming less time than the BSO, and the BSO-CV is more sensitive and specific than the BSO. Conversely, the proposed BSO-CV was compared with CHIO-GC, LBMFO-V3, and HLBDA. The conducted results show the superior performance of the BSO-CV by achieving the best accuracies with more reduction rates. In contrast, the BSO-CV shows an ideal effect in solving FS problems; by comparing the outcomes of the BSO-CV with well-known filter-based FS methods, since the BSO-CV outperforms the standard filter-based FS methods.

In the future, we plan to use the proposed BSO algorithm in cyber security applications such as intrusion detection systems, ransomware detection, and blockchain applications. Furthermore, researchers can direct their efforts to generate a multi-objective version of the SO and use it in the electroencephalography (EEG) field. Other operators and modification methods can be integrated with the SO algorithm to generate a hybrid binary version that can balance the exploration and exploitation phases of the search process.

CRediT authorship contribution statement

Ruba Abu Khurma: Proposed and evolved the mathematical models of the proposed algorithm, Prepared the experiments, tables, diagrams and pseudo-code of the proposed algorithm, Executed the programs and experimental scenarios of the work, Full revision of the entire paper. Dheeb Albashish: Proposed and evolved the mathematical models, Methodology, Formal analysis, Investigation, Validation, Supervision, Full revision of the entire paper. Malik Braik: Implement the proposed BSO and BSO-CV, Discussed the results, Assist in the writing. Abdullah Alzaqebah: Discussed all the computational results of the proposed algorithm, Checked the validation of the results and the references. Ashwaq Qasem: Examined the technical concepts in the paper, the readability of the full paper and English grammar, and computed the t-test. Omar Adwan: Offered feedback on the work and helped shape and analyze the work.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

Data will be made available on request.

References

  • 1.Sahran S., Albashish D., Abdullah A., Abd Shukor N., Pauzi S.H.M. Absolute cosine-based SVM-rfe feature selection method for prostate histopathological grading. Artif. Intell. Med. 2018;87:78–90. doi: 10.1016/j.artmed.2018.04.002. [DOI] [PubMed] [Google Scholar]
  • 2.Alweshah M., Alkhalaileh S., Al-Betar M.A., Bakar A.A. Coronavirus herd immunity optimizer with greedy crossover for feature selection in medical diagnosis. Knowl.-Based Syst. 2022;235 doi: 10.1016/j.knosys.2021.107629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Albashish D., Hammouri A.I., Braik M., Atwan J., Sahran S. Binary biogeography-based optimization based SVM-RFE for feature selection. Appl. Soft Comput. 2021;101 [Google Scholar]
  • 4.Braik M. Enhanced ali baba and the forty thieves algorithm for feature selection. Neural Comput. Appl. 2022:1–32. doi: 10.1007/s00521-022-08015-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Abu Khurma R., Aljarah I., Sharieh A., Abd Elaziz M., Damaševičius R., Krilavičius T. A review of the modification strategies of the nature inspired algorithms for feature selection problem. Mathematics. 2022;10(3):464. [Google Scholar]
  • 6.Xue B., Zhang M., Browne W.N., Yao X. A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 2015;20(4):606–626. [Google Scholar]
  • 7.Deng Z., Chung F.-L., Wang S. Robust relief-feature weighting, margin maximization, and fuzzy optimization. IEEE Trans. Fuzzy Syst. 2010;18(4):726–744. [Google Scholar]
  • 8.Ramírez-Gallego S., Lastra I., Martínez-Rego D., Bolón-Canedo V., Benítez J.M., Herrera F., Alonso-Betanzos A. Fast-mRMR: Fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int. J. Intell. Syst. 2017;32(2):134–152. [Google Scholar]
  • 9.Guyon I., Elisseeff A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003;3(Mar):1157–1182. [Google Scholar]
  • 10.Abdel-Basset M., Abdel-Fatah L., Sangaiah A.K. Metaheuristic algorithms: A comprehensive review. Comput. Intell. Multimed. Big Data Cloud Eng. Appl. 2018:185–231. [Google Scholar]
  • 11.Saw T., Myint P.H. Feature selection to classify healthcare data using wrapper method with PSO search. Int. J. Inf. Technol. Comput. Sci. 2019;11(9):31–37. [Google Scholar]
  • 12.Abu Khurmaa R., Aljarah I., Sharieh A. An intelligent feature selection approach based on moth flame optimization for medical diagnosis. Neural Comput. Appl. 2021;33(12):7165–7204. [Google Scholar]
  • 13.Alweshah M. Hybridization of arithmetic optimization with great Deluge algorithms for feature selection problems in medical diagnosis. Jordanian J. Comput. Inf. Technol. 2022;8(2) [Google Scholar]
  • 14.Awadallah M.A., Al-Betar M.A., Braik M.S., Hammouri A.I., Doush I.A., Zitar R.A. An enhanced binary Rat Swarm Optimizer based on local-best concepts of PSO and collaborative crossover operators for feature selection. Comput. Biol. Med. 2022 doi: 10.1016/j.compbiomed.2022.105675. [DOI] [PubMed] [Google Scholar]
  • 15.Alweshah M., Alkhalaileh S., Albashish D., Mafarja M., Bsoul Q., Dorgham O. A hybrid mine blast algorithm for feature selection problems. Soft Comput. 2021;25(1):517–534. [Google Scholar]
  • 16.Hashim F.A., Hussien A.G. Snake Optimizer: A novel meta-heuristic optimization algorithm. Knowl.-Based Syst. 2022 [Google Scholar]
  • 17.Rawa M. Towards avoiding cascading failures in transmission expansion planning of modern active power systems using hybrid snake-Sine cosine optimization algorithm. Mathematics. 2022;10(8):1323. [Google Scholar]
  • 18.Khurma R.A., Aljarah I., Sharieh A. Rank based moth flame optimisation for feature selection in the medical application. 2020 IEEE Congress on Evolutionary Computation; CEC; IEEE; 2020. pp. 1–8. [Google Scholar]
  • 19.Le T.M., Vo T.M., Pham T.N., Dao S.V.T. A novel wrapper–based feature selection for early diabetes prediction enhanced with a metaheuristic. IEEE Access. 2020;9:7869–7884. [Google Scholar]
  • 20.Mazaheri V., Khodadadi H. Heart arrhythmia diagnosis based on the combination of morphological, frequency and nonlinear features of ECG signals and metaheuristic feature selection algorithm. Expert Syst. Appl. 2020;161 [Google Scholar]
  • 21.Abd Elminaam D.S., Nabil A., Ibraheem S.A., Houssein E.H. An efficient marine predators algorithm for feature selection. IEEE Access. 2021;9:60136–60153. [Google Scholar]
  • 22.Mirjalili S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl.-Based Syst. 2015;89:228–249. [Google Scholar]
  • 23.Khurma R.A., Aljarah I., Sharieh A. A simultaneous moth flame optimizer feature selection approach based on levy flight and selection operators for medical diagnosis. Arab. J. Sci. Eng. 2021;46(9):8415–8440. [Google Scholar]
  • 24.Dhanusha C., Senthil Kumar A., Jagadamba G., Musirin I.B. Sustainable Communication Networks and Application. Springer; 2022. Evolving chaotic shuffled frog leaping memetic metaheuristic model-based feature subset selection for alzheimer’s disease detection; pp. 679–692. [Google Scholar]
  • 25.Jaddi N.S., Abadeh M.S. Cell separation algorithm with enhanced search behaviour in miRNA feature selection for cancer diagnosis. Inf. Syst. 2022;104 [Google Scholar]
  • 26.Abouelmagd L.M., Shams M.Y., El-Attar N.E., Hassanien A.E. Medical Informatics and Bioimaging using Artificial Intelligence. Springer; 2022. Feature selection based coral reefs optimization for breast cancer classification; pp. 53–72. [Google Scholar]
  • 27.Kanya Kumari L., Naga Jagadesh B. An adaptive teaching learning based optimization technique for feature selection to classify mammogram medical images in breast cancer detection. Int. J. Syst. Assur. Eng. Manag. 2022:1–14. [Google Scholar]
  • 28.Dey A., Chattopadhyay S., Singh P.K., Ahmadian A., Ferrara M., Senu N., Sarkar R. MRFGRO: a hybrid meta-heuristic feature selection method for screening COVID-19 using deep features. Sci. Rep. 2021;11(1):1–15. doi: 10.1038/s41598-021-02731-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Aslan M.F., Sabanci K., Durdu A., Unlersen M.F. COVID-19 diagnosis using state-of-the-art CNN architecture features and Bayesian Optimization. Comput. Biol. Med. 2022 doi: 10.1016/j.compbiomed.2022.105244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Davazdahemami B., Zolbanin H.M., Delen D. An explanatory machine learning framework for studying pandemics: The case of COVID-19 emergency department readmissions. Decis. Support Syst. 2022 doi: 10.1016/j.dss.2022.113730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Bandyopadhyay R., Basu A., Cuevas E., Sarkar R. Harris hawks optimisation with simulated annealing as a deep feature selection method for screening of COVID-19 CT-scans. Appl. Soft Comput. 2021;111 doi: 10.1016/j.asoc.2021.107698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Deniz A., Kiziloz H.E., Sevinc E., Dokeroglu T. Predicting the severity of COVID-19 patients using a multi-threaded evolutionary feature selection algorithm. Expert Syst. 2022 [Google Scholar]
  • 33.Kurnaz S., et al. Feature selection for diagnose coronavirus (COVID-19) disease by neural network and Caledonian crow learning algorithm. Appl. Nanosci. 2022:1–16. doi: 10.1007/s13204-021-02159-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kukker A., Sharma R. JAYA-optimized fuzzy reinforcement learning classifier for COVID-19. IETE J. Res. 2022:1–12. [Google Scholar]
  • 35.Ragab M., Eljaaly K., Alhakamy N.A., Alhadrami H.A., Bahaddad A.A., Abo-Dahab S.M., Khalil E.M. Deep ensemble model for COVID-19 diagnosis and classification using chest CT images. Biology. 2022;11(1):43. doi: 10.3390/biology11010043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Too J., Mirjalili S. A hyper learning binary dragonfly algorithm for feature selection: A COVID-19 case study. Knowl.-Based Syst. 2021;212 [Google Scholar]
  • 37.Irene D S., Beulah J.R. An efficient COVID-19 detection from CT images using ensemble support vector machine with ludo game-based swarm optimisation. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2022:1–12. [Google Scholar]
  • 38.Wang W., Pei Y., Wang S., Gorrz J.M., Zhang Y. PSTCNN: Explainable COVID-19 diagnosis using PSO-guided self-tuning CNN. Biocell: Off. J. Soc. Lat. de Microsc. Electron. 2022;47(2):373–384. doi: 10.32604/biocell.2021.0xxx. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wang W., Zhang X., Wang S.-H., Zhang Y.-D. Covid-19 diagnosis by WE-SAJ. Syst. Sci. Control Eng. 2022;10(1):325–335. doi: 10.1080/21642583.2022.2045645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Khurma R.A., Aljarah I., Sharieh A., Mirjalili S. Evolutionary Machine Learning Techniques. Springer; 2020. Evolopy-fs: An open-source nature-inspired optimization framework in python for feature selection; pp. 131–173. [Google Scholar]
  • 41.Jiang X., Coffee M., Bari A., Wang J., Jiang X., Huang J., Shi J., Dai J., Cai J., Zhang T., et al. Towards an artificial intelligence framework for data-driven prediction of coronavirus clinical severity. Comput. Mater. Continua. 2020;63(1):537–551. [Google Scholar]
  • 42.Soomro T.A., Zheng L., Afifi A.J., Ali A., Yin M., Gao J. Artificial intelligence (AI) for medical imaging to combat coronavirus disease (COVID-19): A detailed review with direction for future research. Artif. Intell. Rev. 2021:1–31. doi: 10.1007/s10462-021-09985-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Alzaqebah A., Aljarah I., Al-Kadi O. A hierarchical intrusion detection system based on extreme learning machine and nature-inspired optimization. Comput. Secur. 2023;124 [Google Scholar]
  • 44.Li Y., Zhu X., Liu J. An improved moth-flame optimization algorithm for engineering problems. Symmetry. 2020;12(8):1234. [Google Scholar]
  • 45.Mustafa H.M., Ayob M., Albashish D., Abu-Taleb S. Solving text clustering problem using a memetic differential evolution algorithm. PLoS One. 2020;15(6) doi: 10.1371/journal.pone.0232816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Wang Z., Li M., Li J. A multi-objective evolutionary algorithm for feature selection based on mutual information with a new redundancy measure. Inform. Sci. 2015;307:73–88. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.


Articles from Biomedical Signal Processing and Control are provided here courtesy of Elsevier

RESOURCES