Abstract
Feature Selection (FS) techniques extract the most recognizable features for improving the performance of classification methods for medical applications. In this paper, two intelligent wrapper FS approaches based on a new metaheuristic algorithm named the Snake Optimizer (SO) are introduced. The binary SO, called BSO, is built based on an S-shape transform function to handle the binary discrete values in the FS domain. To improve the exploration of the search space by BSO, three evolutionary crossover operators (i.e., one-point crossover, two-point crossover, and uniform crossover) are incorporated and controlled by a switch probability. The two newly developed FS algorithms, BSO and BSO-CV, are implemented and assessed on a real-world COVID-19 dataset and 23 disease benchmark datasets. According to the experimental results, the improved BSO-CV significantly outperformed the standard BSO in terms of accuracy and running time in 17 datasets. Furthermore, it shrinks the COVID-19 dataset’s dimension by 89% as opposed to the BSO’s 79%. Moreover, the adopted operator on BSO-CV improved the balance between exploitation and exploration capabilities in the standard BSO, particularly in searching and converging toward optimal solutions. The BSO-CV was compared against the most recent wrapper-based FS methods; namely, the hyperlearning binary dragonfly algorithm (HLBDA), the binary moth flame optimization with Lévy flight (LBMFO-V3), the coronavirus herd immunity optimizer with greedy crossover operator (CHIO-GC), as well as four filter methods with an accuracy of more than 90% in most benchmark datasets. These optimistic results reveal the great potential of BSO-CV in reliably searching the feature space.
Keywords: Snake Optimizer, Feature selection, COVID-19, Transfer function, Greedy crossover
1. Introduction
The volume of medical data is steadily expanding daily to keep up with the rapid changes in medical equipment. Nowadays, machine learning and data science techniques play a vital role in medical diagnosis, particularly in discriminating between various forms of cancer. This diagnosing task is considered a classification task in machine learning, aiming to classify the input medical data into several discrete cases (i.e., benign and malignant). Many of the medical domain’s collected features are relevant, redundant, noisy, or irrelevant to classification tasks. Using irrelevant, noisy, and redundant features degrades classification model performance in medical diagnosis. As a result, the final decision in this domain becomes shaky and untrustworthy [1]. Therefore, it is necessary to pick only the proper features on which to perform the learning model. This will boost the effectiveness of the classifier’s output while reducing the learning model’s time-consuming, particularly when dealing with large datasets. In machine learning, FS methods are considered essential preprocessing algorithms for optimizing the efficiency of the classification methods by identifying a meaningful pattern to support the final judgment of the classifiers.
The FS task in the medical domain essentially involves devising a procedure to obtain a suitable subset of features from the original big dataset (e.g., all features). This subset includes features crucial to the current problem while excluding unnecessary or redundant features. If all features are utilized for the classification of medical tasks, the learning model will become porn to overfitting issues due to the curse of dimensionality. Thus, ultimate performance will suffer either in terms of accuracy or time-consuming [2]. Therefore, the primary purpose of the FS algorithms relies on two objectives: generating a smaller version of the original dataset by selecting the most relevant features and excluding irrelevant and redundant features. Meanwhile, improving classification performance [3], [4]. Implementing the FS process has a significant effect in avoiding the curse of dimensionality, which makes the learning methods less likely to overfit [5].
Typically, the feature selection process is divided into five stages [5]: initialization, subset discovery/search, subset evaluation, stopping criterion, and final subset validation. The subsets of features are generated in subset discovery, where each subset is chosen from the whole set of features (all dataset features). The search approach, in particular, explores the search space to identify the optimal feature subset. In reality, each feature in the new subset is checked for eligibility using a forward or backward elimination procedure. The quality of the selected features is assessed using a subset evaluation function. For this assignment, the majority of the FS approaches employ a predictive model with a suitable fitness function (i.e., accuracy). The halting condition is utilized to prevent the FS techniques from becoming trapped in an indefinite loop. Most stopping conditions include the maximum number of iterations as a predefined parameter [6].
FS algorithms can be divided into four classes based on using the evaluation methods: filter, embedded, wrapper and hybrid-based methods [6]. Filter-based FS approaches leverage statistical assessment metrics to rank features. Each feature is granted a score based on the designated metric (for example, information gain (IG) and F-score). Then, each feature is ranked based on its obtained score (ascending or descending). The high-score features are considered the most effective in the current domain. Generally, the filter-based methods have no real interaction with the classifier (predictive) model. As a result, they are faster than the wrapper and embedded methods. Many filters have been adopted in the literature, such as ReliefF [7], mutual information, absolute cosine (AC), and mRMR [8]. The second type is embedded methods, where the FS process is integrated into the classifier learning to become a single process, such as SVM-recursive feature elimination (SVM-RFE) [9]. The hybridization of the embedded feature selection with the filter method can be found in our latest study, where we combined the AC with the SVM-RFE to handle the redundancy in the SVM-RFE(SVM(AC)) [1].
In contrast to the filter method, wrapper methods make use of a predictive model (i.e., K-Nearest-Neighbor (KNN)) as part of the assessment phase to evaluate the fitness value of the acquired feature subset. The wrapper-based approach finds an appropriate subset (i.e., solution) for the current task. However, because the total number of potential solutions is , where n is the number of dimensions, it is difficult to find a near-optimal subset of features in terms of objective fitness due to the vast search space. In addition, this problem has become more complicated when n is increasing dramatically in many fields due to the data collection phase. Thus, the complexity of those problems is increased. This indicates that standard brute force techniques are unfeasible and that advanced search techniques should be utilized instead. Hence, one of the promising techniques that can be used for those problems is Meta-heuristic Algorithms (MAs).
MAs are intelligent algorithms that involve mathematical operations and make several efforts to identify an optimal solution from a set of random solutions with the assistance of the learning model for a particular task [10]. MAs compute either a single or multiple objectives to select the optimal solution. To be precise, MAs use the information obtained during the search to guide the optimization process. They usually merge numerous solutions to generate a highly proficient one, e.g., crossover in Genetic Algorithm (GA) and avoid getting stuck in local minima. While the MAs search for the optimal solution, they usually perform two stages of search: exploration and exploitation [11]. During the exploration stage, the investigation covers a variety of environments to identify more locations for high-quality solutions. In contrast, at the exploitation stage, available resources are focused on a specific search location. The main challenge for MAs is to strike a balance between exploration and exploitation [3].
Using MAs with FS problems is considered a multi-objective task, where the primary goal is to preserve a minimum number of selected features while improving the classification performance. However, these two objectives are contradictory, and the optimal decision should be determined by making a trade-off between them. Recently, numerous MAs have been adopted for FS in medical classification and diagnosis apps. They are utilized in wrapper or hybrid wrapper-filter approaches. These include Moth Flame Optimization (MFO) [12], Coronavirus Herd Immunity Optimizer (CHIO) [2], Particle Swarm Optimization (PSO) [11], [13], Rat Swarm Optimizer (RSO) [14], and Mine Blast Algorithm (MBA) [15]. Continuing with the advantages of MAs in FS problems, we chose Snake Optimizer (SO) as one of the most recent MAs to achieve. The SO is a newly invented, continuous, nature-inspired method that mimics the snakes’ mating and fighting behaviors. SO includes mating and fighting modes. The former occurs based on cold temperatures. While in the latter, the snakes fight until the male gets the female and the female gets the best male. If no food is encountered, the exploration stage is started to search for food. In contrast, if the snakes eat the food, this is one case of exploitation.
SO has some particular aspects over MAs, first, it has a novel natural inspiration. This is the first time to propose the mating behavior of snakes for solving optimization problems. Second, the experimental results and statistical comparisons prove the effectiveness and efficiency of SO on different landscapes concerning exploration–exploitation balance and convergence curve speed [16]. Third, it has high stability and good convergence and is simple to implement and parameter-less [17].
However, many optimization tasks (e.g., feature selection) include discrete search space and decision variables. Besides, updating the population impacts the population’s diversity; as a result, the exploration stage needs to be improved to fully explore the search space [15], [18].
In this study, authors strive to exploit the swarm-based SO algorithm to build a wrapper-based approach to address several medical classification problems. This depends on nominating the most valuable and informative medical features in a specific dataset that are required for generating the best medical classification model with higher performance, less number of features, and less running time. Increasing the effectiveness of medical models using AI tools can be considered as an alternative to physicians with less cost and side effects on patients. Therefore, SO was adopted to search in the feature space for the best feature subset. Since SO was developed to deal with continuous optimization problems and it has never been used before to deal with discrete search space, in the first experiments authors generated a new binary version of SO called BSO using a common and widely used S-shape transform function. The BSO was validated by examining its performance using several evaluation measures such as accuracy, sensitivity, specificity, fitness value, number of selected features, running time, conversion curves, box plots, convergence speed, Holm’s test, and Friedman’s test. In the second experiment, new evolutionary greedy crossover operators (GC) (i.e., one-point, two-point, and uniform crossovers) are proposed to be integrated with SO to enhance its explorative power in the feature space. These are controlled by a switch probability. The two newly developed FS algorithms, BSO and BSO-CV, are implemented and assessed on a real-world COVID-19 dataset and 23 disease benchmark datasets. According to the experimental results, the improved BSO-CV significantly outperformed the standard BSO in terms of accuracy and running time in 17 datasets. Furthermore, it shrinks the COVID-19 dataset’s dimension by 89% as opposed to the BSO’s 79%. Moreover, the adopted operator on BSO-CV improved the balance between exploitation and exploration capabilities in the standard BSO, particularly in searching and converging toward optimal solutions. The BSO-CV was compared against the most recent wrapper-based FS methods; namely, the hyperlearning binary dragonfly algorithm (HLBDA), the binary moth flame optimization with Lévy flight (LBMFO-V3), the coronavirus herd immunity optimizer with greedy crossover operator (CHIO-GC), as well as four filter methods with an accuracy of more than 90% in most benchmark datasets. These optimistic results reveal the great potential of BSO-CV in reliably searching the feature space.
The manuscript’s structure is organized as follows: Section 2 provides a review of some related works. The theoretical and mathematical background of SO is presented in Section 3. In Section 4, the details of the new BSO and BSO-CV methods for FS are outlined. Then, in Section 5, the obtained results and related comparisons are reported and discussed. Followed by the computational statistical test analysis in Section 6. Finally, the conclusion and several recommendations for future research are illustrated in Section 7.
2. Literature review
Medical applications are a critical and crucial research area for machine learning scientists. Recently, many studies that exploit artificial intelligence and data science techniques have assisted in developing medical models. This depends on medical images, patient medical files, and other features to predict disease occurrence at an early stage [5].
2.1. Evolutionary feature selection for disease diagnosis
This subsection sheds light on the recent research in the field of medical applications that have developed evolutionary FS models to support physicians. In [19], the authors developed a new model for the early prediction of diabetes. The new model used Grey Wolf Optimization (GWO) and Adaptive Particle Swam Optimization (APSO) to improve the Multilayer Perceptron (MLP). They were able to reduce the number of selected features and achieve high-performance results. GWO-MLP and APGWO-MLP obtained an accuracy of 96% and 97% respectively.
Mazaher et al. [20], developed a new computer-aided diagnosis (CAD) system to detect different types of cardiac arrhythmia disorder using the ElectroCardioGram (ECG) signal. After the preprocessing steps, different features of the ECG signals were segmented and analyzed. Several metaheuristic algorithms were used in combination with the selected features. The best results were obtained using a multi-objective optimization algorithm called the Non-dominated Sorting Genetic Algorithm (NSGA II). The feed-forward neural network accuracy for heart disease classification was 98.75%. In [21], the authors proposed a new model-based Marine Predator Algorithm (MPA) to extract the most significant feature subset to enhance the classification accuracy using the k-Nearest Neighbors (k-NN). The MPA-KNN was applied to 18 medical datasets and achieved the best results compared with other compared with meta-heuristic algorithms.
The Moth Flame Optimization algorithm (MFO) [22] was one of the optimization algorithms used in developing FS approaches to handle medical diagnosis [12], [18], and [23]. In [12], Khurma et al. generated eight binary MFO versions using eight transfer functions. Then, they applied the Levy flight operator in combination with transfer functions to increase the diversity of the algorithm and support the exploration of the search space. The proposed approach achieved an accuracy of 83% over 23 datasets. In [18], an FS model based on the Moth Flame optimization algorithm (MFO) was proposed. The performance of MFO was improved by adopting an adaptive method to update the position of a solution. The proposed MFO was tested on sixteen medical datasets, and the results showed promising classification results. Another study [23] proposed the MFO using Levy flight and different selection mechanisms: random selection, tournament selection, and roulette wheel selection methods to decrease the bias of the MFO algorithm toward exploitation. The proposed methods were tested using 23 medical data sets. Their results showed an enhanced behavior of MFO in the exploration, convergence, and diversity of solutions.
Dhanusha et al. [24], proposed a new model for Alzheimer’s disease (AD) based on the imaging data and clinical profile. The memetic metaheuristic model was called the Chaotic Shuffled Frog Leaping Algorithm (CSFLA). It used chaotic mapping when the solution in the search space obtained the worst result. CSFLA [24] was a simple model with few parameters and generated smaller subsets of features, less computation time, and best performance compared with other algorithms in the deep neural network.
Jaddi et al. [25], employed the Cell Separation Algorithm (CSA) for cancer classification based on applying feature selection to microRNA data. The authors enhanced the movement of virtual cells in the CSA to achieve a balance between global and local search. The improved CSA (I-CSA) was tested using 22 classifiers on 25 test functions and four general biological classification problems, and an experiment for feature selection from microRNA data was performed. The accuracy of each cancer type was also compared with the accuracy of 77 classifiers reported in previous studies. The proposed approach obtained 100% accuracy in 25 out of 29 classes.
Abouelmagd et al. [26] applied the Coral Reefs Optimization (CRO) algorithm for FS of breast cancer. This was based on using five classifiers. The algorithm achieved an accuracy of 100% in four algorithms and 99.1% using one classifier. In the study performed by Alweshah [2], an FS approach was applied to determine the most informative subset of features for several medical problems. The Coronavirus Herd Immunity Optimizer (CHIO) was used with and without a Greedy Crossover (GC) operator to improve the exploration of the CHIO. The CHIO and CHIO-GC were applied to 24 medical datasets. The results show that CHIO-GC was better than CHIO in terms of accuracy, the number of selected features, F-measure, and convergence speed. The CHIO-GC obtained an accuracy of 79% on medical benchmark datasets and an accuracy of 93% on the COVID-19 dataset.
Kanya [27], developed a new CAD to use mammogram images for early detection of breast cancer. The authors applied feature extraction, selection, and other preprocessing steps. They proposed the Weighted Adaptive Binary Teaching Learning Based Optimization (WA-BTLBO) and XGBoost classifier. The experiments showed high-accuracy results in classifying mammogram images as normal or abnormal.
2.2. Machine learning techniques for tackling COVID-19: Background
Dey [28], proposed a hybrid model that was applied in two stages. The first stage fine-tuned the parameters of the Convolutional Neural Networks (CNNs) to get the features from the COVID-19 patient’s infected lungs. In the second stage applied the Manta Ray Foraging-based Golden Ratio Optimizer (MRFGRO) to select the most informative feature subset. The proposed model achieved a classification accuracy of 99.15%, 99.42%, and 95.57% on three COVID-19 datasets, respectively.
Aslan [29] presented a classification model that extracted features using CNN in a study. Furthermore, it identified the hyperparameters of algorithms by Bayesian Optimization. The main contribution of this study was using Artificial Neural Networks (ANNs) for lung image segmentation. Also, it classified the chest images computed from the COVID-19 Radiography Database. Using classifiers together with the best hyperparameters could produce optimizing results. The best-achieved result was 96.29% using SVM. In [30], an evolutionary and deep learning algorithm and an advanced interpretation model were combined into one framework to help clinical decision-makers in dealing with different pandemic cases promptly. The feature selection stage was implemented using a genetic algorithm, A deep artificial neural network achieved an AUC of 0.883.
Bandyopadhyay et al. [31], proposed two stages of methods that apply feature extraction and feature selection for detecting COVID-19 from CT scan images. For feature extraction, the CNN DenseNet architecture was used. Harris Hawks Optimization (HHO), Simulated Annealing (SA), and Chaotic maps were combined to perform feature selection. The method was applied to the SARS-COV-2 CT-Scan dataset and the achieved accuracy result was 98.42 Deniz et al. [32], used the genetic algorithm and the Extreme Learning Machines (MG-ELM), a multi-threaded genetic feature selection algorithm, to predict the risk level of COVID-19 patients. The authors studied the effects of multi-threaded genetic algorithm implementation with statistical analysis. To verify the efficiency of MG-ELM, they compared their results with traditional and more recent techniques. The proposed algorithm outperformed other algorithms in terms of prediction accuracy. Kurnaz et al. [33], applied an FS approach using the Crow Learning Algorithm and ANN. The FS was used to select the relevant features for COVID-19 disease. The experiments were applied to the COVID-19 disease dataset in a Brazilian hospital. The experimental results showed that the accuracy was 94.31%.
In the study conducted by Kukker et al. [34], reinforcement learning was applied to determine COVID-19 using chest X-ray images. The author used the JAYA-Optimization algorithm, Wavelet Transform, feature extraction, and Principal Component Analysis feature reduction technique on X-ray images. The obtained accuracy of the COVID-19 prediction using the proposed method was 87.75%.
Ragab et al. [35], used the ensemble method for the detection of COVID-19. In addition, Gaussian filtering was used to eliminate noise and enhance image quality. Furthermore, a Shark Optimization Algorithm (SOA) with Recurrent Neural Networks (RNN) was applied to extract features. An Improved Bat Algorithm with a Multiclass Support Vector Machine (IBA-MSVM) was used for CT scan classification. The results showed promising classification performance over other approaches.
In [36], the authors introduced a novel HyperLearning Binary Dragonfly Algorithm (HLBDA) to select the most promising features from the COVID-19 dataset. The results showed that the HLBDA achieved higher results than other related algorithms on the same dataset. In [37], the Ensemble Support vector machine with Ludo Game-based Swarm Algorithm (ESLGSA) was used for the COVID-19 prediction from the CT and X-ray images. The proposed approach reduced the physical labeling of the images. The accuracy results were 99.64% while the AUC was 0.9257.
According to a recent study [38], the authors utilized PSO with a convolutional neural network (PSTCNN) to discover COVID-19 using chest computed tomography (CT) medical images. In more detail, the authors use PSO to perform self-tuning for the CNN’s hyperparameters to improve diagnosis performance. The proposed (PSTCNN) achieved an accuracy value of 93.99%±1.78% for binary classification. However, the PSTCNN is only utilized for tuning three hyperparameters (i.e., the coefficient that controls the decay rates of the past gradient, the square of the decay rates of the past gradient, and the learning rate).
Authors in [39] utilized deep learning with the self-adaptive Jaya algorithm (WE-SAJ) for Covid-19 CT image diagnosis. The proposed model first extracted the wavelet entropy features from the CT images, then utilized the self-adaptive Jaya algorithm for training the model. Finally, they employed a 2-layer feedforward neural network (FNN) as the classifier. Their proposed WE-SAJ model achieved more than 85% sensitivity. Although this model was well-designed, it requires hyperparameter tuning to increase the obtained results and achieve fast convergence.
The bottom line is that many evolutionary algorithms have been applied in medical applications to diagnose several diseases. The findings showed that applying these intelligent algorithms with FS as a preprocessing stage can enhance the classification results. Concerning COVID-19, several machine-learning algorithms have been utilized to detect COVID-19. However, the critical aspects of medical diagnosis pushed researchers to propose new methodologies and new enhancement strategies to optimize the random search.
According to the No-Free-Lunch theorem, there is still an area for proposing new algorithms to diagnose diseases. In this study, the recent SO algorithm is proposed for the first time for medical diagnosis. A binary version is produced to perform an FS within a wrapper framework. Furthermore, the crossover operators are proposed to enhance the search capability and generate more balance between the exploration and exploitation phases. The target is to enforce more diversity among solutions and assist entrapped solutions to jump from the local minima.
3. Snake optimizer (SO)
The SO algorithm is inspired by the behavior of snakes in nature [16]. The following points show the main SO steps: SO Initializes a set of random solutions in the search space using Eq. (1).
| (1) |
where is the location in the search space of the solution in the swarm. is a random number . and are the minimum and the maximum values respectively for the studied problem.
The population is divided into two parts (50% male and 50% female) using Eqs. (2), (3)
| (2) |
| (3) |
where is the size of the population (all snakes). is the number of the male solutions. is the number of female solutions.
Get the best solution from the male group (), and female group () and find the location of the food . Two other concepts are defined which are the temperature () and the quantity of food () as in Eqs. (4), (5) respectively.
| (4) |
where is the current iteration and is the number of all iterations.
| (5) |
where is a constant value equal to 0.5.
Exploring the search space (food is not found): this depends on using a specified threshold value. If , the solutions search globally by updating their locations concerning a specified random location in the search space. This modeled by Eqs. (6)– (9)
| (6) |
where is male solution, is the location of a random male solution, is a random number and is the ability of the male solution to find the food and can be computed using Eq. (7):
| (7) |
where is the fitness of and is the fitness of solution the in male group and is a constant equals 0.05.
| (8) |
where is female solution, is the location of a random female solution, is a random number and is the ability of the female solution to find the food and can be computed using Eq. (9):
| (9) |
where is the fitness of and is the fitness of solution the in male group and is a constant equals 0.05.
Exploiting the search space (Food is found) If the quantity of food is greater than a specified threshold then the temperature is checked. If (hot), The solutions will move to the food only.
| (10) |
where is the location of a solution (male or female), is the location of the best solutions, and is the constant value and equals 2.
If (cold), The snake will be in the fight mode or mating mode Fight Mode.
| (11) |
where is the male location, is the location of the best solution in the female group, and FAM is the fighting ability of the male solution.
| (12) |
where is the female location, is the location of the best solution in the male group, and FAF is the fighting ability of the female solution.
and can be computed from the following equations:
| (13) |
| (14) |
where is the fitness of the best solution for the female group, is the fitness of the best solution of male group and is the solution fitness.
Mating mode.
| (15) |
| (16) |
where is the location of the solution in female group and is the location of the solution in male group and MAm and MAf are the ability of males and females for mating respectively and they can be computed as follow:
| (17) |
| (18) |
If the Egg hatch, select the worst male solution and the worst female solution and replace them
| (19) |
| (20) |
where is the worst solution in the male group, is the worst solution in the female group. The diversity factor operator gives chance to increase or decrease locations’ solutions to give a high probability to change the locations of solutions in the search space in all possible directions.
4. Proposed binary snake optimizer (BSO)
The FS problem deals with binary solutions that move in a discrete search space. The goal of the FS problem is to find the optimal subset of features. This feature subset represents the minimum number of features that have the maximum classification performance. In this study, the SO optimizer is converted for the first time into binary to tackle the FS problem. The original version of SO was developed to deal with continuous search space. Generating a binary version of SO requires representing a solution using a binary vector. The values of the solution’s elements are restricted to either ‘0’ or ‘1’. Concerning the update strategy in the algorithm, the solutions change their positions in the feature space. This requires using transfer functions to guarantee that the solution’s elements are either ‘0’ or ‘1’.
4.1. S-shaped transfer function
The used transfer function in this study is the sigmoid function Eq. (21). The main task of the sigmoid function is to generate a probability for each solution’s element. If this probability is greater than a specified threshold, then the value is ‘0’, otherwise, the value is ’‘1’, as presented in Eq. (22), where , is the th at iteration in dimension . Algorithm 1 represents the pseudo-code of the binary version of the SO algorithm (BSO). Fig. 2 shows the flowchart of the BSO.
| (21) |
| (22) |
Fig. 2.
The flowchart of the proposed BSO for feature selection.
In the FS approach presented in this work, the TF displayed in Fig. 1 which represents Eq. (21) is used to represent the probability of changing the positions of the elements.
Fig. 1.
The sigmoidal transfer function for converting continuous data to discrete.
4.2. BSO for feature selection
To prepare the BSO for the FS problem, two main aspects should be considered: the solution representation and the fitness function. The FS problem requires initializing a solution using a binary vector. The length of this vector is the dimensionality of the problem. Hence, each bit of this vector represents a feature in a dataset. The values of the elements are either ‘0’ or ‘1’. ‘0’ means that the corresponding feature is not selected, while ‘1’ means the corresponding feature is selected. Fig. 3 shows the binary representation of the solutions.
Fig. 3.
The binary representation of the solutions for feature selection task.
Selecting all the features of a dataset, such as in brute force algorithms, causes the search algorithm’s running time to be exponential. The running time of the search algorithm is where is the number of features in a dataset. Hence, reducing the number of features will increase the efficiency of the search algorithm. For this reason, an FS algorithm is multiobjective because its main target is to find a solution with the minimum number of features and maximum classification performance. In this study, the K-nearest neighbor (K-NN) algorithm was used to train the dataset. The parameter is set to 5 [40]. The second aspect of the FS problem is the fitness function. The evaluation of the solutions depends on the number of selected features and the classification error rate as in Eq. (23), where is the error rate of classification, is the selected feature subset, is the set of all features in a dataset.
| (23) |
4.3. Evolutionary crossover operators
The crossover operator is one of the primary evolutionary operators that have been widely used to enhance the swarm-based algorithm. Integrating the crossover operator in the structure of a swarm-based algorithm causes a greater exploration of the search space. This means that the solutions are re-positioned and distributed to undiscovered regions of the search space. Empowering the diversity of the algorithm assists the optimizer to alleviate the local minima problem and being close to the global best solution. This algorithm is abbreviated as BSO-CV. Eq. (24) shows the crossover function. Each solution in the search space is linked with the position of one of the fittest solutions in the swarm. The roulette wheel selection operator is used to find out the fittest solution .
| (24) |
Three types of crossover operators are used and integrated with the SO algorithm. The roulette wheel selection operator is used to select among these types randomly in each run of the SO.
-
•
One-point crossover: selects randomly a single point from the current solution. Based on the selected point, the next elements to them are exchanged with each other. The crossover is applied on the current solution and the best solution . The probability of the occurrence of a single-point crossover where is a random number
-
•
Two-point crossover: selects randomly two points in the current solution. The elements within the two points are interchanged. This happens between and . The probability of the occurrence of the two-point crossover .
-
•
Uniform crossover: The current solution elements and the best solution elements are shuffled based on a pre-determined ratio. For example, if the ratio is 30%, this means that every 30% of the number of elements in the solution must be exchanged between the two solutions. The probability of the occurrence of the uniform crossover .
Fig. 4 shows the techniques followed by different types of crossover operators. Algorithm 3 shows the Greedy crossover operator pseudocode.
Fig. 4.
Evolutionary crossover operators: (i) one-point crossover. (ii) Two-point crossover. (iii) Uniform crossover.
4.4. Complexity analysis of BSO and BSO-CV
Time complexity of the BSO and BSO-CV algorithms was recruited with the use of the Big-O notation (i.e., the worst case). Particularly, the time complexity analysis of these methods for feature selection tasks is based basically on the initialization process, the dataset dimensions (), the cost of the fitness function (), the number of iterations for the optimization algorithm (), population size n (i.e., the number of male + female populations), and the number of running experiments (). In addition, the S-shaped transfer function is used to produce binary versions of the BSO and BSO-CV. Based on the above notations, the general computational complexity of the BSO and BSO-CV can be formulated using the Big-O case as follows:
| (25) |
By calculating the Big-O for each phase in Eq. (26), the time complexity for BSO can be represented as the following:
| (26) |
| (27) |
As shown in Eq. (28) the main parameters of the complexity issue rely on the number of iterations as well as the size of the population. Besides, and , so the component can be ruled out from the time complexity given in Eq. (28). Thus, the time complexity of the BSO can be viewed as the following:
| (28) |
for the BSO-CV, the time complexity is the same as the BSO expect the CV is added into each iteration. thus, the time complexity for BSO-CV is ginven in the following:
5. Experimental results and discussion
5.1. Experiment settings and parameters setup
Some preliminary experiments were carried out to determine the input parameters that enabled the proposed method to produce a better output. To apply fairness, the algorithm configurations were identical throughout the experiments. The used classifier in the BSO wrapper framework is the K-nearest neighbor (KNN). The KNN receives each unclassified new data instance in the feature space as an input, then uses the similarity method to classify it and put it in a particular category. This labeling method is a kind of supervised learning that is commonly used in the diagnosis of disease. In this study, is used to make voting and the decision of the class membership is based on the majority of votes. The parameters setting is, the number of runs is 30, the number of iterations is 100, and the size of the population is 100.
5.2. Evaluation measures
The proposed BSO and BSO-CV are evaluated using accuracy, the number of selected features, running time, sensitivity and specificity, convergence curves, boxplots, and T-test. The following are descriptions of the accuracy, sensitivity, and specificity measures along with their formulas and meaning relative to disease diagnosis. Eq. (29), Eq. (30), and Eq. (31) and shows the mathematical formulas of the classification accuracy, sensitivity specificity, and precision respectively.
| (29) |
where:
-
•
True positives (TPs): indicates the instances that are actually sick (have a disease) and the model diagnoses them as sick. and the actual output was also true.
-
•
True negatives (TNs): indicates the instances that are actually well (does not have a disease) and the model diagnoses them as well.
-
•
False positives (FPs): indicates the instances that are actually well and the model diagnoses them as sick.
-
•False negatives (FNs): indicates the instances that are actually sick and the model diagnoses them as well.
(30) (31)
5.3. Description of the benchmark datasets
Table 1 shows the datasets used in this study. 23 medical benchmark datasets are used in the experiments. In addition, a real COVID-19 dataset is used. For the benchmark datasets, twelve of them were downloaded from the UCI (Diagnostic, Original, Prognostic, Coimbra, BreastEW, Retinopathy, Dermatology, ILPD-Liver, Lymphography, Parkinsons, ParkinsonC, and Prostate). Seven datasets were downloaded from KEEL (SPECT, Cleveland, HeartEW, Hepatitis, SAHear, Spectfheart, and Thyroid0387). Two datasets (Heart and Pima-diabetes) were downloaded from Kaggle. The remaining three datasets were downloaded from different feature selection websites (Leukemia from https://jundongl.github.io/scikit-feature/datasets.html), (Colon from https://jundongl.github.io/scikit-feature/datasets.html) and (Prostate_GE from https://jundongl.github.io/scikit-feature/datasets.html)
Table 1.
Medical benchmark datasets.
| Number | Dataset | Number of features | Number of instances | Number of classes |
|---|---|---|---|---|
| 1 | Diagnostic | 30 | 569 | 2 |
| 2 | Original | 9 | 699 | 2 |
| 3 | Prognostic | 33 | 194 | 2 |
| 4 | Coimbra | 9 | 115 | 2 |
| 5 | BreastEW | 30 | 596 | 2 |
| 6 | Retinopathy | 19 | 1151 | 2 |
| 7 | Dermatology | 34 | 366 | 6 |
| 8 | ILPD-Liver | 10 | 583 | 2 |
| 9 | Lymphography | 18 | 148 | 4 |
| 10 | Parkinsons | 22 | 194 | 2 |
| 11 | ParkinsonC | 753 | 755 | 2 |
| 12 | SPECT | 22 | 267 | 2 |
| 13 | Cleveland | 13 | 297 | 5 |
| 14 | HeartEW | 13 | 270 | 2 |
| 15 | Hepatitis | 18 | 79 | 2 |
| 16 | SAHeart | 9 | 461 | 2 |
| 17 | Spectfheart | 43 | 266 | 2 |
| 18 | Thyroid0387 | 21 | 7200 | 3 |
| 19 | Heart | 13 | 302 | 5 |
| 20 | Pima-diabetes | 9 | 768 | 2 |
| 21 | Leukemia | 7129 | 72 | 2 |
| 22 | Colon | 2000 | 62 | 2 |
| 23 | Prostate_GE | 5966 | 102 | 2 |
5.4. A real world COVID-19 dataset
Recently, the world has suffered from the spread of Corona disease, caused by a contagious virus. The disease has spread so widely that it has been classified as an epidemic. It caused many deaths, and the number of patients exceeded the capacity of hospitals to accommodate them. Machine learning techniques have been used to treat disease and control it is spread [41], [42]. In this study, the COVID-19 real dataset was downloaded from https://github.com/AtharvaPeshkar/Covid-19-Patient-Health-Analytics. The purpose is to validate the BSO and BSO-CV by examining their ability to detect the disease. Table 2 shows the features of the dataset. This study intends to predict the death and recovery conditions depending on the given factors. The patients’ data that contain missing values for both “death” and “recov” status are removed from the main dataset. For the training and testing methodology, the dataset was split evenly between 50% training and 50% testing.
Table 2.
Covid-19 real dataset.
| No | Feature name | Description |
|---|---|---|
| 1 | Id | Patient identifier |
| 2 | Location | Patient location (local address) |
| 3 | Country | Country of origin of the patient |
| 4 | Gender | Gender of the patient |
| 5 | Age | Age of the patient |
| 6 | Sym_on | Date the patient shows symptoms |
| 7 | Hosp_vis | Date the patient visits hospital |
| 8 | vis_wuhan | The patient has visited Wuhan |
| 9 | From_wuhan | The patient is from Wuhan |
| 10 | Symptom1 | A symptom presented by the patient |
| 11 | Symptom2 | A symptom presented by the patient |
| 12 | symptom3 | A symptom presented by the patient |
| 13 | Symptom4 | A symptom presented by the patient |
| 14 | Symptom5 | A symptom presented by the patient |
| 15 | Symptom6 | A symptom presented by the patient |
5.5. Results and discussion
As was already stated, the FS approaches try to reduce the problem space dimension by choosing the most informative features that perform at the most significant level. The datasets used for this purpose employ the binary version of the SO, known as BSO. Table 3 depicts the classification accuracy, sensitivity, specificity, and precision, as well as the number of features that were selected and the best average fitness value (AVE). Additionally, each measurement’s standard deviation (STD) was recorded. Superior improvements are demonstrated by getting ideal results with fewer features when using the BSO. In 11 datasets, it achieves a classification accuracy of greater than 90%. Additionally, in 8 datasets, over 95%. As can be seen, we were able to reduce data in 22 datasets by more than 50% using the suggested technique, and in four datasets (, , , and ), the reduction rate exceeded 90%, which minimized the complexity and saves resources. In contrast, and for the COVID-19 dataset, the BSO correctly identified the occurrences with a 95% proportion by employing just, on average, 3.5 features, generating a 76% reduction rate, as displayed in Table 10.
Table 3.
The results of SO without feature selection.
| Benchmark | Stat. measure | Accuracy | Sensitivity | Specificity | Time |
|---|---|---|---|---|---|
| Diagnostic | AVE STD |
0.7499 0.0425 |
0.4796 0.0991 |
0.9098 0.0356 |
28.9987 3.9533 |
| Original | AVE STD |
0.9691 0.0087 |
0.9699 0.0133 |
0.9698 0.0256 |
54.5091 2.5939 |
| Prognostic | AVE STD |
0.7184 0.0587 |
0.9193 0.0754 |
0.6956 0.0889 |
75.9876 4.9834 |
| Coimbra | AVE STD |
0.5145 0.0976 |
0.4239 0.1815 |
0.6137 0.1573 |
29.8974 39.9873 |
| Retinopathy | AVE STD |
0.6523 0.0303 |
0.6742 0.0517 |
0.6333 0.0407 |
149.9875 29.7789 |
| Dermatology | AVE STD |
0.6934 0.0423 |
0.8332 0.0459 |
0.9211 0.0772 |
30.9987 2.3456 |
| ILPD-Liver | AVE STD |
0.7728 0.0444 |
0.8060 0.9281 |
0.2066 0.0735 |
55.1002 2.7344 |
| Lymphography | AVE STD |
0.9043 0.0739 |
0.4162 0.5348 |
0.7000 0.0948 |
59.9567 49.4922 |
| Parkinsons | AVE STD |
0.8471 0.0492 |
0.5703 0.1452 |
0.9410 0.0546 |
72.5678 0.9988 |
| ParkinsonC | AVE STD |
0.7307 0.0295 |
0.2707 0.0595 |
0.8938 0.0313 |
77.6543 5.9866 |
| SPECT | AVE STD |
0.6717 0.0448 |
0.7249 0.0625 |
0.5971 0.0847 |
96.5789 25.5431 |
| Cleveland | AVE STD |
0.4712 0.0569 |
0.1969 0.0291 |
0.8142 0.0128 |
15.8569 5.2949 |
| HeartEW | AVE STD |
0.8284 0.0412 |
0.8481 0.0635 |
0.8064 0.0682 |
56.9973 34.7790 |
| Hepatitis | AVE STD |
0.8778 0.0759 |
0.0789 0.2138 |
0.9533 0.0491 |
45.7563 2.7331 |
| SAHeart | AVE STD |
0.6239 0.0516 |
0.7774 0.0477 |
0.3216 0.1003 |
97.4367 5.7791 |
| Spectfheart | AVE STD |
0.7692 0.0505 |
0.3404 0.1621 |
0.8869 0.0438 |
104.8872 3.4456 |
| Thyroid0387 | AVE STD |
0.9382 0.0065 |
0.5463 0.0329 |
0.7536 0.0129 |
155.9893 7.8893 |
| Heart | AVE STD |
0.8567 0.5679 |
0.8087 0.0994 |
0.6432 0.3451 |
29.6388 2.4167 |
| Pima-diabetes | AVE STD |
0.7107 0.0287 |
0.8021 0.0347 |
0.5345 0.0554 |
69.8896 5.7689 |
| Leukemia | AVE STD |
0.8714 0.0868 |
0.9773 0.0495 |
0.6872 0.2041 |
67.9987 5.7654 |
| Prostate_GE | AVE STD |
0.8711 0.0535 |
0.8813 0.0941 |
0.8594 0.0763 |
250.6578 9.6678 |
| BreastEW | AVE STD |
0.9596 0.0154 |
0.9824 0.0158 |
0.9212 0.0404 |
7.8890 0.6789 |
| Colon | AVE STD |
0.7528 0.1285 |
0.8701 0.1508 |
0.6002 0.2365 |
9.9865 0.8976 |
Table 10.
Results of BSO and BSO-CV on COVID-19.
| Results of BSO ON Covid-19 dataset | ||||
| Evaluation measure | Average | Standard deviation | Minimum | Maximum |
| Accuracy | 0.9378 | 0.0136 | 0.9167 | 0.9500 |
| # Selected features | 3.1000 | 2.5582 | 2.5000 | 3.5000 |
| Fitness value | 0.1371 | 0.0054 | 0.1304 | 0.1390 |
| Running time | 21.6646 | 0.0412 | 21.6157 | 21.7439 |
| Sensitivity | 0.2706 | 0.0610 | 0.1765 | 0.3571 |
| Specificity | 0.9631 | 0.0197 | 0.9314 | 1.0000 |
| Results of BSO-CV ON Covid-19 dataset | ||||
| Evaluation measure | Average | Standard deviation | Minimum | Maximum |
| Accuracy | 0.9560 | 0.0123 | 0.9167 | 0.9661 |
| #Selected features | 1.7000 | 2.4967 | 1.5000 | 2.4400 |
| Fitness value | 0.1351 | 0.0039 | 0.1299 | 0.1366 |
| Running time | 4.4075 | 0.1031 | 4.2226 | 4.6160 |
| Sensitivity | 0.2973 | 0.1132 | 0.1111 | 0.5000 |
| Specificity | 0.9664 | 0.0231 | 0.9118 | 0.9905 |
These result from how snakes behave, as circumstances like quantity and temperature can lead snakes to change their locations. Additionally, the groupings of males and females can investigate more positions in the search space during the initialization phase.
In general, the population-based evolutionary algorithms can fall into local optima according to the investigation of the solutions in a guided manner besides the best solution’s space. In contrast, SO is a population-based evolutionary algorithm and can trap in the local optima region. Therefore, we use the well-known crossover CV technique to avoid trapping into local optima and balance the exploration and exploitation phases. Furthermore, by using the CV technique and updating the positions operator, more positions in the search space will be discovered to go beyond the local optima region. Table 4 shows a classification accuracy comparison between the BSO and the augmented BSO with a crossover operator, which is called a BSO-CV. It is demonstrated that the BSO-CV outperforms the standard BSO in 16 datasets and has the same classification accuracy as BSO in 6 datasets. Furthermore, the minimal accuracy for the BSO-CV is better than the minimum accuracy for the BSO in the six datasets (, , , , , and ), as opposed to reaching similar accuracy on average. Which indicates the effect of covering more positions in the search space. Thus, the newly generated solutions by using the CV operators play an essential role in widening the searchability of the algorithm and gaining the power to avoid local optima
Table 4.
Results of the proposed BSO vs augmented BSO with crossover in terms of average, standard deviation, minimum, maximum accuracy.
| Benchmark | Average |
Standard deviation |
Minimum |
Maximum |
||||
|---|---|---|---|---|---|---|---|---|
| BSO | BSO-CV | BSO | BSO-CV | BSO | BSO-CV | BSO | BSO-CV | |
| Diagnostic | 0.9912 | 0.9930 | 0.0091 | 0.0124 | 0.9549 | 0.9725 | 0.9800 | 1.0000 |
| Original | 0.9886 | 0.9886 | 0.0060 | 0.0060 | 0.9357 | 0.9557 | 0.9700 | 1.0000 |
| Prognostic | 0.8368 | 0.8474 | 0.0166 | 0.0299 | 0.7995 | 0.8421 | 0.8733 | 0.8947 |
| Coimbra | 0.8182 | 0.8182 | 0.0000 | 0.0000 | 0.7582 | 0.7782 | 0.7910 | 0.8182 |
| Retinopathy | 0.7348 | 0.7391 | 0.0074 | 0.0100 | 0.6917 | 0.7004 | 0.7210 | 0.7478 |
| Dermatology | 1.0000 | 1.0000 | 0.0000 | 0.0000 | 0.9290 | 0.9400 | 0.9600 | 1.0000 |
| ILPD-Liver | 0.7949 | 0.7966 | 0.0000 | 0.0054 | 0.7397 | 0.7466 | 0.7711 | 0.7966 |
| Lymphography | 0.9452 | 0.9729 | 0.0628 | 0.0351 | 0.8571 | 0.9286 | 0.95550 | 1.0000 |
| Parkinsons | 0.9845 | 0.9850 | 0.0250 | 0.0242 | 0.9474 | 0.9500 | 0.9676 | 1.0000 |
| ParkinsonC | 0.7833 | 0.7873 | 0.0158 | 0.0131 | 0.7332 | 0.7632 | 0.7900 | 0.8133 |
| SPECT | 0.8047 | 0.8180 | 0.0479 | 0.0266 | 0.7989 | 0.8889 | 0.8419 | 0.8889 |
| Cleveland | 0.6817 | 0.7041 | 0.0390 | 0.0145 | 0.6097 | 0.7737 | 0.7022 | 0.7241 |
| HeartEW | 0.9148 | 0.9370 | 0.0250 | 0.0250 | 0.6800 | 0.7078 | 0.9411 | 0.9630 |
| Hepatitis | 0.9750 | 0.9875 | 0.0530 | 0.0395 | 0.8362 | 0.9350 | 0.98100 | 1.0000 |
| SAHeart | 0.7478 | 0.7500 | 0.0112 | 0.0185 | 0.7391 | 0.8730 | 0.7509 | 0.7826 |
| Spectfheart | 0.9051 | 0.9084 | 0.0406 | 0.0267 | 0.7091 | 0.8846 | 0.9445 | 0.9615 |
| Thyroid0387 | 0.9894 | 0.9881 | 0.0032 | 0.0032 | 0.9145 | 0.9400 | 0.9831 | 0.9944 |
| Heart | 0.9074 | 0.9259 | 0.0000 | 0.0195 | 0.7082 | 0.7682 | 0.9033 | 0.9259 |
| Pima-diabetes | 0.8182 | 0.8182 | 0.0000 | 0.0000 | 0.8344 | 0.8959 | 0.7982 | 0.8182 |
| Leukemia | 1.0000 | 1.0000 | 0.0000 | 0.0000 | 0.9066 | 0.9321 | 0.9870 | 1.0000 |
| Prostate_GE | 0.9900 | 1.0000 | 0.0316 | 0.0000 | 0.8571 | 0.8571 | 0.9822 | 1.0000 |
| BreastEW | 0.9895 | 0.9912 | 0.0092 | 0.0123 | 0.8093 | 0.9080 | 0.9810 | 1.0000 |
| Colon | 0.9571 | 0.9571 | 0.0690 | 0.0690 | 0.9000 | 0.9049 | 0.9744 | 1.0000 |
On the other hand, and based on the selected number of features, Table 5 shows that the BSO-CV achieves a higher reduction rate by adopting fewer features in 14 datasets since it discovers fewer positions in the search space compared to the BSO. In comparison, the BSO-CV surpasses the BSO in terms of reduction rate on the COVID-19 dataset by shrinking the dataset’s dimension by 89% as opposed to the BSO’s 79% as shown in Table 10. As a result, BSO and BSO-CV show the most significant reduction rate for solving FS problems.
Table 5.
Results of the proposed BSO vs augmented BSO with crossover in terms of the average, standard deviation, minimum and maximum number of selected features.
| Benchmark | Average |
Standard deviation |
Minimum |
Maximum |
||||
|---|---|---|---|---|---|---|---|---|
| BSO | BSO-CV | BSO | BSO-CV | BSO | BSO-CV | BSO | BSO-CV | |
| Diagnostic | 14.4000 | 13.0000 | 2.6331 | 1.4907 | 9.0000 | 11.0000 | 18.0000 | 15.0000 |
| Original | 3.3000 | 3.2000 | 0.4830 | 0.4216 | 3.0000 | 3.0000 | 4.0000 | 4.0000 |
| Prognostic | 15.5000 | 14.2000 | 2.6352 | 2.2509 | 12.0000 | 10.0000 | 21.0000 | 18.0000 |
| Coimbra | 6.1000 | 6.0000 | 0.3162 | 0.0000 | 6.0000 | 6.0000 | 7.0000 | 6.0000 |
| Retinopathy | 11.2000 | 10.5000 | 2.2010 | 1.9003 | 8.0000 | 8.0000 | 14.0000 | 14.0000 |
| Dermatology | 17.3000 | 17.3000 | 1.7670 | 1.7670 | 14.0000 | 15.0000 | 19.0000 | 19.0000 |
| ILPD-Liver | 4.0000 | 3.6000 | 0.4714 | 0.5164 | 3.0000 | 3.0000 | 5.0000 | 4.0000 |
| Lymphography | 9.4000 | 9.1000 | 3.3015 | 1.9120 | 16.0000 | 7.0000 | 27.0000 | 12.0000 |
| Parkinsons | 9.1000 | 8.5000 | 1.9692 | 1.3540 | 7.0000 | 4.0000 | 13.0000 | 10.0000 |
| ParkinsonC | 461.4000 | 466.1000 | 23.5523 | 20.1299 | 425.0000 | 437.0000 | 496.0000 | 508.0000 |
| SPECT | 12.5000 | 12.6000 | 1.5811 | 1.8379 | 9.0000 | 10.0000 | 15.0000 | 15.0000 |
| Cleveland | 6.4000 | 6.5000 | 0.9661 | 0.9718 | 5.0000 | 5.0000 | 8.0000 | 8.0000 |
| HeartEW | 7.3000 | 6.8000 | 1.4181 | 0.6325 | 5.0000 | 6.0000 | 9.0000 | 8.0000 |
| Hepatitis | 5.9000 | 6.9000 | 1.8529 | 1.9120 | 2.0000 | 4.0000 | 8.0000 | 9.0000 |
| SAHeart | 3.1000 | 2.6000 | 0.5676 | 0.5164 | 2.0000 | 2.0000 | 4.0000 | 3.0000 |
| Spectfheart | 22.7000 | 25.0000 | 3.3015 | 3.0551 | 16.0000 | 19.0000 | 27.0000 | 30.0000 |
| Thyroid0387 | 10.0000 | 9.5000 | 0.9428 | 1.8409 | 9.0000 | 6.0000 | 12.0000 | 12.0000 |
| Heart | 5.2000 | 4.2000 | 0.9189 | 1.3166 | 4.0000 | 3.0000 | 6.0000 | 6.0000 |
| Pima-diabetes | 4.0000 | 4.0000 | 0.0000 | 0.0000 | 4.0000 | 4.0000 | 4.0000 | 4.0000 |
| Leukemia | 3502.8000 | 3498.1000 | 32.8830 | 22.0880 | 3453.0000 | 3468.0000 | 3541.0000 | 3533.0000 |
| Prostate_GE | 2969.8000 | 2982.5000 | 28.9513 | 48.0954 | 2934.0000 | 2898.0000 | 3028.0000 | 3055.0000 |
| BreastEW | 15.8000 | 16.0000 | 1.6193 | 2.7889 | 13.0000 | 12.0000 | 19.0000 | 20.0000 |
| Colon | 965.3000 | 970.3000 | 15.8749 | 12.6232 | 937.0000 | 945.0000 | 986.0000 | 994.0000 |
Furthermore, since the BSO-CV selects fewer features and higher variations in the conducting accuracy compared with the BSO, the BSO-CV achieves the lowest fitness values during the algorithm’s iterations. Since the reduction rate and the classification error rate are the fitness function’s objectives, as shown in Eq. (23). In terms of the number of selected features, Table 5 shows the best selected number of features. While the best fitness values for both BSO and BSO-CV are illustrated in Table 6.
Table 6.
Results of the proposed BSO vs augmented BSO with crossover in terms of the average, standard deviation, minimum, and maximum fitness values.
| Benchmark | Average |
Standard deviation |
Minimum |
Maximum |
||||
|---|---|---|---|---|---|---|---|---|
| BSO | BSO-CV | BSO | BSO-CV | BSO | BSO-CV | BSO | BSO-CV | |
| Diagnostic | 0.0230 | 0.0217 | 0.0033 | 0.0005 | 0.0204 | 0.0211 | 0.0321 | 0.0224 |
| Original | 0.0122 | 0.0114 | 0.0029 | 0.0022 | 0.0105 | 0.0105 | 0.0176 | 0.0176 |
| Prognostic | 0.1869 | 0.1813 | 0.0117 | 0.0112 | 0.1625 | 0.1593 | 0.2122 | 0.1877 |
| Coimbra | 0.0929 | 0.0928 | 0.0004 | 0.0000 | 0.0928 | 0.0928 | 0.0939 | 0.0928 |
| Retinopathy | 0.2353 | 0.2337 | 0.0015 | 0.0010 | 0.2329 | 0.2323 | 0.2382 | 0.2355 |
| Dermatology | 0.0051 | 0.0051 | 0.0005 | 0.0004 | 0.0041 | 0.0044 | 0.0056 | 0.0056 |
| ILPD-Liver | 0.2259 | 0.2255 | 0.0005 | 0.0005 | 0.2249 | 0.2249 | 0.2269 | 0.2259 |
| Lymphography | 0.0598 | 0.0528 | 0.0175 | 0.0235 | 0.0386 | 0.0067 | 0.0749 | 0.0738 |
| Parkinsons | 0.0598 | 0.0851 | 0.0168 | 0.0106 | 0.0549 | 0.0789 | 0.1056 | 0.1056 |
| ParkinsonC | 0.2297 | 0.2245 | 0.0094 | 0.0030 | 0.2220 | 0.2224 | 0.2483 | 0.2291 |
| SPECT | 0.1738 | 0.1608 | 0.0179 | 0.0087 | 0.1348 | 0.1540 | 0.1927 | 0.1736 |
| Cleveland | 0.1738 | 0.3390 | 0.0089 | 0.0010 | 0.3346 | 0.3172 | 0.3578 | 0.3520 |
| HeartEW | 0.1670 | 0.1611 | 0.0079 | 0.0098 | 0.1521 | 0.1513 | 0.1719 | 0.1712 |
| Hepatitis | 0.0218 | 0.0100 | 0.0292 | 0.0190 | 0.0028 | 0.0022 | 0.0652 | 0.0641 |
| SAHeart | 0.2940 | 0.2934 | 0.0006 | 0.0006 | 0.2928 | 0.2928 | 0.2950 | 0.2939 |
| Spectfheart | 0.0948 | 0.0841 | 0.0167 | 0.0117 | 0.0613 | 0.0622 | 0.1164 | 0.0995 |
| Thyroid0387 | 0.0208 | 0.0192 | 0.0025 | 0.0026 | 0.0169 | 0.0152 | 0.0254 | 0.0222 |
| Heart | 0.1840 | 0.1813 | 0.0114 | 0.0078 | 0.1700 | 0.1700 | 0.2067 | 0.1867 |
| Pima-diabetes | 0.1862 | 0.1862 | 0.0000 | 0.0000 | 0.1862 | 0.1862 | 0.1862 | 0.1862 |
| Leukemia | 0.1463 | 0.1463 | 0.0000 | 0.0000 | 0.1463 | 0.1463 | 0.1464 | 0.1464 |
| Prostate_GE | 0.0545 | 0.0545 | 0.0000 | 0.0000 | 0.0544 | 0.0544 | 0.0546 | 0.0546 |
| BreastEW | 0.0079 | 0.0062 | 0.0038 | 0.0027 | 0.0053 | 0.0040 | 0.0138 | 0.0134 |
| Colon | 0.2524 | 0.2523 | 0.0001 | 0.0001 | 0.2522 | 0.2522 | 0.2525 | 0.2524 |
Evolutionary algorithms are iterative-nature algorithms, which initially begin with random solutions and iteratively update and generate new solutions. Then, based on the fitness function, the most feasible solution will be compared with the upcoming generated solution’s fitness value and kept the best. As for the proposed BSO and BSO-CV, Table 7 presents the running time for both algorithms. It is clearly shown that the BSO-CV takes less time than the BSO in 17 datasets. Furthermore, the BSO-CV saves computation resources and time by achieving the best fitness value in early iterations since it uses a crossover operator and covers more positions in the search space. In other words, it balances the exploration and exploitation phases, which gives it the power to avoid falling into local optima.
Table 7.
Results of the proposed BSO vs augmented BSO with crossover in terms of the average, standard deviation, minimum, and maximum running times.
| Benchmark | Average |
Standard deviation |
Minimum |
Maximum |
||||
|---|---|---|---|---|---|---|---|---|
| BSO | BSO-CV | BSO | BSO-CV | BSO | BSO-CV | BSO | BSO-CV | |
| Diagnostic | 24.0234 | 20.1709 | 1.2632 | 0.0919 | 22.3863 | 20.0304 | 25.7645 | 20.2939 |
| Original | 52.4918 | 21.7947 | 1.4929 | 0.6899 | 50.6546 | 19.8469 | 55.1332 | 22.1207 |
| Prognostic | 68.5666 | 19.3182 | 1.7727 | 2.1484 | 65.8117 | 18.1769 | 71.0640 | 24.6123 |
| Coimbra | 21.6436 | 31.3520 | 4.3715 | 1.5690 | 14.0520 | 30.1216 | 27.5464 | 35.1497 |
| Retinopathy | 143.4651 | 25.5549 | 8.6208 | 1.1699 | 122.1240 | 24.5064 | 148.8203 | 28.7080 |
| Dermatology | 19.8110 | 19.6653 | 1.2157 | 0.0583 | 18.0606 | 19.5964 | 21.5541 | 19.7621 |
| ILPD-Liver | 40.2314 | 21.2359 | 1.6953 | 0.14384 | 38.0070 | 21.0012 | 43.2428 | 21.5077 |
| Lymphography | 48.6365 | 18.7374 | 47.3621 | 18.5243 | 50.1288 | 19.0404 | 2.4862 | 0.1618 |
| Parkinsons | 61.0268 | 18.4078 | 0.9944 | 0.1112 | 59.6189 | 18.2514 | 62.7003 | 18.5701 |
| ParkinsonC | 71.5125 | 85.9408 | 2.7348 | 0.7091 | 67.6873 | 85.0221 | 74.2297 | 87.0784 |
| SPECT | 88.4608 | 19.4803 | 1.5437 | 0.2716 | 86.3784 | 19.0815 | 91.1695 | 19.8700 |
| Cleveland | 13.7458 | 34.7516 | 4.1938 | 6.1588 | 7.3716 | 21.4041 | 18.0061 | 41.4505 |
| HeartEW | 31.8354 | 19.6732 | 1.3055 | 0.0925 | 29.7337 | 19.5130 | 33.8312 | 19.8095 |
| Hepatitis | 35.6952 | 17.9791 | 1.5000 | 0.1082 | 33.8597 | 17.8650 | 38.6481 | 18.1687 |
| SAHeart | 86.1272 | 20.5537 | 1.6604 | 0.0840 | 83.2163 | 20.4461 | 88.0913 | 20.7296 |
| Spectfheart | 92.3180 | 19.3628 | 1.2729 | 0.1233 | 90.6496 | 19.2185 | 94.3931 | 19.5442 |
| Thyroid0387 | 128.0856 | 169.6468 | 4.6228 | 7.6117 | 118.4025 | 158.0242 | 131.3219 | 179.4991 |
| Heart | 28.5277 | 19.3467 | 1.3055 | 0.4925 | 26.3888 | 17.9694 | 30.4858 | 19.7040 |
| Pima-diabetes | 64.7067 | 22.0212 | 1.7239 | 0.8515 | 65.8117 | 19.7021 | 66.9316 | 22.7713 |
| Leukemia | 56.6581 | 45.1058 | 2.4862 | 0.3344 | 53.3672 | 44.2620 | 59.1867 | 45.3924 |
| Prostate_GE | 149.3277 | 81.1939 | 7.0456 | 4.4155 | 135.0467 | 70.0391 | 155.6277 | 86.0762 |
| BreastEW | 4.5585 | 21.0010 | 0.0729 | 0.1930 | 4.5031 | 20.7807 | 4.7405 | 21.3079 |
| Colon | 5.8711 | 23.8118 | 0.5273 | 0.1527 | 5.5947 | 23.6718 | 7.0661 | 24.1146 |
Since the BSO-CV achieved the best fitness values in early iterations, the proposed algorithm has a fast convergence speed. Fig. 6 illustrates the convergence curves for both the BSO and BSO-CV algorithms. The convergence curves prove the strength of covering more positions in the search space and the effect of balancing the exploration and exploitation phases. As shown for the SAHeart, Spectfheart, SPECT, Heart, and Covid-19 subfigures, the BSO is trapped in local optima. In comparison, the BSO-CV can avoid and go beyond the local optima and not be stuck in this local optimum solution
Fig. 6.
Convergence curves for BSO-CV and BSO methods on the medical benchmark data sets and COVID-19 dataset.
Additionally, Fig. 5 shows the box plots for the BSO and BSO-CV methods on the tested datasets regarding classification accuracy. The -axis represents the classification accuracy, while the -axis represents the tested methods. It is clearly shown that the distribution of the BSO-CV is better than BSO since the median of the BSO-CV’s box plots is greater than or equal, in some cases, compared with the median of the BSO. That proves the robustness of the proposed method. Fig. 7 shows the average maximum and minimum accuracy of BSO and BSO-CV for all datasets
Fig. 5.
Boxplots for BSO-CV and BSO methods on the medical benchmark datasets and COVID-19 dataset.
Fig. 7.
Average of maximum and minimum accuracy of BSO and BSO-CV for all datasets.
In classification problems, sensitivity and specificity are essential for determining how well a model can forecast true positives and negatives for each category. Furthermore, more diversification in the obtained solutions means covering a more expansive space in the search space, which forces the algorithm to be more sensitive and specific. Table 8, Table 9 show the sensitivity and specificity of the proposed BSO and BSO-CV, respectively. As clearly shown, the BSO-CV is more sensitive and specific than the BSO in 22 datasets and performed better for the Covid-19 dataset. The reason behind that is the crossover technique, which gives the BSO-CV the strength to discover more positions in the search space, balances the exploration and exploitation phases, and gives it the power to avoid trapping into local optima.
Table 8.
Comparison results of the proposed BSO vs augmented BSO with crossover in terms of average, standard deviation, minimum and maximum sensitivity.
| Benchmark | Average |
Standard deviation |
Minimum |
Maximum |
||||
|---|---|---|---|---|---|---|---|---|
| BSO | BSO-CV | BSO | BSO-CV | BSO | BSO-CV | BSO | BSO-CV | |
| Diagnostic | 0.9635 | 0.9684 | 0.0113 | 0.0222 | 0.9437 | 0.9221 | 0.9825 | 1.0000 |
| Original | 0.9704 | 0.9718 | 0.0094 | 0.0118 | 0.9577 | 0.9420 | 0.9867 | 0.9853 |
| Prognostic | 0.1933 | 0.2103 | 0.1549 | 0.2268 | 0.0000 | 0.0000 | 0.4286 | 0.6667 |
| Coimbra | 0.6407 | 0.7156 | 0.1371 | 0.1273 | 0.5455 | 0.4444 | 1.0000 | 0.8571 |
| Retinopathy | 0.6443 | 0.6726 | 0.0328 | 0.6239 | 0.5882 | 0.0293 | 0.7025 | 0.7154 |
| Dermatology | 0.9500 | 0.9817 | 0.0304 | 0.0158 | 0.9200 | 0.9500 | 1.0000 | 1.0000 |
| ILPD-Liver | 0.8170 | 0.8352 | 0.0496 | 0.0629 | 0.7284 | 0.6829 | 0.8875 | 0.8941 |
| Lymphography | 0.4473 | 0.4861 | 0.4239 | 0.4749 | 0.0000 | 0.0000 | 0.9286 | 1.0000 |
| Parkinsons | 0.9494 | 0.9531 | 0.0558 | 0.0286 | 0.8519 | 0.9000 | 1.0000 | 1.0000 |
| ParkinsonC | 0.8835 | 0.8861 | 0.0340 | 0.0369 | 0.8182 | 0.8073 | 0.9252 | 0.9292 |
| SPECT | 0.5112 | 0.5405 | 0.1152 | 0.0747 | 0.3529 | 0.4000 | 0.7059 | 0.6667 |
| Cleveland | 0.1792 | 0.5405 | 0.0953 | 0.0948 | 0.0000 | 0.0769 | 0.3000 | 0.3636 |
| HeartEW | 0.8297 | 0.8783 | 0.0581 | 0.0786 | 0.7931 | 0.7097 | 0.9630 | 0.9667 |
| Hepatitis | 0.2000 | 0.3283 | 0.2297 | 0.3234 | 0.0000 | 0.0000 | 0.5000 | 1.0000 |
| SAHeart | 0.3854 | 0.4916 | 0.0531 | 0.0914 | 0.3333 | 0.3333 | 0.5000 | 0.6071 |
| Spectfheart | 0.8520 | 0.8661 | 0.0404 | 0.0705 | 0.7857 | 0.7436 | 0.9024 | 0.9474 |
| Thyroid0387 | 0.8472 | 0.8345 | 0.0816 | 0.0986 | 0.7097 | 0.6923 | 0.9655 | 1.0000 |
| Heart | 0.8244 | 0.8544 | 0.0694 | 0.0847 | 0.7353 | 0.7241 | 0.9355 | 0.9643 |
| Pima-diabetes | 0.5008 | 0.5362 | 0.0473 | 0.0586 | 0.4483 | 0.3750 | 0.6102 | 0.5800 |
| Leukemia | 0.6470 | 0.7895 | 0.2734 | 0.1701 | 0.3333 | 0.5000 | 1.0000 | 1.0000 |
| Prostate_GE | 0.8392 | 0.8769 | 0.0882 | 0.0892 | 0.7273 | 0.7273 | 1.0000 | 1.0000 |
| BreastEW | 0.9411 | 0.9521 | 0.0219 | 0.0159 | 0.9130 | 0.9306 | 0.9857 | 0.9726 |
| Colon | 0.5238 | 0.5850 | 0.1903 | 0.2304 | 0.3333 | 0.2500 | 1.0000 | 1.0000 |
Table 9.
Results of the proposed BSO vs augmented BSO with crossover in terms of average, standard deviation, minimum and maximum specificity.
| Benchmark | Average |
Standard deviation |
Minimum |
Maximum |
||||
|---|---|---|---|---|---|---|---|---|
| BSO | BSO-CV | BSO | BSO-CV | BSO | BSO-CV | BSO | BSO-CV | |
| Diagnostic | 0.9058 | 0.9133 | 0.0387 | 0.0355 | 0.8571 | 0.8605 | 0.9778 | 0.9722 |
| Original | 0.9735 | 0.9794 | 0.0149 | 0.0111 | 0.9531 | 0.9577 | 1.0000 | 0.9867 |
| Prognostic | 0.8909 | 0.9218 | 0.0419 | 0.0527 | 0.8214 | 0.8125 | 0.9688 | 1.0000 |
| Coimbra | 0.7087 | 0.7636 | 0.1253 | 0.1587 | 0.5714 | 0.5000 | 1.0000 | 1.0000 |
| Retinopathy | 0.7088 | 0.7130 | 0.0450 | 0.6577 | 0.6449 | 0.0383 | 0.7822 | 0.7565 |
| Dermatology | 0.9404 | 0.9415 | 0.0354 | 0.0285 | 0.8824 | 0.8889 | 1.0000 | 0.9800 |
| ILPD-Liver | 0.2442 | 0.2980 | 0.0624 | 0.1057 | 0.1379 | 0.1667 | 0.3611 | 0.4545 |
| Lymphography | 0.7311 | 0.7699 | 0.0839 | 0.1392 | 0.6154 | 0.5357 | 0.8571 | 1.0000 |
| Parkinsons | 0.6498 | 0.6745 | 0.0756 | 0.1030 | 0.5000 | 0.4545 | 0.8889 | 0.8182 |
| ParkinsonC | 0.2856 | 0.2901 | 0.0149 | 0.0573 | 0.1622 | 0.1842 | 0.4048 | 0.3714 |
| SPECT | 0.6075 | 0.7784 | 0.1131 | 0.6389 | 0.3889 | 0.9000 | 0.7778 | 0.0787 |
| Cleveland | 0.6215 | 0.6469 | 0.0417 | 0.0647 | 0.5714 | 0.5490 | 0.7021 | 0.7647 |
| HeartEW | 0.7152 | 0.7208 | 0.1111 | 0.0996 | 0.5714 | 0.6000 | 0.8333 | 0.9130 |
| Hepatitis | 0.9552 | 0.9565 | 0.0514 | 0.0504 | 0.8571 | 0.8571 | 1.0000 | 1.0000 |
| SAHeart | 0.7440 | 0.7901 | 0.0700 | 0.0504 | 0.7049 | 0.8571 | 0.9492 | 1.0000 |
| Spectfheart | 0.4165 | 0.4879 | 0.1507 | 0.1328 | 0.2222 | 0.2000 | 0.7000 | 0.7000 |
| Thyroid0387 | 0.9828 | 0.9815 | 0.0037 | 0.0039 | 0.9759 | 0.9780 | 0.9866 | 0.9908 |
| Heart | 0.6629 | 0.6686 | 0.0890 | 0.0946 | 0.5200 | 0.5357 | 0.8095 | 0.8000 |
| Pima-diabetes | 0.8007 | 0.8211 | 0.0319 | 0.0390 | 0.7573 | 0.7292 | 0.8632 | 0.8614 |
| Leukemia | 0.9657 | 0.9678 | 0.0738 | 0.0564 | 0.7778 | 0.8571 | 1.0000 | 1.0000 |
| Prostate_GE | 0.8852 | 0.8887 | 0.0753 | 0.0998 | 0.8000 | 0.6667 | 1.0000 | 1.0000 |
| BreastEW | 0.8920 | 0.9015 | 0.0697 | 0.0472 | 0.7442 | 0.8298 | 1.0000 | 0.9545 |
| Colon | 0.8276 | 0.8663 | 0.1704 | 0.1635 | 0.4444 | 0.5556 | 1.0000 | 1.0000 |
Finally, the conducted results show superior performance for the proposed BSO and BSO-CV for solving the FS problems. The crossover techniques have a significant effect in balancing the exploration and exploitation phases. This plays a vital role in allowing the algorithm to converge more quickly and avoid becoming stuck in local optima. This results in a more reliable ML algorithm.
5.6. Comparison with other meta-heuristic algorithms in the literature
The above results show that the BSO-CV achieved promising results in classification accuracy, running time, sensitivity, specificity, convergence, and boxplots. Also, it achieves competitive results regarding the fitness value and the size of the selected feature subset. To validate the results and show their reliability, the proposed BSO-CV is compared against six methods in the literature. These methods are LBMFO-V3 [12] by using 23 medical datasets. Then, it was compared against HLBDA [2] using the COVID-19 dataset. Then, it is compared with the CHIO-GC [2] using 23 medical datasets and the COVID-19 dataset. Finally, the BSO-CV will be compared to four filter methods used in previous studies: Chi-square, Relief, correlation-based feature selection (CFS), and information gain (IG) [12].
5.6.1. Comparison with CHIO-GC
Table 11 shows a comparison between the proposed BSO-CV and CHIO-GC in terms of the average accuracy and the average feature subset size. It appears that the BSO-CV outperforms the CHIO-GC in all of the datasets except three, which are the ParkinsonsC, Prognostic, and Coimbra. Regarding the feature subset size, it appears that the BSO-CV outperforms the CHIO-GC in 57% of the datasets. Through ten datasets, the CHIO-GC achieved better feature selection than the BSO-CV. These datasets are Diagnostic, Prognostic, Coimbra, Retinopathy, ParkinsonsC, SPECT, HeartEW, Spectfheart Thyroid0387, and BreastEW.
Table 11.
Comparison results of the BSO-CV with LBMFO-V3 and CHIO-GC.
| Benchmark | Average accuracy |
Average selection size |
||||
|---|---|---|---|---|---|---|
| BSO-CV | CHIO-GC | LBMFO-V3 | BSO-CV | CHIO-GC | LBMFO-V3 | |
| Diagnostic | 0.9930 | 0.9033 | 0.9100 | 14.4000 | 13.3700 | 13.9991 |
| Original | 0.9886 | 0.9710 | 0.9683 | 3.3000 | 5.1040 | 5.5000 |
| Prognostic | 0.8474 | 0.6716 | 0.9312 | 15.5000 | 14.6202 | 3.5103 |
| Coimbra | 0.8182 | 0.8896 | 0.9312 | 6.1000 | 3.6007 | 3.5103 |
| Retinopathy | 0.7391 | 0.6436 | 0.5380 | 11.2000 | 7.2647 | 6.9002 |
| Dermatology | 1.0000 | 0.8006 | 0.8442 | 17.3000 | 18.4900 | 18.3541 |
| ILPD-Liver | 0.7966 | 0.7716 | 0.7143 | 4.0000 | 4.0000 | 4.0000 |
| Lymphography | 0.9729 | 0.8343 | 0.8002 | 9.4000 | 10.0622 | 9.7520 |
| Parkinsons | 0.9850 | 0.7903 | 0.7689 | 9.1000 | 9.7383 | 10.3584 |
| ParkinsonC | 0.7873 | 0.8400 | 0.8190 | 461.4000 | 365.8322 | 369.1070 |
| SPECT | 0.8180 | 0.6960 | 0.6576 | 12.5000 | 9.6050 | 10.7832 |
| Cleveland | 0.7041 | 0.5966 | 0.5333 | 6.4000 | 6.8097 | 6.6899 |
| HeartEW | 0.9370 | 0.9116 | 0.9388 | 7.3000 | 7.0105 | 6.3100 |
| Hepatitis | 0.9875 | 0.7903 | 0.7500 | 5.9000 | 8.2011 | 8.3569 |
| SAHeart | 0.7500 | 0.7036 | 0.6992 | 3.1000 | 3.1551 | 3.2222 |
| Spectfheart | 0.9084 | 0.7303 | 0.7013 | 22.7000 | 21.0030 | 20.4598 |
| Thyroid0387 | 0.9881 | 0.9603 | 0.9776 | 10.0000 | 8.0116 | 8.4563 |
| Heart | 0.9259 | 0.8126 | 0.7603 | 5.2000 | 6.1505 | 6.2752 |
| Pima-diabetes | 0.8182 | 0.7956 | 0.8065 | 4.0000 | 6.8387 | 6.7612 |
| Leukemia | 1.0000 | 0.9900 | 1.0000 | 3502.8000 | 3560.5107 | 3570.7137 |
| Prostate_GE | 1.0000 | 0.6010 | 0.5056 | 2969.8000 | 2979.4116 | 2984.7153 |
| BreastEW | 0.9912 | 0.9400 | 0.9398 | 16.0000 | 13.7303 | 13.9714 |
| Colon | 0.9571 | 0.7176 | 0.6667 | 970.3000 | 1000.0067 | 991.5551 |
In Fig. 8, the BSO-CV is compared with the CHIO-GC in terms of the classification accuracy and number of selected features using 23 benchmark medical datasets. From the subfigure on the left side, the BSO-CV achieved an average classification accuracy of 89.9 across the 23 datasets, which outperformed the CHIO-GC, which got 78.5. In terms of feature selection size, the BSO-CV achieved a smaller rate, which equals 350.2 features across all datasets, while the CHIO-GC achieved 351.3 features.
Fig. 8.
Average accuracy and average number of selected features using BSO-CV and CHIO-GC methods on 23 benchmark medical datasets.
As mentioned in [2], it is clear that the CHIO-GC employed the greedy crossover approach, which greedily takes the best candidate to generate new solutions. Which, in turn, eliminates the worst solution. The worst solutions can form better solutions in the upcoming generations using different searching techniques [43]. In contrast, the BSO-CV employs the roulette wheels mechanism for choosing the crossover operator (single point, double point, and uniform), enhancing the generated solutions’ diversity and avoiding trapping into local optima. On the other hand, the SO algorithm initializes the population based on two groups, males and females, which makes the initial population more discoverable. For these reasons, the BSO-CV shows a better performance compared with CHIO-GC.
Fig. 9, shows graphically the results of the BSO-CV compared with CHIO-GC. BSO-CV achieved an accuracy of 95.9 on the COVID-19 dataset. However, the CHIO-GC achieved smaller accuracy equal to 93.2. Regarding the number of selected features,
Fig. 9.
Average accuracy and average number of selected features using BSO-CV and CHIO-GC methods on COVID-19 dataset.
5.6.2. Comparison with LBMFO-V3
Table 11 shows that the BSO-CV outperformed the LBMFO-V3 in all the datasets except Prognostic, Coimbra, ParkinsonC, and HeartEW. This indicates the outperformance of the proposed BSO-CV in 83% of the datasets. Also, it appears from Table 11, that the BSO-CV outperformed the LBMFO-V3 in 57% of the datasets.
Fig. 10 shows graphically the results of this comparison. It appears from the subfigure on the left side that the BSO-CV achieved an average accuracy of 89.9% across all the datasets. However, the LBMFO-V3 achieved an average accuracy of 76.5% across all datasets. This indicates that the BSO-CV outperforms the LBMFO-V3 in 13.4% of the datasets. It appears from the subfigure on the right side that the BSO-CV achieved an average of 350.2 selected features from all the datasets. On the other hand, LBMFO-V3 has an average of 351.7 which is higher than the BSO-CV.
Fig. 10.
Average accuracy and the average number of selected features using BSO-CV and LBMFO-V3 methods on 23 benchmark medical datasets.
As mentioned earlier, The proposed BSO-CV employs different operators for crossover, which gives the proposed algorithm the power to cover more regions in the search space and avoid falling into local optima regions. In other words, it creates a balance between the exploration and exploitation phases. In addition, the MFO suffers from slow population diversity [44], where the BSO-CV solves this issue by using groups of males and females in the initialization phase. For this, the BSO-CV shows a better performance compared with LBMFO.
5.6.3. Comparison with HLBDA
Fig. 11 shows that the BSO-CV is compared with the HLBDA in terms of the classification accuracy and the number of selected features using the COVID-19 dataset. The subfigure on the right side achieved an accuracy of 95.9, while the HLBDA achieved an accuracy of 91.5. On the other hand, in the subfigure on the right side, the HLBDA achieved an average selection rate of 1.7, whereas the BSO-CV achieved a greater selection rate of 2.3 features.
Fig. 11.
Average accuracy and the average number of selected features using BSO-CV and LBMFO-V3 methods on the COVID-19 dataset.
The hyperlearning approach was introduced to the HLDA algorithm to enhance the search capability of the BDA by considering both individual and group bests. The algorithm uses this approach to circumvent the local optima problem. However, using the search strategy to find more positions in the search space is still restricted. By comparison, the proposed BSO-CV can outperform the HLDA by using crossover operators to identify more positions in the search space.
5.7. Comparison with filter-based methods
In this subsection, the proposed BSO-CV as a wrapper-based method is compared against four general filter-based approaches, namely, Chi-square, relief, CFS, and IG. Table 12 shows the average accuracy achieved by these methods after applying them 30 times to the 23 medical benchmark datasets. It can be observed from Table 12 that the BSO-CV exceeded all the filters in all the datasets except in one case. This is the case when the Chi-square is implemented on the SPECT dataset. Fig. 12 shows the accuracy rate of the BSO-CV and the four filter methods. It appears from the red line that represents the BSO-CV that it occupies a larger area of the accuracy radar shape. Table 13 shows that BSO-CV was superior in 81% of the datasets.
Table 12.
Comparison between the proposed BSO-CV and filter methods using the classification accuracy.
| Benchmark | BSO-CV | Chi-square | Relief | CFS | IG |
|---|---|---|---|---|---|
| Diagnostic | 0.9930 | 0.5714 | 0.9585 | 0.9533 | 0.9349 |
| Original | 0.9886 | 0.9091 | 0.6426 | 0.6860 | 0.6759 |
| Prognostic | 0.8447 | 0.5910 | 0.7727 | 0.7576 | 0.7577 |
| Coimbra | 0.8182 | 0.3846 | 0.6672 | 0.5763 | 0.5578 |
| Retinopathy | 0.7391 | 0.6349 | 0.5036 | 0.4783 | 0.5393 |
| Dermatology | 1.0000 | 0.7250 | 0.7248 | 0.4732 | 0.4021 |
| ILPD-Liver | 0.7949 | 0.7106 | 0.5119 | 0.5223 | 0.5264 |
| Lymphography | 0.9729 | 0.8824 | 0.5886 | 0.5533 | 0.5204 |
| Parkinsons | 0.9850 | 0.7581 | 0.7588 | 0.7360 | 0.7150 |
| ParkinsonC | 0.7873 | 0.6593 | 0.6590 | 0.6487 | 0.6376 |
| SPECT | 0.8180 | 0.9667 | 0.5651 | 0.5508 | 0.5460 |
| Cleveland | 0.7041 | 0.3940 | 0.1181 | 0.0398 | 0.0826 |
| HeartEW | 0.9370 | 0.9334 | 0.6153 | 0.5757 | 0.6202 |
| Hepatitis | 0.9875 | 0.7778 | 0.5538 | 0.5857 | 0.6417 |
| SAHeart | 0.7500 | 0.6471 | 0.5024 | 0.5115 | 0.5227 |
| SPECTfheart | 0.9084 | 0.7000 | 0.6079 | 0.6279 | 0.5551 |
| Thyroid0387 | 0.9881 | 1.0000 | 0.6379 | 0.6955 | 0.9773 |
| Heart | 0.9259 | 0.5333 | 0.6317 | 0.5575 | 0.6114 |
| Pima-diabetes | 0.8182 | 0.6905 | 0.5147 | 0.5426 | 0.5264 |
| Leukemia | 1.0000 | 0.7120 | 0.6883 | 0.6759 | 0.6410 |
| Prostate_GE | 1.0000 | 0.5042 | 0.5033 | 0.4786 | 0.4421 |
| BreastEW | 0.9912 | 0.9365 | 0.8160 | 0.8029 | 0.8128 |
| Colon | 0.9571 | 0.5850 | 0.5641 | 0.5116 | 0.5097 |
Fig. 12.
Accuracy-based comparison between the BSO-CV and other common filter methods.
Table 13.
Comparison between the proposed BSO-CV and filter methods based on the number of selected features.
| Benchmark | BSO-CV | Chi-square | Relief | CFS | IG |
|---|---|---|---|---|---|
| Diagnostic | 13.0000 | 18.0000 | 17.5000 | 16.0000 | 15.0000 |
| Original | 3.2000 | 4.0000 | 4.0000 | 4.0000 | 4.0000 |
| Prognostic | 14.2000 | 20.0000 | 18.0000 | 19.0000 | 21.0000 |
| Coimbra | 6.0000 | 7.0000 | 7.0000 | 7.0000 | 7.0000 |
| Retinopathy | 10.5000 | 14.0000 | 13.0000 | 11.0000 | 12.0000 |
| Dermatology | 17.3000 | 19.0000 | 15.0000 | 14.0000 | 16.0000 |
| ILPD-Liver | 3.6000 | 5.0000 | 4.0000 | 4.0000 | 4.0000 |
| Lymphography | 9.1000 | 16.0000 | 27.0000 | 12.0000 | 15.0000 |
| Parkinsons | 8.5000 | 13.0000 | 10.0000 | 9.0000 | 11.0000 |
| ParkinsonC | 466.1000 | 495.0000 | 496.0000 | 460.0000 | 461.0000 |
| SPECT | 12.6000 | 15.0000 | 13.0000 | 14.0000 | 14.0000 |
| Cleveland | 6.5000 | 7.0000 | 8.0000 | 7.0000 | 7.0000 |
| HeartEW | 6.8000 | 9.0000 | 8.0000 | 7.0000 | 9.0000 |
| Hepatitis | 6.9000 | 9.0000 | 8.0000 | 7.0000 | 9.0000 |
| SAHeart | 2.6000 | 4.0000 | 3.0000 | 3.0000 | 4.0000 |
| SPECTfheart | 25.0000 | 30.0000 | 27.0000 | 26.0000 | 29.0000 |
| Thyroid0387 | 9.5000 | 10.0000 | 12.0000 | 11.0000 | 10.0000 |
| Heart | 4.2000 | 6.0000 | 5.0000 | 6.0000 | 6.0000 |
| Pima-diabetes | 4.000 | 4.0000 | 4.0000 | 4.0000 | 4.0000 |
| Leukemia | 3498.1000 | 3500.0000 | 3530.0000 | 3533.0000 | 3540.0000 |
| Prostate_GE | 2982.5000 | 2972.0000 | 2540.5000 | 2533.0000 | 2966.5000 |
| BreastEW | 16.0000 | 19.0000 | 17.0000 | 20.0000 | 18.0000 |
| Colon | 970.3000 | 990.0000 | 985.0000 | 977.0000 | 980.0000 |
6. Computational statistical test analysis
The accuracy of the proposed BSO and its improved variant BSO-CV is evaluated in the previous section using two metric performance measures. It also contrasts their conclusions about FS problems with those of other meta-heuristic and filter-based techniques that have been previously reported in the literature. The average and standard deviation of the optimal solutions found thus far in 30 independent runs serve as the metric performance measures. These metrics broadly understand how well the proposed algorithms handle FS problems. The proposed algorithms’ average performance is shown in the first metric, and their consistency across all 30 independent runs is shown in the second statistic. Although these statistical metric measurements could lead to the suggested BSO and BSO-CV being generally reliable and resilient, they cannot compare each of the 30 independent runs separately. In other words, they have demonstrated that the proposed BSO and BSO-CV enjoy significant concentrations of exploitation and exploration but are unable to demonstrate their superiority.
This part applies statistical Friedman’s and Holm’s test methods to compare each independent run and highlights the significance of the results that were not produced by chance. Friedman’s test is a well-known non-parametric statistical test technique that is always used to evaluate algorithms’ performance levels. The goal of Friedman’s test is to ascertain whether there is a fundamental distinction between the outputs of the various algorithms [45]. The null hypothesis underlying this statistical test is that there is no variation in the accuracy of the compared algorithms. The algorithm with the best performance receives the lowest rank, and the algorithm with the worst performance gets the highest rank. Finding the -value of Friedman’s test for the results of the FS problems under consideration is necessary for the Friedman and Holm test techniques. If Friedman’s statistical test produces a -value that is equal to or less than the level of significance, it is equivalent to 0.05 in this case. The null hypothesis is disproved, indicating statistically significant variations in how well the comparing algorithms work. Following this test is a post-hoc test procedure, where Holm’s method can be used to examine the pairwise comparison of the algorithms. For post-hoc analysis, according to Friedman’s test, the algorithm with the lowest rank is typically used as a control technique.
In the following subsections, Friedman’s and Holm’s tests were applied to reveal that the average results shown in Table 11, Table 12 are statistically significant and do not swerve from the results of other wrapper-based and filter-based FS methods in a statistically respectable way.
6.1. A statistical test of BSO compared to other wrapper FS methods
To quantify the statistical disparity between BSO-CV and other wrapper FS methods, Friedman’s test [46] was performed with a significance level of alpha (i.e., 5%). Therefore, according to the outcomes shown in Table 11, the BSO-CV method will be ranked with other FS methods. A summary of the rankings of the FS approaches determined using Friedman’s test for the accuracy results shown in Table 11 is provided in Table 14.
Table 14.
A summary of the ranking results obtained by applying Friedman’s test to the results given in Table 11.
| Algorithm | Rank |
|---|---|
| BSO-CV | 1.282608 |
| CHIO-GC | 2.260869 |
| LBMFO-V3 | 2.456521 |
The -value reported by Friedman’s statistical method is based on the accuracy results of FS problems in Table 11 is 1.119E−4. The null hypothesis of equivalent performance was rejected to emphasize a statistically significant difference between the classification rates of the comparing algorithms. According to the results shown in Table 14, The BSO-CV algorithm outperformed all other comparative algorithms and has statistical significance on the datasets listed in Table 1; the BSO-CV has attained the lowest rank of 1.282608 with a degree of significance of 5%. CHIO-GC is the second-best performing algorithm on these datasets, and LBMFO-V3 is the third-best performing algorithm but underperformed BSO-CV. Overall, Friedman’s test results on the FS problems are in Table 1, producing the ranks BSO-CV, CHIO-GC, and LBMFO-V3 in that order. Then, Holm’s test method was used to determine whether there was a statistically significant difference between BSO-CV and the other algorithms displayed in Table 14. Statistical comparison was made utilizing Holm’s method, and the FS benchmark datasets outlined in Table 1 are shown in Table 15.
Table 15.
Results of Holm’s test method based on the results in Tables 14 for .
| i | Algorithm | -value | Hypothesis | ||
|---|---|---|---|---|---|
| 2 | LBMFO-V3 | 3.980932 | 6.864535E−05 | 0.025000 | Rejected |
| 1 | CHIO-GC | 3.317444 | 9.084512E−04 | 0.050000 | Rejected |
The findings of Holm’s technique are shown in Table 15 and reject hypotheses with -values less than 0.05. These findings demonstrate that BSO-CV outperforms other convincing competitors’ algorithms statistically. Furthermore, these findings indicate how BSO-CV has avoided the local optimal solutions by striking the ideal balance between its capabilities for exploration and exploitation.
6.2. A statistical test of BSO-CV compared to filter-based FS methods
As shown in Table 12, the results of the proposed BSO-CV are compared with those of well-known filter-based FS approaches to evaluate its robustness further. The acquired accuracy results are then statistically assessed using Friedman’s test, as shown in Table 16.
Table 16.
A summary of the ranking results obtained by Friedman’s test on the results presented in Table 12.
| Method | Rank |
|---|---|
| BSO-CV | 1.130434 |
| Chi-square | 2.478260 |
| Relief | 3.521739 |
| CFS | 3.782608 |
| IG | 4.086956 |
The accuracy results in Table 12 show that Friedman’s test’s predicted -value is 1.144E−10. According to the report of the findings presented in Table 16, the BSO-CV algorithm is the most effective overall and statistically significant. The BSO-CV performed the best, with a significance level of and the best rank of 1.130434. With a rank of 2.478260, the Chi-square method comes in second place. The rankings in Table 16 show that the BSO-CV proposed in this work outperforms the filter-based FS approaches evaluated in Table 12. In conclusion, the proposed BSO-CV comes out on top, with Chi-square, Relief, CFS, and IG coming in last.
The statistical significance of any differences between the filter-based FS method and the BSO-CV method is then determined using Holm’s test. Based on the datasets mentioned in Table 1, Table 17 presents the statistical findings of Holm’s test.
Table 17.
Results of Holm’s method based on the statistical results of the results in Table 16 (Friedman’s test with ).
| i | Method | -value | Hypothesis | ||
|---|---|---|---|---|---|
| 4 | IG | 6.341032 | 2.2823009 E-10 | 0.012500 | Rejected |
| 3 | CFS | 5.688279 | 1.283257E−08 | 0.016666 | Rejected |
| 2 | Relief | 5.128776 | 2.916314E−07 | 0.025000 | Rejected |
| 1 | Chi-square | 2.890764 | 0.003843 | 0.050000 | Rejected |
For the outcomes in Table 17, Holm’s test method rejects any hypothesis with a -value . From the data in Table 17, it can be shown that BSO-CV has outperformed filter-based approaches in terms of performance when used as an FS method. These results show that BSO-CV has successfully avoided local solutions while exploring and utilizing the search space of the FS datasets, which is statistically significant from an important point of view. In addition, the proposed BSO-CV selects the most critical features with minimum redundancy. Thus, the performance of the BSO-CV significantly outperformed the state-of-the-art filter methods on most of the benchmark datasets in Table 1. These findings support the proposed BSO’s efficiency in addressing FS tasks in the medical domain.
7. Conclusion and future works
Due to the importance of caring for people’s lives, diseases must be diagnosed accurately and impartially. Recently, swarm-based algorithms have proved their capability to perform disease classification very efficiently based on different data mining techniques such as feature selection. This study converts the recently developed SO algorithm into binary and enhances it using other evolutionary crossover methods. The enhanced BSO is applied to 23 medical benchmark datasets and a real-world COVID-19 dataset. Various evaluation measures are used to evaluate the performance of the proposed algorithm, including accuracy, the number of selected features, fitness value, running time, sensitivity, specificity, convergence curves, boxplots, and T-test. The average, standard deviation, and minimum and maximum values are reported for each evaluation measure.
The developed binary version of SO shows exciting results for solving a medical classification problem by achieving an accuracy of more than 90% in 11 datasets and exceeding 95% in 8 datasets. In addition, as the BSO can fall into local optima like any evolutionary algorithm, this paper proposes an augmented BSO with a crossover operator to avoid this drawback and balance the exploration and exploitation phases. The BSO-CV performs better than the BSO since it outperforms the BSO in 17 datasets and shrinks the COVID-19 dataset’s dimension by 89% as opposed to the BSO’s 79%. And Since the BSO-CV achieved the best fitness values in early iterations, the proposed algorithm has a fast convergence speed. It saves more resources by consuming less time than the BSO, and the BSO-CV is more sensitive and specific than the BSO. Conversely, the proposed BSO-CV was compared with CHIO-GC, LBMFO-V3, and HLBDA. The conducted results show the superior performance of the BSO-CV by achieving the best accuracies with more reduction rates. In contrast, the BSO-CV shows an ideal effect in solving FS problems; by comparing the outcomes of the BSO-CV with well-known filter-based FS methods, since the BSO-CV outperforms the standard filter-based FS methods.
In the future, we plan to use the proposed BSO algorithm in cyber security applications such as intrusion detection systems, ransomware detection, and blockchain applications. Furthermore, researchers can direct their efforts to generate a multi-objective version of the SO and use it in the electroencephalography (EEG) field. Other operators and modification methods can be integrated with the SO algorithm to generate a hybrid binary version that can balance the exploration and exploitation phases of the search process.
CRediT authorship contribution statement
Ruba Abu Khurma: Proposed and evolved the mathematical models of the proposed algorithm, Prepared the experiments, tables, diagrams and pseudo-code of the proposed algorithm, Executed the programs and experimental scenarios of the work, Full revision of the entire paper. Dheeb Albashish: Proposed and evolved the mathematical models, Methodology, Formal analysis, Investigation, Validation, Supervision, Full revision of the entire paper. Malik Braik: Implement the proposed BSO and BSO-CV, Discussed the results, Assist in the writing. Abdullah Alzaqebah: Discussed all the computational results of the proposed algorithm, Checked the validation of the results and the references. Ashwaq Qasem: Examined the technical concepts in the paper, the readability of the full paper and English grammar, and computed the t-test. Omar Adwan: Offered feedback on the work and helped shape and analyze the work.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data availability
Data will be made available on request.
References
- 1.Sahran S., Albashish D., Abdullah A., Abd Shukor N., Pauzi S.H.M. Absolute cosine-based SVM-rfe feature selection method for prostate histopathological grading. Artif. Intell. Med. 2018;87:78–90. doi: 10.1016/j.artmed.2018.04.002. [DOI] [PubMed] [Google Scholar]
- 2.Alweshah M., Alkhalaileh S., Al-Betar M.A., Bakar A.A. Coronavirus herd immunity optimizer with greedy crossover for feature selection in medical diagnosis. Knowl.-Based Syst. 2022;235 doi: 10.1016/j.knosys.2021.107629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Albashish D., Hammouri A.I., Braik M., Atwan J., Sahran S. Binary biogeography-based optimization based SVM-RFE for feature selection. Appl. Soft Comput. 2021;101 [Google Scholar]
- 4.Braik M. Enhanced ali baba and the forty thieves algorithm for feature selection. Neural Comput. Appl. 2022:1–32. doi: 10.1007/s00521-022-08015-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Abu Khurma R., Aljarah I., Sharieh A., Abd Elaziz M., Damaševičius R., Krilavičius T. A review of the modification strategies of the nature inspired algorithms for feature selection problem. Mathematics. 2022;10(3):464. [Google Scholar]
- 6.Xue B., Zhang M., Browne W.N., Yao X. A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 2015;20(4):606–626. [Google Scholar]
- 7.Deng Z., Chung F.-L., Wang S. Robust relief-feature weighting, margin maximization, and fuzzy optimization. IEEE Trans. Fuzzy Syst. 2010;18(4):726–744. [Google Scholar]
- 8.Ramírez-Gallego S., Lastra I., Martínez-Rego D., Bolón-Canedo V., Benítez J.M., Herrera F., Alonso-Betanzos A. Fast-mRMR: Fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int. J. Intell. Syst. 2017;32(2):134–152. [Google Scholar]
- 9.Guyon I., Elisseeff A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003;3(Mar):1157–1182. [Google Scholar]
- 10.Abdel-Basset M., Abdel-Fatah L., Sangaiah A.K. Metaheuristic algorithms: A comprehensive review. Comput. Intell. Multimed. Big Data Cloud Eng. Appl. 2018:185–231. [Google Scholar]
- 11.Saw T., Myint P.H. Feature selection to classify healthcare data using wrapper method with PSO search. Int. J. Inf. Technol. Comput. Sci. 2019;11(9):31–37. [Google Scholar]
- 12.Abu Khurmaa R., Aljarah I., Sharieh A. An intelligent feature selection approach based on moth flame optimization for medical diagnosis. Neural Comput. Appl. 2021;33(12):7165–7204. [Google Scholar]
- 13.Alweshah M. Hybridization of arithmetic optimization with great Deluge algorithms for feature selection problems in medical diagnosis. Jordanian J. Comput. Inf. Technol. 2022;8(2) [Google Scholar]
- 14.Awadallah M.A., Al-Betar M.A., Braik M.S., Hammouri A.I., Doush I.A., Zitar R.A. An enhanced binary Rat Swarm Optimizer based on local-best concepts of PSO and collaborative crossover operators for feature selection. Comput. Biol. Med. 2022 doi: 10.1016/j.compbiomed.2022.105675. [DOI] [PubMed] [Google Scholar]
- 15.Alweshah M., Alkhalaileh S., Albashish D., Mafarja M., Bsoul Q., Dorgham O. A hybrid mine blast algorithm for feature selection problems. Soft Comput. 2021;25(1):517–534. [Google Scholar]
- 16.Hashim F.A., Hussien A.G. Snake Optimizer: A novel meta-heuristic optimization algorithm. Knowl.-Based Syst. 2022 [Google Scholar]
- 17.Rawa M. Towards avoiding cascading failures in transmission expansion planning of modern active power systems using hybrid snake-Sine cosine optimization algorithm. Mathematics. 2022;10(8):1323. [Google Scholar]
- 18.Khurma R.A., Aljarah I., Sharieh A. Rank based moth flame optimisation for feature selection in the medical application. 2020 IEEE Congress on Evolutionary Computation; CEC; IEEE; 2020. pp. 1–8. [Google Scholar]
- 19.Le T.M., Vo T.M., Pham T.N., Dao S.V.T. A novel wrapper–based feature selection for early diabetes prediction enhanced with a metaheuristic. IEEE Access. 2020;9:7869–7884. [Google Scholar]
- 20.Mazaheri V., Khodadadi H. Heart arrhythmia diagnosis based on the combination of morphological, frequency and nonlinear features of ECG signals and metaheuristic feature selection algorithm. Expert Syst. Appl. 2020;161 [Google Scholar]
- 21.Abd Elminaam D.S., Nabil A., Ibraheem S.A., Houssein E.H. An efficient marine predators algorithm for feature selection. IEEE Access. 2021;9:60136–60153. [Google Scholar]
- 22.Mirjalili S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl.-Based Syst. 2015;89:228–249. [Google Scholar]
- 23.Khurma R.A., Aljarah I., Sharieh A. A simultaneous moth flame optimizer feature selection approach based on levy flight and selection operators for medical diagnosis. Arab. J. Sci. Eng. 2021;46(9):8415–8440. [Google Scholar]
- 24.Dhanusha C., Senthil Kumar A., Jagadamba G., Musirin I.B. Sustainable Communication Networks and Application. Springer; 2022. Evolving chaotic shuffled frog leaping memetic metaheuristic model-based feature subset selection for alzheimer’s disease detection; pp. 679–692. [Google Scholar]
- 25.Jaddi N.S., Abadeh M.S. Cell separation algorithm with enhanced search behaviour in miRNA feature selection for cancer diagnosis. Inf. Syst. 2022;104 [Google Scholar]
- 26.Abouelmagd L.M., Shams M.Y., El-Attar N.E., Hassanien A.E. Medical Informatics and Bioimaging using Artificial Intelligence. Springer; 2022. Feature selection based coral reefs optimization for breast cancer classification; pp. 53–72. [Google Scholar]
- 27.Kanya Kumari L., Naga Jagadesh B. An adaptive teaching learning based optimization technique for feature selection to classify mammogram medical images in breast cancer detection. Int. J. Syst. Assur. Eng. Manag. 2022:1–14. [Google Scholar]
- 28.Dey A., Chattopadhyay S., Singh P.K., Ahmadian A., Ferrara M., Senu N., Sarkar R. MRFGRO: a hybrid meta-heuristic feature selection method for screening COVID-19 using deep features. Sci. Rep. 2021;11(1):1–15. doi: 10.1038/s41598-021-02731-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Aslan M.F., Sabanci K., Durdu A., Unlersen M.F. COVID-19 diagnosis using state-of-the-art CNN architecture features and Bayesian Optimization. Comput. Biol. Med. 2022 doi: 10.1016/j.compbiomed.2022.105244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Davazdahemami B., Zolbanin H.M., Delen D. An explanatory machine learning framework for studying pandemics: The case of COVID-19 emergency department readmissions. Decis. Support Syst. 2022 doi: 10.1016/j.dss.2022.113730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Bandyopadhyay R., Basu A., Cuevas E., Sarkar R. Harris hawks optimisation with simulated annealing as a deep feature selection method for screening of COVID-19 CT-scans. Appl. Soft Comput. 2021;111 doi: 10.1016/j.asoc.2021.107698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Deniz A., Kiziloz H.E., Sevinc E., Dokeroglu T. Predicting the severity of COVID-19 patients using a multi-threaded evolutionary feature selection algorithm. Expert Syst. 2022 [Google Scholar]
- 33.Kurnaz S., et al. Feature selection for diagnose coronavirus (COVID-19) disease by neural network and Caledonian crow learning algorithm. Appl. Nanosci. 2022:1–16. doi: 10.1007/s13204-021-02159-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kukker A., Sharma R. JAYA-optimized fuzzy reinforcement learning classifier for COVID-19. IETE J. Res. 2022:1–12. [Google Scholar]
- 35.Ragab M., Eljaaly K., Alhakamy N.A., Alhadrami H.A., Bahaddad A.A., Abo-Dahab S.M., Khalil E.M. Deep ensemble model for COVID-19 diagnosis and classification using chest CT images. Biology. 2022;11(1):43. doi: 10.3390/biology11010043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Too J., Mirjalili S. A hyper learning binary dragonfly algorithm for feature selection: A COVID-19 case study. Knowl.-Based Syst. 2021;212 [Google Scholar]
- 37.Irene D S., Beulah J.R. An efficient COVID-19 detection from CT images using ensemble support vector machine with ludo game-based swarm optimisation. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2022:1–12. [Google Scholar]
- 38.Wang W., Pei Y., Wang S., Gorrz J.M., Zhang Y. PSTCNN: Explainable COVID-19 diagnosis using PSO-guided self-tuning CNN. Biocell: Off. J. Soc. Lat. de Microsc. Electron. 2022;47(2):373–384. doi: 10.32604/biocell.2021.0xxx. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wang W., Zhang X., Wang S.-H., Zhang Y.-D. Covid-19 diagnosis by WE-SAJ. Syst. Sci. Control Eng. 2022;10(1):325–335. doi: 10.1080/21642583.2022.2045645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Khurma R.A., Aljarah I., Sharieh A., Mirjalili S. Evolutionary Machine Learning Techniques. Springer; 2020. Evolopy-fs: An open-source nature-inspired optimization framework in python for feature selection; pp. 131–173. [Google Scholar]
- 41.Jiang X., Coffee M., Bari A., Wang J., Jiang X., Huang J., Shi J., Dai J., Cai J., Zhang T., et al. Towards an artificial intelligence framework for data-driven prediction of coronavirus clinical severity. Comput. Mater. Continua. 2020;63(1):537–551. [Google Scholar]
- 42.Soomro T.A., Zheng L., Afifi A.J., Ali A., Yin M., Gao J. Artificial intelligence (AI) for medical imaging to combat coronavirus disease (COVID-19): A detailed review with direction for future research. Artif. Intell. Rev. 2021:1–31. doi: 10.1007/s10462-021-09985-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Alzaqebah A., Aljarah I., Al-Kadi O. A hierarchical intrusion detection system based on extreme learning machine and nature-inspired optimization. Comput. Secur. 2023;124 [Google Scholar]
- 44.Li Y., Zhu X., Liu J. An improved moth-flame optimization algorithm for engineering problems. Symmetry. 2020;12(8):1234. [Google Scholar]
- 45.Mustafa H.M., Ayob M., Albashish D., Abu-Taleb S. Solving text clustering problem using a memetic differential evolution algorithm. PLoS One. 2020;15(6) doi: 10.1371/journal.pone.0232816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Wang Z., Li M., Li J. A multi-objective evolutionary algorithm for feature selection based on mutual information with a new redundancy measure. Inform. Sci. 2015;307:73–88. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available on request.














