. 2021 Aug 13;33(22):15091–15118. doi: 10.1007/s00521-021-06406-8

Table 3.

Comparison of related studies on metaheuristic-based feature selection method for text classification

Reference used for the study	Feature selection methods	Dataset	Classification algorithms	Performance and evaluation methods	Contribution	Shortcomings
A novel community detection-based genetic algorithm for feature selection [116]	First, the similarities of the feature are calculated. In the second step, the features are classified by community detection algorithms into clusters. Third, the features are picked by a genetic algorithm	Nine benchmark classification problems were analysed in terms of performance	It used a genetic algorithm based on community detection. The selected methods are based on PSO, ACO, and ABC algorithms	Comparing the performance of the proposed method with three new feature selection methods based on PSO, ACO, and ABC algorithms on three classifiers showed that the accuracy was on the average of 0.52% higher than the PSO, 1.20% higher than ACO, and 1.57 higher than the ABC algorithm	The proposed genetic algorithm approach takes cognizant of the correlation between the selected features, hence preventing the selection of redundant features and significantly improving the predictive model's performance	To optimize the selected parameters, there is a need to repeatedly set parameters, generate a number of predictions with distinct combinations of values and then evaluate the prediction accuracy to select the best parameter values. As a result, choosing the best values for the parameters is an optimization problem
Comparison on feature selection methods for text classification [117]	Typical feature selection methods for text classification, alongside a comparison of experiments on four benchmark datasets, was conducted to compare the effectiveness of twenty typical FS methods	Four datasets achieved from the UCI repository are utilized in the comparison experiments. The four datasets are named as CARR, COMD, IMDB and KDCN, respectively	It uses MOR and MC-OR for text classification. Likewise, it applied the unsupervised term variance (TV), term variance quality (TVQ), term frequency (TF) and document frequency (DF) for efficiency and high classification accuracies	It performance is of the typical feature selection methods	The result of this paper gives a guideline for selecting appropriate feature selection methods for text classification academic analysis or real-world text classification applications	MOR and MC-OR are both the best choices for TC. However, the formulas of the two methods are relatively complex
Novel approach with nature-inspired and ensemble techniques for optimal text classification [112]	Biogeography-based optimization (BBO) with ensemble classifiers, genetic algorithm (GA) and particle swarm optimization (PSO)	Ten text datasets from UCI repository {tr11, tr12, tr21, tr23, tr31, tr41, tr45, oh0, oh10, oh15}, Real-time dataset from MOA, including Scientific documents, News and Airlines dataset of 539,384 records	Naïve Bayes (NB), K-nearest neighbour (kNN), support vector machine (SVM), random forest (RF), decision tree (DT) and ensemble classifier	The average precision was 83.87 with 70.67 recall. The average accuracy was 85.16 with a 76.71 average F-measure	The proposed hybrid BBO algorithm selects optimal subset of features The algorithm was tested on real-time dataset of airlines. Thus, depicting its feasibility for solving real-world problems	The proposed approach used imbalanced data leading to irregularities in the accuracy and F-measure of some of the dataset during performance analysis
Automatic text classification using machine learning and optimization algorithms [77]	The approach is based on the artificial bee colony algorithm with a sequential forward selection algorithm (SFS), where the selection technique utilizes a modest greedy search algorithm	Reuters-21578, 20 Newsgroup and real dataset	Machine learning-based automatic text classification (MLearn-ATC) algorithm based on probabilistic neural networks (PNN)	A precision of 0.847, a recall of 0.839, an F-measure of 0.843 and an accuracy of 0.938 was obtained on Reuters. A precision of 0.896, a recall of 0.825, an F-measure of 0.859 and an accuracy of 0.937 was obtained on 20 Newsgroup. A precision of 0.897, a recall of 0.845, an F-measure of 0.870 and an accuracy of 0.961 was attained on real dataset	The proposed algorithm outperformed Naive Bayes (NB), K-nearest neighbour (KNN), support vector machine (SVM) and probabilistic neural network (PNN) when a comparative analysis that measured performance was carried out Authors claimed that the proposed algorithm utilizes the minimum time and memory while performing the task	The accuracy of the algorithm was verified on particle swarm optimization (PSO), ant colony optimization (ACO), artificial bee colony (ABC) and firefly algorithm (FA) only. Nevertheless, it performance cannot be generalized as it was not compared with other optimization methods
Optimized deep belief network and entropy-based hybrid bounding model for incremental text categorization [118]	Entropy-based FS infused with a feature extraction process using a vector space model (VSM) which extracts the TF-IDF and energy features	20 Newsgroups and Reuters dataset	Grasshopper crow optimization algorithm (GCOA) and deep belief network (DBN)	A precision of 0.959, 0.959 recall and an accuracy of 0.96 were reported	The proposed algorithm provides better performance for incremental text categorization when compared with existing algorithms	The proposed algorithm was not compared alongside other evolutionary algorithms. Hence, its performance may not give the same result when compared to other known systems
Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification [119]	Composition of wrapper-based binary Jaya optimization algorithm (BJO) and filter-based normalized difference measure (NDM)	WebKB dataset, SMS dataset, BBC dataset and 10Newsgroup dataset	Multinomial Naïve Bayes (NB) and linear support vector machine (SVM)	No set values given. Graphs were used to depict the superiority of the proposed NDM-BJO when compared with existing NB and SVM classifiers for the four categories of datasets used	Proposed a new hybrid feature selection method called the normalized difference measure and binary Jaya optimization algorithm (NDM-BJO) to reduce the high-dimensional feature space of text classification problem	The evaluation metrics was based on accuracy and F₁ ^Macro. However, much uncertainty still exists if the proposed algorithm will outperform existing system when other metrics such as precision or recall is used to evaluate the efficacy of the system
Optimization of multi-class document classification with computational search policy [120]	Cuckoo optimization (CO), firefly optimization (FO), and bat optimization (BO) algorithms - correlation-based feature subset filter	News documents	J48 and support vector machine (SVM)	The accuracy for J48 for CO, FO and BO are 92.03%, 90.55% and 90.23%, respectively The accuracy for SVM for CO, FO and BO are 87.22%, 89.60% and 87.22%, respectively	Proposed model took the advantage of nature-inspired-based metaheuristic algorithms, which provided advanced nature search for nonlinear complex problems	More classifiers need to be set up with computational search policies and their effects measured
An improved sine cosine algorithm to select features for text categorization [113]	Improved sine cosine algorithm (ISCA)	Reuters-21578 (Re0), La1s, La2s, Oh0, Oh5, Oh10, Oh15, FBIS, tr41	Naïve Bayes (NB)	The average precision, recall and F-measure are 82.32, 82.89 and 82.22, respectively	Proposed ISCA algorithm which is statistically significant than Obl-SCA, weighted-SCA and ACO algorithms	Proposed ISCA is statistically weak for some other algorithms such as GA, LevySca, SCA and MFO. Thus, limiting its generalization for conclusion if it improves the performance of categorization task in a larger setting
Text feature space optimization using artificial bee colony [73]	Artificial bee colony (ABC)	Reuters-21578	support vector machine (SVM) Naïve Bayes (NB) and k-nearest neighbours (KNN)	The average accuracy, precision, recall and F-measure on SVM are 95.07%, 84.75, 83.74 and 96.08, respectively On NB are 92.23%, 83.04, 81.96 and 82.48, respectively On KNN are 87.37%, 78.91, 77.25 and 78.04, respectively	Proposed the ABC, a metaheuristic-based algorithm for improved performance in text classification	Complexity in determining the control parameters or hyperparameters for the algorithm
New hybrid method for feature selection and classification using metaheuristic algorithm in credit risk assessment [121]	An exploration of new advanced hybrid feature selection has been proposed to deal with these problems	The new proposed algorithm utilizes the dataset from unique client identifier (UCI) repository of machine learning credit to estimate the performance	A metaheuristic of imperialist competitive algorithm with modified fuzzy min–max classifier (ICA-MFMCN)	Statistical test results show that the available data support the hypothesis of searching for reliability level of 1%	Fast algorithms performance to future ranking and also optimization capabilities of an ICA	Lack of rapid filtering technique to reduce the search space
Artificial bee colony algorithm for feature selection and improved support vector machine for text classification [74]	Based on artificial bee colony feature selection (ABCFS) algorithm	Reuters-21578, 20Newsgroup corpus and Real datasets	Support vector machine (SVM) and improved SVM (ISVM)	The average precision, recall, F-measure and accuracy on Reuters are 0.675, 0.702, 0.679 and 0.829, respectively On 20 Newsgroup are 0.701, 0.723, 0.710 and 0.822, respectively On real dataset are 0.840, 0.797, 0.817 and 0.835, respectively	Proposed ABCFS which enhances the accuracy of text document classification	Proposed algorithm requires a high computational time and complexity Verified only on SVM and an improved. Hence, it is unclear if the performance of the algorithm can be generalized on other state-of-the-art classifiers
A modified multi-objective heuristic for effective feature selection in text classification [122]	Modified artificial fish swarm algorithm (MAFSA)	OHSUMED	Support vector machine (SVM), AdaBoost classifiers and Naïve Bayes	Average precision of MAFSA is 2.27% better than artificial fish swarm algorithm (AFSA)	Proposed MAFSA which is an improvement over AFSA for feature selection and better text classification	Performance metrics not descriptive enough
An ACO–ANN-based feature selection algorithm for big data [76]	Ant colony optimization (ACO)	Reuters-21578	Artificial neural network (ANN)	The average precision, recall, macro F-measure, micro F-measure and accuracy are 77.34, 80.14, 79.01, 89.87 and 81.35, respectively	Proposed ACO algorithm is a subset of the hybrid algorithm which has the capability to congregate promptly since it has effective search ability in the problem state space, thus allowing the efficient determination of minimal feature subset	The performance of the proposed algorithm cannot be generalized as verification was not done on standard classifiers
Competitive particle swarm optimization for multi-category text feature selection [123]	Continuous particle swarm optimization (PSO) algorithm	RCV1 and Yahoo collections	Multi-label Naive Bayes (MLNB) and extreme learning machine for multi-label (ML-ELM)	One-error for MLNB EGA + CDM, bALO-QR and CSO are 3.75, 2.31 and 2.94m respectively Multi-label accuracy for MLNB are 3.19, 2.75 and 3.06m respectively	Proposed a process for estimating the relative effectiveness of the PSO based on the fitness-based tournament of the feature subset in each iteration hybridized approach addresses degenerated final feature subsets	The performance of the proposed algorithm cannot be generalized as verification was not done on standard classifiers The proposed PSO was designed for multi-label text feature selection. It was not tested on single-labelled text
A new approach for text documents classification with invasive weed optimization and Naive Bayes classifier [124]	Invasive weed optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB)	Reuters-21578, WebKb, and Cade 12	Naive Bayes (NB)	The precision, recall, F-measure, AUC, accuracy and error rate on Reuters are 0.6632, 0.6925, 0.6775, 0.6894, 0.7012 and 0.2988, respectively On WebKb are 0.6548, 0.7136, 0.6829, 0.6914, 0.7265 and 0.2735, respectively On Cade 12 are 0.6984, 0.7214, 0.7097, 0.7058, 0.7045, 0.2955	Proposed a hybrid of IWO algorithm and NB classifier for improving the performance of document classification	The performance of the proposed algorithm cannot be generalized as verification was not done on all standard classifiers
Particle swarm optimization-based two-stage feature selection in text mining [125]	Correlation (CO), information gain (IG), gain ratio (GR), symmetrical uncertainty (SU) and particle swarm optimization (PSO)	Reuters-21578 R8 dataset	Naïve Bayes (NB)	Average accuracy on CO is 88.74%, on IG is 89.52%, on GR is 87.83% and on SU is 89.34%	Proposed algorithm eliminates useless features and reduces the search space for enhanced performance during the categorization task	Requires increased computational resources and complexity Requires more numbers of features Approach requires further work such as using a different fitness function or a multi-objective search approach
A text feature selection method based on the small world algorithm [126]	Information gain (IG) and Chi-square statistics (CHI) Optimization of candidate features: Small world algorithm (SWA)	Reuters 21,578 Classic Corpus, Chinese Fudan Corpus	K-nearest neighbours (KNN) and support vector machine (SVM)	Aggregated accuracy on Reuters improved by an average of 2.3% when using the IG or CHI and SWA optimization Aggregated accuracy on Fudan improved by an average of 5.3% when using the IG or CHI and SWA optimization	The proposed algorithm minimized the dimension of feature vector and the complexity ultimately increasing the accuracy rate A local short-range search algorithm was used to improve performance of text classification	There was no optimization on the parameter setting of the SWA thus making some portion of the results to be inconclusive The proposed SWA algorithm has no optimal number of iterations due to the lack of a mechanism for parameter settings
An improved flower pollination algorithm with AdaBoost algorithm for feature selection in text documents classification [127]	Flower pollination algorithm (FPA)	Reuters-21578, WebKb, and Cade 12	AdaBoost	The precision, recall, F-measure and accuracy on Reuters are 77.94, 69.32, 72.77 and 70.35, respectively On WebKb are 76.54, 69.94, 71.95 and 69.48, respectively On Cade 12 are 76.94, 71.24, 73.81 and 69.89, respectively	Proposed model shows a significant reduction in the size of features as well as the similarity between the categories of weight and the distance between the words when compared with other models	Proposed model is dependent on the parameter values making it less efficient when choosing the feature weights
An improved k-nearest neighbour with crow search algorithm for feature selection in text documents classification [128]	Crow search algorithm (CSA)	Reuters-21578, WebKb and Cade 12	K-nearest neighbour (KNN)	The precision, recall, F-measure and accuracy for KNN on Reuters are 76.34, 69.47, 72.74 and 68.32, respectively On WebKb are 77.35, 68.24, 72.51 and 70.64, respectively On Cade 12 are 75.48, 69.58, 72.41 and 72.23, respectively	Proposed model is more accurate in classification than the standard KNN with a greater F-measure Proposed model gave a higher accuracy of 27% when compared to KNN	Proposed model have the drawback of optimal feature selection during the classification task
Multi-label text classification using optimized feature sets [129]	Wrapper-based hybrid artificial bee colony and bacterial foraging optimisation (HABBFO)	Reuters dataset	Artificial neural network (ANN)	The precision, recall and hamming loss for KNN are 89.85, 88.89 and 35.45, respectively For ANN are 94.82, 93.79 and 20.45, respectively	The proposed multi-label classifier performs better than standard KNN algorithm when evaluated in terms of precision, recall and hamming loss	Proposed feature selection model was verified on KNN and ANN classifier only. It generalization on other classifiers using the authors proposed algorithm is undefined
Feature selection for text classification using genetic algorithms [130]	Genetic algorithm (GA)	20Newsgroups, Reuters-21578	Naive Bayes (NB), nearest neighbours (KNN) and support vector machines (SVMs)	F-measure on Reuters for KNN, SVM and NB are 0.931, 0.946 and 0.863, respectively F-measure on 20Newsgroup for KNN, SVM and NB are 0.931, 0.879 and 0.946, respectively	Proposed algorithm allows search of a feature subset such that the performance of classifier is best It allows finding a feature subset with the smallest dimensionality which yield higher accuracy in classification	Algorithm needs to be verified on evolutionary and metaheuristic algorithms or hybrid solution to improve textual document classification
Metaheuristic algorithms for feature selection in sentiment analysis [131]	The study compares feature selection in text classification based on traditional and sentiment analysis methods	The proposed dimension reduction strategy was to reduce the size of large capacity training dataset	It applied metaheuristic method such as genetic algorithm, particle swarm optimization (PSO) and rough set theory	The result of the research in traditional text classification found that ACO was able to obtain optimum feature subset compared to GA	The result shows metaheuristic-based algorithms have the potential to be perform in sentiment analysis	The main challenges in the sentiment classification are overlapping of features, large size dimension, and irrelevant elimination
A new and fast rival genetic algorithm for feature selection [132]	The study put forward a new rival genetic algorithm (RGA) to improve the performance of GA for feature selection	The study used twenty-three (23) benchmarked dataset encompassing UCI machine learning repository dataset and Arizona State University dataset	Not stated	Average accuracy was 0.9579 Average FSR is 0.4386	A competition strategy and dynamic mutation rate was used to enhance the performance of the GA A fast RGA was presented to enhance the computational effort of RGA	Future work require testing the efficiency of the RGA on other unexplored classification tasks such as electromyography signals, detection and diagnosis of strokes and other diseases