Skip to main content
. 2021 Aug 13;33(22):15091–15118. doi: 10.1007/s00521-021-06406-8

Table 3.

Comparison of related studies on metaheuristic-based feature selection method for text classification

Reference used for the study Feature selection methods Dataset Classification algorithms Performance and evaluation methods Contribution Shortcomings
A novel community detection-based genetic algorithm for feature selection [116] First, the similarities of the feature are calculated. In the second step, the features are classified by community detection algorithms into clusters. Third, the features are picked by a genetic algorithm Nine benchmark classification problems were analysed in terms of performance It used a genetic algorithm based on community detection. The selected methods are based on PSO, ACO, and ABC algorithms Comparing the performance of the proposed method with three new feature selection methods based on PSO, ACO, and ABC algorithms on three classifiers showed that the accuracy was on the average of 0.52% higher than the PSO, 1.20% higher than ACO, and 1.57 higher than the ABC algorithm The proposed genetic algorithm approach takes cognizant of the correlation between the selected features, hence preventing the selection of redundant features and significantly improving the predictive model's performance To optimize the selected parameters, there is a need to repeatedly set parameters, generate a number of predictions with distinct combinations of values and then evaluate the prediction accuracy to select the best parameter values. As a result, choosing the best values for the parameters is an optimization problem
Comparison on feature selection methods for text classification [117] Typical feature selection methods for text classification, alongside a comparison of experiments on four benchmark datasets, was conducted to compare the effectiveness of twenty typical FS methods Four datasets achieved from the UCI repository are utilized in the comparison experiments. The four datasets are named as CARR, COMD, IMDB and KDCN, respectively It uses MOR and MC-OR for text classification. Likewise, it applied the unsupervised term variance (TV), term variance quality (TVQ), term frequency (TF) and document frequency (DF) for efficiency and high classification accuracies It performance is of the typical feature selection methods The result of this paper gives a guideline for selecting appropriate feature selection methods for text classification academic analysis or real-world text classification applications MOR and MC-OR are both the best choices for TC. However, the formulas of the two methods are relatively complex

Novel approach with nature-inspired and ensemble techniques for optimal text classification

[112]

Biogeography-based optimization (BBO) with ensemble classifiers, genetic algorithm (GA) and particle swarm optimization (PSO)

Ten text datasets from UCI repository {tr11, tr12, tr21, tr23, tr31, tr41, tr45, oh0,

oh10, oh15},

Real-time dataset from MOA, including Scientific documents, News and Airlines dataset of 539,384 records

Naïve Bayes (NB), K-nearest neighbour (kNN), support vector machine (SVM), random forest (RF), decision tree (DT) and ensemble classifier The average precision was 83.87 with 70.67 recall. The average accuracy was 85.16 with a 76.71 average F-measure

The proposed hybrid BBO algorithm selects optimal subset of features

The algorithm was tested on real-time dataset of airlines. Thus, depicting its feasibility for solving real-world problems

The proposed approach used imbalanced data leading to irregularities in the accuracy and F-measure of some of the dataset during performance analysis
Automatic text classification using machine learning and optimization algorithms [77] The approach is based on the artificial bee colony algorithm with a sequential forward selection algorithm (SFS), where the selection technique utilizes a modest greedy search algorithm Reuters-21578, 20 Newsgroup and real dataset Machine learning-based automatic text classification (MLearn-ATC) algorithm based on probabilistic neural networks (PNN)

A precision of 0.847, a recall of 0.839, an F-measure of 0.843 and an accuracy of 0.938 was obtained on Reuters. A precision of 0.896, a recall of 0.825, an F-measure of 0.859 and an accuracy of 0.937 was obtained on

20 Newsgroup. A precision of 0.897, a recall of 0.845, an F-measure of 0.870 and an accuracy of 0.961 was attained on real dataset

The proposed algorithm outperformed Naive Bayes (NB), K-nearest neighbour (KNN), support vector machine (SVM) and probabilistic neural network (PNN) when a comparative analysis that measured performance was carried out

Authors claimed that the proposed algorithm utilizes the minimum time and memory while performing the task

The accuracy of the algorithm was verified on particle swarm optimization (PSO), ant colony optimization (ACO), artificial bee colony (ABC) and firefly algorithm (FA) only. Nevertheless, it performance cannot be generalized as it was not compared with other optimization methods
Optimized deep belief network and entropy-based hybrid bounding model for incremental text categorization [118] Entropy-based FS infused with a feature extraction process using a vector space model (VSM) which extracts the TF-IDF and energy features 20 Newsgroups and Reuters dataset Grasshopper crow optimization algorithm (GCOA) and deep belief network (DBN) A precision of 0.959, 0.959 recall and an accuracy of 0.96 were reported The proposed algorithm provides better performance for incremental text categorization when compared with existing algorithms The proposed algorithm was not compared alongside other evolutionary algorithms. Hence, its performance may not give the same result when compared to other known systems
Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification [119] Composition of wrapper-based binary Jaya optimization algorithm (BJO) and filter-based normalized difference measure (NDM) WebKB dataset, SMS dataset, BBC dataset and 10Newsgroup dataset Multinomial Naïve Bayes (NB) and linear support vector machine (SVM) No set values given. Graphs were used to depict the superiority of the proposed NDM-BJO when compared with existing NB and SVM classifiers for the four categories of datasets used Proposed a new hybrid feature selection method called the normalized difference measure and binary Jaya optimization algorithm (NDM-BJO) to reduce the high-dimensional feature space of text classification problem The evaluation metrics was based on accuracy and F1 Macro. However, much uncertainty still exists if the proposed algorithm will outperform existing system when other metrics such as precision or recall is used to evaluate the efficacy of the system
Optimization of multi-class document classification with computational search policy [120]

Cuckoo optimization (CO), firefly optimization (FO), and bat optimization (BO) algorithms

- correlation-based feature subset filter

News documents J48 and support vector machine (SVM)

The accuracy for J48 for CO, FO and BO are 92.03%, 90.55% and 90.23%, respectively

The accuracy for SVM for CO, FO and BO are 87.22%, 89.60% and 87.22%, respectively

Proposed model took the advantage of nature-inspired-based metaheuristic algorithms, which provided advanced nature search for nonlinear complex problems More classifiers need to be set up with computational search policies and their effects measured
An improved sine cosine algorithm to select features for text categorization [113] Improved sine cosine algorithm (ISCA) Reuters-21578 (Re0), La1s, La2s, Oh0, Oh5, Oh10, Oh15, FBIS, tr41 Naïve Bayes (NB)

The average precision, recall and F-measure are

82.32, 82.89 and 82.22, respectively

Proposed ISCA algorithm which is statistically significant than Obl-SCA, weighted-SCA and ACO algorithms Proposed ISCA is statistically weak for some other algorithms such as GA, LevySca, SCA and MFO. Thus, limiting its generalization for conclusion if it improves the performance of categorization task in a larger setting
Text feature space optimization using artificial bee colony [73] Artificial bee colony (ABC) Reuters-21578

support vector machine (SVM)

Naïve Bayes (NB) and k-nearest neighbours (KNN)

The average accuracy, precision, recall and F-measure on SVM are 95.07%, 84.75, 83.74 and 96.08, respectively

On NB are 92.23%, 83.04, 81.96 and 82.48, respectively

On KNN are 87.37%, 78.91, 77.25 and 78.04, respectively

Proposed the ABC, a metaheuristic-based algorithm for improved performance in text classification Complexity in determining the control parameters or hyperparameters for the algorithm
New hybrid method for feature selection and classification using metaheuristic algorithm in credit risk assessment [121] An exploration of new advanced hybrid feature selection has been proposed to deal with these problems The new proposed algorithm utilizes the dataset from unique client identifier (UCI) repository of machine learning credit to estimate the performance A metaheuristic of imperialist competitive algorithm with modified fuzzy min–max classifier (ICA-MFMCN) Statistical test results show that the available data support the hypothesis of searching for reliability level of 1% Fast algorithms performance to future ranking and also optimization capabilities of an ICA Lack of rapid filtering technique to reduce the search space
Artificial bee colony algorithm for feature selection and improved support vector machine for text classification [74] Based on artificial bee colony feature selection (ABCFS) algorithm

Reuters-21578,

20Newsgroup corpus and

Real datasets

Support vector machine (SVM) and improved SVM (ISVM)

The average precision, recall, F-measure and accuracy on Reuters are 0.675, 0.702, 0.679 and 0.829, respectively

On 20 Newsgroup are 0.701, 0.723, 0.710 and 0.822, respectively

On real dataset are 0.840, 0.797, 0.817 and 0.835, respectively

Proposed ABCFS which enhances the accuracy of text document classification

Proposed algorithm requires a high computational time and complexity

Verified only on SVM and an improved. Hence, it is unclear if the performance of the algorithm can be generalized on other state-of-the-art classifiers

A modified multi-objective heuristic for effective feature selection in text classification [122] Modified artificial fish swarm algorithm (MAFSA) OHSUMED Support vector machine (SVM), AdaBoost classifiers and Naïve Bayes Average precision of MAFSA is 2.27% better than artificial fish swarm algorithm (AFSA) Proposed MAFSA which is an improvement over AFSA for feature selection and better text classification Performance metrics not descriptive enough
An ACO–ANN-based feature selection algorithm for big data [76] Ant colony optimization (ACO) Reuters-21578 Artificial neural network (ANN)

The average precision, recall, macro F-measure, micro F-measure and accuracy are

77.34, 80.14, 79.01, 89.87 and 81.35, respectively

Proposed ACO algorithm is a subset of the hybrid algorithm which has the capability to congregate promptly since it has effective search ability in the problem state space, thus allowing the efficient determination of minimal feature subset The performance of the proposed algorithm cannot be generalized as verification was not done on standard classifiers
Competitive particle swarm optimization for multi-category text feature selection [123] Continuous particle swarm optimization (PSO) algorithm RCV1 and Yahoo collections Multi-label Naive Bayes (MLNB) and extreme learning machine for multi-label (ML-ELM)

One-error for MLNB

EGA + CDM, bALO-QR and CSO are 3.75, 2.31 and 2.94m respectively

Multi-label accuracy for MLNB are 3.19, 2.75 and 3.06m respectively

Proposed a process for estimating the relative effectiveness of the PSO based on the fitness-based tournament of the feature subset in each iteration

hybridized approach addresses degenerated final feature subsets

The performance of the proposed algorithm cannot be generalized as verification was not done on standard classifiers

The proposed PSO was designed for multi-label text feature selection. It was not tested on single-labelled text

A new approach for text documents classification with invasive weed optimization and Naive Bayes classifier [124] Invasive weed optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) Reuters-21578, WebKb, and Cade 12 Naive Bayes (NB)

The precision, recall, F-measure, AUC, accuracy and error rate on Reuters are 0.6632, 0.6925, 0.6775, 0.6894, 0.7012 and 0.2988, respectively

On WebKb are 0.6548, 0.7136, 0.6829, 0.6914, 0.7265 and 0.2735, respectively

On Cade 12 are 0.6984, 0.7214, 0.7097, 0.7058, 0.7045, 0.2955

Proposed a hybrid of IWO algorithm and NB classifier for improving the performance of document classification The performance of the proposed algorithm cannot be generalized as verification was not done on all standard classifiers
Particle swarm optimization-based two-stage feature selection in text mining [125] Correlation (CO), information gain (IG), gain ratio (GR), symmetrical uncertainty (SU) and particle swarm optimization (PSO) Reuters-21578 R8 dataset Naïve Bayes (NB) Average accuracy on CO is 88.74%, on IG is 89.52%, on GR is 87.83% and on SU is 89.34% Proposed algorithm eliminates useless features and reduces the search space for enhanced performance during the categorization task

Requires increased computational resources and complexity

Requires more numbers of features

Approach requires further work

such as using a different fitness function or a multi-objective search approach

A text feature selection method based on the small world algorithm

[126]

Information gain (IG) and Chi-square statistics (CHI)

Optimization of candidate features: Small world algorithm (SWA)

Reuters 21,578 Classic Corpus,

Chinese Fudan Corpus

K-nearest neighbours (KNN) and support vector machine (SVM)

Aggregated accuracy on Reuters improved by an average of 2.3% when using the IG or CHI and SWA optimization

Aggregated accuracy on Fudan improved by an average of 5.3% when using the IG or CHI and SWA optimization

The proposed algorithm minimized the dimension of feature vector and the complexity ultimately increasing the accuracy rate

A local short-range search algorithm was used to improve performance of text classification

There was no optimization on the parameter setting of the SWA thus making some portion of the results to be inconclusive

The proposed SWA algorithm has no optimal number of iterations due to the lack of a mechanism for parameter settings

An improved flower pollination algorithm with AdaBoost algorithm for feature selection in text documents classification [127] Flower pollination algorithm (FPA) Reuters-21578, WebKb, and Cade 12 AdaBoost

The precision, recall, F-measure and accuracy on Reuters are 77.94, 69.32, 72.77 and 70.35, respectively

On WebKb are 76.54, 69.94, 71.95 and 69.48, respectively

On Cade 12 are 76.94, 71.24, 73.81 and 69.89, respectively

Proposed model shows a significant reduction in the size of features as well as the similarity between the categories of weight and the distance between the words when compared with other models Proposed model is dependent on the parameter values making it less efficient when choosing the feature weights
An improved k-nearest neighbour with crow search algorithm for feature selection in text documents classification [128] Crow search algorithm (CSA) Reuters-21578, WebKb and Cade 12 K-nearest neighbour (KNN)

The precision, recall, F-measure and accuracy for KNN on Reuters are 76.34, 69.47, 72.74 and 68.32, respectively

On WebKb are 77.35, 68.24, 72.51 and 70.64, respectively

On Cade 12 are 75.48,

69.58, 72.41 and 72.23, respectively

Proposed model is more accurate in classification than the standard KNN with a greater F-measure

Proposed model gave a higher accuracy of 27% when compared to KNN

Proposed model have the drawback of optimal feature selection during the classification task
Multi-label text classification using optimized feature sets [129] Wrapper-based hybrid artificial bee colony and bacterial foraging optimisation (HABBFO) Reuters dataset Artificial neural network (ANN)

The precision, recall and hamming loss for KNN are 89.85, 88.89 and 35.45, respectively

For ANN are 94.82, 93.79 and 20.45, respectively

The proposed multi-label classifier performs better than standard KNN algorithm when evaluated in terms of precision, recall and hamming loss Proposed feature selection model was verified on KNN and ANN classifier only. It generalization on other classifiers using the authors proposed algorithm is undefined
Feature selection for text classification using genetic algorithms [130] Genetic algorithm (GA) 20Newsgroups, Reuters-21578 Naive Bayes (NB), nearest neighbours (KNN) and support vector machines (SVMs)

F-measure on Reuters for KNN, SVM and NB are 0.931, 0.946 and 0.863, respectively

F-measure on 20Newsgroup for KNN, SVM and NB are 0.931, 0.879 and 0.946, respectively

Proposed algorithm allows search of a feature subset such that the performance of classifier is best

It allows finding a feature subset with the smallest dimensionality which yield higher accuracy in classification

Algorithm needs to be verified on evolutionary and metaheuristic algorithms or hybrid solution to improve textual document classification
Metaheuristic algorithms for feature selection in sentiment analysis [131] The study compares feature selection in text classification based on traditional and sentiment analysis methods The proposed dimension reduction strategy was to reduce the size of large capacity training dataset It applied metaheuristic method such as genetic algorithm, particle swarm optimization (PSO) and rough set theory The result of the research in traditional text classification found that ACO was able to obtain optimum feature subset compared to GA The result shows metaheuristic-based algorithms have the potential to be perform in sentiment analysis The main challenges in the sentiment classification are overlapping of features, large size dimension, and irrelevant elimination
A new and fast rival genetic algorithm for feature selection [132] The study put forward a new rival genetic algorithm (RGA) to improve the performance of GA for feature selection The study used twenty-three (23) benchmarked dataset encompassing UCI machine learning repository dataset and Arizona State University dataset Not stated

Average accuracy was 0.9579

Average FSR is 0.4386

A competition strategy and dynamic mutation rate was used to enhance the performance of the GA

A fast RGA was presented to enhance the computational effort of RGA

Future work require testing the efficiency of the RGA on other unexplored classification tasks such as electromyography signals, detection and diagnosis of strokes and other diseases