Skip to main content
. 2022 Jun 9;2022:1883698. doi: 10.1155/2022/1883698

Table 2.

Benefits and limitations of feature selection methods.

Method Benefits Limitations
Univariate filter method Information gain Results into the relevance of an attribute or feature Biased towards multi-valued attributes and overfitting
Chi-square Reduces training time and avoids overfitting Highly sensitive to sample size
Fishers' score Evaluates features individually to reduce the feature set Does not handle features redundancy
Pearson's correlation coefficient Is simplest and fast and measures the linear correlation between features It is only sensitive to a linear relationship
Variance threshold Removes features with variance below a certain cutoff Does not consider the relationship with the target variable

Multi-variate filter method mRMR (minimal redundancy maximum relevance) Measures the nonlinear relationship between feature and target variable and provides low error accuracies Features may be mutually as dissimilar to each other as possible
Multi-variate relative discriminative criterion Best determines the contribution of individual features to the underlying dimensions Does not fit for a small sample size

Linear multi-variate wrapper method Recursive feature elimination Considers high-quality top-N features and removes weakest features Computationally expensive and correlation of features not considered
Forward/backward stepwise selection Is computationally efficient and greedy optimization Sometimes impossible to find features with no correlation between them
Genetic algorithm Accommodates data set with a large number of features and knowledge about a problem not required Stochastic nature and computationally expensive

Nonlinear multi-variate wrapper methods Nonlinear kernel multiplicative De-emphasizes the least useful features by multiplying features with a scaling factor The complexity of kernel computation and multiplication
Relief Is feasible for binary classification, based on nearest neighbor instance pairs and is noise-tolerant Does not evaluate boundaries between redundant features, not suitable for the low number of training data sets

Embedded methods LASSO L1 regularization reduces overfitting, and it can be applied when features are even more than the data Random selection when features are highly correlated
Ridge regression L2 regularization is preferred over L1 when features are highly correlated Reduction of features is a challenge
Elastic net Is better than L1 and L2 for dealing with highly correlated features, is flexible, and solves optimization problems High computational cost