. 2022 Jun 9;2022:1883698. doi: 10.1155/2022/1883698

Table 2.

Benefits and limitations of feature selection methods.

Method		Benefits	Limitations
Univariate filter method	Information gain	Results into the relevance of an attribute or feature	Biased towards multi-valued attributes and overfitting
	Chi-square	Reduces training time and avoids overfitting	Highly sensitive to sample size
	Fishers' score	Evaluates features individually to reduce the feature set	Does not handle features redundancy
	Pearson's correlation coefficient	Is simplest and fast and measures the linear correlation between features	It is only sensitive to a linear relationship
	Variance threshold	Removes features with variance below a certain cutoff	Does not consider the relationship with the target variable

Multi-variate filter method	mRMR (minimal redundancy maximum relevance)	Measures the nonlinear relationship between feature and target variable and provides low error accuracies	Features may be mutually as dissimilar to each other as possible
Multi-variate filter method	Multi-variate relative discriminative criterion	Best determines the contribution of individual features to the underlying dimensions	Does not fit for a small sample size

Linear multi-variate wrapper method	Recursive feature elimination	Considers high-quality top-N features and removes weakest features	Computationally expensive and correlation of features not considered
	Forward/backward stepwise selection	Is computationally efficient and greedy optimization	Sometimes impossible to find features with no correlation between them
	Genetic algorithm	Accommodates data set with a large number of features and knowledge about a problem not required	Stochastic nature and computationally expensive

Nonlinear multi-variate wrapper methods	Nonlinear kernel multiplicative	De-emphasizes the least useful features by multiplying features with a scaling factor	The complexity of kernel computation and multiplication
Nonlinear multi-variate wrapper methods	Relief	Is feasible for binary classification, based on nearest neighbor instance pairs and is noise-tolerant	Does not evaluate boundaries between redundant features, not suitable for the low number of training data sets

Embedded methods	LASSO	L1 regularization reduces overfitting, and it can be applied when features are even more than the data	Random selection when features are highly correlated
	Ridge regression	L2 regularization is preferred over L1 when features are highly correlated	Reduction of features is a challenge
	Elastic net	Is better than L1 and L2 for dealing with highly correlated features, is flexible, and solves optimization problems	High computational cost