Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2026 Jan 8;16:3426. doi: 10.1038/s41598-025-33384-x

A block matrix incremental feature selection method based on fuzzy rough minimum classification error

Zhanwei Chen 1, Minggang Xing 2, Juan Li 1,
PMCID: PMC12835165  PMID: 41507248

Abstract

In fuzzy rough set models, inner-product correlation serves as an effective evaluation function for feature selection, with its key advantage lying in its ability to characterize the minimum classification error inherent in the model. However, existing inner-product-based methods typically rely only on a subset of samples to approximate this error, making it difficult to fully and accurately capture the discriminative structure across the entire sample space. To address this limitation, this paper proposes a global-sample-oriented inner-product correlation criterion. By constructing a continuous and non-vanishing fuzzy membership structure over the entire universe of discourse, the proposed criterion significantly enhances the theoretical soundness and practical consistency of inner-product-based feature evaluation. Building upon this foundation and leveraging efficient matrix computation techniques, we design a static feature selection algorithm based on the Minimum Classification Error-based Feature Selection (MCEFS) criterion. Furthermore, to meet the demand for efficient updates in dynamic data environments, we develop a block-wise updating mechanism for the fuzzy decision and fuzzy relation matrices. We rigorously derive and prove a block-based incremental update strategy for the fuzzy lower approximation matrix, which effectively eliminates redundant recomputation of fuzzy lower approximations and substantially improves computational efficiency. Based on this strategy, we propose an incremental feature selection algorithm—Block Matrix-based MCEFS (BM-MCEFS). Finally, comprehensive comparative experiments on 12 public benchmark datasets validate the effectiveness and feasibility of the static MCEFS algorithm and clearly demonstrate the superior performance of BM-MCEFS in terms of computational efficiency and numerical stability.

Keywords: Fuzzy rough set, Feature selection, Block matrix, Incremental mechanisms, Fuzzy decision

Subject terms: Data mining, Computer science, Computational science

Introduction

In today’s data-driven world, high-dimensional datasets are becoming increasingly common, yet traditional learning algorithms often fail to process them effectively. As a result, feature selection methods, which can eliminate redundant information, have garnered widespread attention. By identifying a representative subset of features that retains the essential characteristics of the original data, feature selection simplifies the subsequent data analysis process. Currently, feature selection has become an important research topic in the field of pattern recognition1,2 and machine learning35, and has been widely used in practice614. The fuzzy rough set theory proposed by Dubois and Prade15 is an important tool for feature selection using uncertainty in data1618. In recent years, it has attracted the attention of many researchers19,20. Alnoor et al.21 proposed an application method based on Linear Diophantine Fuzzy Rough Sets and Multicriteria Decision-Making Methods, which can effectively identify oil transportation activities. Riaz et al.22 proposed linear Diophantine fuzzy sets (LDFS), which introduce reference parameters to constrain the membership and non-membership degrees, thereby demonstrating greater flexibility and robustness in multi-criteria decision-making. Yang et al.23 proposed a fuzzy rough set method that is aware of noise for feature selection. Ye et al.24 proposed a fuzzy rough set model for multi-attribute decision making for feature selection in multi-label learning. He et al.25 studied the selection of features of incomplete decision information systems based on fuzzy rough sets. Zhang et al.26 redefined the fuzzy rough set model of fuzzy cover based on the fuzzy rough set theory, providing a new thinking direction for the fuzzy rough set theory. Deng et al.27 conducted a theoretical analysis of the fuzzy rough set and proposed a feature selection algorithm that combines the distribution of labels. With the continuous deepening of the research, some researchers select features by constructing a fuzzy rough set feature evaluation function. Wang et al.28 introduced the distance measure into the fuzzy rough set and studied the calculation model of the iterative evaluation function based on variable distance parameters. Zhang et al.29 used information entropy to measure uncertainty for feature selection. Qian et al.30 proposed a label distribution feature selection algorithm based on mutual information. Qiu et al.31 studied the hierarchical feature selection method based on the Hausdorff distance. Sun et al.32 studied the fuzzy rough set online flow evaluation function for feature selection. However, An et al.33 proposed a feature selection method based on a rough relative fuzzy approximation of the maximum positive region by defining a relative fuzzy dependency function to evaluate the importance of features for decision-making. Liang et al.34 proposed a robust feature selection method based on the similarity of the kernel function and the relative classification uncertainty measure by using K nearest neighbor and Bayesian rules to generate an uncertainty measure. Zhang et al.35 enhanced the data fitting ability of fuzzy rough set theory by adopting an adaptive learning mechanism, and thus proposed a feature selection algorithm based on adaptive relative fuzzy rough set. Chen et al.36 conducted research on multi-source data and proposed an algorithm for fusion and feature selection by minimizing entropy to eliminate redundant features.

In general, the aforementioned approaches primarily construct dependency functions by extracting the maximum fuzzy membership degree of each sample with respect to the decision classes–i.e., preserving the maximal fuzzy positive region–to evaluate the importance of feature subsets. However, such methods utilize only the maximum membership value during data analysis and neglect the potentially valuable discriminative information embedded in the non-maximal membership degrees. To address this limitation, Wang et al.37 proposed a feature selection method based on the minimum classification error grounded in Bayesian decision theory, which posits that a smaller overlap between class-conditional probability density curves leads to a lower classification error. Their approach computes inner products between fuzzy membership degrees (excluding samples with zero membership values) to characterize this minimal classification error. Nevertheless, when inner-product correlations are computed solely on the basis of a subset of samples (i.e., partial samples), two critical issues arise. First, fuzzy membership functions derived from partial samples are often discontinuous or exhibit abrupt jumps over the global universe of discourse. This violates the continuity assumption that underlies the “minimal overlap Inline graphic minimal error” principle in Bayesian analysis, thus compromising its theoretical foundation and significantly degrading the reliability of the classification performance. Second, during feature evaluation, if inner-product correlations are computed from a subset of samples while the fuzzy positive region is estimated using the entire sample set, the two components rely on inconsistent sample bases. Specifically, under a given feature, the number of samples used to compute inner-product correlations may differ from that used to calculate the fuzzy positive region. In such cases, combining the inner-product correlation with the fuzzy positive region to compute the incremental dependency–and subsequently using this metric to assess the importance–lacks both reasonableness and consistency.

To more fully exploit the latent discriminative information inherent in the minimum classification error criterion and enhance the capability of inner-product correlation for feature selection, this paper makes the following contributions: Firstly, a non-zero fuzzy similarity relation function is constructed, and the decision information is fuzzified based on the class center sample strategy; Secondly, a continuous fuzzy membership degree curve is constructed on the universe of discourse based on the fuzzy membership function, and the degree of overlap between the fuzzy membership degree curves of different feature subsets is quantified using the inner product correlation, thereby enhancing the model’s screening ability for features. Finally, drawing on the matrix operation strategy, a matrix generation strategy for fuzzy membership degrees and inner product correlations is proposed to improve the computational efficiency of the algorithm. Based on the above research, this paper designs a feature selection algorithm based on the minimum classification error (Minimum Classification Error-based Feature Selection, MCEFS).

However, as data environments evolve, continuing to use static feature selection methods may result in a large number of redundant computations, thereby reducing the computational efficiency of the algorithm. Incremental feature selection methods, which leverage prior knowledge to select features from dynamically changing data, have thus garnered significant attention from researchers4043. Sang et al.44 studied the incremental feature selection method for ordered data with dynamic interval values. Wang et al.45 proposed an incremental fuzzy tolerance rough set method for intuitionistic fuzzy information systems by updating the rough approximation of fuzzy tolerance. Zhang et al.46 proposed a novel incremental feature selection method using sample selection and accelerators based on the discriminative score feature selection framework. Yang et al.47 proposed an incremental feature selection method for interval-valued fuzzy decision information systems by studying two related incremental algorithms for sample insertion and deletion. Zhao et al.48 proposed a two-stage uncertainty measurement and designed an incremental feature selection algorithm capable of handling incomplete stream data. Xu et al.38 proposed a matrix-based incremental feature selection method based on weighted multi-granularity rough sets by minimizing the loss function to obtain the optimal weight vector, effectively improving the efficiency of feature selection algorithms. Zhao et al.39 based on fuzzy rough set theory, proposed a consistency principle to evaluate the significance of the feature, and combined with the principle of representative samples, designed three acceleration algorithms for incremental feature selection.(Table 1 presents a detailed comparison).

Table 1.

Comparison of incremental feature selection methods.

This study Xu et al.38 Zhao et al.39
Tool type Fuzzy rough set Multigranulation rough set Fuzzy rough set
Relation type Fuzzy relation Crisp relation Fuzzy relation
Feature selection mechanism The minimum classification error

Optimal neighborhood

knowledge granularity weights

Inline graphic-level consistency region

Inline graphic-level membership degree

Inline graphic-level reduce

Evaluation Method

Fuzzy dependency

Inner product dependency

Conditional entropy

feature weights (positive correlation , negative correlation )

Consistency approximation importance measure

Fuzzy dependency

Incremental mechanism

Block updating of the fuzzy relation matrix and fuzzy decision matrix

Block updating of the fuzzy lower approximation matrix

Update of the neighborhood relation matrix

Update of the decision matrix

updates positive and negative domain vectors

Acceleration strategy

Check whether the new samples have any impact on the original acceleration method

Computational complexity Inline graphic Inline graphic Inline graphic

In summary, most existing incremental methods primarily focus on updating strategies for sample relationships when samples change, while paying less attention to updating strategies for fuzzy decisions and the generation of fuzzy rough approximations. To reduce redundant calculations of fuzzy rough approximations and enhance the efficiency of incremental algorithms, this paper has carried out the following work: First, we designed fuzzy relation and fuzzy decision matrices based on a block matrix updating strategy. Second, through an in-depth analysis of the calculation method for fuzzy lower approximations, we developed a block-based updating method for fuzzy lower approximation matrices to reduce redundant calculations of fuzzy approximations. Finally, we proposed an incremental feature selection algorithm based on block matrices (BM-MCEFS) to improve the algorithm’s adaptability to dynamic data environments.

The paper is organized as follows. Section “Preliminaries” introduces the fundamental concepts of fuzzy rough sets and fuzzy decision systems. In Section “A fuzzy rough model based on minimum classification error”, the proposed fuzzy similarity relation is incorporated into fuzzy decision information, based on which an inner-product dependency function is constructed and a corresponding static feature-selection algorithm is designed by matrix operations. Section “Incremental method based on block matrix” presents an incremental feature-selection algorithm that realizes dynamic updates using block-matrix techniques. Experimental datasets and an in-depth analysis of the results are provided in Section “Experimental results and analysis”. Finally, Section “Conclusions” summarizes the paper, discusses its limitations, and outlines directions for future research.

Preliminaries

To facilitate understanding of the subsequent content of this paper, this section provides a brief review of concepts related to fuzzy rough sets that are pertinent to this study.

Definition 1

49 Let the triplet (UAD) represent a decision table, in which Inline graphic represents a non-empty finite domain of discourse, Inline graphic is a condition set, Inline graphic is a decision set. If A is a mapping to U, Inline graphic, then A is called a fuzzy set on U, in which A(X) represents the degree of membership of X to A, therefore the fuzzy set in the domain U can be denoted as F(U), that is, Inline graphic.

Definition 2

50 Let U be the domain Inline graphic. If R represents the fuzzy similarity relation of any sample Inline graphic with respect to the conditional feature a, then it satisfies the following properties:

  1. Reflexivity: Inline graphic,

  2. Symmetry: Inline graphic.

If there exists a feature subset Inline graphic, then the fuzzy similarity relation on B is defined as Inline graphic, that is, Inline graphic. The fuzzy granularity of the sample Inline graphic.

Definition 3

51 Let (UAD) be a decision table, Inline graphic, Inline graphic, if Inline graphic is the fuzzy subset, then the fuzzy lower and upper approximations in B are respectively defined as follows:

graphic file with name d33e668.gif 1
graphic file with name d33e672.gif 2

In which “Inline graphic” and “Inline graphic” denote the maximum and minimum operations, respectively. Inline graphic represents the degree of certainty that the sample Inline graphic is subordinate to P; Inline graphic represents the degree of probability that the sample Inline graphic is subordinate to P; then Inline graphic is called a pair of fuzzy approximation operators of P.

Definition 4

52 Let (UAD) be a decision table, where Inline graphic; decision D be divided into r crisp equivalence classes in the domain U, that is, Inline graphic. For any sample Inline graphic, if Inline graphic, then the fuzzy lower and upper approximation can be simplified as:

graphic file with name d33e758.gif 3
graphic file with name d33e763.gif 4

Definition 5

53 Let (UAD) be a decision table, Inline graphic, Inline graphic, For any Inline graphic, its degree of membership in the fuzzy positive region is defined as:

graphic file with name d33e796.gif 5

The fuzzy rough dependency of the decision D on the feature subset B is defined as:

graphic file with name d33e807.gif 6

In which Inline graphic represents the cardinal number U of the domain. The fuzzy dependency function can be interpreted as the ratio of the cardinality of the fuzzy positive region to the total number of samples. In the theory of fuzzy rough sets, it is commonly used to evaluate the significance of a feature subset.

From Equation 6, it is evident that the fuzzy rough dependence can only retain the maximum dependence between the samples and the decisions. The analysis of the area of overlap between decisions, that is, the minimum classification error, is insufficient. Figure 1 depicts the basic process of feature selection based on classical fuzzy rough sets.

Fig. 1.

Fig. 1

The basic concept diagram of fuzzy rough set.

A fuzzy rough model based on minimum classification error

In this section, a method for generating fuzzy lower approximation matrices by incorporating matrix operations is proposed. Furthermore, based on the minimum classification error criterion, a novel inner product relevance function is constructed, and a corresponding static feature selection algorithm is designed accordingly. The flow of the static algorithm is shown in Fig. 2.

Fig. 2.

Fig. 2

Static algorithm framework diagram.

Definition 6

Let (UAD) be a decision table. If Inline graphic, Inline graphic, then the fuzzy similarity relation of sample Inline graphic is defined as:

graphic file with name d33e878.gif 7

in which Inline graphic represents the value of sample Inline graphic under feature a. From Equation 7, it is apparent that Inline graphic satisfies reflexivity, symmetry, and Inline graphic. Alternatively, letting Inline graphic, the fuzzy similarity relation matrix can be expressed as Inline graphic, in which Inline graphic.

The advantage of Equation 7 lies first in its strictly positive nature, which ensures that the induced fuzzy lower approximation is continuous over the entire universe of discourse. Second, its wide mapping range (approximately (0,1]) enables relatively high sensitivity to differences in feature values. As evidenced by Table 3 derived from the decision table in Table 2–Equation 7 covers a wider range within (0, 1) than the similarity relation built with the Gaussian kernel. For the fuzzy similarities between sample Inline graphic and the remaining samples in Table 2, Equation 7 gives a spread of 0.586 (max 0.722 -min 0.136), whereas the Gaussian kernel yields only 0.451. Equation 7 also discriminates samples better than the Euclidean-distance-based similarity: samples Inline graphic and Inline graphic are distinct, yet their Euclidean similarities to Inline graphic are identical, while Equation 7 produces different values and thus distinguishes them effectively.

Table 3.

Comparison table of fuzzy similarity relations constructed based on different distance Functions.

Inline graphic Inline graphic(Gaussian Kernel) Inline graphic(Euclidean Distance)
1 0.625 0.722 0.411 0.282 0.21 0.142 0.136 0.111 1 0.968 0.985 0.861 0.799 0.695 0.523 0.517 0.439 1 0.797 0.85 0.646 0.599 0.54 0.467 0.465 0.438
0.625 1 0.508 0.495 0.311 0.226 0.15 0.147 0.118 0.968 1 0.93 0.872 0.794 0.716 0.56 0.532 0.468 0.797 1 0.724 0.656 0.596 0.55 0.481 0.471 0.448
0.722 0.508 1 0.37 0.316 0.236 0.159 0.153 0.124 0.985 0.93 1 0.848 0.828 0.712 0.54 0.532 0.455 0.85 0.724 1 0.636 0.619 0.548 0.474 0.471 0.444
0.411 0.495 0.37 1 0.449 0.419 0.274 0.26 0.213 0.861 0.872 0.848 1 0.884 0.878 0.752 0.758 0.645 0.646 0.656 0.636 1 0.668 0.662 0.57 0.573 0.516
0.282 0.311 0.316 0.449 1 0.612 0.421 0.405 0.311 0.799 0.794 0.828 0.884 1 0.968 0.871 0.857 0.811 0.599 0.596 0.619 0.668 1 0.797 0.655 0.642 0.607
0.21 0.226 0.236 0.419 0.612 1 0.582 0.56 0.426 0.695 0.716 0.712 0.878 0.968 1 0.957 0.945 0.904 0.54 0.55 0.548 0.662 0.797 1 0.771 0.749 0.69
0.142 0.15 0.159 0.274 0.421 0.582 1 0.656 0.674 0.523 0.56 0.54 0.752 0.871 0.957 1 0.969 0.972 0.467 0.481 0.474 0.57 0.655 0.771 1 0.799 0.808
0.136 0.147 0.153 0.26 0.405 0.56 0.656 1 0.574 0.517 0.532 0.532 0.758 0.857 0.945 0.969 1 0.962 0.465 0.471 0.471 0.573 0.642 0.749 0.799 1 0.782
0.111 0.118 0.124 0.213 0.311 0.426 0.674 0.574 1 0.439 0.468 0.455 0.645 0.811 0.904 0.972 0.962 1 0.438 0.448 0.444 0.516 0.607 0.69 0.808 0.782 1

Table 2.

Decision table.

U Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic D
Inline graphic 0.09 0.16 0.22 0.29 0.27 1
Inline graphic 0.26 0.27 0.16 0.15 0.29 1
Inline graphic 0.12 0.06 0.25 0.42 0.32 1
Inline graphic 0.34 0.34 0.66 0.24 0.36 2
Inline graphic 0.46 0.42 0.48 0.67 0.45 2
Inline graphic 0.58 0.56 0.64 0.61 0.49 2
Inline graphic 0.81 0.69 0.75 0.67 0.54 3
Inline graphic 0.61 0.76 0.88 0.71 0.55 3
Inline graphic 0.79 0.86 0.74 0.82 0.61 3

Definition 7

Let (UAD) be a decision table, Inline graphic, Inline graphic, Inline graphic, then the class center sample can be defined as:

graphic file with name d33e1802.gif 8

by incorporating the class center samples, the corresponding fuzzy decision can be derived from U/D as follows:

graphic file with name d33e1813.gif 9

in which Inline graphic represents the degree of fuzzy similarity between the sample x and the center sample of the class Inline graphic. For Inline graphic, Inline graphic, Inline graphic, Inline graphic. Let Inline graphic, Inline graphic, the fuzzy decision matrix can be expressed as Inline graphic, in which Inline graphic.

Definition 8

Let (UAD) be a decision table, Inline graphic, Inline graphic. Inline graphic is the fuzzy decision corresponding to Inline graphic, the fuzzy lower and upper approximations are defined as:

graphic file with name d33e1893.gif 10
graphic file with name d33e1897.gif 11

From Equation 7, we can see Inline graphic and obtain Inline graphic. Since Inline graphic, Inline graphic, Inline graphic, that is, Inline graphic. It can be seen that in the domain U, Inline graphic generates an uninterrupted fuzzy lower approximation membership curve, as shown in Fig. 3.

Fig. 3.

Fig. 3

Fuzzy lower approximate membership curves of three decision classes on feature subset B.

According to Inline graphic and Inline graphic, combined with the matrix operation, the fuzzy lower and upper approximation matrix can be converted into the following:

graphic file with name d33e1951.gif 12
graphic file with name d33e1955.gif 13

In which Inline graphic represents the reverse of elements in Inline graphic. Let “Inline graphic” denote the union set of each element of each row in Inline graphic and each element of the column corresponding to the matrix Inline graphic, then take the intersection of all the union sets, that is, “Inline graphic” is equivalent to “Inline graphic”. With the same argument: “Inline graphic” represents the operation process “Inline graphic”. Here we can get Inline graphic.

Property 1

Let Inline graphic, Inline graphic, U/D represents the clear division of decision on the domain, FD represents fuzzy decision corresponding to U/D, then the following formula holds:

(1) If Inline graphic, then Inline graphicInline graphic, Inline graphicInline graphic,

(2) If Inline graphic, then Inline graphic, Inline graphic.

Proof

(1) If so Inline graphic, from Equation 3, we can obtain Inline graphic. FD represents the fuzzy decision corresponding to U/D, and from Equation 10, we can obtain Inline graphic. Therefore, we can get Inline graphic, thus Inline graphic. Quod erat demonstrandum.

In the same way, it can be demonstrated that Inline graphic

(2) From Inline graphic we have Inline graphic, that is, Inline graphic, and then we can obtain Inline graphic. From Equation 10 we have Inline graphic, that is, Inline graphic. Quod erat demonstrandum.

In the same way, it can be demonstrated that Inline graphic. Inline graphic

Clearly, from Property 1, Inline graphic satisfies monotonicity, and relative to Inline graphic, the fuzzy lower approximation Inline graphic is expanded.

Definition 9

Let (UAD) be a decision table, Inline graphic, Inline graphic, the fuzzy positive region and the fuzzy rough dependence are redefined, respectively, as:

graphic file with name d33e2186.gif 14
graphic file with name d33e2190.gif 15

Clearly, combined with Fig. 3, Inline graphic is the solid line part, and the dotted line part represents the overlapping area of the fuzzy lower approximate membership curve, that is, the minimum classification error.

Definition 10

Let (UAD) be a decision table, Inline graphic, Inline graphic, Inline graphic, FD be the fuzzy decision corresponding to U/D, the inner product relevance function is then defined as:

graphic file with name d33e2238.gif 16

in which U is the domain and r is the number of decision categories. Since Inline graphic, Inline graphic can be obtained.

Next, this paper will clearly demonstrate the key role that the inner product dependency function plays through a simple and intuitive example.

Example 1

Let Inline graphic, Inline graphic, Inline graphic. FD represents fuzzy decision corresponding to U/D. If the lower approximation membership degrees of the sample to Inline graphic are Inline graphic = Inline graphic, Inline graphic = Inline graphic; Inline graphic = Inline graphic, Inline graphic = Inline graphic; Inline graphic = Inline graphic, Inline graphic = Inline graphic. Obviously, if only based on the maximal fuzzy positive region Inline graphic = Inline graphic and Inline graphic = Inline graphic, then according to the fuzzy dependency function Inline graphic = Inline graphic, Inline graphic can be obtained. At this point, it can be observed that the fuzzy dependency function only retains the maximum fuzzy approximation degree of the sample. At this time, it is impossible to distinguish the feature set B and C. If the inner product-dependent function is used, then we can obtain Inline graphic, and similarly Inline graphic. At this point, it can be observed that the classification capabilities of the feature sets B and C are obviously different.

At this point, it is obvious that the method based on inner product functions can effectively enhance the feature recognition ability.

Theorem 1

Let (UAD) be a decision table, Inline graphic, Inline graphic = Inline graphic, if Inline graphic = Inline graphic, then Inline graphic = Inline graphic. Where Inline graphic is an element in the matrix Inline graphic.

Proof

From Definition 8, the transpose matrix of Inline graphic is Inline graphic . Let Inline graphic be an element in the matrix Inline graphic, Inline graphic, it is evident that Inline graphic, and when Inline graphic, Inline graphic can be obtained. Therefore, Inline graphic. Quod erat demonstrandum. Inline graphic

By Theorem 1, the calculation of the inner product dependent function can be transformed into a matrix operation.

Theorem 2

Let (UAD) be a decision table. If Inline graphic, then Inline graphic.

Proof

From Inline graphic, Inline graphic, we can get Inline graphic, Inline graphicInline graphic. Thus, for Inline graphic, Inline graphic, we can get Inline graphicInline graphic, Inline graphicInline graphic. Therefore, Inline graphicInline graphic, combined with Equation 16, Inline graphic can be obtained. Quod erat demonstrandum. Inline graphic

From Theorem 2, the inner product dependence function satisfies the monotonicity requirement on the domain U.

Theorem 3

Let (UAD) be a decision table, Inline graphic, Inline graphic. If Inline graphic, then Inline graphic.

Proof

For Inline graphic, since Inline graphic, Inline graphic, we can get Inline graphic, Inline graphic, Inline graphic. Consequently, Inline graphicInline graphic can be obtained, that is, Inline graphicInline graphic, From Equation 16, Inline graphic can be obtained. Therefore, for Inline graphic, Inline graphic, Inline graphic = Inline graphic if and only if Inline graphic = Inline graphic, and since Inline graphic = Inline graphic, Inline graphic = Inline graphic, we can get Inline graphic = Inline graphic, then Inline graphic = Inline graphic can be obtained, therefore, when Inline graphic = Inline graphic, Inline graphic = Inline graphic. Quod erat demonstrandum. Inline graphic

Remark 1

Theorem 3 does not necessarily hold in the opposite way. Suppose that when Inline graphic, Inline graphic can be obtained from the fuzzy rough dependence, that is,

graphic file with name d33e2813.gif

clearly, for Inline graphic, Inline graphic can be obtained with the only need of satisfying:

graphic file with name d33e2826.gif

but it can be known from Theorem 3 that when Inline graphic, Inline graphic, Therefore, when Inline graphic, Inline graphic is not necessarily true.

It is evident from the above that the relevance of the inner product encompasses the classification information captured by the fuzzy rough dependency. However, the fuzzy rough dependency cannot fully represent the classification information conveyed by the inner product relevance function.

As shown in Fig. 4, the domain U illustrates the superimposed distributions of the fuzzy lower approximation membership curves for feature subsets H and F. When Inline graphic, the feature subsets H and F exhibit the same classification ability according to fuzzy rough dependency. However, according to Bayesian Decision Theory, a smaller overlap between membership curves corresponds to a lower classification error rate. Therefore, it can be observed from Fig. 4 that in the feature subset F, the overlap area of the membership curve of the fuzzy lower approximation Inline graphic, Inline graphic, Inline graphic is obviously larger than that of the membership curve of Inline graphic, Inline graphic, Inline graphic in the feature subset H. Thus, if Inline graphic, E is Inline graphic, P is Inline graphic, J is Inline graphic, then the inner product is Inline graphic. It follows that in the domain U, the relevance of the inner product can reflect the degree of fuzzy lower approximation overlap of different feature spaces, that is, the effect of minimum classification error on the classification of the data. Therefore, although the feature subsets H and F have the same fuzzy positive region, the feature subset H has a greater classification ability than the feature subset F according to Bayesian Decision Theory.

Fig. 4.

Fig. 4

Fuzzy lower approximation membership curves of feature subsets H and F. In the feature spaces H and F, if the analysis is based only on the maximum fuzzy positive region, the classification performance of the two spaces appears similar. However, when the minimum classification error is also considered, the distinction in classification capability between these two spaces becomes apparent.

Definition 11

Let (UAD) be a decision table, Inline graphic. If B satisfies the following conditions:

  1. Inline graphic,

  2. Inline graphic.

Then B is called a reduction of A, that is, B is the smallest feature subset with the same classification ability as A.

If Inline graphic, then the importance of the feature a to B is defined as:

graphic file with name d33e3043.gif 17

The numerator Inline graphic represents the increase in fuzzy dependency induced by the addition of the feature a, while the denominator is defined as the square root of the function based on the inner-product Inline graphic. Clearly, the stronger the discriminative power of the feature a, the larger the resulting measure Inline graphic.therefore, the importance measure proposed in this formulation jointly accounts for both the increase in fuzzy dependency and the minimization of the classification error (as reflected by the term of inner-product). Consequently, Equation 17 provides a more accurate and comprehensive assessment of the classification capability of a candidate feature.

Example 2

Let Table 2 be a decision table, in which Inline graphic, Inline graphic, Inline graphic Inline graphic, Inline graphic. Then the calculation example is as follows.

  1. From Definition 6, the fuzzy similarity relation matrix Inline graphic can be obtained as follows:
    graphic file with name d33e3108.gif
  2. For Inline graphic, the degree of intra-class similarity is: Inline graphic=1.422, Inline graphic=1.416, Inline graphic=1.265. Therefore, from Equation 8, we can get Inline graphic, and similarly Inline graphic, Inline graphic. From Equation 9, Inline graphic=0.702 can be obtained, and similarly, Inline graphic=0.198, Inline graphic=0.100. The fuzzy decision matrix Inline graphic is shown as follows:
    graphic file with name d33e3168.gif
  3. From Equation 12 and Equation 13, the fuzzy lower approximation matrix Inline graphic and fuzzy upper approximation matrix Inline graphic can be obtained as:
    graphic file with name d33e3190.gif
    The fuzzy positive region and the inner product matrix are:

    Inline graphic Inline graphic ,

    Inline graphic Inline graphic. Thus, the inner product Inline graphic=4.384 can be obtained.

In terms of the above analysis, this paper designs a feature selection algorithm based on minimum classification error (MCEFS). The pseudo-code of the algorithm is shown in Algorithm 1.

Algorithm 1.

Algorithm 1

MCEFS

In Algorithm 1, the parameter Inline graphic is used to terminate the main loop. In fact, the optimal value of Inline graphic varies for different datasets. Assuming the sample size, the number of features, and the number of decision classes are n, m, and c respectively, the first step initializes the feature selection conditions; the third step constructs the fuzzy similarity relation, with a computational complexity of Inline graphic; the fourth step processes the fuzzy decision, with a computational complexity of Inline graphic; the fifth step calculates the fuzzy lower approximation, with a computational complexity of Inline graphic; based on this, the sixth step of the algorithm calculates the inner product correlation under the candidate features according to the fuzzy lower approximation value, with a computational complexity of Inline graphic; the eighth step of the algorithm calculates the fuzzy rough dependency degree respectively and combines the increment of the fuzzy rough dependency degree with the inner product correlation to evaluate the increment of the dependency degree brought by the new feature; the ninth step selects the feature with the highest increment of dependency degree from the candidate features and adds it to the feature subset. The eleventh step judged the increment of the dependency degree brought about by the new feature. When the difference between after and before addition is greater than the preset parameter Inline graphic, the algorithm continues to loop from step 1 to step 9; otherwise, the algorithm terminates and outputs the final result of the selection of features. Therefore, in the process of finding the optimal feature subset, it may need to be evaluated Inline graphic times. Thus, the total complexity of Algorithm 1 is Inline graphic.

Example 3

Let Table 2 (UAD) be a decision table, examples of the feature selection process are as follows: Let Inline graphic, Inline graphic. After the first traversal, for Inline graphic, Inline graphic is calculated as follows: Inline graphic=1.115, Inline graphic=1.104, Inline graphic=1.079, Inline graphic=1.067, Inline graphic=1.063; we can get Inline graphic. After the second traversal, Inline graphic=0.069, Inline graphic=0.174, Inline graphic=0.168, Inline graphic=0.025; we can get Inline graphic. After the third traversal, Inline graphic=0.155, Inline graphic=0.120, Inline graphic=0.094; we can get Inline graphic. After the forth traversal, Inline graphic=0.139, Inline graphic =0.106; we can get Inline graphic. After the fifth traversal, Inline graphic, the traversal is terminated. Thus, Inline graphic.

This section has first introduced a method for generating fuzzy rough approximation matrices with the aid of matrix operations. Subsequently, an inner product relevance function was constructed in the domain and its properties are analyzed. Finally, a static feature selection algorithm is proposed which takes into account the minimum classification error.

Incremental method based on block matrix

To effectively address the challenges of dynamic data environments, this section proposes an incremental feature selection method based on a block matrix framework, developed through an in-depth analysis of fuzzy rough sets. By constructing a block matrix model, the proposed method enables efficient handling of dynamic data and facilitates rapid updating of the results of feature selection. The framework of the incremental method is shown in Fig. 5.

Fig. 5.

Fig. 5

Dynamic algorithm framework diagram.

Theorem 4

Let (UAD) be a decision table, Inline graphic, Inline graphic, and the new samples Inline graphic, Inline graphic be the domain after the samples are increased. If the fuzzy similarity relation matrix Inline graphic is updated to Inline graphic, then the update method would be as follows:

graphic file with name d33e3462.gif 18

Proof

According to the idea of a block matrix, the matrix Inline graphic after increasing the sample can be regarded as a matrix composed of four block matrices, that is, Inline graphic. When Inline graphic, the block matrix Inline graphic represents the fuzzy similarity relation matrix before the sample update; the block matrix Inline graphic represents when Inline graphic, Inline graphic, Inline graphic; the block matrix Inline graphic represents when Inline graphic, Inline graphic, Inline graphic; the block matrix Inline graphic represents when Inline graphic, Inline graphic, Inline graphic. Based on the above analysis, quod erat demonstrandum. Inline graphic

Theorem 5

Let (UAD) be a decision table, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic be the center sample of class Inline graphic, Inline graphic, FD is the fuzzy decision corresponding to U/D, Inline graphic, if the fuzzy decision matrix Inline graphic is updated to Inline graphic, then the update method would be as follows:

graphic file with name d33e3620.gif 19

Proof

Let Inline graphic, Inline graphic, when Inline graphic, show that the class center sample Inline graphic remains unchanged. Now, the fuzzy decision matrix Inline graphic can be regarded as a matrix composed of two block matrices, namely Inline graphic. When Inline graphic, the block matrix Inline graphic, that is, the original fuzzy decision matrix remains unchanged; when Inline graphic, the block matrix Inline graphic, that is, only the fuzzy decision of the newly added samples needs to be calculated. But when Inline graphic, it shows that the center sample of the class Inline graphic changes, and now Inline graphic. That is, fuzzy decision-making requires recalculations. Quod erat demonstrandum.

From the above theorem, it can be observed that when the number of samples increases, the update of the fuzzy decision matrix can be described by the block matrix.

Theorem 6

Let Inline graphic, Inline graphic, FD be the fuzzy decision corresponding to U/D, for Inline graphic, the following equation holds.

graphic file with name d33e3969.gif 20

Proof

graphic file with name d33e3977.gif

If Inline graphic, Inline graphic, Inline graphic, then from Equation 20, the fuzzy approximation of Inline graphic is equivalent to the union of fuzzy approximations of X and Y.

According to the analysis above, if the matrix Inline graphic can be obtained from the block matrix Inline graphic of the matrix Inline graphic, and the block matrix Inline graphic of the matrix Inline graphic, it is noticeable that the matrix Inline graphic is the fuzzy lower approximation obtained by the fuzzy relation of the new sample and the fuzzy decision of the new sample. Meanwhile, it is of the same type of Inline graphic.

Theorem 7

Let (UAD) be a decision table, Inline graphic, Inline graphic, Inline graphic be the center sample of class Inline graphic, Inline graphic, Inline graphic. If Inline graphic, the fuzzy lower approximation matrix Inline graphic is updated to Inline graphic, then the update method would be as follows:

graphic file with name d33e4113.gif 21

Proof

Suppose that the class center sample remains, then the fuzzy lower approximation matrix Inline graphic after the sample has been increased can be regarded as a matrix composed of two block matrices, that is, Inline graphic. Therefore, when Inline graphic, if Inline graphic, then Inline graphic, can be obtained by Theorem 6. If Inline graphic, Inline graphic can be obtained, that is, Inline graphic. When Inline graphic, Inline graphic. Quod erat demonstrandum. Inline graphic

Clearly, from Theorem 7, when the number of samples increases, if Inline graphic remains unchanged, the incremental method of the block matrix can effectively improve the update efficiency of the fuzzy lower approximation matrix. Again, it must be noted that the precondition of the incremental method is that Inline graphic remains unchanged.

Example 4

Let Table 4 be a decision table with increasing samples, in which Inline graphic, Inline graphic. If Inline graphic keeps unchanged, the calculation example is as follows:

  1. The fuzzy similarity matrix Inline graphic and the fuzzy decision matrix Inline graphic are updated as follows:
    graphic file with name d33e4215.gif
  2. Next, the fuzzy lower approximation matrix Inline graphic is updated. Firstly, Inline graphic is calculated:
    graphic file with name d33e4231.gif
  3. Combined with matrix Inline graphic, block matrix Inline graphic. From Equation 20, it could be obtained as the block matrix:

    Inline graphic .

    Thus, the updated fuzzy lower approximation matrix Inline graphic is:
    graphic file with name d33e4261.gif

Table 4.

Incremental sample decision table.

Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic D
Inline graphic 0.09 0.16 0.22 0.29 0.27 1
Inline graphic 0.26 0.27 0.16 0.15 0.29 1
Inline graphic 0.12 0.06 0.25 0.42 0.32 1
Inline graphic 0.34 0.34 0.66 0.24 0.36 2
Inline graphic 0.46 0.42 0.48 0.67 0.45 2
Inline graphic 0.58 0.56 0.64 0.61 0.49 2
Inline graphic 0.81 0.69 0.75 0.67 0.54 3
Inline graphic 0.61 0.76 0.88 0.71 0.55 3
Inline graphic 0.79 0.86 0.74 0.82 0.61 3
Inline graphic 0.04 0.11 0.18 0.21 0.26 1
Inline graphic 0.39 0.41 0.55 0.72 0.60 2

In light of the examples above, we see clearly that the incremental update of the fuzzy lower approximation matrix can be realized by combining the block matrix. As a result, this paper proposes an incremental feature selection based on the block matrix Algorithm 2 (BM-MCEFS).

Algorithm 2.

Algorithm 2

BM-MCEFS

In Algorithm 2, the computational complexity of updating the fuzzy relation matrix in Step 4 is Inline graphic, that of updating the fuzzy decision matrix in Step 6 is Inline graphic, and the complexity of computing the fuzzy lower approximation matrix in Step 7 is Inline graphic. The calculation of relevance based on the inner-product in Step 12 has complexity Inline graphic. Since finding the optimal feature subset may require Inline graphic evaluations, the overall complexity of Algorithm 2 is Inline graphic.

It is self-evident that, when the class centers remain stable, it is only necessary to calculate the fuzzy relationships and fuzzy decisions of the newly added samples. This can effectively avoid a large number of redundant calculations and enable the algorithm to achieve optimal performance. Conversely, if the sample centers of the class change frequently, it can be seen from Equation 9 that the fuzzy decisions will also change accordingly, and in this case the algorithm will not be able to achieve the best performance.

Experimental results and analysis

Against three representative methods: the acceleration algorithm based on fuzzy rough set information entropy46 (AFFS), the heuristic algorithm with variable distance parameters based on distance measures28 (FRDM), the fuzzy rough set feature selection method based on relative distance33 (MPRB), Incremental feature selection with fuzzy rough sets for dynamic data sets54 (IRS) and incremental feature selection based on fuzzy rough sets for hierarchical classification55 (ASIRA).

The experimental environment is configured as follows: an Acer A850 equipped with a 12th-generation Intel Core i7-12700 CPU (2.10 GHz) running Windows 11. All algorithms are implemented in Python. The evaluation criteria include four main aspects: algorithm robustness, computation time, reduct size (i.e., number of selected features), and classification accuracy. To reduce experimental error, each data set is tested 10 times and the final results are reported as average values.

A total of 12 datasets, obtained from the UCI Machine Learning Repository , are used in the experiments and summarized in Table 5. Two widely-used classifiers, namely Support Vector Machine (SVM) and K-Nearest Neighbors (KNN), are employed to evaluate the performance of the feature selection algorithms. Before the experiments, all datasets (see Table 5) were normalized using the following preprocessing formula:

graphic file with name d33e4532.gif 22

In which Inline graphic represents the normalized information of sample x under feature a.

Table 5.

Description of experimental data.

NO Data set Abbreviation Sample Features Classes
1 MAGIC Gamma Telescope MGT 19020 10 2
2 Room Occupancy Estimation ROE 10129 18 4
3 Shill Bidding SHB 6321 9 2
4 Wine Quality WQ 4897 11 3
5 Iranian Churn IC 3150 13 2
6 Hepatitis C Virus for Egyptian patients HCE 1385 28 4
7 Turkish Music Emotion TME 400 50 4
8 TUANDROMD TR 364 241 2
9 Semeion Handwritten Digit SHD 284 256 2
10 Glass GL 214 10 7
11 Connectionist Bench CB 208 60 2
12 Iris IR 150 4 7

Markelle Kelly, Rachel Longjohn, Kolby Nottingham, The UCI Machine Learning Repository, https://archive.ics.uci.edu

Inner product relevance analysis

To analyze the relationship among inner product relevance (IPR), fuzzy rough dependency (FRD), and classification accuracy of the MCEFS algorithm, this paper randomly generates 12 feature subsets from the CB and TME datasets. The IPR, FRD and classification accuracy of each feature subset are calculated using a ten-fold cross-validation on KNN (K=3) and SVM. The results are presented in Tables 6 and 7.

Table 6.

IPR, FRD, and classification accuracy for the CB dataset.

NO Feature Subset FRD IPR SVM KNN
1 21, 17, 34, 43, 60 0.4545 0.1882 0.6431 0.7337
2 26, 9, 45, 7, 30 0.4546 0.1965 0.6672 0.6674
3 28, 33, 24, 56, 58 0.4560 0.1943 0.6044 0.5937
4 36, 39, 35, 25, 10 0.4571 0.1912 0.7201 0.7203
5 40, 38, 21, 14, 54 0.4619 0.1908 0.5943 0.6718
6 14, 7, 55, 20, 32 0.4618 0.1878 0.6382 0.7000
7 18, 41, 35, 16, 2 0.4672 0.2147 0.6189 0.5891
8 55, 19, 14, 22, 13 0.4672 0.1961 0.6675 0.6334
9 44, 11, 30, 21,18 0.4686 0.1860 0.7932 0.8121
10 58, 42, 38, 42, 30 0.4714 0.2168 0.5704 0.5936
11 31, 23, 18, 41, 35 0.4753 0.1903 0.7046 0.7202
12 20, 31, 36, 44,16 0.4826 0.1872 0.7487 0.7443

Table 7.

IPR, FRD, and classification accuracy for the TME dataset.

NO Feature Subset FRD IPR SVM KNN
1 10, 18, 11, 13, 50 0.2478 0.0563 0.4401 0.4147
2 36, 28, 1, 32, 16 0.2649 0.0580 0.4983 0.4571
3 29, 4, 44, 38, 11 0.2681 0.0574 0.4668 0.4024
4 2, 5, 1, 49, 4 0.2699 0.0561 0.5234 0.4922
5 33, 17, 9, 24, 50 0.2721 0.0554 0.4884 0.4219
6 8, 37, 30, 35, 44 0.2733 0.0561 0.4672 0.3173
7 17, 6, 44, 39, 23 0.2763 0.0557 0.5184 0.4503
8 18, 5, 24, 38, 43 0.2776 0.0558 0.4900 0.4821
9 49, 2, 45, 39, 26 0.2781 0.0551 0.6613 0.6264
10 25, 19, 48, 47, 33 0.2822 0.0563 0.4751 0.4573
11 36, 13, 43, 33, 14 0.2903 0.0567 0.4572 0.3590
12 44, 43, 24, 35, 46 0.3082 0.0563 0.6364 0.5484

From Tables 6 and 7, it can be observed that the minimum classification error (IPR of the inner product function) is related to the classification ability of the feature subsets. For example, in the CB dataset, the 9th group of feature subsets has the lowest IPR value and achieves the highest classification accuracy on both the KNN and SVM classifiers; conversely, the 10th group has a higher IPR value and correspondingly lower classification accuracy on both classifiers. In addition, the fifth and sixth groups have nearly identical FRD values; however, the sixth group has a smaller IPR and higher classification accuracy, indicating that when FRD remains constant, the minimum classification error can identify features that are more valuable for classification performance. Nevertheless, this conclusion does not hold universally. For example, in the TME dataset, although the fifth group has a relatively low IPR value, its classification accuracy is also low on both classifiers. This discrepancy arises because the minimum classification error captures only the information related to misclassification within the feature subset, which represents just one aspect of the overall classification capability. Therefore, relying solely on the minimum classification error cannot comprehensively reflect the classification performance of a feature subset.

Additionally, in this study, the threshold Inline graphic was set to 0. Using MCEFS, a series of feature subsets were successfully generated on the four datasets of TME, CB, WQ and SHB. In the SVM and KNN classifiers, based on the feature importance determined by the MCEFS algorithm, features were gradually added from high to low, and at the same time, features were also gradually added according to the unsorted feature sequence in the original data set (RAW) as a control group. The classification accuracy curves that changed with the increase in the number of features were plotted. The experimental results are illustrated in Figs. 6 and 7. The experimental results show that the MCEFS algorithm can significantly improve the classification accuracy of the data. For example, in Fig. 6, when the MCEFS algorithm obtains the highest classification accuracy on the CB dataset, its performance exceeds the highest classification accuracy of RAW; in the SHB dataset, the highest classification accuracy obtained by the MCEFS algorithm on both classifiers exceeds the classification accuracy of RAW. Therefore, it can be concluded that the MCEFS algorithm improves the accuracy of data classification.

Fig. 6.

Fig. 6

Classification accuracy curve with the increase of features (SVM).

Fig. 7.

Fig. 7

Classification accuracy curve with the increase of features(KNN).

To verify the sensitivity of the MCEFS algorithm to the parameter Inline graphic, this study conducted experiments on four randomly selected datasets (CB, HCE, TME, and GL) and validated the results of the selection of features using two classifiers, SVM and 3NN. During the threshold selection process, each dataset was tested multiple times to determine the fluctuation range of its threshold. Subsequently, the classifiers were employed to test the accuracy of the feature selection results, thereby identifying the threshold sensitivity interval for each dataset. Finally, within this sensitivity interval, a uniform step size was set to adjust the threshold and test the changes in reduction length, classification accuracy, and stability. The experimental results are shown in Figs. 8 and 9.

Fig. 8.

Fig. 8

Parameter Inline graphic sensitivity analysis under the KNN classifier.

Fig. 9.

Fig. 9

Parameter Inline graphic sensitivity analysis under the SVM classifier.

As can be seen in Figs. 8 and 9, as the parameter Inline graphic increases, the feature subset selected by the MCEFS algorithm gradually decreases in size, and its classification accuracy on the classifiers also exhibits certain fluctuations. However, within a specific range of values Inline graphic, these fluctuations are effectively controlled at a low level. Taking the Glass data set as an example (see Figs. 8 and 9), when Inline graphic ranges from 0.002 to 0.012, the classification accuracy remains relatively stable despite continuous changes in the parameter. This result indicates that the MCEFS algorithm demonstrates strong robustness during parameter adjustment.

Comparison of static feature selection algorithms

To evaluate the efficiency of feature selection of the MCEFS algorithm, this paper uses KNN and SVM classifiers to assess the optimal classification accuracy achieved by four comparison algorithms in 12 datasets. Based on these results, the optimal subsets of features and their corresponding running times are determined, as summarized in Table 8. Furthermore, the classification accuracy results for the 12 datasets, based on the selected optimal feature subsets, are presented in Fig. 10. In these tables, the values underlined indicate the highest classification accuracy achieved for each dataset.

Table 8.

The reduct size and running time of the four algorithms(number/time).

Data set RAW FRDM AFFS MPRB MCEFS
MGT 10 6.5/2441.73 3.7/4148.52 5.6/2704.46 5.2/2862.28
ROE 18 4.8/1512.25 8.8/4545.23 4.1/1155.61 9.3/2408.32
SHB 9 3.4/94.45 5.3/275.62 4.0/188.60 1.0/48.92
WQ 11 5.1/271.88 5.8/938.26 4.7/312.07 5.6/357.77
IC 13 1.0/82.25 8.4/461.91 9.1/192.03 6.2/231.63
HCE 28 14.3/143.75 18.4/358.61 12.5/128.68 6.9/98.03
TME 50 34.2/136.51 29.5/70.67 29.5/70.6765 25.1/149.22
TR 241 10/13.08 124/2362.56 1.9/1.86 64.0/2072.05
SHD 256 6/14.96 28/386.49 7.2/55.85 8.5/21.46
GL 10 6.3/1.29 6.7/3.42 5.0/2.65 6.1/1.468
CB 60 26.5/21.21 37.4/29.58 24.5/18.78 26.1/32.41
IR 4 2.5/0.2329 2.7/0.36 3/2.88 2.3/0.26
Average 21.3 10.05/835.60 23.22/1934.44 9.26/858.51 13.86/1241.76

Fig. 10.

Fig. 10

Classification-accuracy heatmap from feature-selected data.

The results of the experiment in Table 8 demonstrate that all four algorithms successfully achieved dimensionality reduction. Among them, the MCEFS algorithm generally selected fewer features than the other methods on most datasets. However, in the ROE dataset, the number of features selected by MCEFS exceeded that of the other three algorithms.

By comparing the results in Table 8 and Fig. 10, it is evident that the MCEFS algorithm consistently outperformed the others in terms of classification accuracy in both KNN and SVM classifiers. Specifically, of the original 18 features, MCEFS effectively selected only 9, achieving a substantial reduction in dimensionality.

In terms of computational time, the MCEFS algorithm performs comparably to other algorithms. However, for datasets with a large number of features (such as the CB dataset), the runtime of the MCEFS algorithm is relatively longer. This is mainly because the comparison algorithms only calculate the lower approximations of the samples, while MCEFS additionally computes the inner product correlation between the samples, thus increasing the computational time. It should be noted that MCEFS has achieved higher classification accuracy in most datasets. For example, on the SHB dataset, the MCEFS algorithm achieved higher classification accuracy even when selecting the smallest feature subset; on the IR dataset, the MCEFS algorithm significantly outperformed the other two algorithms in terms of accuracy. These results indicate that the method based on the minimum classification error can effectively compensate for some of the shortcomings of fuzzy rough dependency and improve the classification accuracy of the data.

To assess the robustness of the MCEFS algorithm, 10% and 20% label noise was introduced into five selected datasets. Then, a feature selection was performed and the optimal classification accuracy was recorded. The experimental results are presented in Tables 9 and 10. The results show that, compared to the other algorithms, MCEFS consistently achieves higher classification accuracy under both noise levels. This performance advantage is primarily attributed to the incorporation of the minimum classification error criterion into the MCEFS framework.

Table 9.

Classification accuracy of noise data at 10% noise level.

Data set Classifier RAW Noised data AFFS FRDM MPRB MCEFS
HCE SVM 67.87 45.59 45.59 45.53 47.75 48.35
KNN 63.03 45.16 45.16 44.44 44.45 45.09
TME SVM 75.62 75.10 62.06 67.09 68.18 69.81
KNN 68.33 60.26 49.22 57.78 52.97 59.06
GL SVM 97.19 85.87 85.87 85.87 78.79 87.25
KNN 94.83 85.41 85.41 84.48 92.12 86.32
CB SVM 85.00 77.33 68.64 78.26 83.36 80.67
KNN 84.52 79.64 58.38 83.02 82.36 82.57
IR SVM 95.90 93.33 76.62 92.67 91.61 92.67
KNN 95.24 92.00 70.52 92.00 87.68 94.00
Average 82.75 73.97 64.75 73.11 72.94 74.58

Table 10.

Classification accuracy of noise data at 20% noise level.

Data set Classifier RAW Noised data AFFS FRDM MPRB MCEFS
HCE SVM 67.87 43.87 45.38 44.09 46.06 47.26
KNN 63.03 45.02 45.38 44.44 47.63 45.23
TME SVM 75.62 75.87 71.84 66.10 54.55 75.62
KNN 68.33 59.03 61.03 57.01 47.45 62.56
GL SVM 97.19 88.64 88.64 89.61 64.70 91.00
KNN 94.83 86.82 86.82 85.84 77.05 87.27
CB SVM 85.00 82.55 70.62 79.17 70.45 80.64
KNN 84.52 81.52 66.17 82.57 63.73 81.57
IR SVM 95.24 91.90 74.43 91.24 78.21 91.24
KNN 95.90 89.90 71.05 89.90 67.14 91.29
Average 82.75 74.51 68.14 73.00 61.10 75.37

To further analyze and compare the statistical performance of the four algorithms, the Friedman test56 and the Nemenyi post-hoc test are applied based on the classification accuracy results presented in Fig. 10. These two statistical methods are defined as follows.

graphic file with name d33e5746.gif 23
graphic file with name d33e5750.gif 24

Where, n and k respectively represent the number of data sets and the number of algorithms, Inline graphic represent the average ranking order of the algorithm in all algorithms.

graphic file with name d33e5766.gif 25

Where, Inline graphic represents the degree of significance, and Inline graphic is the critical value at a given point.

If the null hypothesis based on the Friedman test is rejected, the Nemenyi post-hoc test is then employed to assess the significance of pairwise differences between algorithms. Specifically, if the difference in average rankings between two algorithms exceeds the critical distance (CD), their performance is considered significantly different. In the corresponding diagram, significant differences are indicated by the absence of connecting lines between algorithms, whereas non-significant differences are shown by horizontal lines connecting them.

Here, Inline graphic, Inline graphic, if Inline graphic, then the critical value56 Inline graphic. The Friedman statistics calculated by SVM and KNN are 7.815 and 3.86 respectively, and are greater than the critical value of 2.569. Based on this, in this paper the null hypothesis of the equivalence of the SVM and KNN models is rejected, indicating that there are significant differences between the algorithms. Next, according to Equation 25, we get CD=1.35. Therefore, the CD diagram of four feature selection algorithms under SVM and KNN is shown in Fig. 11. As can be clearly seen from Fig. 11, The Nemenyi test indicates that, under the SVM and KNN classifiers, the MCEFS algorithm is significantly superior to other algorithms, highlighting its competitiveness.

Fig. 11.

Fig. 11

The Nemenyi test results of classification accuracies of the four algorithms.

Comparison of feature selection algorithms under sample increase

In order to test the efficiency of the incremental feature selection algorithm after the sample is added, this section randomly divides each data set into two parts, of which 50% of the sample is used as the original data, and the remaining 50% of the sample is used as the test data set. Each time, 10% of the test sample is randomly added, and until it reaches 50% accumulatively, 10% is added each time, up to 5 times. Based on the feature subset corresponding to the optimal classification accuracy selected by the algorithm, and recording the time consumed by each algorithm in selecting these feature subsets, the computational time of different algorithms can be accurately evaluated. The time required for different algorithms to calculate based on 12 datasets is shown in Fig. 12.

Fig. 12.

Fig. 12

The running time of three algorithms with a certain ratio of column samples.

For each subgraph in Fig. 12, the abscissa represents the increase in the proportion of the sample, and the ordinate represents the added calculation time of the six algorithms. In Fig. 12, it is evident that in the 12 data sets, with increasing proportion of samples, the calculation time of all algorithms shows an upward trend, but compared to other algorithms, the calculation time duration of the BM-MCEFS algorithm is the least. At the same time, ten-fold cross-validation is used to test the change in the classification accuracy of the data in the KNN and SVM classifiers after adding samples with different ratio columns in 12 data sets. The experimental results are shown in Tables 11 and 12. Among them, ‘RAW’ represents the accuracy of classification under all features. Since the BM-MCEFS algorithm is an incremental algorithm optimized for the MCEFS algorithm, when the added samples are consistent, the classification accuracy error of the data is small. Therefore, in the testing of incremental algorithms, this paper compares the BM-MCEFS algorithm with the AFFS algorithm, the FRDM algorithm, and the MPRB algorithm and the IRS algorithm and the ASIRA algorithm.

Table 11.

The classification accuracy with a certain ratio of column samples added (SVM).

Data set Algorithm 10% 20% 30% 40% 50% Average
IR BM-MCEFS Inline graphic 97.09±8.90 Inline graphic 96.26±14.07 Inline graphic Inline graphic
IRS 97.32±13.24 Inline graphic 95.34±12.37 96.25±14.31 96.60±14.23 96.53±12.75
ASIRA 97.77±13.57 97.06±8.78 95.59±10.99 96.21±14.12 96.32±13.89 96.59±12.44
AFFS 96.67±10.20 91.19±8.56 90.68±14.93 95.69±20.59 89.90±17.79 92.83±14.41
FRDM 97.78±8.90 97.09±8.90 95.68±11.87 Inline graphic 91.52±8.02 95.67±10.35
MPRB 97.75±13.33 97.00±8.00 95.68±11.87 96.26±14.05 96.52±13.07 96.64±12.25
RAW 96.67±14.23 91.18±9.37 90.68±11.87 95.26±14.07 86.67±8.90 92.09±11.69
CB BM-MCEFS Inline graphic Inline graphic 66.80±35.53 74.74±40.71 Inline graphic Inline graphic
IRS 64.45±38.12 68.44±41.45 67.78±34.67 74.87±40.37 71.14±30.32 69.34±37.21
ASIRA 64.54±40.10 68.89±41.43 68.43±36.87 74.75±39.43 71.12±30.06 69.55±37.79
AFFS 64.10±6.49 68.95±5.73 66.69±5.00 62.87±31.72 57.64±22.89 64.05±14.37
FRDM 58.01±33.85 61.71±49.62 64.45±43.14 Inline graphic 67.26±31.61 65.32±37.64
MPRB 58.72±44.57 60.43±38.73 Inline graphic 73.01±33.87 66.31±37.07 65.74±37.21
RAW 63.91±40.26 59.67±50.11 60.18±38.16 66.58±34.98 69.71±34.73 64.01±39.65
GL BM-MCEFS Inline graphic Inline graphic 78.14±17.09 Inline graphic 79.89±15.17 Inline graphic
IRS 87.91±16.34 87.15±22.56 82.34±18.11 78.01±17.21 79.76±15.42 83.03±18.10
ASIRA 88.88±19.23 86.87±23.81 79.66±15.21 77.49±18.53 Inline graphic 82.56±18.69
AFFS 88.13±20.99 86.54±23.24 76.41±14.11 77.82±20.28 78.01±12.57 81.38±18.24
FRDM 88.13±15.85 73.38±15.64 73.46±9.99 73.11±18.10 79.44±15.13 77.50±14.94
MPRB 71.43±13.26 73.38±15.64 Inline graphic 72.08±25.71 77.97±13.59 77.02±17.40
RAW 88.13±20.99 86.54±23.24 76.41±14.11 77.82±20.28 78.94±13.49 81.57±18.42
TME BM-MCEFS 73.97±19.74 Inline graphic Inline graphic Inline graphic 70.08±9.78 Inline graphic
IRS 73.84±28.34 76.26±14.87 75.06±6.88 77.16±5.98 70.08±9.78 74.48±15.51
ASIRA Inline graphic 76.44±12.87 74.76±8.59 77.27±8.24 70.12±9.67 74.52±12.69
AFFS 68.44±18.68 75.17±15.85 72.30±13.38 77.06±14.11 Inline graphic 73.71±15.00
FRDM 61.34±16.72 66.16±8.30 67.61±15.56 66.19±11.88 65.84±14.73 65.43±13.44
MPRB 69.73±17.73 67.65±8.13 72.60±11.98 66.46±11.41 73.35±12.27 69.96±12.69
RAW 66.88±11.97 68.40±13.67 74.37±15.23 68.46±12.93 69.10±13.13 69.44±13.39
HCE BM-MCEFS Inline graphic 49.36±7.85 Inline graphic Inline graphic Inline graphic Inline graphic
IRS 46.54 ±4.92 49.32±7.69 47.16±11.72 46.95±4.26 44.53±8.74 46.90±7.94
ASIRA 47.03 ±5.47 Inline graphic 47.22±10.15 46.23±7.34 44.26±8.48 46.83±8.03
AFFS 44.72±7.85 44.46±8.17 43.41±6.61 44.34±6.44 44.66±6.00 44.32±7.01
FRDM 44.42±7.30 44.09±7.68 44.00±5.95 45.44±8.99 43.86±6.22 44.36±7.23
MPRB 46.45±7.74 43.81±10.52 42.90±6.95 44.11±6.68 43.51±7.74 44.16±8.04
RAW 44.72±7.85 42.15±5.79 42.73±5.77 44.34±6.44 44.66±6.01 43.72±6.37
IC BM-MCEFS 88.65±3.54 89.11±3.68 Inline graphic Inline graphic Inline graphic Inline graphic
IRS 87.34±2.51 89.02±3.05 89.19±4.29 89.05±4.22 89.31±2.37 88.78±3.39
ASIRA Inline graphic 88.75±5.39 89.21±8.21 88.42±4.77 88.67±3.72 88.75±5.42
AFFS 88.61±3.42 89.12±3.68 89.40±3.35 89.35±3.15 89.25±2.30 89.16±3.12
FRDM 88.45±3.52 89.10±3.68 89.40±3.35 89.39±3.15 84.28±0.34 88.12±2.81
MPRB 88.65±3.62 Inline graphic 89.40±3.35 89.39±3.15 89.15±2.41 89.14±3.27
RAW 88.35±3.62 89.10±3.68 89.40±3.35 89.33±3.15 79.55±2.32 87.15±3.22
WQ BM-MCEFS 49.14±6.44 Inline graphic 50.54±6.48 Inline graphic 49.41±8.42 Inline graphic
IRS 49.84±5.34 49.23±5.44 51.01±6.51 51.46±7.11 49.41±8.42 50.19±6.66
ASIRA 49.91±8.72 49.02±6.19 50.51±6.32 50.88±6.45 49.87±8.54 50.04±7.33
AFFS 45.20±3.52 44.93±3.31 46.31±2.91 47.63±4.60 45.03±0.21 45.82±2.91
FRDM Inline graphic 49.62±4.83 Inline graphic 51.86±7.26 45.07±1.03 49.32±4.91
MPRB 49.20±6.33 48.70±6.16 50.19±6.52 50.99±7.29 Inline graphic 50.05±6.78
RAW 49.05±6.35 49.32±6.07 50.54±6.48 51.08±7.48 45.03±7.68 49.00±6.81
SHB BM-MCEFS Inline graphic 98.32±1.17 Inline graphic 98.15±1.06 Inline graphic Inline graphic
IRS 98.21±1.44 97.54±1.43 97.65±1.21 Inline graphic 97.69±0.68 97.86±1.23
ASIRA 97.87±2.13 Inline graphic 98.21±0.94 98.11±1.21 97.87±1.87 98.10±1.78
AFFS 98.32±1.39 92.56±2.20 93.48±1.21 93.41±1.77 98.04±0.94 95.16±1.50
FRDM 98.34±1.30 97.65±0.91 97.60±0.80 97.52±0.80 98.04±0.91 97.83±0.94
MPRB 97.68±1.15 97.65±0.91 97.60±0.80 97.52±0.80 98.01±0.85 97.69±0.91
RAW 98.32±1.39 98.32±1.17 98.20±0.99 92.09±0.79 88.04±0.91 94.99±1.05
ROE BM-MCEFS Inline graphic Inline graphic 98.98±3.36 Inline graphic 98.60±2.82 Inline graphic
IRS 98.15±3.48 99.13±2.58 Inline graphic 98.37±3.52 98.33±2.48 98.60±15.39
ASIRA 98.997±2.76 99.05±2.51 98.77±2.71 98.05±2.79 Inline graphic 98.71±2.57
AFFS 97.31±7.00 97.77±5.83 97.53±4.95 95.78±5.46 89.47±8.20 95.57±6.29
FRDM 86.23±7.29 88.16±6.21 97.14±6.92 89.68±9.42 91.19±7.92 90.48±7.55
MPRB 86.23±7.29 88.16±6.21 96.96±7.63 90.20±8.49 91.19±7.89 90.55±7.54
RAW 99.04±3.29 99.17±2.83 98.95±3.35 98.39±3.21 88.58±2.82 96.83±3.10
MGT BM-MCEFS Inline graphic Inline graphic 86.43±1.11 81.56±0.99 79.16±1.00 Inline graphic
IRS 87.66±1.07 89.16±0.75 86.49±1.21 81.56±0.99 79.19±1.02 84.81±1.02
ASIRA 87.54±1.28 88.68±1.22 86.51±2.15 Inline graphic 79.22±1.31 84.71±1.49
AFFS 87.66±1.03 89.23±0.76 85.29±0.78 81.53±0.99 78.59±1.51 84.46±1.01
FRDM 87.61±0.98 89.21±0.85 Inline graphic 81.56±0.99 79.06±1.10 84.78±1.04
MPRB 86.41±0.59 89.21±0.85 85.41±1.17 81.54±1.24 Inline graphic 84.64±1.03
RAW 87.66±1.03 89.21±0.88 86.42±1.10 81.55±1.05 79.16±0.90 84.80±0.99
TR BM-MCEFS Inline graphic Inline graphic 99.58±2.50 Inline graphic 97.86±5.71 Inline graphic
IRS 99.32±3.12 99.38±3.18 Inline graphic 99.01±2.51 97.64±4.87 98.99±3.37
ASIRA 99.16±2.67 99.51±2.54 99.52±2.31 99.10±2.71 Inline graphic 99.03±3.48
AFFS 97.87±3.11 97.12±4.23 98.55±2.61 97.46±4.58 91.03±11.21 96.41±6.02
FRDM 99.00±4.00 99.09±5.45 98.75±3.82 98.46±3.77 91.57±3.50 97.37±4.17
MPRB 99.00±4.00 99.09±5.45 98.75±3.82 98.46±3.77 96.57±3.50 98.37±4.17
RAW 96.50±3.00 96.55±2.73 96.58±2.50 96.23±3.08 96.29±2.86 96.43±2.84
SHD M-MCEFS Inline graphic 93.68±16.07 Inline graphic 91.34±0.48 Inline graphic Inline graphic
IRS 93.12±11.76 93.54±15.44 91.54±15.42 91.34±0.48 97.33±14.87 93.37±12.93
ASIRA 92.87±13.91 Inline graphic 91.09±17.65 91.34±0.48 97.39±16.21 93.28±14.60
AFFS 90.31±12.03 88.75±23.71 88.00±34.97 90.42±11.25 93.29±16.97 90.15±21.65
FRDM 91.52±23.07 88.00±34.97 88.00±34.97 94.21±19.57 93.34±16.33 91.01±26.94
MPRB 91.93±22.91 88.40±35.24 76.64±29.68 Inline graphic 91.17±15.73 88.47±25.60
RAW 91.08±15.19 93.68±16.07 91.67±19.47 92.31±18.43 93.29±16.97 92.41±17.30

Table 12.

The classification accuracy with a certain ratio of column samples added (KNN).

Data set Algorithm 10% 20% 30% 40% 50% Average
IR BM-MCEFS Inline graphic 96.09±13.11 Inline graphic Inline graphic 96.52±13.07 Inline graphic
IRS 97.45±8.32 95.43±12.67 96.14±2.31 94.52±12.11 94.28±9.87 95.56±9.79
ASIRA 97.22±6.73 Inline graphic 96.47±2.31 94.52±14.27 Inline graphic 96.26±10.78
AFFS 97.68±8.90 71.18±19.12 74.85±19.48 75.44±12.81 78.48±13.42 79.53±14.75
FRDM 97.58±8.90 96.09±13.11 94.85±11.71 94.67±15.43 95.86±14.28 95.81±12.69
MPRB 96.67±14.23 95.09±13.29 94.85±11.71 94.67±15.43 94.72±12.13 95.20±13.43
RAW 97.58±8.89 95.09±13.29 94.85±11.71 94.73±13.78 95.19±12.74 95.49±12.08
CB BM-MCEFS 65.58±28.18 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
IRS Inline graphic 74.32±19.87 66.45±30.13 67.76±32.54 61.47±25.81 67.22±26.90
ASIRA 65.08±26.41 75.04±22.12 66.41±28.12 68.09±28.91 61.41±26.23 67.21±26.46
AFFS 57.88±29.77 58.43±29.54 60.70±13.19 56.37±22.96 51.74±31.04 57.02±25.30
FRDM 64.87±36.52 70.57±37.17 65.33±31.29 64.33±35.70 61.50±32.39 65.32±34.61
MPRB 63.33±20.26 73.19±33.10 62.35±34.55 64.94±46.51 60.02±32.89 64.77±34.48
RAW 61.67±21.59 70.48±35.34 62.39±29.37 66.46±32.39 60.00±28.85 64.20±29.51
GL BM-MCEFS Inline graphic Inline graphic 88.53±16.24 Inline graphic Inline graphic Inline graphic
IRS 91.09±17.62 90.34±18.23 Inline graphic 86.54±19.67 88.54±20.21 89.11±18.74
ASIRA 90.78±15.76 90.78±17.131 88.67±16.45 86.92±21.72 88.72±20.59 89.17±18.48
AFFS 86.70±22.53 87.08±17.05 85.62±18.65 86.11±14.29 85.50±21.60 86.20±18.82
FRDM 89.62±13.26 90.25±15.73 85.03±17.20 84.55±13.44 85.97±19.24 87.08±15.77
MPRB 90.32±13.30 90.25±15.73 83.14±24.82 Inline graphic 87.58±26.68 88.62±23.82
RAW 86.70±22.53 87.08±17.05 85.62±18.65 86.11±14.29 85.97±19.24 86.30±18.35
TME BM-MCEFS Inline graphic Inline graphic 61.22±14.76 62.52±15.71 Inline graphic Inline graphic
IRS 65.45±21.64 67.23±12.12 63.34±12.14 64.32±15.19 58.11±15.88 63.69±15.78
ASIRA 65.77±22.12 67.55±11.05 62.65±13.81 63.83±11.09 58.07±14.82 63.57±15.13
AFFS 65.42±20.17 61.03±21.19 64.16±17.92 Inline graphic 57.51±11.01 62.58±16.12
FRDM 65.61±21.55 59.66±12.70 Inline graphic 61.23±13.37 55.31±12.80 61.22±14.76
MPRB 59.22±17.27 60.75±10.43 61.02±11.41 59.74±13.62 58.79±12.47 60.30±13.25
RAW 63.84±16.97 64.38±16.27 64.16±16.23 62.11±15.07 52.05±11.62 61.31±15.23
HCE BM-MCEFS 47.56±7.28 Inline graphic Inline graphic 45.04±7.17 43.07±5.20 Inline graphic
IRS Inline graphic 46.23±8.17 48.54±10.11 45.26±7.62 43.59±6.35 46.37±8.23
ASIRA 47.32±7.41 46.37±8.63 48.68±10.43 45.18±7.04 43.73±5.86 46.26±8.03
AFFS 45.74±7.54 45.39±7.19 45.19±6.13 44.81±7.47 Inline graphic 45.13±7.03
FRDM 45.13±7.65 44.19±8.03 44.35±9.04 Inline graphic 44.29±5.94 44.66±7.11
MPRB 45.13±7.84 42.81±9.03 43.09±10.23 45.12±7.99 45.33±8.50 44.30±8.76
RAW 45.74±7.54 43.27±8.16 43.59±8.31 44.81±7.47 41.52±6.78 43.79±7.65
IC BM-MCEFS 94.97±2.76 Inline graphic 95.29±2.67 Inline graphic 95.05±2.92 Inline graphic
IRS Inline graphic 95.22±4.22 95.24±2.87 94.88±4.27 94.37±2.45 95.11±3.53
ASIRA 94.69±2.81 95.31±3.26 95.06±2.45 94.92±2.53 95.01±2.81 95.00±2.79
AFFS Inline graphic 95.29±3.21 95.29±2.84 93.92±3.20 95.01±2.92 94.92±2.99
FRDM 94.81±3.08 95.20±3.10 Inline graphic 94.98±2.72 84.28±0.33 92.94±2.39
MPRB 94.76±3.22 95.13±3.03 95.45±2.73 94.98±2.72 Inline graphic 95.12±2.86
RAW 94.92±2.83 95.33±3.01 95.37±2.70 94.87±3.00 82.24±2.90 92.55±2.89
WQ BM-MCEFS Inline graphic 47.06±7.17 Inline graphic 47.58±8.17 45.77±7.84 Inline graphic
IRS 45.19±5.32 Inline graphic 47.05±5.32 47.64±8.47 45.87±7.56 46.59±7.05
ASIRA 45.29±5.72 47.01±7.57 47.01±4.92 47.97±9.48 45.93±8.63 46.64±7.46
AFFS 44.95±4.85 47.14±9.98 45.77±5.51 47.50±7.15 42.81±6.21 45.63±6.74
FRDM 44.68±5.16 46.92±5.27 47.25±5.79 Inline graphic 44.52±6.67 46.40±6.56
MPRB 45.23±5.29 45.91±5.01 46.09±5.42 48.17±8.92 Inline graphic 46.32±7.36
RAW 45.31±3.08 46.29±4.74 47.10±5.46 47.46±11.59 48.85±10.82 47.00±7.14
SHB BM-MCEFS Inline graphic Inline graphic Inline graphic Inline graphic 98.86±0.91 Inline graphic
IRS 99.08±1.21 99.12±0.38 99.21±1.01 98.87±0.69 Inline graphic 99.04±0.89
ASIRA 99.10±1.11 99.07±1.02 99.17±0.97 99.04±0.59 98.84±0.88 98.52±3.02
AFFS 99.01±1.05 92.54±2.40 92.46±2.31 92.43±2.51 97.04±1.10 94.70±1.87
FRDM 98.34±1.47 97.34±1.44 97.60±0.80 97.52±0.80 97.83±1.10 97.73±1.12
MPRB 97.68±1.15 97.34±1.44 97.60±0.80 97.52±0.80 97.97±0.93 97.62±1.05
RAW 98.11±1.05 99.07±1.01 99.14±0.88 98.13±1.10 88.92±1.03 96.67±1.01
ROE BM-MCEFS Inline graphic Inline graphic 98.46±3.22 Inline graphic Inline graphic Inline graphic
IRS 99.19±2.21 99.38±1.54 98.23±2.76 98.21±3.48 97.65±5.43 98.53±3.36
ASIRA 99.16±2.24 99.31±1.41 Inline graphic 98.09±3.56 97.59±3.98 98.52±3.02
AFFS 90.17±26.88 93.73±17.57 92.48±17.49 94.97±13.75 89.22±11.20 92.11±17.38
FRDM 93.60±6.71 94.50±4.46 96.20±7.87 95.25±9.17 97.47±5.80 95.44±6.21
MPRB 93.60±6.71 94.50±4.46 97.16±5.55 97.19±6.12 97.45±5.75 95.98±5.77
RAW 98.87±3.17 99.05±2.62 98.38±4.04 98.15±3.70 87.99±4.20 96.07±3.51
MGT BM-MCEFS Inline graphic Inline graphic 89.18±1.21 Inline graphic 83.89±1.61 Inline graphic
IRS 89.87±2.56 91.37±1.67 Inline graphic 85.19±1.37 83.89±1.61 87.91±1.84
ASIRA 90.13±2.76 91.29±1.76 89.24±1.65 85.17±1.12 83.77±1.25 87.92±1.80
AFFS 90.16±1.38 91.21±1.23 86.77±1.15 85.22±1.26 80.49±1.62 86.77±1.33
FRDM 90.10±1.33 91.38±1.12 89.14±1.37 85.23±1.23 Inline graphic 87.96±1.45
MPRB 89.24±1.36 91.38±1.12 89.14±1.37 85.42±1.58 82.41±1.27 87.52±1.35
RAW 90.16±1.38 91.21±1.23 88.89±1.46 85.17±1.25 81.77±1.61 87.44±1.39
TR BM-MCEFS Inline graphic Inline graphic 99.89±0.12 Inline graphic 95.34±7.85 Inline graphic
IRS 99.47±3.34 98.84±3.47 99.86±0.17 99.58±2.78 95.19±6.45 98.59±3.81
ASIRA 99.42±2.89 99.02±3.51 Inline graphic 99.45±2.17 95.31±7.34 98.62±3.98
AFFS 96.12±2.74 98.73±3.71 92.64±6.91 92.11±3.27 88.89±5.93 93.70±4.79
FRDM 99.00±4.00 99.09±5.45 90.38±7.48 88.82±2.12 79.62±1.98 91.38±4.69
MPRB 99.00±4.00 99.09±5.45 90.38±7.48 88.82±2.12 86.62±1.98 92.78±4.69
RAW 99.50±3.00 99.09±3.64 99.17±3.33 99.23±3.08 Inline graphic 98.66±3.19
SHD BM-MCEFS Inline graphic 84.18±0.56 Inline graphic 85.34±0.48 Inline graphic Inline graphic
IRS 90.29±11.26 Inline graphic 84.66±0.38 85.32±0.45 88.07±11.76 86.51±7.29
ASIRA 90.35±12.02 84.18±0.56 84.74±0.72 85.31±0.34 88.11±11.26 86.54±7.38
AFFS 84.21±1.82 81.45±3.37 82.78±8.67 Inline graphic 87.62±4.77 84.65±5.03
FRDM 83.53±0.61 83.78±2.58 84.40±2.47 84.97±2.38 85.50±2.28 84.44±2.19
MPRB 83.53±0.61 82.58±9.74 84.79±0.52 84.97±2.38 86.93±9.00 84.56±6.04
RAW 80.13±17.52 81.34±9.77 81.67±8.12 81.20±7.55 81.17±7.28 81.10±10.75

It is not difficult to see from Tables 11 and 12 that under the KNN classifier, the average accuracy of the BM-MCEFS algorithm on 12 data sets is better than other algorithms. At the same time, under the SVM classifier, the classification accuracy of the BM-MCEFS algorithm is significantly higher than that of other algorithms. For example, on the HCE dataset in Table 11, the average accuracy of the BM-MCEFS algorithm is 47.10, while the average accuracy of the AFFS algorithm is 44.32, while the FRDM algorithm being 44.36 and the MPRB 44.16.

To evaluate the robustness of the BM-MCEFS algorithm, 10% label noise was introduced into a portion of sequentially added samples in four selected datasets. The feature selection was then performed under these noisy conditions. In the experiment, ”RAW” denotes the classification accuracy using the original feature set, while ”Noised data” refers to the classification accuracy obtained after injecting 10% of label noise into the full feature set. The experimental results are presented in Figs. 13 and 14. As observed, the BM-MCEFS algorithm consistently achieves higher classification accuracy across most noise levels. This performance is primarily attributed to the fact that BM-MCEFS inherits the minimum classification error criterion from the MCEFS algorithm.

Fig. 13.

Fig. 13

The classification accuracy with noise added into a certain ratio column sample(SVM).

Fig. 14.

Fig. 14

The classification accuracy with noise added into a certain ratio column sample(KNN).

To further assess the statistical performance of the five algorithms, a comparative analysis was conducted based on computation time, as shown in Fig. 12, under a condition where 50% of the samples were incrementally added. The resulting Critical Difference (CD) diagram is also provided.

In this analysis, Inline graphic, Inline graphic, if Inline graphic, then the critical value56 Inline graphic. The Friedman statistic is 46.64, which exceeds the critical value, leading to the rejection of the null hypothesis of equivalence. This indicates significant differences among the algorithms. According to Equation 25, CD = 2.60. As shown in Fig. 15, the Nemenyi test confirms that the BM-MCEFS algorithm performs significantly better than the others, further highlighting its competitiveness.

Fig. 15.

Fig. 15

The Nemenyi test results of the computational times for the five algorithms.

Conclusions

Feature selection is an effective approach to high-dimensional data analysis, as it reduces data redundancy while preserving essential discriminative information. Incremental learning further enhances this process by leveraging prior knowledge to efficiently adapt to dynamically evolving data environments. This paper investigates feature selection methods based on fuzzy rough sets and identifies a key limitation of the classical fuzzy positive region: its failure to fully exploit the rich membership information embedded in the fuzzy lower approximation. To address this issue, we propose a novel Minimum Classification Error-based Feature Selection framework (MCEFS). The method constructs continuous membership curves over the universe of discourse and quantifies inter-class separability using inner product correlation, thereby effectively uncovering discriminative information beyond the traditional fuzzy positive region. Moreover, by integrating efficient matrix computation strategies, the generation of the fuzzy lower approximation is significantly accelerated, substantially improving the computational efficiency of static feature selection. Building on this foundation, we further develop an incremental variant–BM-MCEFS–that employs a block matrix mechanism to dynamically update both the fuzzy relation matrix and the fuzzy decision matrix. By reusing and incrementally refining sub-blocks of these matrices, the algorithm avoids full recomputation during data updates, greatly reducing time overhead in dynamic scenarios. Experimental results on 12 benchmark datasets demonstrate that both MCEFS and BM-MCEFS achieve high classification accuracy while offering markedly superior computational efficiency compared to state-of-the-art methods. The proposed algorithms hold significant practical value in real-world applications involving highly dimensional, streaming, or frequently updated, streaming data. For example: In smart urban management57, they can dynamically identify key indicators–such as traffic flow, environmental quality, and land use–to support resilient city planning; in agricultural cooperation systems58, they enable effective selection of environmental and socio-economic features, facilitating multidimensional sustainability assessments that go beyond yield alone; in industrial production optimization59, they support real-time monitoring and feature-driven anomaly detection, thereby improving resource utilization and system stability.These capabilities align closely with the current social demands for sustainable development, digital transformation, and intelligent decision-making.

Nevertheless, the proposed method has certain limitations. Its effectiveness relies on the relative stability of the underlying data distribution, particularly the continuity of class-center sample structures. When class centers undergo abrupt shifts due to concept drift, the incremental update mechanism may lag behind. Additionally, although inner product correlation significantly strengthens feature discriminability, it also introduces additional computational overhead. In future work, we will focus on three main directions: designing block-update strategies that respond to feature-level changes to better capture local dynamics; extending the use of inner product correlation to broader supervised learning tasks, such as multi-label and imbalanced learning; integrating incremental and batch processing mechanisms to enhance robustness against concept drift. Our ultimate goal is to develop a more scalable, adaptive, and interpretable feature selection framework that can be effectively deployed across diverse real-world applications.

Author contributions

Z.W.C.: Conceptualization, Problem formulation, Methodology, Writing – original draft, Review and editing, Final draft. M.G.X.: Numerical analysis, Programming, Mathematical modeling. J.L.: Supervision, Problem formulation, Programming, Writing – review and editing, Final drafting.

Funding

This study was funded by the National Natural Science Foundation of China (No.62066044), the 2025 Autonomous Region Graduate Education Innovation Plan Project (No.XJ2025G209) and the Xinjiang Normal University Smart Education Engineering Technology Research Center Project (No.XJNU-ZHJY202410).

Data availability

The datasets supporting this study can be obtained from the corresponding author or downloaded directly from the provided URLs. The basic information of the 12 datasets used in this paper is as follows: Markelle Kelly, Rachel Longjohn, Kolby Nottingham, The UCI Machine Learning Repository (https://archive.ics.uci.edu/). The specific download URLs for each dataset are as follows: MAGIC Gamma Telescope: Available at https://archive.ics.uci.edu/datasets/?search=MAGIC+Gamma+Telescope. Room Occupancy Estimation: Available at https://archive.ics.uci.edu/datasets/?search=Room+Occupancy+Estimation. Shill+Bidding: Available at https://archive.ics.uci.edu/datasets/?search=Shill+Bidding. Wine Quality: Available at https://archive.ics.uci.edu/datasets/?search=Wine+Quality. Iranian Churn: Available at https://archive.ics.uci.edu/datasets/?search=Iranian+Churn. Hepatitis C Virus for Egyptian patients: Available at https://archive.ics.uci.edu//datasets/?search=Hepatitis+C+Virus+for+Egyptian+patients. Turkish Music Emotion: Available at https://archive.ics.uci.edu/datasets/?search=Turkish+Music+Emotion. TUANDROMD: Available at https://archive.ics.uci.edu/datasets/?search=TUANDROMD. Semeion Handwritten Digit: Available at https://archive.ics.uci.edu/dataset//178/semeion+handwritten+digit. Glass: Available at https://archive.ics.uci.edu/dataset/42/glass+identification. Connectionist Bench: Available at https://archive.ics.uci.edu/dataset/151/connectionist+bench+sonar+mines+vs+rocks. Iris: Available at https://archive.ics.uci.edu/dataset/53/iris.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Yin, Z., Liu, L., Chen, J., Zhao, B. & Wang, Y. Locally robust eeg feature selection for individual-independent emotion recognition. Expert Syst. Appl.162, 113768 (2020). [Google Scholar]
  • 2.Sheikhpour, R., Saberi-Movahed, F., Jalili, M. & Berahmand, K. Semi-supervised feature selection with concept factorization and robust label learning. Pattern Recognit. 112317 (2025).
  • 3.Kou, G. et al. Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl. Soft Comput.86, 105836 (2020). [Google Scholar]
  • 4.Sheikhpour, R., Mohammadi, M., Berahmand, K., Saberi-Movahed, F. & Khosravi, H. Robust semi-supervised multi-label feature selection based on shared subspace and manifold learning. Inf. Sci.699, 121800 (2025). [Google Scholar]
  • 5.Fedorka P, Buchuk R, Klymenko M, Saibert F, Petrushyn A. The use of adaptive artificial intelligence (ai) learning models in decision support systems for smart regions. Journal of Research, Innovation and Technologies4, 99–115 (2025). [Google Scholar]
  • 6.Nejadshamsi, S., Bentahar, J., Eicker, U., Wang, C. & Jamshidi, F. A geographic-semantic context-aware urban commuting flow prediction model using graph neural network. Expert Syst. Appl.261, 125534 (2025). [Google Scholar]
  • 7.Qiu, Y., Bouraima, M., Badi, I., Stević, Ž & Simic, V. A decision-making model for prioritizing low-carbon policies in climate change mitigation. Chall. sustain12, 1–17 (2024). [Google Scholar]
  • 8.Chaoui, G., Yaagoubi, R. & Mastere, M. Integrating geospatial technologies and multi-criteria decision analysis for sustainable and resilient urban planning. Chall. Sustain13, 122–134 (2025). [Google Scholar]
  • 9.Fedorka, P., Buchuk, R., Klymenko, M., Saibert, F. & Petrushyn, A. The use of adaptive artificial intelligence (ai) learning models in decision support systems for smart regions. Journal of Research, Innovation and Technologies7, 99–115 (2025). [Google Scholar]
  • 10.Krause, A. & Köppel, J. A multi-criteria approach for assessing the sustainability of small-scale cooking and sanitation technologies. Challenges in Sustainability6, 1–19 (2018). [Google Scholar]
  • 11.Qiu, Y., Bouraima, M., Badi, I., Stević, Ž & Simic, V. A decision-making model for prioritizing low-carbon policies in climate change mitigation. Chall. sustain12, 1–17 (2024). [Google Scholar]
  • 12.Chaoui, G., Yaagoubi, R. & Mastere, M. Integrating geospatial technologies and multi-criteria decision analysis for sustainable and resilient urban planning. Challenges in Sustainability13, 122–134 (2025). [Google Scholar]
  • 13.Terentieva, K., Karpenko, I., Yarova, T., Shkvyria, O. & Pasko, Y. Technological innovation in digital brand management: Leveraging artificial intelligence and immersive experiences. Journal of Research, Innovation and Technologies4, 201–223 (2025). [Google Scholar]
  • 14.Wolf, B. M., Häring, A.-M. & Heß, J. Strategies towards evaluation beyond scientific impact: Pathways not only for agricultural research. Organic Farming1, 3–18 (2015). [Google Scholar]
  • 15.Dubois, D. & Prade, H. Rough fuzzy sets and fuzzy rough sets. International Journal of General System17, 191–209 (1990). [Google Scholar]
  • 16.Lang, G., Li, Q., Cai, M., Yang, T. & Xiao, Q. Incremental approaches to knowledge reduction based on characteristic matrices. Int. J. Mach. Learn. Cybern.8, 203–222 (2017). [Google Scholar]
  • 17.Liang, J., Wang, F., Dang, C. & Qian, Y. A group incremental approach to feature selection applying rough set technique. IEEE Trans. Knowl. Data Eng.26, 294–308 (2012). [Google Scholar]
  • 18.Sun, L., Zhang, X., Qian, Y., Xu, J. & Zhang, S. Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf. Sci.502, 18–41 (2019). [Google Scholar]
  • 19.Feng, Y., Hua, Z. & Liu, G. Partial reduction algorithms for fuzzy relation systems. Knowl.-Based Syst.188, 105047 (2020). [Google Scholar]
  • 20.Theerens, A. & Cornelis, C. Fuzzy rough sets based on fuzzy quantification. Fuzzy Sets Syst.473, 108704 (2023). [Google Scholar]
  • 21.Alnoor, A. et al. Toward a sustainable transportation industry: Oil company benchmarking based on the extension of linear diophantine fuzzy rough sets and multicriteria decision-making methods. IEEE Trans. Fuzzy Syst.31, 449–459 (2022). [Google Scholar]
  • 22.Riaz, M. & Hashmi, M. R. Linear diophantine fuzzy set and its applications towards multi-attribute decision-making problems. Journal of Intelligent & Fuzzy Systems37, 5417–5439 (2019). [Google Scholar]
  • 23.Yang, X., Chen, H., Li, T. & Luo, C. A noise-aware fuzzy rough set approach for feature selection. Knowl.-Based Syst.250, 109092 (2022). [Google Scholar]
  • 24.Ye, J., Zhan, J. & Xu, Z. A novel multi-attribute decision-making method based on fuzzy rough sets. Computers & Industrial Engineering155, 107136 (2021). [Google Scholar]
  • 25.He, J. et al. Attribute reduction in an incomplete categorical decision information system based on fuzzy rough sets. Artif. Intell. Rev.55, 5313–5348 (2022). [Google Scholar]
  • 26.Zhang, K. & Dai, J. Redefined fuzzy rough set models in fuzzy covering group approximation spaces. Fuzzy Sets Syst.442, 109–154 (2022). [Google Scholar]
  • 27.Deng, Z. et al. Feature selection for label distribution learning based on neighborhood fuzzy rough sets. Appl. Soft Comput.169, 112542 (2025). [Google Scholar]
  • 28.Wang, C., Huang, Y., Shao, M. & Fan, X. Fuzzy rough set-based attribute reduction using distance measures. Knowl.-Based Syst.164, 205–212 (2019). [Google Scholar]
  • 29.Zhang, X., Mei, C., Chen, D. & Li, J. Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy. Pattern Recogn.56, 1–15 (2016). [Google Scholar]
  • 30.Qian, W., Huang, J., Wang, Y. & Shu, W. Mutual information-based label distribution feature selection for multi-label learning. Knowl.-Based Syst.195, 105684 (2020). [Google Scholar]
  • 31.Qiu, Z. & Zhao, H. A fuzzy rough set approach to hierarchical feature selection based on hausdorff distance. Appl. Intell.52, 11089–11102 (2022). [Google Scholar]
  • 32.Sun, Y. & Zhu, P. Online group streaming feature selection based on fuzzy neighborhood granular ball rough sets. Expert Syst. Appl.249, 123778 (2024). [Google Scholar]
  • 33.An, S. et al. Relative fuzzy rough approximations for feature selection and classification. IEEE Transactions on Cybernetics53, 2200–2210 (2023). [DOI] [PubMed] [Google Scholar]
  • 34.Liang, P., Lei, D., Chin, K. & Hu, J. Feature selection based on robust fuzzy rough sets using kernel-based similarity and relative classification uncertainty measures. Knowl.-Based Syst.255, 109795 (2022). [Google Scholar]
  • 35.Zhang, Y., Wang, C., Huang, Y., Ding, W. & Qian, Y. Adaptive relative fuzzy rough learning for classification. IEEE Trans. Fuzzy Syst.32, 6267–6276 (2024). [Google Scholar]
  • 36.Chen, X., Lai, L. & Luo, M. A novel fusion and feature selection framework for multisource time-series data based on information entropy. IEEE Trans. Neural Netw. Learn. Syst. (2025). [DOI] [PubMed]
  • 37.Wang, C., Qian, Y., Ding, W. & Fan, X. Feature selection with fuzzy-rough minimum classification error criterion. IEEE Trans. Fuzzy Syst.30, 2930–2942 (2021). [Google Scholar]
  • 38.Xu, W. & Bu, Q. Matrix-based incremental feature selection method using weight-partitioned multigranulation rough set. Inf. Sci.681, 121219 (2024). [Google Scholar]
  • 39.Zhao, J. et al. Consistency approximation: Incremental feature selection based on fuzzy rough set theory. Pattern Recogn.155, 110652 (2024). [Google Scholar]
  • 40.Wang, T., Sun, B. & Jiang, C. Kernelized multi-granulation fuzzy rough set over hybrid attribute decision system and application to stroke risk prediction. Appl. Intell.53, 24876–24894 (2023). [Google Scholar]
  • 41.Sang, B. et al. Feature selection for dynamic interval-valued ordered data based on fuzzy dominance neighborhood rough set. Knowl.-Based Syst.227, 107223 (2021). [Google Scholar]
  • 42.Yu, J. & Xu, W. Incremental knowledge discovering in interval-valued decision information system with the dynamic data. Int. J. Mach. Learn. Cybern.8, 849–864 (2017). [Google Scholar]
  • 43.Xu, W., Yuan, K. & Li, W. Dynamic updating approximations of local generalized multigranulation neighborhood rough set. Appl. Intell.52, 9148–9173 (2022). [Google Scholar]
  • 44.Sang, B., Chen, H., Yang, L., Li, T. & Xu, W. Incremental feature selection using a conditional entropy based on fuzzy dominance neighborhood rough sets. IEEE Trans. Fuzzy Syst.30, 1683–1697 (2021). [Google Scholar]
  • 45.Wang, L., Pei, Z., Qin, K. & Yang, L. Incremental updating fuzzy tolerance rough set approach in intuitionistic fuzzy information systems with fuzzy decision. Appl. Soft Comput.151, 111119 (2024). [Google Scholar]
  • 46.Zhang, X., Liu, X. & Yang, Y. A fast feature selection algorithm by accelerating computation of fuzzy rough set-based information entropy. Entropy20, 788 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Zhang, X. & Li, J. Incremental feature selection approach to interval-valued fuzzy decision information systems based on -fuzzy similarity self-information. Inf. Sci.625, 593–619 (2023). [Google Scholar]
  • 48.Zhao, J., Ling, Y., Huang, F., Wang, J. & See-To, E. W. Incremental feature selection for dynamic incomplete data using sub-tolerance relations. Pattern Recogn.148, 110125 (2024). [Google Scholar]
  • 49.Qi, Z., Li, H., Liu, F., Chen, T. & Dai, J. Fusion decision strategies for multiple criterion preferences based on three-way decision. Information Fusion108, 102356 (2024). [Google Scholar]
  • 50.Xu, W., Yuan, Z. & Liu, Z. Feature selection for unbalanced distribution hybrid data based on k-nearest neighborhood rough set. IEEE Transactions on Artificial Intelligence5, 229–243 (2023). [Google Scholar]
  • 51.Gao, Y., Chen, D., Wang, H. & Shi, R. Optimization attribute reduction with fuzzy rough sets based on algorithm stability. IEEE Trans. Fuzzy Syst. (2023).
  • 52.Sang, B., Xu, W., Chen, H. & Li, T. Active antinoise fuzzy dominance rough feature selection using adaptive k-nearest neighbors. IEEE Trans. Fuzzy Syst.31, 3944–3958 (2023). [Google Scholar]
  • 53.Chen, J., Lin, Y., Mi, J., Li, S. & Ding, W. A spectral feature selection approach with kernelized fuzzy rough sets. IEEE Trans. Fuzzy Syst.30, 2886–2901 (2021). [Google Scholar]
  • 54.Dong, L., Wang, R. & Chen, D. Incremental feature selection with fuzzy rough sets for dynamic data sets. Fuzzy Sets Syst.467, 108503 (2023). [Google Scholar]
  • 55.Huang, W., She, Y., He, X. & Ding, W. Fuzzy rough sets-based incremental feature selection for hierarchical classification. IEEE Trans. Fuzzy Syst.31, 3721–3733 (2023). [Google Scholar]
  • 56.Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res.7, 1–30 (2006). [Google Scholar]
  • 57.Ulker Senkulak, B., Kanoglu, A. & Ozcevik, O. Simurg_cities conceptual model: Multi-dimensional and multi-layer performance-based assessment of urban sustainability at the city level. Chall. Sustain13, 425–444 (2025). [Google Scholar]
  • 58.Utomo, B., Soedarto, T., Winarno, S. & Hendrarini, H. Predicting the success of coffee farmer partnerships using factor analysis and multiple linear regression. Org. Farming11, 61–71 (2025). [Google Scholar]
  • 59.Baidalina, S. et al. Enhancing nutritional value and production efficiency of feeds through biochemical composition optimization. Org. Farming10, 80–93 (2024). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets supporting this study can be obtained from the corresponding author or downloaded directly from the provided URLs. The basic information of the 12 datasets used in this paper is as follows: Markelle Kelly, Rachel Longjohn, Kolby Nottingham, The UCI Machine Learning Repository (https://archive.ics.uci.edu/). The specific download URLs for each dataset are as follows: MAGIC Gamma Telescope: Available at https://archive.ics.uci.edu/datasets/?search=MAGIC+Gamma+Telescope. Room Occupancy Estimation: Available at https://archive.ics.uci.edu/datasets/?search=Room+Occupancy+Estimation. Shill+Bidding: Available at https://archive.ics.uci.edu/datasets/?search=Shill+Bidding. Wine Quality: Available at https://archive.ics.uci.edu/datasets/?search=Wine+Quality. Iranian Churn: Available at https://archive.ics.uci.edu/datasets/?search=Iranian+Churn. Hepatitis C Virus for Egyptian patients: Available at https://archive.ics.uci.edu//datasets/?search=Hepatitis+C+Virus+for+Egyptian+patients. Turkish Music Emotion: Available at https://archive.ics.uci.edu/datasets/?search=Turkish+Music+Emotion. TUANDROMD: Available at https://archive.ics.uci.edu/datasets/?search=TUANDROMD. Semeion Handwritten Digit: Available at https://archive.ics.uci.edu/dataset//178/semeion+handwritten+digit. Glass: Available at https://archive.ics.uci.edu/dataset/42/glass+identification. Connectionist Bench: Available at https://archive.ics.uci.edu/dataset/151/connectionist+bench+sonar+mines+vs+rocks. Iris: Available at https://archive.ics.uci.edu/dataset/53/iris.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES