Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 Jun 10;12179:78–92. doi: 10.1007/978-3-030-52705-1_6

Rough Sets Meet Statistics - A New View on Rough Set Reasoning About Numerical Data

Marko Palangetić 14,, Chris Cornelis 14, Salvatore Greco 15,16, Roman Słowiński 17,18
Editors: Rafael Bello8, Duoqian Miao9, Rafael Falcon10, Michinori Nakata11, Alejandro Rosete12, Davide Ciucci13
PMCID: PMC7338170

Abstract

In this paper, we present a new view on how the concept of rough sets may be interpreted in terms of statistics and used for reasoning about numerical data. We show that under specific assumptions, neighborhood based rough approximations may be seen as statistical estimations of certain and possible events. We propose a way of choosing the optimal neighborhood size inspired by statistical theory. We also discuss possible directions for future research on the integration of rough sets and statistics.

Keywords: Rough sets, Statistical learning, Neighborhood based rough sets

Introduction

Zdzisław Pawlak introduced rough sets in 1982 to deal with inconsistencies within information tables [15]. His approach is applied to the representation of classes of objects in an information table using two new sets called lower and upper approximation. The lower approximation contains objects which certainly belong to the approximated class, while the objects which are possibly in the approximated class are included in the upper approximation. Formulated in another way, the approach identifies the objects which are certainly consistent with the available knowledge and the objects which are possibly consistent with it. The original method is designed to deal with categorical data or data with a finite domain.

The extension of the model to numerical data faces some difficulties. One possibility to deal with numerical data is to discretize the attributes in the information table and make them categorical [7]. However, such an approach may lead to a loss of information, since discretization considers a set of values as one single value. The other option are neighborhood based rough sets where the equivalence class from Pawlak’s approach is replaced with the neighborhood of an object in a high dimensional Euclidean space [9]. They are related to similarity based rough sets [21], and are part of the more general family of covering based rough sets [26]. The third approach are fuzzy rough sets which use fuzzy generalizations of equivalence relations suitable for application to numerical data [5]. In this paper, we use probability and statistics instead of fuzziness to model uncertainty in data.

From the very beginning, it was acknowledged that Pawlak’s approach runs into limitations when it comes to problems which are more probabilistic than deterministic in nature [27]. In general, data consist of true values affected by some noise. Therefore, the first step in data analysis is to remove that noise in order to use the real values to solve the problem of interest. As a robust version of rough sets, the Variable Precision Rough Set (VPRS) approach was proposed by Ziarko [27]. It was also the first attempt to integrate the probabilistic approach and rough sets. Other probabilistic versions of rough sets were presented later, including decision theoretic rough sets [25] and parameterized rough sets [6]. Later on, Ziarko also introduced the assumption that the data are just a sample from an unknown space [28] into rough sets. That is a widely used assumption in statistics and machine learning: data are a realization of a random variable. With this assumption, we seek for a deeper integration of rough sets and statistics. In this paper, we propose a new view on the definition of rough sets, and provide a new definition independent of the type of data. It leads to a natural extension of the initial rough set approach to numerical data. We provide an example how to calculate rough sets for numerical data, elaborate on some of issues we are facing and present some ideas about how to direct the future research on integration of rough sets and statistics.

The paper is organized as follows. In the next section we recall basic concepts of rough set theory. In Sect. 3, statistical learning theory for Pawlak’s rough sets is introduced. Section 4 presents rough approximations for numerical data. Section 5 identifies and discusses some potential pitfalls and drawbacks identified in Sect. 4 together with ideas for improvement. Conclusions are provided in Sect. 6.

Preliminaries

Rough Sets

An information table is a 4-tuple Inline graphic where Inline graphic is a finite set of objects or alternatives, Inline graphic is a finite set of condition attributes, d is a decision attribute; Inline graphic, where Inline graphic is the domain of attribute Inline graphic while Y is the domain of d. The information function Inline graphic satisfies that Inline graphic and that Inline graphic. Denote by Inline graphic the joint domain of condition attributes, while Inline graphic represents the |Q|-tuple of values f(uq) for Inline graphic. If Inline graphic is finite, we say that q is categorical, while if Inline graphic we say that q is numerical.

First we assume that all condition attributes are categorical. We define the equivalence relation Inline graphic on objects u and v as Inline graphic. This means that two objects are related (indiscernible) if they are equally evaluated on all attributes. Let Inline graphic denote the equivalence class of object u, and Inline graphic. We recall Pawlak’s lower and upper approximations on U:

graphic file with name M19.gif

In the lower approximation of A, we include objects u for which all identically evaluated objects are also in A. Therefore, we may conclude that u for sure belongs to A based on available knowledge, since all the instances with the same values are also in A. We include object u in the upper approximation of A if there is an instance in A identically evaluated as u. Hence, we may say that u is possibly in A if some instances, identically evaluated as u, are in A. In this way, we distinguish certain and possible knowledge. Below, we list the important properties of inclusion and duality [15]:

  • (inclusion) Inline graphic,

  • (duality) Inline graphic, Inline graphic.

A question arises: how to apply a similar reasoning when we have numerical data? If we apply the reasoning presented above, the equivalence classes will mostly consist of only one object since it is almost impossible that two objects with numerical characteristics will be identically evaluated on all attributes. This means that all objects from A belong to the lower approximations of A, i.e., all objects from A certainly belong to A. However, in this way we ignore the fact that the noise present in data affects the certainty of objects belonging to a set. The noise is related to imprecision of numerical attributes and, even if the measurement of numerical attributes is precise, to human perception of these precise values.

A way to handle this problem is the neighborhood based rough set approach. Assume now that condition attributes are taking real values and let d be Euclidean distance on Inline graphic. Here, any distance metrics can be used, but Euclidean distance corresponds with the later statistical approach we will use. For object Inline graphic we define its Inline graphic-neighborhood Inline graphic. We define the approximations in the following way [9]:

graphic file with name M27.gif

Here, object u certainly belongs to A if its close neighborhood only contains objects from A. Object u possibly belongs to A if its close neighborhood contains at least one object from A. Equivalent properties of inclusion and duality also hold in this case [9].

From the definition we may see that the approximations heavily depend on the parameter Inline graphic. The question is, what is the optimal neighborhood size which will identify certain and possible knowledge. Later on we will see that statistical techniques may be useful for this purpose.

Value-Based Definitions and Inconclusive Regions

Pawlak defines the approximations as sets of objects (SO). The main goal of these definitions is to distinguish possible knowledge from certain knowledge and for this we do not need to refer exactly to the set of objects. We can define the approximations as sets of values (SV), i.e., the sets which will only contain values from the domain of condition attributes. Let Inline graphic. Similarly as in [8] we define sets Inline graphic. The SV approximations are

graphic file with name M31.gif

We refer to this definition as SV definition while the original one will be called SO definition. We note that the SV definition keeps the same knowledge as the SO definition. The SO approximations can be obtained from the SV definition by collecting all objects with condition values belonging to the SV approximations (lower or upper). The SV approximations can be obtained from the SO definition as a set of unique condition values f(uQ) of the objects from the SO approximations. Therefore, in terms of Pawlak’s environment of categorical data, SO and SV definitions are equivalent.

We notice that there are values from the domain which cannot be assigned to any approximation. In particular, the condition Inline graphic is necessary in the definitions. Otherwise a value x for which Inline graphic would belong to the lower approximations of A and Inline graphic at the same time, i.e., it would certainly belong to two opposite classes. Of course, that is not possible and such values from the domain are called inconclusive. We denote the set Inline graphic of inconclusive values by

graphic file with name M36.gif

The inclusion property is clearly preserved while duality still holds if the complement operator on Inline graphic excludes inconclusive values i.e., if it is defined as: Inline graphic for Inline graphic.

On the other hand, for the SV extension in the neighborhood based approximations, neighborhood may be defined for any value from the domain Inline graphic. If Inline graphic and Inline graphic we define Inline graphic. The SV approximations are:

graphic file with name M44.gif
graphic file with name M45.gif

An arbitrary value Inline graphic is in the lower approximation of A if its Inline graphic-neighborhood contains only objects from A while it is in the upper approximation if it contains at least one object from A. Here again we consider the inconclusive areas, i.e., values in which neighborhood there are no objects from U. As for the SV definitions for Pawlak’s rough sets, the inclusion property is preserved while duality holds with exclusion of the inconclusive areas. The SO and SV definitions are not equivalent in this case since SV is more general, and SO can be obtained from it, but not vice versa. For example, there can exist a value Inline graphic such that its neighborhood contains exactly one object Inline graphic and no elements from Inline graphic, and such that u is not in the SO lower approximation of A. The latter holds in particular if there exists some Inline graphic such that Inline graphic. However, x belongs to the SV lower approximation, and such x cannot be reconstructed from the SO lower approximation.

We will use the SV definition to derive a statistical extension of rough sets to numerical data.

A Statistical View of Pawlak’s Rough Sets

One widely used assumption in statistics and machine learning (ML) is that data are realizations of a joint random variable. Let objects be outcomes of the joint random variable Inline graphic where Inline graphic is a random variable corresponding to the condition attributes, while Inline graphic corresponds to the decision attribute. Since we are dealing with classification problems, we know that Inline graphic is always discrete, while Inline graphic is discrete if we work with categorical data, or Inline graphic takes values from Inline graphic if we have numerical data. Those random variables are unknown in practice, so using data as their realizations, we explain the relations between Inline graphic and Inline graphic.

The idea here is to redefine the approximations in terms of random variables instead of data. The SV approximations were defined on the domain w.r.t. neighborhood operators, while here the approximations are defined on the domain w.r.t. a random variable. In terms of statistics these are the “true” approximations dependent on unknown random variables. The SV approximations on data will play the role of estimators of such approximations.

Since Inline graphic is discrete, assume that its domain is the set Inline graphic for some K. Classification tasks in machine learning often refer to calculation of the conditional probabilities of the particular classes. More formally, for class Inline graphic we want to model the expression Inline graphic as a function of x for all x from the domain space (either a space of categories or Inline graphic). Assume now that the domain Inline graphic of Inline graphic is finite i.e., Inline graphic is discrete. If certainty is modeled in a probabilistic environment, we say that an event is certain if its probability is 1 while an event is possible if its probability is greater than 0. We want to know if value Inline graphic certainly belongs to class k, i.e., if Inline graphic. In practice, we do not have exact knowledge about the conditional distribution of Inline graphic on Inline graphic, so we need to estimate it. We recall the set of objects Inline graphic which is now a set of realizations of random variable Inline graphic, known as a sample. The empirical estimation of the above mentioned conditional probability is

graphic file with name M76.gif

where Inline graphic is the indicator function, Inline graphic is the number of objects Inline graphic equal to k, while Inline graphic is the number of objects Inline graphic equal to x. To estimate the set of values x for which Inline graphic, we use the estimated probability instead of the true one. We have that:

graphic file with name M83.gif
graphic file with name M84.gif

We obtain

graphic file with name M85.gif

The right side of the latter equality is identical to the SV definition of Pawlak’s rough sets, where [x] is replaced by Inline graphic while A is replaced with Inline graphic. Here, it can be noticed that the SV lower approximation may be seen as an estimation of the unknown lower approximation dependent on random variables. A similar procedure may be used for the upper approximation. This leads to the definition of the lower and upper approximations of the class k with respect to random variable Inline graphic:

graphic file with name M89.gif 1

We call this the RV definition of rough sets. Such defined “true” approximations do not require any assumptions on Inline graphic (Inline graphic being discrete or continuous) as long as the conditional probability is defined. This version of the approximations provides a natural extension of rough sets to numerical data (and all other types of data). In practice, approximation estimates for categorical and numerical data are different since the probability estimation is different in the discrete and the continuous case. We have already seen the estimation of the lower approximation for categorical data. Later on it will be shown how to estimate the approximations in the numerical case. The RV rough set definitions can be taken out of the context of classification and they can be extended to arbitrary events. Let A be an event and Inline graphic be a random variable. The lower and upper approximations of A w.r.t. Inline graphic are defined as:

graphic file with name M94.gif

However, such general definition will not play an important role for our goal, but it may find some other applications in data analysis.

Rough Approximations for Numerical Data

In the previous section we have seen how the approximations may be estimated in practice when we deal with categorical data, and that such estimation coincides with Pawlak’s approach. Since the approximations do not depend on the type of data, the question is how to estimate them for numerical data. To make things simpler, we assume that classification is binary, i.e., Inline graphic, and we only have two values for the variable Inline graphic, 0 and 1. Assume also that the domain of Inline graphic is Inline graphic i.e., Inline graphic is a continuous random variable. By Inline graphic we denote the probability density function (PDF) of Inline graphic, while by Inline graphic we denote the PDF of the binary random variable Inline graphic. The joint PDF of Inline graphic and Inline graphic is denoted as Inline graphic. From probability theory it holds that Inline graphic, Inline graphic for Inline graphic and Inline graphic. We calculate the approximations of class 1. Probability theory tells us that:

graphic file with name M111.gif

For the lower approximation we have that

graphic file with name M112.gif

The last equality can be divided by Inline graphic and we get the condition Inline graphic. Here Inline graphic stands for the conditional PDF of Inline graphic on event Inline graphic. For the upper approximation we have:

graphic file with name M118.gif

The last equality can be divided by Inline graphic and we get the condition Inline graphic.

The conclusion we may derive from the calculations is that x certainly belongs to class 1 if the conditional PDF of Inline graphic on Inline graphic evaluated in x is 0. We have that x possibly belongs to class 1 if the conditional PDF of Inline graphic on Inline graphic evaluated in x is greater than 0. These conditions depend on conditional PDFs which are unknown in practice and have to be estimated. More precisely, we need to estimate the so-called level sets, i.e., areas on which the PDF is smaller or greater than some value [2]. In our case, the thresholds we consider for the PDFs are when they are equal to 0 and greater than 0 (lower and upper approximation).

The estimation of level sets is an emerging field in statistics and ML [2, 3, 20]. Such estimations are essentially different from estimating the PDF itself since we are searching for good estimators for a particular area of the PDF, not for the whole PDF.

Below we present a naive approach of estimating level sets using the estimation of the PDF. Density estimation is a well studied area of statistics [18, 19, 23]. The main methods are histogram density estimation, kernel density estimation (KDE) and nearest neighbour density estimation. Histograms are known for performing badly in high dimensions [18], while the nearest neighbour methods do not assume that there are areas where the PDF is equal to 0 [14]. For these reasons, KDE appears the most appropriate choice to calculate level sets. We refer the reader to [19] for an overview of density estimation methods.

Rough Sets and KDE

A kernel Inline graphic is a positive and symmetric mapping for which it holds that Inline graphic [24]. It may be seen as a measure of similarity between points from Inline graphic. The kernel density estimator is defined as:

graphic file with name M128.gif

where Inline graphic is a given sample from the unknown PDF f. The motivation behind this definition is that if x has more points in its proximity, then value Inline graphic will be larger, which indicates an area of higher density.

Similarity measures are usually based on distances between points since, intuitively, the closer points are, the more similar they are to each other. Therefore, we use kernels based on Euclidean distance, called radial kernels [12]:

graphic file with name M131.gif

The notation Inline graphic stands for the standard norm on Inline graphic, h is a positive real parameter called bandwidth while k is a univariate positive function. Using radial kernels, the PDF estimator becomes:

graphic file with name M134.gif 2

From before we have that the lower approximation can be formulated as:

graphic file with name M135.gif

Therefore, using (2) we get the estimator of the lower approximation:

graphic file with name M136.gif

Although it is not possible that Inline graphic and Inline graphic at the same time, it may happen that Inline graphic and Inline graphic for some x. Such values we will denote as inconclusive and we will exclude them from the approximations, as before. Following this, we redefine the estimation of the lower approximation:

graphic file with name M141.gif 3

Henceforth we will focus on the lower approximation. A very similar procedure can be used to estimate the upper approximation.

We have to decide which area satisfies the condition from (3). To estimate Inline graphic we use objects from class 0 and to estimate Inline graphic we use objects from class 1. Recall Inline graphic as the set of objects or the sample. Set U is split into two subsets; objects which belong to class 0, and objects which belong to class 1. We denote those sets Inline graphic and Inline graphic. To estimate the conditional PDFs Inline graphic and Inline graphic we use the objects from Inline graphic and Inline graphic respectively. To estimate the level set Inline graphic we have to find values of x for which Inline graphic and to estimate Inline graphic we are searching for x where Inline graphic. It follows that:

graphic file with name M155.gif
graphic file with name M156.gif

The derivation up to now is general and holds for all functions k and bandwidths h. The question is, which kernel best suits the last condition. The most used kernel in practice is the Gaussian kernel which is also radial: Inline graphic. Its main drawback is that it is nowhere equal to 0. It is used under the assumption that there are no impossible or certain events which is not the case here. Therefore, a better choice would be a kernel with different assumptions. In particular, we require a kernel for which k is bigger than 0 on a bounded set i.e., a kernel with bounded support (Fig. 1).

Fig. 1.

Fig. 1.

Kernel examples in univariate case

The theory developed in [13] states that the smallest estimation error under certain conditions is achieved for the Epanechikov kernel. The Epanechikov kernel is radial with

graphic file with name M158.gif

where Inline graphic is the volume of the m-dimensional unit ball. According to the definition, its support is the unit hypersphere, which implies that it is bounded. Another kernel with bounded support is the spherical uniform kernel, i.e., the constant radial kernel for which

graphic file with name M160.gif

Let Inline graphic and Inline graphic be the bandwidths corresponding to the Epanechikov kernel and spherical uniform kernel, respectively. For the Epanechikov kernel, we have that:

graphic file with name M163.gif
graphic file with name M164.gif

while for the spherical uniform kernel it holds that:

graphic file with name M165.gif

In both cases, value x certainly belongs to class 1 if in the neighborhood there are no objects from the opposite class and there are some objects from the same class. Hence, by using kernels with bounded support, we obtain simple conditions for estimating the lower approximations.

Relationship to Neighborhood Based Rough Sets

We summarize the results obtained so far: we defined the lower approximation of class Inline graphic as : Inline graphic for continuous random variable Inline graphic. We estimated the approximation by estimating the PDF from the expression using kernel density estimators as:

graphic file with name M169.gif

We have shown that the estimators for certain radial kernels with bounded support lead to the expression:

graphic file with name M170.gif

for some h. Let us write the neighborhood definition replacing Inline graphic with h: Inline graphic, where d is the Euclidean distance. Condition Inline graphic means that there is at least one object from Inline graphic in Inline graphic, i.e., Inline graphic, while Inline graphic means that there are no objects from Inline graphic in Inline graphic, i.e., Inline graphic. It follows that the approximation estimator can be written as:

graphic file with name M181.gif

The latter expression is exactly the SV (set of values) definition of the neighborhood based rough sets. We can conclude that the estimators of the RV approximations coincide with the SV definition of the neighborhood based rough sets. The advantage of this representation of the neighborhood based rough sets is that we have proper mathematical tools to calculate the neighborhood size in order to get better results. We are now able to use statistical methods to obtain a proper bandwidth which plays the role of the neighborhood size.

In the following subsection, we will outline a procedure to select the bandwidths in theory, that is: we provide some insights on how the bandwidths can be calculated independently from data, using only the chosen kernel and the original PDF.

Bandwidth Selection - An Example

This subsection relies on the work presented in [19]. Using the KDE theory, we are able to construct the proper bandwidths for different kernels in order to obtain the best possible estimator of PDFs (or at least close to the best). The bandwidths are chosen to minimize the error of the PDF estimation. A widely used error function is Mean Integrated Square Error (MISE):

graphic file with name M182.gif

where E stands for the expected value. When n is significantly larger than the number of attributes m, the MISE of radial kernels can be approximated as:

graphic file with name M183.gif

The latter expression is also called AMISE or Asymptotic MISE. By minimizing the expression above, we get the optimal bandwidth:

graphic file with name M184.gif

Constants Inline graphic and Inline graphic are dependent on the kernel and on the actual probability density function f. Assuming that our data are normally distributed (or something close to normal with bounded support), we are able to calculate the optimal bandwidths. Under normality assumption, the optimal bandwidths for the Epanechikov and spherical uniform kernels are:

graphic file with name M187.gif

From the AMISE expression, we may see that the rate of convergence is not dependent on constant Inline graphic. Therefore, in order to avoid the assumptions and to achieve better results one can try to tune constant Inline graphic using data. Under Inline graphic for some kernel we also ensure that:

graphic file with name M191.gif

That ensures that for a sufficiently large sample size n, the inconclusive areas will become negligible. That is also intuitive since with more data we acquire more knowledge which leaves less space for uncertainty.

Discussion

We have presented a new way to calculate the neighborhood size in neighborhood based rough sets. A question arises: does it provide satisfactory results in practice?

It is well known that rough sets are widely used in attribute selection [4, 10]. The attribute selection in rough sets focuses on preservation of certain knowledge; we delete attributes as long as the lower approximations of all classes remain unchanged.

We have run a series of experiments applying the attribute selection using neighborhood based rough sets together with the calculated bandwidths. Unfortunately, the results were not satisfactory. First, we simulated data with normal distribution to fulfill the assumption from the previous subsection. We have noticed that for lower dimensions, both Inline graphic-neighborhood and Inline graphic-neighborhood are too wide, meaning that they cover a large amount of data. Consequently, the lower approximations obtained with them consist of a low percentage of data which is unrealistic. With higher dimensions, we observed the opposite problem; the neighborhoods are too narrow which leads to the lower approximation containing almost all data, which is also unrealistic. We can conclude that the naive approach of estimating PDF and searching for the optimal bandwidth is not the best idea. The reason for the failure, even under the normality assumption, may lie in the fact that the optimal bandwidths are mainly useful in the following cases.

  • The number of objects in the sample is significantly larger than the number of attributes since the bandwidth optimality is asymptotic.

  • The MISE error is calculated using Inline graphic norm (the integral of the squared difference). Our interest is to get the optimal bandwidth for the level set where PDF is equal to 0. The Inline graphic convergence does not guarantee that the estimator also uniformly converges to the actual PDF [17]. Thus, we may have that Inline graphic is suitable for the higher density regions where the PDF is significantly larger than 0 and that it may have poor performance for the regions where the PDF is close to 0.

We have also applied the procedure on real data for which the normality assumption does not hold. As soon as the assumption is not fulfilled, the results are getting worse. For example, we considered binary classification in mammographic data from UCI [1] for which Inline graphic and Inline graphic. In all cases, the lower approximations contained less than 7 % of data, meaning that only 7 % of data can be certainly classified. Keeping in mind that the classification accuracy we obtained with SVM on this dataset is around 85%, 7 % of certainty is unrealistic.

To overcome the limitations of the theoretical bandwidth selection, we identify the following options for future integration of rough sets, KDE and statistics in general.

  • Data driven estimation. The calculation of bandwidths may be data driven. There is also a statistical theory on how to calculate bandwidths based on data (again [19]). Data driven bandwidths will help us to overcome any a priori assumptions on the distribution of data.

  • Robust approaches. Having 0 probability regions is a strong assumption which usually does not coincide with reality. Mostly, numerical data exhibit rare events, which may occur in the training data and/or during the prediction process. Having the assumption that data lie in a bounded region may be misleading in many cases and it can produce bad results. The 0 probability regions can be eliminated by applying robust approaches similar to Variable Precision Rough Sets (VPRS).

  • Direct level set estimation. The bandwidth calculation needs to be more adjusted to the problem of the level set estimation, rather than to the PDF estimation. After we identify the regions of interest, we have to set up the optimization problem to get the best possible (or close to the best) bandwidth for that particular case.

  • Different estimators than KDE. We can try to use other estimators for level sets, besides KDE. The nearest neighbor based estimator can give interesting results [14].

  • Integration with SVM. Do we have to use densities to estimate the approximations defined in (1)? We showed that the estimation of the RV approximations (1) boils down to the estimation of level sets. We may explore the relation between SVM and level set estimation as has been done in [11, 16, 22]. On the other hand, there is a direct correspondence between principles of rough sets and SVM. The applications of rough sets in binary classification divide the domain into three sets, two certain regions for each class and one boundary region. SVM is doing something similar where it trains two margins which divide the space similarly as the rough sets: one boundary region and two regions for two classes. Thus, using the similarities between rough sets and SVM, we can try to integrate them in order to achieve better results.

Conclusion

We presented a new view on the definition of rough sets for the case when data are not necessarily categorical. From the statistical point of view, the calculation of rough set approximations is basically the estimation of the unknown RV (random value) approximations dependent on random variables that generate data. Such estimation under certain conditions (i.e., using radial kernels with bounded support) is equivalent to the definition of neighborhood based rough sets. We also showed a simple way how to calculate the neighborhood size using statistics. Moreover, we discussed several options for future research on the integration of rough sets and statistics. Of course, for each of the proposals it should be studied if it can be tailored to the main applications of rough sets: rule induction and attribute selection.

Acknowledgements

This work was supported by the Odysseus program of the Research Foundation-Flanders.

Contributor Information

Rafael Bello, Email: rbellop@uclv.edu.cu.

Duoqian Miao, Email: dqmiao@tongji.edu.cn.

Rafael Falcon, Email: rfalcon@ieee.org.

Michinori Nakata, Email: nakatam@ieee.org.

Alejandro Rosete, Email: rosete@ceis.cujae.edu.cu.

Davide Ciucci, Email: davide.ciucci@unimib.it.

Marko Palangetić, Email: marko.palangetic@ugent.be.

Chris Cornelis, Email: chris.cornelis@ugent.be.

Salvatore Greco, Email: salgreco@unict.it.

Roman Słowiński, Email: roman.slowinski@cs.put.poznan.pl.

References

  • 1.Asuncion, A., Newman, D.: UCI machine learning repository (2007)
  • 2.Cadre B. Kernel estimation of density level sets. J. Multivar. Anal. 2006;97(4):999–1023. doi: 10.1016/j.jmva.2005.05.004. [DOI] [Google Scholar]
  • 3.Chen YC, Genovese CR, Wasserman L. Density level sets: asymptotics, inference, and visualization. J. Am. Stat. Assoc. 2017;112(520):1684–1696. doi: 10.1080/01621459.2016.1228536. [DOI] [Google Scholar]
  • 4.Choubey, S.K., Deogun, J.S., Raghavan, V.V., Sever, H.: A comparison of feature selection algorithms in the context of rough classifiers. In: Proceedings of IEEE 5th International Fuzzy Systems, vol. 2, pp. 1122–1128. IEEE (1996)
  • 5.Dubois D, Prade H. Rough fuzzy sets and fuzzy rough sets. Int. J. Gen. Syst. 1990;17(2–3):191–209. doi: 10.1080/03081079008935107. [DOI] [Google Scholar]
  • 6.Greco S, Matarazzo B, Słowiński R. Rough membership and bayesian confirmation measures for parameterized rough sets. In: Ślęzak D, Wang G, Szczuka M, Düntsch I, Yao Y, editors. Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing; Heidelberg: Springer; 2005. pp. 314–324. [Google Scholar]
  • 7.Grzymala-Busse JW, Stefanowski J. Three discretization methods for rule induction. Int. J. Intell. Syst. 2001;16(1):29–38. doi: 10.1002/1098-111X(200101)16:1<29::AID-INT4>3.0.CO;2-0. [DOI] [Google Scholar]
  • 8.Grzymala-Busse JW, Werbrouck P. On the best search method in the LEM1 and LEM2 algorithms. In: Orłowska E, editor. Incomplete Information: Rough Set Analysis. Heidelberg: Springer; 1998. pp. 75–91. [Google Scholar]
  • 9.Hu Q, Yu D, Liu J, Wu C. Neighborhood rough set based heterogeneous feature subset selection. Inf. Sci. 2008;178(18):3577–3594. doi: 10.1016/j.ins.2008.05.024. [DOI] [Google Scholar]
  • 10.Jensen, R.: Rough set-based feature selection: a review. In: Rough Computing: Theories, Technologies and Applications, pp. 70–107. IGI Global (2008)
  • 11.Kloft M, Nakajima S, Brefeld U. Feature selection for density level-sets. In: Buntine W, Grobelnik M, Mladenić D, Shawe-Taylor J, editors. Machine Learning and Knowledge Discovery in Databases; Heidelberg: Springer; 2009. pp. 692–704. [Google Scholar]
  • 12.Kulczycki P. Kernel estimators in industrial applications. In: Prasad B, editor. Soft Computing Applications in Industry. Heidelberg: Springer; 2008. pp. 69–91. [Google Scholar]
  • 13.Muller HG, et al. Smooth optimum kernel estimators of densities, regression curves and modes. Ann. Stat. 1984;12(2):766–774. doi: 10.1214/aos/1176346523. [DOI] [Google Scholar]
  • 14.Orava J. K-nearest neighbour kernel density estimation, the choice of optimal k. Tatra Mt. Math. Publ. 2011;50(1):39–50. [Google Scholar]
  • 15.Pawlak Z. Rough sets. Int. J. Comput. Inf. Sci. 1982;11(5):341–356. doi: 10.1007/BF01001956. [DOI] [Google Scholar]
  • 16.Rakotomamonjy, A., Davy, M.: One-class SVM regularization path and comparison with alpha seeding. In: ESANN, pp. 271–276. Citeseer (2007)
  • 17.Rudin W. Real and Complex Analysis. New York: Tata McGraw-Hill Education; 2006. [Google Scholar]
  • 18.Scott DW. Multivariate Density Estimation: Theory, Practice, and Visualization. Hoboken: Wiley; 2015. [Google Scholar]
  • 19.Silverman BW. Density Estimation for Statistics and Data Analysis. Abingdon: Routledge; 2018. [Google Scholar]
  • 20.Singh A, Scott C, Nowak R, et al. Adaptive hausdorff estimation of density level sets. Ann. Stat. 2009;37(5B):2760–2782. doi: 10.1214/08-AOS661. [DOI] [Google Scholar]
  • 21.Slowinski R, Vanderpooten D. A generalized definition of rough approximations based on similarity. IEEE Trans. Knowl. Data Eng. 2000;12(2):331–336. doi: 10.1109/69.842271. [DOI] [Google Scholar]
  • 22.Steinwart, I., Hush, D., Scovel, C.: Density level detection is classification. In: Advances in Neural Information Processing Systems, pp. 1337–1344 (2005)
  • 23.Wand MP, Jones MC. Kernel Smoothing. Boca Raton: Chapman and Hall/CRC; 1994. [Google Scholar]
  • 24.Wȩglarczyk, S.: Kernel density estimation and its application. In: ITM Web of Conferences, vol. 23, p. 00037. EDP Sciences (2018)
  • 25.Yao Y. Decision-theoretic rough set models. In: Yao JT, Lingras P, Wu W-Z, Szczuka M, Cercone NJ, Ślȩzak D, editors. Rough Sets and Knowledge Technology; Heidelberg: Springer; 2007. pp. 1–12. [Google Scholar]
  • 26.Yao Y, Yao B. Covering based rough set approximations. Inf. Sci. 2012;200:91–107. doi: 10.1016/j.ins.2012.02.065. [DOI] [Google Scholar]
  • 27.Ziarko W. Variable precision rough set model. J. Comput. Syst. Sci. 1993;46(1):39–59. doi: 10.1016/0022-0000(93)90048-2. [DOI] [Google Scholar]
  • 28.Ziarko W. Probabilistic rough sets. In: Ślęzak D, Wang G, Szczuka M, Düntsch I, Yao Y, editors. Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing; Heidelberg: Springer; 2005. pp. 283–293. [Google Scholar]

Articles from Rough Sets are provided here courtesy of Nature Publishing Group

RESOURCES