Survey on granularity clustering

Shifei Ding; Mingjing Du; Hong Zhu

doi:10.1007/s11571-015-9351-3

. 2015 Jul 29;9(6):561–572. doi: 10.1007/s11571-015-9351-3

Survey on granularity clustering

Shifei Ding ^1,^2,^✉, Mingjing Du ¹, Hong Zhu ¹

PMCID: PMC4635389 PMID: 26557926

Abstract

With the rapid development of uncertain artificial intelligent and the arrival of big data era, conventional clustering analysis and granular computing fail to satisfy the requirements of intelligent information processing in this new case. There is the essential relationship between granular computing and clustering analysis, so some researchers try to combine granular computing with clustering analysis. In the idea of granularity, the researchers expand the researches in clustering analysis and look for the best clustering results with the help of the basic theories and methods of granular computing. Granularity clustering method which is proposed and studied has attracted more and more attention. This paper firstly summarizes the background of granularity clustering and the intrinsic connection between granular computing and clustering analysis, and then mainly reviews the research status and various methods of granularity clustering. Finally, we analyze existing problem and propose further research.

Keywords: Granular computing, Clustering analysis, Granularity clustering

Introduction

With the development of computer technology and its application, all walks of life produce high-dimensional massive data. The coming of big data era not only is a challenge to human society, but also is a good opportunity for the development of human society. How to process these data and get useful information from these data has become an important content in data processing. In the age of big data, data is characterized by high-dimensional massive features, uncertainty, incompleteness and imprecision. In this case, it is difficult to obtain an accurate solution, so an approximate solution can be obtained by comparatively rough granularity. Obviously, granular computing is the appropriate tool to study granulation. As an important tool to deal with uncertain information, granular computing is a new method which can simulate human thinking and solve problems in computational intelligence. Granular computing is a multidisciplinary study that emerged from existing disciplines and fields of study. For example, granular computing can draw results from cognitive science and cognitive psychology (Yao 2007). Cognitive science is the interdisciplinary scientific study of the mind and its processes (Posner 1989). While cognitive science focuses on information processing based on the object–attribute–relation model and the concept algebra, granular computing explores a special type of information structures characterized by multiple levels of granularity. An examination of the scopes, goals, and methodologies of cognitive informatics and granular computing suggests that there exists a close relationship between both fields (Yao 2009).

Granular computing covers theories, methodologies and techniques that make use of granules, and is a powerful tool that researches complex problem solving, massive data mining and information processing with uncertainty (Yao 2000; Zadeh 1996, 1997; Zhang and Zhang 1992; Miao 2011; Wang et al. 2011; Pedrycz 2013; Pedrycz et al. 2008). In recent years, granular computing gradually produced its own philosophy, theories, methods and tools and developed many topics that include the idea of granularity, the logic of granularity, the reasoning of granularity and the problem solving of granularity (Miao et al. 2007; Chen 2006; Yao 2006; Yao 2008; Zhu et al. 2011; Ding et al. 2010). The researches of Chinese scholars lay a foundation for further research on granular computing, and guide the development of granular computing (Zhang and Zhang 2003; Liu and Li 2011; Li et al. 1995; Wang et al. 2009; Liu et al. 2008; Zhu et al. 2012). Granular computing model has unique advantages in these aspects of resolving the uncertainty and exploring inner relationships among the data. But conventional granular computing models which include fuzzy set theory, rough set theory and quotient space theory have rather high the space and time complexity (where the time complexity in fuzzy set theory is determined by the membership function), so these models are inefficient in processing high dimensions.

In such a situation, clustering analysis as a method for granulating data objects has attracted wide attention. Its time and space complexity is far less than one of the typical models of granular computing. In unsupervised learning, clustering algorithm whose goal is the granulation of data objects can automatically classify data into several clusters based on the similarity of data. Consequently, perspectives and levels of people can be changed in viewing and solving problems. Clustering algorithm has been a hot research area. However, either-or property in conventional clustering analysis has limited its application. A method which combines clustering analysis with granular computing is an effective solution. Because the granularity computing has a natural relationship with clustering analysis on fundamentals, many experts grope for new solutions by combining clustering analysis and granular computing. Granularity clustering is a hot topic in the current study and becomes an important instrument of observing and solving problems. Even, in many circumstances, people have unconsciously used the cluster idea and method.

The remainder of this paper is organized as follows: Sect. 2 analyzes the inevitability of combination clustering analysis with granular computing and discusses each principle of granularity in clustering; Sect. 3 introduces various methods and the advances in granular clustering based on single granularity models and fusion granularity models; Sect. 4 introduces the applications of granular computing in subspace clustering; Sect. 5 makes a conclusion and proposes the future research directions in this field.

Granular computing and clustering analysis

Essences of granular computing

Clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups(Han and Micheline 2006). Clustering is an important part of the some research fields, such as pattern recognition, data mining, machine learning, and so on, and plays an extremely vital role in the aspect of recognizing the intrinsic structure of data. Intersection of clustering and other studies, as well as its own importance, makes it become a hot research topic. Conventional clustering has an either-or property that is a hard partition, and strictly puts each object set in one cluster. But with the rapid development of the Internet and information systems, massive, high-dimensional, distributed, dynamic and complex data is produced. There is incompleteness, unreliability, inaccuracy, inconsistency, and so on in the data. It is very difficult to satisfy these demands by using conventional clustering methods.

Granularity is a powerful tool to describe uncertain object. Granular computing is a new concept and computing paradigm of processing information. It covers all theories that relate to granularity, including the theories, methods, techniques and tools, and majors in intelligent processing of uncertain, incomplete, fuzzy and massive information.

Roughly speaking, on the one hand, granular computing is a superset of the theory of fuzzy information granulation, rough set theory, the theory of quotient space and interval computations. On the other hand, it is a subset of granular mathematics. To be more specific, in analyzing and solving problems, all theories and methods which make use of any means to group, classify and cluster data objects belonging to granular computing. Many scholars have found the essential relation between granularity and clustering. From granularity perspective, clustering can analyze and solve problems in a unified granularity. Some ways to apply the idea of granularity to clustering algorithms produce better results.

Due to the rise of granular computing, clustering analysis has been extended to the field of soft computing, will further raise practical value and even more has the real significance. Data is clustered in terms of different perspectives and different levels by the transformation of granularity. So both-and clustering has theoretical principles and practical methods and makes up for the deficiencies of the conventional clustering. Research of granular computing which includes fuzzy clustering, clustering based on rough set, clustering based on quotient space and so on has boomed. They are going to infiltrate and complement, and are combined with neural networks, evolutionary computation and other soft computing. Granular computing is widely applied in the improvement of conventional clustering algorithms (Bai et al. 2009; Gang and Miao 2009), image processing (Liu et al. 2004a), biology evolutionary computation (Hao and Xie 2007), web page text clustering (Zhong 2004; Zheng et al. 2009; Zhang et al. 2009) and other fields (Fukushima et al. 2007). A unified granular clustering model is being produced.

Principle of granularity in clustering

Granularity is a measure of granules, so granules are regarded as the primitive notion of granular computing. A granule is a set of elements that are drawn together by indistinguishability, similarity, proximity or functionality. From a philosophical point of view, Yager and Filev pointed out that “human beings have been developed a granular view of the world”, and “… objects with which mankind perceives, measures, conceptualizes and reasons are granular”. Of all human activities, granularity is omnipresent (Leslie 1984). Granule exists at a particular level. They are the study subjects on the level. Granular computing based on multi-level and multi-view structured idea is inspired by human cognition and problem solving process. Some algorithms to solve problems are designed through granularity conception. Granular computing is a kind of world outlook and methodology of treating the objective world and it is also the basics of problem-solving. Yao provides three views of granular computing: granular computing is a way of structured thinking; granular computing is a method of structured problem solving; granular computing is a paradigm of information processing. Granular computing is viewed as an interdisciplinary study of computation in nature, society and science, characterized by structured thinking, structured problem solving and structured information processing with an underlying notion of multiple levels of granulation. It consists of all the theories, techniques and tools related to the granularity.

There is an intrinsic relationship between granular computing and clustering analysis. Many scholars have studied on granule essence of clustering (Bu et al. 2002; Zhang et al. 2001; Chen et al. 2007; Wang 2006). The granulation of granules obtained from multi-levels and multi-views partition is measured by granularity. We can divide a big original coarse-grained object into some smaller fine-grained objects through granularity principle. We can also combine some small fine-grained objects into a bigger coarse-grained object. The former is a process of classification, and the latter is a process of clustering. Because clustering is an unsupervised classification in essence, the process of clustering is the process of dividing. All clustering methods can be adaptable to granularity partition.

In granularity thinking, all clustering algorithms are uniformed. The three main factors determine clustering results: The first factor is the method of selecting clustering centers. The second factor similarity function. The last factor is similarity threshold. By changing the similarity function, the object can be divided from different perspectives. If the similarity function has been selected, by changing the similarity threshold, the objects can be divided from different levels. Through changing granularity, the complex problems can be simplified. For example, in the case of the coarse-grained object, minor details will be eliminated. In order to facilitate solving problem, the size of the granularity object is changed by changing similarity function and similarity threshold.

The size of each cluster is described by granularity. The similarity function is a measure of similarity between two samples. The samples are clustered into certain classes by specific similarity threshold, and the objects in one cluster are similar to each other, and ones in different clusters are different. Clustering is essentially a kind of relation of equivalence which divides the object into several classes. A class is a cluster. For the objects of each cluster are similar in current threshold. The size of threshold corresponds to the size of. If threshold is bigger, the size of each granule obtained by clustering is rougher. Some valuable information is retained, and minor details are obscured. If the threshold is smaller, the size of each granule obtained by clustering is finer, and saving information is too much. This kind of relation of equivalence from big to small forms a partially ordered lattice construction. The size of the granularity is always changing in the process of clustering. Firstly, a proper similarity function first is selected to ensure correct perspective of the granulation, then a suitable threshold is selected, and, finally we get the result of clustering.

After granulating the data of a problem, using a triad (X, F, Γ) describes the problem formally. X represents the universe of the problem (a collection of researching objects). F is the attribute function which is defined as, F:X → Y, where Y is an attribute set. Γ denotes the structure of the universe which is defined as the relationship among elements. To deal with the complex and difficult problem, first of all, a simple vague model is abstracted so that a relatively rough granularity space is formed. And then the model recognizes the sample elements as a whole in which these elements have a similar nature, so a new data element is generated. According to the equivalence class partition which corresponds to relation of equivalence, A new universe [X] is produced, so the original problem is transformed into a new level of the problems ([X], [F], [Γ]).

Supposed R is the sum of all the relations of equivalence, R₁ and R₂ are two of them, for any two elements in the universe: x, y, if there is xR₁y ⇒ xR₂y, we often say R₁ is more detailed than R₂. That is, given two relations of equivalence corresponding to two different partitions, if one partition set is included in the other; it shows the latter set is bigger than the former. The former subdivides the latter, denoting: R₂ < R₁. According to this principle, we can get a sequence of relations of equivalence: R_n < R_n-1 <···< R₂ < R₁ < R₀. R_n is the “biggest” (fuzzy) relationship, whereas R₀ is the “smallest”(detailed) relationship. So a n-level tree will be obtained. All the leaf nodes make up the universe, representing the smallest partition. On this basis, bottom-up each layer is a partition of the universe. The root, which is the crudest partition, puts all elements into a large collection. Clustering results are often expressed with a genealogy chart. The bigger similarity threshold is selected, so the difference between the sample points are fuzzier, and the divided clusters are less. Whereas, differences between the sample points are more precise, so the more clusters are obtained; this also corresponds to a tree structure. Thus, we can see that this is the reason why there is a natural similarity between clustering and granularity.

In order to automatic clustering effectively, we not only qualitatively describe the granularity principle in clustering, but also need to find the right size to make the quantitative analysis. Firstly we give two partitions of equivalent:

There are two relations of equivalence, R₁ and R₂ in universe X:

Define the relationship “AND” as R₁ ⊕ R₂ between the two relationships, it can be divided by R₁ and R₂, but there is still this relation of equivalence, R′ in the middle of them. It can be parted not only by R₁ and R₂ but also byR₁ ⊕ R₂, so R₁ ⊕ R₂ is the most detailed partition by R₁ and R₂.
Define the relationship “PRODUCT” as R₁ ⊗ R₂ between the two relationships, it can divide R₁ and R₂, but there is still this relation of equivalence, R′ in the middle of them. It can part not only R₁ and R₂ but also R′, so R₁ ⊗ R₂ is the fuzziest partition which can divide R₁ and R₂.

The essential idea of the clustering based on granular computing: when we are solving a specific problem about clustering analysis, first a relation of equivalence, R₀ should be initialized according to what we need, then the relation of equivalence divides the problem space into several clusters (or called granules). In this case, Δ₀ denotes the granularity size of these granules, and S₀ denotes the quotient space. If the granularity of clustering is suitable, we can obtain a satisfactory result when we analyze problems by using this quotient space S₀, else:

If the granularity size is a little bigger, we should get a more detailed partition of equivalence $R_{0}^{'}$ and let $R_{1} = R_{0} \otimes R_{0}^{'}$ . The relation of equivalence R₁ serves as a new partition rule, so that we can get the new granularity size Δ₁. If the granularity is still big (fuzzy), we will continue to repeat the steps above until t a satisfactory result has been achieved.
If the granularity size is a little smaller, we should get a fuzzier partition of equivalence $R_{0}^{'}$ and let $R_{1} = R_{0} \otimes R_{0}^{'}$ . The relation of equivalence R₁ serves as a new partition rule, so that we can get the new granularity size Δ₁. If the granularity is still small (detailed), we will continue to repeat the steps above until a satisfactory result has been achieved.

According to the specific issues, refining and coarsening can be mixed, eventually we can get the proper size and obtain a satisfactory approximate solution.

Granular computing, as a new method for solving complex problems, is widely applied in clustering analysis.

In recent years, many researchers study some algorithms to combine clustering with granularity theory attracted widespread attention, because there is the granularity thinking in clustering. Bargiela and Pedrycz do further systematic research on granular computing method and describe a granularity world in the sense of clustering (Bargiela and Pedrycz 2003; Bargiela and Pedrycz 2003; Pedrycz and Bargiela 2012; Pedrycz and Keun 2006). Xie et al. (2005) propose a fuzzy clustering algorithm based on the granularity analysis theory (3 M algorithm). Su et al. (2006) introduce granular computing into clustering analysis and make a study of knowledge acquisition method based on information granularity. Pu et al. propose a new classification algorithm based on the theory of information granularity. Zhang et al. (2001) apply granular computing based on visual simulation under the guidance of clustering analysis thought. An et al. (2003) propose a clustering algorithm based on information granularity and rough set. The research on the combination of clustering based on granular computing, ant colony and neural network is a hot topic.

Under granule thought, granularity clustering, by using theories, methodologies and tools, such as fuzzy set theory, rough set theory and quotient space theory, can expand the study of clustering analysis in order to find an optimal “granule”, get the best result of clustering and solve the problem better. Granular computing can also be implemented by using neural network, evolutionary algorithm, particle swarm optimization algorithm, immunologic mechanism, support vector machine, and so on.

Advantages of granularity clustering (He et al. 2007; Zhang et al. 2005)

It can be difficult to obtain the exact solution when practical problems are characterized by incomplete, uncertain, imprecise or vague information in age of big data. So approximate solution could be obtained by coarse-grained clustering.
People can preprocess the data by coarse-grained clustering when facing high-dimensional massive data.
Sometimes obtaining the exact solution is not necessary. We can obtain an approximate solution by coarse-grained clustering.
When the problem is too elaborate, we can abstract and simplify the problem by using granular computing. After unnecessary details are removed, clustering analysis is implemented.
It is easy to integrate different clustering methods based on the granularity. The combination of them and soft algorithm, such as neural network and evolutionary computation, improves the performance of the algorithm.
Selecting the appropriate original granularity can reduce time spent and storage space and improve the correctness of clustering.
The incompatibility between clustering results and prior knowledge is eliminated.

Granularity clustering theories

Fuzzy clustering analysis

The typical fuzzy clustering methods include fuzzy clustering methods based on partitioning, clustering methods based on transitive closure for fuzzy relations, the methods based on similar relations and fuzzy relations which include aggregate method and splitting method, convex decomposition methods based on data set, maximal tree methods based on fuzzy graph, dynamic programming, and so on.

Fuzzy clustering method based on partitioning which is also called the method based on the objective function is simple and widely used. It can be transformed into an optimization problem and solved by using nonlinear programming theory of classical mathematics and implemented easily. In order to implement a fuzzy partition, this method based on hard C-means algorithm will introduce the weighting exponent of membership function or information entropy into the objective function.

Introducing weighting exponent of membership function.

In clustering based on the objective function, fuzzy C-means method (FCM) is the most widely used method and is inspired by hard C-means algorithm. Dunn extends square error and function J₁ to weighted average error and function J₂. Bezdek introduces a parameter m, extends function J₂ to an infinite family J_m of weighted objective functions, proposes alternating optimization algorithm (AO) and forms FCM algorithm which evolves FGFEM, PFCM and PCM. There is J_m concerns linked with the spatial structure of Hilbert space of R^s, so that it can be studied by using more mathematical theories. Objective function for fuzzy clustering that is introduced to a weighting exponent of membership function is as follows:

\{\begin{matrix} J_{m} = \sum_{i = 1}^{c} \sum_{k = 1}^{n} {(μ_{i k})}^{m} \cdot D (x_{k}, p_{i}) + ζ \\ s . t . f (μ_{i k}) \in C \end{matrix}

where X = {x₁, x₂,…, xn}, x_k ∈ R^s, and n denotes the number of data items. Objective function for fuzzy clustering is determined by the parameter set {U, D (•), P, m, X} where $U = {[u_{i k}]}_{c \times n}$ denotes a membership matrix, and P = {p₁, p₂,…, p_c} denotes a set of cluster centers and p_i ∈ R^s. ζ is a penalty term. f(μ_ik) ∊ C is a constraint. m is a weighting exponent.

2.
Introducing information entropy

Information entropy is introduced into the hard C-means algorithm. By using this way, we can get the fuzzy clustering algorithm in the sense of maximum entropy. This kind of algorithm has many kinds of forms. As one of them, the objective function of maximum-entropy inference (MEI) is defined by:

J = \sum_{j = 1}^{c} \sum_{i = 1}^{n} u_{i j} d_{i j} + λ^{- 1} \sum_{j = 1}^{c} \sum_{i = 1}^{n} u_{i j} log u_{i j}

In recent years, fuzzy clustering was developed further. In 2013, for the disadvantage and shortage of fuzzy kernel clustering, Zhang et al. (2013) proposed a robust fuzzy kernel clustering algorithm. Pedrycz et al. (2010) introduced a certain knowledge-guided scheme of fuzzy clustering (fuzzy clustering with viewpoints) in which domain knowledge is represented in the form of so-called viewpoints. The viewpoints are represented either in a plain numeric format (considering that there is a high level of specificity with regard to how one establishes perspective from which the data need to be analyzed) or through some information granules (which reflect a more relaxed way in which the views at the data are being expressed). The experiment results elaborate on a way in which the clustering with viewpoints enhances fuzzy models and mechanisms of decision making in the sense that the resulting constructs reflect the preferences and requirement that are present in the modeling environment.

Spectral clustering algorithms have been successfully used in the field of pattern recognition and computer vision. Fuzzy spectral clustering algorithm has been a hot topic in current research. Korenblum and Shalloway extended spectral clustering to fuzzy clustering by introducing the principle of uncertainty minimization. However, this posed a challenging non-convex global optimization problem that they solved by a brute-force technique unlikely to scale to data sets having more than O(10²) items. White et al. develop the efficient uncertainty minimization for fuzzy spectral clustering method (White and Shalloway 2009). In order to handle larger data sets, they apply multiple geometric representations to uncertainty minimization. Uncertainty minimization can be applied to a wide variety of existing hard spectral clustering approaches, thus transforming them to fuzzy methods.

The performance of unsupervised spectral clustering methods is usually affected by uncertain parameters. Using the underlying structure of a general spectral clustering method, Celikyilmaz proposed a new soft-link spectral clustering algorithm is introduced to identify clusters based on fuzzy k-nearest neighbor approach (Celikyilmaz 2009). He constructs a soft weight matrix of a graph by identifying the upper and lower boundaries of parameters of the similarity function, specifically the fuzzier parameter (fuzziness) of the fuzzy k-nearest neighbor algorithm. The algorithm allows perturbations on the graph Laplace during the learning stage by the changes on these parameters.

It is difficult for spectral clustering to choose the suitable scaling parameter in Gaussian kernel similarity measure. Utilizing the prototypes and partition matrix obtained by fuzzy c-means clustering algorithm, Zhao et al. developed a fuzzy similarity measure for spectral clustering (FSSC) (Zhao et al. 2011). Furthermore, they introduce the K-nearest neighbor sparse strategy into FSSC and apply the sparse FSSC to texture image segmentation. In (2012), Mirkin and Nascimento proposed an additive spectral method for fuzzy clustering. The method operates on a clustering model which is an extension of the spectral decomposition of a square matrix. The computation proceeds by extracting clusters one by one, which makes the spectral approach quite naturally. The iterative extraction of clusters, also, allows us to draw several stopping rules to the procedure. This applies to several relational data types differently normalized. The method is experimentally with several classic and recent techniques and shown to be competitive. In (2013), Zhang et al. proposed a classification of human operator functional state based on fuzzy clustering method. The fuzzy c-means (FCM) algorithm was employed to classify the Operator Functional State (OFS) time series data and both the instantaneous OFS class label and maximum degree of membership of that class were given.

Rough clustering

Applications of rough set in clustering analysis mainly have two aspects:

For data preprocessing

In the age of big data, data is characterized by uncertainty, noise, redundancy, diversity and so on. Rough set can solve these problems well. Normally, before implementing clustering, we can correct missing data, and produce discrete data, and process inconsistent data by logical reasoning, and reduce redundant attribute and redundant data by using rough set. These methods ensure that clustering algorithm runs smoothly, at the same time, the algorithmic efficiency is improved. Rough set is regarded as data preprocessing method for clustering analysis by many scholars. To prepare clustering analysis, these scholars get some parameters for by using some concepts of rough set, for example, data reduction algorithm is implemented by using clustering based on rough set theory (Yang and Li 2004).

2.
Clustering with the concepts and the properties of rough set

This method uses the lower and upper approximations to deal with fuzzy partition problem in the clustering. Thus, clustering is extended to soft partition.

There are many new developments in the application of rough set clustering (Pawlak 1982). Herawan et al. proposed maximum dependency attributes (MDA) based on rough set (Herawan et al. 2010). Taking into account the dependency of attributes of the database, some are able to handle uncertainty in the clustering process. MDA technique has high accuracy and low computational complexity comparing to the bi-clustering, total roughness (TR) and min–min roughness (MMR) techniques.

Liu et al. (2004b) applied rough set theory to clustering analysis in knowledge discovery. A lot of definitions such as the local indiscernibility relation, the local and total indiscernibility degree between two objects, the indiscernibility degree between two clusters and the integrated approximation rate of the clustering result are given. Based on these definitions, a rough set based hierarchical clustering algorithm is proposed. It can automatically adjust the parameter in order to get the more optimum result.

In (2010), Malyszko and Stepaniuk proposed a new multilevel rough entropy evolutionary threshold algorithm (MRET) that operates on a multilevel domain. Combining entropy-based thresholding with rough set results in the rough entropy thresholding algorithm. The algorithm is applied in image segmentation. Image is divided into distinct disjoint and homogenous regions. Multilevel rough entropy threshold based segmentations—MRET—present high quality, comparable with and often better than k-means clustering based segmentations. MRET algorithm is suitable for specific segmentation tasks, when seeking solutions that incorporate spatial data features with particular characteristics. In 2011, Malyszko also proposed rough entropy hierarchical agglomerative clustering in image segmentation (Malyszko and Stepaniuk 2011). The algorithmic rough entropy framework has been applied in the hierarchical clustering setting. During cluster merges the quality of the resultant merges has been assessed on the base of the rough entropy. Incorporating rough entropy measure as the evaluation of cluster quality takes into account inherent uncertainty, vagueness and impreciseness.

In (2011), Yanto et al. proposed a clustering by using a rough set model of variable precision. It is applied to group data objects made up of non-numerical attributes and can process noisy data. In 2011, Chen et al. proposed an interval set clustering based on decision theory (Chen and Miao 2011). Lower and upper approximations in the proposed algorithm are hierarchical and constructed as outer-level approximations and inner-level ones. Uncertainty of objects in out-level upper approximation is described by the assignment of objects among different clusters. Accordingly, ambiguity of objects in inner-level upper approximation is represented by local uniform factors of objects. In addition, interval set clustering can be improved to obtain a satisfactory clustering result with the optimal number of clusters, as well as optimal values of parameters, by taking advantage of the usefulness of rough cluster quality index in the evaluation of clustering.

Clustering analysis based on quotient space

Zhang et al. proposed quotient space theory and introduce the concept of quotient space theory into clustering. In 2006, they study the clustering under the concept of granular computing, such as, the framework of quotient space theory (Zhang and Zhang 2006). From the granular computing point of view, all these categories of clustering can be represented by a hierarchical structure in quotient spaces. From the hierarchical structures, several new characteristics of clustering can be obtained. It may provide a new way for further study on clustering.

The process of clustering based on quotient space is a process of constructing different quotient set [X] on the universe (X, f, T). Selecting the appropriate granularity is a key problem. But we cannot immediately find the appropriate granularity during actual operation. To determine an appropriate granularity, we need to analyze and compare the results constantly. For example, to deal with a specific clustering problem, a relation of equivalence can be assumed firstly, and then, a preliminary clustering result can be obtained according to this relation of equivalence. If we obtain satisfactory results, then clustering ends. If the clustering result is too fine, a new partition on the universe can be obtained by the combination method, at the same time, a new quotient set is obtained. If the result is too rough, a new partition on the universe can be obtained by the decomposition method, at the same time, a new quotient set is obtained. We will continue to repeat the steps above until t a satisfactory result has been achieved (Yan et al. 2008).

Zhang et al. (1999) proposed an alternative covering design algorithm of multi-layer neural networks. Based on the former, Zhao et al. (2005) proposed a covering clustering algorithm. For the classification problems of large-scale data, this algorithm not only can solve the problem well, but also can have a good effect on data clustering. Li and Ding (2013) applied granularity clustering based on quotient space to training parameters of neural network. Yan et al. proposed a new covering clustering algorithm based on quotient space granularity. The algorithm is more efficient than K-means algorithm and density-based spatial clustering algorithm (DBSCAN).

Clustering based on a hybrid approach

In granular computing, three main models are fuzzy set, rough set and quotient space. They have their advantages and disadvantages and they are highly complementary. Many experts deeply analyze and compare three models above, and then find some differences and connect with these models, so that a unified granularity clustering model is constructed on the basis of finding a mixture of them.

Rough-fuzzy sets

Although both fuzzy set and rough set theory as two typical methods can deal with uncertainty and imprecision, they have different emphases in solving the problems. The membership of an object x in rough set is subjectively specified. Rough set theory classifies the universe through the different attribute values of objects, and then it can produce different granularity without subjective factors. This method can objectively reflect the fuzziness of knowledge. Fuzzy set theory and rough set theory are two complementary mathematical tools. Some problems, insoluble by a single method, are often solved by rough-fuzzy sets model. The way is preferable to a single method in the efficiency and the correct rate of solving the problems. At the same time, it can show a better capability (Yong et al. 2005). For example, shadowed set model processes information in a similar way that rough set model processes information. But it is developed on the basis of the framework of fuzzy set theory and has shown its advantages in the practical application. This shows a combination of these two models can better solve the problems.

In recent years, rough-fuzzy sets method to study clustering analysis has been a hot topic of the research. In (2010), Mitra et al. proposed a new method of partitive clustering in the framework of shadowed sets, shadowed C-means. The algorithm combines fuzzy set with rough set. The core and exclusion regions of the generated shadowed partition result in a reduction in computations as compared to conventional fuzzy clustering. Unlike rough clustering, the choice of threshold parameter is fully automated. The number of clusters is optimized in terms of various validity indices. It is observed that shadowed clustering can efficiently handle overlapping among clusters as well as model uncertainty in class boundaries. The algorithm is robust in the presence of outliers. A comparative study is made with related partitive approaches. Experimental results on synthetic as well as real data sets demonstrate the superiority of the proposed approach.

In (2010), Xue et al. proposed a fuzzy rough semi-supervised outlier detection (FRSSOD) approach with the help of some labeled samples and fuzzy rough C-means clustering. The method introduces an objective function, which minimizes the sum squared error of clustering results and the deviation from labeled samples as well as the number of outliers. Each cluster is represented by a center, a crisp lower approximation and a fuzzy boundary by using fuzzy rough C-means clustering and only those points located in the boundary can be further discussed the possibility to be reassigned as outliers. As a result, this method can obtain better clustering results for normal points and have better accuracy for outlier detection. Experimental results show that the proposed method, on average, keeps, or improves the detection precision and reduces the false alarm rate as well as reduces the number of candidate outliers to be discussed.

One of the major tasks with gene expression data is to find groups of coregulated genes whose collective expression is strongly associated with sample categories. In (2011), Maji proposed a new clustering algorithm, termed as fuzzy-rough supervised attribute clustering (FRSAC) to find such groups of genes. The proposed algorithm is based on the theory of fuzzy-rough sets, which directly incorporates the information of sample categories into the gene clustering process. A new quantitative measure is introduced based on fuzzy-rough sets that incorporates the information of sample categories to measure the similarity among genes. The proposed algorithm is based on measuring the similarity between genes using the new quantitative measure, whereby redundancy among the genes is removed. The clusters are refined incrementally based on sample categories. Effectiveness of the proposed FRSAC algorithm, along with a comparison with existing supervised and unsupervised gene selection and clustering algorithms, is demonstrated.

In (2011), Zhou et al. exploited a concept of shadowed sets to describe rough-fuzzy clustering. They develop a technique of an automatic selection of a threshold parameter, which determines approximation regions in rough set-based clustering. A lack of knowledge about global relationships among objects caused by the individual absolute distance in rough C-means clustering or individual membership in rough-fuzzy C-means clustering can be circumvented. Subsequently, relative approximation regions of each cluster are detected and described. By integrating several technologies of granular computing including fuzzy sets, rough sets, and shadowed sets, they show that the resulting characterization leads to an efficient description of information granules obtained through the process of clustering including their overlap regions, outliers, and boundary regions. Comparative experimental results reported for synthetic and real-world data illustrate the essence of the proposed idea.

Fuzzy quotient space

In 2003, fuzzy quotient space theory is presented. The theory and the method of quotient space in precise granularity are extended to fuzzy granular computing. The researchers apply fuzzy quotient space theory to clustering (Feng et al. 2004; Tang et al. 2008). The fuzziness of quotient space can be obtained from the following three aspects: (1) the universe is introduced into fuzzy sets. (2) Fuzzy topological structure based on the topological structure is introduced. (3) The relation of equivalence is extended to the fuzzy relation of equivalence.

In the third situation, fuzzy concept can be introduced into relation of equivalence, when value λ in relation of equivalence R ranges from 0 to 1. So quotient space λ on universe X will be obtained. According to the different λ, a quotient space family or called hierarchical structure on X can be obtained. In fact, a fuzzy relation of equivalence corresponds to the hierarchical structure on a universe.

Granular computing for subspace clustering

Granular computing, as a new method for solving complex problems, is widely applied in subspace clustering.

Ideas of granular computing can be explained thoroughly in subspace clustering. There are two tasks in subspace clustering: the one is to find universe subspace. The other is to classify the samples in each subspace. The former aims at viewing a problem from different perspectives. The latter aims at viewing a problem from different levels in this perspective. Because a set of attributes in each subspace is actually a subset of the set of attributes in the whole space, a process of finding each subspace on the universe is a process of granulating the set of attributes on the universe. A process of selecting a subspace is a process of clustering attributes related to each other and deleting the irrelevant and redundant attributes. A process of clustering the samples in each subspace is a process of granulating samples. Various sizes of the sample clusters can be obtained by clustering. Then various sizes of the sample granules can be formed. The application of granular computing in subspace clustering is firstly the application of granular computing thinking. The essence of subspace clustering is granulating both the sample and the attribute.

Most soft subspace clustering methods commonly utilize within-cluster information, and seldom consider other important information such as between-cluster information. In 2010, Deng et al. proposed a new clustering technique called enhanced soft subspace clustering (ESSC) by employing both within-cluster and between-class information (Deng et al. 2010). Firstly, a new optimization objective function is developed by integrating the within-class compactness with the between-cluster separation in the subspace. Based on this objective function, the corresponding update rules for clustering are then derived, followed by the development of the novel ESSC algorithm. Experimental studies demonstrate that the accuracy of the proposed ESSC algorithm outperforms most existing state-of-the-art soft subspace clustering algorithms.

Almost all subspace clustering algorithms proposed so far are designed for numeric data sets. In 2011, Ahmad et al. presented a k-means type clustering algorithm that finds clusters in data subspaces in mixed numeric and categorical data sets (Ahmad and Dey 2011). In this method, they compute the degree of contribution of attributes to different clusters. They created a new cost function for a k-means type algorithm. One of the advantages of this algorithm is its complexity which is linear with respect to the number of the data points. This algorithm is also useful in describing the cluster formation in terms of the degree of contribution of attributes to different clusters. The clustering results are explained by using attribute weights in the clusters.

In high-dimensional data, clusters of objects usually exist in subspaces; besides, different clusters probably have different shape volumes. Most existing methods for high-dimensional data clustering, however, only consider the former factor. They ignore the latter factor by assuming the same shape volume value for different clusters. In 2011, Peng el al. proposed a new Gaussian mixture model (GMM) type algorithm for discovering clusters with various shape volumes in subspaces (Peng and Zhang 2011). They extend the GMM clustering method to calculate a local weight vector as well as a local variance within each cluster, and use the weight and variance values to capture the main properties that discriminate different clusters, including subsets of relevant dimensions and shape volumes. Experimental results on both synthetic and real datasets show that the proposed algorithm outperforms its competitors, especially when applying to high-dimensional data sets.

Due to data sparseness and attribute redundancy in high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space. In 2011, Bai et al. propose a novel attribute weighting algorithm for clustering high-dimensional categorical data (Bai et al. 2011). The algorithm is an extension of the k-modes clustering algorithm. In the proposed algorithm, a novel weighting technique for categorical data is developed to calculate two weights for each attribute (or dimension) in each cluster and use the weight values to identify the subsets of important attributes that categorize different clusters. The experimental studies show that the proposed algorithm is effective in clustering categorical data sets and also scalable to large data sets owning to its linear time complexity with respect to the number of data objects, attributes or clusters.

The measure of data reliability has recently proven useful for a number of data analysis tasks. In (2011), Boongoen et al. extended the underlying metric to a new problem of soft subspace clustering and proposes a filter approach. The concept of subspace clustering has been increasingly recognized as an effective alternative to conventional algorithms (which search for clusters without differentiating the significance of different data attributes) While a large number of crisp subspace approaches have been proposed, only a handful of soft counterparts are developed with the common goal of acquiring the optimal cluster-specific dimension weights. Most soft subspace clustering methods work based on the exploitation of k-means and greatly rely on the iteratively disclosed cluster centers for the determination of local weights. Unlike such wrapper techniques, Boongoen proposes a filter approach which is efficient and generally applicable to different types of clustering and outperforms several well-known subspace clustering algorithms.

In (2012), Chen et al. proposed a new method to weight subspaces in feature groups and individual features for clustering high-dimensional data. In this method, the features of high-dimensional data are divided into feature groups, based on their natural characteristics. Two types of weights are introduced to the clustering process to simultaneously identify the importance of feature groups and individual features in each cluster. A new optimization model is given to define the optimization process and a new clustering algorithm FG-k-means is proposed to optimize the optimization model. The new algorithm is an extension to k-means by adding two additional steps to automatically calculate the two types of subspace weights. A new data generation method is presented to generate high-dimensional data with clusters in subspaces of both feature groups and individual features. Experimental results on synthetic and real-life data have shown that the FG-k-means algorithm significantly outperformed four k-means type algorithms, i.e., k-means, W-k-means, LAC and EWKM in almost all experiments. The new algorithm is robust to noise and missing values which commonly exist in high-dimensional data.

Conclusions and prospect

Clustering is an important part of some research fields, such as pattern recognition, data mining, machine learning, and so on, and plays an extremely vital role in the aspect of recognizing the intrinsic structure of data. Clustering as an interdisciplinary field is applied widely in many research fields. Facing massive, high-dimensional, distributed, dynamic and complex data characterized by incompleteness, unreliability, inaccuracy, inconsistency, and so on, it is extremely difficult to satisfy these demands by using conventional cluster methods.

Granularity as a tool describes fuzzy uncertain object. Granular computing is a study of structured thinking, problem solving and information-processing paradigm based on multiple levels of granularity, and may be regarded as a series of theories, methodologies, techniques, and tools that in the process of problem solving. Granular computing is mainly used for intelligent processing of uncertain, incomplete, fuzzy and massive information. Clustering analysis embodies the granular thinking. Granularity clustering, by using theories, methodologies and tools, such as fuzzy set theory, rough set theory and quotient space theory, can expand the study of clustering analysis in order to find an optimal “granule”, get the best result of clustering and solve the problem better.

Although some methods obtain the predictive effect in rough-fuzzy sets theory and fuzzy quotient space, there are also many shortcomings.

Firstly, there is no clustering technology which can be generally applied in a wide variety of structures can be presented by all kinds of multidimensional data sets. Three main models of granular computing are fuzzy set, rough set and quotient space. They have advantages and disadvantages and are going to infiltrate and complement each other. It is the research trend in the future to find out how to integrate them and how to construct a unified granularity clustering model.

Secondly, from point of view of the diversity of granular space, some measurements of granularity haven’t been studied deeply, such as, granular space based on neighborhood and granular space based on fuzzy neighborhood. Hence, it is necessary to develop different types of measurements of granularity.

Finally, we need to explore further how a process of granulating the data objects affects clustering results, so that some granularity clustering models are presented to implement the goal-oriented clustering. On either digital data sets or text data sets, many problems can be solved by using existing method, but the actual problems usually contain both mixed data. These data can be clustered by pretreatment, but the results may not be accurate enough in this way. An important research direction in the future is to improve the accuracy on mixed data sets.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61379101).

References

Ahmad A, Dey L. A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Pattern Recogn Lett. 2011;32(7):1062–1069. doi: 10.1016/j.patrec.2011.02.017. [DOI] [Google Scholar]
An QS, Shen JY, Wang GY. A clustering method based on information granularity and rough sets. Pattern Recog Artif Intell. 2003;6(4):412–417. [Google Scholar]
Bai L, Liang JY, Cao FY. Improved K-Modes Clustering Algorithm Based on Rough Sets. Comput Sci. 2009;36(1):162–176. [Google Scholar]
Bai L, Liang JY, Dang CY, Cao FY. A novel attribute weighting algorithm for clustering high-dimensional categorical data. Pattern Recogn. 2011;44(12):2843–2861. doi: 10.1016/j.patcog.2011.04.024. [DOI] [Google Scholar]
Bargiela A, Pedrycz W. Granular computing: an introduction. Boston: Kluwer Academic Publishers; 2003. [Google Scholar]
Bargiela A, Pedrycz W. Recursive information granulation: aggregation and interpretation issues. IEEE Trans Syst Man Cybern B Cybern. 2003;33(1):96–112. doi: 10.1109/TSMCB.2003.808190. [DOI] [PubMed] [Google Scholar]
Boongoen T, Shang CJ, Iam-On N, Shen Q. Extending Data Reliability Measure to a Filter Approach for Soft Subspace Clustering. IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics. 2011;41(6):1705–1714. doi: 10.1109/TSMCB.2011.2160341. [DOI] [PubMed] [Google Scholar]
Bu DB, Bai S, Li G. Principle of granularity in clustering and classification. Chin J Comput Chin Edition- 2002;25(8):810–816. [Google Scholar]
Celikyilmaz A. Soft-Link Spectral Clustering for Information Extraction. 2009 IEEE Third International Conference on Semantic Computing (ICSC 2009), 2009: 434-441
Chen M, Miao DQ. Interval set clustering. Expert Syst Appl. 2011;38(4):2923–2932. doi: 10.1016/j.eswa.2010.06.052. [DOI] [Google Scholar]
Chen Y H, Yao Y Y. Multiview intelligent data analysis based on granular computing. In: proceedings of 2006 IEEE international conference on granular computing. Shanghai, 2006
Chen J, Zhang YP, Zhang L. Analysis and Application of Clustering Based on Information Granularity. J Image Graphics. 2007;12(1):87–91. [Google Scholar]
Chen XJ, Ye YM, Xu XF, Huang JZ. A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recogn. 2012;45(1):434–446. doi: 10.1016/j.patcog.2011.06.004. [DOI] [Google Scholar]
Deng ZH, Choi KS, Chung FL, Wang ST. Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recogn. 2010;43(3):767–781. doi: 10.1016/j.patcog.2009.09.010. [DOI] [Google Scholar]
Ding SF, Xu L, Zhu H, Zhang LW. Research and Progress of Cluster Algorithms Based on Granular Computing. Int J Digital Content Technol Appl. 2010;4(5):96–104. doi: 10.4156/jdcta.vol4.issue5.11. [DOI] [Google Scholar]
Feng X, Ling Z, Wang LW. The Approach of the Fuzzy Granular Computing Based on the Theory of Quotient Space. Pattern Recog Artif Intell. 2004;17(4):425–429. [Google Scholar]
Fukushima Y, Tsukada M, Tsuda I, et al. Spatial clustering property and its self-similarity in membrane potentials of hippocampal CA1 pyramidal neurons for a spatio-temporal input sequence. Cogn Neurodyn. 2007;1(4):305–316. doi: 10.1007/s11571-007-9026-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gang Y, Miao DQ. Duan Q G New rough leader clustering algorithm. Comput Sci. 2009;36(5):203–205. [Google Scholar]
Han JW, Micheline K. Data Mining: Concepts and Techniques (Second Edition) Massachusetts: Morgan Kaufmann Publishers; 2006. [Google Scholar]
Hao XL, Xie KM. Parallel artificial immune clustering algorithm based on dynamic granulation. Comput Eng. 2007;33(23):194–196. [Google Scholar]
He L, Wu L, Cai Y. Survey of Clustering Algorithms in Data Mining. Appl Res Comput. 2007;24(1):10–13. [Google Scholar]
Herawan T, Deris MM, Abawajy JH. A rough set approach for selecting clustering attribute. Knowl Based Syst. 2010;23(3):220–231. doi: 10.1016/j.knosys.2009.12.003. [DOI] [Google Scholar]
Leslie V. A theory of the learnable. Commun ACM. 1984;27(11):1134–1142. doi: 10.1145/1968.1972. [DOI] [Google Scholar]
Li H, Ding SF. Research of individual neural network generation and ensemble algorithm based on quotient space granularity clustering. Appl Math Informat Sci. 2013;7(2):701–708. doi: 10.12785/amis/070238. [DOI] [Google Scholar]
Li D, Meng H, Shi XS. Membership Clouds and Membership Cloud Generators. J Comput Res Dev. 1995;32(6):16–21. [Google Scholar]
Liu YC, Li DY. Granular Computing Based on Cloud Model. In: Miao DQ, editor. Uncertainty and Granular Computing. Beijing: Science Press; 2011. [Google Scholar]
Liu Y, Lue YJ, Li YJ. Application of Rough Set and K-means Clustering in Image Segmentation. Infrared Laser Eng. 2004;33(3):300–302. [Google Scholar]
Liu SH, Hu F, Jia ZY, Shi ZZ. A Rough Set Based Hierarchical Clustering Algorithm. J Comput Res Dev. 2004;41(4):552–557. [Google Scholar]
Liu Q, Sun H, Wang H. The present studying state of granular computing and studying of granular computing based on the semantics of rough logic. Chin J Comput Chin Edition- 2008;31(4):543. doi: 10.3724/SP.J.1016.2008.00543. [DOI] [Google Scholar]
Maji P. Fuzzy-Rough Supervised Attribute Clustering Algorithm and Classification of Microarray Data. IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics. 2011;41(1):222–233. doi: 10.1109/TSMCB.2010.2050684. [DOI] [PubMed] [Google Scholar]
Malyszko D, Stepaniuk J. Adaptive multilevel rough entropy evolutionary thresholding. Inf Sci. 2010;180(7):1138–1158. doi: 10.1016/j.ins.2009.11.034. [DOI] [Google Scholar]
Malyszko D, Stepaniuk J. Rough Entropy Hierarchical Agglomerative Clustering in Image Segmentation. Trans Rough Sets XIII. 2011;6499:89–103. doi: 10.1007/978-3-642-18302-7_6. [DOI] [Google Scholar]
Miao DQ. Uncertainty and granular computing. Beijing: Science Press; 2011. [Google Scholar]
Miao DQ, Wang GY, Liu Q, et al. Granular computing: past, present, future. Beijing: Science Press; 2007. [Google Scholar]
Mirkin B, Nascimento S. Additive spectral method for fuzzy clustering analysis of similarity data including community structure and affinity matrices. Inf Sci. 2012;183(1):16–34. doi: 10.1016/j.ins.2011.09.009. [DOI] [Google Scholar]
Mitra S, Pedrycz W, Barman B. Shadowed c-means: integrating fuzzy and rough clustering. Pattern Recogn. 2010;43(4):1282–1291. doi: 10.1016/j.patcog.2009.09.029. [DOI] [Google Scholar]
Pawlak Z. Rough sets. Int J Informat Comput Sci. 1982;11(5):145–172. doi: 10.1007/BF01001956. [DOI] [Google Scholar]
Pedrycz W. Granular computing: analysis and design of intelligent systems. Boca Raton: CRC Press; 2013. [Google Scholar]
Pedrycz W, Bargiela A. An optimization of allocation of information granularity in the interpretation of data structures: toward granular fuzzy clustering. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on. 2012;42(3):582–590. doi: 10.1109/TSMCB.2011.2170067. [DOI] [PubMed] [Google Scholar]
Pedrycz W, Keun KC. Boosting of granular models. Fuzzy Sets Syst. 2006;157(22):2934–2953. doi: 10.1016/j.fss.2006.07.005. [DOI] [Google Scholar]
Pedrycz W, Bassis S, Malchiodi D. The puzzle of granular computing. Heidelberg: Springer; 2008. [Google Scholar]
Pedrycz W, Loia V, Senatore S. Fuzzy Clustering With Viewpoints. IEEE Trans Fuzzy Syst. 2010;18(2):274–284. [Google Scholar]
Peng LQ, Zhang JY. An entropy weighting mixture model for subspace clustering of high-dimensional data. Pattern Recogn Lett. 2011;32(8):1154–1161. doi: 10.1016/j.patrec.2011.03.003. [DOI] [Google Scholar]
Posner MI, editor. Foundations of cognitive science. Cambridge: The MIT Press; 1989. [Google Scholar]
Su CT, Chen LS, Yih Y. Knowledge acquisition through information granulation for imbalanced data. Expert Syst Appl. 2006;31(3):531–541. doi: 10.1016/j.eswa.2005.09.082. [DOI] [Google Scholar]
Tang XQ, Zhu P, Cheng JX. Clustering analysis Based on Fuzzy Quotient Space. J Softw. 2008;19(4):861–868. doi: 10.3724/SP.J.1001.2008.00861. [DOI] [Google Scholar]
Wang LW. Study of granular analysis in clustering. Comput Eng Appl. 2006;42(5):29–31. [Google Scholar]
Wang G, Yao Y, Yu H. A survey on rough set theory and applications. Chin J Comput. 2009;32(7):1229–1246. doi: 10.3724/SP.J.1016.2009.01229. [DOI] [Google Scholar]
Wang GY, Zhong QH, Ma XA, et al. Granular computing models for knowledge uncertainty. J. Softw. 2011;22(4):679–694. [Google Scholar]
White BS, Shalloway D. Efficient uncertainty minimization for fuzzy spectral clustering. Phys Rev E. 2009;80(5):056705. doi: 10.1103/PhysRevE.80.056705. [DOI] [PubMed] [Google Scholar]
Xie Y, Raghavan VV, Dhatric P, Zhao XQ. A new fuzzy clustering algorithm for optimally finding granular prototypes. Int J Approximate Reasoning. 2005;40(1–2):109–124. doi: 10.1016/j.ijar.2004.11.002. [DOI] [Google Scholar]
Xue ZX, Shang YL, Feng AF. Semi-supervised outlier detection based on fuzzy rough C-means clustering. Math Comput Simul. 2010;80(9):1911–1921. doi: 10.1016/j.matcom.2010.02.007. [DOI] [Google Scholar]
Yan LL, Zhang YP, Hu BY. Covering Clustering Algorithm Based on Quotient Space Granularity. Appl Res Comput. 2008;25(1):47–49. [Google Scholar]
Yang T, Li LS. A Data Reduction Algorithm Using Clustering Based on Rough Set Theory. J Syst Simul. 2004;16(10):2195–2197. [Google Scholar]
Yanto ITR, Herawan T, Deris MM. Data clustering using variable precision rough set. Intell Data Anal. 2011;15(4):465–482. [Google Scholar]
Yao YY. Three perspectives of granular computing. J Nanchang Inst Technol. 2006;25(2):16–21. [Google Scholar]
Yao YY. The art of granular computing. Rough sets and intelligent systems paradigms. Berlin: Springer; 2007. pp. 101–112. [Google Scholar]
Yao Y Y (2008) Granular computing: past, present and future. In: 2008 IEEE international conference on granular compting. Beijing.
Yao YY. Interpreting concept learning in cognitive informatics and granular computing. Syst Man Cybern Part B. 2009;39(4):855–866. doi: 10.1109/TSMCB.2009.2013334. [DOI] [PubMed] [Google Scholar]
Yao Y Y (2000) Granular computing: basic issues and possible solutions. In: proceedings of the 5th Joint conference on information sciences. Elsevier Publishing Company, USA, 186–189
Yong C, Hong M, Min Z, et al. An Overview of Granular Computing. Comput Sci. 2005;32(9):1–12. [Google Scholar]
Zadeh LA. Fuzzy logic: computing with words. IEEE Trans Fuzzy Syst. 1996;1(2):103–111. doi: 10.1109/91.493904. [DOI] [Google Scholar]
Zadeh LA. Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst. 1997;19:111–127. doi: 10.1016/S0165-0114(97)00077-8. [DOI] [Google Scholar]
Zhang L, Zhang B. Quotient space based clustering analysis. In Proceedings of Foundations and Novel Approaches in Data Mining, 2006: 259-269
Zhang X, Yin Y X, Xu M Z. Research of Text Clustering Based on Fuzzy Granular Computing. In: 2009 Second IEEE International Conference on Computer Science and Informational Tecnology, 2009:288-291
Zhang B, Zhang L. Theory and applications of problem solving. North-Holland: Elsevier; 1992. [Google Scholar]
Zhang L, Zhang B. Theory of fuzzy quotient space (methods of fuzzy granular computing) J Softw. 2003;14(4):770–776. [Google Scholar]
Zhang L, Zhang B, Yin H. An alternative covering design algorithm of multi-layer neural networks. J Softw. 1999;10(7):737–742. [Google Scholar]
Zhang WX, Hao WZ, Liang JY, Li DY. Rough set theory and method. Beijing: Science Press; 2001. [Google Scholar]
Zhang JS, Leung Y, Xu ZB. Clustering methods by simulating visual systems. Chin J Comput Chin Edit. 2001;24(5):496–501. [Google Scholar]
Zhang LJ, Li ZJ, Chen HW. Granular computing and its application in data mining. Comput Sci. 2005;32(12):178–180. [Google Scholar]
Zhang C, Xia SX, Liu B. A robust fuzzy kernel clustering algorithm. Appl Math Inf Sci. 2013;7(2):1005–1012. [Google Scholar]
Zhang JH, Peng XD, Liu H, et al. Classifying human operator functional state based on electrophysiological and performance measures and fuzzy clustering method. Cogn Neurodyn. 2013;7(6):477–494. doi: 10.1007/s11571-013-9243-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao S, Zhang Y, Zhang L, et al. Covering clustering algorithm. J Anhui Univ (Nat Sci) 2005;29(2):28–32. [Google Scholar]
Zhao F, Liu HQ, Jiao LC. Spectral clustering with fuzzy similarity measure. Digit Signal Process. 2011;21(6):701–709. doi: 10.1016/j.dsp.2011.07.002. [DOI] [Google Scholar]
Zheng S Z, Zhao X L, Zhang B Q (2009) Web document clustering research based on granular computing. In: 2009 2nd international symposium on electronic commerce and security, pp 446–450
Zhong MS. Fuzzy clustering of web page. J East China Jiaotong Univ. 2004;21(5):59–62. [Google Scholar]
Zhou J, Pedrycz W, Miao DQ. Shadowed sets in the characterization of rough-fuzzy clustering. Pattern Recognit. 2011;44(8):1738–1749. doi: 10.1016/j.patcog.2011.01.014. [DOI] [Google Scholar]
Zhu H, Ding SF, Xu L, Zhang LW. Research and development of granularity clustering. Commun Comput Inf Sci. 2011;159(5):253–258. doi: 10.1007/978-3-642-22691-5_44. [DOI] [Google Scholar]
Zhu H, Ding SF, Xu XZ. An AP clustering algorithm of fine-grain parallelism based on improved attribute reduction. J Comput Res Dev. 2012;49(12):2638–2644. [Google Scholar]

[CR1] Ahmad A, Dey L. A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Pattern Recogn Lett. 2011;32(7):1062–1069. doi: 10.1016/j.patrec.2011.02.017. [DOI] [Google Scholar]

[CR2] An QS, Shen JY, Wang GY. A clustering method based on information granularity and rough sets. Pattern Recog Artif Intell. 2003;6(4):412–417. [Google Scholar]

[CR3] Bai L, Liang JY, Cao FY. Improved K-Modes Clustering Algorithm Based on Rough Sets. Comput Sci. 2009;36(1):162–176. [Google Scholar]

[CR4] Bai L, Liang JY, Dang CY, Cao FY. A novel attribute weighting algorithm for clustering high-dimensional categorical data. Pattern Recogn. 2011;44(12):2843–2861. doi: 10.1016/j.patcog.2011.04.024. [DOI] [Google Scholar]

[CR5] Bargiela A, Pedrycz W. Granular computing: an introduction. Boston: Kluwer Academic Publishers; 2003. [Google Scholar]

[CR6] Bargiela A, Pedrycz W. Recursive information granulation: aggregation and interpretation issues. IEEE Trans Syst Man Cybern B Cybern. 2003;33(1):96–112. doi: 10.1109/TSMCB.2003.808190. [DOI] [PubMed] [Google Scholar]

[CR7] Boongoen T, Shang CJ, Iam-On N, Shen Q. Extending Data Reliability Measure to a Filter Approach for Soft Subspace Clustering. IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics. 2011;41(6):1705–1714. doi: 10.1109/TSMCB.2011.2160341. [DOI] [PubMed] [Google Scholar]

[CR8] Bu DB, Bai S, Li G. Principle of granularity in clustering and classification. Chin J Comput Chin Edition- 2002;25(8):810–816. [Google Scholar]

[CR9] Celikyilmaz A. Soft-Link Spectral Clustering for Information Extraction. 2009 IEEE Third International Conference on Semantic Computing (ICSC 2009), 2009: 434-441

[CR10] Chen M, Miao DQ. Interval set clustering. Expert Syst Appl. 2011;38(4):2923–2932. doi: 10.1016/j.eswa.2010.06.052. [DOI] [Google Scholar]

[CR11] Chen Y H, Yao Y Y. Multiview intelligent data analysis based on granular computing. In: proceedings of 2006 IEEE international conference on granular computing. Shanghai, 2006

[CR12] Chen J, Zhang YP, Zhang L. Analysis and Application of Clustering Based on Information Granularity. J Image Graphics. 2007;12(1):87–91. [Google Scholar]

[CR13] Chen XJ, Ye YM, Xu XF, Huang JZ. A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recogn. 2012;45(1):434–446. doi: 10.1016/j.patcog.2011.06.004. [DOI] [Google Scholar]

[CR14] Deng ZH, Choi KS, Chung FL, Wang ST. Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recogn. 2010;43(3):767–781. doi: 10.1016/j.patcog.2009.09.010. [DOI] [Google Scholar]

[CR15] Ding SF, Xu L, Zhu H, Zhang LW. Research and Progress of Cluster Algorithms Based on Granular Computing. Int J Digital Content Technol Appl. 2010;4(5):96–104. doi: 10.4156/jdcta.vol4.issue5.11. [DOI] [Google Scholar]

[CR16] Feng X, Ling Z, Wang LW. The Approach of the Fuzzy Granular Computing Based on the Theory of Quotient Space. Pattern Recog Artif Intell. 2004;17(4):425–429. [Google Scholar]

[CR17] Fukushima Y, Tsukada M, Tsuda I, et al. Spatial clustering property and its self-similarity in membrane potentials of hippocampal CA1 pyramidal neurons for a spatio-temporal input sequence. Cogn Neurodyn. 2007;1(4):305–316. doi: 10.1007/s11571-007-9026-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] Gang Y, Miao DQ. Duan Q G New rough leader clustering algorithm. Comput Sci. 2009;36(5):203–205. [Google Scholar]

[CR19] Han JW, Micheline K. Data Mining: Concepts and Techniques (Second Edition) Massachusetts: Morgan Kaufmann Publishers; 2006. [Google Scholar]

[CR20] Hao XL, Xie KM. Parallel artificial immune clustering algorithm based on dynamic granulation. Comput Eng. 2007;33(23):194–196. [Google Scholar]

[CR21] He L, Wu L, Cai Y. Survey of Clustering Algorithms in Data Mining. Appl Res Comput. 2007;24(1):10–13. [Google Scholar]

[CR22] Herawan T, Deris MM, Abawajy JH. A rough set approach for selecting clustering attribute. Knowl Based Syst. 2010;23(3):220–231. doi: 10.1016/j.knosys.2009.12.003. [DOI] [Google Scholar]

[CR23] Leslie V. A theory of the learnable. Commun ACM. 1984;27(11):1134–1142. doi: 10.1145/1968.1972. [DOI] [Google Scholar]

[CR24] Li H, Ding SF. Research of individual neural network generation and ensemble algorithm based on quotient space granularity clustering. Appl Math Informat Sci. 2013;7(2):701–708. doi: 10.12785/amis/070238. [DOI] [Google Scholar]

[CR25] Li D, Meng H, Shi XS. Membership Clouds and Membership Cloud Generators. J Comput Res Dev. 1995;32(6):16–21. [Google Scholar]

[CR26] Liu YC, Li DY. Granular Computing Based on Cloud Model. In: Miao DQ, editor. Uncertainty and Granular Computing. Beijing: Science Press; 2011. [Google Scholar]

[CR27] Liu Y, Lue YJ, Li YJ. Application of Rough Set and K-means Clustering in Image Segmentation. Infrared Laser Eng. 2004;33(3):300–302. [Google Scholar]

[CR28] Liu SH, Hu F, Jia ZY, Shi ZZ. A Rough Set Based Hierarchical Clustering Algorithm. J Comput Res Dev. 2004;41(4):552–557. [Google Scholar]

[CR29] Liu Q, Sun H, Wang H. The present studying state of granular computing and studying of granular computing based on the semantics of rough logic. Chin J Comput Chin Edition- 2008;31(4):543. doi: 10.3724/SP.J.1016.2008.00543. [DOI] [Google Scholar]

[CR30] Maji P. Fuzzy-Rough Supervised Attribute Clustering Algorithm and Classification of Microarray Data. IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics. 2011;41(1):222–233. doi: 10.1109/TSMCB.2010.2050684. [DOI] [PubMed] [Google Scholar]

[CR31] Malyszko D, Stepaniuk J. Adaptive multilevel rough entropy evolutionary thresholding. Inf Sci. 2010;180(7):1138–1158. doi: 10.1016/j.ins.2009.11.034. [DOI] [Google Scholar]

[CR32] Malyszko D, Stepaniuk J. Rough Entropy Hierarchical Agglomerative Clustering in Image Segmentation. Trans Rough Sets XIII. 2011;6499:89–103. doi: 10.1007/978-3-642-18302-7_6. [DOI] [Google Scholar]

[CR33] Miao DQ. Uncertainty and granular computing. Beijing: Science Press; 2011. [Google Scholar]

[CR34] Miao DQ, Wang GY, Liu Q, et al. Granular computing: past, present, future. Beijing: Science Press; 2007. [Google Scholar]

[CR35] Mirkin B, Nascimento S. Additive spectral method for fuzzy clustering analysis of similarity data including community structure and affinity matrices. Inf Sci. 2012;183(1):16–34. doi: 10.1016/j.ins.2011.09.009. [DOI] [Google Scholar]

[CR36] Mitra S, Pedrycz W, Barman B. Shadowed c-means: integrating fuzzy and rough clustering. Pattern Recogn. 2010;43(4):1282–1291. doi: 10.1016/j.patcog.2009.09.029. [DOI] [Google Scholar]

[CR37] Pawlak Z. Rough sets. Int J Informat Comput Sci. 1982;11(5):145–172. doi: 10.1007/BF01001956. [DOI] [Google Scholar]

[CR38] Pedrycz W. Granular computing: analysis and design of intelligent systems. Boca Raton: CRC Press; 2013. [Google Scholar]

[CR39] Pedrycz W, Bargiela A. An optimization of allocation of information granularity in the interpretation of data structures: toward granular fuzzy clustering. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on. 2012;42(3):582–590. doi: 10.1109/TSMCB.2011.2170067. [DOI] [PubMed] [Google Scholar]

[CR40] Pedrycz W, Keun KC. Boosting of granular models. Fuzzy Sets Syst. 2006;157(22):2934–2953. doi: 10.1016/j.fss.2006.07.005. [DOI] [Google Scholar]

[CR41] Pedrycz W, Bassis S, Malchiodi D. The puzzle of granular computing. Heidelberg: Springer; 2008. [Google Scholar]

[CR42] Pedrycz W, Loia V, Senatore S. Fuzzy Clustering With Viewpoints. IEEE Trans Fuzzy Syst. 2010;18(2):274–284. [Google Scholar]

[CR43] Peng LQ, Zhang JY. An entropy weighting mixture model for subspace clustering of high-dimensional data. Pattern Recogn Lett. 2011;32(8):1154–1161. doi: 10.1016/j.patrec.2011.03.003. [DOI] [Google Scholar]

[CR44] Posner MI, editor. Foundations of cognitive science. Cambridge: The MIT Press; 1989. [Google Scholar]

[CR45] Su CT, Chen LS, Yih Y. Knowledge acquisition through information granulation for imbalanced data. Expert Syst Appl. 2006;31(3):531–541. doi: 10.1016/j.eswa.2005.09.082. [DOI] [Google Scholar]

[CR46] Tang XQ, Zhu P, Cheng JX. Clustering analysis Based on Fuzzy Quotient Space. J Softw. 2008;19(4):861–868. doi: 10.3724/SP.J.1001.2008.00861. [DOI] [Google Scholar]

[CR47] Wang LW. Study of granular analysis in clustering. Comput Eng Appl. 2006;42(5):29–31. [Google Scholar]

[CR48] Wang G, Yao Y, Yu H. A survey on rough set theory and applications. Chin J Comput. 2009;32(7):1229–1246. doi: 10.3724/SP.J.1016.2009.01229. [DOI] [Google Scholar]

[CR49] Wang GY, Zhong QH, Ma XA, et al. Granular computing models for knowledge uncertainty. J. Softw. 2011;22(4):679–694. [Google Scholar]

[CR50] White BS, Shalloway D. Efficient uncertainty minimization for fuzzy spectral clustering. Phys Rev E. 2009;80(5):056705. doi: 10.1103/PhysRevE.80.056705. [DOI] [PubMed] [Google Scholar]

[CR51] Xie Y, Raghavan VV, Dhatric P, Zhao XQ. A new fuzzy clustering algorithm for optimally finding granular prototypes. Int J Approximate Reasoning. 2005;40(1–2):109–124. doi: 10.1016/j.ijar.2004.11.002. [DOI] [Google Scholar]

[CR52] Xue ZX, Shang YL, Feng AF. Semi-supervised outlier detection based on fuzzy rough C-means clustering. Math Comput Simul. 2010;80(9):1911–1921. doi: 10.1016/j.matcom.2010.02.007. [DOI] [Google Scholar]

[CR53] Yan LL, Zhang YP, Hu BY. Covering Clustering Algorithm Based on Quotient Space Granularity. Appl Res Comput. 2008;25(1):47–49. [Google Scholar]

[CR54] Yang T, Li LS. A Data Reduction Algorithm Using Clustering Based on Rough Set Theory. J Syst Simul. 2004;16(10):2195–2197. [Google Scholar]

[CR55] Yanto ITR, Herawan T, Deris MM. Data clustering using variable precision rough set. Intell Data Anal. 2011;15(4):465–482. [Google Scholar]

[CR56] Yao YY. Three perspectives of granular computing. J Nanchang Inst Technol. 2006;25(2):16–21. [Google Scholar]

[CR57] Yao YY. The art of granular computing. Rough sets and intelligent systems paradigms. Berlin: Springer; 2007. pp. 101–112. [Google Scholar]

[CR58] Yao Y Y (2008) Granular computing: past, present and future. In: 2008 IEEE international conference on granular compting. Beijing.

[CR59] Yao YY. Interpreting concept learning in cognitive informatics and granular computing. Syst Man Cybern Part B. 2009;39(4):855–866. doi: 10.1109/TSMCB.2009.2013334. [DOI] [PubMed] [Google Scholar]

[CR60] Yao Y Y (2000) Granular computing: basic issues and possible solutions. In: proceedings of the 5th Joint conference on information sciences. Elsevier Publishing Company, USA, 186–189

[CR61] Yong C, Hong M, Min Z, et al. An Overview of Granular Computing. Comput Sci. 2005;32(9):1–12. [Google Scholar]

[CR62] Zadeh LA. Fuzzy logic: computing with words. IEEE Trans Fuzzy Syst. 1996;1(2):103–111. doi: 10.1109/91.493904. [DOI] [Google Scholar]

[CR63] Zadeh LA. Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst. 1997;19:111–127. doi: 10.1016/S0165-0114(97)00077-8. [DOI] [Google Scholar]

[CR64] Zhang L, Zhang B. Quotient space based clustering analysis. In Proceedings of Foundations and Novel Approaches in Data Mining, 2006: 259-269

[CR65] Zhang X, Yin Y X, Xu M Z. Research of Text Clustering Based on Fuzzy Granular Computing. In: 2009 Second IEEE International Conference on Computer Science and Informational Tecnology, 2009:288-291

[CR66] Zhang B, Zhang L. Theory and applications of problem solving. North-Holland: Elsevier; 1992. [Google Scholar]

[CR67] Zhang L, Zhang B. Theory of fuzzy quotient space (methods of fuzzy granular computing) J Softw. 2003;14(4):770–776. [Google Scholar]

[CR68] Zhang L, Zhang B, Yin H. An alternative covering design algorithm of multi-layer neural networks. J Softw. 1999;10(7):737–742. [Google Scholar]

[CR69] Zhang WX, Hao WZ, Liang JY, Li DY. Rough set theory and method. Beijing: Science Press; 2001. [Google Scholar]

[CR70] Zhang JS, Leung Y, Xu ZB. Clustering methods by simulating visual systems. Chin J Comput Chin Edit. 2001;24(5):496–501. [Google Scholar]

[CR71] Zhang LJ, Li ZJ, Chen HW. Granular computing and its application in data mining. Comput Sci. 2005;32(12):178–180. [Google Scholar]

[CR72] Zhang C, Xia SX, Liu B. A robust fuzzy kernel clustering algorithm. Appl Math Inf Sci. 2013;7(2):1005–1012. [Google Scholar]

[CR73] Zhang JH, Peng XD, Liu H, et al. Classifying human operator functional state based on electrophysiological and performance measures and fuzzy clustering method. Cogn Neurodyn. 2013;7(6):477–494. doi: 10.1007/s11571-013-9243-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR74] Zhao S, Zhang Y, Zhang L, et al. Covering clustering algorithm. J Anhui Univ (Nat Sci) 2005;29(2):28–32. [Google Scholar]

[CR75] Zhao F, Liu HQ, Jiao LC. Spectral clustering with fuzzy similarity measure. Digit Signal Process. 2011;21(6):701–709. doi: 10.1016/j.dsp.2011.07.002. [DOI] [Google Scholar]

[CR76] Zheng S Z, Zhao X L, Zhang B Q (2009) Web document clustering research based on granular computing. In: 2009 2nd international symposium on electronic commerce and security, pp 446–450

[CR77] Zhong MS. Fuzzy clustering of web page. J East China Jiaotong Univ. 2004;21(5):59–62. [Google Scholar]

[CR78] Zhou J, Pedrycz W, Miao DQ. Shadowed sets in the characterization of rough-fuzzy clustering. Pattern Recognit. 2011;44(8):1738–1749. doi: 10.1016/j.patcog.2011.01.014. [DOI] [Google Scholar]

[CR79] Zhu H, Ding SF, Xu L, Zhang LW. Research and development of granularity clustering. Commun Comput Inf Sci. 2011;159(5):253–258. doi: 10.1007/978-3-642-22691-5_44. [DOI] [Google Scholar]

[CR80] Zhu H, Ding SF, Xu XZ. An AP clustering algorithm of fine-grain parallelism based on improved attribute reduction. J Comput Res Dev. 2012;49(12):2638–2644. [Google Scholar]

PERMALINK

Survey on granularity clustering

Shifei Ding

Mingjing Du

Hong Zhu

Abstract

Introduction

Granular computing and clustering analysis

Essences of granular computing

Principle of granularity in clustering

Advantages of granularity clustering (He et al. 2007; Zhang et al. 2005)

Granularity clustering theories

Fuzzy clustering analysis

Rough clustering

Clustering analysis based on quotient space

Clustering based on a hybrid approach

Rough-fuzzy sets

Fuzzy quotient space

Granular computing for subspace clustering

Conclusions and prospect

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Survey on granularity clustering

Shifei Ding

Mingjing Du

Hong Zhu

Abstract

Introduction

Granular computing and clustering analysis

Essences of granular computing

Principle of granularity in clustering

Advantages of granularity clustering (He et al. 2007; Zhang et al. 2005)

Granularity clustering theories

Fuzzy clustering analysis

Rough clustering

Clustering analysis based on quotient space

Clustering based on a hybrid approach

Rough-fuzzy sets

Fuzzy quotient space

Granular computing for subspace clustering

Conclusions and prospect

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases