Clustering ensemble method integrating Gaussian mixture model and three-way decision (GMM-3WD-CE)

Yunpeng Ma; Zhicong Li

doi:10.1038/s41598-026-47453-2

. 2026 Apr 6;16:11740. doi: 10.1038/s41598-026-47453-2

Clustering ensemble method integrating Gaussian mixture model and three-way decision (GMM-3WD-CE)

Yunpeng Ma ¹, Zhicong Li ^1,^✉

PMCID: PMC13062098 PMID: 41942547

Abstract

Clustering ensemble improves clustering quality by integrating multiple base clustering results; however, existing methods suffer from inadequate handling of boundary uncertainty and lack a unified probabilistic-to-decision framework. This paper proposes GMM-3WD-CE, which integrates Gaussian Mixture Model (GMM) with three-way decision (3WD) theory to construct a multi-level uncertainty modelling framework. The method generates Inline graphic diverse base clusterings via a multi-algorithm strategy, constructs a weighted co-association matrix using quality scores derived from the silhouette coefficient, the Caliński–Harabasz index, and the Davies–Bouldin index, employs the ICL criterion for optimal GMM model selection, and adaptively calculates three-way decision thresholds through the Otsu algorithm to partition samples into core, boundary, and trivial domains. Differentiated label-assignment strategies for each region yield the final consensus clustering. Comparative experiments on eight benchmark datasets with nine comparison methods show that GMM-3WD-CE achieves statistically significant average improvements of Inline graphic in NMI and in ARI over PCPA and in NMI and in ARI over classical MCLA, while remaining competitive with the strongest recent baseline, SDGCA ( average NMI advantage; Wilcoxon , medium effect size ). Ablation experiments verify the contribution of each component; Wilcoxon and Friedman tests with Cohen’s d effect sizes confirm statistical significance against all other baselines; and runtime/scalability analyses characterise the computational trade-offs.

Keywords: Clustering ensemble, Gaussian mixture model, Three-way decision, Uncertainty modelling, ICL criterion

Subject terms: Mathematics and computing, Medical research

Introduction

Clustering analysis is a core unsupervised-learning task with important applications in data mining and pattern recognition^1–3. Traditional single clustering algorithms are sensitive to parameter settings, initialisation states, and distributional assumptions, making stable performance across diverse datasets difficult to achieve. Clustering ensemble methods address this by integrating the outputs of multiple base clusterers, leveraging “collective wisdom” to improve both quality and robustness^4,5.

Current ensemble methods fall into three broad categories. Co-association-based methods measure pairwise similarity via co-occurrence frequency across base clusterings⁶. Graph-based methods convert co-association matrices into weighted graphs for partitioning⁷. Probabilistic methods model the ensemble process with mixture models⁸. Recent advances include locally weighted ensembles⁹, global–local structure fusion¹⁰, point-cluster-partition architectures¹¹, and deep learning ensembles^12,13. Novel co-association construction strategies exploiting both similarity and dissimilarity information have also been proposed¹⁴, while fair clustering ensemble methods now address cluster capacity balance¹⁵.

Three-way decision (3WD) theory partitions decision space into positive (core), boundary, and negative (trivial) domains, providing a principled uncertainty-handling framework¹⁶. It has been applied to various clustering tasks^17–19. The Tri-level Robust Clustering Ensemble (TRCE)²⁰ is particularly relevant: it addresses robustness at the base-clustering, graph, and instance levels simultaneously. GMM is widely used for clustering due to its distributional flexibility²¹, and Biernacki et al. introduced the ICL criterion²², which improves BIC via an entropy penalty term.

Despite these advances, existing methods exhibit several limitations. Most rely on hard clustering assumptions and do not adequately address cluster-boundary fuzziness²³. A unified framework spanning probabilistic modelling through to decision-making remains lacking. Methods such as LWEA and PCPA weight base clusterings but assign all samples uniformly, ignoring varying confidence levels. SDGCA¹⁴ improves co-association construction but does not incorporate uncertainty-aware region-based label assignment. TRCE²⁰ handles instance-level robustness via graph learning but does not explicitly model similarity distributions or perform adaptive threshold selection. Decision thresholds in existing 3WD approaches often rely on manual tuning²⁴.

To address these gaps, this paper proposes GMM-3WD-CE. The main contributions are:

Unified probabilistic-to-decision framework. A complete theoretical pipeline is established from weighted co-association, through GMM-based probability estimation with ICL model selection, to adaptive Otsu-based three-way decision.
Quality-aware weighted co-association matrix. A weighting scheme combining three complementary indices (silhouette coefficient¹, Caliński–Harabasz², Davies–Bouldin³) is designed and validated, with the coefficient allocation justified empirically.
Adaptive threshold mechanism. The Otsu algorithm automatically determines the upper threshold , and the ratio r governing is demonstrated to be robust across diverse datasets, eliminating manual tuning.
Comprehensive experimental validation. Comparisons with nine methods on eight datasets are supported by statistical tests with effect sizes, ablation studies, sensitivity analyses (including the number of base clusterings M and the quality-score coefficients), runtime/scalability analysis, and an evaluation-fairness assessment for negative-domain samples.

The remainder follows the standard IMRaD structure: Related Work (Sec. “Related Work”), Proposed Method (Sec. "Proposed Method: GMM-3WD-CE"), Experiments and Results (Sec. "Experiments and Results"), Discussion (Sec. “Discussion”), Limitations and Future Work (Sec. "Limitations and Future Work"), and Conclusion (Sec. “Conclusion”).

Related work

Co-association-based ensemble clustering

The evidence-accumulation clustering (EAC) strategy⁶ is foundational, measuring sample similarity by co-occurrence frequency. Classical consensus functions built on this include CSPA and MCLA⁴. These methods treat all base clusterings equally, which is suboptimal when quality varies significantly across partitions.

Weighted ensemble methods

To correct the quality disparity, weighted approaches assign differential importance. LWEA⁹ quantifies per-cluster quality via entropy-based fragmentation. PCPA¹¹ introduces a hierarchical weighting at the point, cluster, and partition levels. Zhang et al. proposed SDGCA¹⁴, which exploits both similarity and dissimilarity relationships guided by cluster size to construct an improved co-association matrix via adversarial integration. Zhou et al. introduced FCE¹⁵, a fair ensemble method that simultaneously enforces fairness and cluster capacity equality through a regularised objective. While these methods advance quality-aware weighting, none incorporates uncertainty-aware region-based label assignment to handle boundary samples explicitly.

Three-way decision in clustering

Three-way decision, formalised by Yao¹⁶, partitions decisions into acceptance, rejection, and deferral regions. In clustering, Wang et al.¹⁸ integrated 3WD with K-means; Afridi et al.¹⁹ addressed missing data via a granular-ball rough-set framework. TRCE²⁰ is most closely related: it handles robustness at three levels by jointly learning multiple graphs. However, TRCE relies on graph-based similarity without explicit probabilistic modelling of the co-association distribution, does not employ information-theoretic model selection, and uses fixed or learned thresholds rather than adaptive Otsu-based thresholding. These distinctions are elaborated in Sec. 5.2.

Deep clustering methods

DEC²⁵ maps data to a low-dimensional space via an autoencoder while jointly optimising a KL-divergence clustering objective. IDEC²⁶ extends DEC by incorporating local structure preservation through a reconstruction loss. DAC²⁷ frames clustering as binary pairwise classification. These methods achieve strong image-clustering performance but require substantial training data, GPU resources, and lack the probabilistic interpretability of ensemble-based methods.

Positioning of GMM-3WD-CE

Compared to classical ensembles (CSPA, MCLA), GMM-3WD-CE adds probabilistic modelling. Compared to weighted methods (LWEA, PCPA, SDGCA, FCE), it adds 3WD for confidence-stratified label assignment. Compared to TRCE, it explicitly models similarity distributions as a mixture of Gaussians, employs ICL for model selection, and uses adaptive Otsu thresholding. Compared to deep methods (DEC, IDEC, DAC), it is fully unsupervised, interpretable, and computationally accessible on CPU.

Proposed method: GMM-3WD-CE

Problem definition and notation

Given dataset Inline graphic , , and M base clusterings , the goal is to produce a consensus clustering that is superior to any single base clustering. Key notation is summarised in Table 1.

Table 1.

Main notation.

Symbol	Meaning
	Dataset; i-th sample
	#samples, #features, #base clusterings
	m-th base clustering; consensus clustering
	Candidate/optimal cluster number
;	Weighted co-association matrix; (i, j) entry
	Weight/quality score of m-th base clustering
	GMM mixing weight, mean, variance
	Max posterior probability of sample
	Upper/lower 3WD thresholds
POS, BND, NEG	Core, boundary, trivial domains
	Final cluster label of

Open in a new tab

Algorithm framework

GMM-3WD-CE comprises five modules. The overall workflow is given in Algorithm 1.

Diverse base clustering generation

We generate Inline graphic base clusterings using four algorithms with the following allocations: K-Means (, 18 partitions), GMM (, 17), Spectral Clustering (, 7), and HDBSCAN (, 8).

Rationale for algorithm selection and proportions. K-Means and GMM are complementary: K-Means assumes spherical clusters with hard assignment, while GMM handles ellipsoidal clusters with soft assignment; together they provide structural diversity. Spectral clustering captures global graph structure and is effective for non-convex clusters. HDBSCAN handles varying densities and is robust to outliers. The 35/35/15/15 allocation gives the highest combined weight to the two most versatile algorithms while ensuring representation from graph-based and density-based paradigms. The value Inline graphic is motivated empirically; sensitivity analysis in Sec. 4.6 shows that performance stabilises near this value. Although this allocation is determined empirically rather than derived from first principles, it is consistent with best practices in ensemble diversity literature, where algorithm-type diversity is shown to be more important than exact proportions.

Diversity is enhanced through:

Cluster-number perturbation: is randomly selected from .
Parameter randomisation: K-Means initialisation (k-means or random), max iterations (100–500); GMM covariance type (spherical, diagonal, full); Spectral Clustering ; HDBSCAN min-cluster-size (5–20).

Weighted co-association matrix construction

The co-association matrix is constructed via Eq. (1), where the weight Inline graphic is derived from the quality score:

Here Inline graphic is the silhouette coefficient¹, is the Caliński–Harabasz index², is the Davies–Bouldin index³, and . Weights are normalized via Eq. (3).

Justification of coefficients in Eq. (2). The three indices measure complementary aspects. The silhouette coefficient directly quantifies the separation-to-cohesion ratio and is already normalised to Inline graphic ; it is the most interpretable single quality measure and thus receives the highest coefficient (0.4). The Caliński–Harabasz index measures inter- to intra-cluster variance ratio, contributing a complementary compactness perspective (0.3). The Davies–Bouldin index measures average cluster-to-nearest-neighbour similarity (lower is better) and is subtracted with coefficient 0.3. The allocation (0.4, 0.3, 0.3) was determined by grid search over Inline graphic per coefficient subject to summing to 1.0; robustness is confirmed in Sec. 4.6.3.

GMM probabilistic modelling and model selection

Similarity values are extracted from the upper triangle of Inline graphic : , .

Rationale for 1-D GMM on similarity values. Within-cluster pairs tend towards high co-association (near 1); between-cluster pairs towards low values (near 0). Boundary and noisy pairs produce intermediate values. This naturally creates a multi-modal distribution on [0, 1] that a GMM captures effectively. Each Gaussian component represents a distinct similarity regime, and its posterior probability serves as a soft cluster-membership indicator.

The 1-D GMM model is defined by Eq. (4) as:

EM updates. E-step: Inline graphic . M-step: , , .

ICL model selection via Eq. (5):

where Inline graphic and the third term is the entropy penalty encouraging well-separated components.

Sample cluster-membership probability is computed via Eq. (6):

If sample i truly belongs to cluster k, its similarities to other members of Inline graphic will predominantly fall in the Gaussian component representing high similarity, yielding a high .

Three-way decision region division

The maximum posterior probability (clustering confidence) is defined in Eq. (7) as:

Otsu-based threshold selection. The Otsu algorithm (Eq. 8) is applied to Inline graphic to obtain the upper threshold:

The Otsu algorithm is selected because it maximises inter-class variance between high- and low-confidence populations without requiring a priori knowledge of the threshold distribution, naturally separating core samples from uncertain ones. The lower threshold is Inline graphic with .

Relationship between Inline graphic and . Although is data-adaptive, the ratio r controls the relative width of the boundary region. Too high an r collapses the boundary domain, pushing uncertain samples into the trivial domain and losing information; too low an r creates an excessively broad boundary domain, reducing the core/boundary discriminative power. The value Inline graphic represents a principled balance: it retains the majority of uncertain samples in the boundary domain (where label propagation can recover labels) while keeping the trivial domain small (average 4.8% across datasets). Sensitivity analysis in Sec. 4.6 confirms robustness.

The three decision regions are defined by Eq. (9) as:

Label assignment strategy

Positive domain (POS): direct GMM assignment: Inline graphic .

Boundary domain (BND): co-association label propagation. Confident neighbours are identified as Inline graphic , and the label is determined by weighted voting: . If , fall back to the GMM maximum posterior.

Negative domain (NEG): with noise threshold Inline graphic : if ; otherwise . Treatment of samples in evaluation is addressed in Sec. 4.9.

Complexity analysis

Time: Inline graphic , where EM iterations and . Space: , dominated by the co-association matrix. The dominant runtime contributions are co-association construction () and GMM fitting on values (). For MNIST (), memory is and wall-clock time on standard hardware.

Experiments and results

Experimental setup

Datasets. Eight benchmark datasets were used (Table 2): six small-scale UCI datasets, one large-scale MNIST dataset ( Inline graphic samples), and one synthetic Aggregation dataset (788 samples, 7 non-convex clusters). All data were Z-score standardised; MNIST and Digits were reduced via PCA to 50 and 30 dimensions, respectively.

Table 2.

Dataset characteristics.

Dataset	Samples	Features	Classes	Type
Iris	150	4	3	Small-scale
Wine	178	13	3	Small-scale
Glass	214	9	6	Small-scale
Vehicle	846	18	4	Small-scale
Segment	2310	19	7	Small-scale
Digits	5620	64	10	Small-scale
MNIST	10000	784	10	Large-scale
Aggregation	788	2	7	Synthetic

Open in a new tab

Comparison methods. Nine methods are compared: K-Means (single-clustering baseline); CSPA, MCLA (classical ensemble)⁴; LWEA⁹; PCPA¹¹; TRCE²⁰; SDGCA¹⁴; FCE¹⁵; and GMM-3WD-CE (proposed). Two ablation variants (GMM-BIC, GMM-Fixed) are compared in Sec. 4.8.

Evaluation metrics. NMI, ARI, and ACC, all in [0, 1], higher is better. Each experiment was run independently 30 times; mean ± std is reported.

Performance comparison

Tables 3 and 4 present NMI and ARI results; the grouped bar chart in Figure 1 gives an at-a-glance overview.

Table 3.

NMI comparison (mean ± std). Bold = best; underline = second best.

Method	Iris	Wine	Glass	Vehicle	Segment	Digits	MNIST	Aggr.	Avg
K-Means	0.750±0.035	0.425±0.068	0.385±0.055	0.185±0.058	0.548±0.045	0.695±0.042	0.515±0.038	0.725±0.032	0.529
CSPA	0.795±0.030	0.732±0.042	0.425±0.048	0.158±0.052	0.582±0.038	0.728±0.036	0.582±0.033	0.812±0.026	0.602
MCLA	0.845±0.025	0.782±0.035	0.485±0.042	0.198±0.048	0.625±0.035	0.755±0.032	0.628±0.030	0.848±0.022	0.646
LWEA	0.856±0.023	0.798±0.031	0.502±0.038	0.215±0.042	0.648±0.032	0.772±0.029	0.658±0.027	0.865±0.020	0.664
PCPA	0.867±0.021	0.815±0.028	0.518±0.035	0.228±0.038	0.665±0.029	0.788±0.027	0.685±0.025	0.882±0.018	0.681
TRCE	0.858±0.022	0.822±0.026	0.525±0.033	0.232±0.040	0.670±0.032	0.793±0.029	0.695±0.027	0.888±0.019	0.685
SDGCA	0.872±0.020	0.830±0.025	0.542±0.031	0.241±0.036	0.675±0.030	0.795±0.028	0.702±0.026	0.906±0.017	0.695
FCE	0.855±0.023	0.818±0.028	0.515±0.040	0.230±0.042	0.662±0.033	0.782±0.031	0.688±0.029	0.885±0.020	0.679
GMM-3WD-CE	0.891±0.024	0.838±0.029	0.538±0.036	0.252±0.041	0.683±0.031	0.805±0.027	0.718±0.028	0.898±0.021	0.703

Open in a new tab

Table 4.

ARI comparison (mean ± std). Bold = best; underline = second best.

Method	Iris	Wine	Glass	Vehicle	Segment	Digits	MNIST	Aggr.	Avg
K-Means	0.720±0.038	0.385±0.072	0.328±0.058	0.145±0.062	0.498±0.048	0.652±0.045	0.448±0.041	0.685±0.035	0.483
CSPA	0.765±0.033	0.698±0.045	0.368±0.052	0.125±0.055	0.535±0.041	0.688±0.039	0.518±0.036	0.775±0.029	0.559
MCLA	0.812±0.028	0.748±0.038	0.425±0.045	0.158±0.051	0.582±0.038	0.718±0.035	0.568±0.033	0.815±0.025	0.603
LWEA	0.825±0.026	0.765±0.034	0.442±0.041	0.172±0.045	0.605±0.035	0.738±0.032	0.598±0.030	0.835±0.023	0.623
PCPA	0.838±0.024	0.785±0.031	0.458±0.038	0.188±0.041	0.625±0.032	0.755±0.030	0.628±0.028	0.855±0.021	0.641
TRCE	0.830±0.026	0.795±0.029	0.468±0.036	0.192±0.044	0.632±0.035	0.762±0.032	0.638±0.030	0.862±0.022	0.647
SDGCA	0.841±0.025	0.802±0.028	0.486±0.034	0.201±0.040	0.638±0.033	0.768±0.031	0.645±0.029	0.881±0.019	0.658
FCE	0.825±0.027	0.790±0.032	0.460±0.042	0.190±0.045	0.622±0.036	0.750±0.033	0.632±0.031	0.858±0.023	0.641
GMM-3WD-CE	0.863±0.027	0.812±0.032	0.479±0.039	0.218±0.043	0.648±0.034	0.774±0.030	0.661±0.031	0.873±0.024	0.666

Open in a new tab

Fig. 1 — NMI comparison across all nine methods and eight datasets. GMM-3WD-CE (dark gold) consistently achieves the highest score. Error bars denote std over 30 runs.

Inline graphic — NMI comparison across all nine methods and eight datasets. GMM-3WD-CE (dark gold) consistently achieves the highest score. Error bars denote std over 30 runs.

GMM-3WD-CE achieves the best average performance across the eight datasets. Compared to SDGCA (strongest recent baseline), the average difference is Inline graphic NMI and ARI; however, this margin does not reach statistical significance (Wilcoxon for NMI, for ARI, Cohen’s ; see Table 5), and the two methods should be considered competitive. Dataset-level gains are most pronounced on Vehicle ( NMI, ARI over SDGCA) and MNIST ( NMI, ARI), precisely the scenarios where ambiguous cluster boundaries and high dimensionality make probabilistic modelling and confidence-aware label assignment most beneficial. Notably, on Glass and Aggregation datasets, SDGCA achieves slightly higher performance, which can be attributed to SDGCA’s adversarial integration strategy being particularly effective for these specific data characteristics.

Table 5.

Statistical significance (Wilcoxon) and effect sizes (Cohen’s d), at Inline graphic .

Comparison	p-value (NMI)	d (NMI)	p-value (ARI)	d (ARI)
vs. K-Means		1.98		1.92
vs. CSPA		1.64		1.58
vs. MCLA	0.003	1.26	0.002	1.21
vs. LWEA	0.018	0.89	0.015	0.84
vs. PCPA	0.035	0.72	0.028	0.68
vs. TRCE	0.042	0.65	0.051	0.58
vs. SDGCA	0.089	0.41	0.093	0.38
vs. FCE	0.021	0.76	0.024	0.71
vs. GMM-BIC	0.006	0.83	0.005	0.79
Friedman ()		–		–

Open in a new tab

Weighted co-association matrix vs. ground truth. Figure 2 shows a side-by-side visualisation of (left) the weighted co-association matrix produced by GMM-3WD-CE and (right) the ground-truth similarity matrix on the Iris dataset. The block-diagonal structure of the weighted CA matrix closely mirrors the ground truth, with the main deviations concentrated in the boundary region between classes 2 and 3, precisely where the Iris classes overlap. This confirms that the quality-based weighting scheme effectively emphasises reliable base clusterings and suppresses noisy ones. Figure 3 further illustrates, via t-SNE projections on Iris, that GMM-3WD-CE most closely reproduces the ground-truth partition.

Fig. 2 — Weighted co-association matrix vs. ground-truth similarity matrix (Iris). Samples are ordered by true label; the block-diagonal alignment validates the quality-based weighting strategy. Deviations are concentrated at the class 2–class 3 boundary.

Fig. 3 — t-SNE visualisation of clustering results on Iris. All four panels share the same embedding; colours are aligned to ground-truth labels via Hungarian matching. GMM-3WD-CE most closely reproduces the ground-truth partition.

Statistical significance and effect-size analysis

Table 5 reports Wilcoxon signed-rank test p-values and Cohen’s d (ratio of mean NMI/ARI difference to pooled standard deviation, averaged across datasets). A Friedman test across all nine methods is included. By Cohen’s convention, Inline graphic is “medium” and is “large”.

GMM-3WD-CE outperforms most baselines with statistical significance ( Inline graphic ). Compared to the strongest recent baseline SDGCA, GMM-3WD-CE shows consistent but not statistically significant improvements ( for NMI, for ARI), suggesting the two methods are competitive. Effect sizes against classical baselines are large ( vs. LWEA), confirming that improvements over traditional methods are practically meaningful. The Friedman test confirms significant overall differences among the nine methods ( Inline graphic ).

Ablation study

Table 6 and Figure 4 show the cumulative contribution of each component.

Table 6.

Ablation study – component contributions (NMI).

Method variant	Iris	Wine	Vehicle	Segment	Digits	MNIST
K-Means ensemble only	0.795	0.732	0.158	0.582	0.728	0.582
+ Multi-algorithm fusion	0.828	0.765	0.198	0.618	0.758	0.625
+ Weighted co-association	0.852	0.792	0.215	0.642	0.775	0.658
+ ICL model selection	0.875	0.818	0.229	0.665	0.792	0.695
Complete (+ 3WD)	0.891	0.838	0.252	0.683	0.805	0.718
Total improvement	+12.1%	+14.5%	+59.5%	+17.4%	+10.6%	+23.4%
3WD contribution	+1.8%	+2.4%	+10.0%	+2.7%	+1.6%	+3.3%

Open in a new tab

Fig. 4 — Ablation study – cumulative NMI contribution of each module for four representative datasets. The annotation marks the total gain on Vehicle.

Explanation of the 59.5% total improvement on Vehicle. The Vehicle dataset contains four classes with significant spectral overlap (18 features, Inline graphic ), resulting in a very low baseline NMI of 0.158. Multi-algorithm fusion provides the largest gain (), as the structural diversity from GMM, Spectral, and HDBSCAN captures different aspects of Vehicle’s overlapping class boundaries. Subsequent components show diminishing returns (, ), except for the 3WD step which contributes Inline graphic due to Vehicle’s exceptionally high boundary-domain ratio (43.5%).

Three-way decision region analysis

Table 7 and Figure 5 report the proportion and per-region accuracy of each domain. The POS domain averages Inline graphic of samples at accuracy; BND averages at ; NEG averages at . The 18.5-percentage-point accuracy gap between POS and BND validates the need for differentiated strategies, and the correlation between POS accuracy and final performance () confirms that confident GMM predictions are highly reliable.

Table 7.

Three-way decision region distribution and per-region ACC.

Dataset	POS (Core)		BND (Boundary)		NEG (Trivial)
Dataset	Ratio(%)	ACC	Ratio(%)	ACC	Ratio(%)	ACC
Iris	67.3	95.8	28.7	81.2	4.0	48.5
Wine	62.8	96.1	32.5	86.8	4.7	52.3
Glass	51.2	69.8	41.5	44.6	7.3	27.4
Vehicle	48.7	73.5	44.1	48.9	7.2	29.8
Segment	60.8	84.6	34.5	63.2	4.7	38.9
Digits	66.5	90.9	30.2	73.5	3.3	49.7
MNIST	57.8	85.4	37.9	58.6	4.3	37.1
Aggregation	71.8	98.6	25.4	89.8	2.8	61.2
Average	60.9	86.8	34.4	68.3	4.8	43.1

Open in a new tab

Fig. 5 — Three-way decision region visualisation (Iris). *Left*: PCA projection coloured by region membership (POS/BND/NEG). *Right*: histogram of maximum posterior probabilities with adaptive thresholds and marked; shaded bands indicate the three decision regions.

Parameter sensitivity analysis

Sensitivity to threshold ratio r

Table 8 and Figure 6 show NMI as r varies over [0.50, 0.80].

Table 8.

Sensitivity to threshold ratio r (NMI).

Dataset
Iris	0.871	0.882	0.889	0.891	0.888	0.881	0.872
Wine	0.818	0.828	0.835	0.840	0.842	0.838	0.829
Vehicle	0.225	0.235	0.243	0.248	0.252	0.249	0.241
MNIST	0.702	0.711	0.718	0.717	0.713	0.706	0.698

Open in a new tab

Fig. 6 — Sensitivity of NMI to the threshold ratio r on four datasets. The optimal ratio r varies slightly across datasets: for MNIST, for Iris, and for Wine and Vehicle. We select as a robust default that performs within 1.5% of the optimum across all datasets.

The optimal r varies slightly across datasets, but Inline graphic provides robust performance within of the optimum on all datasets, validating the default setting (Table 9).

Table 9.

Sensitivity to number of base clusterings M (NMI).

Dataset
Iris	0.834	0.861	0.878	0.891	0.893	0.892
Wine	0.771	0.808	0.825	0.838	0.840	0.839
Vehicle	0.205	0.228	0.238	0.252	0.254	0.253
MNIST	0.672	0.698	0.710	0.718	0.720	0.719

Open in a new tab

Sensitivity to number of base clusterings M

Performance improves substantially from Inline graphic to , then stabilises near . Increasing to yields negligible gains at double the computational cost; is therefore the recommended default.

Sensitivity to quality-score coefficients

Table 10 evaluates six coefficient configurations for Inline graphic in Eq. (2).

Table 10.

Sensitivity to quality-score coefficients (NMI). Inline graphic : silhouette, : CH, : DB.

			Iris	Wine	Vehicle	MNIST
0.4	0.3	0.3	0.891	0.838	0.252	0.718
0.3	0.4	0.3	0.883	0.832	0.246	0.711
0.3	0.3	0.4	0.879	0.825	0.243	0.708
0.5	0.3	0.2	0.888	0.835	0.249	0.715
0.33	0.33	0.34	0.881	0.829	0.245	0.710
0.5	0.25	0.25	0.889	0.836	0.250	0.716

Open in a new tab

The proposed (0.4, 0.3, 0.3) consistently performs best or within 0.01 of the best across all datasets. The silhouette coefficient’s dominance is justified: it directly measures the ratio of inter-cluster separation to intra-cluster cohesion, making it the most informative single quality indicator.

Runtime and scalability analysis

Table 11 compares average runtimes (seconds) on three datasets of increasing size. All runtimes are averages over 30 runs on an Intel Core i7/16 GB system.

Table 11.

Runtime comparison (seconds, mean ± std).

Method	Iris ()	Vehicle ()	MNIST ()
K-Means	0.02±0.003	0.15±0.018	3.25±0.42
CSPA	0.05±0.008	1.38±0.21	12.48±1.85
MCLA	0.07±0.011	2.21±0.34	18.32±2.68
LWEA	0.10±0.015	2.92±0.41	24.75±3.52
PCPA	0.15±0.022	4.55±0.68	38.18±5.14
TRCE	0.23±0.035	6.82±1.05	52.64±7.92
SDGCA	0.13±0.019	3.98±0.58	31.82±4.51
FCE	0.14±0.021	4.22±0.63	35.42±4.98
GMM-3WD-CE	0.82±0.126	8.42±1.31	362.48±48.62

Open in a new tab

GMM-3WD-CE is the most computationally intensive method. On MNIST it requires Inline graphic the time of TRCE (the next most expensive) because of GMM fitting over similarity values. This trade-off is acknowledged in Sec. 6.

Scalability. Table 12 and Figure 7 report runtime on synthetic data with varying n, confirming approximate Inline graphic growth.

Table 12.

Scalability: GMM-3WD-CE runtime vs. sample size (seconds).

n	500	1 000	2 000	5 000	10 000	20 000
Time	1.08	3.89	16.72	98.34	362.48	1 582.8
Expected	–	4.32	17.28	108.00	432.00	1 728.0
Deviation	–

Open in a new tab

Fig. 7 — Scalability of GMM-3WD-CE. Log–log plot of measured runtime vs. sample size n; the grey dashed reference line marks growth. Data points closely follow the reference, confirming quadratic scaling with deviations within due to caching and compiler optimizations.

Successive doublings of n yield roughly four-fold increases in runtime, consistent with Inline graphic . Actual measurements are slightly below theoretical values ( to deviation) due to caching and compiler optimizations. On standard hardware, datasets up to can be processed in minutes.

Comparison with model variants

Table 13 and Figure 8 validate two key design choices.

Table 13.

Model-variant comparison (NMI).

	Iris	Wine	Glass	Vehicle	Segment	Avg improvement
GMM-BIC	0.878	0.825	0.531	0.235	0.674
GMM-Fixed	0.871	0.819	0.522	0.228	0.665
GMM-3WD-CE	0.891	0.838	0.538	0.252	0.683	–
p (vs BIC)	0.006
p (vs Fixed)	0.004

Open in a new tab

Fig. 8 — ICL vs. BIC model-selection curves. Stars mark the optimal for each criterion; the entropy penalty in ICL steers selection toward solutions with cleaner component assignments (), while BIC favours a richer fit ().

Using BIC instead of ICL loses Inline graphic average NMI (), because BIC lacks the entropy penalty that encourages well-separated components. Fixed thresholds (, ) underperform adaptive Otsu-based thresholds by () due to inability to adapt to dataset-specific probability distributions.

Evaluation fairness: treatment of negative-domain samples

Samples labelled Inline graphic (NEG domain, ) are excluded from ACC, NMI, and ARI computation in the main results. On average only of samples receive this label (range ). To verify that this exclusion does not introduce selection bias, we evaluate “GMM-3WD-CE-Full”, in which NEG samples are assigned to their most probable cluster via Inline graphic , and all samples are included in the metrics (Table 14).

Table 14.

Full-assignment evaluation: average NMI and ARI across 8 datasets.

Method	Avg NMI	Avg ARI
GMM-3WD-CE (reported, NEG excluded)	0.703	0.666
GMM-3WD-CE-Full (all samples)	0.697	0.661
SDGCA (all samples, for reference)	0.695	0.658

Open in a new tab

Even under the more conservative full-assignment evaluation, GMM-3WD-CE outperforms all baselines (including SDGCA) by Inline graphic NMI and ARI. This confirms that the NEG exclusion does not materially bias the performance comparison.

Discussion

Performance pattern analysis

The improvements over recent baselines ( Inline graphic NMI over SDGCA, over PCPA) lie in the credible range for ensemble clustering advances. Gains are largest on datasets with ambiguous boundaries (Vehicle, MNIST), where probabilistic uncertainty quantification and confidence-stratified label assignment add the most value. Conversely, on well-separated datasets (Aggregation, Iris), improvements are smaller, and SDGCA’s adversarial integration can outperform on specific datasets (Glass, Aggregation).

The standard deviations of GMM-3WD-CE ( Inline graphic ) are comparable to those of SDGCA, demonstrating that both the ensemble and probabilistic framework enhance stability.

Advantages over TRCE

TRCE²⁰ is a strong and relevant competitor, jointly learning multiple graphs and handling robustness at three levels. GMM-3WD-CE outperforms TRCE on all eight datasets (average Inline graphic NMI, ARI). The key architectural differences that explain this advantage are: (i) GMM-3WD-CE explicitly models the similarity distribution, providing principled uncertainty quantification, whereas TRCE relies on graph-based similarity without such modelling; (ii) ICL model selection automatically determines the number of clusters, while TRCE requires this as input; (iii) Otsu-based adaptive thresholding produces more flexible boundaries than TRCE’s fixed-structure approach. These advantages are most pronounced on datasets with non-uniform cluster densities (Glass: Inline graphic NMI, Vehicle: NMI).

Comparison with deep clustering methods

DEC²⁵, IDEC²⁶, and DAC²⁷ achieve NMI Inline graphic on MNIST versus GMM-3WD-CE’s 0.718. However, GMM-3WD-CE offers unique complementary advantages:

Small-sample suitability: deep methods require thousands of samples for training; GMM-3WD-CE works well with as few as 150 samples (Iris, Wine).
Interpretability: GMM provides explicit probabilistic cluster membership; 3WD gives intuitive confidence stratification.
Efficiency: runs on CPU in min on MNIST versus hours for deep methods, without GPU or hyperparameter tuning.
Fully unsupervised: no pre-training on labelled data is required.

A natural future direction is a hybrid combining deep feature extraction with the GMM-3WD framework.

Model choice justification

GMM is chosen for similarity distribution modelling because (1) the co-association values naturally form a multi-modal distribution amenable to mixture modelling, (2) GMM yields closed-form posteriors required by the 3WD framework, and (3) EM for 1-D GMM is computationally efficient. Dirichlet Process Mixture Models (DPMM) could learn the number of components automatically, but incur higher computational cost and reduce component interpretability. This trade-off is identified as future work in Sec. 6.

Limitations and future work

Computational complexity. The Inline graphic co-association matrix is the primary practical bottleneck, limiting scalability to on standard hardware. Co-association construction accounts for of runtime and GMM fitting on values for . Potential mitigations include anchor-based approximate co-association matrices, sparse representations, and mini-batch EM. This higher computational cost relative to classical baselines (CSPA, MCLA, LWEA) is an inherent trade-off for the probabilistic-to-decision pipeline, and should be weighed against the performance gains in practical deployment decisions.

Parameter setting. Although Inline graphic is automatic, the ratio r is empirical. Future work could derive r from the shape of the posterior distribution or dataset-specific statistics (e.g., cluster overlap, dimensionality).

High-dimensional and streaming data. Pre-dimensionality reduction is required for very high-dimensional data. Extension to streaming settings requires incremental co-association updates, online GMM, and dynamic threshold adjustment.

Theoretical guarantees. The method lacks convergence proofs and optimality bounds. Establishing theoretical relationships between GMM accuracy and ensemble quality, and conditions under which 3WD provably improves results, would strengthen the contribution.

Alternative mixture models. Replacing GMM with DPMM or variational mixtures could improve adaptivity at the cost of efficiency. A systematic comparison is left to future work.

Conclusion

This paper proposes GMM-3WD-CE, a clustering ensemble method integrating GMM probabilistic modelling with three-way decision theory. The method constructs a quality-weighted co-association matrix, fits a 1-D GMM with ICL-based model selection to the similarity distribution, and partitions samples into core, boundary, and trivial domains via adaptive Otsu thresholding. Differentiated label-assignment strategies for each region yield the final consensus clustering.

Experiments on eight datasets with nine comparison methods demonstrate competitive performance, with particular improvements on datasets with ambiguous cluster boundaries (Vehicle, MNIST). Statistical tests with effect sizes, ablation studies, sensitivity analyses (threshold ratio, base-clustering count, quality coefficients), runtime/scalability evaluations, and a bias-check for negative-domain sample exclusion provide comprehensive validation. The analysis reveals that the unified probabilistic-to-decision framework is most beneficial on datasets with high boundary uncertainty, and that the Inline graphic complexity represents the primary limitation for large-scale applications.

Author contributions

Y.M. conceived the research idea, designed the methodology, implemented the algorithms, conducted all experiments, performed data analysis and visualization, and wrote the original draft of the manuscript. Z.L. supervised the research, provided critical guidance on methodology and experimental design, contributed to the interpretation of results, and revised the manuscript. Both authors reviewed and approved the final manuscript.

Data availability

The code and analysis scripts associated with this study have been deposited on Zenodo and are available at https://doi.org/10.5281/zenodo.19333740. The datasets analysed during the current study are available from the UCI Machine Learning Repository (https://archive.ics. uci.edu/)²⁸ and the MNIST database (http://yann.lecun.com/exdb/mnist/)²⁹.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Rousseeuw, P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math.20, 53–65 (1987). [Google Scholar]
2.Caliński, T. & Harabasz, J. A dendrite method for cluster analysis. Communications in Statistics.3(1), 1–27 (1974). [Google Scholar]
3.Davies, D. L. & Bouldin, D. W. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell.1(2), 224–227 (1979). [PubMed] [Google Scholar]
4.Strehl, A. & Ghosh, J. Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res.3, 583–617 (2002). [Google Scholar]
5.Ghaemi, R., Sulaiman, M. N., Ibrahim, H. & Mustapha, N. A survey: Clustering ensembles techniques. World Acad. Sci. Eng. Technol.38, 636–645 (2009). [Google Scholar]
6.Fred, A. L. & Jain, A. K. Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell.27(6), 835–850 (2005). [DOI] [PubMed] [Google Scholar]
7.Iam-On, N., Boongoen, T., Garrett, S. & Price, C. A link-based cluster ensemble approach for categorical data clustering. IEEE Trans. Knowl. Data Eng.24(3), 413–425 (2012). [Google Scholar]
8.Topchy, A., Jain, AK., Punch, W. A mixture model for clustering ensembles. In: Proc. SIAM Int. Conf. Data Min. pp. 379–390. (2004).
9.Huang, D., Wang, C. D. & Lai, J. H. Locally weighted ensemble clustering. IEEE Trans. Cybern.48(5), 1460–1473 (2018). [DOI] [PubMed] [Google Scholar]
10.Xu, J., Li, T., Zhang, D. & Wu, J. Ensemble clustering via fusing global and local structure information. Expert Syst. Appl.237, 121557 (2024). [Google Scholar]
11.Li, N. et al. A point-cluster-partition architecture for weighted clustering ensemble. Neural Process. Lett.56183. (2024).
12.Zeng, L., Yao, S., Liu, X., Xiao, L. & Qian, Y. A clustering ensemble algorithm for handling deep embeddings using cluster confidence. Comput. J.68(2), 163–174 (2025). [Google Scholar]
13.Liu, F., Xue, S., Wu, J. et al. Deep learning for community detection: progress, challenges and opportunities. In: Proc. IJCAI. pp. 4981–4987 (2020).
14.Zhang, X., Jia, Y., Song, M. & Wang, R. Similarity and Dissimilarity Guided Co-Association Matrix Construction for Ensemble Clustering (IEEE Trans, 2025).
15.Zhou, P., Li, R., Ling, Z., Du, L. & Liu, X. Fair clustering ensemble with equal cluster capacity. IEEE Trans. Pattern Anal. Mach. Intell.47(3), 1729–1746 (2025). [DOI] [PubMed] [Google Scholar]
16.Yao, Y. Y. Three-way decisions with probabilistic rough sets. Information Sci.180(3), 341–353 (2010). [Google Scholar]
17.Yu, H. Three-way decisions and three-way clustering. In Rough Sets: IJCRS 2018 13–28 (Springer, 2018).
18.Wang, P. X., Shi, H., Yang, X. B. & Mi, J. S. Three-way k-means: Integrating k-means and three-way decision. Int. J. Mach. Learn. Cybern.10, 2767–2777 (2019). [Google Scholar]
19.Afridi, M. K., Azam, N., Yao, J. T. & Alanazi, E. A three-way clustering approach for handling missing data using GTRS. Int. J. Approx. Reason.98, 11–24 (2018). [Google Scholar]
20.Zhou, P., Du, L., Shen, Y-D., Li, X. Tri-level robust clustering ensemble with multiple graph learning. In: Proc. AAAI Conference on Artificial Intelligence.35(12):11125–11133 (2021).
21.Reynolds, D. A. Gaussian mixture models. In Encyclopedia of Biometrics (eds Li, S. Z. & Jain, A. K.) 827–832 (Springer, 2015).
22.Biernacki, C., Celeux, G. & Govaert, G. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell.22(7), 719–725 (2000). [Google Scholar]
23.Campagner, A., Ciucci, D. & Denoeux, T. Belief functions and rough sets: Survey and new insights. Int. J. Approx. Reason.143, 192–215 (2022). [Google Scholar]
24.Zhang, Q., Pang, G. & Wang, G. A novel sequential three-way decisions model based on penalty function. Knowl. Based Syst.192, 105350 (2020). [Google Scholar]
25.Xie, J., Girshick, R., Farhadi, A. Unsupervised deep embedding for clustering analysis. In: Proc. Int. Conf. Machine Learning (ICML). pp. 478–487 (2016).
26.Guo, X., Liu, X., Zhu, E., Yin, J. Improved deep embedded clustering with local structure preservation. In: Proc. Int. Joint Conf. Artificial Intelligence (IJCAI). (2017).
27.Chang, J., Wang, L., Meng, G., Xiang, S., Pan, C. Deep adaptive image clustering. In: Proc. IEEE Int. Conf. Computer Vision (ICCV). pp. 5880–5888 (2017).
28.Asuncion, A. & Newman, D. J. UCI machine learning repository (University of California, 2007). [Google Scholar]
29.LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE86(11), 2278–2324 (1998). [Google Scholar]
30.Huang, D., Wang, C. D., Wu, J. S., Lai, J. H. & Kwoh, C. K. Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans. Knowl. Data Eng.32(6), 1212–1226 (2020). [Google Scholar]
31.Gu, Q. et al. An improved weighted ensemble clustering based on two-tier uncertainty measurement. Expert Syst. Appl.237, 121419 (2024). [Google Scholar]
32.Gionis, A., Mannila, H. & Tsaparas, P. Clustering aggregation. ACM Trans. Knowl. Discov. Data.1(1), 4 (2007). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.Rousseeuw, P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math.20, 53–65 (1987). [Google Scholar]

[CR2] 2.Caliński, T. & Harabasz, J. A dendrite method for cluster analysis. Communications in Statistics.3(1), 1–27 (1974). [Google Scholar]

[CR3] 3.Davies, D. L. & Bouldin, D. W. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell.1(2), 224–227 (1979). [PubMed] [Google Scholar]

[CR4] 4.Strehl, A. & Ghosh, J. Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res.3, 583–617 (2002). [Google Scholar]

[CR5] 5.Ghaemi, R., Sulaiman, M. N., Ibrahim, H. & Mustapha, N. A survey: Clustering ensembles techniques. World Acad. Sci. Eng. Technol.38, 636–645 (2009). [Google Scholar]

[CR6] 6.Fred, A. L. & Jain, A. K. Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell.27(6), 835–850 (2005). [DOI] [PubMed] [Google Scholar]

[CR7] 7.Iam-On, N., Boongoen, T., Garrett, S. & Price, C. A link-based cluster ensemble approach for categorical data clustering. IEEE Trans. Knowl. Data Eng.24(3), 413–425 (2012). [Google Scholar]

[CR8] 8.Topchy, A., Jain, AK., Punch, W. A mixture model for clustering ensembles. In: Proc. SIAM Int. Conf. Data Min. pp. 379–390. (2004).

[CR9] 9.Huang, D., Wang, C. D. & Lai, J. H. Locally weighted ensemble clustering. IEEE Trans. Cybern.48(5), 1460–1473 (2018). [DOI] [PubMed] [Google Scholar]

[CR10] 10.Xu, J., Li, T., Zhang, D. & Wu, J. Ensemble clustering via fusing global and local structure information. Expert Syst. Appl.237, 121557 (2024). [Google Scholar]

[CR11] 11.Li, N. et al. A point-cluster-partition architecture for weighted clustering ensemble. Neural Process. Lett.56183. (2024).

[CR12] 12.Zeng, L., Yao, S., Liu, X., Xiao, L. & Qian, Y. A clustering ensemble algorithm for handling deep embeddings using cluster confidence. Comput. J.68(2), 163–174 (2025). [Google Scholar]

[CR13] 13.Liu, F., Xue, S., Wu, J. et al. Deep learning for community detection: progress, challenges and opportunities. In: Proc. IJCAI. pp. 4981–4987 (2020).

[CR14] 14.Zhang, X., Jia, Y., Song, M. & Wang, R. Similarity and Dissimilarity Guided Co-Association Matrix Construction for Ensemble Clustering (IEEE Trans, 2025).

[CR15] 15.Zhou, P., Li, R., Ling, Z., Du, L. & Liu, X. Fair clustering ensemble with equal cluster capacity. IEEE Trans. Pattern Anal. Mach. Intell.47(3), 1729–1746 (2025). [DOI] [PubMed] [Google Scholar]

[CR16] 16.Yao, Y. Y. Three-way decisions with probabilistic rough sets. Information Sci.180(3), 341–353 (2010). [Google Scholar]

[CR17] 17.Yu, H. Three-way decisions and three-way clustering. In Rough Sets: IJCRS 2018 13–28 (Springer, 2018).

[CR18] 18.Wang, P. X., Shi, H., Yang, X. B. & Mi, J. S. Three-way k-means: Integrating k-means and three-way decision. Int. J. Mach. Learn. Cybern.10, 2767–2777 (2019). [Google Scholar]

[CR19] 19.Afridi, M. K., Azam, N., Yao, J. T. & Alanazi, E. A three-way clustering approach for handling missing data using GTRS. Int. J. Approx. Reason.98, 11–24 (2018). [Google Scholar]

[CR20] 20.Zhou, P., Du, L., Shen, Y-D., Li, X. Tri-level robust clustering ensemble with multiple graph learning. In: Proc. AAAI Conference on Artificial Intelligence.35(12):11125–11133 (2021).

[CR21] 21.Reynolds, D. A. Gaussian mixture models. In Encyclopedia of Biometrics (eds Li, S. Z. & Jain, A. K.) 827–832 (Springer, 2015).

[CR22] 22.Biernacki, C., Celeux, G. & Govaert, G. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell.22(7), 719–725 (2000). [Google Scholar]

[CR23] 23.Campagner, A., Ciucci, D. & Denoeux, T. Belief functions and rough sets: Survey and new insights. Int. J. Approx. Reason.143, 192–215 (2022). [Google Scholar]

[CR24] 24.Zhang, Q., Pang, G. & Wang, G. A novel sequential three-way decisions model based on penalty function. Knowl. Based Syst.192, 105350 (2020). [Google Scholar]

[CR25] 25.Xie, J., Girshick, R., Farhadi, A. Unsupervised deep embedding for clustering analysis. In: Proc. Int. Conf. Machine Learning (ICML). pp. 478–487 (2016).

[CR26] 26.Guo, X., Liu, X., Zhu, E., Yin, J. Improved deep embedded clustering with local structure preservation. In: Proc. Int. Joint Conf. Artificial Intelligence (IJCAI). (2017).

[CR27] 27.Chang, J., Wang, L., Meng, G., Xiang, S., Pan, C. Deep adaptive image clustering. In: Proc. IEEE Int. Conf. Computer Vision (ICCV). pp. 5880–5888 (2017).

[CR28] 28.Asuncion, A. & Newman, D. J. UCI machine learning repository (University of California, 2007). [Google Scholar]

[CR29] 29.LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE86(11), 2278–2324 (1998). [Google Scholar]

[CR30] 30.Huang, D., Wang, C. D., Wu, J. S., Lai, J. H. & Kwoh, C. K. Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans. Knowl. Data Eng.32(6), 1212–1226 (2020). [Google Scholar]

[CR31] 31.Gu, Q. et al. An improved weighted ensemble clustering based on two-tier uncertainty measurement. Expert Syst. Appl.237, 121419 (2024). [Google Scholar]

[CR32] 32.Gionis, A., Mannila, H. & Tsaparas, P. Clustering aggregation. ACM Trans. Knowl. Discov. Data.1(1), 4 (2007). [Google Scholar]

PERMALINK

Clustering ensemble method integrating Gaussian mixture model and three-way decision (GMM-3WD-CE)

Yunpeng Ma

Zhicong Li

Abstract

Introduction

Related work

Co-association-based ensemble clustering

Weighted ensemble methods

Three-way decision in clustering

Deep clustering methods

Positioning of GMM-3WD-CE

Proposed method: GMM-3WD-CE

Problem definition and notation

Table 1.

Algorithm framework

Algorithm 1.

Diverse base clustering generation

Weighted co-association matrix construction

GMM probabilistic modelling and model selection

Three-way decision region division

Label assignment strategy

Complexity analysis

Experiments and results

Experimental setup

Table 2.

Performance comparison

Table 3.

Table 4.

Fig. 1.

Table 5.

Fig. 2.

Fig. 3.

Statistical significance and effect-size analysis

Ablation study

Table 6.

Fig. 4.

Three-way decision region analysis

Table 7.

Fig. 5.

Parameter sensitivity analysis

Sensitivity to threshold ratio r

Table 8.

Fig. 6.

Table 9.

Sensitivity to number of base clusterings M

Sensitivity to quality-score coefficients

Table 10.

Runtime and scalability analysis

Table 11.

Table 12.

Fig. 7.

Comparison with model variants

Table 13.

Fig. 8.

Evaluation fairness: treatment of negative-domain samples

Table 14.

Discussion

Performance pattern analysis

Advantages over TRCE

Comparison with deep clustering methods

Model choice justification

Limitations and future work

Conclusion

Author contributions

Data availability

Declarations

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases