Skip to main content
. 2021 Mar 10;159:107217. doi: 10.1016/j.csda.2021.107217

Table 3.

Percentage of numbers of clusters identified correctly (IC) based on 1000 simulation replications when data are generated (27) with respect to k-means (K), convex clustering (Convex), and k-means++ (KPP) directly on regression coefficients based on the gap statistic and our BIC selector in generalized k-means.

σ c k n0=50
n0=100
K Convex KPP BIC K Convex KPP BIC
0.1 10 2 92.6 98.5 88.2 100.0 96.5 100.0 93.9 100.0
3 72.3 99.2 90.2 100.0 72.4 99.9 93.1 100.0
20 2 99.7 100.0 98.9 100.0 100.0 100.0 99.9 100.0
3 73.3 100.0 99.2 100.0 72.9 100.0 99.9 100.0
0.2 10 2 94.6 97.8 90.9 100.0 96.0 99.8 93.3 100.0
3 36.8 47.5 26.9 100.0 75.6 99.2 95.0 100.0
20 2 99.7 100.0 99.0 100.0 100.0 100.0 99.9 100.0
3 24.6 24.4 10.6 100.0 77.2 100.0 99.8 100.0
0.5 10 2 95.7 99.9 93.5 100.0 96.0 99.1 95.4 100.0
3 1.1 1.3 1.2 88.7 1.3 1.4 1.4 100.0
20 2 99.9 100.0 99.7 100.0 99.9 100.0 99.8 100.0
3 0.0 0.0 0.0 95.2 0.0 0.0 0.0 100.0
1.0 10 2 96.2 100.0 93.0 100.0 97.7 100.0 95.8 100.0
3 0.9 0.8 1.6 16.2 0.3 0.2 0.3 57.5
20 2 99.6 100.0 99.6 100.0 100.0 100.0 99.9 100.0
3 0.0 0.0 0.0 39.8 0.0 0.0 0.0 84.3