Skip to main content
. Author manuscript; available in PMC: 2019 May 16.
Published in final edited form as: J Am Stat Assoc. 2018 May 16;113(521):95–110. doi: 10.1080/01621459.2017.1330202

Table 5:

Results from the MSN cluster scenario based on 500 simulations for each sample size (1000 for sparse k-means). The percentage of models selecting each number of clusters is based on only those data sets for which any variables were selected for clustering.

Approach N ARI (95% CI) Number of Clusters (%)
1 2 3 4 5 6
VSCC+MVN 200 0.599(0.585,0.612) 0.2 41.8 48.2 7.8 1.4 0.6
500 0.589(0.578,0.600) 0.0 38.0 28.8 26.2 6.0 1.0
800 0.515(0.504,0.525) 0.0 24.6 11.2 36.0 23.0 5.2

VSCC+MSN 200 0.556(0.543,0.568) 11.6 82.6 5.8 0.0 0.0 0.0
500 0.569(0.559,0.579) 0.8 81.6 15.4 2.2 0.0 0.0
800 0.558(0.546,0.569) 0.0 71.0 21.4 6.8 0.8 0.0

clustvarsel+MVN 200 0.473(0.460,0.486) 0.0 9.0 49.8 34.2 6.0 1.0
500 0.372(0.362,0.382) 0.0 0.0 1.8 35.2 37.6 25.4
800 0.359(0.351,0.366) 0.0 0.0 0.0 7.6 39.2 53.2

clustvarsel+MSN 200 0.512(0.497,0.527) 0.2 68.2 29.6 1.6 0.4 0.0
500 0.498(0.484,0.512) 0.0 23.8 55.2 18.6 2.4 0.0
800 0.458(0.445,0.471) 0.0 5.2 39.8 39.8 11.2 4.0

clustvarsel+CFUST 200 0.574(0.563,0.585) 9.4 86.0 4.0 0.60 0.0 0.0
skewvarsel+MVN 200 0.560(0.547,0.574) 0.0 21.93 51.51 25.35 1.01 0.20
500 0.517(0.507,0.527) 0.0 0.6 11.4 72.2 14.6 1.2
800 0.526(0.520,0.533) 0.0 0.0 3.4 78.4 15.4 2.8

skewvarsel+MSN 200 0.595(0.584,0.605) 0.20 87.12 12.27 0.40 0.0 0.0
500 0.702(0.693,0.711) 0.0 49.6 49.6 0.8 0.0 0.0
800 0.762(0.755,0.768) 0.0 16.0 79.0 4.8 0.2 0.0

skewvarsel+CFUST 200 0.615(0.607,0.623) 24.4 73.8 1.2 0.0 0.0 0.0

PGMM 200 0.608(0.598,0.617) 0.0 67.6 27.2 1.8 1.2 2.2
500 0.650(0.639,0.660) 0.0 33.8 43.8 18.8 3.6 0.0
800 0.584(0.569,0.598) 0.0 9.0 35.0 28.6 22.2 5.2

sparse k-means 200 0.392 (0.377, 0.408) 63.4 16.0 15.0 4.9 0.7 0.0
500 0.425 (0.410, 0.440) 50.5 11.8 27.6 7.0 2.9 0.2
800 0.420 (0.406, 0.434) 42.3 12.5 33.2 6.8 4.8 0.4