Table 2.
Dataset | -means++ | GMM | Fuzzy | MS | AC-C | AC-W | N-Cuts | AP | Zell | SEC | LDMGI | GDL | PIC | RCC | RCC-DR |
MNIST | 0.500 | 0.404 | 0.386 | 0.264 | NA | 0.679 | NA | 0.478 | NA | 0.469 | 0.761 | NA | NA | 0.893 | 0.828 |
COIL-100 | 0.803 | 0.786 | 0.796 | 0.685 | 0.703 | 0.853 | 0.871 | 0.761 | 0.958 | 0.849 | 0.888 | 0.958 | 0.965 | 0.957 | 0.957 |
YTF | 0.783 | 0.793 | 0.769 | 0.831 | 0.673 | 0.801 | 0.752 | 0.751 | 0.273 | 0.754 | 0.518 | 0.655 | 0.676 | 0.836 | 0.874 |
YaleB | 0.615 | 0.591 | 0.066 | 0.091 | 0.445 | 0.767 | 0.928 | 0.700 | 0.905 | 0.849 | 0.945 | 0.924 | 0.941 | 0.975 | 0.974 |
Reuters | 0.516 | 0.507 | 0.272 | 0.000 | 0.368 | 0.471 | 0.545 | 0.386 | 0.087 | 0.498 | 0.523 | 0.401 | 0.057 | 0.556 | 0.553 |
RCV1 | 0.355 | 0.344 | 0.205 | 0.000 | 0.108 | 0.364 | 0.140 | 0.313 | 0.023 | 0.069 | 0.382 | 0.020 | 0.015 | 0.138 | 0.442 |
Pendigits | 0.679 | 0.695 | 0.695 | 0.694 | 0.525 | 0.728 | 0.813 | 0.639 | 0.317 | 0.741 | 0.775 | 0.330 | 0.467 | 0.848 | 0.854 |
Shuttle | 0.215 | 0.266 | 0.204 | 0.362 | NA | 0.291 | 0.000 | 0.322 | NA | 0.305 | 0.591 | NA | NA | 0.488 | 0.513 |
Mice Protein | 0.425 | 0.385 | 0.417 | 0.534 | 0.315 | 0.525 | 0.536 | 0.554 | 0.428 | 0.537 | 0.527 | 0.400 | 0.394 | 0.649 | 0.638 |
Rank | 7.8 | 8.6 | 9.9 | 9.9 | 12.4 | 6.3 | 6.3 | 8.1 | 10.4 | 7.2 | 4.9 | 9.9 | 10 | 2.4 | 1.6 |
For each dataset, the maximum AMI is highlighted in bold. Some prior algorithms did not scale to large datasets such as MNIST (70,000 data points in 784 dimensions). RCC or RCC-DR achieves the highest accuracy on seven of the nine datasets. RCC-DR achieves the highest or second-highest accuracy on eight of the nine datasets. The average rank of RCC-DR across datasets is lower by a multiplicative factor of 3 or more than the average rank of any prior algorithm. NA, not applicable.