Table 1.
Libraries | Language | Ease of implementation and use | Robustness | Number of downloads in 2021 | |||
---|---|---|---|---|---|---|---|
Data processing | Data clustering | Internal validation metrics | Clusters description | Cluster stability assessment | |||
FPC | R | Yes | Yes | Silhouette width, Caliński-Harabasz index, Hubert's gamma coefficient, Dunn index, Tibshirani, and Walther's prediction strength, etc. | Yes | Bootstrap, noise, resampling, etc. | 985,853 |
Cluster | R | Yes | Yes | Silhouette width, gap statistic, etc. | Yes | / | 891,577 |
Clue | R | No | Yes | Variance accounted for (VEF), deviance accounted for (DEF), … | Yes | Bootstrap | 467,260 |
clValid | R | No | Yes | Connectivity, Silhouette width, Dunn Index, etc. | Yes | Removing each column, one at a time | 96,676 |
sklearn.cluster | Python | Yes | Yes | Rand index, Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI), Silhouette width, Caliński-Harabasz Index, Davies-Bouldin Index, etc. | No | / | NA |
Number of downloads in 2021 is based on the cran_downloads() function in the cranlogs R library. NA, not applicable.