Table 5. Selected clustering results with high or moderate clustering partition metrics.
Clustering validity metrics and the number of clusters k for different partitions obtained from different data representations. Repr. denotes a data representation used for clustering. It is either an embedding provided by a manifold learning algorithm (SE - Spectral Embedding, LLE - Locally Linear Embedding) or pairwise distances inferred from the data (L1 - Manhattan distance in the original space of taxonomic abundances). Spectral, Spectral Clustering algorithm. D-B index, Davies-Bouldin index. Silh. score, Silhouette score. DBCV, Density-Based Clustering Validation index. Ent., Entropy. Notation as in Table 2.
Tax | Repr. | Cluster method | k | D-B index | Silh. score | DBCV | Prediction Strength | Ent. | |
---|---|---|---|---|---|---|---|---|---|
AGP | O | L1 | Spectral | 2 | 0.60 | 0.60 | −0.63 | 0.98 | 0.06 |
O | LLE | Spectral | 2 | 0.49 | 0.74 | −0.86 | 0.94 | 0.06 | |
O | LLE | Spectral | 3 | 0.60 | 0.60 | −0.91 | 0.91 | 0.18 | |
O | SE | Spectral | 2 | 0.50 | 0.68 | −0.91 | 0.96 | 0.09 | |
O | SE | Spectral | 3 | 0.57 | 0.63 | −0.92 | 0.94 | 0.19 | |
F | t-SNE | HBDSCAN | 2 | 1.38 | 0.14 | 0.15 | 1.00 | 0.09 | |
F | UMAP | HBDSCAN | 2 | 1.02 | 0.17 | 0.22 | 1.00 | 0.06 | |
G | UMAP | HBDSCAN | 2 | 1.03 | 0.23 | 0.25 | 1.00 | 0.08 | |
HMP | O | t-SNE | HBDSCAN | 2 | 1.00 | 0.13 | 0.12 | 1.00 | 0.06 |
O | UMAP | HBDSCAN | 2 | 0.87 | 0.15 | 0.19 | 1.00 | 0.08 | |
O | UMAP | HBDSCAN | 3 | 1.02 | 0.06 | 0.19 | 1.00 | 0.16 | |
F | UMAP | HBDSCAN | 2 | 1.03 | 0.08 | 0.10 | 1.00 | 0.08 | |
F | SE | HBDSCAN | 2 | 0.53 | 0.64 | −0.63 | 1.00 | 0.09 | |
F | t-SNE | HBDSCAN | 2 | 1.11 | 0.09 | 0.21 | 0.97 | 0.09 | |
G | UMAP | HBDSCAN | 2 | 1.24 | −0.02 | 0.16 | 1.00 | 0.06 |