Skip to main content
. 2020 Jul 3;36(20):5027–5036. doi: 10.1093/bioinformatics/btaa613

Fig. 1.

Fig. 1.

Optimal univariate clustering leads in silhouette cluster quality over heuristic clustering. Heuristic methods include model-based, heuristic k-means and hierarchical clustering in eight configurations. Cluster quality is measured by average silhouette width (ASW) and adjusted Rand index (ARI)—both the higher the better, as a function of number of clusters k. (a) Optical density of protein DNase (n = 176). (b) Simulated data (n = 251) from a five-component Gaussian mixture model. (c) Locations (n = 1771) of CC dinucleotide on the human mitochondrial genome. The blue vertical lines in (a–c) mark the maximum ASW in each plot. (d–f) ranked ARI of each method from the same Gaussian mixture model with (b), in different sample sizes of (d) 25, (e) 100 and (f) 500, each replicated 51 times to produce the box plots