Figure 1.
Archetype stability as evaluated using the following approaches
(A) Minimized residual sum of squares (RSS) for k number of archetypes ranging from 1 to 10 in a scree plot was assessed first. The screen plot was based on RSS from the best model out of 100 restarts of the archetype algorithm for each k and showed a plateau in the drop in intra-cluster variance at k4 or k5.
(B) Stability of the k archetypes was then assessed by a randomized subsampling of 90% of the original dataset repeated 100 times and compared to the original subgroups using the adjusted Rand index. Simultaneously, we evaluated the stability of the archetypes at archetype membership cutoffs ranging from 0 to 1 in intervals of 0.05. The most stable solution was k2, irrespective of membership threshold, followed by k4, which reached a median adjusted Rand index > 0.75 at threshold 0.
(C) Stability of the solution with two and four archetypes across the full range of tested archetype membership thresholds. Altogether, these analyses showed that four archetypes had the lowest RSS while showing high stability after randomization. The subgroup stability increased with an increasing membership threshold and plateaued at 0.6, wherefore this threshold was used as the cutoff for the extreme archetype inclusion.
Whiskers in (B) and (C) correspond to the largest and smallest value no further than 1.5 IQR (inter quartile range) from the hinge.