Skip to main content
. Author manuscript; available in PMC: 2022 Jun 1.
Published in final edited form as: J Biomed Inform. 2021 Apr 20;118:103788. doi: 10.1016/j.jbi.2021.103788

Table 2.

Features of 3 implemented clustering algorithms.

Algorithm Class Computational Method Advantages Disadvantages
Agglomerative hierarchical clustering with Ward’s method Connectivity-based Sequential, bottom-up merging of objects into clusters to minimize within-cluster error sum of squares Does not require a priori designation of number of clusters.
Can be implemented with a variety of distance metrics and linkage methods.
Geometric interpretation assumes objects are in Euclidean space.
Tends to result in hyperspherical clusters of similar size.
Not robust to outliers.
High computational cost with high-dimensional data.
Requires designation of a level to cut the hierarchy to obtain a final cluster solution.
Every outlier observation is forced into a cluster.
Partitioning Around Mediods (PAM) (k-medoids) Partitioning Iteratively defines a central observation within a cluster (medoid) and assigns each object to the nearest medoid Robust to outliers.
Can be implemented with a variety of distance metrics and linkage methods.
Low computational cost.
Requires a priori designation of number of clusters.
Tends to result in hyperspherical clusters of similar size.
Every outlier observation is forced into a cluster.
Self-organizing maps (SOM) Neural-network based High-dimensional data are projecting into a 1-D or 2-D lattice of neurons, preserving the proximity relationships of original data as a topological map Low computational intensity; very fast.
Can be implemented with a variety of distance metrics and linkage methods.
Classically considered a method of visualization, not a clustering approach.
Every outlier observation is forced into a cluster.
Requires a priori designation of number of clusters.

Compiled from the following references: [1,2,25].