. Author manuscript; available in PMC: 2022 Jun 1.

Published in final edited form as: J Biomed Inform. 2021 Apr 20;118:103788. doi: 10.1016/j.jbi.2021.103788

Table 2.

Features of 3 implemented clustering algorithms.

Algorithm	Class	Computational Method	Advantages	Disadvantages
Agglomerative hierarchical clustering with Ward’s method	Connectivity-based	Sequential, bottom-up merging of objects into clusters to minimize within-cluster error sum of squares	Does not require a priori designation of number of clusters. Can be implemented with a variety of distance metrics and linkage methods.	Geometric interpretation assumes objects are in Euclidean space. Tends to result in hyperspherical clusters of similar size. Not robust to outliers. High computational cost with high-dimensional data. Requires designation of a level to cut the hierarchy to obtain a final cluster solution. Every outlier observation is forced into a cluster.
Partitioning Around Mediods (PAM) (k-medoids)	Partitioning	Iteratively defines a central observation within a cluster (medoid) and assigns each object to the nearest medoid	Robust to outliers. Can be implemented with a variety of distance metrics and linkage methods. Low computational cost.	Requires a priori designation of number of clusters. Tends to result in hyperspherical clusters of similar size. Every outlier observation is forced into a cluster.
Self-organizing maps (SOM)	Neural-network based	High-dimensional data are projecting into a 1-D or 2-D lattice of neurons, preserving the proximity relationships of original data as a topological map	Low computational intensity; very fast. Can be implemented with a variety of distance metrics and linkage methods.	Classically considered a method of visualization, not a clustering approach. Every outlier observation is forced into a cluster. Requires a priori designation of number of clusters.

Compiled from the following references: [1,2,25].