Table 2.
Algorithm | Class | Computational Method | Advantages | Disadvantages |
---|---|---|---|---|
Agglomerative hierarchical clustering with Ward’s method | Connectivity-based | Sequential, bottom-up merging of objects into clusters to minimize within-cluster error sum of squares | Does not require a priori designation of number of clusters. Can be implemented with a variety of distance metrics and linkage methods. |
Geometric interpretation assumes objects are in Euclidean space. Tends to result in hyperspherical clusters of similar size. Not robust to outliers. High computational cost with high-dimensional data. Requires designation of a level to cut the hierarchy to obtain a final cluster solution. Every outlier observation is forced into a cluster. |
Partitioning Around Mediods (PAM) (k-medoids) | Partitioning | Iteratively defines a central observation within a cluster (medoid) and assigns each object to the nearest medoid | Robust to outliers. Can be implemented with a variety of distance metrics and linkage methods. Low computational cost. |
Requires a priori designation of number of clusters. Tends to result in hyperspherical clusters of similar size. Every outlier observation is forced into a cluster. |
Self-organizing maps (SOM) | Neural-network based | High-dimensional data are projecting into a 1-D or 2-D lattice of neurons, preserving the proximity relationships of original data as a topological map | Low computational intensity; very fast. Can be implemented with a variety of distance metrics and linkage methods. |
Classically considered a method of visualization, not a clustering approach. Every outlier observation is forced into a cluster. Requires a priori designation of number of clusters. |