Table 2.
Cluster Distance Measures.
| Linkage Type | Process | Advantage | Disadvantage |
|---|---|---|---|
| Single-linkage or nearest neighbor | Combines two clusters together that have the smallest amount of dissimilarity (distance) between the closest pair of data points belonging to the different clusters. | Less sensitive to outliers | Based completely on single links between individual data points and cluster, forms elongated chains |
| Complete-linkage or furthest-neighbor | Combines two clusters together that have the largest amount of dissimilarity (distance) between the farthest pair of data points belonging to the different clusters. | Compact, hyperspherical clusters composed of very similar data points | Vulnerable to outliers Tends to break large clusters apart All clusters have the same diameter, so smaller clusters are merged with larger ones |
| Average-linkage or minimum variance | Combines two clusters together after an average distance measure for all preexisting data points belonging to the different cluster is calculated. Clusters are combined together only if a predetermined mathematical threshold is obtained. | Less sensitive to outliers | Tendency to split elongated clusters in half and tail portions of clusters tend to merge with neighboring clusters |
| Ward's method | Minimizes intracluster variance via calculation of the error sum of squares. Combinations are made that yield the smallest error sum of squares. | Clusters are relatively equal in size and shape | Cannot be used with binary variables Has a tendency to form globular clusters |