Algorithm 1. K-means algorithm |
Input: the number of clusters to be formed the training set ( data matrix) Output:
Randomly choose < points ( rows of the data matrix). These points are the centroids.
Assign a cluster to each point (or observation), randomly.
Calculate the centroid of each cluster (i.e., the vector of the means of the different variables).
For each point, calculate its Euclidean distance with the centroids of each of the clusters.
Assign the closest cluster to the object.
Calculate the sum of the intra-cluster variability.
Repeat steps 3 to 5 until an equilibrium is reached, that is, convergence: no more change in clusters, or stabilization of the sum of the intra-cluster variability.
|