Pseudo code “K”-value determination based on clustering for data labeling |
Step-1: Define input parameters - data, max_clusters = 10, scaling{True, False}, visualization{True, False}, and metric=’euclidean’ Step-2: Define list - n_clusters_list, silhouette_list Step-3: if (scaling = = True) Then scalar = convert_to_min_max (data) else scalar = data Step-4: For n_c = 2 to max_clusters + 1 do kmeans_model = KMeans(n_clusters = n_c).fit(scalar) labels = find_labels(kmeans_model) n_clusters_list.append(n_c) silhouette_list.append(silhouette_score(scalar, labels, metric = metric)) End Step-5: Cross-verification of “K” value with the Elbow method. Step-6: Find the best parameters based on defined lists Step-7: Perform data labeling with the best model Step-8: Visualize the best Clustering corresponds to Number of clusters (n_c) and Silhouette score. |