Skip to main content
. 2021 Nov 23;9(11):e30467. doi: 10.2196/30467

Table 1.

Performance comparisons between HAC, HAC+KNN, LDA, and k-means models for data sets with 10,000 messages.

Model Runtime (seconds), mean (SD) Precision, mean (SD) Recall, mean (SD) F score, mean (SD)
HACa 6.594 (0.245) N/Ab N/A N/A
HAC+KNNc (u=0.2) 2.172 (0.097) 0.993 (0.003) 0.982 (0.005) 0.986 (0.004)
HAC+KNN (u=0.4) 2.502 (0.023) 0.995 (0.001) 0.996 (0.002) 0.995 (0.001)
HAC+KNN (u=0.6) 3.418 (0.071) 0.997 (0.001) 0.998 (0.001) 0.997 (0.001)
HAC+KNN (u=0.8) 4.697 (0.146) 0.998 (0.001) 0.999 (0.001) 0.999 (0.001)
LDAd 1788.981 (62.444) 0.624 (0.029) 0.939 (0.006) 0.704 (0.023)
K-means 41.143 (1.334) 0.993 (0.002) 0.734 (0.011) 0.823 (0.010)

aHAC: hierarchical agglomerative clustering.

bN/A: not applicable, because model does not include the parameter u.

cKNN: k-nearest neighbors.

dLDA: latent Dirichlet allocation.