Skip to main content

View full-text article in PMC

. 2021 Nov 23;9(11):e30467. doi: 10.2196/30467

Table 1.

Performance comparisons between HAC, HAC+KNN, LDA, and k-means models for data sets with 10,000 messages.

Model	Runtime (seconds), mean (SD)	Precision, mean (SD)	Recall, mean (SD)	F score, mean (SD)
HAC^a	6.594 (0.245)	N/A^b	N/A	N/A
HAC+KNN^c (u=0.2)	2.172 (0.097)	0.993 (0.003)	0.982 (0.005)	0.986 (0.004)
HAC+KNN (u=0.4)	2.502 (0.023)	0.995 (0.001)	0.996 (0.002)	0.995 (0.001)
HAC+KNN (u=0.6)	3.418 (0.071)	0.997 (0.001)	0.998 (0.001)	0.997 (0.001)
HAC+KNN (u=0.8)	4.697 (0.146)	0.998 (0.001)	0.999 (0.001)	0.999 (0.001)
LDA^d	1788.981 (62.444)	0.624 (0.029)	0.939 (0.006)	0.704 (0.023)
K-means	41.143 (1.334)	0.993 (0.002)	0.734 (0.011)	0.823 (0.010)

^aHAC: hierarchical agglomerative clustering.

^bN/A: not applicable, because model does not include the parameter u.

^cKNN: k-nearest neighbors.

^dLDA: latent Dirichlet allocation.