Table 1.
Performance comparisons between HAC, HAC+KNN, LDA, and k-means models for data sets with 10,000 messages.
| Model | Runtime (seconds), mean (SD) | Precision, mean (SD) | Recall, mean (SD) | F score, mean (SD) |
| HACa | 6.594 (0.245) | N/Ab | N/A | N/A |
| HAC+KNNc (u=0.2) | 2.172 (0.097) | 0.993 (0.003) | 0.982 (0.005) | 0.986 (0.004) |
| HAC+KNN (u=0.4) | 2.502 (0.023) | 0.995 (0.001) | 0.996 (0.002) | 0.995 (0.001) |
| HAC+KNN (u=0.6) | 3.418 (0.071) | 0.997 (0.001) | 0.998 (0.001) | 0.997 (0.001) |
| HAC+KNN (u=0.8) | 4.697 (0.146) | 0.998 (0.001) | 0.999 (0.001) | 0.999 (0.001) |
| LDAd | 1788.981 (62.444) | 0.624 (0.029) | 0.939 (0.006) | 0.704 (0.023) |
| K-means | 41.143 (1.334) | 0.993 (0.002) | 0.734 (0.011) | 0.823 (0.010) |
aHAC: hierarchical agglomerative clustering.
bN/A: not applicable, because model does not include the parameter u.
cKNN: k-nearest neighbors.
dLDA: latent Dirichlet allocation.