Table 5:
Prediction overlap raw count (out of 90000) | Percentage of Test Set (%) | Accuracy on prediction overlap (%) | |
LR + KNN (Levi) | 86430 | 96.0 | 95.2 |
LR + RF (Levi) | 87580 | 97.3 | 94.9 |
KNN + RF (Levi) | 87211 | 96.9 | 95.1 |
LR + KNN (Tokens) | 78469 | 87.2 | 94.7 |
LR + RF (Tokens) | 79174 | 88.0 | 97.1 |
KNN + RF (Tokens) | 81643 | 90.7 | 96.3 |
LR Levi + Tokens | 80003 | 88.9 | 95.9 |
RF Levi + Tokens | 83526 | 92.8 | 97.9 |
KNN Levi + Tokens | 79103 | 87.9 | 97.0 |
LR + RF + KNN (Levi) | 85774 | 95.3 | 95.6 |
LR + RF + KNN (Tokens) | 75423 | 83.8 | 97.7 |
LR + RF + KNN (Levi + Tokens) | 72477 | 80.5 | 99.0 |
Abbreviations: LR, logistic regression classifier. RF, random forest classifier. KNN, K-nearest neighbors classifier. Levi, Levenshtein distance encoding. Tokens, Frequency Tokenization Encoding.