Skip to main content
. 2022 Feb 21;2021:611–620.

Table 5:

Ensemble learning performance for various combinations of models and encoding methods.

Prediction overlap raw count (out of 90000) Percentage of Test Set (%) Accuracy on prediction overlap (%)
LR + KNN (Levi) 86430 96.0 95.2
LR + RF (Levi) 87580 97.3 94.9
KNN + RF (Levi) 87211 96.9 95.1
LR + KNN (Tokens) 78469 87.2 94.7
LR + RF (Tokens) 79174 88.0 97.1
KNN + RF (Tokens) 81643 90.7 96.3
LR Levi + Tokens 80003 88.9 95.9
RF Levi + Tokens 83526 92.8 97.9
KNN Levi + Tokens 79103 87.9 97.0
LR + RF + KNN (Levi) 85774 95.3 95.6
LR + RF + KNN (Tokens) 75423 83.8 97.7
LR + RF + KNN (Levi + Tokens) 72477 80.5 99.0

Abbreviations: LR, logistic regression classifier. RF, random forest classifier. KNN, K-nearest neighbors classifier. Levi, Levenshtein distance encoding. Tokens, Frequency Tokenization Encoding.