Table 3.
Classifiers | FastText | TF-IDF | Hybrid | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F | A | P | R | F | A | P | R | F | A | |
LR | 58.4 | 58.5 | 54.4 | 58.5 | 62.9 | 64.4 | 60.7 | 64.4 | 65.9 | 68.1 | 65.9 | 68.1 |
K-NN | 65.2 | 65.2 | 65.2 | 60.2 | 58.0 | 61.0 | 58.0 | 60.1 | 64.1 | 66.0 | 63.6 | 66.0 |
NB | 54.1 | 50.0 | 50.1 | 49.6 | 65.1 | 52.1 | 55.0 | 52.0 | 62.4 | 55.5 | 57.7 | 55.6 |
DT | 48.4 | 48.0 | 48.2 | 48.0 | 58.1 | 60.5 | 58.6 | 59.5 | 57.0 | 57.9 | 57.4 | 57.9 |
RF | 63.1 | 65.4 | 62.5 | 64.8 | 63.1 | 65.1 | 62.0 | 64.0 | 68.7 | 67.9 | 64.6 | 67.9 |
ETC | 63.5 | 61.6 | 58.4 | 61.6 | 63.1 | 64.9 | 62.7 | 64.9 | 69.2 | 67.9 | 64.8 | 67.9 |
AdaBoost | 54.2 | 56.1 | 52.9 | 56.1 | 63.7 | 62.7 | 58.2 | 59.6 | 60.1 | 62.9 | 60.1 | 62.9 |
MLP-NN | 61.3 | 60.9 | 56.5 | 60.9 | 58.6 | 60.6 | 59.2 | 60.6 | 66.5 | 66.4 | 66.4 | 66.4 |
SVM + Linear | 51.2 | 58.3 | 52.2 | 57.8 | 62.1 | 64.3 | 59.2 | 64.2 | 66.0 | 68.0 | 64.6 | 68.0 |
SVM + RBF | 62.1 | 58.1 | 53.0 | 58.0 | 66.0 | 66.0 | 62.1 | 65.1 | 71.4 | 72.1 | 70.1 | 72.1 |
Note that P, R, F, and A denote overall Precision, Recall, F1-score, and Accuracy for three types of feature extraction methods (FastText, TF-IDF, and Hybrid), respectively. The best hyperparameters of each machine learning algorithm are as follows: LR (C:10, solver: lbfgs, and max_iteration: 2000), K-NN (leaf_size: 35, n_neighbour: 120, p: 1), DT (criterion: gini, min_sample_leaf: 10, and min_sample_split: 2), RF (min_sample_split: 6, min_sample_leaf: 3), ETC (min_sample_leaf: 1, min_sample_split: 2, and n_estimator: 200), AdaBoost (learning_rate: 0.8, n_estimator: 100), MLP-NN (hidden_layer_size: 20, learning_rate_init: 0.01, solver: Adam, and max_iteration: 2000), SVM + Linear (c: 1, Gamma: 0.1), and SVM + RBF (c: 100, Gamma: 0.1). The highest metrics are highlighted in boldface.