Skip to main content
. 2022 Mar 9;2022:5681574. doi: 10.1155/2022/5681574

Table 3.

Comparison of performance of three feature extraction methods with nine machine learning algorithms in terms of classification performance (%).

Classifiers FastText TF-IDF Hybrid
P R F A P R F A P R F A
LR 58.4 58.5 54.4 58.5 62.9 64.4 60.7 64.4 65.9 68.1 65.9 68.1
K-NN 65.2 65.2 65.2 60.2 58.0 61.0 58.0 60.1 64.1 66.0 63.6 66.0
NB 54.1 50.0 50.1 49.6 65.1 52.1 55.0 52.0 62.4 55.5 57.7 55.6
DT 48.4 48.0 48.2 48.0 58.1 60.5 58.6 59.5 57.0 57.9 57.4 57.9
RF 63.1 65.4 62.5 64.8 63.1 65.1 62.0 64.0 68.7 67.9 64.6 67.9
ETC 63.5 61.6 58.4 61.6 63.1 64.9 62.7 64.9 69.2 67.9 64.8 67.9
AdaBoost 54.2 56.1 52.9 56.1 63.7 62.7 58.2 59.6 60.1 62.9 60.1 62.9
MLP-NN 61.3 60.9 56.5 60.9 58.6 60.6 59.2 60.6 66.5 66.4 66.4 66.4
SVM + Linear 51.2 58.3 52.2 57.8 62.1 64.3 59.2 64.2 66.0 68.0 64.6 68.0
SVM + RBF 62.1 58.1 53.0 58.0 66.0 66.0 62.1 65.1 71.4 72.1 70.1 72.1

Note that P, R, F, and A denote overall Precision, Recall, F1-score, and Accuracy for three types of feature extraction methods (FastText, TF-IDF, and Hybrid), respectively. The best hyperparameters of each machine learning algorithm are as follows: LR (C:10, solver: lbfgs, and max_iteration: 2000), K-NN (leaf_size: 35, n_neighbour: 120, p: 1), DT (criterion: gini, min_sample_leaf: 10, and min_sample_split: 2), RF (min_sample_split: 6, min_sample_leaf: 3), ETC (min_sample_leaf: 1, min_sample_split: 2, and n_estimator: 200), AdaBoost (learning_rate: 0.8, n_estimator: 100), MLP-NN (hidden_layer_size: 20, learning_rate_init: 0.01, solver: Adam, and max_iteration: 2000), SVM + Linear (c: 1, Gamma: 0.1), and SVM + RBF (c: 100, Gamma: 0.1). The highest metrics are highlighted in boldface.