Table 2.
Popular machine learning techniques employed in online extremism detection.
| Technique used for extremism detection | Study | Hyperparameter | Features | Performance metric | Remark |
|---|---|---|---|---|---|
| Naïve Bayes/multinomial Naïve Bayes | [35, 42, 46, 60–63] | Alpha = 0.01 [46] | n-grams, TF-IDF, and word2vec | Accuracy = 0.66 [46] and correctly classified instances = 89% [42] | Naïve Bayes or multinomial Naïve Bayes is used so as to build a model based on a probabilistic learning approach [46] |
|
| |||||
| KNN | [64] | Distance = euclidean distance, K = 100 [64] | Term frequency | Precision = 0.48 and accuracy = 0.90 [64] | Distance-based approach for similarity in extremism text |
|
| |||||
| Logistic regression | [36, 62, 65] | NA | Word2vec, fasttext, GloVe, and LIWC | F1 score = 99.77 [36] and accuracy = 0.70 [62] | Used for binary classification of extremism text |
|
| |||||
| SVM | [36, 42, 44, 46, 49, 61–64, 66, 67] | Penalty = L1, tol = 1e − 3 [46] | n-grams, TF-IDF, word2vec, fasttext, GloVe, and PCA | Accuracy = 84 [67] and precision = 84 [49] | SVM segregates data using hyperplanes, so that classification is better. [46] |
|
| |||||
| Random forest | [61, 63, 66, 68, 69] | Estimators = 100, Kfold = 5 [68], estimators = 100, max_depth = 50 [66] | n-grams, TF-IDF, word2vec, and GloVe | Accuracy = 100 [66] and F1-score = 0.93 [69] | Random forest is scalable and unaffected by outliers in extremism text dataset [66] |
|
| |||||
| AdaBoost | [41, 42, 70] | Boosting iterations = 300 [41] | n-grams | Precision = 0.88 [70] and accuracy = 99.5 [42] | AdaBoost improves performance by combining weak classifiers |
|
| |||||
| XGBoost | [59, 71] | Regularization = L2 | Betweenness centrality and page rank | ROC-AUC curve = 0.95 [59] | XGBoost improves performance with faster learning |