Table 7.
Best accuracy with different machine learning algorithm for different text representation method
Input data table | Used text | Machine learning algorithms | Accuracy | Macro average (F1 score) | Weighted average (F1 score) |
---|---|---|---|---|---|
ScispaCy | Abstract | Random Forest Classifier | 0.74 | 0.74 | 0.74 |
TF-IDF | Abstract | Random Forest Classifier | 0.92 | 0.92 | 0.92 |
BOW | Abstract | Random Forest Classifier | 0.90 | 0.90 | 0.90 |
ScispaCy | Title | Multinomial NB | 0.68 | 0.68 | 0.68 |
TF-IDF | Title | Random Forest Classifier | 0.80 | 0.80 | 0.80 |
BOW | Title | Random Forest Classifier | 0.70 | 0.70 | 0.70 |
ScispaCy | Title and Abstract | Random Forest Classifier | 0.79 | 0.79 | 0.79 |
TF-IDF | Title and Abstract | Random Forest Classifier | 0.92 | 0.92 | 0.92 |
BOW | Title and Abstract | Random Forest Classifier | 0.91 | 0.91 | 0.91 |
BOW | Abstract with Bibliometric Features | Random Forest Classifier | 0.73 | 0.73 | 0.73 |
TF-IDF | Abstract with Bibliometric Features | Random Forest Classifier | 0.92 | 0.92 | 0.92 |
Bidirectional Encoder Representations | Title_Abstract | Neural Network (BERT) | 0.87 | 0.87 | 0.87 |