Table 2.
Method | Accuracy | Precision | Recall | F1-score | AUC-ROC | |||
---|---|---|---|---|---|---|---|---|
Infeasible | Feasible | Infeasible | Feasible | Infeasible | Feasible | |||
BERT‡ | 87.82 ± 0.6 | 80.51 ± 2.8 | 89.59 ± 1.8 | 62.70 ± 7.7 | 95.32 ± 1.5 | 70.08 ± 3.6 | 92.34 ± 0.3 | 93.93 ± 0.2 |
BiLSTM† | 87.37 ± 0.4 | 72.36 ± 2.9 | 92.14 ± 1.4 | 73.72 ± 5.5 | 91.44 ± 1.9 | 72.81 ± 1.4 | 91.76 ± 0.4 | 91.71 ± 0.5 |
XGBoost†* | 86.03 ± 0.3 | 73.78 ± 1.7 | 88.93 ± 0.2 | 61.10 ± 0.5 | 93.49 ± 0.6 | 66.83 ± 0.5 | 91.15 ± 0.2 | 92.38 ± 0.3 |
SVM†* | 85.68 ± 0.3 | 74.93 ± 2.2 | 87.99 ± 0.3 | 56.98 ± 1.4 | 94.27 ± 0.8 | 64.69 ± 0.5 | 91.02 ± 0.2 | 92.21 ± 0.5 |
Weighted TF-IDF† | 75.36 ± 0.8 | 47.54 ± 1.3 | 88.68 ± 0.2 | 66.73 ± 0.6 | 77.94 ± 1.1 | 55.51 ± 0.7 | 82.96 ± 0.6 | 72.34 ± 0.4 |
TF-IDF | 75.43 ± 0.6 | 47.39 ± 1.1 | 87.04 ± 0.1 | 60.16 ± 0.7 | 80.00 ± 1.0 | 53.00 ± 0.4 | 83.37 ± 0.5 | 70.08 ± 0.2 |
Frequency | 64.74 ± 0.3 | 23.21 ± 0.9 | 77.03 ± 0.3 | 23.03 ± 1.2 | 77.21 ± 0.6 | 23.11 ± 1.0 | 77.12 ± 0.2 | 49.99 ± 0.7 |
Bold entries are the best performance values given each metric
‡PubMedBERT embedding; †Wikipedia-PubMed embedding; *TF-IDF values as weight