Table 3.
Task | Balanced | Full-balanced | Unbalanced | Wilcoxon |
---|---|---|---|---|
AUPRC | ||||
IE versus IP | 0.627 | 0.787* | 0.791* | 0.251 |
AP versus IP | 0.745 | 0.884* | 0.901* | 0.066 |
AE versus IE | 0.660 | 0.885 | 0.814 | |
AE versus AP | 0.834 | 0.945 | 0.856 | |
AE + AP versus else | 0.671 | 0.882 | 0.824 | |
All tasks | 0.707 | 0.877 | 0.837 | |
AUROC | ||||
IE versus IP | 0.82* | 0.819* | 0.903 | 0.046 |
AP versus IP | 0.919 | 0.931 | 0.960 | |
AE versus IE | 0.893* | 0.921 | 0.9205* | 0.052 |
AE versus AP | – | 0.960* | 0.956* | 0.249 |
0.952* | – | 0.956* | 0.035 | |
AE + AP versus else | 0.929* | 0.956 | 0.925* | 0.066 |
All tasks | 0.903 | 0.917 | 0.933 |
Character * marks not statistically different pairs and, in this case, the last column reports the computed p value > 0.01. Bold text highlight the best performance, when this is statistically different from all the other values