TABLE 3. The Overall Performance (Main Evaluation Measures) of the Methods on the Litcovid and Hoc datsets.
Label-based measures | Instance-based measures | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Macro-F1 | Macro-AP | Micro-F1 | Micro-AP | F1 | Accuracy | |||||||
Mean | Max | Mean | Max | Mean | Max | Mean | Max | Mean | Max | Mean | Max | |
LitCovid BioCreative | ||||||||||||
ML-Net | 0.7655 | 0.7750 | - | - | 0.8437 | 0.8470 | - | - | 0.8678 | 0.8706 | 0.7019 | 0.7108 |
Binary BERT | 0.8597 | 0.8773 | 0.7825 | 0.8059 | 0.9132 | 0.9186 | 0.8557 | 0.8655 | 0.9278 | 0.9330 | 0.7984 | 0.8120 |
Linear BERT | 0.8569 | 0.8791 | 0.7796 | 0.8066 | 0.9067 | 0.9163 | 0.8461 | 0.8607 | 0.9254 | 0.9341 | 0.7915 | 0.8072 |
LitMC-BERT (ours) | 0.8776 | 0.8921 | 0.8048 | 0.8223 | 0.9129 | 0.9212 | 0.8553 | 0.8663 | 0.9314 | 0.9384 | 0.8022 | 0.8188 |
Hoc | ||||||||||||
ML-Net | 0.7618 | 0.7665 | - | - | 0.7449 | 0.7560 | - | - | 0.7931 | 0.8003 | 0.4990 | 0.5429 |
Binary BERT | 0.8530 | 0.8686 | 0.7581 | 0.7811 | 0.8453 | 0.8583 | 0.7368 | 0.7568 | 0.8733 | 0.8850 | 0.6251 | 0.6476 |
Linear BERT | 0.8599 | 0.8711 | 0.7690 | 0.7875 | 0.8554 | 0.8637 | 0.7547 | 0.7670 | 0.8941 | 0.9018 | 0.6695 | 0.6857 |
LitMC-BERT (ours) | 0.8733 | 0.8882 | 0.7894 | 0.8118 | 0.8648 | 0.8787 | 0.7697 | 0.7905 | 0.9036 | 0.9169 | 0.6854 | 0.7270 |
Reported SOTA performance on Hoc | ||||||||||||
BlueBERT (base) | - | - | - | - | - | - | - | - | - | 0.8530 | - | - |
BlueBERT (large) | - | - | - | - | - | - | - | - | - | 0.8730 | - | - |
PubMedBERT | - | - | - | - | - | 0.8232* | - | - | - | - | - | - |
*: The reported results was on a slightly different version of the hoc dataset.