Table 8.
Span detection F1 score results for all algorithms tested against the core evaluation annotation set of the 30 held-out articles
| Ontology | CRF | BiLSTM | BiLSTM-CRF | Char-Embeddings | BiLSTM-ELMo | BioBERT |
|---|---|---|---|---|---|---|
| ChEBI | 0.7234 | 0.6545 | 0.5000 | 0.5280 | 0.0620 | 0.9091* |
| CL | 0.8333 | 0.5882 | 0.3774 | 0.8000 | 0.0000 | 0.9231* |
| GO_BP | 0.8677* | 0.5498 | 0.3661 | 0.6346 | 0.0685 | 0.8646 |
| GO_CC | 0.9412 | 0.1379 | 0.2689 | 0.2581 | 0.1000 | 0.9444* |
| GO_MF | > 0.9999* | > 0.9999* | > 0.9999* | 0.8421 | 0.0000 | > 0.9999* |
| MOP | > 0.9999* | > 0.9999* | > 0.9999* | > 0.9999* | 0.0000 | > 0.9999* |
| NCBITaxon | 0.9959* | 0.8551 | 0.9440 | 0.9569 | 0.0711 | 0.9453 |
| PR | 0.4351 | 0.2979 | 0.2151 | 0.0995 | 0.0339 | 0.8199* |
| SO | 0.9435* | 0.4935 | 0.4897 | 0.8203 | 0.1059 | 0.9081 |
| UBERON | 0.7913 | 0.7206 | 0.4758 | 0.7440 | 0.0854 | 0.8826* |
The best-performing algorithm per ontology is bolded with an asterisk*