. 2021 Dec 17;22(Suppl 1):598. doi: 10.1186/s12859-021-04141-4

Table 8.

Span detection F1 score results for all algorithms tested against the core evaluation annotation set of the 30 held-out articles

Ontology	CRF	BiLSTM	BiLSTM-CRF	Char-Embeddings	BiLSTM-ELMo	BioBERT
ChEBI	0.7234	0.6545	0.5000	0.5280	0.0620	0.9091*
CL	0.8333	0.5882	0.3774	0.8000	0.0000	0.9231*
GO_BP	0.8677*	0.5498	0.3661	0.6346	0.0685	0.8646
GO_CC	0.9412	0.1379	0.2689	0.2581	0.1000	0.9444*
GO_MF	> 0.9999*	> 0.9999*	> 0.9999*	0.8421	0.0000	> 0.9999*
MOP	> 0.9999*	> 0.9999*	> 0.9999*	> 0.9999*	0.0000	> 0.9999*
NCBITaxon	0.9959*	0.8551	0.9440	0.9569	0.0711	0.9453
PR	0.4351	0.2979	0.2151	0.0995	0.0339	0.8199*
SO	0.9435*	0.4935	0.4897	0.8203	0.1059	0.9081
UBERON	0.7913	0.7206	0.4758	0.7440	0.0854	0.8826*

The best-performing algorithm per ontology is bolded with an asterisk*