Skip to main content
. 2022 Sep 6;22:234. doi: 10.1186/s12911-022-01977-5

Table 6.

Performance comparison of pre-trained language models

Model Accuracy Precision Recall F1-score
BERT [20]a *0.849 **(0.003) 0.817 (0.010) 0.822 (0.019) 0.818 (0.011)
BioBERT [31]b 0.861 (0.008) 0.835 (0.017) 0.846 (0.015) 0.839 (0.011)
PubMedBERT[34]c 0.865 (0.014) 0.833 (0.020) 0.849 (0.009) 0.839 (0.015)
RoBERTa [35]d 0.862 (0.009) 0.835 (0.018) 0.837 (0.009) 0.835 (0.010)
SciBERT [32]e 0.862 (0.010) 0.836 (0.017) 0.843 (0.013) 0.838 (0.013)

The best scores are in bold

*Mean

**Standard deviation

aBert-base-uncased, Accessed July 20, 2022, Available from: https://huggingface.co/bert-base-uncased

bBiobert-base-cased-v1.2, Accessed July 20, 2022, Available from: https://huggingface.co/dmis-lab/biobert-base-cased-v1.2

cBiomedNLP-PubMedBERT-base-uncased-abstract-fulltext, Accessed July 20, 2022, Available from: https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext

dRoberta-base, Accessed July 20, 2022, Available from: https://huggingface.co/roberta-base

eScibert_scivocab_uncased, Accessed July 20, 2022, Available from: https://huggingface.co/allenai/scibert_scivocab_uncased