Skip to main content
. 2023 Apr 14;4(4):100729. doi: 10.1016/j.patter.2023.100729

Table 9.

Comparison of BLURB test performance: standard fine-tuning vs. optimal fine-tuning with advanced stabilization methods PLUS extensive hyperparameter search

Fine-tuning PubMedBERT-BASE
PubMedBERT-LARGE
PubMedELECTRA-LARGE
Standard Optimal Standard Optimal Standard Optimal
BC5-chem 93.33 93.33 93.23 93.23 92.90 93.25
BC5-disease 85.62 85.62 85.77 85.77 84.82 85.23
NCBI-disease 87.82 88.21 88.25 88.25 87.93 88.19
BC2GM 84.52 84.55 84.72 84.72 83.87 84.47
JNLPBA 79.10 79.16 79.44 79.44 78.77 78.77
EBM PICO 73.38 73.45 73.61 73.61 73.95 74.02
ChemProt 77.24 77.41 78.77 78.77 76.80 77.26
DDI 82.36 83.17 82.39 82.78 78.92 80.37
GAD 83.96 84.01 83.57 83.76 83.93 83.93
BIOSSES 92.30 94.49 90.29 92.73 86.17 92.69
HoC 82.32 83.02 82.57 82.70 82.37 82.37
PubMedQA 55.84 63.92 63.18 67.38 60.18 65.02
BioASQ 87.56 82.75 92.36 93.36 81.71 93.14
BLURB score 81.16 82.75 82.02 82.91 79.83 82.44

Note: the addition of extensive hyperparameter tuning results in optimal results that may be higher than those reported without extensive hyperparameter tuning, as in Table 7.

Highest performance for task (row).