Skip to main content
. 2023 Apr 14;4(4):100729. doi: 10.1016/j.patter.2023.100729

Table 7.

Comparison of BLURB test performance using LARGE models (24 layers, 300M+ parameters), with optimal layer-specific stabilization methods

BioBERT-LARGE BlueBERT-LARGE PubMedBERT-LARGE PubMedELECTRA-LARGE
BC5-chem 93.05 90.24 93.23 92.90
BC5-disease 84.97 82.93 85.77 84.82
NCBI-disease 88.76 86.44 88.25 87.93
BC2GM 84.21 80.86 84.72 83.87
JNLPBA 78.83 77.59 79.44 78.77
EBM PICO 73.81 72.43 73.61 73.95
ChemProt 77.79 71.31 78.77 76.80
DDI 81.53 78.99 82.39 78.92
GAD 82.47 75.80 83.57 83.93
BIOSSES 91.53 86.18 92.73 90.33
HoC 81.57 81.35 82.57 82.37
PubMedQA 55.16 55.24 67.38 65.02
BioASQ 78.93 72.21 93.36 93.14
BLURB score 80.09 77.11 82.86 81.88
ΔBASE model −0.59 +0.15 +0.58 +0.37

Note: the optimal strategies were all layer-reinit for BIOSSES and layerwise decay for QA.

Highest performance for task (row).