. 2023 Apr 14;4(4):100729. doi: 10.1016/j.patter.2023.100729

Table 7.

Comparison of BLURB test performance using $L A R G E$ models (24 layers, 300M+ parameters), with optimal layer-specific stabilization methods

	BioBERT- $L A R G E$	BlueBERT- $L A R G E$	PubMedBERT- $L A R G E$	PubMedELECTRA- $L A R G E$
BC5-chem	93.05	90.24	93.23^∗	92.90
BC5-disease	84.97	82.93	85.77^∗	84.82
NCBI-disease	88.76^∗	86.44	88.25	87.93
BC2GM	84.21	80.86	84.72^∗	83.87
JNLPBA	78.83	77.59	79.44^∗	78.77
EBM PICO	73.81	72.43	73.61	73.95^∗
ChemProt	77.79	71.31	78.77^∗	76.80
DDI	81.53	78.99	82.39^∗	78.92
GAD	82.47	75.80	83.57	83.93^∗
BIOSSES	91.53	86.18	92.73^∗	90.33
HoC	81.57	81.35	82.57^∗	82.37
PubMedQA	55.16	55.24	67.38^∗	65.02
BioASQ	78.93	72.21	93.36^∗	93.14
BLURB score	80.09	77.11	82.86^∗	81.88
$Δ B A S E$ model	−0.59	+0.15	+0.58	+0.37

Note: the optimal strategies were all layer-reinit for BIOSSES and layerwise decay for QA.

^∗

Highest performance for task (row).