. 2023 Apr 14;4(4):100729. doi: 10.1016/j.patter.2023.100729

Table 4.

Comparison of test performance (and standard deviation) on the BIOSSES and BioASQ tasks with major layer-specific adaptation methods, all with $B A S E$ models

Pretraining setting	Baseline	Laye freeze	Layerwise decay	Layer reinit
BIOSSES

BERT	93.46 (0.96)	92.86 (0.88)	93.35 (0.78)	94.49^∗ (0.88)
BERT (no NSP)	93.12 (1.04)	94.01^∗ (0.99)	93.01 (1.07)	92.89 (0.91)
BERT (no NSP, single seq)	75.50 (3.00)	72.09 (2.86)	74.11 (3.51)	85.04^∗ (2.69)
ELECTRA	80.24 (5.92)	83.06 (3.68)	83.55 (3.25)	88.74^∗ (2.29)

BioASQ

BERT	87.56 (2.43)	90.50^∗ (1.51)	88.29 (2.76)	81.28 (3.72)
BERT (no NSP)	83.57 (3.60)	87.07^∗ (3.15)	86.07 (2.29)	83.64 (3.70)
BERT (no NSP, single seq)	85.64 (2.48)	88.64^∗ (1.76)	88.50 (1.63)	80.79 (2.51)
ELECTRA	88.93 (3.87)	90.00 (4.16)	90.64^∗ (2.60)	88.14 (2.21)

^∗

Highest performance for model (row).