. 2023 Apr 14;4(4):100729. doi: 10.1016/j.patter.2023.100729

Table 9.

Comparison of BLURB test performance: standard fine-tuning vs. optimal fine-tuning with advanced stabilization methods PLUS extensive hyperparameter search

Fine-tuning	PubMedBERT- $B A S E$		PubMedBERT- $L A R G E$		PubMedELECTRA- $L A R G E$
Fine-tuning	Standard	Optimal	Standard	Optimal	Standard	Optimal
BC5-chem	93.33	93.33	93.23^∗	93.23^∗	92.90	93.25
BC5-disease	85.62	85.62	85.77^∗	85.77^∗	84.82	85.23
NCBI-disease	87.82	88.21	88.25^∗	88.25^∗	87.93	88.19
BC2GM	84.52	84.55	84.72^∗	84.72^∗	83.87	84.47
JNLPBA	79.10	79.16	79.44^∗	79.44^∗	78.77	78.77
EBM PICO	73.38	73.45	73.61	73.61	73.95	74.02^∗
ChemProt	77.24	77.41	78.77^∗	78.77^∗	76.80	77.26
DDI	82.36	83.17	82.39	82.78^∗	78.92	80.37
GAD	83.96	84.01^∗	83.57	83.76	83.93	83.93
BIOSSES	92.30	94.49^∗	90.29	92.73	86.17	92.69
HoC	82.32	83.02^∗	82.57	82.70	82.37	82.37
PubMedQA	55.84	63.92	63.18	67.38^∗	60.18	65.02
BioASQ	87.56	82.75	92.36	93.36^∗	81.71	93.14
BLURB score	81.16	82.75	82.02	82.91^∗	79.83	82.44

Note: the addition of extensive hyperparameter tuning results in optimal results that may be higher than those reported without extensive hyperparameter tuning, as in Table 7.

^∗

Highest performance for task (row).