Table 7.
Comparison of BLURB test performance using models (24 layers, 300M+ parameters), with optimal layer-specific stabilization methods
| BioBERT- | BlueBERT- | PubMedBERT- | PubMedELECTRA- | |
|---|---|---|---|---|
| BC5-chem | 93.05 | 90.24 | 93.23∗ | 92.90 |
| BC5-disease | 84.97 | 82.93 | 85.77∗ | 84.82 |
| NCBI-disease | 88.76∗ | 86.44 | 88.25 | 87.93 |
| BC2GM | 84.21 | 80.86 | 84.72∗ | 83.87 |
| JNLPBA | 78.83 | 77.59 | 79.44∗ | 78.77 |
| EBM PICO | 73.81 | 72.43 | 73.61 | 73.95∗ |
| ChemProt | 77.79 | 71.31 | 78.77∗ | 76.80 |
| DDI | 81.53 | 78.99 | 82.39∗ | 78.92 |
| GAD | 82.47 | 75.80 | 83.57 | 83.93∗ |
| BIOSSES | 91.53 | 86.18 | 92.73∗ | 90.33 |
| HoC | 81.57 | 81.35 | 82.57∗ | 82.37 |
| PubMedQA | 55.16 | 55.24 | 67.38∗ | 65.02 |
| BioASQ | 78.93 | 72.21 | 93.36∗ | 93.14 |
| BLURB score | 80.09 | 77.11 | 82.86∗ | 81.88 |
| model | −0.59 | +0.15 | +0.58 | +0.37 |
Note: the optimal strategies were all layer-reinit for BIOSSES and layerwise decay for QA.
Highest performance for task (row).