Table 9.
Comparison of BLURB test performance: standard fine-tuning vs. optimal fine-tuning with advanced stabilization methods PLUS extensive hyperparameter search
| Fine-tuning | PubMedBERT- |
PubMedBERT- |
PubMedELECTRA- |
|||
|---|---|---|---|---|---|---|
| Standard | Optimal | Standard | Optimal | Standard | Optimal | |
| BC5-chem | 93.33 | 93.33 | 93.23∗ | 93.23∗ | 92.90 | 93.25 |
| BC5-disease | 85.62 | 85.62 | 85.77∗ | 85.77∗ | 84.82 | 85.23 |
| NCBI-disease | 87.82 | 88.21 | 88.25∗ | 88.25∗ | 87.93 | 88.19 |
| BC2GM | 84.52 | 84.55 | 84.72∗ | 84.72∗ | 83.87 | 84.47 |
| JNLPBA | 79.10 | 79.16 | 79.44∗ | 79.44∗ | 78.77 | 78.77 |
| EBM PICO | 73.38 | 73.45 | 73.61 | 73.61 | 73.95 | 74.02∗ |
| ChemProt | 77.24 | 77.41 | 78.77∗ | 78.77∗ | 76.80 | 77.26 |
| DDI | 82.36 | 83.17 | 82.39 | 82.78∗ | 78.92 | 80.37 |
| GAD | 83.96 | 84.01∗ | 83.57 | 83.76 | 83.93 | 83.93 |
| BIOSSES | 92.30 | 94.49∗ | 90.29 | 92.73 | 86.17 | 92.69 |
| HoC | 82.32 | 83.02∗ | 82.57 | 82.70 | 82.37 | 82.37 |
| PubMedQA | 55.84 | 63.92 | 63.18 | 67.38∗ | 60.18 | 65.02 |
| BioASQ | 87.56 | 82.75 | 92.36 | 93.36∗ | 81.71 | 93.14 |
| BLURB score | 81.16 | 82.75 | 82.02 | 82.91∗ | 79.83 | 82.44 |
Note: the addition of extensive hyperparameter tuning results in optimal results that may be higher than those reported without extensive hyperparameter tuning, as in Table 7.
Highest performance for task (row).