Table 4.
Comparison of test performance (and standard deviation) on the BIOSSES and BioASQ tasks with major layer-specific adaptation methods, all with models
| Pretraining setting | Baseline | Laye freeze | Layerwise decay | Layer reinit |
|---|---|---|---|---|
| BIOSSES | ||||
| BERT | 93.46 (0.96) | 92.86 (0.88) | 93.35 (0.78) | 94.49∗ (0.88) |
| BERT (no NSP) | 93.12 (1.04) | 94.01∗ (0.99) | 93.01 (1.07) | 92.89 (0.91) |
| BERT (no NSP, single seq) | 75.50 (3.00) | 72.09 (2.86) | 74.11 (3.51) | 85.04∗ (2.69) |
| ELECTRA | 80.24 (5.92) | 83.06 (3.68) | 83.55 (3.25) | 88.74∗ (2.29) |
| BioASQ | ||||
| BERT | 87.56 (2.43) | 90.50∗ (1.51) | 88.29 (2.76) | 81.28 (3.72) |
| BERT (no NSP) | 83.57 (3.60) | 87.07∗ (3.15) | 86.07 (2.29) | 83.64 (3.70) |
| BERT (no NSP, single seq) | 85.64 (2.48) | 88.64∗ (1.76) | 88.50 (1.63) | 80.79 (2.51) |
| ELECTRA | 88.93 (3.87) | 90.00 (4.16) | 90.64∗ (2.60) | 88.14 (2.21) |
Highest performance for model (row).