Table 5.
Comparison of test performance (and standard deviation) on the ChemProt and DDI relation extraction tasks with major layer-specific adaptation methods, for reduced numbers of training instances
| No. Training instances | Baseline | Layer freeze | Layerwise decay | Layer reinit |
|---|---|---|---|---|
| ChemProt | ||||
| 100 | 22.45∗ (3.22) | 20.39 (2.53) | 20.50 (2.44∗) | 19.33 (3.18) |
| 500 | 44.77 (3.71) | 48.40 (2.85) | 48.55∗(1.78∗) | 43.79 (3.29) |
| 1000 | 56.62 (1.93) | 59.91∗(1.26∗) | 59.67 (1.39) | 55.31 (2.65) |
| DDI | ||||
| 100 | 10.72 (2.93) | 11.13∗ (3.73) | 10.34 (2.50∗) | 9.83 (2.64) |
| 500 | 34.36 (5.46) | 39.78 (4.39) | 40.15∗(3.34∗) | 36.67 (5.50) |
| 1000 | 58.71 (2.87) | 61.40 (2.53) | 61.54∗(1.46∗) | 58.67 (3.54) |
Highest performance and lowest standard deviation.