Table 3.
BERTa hyperparameter tuning in the internal training and validation cohorts using 5-fold experiments.
Batch size | Max length | Learning rate | Epoch | Value, mean (SD) |
64 | —b | 2×10–5 | — | 0.78 (0.01) |
128 | — | 2×10–5 | — | 0.80 (0.01) |
128 | 128 | 2×10–5 | 3 | 0.80 (0.01) |
256 | 64 | 2×10–5 | — | 0.79 (0.01) |
64 | — | 3×10–5 | — | 0.79 (0.01) |
128 | — | 3×10–5 | — | 0.79 (0.01) |
128 | 128 | 3×10–5 | 3 | 0.78 (0.01) |
256 | 64 | 3×10–5 | — | 0.78 (0.01) |
64 | — | 5×10–5 | — | 0.79 (0.01) |
128 | — | 5×10–5 | — | 0.80 (0.01) |
128 | 128 | 5×10–5 | 3 | 0.79 (0.01) |
256 | 64 | 5×10–5 | — | 0.79 (0.01) |
aBERT: Bidirectional Encoder Representations from Transformers.
bNot applicable.