Table 3.
Parameter | BERTa | RoBERTab | XLNETc |
Learning rate | 1e-5 | 1e-5 | 2e-5 |
Training steps | 7000 | 7000 | 7000 |
Maximum length | 128 | 128 | 128 |
Batch size | 16 | 16 | 16 |
Warm-up steps | 700 | 700 | 700 |
Dropout rate | 0.3 | 0.3 | 0.3 |
aBERT: bidirectional encoder representations from transformers.
bRoBERTa: robustly optimized bidirectional encoder representations from transformers pretraining approach.
cXLNET: generalized autoregressive pretraining for language understanding.