Table 4.
Parameter | BERTa | RoBERTab | XLNETc |
Learning rate | 2e-5 | 2e-5 | 2e-5 |
Training steps | 100,000 | 100,000 | 100,000 |
Maximum length | 256 | 256 | 256 |
Batch size | 16 | 16 | 16 |
Warm-up steps | 10,000 | 10,000 | 10,000 |
aBERT: bidirectional encoder representations from transformers.
bRoBERTa: robustly optimized bidirectional encoder representations from transformers pretraining approach.
cXLNET: generalized autoregressive pretraining for language understanding.