Table 5:
Hyperparameters for T5 DAPT
Hyper-parameter | Setting |
---|---|
Optimizer | AdamW |
Epoch | 10 (with early stopping) |
Learning rate | 1e-3, 1e-4 |
Batch size | 256 |
Gradient accumulation | True |
Hyperparameters for T5 DAPT
Hyper-parameter | Setting |
---|---|
Optimizer | AdamW |
Epoch | 10 (with early stopping) |
Learning rate | 1e-3, 1e-4 |
Batch size | 256 |
Gradient accumulation | True |