Table 5:
Hyperparameters for T5 DAPT
| Hyper-parameter | Setting |
|---|---|
| Optimizer | AdamW |
| Epoch | 10 (with early stopping) |
| Learning rate | 1e-3, 1e-4 |
| Batch size | 256 |
| Gradient accumulation | True |
Hyperparameters for T5 DAPT
| Hyper-parameter | Setting |
|---|---|
| Optimizer | AdamW |
| Epoch | 10 (with early stopping) |
| Learning rate | 1e-3, 1e-4 |
| Batch size | 256 |
| Gradient accumulation | True |