Table 3.
Parameter | Value used for training set 1 | Value used for training set 2 |
---|---|---|
Batch size | 128 | 32 |
Learning rate | 0.1 | 0.001 |
Epochs | 15 | 15 |
Adaptive gradient clipping | 0.16 | 0.16 |
Weight decay | 1.00E-05 | 1.00E-05 |
Optimizer | Stochastic gradient descent | Stochastic gradient descent |
StepLR scheduler | Every 2.4 steps | Every 2.4 steps |
Gamma | 0.97 | 0.97 |