Table 4.
Optimized hyperparameters relating to learning determined by Bayesian optimization
Test dataset A | Test dataset B | Test dataset C | |
---|---|---|---|
Minibatch size | 20 | 10 | 19 |
Initial learning rates | 9.461e−05 | 5.631e−05 | 7.550e−05 |
Momentum | 0.945 | 0.919 | 0.949 |
L2 regularization | 1.529e−09 | 1.953e−10 | 1.087e−06 |