Table 5.
Hyperparameter | Parameter Value |
---|---|
Initial Learn Rate | |
Gradient Decay Factor | 0.9000 |
Squared Gradient Decay Factor | 0.9990 |
Epsilon () | |
Learn Rate Schedule | piecewise |
Learn Rate Drop Factor | 0.0100 |
Learn Rate Drop Period | 125,000 |
L2 Regularization | |
Gradient Threshold Method | L2 norm |
Gradient Threshold | 1 |
Maximum Epochs | 7000 |
Mini Batch Size | 2 |
Input and Label Shuffle | once |