Table 1.
Optimizers | Specifications |
---|---|
SGD | learning rate = 0.001, weight decay = 0.0005, momentum = 0.9, nesterov = False |
Adagrad | learning rate = 0.001, epsilon = 1 × 10−7 |
RMSProp | learning rate = 0.001, rho = 0.9, epsilon = 1 × 10−7 |
Adadelta | learning rate = 1.0, rho=0.95, epsilon= 1 × 10−6 |
Adam | learning rate = 0.001, beta1 = 0.9, beta2 = 0.999, epsilon = 1 × 10−8, amsgrad = False |
Adamax | learning rate = 0.002, beta1 = 0.9, beta2 = 0.999, epsilon = 1 × 10−8 |