Table 12.
Model | Adam | RMSprop | SGD | AdaMax | Adadelta | Nadam |
---|---|---|---|---|---|---|
ShuffleNet | 2.9 m | 3.9 m | 2.9 m | 3.2 m | 3.2 m | 3.0 m |
ShuffleNet-Light+Inception-v3 | 2.0 m | 2.1 m | 1.9 m | 2.1 m | 2.1 m | 2.0 m |
ShuffleNet-Light+AlexNet | 2.4 m | 3.4 m | 2.4 m | 3.4 m | 3.4 m | 2.5 m |
ShuffleNet-Light+MobileNet | 2.5 m | 3.5 m | 2.5 m | 4.5 m | 3.5 m | 2.7 m |
m: millions, SGD: stochastic gradient descent, RMSprop: root mean square propagation, Adagrad: adaptive gradient algorithm, Adadelta: an extension of Adagrad, Adam: adaptive moment estimation, AdaMax: a variant of Adam, Nadam: Nesterov-accelerated adaptive moment estimation.