Table 3.
Name | Range | Recommend |
---|---|---|
Learning rate | 1,0.1,0.001,0.002,0.003,0.0001 | 0.002 |
Batch size | 32,64,128,256,512,1024,2056 | 1024,2056 |
Weight initialization | uniform, normal, lecun_uniform, glorot_normal, glorot_uniform | glorot_normal |
Per-parameter adaptive learning rate | SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam | Adam |
Activation function | relu, tanh, sigmoid, softmax, softplus | relu, sigmoid |
Dropout rate | 0.5, 0.6, 0.7 | 0.6 |
Depth | 2, 3, 4, 5, 6, 7, 8,9 | 3 |
Width | 16, 32, 64, 128, 256, 1024, 2048, 4096 | 128, 64, 32 |
GPU | Yes, No | Yes |