Batch size |
10, 20, 30, 40, 50, 60, 70, 80, 90, 100 |
Number of training examples utilized in one iteration |
Epochs |
10, 50, 100, 200 |
Number of times that the learning algorithm will work through the entire training |
Training optimization algorithm |
SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam |
Tools that update model parameters and minimize the value of the loss function, as evaluated on the training set |
Learning rate |
0.001, 0.01, 0.1, 0.2, 0.3 |
Hyper-parameter that controls how much the weights are being adjusting with respect to the loss gradient |
Momentum |
0.0, 0.2, 0.4, 0.6, 0.8, 0.9 |
Value between 0 and 1 that increases the size of the steps taken towards the minimum by trying to jump from a local minima |
Network weight initialization |
uniform, lecun_uniform, normal, zero, glorot_normal, glorot_uniform, he_normal, he_uniform |
Initialization of weights into hidden layers of the network |
Neuron activation function |
softmax, softplus, softsign, relu, tanh, sigmoid, hard_sigmoid, linear |
How the neuron output is activated based on its inputs |
Dropout regularization |
0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 |
Process of randomly dropping out nodes during training |
Weight constraint |
1, 2, 3, 4, 5 |
Value that introduces a penalty to the loss function when training a neural network to encourage the network to use small weights |
Number of neurons in the hidden layers |
1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 |
Amount of neurons that composed each hidden layers of the network |