Table 1.
DeepSurv hyper-parameter search space
Hyper-parameter | Search space |
---|---|
Activation | LeakyReLU21, ReLU22, and SELU23 |
Hidden layers topology | 8, 32, 256, 32 × 32, 64 × 64, 128 × 128, 64 × 16, 256 × 32, 32 × 32 × 32, 64 × 64 × 64 |
Drop-outa24 | [0, 0.9] |
Weight-decaya25 | [0, 20] |
Batch normalization26 | Yes/No |
Optimizer | Stochastic Gradient Descent, Adam27 |
Momentuma28 | [0, 1] |
Learning rate | Log distribution on [1e−5, 1] |
The search space consisted of 10 different neural network topologies, up to three layers deep, and a choice of three activation functions for these layers. Regularization techniques included drop-out (ignores randomly selected neurons in the network) and weight decay (L2 regularization, shrinks weights). The option to utilize batch normalization was offered to accelerate training via standardizing the inputs’ changing distribution. The choice of an optimizer for gradient descent included standard stochastic gradient descent (SGD) with or without Momentum, or Adam (adaptive moment optimization).
Uniform distributions.