Skip to main content
. 2021 Jun 26;2(3):528–538. doi: 10.1093/ehjdh/ztab057

Table 1.

DeepSurv hyper-parameter search space

Hyper-parameter Search space
Activation LeakyReLU21, ReLU22, and SELU23
Hidden layers topology 8, 32, 256, 32 × 32, 64 × 64, 128 × 128, 64 × 16, 256 × 32, 32 × 32 × 32, 64 × 64 × 64
Drop-outa24 [0, 0.9]
Weight-decaya25 [0, 20]
Batch normalization26 Yes/No
Optimizer Stochastic Gradient Descent, Adam27
Momentuma28 [0, 1]
Learning rate Log distribution on [1e−5, 1]

The search space consisted of 10 different neural network topologies, up to three layers deep, and a choice of three activation functions for these layers. Regularization techniques included drop-out (ignores randomly selected neurons in the network) and weight decay (L2 regularization, shrinks weights). The option to utilize batch normalization was offered to accelerate training via standardizing the inputs’ changing distribution. The choice of an optimizer for gradient descent included standard stochastic gradient descent (SGD) with or without Momentum, or Adam (adaptive moment optimization).

a

Uniform distributions.