. 2021 Jun 26;2(3):528–538. doi: 10.1093/ehjdh/ztab057

Table 1.

DeepSurv hyper-parameter search space

Hyper-parameter	Search space
Activation	LeakyReLU²¹, ReLU²², and SELU²³
Hidden layers topology	8, 32, 256, 32 × 32, 64 × 64, 128 × 128, 64 × 16, 256 × 32, 32 × 32 × 32, 64 × 64 × 64
Drop-out^a24	[0, 0.9]
Weight-decay^a25	[0, 20]
Batch normalization²⁶	Yes/No
Optimizer	Stochastic Gradient Descent, Adam²⁷
Momentum^a28	[0, 1]
Learning rate	Log distribution on [1e−5, 1]

The search space consisted of 10 different neural network topologies, up to three layers deep, and a choice of three activation functions for these layers. Regularization techniques included drop-out (ignores randomly selected neurons in the network) and weight decay (L2 regularization, shrinks weights). The option to utilize batch normalization was offered to accelerate training via standardizing the inputs’ changing distribution. The choice of an optimizer for gradient descent included standard stochastic gradient descent (SGD) with or without Momentum, or Adam (adaptive moment optimization).

Uniform distributions.