. 2022 Dec 14;14(12):e2021MS002959. doi: 10.1029/2021MS002959

Table 2.

Hyperparameters of the NNs and the Optimizer

	Models N1–N3 and Q2	Models Q1 and Q3
Hidden layers	2	3
Units per hidden layer	256	64
Activation fct. for each layer	ReLU → ReLU → linear	tanh → leaky ReLU (α = 0.2) → tanh → linear
L1, L2 reg. coef. for each layer	None	L1: 4.7 ⋅ 10⁻³, L2: 8.7 ⋅ 10⁻³
Batch Normalization	None	After the second hidden layer
Optimizer	N1–N3: Nadam, Q2: Adam	Q1: Adam, Q3: Adadelta
↪ Initial learning rate	10^–3	4.3 ⋅ 10⁻⁴
↪ Batch size	N1–N3: 32, Q2: 128	1,028
↪ Maximal number of epochs	N1–N3: 70, Q2: 40	Q1: 30, Q3: 50