Table 2.
Models N1–N3 and Q2 | Models Q1 and Q3 | |
---|---|---|
Hidden layers | 2 | 3 |
Units per hidden layer | 256 | 64 |
Activation fct. for each layer | ReLU → ReLU → linear | tanh → leaky ReLU (α = 0.2) → tanh → linear |
L1, L2 reg. coef. for each layer | None | L1: 4.7 ⋅ 10−3, L2: 8.7 ⋅ 10−3 |
Batch Normalization | None | After the second hidden layer |
Optimizer | N1–N3: Nadam, Q2: Adam | Q1: Adam, Q3: Adadelta |
↪ Initial learning rate | 10–3 | 4.3 ⋅ 10−4 |
↪ Batch size | N1–N3: 32, Q2: 128 | 1,028 |
↪ Maximal number of epochs | N1–N3: 70, Q2: 40 | Q1: 30, Q3: 50 |