Table A1.
WDN 1 | WDN 2 | WDN 3 | |
---|---|---|---|
Training hyperparameters | |||
Batch size | 256 | 256 | 256 |
Learning rate | |||
regularization | |||
MMD regularization | |||
Scheduler warmup | 50 | 50 | 50 |
Early stopping patience | 50 | 50 | 50 |
ResNet Architecture | |||
Latent dimension | 8 | 16 | 16 |
# layers | 5 | 5 | 7 |
# neurons | 64, 48, 32, | 192, 160, 128, | 512, 384, 320, |
24, 16 | 96, 64, 32 | 256, 192, 128, 64 | |
Act. func. | Leaky ReLU | Leaky ReLU | Leaky ReLU |
Transformer Architecture | |||
Latent dimension | 8 | 16 | 16 |
Num. dense neurons | 32 | 128 | 256 |
Embedding dimension | 4 | 4 | 4 |
# Attn. heads | 2 | 2 | 2 |
# Latent transformer blocks | 2 | 2 | 2 |
# Full order transformer blocks | 1 | 1 | 1 |
Act. func. transformer layers | GeLU | GeLU | GeLU |
Act. func. dense layers | Leaky ReLU | Leaky ReLU | Leaky ReLU |