Table 4.
Hyper-parameter | Model 1 | Model 2 | |
---|---|---|---|
Autoencoder | Code size | 64 | 128 |
Encoder hidden size | 256, 128 | 512, 256, 128 | |
Decoder hidden size | 256, 128 | 512, 256, 128 | |
GAN | Generator hidden layers | 64, 64 | 128, 128, 128, 128 |
Discriminator hidden size | 256, 128 | 512, 256, 128 | |
# of generator/discriminator steps | 2/1 | 3/1 |
For both models we used batch size of 100 samples, trained the autoencoder for 100 epochs and the GAN for 500 epochs. We applied L2-regularization on the neural network weights (weight decay) with λ=1e-3, and temperature parameter (Gumbel-Softmax trick) τ=0.66. We tested learning rates of [1e-2, 1e-3, 1e-4]