Skip to main content
. 2021 Jan 12;21(2):492. doi: 10.3390/s21020492

Table 8.

Configuration hyperparameters for the IA trained by reinforcement through the PPO algorithm.

Trainer PPO
Batch size 16
Beta 0.01
Buffer size 256
Epsilon 0.15
Gamma 0.9
Hidden units 64
Lambda 0.9
Learning rate 5 × 10−4
Max steps 10 × 104
Num epoch 10
Num layers 3
Time horizon 4