Skip to main content

View full-text article in PMC

. 2021 Jan 12;21(2):492. doi: 10.3390/s21020492

Table 8.

Configuration hyperparameters for the IA trained by reinforcement through the PPO algorithm.

Trainer	PPO
Batch size	16
Beta	0.01
Buffer size	256
Epsilon	0.15
Gamma	0.9
Hidden units	64
Lambda	0.9
Learning rate	5 × 10⁻⁴
Max steps	10 × 10⁴
Num epoch	10
Num layers	3
Time horizon	4