Table 8.
Trainer | PPO |
---|---|
Batch size | 16 |
Beta | 0.01 |
Buffer size | 256 |
Epsilon | 0.15 |
Gamma | 0.9 |
Hidden units | 64 |
Lambda | 0.9 |
Learning rate | 5 × 10−4 |
Max steps | 10 × 104 |
Num epoch | 10 |
Num layers | 3 |
Time horizon | 4 |
Trainer | PPO |
---|---|
Batch size | 16 |
Beta | 0.01 |
Buffer size | 256 |
Epsilon | 0.15 |
Gamma | 0.9 |
Hidden units | 64 |
Lambda | 0.9 |
Learning rate | 5 × 10−4 |
Max steps | 10 × 104 |
Num epoch | 10 |
Num layers | 3 |
Time horizon | 4 |