Table A3.
Parameter | Value |
---|---|
Parallel Environment | 150 |
Maximum time steps per epoch (Stage One) | 2000 |
Learning Rate | 1 × 10 |
Discount Factor | 0.996 |
Curriculum Reward Threshold | 2300 |
Parameter | Value |
---|---|
Parallel Environment | 150 |
Maximum time steps per epoch (Stage One) | 2000 |
Learning Rate | 1 × 10 |
Discount Factor | 0.996 |
Curriculum Reward Threshold | 2300 |