α |
Learning rate |
2.5e-4 |
γ |
Discount rate |
0.99 |
λ |
GAE parameter |
0.95 |
τ |
Entropy bonus coefficient |
0.001 |
T
|
Number of steps per policy updates |
1,024 |
K
|
Number of epochs |
4 |
M
|
Batch size |
64 |
N
|
Number of parallel actors |
4 |
Environment |
|
|
|
Look-ahead distance |
3 |
|
Number of training path waypoints |
7 |
|
Sonar span apex angle |
140 |
|
Sonar range |
25 |
|
Sensor suite |
(15, 15) |
|
Sensor min. pool output |
(8, 8) |
|
Sensor update frequency |
1 |
|
Ocean current intensity limits |
|
|
End-goal acceptance radius |
1 |
|
Control fins time constant |
0.2 |
Reward function |
|
|
|
Course error penalty coefficient |
|
|
Elevation error penalty coefficient |
|
|
Obst. closen. penalty scaling |
|
|
Minimum obstacle penalty closeness |
|
|
Minimum vessel-relative scaling |
|
|
Roll penalty coefficient |
|
|
Roll rate penalty coefficient |
|
|
Rudder action penalty coefficient |
|
|
Elevator action penalty coefficient |
|
|
path following/COLAV trade-off |
|