Skip to main content
. 2021 Jan 25;7:566037. doi: 10.3389/frobt.2020.566037


Parameter table for training and simulation setup.

PPO Description Value
α Learning rate 2.5e-4
γ Discount rate 0.99
λ GAE parameter 0.95
τ Entropy bonus coefficient 0.001
T Number of steps per policy updates 1,024
K Number of epochs 4
M Batch size 64
N Number of parallel actors 4
Δ Look-ahead distance 3
nw Number of training path waypoints 7
γa Sonar span apex angle 140
sr Sonar range 25
Sensor suite (15, 15)
Sensor min. pool output (8, 8)
Sensor update frequency 1
[Vmin,Vmax] Ocean current intensity limits [0.5,1]
da End-goal acceptance radius 1
Tf Control fins time constant 0.2
Reward function
cχ Course error penalty coefficient 1
cυ Elevation error penalty coefficient 1
γc Obst. closen. penalty scaling 12.5
ϵc Minimum obstacle penalty closeness 5e3
ϵoa Minimum vessel-relative scaling 0.05
cϕ Roll penalty coefficient 1
cr Roll rate penalty coefficient 1
cδr Rudder action penalty coefficient 0.1
cδs Elevator action penalty coefficient 0.1
λr path following/COLAV trade-off [0.9,0.5,0.1]