Decision-Making for the Autonomous Navigation of Maritime Autonomous Surface Ships Based on Scene Division and Deep Reinforcement Learning

. 2019 Sep 19;19(18):4055. doi: 10.3390/s19184055

Algorithm 1. DRL for MASS Autonomous Navigation Decision-making

● Input:

Start sampling from random state $s_{0}$ and randomly select action. Sampling is terminated at $T$ cycles or the MASS collides. The resulting sample set is $S$ .

Each input in $S$ must be included:

(1) Current states $s_{t}$ , (2) action $a$ , (3) return $r$ , (4) the next state after the action $s_{t + 1}$ , and (5) the termination condition

● Output: weights parameter $ω^{*}$ for DRL

Require: $ω$ : a small positive number representing the allowed smallest convergence tolerance; $S$ : the state set; $P (s^{'}, r | s, a)$ : the transition probability from current state and action to next state and reward; $γ$ : the discount factor;

1:
Initialize the optimal value function $Q (s)$ , $\forall s \in S$ arbitrarily
2:
For episode = 1, M do
3:
For t = 1, T do
4:
repeat
5:
$ω \leftarrow 0$
6:
for $s \in S$ do
7:
target $q \leftarrow Q (s)$
8:
$Q (s) \leftarrow \max_{a} [r + γ \sum s^{'}, r P (s^{'}, r | s, a) Q (s)]$
9:
$ω \leftarrow \max (ω, | q - Q (s) |)$
10:
until $ω < 0$
11:
end for
12:
end for
13:
$π^{*} (s) \approx a r g m a x_{a} [r + γ \sum s^{'}, r P (s^{'}, r | s, a) Q (s)]$