Skip to main content
. 2019 Sep 19;19(18):4055. doi: 10.3390/s19184055
Algorithm 1. DRL for MASS Autonomous Navigation Decision-making

Input:

Start sampling from random state s0 and randomly select action. Sampling is terminated at T cycles or the MASS collides. The resulting sample set is S.

Each input in S must be included:

(1) Current states st, (2) action a, (3) return r, (4) the next state after the action st+1, and (5) the termination condition

Output: weights parameter ω for DRL

Require: ω: a small positive number representing the allowed smallest convergence tolerance; S: the state set; P(s,r|s,a): the transition probability from current state and action to next state and reward; γ: the discount factor;

  • 1:

     Initialize the optimal value function Q(s), sS arbitrarily

  • 2:

     For episode = 1, M do

  • 3:

        For t = 1, T do

  • 4:

        repeat

  • 5:

        ω0

  • 6:

        for sS do

  • 7:

          target qQ(s)

  • 8:

          Q(s)maxa[r+γs,rP(s,r|s,a)Q(s)]

  • 9:

          ωmax(ω,|qQ(s)|)

  • 10:

       until ω<0

  • 11:

       end for

  • 12:

     end for

  • 13:

     π(s)argmaxa[r+γs,rP(s,r|s,a)Q(s)]