Skip to main content
. 2020 Oct 19;20(20):5911. doi: 10.3390/s20205911
Algorithm 1: Proposed SAC-based path planning algorithm for multi-arm manipulator.
  • 1:

    Define MAMMDP and the augmented state qt and the state and goal state qinit and qgoal

  • 2:

    Initialize network parameters ψ,θ1,2,ϕ

  • 3:

    Initialize the parameter values of the target network ψ¯ψ

  • 4:

    Initialize global replay memory D

  • 5:

      

  • 6:

    fore=1 to Mdo

  • 7:

        Initialize local buffer L                ▹ Memory for an episode     

  • 8:

        for t=0 to T1 do

  • 9:

            Randomly choose the goal and initial positions qgoal,qinitQfreea

  • 10:

            at=fϕ(ϵt,qt||qgoal),ϵtN(0,σt)

  • 11:

            q^t+1=qt+α·at+ϵe,ϵeN(0,σe)

  • 12:

      

  • 13:

            if q^t+1Qfreeathen                ▹ Get next state and reward

  • 14:

               qt+1q^t+1

  • 15:

               rt+1=1

  • 16:

            else if q^t+1Qcollidea then

  • 17:

               qt+1qt

  • 18:

               rt+1=1

  • 19:

            else if |qt+1qgoal|η·α then

  • 20:

               rt+1=0

  • 21:

               Terminate due to goal arrival

  • 22:

            end if

  • 23:

      

  • 24:

            Store the transition (qt||qgoal,at,rt+1,qt+1||qgoal) in D,L

  • 25:

                                  ▹ Parameters update

  • 26:

            Sample mini-batch of m transitions (ql||qgoal,al,rl+1,ql+1||qgoal) from D

  • 27:

            JV(ψ)=Eql[12(Vψ(ql||qgoal)Eal[mink=1,2Qθk(ql||qgoal,al)βlogπϕ(al|ql||qgoal)])2]

  • 28:

            JQ(θk=1,2)=Eql,al[12(Qθk=1,2(ql||qgoal,al)(rl+1+Vψ¯(ql+1||qgoal)))2]

  • 29:

            Jπ(ϕ)=Eql,al[βlogπϕ(al|ql||qgoal)mink=1,2Qθk(ql||qgoal,al)]

  • 30:

      

  • 31:

            Each network parameters ψ,θ1,2,ϕ are updated by gradient descent

  • 32:

            using ψJV(ψ),θ1JQ(θ1),θ2JQ(θ2),ϕJπ(ϕ)

  • 33:

      

  • 34:

            Update state value target ψ¯τψ+(1τ)ψ¯

  • 35:

        end for

  • 36:

      

  • 37:

        if qTqgoalthen                      ▹ HER

  • 38:

            Set additional goal qgoal{q1,q2,,qT}

  • 39:

            for t=0 to T1 do

  • 40:

               Sample a transition (qt||qgoal,at,rt,qt+1||qgoal) from L

  • 41:

               if |qt+1qgoal|η·α then

  • 42:

                   rt+1=0

  • 43:

               else  rt+1=1

  • 44:

               end if

  • 45:

               Store the transition (qt||qgoal,at,rt+1,qt+1||qgoal) in D

  • 46:

            end for

  • 47:

        end if

  • 48:

    end for