Skip to main content
. 2023 Jun 15;23(12):5615. doi: 10.3390/s23125615
Algorithm 1: An algorithm of MADDPG with the ϵ-greedy policy for AGVs.
   for j=1 to max-episode do
      Initialization of the parameters
      for t=1 to M do
      for i=1 to N do
       n = random number
       if n<ϵ then
         execute any action(a)
       else
         execute the action which maximizes Qt(a) with 1ϵ
        end if
        ai=μθi(oi)+Nt
        a=(ai,,aN)
        ri=ki1×Dposition+ki2×Cv×v×cos(ω)+ki3×Ce×(EtargetEi)+ki4×CAGV+ki5×Cobstacles
      end for
      for agent i=1 to N do
        yj=rij+γQμi(xj,a1,aN)
        L(θi)=1sj(yjQiμ(xj,a1j,,aNj))2
        θiJ1sjθiμi(oij)aiQiμ(xj,a1j,,ai,,aNj)|ai=μi(oij)
      end for
      for i=1 to N do
        θi<τθi+(1τ)θi
      end for
      end for
   end for