Skip to main content
. 2023 Jul 6;23(13):6190. doi: 10.3390/s23136190
Algorithm 1 DIDDPG-based UAV formation algorithm.
  •   1:

    Initialize system parameters P¯,Fj,Hj,Dj1,Dj2 and the replay memory buffer R.

  •   2:

    Randomly initialize θμ, θQ, μ and Q.

  •   3:

    Initialize online actor and critic networks μsθμ and Qs,aθQ, respectively.

  •   4:

    for episode = 0:1:N1 do

  •   5:

       Initialize the random noise ω and state s0.

  •   6:

       for j=0:1:M1 do

  •   7:

         Update the action aj=μsjθμ+ω.

  •   8:

         Update the next state sj+1 based on (11) that sj+1=sjFj+ajHj+GjT,03×1.

  •   9:

         Derive the reward rj by (12) that rj=sjP¯sjTaj+1Qaj+1T.

  • 10:

         Store transition (sj,aj,rj,sj+1) in R.

  • 11:

         Randomly Select a mini-batch of K experience samples (sj,aj,rj,sj+1) from R.

  • 12:

         Update target Q value based on (15) that yj=rj+γQ(sj+1,μ(sj+1|θμ)|θQ).

  • 13:

         Update θQ by minimizing the mean quadratic error function based on (14).

  • 14:

         Update θμ by sampled policy gradient θμJ given by (16).

  • 15:

         Update the target networks:

  • 16:

         θQδθQ+1δθQ,θμδθμ+1δθμ.

  • 17:

       end for

  • 18:

    end for