Skip to main content
. 2022 Mar 1;22(5):1919. doi: 10.3390/s22051919
Algorithm 2: Pseudocode for the DRL approach.

1.2

  • 1:

    Initialize the experience replay buffer B.

  • 2:

    for each UAV i in N do

  • 3:

        Initialize the actor network πstθπ with weights θπ.

  • 4:

        Initialize the critic network Qs,aθQ.

  • 5:

        Initialize the target actor network π(sθπ) with weights θπ.

  • 6:

        Initialize the target critic network Q(s,aθQ) with weights θQ.

  • 7:

    end for

  • 8:

    for each episode in H do

  • 9:

        Initialize the locations of the UAVs.

  • 10:

        The initial speed is zero for the UAVs, and their battery energy is Emax.

  • 11:

        Initialize the environment.

  • 12:

        Receive the initial state s1.

  • 13:

        for each time t in T do

  • 14:

            for each UAV i in N do

  • 15:

               Select action ati=πi(stθπ)+N, where N is the noise term.

  • 16:

            end for

  • 17:

            UAVs execute their actions at=(at1,,atN).

  • 18:

            Update next state st+1, and obtain reward rt=(rt1,,rtN).

  • 19:

            for each UAV i in N do

  • 20:

               if UAV i moves outside the region or close to other UAVs then

  • 21:

                   Find rti=Utfti.

  • 22:

                   Neglect the new location and update Oti.

  • 23:

               end if

  • 24:

            end for

  • 25:

            Update stst+1.

  • 26:

            Store (st,at,rt,st+1) in the buffer.

  • 27:

            for each UAV i in N do

  • 28:

               Sample L random mini-batches (st,at,rt,st+1)B.

  • 29:

               Find ytb=rtb+γQ(s(b+1),π(s(b+1)|θπ.)|θQ.), where b=1,,L.

  • 30:

               Update weights θQ by minimizing: LθQ=1Lb=1Lyt(b)Qs(b),a(b)θQ2.

  • 31:

               Update weights θπ by minimizing: Lθπ=1Lb=1LQs(b),πs(b)θπθQ.

  • 32:

               Update the target network’s weights: θQ=εθQ+(1ε)θQ and θπ=εθπ+(1ε)θπ.

  • 33:

            end for

  • 34:

        end for

  • 35:

    end for