Skip to main content
. 2022 Mar 17;22(6):2328. doi: 10.3390/s22062328
Algorithm 2 The single-layer DDPG-based MRA training algorithm.
  • 1:

    (Input) λi,μi,νi,i, batch size η, actor learning rate αa, critic learning rate αc, decay rate d, discount factor ζ, and soft update parameter τ;

  • 2:

    (Output) Learned actor/critic to decide Pi,θi,fi,i, for (7);

  • 3:

    Initialize actor Qa(s;ωa), critic Q(s,a;ωc), action a(0), replay buffer D, and set initial decay rate d(0)=1;

  • 4:

    for episode = 1 to M do

  • 5:

         Initialize state s(0) and ρ(0);

  • 6:

         for time t=1 to N do

  • 7:

              Normalize state s(t) with (32);

  • 8:

              Execute action a(t) in (30), obtain reward r(t)=U(t) with (23), and observe new state s;

  • 9:

              if replay buffer D is not full then

  • 10:

                   Store transition (s(t),a(t),r(t),s) in D;

  • 11:

              else

  • 12:

                   Replace the oldest one in buffer D with (s(t),a(t),r(t),s);

  • 13:

                   Set d(t)=d(t1)·d;

  • 14:

                   Randomly choose η stored transitions from D;

  • 15:

                   Update the critic online network by minimizing the loss function in (36);

  • 16:

                   Update the actor online network with the gradient obtained by (37);

  • 17:

                   Soft update the target networks with their parameters updated by (29);

  • 18:

                   s(t)=s;

  • 19:

              end if

  • 20:

         end for

  • 21:

       end for