Skip to main content
. 2022 Mar 17;22(6):2328. doi: 10.3390/s22062328
Algorithm 1 The single-layer DQN-based MRA training algorithm.
  • 1:

    (Input) λi,μi,νi,i, batch size η, learning rate α, minimum exploration rate ϵmin, discount factor ζ, and exploration decay rate d;

  • 2:

    (Output) Learned DQN to decide Pi,θi,fi,i, for (7);

  • 3:

    Initialize action a(0) and replay buffer D=;

  • 4:

    for episode = 1 to M do

  • 5:

        Initialize state s(0);

  • 6:

        for time t=1 to N do

  • 7:

            Observe current state s(t);

  • 8:

            ϵ=max(ϵ·d,ϵmin);

  • 9:

            if random number r<ϵ then

  • 10:

               Select a(t)A^ at random;

  • 11:

            else

  • 12:

               Select a(t)=argmaxaQ*(s(t),a,ω);

  • 13:

            end if

  • 14:

            Observe next state s;

  • 15:

            Store transition (s(t),a(t),r(t),s) in D, where r(t) is U(t) obtained with (23);

  • 16:

            Select randomly η stored samples (s(j),a(j),r(j),s(j+1)) from D for experience;

  • 17:

            Obtain Q^(s(j),a(j),ω) for all j samples with (13);

  • 18:

            Perform SGD to minimize the loss in (14) for finding the optimal weight of DNN, ω*;

  • 19:

            Update ω=ω* in the DQN;

  • 20:

            s(t)=s;

  • 21:

        end for

  • 22:

    end for