Skip to main content
. 2022 Apr 22;22(9):3217. doi: 10.3390/s22093217
Algorithm 1 DDPG-based Optimization Framework for MEC-enabled Blockchain IoT Systems.
  • 1:

    for each GN mΦG do

  • 2:

       Initialization: replay memory Bm, critic network Q(s,a|θmQ), actor network μ(s|θmμ) and corresponding target networks Q and μ with weights θmμθmμ and θmQθmQ;

  • 3:

    end for

  • 4:

    for each episode {1,2,,Kmax} do

  • 5:

       Initialization: state sm,1 for each GN mΦG;

  • 6:

       for each decision epoch n=1,2,,Tmax do

  • 7:

         for each GN mΦG do

  • 8:

            Select action am,n=μ(sm,n|θmμ)+Δμ based on the exploration noise Δμ to decide block interval and power allocation;

  • 9:

            Observe reward rm,n and next state sm,n+1;

  • 10:

            Store transition data (sm,n,am,n,rm,n, sm,n+1) into replay memory Bm;

  • 11:

            Sample a mini-batch of Z transition tuples {(sz,az,rz,sz)}z=1Z from memory Bm at random;

  • 12:
            Update critic network by minimizing the loss L:
    L=1Zz=1ZQ(sz,az|θmQ)ϵz2;
  • 13:
            Update actor policy based on the sampled policy gradient:
    θmμJ1Zz=1ZaQ(sz,a|θmQ)|a=μ(sz)θmμμ(sz|θmμ);
  • 14:
            Update target networks:
    θmμζθmμ+(1ζ)θmμ
    θmQζθmQ+(1ζ)θmQ
  • 15:

         end for

  • 16:

       end for

  • 17:

    end for