Skip to main content
. 2024 Mar 14;24(6):1863. doi: 10.3390/s24061863
Algorithm 1 DQN algorithm.
  •   1:

    Initialize replay memory pool M

  •   2:

    Initialize neural network weight θ and target weight θ=θ

  •   3:

    for episode i = 1, M do

  •   4:

          for t=1,T do

  •   5:

                Obtain state st from the environment

  •   6:

                Randomly select an action at or determine at=argmaxaQ(st,at;θ)

  •   7:

                Observe the reward Rt with the action at and obtain the next state st+1

  •   8:

                Store transition (st,at,Rt,st+1) in memory M

  •   9:

                Randomly sample a mini-batch M˜ of M

  • 10:

                Update the evaluation network and perform θiLF(θi)

  • 11:

                Update the target network after C steps

  • 12:

          end for

  • 13:

    end for

  • 14:

    Output: offloading decision strategy at.