Skip to main content
. 2024 Mar 14;24(6):1863. doi: 10.3390/s24061863
Algorithm 2 Duelling DQN-based for the POSG.

Input: replay memory pool M

  •   1:

    Initialize neural network weight θq and θq;

  •   2:

    for episode ep=1,2, do

  •   3:

          Obtain state sq(0) from the environment;

  •   4:

          for step t=1,,T do

  •   5:

                for each PN q do

  •   6:

                      Obtain the new assigned tasks Jq(t);

  •   7:

                      Obtain the previous computing capacity allocation state Γq(t);

  •   8:

                      Obtain the current local observation sq(t)={Jq,Dq,Γq};

  •   9:

                      Selects action aq with: an=ϵgreedy(Qq(sq,aq;θq));

  • 10:

                      Execute the computing capacity allocation and obtain the local utility uq(sq,aq);

  • 11:

                      Obtain utility from other PNs uq=(sq,aq),qq and the subsequent local observation oq;

  • 12:

                      Stores transition tuple <sq,aq,u(s,a),sq> into M;

  • 13:

                      Randomly sample a mini-batch Mn from M;

  • 14:

                      Update the network weights θq by performing gradient descent;

  • 15:

                      Update target network parameters θq=θq after C steps;

  • 16:

                end for

  • 17:

          end for

  • 18:

    end for

  • 19:

    Output: computing capacity allocation strategy a={a1,,aq,aQ+1}.