Skip to main content
. 2025 Feb 18;25(4):1232. doi: 10.3390/s25041232
Algorithm 1 GDRL-SFC training algorithm
Input: GNN network, policy network and critic network
Output: Some parameters
  •  1:

    Initialize actor network and critic network.

  •  2:

    for  step=1,2,,max_step  do

  •  3:

        Initialize env, memory buffer;

  •  4:

        while not Done do

  •  5:

            Extract node embeddings using GNN;

  •  6:

            Get the valid state st use MLP;

  •  7:

            Actor network sample action at;

  •  8:

            Calculate the reward rt and receive next state st+1;

  •  9:

            Store the current experience in the memory buffer.

  • 10:

            Update st = st+1

  • 11:

        end while

  • 12:

        Sample a batch from the memory buffer.

  • 13:

        Compute loss L, and optimize the parameters;

  • 14:

        Update network parameters;

  • 15:

        if Done then

  • 16:

            Break;

  • 17:

        end if

  • 18:

        if tmod5==0 then

  • 19:

            Test the policy;

  • 20:

        end if

  • 21:

    end forreturn;