Table - PMC

Skip to main content

View full-text article in PMC

. 2025 Feb 18;25(4):1232. doi: 10.3390/s25041232

Algorithm 1 GDRL-SFC training algorithm

Input: GNN network, policy network and critic network
Output: Some parameters

1:
Initialize actor network and critic network.
2:
for $s t e p = 1, 2, \dots, m a x_s t e p$ do
3:
Initialize env, memory buffer;
4:
while not Done do
5:
Extract node embeddings using GNN;
6:
Get the valid state $s_{t}$ use $M L P$ ;
7:
Actor network sample action $a_{t}$ ;
8:
Calculate the reward $r_{t}$ and receive next state $s_{t + 1}$ ;
9:
Store the current experience in the memory buffer.
10:
Update $s_{t}$ = $s_{t + 1}$
11:
end while
12:
Sample a batch from the memory buffer.
13:
Compute loss $L$ , and optimize the parameters;
14:
Update network parameters;
15:
if Done then
16:
Break;
17:
end if
18:
if $t mod 5 = = 0$ then
19:
Test the policy;
20:
end if
21:
end forreturn;