|
Algorithm 1 GDRL-SFC training algorithm |
Input: GNN network, policy network and critic network Output: Some parameters
-
1:
Initialize actor network and critic network.
-
2:
for
do
-
3:
Initialize env, memory buffer;
-
4:
while not Done do
-
5:
Extract node embeddings using GNN;
-
6:
Get the valid state use ;
-
7:
Actor network sample action ;
-
8:
Calculate the reward and receive next state ;
-
9:
Store the current experience in the memory buffer.
-
10:
Update =
-
11:
end while
-
12:
Sample a batch from the memory buffer.
-
13:
Compute loss , and optimize the parameters;
-
14:
Update network parameters;
-
15:
if Done then
-
16:
Break;
-
17:
end if
-
18:
if then
-
19:
Test the policy;
-
20:
end if
-
21:
end forreturn;
|