|
Algorithm 2. Distributed Routing Decision Algorithm Based on GraphSAGE-MAPPO. |
|
Input: the service flow arriving at agent
|
| Output: , routing and resource allocation actions. |
-
1:
initialize the network G, Actor network parameters , Critic network parameters , Experience replay buffer D.
-
2:
while
-
3:
for step from 1 to T
-
4:
if the current node is not the destination of the traffic flow.
-
5:
Obtain the initial feature vector of node .
-
6:
Compute the hidden feature vector of the node using the algorithm in Algorithm 1. , form the state space by incorporating the traffic flow
-
7:
Generate the next-hop probability distribution according to the policy function , select the optimal next hop, allocate resources, and compute the reward using Equation (26).
-
8:
Obtain the next state , and store in the experience replay buffer D
-
9:
Sample experiences from the replay buffer and compute the advantage function according to Equation (30).
-
10:
Compute the loss function of the Actor network and update .
-
11:
Compute the loss function of the Critic network and update .
-
12:
end if
-
13:
end for
-
14:
end while
-
15:
Save the model.
|