GNN-DRL-Based Intelligent Routing and Resource Allocation Algorithms for Multi-Layer Wireless Mesh Network

. 2026 Feb 11;26(4):1170. doi: 10.3390/s26041170

Algorithm 2. Distributed Routing Decision Algorithm Based on GraphSAGE-MAPPO.

Input: the service flow arriving at agent

i

f_{i j} = {t y p e_{f_{i j}}, d a t a_{f_{i j}}, C_{f_{i j}}^{d e m a n d}, r e q_{f_{i j}}}

Output:

(a_{r o u t e}, a_{r e s o u r c e})

, routing and resource allocation actions.

1:
initialize the network G, Actor network parameters $θ$ , Critic network parameters $ω$ , Experience replay buffer D.
2:
while $e p i s o d e < N_{e p i s o d e s}$
3:
for step from 1 to T
4:
if the current node is not the destination of the traffic flow.
5:
Obtain the initial feature vector of node $n_{i}$ $x_{n_{i}} = {p o s_{n_{i}}, B_{n_{i}}, C_{n_{i}}, S_{n_{i}}}$ .
6:
Compute the hidden feature vector of the node using the algorithm in Algorithm 1. $h_{i}$ , form the state space $s_{i}$ by incorporating the traffic flow $f_{i j}$
7:
Generate the next-hop probability distribution according to the policy function $π_{θ_{i}}^{i} (a_{i}, s_{i})$ , select the optimal next hop, allocate resources, and compute the reward $R_{i}$ using Equation (26).
8:
Obtain the next state $s^{'}$ , and store $(s, a, r, s^{'})$ in the experience replay buffer D
9:
Sample experiences from the replay buffer and compute the advantage function $A_{i}^{t}$ according to Equation (30).
10:
Compute the loss function of the Actor network and update $θ$ .
11:
Compute the loss function of the Critic network and update $ω$ .
12:
end if
13:
end for
14:
end while
15:
Save the model.