GAPO: A Graph Attention-Based Reinforcement Learning Algorithm for Congestion-Aware Task Offloading in Multi-Hop Vehicular Edge Computing

. 2025 Aug 6;25(15):4838. doi: 10.3390/s25154838

Algorithm 2: Actor Network Forward Pass for Joint Action Generation

Input : GNN node embeddings H, vehicle embedding h_{v e h}

, task features X_{t a s k}

, global stats g_{s t a t s}

Output : Joint action (K_{i}, λ_{i}, α_{i})

, \log probability \log π (K_{i}, λ_{i}, α_{i}| S_{t})

1: Phase A: Select Offloading Target via Attention

2 : q_{a t t}

\leftarrow {M L P}_{q u e r y} ([h_{v e h} \oplus X_{t a s k} \oplus g_{s t a t s}])

3 : C_{t a r g e t}

\leftarrow [h_v e h, h_r s u_1, . . ., h_r s u_M]

4 : for each candidate c_{j}

in C_{t a r g e t}

5 : e_{j}

\leftarrow (q_{a t t}

\cdot c_{j}

) /

sqrt (D_{e m b}

)

6: end for

7: Mask scores for invalid/unavailable targets

8 : P_{t a r g e t}

\leftarrow S o f t m a x

([e_{0}, e_{1}, \dots, e_{M}])

9 : Sample target index K_{i}

~ Categorical (P_{t a r g e t}

)

10 : h_{t a r g e t}

\leftarrow C_{t a r g e t}

[K_{i}

]

11: Phase B: Determine Continuous Ratios

12 : if K_{i}

= 0 then

13 : λ_{i}

\leftarrow 0, α_{i}

← 1

14: else

15 : a_{λ}, b_{λ}

\leftarrow S o f t p l u s ({M L P}_{λ} ([h_{v e h} \oplus {h_{t a r g e t} \oplus X}_{t a s k} \oplus g_{s t a t s}]))

+ 1

16 : Sample offload ratio λ_{i}

~ B e a t (a_{λ}, b_{λ})

17 : a_{α}, b_{α}

\leftarrow S o f t p l u s ({M L P}_{α} ([h_{v e h} \oplus {h_{t a r g e t} \oplus X}_{t a s k} \oplus g_{s t a t s} \oplus λ_{i}]))

+ 1

18 : Sample resource ratio α_{i}

~ B e a t (a_{α}, b_{α})

19: end if

20: Calculate total log probability log π from the distributions

21 : return (K_{i}, λ_{i}, α_{i})

, \log π (K_{i}, λ_{i}, α_{i}| S_{t})