. 2025 Jun 3;11:e2922. doi: 10.7717/peerj-cs.2922

Algorithm 1 . Algorithm of the proposed process.

1: Input: batch size,

θ^{μ^{*}}

2: Load the weights:

θ^{μ^{*}}

for target model

3: repeat

4: Generate data packet p, calculate its next hop a using

θ^{μ^{*}}

and stochastic process, then record it with the current network status:

p := {d e s t i n a t i o n, l o c a t i o n, s t a t e, a, d a t a}

5: while length(experience replay)

< b a t c h s i z e

6: if packet p received then then

7: if p has arrived at its destination then

8: Done ← true

9: else

10: Done ← false

11: end if

12: Add experience replay list with

{p \to s t a t e, p \to a, r, n e w s t a t e, d o n e}

13: generate new state and compute reward r for

p \to a

14:

p \to s t a t e := n e w s t a t e; p \to l o c a t i o n := l o c a t i o n

15: end if

16: if not done then then

17: Use

θ^{μ^{*}}

to find the next hop a

18:

p \to a := a

19: end if

20: Transmit data packet p

21: end while

22: until True