Skip to main content
. 2025 Jun 3;11:e2922. doi: 10.7717/peerj-cs.2922

Algorithm 1 . Algorithm of the proposed process.

 1: Input: batch size, θμ
 2: Load the weights: θμ for target model
 3: repeat
 4:    Generate data packet p, calculate its next hop a using θμ and stochastic process, then record it with the current network status: p:={destination,location,state,a,data}
 5:    while length(experience replay) <batchsize do
 6:       if packet p received then then
 7:          if p has arrived at its destination then
 8:            Done ← true
 9:       else
10:            Done ← false
11:       end if
12:       Add experience replay list with {pstate,pa,r,newstate,done}
13:       generate new state and compute reward r for pa
14:       pstate:=newstate;plocation:=location
15:     end if
16:     if not done then then
17:        Use θμ to find the next hop a
18:         pa:=a
19:     end if
20:     Transmit data packet p
21:   end while
22: until True