| 1: Input: batch size,
|
| 2: Load the weights:
for target model |
| 3: repeat
|
| 4: Generate data packet p, calculate its next hop a using
and stochastic process, then record it with the current network status:
|
| 5: while length(experience replay)
do
|
| 6: if packet p received then then
|
| 7: if
p has arrived at its destination then
|
| 8: Done ← true |
| 9: else
|
| 10: Done ← false |
| 11: end if
|
| 12: Add experience replay list with
|
| 13: generate new state and compute reward r for
|
| 14:
|
| 15: end if
|
| 16: if not done then then
|
| 17: Use
to find the next hop a
|
| 18:
|
| 19: end if
|
| 20: Transmit data packet p
|
| 21: end while
|
| 22: until True |