Two-Layer Edge Intelligence for Task Offloading and Computing Capacity Allocation with UAV Assistance in Vehicular Networks

. 2024 Mar 14;24(6):1863. doi: 10.3390/s24061863

Algorithm 2 Duelling DQN-based for the POSG.

Input: replay memory pool $M$

1:
Initialize neural network weight $θ_{q}$ and $θ_{q}^{-}$ ;
2:
for episode $e p = 1, 2, \dots$ do
3:
Obtain state $s_{q} (0)$ from the environment;
4:
for step $t = 1, \dots, T$ do
5:
for each PN q do
6:
Obtain the new assigned tasks $J_{q} (t)$ ;
7:
Obtain the previous computing capacity allocation state $Γ_{q} (t)$ ;
8:
Obtain the current local observation $s_{q} (t) = {J_{q}, D_{q}, Γ_{q}}$ ;
9:
Selects action $a_{q}$ with: $a_{n} = ϵ - g r e e d y (Q_{q} (s_{q}, a_{q}; θ_{q}))$ ;
10:
Execute the computing capacity allocation and obtain the local utility $u_{q} (s_{q}, a_{q})$ ;
11:
Obtain utility from other PNs $u_{q^{'}} = (s_{q^{'}}, a_{q^{'}}), q^{'} \neq q$ and the subsequent local observation $o_{q}^{'}$ ;
12:
Stores transition tuple $< s_{q}, a_{q}, u (s, a), s_{q}^{'} >$ into $M$ ;
13:
Randomly sample a mini-batch $M_{n}$ from $M$ ;
14:
Update the network weights $θ_{q}$ by performing gradient descent;
15:
Update target network parameters $θ_{q}^{'} = θ_{q}$ after C steps;
16:
end for
17:
end for
18:
end for
19:
Output: computing capacity allocation strategy $a = {a_{1}, \dots, a_{q} \dots, a_{Q + 1}}$ .