D2D-Assisted Multi-User Cooperative Partial Offloading in MEC Based on Deep Reinforcement Learning

. 2022 Sep 15;22(18):7004. doi: 10.3390/s22187004

Algorithm 1: DQN-PTR to solve

P 3

01: Initialize the Q-Network Q with random weights $θ$

02: Initialize the Target Q-Network $\hat{Q}$ with weights $θ^{^{'}} = θ$

03: Initialize replay memory D to capacity N

04: Initialize $ϵ = 0, ϵ_{i n c r e m e n t} = 0.0001, ϵ_{m a x} = 0.9999$

05: Initialize optimal trajectory $O$ to empty and maximum total return $R = 0$

06: For $e p i s o d e = 1, M$ do

07: Initialize sum reward $R^{^{'}} = 0$

08: Initialize state $s_{0}$ , $μ = r a n d \in [0, 1]$

09: For each step t do

10: If $e p i s o d e \leq 100$ or $μ > 0.9$ then

11: If $r a n d \in [0, 1] \geq ϵ$ then

12: Select a random action $a_{t}$

13: else

14: Set $a_{t} = a r g {max}_{a} Q (s_{t}, a; θ)$

15: end if

16: Execute action $a_{t}$ , observe next state $s_{t + 1}$ and reward $r_{t}$ according to environment 1

17: else

18: Set $a_{t}$ according $O [t]$

19: Execute action $a_{t}$ , observe next state $s_{t + 1}$ and reward $r_{t}$ according to environment 2

20: end if

21: $R^{^{'}} = R^{^{'}} + r_{t}$

22: Store transition $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ in D

23: If episode terminates at step $t + 1$ then

24: If $R^{^{'}} > R$ then

25: Replace the trajectory in $O$ with ${a_{1}, \dots, a_{t}}$

26: $R = R^{^{'}}$

27: else if $R^{^{'}} = R$ and $t < l e n (O)$ then

28: Replace the trajectory in $O$ with ${a_{1}, \dots, a_{t}}$

29: end if

30: break

31: end if

32: Sample random mini-batch of transitions $(s_{i}, a_{j}, r_{j}, s_{j + 1})$ from D

33: Set $y_{j} = \{\begin{matrix} r_{j}, if episode terminates at step j + i, \\ r_{j} + γ {max}_{a^{^{'}}} \hat{Q} (s_{j + 1}, a^{^{'}}; θ^{^{'}}), otherwise, \end{matrix}$

34: Perform a gradient descent step on ${(y_{j} - Q (s_{j}, a_{j}; θ))}^{2}$ with respect to the network parameters $θ$

35: If $ϵ < ϵ_{m a x}$ then

36: $ϵ = ϵ + ϵ_{i n c r e m e n t}$

37: end if

38: Every C steps reset $\hat{Q} = Q$

39: end for

40: end for