|
Algorithm 4 Dynamic Sampling with D4PG |
|
Input: Environment, replay buffer D, Q-network parameters and , exploration rate , learning rate , discount factor , target network update frequency C, target network update rate
|
|
Output: Optimal action sequence for task offloading and resource allocation |
-
1:
Initialize and
-
2:
for each global iteration g do
-
3:
Observe current state
-
4:
Select action :
-
5:
if then
-
6:
Choose a random subset of UEs as
-
7:
else
-
8:
Choose
-
9:
end if
-
10:
Sample subset from
-
11:
Record in D
-
12:
Conduct local training and obtain reward R
-
13:
Sample mini-batch from D
-
14:
Set target:
-
15:
if is terminal then
-
16:
-
17:
else
-
18:
-
19:
end if
-
20:
Update by minimizing the loss:
-
21:
Every C steps, update by:
-
22:
end for
|