Federated Reinforcement Learning-Based Dynamic Resource Allocation and Task Scheduling in Edge for IoT Applications

. 2025 Mar 30;25(7):2197. doi: 10.3390/s25072197

Algorithm 4 Dynamic Sampling with D4PG

Input: Environment, replay buffer D, Q-network parameters

θ_{online}

and

θ_{target}

, exploration rate

ϵ

, learning rate

α

, discount factor

γ

, target network update frequency C, target network update rate

τ

Output: Optimal action sequence for task offloading and resource allocation

1:
Initialize $θ_{online}$ and $θ_{target}$
2:
for each global iteration g do
3:
Observe current state $s_{g}$
4:
Select action $a_{g}$ :
5:
if $random (0, 1) < ϵ$ then
6:
Choose a random subset of UEs as $a_{g}$
7:
else
8:
Choose $a_{g} = arg {max}_{a} Q (s_{g}, a; θ_{online})$
9:
end if
10:
Sample subset $K_{sampled}$ from $a_{g}$
11:
Record $(s_{g}, a_{g})$ in D
12:
Conduct local training and obtain reward R
13:
Sample mini-batch $(s, a, r, s^{'})$ from D
14:
Set target:
15:
if $s^{'}$ is terminal then
16:
$t a r g e t = r$
17:
else
18:
$t a r g e t = r + γ E [Q (s^{'}, a^{'}; θ_{target})]$
19:
end if
20:
Update $θ_{online}$ by minimizing the loss:
$L = {(Q (s, a; θ_{online}) - t a r g e t)}^{2}$
21:
Every C steps, update $θ_{target}$ by:
$θ_{target} \leftarrow τ θ_{online} + (1 - τ) θ_{target}$
22:
end for