Federated Reinforcement Learning-Based Dynamic Resource Allocation and Task Scheduling in Edge for IoT Applications

. 2025 Mar 30;25(7):2197. doi: 10.3390/s25072197

Algorithm 3 Federated Learning Algorithm with Dynamic UE Selection using D4PG

Input: Global model parameters

w_{global}

, UE dataset

D_{k}

for each UE k, hyperparameter

λ

for all UEs, maximum number of global iterations G, local training iterations L.

Output: Final global model parameters

w_{global}^{(G)}

, trained D4PG agent, performance metrics.

1:
Initialization: Initialize global model parameters $w_{global}$ . Initialize the hyperparameter $λ$ for all UEs.
2:
D4PG Agent Initialization: Initialize the D4PG agent with appropriate observation and action spaces, neural networks for policy and value functions, and experience replay buffer.
3:
Dynamic Sampling: At each global iteration g:
4:
D4PG agent dynamically selects a subset $K_{sampled}$ of $K_{g}$ UEs based on learned policies
5:
Global Model Broadcast: Broadcast the global model parameters $w_{global}$ to all UEs in $K_{sampled}$ .
6:
Local Training: Each UE $k \in K_{sampled}$ performs local training:
7:
Initialize local model $w_{local}$ with $w_{global}$ .
8:
for $(l = 0, 1, \dots, L - 1)$ do
9:
Randomly select a data point $ξ_{k, g, l}$ from local dataset $D_{k}$ .
10:
Update local model: As shown in Equation (45)
11:
end for
12:
Send Local Models: Each UE sends its updated local model $w_{local}^{(g, L)}$ to the base station.
13:
Global Model Aggregation: The base station aggregates the local models to update the global model: As shown in Equation (46)
14:
D4PG Agent Update: Update the D4PG agent using the reward based on the performance and fairness of the aggregated model, storing experiences in the replay buffer, and training the neural networks.
15:
Repeat: Repeat the process for G global iterations.