Skip to main content
. 2023 Feb 16;23(4):2243. doi: 10.3390/s23042243
Algorithm 1 FDRT’s Mobile Device Agent Training.
INPUT: maximum number of training episodes MAX_EPISODE, maximum number of training steps per episode MAX_STEP, learning rate α, discount factor γ, exploration rate ε, experience replay pool M capacity CAPACITY, number of batch samples BATH_SIZE, current network Q, target network Q^, target network update frequency C, numbers of training in a round of federated learning F
OUTPUT: parameter θt of current network
// randoma,b is a function that generates random numbers in range a,b
// randinta,b is a function that generates random integers in range a,b
// memory.isfull() indicates whether the experience replay pool M is full
// θe denotes the global model parameters of the MEC server
Initialization: Wireless communication model between MDs and BSs; task model and queue model of MD, parameter θ of current network Q, parameter θ^=θ of target network Q^, experience replay pool M.
1:  for episode=0:MAX_EPISODE by 1 do
2:    Get initial state s0;
3:    for t=0:MAX_STEP by 1 do
4:       xrandom0,1;
5:      if(x>ε) then
6:         atrandint0,2;
7:      else
8:        atargmaxaQst,a;θt;
9:      end if
10:      Perform action  at in the system model, get reward rst,at and next state st+1;
11:      stst+1;
12:      Put It=st,at,rst,at,st+1 in M;
13:      if(memory.isfull()) then
14:        continue;
15:      end if
16:      ifepisode*MAX_STEP+t mod F==0 then
17:         Upload θt to connected MEC server;
18:         θtθe;
19:      else
20:        Randomly choose a batch sample from M to update parameter, θtθtαLθt;
21:      end if
22:      ift mod C==0 then
23:        θ^tθt;
24:      end if
25:    end for
26:  end for
27:  return θt;