Algorithm 1 FDRT’s Mobile Device Agent Training. |
INPUT: maximum number of training episodes , maximum number of training steps per episode , learning rate , discount factor , exploration rate , experience replay pool capacity , number of batch samples , current network , target network , target network update frequency , numbers of training in a round of federated learning OUTPUT: parameter of current network // is a function that generates random numbers in range // is a function that generates random integers in range // indicates whether the experience replay pool is full // denotes the global model parameters of the MEC server Initialization: Wireless communication model between MDs and BSs; task model and queue model of MD of current. 1: for by 1 do 2: Get initial state ; 3: for by 1 do 4: ; 5: if() then 6: ; 7: else 8: ; 9: end if 10: Perform action in the system model, get reward and next state ; 11: ; 12: Put in; 13: if then 14: continue; 15: end if 16: if then 17: Upload to connected MEC server; 18: ; 19: else 20: Randomly choose a batch sample from to update parameter, ; 21: end if 22: if then 23: ; 24: end if 25: end for 26: end for 27: return ; |