A Federated Learning and Deep Reinforcement Learning-Based Method with Two Types of Agents for Computation Offload

. 2023 Feb 16;23(4):2243. doi: 10.3390/s23042243

Algorithm 1 FDRT’s Mobile Device Agent Training.

INPUT: maximum number of training episodes

M A X_E P I S O D E

, maximum number of training steps per episode

M A X_S T E P

, learning rate

α

, discount factor

γ

, exploration rate

ε

, experience replay pool

M

capacity

C A P A C I T Y

, number of batch samples

B A T H_S I Z E

, current network

Q

, target network

\hat{Q}

, target network update frequency

C

, numbers of training in a round of federated learning

F

OUTPUT: parameter

θ_{t}

of current network
//

r a n d o m (a, b)

is a function that generates random numbers in range

[a, b)

r a n d i n t (a, b)

is a function that generates random integers in range

[a, b)

m e m o r y . i s f u l l ()

indicates whether the experience replay pool

M

is full
//

θ^{e}

denotes the global model parameters of the MEC server
Initialization: Wireless communication model between MDs and BSs; task model and queue model of MD

, parameter θ

of current

network Q

, parameter \hat{θ} = θ

of target network \hat{Q}

, experience replay pool M

.
1: for

e p i s o d e = 0 : M A X_E P I S O D E

by 1 do
2: Get initial state

s_{0}

;
3: for

t = 0 : M A X_S T E P

by 1 do
4:

x \leftarrow r a n d o m (0, 1)

;
5: if(

x > ε

) then
6:

a_{t} \leftarrow r a n d i n t (0, 2)

;
7: else
8:

a_{t} \leftarrow \underset{a}{argmax} Q (s_{t}, a; θ_{t})

;
9: end if
10: Perform action

a_{t}

in the system model, get reward

r (s_{t}, a_{t})

and next state

s_{t + 1}

;
11:

s_{t} \leftarrow s_{t + 1}

;
12: Put

I_{t} ∶ = (s_{t}, a_{t}, r (s_{t}, a_{t}), s_{t + 1})

M

;
13: if

(m e m o r y . i s f u l l ())

then
14: continue;
15: end if
16: if

((e p i s o d e * M A X_S T E P + t) m o d F = = 0)

then
17: Upload

θ_{t}

to connected MEC server;
18:

θ_{t} \leftarrow θ^{e}

;
19: else
20: Randomly choose a batch sample from

M

to update parameter,

θ_{t} \leftarrow θ_{t} - α \nabla L (θ_{t})

;
21: end if
22: if

(t m o d C = = 0)

then
23:

{\hat{θ}}_{t} \leftarrow θ_{t}

;
24: end if
25: end for
26: end for
27: return

θ_{t}

;