|
Algorithm 3 The two-layer hybrid MRA training algorithm. |
-
1:
(Input) , batch size , learning rate , minimum exploration rate , discount factor , exploration decay rate d, and converge threshold ;
-
2:
(Output) Learned DQN to decide , for (7);
-
3:
(Upper-layer DQN-based learning:)
-
4:
Initialize action and replay buffer ;
-
5:
for episode = 1 to do
-
6:
Initialize state ;
-
7:
for time to do
-
8:
Observe current state ;
-
9:
;
-
10:
if random number then
-
11:
Select from at random;
-
12:
else
-
13:
Select ;
-
14:
end if
-
15:
Observe next state ;
-
16:
(Lower-layer game-theory-based iteration:)
-
17:
for each link i do
-
18:
for iteration to do
-
19:
Update with (47);
-
20:
Update with (48);
-
21:
if | then
-
22:
; break;
-
23:
end if
-
24:
end for
-
25:
;
-
26:
; ;
-
27:
end for
-
28:
Determine based on and in the lower layer, and in the upper layer, ;
-
29:
Store transition in D;
-
30:
Select random samples from D;
-
31:
Calculate and perform SGD to find the optimal weight of DNN, ;
-
32:
Update for DQN in the upper layer;
-
33:
;
-
34:
end for
-
35:
end for
|