|
Algorithm 1 The single-layer DQN-based MRA training algorithm. |
-
1:
(Input) , batch size , learning rate , minimum exploration rate , discount factor , and exploration decay rate d;
-
2:
(Output) Learned DQN to decide , for (7);
-
3:
Initialize action and replay buffer ;
-
4:
for episode = 1 to do
-
5:
Initialize state ;
-
6:
for time to do
-
7:
Observe current state ;
-
8:
;
-
9:
if random number then
-
10:
Select at random;
-
11:
else
-
12:
Select ;
-
13:
end if
-
14:
Observe next state ;
-
15:
Store transition in D, where is obtained with (23);
-
16:
Select randomly stored samples from D for experience;
-
17:
Obtain for all j samples with (13);
-
18:
Perform SGD to minimize the loss in (14) for finding the optimal weight of DNN, ;
-
19:
Update in the DQN;
-
20:
;
-
21:
end for
-
22:
end for
|