|
Algorithm 1 Proposed deep Q-learning algorithm |
-
1:
Initialize policy, target DQN with random w,
-
2:
Initialize
-
3:
for episode do
-
4:
for instance do
-
5:
Select a channel matrix and add it to action space for present state space
-
6:
Observe immediate reward , next state space
-
7:
Put → ERM
-
8:
Form random sample mini batch of from ERM
-
9:
for each tuple in mini batch do
-
10:
Calculate Q-values
-
11:
Approximate -values using target DNN
-
12:
Compute loss from Q and
-
13:
Optimize w of policy DNN with Adam optimizer
-
14:
after all time steps
|