|
Algorithm 1 DQN algorithm. |
-
1:
Initialize replay memory pool
-
2:
Initialize neural network weight and target weight
-
3:
for episode i = 1, M do
-
4:
for do
-
5:
Obtain state from the environment
-
6:
Randomly select an action or determine
-
7:
Observe the reward with the action and obtain the next state
-
8:
Store transition in memory
-
9:
Randomly sample a mini-batch of
-
10:
Update the evaluation network and perform
-
11:
Update the target network after C steps
-
12:
end for
-
13:
end for
-
14:
Output: offloading decision strategy .
|