|
Algorithm 2 Duelling DQN-based for the POSG. |
|
Input: replay memory pool
-
1:
Initialize neural network weight and ;
-
2:
for episode do
-
3:
Obtain state from the environment;
-
4:
for step do
-
5:
for each PN q do
-
6:
Obtain the new assigned tasks ;
-
7:
Obtain the previous computing capacity allocation state ;
-
8:
Obtain the current local observation ;
-
9:
Selects action with: ;
-
10:
Execute the computing capacity allocation and obtain the local utility ;
-
11:
Obtain utility from other PNs and the subsequent local observation ;
-
12:
Stores transition tuple into ;
-
13:
Randomly sample a mini-batch from ;
-
14:
Update the network weights by performing gradient descent;
-
15:
Update target network parameters after C steps;
-
16:
end for
-
17:
end for
-
18:
end for
-
19:
Output: computing capacity allocation strategy .
|