Algorithm 1 DDPG-based Optimization Framework for MEC-enabled Blockchain IoT Systems. |
-
1:
for each GN do
-
2:
Initialization: replay memory , critic network , actor network and corresponding target networks and with weights and ;
-
3:
end for
-
4:
for each episode do
-
5:
Initialization: state for each GN ;
-
6:
for each decision epoch do
-
7:
for each GN do
-
8:
Select action based on the exploration noise to decide block interval and power allocation;
-
9:
Observe reward and next state ;
-
10:
Store transition data into replay memory ;
-
11:
Sample a mini-batch of Z transition tuples from memory at random;
-
12:
Update critic network by minimizing the loss L:
-
13:
Update actor policy based on the sampled policy gradient:
-
14:
-
15:
end for
-
16:
end for
-
17:
end for
|