|
Algorithm 2 MODDPG |
-
Input:
State space
-
Output:
The action of UAV
-
1:
Initialize Actor network parameters , and Target Actor network parameters , .
-
2:
Initialize Critic network parameters , and Target Critic network parameters , .
-
3:
Initialize experience Replay Buffer.
-
4:
for M do
-
5:
Initialize .
-
6:
for time step = do
-
7:
repeat
-
8:
With probability of choose an action .
-
9:
Perform action and observe reward and next state .
-
10:
Store the transition from P.
-
11:
if P ≥ Batch size then
-
12:
Randomly sample Mini batch transitions from P.
-
13:
Compute (19).
-
14:
Update Critic network by minimizing the critic loss (20).
-
15:
Update Actor network by maximizing the actor loss (21).
-
16:
Soft Update Target Network Parameters.
-
17:
-
18:
end if
-
19:
until
-
20:
end for
-
21:
end for
|