Algorithm 1 DIDDPG-based UAV formation algorithm. |
-
1:
Initialize system parameters and the replay memory buffer R.
-
2:
Randomly initialize , , and .
-
3:
Initialize online actor and critic networks and , respectively.
-
4:
for episode = do
-
5:
Initialize the random noise and state .
-
6:
for do
-
7:
Update the action .
-
8:
Update the next state based on (11) that .
-
9:
Derive the reward by (12) that .
-
10:
Store transition in R.
-
11:
Randomly Select a mini-batch of K experience samples from R.
-
12:
Update target Q value based on (15) that .
-
13:
Update by minimizing the mean quadratic error function based on (14).
-
14:
Update by sampled policy gradient given by (16).
-
15:
Update the target networks:
-
16:
-
17:
end for
-
18:
end for
|