A UAV Maneuver Decision-Making Algorithm for Autonomous Airdrop Based on Deep Reinforcement Learning

. 2021 Mar 23;21(6):2233. doi: 10.3390/s21062233

Algorithm 1 The UAV Maneuver Decision-Making Algorithm for Airdrop Task.

Input:
- The hyperparameters of training networks: the size of minibatch k, networks’ learning rate $η$ ;
- The hyperparameters of updating policy: policy’s learning rate $σ$ , learning period K, memory capacity N, “Soft” updating $τ$ ;
- The hyperparameters of sampling: the availability exponent of PER $α$ , IS exponent $β$ ;
- The control parameters of simulation: maximum period M, maximum step per period T.
Output:
- Actor network $Q (s_{j}, a_{j}; θ^{Q})$ and its target network $Q^{'} (s, a; θ^{Q^{'}})$ ;
- Critic network $μ (s; θ^{μ})$ and its target network $μ^{'} (s; θ^{μ^{'}})$ .

1:
Initialize $Q (s_{j}, a_{j}; θ^{Q})$ , $μ (s; θ^{μ})$ and their target networks $Q^{'} (s, a; θ^{Q^{'}})$ , $μ^{'} (s; θ^{μ^{'}})$ .
2:
for $m = 1$ to M do
3:
Reset environment and read the initial state $s_{0}$ .
4:
Output $a_{0}$ according to Equation (18).
5:
for $t = 1$ to T do
6:
Observe current state $s_{t}$ and reward $r_{t}$ of environment and calculate current action $a_{t}$ according to Equation (18).
7:
Save current transition $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ into experiences memory D.
8:
if $t mod K \equiv 0$ then
9:
Reset the gradient $Δ = 0$ of $Q (s_{j}, a_{j}; θ^{Q})$ with IS.
10:
for $j = 0$ to k do
11:
Sample traing data $j \sim P (j)$ according to Equation (27)
12:
Calculate IS weight $ω_{j}$ according to Equation (31)
13:
Calculate TD-error $δ_{j}$ of training data according to Equation (22) and update its priority according to Equation (28)
14:
Accumulate $Δ$ according to Equation (30).
15:
end for
16:
Update the parameters of $Q (s_{j}, a_{j}; θ^{Q})$ according to $Δ$ with learning rate $η$ .
17:
Update the parameters of $μ (s; θ^{μ})$ according to Equation (26).
18:
Update the parameters of target networks $Q^{'} (s, a; θ^{Q^{'}})$ and $μ^{'} (s; θ^{μ^{'}})$ according to Equation (32)
19:
end if
20:
end for
21:
end for