Energy-Efficient UAV Movement Control for Fair Communication Coverage: A Deep Reinforcement Learning Approach

View full-text article in PMC

. 2022 Mar 1;22(5):1919. doi: 10.3390/s22051919

Algorithm 2: Pseudocode for the DRL approach.

1.2

1:
Initialize the experience replay buffer $B$ .
2:
for each UAV i in N do
3:
Initialize the actor network $π (s_{t} |θ^{π})$ with weights $θ^{π}$ .
4:
Initialize the critic network $Q (s, a |θ^{Q})$ .
5:
Initialize the target actor network $π^{^{'}} (s |θ^{π^{^{'}}})$ with weights $θ^{π^{^{'}}}$ .
6:
Initialize the target critic network $Q^{^{'}} (s, a |θ^{Q^{^{'}}})$ with weights $θ^{Q^{^{'}}}$ .
7:
end for
8:
for each episode in $H$ do
9:
Initialize the locations of the UAVs.
10:
The initial speed is zero for the UAVs, and their battery energy is $E_{m a x}$ .
11:
Initialize the environment.
12:
Receive the initial state $s_{1}$ .
13:
for each time t in T do
14:
for each UAV i in N do
15:
Select action $a_{t}^{i} = π^{i} (s_{t} |θ^{π}) + N$ , where $N$ is the noise term.
16:
end for
17:
UAVs execute their actions $a_{t} = (a_{t}^{1}, \dots, a_{t}^{N})$ .
18:
Update next state $s_{t + 1}$ , and obtain reward $r_{t} = (r_{t}^{1}, \dots, r_{t}^{N})$ .
19:
for each UAV i in N do
20:
if UAV i moves outside the region or close to other UAVs then
21:
Find $r_{t}^{i} = U_{t} - f_{t}^{i}$ .
22:
Neglect the new location and update $O_{t}^{i}$ .
23:
end if
24:
end for
25:
Update $s_{t} ⟵ s_{t + 1}$ .
26:
Store $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ in the buffer.
27:
for each UAV i in N do
28:
Sample L random mini-batches $(s_{t}, a_{t}, r_{t}, s_{t + 1}) \in B$ .
29:
Find $y_{t} (b) = r_{t} (b) + γ Q^{^{'}} (s (b + 1), π^{^{'}} (s (b + 1) | θ^{π^{^{'}}} .) | θ^{Q^{^{'}}} .)$ , where $b = 1, \dots, L$ .
30:
Update weights $θ^{Q}$ by minimizing: $L (θ^{Q}) = \frac{1}{L} \sum_{b = 1}^{L} {[y_{t} (b) - Q (s (b), a (b) |θ^{Q})]}^{2}$ .
31:
Update weights $θ^{π}$ by minimizing: $L (θ^{π}) = \frac{1}{L} \sum_{b = 1}^{L} - Q (s (b), π (s (b) |θ^{π}) |θ^{Q})$ .
32:
Update the target network’s weights: $θ^{Q^{^{'}}} = ε θ^{Q} + (1 - ε) θ^{Q^{^{'}}}$ and $θ^{π^{^{'}}} = ε θ^{π} + (1 - ε) θ^{π^{^{'}}}$ .
33:
end for
34:
end for
35:
end for