|
Algorithm 2: Pseudocode for the DRL approach. |
|
1.2
-
1:
Initialize the experience replay buffer .
-
2:
for each UAV i in N do
-
3:
Initialize the actor network with weights .
-
4:
Initialize the critic network .
-
5:
Initialize the target actor network with weights .
-
6:
Initialize the target critic network with weights .
-
7:
end for
-
8:
for each episode in do
-
9:
Initialize the locations of the UAVs.
-
10:
The initial speed is zero for the UAVs, and their battery energy is .
-
11:
Initialize the environment.
-
12:
Receive the initial state .
-
13:
for each time t in T do
-
14:
for each UAV i in N do
-
15:
Select action , where is the noise term.
-
16:
end for
-
17:
UAVs execute their actions .
-
18:
Update next state , and obtain reward .
-
19:
for each UAV i in N do
-
20:
if UAV i moves outside the region or close to other UAVs then
-
21:
Find .
-
22:
Neglect the new location and update .
-
23:
end if
-
24:
end for
-
25:
Update .
-
26:
Store in the buffer.
-
27:
for each UAV i in N do
-
28:
Sample L random mini-batches .
-
29:
Find , where .
-
30:
Update weights by minimizing: .
-
31:
Update weights by minimizing: .
-
32:
Update the target network’s weights: and .
-
33:
end for
-
34:
end for
-
35:
end for
|