|
Algorithm 2 DRL-Based Multi-Modal Data Collection Algorithm |
-
1:
Input: Initialize the constants , and , maximum number of training sets E, reward discount factor , learning rate , experience replay , minimum batch , exploration probability , and update step ;
-
2:
Initialize the current network with weights and the target network with weights .
-
3:
fordo
-
4:
for do
-
5:
Initialize the data collection network environment and observe the initial state .
-
6:
Select a random action according to the -greedy algorithm.
-
7:
Determine the AUV steering angle with Algorithm 1.
-
8:
Execute action and observe the reward and the next state .
-
9:
Store experience in experience replay .
-
10:
Sample a random mini-batch of experiences from .
-
11:
Calculate the target value by (26).
-
12:
Update the current network weights by (27).
-
13:
Update the weights of the target network every steps.
-
14:
if is the collection stop then
-
15:
Remove the CH from .
-
16:
end if
-
17:
Terminate the episode if holds.
-
18:
end for
-
19:
end for
-
20:
Output: The AUV trajectory and the AoI .
|