AUV-Aided Optical—Acoustic Hybrid Data Collection Based on Deep Reinforcement Learning

. 2023 Jan 4;23(2):578. doi: 10.3390/s23020578

Algorithm 2 DRL-Based Multi-Modal Data Collection Algorithm

1:
Input: Initialize the constants $k_{1}$ , $k_{2}$ and $k_{3}$ , maximum number of training sets E, reward discount factor $γ$ , learning rate $l_{r}$ , experience replay $B$ , minimum batch $Φ_{b}$ , exploration probability $ϵ$ , and update step $χ$ ;
2:
Initialize the current network $Q (s_{t}, a_{t}, θ)$ with weights $θ$ and the target network $Q (s_{t}, a_{t}, θ^{-})$ with weights $θ^{-}$ .
3:
for $e p i s o d e = 1, \dots, E$ do
4:
for $t = 1, \dots, T$ do
5:
Initialize the data collection network environment and observe the initial state $s_{t}$ .
6:
Select a random action $a_{t}$ according to the $ϵ$ -greedy algorithm.
7:
Determine the AUV steering angle with Algorithm 1.
8:
Execute action $a_{t}$ and observe the reward $r_{t}$ and the next state $s_{t + 1}$ .
9:
Store experience $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ in experience replay $B$ .
10:
Sample a random mini-batch of $Φ_{b}$ experiences from $B$ .
11:
Calculate the target value $y_{t}$ by (26).
12:
Update the current network weights $θ$ by (27).
13:
Update the weights of the target network $θ^{-} = θ$ every $χ$ steps.
14:
if $s_{t + 1}$ is the collection stop $n_{i}$ then
15:
Remove the CH $c_{i}$ from $N_{r}$ .
16:
end if
17:
Terminate the episode if $N_{r} = \emptyset$ holds.
18:
end for
19:
end for
20:
Output: The AUV trajectory $p_{a, t}$ and the AoI $A_{i}$ .