Skip to main content
. 2023 Jan 4;23(2):578. doi: 10.3390/s23020578
Algorithm 2 DRL-Based Multi-Modal Data Collection Algorithm
  •   1:

    Input: Initialize the constants k1, k2 and k3, maximum number of training sets E, reward discount factor γ, learning rate lr, experience replay B, minimum batch Φb, exploration probability ϵ, and update step χ;

  •   2:

    Initialize the current network Q(st,at,θ) with weights θ and the target network Q(st,at,θ) with weights θ.

  •   3:

    forepisode=1,,Edo

  •   4:

        for t=1,,T do

  •   5:

            Initialize the data collection network environment and observe the initial state st.

  •   6:

            Select a random action at according to the ϵ-greedy algorithm.

  •   7:

            Determine the AUV steering angle with Algorithm 1.

  •   8:

            Execute action at and observe the reward rt and the next state st+1.

  •   9:

            Store experience (st,at,rt,st+1) in experience replay B.

  • 10:

            Sample a random mini-batch of Φb experiences from B.

  • 11:

            Calculate the target value yt by (26).

  • 12:

            Update the current network weights θ by (27).

  • 13:

            Update the weights of the target network θ=θ every χ steps.

  • 14:

            if st+1 is the collection stop ni then

  • 15:

               Remove the CH ci from Nr.

  • 16:

            end if

  • 17:

            Terminate the episode if Nr= holds.

  • 18:

        end for

  • 19:

    end for

  • 20:

    Output: The AUV trajectory pa,t and the AoI Ai.