Skip to main content
. 2023 Jul 29;23(15):6796. doi: 10.3390/s23156796
Algorithm 1: MDQN training phase for V2V resource allocation
  • 1:

    Input: V2V link settings Bk=B and Tk=T for all kK

  • 2:

    Output: Trained Q-value function Q(s,a,θ) and target Q-value network

  • 3:

    Activate environmental simulator and generate vehicles

  • 4:

    For each V2V link kK:

  • 5:

         For each step:

  • 6:

              Select subchannels and transmit power based on policy

  • 7:

              Receive feedback on status and rewards of actions from ambient simulators

  • 8:

              Collect and store data quadruplet state, reward, action, previous state in memory bank

  • 9:

              Select a small batch of data from experience pool to train neural network

  • 10:

            Perform gradient descent according to Equation (20)

  • 11:

            If step is a multiple of C:

  • 12:

                   Copy Q-value network weights to target Q-value network

  • 13:

          End For

  • 14:

    End For

  • 15:

    Return: Trained Q-value function Q(s,a,θ) and target Q-value network