Intelligent Resource Allocation for V2V Communication with Spectrum–Energy Efficiency Maximization

. 2023 Jul 29;23(15):6796. doi: 10.3390/s23156796

Algorithm 1: MDQN training phase for V2V resource allocation

1:
Input: V2V link settings $B_{k} = B$ and $T_{k} = T$ for all $k \in K$
2:
Output: Trained Q-value function $Q (s, a, θ)$ and target Q-value network
3:
Activate environmental simulator and generate vehicles
4:
For each V2V link $k \in K$ :
5:
For each step:
6:
Select subchannels and transmit power based on policy
7:
Receive feedback on status and rewards of actions from ambient simulators
8:
Collect and store data quadruplet state, reward, action, previous state in memory bank
9:
Select a small batch of data from experience pool to train neural network
10:
Perform gradient descent according to Equation (20)
11:
If step is a multiple of C:
12:
Copy Q-value network weights to target Q-value network
13:
End For
14:
End For
15:
Return: Trained Q-value function $Q (s, a, θ)$ and target Q-value network