|
Algorithm 1: MDQN training phase for V2V resource allocation |
-
1:
Input: V2V link settings and for all
-
2:
Output: Trained Q-value function and target Q-value network
-
3:
Activate environmental simulator and generate vehicles
-
4:
For each V2V link :
-
5:
For each step:
-
6:
Select subchannels and transmit power based on policy
-
7:
Receive feedback on status and rewards of actions from ambient simulators
-
8:
Collect and store data quadruplet state, reward, action, previous state in memory bank
-
9:
Select a small batch of data from experience pool to train neural network
-
10:
Perform gradient descent according to Equation (20)
-
11:
If step is a multiple of C:
-
12:
Copy Q-value network weights to target Q-value network
-
13:
End For
-
14:
End For
-
15:
Return: Trained Q-value function and target Q-value network
|