Deep Reinforcement Learning-Based Coordinated Beamforming for mmWave Massive MIMO Vehicular Networks

. 2023 Mar 3;23(5):2772. doi: 10.3390/s23052772

Algorithm 1 Proposed deep Q-learning algorithm

1:
Initialize policy, target DQN with random w, $w^{'}$
2:
Initialize $ϵ$
3:
for episode do
4:
for instance do
5:
Select a channel matrix and add it to action space $A_{t}$ for present state space $S_{t}$
6:
Observe immediate reward $R_{t}$ , next state space $S_{t + 1}$
7:
Put $(S_{t}, A_{t}, R_{t}, S_{t + 1})$ → ERM
8:
Form random sample mini batch of $(S_{t}, A_{t}, R_{t}, S_{t + 1})$ from ERM
9:
for each tuple in mini batch do
10:
Calculate Q-values
11:
Approximate $Q^{*}$ -values using target DNN
12:
Compute loss from Q and $Q^{*}$
13:
Optimize w of policy DNN with Adam optimizer
14:
$w^{'} \leftarrow w$ after all time steps