Optimal User Scheduling in Multi Antenna System Using Multi Agent Reinforcement Learning

. 2022 Oct 28;22(21):8278. doi: 10.3390/s22218278

Algorithm 1:Q-learning algorithm.

1. Initialize

Q arbitrarily

Q (terminal) =0

Repeat

initialize s

Repeat

choose $a^{'} \in ϵ - g r e e d i l y$

take action a, observe $r, s^{'}$

$Q (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + α [r_{t + 1} + γ Q (s_{t + 1}, a_{t + 1}] - Q (s_{t}, a_{t})$

$s \leftarrow s^{'}$

s is terminal

until convergence