Skip to main content
. 2019 Dec 19;20(1):44. doi: 10.3390/s20010044
Algorithm 1 The Q-learning based resource allocation algorithm
  1. initialize the table entry Qs,a arbitrarily for each state-action pair s,a

  2. observe the current state s, initialize the value of α and γ

  3. forepisode = 1 to Mdo

  4.   from the current state-action pair s,a, execute action a and obtain

      the immediate reward r and a new state s

  5.   select an action a based on the state s and update the table entry for

      Qs,a as expressed in Equation (18)

  6.   replace ss

  7. end for

  8. Output:πs=argmaxaQs,a