Skip to main content
. 2020 Feb 12;20(4):973. doi: 10.3390/s20040973
Algorithm 2 Distributed Q-learning for device m with Boltzmann learning policy.
  • fori=0 to N do

  •     Qmi(0)=0

  •     pmi(0)=1/(N+1)

  • end for

  • fork=1 to k* do

  •      if maxpmi(k)0.99 then

  •        exit.

  •     end if

  •     Choose the action am(k) according to its current action selection probability vector pm(k).

  •     Based on am(k), change the computing state and sub-channel.

  •     Observe sub-channel interference and calculate the reward rmam(k)(k) according to (26).

  •     for i=0 to N do

  •          Calculate the Qmi(k) according to (25) with the learning rate formulated at (33)

  •          Update the action selection probability vector according to (31).

  •     end for

  • end for