Skip to main content
. 2020 Feb 12;20(4):973. doi: 10.3390/s20040973
Algorithm 1 Distributed Q-learning for device m with ϵ-greedy learning policy.
  • fori=0 to N do

  •     Qmi(0)=0

  • end for

  • fork=1 to k* do

  •     ϵk=ϵ0k1/N

  •     Follow the ϵ-greedy policy (28) to get the next action am(k).

  •     Based on am(k), change the computing state and sub-channel.

  •     Observe sub-channel interference and calculate the reward rmam(k)(k) according to (26).

  •     for i=0 to N do

  •          Calculate the Qmi(k) according to (25) with the learning rate formulated at (30)

  •     end for

  • end for