Skip to main content
. 2020 Nov 23;20(22):6692. doi: 10.3390/s20226692
Algorithm 3 Proposed reinforcement learning algorithm
  • Initialize Q(s,a)=0 where s is the set of states and a is the set of actions

  • while Battery lifetime is not equal to zero do

  •     Determine current state

  •     Select action a based on policy

  • eQ(s,a)/ωa(eQ(s,a)/ω)
  •     Execute the selected action

  •     Calculate the reward

  • Rf=LQMl1LQMl+LQMl×AfornN.LQMl1LQMln>N.
  •     Calculate the learning rate

  • φ=Zvisited(s,a)
  •     Calculate Q value for the executed action

  • Qt+1(st,at)=(1φ)Qt(st,at)+φ(Rf(st+1)+ΓVt(st+1))
  •     Calculate the value function for the executed action

  • Vt+1(st)=maxaAQt+1(st,a)
  •     Update the utility table of the scheduler agent

  • U(q)=(1Υ)U(q)+ΥiRfi
  •     Move to the next state based on the executed action

  • end while