Reinforcement Learning (RL)-Based Energy Efficient Resource Allocation for Energy Harvesting-Powered Wireless Body Area Network

. 2019 Dec 19;20(1):44. doi: 10.3390/s20010044

Algorithm 1 The Q-learning based resource allocation algorithm

initialize the table entry $Q (s, a)$ arbitrarily for each state-action pair $(s, a)$
observe the current state $s$ , initialize the value of $α$ and $γ$
forepisode = 1 to Mdo
from the current state-action pair $(s, a)$ , execute action $a$ and obtain

the immediate reward $r$ and a new state $s^{'}$
select an action $a^{'}$ based on the state $s^{'}$ and update the table entry for

$Q (s, a)$ as expressed in Equation (18)
replace $s \leftarrow s^{'}$
end for
Output: $π^{*} (s) = a r g m a x_{a} Q^{*} (s, a)$