|
Algorithm 1 The Q-learning based resource allocation algorithm |
initialize the table entry arbitrarily for each state-action pair
observe the current state , initialize the value of and
forepisode = 1 to Mdo
-
from the current state-action pair , execute action and obtain
the immediate reward and a new state
-
select an action based on the state and update the table entry for
as expressed in Equation (18)
replace
end for
Output:
|