|
Algorithm 3 Proposed reinforcement learning algorithm |
Initialize where s is the set of states and a is the set of actions
while Battery lifetime is not equal to zero do
Determine current state
Select action a based on policy
Execute the selected action
Calculate the reward
Calculate the learning rate
Calculate Q value for the executed action
Calculate the value function for the executed action
Update the utility table of the scheduler agent
Move to the next state based on the executed action
end while
|