|
Algorithm 1 Distributed Q-learning for device m with -greedy learning policy. |
for to N
do
end for
for to
do
Follow the -greedy policy (28) to get the next action .
Based on , change the computing state and sub-channel.
Observe sub-channel interference and calculate the reward according to (26).
for
to N
do
Calculate the according to (25) with the learning rate formulated at (30)
end for
end for
|