Distributed Learning Based Joint Communication and Computation Strategy of IoT Devices in Smart Cities

. 2020 Feb 12;20(4):973. doi: 10.3390/s20040973

Algorithm 1 Distributed Q-learning for device m with

ϵ

-greedy learning policy.

for $i = 0$ to N do
$Q_{m}^{i} (0) = 0$
end for
for $k = 1$ to $k^{*}$ do
$ϵ_{k} = ϵ_{0} k^{- 1 / N}$
Follow the $ϵ$ -greedy policy (28) to get the next action $a_{m} (k)$ .
Based on $a_{m} (k)$ , change the computing state and sub-channel.
Observe sub-channel interference and calculate the reward $r_{m}^{a_{m} (k)} (k)$ according to (26).
for $i = 0$ to N do
Calculate the $Q_{m}^{i} (k)$ according to (25) with the learning rate formulated at (30)
end for
end for