Distributed Learning Based Joint Communication and Computation Strategy of IoT Devices in Smart Cities

. 2020 Feb 12;20(4):973. doi: 10.3390/s20040973

Algorithm 2 Distributed Q-learning for device m with Boltzmann learning policy.

for $i = 0$ to N do
$Q_{m}^{i} (0) = 0$
$p_{m}^{i} (0) = 1 / (N + 1)$
end for
for $k = 1$ to $k^{*}$ do
if $max p_{m}^{i} (k) \geq 0.99$ then
exit.
end if
Choose the action $a_{m} (k)$ according to its current action selection probability vector $p_{m} (k)$ .
Based on $a_{m} (k)$ , change the computing state and sub-channel.
Observe sub-channel interference and calculate the reward $r_{m}^{a_{m} (k)} (k)$ according to (26).
for $i = 0$ to N do
Calculate the $Q_{m}^{i} (k)$ according to (25) with the learning rate formulated at (33)
Update the action selection probability vector according to (31).
end for
end for