Application of Deep Reinforcement Learning to UAV Swarming for Ground Surveillance

. 2023 Oct 27;23(21):8766. doi: 10.3390/s23218766

Algorithm 1 Q-learning

initialize $Q (s, a)$ arbitrarily, where s denotes the state of the agent and a denotes the action
for each episode do
initialize s
while s is not terminal state and steps number < max steps number do
choose a from s using policy derived from Q
take action a, observe reward r, and next state $s^{'}$
$Q (s, a) \leftarrow Q (s, a) + α [r + γ {max}_{a^{'}} Q (s^{'}, a^{'}) - Q (s, a)]$
$s \leftarrow s^{'}$
end while
end for