Skip to main content
. 2020 Nov 18;20(22):6595. doi: 10.3390/s20226595
Algorithm 1. The Q-learning method pseudocode.
Initialize
 Set the state s and the action a
For each state si and action ai
Set Q(si,ai)=0
End For
 Randomly choose an initial state st
While the terminal condition is not reached do
Choose the best action at from the current state st from Q-table
Execute action at, then get the immediate reward
 Find out the new state st+1
 Acquire the corresponding maximum Q-value of st+1
 Update the Q-table by (19)
 Update the state stst+1
End While