Skip to main content
. 2023 Oct 27;23(21):8766. doi: 10.3390/s23218766
Algorithm 1 Q-learning
  • initialize Q(s,a) arbitrarily, where s denotes the state of the agent and a denotes the action

  • for each episode do

  •     initialize s

  •     while s is not terminal state and steps number < max steps number do

  •         choose a from s using policy derived from Q

  •         take action a, observe reward r, and next state s

  •         Q(s,a)Q(s,a)+αr+γmaxaQ(s,a)Q(s,a)

  •         ss

  •     end while

  • end for