Skip to main content
. 2021 Jun 11;23(6):737. doi: 10.3390/e23060737
Algorithm 1 Q–Learning Algorithm
Input: State set S, Action set A, Reward function R
Output: Q table
  1: Initialize Q table
  2: while i the number of iterations do
  3:    Return s to the initial state
  4:    while s is not terminal do
  5:        Choose a from A using policy derived from Q(s) (e.g., ϵ greedy)
  6:        Take action a, observe r, s
  7:        Q(s,a)Q(s,a)+α[r+γmaxaQ(s,a)Q(s,a)]
  8:        ss
  9:    end while
10: end while
11: Return Q