Skip to main content
. 2021 Jun 11;23(6):737. doi: 10.3390/e23060737
Algorithm 4 Approximate State Matching Q-learning Algorithm
Input: Environment E:S=Su;A=Au;T=Tu;R=Ru
Output: Learned Q table
  1: Initialize Q table
  2: whilei the number of iterations do
  3:    Return s to the initial state
  4:    while s is not terminal state do
  5:        if s is not in Q table then
  6:           Add s to the Q table and initialize s
  7:        end if
  8:        if s is a Guided State then
  9:           Choose a from A using policy derived from Q(s)
10:        else
11:           Select the similar set Sene of s using Algorithm 2
12:           Q(s,a,temporary)=ϵSeneQ(ϵ,a)Sene for all aA
13:           Choose a from A using policy derived from Q(s,temporary)
14:           if ae then
15:               Select the similar set SvpShp of s using Algorithm 3
16:               Q(s,a,temporary)=(ϵSvpShp)Q(ϵ,a)0Q(ϵ,a)SvpShp for all aA
17:               Choose a from A using policy derived from Q(s,temporary)
18:           end if
19:        end if
20:        Take action a, observe s=Tu(s,a) and r=Ru(s,a,s)
21:        Q(s,a)Q(s,a)+α[r+γmaxaQ(s,a)Q(s,a)]
22:        ss
23:    end while
24: end while
25: Return Q table