| Algorithm 4 Approximate State Matching Q-learning Algorithm |
|
Input: Environment Output: Learned Q table 1: Initialize Q table 2: while the number of iterations do 3: Return s to the initial state 4: while s is not terminal state do 5: if s is not in Q table then 6: Add s to the Q table and initialize s 7: end if 8: if s is a Guided State then 9: Choose a from A using policy derived from 10: else 11: Select the similar set of s using Algorithm 2 12: for all 13: Choose a from A using policy derived from 14: if then 15: Select the similar set of s using Algorithm 3 16: for all 17: Choose a from A using policy derived from 18: end if 19: end if 20: Take action a, observe and 21: 22: 23: end while 24: end while 25: Return Q table |