Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

. 2021 Jun 11;23(6):737. doi: 10.3390/e23060737

Algorithm 4 Approximate State Matching Q-learning Algorithm

Input: Environment

E : S = S_{u}; A = A_{u}; T = T_{u}; R = R_{u}

Output: Learned Q table
1: Initialize Q table
2: while

i \leq

the number of iterations do
  3:    Return s to the initial state
  4:    while s is not terminal state do
  5:        if s is not in Q table then
  6:           Add s to the Q table and initialize s
  7:        end if
  8:        if s is a Guided State then
  9:           Choose a from A using policy derived from

Q (s)

10: else
11: Select the similar set

S_{e n e}

of s using Algorithm 2
12:

Q (s, a, t e m p o r a r y) = \frac{\sum_{ϵ \in S_{e n e}} Q (ϵ, a)}{∥ S_{e n e} ∥}

for all

a \in A

13: Choose a from A using policy derived from

Q (s, t e m p o r a r y)

14: if

a \neq e

then
15: Select the similar set

S_{v p} \cap S_{h p}

of s using Algorithm 3
16:

Q (s, a, t e m p o r a r y) = \frac{\sum_{(ϵ \in S_{v p} \cap S_{h p}) \land Q (ϵ, a) \geq 0} Q (ϵ, a)}{∥ S_{v p} \cap S_{h p} ∥}

for all

a \in A

17: Choose a from A using policy derived from

Q (s, t e m p o r a r y)

18:           end if
19:        end if
20:        Take action a, observe

s^{^{'}} = T_{u} (s, a)

and

r = R_{u} (s, a, s^{'})

21:

Q (s, a) \leftarrow Q (s, a) + α [r + γ m a x_{a^{^{'}}} Q (s^{^{'}}, a^{^{'}}) - Q (s, a)]

22:

s \leftarrow s^{^{'}}

23: end while
24: end while
25: Return Q table