Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

. 2021 Jun 11;23(6):737. doi: 10.3390/e23060737

Algorithm 1 Q–Learning Algorithm

Input: State set S, Action set A, Reward function R
Output: Q table
1: Initialize Q table
2: while

i \leq

the number of iterations do
  3:    Return s to the initial state
  4:    while s is not terminal do
  5:        Choose a from A using policy derived from

Q (s)

(e.g.,

ϵ -

greedy)
6: Take action a, observe r,

s^{^{'}}

Q (s, a) \leftarrow Q (s, a) + α [r + γ m a x_{a^{^{'}}} Q (s^{^{'}}, a^{^{'}}) - Q (s, a)]

s \leftarrow s^{^{'}}

9: end while
10: end while
11: Return Q