| Algorithm 1. The Q-learning method pseudocode. |
|
Initialize Set the state and the action For each state and action Set End For Randomly choose an initial state While the terminal condition is not reached do Choose the best action from the current state from Q-table Execute action , then get the immediate reward Find out the new state Acquire the corresponding maximum Q-value of Update the Q-table by (19) Update the state End While |