| Algorithm 1 Q Learning Algorithm | |
| 1: | Initialize Q-table of size (states, actions) |
| 2: | Choose reward discount, learning rate and exploration rate |
| 3: | Do |
| 4: | Choose random number between 0 and 1 |
| 5: | If random number is less than exploration rate |
| 6: | Choose random action |
| 7: | Else |
| 8: | Choose maximum Q-action |
| 9: | Perform chosen action |
| 10: | Observe |
| 11: | Update Q entry based on previously defined Q-update |
| 12: | Until |
| 13: | Reward threshold is achieved |