| Algorithm 2: Pseudocode for distributed Q-learning |
| Initialization: |
| for each do |
| initialize Q-table and policy |
| end for |
| Learning: |
| loop |
| estimate state |
| generate a random real number x |
| if // for exploration |
| elect action randomly |
| else |
| select action according to |
| receive action from algorithm1 |
| determine action by comparing and |
| execute action |
| calculate reward |
| update Q-value and |
| end loop |