|
Algorithm 1 Procedure of DQN Optimizer and Classifier. |
| 1 : Initialize replay memory |
| 2 : Initialize action value function Q with random weights |
| 3 : for = 1, M do |
| 4 : for t = 1, T do |
| 5 : With probability epsilon, select a random action |
| 6 : if random action is feature: |
| 7 : Execute action in emulator, and observe reward |
| 8 : Set state and preprocess policy |
| 9 : Store transition in replay memory |
| 10 : Perform a gradient descent step |
| 11 : if random action is subject number: |
| 12 : Execute action in the emulator, and observe reward |
| 13 : Set state and preprocess policy |
| 14 : Store transition in replay memory |
| 15 : end for |
| 16 : end for |