| Algorithm 3 Deep Q-Learning with Experience Replay. |
| 1: Initialize replay memory to capacity ; 2: Initialize action-value function with random weights 3: for episodes = 1, M do Initialize sequence and preprocessed sequenced 4: for t = 1,T do 5: With probability select a random action select a random action 6: otherwise select 7: Execute action in emulator and observe reward and image 7: Set and preprocess 8: Store transition in 9: Sample random minibatch of transitions from 9: Set using the RMSprop update rule 10: Perform a gradient descent step on 11: end for 12: end for |