| Algorithm 1: B-PER DQN algorithm |
| 1. Initialize the priority experience playback buffer , the capacity is 2. Initialize environment parameters and training parameters 3. For episode =1 to do 4. Reset the environment, get the initial states, initialize episode reward =0 5. While not done: 6. Select the action according to Equation (8). 7. Execute the action and store to 8. If > 256: 9. According to Equation (5), the sample of batch-size size is extracted from 10. Calculate and update network parameters according to Equation (11) 11. End if 12. Dynamically adjust according to Equation (10) 13. Attenuation according to Equation (9) 14. End for |