|
Algorithm 1 DQN training procedure. |
|
Input: Total training episodes , experience pool capacity D, target network update frequency F, training batch size B, attenuation coefficient , greedy value and maximum total penalty . |
|
Output: Target neural network with . |
| 1: Initialization: set initial episode , initial state , total reward of each episode , value neural network Q with random weight , target network with . |
| 2: while
do
|
| 3: while
|
|
do
|
| 4: Select an action through . |
| 5: Execute and obtain and . |
| 6: Store into D. |
| 7: Randomly select B-size data from D, and perform
|
| 8: Compute mean square error loss through
|
| 9: Utilize gradient descent algorithm to update . |
| 10: After F steps, perform . |
| 11: Perform . |
| 12: end while
|
| 13: Perform and . |
| 14: end while
|
| 15: return target neural network with . |