| Algorithm 1: DDPG | |
| 1. |
Initialize: The Critic networks and Actor network ; the weights are and . The Critic target networks and Actor target network have weights and The experience playback buffer (R) has size n. Empty the experience playback buffer (R). |
| 2. | for episode = 1, 2, …, T do |
| 3. | Reset the simulation parameters of the energy dispatch system to obtain the initial observation state, . |
| 4. | for i = 1, 2, …, I do |
| 5. | Normalize state to . |
| 6. | Obtain Actor network action and noise : |
| 7. | Execute action , obtain the reward, , and observe the new state, . |
| 8. | Store transmission to the Replay Buffer (R). |
| 9. | Select a batch of transition from R, |
| 10. | Calculate |
| 11. | Update the Critic network parameters based on the mean square loss function: . |
| 12. | Update the Actor network using the stochastic policy gradient: . |
| 13. | Update the target network parameters: , . |
| 14. | end for |
| 15 | end for |