Improved Deep Q-Network for User-Side Battery Energy Storage Charging and Discharging Strategy in Industrial Parks

. 2021 Oct 6;23(10):1311. doi: 10.3390/e23101311

Algorithm 1 Battery charging and discharging optimization algorithm

Require: electricity price V_pr

Ensure: earned money money

1: money, step, changedStep = 0

2: repeat

3: r = getRandom();

4: if r < ϵ then

5: selet a_t randomly;

6: else

7: select a_t = argmax_aQ(s_t, a|θ);

8: end if

9: ϵ = ϵ − △ϵ

10: execute charging/discharging action a_t, and get reward r_t and new state s_t+1;

11: store (s_t, a_t, V_pr, s_t+1) in replay memory D;

12: sample random minibatch of transitions from D;

13: calculate accumulative reward by target Q-network with parameters θ⁻;

14: perform a gradient decent learning on Q-network with parameters θ;

15: if step/N == 0 then

16: update target Q-network parameters with Q-network parameters;

17: end if

18: if changedStep + + > M and isFull(E_st) then

19: switch to next battery group;

20: changedStep = 0;

21: end if

22: calculate earned money;

23: until (step = = MaxStep)

24: return money;