| Algorithm 1 Battery charging and discharging optimization algorithm |
| Require: electricity price Vpr |
| Ensure: earned money money |
| 1: money, step, changedStep = 0 |
| 2: repeat |
| 3: r = getRandom(); |
| 4: if r < ϵ then |
| 5: selet at randomly; |
| 6: else |
| 7: select at = argmaxaQ(st, a|θ); |
| 8: end if |
| 9: ϵ = ϵ − △ϵ |
| 10: execute charging/discharging action at, and get reward rt and new state st+1; |
| 11: store (st, at, Vpr, st+1) in replay memory D; |
| 12: sample random minibatch of transitions from D; |
| 13: calculate accumulative reward by target Q-network with parameters θ−; |
| 14: perform a gradient decent learning on Q-network with parameters θ; |
| 15: if step/N == 0 then |
| 16: update target Q-network parameters with Q-network parameters; |
| 17: end if |
| 18: if changedStep + + > M and isFull(Est) then |
| 19: switch to next battery group; |
| 20: changedStep = 0; |
| 21: end if |
| 22: calculate earned money; |
| 23: until (step = = MaxStep) |
| 24: return money; |