Skip to main content
. 2021 Oct 6;23(10):1311. doi: 10.3390/e23101311
Algorithm 1 Battery charging and discharging optimization algorithm
Require: electricity price Vpr
Ensure: earned money money
1: money, step, changedStep = 0
2: repeat
3:  r = getRandom();
4:  if r < ϵ then
5:     selet at randomly;
6:  else
7:     select at = argmaxaQ(st, a|θ);
8:  end if
9:  ϵ = ϵ − △ϵ
10:      execute charging/discharging action at, and get reward rt and new state st+1;
11:      store (st, at, Vpr, st+1) in replay memory D;
12:      sample random minibatch of transitions from D;
13:      calculate accumulative reward by target Q-network with parameters θ;
14:      perform a gradient decent learning on Q-network with parameters θ;
15:      if step/N == 0 then
16:      update target Q-network parameters with Q-network parameters;
17:      end if
18:      if changedStep + + > M and isFull(Est) then
19:      switch to next battery group;
20:      changedStep = 0;
21:      end if
22:      calculate earned money;
23: until (step = = MaxStep)
24: return money;