Skip to main content
. 2022 Jan 14;22(2):636. doi: 10.3390/s22020636
Algorithm 1: CR-DRQN pseudocode
  1. Initialize replay memory D with capacity N

  2. Initialize online Q network with parameters ω randomly

  3. Initialize target Q network with parameters ω-=ω

  4. For episode =1:M do

  5.  Initialize observed state o1=O(s1)

  6.  For t =1:T do

  7.   With probability ε select random action at, otherwise select at=argmaxaQ(ϕ(ot),a;ω)

  8.   Execute action at in emulator and get reward rt+1 and next observed state ot+1

  9.   Store transition (ot,at,rt+1,ot+1) in D

  10.   Set yj={rj+1,if episode terminates at step j+1rj+1+γmaxaQ(ϕj+1,a;ω),otherwise

  11.   Update network parameters ω by using the gradient descent of (yjQ(ϕj,aj;ω))2

  12.   Every C steps reset Q=Q

  13.  End for

  14. End for