Algorithm 1: CR-DRQN pseudocode |
Initialize replay memory D with capacity N
Initialize online Q network with parameters randomly
Initialize target Q network with parameters
For episode =1:M do
Initialize observed state
For t =1:T do
With probability select random action , otherwise select
Execute action in emulator and get reward and next observed state
Store transition in D
Set
Update network parameters by using the gradient descent of
Every C steps reset
End for
End for
|