A Collision Relationship-Based Driving Behavior Decision-Making Method for an Intelligent Land Vehicle at a Disorderly Intersection via DRQN

. 2022 Jan 14;22(2):636. doi: 10.3390/s22020636

Algorithm 1: CR-DRQN pseudocode

Initialize replay memory D with capacity N
Initialize online Q network with parameters $ω$ randomly
Initialize target Q network with parameters $ω^{-} = ω$
For episode =1:M do
Initialize observed state $o_{1} = O (s_{1})$
For t =1:T do
With probability $ε$ select random action $a_{t}$ , otherwise select $a_{t} {= argmax}_{a} Q (ϕ (o_{t}), a; ω)$
Execute action $a_{t}$ in emulator and get reward $r_{t + 1}$ and next observed state $o_{t + 1}$
Store transition $(o_{t}, a_{t}, r_{t + 1}, o_{t + 1})$ in D
Set $y_{j} = {\begin{matrix} r_{j + 1}, if episode terminates at step j + 1 \\ r_{j + 1} + γ \max_{a^{'}} Q^{'} (ϕ_{j + 1}, a^{'}; ω^{-}), o t h e r w i s e \end{matrix}$
Update network parameters $ω$ by using the gradient descent of ${(y_{j} - Q (ϕ_{j}, a_{j}; ω))}^{2}$
Every C steps reset $Q^{'} = Q$
End for
End for