Safe Decision Controller for Autonomous DrivingBased on Deep Reinforcement Learning inNondeterministic Environment

. 2023 Jan 20;23(3):1198. doi: 10.3390/s23031198

Algorithm 2 Generation algorithm for the safe decision controller

Input:

M D P = (S, A, T, R), τ_{r} = < r, φ, φ_{r}, r i s k, r i s k_{r} >, d o n e, u p d a t e

Output: Optimal safety policy

π

1: Init (

π

)
2:

s_{t} \leftarrow s_{0}

a \leftarrow N O P

4: while

s_{t}! = d o n e and e p i s o d e < e p i s o d e s

do
5: // Iterative selection of optimal value
6:

A^{^{'}} = A

a \leftarrow c h o o s e (A^{^{'}})

// Select action
8:

e n v \leftarrow Environmental sampling

9: while

m o n i t o r (s_{t}, a, e n v)

do // monitor
10:

A^{^{'}} \leftarrow A^{^{'}} - a

11: if

A^{^{'}}! = ⌀

then
12:

a \leftarrow c h o o s e (A^{^{'}})

13: else
14:

a \leftarrow c h o o s e (a_{conservative} \in A)

15:           break
16:        end if
17:    end while
18:

u p d a t e (s_{t}, a, π)

// Update parameters
19: Go to the next state

s_{t + 1}

20: end while