Skip to main content
. 2022 Oct 28;22(21):8278. doi: 10.3390/s22218278
Algorithm 2:Value iteration algorithm.

Initialize V arbitrarily

Repeat

      Δ0

      For each sS

            vV(s)             V(s)maxas,rp(s,r|s,a)[r+γV(s)]

            Δmax(Δ,|vV(s)|)

until Δ<θ (a small positive number)

output a deterministic policy, π, such that

π(s)=argmaxas,rp(s,r|s,a)[r+γV(s)]