Skip to main content
. Author manuscript; available in PMC: 2017 Apr 1.
Published in final edited form as: IEEE Trans Autom Sci Eng. 2016 Jan 27;13(2):437–447. doi: 10.1109/TASE.2016.2517124

Algorithm 1.

SELQR

Input: stochastic continuous-time dynamics (Eq. (1)); ct: local cost functions for 0 ≤ tl; Δ: time step duration; l: number of time steps
Data: : smoothed states; π: control policy; π: inverse control policy; vt: cost-to-go function; t: cost-to-come function
1 πt = 0, St = 0, st = 0, st = 0
2 repeat
3     0 := 0, 0 := 0, 0 := 0
4     for t := 0; t < l; t := t + 1 do
5         t = –(St + t)–1 (st + t) (smoothed states)
6         ût = πt(t), t+1 = g(t, ût)
7         Linearize inverse discrete dynamics around (t+1, ût) (Eq. (16))
8         Quadratize ct around (t, ūt) (Eq. (12))
9         Compute t+1, t+1, t+1, t+1, πt (forward value iteration in Sec. IV-C)
10         end
11     Quadratize cl around l in the form of Eq. (12) to compute Ql, ql, and ql
12     Sl := Ql, sl := ql, and sl := ql.
13     for t := l – 1; t ≥ 0; t := t – 1 do
14         t+1 = –(St+1 + t+1)–1(st+1 + t+1) (smoothed states)
15         u^t=πt(x^t+1), t = (t+1, ût)
16         Linearize stochastic discrete dynamics around (t, ût) (Eq. (11))
17         Quadratize ct around (t, ût) (Eq. (12))
18         Compute St, st, st, vt, πt (backward value iteration in Sec. IV-B)
19         end
20 until Converged (e.g., v0 stops changing significantly);
21 return πt for 0 ≤ tl