. Author manuscript; available in PMC: 2017 Apr 1.

Published in final edited form as: IEEE Trans Autom Sci Eng. 2016 Jan 27;13(2):437–447. doi: 10.1109/TASE.2016.2517124

Algorithm 1.

SELQR

	Input: stochastic continuous-time dynamics (Eq. (1)); c_t: local cost functions for 0 ≤ t ≤ l; Δ: time step duration; l: number of time steps
	Data: x̂: smoothed states; π: control policy; $\overset{‒}{π}$ : inverse control policy; v_t: cost-to-go function; v̄_t: cost-to-come function
1	π_t = 0, S_t = 0, s_t = 0, s_t = 0
2	repeat
3	S̄₀ := 0, s̄₀ := 0, s̄₀ := 0
4	for t := 0; t < l; t := t + 1 do
5	x̂_t = –(S_t + S̄_t)^–1 (s_t + s̄_t) (smoothed states)
6	û_t = π_t(x̂_t), x̂_t+1 = g(x̂_t, û_t)
7	Linearize inverse discrete dynamics around (x̂_t+1, û_t) (Eq. (16))
8	Quadratize c_t around (x̂_t, ū_t) (Eq. (12))
9	Compute S̄_t+1, s̄_t+1, s̄_t+1, v̄_t+1, ${\overset{‒}{π}}_{t}$ (forward value iteration in Sec. IV-C)
10	end
11	Quadratize c_l around x̂_l in the form of Eq. (12) to compute Q_l, q_l, and q_l
12	S_l := Q_l, s_l := q_l, and s_l := q_l.
13	for t := l – 1; t ≥ 0; t := t – 1 do
14	x̂_t+1 = –(S_t+1 + S̄_t+1)^–1(s_t+1 + s̄_t+1) (smoothed states)
15	${\hat{u}}_{t} = {\overset{‒}{π}}_{t} ({\hat{x}}_{t + 1})$ , x̂_t = ḡ(x̂_t+1, û_t)
16	Linearize stochastic discrete dynamics around (x̂_t, û_t) (Eq. (11))
17	Quadratize c_t around (x̂_t, û_t) (Eq. (12))
18	Compute S_t, s_t, s_t, v_t, π_t (backward value iteration in Sec. IV-B)
19	end
20	until Converged (e.g., v₀ stops changing significantly);
21	return π_t for 0 ≤ t ≤ l