. Author manuscript; available in PMC: 2009 Sep 22.

Published in final edited form as: J Artif Intell Res. 2008 Jul 1;32(2):663–704.

Algorithm 3.7.

RTDP-Bel Algorithm.

1:	Function OnlinePOMDPSolver()
	Static: b_c: The current belief state of the agent.
	V₀: Initial approximate value function (computed offline).
	V: A hashtable of beliefs and their approximate value.
	k: Discretization resolution.
2:	Initialize b_c to the initial belief state and V to an empty hashtable.
3:	while not ExecutionTerminated() do
4:	For all a ∈ A: Evaluate Q(b_c, a) = R_B(b, a) + γΣ_z_∈_Z Pr(z\|b, a) V(Discretize(τ(b, a, z), k))
5:	â ← argmax_a_∈_A Q(b_c, a)
6:	Execute best action â for b_c
7:	V(Discretize(b_c, k)) ←Q(b_c, â)
8:	Perceive a new observation z
9:	b_c ← τ(b_c, â, z)
10:	end while