Skip to main content
. Author manuscript; available in PMC: 2009 Sep 22.
Published in final edited form as: J Artif Intell Res. 2008 Jul 1;32(2):663–704.

Algorithm 3.7.

RTDP-Bel Algorithm.

1: Function OnlinePOMDPSolver()
Static: bc: The current belief state of the agent.
  V0: Initial approximate value function (computed offline).
  V: A hashtable of beliefs and their approximate value.
  k: Discretization resolution.
2: Initialize bc to the initial belief state and V to an empty hashtable.
3: while not ExecutionTerminated() do
4:  For all aA: Evaluate Q(bc, a) = RB(b, a) + γΣzZ Pr(z|b, a) V(Discretize(τ(b, a, z), k))
5: â ← argmaxaA Q(bc, a)
6:  Execute best action â for bc
7: V(Discretize(bc, k)) ←Q(bc, â)
8:  Perceive a new observation z
9: bcτ(bc, â, z)
10: end while