1: |
Function OnlinePOMDPSolver() |
Static: bc: The current belief state of the agent. |
V0: Initial approximate value function (computed offline). |
V: A hashtable of beliefs and their approximate value. |
k: Discretization resolution. |
2: |
Initialize bc to the initial belief state and V to an empty hashtable. |
3: |
while not ExecutionTerminated() do
4: |
For all a ∈ A: Evaluate Q(bc, a) = RB(b, a) + γΣz∈Z Pr(z|b, a) V(Discretize(τ(b, a, z), k)) |
5: |
â ← argmaxa∈A Q(bc, a) |
6: |
Execute best action â for bc
7: |
V(Discretize(bc, k)) ←Q(bc, â) |
8: |
Perceive a new observation z
9: |
bc ← τ(bc, â, z) |
10: |
end while |