1: |
Function OnlinePOMDPSolver() |
|
Static: bc: The current belief state of the agent. |
|
V0: Initial approximate value function (computed offline). |
|
V: A hashtable of beliefs and their approximate value. |
|
k: Discretization resolution. |
2: |
Initialize bc to the initial belief state and V to an empty hashtable. |
3: |
while not ExecutionTerminated() do
|
4: |
For all a ∈ A: Evaluate Q(bc, a) = RB(b, a) + γΣz∈Z Pr(z|b, a) V(Discretize(τ(b, a, z), k)) |
5: |
â ← argmaxa∈A Q(bc, a) |
6: |
Execute best action â for bc
|
7: |
V(Discretize(bc, k)) ←Q(bc, â) |
8: |
Perceive a new observation z
|
9: |
bc ← τ(bc, â, z) |
10: |
end while |