1: |
Function Expand(b, d) |
|
Inputs: b: The belief node we want to expand. |
|
d: The depth of expansion under b. |
|
Static: T: An AND-OR tree representing the current search tree. |
|
Π: A set of initial policies. |
|
M: The number of trajectories of depth d to sample. |
2: |
LT (b) ← −∞ |
3: |
for all a ∈ A do
|
4: |
for all π ∈ Π do
|
5: |
Q̂π(b, a) ← 0 |
6: |
for i = 1 to M do
|
7: |
b̃ ← b
|
8: |
ã ← a
|
9: |
for j = 0 to d do
|
10: |
|
11: |
z ← SampleObservation(b̃, ã) |
12: |
b̃ ← τ(b̃, ã, z) |
13: |
ã ← p(b̃) |
14: |
end for
|
15: |
end for
|
16: |
end for
|
17: |
LT (b, a) = maxπ∈ Π Q̂π (b, a) |
18: |
end for |