Algorithm 3.4.
Expand subroutine of the Parallel Rollout Algorithm.
1: | Function Expand(b, d) |
Inputs: b: The belief node we want to expand. | |
d: The depth of expansion under b. | |
Static: T: An AND-OR tree representing the current search tree. | |
Π: A set of initial policies. | |
M: The number of trajectories of depth d to sample. | |
2: | LT (b) ← −∞ |
3: | for all a ∈ A do |
4: | for all π ∈ Π do |
5: | Q̂π(b, a) ← 0 |
6: | for i = 1 to M do |
7: | b̃ ← b |
8: | ã ← a |
9: | for j = 0 to d do |
10: | |
11: | z ← SampleObservation(b̃, ã) |
12: | b̃ ← τ(b̃, ã, z) |
13: | ã ← p(b̃) |
14: | end for |
15: | end for |
16: | end for |
17: | LT (b, a) = maxπ∈ Π Q̂π (b, a) |
18: | end for |