Skip to main content
. Author manuscript; available in PMC: 2009 Sep 22.
Published in final edited form as: J Artif Intell Res. 2008 Jul 1;32(2):663–704.

Algorithm 3.4.

Expand subroutine of the Parallel Rollout Algorithm.

1: Function Expand(b, d)
Inputs: b: The belief node we want to expand.
   d: The depth of expansion under b.
Static: T: An AND-OR tree representing the current search tree.
   Π: A set of initial policies.
   M: The number of trajectories of depth d to sample.
2: LT (b) ← −∞
3: for all aA do
4: for all π ∈ Π do
5:   π(b, a) ← 0
6:   for i = 1 to M do
7:    b
8:    ãa
9:    for j = 0 to d do
10:      Q^π(b,a)Q^π(b,a)+1MγjRB(b,a)
11:     z ← SampleObservation(b̃, ã)
12:     τ(b̃, ã, z)
13:     ãp()
14:    end for
15:   end for
16: end for
17: LT (b, a) = maxπ∈ Π π (b, a)
18: end for