. Author manuscript; available in PMC: 2009 Sep 22.

Published in final edited form as: J Artif Intell Res. 2008 Jul 1;32(2):663–704.

Algorithm 3.4.

Expand subroutine of the Parallel Rollout Algorithm.

1:	Function Expand(b, d)
	Inputs: b: The belief node we want to expand.
	d: The depth of expansion under b.
	Static: T: An AND-OR tree representing the current search tree.
	Π: A set of initial policies.
	M: The number of trajectories of depth d to sample.
2:	L_T (b) ← −∞
3:	for all a ∈ A do
4:	for all π ∈ Π do
5:	Q̂^π(b, a) ← 0
6:	for i = 1 to M do
7:	b̃ ← b
8:	ã ← a
9:	for j = 0 to d do
10:	${\hat{Q}}^{π} (b, a) \leftarrow {\hat{Q}}^{π} (b, a) + \frac{1}{M} γ^{j} R_{B} (\tilde{b}, \tilde{a})$
11:	z ← SampleObservation(b̃, ã)
12:	b̃ ← τ(b̃, ã, z)
13:	ã ← p(b̃)
14:	end for
15:	end for
16:	end for
17:	L_T (b, a) = max_π_{∈ Π} Q̂^π (b, a)
18:	end for