Skip to main content
. Author manuscript; available in PMC: 2016 Dec 22.
Published in final edited form as: J Mach Learn Res. 2016 Dec 1;17:211.

Algorithm 1.

Non-deterministic fitted-Q

Learn T = (T[1], …, T[D]), set 𝒬T = {T}
for t = T − 1, T − 2, …, 1 do
  for all sti in the data do
    Generate Π(sti) using 𝒬t+1
  𝒬t ← ∅
  for all πt𝒞ϕ(Π) do
    for all t+1 ∈ 𝒬t+1 do
      Learn (t[1](·, ·, πt, …), …, t[D] (·, ·, πt, …)) using t+1, add to 𝒬t