Algorithm 1.
Learn Q̂T = (Q̂T[1], …, Q̂T[D]), set 𝒬T = {Q̂T} |
for t = T − 1, T − 2, …, 1 do |
for all in the data do |
Generate using 𝒬t+1 |
𝒬t ← ∅ |
for all do |
for all Q̂t+1 ∈ 𝒬t+1 do |
Learn (Q̂t[1](·, ·, πt, …), …, Q̂t[D] (·, ·, πt, …)) using Q̂t+1, add to 𝒬t |