. Author manuscript; available in PMC: 2017 Dec 10.

Published in final edited form as: Stat Med. 2016 Jul 24;35(28):5189–5209. doi: 10.1002/sim.7047

Table 3.

Decision making and policy improvement in H-Approximation method

To make a decision when feature values f(h) is observed:

Set A ← {};
For each action a ∈ 𝒜:
- If H̃_a₁(f(h); θ̃_a₁) > H̃_a₀(f(h); θ̃_a₀) then A ← A ∪ a;
Return A.

To update regression models H̃_a₁(·; θ̃_a₁) and H̃_a₀(·; θ̃_a₀) when the loss-to-go q̂ is incurred if action Â is taken upon observing history ĥ:

For each action a ∈ 𝒜:
- If a ∈ Â then θ̃_a₁ ← 𝒰(f(ĥ), Â, q̂; λ,θ̃_a₁),
- Else θ̃_a₀ ← 𝒰(f(ĥ), Â, q̂; λ, θ̃_a₀).