Skip to main content
. Author manuscript; available in PMC: 2017 Dec 10.
Published in final edited form as: Stat Med. 2016 Jul 24;35(28):5189–5209. doi: 10.1002/sim.7047

Table 3.

Decision making and policy improvement in H-Approximation method

To make a decision when feature values f(h) is observed:
  • Set A ← {};

  • For each action a ∈ 𝒜:

    • If a1(f(h); θ̃a1) > a0(f(h); θ̃a0) then AAa;

  • Return A.

To update regression models a1(·; θ̃a1) and a0(·; θ̃a0) when the loss-to-go is incurred if action  is taken upon observing history ĥ:
  • For each action a ∈ 𝒜:

    • If a then θ̃a1𝒰(f(ĥ), Â, q̂; λ,θ̃a1),

    • Else θ̃a0𝒰(f(ĥ), Â, q̂; λ, θ̃a0).