Skip to main content
. 2018 Dec 5;2018:887–896.

Table 2:

Estimate of the discounted expected return for policies over test set, γ = 0.99. Vd indicates approximating the MoE V by DQN V function, Vb indicates approximating the MoE V by behavioral policy, namely, physician V function

Physician Kernel DQN MoEVd,Qd MoEVb;Qb
non-recurrent encoded 3.76 3.73 4.06 3.93 4.31
recurrent encoded 3.76 4.46 4.23 5.03 5.72