. 2018 Dec 5;2018:887–896.

Table 2:

Estimate of the discounted expected return for policies over test set, γ = 0.99. V_d indicates approximating the MoE V by DQN V function, V_b indicates approximating the MoE V by behavioral policy, namely, physician V function

	Physician	Kernel	DQN	MoE_{V_d,Q_d}	MoE_{V_b;Q_b}
non-recurrent encoded	3.76	3.73	4.06	3.93	4.31
recurrent encoded	3.76	4.46	4.23	5.03	5.72