Skip to main content
. Author manuscript; available in PMC: 2021 Dec 1.
Published in final edited form as: Trends Neurosci. 2020 Oct 19;43(12):980–997. doi: 10.1016/j.tins.2020.09.004

Figure 3. Distributional RL as minimizing a loss function.

Figure 3.

(a) The reward probabilities of an example reward distribution. Mean Vmean, median Vmedian, 0.25-quantile V0.25-quantile and 0.97-expectile V0.97-expectile of this distribution are indicated with different colors.

(b-e) Loss as a function of the value estimate V (left) when the rewards follow the distribution presented in (a), illustrating that V = Vmean minimizes the mean squared error (b), V = Vmedian minimizes the mean absolute error (c), V = V0.25-quantile minimizes the quantile regression loss for τ = 0.25 (d), and V = V0.97-expectile minimizes the expectile regression loss for τ = 0.97 (e). The right panels show the loss as a function of the RPE δ.