Skip to main content
. Author manuscript; available in PMC: 2016 Dec 22.
Published in final edited form as: J Mach Learn Res. 2016 Dec 1;17:211.

Figure 2.

Figure 2

Partial visualization of the members of an example 𝒬T−1. We fix a state sT−1 = (50.1, 48.6) in this example, and we plot T−1(sT, aT) for each T−1 ∈ 𝒬T−1 and for each aT−1 ∈ {Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic}. For example, the Inline graphic markers near the top of the plot correspond to expected returns for each ∈ 𝒬T that is achievable by taking the Inline graphic action at the current time point and then following a particular future policy. This example 𝒬T−1 contains 20 T−1 functions, each assuming a different πT.