Skip to main content
. 2021 Sep 24;2021:7588221. doi: 10.1155/2021/7588221

Figure 10.

Figure 10

The cumulative reward (mean ± standard deviation with 500 rollouts) of RLBNK-switch and RLBNK-concat trained policies versus the trained PPO baseline policy when tested in disturbed CartPole task. Plots show the performance of each policy over the disturbance strength Φ.