Skip to main content
. 2021 Sep 24;2021:7588221. doi: 10.1155/2021/7588221

Figure 7.

Figure 7

Comparison of RLBNK-switch and RLBNK-concat to the baseline PPO, DQfD, expert policy, and pure imitation learning under the normal reward setting. Plots show the training performance over the number of episodes. (a) CartPole. (b) Catcher. (c) FlappyBird.