Skip to main content
. 2021 Sep 24;2021:7588221. doi: 10.1155/2021/7588221

Figure 9.

Figure 9

Comparison of RLBNK-switch and RLBNK-concat to the PPO-finetune, baseline PPO, DQfD, and imitation learning in two generalization settings. Plots show the training performance over the number of episodes. (a) Pole length generalization. (b) Cart mass generalization.