Figure 8.
Experimental results for CartPole task under different sparse reward settings, where T denotes the sparse interval of receiving rewards for the agent. Plots show the training performance over the number of episodes. (a) T = 25. (b) T = 50. (c) T = 100.