Skip to main content
. 2020 Feb;122:218–230. doi: 10.1016/j.neunet.2019.10.011

Fig. 6.

Fig. 6

Episode reward achieved by CTDL and DQN on the Cart–Pole environment over the course of learning. Both CTDL and DQN were run 100 times on the Cart–Pole environment. The solid line represents the mean and the shaded region represents the standard deviation.