Skip to main content
. 2019 Jun 28;13:40. doi: 10.3389/fnbot.2019.00040

Figure 4.

Figure 4

There are reward trends for two different maps. (A) corresponded to Farm and (B) corresponded to Raceway. After training for 1,750 episodes, we obtained the reward tendency. At about 400 episodes, the stability of the driving agent began to increase. After 1,400 episodes, the reward stabilized at a high level.