Skip to main content
. 2014 Oct 28;8:126. doi: 10.3389/fncir.2014.00126

Figure 10.

Figure 10

Temporal development of the reward and exploration noise for the dynamic foraging task. (A) Change in the reward signal (r) over time. Between 3 × 104 time steps and 5 × 104 time steps the robot learns the initial task of reaching the green goal, receiving positive rewards (+1), successively. However, after 50 trials (approximately 5 × 104 to 5.5 × 104 time steps) the reward signals were changed, causing the robot to receive negative rewards (−1) as it drives to the green goal. After around 10 × 104 time steps as the robot learns to steer correctly toward the new desired location (blue goal), it successively receives positive rewards. (B) Change in the exploration noise (ϵ) over time. There is random exploration in the beginning of the task and after switching the reward signals (pink shaded regions), followed by stabilization and decrease in exploratory noise once the robot learns the correct behavior (gray shaded region). In both plots the thick dashed line (black) marks the point of reward switch.