Skip to main content
. 2021 Jul 6;8:692811. doi: 10.3389/frobt.2021.692811

FIGURE 6.

FIGURE 6

An overview of the representation of the learning problem embedded in the experiment. It shows the different runs that a participant went through (5 runs for level 1, 3 runs for level 2), as well as how the runs were separated into 4 phases defined by the Phase Variables. The colors show how in R1.1, R1.2 and R1.3, the robot usually used O1—picking up all, O2—passive large rocks and O3—breaking respectively in each phase. From R1.4 onwards, the robot would choose a Macro-action based on the learned Q-values. The Future Run portrays the behavior that the robot would engage in if there were another run, based on the Q-values after R2.3.