Skip to main content
. 2013 Jul 26;7:11. doi: 10.3389/fnbot.2013.00011

Figure 2.

Figure 2

Left plot: Agent architecture employed during the developmental period. No external reward is provided but the motivational system IM creates an intrinsic reward ri. In parallel, new skills o are identified using the skill discovery module SD and added to the skill pool O. The policy πi selects skills according to their intrinsic reward; both πi and the policy πo of the active skill are learned. Right plot: Agent architecture for learning to solve external tasks graphic file with name fnbot-07-00011-i0001.jpg. A hierarchical policy πe is learned based on the external reward re using the fixed set of skills O. The policy πo of the active skill is also improved.