Skip to main content
. 2009 Oct 20;4(10):e7362. doi: 10.1371/journal.pone.0007362

Figure 1. Model overview.

Figure 1

The world communicates with the agent by sending observations and rewards and receiving actions. The world maintains its own “true” state and dwell time in that state. The agent is composed of independent µAgents that each maintain a belief of the world's state and dwell time. Each µAgent has its own value estimate for each state and its own discounting factor, and generates an independent δ signal. The µAgents' belief is integrated for action selection by a voting process.