Skip to main content
. 2014 Jun 5;10(6):e1003640. doi: 10.1371/journal.pcbi.1003640

Figure 1. Agent acting in a changing environment.

Figure 1

A The environmental state changes stochastically with rates Inline graphic between being rewarding, neutral or punishing. Unless mentioned otherwise, we choose Inline graphic and Inline graphic. B Based on a policy (with forgetting, without forgetting) which may depend on past observations of the environmental state and current costs of responding, an agent shows the appetitive reaction (upward arrow) or the aversive reaction (downward arrow). The stochastic costs (i.i.d. with an exponential distribution with scale parameter Inline graphic) for the appetitive/aversive reaction are shown above/below the white line. An agent with a policy that involves forgetting accumulates more reward than an agent without forgetting or immediate forgetting. C In an emulation of a classical conditioning experiment, the agent experiences a defined environmental state, and after a waiting period of length Inline graphic the agent has to react according to the internal policy. D Different policies lead to different outcomes in classical conditioning experiments. Shown is the fraction of agents choosing the conditioned response (conditioned resp.) at time Inline graphic after conditioning for agents subject to individual costs of responding.