Skip to main content
. 2014 Jun 5;10(6):e1003640. doi: 10.1371/journal.pcbi.1003640

Figure 4. Asymmetry of behavior after aversive and appetitive conditioning.

Figure 4

A An agent with a provident policy shows faster forgetting after aversive conditioning (red curve) than after appetitive conditioning (green curve). The boxes mark the behavior of the approximative model in C. B The total reward collected in free runs of Inline graphic time bins (compare to Fig. 1B) is larger for the provident policy than for the greedy policy. Plotted are mean and s.e.m. for 40 trials. C Similar performances are obtained with a simple, approximative implementation of the optimal strategy with synaptic strengths Inline graphic and Inline graphic connecting an odor detecting neuron (o) to action neurons “approach” (ap) and “avoid” (av). In the absence of any stimulus (odor) the synaptic strengths decay with different time constants for the approximative provident policy and with the same time constants for the approximative greedy policy. When an odor is present, the synaptic strengths change in a Hebbian way in the case of reward and in an anti-Hebbian way in the case of punishment, i.e. Inline graphic/Inline graphic increase/decrease for reward and decrease/increase for punishment.