Skip to main content
. 2017 Feb 22;8:244. doi: 10.3389/fpsyg.2017.00244

FIGURE 1.

FIGURE 1

Dopamine neural correlates follow the laws of prediction error as formalized in Sutton and Barto’s (1981) model-free reinforcement learning algorithm. According to Sutton and Barto (1981) prediction errors are generated by the quantitative difference between the value of the reward delivered and the value attributed to the reward-predictive cue. Phasic activity of dopaminergic neurons in the VTA can be interpreted as conforming with these predictions. This schematic diagram represents dopamine activity during different Pavlovian conditioning paradigms. Activity is represented by the black lines aligned to a cue (e.g., light or tone, on the left) and reward (e.g., a juice drop, on the right). The dotted lines refer to changes in associative strength. (A) Pavlovian conditioning: A reward elicits a positive prediction error when the reward is unexpected, thus an error is made in prediction (Stage I). Dopamine neurons exhibit firing upon the reward delivery. However, with repeated cue-reward pairings this dopamine signal transfers to the reward-predictive cue and diminishes to the reward (Stage II) (Hollerman and Schultz, 1998). As this cue is now predictive of reward, there is a reduction in prediction error at the time of reward and motivated behavior directed toward the predictive cue increases. (B) Blocking: a critical aspect of the Sutton and Barto (1981) model is that the learning (or value) about the reward must be shared amongst all present cues. This is referred to in learning theory as a summed-error term (Rescorla and Wagner, 1972). This concept is well illustrated by the blocking phenomenon. For example, during Stage I a light cue is trained to predict reward and with training comes to elicit a dopamine signal (Waelti et al., 2001). When a second auditory cue (tone) is presented simultaneously with the light cue and the same quantity of reward is delivered during Stage II, no prediction error is elicited as the reward is already expected and no dopamine signal is exhibited. Behaviorally, learning about the novel tone cue is said to be blocked, and when the cues are presented alone at Test the light cue maintains associative strength but the blocked tone cue does not gain any associative strength. (C) Over-expectation: Two different cues (light and tone) that have been separately trained to predict a particular quantity of reward come to each elicit a dopamine prediction-error signal after multiple cue-reward pairings in Stage I. During Stage II, the two cues are then presented as a simultaneous compound, followed by reward given to each trial type during Stage I. This generates a negative prediction error, as the reward is less than the summed expectation of each cue. In this example dopamine signaling is suppressed in response to the over-expected reward not being delivered. This negative prediction error drives a reduction in associative strength so that both cues lose half their associative value when presented alone at Test, assuming these cues are matched for salience (e.g., Chang et al., 2016).