(a) Model circuit for the control of dopaminergic Now Print signals in response to unexpected rewards. Cortical inputs (Ii), activated by conditioned stimuli, learn to excite the SNc via a multi-stage pathway from the ventral striatum (S) to the ventral pallidum and then on to the PPTN (P) and the SNc (D). The inputs Ii excite the ventral striatum via adaptive weights WiS, and the ventral striatum excites the PPTN via double inhibition through the ventral pallidum, with strength WSP. When the PPTN activity exceeds a threshold GP, it excites the SNc with strength WPD. The striosomes, which contain an adaptive spectral timing mechanism (xij, Gij, Yij, Zij), learn to generate adaptively timed signals that inhibit reward-related activation of the SNc. Primary reward signals (IR) from the lateral hypothalamus both excite the PPTN directly (with strength WRP) and act as training signals to the ventral striatum S (with strength WRS) that trains the weights WiS. Arrowheads denote excitatory pathways, circles denote inhibitory pathways, and hemidiscs denote synapses at which learning occurs. Thick pathways denote dopaminergic signals. Reprinted with permission from Brown et al. (1999). (b) Dopamine cell firing patterns: Left: data. Right: model simulation, showing model spikes and underlying membrane potential. (A) In naive monkeys, the dopamine cells fire a phasic burst when unpredicted primary reward R occurs, such as if the monkey unexpectedly receives a burst of apple juice. (B) As the animal learns to expect the apple juice that reliably follows a sensory cue (conditioned stimulus, CS) that precedes it by a fixed time interval, then the phasic dopamine burst disappears at the expected time of reward, and a new burst appears at the time of the reward-predicting CS. (C) After learning, if the animal fails to receive reward at the expected time, a phasic depression, or dip, in dopamine cell firing occurs. Thus, these cells reflect an adaptively timed expectation of reward that cancels the expected reward at the expected time. The data are reprinted with permission from Schultz et al. (1997). The model simulations are reprinted with permission from Brown et al. (1999). (c) Dopamine cell firing patterns: Left: data. Right: model simulation, showing model spikes and underlying membrane potential. (A) The dopamine cells learn to fire in response to the earliest consistent predictor of reward. When CS2 (instruction) consistently precedes the original CS (trigger) by a fixed interval, the dopamine cells learn to fire only in response to CS2. Data reprinted with permission from Schultz et al. (1993). (B) During training, the cell fires weakly in response to both the CS and reward. Data reprinted with permission from Ljungberg et al. (1992). (C) Temporal variability in reward occurrence: When reward is received later than predicted, a depression occurs at the time of predicted reward, followed by a phasic burst at the time of actual reward. (D) If reward occurs earlier than predicted, a phasic burst occurs at the time of actual reward. No depression follows since the CS is released from working memory. Data in C and D reprinted with permission from Hollerman and Schultz (1998). (E) When there is random variability in the timing of primary reward across trials (e.g. when the reward depends on an operant response to the CS), the striosomal cells produce a Mexican Hat depression on either side of the dopamine spike. Data reprinted with permission from Schultz et al. (1993). Model simulation reprinted with permission from Brown et al. (1999).