How the basal ganglia generate a dopaminergic Now Print learning signal to multiple brain regions in response to rewards whose timing or amplitude are unexpected: (A) Model circuit for triggering dopaminergic Now Print signals at the substantia nigra pars compacta (SNc) to multiple brain regions in response to unexpected rewards. Cortical inputs (Ii) that are activated by conditioned stimuli learn to excite the SNc (D) via the (ventral striatal, S)-to-(ventral pallidal, VP)-to-(PPTN, P)-to-SNc path. The inputs Ii excite the ventral striatum via adaptive weights WiS, and the ventral striatum excites the PPTN via double inhibition through the ventral pallidum, with weights WSP. When the PPTN activity exceeds a threshold Γp it excites the dopamine cell with weighted strength WPD. The striosomes, which contain an adaptive spectral timing mechanism (xij, Gij, Yij, Zij), learn to generate lagged, adaptively timed signals that inhibit reward-related activation of SNc. Primary reward signals (IR) from the lateral hypothalamus both excite the PPTN directly (with weighted strength WRP) and act as training signals to the ventral striatum S (with weighted strength WRS). Arrowheads denote excitatory pathways, circles denote inhibitory pathways, and hemidisks denote synapses at which learning occurs. Thick pathways denote dopaminergic signals. (B) Dopamine cell firing patterns: Left: Data. Right: Model simulation, showing model spikes and underlying membrane potential. (A) In naive monkeys, the dopamine cells fire a phasic burst when unpredicted primary reward R occurs; e.g., if the monkey receives a burst of apple juice unexpectedly. (B) As the animal learns to expect the apple juice that reliably follows a conditioned stimulus (CS) that precedes it by a fixed time interval, then the phasic dopamine burst disappears at the expected time of reward, and a new burst appears at the time of the reward-predicting CS. (C) After learning, if the animal fails to receive reward at the expected time, a phasic depression in dopamine cell firing occurs. Thus, these cells reflect an adaptively timed expectation of reward that cancels the expected reward at the expected time. [The data in (B) (column 1) are reprinted with permission from Schultz et al. (1997)]. [The model diagram in (A) and data simulation in (B) (column 2) are reprinted with permission from Brown et al. (1999).]