Abstract
Although rewards are physical stimuli and objects, their value for survival and reproduction is subjective. The phasic, neurophysiological and voltammetric dopamine reward prediction error response signals subjective reward value. The signal incorporates crucial reward aspects such as amount, probability, type, risk, delay and effort. Differences of dopamine release dynamics with temporal delay and effort in rodents may derive from methodological issues and require further study. Recent designs using concepts and behavioral tools from experimental economics allow to formally characterize the subjective value signal as economic utility and thus to establish a neuronal value function. With these properties, the dopamine response constitutes a utility prediction error signal.
Introduction
The function of reward is derived from the biological needs for nutritional and other substances and reproduction. Thus, rewards have specific value for individual survival and gene propagation. Although rewards have physical aspects that are detected by sensory receptors, there are no specific receptors for the typically polysensory rewards, and their value needs to be inferred from eliciting preferences in behavioral choices. Furthermore, reward value depends on the organism’s momentary requirements. Satiation induced by a meal reduces the value of foods but may render liquids such as digestive drinks more attractive. Thus, value is subjective and constructed by the brain; it cannot be estimated entirely from the physical parameters and sensory properties of the rewards. The usual way to estimate subjective value in animals involves behavioral measures, including break points in fixed ratio schedules, preferences in binary choices and psychophysical indifference points against a common reference reward (subjective equivalents). Subjective value estimated in these ways is expressed in physical measures of break points, choice frequency or reference reward amount (e.g. ml of juice or numbers of pellets). By contrast, a more general, and theoretically well defined, measure for subjective value is formal economic utility, which constitutes a mathematical characterization of reward preferences and provides an internal metric of subjective value (sometimes called util) [1]. Individuals have the best chance to survive by preferring rewards with the highest subjective value. Economic theory formalizes this idea with axioms defining the conditions for utility maximization [2].
Maximization of subjective value and utility requires decision mechanisms in which inputs from neuronal value signals compete with each other, and only the option with the highest value gets selected. Neuronal reward signals that serve as appropriate inputs to competitive decision mechanisms should process subjective value or, in their best defined form, economic utility, in a monotonic but usually nonlinear relationship to objective value.
This review describes the neuronal coding of subjective value and formal economic utility in one of the brain’s prominent reward systems, the dopamine neurons. We review both the electrophysiological responses of mid-brain dopamine neurons and the voltametrically assessed dopamine concentration changes in axonal terminal areas in nucleus accumbens. We also address recent issues concerning voltammetric changes reflecting subjective value in rats.
Different rewards
Concepts and behaviour
How can we maximize subjective value when choosing between apples and oranges? These objects contain important substances for bodily functions, like glucose and water, but their precise contents are difficult to quantify. As different rewards often have no common physical unit, one can assign a ‘common currency’ value to one particular object, called ‘numeraire’ in economic theory. Behavioral preferences serve to estimate the subjective values of all other objects relative to this common reference, which allows comparison of subjective values between rewards. The value of the numeraire is usually set to 1, and the values of all other rewards are expressed as real number multiples of this value. Choice preferences provide numeric measures of subjective value along objective physical scales but not along subjective value scales; the subjective measure they allow is a rank order of rewards relative to the numeraire (and each other). Money is the key numeraire for modern humans, and any sufficiently familiar reward can serve as numeraire for animals. Monkeys show stable, but individually different, rankings of subjective value, as estimated from direct choices and from psychophysical variations against a numeraire [3,4••].]. Thus, monkeys estimate subjective reward value from different rewards in a common currency.
Neurobiology
The phasic, neurophysiological dopamine reward prediction error response is a brief value signal that increases monotonically with increasing reward amount and probability [5–7]. It is associated with corresponding dopamine concentrations changes in rat nucleus accumbens [8,9••,10]. Aversive stimuli induce occasionally dopamine activations and often depressions [11,12]; the phasic activations consist of briefly increased impulse activity that reflects physical stimulus impact but does not vary positively with aversiveness [13••] and thus does not represent an indiscriminate response to reward and punishment; the depressions consist of briefly reduced impulse activity that codes either negative aversive value [13••] or absence of reward (negative reward prediction error) [14•]. Dopamine reward responses follow closely the rank-ordered preferences among different liquid and food rewards (Figure 1a) [4••]. The responses reflect also the sum of positive and negative reward values, as shown when rewards are delivered together with punishers (Figure 1b) [14•]. Thus, dopamine neurons integrate different positive and negative outcomes into a subjective value signal in a common currency.
Figure 1.
Subjective value coding. (a) Graded impulse responses of dopamine neurons to different reward-predicting stimuli (blackcurrant juice and mashed banana mix). Arrows indicate subjective behavioral preferences; ~ indifferent. imp/s: firing rate. From Lak et al. [4••]. (b) Incorporation of negative aversive value into the common currency dopamine signal. The impulse response to reward juice alone (black) is reduced when an aversive salt or bitter solution is delivered together with the juice. +imp/s: firing rate subtracted from baseline firing. From Fiorillo [14•]. (c) Influence of risk on dopamine impulse responses to stimuli predicting differentially two liquid rewards (blue, blackcurrant juice; green, orange juice). Top: stimuli predicting binary, equiprobable, risky gambles (double horizontal bars) and safe rewards (single bars) with identical mean amounts. Vertical bar height indicates reward amount. Bottom: corresponding neuronal responses, closely following subjective values inferred from behavioral preferences shown above. Arrows indicate response increases with risky over safe rewards in risk seeking animal. From Lak et al. [4••]. (d) Differential, risk-attitude dependent influences of risk on voltammetric dopamine responses in rat nucleus accumbens to cue lights. Equiprobable risk reduces response in risk avoiders (top) but increases response in risk seekers (bottom). Mean reward is identical for gamble and safe reward (1 pellet). From Sugam et al. [9••].
Risk
Concepts
Rewards are inherently risky. The terms of risk avoidance and risk seeking refer to individual psychological tendencies of hating or loving risk and characterize the influence of risk on subjective reward value. Risk avoiders value risky outcomes lower than safe outcomes with same objective value, whereas risk seekers do the opposite. Risk is distinct from probability, as it increases up to probabilities of p = 0.5 and then declines again. The most simple and best controlled behavioral risk test employs binary, equiprobable gambles (p = 0.5 each outcome) in which risk is defined by statistical variance (Figure 1c top). A typical gamble involves two rewards whose amounts are above and below a mean by equal amounts, and higher risk consists of larger spread around the same mean (‘mean-preserving spread’) [15]. This review considers only risk defined by variance, without skewness.
Behavior
The subjective value of risky options can be assessed psychophysically in choices between certain and risky outcomes. The amount of the certain outcome at choice indifference (‘certainty equivalent’) indicates the subjective value of the gamble. A certainty equivalent below the mean value of the gamble outcomes indicates risk avoidance, whereas the opposite indicates risk seeking. In these tests, rhesus monkeys are often risk seeking with liquid amounts below 0.4–0.6 ml and risk avoiding with larger amounts [16,17,18••]. This reward amount-dependent pattern of risk attitude resembles the peanuts effect in humans [19]. In choices between a certain reward (1 sugar pellet) and a risky outcome (0 or 2 sucrose pellets at p = 0.5 each), rats show specific, risk-avoiding or risk-seeking attitudes [9••]. Thus, risk solely involving variance affects subjective value in monkeys and rats.
Neurobiology
Phasic dopamine responses to variance risk-predicting cues in monkeys are enhanced in the low value, risk-seeking range and reduced in the high value, risk-avoiding range (Figure 1c) [4••,18••,20]. In formal economic tests for the influence of risk on subjective value [15,21], dopamine responses follow second order stochastic dominance for risk seeking [18••]. Striatal voltammetric dopamine concentration changes following risky cues are reduced in risk-avoiding rats and enhanced in risk seekers (Figure 1d) [9••]. These data suggest meaningful incorporation of risk into the dopamine signal of subjective value.
Temporal discounting
Concepts and behaviour
The subjective value of rewards decays with the delay between a stimulus or action and the reward delivery. This temporal discounting may have its biological origin in the physical decay of many nutrient rewards. Temporal discounting applies to rewards in general, even when they remain physically unchanged and do not decay. Inter-temporal choices between a variable early and a set late reward serve to psychophysically assess temporal discounting. The subjective value of the late reward is inferred from the amount of early reward at choice indifference. The decay is captured quantitatively by hyperbolic, exponential or combined discounting functions [22–24]. Monkeys show slightly better hyperbolic than exponential discounting across several seconds [25]. In direct choices between early and late rewards, rats prefer earlier over later rewards of equal amounts [26], and exhibit delay discounting behavior [10]. Thus, temporal discounting dissociates subjective from objective reward value.
Neurobiology
Monkey dopamine neurons show hyperbolically decreasing responses to reward-predicting stimuli across reward delays of several seconds, despite constant physical reward amount, thus matching closely the behavioral discounting (Figure 2a) [25,27]. Lower reward amounts are associated with steeper neuronal discounting [25]. The close similarity between behavioral and neuronal discounting suggests that dopamine responses are sensitive to the reduced subjective reward value of delayed rewards. Voltammetric dopamine responses in rat nucleus accumbens show comparable temporal discounting, despite constant amount and effort (Figure 2b) [10,26], although another study found no appreciable temporal discounting (combined with effort cost) using the same method [28,29•]. Further, optogenetic dopamine axon stimulation in nucleus accumbens enhances the choice of delayed reward [10], presumably by enhancing the subjective value of the late reward in line with the general, causal influence of dopamine stimulation on behavioral learning and approach [30,31,32•]. Thus, the dopamine signal appears to reflect and influence the subjective reward value derived from reward delay.
Figure 2.
Temporal discounting. (a) Decreasing impulse responses of dopamine neurons to stimuli differentially predicting increasing reward delays (red), corresponding to subjective value decrements assessed in intertemporal choices (blue). Y-axis shows behavioral value and neuronal responses in % of reward amount at 2 s delay (0.56 ml). From Kobayashi and Schultz [25]. (b) Lower voltammetric dopamine responses in rat nucleus accumbens to visual cues specifying longer reward delay with identical effort. From Day et al. [26].
Effort cost
Concepts and behaviour
Caloric rewards provide energy for body functions. However, reward acquisition often involves effort, which amounts to energy expenditure. The gain from reward is reduced by the loss. This notion can be extended to all rewards; effort is considered an economic cost that should be subtracted from income value (however, subtracting cost from income does not define formal economic utility as viewed by economists [2,33,34]). Monkeys show longer reaction times, more task errors and lower task engagement when performing more effortful hand or arm movements against mechanical resistance [35,36]. Rats prefer low over high ratios of lever pressing [26,29•,37•]. Thus, effort cost affects subjective reward value.
Neurobiology
Movement effort reduces phasic neurophysiological value responses of some substantia nigra dopamine neurons in monkeys [35,36]. Increasing ratios of lever pressing reduce substantially the voltammetric dopamine value responses in rat nucleus accumbens beyond the effects of associated temporal discounting (Figure 3a) [26], whereas another study finds no influence of effort on voltammetric dopamine responses (Figure 3b) [28,29•,37•]. These differences may be addressed by considering that higher effort is positively correlated with temporal discounting in these studies [26,28,29•,37•]. Temporal discounting on its own is known to reduce electrophysiological and voltammetric dopamine responses when effort stays constant [25–27] (Figure 2a,b). Thus, effort should have decreased the voltammetric dopamine responses due to its correlation with temporal discounting. Explanations for these differences might lie in the slower (lasting >8 s) and smaller (0.5–15 nM) effort-insensitive voltammetric responses in one study [28,29•,37•], as compared to the typical dopamine changes of 30–70 nM associated with neurophysiological dopamine responses in the other studies [10,26]. The effort-insensitive voltammetric dopamine changes might not involve dopamine impulse responses or derive from methodological differences.
Figure 3.
Voltammetric dopamine responses under different effort loads. (a) High effort (lever press FR16) compared to low effort (FR1) decreases dopamine response in rat nucleus accumbens (from 70 nM to 50 nM). From Day et al. [26]. (b) Behavioral preference for low effort–low reward (LL) is not associated with higher voltammetric dopamine response in 3 nM range. From Hollon et al. [29•].
Methodology of dopamine voltammetry
The reports by Gan et al. [28], Hollon et al. [29•] and Wanat et al. [37•] used background-subtracted fast-scan cyclic voltammetry (FSCV) with chronically implanted carbon-fiber microelectrodes to monitor dopamine [38]. While this approach has been characterized [38], it is controversial in that it differs from the classically used acute electrodes, which are positioned with micromanipulators at specific locations in brain [39]. Given the well-established regional heterogeneity of dopamine release in the brain [40], acute electrodes allow for the location of dopamine ‘hotspots’ that cannot be done with chronically implanted electrodes, possibly explaining why dopamine responses measured with acute electrodes are often an order of magnitude greater than those with chronic electrodes [41]. Further, after each experiment, acute electrodes can be removed from the brain for calibration and temporal characterization. By contrast, chronic electrodes, constructed with polyimide-coated fused silica [42,43], are glued into the brain, preventing optimization of recording positions for best responses, post hoc calibration and temporal response assessment following each experiment.
Chronically implanted microelectrodes are also controversial because they have been only partially successful in other applications. They have been investigated for electrical–brain interfaces and chemical monitoring. Their insertion into tissue activates the immune response [44,45], which, over a month, can lead to encapsulation in glial scars that yield distorted but still useful electrical signals [46]. However, chronically implanted chemical sensors, typified by glucose electrodes needed for diabetic patients (for review see [47]), are recommended only for relatively short durations (3–7 days). While carbon-fiber electrodes implanted for a month seem to avoid glial encapsulation [38], other problems including biofouling, passivation, and degeneration can occur that necessitate frequent calibration. Methods for in situ calibration have yet to be developed.
The time course of dopamine fluctuations in the brain is a function of release and uptake [48–50], and temporal distortion could obfuscate these processes. For example, irrespective of whether an animal prefers low effort/low reward (LL) trials or high effort/high reward (HH) trials, cue-evoked dopamine during the HH trials has a pro-longed elevation at chronic electrodes (see lower panel of Fig. 3b) [29•]. The authors claim that this reflects sustained release. Alternatively, the delayed return to baseline could indicate diminished uptake rates in HH trials because electrochemical gradients across neuronal membranes, necessary for uptake, are diminished by prolonged firing [51]. Temporal distortion makes it impossible to resolve these possibilities. Note, however, that while dopamine shows prolonged elevations when measured with acute electrodes [10], the responses are regionally specific and temporally calibrated.
Recordings from chronic sensors differ in other ways from results obtained with acute sensors. For example, dopamine responses measured with the chronic probe often dip far below baseline (Figure 3b). In some instances the dip below baseline obtained with chronic electrodes is larger than the cue-evoked increase (Figure 1e and g in Ref. [29•]). Does this mean uptake is enhanced around chronically implanted electrodes? In most cases with acutely implanted electrodes, dopamine returns near to baseline, within the standard error of measurements, after its behavioral activation [9••,10,26], although it may remain elevated when reward-related behavior is ongoing [9••,10]. Because all electrode responses exhibit some drift, interpretations of failure to return to baseline have not yet been attempted in the literature.
A failure to return to baseline could also arise from chemical interferences. Baseline dips occur when dopamine and basic pH changes overlap — a common finding during behavior [52]. Display of color plots of the voltammetric data allows the possibility of chemical interferences to be evaluated. Unfortunately, we cannot evaluate this possibility because Hollon et al. [29•] do not show the fast-scan cyclic voltammograms from which the chemical changes are extracted. Further, chronic electrodes are frequently used with principal component analysis to extract dopamine contributions, employing a ‘standard training set’. A major problem with standard training sets is that they eliminate an important method of validating the principle component analysis [53]. This approach is also problematic because cyclic voltammetric responses of each carbon fiber differs, requiring building of training sets at each measurement site [53,54].
Taken together, the reliability of recordings with chronic voltammetric electrodes needs further confirmation. Although multiple interpretations have been suggested here, in our view the correct interpretation requires additional data and validation.
Utility
Concepts
The quantification of subjective reward value relative to a numeraire employs an objective scale. The £20,000 price of car is an objective money amount, even though my preference for that car over another car reveals my subjective value. By contrast, formal economic utility advances by one crucial step in providing a mathematical function of objective value u (x). Knowing such a function allows to determine the subjective value for goods solely based on their objective amounts, without every time requiring behavioral choices. Thus, by contrast to simple subjective value, utility employs an internal, subjective scale. This property makes utility a fundamental variable for economic decision theory.
Being subjective, utility lacks a physical measure that could serve as an anchor for quantification. Therefore, in principle, utility can only be estimated by choice preferences among rewards, which involves the ranking of rewards relative to each other and results in ordinal utility. However, choices under risk allow to estimate certainty equivalents that can serve to construct numeric, cardinal utility functions with unique shapes that are meaningful up to positive affine transformation (y = a + bx) [1,2,33,34]. The construction of formal, mathematically defined utility does not require other factors such as reward type, delay or effort, although these factors affect net benefit utility according to various economic models. Thus, formal economic utility is a well-defined, highly constrained form of subjective value that is expressed in internal units (utils).
Behavior
The fractile procedure is a useful behavioral tool for estimating the utility of reward income while keeping reward type, delay and effort constant. It estimates certainty equivalents in choices between specifically set, binary, equiprobable gambles and psychophysically adjustable safe outcomes [55,56]. Monkeys show convex utility functions with small rewards (risk-seeking, below 0.4–0.6 ml of blackcurrant juice) that turn concave with larger amounts (risk-avoiding) [18••]. The risk attitudes apparent in the curvatures of the utility function correspond to out-of-sample risky choices specifically tested at low and high outcomes [18••]. Thus, formal economic utility provides an excellent means for establishing meaningful neuronal value functions, as long as stable environments prevent adaptive value rescaling.
Neurophysiology
The identification of neuronal utility coding would involve the establishment of a neuronal utility function n (x) based on a behavioral utility function u (x). Such a signal would go well beyond the neuronal coding of subjective value shown above, as it would document that neuronal processing can derive utility u (x) from objective reward measures x that are detected by neuronal sensory signals. As neuronal activity has numeric properties, it could be necessary to relate the neuronal reward responses to numeric, rather than ordinal, utility functions. Thus, as a minimal condition, numeric neuronal value functions n (x) should have similar, usually nonlinear, forms as numeric behavioral utility functions u (x), which can be estimated in choices under risk.
The positive reward prediction error response of dopamine neurons increases non-linearly with linear increases of unpredicted juice reward (Figure 4a) [18••]. The responses (black bars) follow well the curvature of the utility function (red), increasing mildly in the low and high ranges and supralinearly in between. Similar nonlinear changes are also seen with three identical, well-defined, binary gambles that allow estimation of numeric utility; the dopamine response to the larger of the two gamble outcomes (positive prediction error) is small in lower and higher ranges, where the slope of the utility function is relatively flat, and large in the center where the utility slope is steeper (Figure 4b). In following the curvature of the utility function, the dopamine responses reflect the changes in marginal utility (first derivative of utility function) that underlie nonlinear utility. However, dopamine neurons do not code marginal utility explicitly, as the responses to unpredicted reward (Figure 4a) follow the monotonically increasing utility prediction error (blue lines) but not the nonmonotonic marginal utility. Thus, the dopamine reward prediction error response represents a numeric neuronal value function n (x) that follows closely the behavioral utility function u (x) and appears to constitute a utility prediction error signal. This neuronal value function is valid in the stable environment tested in the specific experiment [18••]; it might be subject to adaptive rescaling in more variable contexts, which remains to be tested. In sum, the neuronal signal coding a well-constrained, mathematical utility function goes well beyond the coding of subjective value derived from simple behavioral preferences. With these characteristics, the dopamine utility prediction error signal constitutes a measurable, physical signal that implements the elusive utility in the brain.
Figure 4.
Utility prediction error coding by dopamine neurons. (a) Nonlinear impulse response to unpredicted juice reward generating positive reward prediction errors. Red: nonlinear utility function. Black: dopamine responses in same monkey. Blue: linear increases in unpredicted reward, generating linearly increasing positive reward prediction errors. The linearly increasing reward prediction errors are coded by dopamine neurons as nonlinearly increasing utility prediction errors, thus showing a neuronal utility signal. (b) Nonmonotonic utility prediction error signal with constant-risk gambles. The higher gamble outcomes constitute positive utility prediction errors (Δu) whose magnitudes depend on the local slope of the utility function (top). Dopamine responses follow the nonmonotonic changes of the positive utility prediction error (Δu). a and b from Stauffer et al. [18••].
Acknowledgements
Our work has been supported by Wellcome Trust (WS: 058365, 093270, 095495), European Research Council (WS: ERC, 293549), NIH Conte Center at Caltech (WS: P50MH094258) and NIH (RMC: DA034021; RMW: DA010900).
Footnotes
Conflict of interest statement
Nothing declared.
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest
- 1.Kagel JH, Battalio RC, Green L. Economic Choice Theory: An Experimental Analysis of Animal Behavior. Cambridge University Press; 1995. [Google Scholar]
- 2.von Neumann J, Morgenstern O. The Theory of Games and Economic Behavior. Princeton University Press; 1944. [Google Scholar]
- 3.Padoa-Schioppa C, Assad JA. Neurons in the orbitofrontal cortex encode economic value. Nature. 2006;441:223–226. doi: 10.1038/nature04676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Lak A, Stauffer WR, Schultz W. Dopamine prediction error responses integrate subjective value from different reward dimensions. Proc Natl Acad Sci U S A. 2014;111:2343–2348. doi: 10.1073/pnas.1321596111. Our recent study that forms partly the basis for this review, showing dopamine neuron coding of subjective value derived from different liquid and food rewards, closely related to behavioral choice preferences.
- 5.Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science. 2003;299:1898–1902. doi: 10.1126/science.1077349. [DOI] [PubMed] [Google Scholar]
- 6.Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H. Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron. 2004;43:133–143. doi: 10.1016/j.neuron.2004.06.012. [DOI] [PubMed] [Google Scholar]
- 7.Tobler PN, Fiorillo CD, Schultz W. Adaptive coding of reward value by dopamine neurons. Science. 2005;307:1642–1645. doi: 10.1126/science.1105370. [DOI] [PubMed] [Google Scholar]
- 8.Day JJ, Roitman MF, Wightman RM, Carelli RM. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat Neurosci. 2007;10:1020–1028. doi: 10.1038/nn1923. [DOI] [PubMed] [Google Scholar]
- 9. Sugam JA, Day JJ, Wightman RM, Carelli RM. Phasic nucleus accumbens dopamine encodes risk-based decision-making behavior. Biol Psychiatry. 2012;71:199–2015. doi: 10.1016/j.biopsych.2011.09.029. Our recent study that forms partly the basis for this review, showing voltammetric dopamine responses of subjective value affected by risk according to individual risk attitudes. Further, consistent with bidirectional reward prediction error coding, reward delivery in risky trials evoked significant dopamine increases compared to perfectly predicted safe reward, whereas reward omission in risky trials caused significant dopamine decreases.
- 10.Saddoris MP, Sugam JA, Stuber GD, Witten IB, Deisseroth K, Carelli RM. Mesolimbic dopamine dynamically tracks, and is causally linked to, discrete aspects of value-based decision making. Biol Psychiatry. 2015;77:903–911. doi: 10.1016/j.biopsych.2014.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mirenowicz J, Schultz W. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature. 1996;379:449–451. doi: 10.1038/379449a0. [DOI] [PubMed] [Google Scholar]
- 12.Matsumoto M, Hikosaka O. Two types of dopamine neuron distinctively convey positive and negative motivational signals. Nature. 2009;459:837–841. doi: 10.1038/nature08028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Fiorillo CD, Song MR, Yun SR. Multiphasic temporal dynamics in responses of midbrain dopamine neurons to appetitive and aversive stimuli. J Neurosci. 2013;33:4710–4725. doi: 10.1523/JNEUROSCI.3883-12.2013. This study assessed the subjective (negative) value of punishers in behavioral choices and also distinguished between physical stimulus components and aversive motivational components of punishers. Dopamine neurons were activated by punishers but, importantly, only by their physical intensity, not their aversiveness.
- 14. Fiorillo CD. Two dimensions of value: dopamine neurons represent reward but not aversiveness. Science. 2013;341:546–549. doi: 10.1126/science.1238699. This study shows that dopamine responses incorporate the negative value of punishers into a common currency subjective value signal, without reacting to punishers alone.
- 15.Rothschild M, Stiglitz JE. Increasing risk: I. A definition. J Econ Theory. 1970;2:225–243. [Google Scholar]
- 16.McCoy AN, Platt ML. Risk-sensitive neurons in macaque posterior cingulate cortex. Nat Neurosci. 2005;8:1220–1227. doi: 10.1038/nn1523. [DOI] [PubMed] [Google Scholar]
- 17.O’Neill M, Schultz W. Coding of reward risk distinct from reward value by orbitofrontal neurons. Neuron. 2010;68:789–800. doi: 10.1016/j.neuron.2010.09.031. [DOI] [PubMed] [Google Scholar]
- 18. Stauffer WR, Lak A, Schultz W. Dopamine reward prediction error responses reflect marginal utility. Curr Biol. 2014;24:2491–2500. doi: 10.1016/j.cub.2014.08.064. Our recent study that forms partly the basis for this review, showing dopamine neuron coding of formal economic utility, closely associated with behavioral utility function.
- 19.Weber BJ, Chapman GB. Playing for peanuts: why is risk seeking more common for low-stakes gambles? Organ Behav Hum Decis Process. 2005;97:31–46. [Google Scholar]
- 20.Fiorillo CD. Transient activation of midbrain dopamine neurons by reward risk. Neuroscience. 2011;197:162–171. doi: 10.1016/j.neuroscience.2011.09.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mas-Colell A, Whinston M, Green J. Microeconomic Theory. Oxford University Press; 1995. [Google Scholar]
- 22.Ainslie GW. Impulse control in pigeons. J Exp Anal Behav. 1974;21:485–489. doi: 10.1901/jeab.1974.21-485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Rodriguez ML, Logue AW. Adjusting delay to reinforcement: comparing choice in pigeons and humans. J Exp Psychol Anim Behav Proc. 1988;14:105–117. [PubMed] [Google Scholar]
- 24.Richards JB, Mitchell SH, de Wit H, Seiden LS. Determination of discount functions in rats with an adjusting-amount procedure. J Exp Anal Behav. 1997;67:353–366. doi: 10.1901/jeab.1997.67-353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kobayashi S, Schultz W. Influence of reward delays on responses of dopamine neurons. J Neurosci. 2008;28:7837–7846. doi: 10.1523/JNEUROSCI.1600-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Day JJ, Jones JL, Wightman RM, Carelli RM. Phasic nucleus accumbens dopamine release encodes effort- and delay-related costs. Biol Psychiatry. 2010;68:306–309. doi: 10.1016/j.biopsych.2010.03.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Fiorillo CD, Newsome WT, Schultz W. The temporal precision of reward prediction in dopamine neurons. Nat Neurosci. 2008;11:966–973. doi: 10.1038/nn.2159. [DOI] [PubMed] [Google Scholar]
- 28.Gan JO, Walton ME, Phillips PEM. Dissociable cost and benefit encoding of future rewards by mesolimbic dopamine. Nat Neurosci. 2010;13:25–27. doi: 10.1038/nn.2460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Hollon NG, Arnold MM, Gan JO, Walton ME, Phillips PE. Dopamine-associated cached values are not sufficient as the basis for action selection. Proc Natl Acad Sci U S A. 2014;111:18357–18362. doi: 10.1073/pnas.1419770111. Similar to a previous report [28], the study shows slow and very small voltammetric dopamine signals in nucleus accumbens which are hardly influenced by effort costs that vary together with temporal discounting. The reported dopamine signals are not sensitive to all aspects of subjective value but, because of their insensitivity to temporal discounting, are also unlikely to reflect dopamine impulse activity.
- 30.Corbett D, Wise RA. Intracranial self-stimulation in relation to the ascending dopaminergic systems of the midbrain: a moveable microelectrode study. Brain Res. 1980;185:1–15. doi: 10.1016/0006-8993(80)90666-6. [DOI] [PubMed] [Google Scholar]
- 31.Tsai H-C, Zhang F, Adamantidis A, Stuber GD, Bonci A, de Lecea L, Deisseroth K. Phasic firing in dopaminergic neurons is sufficient for behavioral conditioning. Science. 2009;324:1080–1084. doi: 10.1126/science.1168878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Steinberg EE, Keiflin R, Boivin JR, Witten IB, Deisseroth K, Janak PH. A causal link between prediction errors, dopamine neurons and learning. Nat Neurosci. 2013;16:966–973. doi: 10.1038/nn.3413. In the strictest learning test used so far on dopamine neurons, inspired by formal animal learning theory (blocking), optogenetic stimulation of rat dopamine neurons renders a blocked stimulus effective for behavioral learning.
- 33.Savage LJ. The Foundations of Statistics. Wiley; 1954. [Google Scholar]
- 34.Debreu G. Cardinal utility for even-chance mixtures of pairs of sure prospects. Rev Econ Stud. 1959;26:174–177. [Google Scholar]
- 35.Pasquereau B, Turner RS. Limited encoding of effort by dopamine neurons in a cost–benefit trade-off task. J Neurosci. 2013;33:8288–8300. doi: 10.1523/JNEUROSCI.4619-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Varazzani C, San-Galli A, Gilardeau S, Bouret S. Noradrenaline and dopamine neurons in the reward/effort trade-off: a direct electrophysiological comparison in behaving monkeys. J Neurosci. 2015;35:7866–7877. doi: 10.1523/JNEUROSCI.0454-15.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Wanat MJ, Kuhnen CM, Phillips PEM. Delays conferred by escalating costs modulate dopamine release to rewards but not their predictors. J Neurosci. 2010;30:12020–12027. doi: 10.1523/JNEUROSCI.2691-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Clark JJ, Sandberg SG, Wanat MJ, Gan JO, Horne EA, Hart AS, Akers CA, Parker JG, Willuhn I, Martinez V, Evans SB, Stella N, Phillips PEM. Chronic microsensors for longitudinal, subsecond dopamine detection in behaving animals. Nat Methods. 2010;7:126–129. doi: 10.1038/nmeth.1412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Garris PA, Christensen JR, Rebec GV, Wightman RM. Real-time measurement of electrically evoked extracellular dopamine in the striatum of freely moving rats. J Neurochem. 1997;68:152–161. doi: 10.1046/j.1471-4159.1997.68010152.x. [DOI] [PubMed] [Google Scholar]
- 40.Wightman RM, Heien ML, Wassum KM, Sombers LA, Aragona BJ, Khan AS, Ariansen JL, Cheer JF, Phillips PE, Carelli RM. Dopamine release is heterogeneous within microenvironments of the rat nucleus accumbens. Eur J Neurosci. 2007;26:2046–2054. doi: 10.1111/j.1460-9568.2007.05772.x. [DOI] [PubMed] [Google Scholar]
- 41.Willuhn I, Burgeno LM, Everitt BJ, Phillips PE. Hierarchical recruitment of phasic dopamine signaling in the striatum during the progression of cocaine use. Proc Natl Acad Sci U S A. 2012;109:20703–20708. doi: 10.1073/pnas.1213460109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Swiergiel AH, Palamarchouk VS, Dunn AJ. A new design of carbon fiber microelectrode for in vivo voltammetry using fused silica. J Neurosci Methods. 1997;73:29–33. doi: 10.1016/s0165-0270(96)02207-8. [DOI] [PubMed] [Google Scholar]
- 43.Gerhardt GA, Ksir C, Pivik C, Dickinson SD, Sabeti J, Zahniser NR. Methodology for coupling local application of dopamine and other chemicals with rapid in vivo electrochemical recordings in freely-moving rats. J Neurosci Methods. 1999;87:67–76. doi: 10.1016/s0165-0270(98)00158-7. [DOI] [PubMed] [Google Scholar]
- 44.Polikov VS, Tresco PA, Reichert WM. Response of brain tissue to chronically implanted neural electrodes. J Neurosci Methods. 2005;148:1–18. doi: 10.1016/j.jneumeth.2005.08.015. [DOI] [PubMed] [Google Scholar]
- 45.Kozai TD, Jaquins-Gerstl A, Vazquez A, Michael AC, Cui X. Brain tissue responses to neural implants impact signal sensitivity and intervention strategies. ACS Chem Neurosci. 2015;6:48–67. doi: 10.1021/cn500256e. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Winslow BD, Tresco PA. Quantitative analysis of the tissue response to chronically implanted microwire electrodes in rat cortex. Biomaterials. 2010;31:1558–1567. doi: 10.1016/j.biomaterials.2009.11.049. [DOI] [PubMed] [Google Scholar]
- 47.Cengiz E, Sherr JL, Weinzimer SA, Tamborlane WV. New-generation diabetes management: glucose sensor-augmented insulin pump therapy. Expert Rev Med Devices. 2011;8:449–458. doi: 10.1586/erd.11.22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Giros B, Jaber M, Jones SR, Wightman RM, Caron MG. Hyperlocomotion and indifference to cocaine and amphetamine in mice lacking the dopamine transporter. Nature. 1996;379:606–612. doi: 10.1038/379606a0. [DOI] [PubMed] [Google Scholar]
- 49.Montague PR, McClure SM, Baldwin PR, Phillips PE, Budygin EA, Stuber GD, Kilpatrick MR, Wightman RM. Dynamic gain control of dopamine delivery in freely moving animals. J Neurosci. 2004;24:1754–1759. doi: 10.1523/JNEUROSCI.4279-03.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Robinson DL, Hermans A, Seipel AT, Wightman RM. Monitoring rapid chemical communication in the brain. Chem Rev. 2008;108:2554–2584. doi: 10.1021/cr068081q. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Jones SR, Joseph JD, Barak LS, Caron MG, Wightman RM. Dopamine neuronal transport kinetics and effects of amphetamine. J Neurochem. 1999;73:2406–2414. doi: 10.1046/j.1471-4159.1999.0732406.x. [DOI] [PubMed] [Google Scholar]
- 52.Ariansen JL, Heien ML, Hermans A, Phillips PE, Hernadi I, Bermudez MA, Schultz W, Wightman RM. Monitoring extracellular pH, oxygen, and dopamine during reward delivery in the striatum of primates. Front Behav Neurosci. 2012;6:1–10. doi: 10.3389/fnbeh.2012.00036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Rodeberg NT, Johnson JA, Cameron CM, Saddoris MP, Carelli RM, Wightman RM. Construction of training sets for valid calibration of in vivo cyclic voltammetric data by principal component analysis. Anal Chem. 2015;87:11484–11491. doi: 10.1021/acs.analchem.5b03222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Heien ML, Khan AS, Ariansen JL, Cheer JF, Phillips PE, Wassum KM, Wightman RM. Real-time measurement of dopamine fluctuations after cocaine in the brain of behaving rats. Proc Natl Acad Sci U S A. 2005;102:10023–10028. doi: 10.1073/pnas.0504657102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Caraco T, Martindale S, Whitham TS. An empirical demonstration of risk-sensitive foraging preferences. Anim Behav. 1980;28:820–830. [Google Scholar]
- 56.Machina MJ. Choice under uncertainty: problems solved and unsolved. J Econ Perspect. 1987;1:121–154. [Google Scholar]




