Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jun 17.
Published in final edited form as: Curr Biol. 2019 May 30;29(12):2066–2074.e5. doi: 10.1016/j.cub.2019.05.013

An Analysis of Decision Under Risk in Rats

Christine M Constantinople 1,5,*, Alex T Piet 1,4, Carlos D Brody 1,2,3
PMCID: PMC6863753  NIHMSID: NIHMS1531308  PMID: 31155352

Summary:

In 1979, Daniel Kahneman and Amos Tversky published a ground-breaking paper titled “Prospect Theory: An Analysis of Decision Under Risk,” which presented a behavioral economic theory that accounted for the ways in which humans deviate from economists’ normative workhorse model, Expected Utility Theory [1,2]. For example, people exhibit probability distortion (they overweight low probabilities), loss aversion (losses loom larger than gains), and reference dependence (outcomes are evaluated as gains or losses relative to an internal reference point). We found that rats exhibited many of these same biases, using a task in which rats chose between guaranteed and probabilistic rewards. However, Prospect Theory assumes stable preferences in the absence of learning, an assumption at odds with alternative frameworks such as animal learning theory and reinforcement learning [37]. Rats also exhibited trial history effects, consistent with ongoing learning. A reinforcement learning model in which state-action values were updated by the subjective value of outcomes according to Prospect Theory reproduced rats’ nonlinear utility and probability weighting functions, and also captured trial-by-trial learning dynamics.

Blurb

Constantinople et al. apply Prospect Theory, the predominant economic theory of decision-making under risk, to rats. Rats exhibit signatures of both Prospect Theory and reinforcement learning. The authors present a model that integrates these frameworks, accounting for rats’ nonlinear econometric functions and also trial-by-trial learning.

Results:

Two key components of Prospect Theory are utility (rewards are evaluated by the subjective satisfaction or “utility” they provide) and probability distortion (people often overweight low and underweight high probabilities; Figure 1D). In this theory, subjective value is determined by the shapes of subjects’ utility and probability weighting functions.

Figure 1. Rats choose between guaranteed and probabilistic rewards.

Figure 1.

(A) Behavioral task and timing of task events: flashes cue reward probability (p) and click rates convey water volume (x) on each side. Safe and risky sides are not fixed.

(B) Relationship between cues and reward probability/volume in one task version. Alternative versions produced similar results (Figure S2). There were four possible volumes (6, 12, 24, or 48μL), and the risky side offered reward probabilities between 0 and 1 in increments of 0.1.

(C) One rat’s performance for each of the safe side volumes. Axes are probability and volume of risky options.

(D) A behavioral model inferred the utility and probability weighting functions that best explained rats’ choices. We modeled the probability that the rat chose the right side by a logistic function whose argument was the difference between the subjective value of each option (VR-VL) plus a trial history-dependent term. Subjective utility was parameterized as:
u(x)={(xr)α if x>rκ(rx)α if x<r, (1)
where α is a free parameter, and x is reward volume. r is the reference point, which determines whether rewards are perceived as gains or losses. We first consider the case where r = 0, so
u(x)=xα. (2)
The subjective probability of each option is computed by:
w(p)=eβ(ln(p))δ, (3)
where β and δ are free parameters and p is the objective probability offered. Combining utility and probability yields the subjective value for each option:
VR=u(xR)w(pR) (4)
VL=u(xL)w(pL). (5)
These were normalized by the max, and transformed into choice probabilities via a logistic function:
P(ChooseR)=ι+12ι1+eλ(VRVL)+bias, (6)
where ι captures stimulus independent variability (lapse rate) and λ determines the sensitivity of choices to the difference in subjective value (VRVL). The bias term was comprised of three possible parameters, depending on trial history:
bias={+/h1 if t-1 was safe L/R choice +/h2 if t-1 was risky L/R rew +/h3 if t-1 was risky L/R miss. (7)
(E) Model prediction for held-out data from one rat, averaged over 5 test sets.

See also Figures S1, S2.

Learning theories provide an alternative account of subjective value. In animal learning theory, Thorndike’s “Law of Effect” described the effect of reinforcers on action selection [8], and Pavlov’s subsequent experiments demonstrated how animals learn to associate stimuli with rewards [9]. The Rescorla-Wagner model of classical conditioning formalized how such learning might occur [35]. Although these models described how animals might learn associations between stimuli, they were naturally extended to account for learning values from experience [7,10]. Models of trial-and-error learning from animal learning theory form the basis for reinforcement learning algorithms, including temporal difference learning, which captures temporal relationships between predictors and outcomes [7,11]. Reinforcement learning provides a powerful framework for value-based decision-making in psychology and neuroscience, in which value estimates are learned from experience and updated trial-to-trial based on prediction errors [7,12,13].

Reinforcement learning has had profound impact in part because many of its components have been related to neural substrates [12,1417]. However, standard reinforcement learning algorithms dictate that agents learn the expected value (volume × probability) of actions or outcomes with experience [6], meaning that they will exhibit linear utility and probability weighting functions. This is incompatible with Prospect Theory. We found that rats exhibited signatures of both Prospect Theory and reinforcement learning, and we present an initial attempt to integrate these frameworks. First, we focus on Prospect Theory.

Most economic studies examine decisions between clearly described lotteries (i.e., “decisions from description”). Studies of risky choice in rodents, however, typically examine decisions between prospects that are learned over time (i.e., “decisions from experience”), which are difficult to reconcile with Prospect Theory [1820]. We designed a task in which reward probability and amount are communicated by sensory evidence, eliciting decisions from description rather than experience. This enabled behavioral economic approaches, such as estimating utility functions.

Rats initiated a trial by nose-poking in the center port of a three-port wall. Light flashes were presented from left and right side ports, and the number of flashes conveyed the probability of water reward at each port. Simultaneously, auditory clicks were presented from left/right speakers, and click rate conveyed the volume of water reward baited at each port (Figure 1A,B). One port offered a guaranteed/safe reward, and the other offered a risky reward with an explicitly cued probability. The safe and risky ports (left/right) varied randomly. One of four water volumes could be the guaranteed or risky reward (6, 12, 24, 48μL); risky reward probabilities ranged from 0 to 1, in increments of 0.1 (Figure 1A,B).

High-throughput training generated 36 trained rats and many tens of thousands of choices per rat, enabling detailed behavioral quantification. Rats demonstrated they learned the meaning of the cues by frequently “opting-out” of trials offering smaller rewards, leaving the center poke despite incurring a time-out penalty and white-noise sound (Figure S1AC). This indicated that they associated the click rates with water volumes, instead of relying on a purely perceptual strategy. It is possible that opting-out, which persisted despite longer time-out penalties for low-volume trials (Methods), reflected reward-rate maximizing strategies [2123]. Rats favored prospects with higher expected value (Figure 1C, S1).

We used a standard choice model [24,25] to estimate each rat’s utility and probability weighting functions according to Prospect Theory (Figure 1D; see legend; Figure S1). The model predicted rats’ choices on held-out data (Figure 1E). It outperformed alternative models, including one imposing linear probability weighting (according to Expected Utility Theory [26]), one that fit linear weights for probabilities and volumes, and several models implementing purely perceptual strategies with sensory noise (Figure S2AF).

Concave utility (the utility function exponent α<1) produces diminishing marginal sensitivity, in which subjects are less sensitive to differences in larger rewards. Rats’ median α was 0.54, indicating concave utility, like humans [2] (Figure 2A; Figure S1F). To test for diminishing marginal sensitivity, we compared performance on trials offering guaranteed outcomes of 0 or 24μL, and 24 or 48μL (Figure 2B,C). Concave utility implies that 24 and 48μL are less discriminable than 0 and 24μL (Figure 2C). Indeed, the concavity of the utility function was correlated with reduced discriminability on trials offering 24 and 48μL (Figure 2D,E; p = 1.03e-7, Pearson’s correlation). This was true when the guaranteed outcome of 0 included trials offering 24μL with p=0 (Figure 2D,E), or all volumes with p=0 (Figure S2G,H). Our choice set did not permit analysis of trials offering non-zero rewards, as these trials (24 vs. 0, 48 vs. 24) were the only ones with equal reward differences. This suggests that rats, like humans, exhibit diminishing marginal sensitivity.

Figure 2. Non-parametric analyses confirm nonlinear utility and probability weighting and reveal diverse risk attitudes.

Figure 2.

(A) Model fits of subjective utility functions for each rat, normalized by the maximum volume (48μL).

(B) Schematic linear utility function: the perceptual distance (or discriminability, d’) between 0μL and 24μL is the same as 24μL and 48μL.

(C) Schematic concave utility function: 24μL and 48μL are less discriminable than 0μL and 24μL.

(D) One rat’s performance on trials with guaranteed outcomes of 0μL vs. 24μL (green), or 24μL vs. 48μL (purple). Performance ratio on these trials (“d’ ratio”) less than 1 indicates diminishing sensitivity.

(E) The concavity of the utility function (α) is significantly correlated with reduced discriminability of larger rewards. Pink circle is rat from D.

(F) Model fits of probability weighting functions.

(G) Weights from logistic regression parameterizing each probability match probability weighting function for one rat. Error bars are s.e.m. for each regression coefficient.

(H) Mean squared error between regression weights and parametric fits for each rat (mean mse=0.006, in units of probability).

(I,J) To obtain “certainty equivalents,” we measured psychometric functions for each probability of receiving 48μL, and estimated the certain volume at which performance = 50%.

(K) Measured (blue) and model-predicted (red) certainty equivalents from one rat indicates systematic undervaluing of the gamble, or risk aversion. Error bars for model-prediction are 95% confidence intervals of parameters from 5-fold cross validation. Data are mean +/− s.e.m. for left-out test sets.

(L) Distribution of CE areas computed using analytic expression from model fits. Measured CEs were similar (Figure S3C). See also Figure S2, S3.

Rats’ probability weighting functions revealed overweighting of probabilities (Figure 2F). A logistic regression model that parameterized each probability to predict choice yielded regression weights mirroring the probability weighting functions (Figure 2G,H). Control experiments indicated that nonlinear utility and probability weighting were not due to perceptual errors in estimating flashes and clicks [27] (Figure S2IK).

To evaluate rats’ risk attitudes, we measured the certainty equivalents for all gambles of 48μL [2,28,29]. The certainty equivalent is the guaranteed reward the rat deems equal to the gamble (Figure 2I,J). If it is less than the gamble’s expected value, that indicates risk aversion: the subject effectively “undervalues” the gamble, and will accept a smaller reward to avoid risk (Figure 2K). Conversely, if the certainty equivalent is greater than the gamble’s expected value, the subject is risk seeking, and risk neutral if they are equal. Measured certainty equivalents closely matched those predicted from the model, using an analytic expression incorporating utility and probability weighting functions (CE = w(p)1/α; Methods. Pearson’s correlation 0.96, p=1.58e-11; Figure 2K). This non-parametric assay further validated the model fits and revealed heterogeneous risk preferences across rats (Figure 2L; Figure S3AC).

Although rats exhibited nonlinear utility and probability weighting, consistent with Prospect Theory, they also exhibited trial-by-trial learning, consistent with reinforcement learning. Fitting the model to trials following rewarded and unrewarded choices revealed systematic shifts in utility and probability weighting functions: utility functions became less concave and probability weighting functions became more elevated to reflect increased likelihood of risky choice following rewards (Figure 3A). This was consistent across rats, as observed in certainty equivalents from the data (p=6.49e-5, paired t-test) and model (Figure 3B; p=2.46e-7).

Figure 3. Rats exhibit evidence of trial-by-trial learning.

Figure 3.

(A) Probability weighting function (left) and utility function (right) for one rat from model fit to trials following reward (turquoise) or no reward (black).

(B) CE areas predicted from model fits for all rats following rewarded and unrewarded trials.

(C) ΔProbability of repeating left/right choices (relative to mean probability of repeating), following each reward. Points above the dashed line indicate an increased probability of repeating (“stay”); those below indicate a decreased probability (“switch”). Black curve is average +/− s.e.m. across rats.

(D) A separate cohort of 3 rats was trained with doubled water volumes. They exhibited lose-switch biases following 12 and 24μL.

(E) Win-stay/lose-switch biases for one rat separated by reward history two trials back.

(F) Schematic illustrating that with concave utility, rewards should be more (less) discriminable when the reference point is high (low).

(G) Psychometric performance from one rat when the inferred reference point was low (black) or high (blue). Red curve is ideal performance.

(H) Value function with the median parameters across rats indicates loss aversion (median α=0.6, κ=1.7).

See also Figure S3.

Another feature of human behavior is “reference dependence:” people evaluate rewards as gains or losses relative to an internal reference point. It is unclear what determines the reference point [30]; proposals include status quo wealth [1], reward expectation [31,32], heuristics based on the prospects [33,34], or recent experience [35,36].

Rats demonstrated reference dependence by treating smaller rewards as losses. They exhibited win-stay/lose-switch biases: following unrewarded trials, rats were more likely to switch ports (Figure 3C). Surprisingly, most rats exhibited “switch” biases after receiving 6 or 12μL, consistent with treating these outcomes as losses. The “win/lose” threshold (i.e., reference point) was experience-dependent: a separate cohort of rats (n=3) trained with doubled rewards (12–96μL) exhibited lose-switch biases after receiving 12 or 24μL (Figure 3D).

The win/lose threshold was often reward-history dependent (Figure 3E). Therefore, we parameterized a reference point, r, as taking one of two values depending on whether the previous trial was rewarded (see Methods). Rewards less than r were negative (losses). The relative amplitude of losses versus gains was controlled by the parameter κ (Equation 1; Figure 1D). Subjective value was reparameterized to include the zero outcome of the gamble, which is a loss when r > 0:

VR=u(xR)w(pR)+u(0)w(1pR) (8)
VL=u(xL)w(pL)+u(0)w(1pL) (9)

Model comparison (Akaike Information Criteria, AIC) favored the reference point model for all rats (Figure S3D,E). We also parameterized the reference point as reflecting several trials back with an exponential decay, where the time constant was a free parameter (see Methods). For most rats (20/36, 77%), this did not significantly improve model performance compared to the reference point reflecting one trial back, although it was a better fit for a minority of rats with longer integration time constants over trials (Figure S3FH). For the sake of simplicity and because it generally provided a better fit, we focused on the “one-trial back” reference point model. Interestingly, the reference point from the model was not significantly correlated with average reward rate for each rat [37]; this was true regardless of whether opt-out trials were included in estimates of average reward per trial (Pearson’s correlation, p>0.05).

With concave utility, rats should exhibit sharper psychometric performance when the reference point is high (and rewards are more discriminable; Figure 3F). Indeed, performance was closer to ideal when the reference point was high (Figure 3G; mean mse between psychometric and ideal performance was 0.143 low ref vs. 0.122 high ref, p = 3.4e-5, paired t-test across rats).

A loss parameter κ>1 indicates “loss aversion,” or a greater sensitivity to losses than gains (Equation 1). We observed a median κ of 1.66 (Figure 3H). There was variability across rats: 16/36 rats (44%) were not loss averse but were more sensitive to gains (κ<1). Still, the median κ across rats suggests similarity to humans (Figure 3H).

Prospect Theory does not account for how agents learn subjective values from experience, and we explicitly incorporated trial history parameters to account for trial-by-trial learning (Figure 4A). To examine learning dynamics, we fit the model to the first and second half of all trials, once rats achieved criterion performance (Figure S4A). There was no significant change in the parameters for the utility or probability weighting functions (Figure S4B). Rats showed a significant increase in the softmax parameter with training, indicating increased sensitivity to value differences, and a decrease in one of the trial history parameters, h1, indicating reduced win-stay biases (Figure S4).

Figure 4. Integrating Prospect Theory and reinforcement learning captures nonlinear subjective functions and learning.

Figure 4.

(A) Prospect Theory model predictions for each rat, without the trial history parameters (h1-h3, see Figure 1 legend) does not account for win-stay/lose-switch trial history effects. Inclusion of these parameters accounts for these effects.

(B) Prospect Theory model fit to simulated choices from a basic reinforcement learning agent yields linear utility and probability weighting functions over a range of generative learning rates (0.2, 0.4, 0.6, 0.8. 1.0, overlaid).

(C) Schematic of model incorporating Prospect Theory and reinforcement learning.

(D) The hybrid model described in panel C accounts for win-stay/lose-switch affects.

(E) The model recovers nonlinear utility and probability weighting functions.

(F) Model comparison when the error term used in the model was the subjective value (as shown in panel C), or the expected value (probability × reward). Red arrow is mean ΔAIC.

(G) Binned values of rats’ lose-switch biases (measured from the data) plotted against the best-fit learning rate, αlearn. Pearson’s correlation coefficient is −0.37 across rats.

See also Figure S4.

Reinforcement learning describes an adaptive process in which animals learn the value of states and actions. This framework, however, implies linear utility and probability weighting. We simulated choices of a Q-learning agent, which learned the state-action values of each unique trial type. Fitting the Prospect Theory model to these simulated choices recovered linear utility and probability weighting functions (regardless of learning rate; Figure 4B). This is expected: since trial sequences were randomized (i.e., each trial was independent) basic reinforcement learning will learn the expected value of each option [6], and will resolve to linear utility and probability weighting functions. We therefore implemented a reinforcement learning model that could accommodate nonlinear subjective utility and probability weighting, but also learning over trials (Figure 4C; [10,38]). The model assumed that the rats learned the value of left and right choices for each unique combination of probability (p) and reward (μL) according to the following equation:

Vp,μL(t+1)=Vp,μL(t)+αlearn(w(p)u(x)Vp,μL(t)), (10)

where αlearn is the learning rate parameter, and w(p) and u(x) are parameterized as in Equations 2 and 3. The learned values of the right and left prospects on each trial were transformed into choice probabilities via a logistic function (see Equation 6). We also implemented a global update to the value matrices for left and right choices depending on reward history (Figure 4C,D). In this model, utility and probability weighting functions were exclusively used for learning/updating values, whereas choice depended on the learned values on each trial. Although the parameters of the utility and probability weighting functions were free parameters in this model, we recovered parameter values identical to the Prospect Theory model (Figure 4E, Figure S4C). Importantly, a reinforcement learning model in which the expected value (EV) was the error signal driving learning underperformed compared to the model incorporating subjective value according to Prospect Theory (Figure 4F; p = 1.38e-8, paired t-test of AIC). Finally, each rats’ learning rate (αlearn) was negatively correlated with the magnitude of their lose-swich biases, suggesting an inverse relationship between learning dynamics governing gradual improvements in task performance, and trial-by-trial learning which is deleterious to performance when trials are independent [27,39,40]. Rats with slower dynamics (lower learning rate) showed more prominent trial history effects, whereas rats with rapid learning showed reduced trial history biases (Figure 4G).

Discussion:

There is a strong foundation for using animal models to study the cost-benefit calculations underlying economic theory [41]. In foraging studies, animals either exploit a current option for reward or explore a new one, and often maximize their rate of reward [4245]. Rodents exhibit complex aspects of economic decision-making, including regret [46,47], and sensitivity to sunk costs [48,49]. Here, we applied Prospect Theory to rats. Like humans, rats exhibit nonlinear concave utility for gains, probability distortion, reference dependence, and, frequently, loss aversion. Nearly all rats exhibited concave utility, which produced diminishing marginal sensitivity. In contrast, most studies in monkeys have reported convex utility [24,5053] (but see [25]). In Expected Utility Theory, concave utility indicates risk aversion [26]. However in Prospect Theory, concave utility can coincide with risk-seeking behavior due to the elevation of the probability weighting function [28,54].

Our rats differ from primates in that they do not appear to underweight moderate and high probabilities [1,24,28]. The “inverted-S” shape of the probability weighting function may reflect diminishing sensitivity relative to two reference points which, in the probability domain, correspond to 0 and 1 [2,28]. The rats, either due to the task or species differences, may not treat certainty as a reference point.

Reward history modified rats’ risk preferences, producing shifts in utility and probability weighting functions. The extent to which risk preferences are stable traits is an area of active research [55,56]. Recent work suggests a general/stable component of risk attitudes, but variability across domains (e.g., finance, recreation [56]). In foraging tasks, risk preferences reflect food availability and/or energy budget in a variety of species [44,57]. Here, we document dynamic risk preferences; these trial-by-trial dynamics are not likely driven by physiological factors (e.g. energy budget) but may reflect dynamic internal/cognitive states mediated by reinforcement learning.

We found evidence for reference dependence, in which rats’ treatment of outcomes as gains/losses reflected their reward history. Studies in several species, including capuchin monkeys [58], have also suggested reference dependence. Starlings and locusts prefer options that previously were encountered under greater hunger, presumably because those rewards were perceived to have greater reference-dependent value [5961]. Rats modulate their approach speed for rewards depending on previously experienced reward amounts [62,63]. Regret, which reflects a post-decision valuation of a choice relative to an unchosen alternative, may be a reference-dependent computation; for regret, the reference point would be the counterfactual prospect [64].

While variable, the median loss parameter (κ) across rats indicated loss aversion, which has been documented in capuchin monkeys [65]. We note that we did not examine losses by taking reinforcers away from the animal. However, several decision theories [32,66] posit that rewards less than the reference point are losses. Loss aversion in humans is remarkably variable [67] and possibly domain specific [68,69]. The nature of loss aversion is intriguing: is it a constant, a psychological trait similar to risk preferences [56], or an emergent property of constructing preferences [70]?

Prospect Theory, animal learning theory, and reinforcement learning are complementary frameworks for studying decision-making (but see [71]). Reinforcement learning and animal learning theory are principally concerned with how subjects learn values over experience and use those learned values to make decisions. Prospect Theory, in contrast, does not address learning but describes nonlinear distortions that account for the decision. We propose a simple approach for integrating these frameworks, in which animals learn the values of actions associated with task states, but the reward prediction error driving learning is in units of subjective value according to Prospect Theory [10,38]. This hypothesis is consistent with studies of dopamine neurons, which are thought to instantiate reward prediction errors in the brain in a temporal-difference learning algorithm [6,7,13,14]. Conditioned stimuli predicting rewards with different probabilities or magnitudes have been shown to elicit phasic dopamine responses reflecting the value of the expected reward [7274]. In delay discounting tasks, the phasic dopamine response reflects discounted value of delayed rewards in monkeys and rats [75,76]. Finally, recent work has shown that dopamine reward prediction errors reflect the shape of monkeys’ measured utility functions [77]. The hypothesis of a reward prediction error in units of subjective value (perhaps according to Prospect Theory in the case of explicitly described lotteries) is also conceptually related to studies of homeostatic reinforcement learning, in which internal state influences subjective valuation [78]. This hypothesis bridges animal learning theory, reinforcement learning, and economic concepts of subjective value. A key topic of future research should address how subjective estimates of value arise: are they innate, learned early in life, or constantly evolving over the lifespan?

STAR Methods

Contact for Reagent and Resource Sharing

Further information and requests for resources should be directed to and will be fulfilled by the Lead Contact, Christine Constantinople (constantinople@nyu.edu). Transgenic (Pvalb-iCre)2Ottc rats (n=5) were obtained by an MTA from University of Missouri RRRC.

Experimental Model and Subject Details

Subjects

A total of 39 male rats between the ages of 6 and 24 months were used for this study, including 35 Long-evans and 4 Sprague-Dawley rats (Rattus norvegicus). The Long-evans cohort also included LE-Tg (Pvalb-iCre)2Ottc rats (n=5) made at NIDA/NIMH and obtained from the University of Missouri RRRC (transgenic line 0773). These are BAC transgenic rats expressing Cre recombinase in parvalbumin expressing neurons. Animal use procedures were approved by the Princeton University Institutional Animal Care and Use Committee and carried out in accordance with National Institutes of Health standards.

Rats were typically housed in pairs or singly; rats that trained during the day were housed in a reverse light cycle room. Some rats trained overnight, and were not housed with a reverse light cycle. Access to water was scheduled to within-box training, 2–4 hours per day, usually 7 days a week, and between 0 and 1 hour ad lib following training.

Method Details

Behavioral training

Rats were trained in a high-throughput facility using a computerized training protocol. Rats were trained in operant training boxes with three nose ports. When an LED from the center port was illuminated, the animal could initiate a trial by poking his nose in that port; upon trial initiation the center LED turned off. While in the center port, rats were continuously presented with a train of randomly timed clicks from a left speaker and, simultaneously, a different train of clicks from a right speaker. The click trains were generated by Poisson processes with different underlying rates [79,80]; the rates conveyed the water volume baited at each side port. After a variable pre-flash interval ranging from 0 to 350ms, rats were also presented with light flashes from the left and right side ports; the number of flashes conveyed reward probability at each port. Each flash was 20ms in duration; flashes were presented in fixed bins, spaced every 250ms, to avoid perceptual fusion of consecutive flashes [81]. After a variable post-flash delay period from 0 to 500ms, the end of the trial was cued by a go sound and the center LED turning back on. The animal was then free to choose the left or right center port, and potentially collect reward.

The trials were self-paced: on trials when rats did not receive reward, they were able to initiate another trial immediately. However, if rats terminated center fixation prematurely, they were penalized with a white noise sound and a time out penalty. Since rats disproportionately terminated trials offering low volumes, we scaled the time out penalty based on the minimum reward offered. The time out penalties were adjusted independently for each rat to minimize terminated trials (as an example, several rats were penalized with 6 second time-outs for terminating trials offering a minimum of 6μL, 4.5 seconds for terminating trials offering a minimum of 12μL, 3 seconds for terminating trials offering a minimum of 24μL, and 1.5 seconds for terminating trials offering a minimum of 48μL).

In this task, the rats were required to reveal their preference between safe and risky rewards. To determine when rats were sufficiently trained to understand the meaning of the cues in the task, we evaluated the “efficiency” of their choices as follows. For each training session, we computed the average expected value per trial of an agent that chose randomly, and a perfect expected value maximizer, or an agent that always chose the side with the greater expected value. We compared the expected value per trial from the rat’s choices relative to these lower and upper bounds. Specifically, the efficiency was calculated as follows:

efficiency=0.5ratEV/trialrandEV/trialEVmaxEV/trialrandEV/trial+0.5 (11)

The threshold for analysis was the median performance of all sessions minus 1.5 times the interquartile range of performance across the second half of all sessions. Once performance surpassed this threshold, it was typically stable across months. Occasional days with poor performance were usually due to hardware malfunctions in the rig. Days in which performance was below threshold were excluded from analysis.

Quantification and Statistical Analysis

Behavioral model

We fit a behavioral model separately for each rat (see Figure 1 legend for description of the model). We used Matlab’s constrained minimization function fmincon to minimize the sum of the negative log likelihoods with respect to the model parameters. 20 random seeds were used in the maximum likelihood search for each rat; parameter values with the maximum likelihood of these seeds were deemed the best fit parameters. When evaluating model performance (e.g., Figure 1E), we performed 5-fold cross-validation and evaluated the predictive power of the model on the held-out test sets.

We initially evaluated three different parametric forms of the probability weighting function, the one- and two-parameter Prelec models and the linear in log-odds model (see below; [24,28]. We compared the different parametric forms using Akaike Information Criteria (AIC), AIC = 2k + 2nLL, where k is the number of parameters, and nLL is the negative log likelihood of the model. AIC favored the two-parameter Prelec model for nearly all rats, although some rats were equally well-fit by the linear in log-odds model (data not shown). Therefore, we implemented the two-parameter Prelec model.

One-parameter Prelec:w(p)=e(ln(p))δ, (12)

where p is the true probability, and δ is a free parameter. δ controls the curvature of the weighting function; its crossover point is fixed at 1/e.

Two-parameter Prelec:w(p)=eβ(ln(p))δ, (13)

Where p is the true probability, β and δ are free parameters. δ primarily controls the curvature and β primarily controls the elevation of the weighting function.

Linear in log-odds:w(p)=δpγδpγ+(1p)γ, (14)

where p is the true probability and γ and δ are free parameters. γ primarily controls the curvature of the weighting function and δ controls the elevation.

Alternative models

We compared the Prospect Theory model to a number of alternative models. The Expected Utility Theory model (EUT) has the same form as the Prospect Theory model, except that the subjective value on each side is the product of objective probability and subjective utility (see Figure S2A):

VR=u(xR)pR (15)
VL=u(xL)pL. (16)

The linear weighting model fit different weights to flashes and clicks, before combining them (see Figure S1G):

u(x)=cx (17)
w(p)=dp, (18)

where c and d are constants, x is the click rate and p is the number of flashes presented to the animal on each side. The value on each side is the product of linearly weighted flashes and clicks:

V=u(x)w(p). (19)

We next included sensory noise as part of a perceptual strategy. Previously, we have used a signal-detection theory (SDT) model to estimate rats’ perceptual variability (noise) in estimating numbers of flashes and clicks; we found they exhibit a property called scalar variability, meaning that the standard deviation in their estimate grows linearly with the mean [27]. We implemented four different signal-detection theory models that instantiated scalar noise, according to this work. The models differ in the decision rules they apply. The models assume that on each trial, the rats’ estimate of the number of flashes (clicks) on each side is a random variable drawn from a normal distribution, the mean of which corresponds to the actual number of flashes (clicks) presented to the animal. According to scalar variability, the standard deviation is linearly related to the number of flashes (clicks). There are two free parameters that define this linear relationship; we fit separate linear scaling relationships to the estimation of clicks and flashes:

σF=mFx+bF (20)
σC=mCx+bC (21)

Where mF, mC, bF, bC, are free parameters. x is the number of flashes (clicks) presented to the rat on each side. For the first two SDT models, We compute choice probabilities based on the flash difference (ΔF), and click difference (ΔC) separately, where these choice probabilities are calculated as follows, according to [27]:

P(went right|ΔF)=0N(RFLF)σRF2+σLF2d(RFLF) (22)
P(went right|ΔC)=0N(RCLC)σRC2+σLC2d(RCLC) (23)

RF (RC) and LF (LC) are the number of right and left flashes (clicks) presented to the rat on each trial, and the σ terms are the noise terms defined in Equations 20 and 21. One model (SDT1) assumes that the rat’s choice is given by the average of these probabilities (see Figure S2C). Another model (SDT2) assumes that the rat’s choice is given by the most informative cue on each trial (the choice probability most different from 0.5; see Figure S2D).

Alternatively, it’s possible that the rats combine the noisy estimates of flashes and clicks on each side. Therefore, we evaluated two additional models parameterized as follows:

P(went right|right ev)=0N(RF+RC)σRF2+σRC2d(RF+RC) (24)
P(went right|left ev)=0N(LF+LC)σLF2+σLC2d(LF+LC) (25)

One model (SDT3) assumes that the rat’s choice is given by the average of these probabilities (Figure S2E), and the other (SDT4) assumes that the rat’s choice is given by the most informative side on each trial (the choice probability most different from 0.5; Figure S2F).

Psychometric curves

We measured rats’ psychometric performance when choosing between the safe and risky options. For these analyses, we excluded trials where both the left and right side ports offered certain rewards. We binned the data into 11 bins of the difference in the subjective value (inferred from the behavioral model) of the safe minus the risky option. Psychometric plots show the probability that the subjects chose the safe option as a function of this difference (see Figure S1D). We fit a 4-parameter sigmoid of the form:

p(ChooseS)=y0+12a(1+e(b(VSVRx0))), (26)

where y0, a, b, and x0 were free parameters. Parameters were fit using a gradient-descent algorithm to minimize the mean square error between the data and the sigmoid, using the sqp algorithm in Matlab’s constrained optimization function fmincon.

Logistic regression to compare regressors to probability weighting functions

We fit a logistic regression model with a separate regressor for each probability the rat may have been offered (0 to 1 in 0.1 increments), plus a constant term. To compare the regressors to the parametric fits, we normalized the regressors for each probability by subtracting the minimum and dividing by the maximum regressor value, so they ranged from 0 to 1 (Figure 2G). We computed the mean square error between these normalized regressor values and the probability weighting functions (Figure 2H). The model was fit using Matlab’s function glmfit.

Certainty equivalents

Non-parametric estimate

We estimated rat’s certainty equivalents by evaluating their psychometric performance (%Chose risky) for each gamble of 48 μL, and estimating the value of the psychometric curve at which performance was at 50% (Figure 2I). To do this, we fit a line to the two points of the psychometric curve above and below chance level using Matlab’s regress.m function, and interpolated the value of that line that would correspond to 50%.

Analytic expression for CE from the model fits

We compared our estimates of rats’ certainty equivalents from their behavioral data to an analytic expression from the subjective probability and utility functions we obtained from the model. We define the certainty equivalent, x˜, as the guaranteed reward equal to a gamble, x with probability p. In the case of linear probability weighting, we express this as follows:

pxα=x˜αln(p)+αln(x)=αln(x˜)ln(p)=αln(x˜x)1αln(p)=ln(x˜x)p1α=x˜x (27)

For nonlinear probability weighting, substituting w(p) for p yields an analytic expression for the certainty equivalent from the exponent of the utility function (α) and the probability weighting function (also see [29]).

Behavioral model with reference point

The behavioral model with the reference point (see Figure 3) was similar to the behavioral model described above, except for elaborations of the subjective utility function u(x) and subjective value (VR, VL). We modified the subjective utility function to include a dynamic reference point, r, below which value was treated negatively (as a loss). The relative amplitude of losses versus gains was controlled by the scale parameter κ.

u(x)={(xr)αif xrκ(rx)α  if  x<r, (28)

where, as before, α is the exponent of the utility function, and x is the offered reward. We also reparameterized subjective value. The risky prospect offers two possible outcomes: x with probability p, and 0 with probability 1 − p. In the absence of a reference point, the zero reward outcome (0, 1 − p) does not influence choice (0α = 0). However, if r>0, the zero reward outcome can be perceived as a loss. Therefore, in the reference point model, subjective value was reparameterized to incorporate this possible outcome of the gamble:

VR=u(xR)w(pR)+u(0)w(1pR) (29)
VL=u(xL)w(pL)+u(0)w(1pL) (30)

We parameterized the reference point, r, to take on two discrete values depending on whether the previous trial was rewarded or not. There were two additional free parameters, y and m that could account for asymmetric effects of rewarded and unrewarded trials:

r(t)={m if t-1 was rewarded y if t-1 was not rewarded . (31)

We constrained r ≥ 0.

Behavioral model integrating Prospect Theory and Reinforcement Learning

This behavioral model was similar to the Prospect Theory model, except that w(p) and u(x) were used to update the subject’s value of each unique trial type based on experience. There were separate state-action value matrices for left and right choices. The entry of each matrix corresponded to a unique trial type, for each unique probability p and reward volume μL,

Vp,μL(t+1)=Vp,μL(t)+αlearn(w(p)u(x)Vp,μL(t)), (32)

where αlearn is an additional free parameter fit by the model. w(p) and u(x) are parameterized as they were in the Prospect Theory model, according to Equations 2 and 3 in the main text. The difference in value of the right and left prospect is the argument to a logistic function, as parameterized in Equation 6 of the main text.

We also included a global update of the entire left or right value matrix that reflected reward history as follows:

V(t+1)={βwinu(x) if t was rewarded βlossu(x) if t was not rewarded, (33)

where u(x) corresponds to the subjective utility of the chosen reward volume.

Data and Software Availability

Behavioral data are available upon request by contacting the Lead Contact, Christine Constantinople (constantinople@nyu.edu).

Supplementary Material

Supplemental Information

Key Resources Table

REAGENT or RESOURCE SOURCE IDENTIFIER
Experimental Models: Organisms/Strains
Rat: Long Evans Taconic RRID: RGD_1566430
Rat: Sprague Dawley Taconic RRID: RGD_1566440
Rat: Long Evans Hilltop www.hilltoplabs.com
Rat: Long Evans Harlan RRID: RGD_5508398
Rat: Pvalb-iCre University of Missouri RRRC RRID: RGD_10412329
Software and Algorithms
Matlab MathWorks RRID: SCR_001622
Behavioral control software Bcontrol http://brodywiki.princeton.edu/bcontrol/index.php/Main_Page

Highlights.

  • A novel task enables application of core behavioral economic approaches in rodents.

  • Like humans, rats exhibit nonlinear utility and probability weighting.

  • Rats also exhibit trial history effects, consistent with ongoing learning.

  • A reinforcement learning model incorporating subjective value accounts for the data.

Acknowledgements:

The authors thank Paul Glimcher, Kenway Louie, Mike Long, Cristina Savin, David Schneider, Kevin Miller, Ben Scott, Mikio Aoi, Matthew Lovett-Barron, Cristina Domnisoru, Alejandro Ramirez, and members of the Brody lab for helpful discussions and comments on the manuscript. We thank J. Teran, K. Osorio, L. Teachen, and A. Sirko for animal training. This work was funded in part by a K99/R00 award from NIMH (MH111926, to C.M.C.).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Declaration of Interests: The authors declare no competing interests.

References

  • 1.Kahneman D, and Tversky A (1979). Prospect Theory: An Analysis of Decision under Risk. Econometrica 47, 263. [Google Scholar]
  • 2.Tversky A, and Kahneman D (1992). Advances in prospect theory: Cumulative representation of uncertainty. J. Risk Uncertain 5, 297–323. [Google Scholar]
  • 3.Bush RR, and Mosteller F A Mathematical Model for Simple Learning. Springer Series in Statistics, 221–234. Available at: 10.1007/978-0-387-44956-2_12. [DOI] [PubMed] [Google Scholar]
  • 4.Bush RR, and Mosteller F A Model for Stimulus Generalization and Discrimination. Springer Series in Statistics, 235–250. Available at: 10.1007/978-0-387-44956-2_13. [DOI] [PubMed] [Google Scholar]
  • 5.Rescorla RA, A.R. W (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement In Classical conditioning Ii: Current theory and research (New York: Appleton-Century-Crofts.). [Google Scholar]
  • 6.Glimcher PW (2011). Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc. Natl. Acad. Sci. U. S. A 108 Suppl 3, 15647–15654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sutton RS, and Barto AG (2018). Reinforcement Learning: An Introduction (A Bradford Book; ). [Google Scholar]
  • 8.Thorndike EL (1911). Animal intelligence; experimental studies Available at: 10.5962/bhl.title.55072. [DOI]
  • 9.Pavlov PI (2010). Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. Ann Neurosci 17, 136–141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Glimcher PW (2010). Foundations of Neuroeconomic Analysis.
  • 11.Schultz W, Dayan P, and Montague PR (1997). A neural substrate of prediction and reward. Science 275, 1593–1599. [DOI] [PubMed] [Google Scholar]
  • 12.Lee D, Seo H, and Jung MW (2012). Neural Basis of Reinforcement Learning and Decision Making. Annu. Rev. Neurosci 35, 287–308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Niv Y (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology 53, 139–154. Available at: 10.1016/j.jmp.2008.12.005. [DOI] [Google Scholar]
  • 14.Bornstein AM, and Daw ND (2011). Multiplicity of control in the basal ganglia: computational roles of striatal subregions. Curr. Opin. Neurobiol 21, 374–380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.van der Meer MAA, and Redish AD (2011). Ventral striatum: a critical look at models of learning and evaluation. Curr. Opin. Neurobiol 21, 387–392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Averbeck BB, and Costa VD (2017). Motivational neural circuits underlying reinforcement learning. Nat. Neurosci 20, 505–512. [DOI] [PubMed] [Google Scholar]
  • 17.Ito M, and Doya K (2011). Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit. Curr. Opin. Neurobiol 21, 368–373. [DOI] [PubMed] [Google Scholar]
  • 18.Hertwig R, and Erev I (2009). The description–experience gap in risky choice. Trends Cogn. Sci 13, 517–523. [DOI] [PubMed] [Google Scholar]
  • 19.Barron G, and Erev I (2003). Small feedback-based decisions and their limited correspondence to description-based decisions. J. Behav. Decis. Mak 16, 215–233. [Google Scholar]
  • 20.Erev I, and Roth AE (2014). Maximization, learning, and economic behavior. Proc. Natl. Acad. Sci. U. S. A 111 Suppl 3, 10818–10825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Herrnstein RJ (1961). Relative and absolute strength of response as a function of frequency of reinforcement. J. Exp. Anal. Behav 4, 267–272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Heyman GM, and Duncan Luce R (1979). Operant matching is not a logical consequence of maximizing reinforcement rate. Anim. Learn. Behav 7, 133–140. [Google Scholar]
  • 23.Gallistel CR, Mark TA, King AP, and Latham PE (2001). The rat approximates an ideal detector of changes in rates of reward: implications for the law of effect. J. Exp. Psychol. Anim. Behav. Process 27, 354–372. [DOI] [PubMed] [Google Scholar]
  • 24.Stauffer WR, Lak A, Bossaerts P, and Schultz W (2015). Economic choices reveal probability distortion in macaque monkeys. J. Neurosci 35, 3146–3154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Yamada H, Tymula A, Louie K, and Glimcher PW (2013). Thirst-dependent risk preferences in monkeys identify a primitive form of wealth. Proc. Natl. Acad. Sci. U. S. A 110, 15788–15793. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.von Neumann J, and Morgenstern O (2007). Theory of Games and Economic Behavior (Princeton University Press; ). [Google Scholar]
  • 27.Scott BB, Constantinople CM, Erlich JC, Tank DW, and Brody CD (2015). Sources of noise during accumulation of evidence in unrestrained and voluntarily head-restrained rats. Elife 4, e11308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Gonzalez R, and Wu G (1999). On the shape of the probability weighting function. Cogn. Psychol 38, 129–166. [DOI] [PubMed] [Google Scholar]
  • 29.Abdellaoui M, Bleichrodt H, and l’Haridon O A Tractable Method to Measure Utility and Loss Aversion under Prospect Theory. PsycEXTRA Dataset Available at: 10.1037/e722852011-018. [DOI] [Google Scholar]
  • 30.Barberis N (2012). Thirty Years of Prospect Theory in Economics: A Review and Assessment Available at: 10.3386/w18621. [DOI]
  • 31.Kőszegi B, and Rabin M (2007). Reference-Dependent Risk Attitudes. Am. Econ. Rev 97, 1047–1073. [Google Scholar]
  • 32.Köszegi B, and Rabin M (2006). A Model of Reference-Dependent Preferences*. Q. J. Econ 121, 1133–1165. [Google Scholar]
  • 33.Bleichrodt H, Pinto JL, and Wakker PP (2001). Making Descriptive Use of Prospect Theory to Improve the Prescriptive Use of Expected Utility. Manage. Sci 47, 1498–1514. [Google Scholar]
  • 34.van Osch SMC, van den Hout WB, and Stiggelbout AM (2006). Exploring the reference point in prospect theory: gambles for length of life. Med. Decis. Making 26, 338–346. [DOI] [PubMed] [Google Scholar]
  • 35.Khaw MW, Glimcher PW, and Louie K (2017). Normalized value coding explains dynamic adaptation in the human valuation process. Proc. Natl. Acad. Sci. U. S. A 114, 12696–12701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Hunter LE, and Gershman SJ (2018). Reference-dependent preferences arise from structure learning Available at: 10.1101/252692. [DOI]
  • 37.Constantino SM, and Daw ND (2015). Learning the opportunity cost of time in a patch-foraging task. Cogn. Affect. Behav. Neurosci 15, 837–853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Niv Y, Edlund JA, Dayan P, and O’Doherty JP (2012). Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain. Journal of Neuroscience 32, 551–562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Akrami A, Kopec CD, Diamond ME, and Brody CD (2018). Posterior parietal cortex represents sensory history and mediates its effects on behaviour. Nature 554, 368–372. [DOI] [PubMed] [Google Scholar]
  • 40.Busse L, Ayaz A, Dhruv NT, Katzner S, Saleem AB, Schölvinck ML, Zaharia AD, and Carandini M (2011). The detection of visual contrast in the behaving mouse. J. Neurosci 31, 11351–11361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kagel JH, Battalio RC, and Green L (1995). Economic Choice Theory: An Experimental Analysis of Animal Behavior (Cambridge University Press; ). [Google Scholar]
  • 42.Charnov EL (1976). Optimal foraging, the marginal value theorem. Theor. Popul. Biol 9, 129–136. [DOI] [PubMed] [Google Scholar]
  • 43.Kacelnik A (1984). Central Place Foraging in Starlings (Sturnus vulgaris). I. Patch Residence Time. J. Anim. Ecol 53, 283. [Google Scholar]
  • 44.Stephens DW, and Krebs JR (1986). Foraging Theory (Princeton University Press; ). [Google Scholar]
  • 45.Ollason JG (1980). Learning to forage--optimally? Theor. Popul. Biol 18, 44–56. [DOI] [PubMed] [Google Scholar]
  • 46.Steiner AP, and David Redish A (2014). Behavioral and neurophysiological correlates of regret in rat decision-making on a neuroeconomic task. Nat. Neurosci 17, 995–1002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Sweis BM, Thomas MJ, and Redish AD (2018). Mice learn to avoid regret. PLoS Biol 16, e2005853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Sweis BM, Abram SV, Schmidt BJ, Seeland KD, MacDonald AW 3rd, Thomas MJ, and Redish AD (2018). Sensitivity to “sunk costs” in mice, rats, and humans. Science 361, 178–181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Wikenheiser AM, and David Redish A (2012). Sunk costs account for rats’ decisions on an intertemporal foraging task. BMC Neurosci 13, P63. [Google Scholar]
  • 50.McCoy AN, and Platt ML (2005). Risk-sensitive neurons in macaque posterior cingulate cortex. Nat. Neurosci 8, 1220–1227. [DOI] [PubMed] [Google Scholar]
  • 51.Hayden BY, and Platt ML (2007). Temporal Discounting Predicts Risk Sensitivity in Rhesus Macaques. Curr. Biol 17, 49–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.So N-Y, and Stuphorn V (2010). Supplementary eye field encodes option and action value for saccades with variable reward. J. Neurophysiol 104, 2634–2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Chen X, and Stuphorn V (2018). Inactivation of Medial Frontal Cortex Changes Risk Preference. Curr. Biol 28, 3709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Abdellaoui M, Bleichrodt H, and l’Haridon O (2007). A Tractable Method to Measure Utility and Loss Aversion under Prospect Theory. PsycEXTRA Dataset Available at: 10.1037/e722852011-018. [DOI] [Google Scholar]
  • 55.Schildberg-Hörisch H (2018). Are Risk Preferences Stable? J. Econ. Perspect 32, 135–154. [PubMed] [Google Scholar]
  • 56.Frey R, Pedroni A, Mata R, Rieskamp J, and Hertwig R (2017). Risk preference shares the psychometric structure of major psychological traits. Sci Adv 3, e1701381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Kacelnik A, and Bateson M (1996). Risky Theories—The Effects of Variance on Foraging Decisions. Am. Zool 36, 402–434. [Google Scholar]
  • 58.Lakshminarayanan VR, Keith Chen M, and Santos LR (2011). The evolution of decision-making under risk: Framing effects in monkey risk preferences. J. Exp. Soc. Psychol 47, 689–693. [Google Scholar]
  • 59.Pompilio L, and Kacelnik A (2005). State-dependent learning and suboptimal choice: when starlings prefer long over short delays to food. Anim. Behav 70, 571–578. [Google Scholar]
  • 60.Pompilio L, Kacelnik A, and Behmer ST (2006). State-dependent learned valuation drives choice in an invertebrate. Science 311, 1613–1615. [DOI] [PubMed] [Google Scholar]
  • 61.Marsh B (2004). Energetic state during learning affects foraging choices in starlings. Behav. Ecol 15, 396–399. [Google Scholar]
  • 62.Crespi LP (1942). Quantitative Variation of Incentive and Performance in the White Rat. Am. J. Psychol 55, 467. [Google Scholar]
  • 63.Zeaman D (1949). Response latency as a function of the amount of reinforcement. J. Exp. Psychol 39, 466–483. [DOI] [PubMed] [Google Scholar]
  • 64.Krähmer D, and Stone R (2011). Anticipated regret as an explanation of uncertainty aversion. Econom. Theory 52, 709–728. [Google Scholar]
  • 65.Chen MK, Keith Chen M, Lakshminarayanan V, and Santos LR (2006). How Basic Are Behavioral Biases? Evidence from Capuchin Monkey Trading Behavior. J. Polit. Econ 114, 517–537. [Google Scholar]
  • 66.Gul F (1991). A Theory of Disappointment Aversion. Econometrica 59, 667 Available at: 10.2307/2938223. [DOI] [Google Scholar]
  • 67.Sayman S, and Öncüler A (2005). Effects of study design characteristics on the WTA–WTP disparity: A meta analytical framework. J. Econ. Psychol 26, 289–312. [Google Scholar]
  • 68.Dhar R, and Wertenbroch K (1999). Consumer Choice Between Hedonic and Utilitarian Goods
  • 69.Heath TB, Ryu G, Chatterjee S, McCarthy MS, Mothersbaugh DL, Milberg S, and Gaeth GJ (2000). Asymmetric Competition in Choice and the Leveraging of Competitive Disadvantages. J. Consum. Res 27, 291–308. [Google Scholar]
  • 70.Johnson EJ, Gächter S, and Herrmann A (2006). Exploring the Nature of Loss Aversion [Google Scholar]
  • 71.Plonsky O, and Erev I (2017). Learning in settings with partial feedback and the wavy recency effect of rare events. Cogn. Psychol 93, 18–43. [DOI] [PubMed] [Google Scholar]
  • 72.Fiorillo CD, Tobler PN, and Schultz W (2003). Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902. [DOI] [PubMed] [Google Scholar]
  • 73.Morris G, Arkadir D, Nevet A, Vaadia E, and Bergman H (2004). Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43, 133–143. [DOI] [PubMed] [Google Scholar]
  • 74.Tobler PN, Fiorillo CD, and Schultz W (2005). Adaptive coding of reward value by dopamine neurons. Science 307, 1642–1645. [DOI] [PubMed] [Google Scholar]
  • 75.Kobayashi S, and Schultz W (2008). Influence of reward delays on responses of dopamine neurons. J. Neurosci 28, 7837–7846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Day JJ, Jones JL, Wightman RM, and Carelli RM (2010). Phasic nucleus accumbens dopamine release encodes effort- and delay-related costs. Biol. Psychiatry 68, 306–309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Stauffer WR, Lak A, and Schultz W (2014). Dopamine reward prediction error responses reflect marginal utility. Curr. Biol 24, 2491–2500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Keramati M, and Gutkin B (2014). Homeostatic reinforcement learning for integrating reward collection and physiological stability. Elife 3 Available at: 10.7554/eLife.04811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Brunton BW, Botvinick MM, and Brody CD (2013). Rats and humans can optimally accumulate evidence for decision-making. Science 340, 95–98. [DOI] [PubMed] [Google Scholar]
  • 80.Hanks TD, Kopec CD, Brunton BW, Duan CA, Erlich JC, and Brody CD (2015). Distinct relationships of parietal and prefrontal cortices to evidence accumulation. Nature 520, 220–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Scott BB, Constantinople CM, Erlich JC, Tank DW, and Brody CD (2015). Sources of noise during accumulation of evidence in unrestrained and voluntarily head-restrained rats. Elife 4, e11308. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Information

Data Availability Statement

Behavioral data are available upon request by contacting the Lead Contact, Christine Constantinople (constantinople@nyu.edu).

RESOURCES