Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 May 15.
Published in final edited form as: Biol Psychiatry. 2023 Dec 13;95(10):974–984. doi: 10.1016/j.biopsych.2023.12.005

Recent Opioid Use Impedes Range Adaptation in Reinforcement Learning in Human Addiction

Maëlle CM Gueguen 1,2, Hernan Anlló 2,3,4, Darla Bonagura 1,2, Julia Kong 1, Sahar Hafezi 1, Stefano Palminteri 2,4, Anna B Konova 1,2,
PMCID: PMC11065633  NIHMSID: NIHMS1953575  PMID: 38101503

Abstract

Background:

Drugs like opioids are potent reinforcers thought to co-opt value-based decisions by overshadowing other rewarding outcomes, but how this happens at a neurocomputational level remains elusive. Range adaptation is a canonical process of fine-tuning representations of value based on reward context. Here, we tested whether recent opioid exposure impacts range adaptation in opioid use disorder (OUD), potentially explaining why shifting decisions away from drugs in this vulnerable period is so difficult.

Methods:

Participants who had recently (<90 days) used opioids (n=34) or who had abstained from opioid use ≥90 days (n=20) and comparison controls (n=44) completed a reinforcement learning task designed to induce robust contextual modulation of value. Two models were used to assess the latent process participants engaged in making their decisions: a RANGE model that dynamically tracks context and a standard model that assumes stationary, objective encoding of value (ABSOLUTE model).

Results:

Controls and ≥90-day abstinent OUD participants exhibited choice patterns consistent with range-adapted valuation. By contrast, those with recent opioid use were more prone to learn and encode value on an absolute-scale. Computational modeling confirmed the behavior of most controls and ≥90-day abstinent OUD (75%), but a minority in the recent use group (38%), was better fitted by the RANGE (vs. ABSOLUTE) model. Further, the degree to which participants relied on range adaptation correlated with continuous abstinence duration and subjective craving/withdrawal.

Conclusion:

Reduced context adaptation to available rewards could explain difficulty deciding about smaller (typically nondrug) rewards in the aftermath of drug exposure.

Keywords: reward, value, range adaptation, reinforcement learning, addiction, opioids

Introduction

Drugs of abuse are potent reinforcers thought to co-opt value-based decisions by overshadowing other rewarding outcomes, leading to continued drug pursuit (1-3). This “hijacking” can be explained by normative accounts of reinforcement learning and valuation: given the supra-physiologically reinforcing effects of drugs, their value outweighs that of many alternatives, causing drugs to be preferred over time over these alternatives (4-7).

However, like all recurring-remitting disorders, drug addiction follows a cyclical pattern, where the symptoms of addiction and drug use itself can fluctuate (3, 8). It remains unclear which neurocomputational mechanisms underlie these more time-limited clinical vulnerability states. Standard (static) reinforcement learning processes can reasonably explain long-term changes associated with the development of addiction, but fall short in accounting for the behavioral changes that accompany these dynamic addiction vulnerability states, which can evolve over short timescales like months, weeks, or even days (9-12). Improved understanding of the decision-relevant mechanisms of vulnerable addiction states, separate from or in conjunction with those associated with cumulative addiction burden, is crucial for identifying effective intervention targets and preventing unwanted transitions in the addiction cycle.

This question is especially pertinent in the context of opioid misuse (encompassing heroin and opioid analgesics). Opioid misuse and overdose remain pressing public health challenges in the U.S (13, 14). Despite the availability of evidence-based treatments for opioid use disorder (OUD), reuse and relapse are common (15, 16). Further, once opioid reuse has occurred, continued use is the modal outcome (17). These observations suggest that opioid exposure creates a time-limited vulnerability state that is difficult to overcome, even in people receiving gold-standard treatment (18, 19).

To evaluate what neurocomputational process might account for the over-selection of drug-related actions following recent opioid exposure, we turned to models of context-dependent choice. A robust cross-species literature suggests the perceived value of an option is not estimated in isolation but is highly dependent on its local context (20-26). For example, an option objectively worth $0.75 is perceived as more valuable if it is available in a context where the range of reward is relatively narrow ($0-$1) than relatively wide ($0-$10). Such context effects are tied to a canonical property of the brain’s valuation systems in which regions like the orbitofrontal cortex dynamically rescale their responses to the available reward distribution. Range adaptation facilitates sensitivity to detect value differences in very different contexts despite finite firing rates (20, 27, 28). While range adaptation has been extensively observed behaviorally (29, 30) and neurally (22, 31-34) in humans, its functional role in addiction or vulnerable addiction states has not been examined. We reasoned that reduced range adaptation in OUD could explain shifts in drug-seeking behavior, especially following recent use. Existing models of this vulnerable period have generally invoked specialized processes [e.g., drug-related set point shift/homeostasis (35, 36), drug cue-triggered incentive salience (11)]. Here, we asked if, instead, a general mechanism (a known property of value systems) could explain how both the value of the drug, and nondrug value, might shift in the aftermath of opioid exposure. In this view, a dysfunctional valuation system due to difficulties in range adaptation could alter drug-seeking directly, because of the absolute and high value assigned to the drug, or indirectly through the experience of apathy and suboptimal social, health-related, or economic decisions, because of diminished sensitivity to differences between smaller (typically nondrug) rewards.

To quantify range adaptation, we leveraged a reinforcement learning task and model previously shown to robustly capture context-dependent behavior in healthy human choosers (30). We focused on treatment-engaged individuals as this group reflects the more severe end of the spectrum, and because current treatments for OUD do not fully mitigate craving and reuse (15, 16). Thus, identifying a potentially novel vulnerability mechanism in this group would be highly significant. Further, because people abstinent for at least 90 days are considered “stabilized” given significant reduction in their symptoms and risk for reuse (37-41), and can thus qualify for early remission status, we used 90 days as a clinically-relevant cut-off for subgrouping participants. Lastly, as abstinence effects in range adaptation could arise from changes that signal a move away/toward a premorbid, normative neurocognitive pattern, or from a shift away from such a reference point to support active maintenance of current clinical state (42), we included a third (comparison) group of control individuals.

We tested whether participants in each of the three groups relied on a range-adapted value scale (RANGE model) or a context-independent scale (ABSOLUTE model) in forming their estimates of value during reinforcement learning, and how recent drug use affected the strategy used. Specifically, we asked if people with chronic opioid addiction are more or less sensitive to the range of available reward, and, crucially, if (and how) range adaptation is modulated by recent opioid use. Given prior literature showing altered reinforcement learning and decision processes in OUD in tasks that did not explicitly manipulate context (43-46), we also assessed for differences in lower-level, context-independent reward and probability sensitivity to determine specificity to context-dependent choice.

Methods and Materials

Participants.

We enrolled n=56 individuals with OUD undergoing outpatient treatment (primarily with suboxone/buprenorphine) and n=44 socio-demographically similar controls (N=100). All OUD participants met DSM-defined criteria for OUD, as determined by clinic staff and obtained from patient charts, that was in the moderate to severe range on average at study entry (Table 1). Other substance use was not exclusionary, but opioids were identified as primary. See Supplement for full inclusion/exclusion criteria and methods for ascertaining recent use status. Due to technical issues, two datasets were corrupted, leaving n=54 OUD participants (age=47.50 [SE=1.59] years, 20 females) and n=44 controls (age=49.27 [SE=2.19] years, 22 females) for analysis. All participants provided written informed consent following Rutgers University IRB-approved procedures.

Table 1.

Demographic and clinical characteristics of the study sample.

Opioid Use Disorder Controls
<90-day abstinent ≥90-day abstinent
N=34 N=20 N=44 Test P
Demographic
Age (years) M=49.9 SE=1.8 M=43.4 SE=2.9 M=49.3 SE=2.2 F2,95=1.84 0.21
Sex: χ2=3.84 0.15
Male 24 10 22
Female 10 10 22
Race: χ2=13.52
(CTR<90 OUD)
0.04
Asian 0 0 5
Black or African American 20 7 15
White 9 11 21
Other or More Than One Race 5 2 3
Ethnicity χ2=0.99 0.61
Hispanic or Latino 3 4 5
Non-Hispanic or Latino 30 16 39
Education: χ2=46.97
(CTR>all OUD)
9.56×10−7
Primary complete 3 3 0
High school complete/GED 18 10 3
Some college 10 5 13
College complete 1 1 22
Some graduate 0 0 3
Graduate complete 1 1 3
Nonverbal IQ (K-BIT) A M=88.8 SE=2.8 M=94.5 SE=2.4 M=103.5 SE=2.4 F2,91=8.99
(CTR>all OUD)
0.0003
Numeracy module of HHS (0–6) B M=3.0 SE=0.2 M=3.2 SE=0.3 M=4.0 SE=0.2 F2,93=6.68
(CTR>all OUD)
0.0019
Working memory (Digit Span-Rev.) C M=5.43 SE=0.37 M=5.88 SE=0.59 M=6.28 SE=0.27 F2,83=1.38 0.26
Income (monthly) $0–$11200
(Median=$680)
$0–$9000
(Median=$477)
$0–$10000
(Median=$1988)
Kruskal-Wallis
χ2=9.11
(CTR>all OUD)
0.01
Psychiatric & Substance Use
Depression (BDI-II, 0–63) D M=16.8 SE=2.0 M=19.6 SE=2.7 M=6.7 SE=1.3 F2,91=14.06
(CTR<all OUD)
4.8×10−6
Anxiety (STAI-T, 20–80) E M=44.8 SE=1.9 M=45.9 SE=2.3 M=33.9 SE=1.7 F2,92=12.69
(CTR<all OUD)
1.4×10−5
Nicotine use (FTND score, 0–10) F n=25 current smokers n=11 current smokers n=3 current smokers F2,37=0.92 0.41
M=6.4 SE=0.3 M=6.8 SE=0.4 M=5.3 SE=0.1
Alcohol use (Lifetime years) G M=22.5 SE=2.6 M=16.4 SE=3.0 M=20.8 SE=3.0 F2,93=0.85 0.43
Cocaine use (Lifetime years) H n=29 n=18 t51=1.34 0.19
M=12.4 SE=1.9 M=8.4 SE=2.1
Any opioid use (Lifetime years) H M=22.1 SE=2.2 M=13.8 SE=2.3 -- -- t51=2.50 0.016
Intravenous use (%) I 30.3% 23.8% -- χ2=0.35 0.56
History of overdose (%) J 54.6% 42.9% -- χ2=1.46 0.23
HCV positive status (%) 24.2% 19.1% -- χ2=0.20 0.65
HIV positive status (%) 3.0% 4.8% -- χ2=0.09 0.76
Duration of current treatment (months) 0.4–359.3
(Median=13.2 months)
0.7–67.8
(Median=10.6 months)
-- Wilcoxon Test
z=0.97
0.33
Treatment medication: -- χ2=3.56 0.17
Suboxone/Buprenorphine 28 15
Methadone 6 3
Vivitrol/Naltrexone 0 2
Suboxone/ Buprenorphine dose M=20.7 SE=1.1 M=13.1 SE=1.8 -- -- t41=3.86 3.94×10−4
Methadone dose M=95.8 SE=8.8 M=85.3 SE=40.6 t7=0.36 0.73
Morphine equivalents (mg) K M=563.0 SE=35.1 M=369.3 SE=49.4 t50=3.22 0.0023
Duration of opioid abstinence (days) L 0–78
(Median=2 days)
90–4421 days
(Median=393 days)
-- Wilcoxon Test
z=−6.16
7.24×10−10
OUD severity (DSM5 checklist) M M=7.87 SE=0.63 M=5.00 SE=1.04 -- -- t50=2.45 0.02
Current craving (HCQ-NOW, 1–7) N M=4.15 SE=1.3 M=0.43 SE=0.3 -- -- t51=4.66 2.28×10−5
Current withdrawal (SOWS, 0–64) M=8.24 SE=1.3 M=7.5 SE=1.7 t52=0.35 0.73
Current anxiety (STAI-S, 20–80) O M=37.41 SE=1.8 M=35.50 SE=3.1 M=31.91 SE=1.8 F2,93=2.10 0.13
A

Kaufman Brief Intelligence Test (K-BIT). Normative data suggest scaled scores between 85–115 constitute the average nonverbal IQ of the population. K-BIT scores are missing tor n=2 <90-day abstinent OUD participants and n=3 controls.

B

Numeracy module of the Health and Human Services (HHS) survey. Numeracy scores are missing for n=1 <90-day abstinent OUD participant and n=1 control.

C

Working memory test - Digit Span reverse (DSMT). DSMT-reverse score missing for n=4 <90-day abstinent OUD participants, n=3 ≥90-day abstinent OUD participants, and n=5 controls

D

Beck Depression Inventory (BDI-II). Depression severity cut-offs for the BDI-II are as follows: 0–13 minimal, 14–19 mild, 20–28 moderate, and 29–63 severe. BDI-II score is missing for n=2 <90-day abstinent OUD participants and n=1 control.

E

State-Trait Anxiety Inventory-Trait (STAI-T). Anxiety severity cut-offs for the STAI-T are as follows: 20–37 no or low anxiety, 38–44 moderate anxiety, and 45–80 high anxiety. STAI-T scores are missing for n=3 <90-day abstinent OUD participants and n=1 control.

F

Fägerstrom Test for Nicotine Dependence (FTND) in current smokers: n=25 <90-day abstinent OUD participants, n=12 ≥90-day abstinent OUD participants, and n=3 controls.

G

Years of use of alcohol from the ASI. Data missing for n=1 <90-day abstinent OUD participant and n=1 control.

H

Years of use of cocaine and any opioid use (heroin, painkillers) from the ASI. Data missing for n=1 <90-day abstinent OUD participant.

I

Intra-veinous use is missing for n=1 <90-day abstinent OUD participant.

J

History of overdose is missing for n=3 <90-day abstinent OUD participants.

K

Morphine milligram equivalent (MME) for treatment medications. 1 mg of buprenorphine is equivalent to 30 mg of morphine, and 1 mg of methadone is equivalent to 3 mg of morphine. Average MME estimates omit n=2 ≥90-day abstinent OUD participants on naltrexone.

L

Duration of current treatment (months). Treatment duration missing for n=3 <90-day abstinent OUD participants.

M

DSM5 checklist total score missing for n=2 <90-day abstinent OUD participants.

N

Current craving defined as the average of the “craving now” and “urge now” measures from HCQ-NOW. Craving measure missing for n=1 <90-day abstinent OUD participant.

O

State-Trait Anxiety Inventory-State (STAI-S). Current anxiety measure missing for n=2 <90-day abstinent OUD participants.

We recruited OUD individuals who, despite being treatment-engaged, had consumed opioids (heroin, nonprescription opioid analgesics), either recently or in the more distant past. While different abstinence durations are used to define clinically vulnerable periods, a minimum of 60-90 days is typically considered clinically-stabilized (38, 41), with 90 days required for early remission status. Further, the path to stabilization within this period can be nonlinear (37, 47), and as we did not recruit sufficient numbers of participants for intermediate abstinence bins, for our primary analyses, we divided participants into two groups based on self-reported days since last use (which had good concordance in our sample with urinalysis results, ~88%): a <90-day group (n=34, range: 0–78 days, median=2 days) and ≥90-day group (n=20; range: 90–4421 days, median=418 days).

Control participants were drawn from the same geographic area and matched on age, sex, and ethnicity to patients, but differed on other sociodemographic variables, nonverbal IQ and numeracy (although notably, not working memory), and depression/anxiety (Table 1). Also, in addition to recent use, the OUD subgroups differed from each other on years of use and current OUD severity and medication dose (which index, in part, current use severity).

Behavioral task.

Participants completed a task, validated for the remote and in-person settings (30), involving choices aimed at maximizing an overall point total across three phases (learning, transfer, and explicit; Fig. 1A). The task was administered through an online portal with an experimenter present (see Supplement for details). Participants were instructed to choose options likely to yield more points, and that one of the two options in each pair would, on average, yield more points than the other. In each trial of the learning phase, the points associated with both the chosen and unchosen options were shown. In all other phases, no point outcomes were shown. Compensation was $10-15/hour plus a bonus based on the cumulative points earned across the three phases at an exchange rate of 10 pts.=$0.05.

Fig. 1. Experimental design.

Fig. 1.

(A) Following clinical state questionnaires and a short instructions and training phase with the experimenter, participants completed three task phases: learning, transfer, and explicit phase. (B) During the learning phase, abstract cues were paired to form two expected value (EV) contexts: WIDE (ΔEV=5) and NARROW (ΔEV=0.5) that were presented in four blocks (two of each context in fully randomized order). Participants learned by trial-and-error which was the better (more valuable) option in each pair in each context while receiving complete feedback on the chosen and unchosen options in each trial. (C) During the transfer phase, cues were rearranged to form four new pairs (presented in four blocks in randomized order), with the ΔEV=1.75 pair being key in testing model predictions. As there was no additional feedback in this phase, participants had to extrapolate cue values acquired during the learning phase. If these values are learned as range-adapted values, performance for this pair is predicted to be below-chance level (Fig. 2). Finally, during the explicit phase, we probed choice for explicitly stated probabilistic outcomes in the EV range available in the transfer phase to assess lower-level decision processes related to probability and magnitude sensitivity. All trials were self-paced and separated by a 500 ms delay. Outcomes were presented as points in the learning phase and “?” in the transfer phase for 1000 ms.

The task was adapted from Exp. 7 in Bavard et al. (30), who systematically compared different designs to demonstrate that healthy human participants acquire context-dependent values during reinforcement learning, best explained by a range adaptation process. The design in Exp. 7 was chosen here because it robustly induced/captured this process while minimizing task difficulty (of relevance for studies of chronic OUD). In the learning phase, participants learned the expected values (EV) of 4 cue pairs in one of two contexts: NARROW (ΔEV=0.5 [75% 1 pt. vs. 25% 1 pt.]) and WIDE (ΔEV=5.0 [75% 10 pts. vs. 25% 10 pts.]; Fig. 1B), with 30 trials per pair presented in blocks (order randomized). If participants encoded range-adapted values rescaled to the local context (block), they should show comparable choice accuracies in both contexts (Fig. 2A). In contrast, encoding absolute (context-independent) values would result in lower accuracy in the NARROW context, where smaller EVs are harder to distinguish on an absolute scale compared to the larger EVs.

Fig. 2. Model predictions.

Fig. 2.

(A) In the learning phase, range adaptation (filled circles) is expected to account for similar performance in the WIDE (gray) and NARROW contexts (green) by permitting increased sensitivity to discriminate the smaller value differences in the NARROW context relative to an absolute value encoding strategy (open circles). (B) In the transfer phase (see Fig. 1C), range adaptation is expected to account for choice errors when the EV=2.5 cue from the WIDE context (gray) is paired with the EV=0.75 cue from the NARROW context (green) to form a new ΔEV=1.75 pair. A visual depiction of how range adaptation relates objective to subjective value in a finite system is depicted in (B) and aggregate-level choice based on model predictions is shown in (A and C).

Although generally adaptive for value coding, range adaptation can, under some circumstances, induce systematic choice errors (29, 30, 48-50) (Fig. 2B). We exploited this paradoxical ‘downside’ as a converging test of our hypotheses in the transfer phase. Here, cues were rearranged to create 4 new pairs (Fig. 1C; 30 trials per pair presented in blocks without feedback). Participants were instructed that the cues would maintain their values from the first phase but would be presented in new combinations without feedback. Among the new pairs, one is especially revealing of range adaptation. This “diagnostic” pair combines the lower EV cue from the WIDE context (EV=2.5) with the higher EV cue from the NARROW context (EV=0.75), yielding ΔEV=1.75. In this pair, having learned range-adapted values predicts participants will (erroneously) choose the objectively lower-value cue (Fig. 2C).

Finally, in the explicit phase, as a control task, participants made binary choices based on explicit (fully-described) reward magnitudes and probabilities that matched those encountered in the transfer phase, including the diagnostic ΔEV=1.75 pair (see Supplement, where we also considered a related static ‘UTILITY’ model).

RL models.

To formally model range adaptation, we used variations on a simple Q–learning algorithm to estimate learning in each context (or state) of the learning phase to maximize the expected reward Q. At trial t, option values in the current state S are updated with a delta rule:

Qt+1(s,c)=Qt(s,c)+αcδt

and

Qt+1(s,u)=Qt(s,u)+αuδt

where αc is the factual learning rate for the chosen option and αu is the counterfactual learning rate for the unchosen option. δt is the prediction error calculated as:

δt=RtQt(s)

where Rt is the reward term, which differs based on the model. We modeled participants’ choices using a Softmax decision rule to determine, on a trial-by-trial basis, the probability of choosing one option a over the other option b and where β is the temperature parameter which controls choice stochasticity:

Pt(s,a)=11+e[Qt(s,b)Qt(s,a)]β

We then compared two models that make theoretically distinct predictions about Rt: the ABSOLUTE model, which assumes rewards are encoded on an absolute scale independent of context (i.e., their objective value):

Rt{0,1,10}

and the RANGE model in which rewards are normalized using the maximum reward RMAX available in the local context (NARROW or WIDE; note RMIN is 0 for both), rescaling outcomes between 0 and 1:

RRAN,t=Rt(RMAX,t(s)RMIN,t(s)+1)

with the update of RMAX being progressive based on a contextual-updating learning rate αr and the maximal reward obtained in that context so far:

RMAX,t+1=RMAX,t+αr(max(R)RMAX,t)

For both models, we used a constrained method to optimize parameters at the individual level, with 0αc1; 0αu1; 0αr1; and 0β+. Models were fit using maximum-likelihood estimation in MATLAB.

Statistical analysis.

For model-free analyses, we compared choice behavior across the three groups in each phase (learning, transfer, explicit) using generalized linear mixed-effects models (fitglme in MATLAB). These models included fixed-effects predictors for group, trial number, and ΔEV (learning only), and random intercepts and slopes per participant. We employed F-tests (marginal tests) to assess the significance of fixed-effects coefficients, and followed-up with pairwise t-tests. Model-based analyses involved model comparison using Akaike Information Criterion (AIC) to evaluate the evidence supporting RANGE-adapted vs. ABSOLUTE value representations (with participants classified as better fit by one or the other model based on ΔAIC). We also report results using Bayesian Information Criterion (BIC), which penalizes models with more parameters more strictly. Complementary analyses examined continuous associations with days since last use and clinical indicators of opioid abstinence (i.e., craving and withdrawal). Lastly, control analyses assessed the influence of clinical/sociodemographic variables that differed between groups (Table 1), use of other substances, and task setting (remote/in-person), as well as alternative explanations of the data by considering other models and behavior in the control (explicit phase) task.

Results

Choice behavior.

During the learning phase (Fig. 3), all three groups successfully acquired the cue-reward associations. Controls and ≥90-day abstinent OUD participants had similar performance in the NARROW (ΔEV=0.5) and WIDE (ΔEV=5) contexts, supporting range adaptation (Fig. 2 and Figs. S1-2 for direct comparison to Bavard et al. Exp. 7). However, those with recent use (<90-day abstinent group) demonstrated reduced performance in the NARROW context, indicative of absolute-value encoding instead. The generalized mixed-effects model examining trial-by-trial correct choices showed accuracy increased with trial number (F1,11628=14.17, P=0.0002) and differed overall across groups (F2,11628=4.25, P=0.01). Crucially, there were significant interaction effects for group × context (F2,11628=6.09, P=0.002) and trial number × group × context (F2,11628=3.32, P=0.04; all other main or interaction effects, P>0.17, one influential outlier censored, >4 SDs from group mean). Follow-up t-tests showed that accuracy was selectively lower in the NARROW context compared to the WIDE context in those with recent use (t33=−2.53, P=0.02, d=0.43), did not differ in controls (t43=0.75, P=0.46, d=0.11), and favored the reverse (even better learning in the NARROW context) in ≥90-day abstinent participants (t18=−2.42, P=0.03, d=0.56; Fig. 3). Accuracy was also lower in the NARROW context in those with recent use compared to ≥90-day abstinent participants (t51=−2.36, P=0.02, d=0.70; WIDE: P=0.93). Neither OUD subgroup differed from controls in either the NARROW or WIDE context (t<∣1.35∣, P>0.18).

Fig. 3. Choice behavior during the learning phase.

Fig. 3.

(A-C) Trial-by-trial and (D-F, left) average performance (% choice accuracy, possible range: 0 to 100) in the learning phase for each reward context, NARROW and WIDE. (D-F, right) Difference in average performance between the WIDE and NARROW contexts (ΔPerformance; possible range: −100 to 100). (A, D) Data for n=34 OUD participants abstinent from opioids for <90 days, (B, E) n=20 OUD participants abstinent from opioids ≥90 days, and (C, F) n=44 healthy community controls. Dots represent individual participant data and shaded areas represent group density with box plots for 95% CI and error bars for ± SE. (D-F, left) The dotted line is chance level (50%). (D-F, right) The solid line is null performance difference between the WIDE and NARROW contexts. One influential outlier (>4 SDs) out of range in panel E, right (applying a >3 SDs outlier threshold led to similar overall results). *P<0.05.

Also supporting range adaptation, controls and ≥90-day abstinent OUD participants, but not those with recent use, made systematic errors during the transfer phase when ΔEV=1.75 (Fig. 4). As expected, given no feedback was provided to facilitate new learning, choice accuracies remained relatively constant with trial number (F1,2934=2.52, P=0.11) and differed overall across groups, albeit only at trend-level (F2,2934=2.70, P=0.067; trial number × group, P=0.13). Planned t-tests showed that accuracy was below chance-level in controls (t43=−2.31, P=0.03, d=0.35) and in ≥90-day abstinent participants (t19=−2.97, P=0.008, d=0.66) but did not differ from chance in those with recent use (t33=−0.65, P=0.52, d=0.11). Accuracy also tended to be lower in ≥90-day abstinent participants compared to those with recent use (t52=1.75, P=0.087, d=0.50). Neither OUD subgroup differed significantly from controls (t<∣1.06∣, P>0.29). For comparison, we also plot the data in Fig. 4 when ΔEV=7.25 where systematic choice errors are not expected or observed (see Fig. S3 for data across all transfer pairs). Planned t-tests indicated that all three groups performed at or above chance level when ΔEV=7.25 (t97=3.48, P=7.63×10−4; group effect: F1,95=1.92, P=0.15).

Fig. 4. Choice behavior during the transfer phase for ΔEV=1.75 cue pair “diagnostic” of range adaptation and reference ΔEV=7.25 cue pair.

Fig. 4.

(A-C) Trial-by-trial and (D-F) average performance (% choice accuracy, possible range: 0 to 100) in the transfer phase. (A, D) Data for n=34 OUD participants abstinent from opioids for <90 days, (B, E) n=20 OUD participants abstinent from opioids ≥90 days, and (C, F) n=44 healthy community controls. Dots represent individual participant data and shaded areas represent group density with box plots for 95% CI and error bars for ± SE. The dotted line is chance level (50%). *P<0.05, **P<0.01, ***P<0.001.

RL model comparison.

We modeled participants’ choices during learning with two models that differed in their assumption about how value is encoded: on an absolute scale (ABSOLUTE model) or range-adapted scale based on the local context (RANGE model). Comparison of model fits based on AIC scores confirmed that the RANGE model outperformed the ABSOLUTE model (average ΔAIC=7.97 in favor of RANGE; t97=−6.22, P=1.25×10−8, d=0.63). However, while the behavior of most controls (75%) and ≥90-day abstinent participants (75%) was better fit by the RANGE model, this was the case for only 38% of those with recent use (χ2=12.77, P=0.002, Cramer's V=0.36; Fig. 5A), consistent with the observed choice patterns in the raw data (Figs. 3-4; see also Fig. S4 for out-of-phase predictions where models fit to learning are used to predict transfer behavior). Model comparison using BIC instead of AIC, or via Bayesian model selection, led to similar conclusions (Fig. S5). Model recovery analysis further confirmed reasonably good recovery of both models (Fig. S6). Figs. S7-S8 show the best-fitting parameters from each model. Group differences were observed only in the contextual learning rate (αr), which influences the normalization speed for partial or full range adaptation (F2,95=3.23, P=0.04). αr was higher in ≥90-day abstinent participants (t62=2.45, P=0.017) and tended to be higher in those with recent use (t76=1.77, P=0.07) compared to controls, with no significant differences between the two OUD subgroups (t52=0.76, P=0.45).

Fig. 5. Influence of abstinence duration on best model.

Fig. 5.

(A) Proportion of participants better fit by RANGE (gray) vs. ABSOLUTE (white) model based on ΔAIC in n=34 OUD participants abstinent from opioids for <90 days, n=20 OUD participants abstinent from opioids ≥90 days, and n=44 healthy community controls. (B) Continuous relationship between abstinence duration and reliance on RANGE vs. ABSOLUTE model, as captured by ΔAIC, across all participants with OUD (n=54). The shade of each point represents the abstinence duration bin for that participant. Coefficients for the same analysis with craving and withdrawal are provided as inset text. **P<0.01.

Continuous association with clinical state.

The propensity of OUD participants to learn and represent range-adapted values (captured by ΔAIC scores favoring the RANGE model) increased with number of abstinence days (Rs=0.39, P=0.004; Fig. 5B), such that it was highest among those abstinent for ≥90 days to ≥1 year and lowest among those with very recent opioid exposure (≤72 hours). A similar relationship was separately observed with craving (Rs=−0.29; P=0.04) and withdrawal (Rs=−0.29, P=0.03) suggesting range adaptation was also more common among less-symptomatic individuals.

Potential confounds and alternative mechanisms.

Control analyses assessed how group differences in socioeconomic and clinical variables, apart from recent opioid use (Table 1), and task setting, affected our main behavioral indices of range adaptation. None of these variables contributed significantly, and statistically controlling for them did not alter any of our results or interpretations (see Supplement).

To confirm the observed group differences in the learning and transfer phases were also not due to lower-level differences in reward magnitude or probability sensitivity, we (1) evaluated an alternative model applied to the learning phase (UTILITY; Supplement) and (2) analyzed choices in the explicit phase. The UTILITY model assumes that participants learn static ‘utilities’ (context-independent nonlinear representations of reward magnitude). Bavard et al. (30) found the RANGE and UTILITY models were confusable in Exp. 7 (the design used here), but could be distinguished in experiments that provided feedback during transfer, allowing the dynamic vs. static nature of the models to tease apart their influence. As in this study, the UTILITY model explained the learning phase behavior well in our data, even outperforming the RANGE model, but captured the transfer behavior less accurately. Further, there were no differences in the proportion of participants better-fitted by RANGE vs. UTILITY across groups (Supplement and Fig. S9), suggesting differences in static utility representations did not fully account for the behavioral pattern observed with opioid exposure. Moreover, the groups did not differ in their reward magnitude and probability sensitivity in the explicit task (Fig. S10), making an explanation in terms of utility differences even less tenable. All participants were sensitive to ΔEV, performing at or above chance level when ΔEV=1.75 (key-transfer pair equivalent), and to magnitude and probability level when each were separately varied, with no significant group differences. Taken together with the overall superiority of the RANGE model in this class of tasks (30), these results favor an explanation of opioid exposure effects in terms of differences in range adaptation.

Discussion

We found that recent opioid use and associated craving/withdrawal impede range adaptation when representing value in contexts that differ in the available reward range. This reduced tendency to range-adapt in recent users signals difficulty in flexibly fine-tuning representations of a current reward’s value to better place it within the range of available outcomes, and was more pronounced the less time had elapsed since opioid exposure. Our results offer novel insights into the role of range adaptation in human drug addiction, suggesting a time-limited alteration of this fundamental process linked to opioid exposure in people with OUD.

To quantify range adaptation, we employed a model-based approach and a task designed to induce robust contextual-modulation of value. Replicating prior findings in healthy young adults (30), we found healthy participants in the present study, as well as people with OUD who had abstained from opioids ≥90 days, reliably range-adapted. This was captured by a pattern of choice behavior during the learning and transfer phases of the task that qualitatively matched theoretical predictions. Quantitatively, participants’ behavior was better fit by a RANGE variant of the standard Q-learning model, and was distinct from choice under explicit conditions that did not probe learning from experience or context-dependence. By contrast, people with OUD with more recent use had limited behavioral adjustment to context, and a much lower proportion (38% vs. 75% in the other two groups) were better fitted by the RANGE model, relying more on absolute-value coding instead.

What explains the reduced tendency of recent users to range-adapt? Given overlap in the neural systems that represent range-adapted value and choice and those affected by drugs of abuse (7, 10, 20, 21, 24, 31), one possibility is that recent drug exposure inhibits the neurophysiological process of range adaptation. Although this process is not yet well-described, it is assumed to parallel sensory adaptation (51), and could involve drug-induced changes in local cellular function and dynamics in these systems.

Alternatively, those with more recent use/craving might actually be adapted to the current high value of the drug (9), rendering the task rewards insufficient to further influence behavior (i.e., there is longer-term temporal context adaptation that, given the supra-physiological effects of opioids, requires prolonged abstinence to resolve). By this account, the drug becomes a value of reference outside the local context for all choices that must be made. Regardless of the underlying mechanism, however, the implications are the same: a reduced tendency to range-adapt would make it harder to distinguish among smaller rewards, leading to their suboptimal choice and/or apathy across multiple social and personal domains. Moreover, differences between high-value drug options and lower-value alternatives would be amplified under these scenarios, collectively promoting drug-seeking in high-vulnerability states.

Interestingly, ≥90-day abstinent OUD participants demonstrated normal or even upregulated range adaptation, suggesting recovery of this adaptive function with abstinence or a role in supporting its active maintenance (42). Across different indices of range adaptation, ≥90-day abstinent participants consistently outperformed those with recent use, with controls typically falling intermediate. However, as not all pairwise comparisons reached significance, these findings remain to be replicated with larger samples. Further, by using a 90-day abstinence cut-off, our groups were defined by abstinence durations that spanned a wide range. Those in the ≥90-day group had a median abstinence of over 1 year, while those in the <90-day group had a median abstinence of just 2 days. This raises additional questions about the necessary duration needed for recovery, warranting further longitudinal studies.

By targeting within-person effects, such studies could also help to attribute reduced range adaptation more conclusively to recent opioid use. Using a cross-sectional design, we found suggestive evidence that range adaptation selectively related to abstinence and associated states, but not other relevant socioeconomic or clinical factors, including opioid use chronicity and medication exposure or polysubstance use. Nevertheless, the complex clinical profile of our OUD population limited our ability to completely rule out additional contributing factors and mechanisms.

Lastly, we attribute our findings to range adaptation given work by Bavard et al. that established this process as the most parsimonious and general account of context-dependent reinforcement learning (30). However, alternative mechanisms like context-independent reward magnitude and probability sensitivity may also be involved and our particular task variant does not optimally dissociate these from range adaptation. The static UTILITY model, for instance, fit the learning phase data well and could in theory account for group differences in the transfer phase in terms of differences in perceived utilities of the small-stakes rewards used. Although our results showing this model did worse predicting transfer behavior and data from the explicit (control) task speak against this explanation and favor one based on differential range adaptation, future research using different experimental designs like in Bavard et al. could further tease these alternative mechanisms apart in OUD. Additionally, the basis of individual magnitude and probability sensitivity are increasingly thought to index certain adaptation processes (52, 53), questioning a clear dissociation and arguing for jointly studying these processes in OUD.

Together, our findings reveal a novel mechanism of vulnerable addiction states tied to range adaptation, a fundamental decision process known to support adaptive choice in humans (22, 29, 30, 32-34) and across species (20, 31, 32). Recent use and associated craving/withdrawal biased the value encoding strategy participants used to make decisions—impeding contextual-adaptation to the current available reward range in favor of a less efficient, absolute-value encoding strategy. These findings extend existing addiction models that emphasize static reinforcement mechanisms, deepening our understanding of dynamic valuation processes that could contribute to relapse or promote recovery.

Supplementary Material

1

KEY RESOURCES TABLE

Resource Type Specific Reagent or Resource Source or Reference Identifiers Additional Information
Add additional rows as needed for each resource type Include species and sex when applicable. Include name of manufacturer, company, repository, individual, or research lab. Include PMID or DOI for references; use “this paper” if new. Include catalog numbers, stock numbers, database IDs or accession numbers, and/or RRIDs. RRIDs are highly encouraged; search for RRIDs at https://scicrunch.org/resources. Include any additional information or notes if necessary.
Software; Algorithm MATLAB R2018a, R2018b Mathworks RRID:SCR_001622  

Acknowledgements

We would like to thank the members of the Intercultural Cognitive Network for resource support, members of the Rutgers-Princeton Center for Computational Cognitive Neuropsychiatry (CCNP) for helpful discussions, and all the patients and staff at our collaborating treatment programs.

Funding

This work was supported by grants from the National Institute on Drug Abuse (R01DA053282, R01DA054201). SP is supported by the Agence National de la Recherche (CogFinAgent: ANR-21-CE23-0002-02; RELATIVE: ANR-21-CE37-0008-01; RANGE: ANR-21-CE28-0024-01) and the Departement d’Etudes Cognitives (FrontCog: ANR-17-EURE-0017).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Competing interests

The authors report no biomedical financial interests or potential conflicts of interest.

References

  • 1.Bickel WK, Mellis AM, Snider SE, Athamneh LN, Stein JS, Pope DA (2018): 21st century neurobehavioral theories of decision making in addiction: Review and evaluation. Pharmacology Biochemistry and Behavior: Elsevier Inc., pp 4–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Konova AB, Goldstein RZ (2015): Role of the Value Circuit in Addiction and Addiction Treatment. The Wiley Handbook on the Cognitive Neuroscience of Addiction, pp 109–127. [Google Scholar]
  • 3.Volkow ND, Koob GF, McLellan AT (2016): Neurobiologic Advances from the Brain Disease Model of Addiction. New England Journal of Medicine. 374:363–371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Redish A (2004): Addiction as a Computational Process Gone Awry. [DOI] [PubMed] [Google Scholar]
  • 5.Redish AD, Jensen S, Johnson A (2008): A unified framework for addiction: Vulnerabilities in the decision process. Behavioral and Brain Sciences, pp 415–487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Huys QJ, Tobler PN, Hasler G, Flagel SB (2014): The role of learning-related dopamine signals in addiction vulnerability. Prog Brain Res. 211:31–77. [DOI] [PubMed] [Google Scholar]
  • 7.Schoenbaum G, Roesch MR, Stalnaker TA (2006): Orbitofrontal cortex, decision-making and drug addiction. Trends in neurosciences. 29:116–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Koob GF, Volkow ND (2010): Neurocircuitry of Addiction. Neuropsychopharmacology. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Biernacki K, Lopez-Guzman S, Messinger JC, Banavar NV, Rotrosen J, Glimcher PW, et al. (2022): A neuroeconomic signature of opioid craving: How fluctuations in craving bias drug-related and nondrug-related value. Neuropsychopharmacology. 47:1440–1448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Berridge KC (2012): From prediction error to incentive salience: mesolimbic computation of reward motivation. Eur J Neurosci. 35:1124–1143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zhang J, Berridge KC, Tindell AJ, Smith KS, Aldridge JW (2009): A neural computational model of incentive salience. PLoS Comput Biol. 5:e1000437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Berridge KC, Zhang J, Aldridge JW (2008): Computing motivation: Incentive salience boosts of drug or appetite states. Behav Brain Sci. 31:440–441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.SAMHSA (2021): Key Substance Use and Mental Health Indicators in the United States: Results from the 2020 National Survey on Drug Use and Health. SAMHSA. [Google Scholar]
  • 14.CDC (2021): Drug Overdose Deaths in the U.S. Top 100,000 Annually. National Center for Health Statistics: CDC. [Google Scholar]
  • 15.Hser YI, Saxon AJ, Huang D, Hasson A, Thomas C, Hillhouse M, et al. (2014): Treatment retention among patients randomized to buprenorphine/naloxone compared to methadone in a multi-site trial. Addiction. 109:79–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Connery HS (2015): Medication-assisted treatment of opioid use disorder: review of the evidence and future directions. Harv Rev Psychiatry. 23:63–75. [DOI] [PubMed] [Google Scholar]
  • 17.Soyka M, Zingg C, Koller G, Kuefner H (2008): Retention rate and substance use in methadone and buprenorphine maintenance therapy and predictors of outcome: results from a randomized study. Int J Neuropsychopharmacol. 11:641–653. [DOI] [PubMed] [Google Scholar]
  • 18.Garland EL (2016): Restructuring reward processing with Mindfulness-Oriented Recovery Enhancement: novel therapeutic mechanisms to remediate hedonic dysregulation in addiction, stress, and pain. Ann N Y Acad Sci. 1373:25–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Leventhal A, Trujillo M, Ameringer K, Tidey J, Sussman S, Kahler C (2014): Anhedonia and the relative reward value of drug and nondrug reinforcers in cigarette smokers. Journal of abnormal psychology. 123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Padoa-Schioppa C (2009): Range-adapting representation of economic value in the orbitofrontal cortex. The Journal of neuroscience : the official journal of the Society for Neuroscience, pp 14004–14014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Pischedda D, Palminteri S, Coricelli G (2020): The Effect of Counterfactual Information on Outcome Value Coding in Medial Prefrontal and Cingulate Cortex: From an Absolute to a Relative Neural Code. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Rustichini A, Conen KE, Cai X, Padoa-Schioppa C (2017): Optimal coding and neuronal adaptation in economic decisions. Nature communications. 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Louie K PWG (2012): Efficient coding and the neural representation of value. Annals of the New York Academy of Sciences. [DOI] [PubMed] [Google Scholar]
  • 24.Padoa-Schioppa C, Conen KE (2017): Orbitofrontal Cortex: A Neural Circuit for Economic Decisions. Neuron. 96:736–754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hunter LE, Daw ND (2021): Context-sensitive valuation and learning. Curr Opin Behav Sci. 41:122–127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Louie K, Glimcher PW, Webb R (2015): Adaptive neural coding: from biological to behavioral decision-making. Current Opinion in Behavioral Sciences. 5:91–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Padoa-Schioppa C, Assad JA (2006): Neurons in the orbitofrontal cortex encode economic value. Nature, pp 223–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Padoa-Schioppa C, Assad JA (2008): The representation of economic value in the orbitofrontal cortex is invariant for changes of menu. Nature neuroscience, pp 95–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Bavard S, Lebreton M, Khamassi M, Coricelli G, Palminteri S (2018): Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences. Nat Commun. 9:4503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Bavard S, Rustichini A, Palminteri S (2021): Two sides of the same coin: Beneficial and detrimental consequences of range adaptation in human reinforcement learning. Sci Adv. 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Conen KE, Padoa-Schioppa C (2019): Partial Adaptation to the Value Range in the Macaque Orbitofrontal Cortex. The Journal of neuroscience : the official journal of the Society for Neuroscience. 39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Nieuwenhuis S, Heslenfeld DJ, Alting Von Geusau NJ, Mars RB, Holroyd CB, Yeung N (2005): Activity in human reward-sensitive brain areas is strongly context dependent. NeuroImage. 25:1302–1309. [DOI] [PubMed] [Google Scholar]
  • 33.Elliott R, Agnew Z, Deakin JF (2008): Medial orbitofrontal cortex codes relative rather than absolute value of financial rewards in humans. Eur J Neurosci. 27:2213–2218. [DOI] [PubMed] [Google Scholar]
  • 34.Palminteri S, Khamassi M, Joffily M, Coricelli G (2015): Contextual modulation of value signals in reward and punishment learning. Nature Communications: Nature Publishing Group, pp 8096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Keramati M, Durand A, Girardeau P, Gutkin B, Ahmed SH (2017): Cocaine addiction as a homeostatic reinforcement learning disorder. Psychological Review. 124:130–153. [DOI] [PubMed] [Google Scholar]
  • 36.Koob GF, Le Moal M (1997): Drug abuse: hedonic homeostatic dysregulation. Science (New York, NY: ). 278. [DOI] [PubMed] [Google Scholar]
  • 37.Maarefvand M, Eghlima M, Rafiey H, Rahgozar M, Tadayyon N, Deilamizadeh A, et al. (2015): Community-based relapse prevention for opiate dependents: a randomized community controlled trial. Community Ment Health J. 51:21–29. [DOI] [PubMed] [Google Scholar]
  • 38.Damian AJ, Mendelson T, Agus D (2017): Predictors of buprenorphine treatment success of opioid dependence in two Baltimore City grassroots recovery programs. Addict Behav. 73:129–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Dreifuss JA, Griffin ML, Frost K, Fitzmaurice GM, Potter JS, Fiellin DA, et al. (2013): Patient characteristics associated with buprenorphine/naloxone treatment outcome for prescription opioid dependence: Results from a multisite study. Drug Alcohol Depend. 131:112–118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Daniels AM, Salisbury-Afshar E, Hoffberg A, Agus D, Fingerhood MI (2014): A novel community-based buprenorphine program: client description and initial outcomes. J Addict Med. 8:40–46. [DOI] [PubMed] [Google Scholar]
  • 41.Peterson C, Liu Y, Xu L, Nataraj N, Zhang K, Mikosz CA (2019): U.S. National 90-Day Readmissions After Opioid Overdose Discharge. Am J Prev Med. 56:875–881. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Garavan H, Brennan KL, Hester R, Whelan R (2013): The neurobiology of successful abstinence. Curr Opin Neurobiol. 23:668–674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Alvarez EE, Hafezi S, Bonagura D, Kleiman EM, Konova AB (2022): A Proof-of-Concept Ecological Momentary Assessment Study of Day-Level Dynamics in Value-Based Decision-Making in Opioid Addiction. Front Psychiatry. 13:817979. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Gueguen CM, Schweitzer EM, Konova AB (2021): Computational theory-driven studies of reinforcement-learning and decision-making in addiction: what have we learned? Current Opinion in Behavioral Sciences: Elsevier Ltd, pp 40–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Biernacki K, McLennan SN, Terrett G, Labuschagne I, Rendell PG (2016): Decision-making ability in current and past users of opiates: A meta-analysis. Neurosci Biobehav Rev. 71:342–351. [DOI] [PubMed] [Google Scholar]
  • 46.Konova AB, Lopez-Guzman S, Urmanche A, Ross S, Louie K, Rotrosen J, et al. (2020): Computational Markers of Risky Decision-making for Identification of Temporal Windows of Vulnerability to Opioid Use in a Real-world Clinical Setting. JAMA Psychiatry. 77:368–377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Greiner MG, Shulman M, Choo TH, Scodes J, Pavlicova M, Campbell ANC, et al. (2021): Naturalistic follow-up after a trial of medications for opioid use disorder: Medication status, opioid use, and relapse. J Subst Abuse Treat. 131:108447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Hayes WM, Wedell DH (2023): Testing models of context-dependent outcome encoding in reinforcement learning. Cognition. 230:105280. [DOI] [PubMed] [Google Scholar]
  • 49.Pompilio L, Kacelnik A (2010): Context-dependent utility overrides absolute memory as a determinant of choice. Proceedings of the National Academy of Sciences. 107:508–512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Klein TA, Ullsperger M, Jocham G (2017): Learning relative values in the striatum induces violations of normative decision making. Nature Communications. 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Rustichini A, Conen KE, Cai X, Padoa-Schioppa C (2017): Optimal coding and neuronal adaptation in economic decisions. Nat Commun. 8:1208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Barretto-Garcia M, de Hollander G, Grueschow M, Polania R, Woodford M, Ruff CC (2023): Individual risk attitudes arise from noise in neurocognitive magnitude representations. Nat Hum Behav. 7:1551–1567. [DOI] [PubMed] [Google Scholar]
  • 53.Tymula A, Wang X, Imaizumi Y, Kawai T, Kunimatsu J, Matsumoto M, et al. (2023): Dynamic prospect theory: Two core decision theories coexist in the gambling behavior of monkeyss and humans. Sci Adv. 9:eade7972. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES