Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Mar 1.
Published in final edited form as: Biol Psychiatry Cogn Neurosci Neuroimaging. 2018 Aug 3;4(3):291–299. doi: 10.1016/j.bpsc.2018.07.009

In cocaine dependence, neural prediction errors during loss avoidance are increased with cocaine deprivation and predict drug use

John M Wang 1,2, Lusha Zhu 1, Vanessa M Brown 1,2, Richard De La Garza II 3,#, Thomas Newton 3,#, Brooks King-Casas 1,2,4, Pearl H Chiu 1,2
PMCID: PMC6857782  NIHMSID: NIHMS1509035  PMID: 30297162

Abstract

Background:

In substance dependent individuals, drug-deprivation and drug-use trigger divergent behavioral responses to environmental cues. These divergent responses are consonant with data showing that short- and long-term adaptations in dopamine signaling are similarly sensitive to state of drug-use. These literatures suggest a drug-state dependent role of learning in maintaining substance use; evidence linking dopamine to both reinforcement learning and addiction provides a framework to test this possibility.

Methods:

In a randomized crossover design, 22 participants with current cocaine use disorder completed a probabilistic loss-learning task during functional MRI while on- and off-cocaine (44 sessions). 54 participants without Axis-I psychopathology served as a secondary reference group. Within-drug state and paired-subjects’ learning effects were assessed with computational-model derived individual learning parameters. Model-based neuroimaging analyses evaluated effects of drug-use state on neural learning signals. Relationships among model-derived behavioral learning rates (α+, α-), neural prediction error signals (δ+, δ-), cocaine use, and desire to use were assessed.

Results:

During cocaine-deprivation, cocaine dependent individuals exhibited heightened positive learning rates (α+), heightened neural positive prediction error (δ+) responses, and heightened association of α+ with neural δ+ responses. The deprivation-enhanced neural learning signals were specific to successful loss avoidance, comparable to non-psychiatric participants, and mediated a relationship between chronicity of drug use and desire to use cocaine.

Conclusions:

Neurocomputational learning signals are sensitive to drug-use status and suggest that heightened reinforcement by successful avoidance of negative outcomes may contribute to drug seeking during deprivation. More generally, attention to drug-use state is important for delineating substrates of addiction.

Keywords: addiction, reinforcement learning, cocaine, prediction error, dopamine, fMRI, computational psychiatry

Introduction

In substance dependent individuals, responses to negative environmental cues appear to both vary with state of drug use and contribute to continued drug seeking. Specifically, when drug deprived, substance dependent individuals are adept at avoiding negative states, such as withdrawal and isolation, through drug use (1, 2). At the same time, when drug using, dependent individuals ignore negative outcomes, including serious social, health, and economic costs (1, 3, 4). The divergence between these behavioral responses to negative consequences suggests a drug-state dependent role for loss-learning (i.e., learning about negative outcomes) in maintaining substance use. In particular, these clinical and behavioral data suggest the hypothesis that heightened reinforcement from avoiding negative states may facilitate drug seeking during deprivation, relative to during substance use. A largely parallel but related literature linking neural dopamine (DA) systems in reinforcement learning (57) and cocaine addiction (810) provides a framework within which to examine this possibility.

Extant data show that while acute cocaine use increases striatal DA (9), chronic cocaine use decreases postsynaptic DA receptor availability (9). These short- and long-term neurophysiological adaptations together contribute to changes in DA signaling and detection that are sensitive to state of drug use (811); for reviews, see (12, 13). In addition, related evidence indicates that during healthy reinforcement learning, DA release encodes prediction errors (δ; signaling ‘better’ or ‘worse’ than expected) which have detectable correlates in human striatum (14, 15). Together, these literatures suggest that in the case of substance dependence, DA-related learning signals are likely to be enhanced with drug deprivation, as DA receptors are relatively freed, allowing the detection of prediction errors. A few previous studies that have examined neural substrates of contingency learning in cocaine dependence have primarily focused on reward-learning and found decreased prediction error signaling in cocaine dependent individuals compared with controls (1618; see 19 who report greater baseline neural win responses in successful future abstainers)

To evaluate cocaine-state modulation of learning signals and assess the potential drug-state dependent role of loss-learning mechanisms in maintaining substance dependence, we tested cocaine-dependent individuals in a loss-learning task both during cocaine deprivation and when using cocaine as usual. Using a computational psychiatry approach, we assessed behavioral and neural learning substrates, in the form of model-derived learning rate parameters and striatal encoding of prediction errors, respectively, and tested the relationship of neural learning signals to measures of drug use and dependence.

Methods and Materials

Participants and Experimental Design

Twenty-two right handed, non-treatment seeking male individuals who met criteria only for current cocaine use disorder without other substance dependencies or comorbid Axis-I psychopathology were enrolled from a larger study on biomarkers of substance use (see Table 1 for demographic information and Supplemental Methods for inclusion/exclusion criteria). Following an initial lab visit to assess cocaine use and entrance criteria, eligible individuals participated in two subsequent scanning sessions in a within-subject design: In one session, participants were instructed to use cocaine as usual (C+), and in a second session, participants were instructed to abstain from cocaine use for at least 72 hours (C-). Cocaine-use state was verified at each lab visit with urine testing for cocaine metabolites (NIDA 5-panel drug test; Alere, Inc.); C+ and C- sessions were counterbalanced for order. All participants provided informed consent and all procedures were approved by the Institutional Review Boards of Baylor College of Medicine and Virginia Tech.

Table 1.

A) Participant Characteristics
Variable - Mean (SD) Cocaine Dependent Individuals (N = 22)

Age 45.7 (7.0)
Education - Years 12.9 (1.0)
WTAR* 98.4 (10.8)
Years of Cocaine Use 17.6 (7.9)
B) Participant self-reported craving
Drug Use Information - Mean (SE) Deprived (C-) Using As Usual (C+)

Estimated Cocaine Intake Last 48 hours 0g (0) 15g (0.5)
CCQ** Grand total 137.9 (8.7) 145.3 (8.7)
CCQ Anticipated Positive Outcome 2.4 (0.3) 2.7 (0.3)
CCQ Desire to Use 3.2 (0.2) 3.4 (0.2)
CCQ Intention to Use 3.0 (0.3) 3.0 (0.3)
CCQ Anticipated Withdrawal Relief 3.9 (0.4) 3.5 (0.4)
CCQ No Control 3.8 (0.4) 3.5 (0.4)
*

Wechsler Test of Adult Reading (Standardized Score Representing Verbal IQ)

**

Cocaine Craving Questionnaire (grand total composed of total raw score from each individual item; subscales are averaged across items)

Participants completed a probabilistic loss-learning task (Figure 1A) during fMRI scanning in two separate lab sessions (N = 22, each scanned in both states of cocaine use; see Supplemental Methods for scanning parameters and preprocessing procedures). The task entailed learning from repeated choices between two losing options (two-arm bandit in the loss domain; see details in Supplemental Methods; adapted from 7, 20), with one having a higher probability of producing the better outcome (i.e., smaller loss). On each trial, subjects chose between two abstract stimuli and subsequently observed the outcome (Figure 1A). Participants were instructed that one option was better than the other and that payment was related to their choices but were not explicitly informed of the outcome probabilities or loss framework. Trials were presented for a maximum of 36 trials per block or when sufficient learning occurred (see Supplemental Methods for learning criteria). Each block consisted of novel stimuli, which required participants to learn the contingencies between stimuli and outcomes within each block.

Figure 1. Experimental design and cocaine-modulation of learning.

Figure 1.

A) Participants performed a probabilistic loss-learning task wherein they made a series of choices between two options and were shown the outcome of each choice. In this example, the selected option has a 75% chance of losing $0.25 and a 25% chance of losing $0.75 and is the ‘better’ of the two options (less loss). Participants completed trials until learning occurred or up to 36 trials per block. B) The reinforcement learning model incorporated positive and negative prediction errors + and δ-, respectively) which updated the subsequent expected value (Q) with separately estimated learning rates (α+ and x- respectively). Trial-by-trial prediction errors are computed as the difference between the outcome and expected value (R - Q). C) The model predicted probability of selecting the better option was a good fit with both C- and C+ participants’ actual behavior (% better option selected across subjects). Model prediction and actual selection for participants in the C- (blue) and C+ (red) drug-use states were similar (no difference of average log likelihood per trial in a paired comparison across states; t(17) = 1.47, p = 0.14). D) In both states of drug use, individuals showed learning, improving in choosing the better option as trials progressed. C- relative to C+ individuals showed diminished overall accuracy (t(17) = 2.62, p = 0.01). E) Bootstrapped group estimates for positive learning rate (α+), negative learning rate (α-), and inverse temperature (β) suggested higher α+ and lower β during cocaine deprivation. F) To clarify whether the drug-state modulation of α+ and/or β was associated with the cocaine- state differences in model-free behavioral performance (panel d), we simulated behavioral choices iterating through the observed ranges of α+ and β parameter values and show that increasing α+ was associated with diminished total earnings, while performance did not vary with changes in β.

An additional group of fifty-four males (see Table S1A for demographics) with no history of Axis-I psychopathology were used as an independent non-psychiatric control sample to identity learning-related striatum activation that could be used as a reference against which to interpret any neural effects observed in the CUD individuals, and to compute non-psychiatric individual parameter estimates for model validation and parameter recovery.

Behavioral Analyses

Model-agnostic behavioral analyses.

To verify that participants learned during the task, the behavioral choices of each individual in each drug-use state were examined over time and quantified as the percentage of trials that the objectively better choice was selected. Within- sample and paired t-tests on performance were implemented in MATLAB.

Computational model-based behavioral analyses.

To assess model-derived behavioral learning effects for each participant in each drug-use state, participants’ behavioral choices were fit to a basic reinforcement Q-learning model which included two learning rates (α) that provided separate update rules for positive (δ+) and negative (δ-) prediction errors (better or worse than expectations, mapping on to successful and unsuccessful loss avoidance; d+ and S-, adapted from previous studies (2124), in the form of positive (α+) and negative (α-) learning rates (Figure 1B). The separate learning rates allowed us to evaluate asymmetries in learning from δ+ and δ- (e.g., 1620) and assess the possibility that cocaine’s effects on DA systems differentially impact these components. The model was a good fit of participants’ choices during the loss- learning task (Figure 1C), was a better fit then a single learning rate model, and model and parameter recovery using simulated data further verified the fit of the model to the observed behavior (see Behavioral Results). A third estimated parameter, inverse temperature (β), provided a measure of exploration and indicated the sensitivity of choice probabilities to differences in values. See Supplemental Methods for additional descriptions of model selection, model validation, and parameter recovery procedures (Supplemental Methods, Model Fitting and Selection).

For the two learning rate model, the initial expected values Q(0) for the possible choices a and b were set to 0 as participants were not instructed a priori about the range of possible outcomes. For trial number t, the outcome for the chosen option a was represented by Ra(t) with the expected value represented by Qa(t). The prediction error δ(t), which measures the difference in outcome Ra(t) and expectation Qa(t), for a trial was defined as follows:

δ(t)=Ra(t)Qa(t)

The parameter estimation procedures included separate update rules for positive and negative prediction errors δ(t) in the form of positive (α+) and negative (α-) learning rates, respectively (Figure 1B). The learning rate parameters quantify how much weight the prediction error δ(t) from current trials is given in updating the following trials’ expected value Qa(t+1):

Qa(t+1)={Qa(t)+α+δ(t) if δ(t)>0Qa(t)+αδ(t) if δ(t)0  

A standard softmax action selection function was used to calculate the probability of selecting choice a at time t and was implemented as follows:

Pa(t)=eQa(t)βeQa(t)β+eQb(t)β

Positive and negative learning rates (α+ and α-) and inverse temperature β were free parameters, iteratively estimated in MATLAB using the function fminsearch, that were evaluated to have the maximum log likelihood (25). Learning rates were bounded between 0:1 and inverse temperature was bounded between 0:∞. For the unchosen option b, the expected value of the subsequent trial Qb(t+1) was set to the current trial’s expected value Qb(t) multiplied by an additional decay parameter (ϕ, bounded between 0:∞) similar to previous studies (2629).

Individual variances in learning rates (α- and α+) as an effect of drug-state (C- or C+) were estimated for second level fMRI analysis (see Imaging Analysis below). First, the prior mean and distribution of learning rates for participants in each drug state (C- and C+) were estimated using bootstrapped maximum likelihood created via sampling, approximating integration, around a bootstrapped maximum likelihood estimation across subjects (30, 31). Individual learning rates were subsequently estimated by conditioning individuals’ behavioral data on the respective drug state (C- or C+) group’s prior distribution to account for drug-use status differences. For each participant, individual C+ learning rates were then subtracted from the individual C- learning rates to compute a cocaine ‘deprivation enhanced’ learning rate for each participant. Group specific bootstrapped estimates were used for the inverse temperature and decay parameters during individual estimation of learning rates.

Imaging Analyses

To examine neural substrates of loss learning associated with cocaine use-state in dependent individuals, model-derived learning variables fit across all participants (as described above) were first correlated with functional MRI data collected during the loss-learning task. Next, the model-based neural prediction error signals were related to participants’ self-reported cocaine use measures.

First-level fMRI processing.

The general linear model (GLM) implemented in SPM8 (32) was used to perform neuroimaging analyses at the individual and group levels. For the first level analyses, onset times for stimuli, outcome events for δ+ outcomes, and outcome events for δ- outcomes for each trial were modeled as separate punctate events. The outcomes were categorized based on the sign of the prediction error (δ > 0 or δ < 0, indicating δ+ and δ-), using the fitted estimates of the 2 Learning Rate model, in which trial-by-trial δs were generated (see procedure in Supplemental Methods). To examine the first level effects of drug-use status on neural representation of learning and valuation, cocaine positive (C+, urine positive for cocaine metabolites) and cocaine negative (C-, urine negative for cocaine metabolites) drug use states for each individual were modeled as separate first level GLMs. Trial-by-trial expected values (Q) were modeled as parametric regressors onto the response events. Trial-by-trial δ+ and δ-, and the actual outcomes were modeled as parametric regressors onto separate δ+ and δ- outcome events, respectively. Effects due to run number, time in scanner, and head movement parameters were modeled out as nuisance covariates for each time point.

Within and paired drug-state analyses.

The within-drug state (C- and C+, respectively) and paired-subjects (C- > C+) effects of cocaine use were compared using one-sample and paired subjects’ second level contrasts in SPM8. The effects of interest were neural responses to δ+ and δ−. In line with previous data demonstrating the role of striatum and DA in learning (7, 14, 15), the imaging analyses were masked for the striatum. Anatomical masks were constructed using WFU-pickatlas (33) including the structures of the caudate, putamen, and globus pallidus. Also included in the striatum mask was the nucleus accumbens (per Garrison et al., 34). Results were thresholded with a voxel level uncorrected p < 0.001, unless noted, and significant clusters were defined using family-wise-error correction.

Correlation analyses between cocaine-state modulated learning rate and neural prediction error signals.

To relate drug-state effects on behavioral learning rates (α+ and α-) with the corresponding neural δ signals, separate first level and second level GLMs were created to correlate within-subject drug-modulated α+ and α- differences (C- > C+ for α+ and C- > C+ for δ-, respectively) with the corresponding neural differences for positive and negative δ, respectively (C- > C+ for δ+ and C- > C+ for δ-, respectively). Results were again thresholded with a voxel level uncorrected p < 0.001, and significant clusters were defined using family- wise-error correction. In addition, leave-one-out cross-validation analyses were performed in regions of interest (see Supplemental Methods) to reduce bias due to non-independence (35).

Relationships between neural prediction error responses and behavioral cocaine use measures.

To test relationships between the observed neural learning signals and cocaine use measures, questionnaire data characterizing individual drug use history and current cocaine craving were tested against subjects’ C- neural prediction error responses (given the primary results of interest involving enhanced δ+ from drug deprivation). Again, using leave-one-out cross-validation analysis, neural signals from trials with δ+ were correlated with years of drug use and subscales of the Cocaine Craving Questionnaire (36). The analyses identified relationships among years of drug use, δ+ neural signal, and the desire to use cocaine. Based on the results of the correlation analysis, a mediation analysis was performed testing whether neural learning signals mediated the relationship between duration of drug use and individuals’ desire to use cocaine or expected positive outcome from cocaine use (C- measures). A bootstrap approach to mediation (37) was implemented in R to calculate a 95% confidence interval with 10,000 bootstrapped resamples.

Results

Behavioral Results

Model-agnostic behavioral results.

In both drug-use states, participants demonstrated learning and performed significantly above chance (percentage of trials on which the ‘better’ option was chosen; C-: 62.76%, s.e. 3.22%, t(17) = 3.91, d = 0.96, p < 0.01; C+: 73.61%, s.e. 2.69%, t(17) = 8.70, d = 2.12, p < 0.01; chance: 50%; Figure 1D). In addition, participants in the C- state showed diminished accuracy, relative to C+ participants (t(17) = 2.62, d = 0.30, p = 0.01).

Computational model-derived behavioral results.

Computational model-based analyses, using bootstrapped group parameters (per 38; 200 iterations draws of subjects within drug-use state) for positive and negative learning rate (α+ and α-, respectively) and inverse temperature (β), suggested increased positive learning rates (α+) and decreased inverse temperature (β) in C- relative to C+ participants; negative learning rates (α-) did not differ between participants in the C- and C+ states (Figure 1E). To clarify whether cocaine-state modulation of positive learning rate (α+) or inverse temperature (β) was associated with the diminished behavioral accuracy in C- participants, we simulated behavioral choices holding α+ constant (iterating through the ranges of the observed parameter values) while allowing β to vary, and similarly holding β constant and allowing α+ to vary. As shown in Figure 1F, these simulations revealed increased α+ to be associated with decreased performance and no relationship between β and performance (see simulation details in Supplemental Methods). Together, these data provided initial evidence of drug-state modulation of learning, wherein cocaine deprivation-related increases in positive learning rates are associated with diminished behavioral performance.

Imaging Results

Effects of cocaine-use state on neural prediction error signals.

Significant neural correlates of positive prediction error were observed in striatum for C- participants + for C-; Figure 2A; Table S2A), but not for C+ participants. In addition, no significant neural correlates of negative prediction error (δ-) were found during either drug-use state. Positive prediction error (δ+) responses were verified in the non-psychiatric participants (Figure S1, Table S2B), and post hoc analyses using an independent striatum ROI indicated that δ+ in C- participants was comparable to this control cohort whereas C+ participants showed significantly diminished δ+ responses (C-vs non-psychiatric control, t(70) = 0.15, d = 0.004, p = 0.87; C+ vs non-psychiatric control, t(70) = 3.22, d = 0.09, p = 0.001; Figure S1; see analytic details in Supplemental Methods).

Figure 2. Cocaine-use status modulates neural learning signals and reveals increased positive prediction error signaling during deprivation.

Figure 2.

A) Positive prediction error signal (δ+) was found in the right striatum for cocaine-deprived participants (C-; peak voxel at T = 5.33; cluster FWE corrected p = 0.005; thresholded at T = 2.9 for visualization, see Table S2A). No significant neural δ+ signals were found for participants using cocaine as usual (C+). Neither C- nor C+ participants showed significant neural responses to negative prediction errors (δ-). See also Figure S2A and Table S2D for C- > C+ contrasts that further show ‘deprivation- enhancement’ of learning signals. B) Deprivation-enhancement of positive learning rate (α+) was correlated with deprivation-enhancement of positive prediction error (δ+) in the striatum (C-> C+ for α+ and neural C- > C+ for δ+; left striatum peak at T = 7.20, Right striatum peak at T = 5.61; both p < 0.0001; thresholded at T = 2.9 with striatum mask for visualization, see Table S2C). C) Drug-state dependent (i.e., C- > C+) neural δ beta values were extracted for both δ+ and δ- from bilateral striatum and correlated with their corresponding deprivation-enhanced learning rates. For positive learning rate, the degree of participants’ deprivation-enhancement was significantly associated with the degree of deprivation-enhancement of positive prediction error responses in the striatum (C- > C+ for α+ and neural C- > C+ for δ+; Table S2C; r = 0.79, p < 0.01; 27). No relationship between drug-state modulation of negative learning rates and their associated neural prediction error signals was observed (C- > C+ for α-and neural C- > C+ for δ- ; r = −0.08, p = 0.72. Beta value differences (C- > C+, whole brain normalized) are plotted.

The specificity of the neural encoding of positive prediction errors in the C- participants (Figure 2A) was striking in its parallel with the increased positive learning rate in these participants. Thus, to test for a neural instantiation of the deprivation-increased positive learning rate, we first computed individual behavioral learning rate estimates for each participant in the C- and C+ states, respectively (see Methods; 30), and generated for each participant ‘deprivation-enhanced’ positive and negative learning rate metrics (i.e., C- > C+ for α- and α+, respectively, for each individual). For positive learning rate, the degree of participants’ deprivation-enhancement was significantly associated with the degree of deprivation-enhancement of positive prediction error responses in the striatum (C- > C+ for α+ and neural C- > C+ for δ+; Figure 2B; Table S2C; r = 0.79, p < 0.01; using leave-one-out cross validation to avoid potential bias due to non-independence; 35). No relationship between drug-state modulation of negative learning rates and their associated neural prediction error signals was observed (C- > C+ for α- and neural C- > C+ for δ-; r = −0.08, p = 0.72; Figure 2C). For C- > C+ contrasts that further show ‘deprivation- enhancement’ see Figure S2A and Table S2D. Figure S3 shows similar imaging results when using group estimates from within-status behavioral estimates. In addition, no effects of cocaine deprivation on neural expected value signals were detected (Figure S2B), indicating generally intact outcome valuation unaffected by drug use status.

Results relating neural prediction error signals and behavioral cocaine use measures.

As described above, the specificity of drug-state modulation and deprivation-enhancement to positive (i.e., successful loss avoidance) prediction errors (δ+) was consistent with the hypothesis that reinforcement from successfully avoiding negative states contributes to continued drug seeking in addiction. In this case, successful loss avoidance in cocaine-deprived participants should be further related to aspects of real-world cocaine use. To test this possibility, we regressed C- individuals’ neural δ+ responses (beta values from outcomes with δ+) against self-reported drug craving (subscales of Cocaine Craving Questionnaire, CCQ; 36; Figure S4A; Table S3) and observed that neural δ+ responses were related specifically to the desire to use cocaine (Figure 3A; Table S3; r = 0.70, p < 0.01; correlations again performed using neural signals obtained from leave-one-out cross-validation analyses and Bonferroni corrected for multiple comparisons as described in Methods; 35). These relationships were also present using the deprivation-enhanced neural δ+ signal (i.e., C- > C+, extracted from outcomes with δ+; r = 0.67, p < 0.01) and not observed in the C+ state (i.e., signal from outcomes with ô+ while C+; Figure S4B; r = −0.20, p = 0.42). Greater neural δ+ responses during cocaine deprivation were also associated with greater years of cocaine use (C-; Figure 3A; r = 0.64, p < 0.01; no relationship between neural δ+ and chronicity of use was observed for participants in the C+ state, Figure S4C, r = −0.09, p = 0.71). No other subscales of the CCQ were correlated with striatal δ+ signals (Table S3). Lastly, desire to use cocaine (Figure 3A; r = 0.61, p < 0.01) was also positively correlated with participants’ years of cocaine use. Following these observed relationships, a mediation analysis (37; see Methods) revealed that the deprivation-enhanced neural δ+ signal fully mediated the relationship between years of cocaine use and desire to use cocaine use while deprived (Figure 3B; path c: β = 0.09, p < 0.01; path a: β = 0.06, p = 0.04; path b: β = 0.66, p = 0.01; path c’: β = 0.05, p = 0.07; mediation effect a*b: 95% CI 0.0008–0.0953).

Figure 3. During cocaine deprivation, neural positive prediction error signals mediate relationship between chronicity of drug use and desire to use cocaine.

Figure 3.

In deprived cocaine dependent individuals (C-): A) Positive prediction error signal (normalized δ+) in striatum is associated with greater desire to use cocaine (r = 0.70, p < 0.01), longer history of cocaine use predicted increased neural prediction error signal (normalized δ+; r = 0.64, p < 0.01), and longer history of cocaine use predicted higher desire to use cocaine (r = 0.61, p < 0.01). B) Following these correlational results, a mediation analysis found that the relationship between individuals’ years of cocaine use and their desire to use while deprived (path c: β = 0.09, p < 0.01) was fully mediated by individuals’ deprivation-enhanced neural δ+ signal (path a: β = 0.06, p = 0.04; path b: β = 0.66, p = 0.01; path c’: β = 0.05, p = 0.07; mediation effect a*b: 95% CI 0.0008–0.0953).

Discussion

Using a computational psychiatry approach (3941), we show drug-state modulation of learning signals in cocaine dependent participants, such that successful loss avoidance signals are greater during deprivation, and the neural responses are associated with both longer history of drug use and greater desire for cocaine. The specificity of the deprivation-enhancement to positive neural prediction error signals during loss-avoidance appears to parallel clinical descriptions of addiction as a cycle maintained by negative reinforcement wherein drug-deprived dependent individuals seek drugs and thus successfully avoid negative states (e.g., withdrawal, isolation etc.); such successful loss-avoidance has been posited to reinforce continued drug seeking (for relevant discussions see 1, 2).

These data are consonant with prior studies showing that with greater chronicity of cocaine use, physiological adaptations occur in DA systems (9, 11). In particular, the enhanced neural positive prediction error (δ+) encoding in C- relative to C+ cocaine dependent participants is consistent with studies showing that long term cocaine dependent humans have decreased density of striatal DA receptors and lower tonic DA levels (9) and that acute cocaine intake in chronically cocaine-treated mice reduces DA signaling (11). Following from these studies, δ+ signals ought to be more evident during drug deprivation (as observed here) than during drug use, as DA receptors, though diminished in density, are free in the deprived state to detect δ+ fluctuations. We note that in the present cocaine dependent participants, neural δ+ responses in the drug-deprived state are comparable with the δ+ observed in non-psychiatric control participants, whereas δ+ signaling in the drug-using state was diminished relative to the control participants. Together these data suggest that although learning signal impairments appear restored by cocaine deprivation in dependent participants, such intact learning can have increasingly detrimental consequences in the context of unhealthy reinforcers, negative environmental states, and adverse outcomes (e.g., when dependent individuals are faced with withdrawal avoidance, drug-available environments, and drug-use).

The present data are also relevant for closely related reports of significant increase in prediction error correlates following DA agonist administration (7) and computational model based theories that drug use exacerbates prediction errors (42) or triggers ‘false’ phasic activation of DA neurons (43). A key difference between these previous reports and the present findings is the incorporation of the consequences of long-term drug dependence (i.e., diminished DA functioning) into an understanding of learning in addiction (42, 44). In addition, the present diminished δ+ signaling in C+ and enhancement in C- individuals is consistent with related work showing DA drug-state modulation of learning signals in participants with Parkinson’s disease (who are known to have impaired DA function); these participants similarly show reduced prediction error-related BOLD responses when on DA enhancing medication (levodopa) and greater prediction error responses while off medication, specifically to positive prediction errors (δ+; 45, 46).

Finally, we show that greater neural loss-learning δ+ (signaling successful loss-avoidance) during deprivation mediates a relationship between chronicity of drug use and desire for cocaine. This relationship supports the hypothesis that drug-state dependent learning signals play a role in maintaining drug use. The present data thus emphasize that both drug-use chronicity and the context in which learning is assessed (e.g., loss, gain, etc.) may be critical for identifying neurobehavioral mechanisms that maintain drug use (for related data indicating differences in neural substrates of loss and gain learning, see 4750).

The limitations of the current work provide avenues for further study. First, a relatively small number of male participants were included in this study (N = 22). While the within-subjects design and advantages of sample homogeneity partially mitigate the sample size, replication in a larger, more diverse sample would address questions regarding generalizability. In addition, the present study identified drug-state modulation of responses to negative outcomes but does not evaluate the degree to which the physical consequences per se (i.e., small or large monetary loss), affect associated with the consequences, or other aspect of the outcomes, contribute to the reinforcement provided by successful loss avoidance. Clarifying the role of components of negative outcomes in maintaining substance use in dependence ought to be a focus of future studies. Finally, we focused our neural analyses primarily on regions of striatum, given previous work linking learning mechanisms and cocaine pharmacodynamics to these regions (57, 9). Supplemental analyses found no effects of cocaine-state on encoding of expected value (see Figure S2B), indicating generally intact outcome valuation unaffected by drug use status; nonetheless, other neural regions implicated in learning may be of interest in future investigations.

In summary, in cocaine dependent participants, we show that drug deprivation enhances neural signaling of successful loss avoidance, which in turn predicts increased desire to use cocaine. The deprivation-enhanced neural prediction error is in-line with prior reports of DA adaptations associated with chronic substance use (2, 9, 11) and also points to a potential mechanism via which drug seeking is maintained. That is, when dependent individuals are at their most vulnerable (i.e., during drug deprivation), reward signals associated with successful avoidance of negative states are at their greatest and may contribute to a pernicious cycle of drug seeking in the face of quit attempts. Of note, DA dysregulation has been associated with poor response to behavioral treatments in addiction (10) and innovative behavioral training protocols have identified learning systems as potential new mechanistic treatment targets for cocaine dependence (51). More generally, the current results support targeting learning-based therapies to identify goal driven behaviors that provide relief from the negative outcomes of drug deprivation and indicate that attention to drug state may be critical for understanding neural mechanisms of addiction and refining learning-based therapies.

Supplementary Material

1

Acknowledgments

We acknowledge the technical assistance of George Christopoulos, Dongil Chung, Jacob Lee, James Mahoney, Dharol Tankersley, Katherine McCurry, Nina Lauharatanahirun, and members of the Chiu, De La Garza, King-Casas, and Newton Labs. This work was supported in part by the National Institutes of Health (R01MH091872 to PC, R01DA036017 to BKC, RC1DA028387 and R01DA023624 to RDLG).

Footnotes

Financial Disclosures

The authors report no biomedical financial interests or potential conflicts of interest.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.West R, Hardy A (2005): Theory of Addiction. Malden, MA: Blackwell Publishing Ltd. [Google Scholar]
  • 2.Potenza MN, Sofuoglu M, Carroll KM, Rounsaville BJ (2011): Neuroscience of Behavioral and Pharmacological Treatments for Addictions. Neuron. 69: 695–712. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lucantonio F, Stalnaker TA, Shaham Y, Niv Y, Schoenbaum G (2012): The Impact of Orbitofrontal Dysfunction on Cocaine Addiction. Nat Neurosci. 15: 358–366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hyman SE, Malenka RC, Nestler EJ (2006): Neural Mechanisms of Addiction: The Role of Reward-Related Learning and Memory. Annu Rev Neurosci. 29: 565–598. [DOI] [PubMed] [Google Scholar]
  • 5.Montague PR, Hyman SE, Cohen JD (2004): Computational Roles for Dopamine in Behavioural Control. Nature, 2004/October/16 431: 760–767. [DOI] [PubMed] [Google Scholar]
  • 6.Schultz W, Dayan P, Montague PR (1997): A Neural Substrate of Prediction and Reward. Science (80-), 1997/March/14 275: 1593–1599. [DOI] [PubMed] [Google Scholar]
  • 7.Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD (2006): Dopamine-Dependent Prediction Errors Underpin Reward-Seeking Behaviour in Humans. Nature, 2006/August/25 442: 1042–1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Acevedo-Rodriguez A, Zhang L, Zhou F, Gong S, Gu H, De Biasi M, et al. (2014): Cocaine Inhibition of Nicotinic Acetylcholine Receptors Influences Dopamine Release. Front Synaptic Neurosci. 6: 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Volkow ND, Fowler JS, Wang G, Swanson JM (2004): Dopamine in Drug Abuse and Addiction: Results from Imaging Studies and Treatment Implications. Mol Psychiatry, 2004/April/21 9: 557–569. [DOI] [PubMed] [Google Scholar]
  • 10.Martinez D, Carpenter KM, Liu F, Slifstein M, Broft A, Friedman AC, et al. (2011): Imaging Dopamine Transmission in Cocaine Dependence: Link Between Neurochemistry and Response to Treatment. Am J Psychiatry. 168: 634–641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Park K, Volkow ND, Pan Y, Du C (2013): Chronic Cocaine Dampens Dopamine Signaling during Cocaine Intoxication and Unbalances D1 over D2 Receptor Signaling. J Neurosci. 33:15827–15836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Keramati M, Durand A, Girardeau P, Gutkin B, Ahmed SH (2017): Cocaine addiction as a homeostatic reinforcement learning disorder. Psychol Rev. 124: 130–153. [DOI] [PubMed] [Google Scholar]
  • 13.Willuhn I, Wanat MJ, Clark JJ, Phillips PEM (2010): Dopamine Signaling in the Nucleus Accumbens of Animals Self-Administering Drugs of Abuse. Curr Top BehavNeurosci. (Vol. 11), Springer, pp 29–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Montague PR, King-Casas B, Cohen JD (2006): Imaging Valuation Models in Human Choice. Annu RevNeurosci, 2006/June/17 29: 417–448. [DOI] [PubMed] [Google Scholar]
  • 15.Jocham G, Klein TA, Ullsperger M (2014): Differential Modulation of Reinforcement Learning by D2 Dopamine and NMDA Glutamate Receptor Antagonism. JNeurosci. 34: 13151–13162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Parvaz MA, Konova AB, Proudfit GH, Dunning JP, Malaker P, Moeller SJ, et al. (2015): Impaired Neural Response to Negative Prediction Errors in Cocaine Addiction. J Neurosci. 35: 1872–1879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Tanabe J, Reynolds J, Krmpotich T, Claus E, Thompson LL, Du YP, Banich MT (2013): Reduced Neural Tracking of Prediction Error in Substance-Dependent Individuals. Am J Psychiatry. 170: 1356–1363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rose EJ, Salmeron BJ, Ross TJ, Waltz J, Schweitzer JB, McClure SM, Stein E a (2014): Temporal Difference Error Prediction Signal Dysregulation in Cocaine Dependence. Neuropsychopharmacology 39: 1732–1742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Stewart JL, Connolly CG, May AC, Tapert SF, Wittmann M, Paulus MP (2014): Cocaine Dependent Individuals with Attenuated Striatal Activation during Reinforcement Learning are More Susceptible to Relapse. Psychiatry Res Neuroimaging. 223: 129–139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Brown VM, Zhu L, Wang JM, Frueh BC, King-Casas B, Chiu PH (2018): Associability- modulated loss learning is increased in posttraumatic stress disorder. Elife. 7: 1–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Niv Y, Edlund JA, Dayan P, O’Doherty JP (2012): Neural Prediction Errors Reveal a Risk- Sensitive Reinforcement-Learning Process in the Human Brain. J Neurosci, 2012/January/13 32: 551–562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Christakou A, Gershman SJ, Niv Y, Simmons A, Brammer M, Rubia K (2013): Neural and Psychological Maturation of Decision-making in Adolescence and Young Adulthood. J Cogn Neurosci. 25: 1807–1823. [DOI] [PubMed] [Google Scholar]
  • 23.Li J, Delgado MR, Phelps EA (2011): How Instructed Knowledge Modulates the Neural Systems of Reward Learning. Proc Natl Acad Sci. 108: 55–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Palminteri S, Lefebvre G, Kilford EJ, Blakemore S (2016): Confirmation Bias in Human Reinforcement Learning. BioRXiv. 1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Sutton RS, Barto AG (1998): Reinforcement learning: An introduction. (Vol. 1), Cambridge Univ Press. [Google Scholar]
  • 26.Niv Y, Daniel R, Geana A, Gershman SJ, Leong YC, Radulescu A, Wilson RC (2015): Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms. JNeurosci. 35: 8145–8157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Boorman ED, Behrens TEJ, Woolrich MW, Rushworth MFS (2009): How Green Is the Grass on the Other Side? Frontopolar Cortex and the Evidence in Favor of Alternative Courses of Action. Neuron. 62: 733–743. [DOI] [PubMed] [Google Scholar]
  • 28.Cavanagh JF (2015): Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times. Neuroimage. 110: 205–216. [DOI] [PubMed] [Google Scholar]
  • 29.Collins AGE, Frank MJ (2016): Neural signature of hierarchically structured expectations predicts clustering and transfer of rule sets in reinforcement learning. Cognition. 152: 160–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Daw ND (2011): Trial-by-trial Data Analysis Using Computational Models Decis Making, Affect Learn. Oxford, England: Oxford University Press, pp 3–38. [Google Scholar]
  • 31.Ahn W- YW, Krawitz A, Kim W, Busemeyer JR, Brown JW (2011): A Model-Based fMRI Analysis with Hierarchical Bayesian Parameter Estimation. J Neurosci Psychol Econ. 4: 95–110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Friston KJ, Holmes AP, Worsley KJ, Poline J- P, Frith CD, Frackowiak RSJ (1994): Statistical Parametric Maps in Functional Imaging: A General Linear Approach. Hum Brain Mapp. 2: 189–210. [Google Scholar]
  • 33.Maldjian JA, Laurienti PJ, Kraft RA, Burdette JH (2003): An Automated Method for Neuroanatomic and Cytoarchitectonic Atlas-Based Interrogation of fMRI Data Sets. Neuroimage. 19: 1233–1239. [DOI] [PubMed] [Google Scholar]
  • 34.Garrison J, Erdeniz B, Done J (2013): Prediction Error in Reinforcement Learning: A Meta- Analysis of Neuroimaging Studies. Neurosci BiobehavRev. 37: 1297–1310. [DOI] [PubMed] [Google Scholar]
  • 35.Esterman M, Tamber-Rosenau BJ, Chiu Y, Yantis S (2010): Avoiding Non-Independence in fMRI Data Analysis: Leave One Subject Out. Neuroimage. 50: 572–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Tiffany ST, Singleton E, Haertzen CA, Henningfield JE (1993): The Development of a Cocaine Craving Questionnaire. Drug Alcohol Depend. 34: 19–28. [DOI] [PubMed] [Google Scholar]
  • 37.Efron B, Tibshirani R (1986): Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy. Stat Sci. 1: 54–75. [Google Scholar]
  • 38.Martinez WL, Martinez AR (2008): Computational Statistics Handbook with MATLAB: Second Edition London, United Kingdom: Chapman and Hall/CRC. [Google Scholar]
  • 39.Maia TV, Frank MJ (2011): From Reinforcement Learning Models to Psychiatric and Neurological Disorders. Nat Neurosci, 2011/January/29 14: 154–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Friston KJ, Stephan KE, Montague R, Dolan RJ (2014): Computational Psychiatry: the Brain as a Phantastic organ. The Lancet Psychiatry. 1: 148–158. [DOI] [PubMed] [Google Scholar]
  • 41.Friston KJ, Redish AD, Gordon JA (2017): Computational Nosology and Precision Psychiatry. Comput Psychiatry. 1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Redish AD (2004): Addiction as a Computational Process Gone Awry. Science (80-), 2004/December/14 306: 1944–1947. [DOI] [PubMed] [Google Scholar]
  • 43.Schultz W (2011): Potential Vulnerabilities of Neuronal Reward, Risk, and Decision Mechanisms to Addictive Drugs. Neuron 69: 603–617. [DOI] [PubMed] [Google Scholar]
  • 44.Keiflin R, Janak PH (2015): Dopamine Prediction Errors in Reward Learning and Addiction: From Theory to Neural Circuitry. Neuron. 88: 247–263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Voon V, Pessiglione M, Brezing C, Gallea C, Fernandez HH, Dolan RJ, Hallett M (2010): Mechanisms Underlying Dopamine-Mediated Reward Bias in Compulsive Behaviors. Neuron. 65: 135–142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Schmidt L, Braun EK, Wager TD, Shohamy D (2014): Mind Matters: Placebo Enhances Reward Learning in Parkinson’s Disease. Nat Neurosci. 17: 1793–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Palminteri S, Justo D, Jauffret C, Pavlicek B, Dauta A, Delmaire C, et al. (2012): Critical Roles for Anterior Insula and Dorsal Striatum in Punishment-Based Avoidance Learning. Neuron. 76: 998–1009. [DOI] [PubMed] [Google Scholar]
  • 48.Seymour B, Daw N, Dayan P, Singer T, Dolan R (2007): Differential encoding of losses and gains in the human striatum. JNeurosci, 2007/May/04 27: 4826–4831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Cox SML, Frank MJ, Larcher K, Fellows LK, Clark C a, Leyton M, Dagher A (2015): Striatal D1 and D2 Signaling Differentially Predict Learning from Positive and Negative Outcomes. Neuroimage. 109: 95–101. [DOI] [PubMed] [Google Scholar]
  • 50.Pessiglione M, Delgado MR (2015): The Good, the Bad and the Brain: Neural Correlates of Appetitive and Aversive Values Underlying Decision Making. Curr Opin Behav Sci. 5: 78–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Ersche KD, Gillan CM, Jones PS, Williams GB, Ward LHE, Luijten M, et al. (2016): Carrots and sticks fail to change behavior in cocaine addiction. Science. 352: 1468–71. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES