Substance Use is Associated with Reduced Devaluation Sensitivity

Kaileigh A Byrne; A Ross Otto; Bo Pang; Christopher J Patrick; Darrell A Worthy

doi:10.3758/s13415-018-0638-9

. Author manuscript; available in PMC: 2021 Apr 15.

Published in final edited form as: Cogn Affect Behav Neurosci. 2019 Feb;19(1):40–55. doi: 10.3758/s13415-018-0638-9

Substance Use is Associated with Reduced Devaluation Sensitivity

Kaileigh A Byrne ¹, A Ross Otto ², Bo Pang ³, Christopher J Patrick ⁴, Darrell A Worthy ³

PMCID: PMC8049538 NIHMSID: NIHMS1511193 PMID: 30377929

Abstract

Substance use has been linked to impairments in reward processing and decision-making, yet empirical research on the relationship between substance use and devaluation of reward in humans is limited. Here, we report findings from two studies that tested whether individual differences in substance use behavior predicted reward learning strategies and devaluation sensitivity in a non-clinical sample. Participants in Experiment 1 (N = 66) and Experiment 2 (N = 91) completed subscales of the Externalizing Spectrum Inventory and then performed a two-stage reinforcement learning task that included a devaluation procedure. Spontaneous eye blink rate was used as an indirect proxy for dopamine functioning. In Experiment 1, correlational analysis revealed a negative relationship between substance use and devaluation sensitivity. In Experiment 2, regression modeling revealed that while spontaneous eyeblink rate moderated the relationship between substance use and reward learning strategies, substance use alone was related to devaluation sensitivity. These results suggest that once reward-action associations are established during reinforcement learning, substance use predicted reduced sensitivity to devaluation independently of variation in eyeblink rate. Thus, substance use is not only related to increased habit formation but also to difficulty disengaging from learned habits. Implications for the role of the dopaminergic system in habitual responding in individuals with substance use problems are discussed.

Keywords: substance use, decision-making, reward, devaluation, habit formation

Substance Use is Associated with Reduced Devaluation Sensitivity

The dual systems theory of reinforcement learning proposes that learning is mediated through two different neural systems: the model-free system and the model-based system (Daw, et al., 2005). The “model-free,” or habit-based, system learns solely from reward prediction errors. In contrast, learning that entails developing a model of the environment with different action-outcome sequence paths in order to maximize long-term future actions reflects the “model-based,” or goal-directed system (Daw et al., 2005; Doya et al., 2002; Friedel et al., 2014). The model-free system is less flexible and entails more automatic processing, while the model-based reinforcement learning system requires more cognitive resources but underlies more flexible, controlled processing (Otto et al., 2013).

The distinction between these two systems has recently received attention from researchers studying counterproductive habitual behaviors such as substance use problems. One theoretical framework for substance abuse proposes that vulnerabilities in decision-making contribute to maladaptive behaviors, such as drug use (Redish, Jensen, & Johnson, 2008). In particular, selective inhibition of the model-based system represents one such vulnerability. Repeated drug use may shift the balance from the model-based to the model-free system, which may, in turn, lead to difficulties in breaking habits (Redish et al., 2008). Dysfunction in the prefrontal cortex may disrupt an individual’s ability to update reward values of non-drug reinforcers. This process may lead to reduced ability to devalue rewards and contribute to compulsory drug use (Goldstein & Volkow, 2011). Indeed, there is evidence that the transition from recreational to addictive substance abuse corresponds to a shift in the use of model-based learning approaches to reliance on model-free strategies (Everitt & Robbins, 2005; Loewenstein & O’Donoghue, 2004; Gillan et al., 2015). For example, alcohol and stimulant addiction has been associated with selective impairment in model-based learning within a two-stage reinforcement learning task (Gillan et al., 2016; Sebold et al., 2014; Voon et al., 2015). Other work using instrumental learning tasks has shown that alcohol dependent individuals engage in more habit-based learning than controls (e.g., Sjoerds et al., 2013). Similarly, a recent review has proposed that cocaine addiction may impair model-based learning and promote model-free behavior (Lucantonio et al., 2014). However, this relationship has yet to be empirically tested in humans. In addition to substance use problems, impulsive behavior may also contribute to dysregulation in goal-directed and habit-based control (Hogarth et al., 2012). Thus, prior research provides some evidence linking externalizing behaviors with diminished goal-directed learning and reliance on habit-based responding.

While substance abuse is associated with distinct patterns of reward learning, another key factor that may play a role in substance use problems is difficulty in reward disengagement. Prior research indicates that substance abuse is associated with increased perseveration during reversal learning in humans (Ersche et al., 2008; Patzelt et al., 2014), yet it is important to note that animal lesion studies have found that distinct neural regions mediate reversal learning and devaluation (Izquierdo & Murray, 2007; Rudebeck & Murray, 2008). Reversal learning entails flexibly updating stimulus-reward associations, but devaluation involves updating reward value information and inhibiting prior learned responses (Rudebeck & Murray, 2008). Beyond variation in learning to respond to action-reward outcomes, substance use problems may correspond to marked declines in the ability to devalue or disengage from those reward-motivated responses. With these issues in mind, the present investigation utilized data from a large a large representative sample of males and females in the service of two major aims. The first was to examine relations of model-based and model-free components of reinforcement learning with a broad range of substance use behaviors and ascertain whether individual differences in eyeblink rate, an indirect proxy for dopamine functioning might moderate this relationship. The second was to evaluate whether substance use predicts reward disengagement after reinforcement learning has occurred.

One paradigm that has frequently been used to assess reward disengagement is the devaluation procedure, in which reductions in the value attached to a previously rewarded outcome are assessed. Recent work that has employed devaluation procedures in humans demonstrates that goal-directed associative learning predicts heightened devaluation sensitivity; on the other hand, habit-based learning is associated with insensitivity to reward devaluation (Friedel et al., 2014; Gillan et al., 2015). Reward devaluation has been shown to occur in dopamine-free rats (Berridge & Robinson, 1998), indicating that devaluation may not depend on variation in dopaminergic availability. Furthermore, animal research suggests that repeated substance use may lower one’s devaluation sensitivity threshold. For example, extended exposure to cocaine in rats has been associated with insensitivity to outcome values and, subsequently, reduced devaluation sensitivity (Leong, Berini, Ghee, & Reichel, 2016; Schoenbaum & Setlow, 2005). It is plausible, therefore, that substance use may predict diminished devaluation sensitivity in humans as well. However, it is unclear whether striatal dopamine influences devaluation in individuals with substance use tendencies.

Previous research suggests that variation in spontaneous eye blink rate (EBR), a hypothesized index of striatal tonic dopamine functioning (see Jongkees & Colzato, 2016 for a review of EBR methodology and findings), moderates the effects of trait disinhibition and substance abuse on reward wanting and learning, respectively (Byrne, Patrick, & Worthy, 2016). Specifically, low EBR levels predicted enhanced reward wanting in individuals with high disinhibitory tendencies; in contrast, high EBR predicted better associative learning in individuals who reported high amounts of substance abuse behaviors (Byrne, Patrick, & Worthy, 2016). Given the specific effects of substance use on associative reward learning, it is reasonable to predict that substance abuse tendencies, but not trait disinhibition, may also be associated with impaired reward disengagement.

To pursue this investigation, we utilized spontaneous EBR, which consistent evidence from several lines of research suggests may provide an indirect physiological index of striatal tonic dopamine functioning (Cavanagh et al., 2014; Elsworth et al., 1991; Groman et al., 2014; Jutkiewicz & Bergman, 2004; Kaminer et al., 2011; Karson, 1983; Taylor et al., 1999), although two recent PET imaging studies failed to observe a positive relationship between dopamine and blink rate (Dang et al., 2017; Sescousse et al., 2017). Prior research, including pharmacological and PET imaging work, demonstrates that faster spontaneous EBR is correlated with elevated dopamine levels in the striatum (Colzato et al., 2008; Karson, 1983; Taylor et al., 1999), and also with dopamine D₂ receptor density in the ventral striatum and caudate nucleus (Dreyer et al., 2010; Groman et al., 2014; Slagter et al., 2015). Moreover, a recent review concluded that in addition to the relationship between EBR and dopaminergic activity, this measure is also a reliable predictor of individual differences in performance on tasks that are dopamine-dependent and of aberrations in dopaminergic activity in psychopathology (Jongkees & Colzato, 2016). The Externalizing Spectrum Inventory-Brief Form (Patrick, Kramer, Krueger, & Markon, 2013) was used to assess trait disinhibition and recreational substance abuse tendencies. In order to test whether substance use tendencies predict both reward learning and disengagement, we used a two-stage reinforcement learning task in combination with a devaluation procedure that has been used in previous research (Gillan et al., 2015).

Study Hypotheses

Given previous work with this task and findings showing that recreational substance-using individuals with high spontaneous EBRs show enhanced reward learning, we tested the following hypotheses:

H1: Substance use, independent of the EBR index of dopaminergic functioning, will predict habit-based responding on the devaluation procedure, suggesting that the effects of tonic dopamine may not influence goal-directed responding once learning has occurred.

H2: Given previous work reporting that substance-dependent individuals show deficits in model-based learning (Gillan et al., 2016; Sebold et al., 2014; Voon et al., 2015), we hypothesized that substance use would negatively predict model-based strategies during the reinforcement learning phase of the task.

H3: High EBR will positively predict model-based control in the reinforcement learning phase of the task.

H4a: Individuals reporting high levels of recreational substance use with high EBR will show heightened goal-directed responding during the devaluation procedure.

H4b: Individuals reporting high levels of substance use in conjunction with low EBR will show heightened habit-responding during the devaluation procedure, reflecting a facilitative effect of dopamine on reward learning, but a hindering effect on goal-directed responding.

EXPERIMENT 1

Method

Participants

Sixty-six undergraduate students (47 females, 19 males; M_age = 18.42, SD_age = 0.71) completed the experiment for partial fulfillment of an introductory psychology course requirement. The study was approved by the Institutional Review Board at Texas A&M University before procedures were implemented.

Individual Difference Measures and Experimental Task

Spontaneous EBR (Tonic Dopamine Marker)

Consistent with previous research (e.g., Byrne, Norris, & Worthy, 2016; Byrne, Patrick, & Worthy, 2016; Chermahini & Hommel, 2010; Colzato, Slagter, van den Wildenberg, & Hommel, 2009; De Jong & Merckelbach, 1990; Ladas, Frantzidis, Bamidis, & Vivas, 2014), we measured spontaneous eye blink using electrooculographic (EOG) recording. We followed the procedure described by Fairclough and Venables (2006) to record EBR: Vertical eye blink activity was collected by attaching Ag–AgCl electro des above and below the left eye, with a ground electrode placed at the center of the forehead. All EOG signals were filtered at 0.01–10 Hz and amplified by a Biopac EOG100C differential corneal–retinal potential amplifier. In line with previous research, eye blinks were defined as phasic increases in EOG activity of >100 µV and less than 500ms in duration (Byrne, Norris, & Worthy, 2016; Byrne, Patrick, & Worthy, 2016; Barbato et al., 2000; Colzato et al., 2009; Colzato et al., 2007). Eye blink frequency was both manually counted and derived using BioPac Acqknowledge software functions, which computed the frequency of amplitude changes of greater than 100μV, but not duration differences, in order to ensure valid results. The manual and automated EBRs were strongly positively correlated, r = .97, p < .001. Manual EBR was used for all subsequent statistical analyses.

All recordings were collected during daytime hours of 10 a.m. to 5 p.m. because previous work has shown that diurnal fluctuations in spontaneous EBR can occur in the evening hours (Barbato et al., 2000). A black fixation cross (“X”) was displayed at eye level on a wall situated one meter from where the participant was seated. Participants were instructed to look in the direction of the fixation cross for the duration of the recording and avoid moving or turning their head. Eye blinks were recorded for six minutes under this basic resting condition. Each participant’s EBR was quantified by computing the average number of blinks per minute across the 6-min recording interval.

Externalizing Spectrum Inventory–Brief Form

To assess substance use behavior and general externalizing proneness, we administered the Substance Abuse and Disinhibition subscales from the Externalizing Spectrum Inventory – Brief Form (ESI-BF; Patrick et al., 2013). The 18-item Substance Abuse subscale assesses use of and problems with alcohol and drugs, while the 20-item Disinhibition subscale indexes an individual’s general proclivity for externalizing problems through questions pertaining to impulsivity, impatient urgency, alienation, irresponsibility, theft, and fraud. Participants responded to the items of each subscale using a 4-point Likert scale (true, somewhat true, somewhat false, and false), and responses were coded such that higher scores indicate a greater degree of substance abuse problems and externalizing tendencies, respectively. Prior research has shown strong validity for both subscales in relation to relevant criterion measures (Patrick & Drislane, 2015; Venables & Patrick, 2012). Additionally, within the current study sample, each subscale exhibited high internal consistency (Cronbach’s αs = .91 and .83 for Substance Abuse and Disinhibition, respectively). Scores for substance use ranged from 0 – 49 out of a maximum possible score of 72. Figure S1 in the Supplementary Materials displays a histogram showing the frequency distribution of substance use in this sample. Among participants in the sample, 36.92% endorsed a “True” or “Somewhat True” response on at least one marijuana use/problems question, and 40.00% endorsed a “True” or “Somewhat True” response on at least one item indicative of alcohol problems. Table 1 displays the rate of recreational substance use reported for each type of drug assessed.

Table 1.

Recreational Drug Use in Experiments 1 and 2

	Alcohol	Marjiuana	Hallucinogens	Depressants
Experiment 1	40%	37%	6%	5%
Experiment 2	39%	37%	7%	9%

Open in a new tab

Note. Percentages indicate recreational use of each substance reported. Specific hallucinogens included LSD and magic mushrooms and depressants included use of drugs like Valium and Xanax for non-medical reasons.

Barratt Impulsiveness Scale

In view of the close relationship between disinhibition and impulsivity, we also administered the 30-item Barratt Impulsiveness Scale (11^th version; BIS-11). The BIS-11 measures three domains of impulsiveness: motor, cognitive, and nonplanning (Patton et al., 1995). Participants used a 4-point Likert scale from 0 – 3 (rarely/never, occasionally, often, and almost always/always) to indicate the frequency in which they engaged in each questionnaire item. This scale has been shown to have a high degree of internal consistency among college students (α = .82; Patton et al., 1995).

Two-Stage Reinforcement Learning Devaluation Task

Reinforcement Learning Phase.

A two-stage reinforcement-learning task with a devaluation procedure was used to assess individual differences in devaluation (unlearning) of previously valued choices (Gillan et al., 2015; Figure 1). To compare devalued and non-devalued rewards in the devaluation phase, the task included two conditions or “states” (gold and silver) of a two-stage Markov decision process. Each state, which we refer to as a “trial type”, was independent of the other and was denoted using a gold or silver border that clearly outlined choice options. The reward structure and procedure were the same for both trial types, but rewards earned for each trial type were recorded separately. Participants were informed that the points they earned would be stored in a container of the corresponding color.

Figure 1. — Overview of the reinforcement learning task. Each trial began with one of two trial types (gold or silver). First stage selections, which cost 5 points, could lead to a common (70%) or rare (30%) second stage state, which had a distinct probability of reward that slowed changed based on random Gaussian walks across trials. Selections were not made in the second stage. Figure adapted from Gillan et al., 2015.

In the first stage of the task, participants completed two concurrent two-stage reinforcement learning tasks that were structurally equivalent, but had unique stimuli and rewards. Through this procedure, individuals gained experience with two situations (“states”) and reward outcomes in order to base future decisions on (Gläscher et al., 2010 ). Two options were presented, and participants were given 2.5s to choose one or the other. If a response was not made within this response window, the words “No Response” were displayed on the screen, and the next trial began. For each trial, choosing an option in the first stage cost 5 points, which was displayed on the screen after participants selected an option. Each first-stage option had a 70% probability of leading to a particular state (point box) in the second stage (common state transition) and a 30% probability of leading to a different state in the second stage (rare state transition). The probability that the states in the second stage contained a reward varied across the task based on independent Gaussian random walks (SD = .025) with a minimum probability of 25% and maximum probability of 75%. Rewards were portrayed in points such that the state-transition point boxes contained either 0 (unrewarded) or 100 (rewarded) points. After the second stage, the points earned were stored in a gold (gold trial type) or silver (silver trial type) container. The cumulative amount earned was displayed throughout the reinforcement learning phase.

Because each trial cost 5 points, if the state in the second stage led to 0 points, then there was a net loss of 5 points for that trial, and the 5 points were deducted from the cumulative total. If the second state yielded 100 points, then there was a net gain of 95 points that was added to the cumulative total. A model-based strategy entails learning which first-stage choice has the common second-stage state transition that is most likely to lead to reward. A model-based learner learns about the chosen first-stage option on common trials, and about the unchosen first- stage option on rare trials. In contrast, a model-free strategy entails staying with a first-stage option if it yielded a reward on the previous trial or switching if the previously selected option was not rewarded; thus, model-free strategies essentially ignore whether the second stage state transition was common or rare.

Devaluation Phase.

In the devaluation phase of the task, participants were informed that one of their point containers (i.e., gold) was full, and that they would no longer be able to store points in that container. Even if the second stage point box contained 100 points, participants were informed that the points would not be deposited in the container, and they would only be charged the 5 points for that trial. The other point container (i.e., silver) still had room, and participants could still keep points for those trial types. Thus, actions made to earn gold points became devalued.

The outcome of the second stage (either 0 or 100 points) was not displayed in the devaluation phase in order to ensure that new learning did not contribute to choices in this phase. Participants were informed about this procedural change and advised that the results of their choices (whether or not they received points on each trial) would no longer be shown, but apart from this, nothing about the game had changed and they should continue playing as before. Four trials with no feedback were presented prior to the devaluation phase so participants could learn and adjust to this change in feedback presentation before the devaluation procedure began. After these four trials, participants viewed instructions that one of their point containers was completely full. The optimal strategy in this phase was to respond the same way as in the reinforcement learning phase for the valued trial type that still had room (i.e., silver), and choose not to respond for the devalued trial type where the container was full (i.e., gold).

The two-stage reinforcement learning devaluation task used in this study was the same as that used in previous research (Gillan et al., 2015), with three modifications: (1) we used points rather than monetary rewards (coins) for the second state reward outcome because participants were recruited from university classes and completed testing for partial course credit rather than being recruited through MTurk and paid for participation as in Gillan et al. (2015); (2) we included 100 trials of the reinforcement learning phase, as opposed to the 200 trials used in some previous studies (e.g., Daw et al., 2011; Gillan et al., 2015), to align with the number of trials used in our own prior work (Byrne, Patrick, & Worthy, 2016); and (3) we included 50 trials of the devaluation phase rather than 20 trials. The reward structure, stimuli, and design were otherwise identical.

Procedure

Participants completed the questionnaires and two-stage reinforcement learning devaluation task on PC computers using Psychtoolbox for Matlab (version 2.5). Participants first completed the ESI-BF Substance Abuse and Disinhibition scales along with the BIS-11. Next, participants completed 100 trials of the reinforcement learning phase of the two-stage reinforcement learning devaluation task, followed by 50 trials of the devaluation phase. The instructions for the two-stage reinforcement learning phase and the devaluation phase are presented in the Supplementary Material. The session ended with the 6-min EBR assessment.

Data Analysis

Correlations and multiple hierarchical linear regressions were computed using JASP software (jasp-stats.org). JASP allows for both frequentist and Bayesian hypothesis tests. For all Bayesian tests we used the default priors from JASP. When testing whether a coefficient from a regression model is significant, we report both the p-value from a frequentists test, and also the Bayes factor (BF) value for the predictor being tested when entered into a model following the inclusion of all other predictor variables. Bayesian regression in JASP also follows a model comparison approach when evaluating which set of predictors best accounts for a data set. Bayes factors are interpreted as the odds supporting one hypothesis over another. BF values between 3–10 indicate substantial support for the alternative hypothesis over the null hypothesis that a coefficient’s true value is zero, BFs between 10–30 indicate strong support, BFs between 30–100 indicate very strong support, and a value above100 indicates extreme evidence that the alternative hypothesis should be supported over the null (Jeffreys, 1961). In comparative analyses, Wetzels and colleagues generally found that a BF of 3 corresponds to a p-value of around .01; many tests that yielded p-values between .01 and .05 did not have BFs larger than 3 (Wetzels et al., 2011). Thus, an advantage of including results from Bayesian hypothesis tests is that the conclusions are typically more conservative than those from frequentists tests using an alpha level of .05. Additionally, Bayes factors can be interpreted on a continuous scale, unlike p-values.

Mixed-effects logistic regression analyses were performed using the lme4 and brms modules of the R statistical package, version 3.0.1. The lme4 package uses frequentist methods, while the brms package uses Bayesian methods to estimate the model parameters. One advantage of using Bayesian methods is that models often fail to converge when using frequentist methods (Burkner, 2017). When reporting results from lme4 we use p-values to denote whether regression coefficients differed significantly from zero, and when reporting results from brms, we indicate whether the 95% highest density interval from a given coefficient’s posterior distribution includes zero. If it does not then that indicates substantial evidence that the coefficient’s true value is not zero.

Trial types (silver coded as 0, gold coded as 1) were computed independently in the analysis such that reward and common/rare states pertained to the previous outcomes for that trial type. For instance, for a trial in which participants selected an option on a gold trial type, the reward and common/rare outcome variables were computed based on the trial preceding that trial type. Reward, second state outcome (common or rare), their interaction, and participants were included as random effects. In line with the analyses of Gillan et al. (2015), we first tested the three-way interaction between Reward, Transition, and Devaluation sensitivity to predict Stay behavior (coded as stay 1, switch 0). The Reward from the previous trial was coded as rewarded 1 and unrewarded −1. Transition type on the previous trial was coded as common 1 and rare 0, and devaluation sensitivity scores were z-scored. Devaluation sensitivity was computed as the number of valued trials that participants responded on minus the number of devalued trials that participants responded on. Higher values indicate enhanced devaluation sensitivity.

To compute each participant’s model-free and model- based indices, representing the degree to which they engaged in model-free or model-based behavior, we first ran a model predicting Stay from Reward and Transition. The specific syntax for this mixed-effects logistic regression model was: Stay ~ Reward + Transition + Reward:Transition + (1 + Reward + Transition + Reward:Transition | Participant)¹. From this model, individual beta weights were derived. The betas from the Reward variable were used as the model-free metric, and the betas from the Reward X Transition interaction were designated as the model-based metric. Following Gillan et al. (2015), we also regressed Stay behavior on the interaction between Reward, Transition, and Devaluation Sensitivity. The specific syntax for this model is: Stay ~ Reward + Transition + Devaluation + Reward:Transition + Reward:Devaluation + Transition:Devaluation Reward:Transition:Devaluation + (1 + Reward + Transition + Reward:Transition | Participant).

To evaluate our hypotheses regarding whether the individual difference variables (substance use, EBR, and disinhibition) would predict performance measures from the task (devaluation sensitivity, model-based index, and model-free index), correlations were first computed among these variables. We predicted that a negative correlation would be observed between substance abuse and the devaluation sensitivity measure. Furthermore, we expected to find a positive correlation between EBR and the model-based index.

Next, to assess relationships between substance use and model-based behavior, we performed regression analyses that (1) tested the prediction that substance use would negatively predict model-based learning, and (2) investigated the possibility that substance use would interact with EBR to influence model-based strategies. These analyses allowed for testing the separate and interactive effects of continuous variations in externalizing tendencies and dopamine levels on reinforcement learning behavior. We employed a hierarchical regression procedure in order to test a priori predictions for substance use and its interaction with the EBR proxy of dopaminergic functioning. This approach aligns with analyses in previous work (Byrne, Patrick, & Worthy, 2016) showing that unique manifestations of externalizing proneness (substance use and disinhibition) exert differential effects on reward processing. The predictors in the model included EBR (striatal dopamine proxy), Substance use, Disinhibition, the EBR X Substance use interaction term, and the EBR X Disinhibition interaction term. In addition, two separate regression analyses were conducted to test the effect of these predictors on devaluation sensitivity and the model-free metric.

Finally, in addition to these hierarchical regression analyses, we also performed mixed effects logistic regression analyses where we included the individual difference variables in the model predicting stay probability from Reward, Transition, and their interaction. This was done to investigate whether devaluation, substance use, disinhibition, or devaluation sensitivity interacted with reward or the Reward X Transition interaction to predict stay behavior. In other words, this model tested whether any of the individual difference variables interacted with the model-free (Reward) or model-based (Reward X Transition) metric. These analyses were exploratory in nature, as we did not have specific predictions regarding individual differences interacting with the model-free or model-based metrics. The specific syntax and results are listed in the supplementary material. All data for both Experiments 1 and 2 are available on the Open Science Framework (https://osf.io/unk79/).

Results

Correlational Analyses

Correlations were used to quantify bivariate relations between the predictor variables (EBR index of striatal dopamine, ESI-BF Substance Abuse, ESI-BF Disinhibition, and BIS-11 impulsiveness) and the criterion measures (devaluation sensitivity index, model-free index, and the model-based index). Table 2 shows correlations among all variables. Substance Use was significantly negatively correlated with devaluation sensitivity (r = −.26, p < .05; Figure 2) and the model-based metric (r = −.30, p < .05, Figure 3). However, the BF for the substance use –devaluation sensitivity correlation was only 1.26, and the BF for the substance use – model-based metric was 2.65. Thus, these effects are suggestive, but weak. There were no significant correlations for disinhibition, impulsiveness, or the EBR index of striatal dopamine with any of the outcome measures.

Table 2.

Correlational Analyses for Experiment 1

	Substance Use	Disinhibition	Impulsiveness	EBR	Devaluation	MF Index
Disinhibition	0.20
Impulsiveness	0.09	0.48^**
EBR	−0.02	0.12	−0.13
Devaluation	−0.26^*	−0.15	−0.22	−0.09
MF Index	−0.20	0.02	0.19	−0.19	0.07
MB Index	−0.30^*	−0.09	−0.03	−0.10	0.07	0.72^**

Open in a new tab

Note. Impulsiveness refers to scores on the BIS-11 Impulsiveness Scale.

^**

indicates significance at the p < .01 level.

indicates significance at the p < .05 level.

Figure 2. — Correlation between substance use and devaluation sensitivity (valued – devalued trials) in Experiment 1.

Figure 3. — Correlation between substance use and the model-based index in the reinforcement learning task in Experiment 1.

Mixed Effects Logistic Regression Analyses

As described above, we assessed participants’ trial-by-trial changes in stay behavior during the reinforcement learning phase of the task (Figure 4) using a model utilized in previous work (Gillan et al., 2015) that included Devaluation sensitivity along with Reward and Transition to predict stay probability on each trial. In line with previous studies (Daw et al., 2011; Gillan et al., 2015), the results demonstrated a significant main effect of Reward (β = .20, p < .001), suggesting that participants as a whole engaged in a simple model-free reinforcement strategy during this phase of the experiment. However, the Reward X Transition interaction, indicative of a model-based strategy, was nonsignificant, β = .01, p = .82. Further results of this analysis are presented in Table S1 of the Supplementary Material.

Figure 4. — Analysis of stay behavior in the two-choice reinforcement learning phase of the task in Experiment 1. (a) Depiction of a purely model-free learner whose choices would be predicted by rewarded vs. unrewarded reinforcement strategy. (b) Depiction of a purely model-based learner whose choices would depend on both rewarded vs. unrewarded reinforcement and the common vs. rare transition. (c) Stay proportion results from the mixed effects logistic regression model averaged across all participants. As reported in the main text, evidence for model-free (main effect of Reward, p < .001), but not model-based (Reward X Transition interaction) behavior were observed in the 100-trial version of the task in Experiment 1.

To examine the individual and interactive effects of externalizing proneness and striatal dopamine on stay behavior, the individual difference variables (EBR index of striatal dopamine, substance use, and disinhibition) were incorporated into an expanded mixed effects logistic regression model. However, while the same effect of reward emerged, as in the simpler model, we found no significant effects of the individual difference factors when predicting stay behavior emerged, ps > .10. There were also no significant interactions between these individual difference variables with Reward (ps > .10) or Reward X Transition (ps > .10). The results are reported in Table S2 along with the specific syntax for the model.

Hierarchical Regression Analyses

Devaluation Sensitivity.

A hierarchical regression analysis was conducted to examine the effect of substance use, disinhibition, and EBR on devaluation sensitivity². In the first step, Gender was added as a covariate, but did not significantly predict devaluation (p = .91). In the second step, scores for these three individual variables were entered into the model, R² = .11, F(4, 60) = 1.43, p = .23. Substance use emerged as a marginally significant predictor (β = −.23, p .07, BF=1.64), whereas effects for disinhibition and EBR were nonsignificant, ps > .30. In the last step of the model, the Substance use X EBR and Disinhibition X EBR interaction terms were added. At this step, none of the predictors (ps > .40) were significant, nor did the addition of these terms account for a significant proportion of the variance in devaluation sensitivity, ∆R² = .001, F(2, 58) = 0.02, p = .98.

Model-Free Index.

The hierarchical regression analysis with each participant’s model-free index as the outcome variable utilized the same predictors as described for the devaluation sensitivity outcome measure. This metric, as well as the model-based index, was dependent on whether participants stayed with or switched options following a rewarded or unrewarded trial and whether the option on the previous trial led to a common or rare box in the second stage. In this model, neither the predictors nor the omnibus predictions were significant in the first and second steps, ps > .10.

Model-Based Index.

The final set of hierarchical regression models was performed with the model-based index as the outcome variable. In the first step of the model, Gender was added as a covariate, but was nonsignificant, p = .40. In the second step, Substance use (β = −.29, p < .05, BF=3.31) negatively predicted model-based behavior, whereas Disinhibition (p = .91) and EBR (p = .34) were not significantly associated with the model-based index. However, the omnibus test was not significant at this step, R² = .10, F(4, 59) = 1.88, p = .13. In the final step with the interaction terms added, the test of R² change was nonsignificant, ∆R² = .02, F(2, 57) = .56, p = .58, and none of the other predictors were significant at this step, ps > .05.

Discussion

Experiment 1 supports our first hypothesis (H1) and provides preliminary evidence that substance use alone, independent of our indirect proxy for striatal dopaminergic functioning (EBR) and trait disinhibition, is associated with diminished devaluation sensitivity. This result suggests that substance use is associated with reduced reward disengagement. However, it should be noted that the Bayes factors for this effect indicate only weak support for the substance-use devaluation association, possibly due to insufficient power. Moreover, consistent with our second hypothesis (H2), substance use was negatively associated with model-based behavior during the reinforcement learning phase of the task. Because of the relationship between substance use and both model-based behavior and devaluation sensitivity, we also considered the prospect that an interaction between substance use and model-based behavior may predict devaluation sensitivity. However, as discussed in the Supplementary Materials, this analysis yielded nonsignificant results, which excludes this possibility.

Given the absence of a significant Reward X Transition interaction, and, consequently, the apparent lack of model-based strategy use in overall choice behavior, this finding should be interpreted with caution. One potential limitation to this study that may have contributed to this null finding as well as the observed high correlation between the model-based and model-free indices is that participants performed 100 trials of the task, rather than 200 task trials as employed in previous research by Gillan et al. (2015). Whereas Gillan et al. reported a significant positive relationship between individuals’ model-based indices and devaluation sensitivity, our results for Experiment 1 (see Table 2) showed a nonsignificant effect (r = .07, p = .58). Thus, 100 trials of the reinforcement learning phase may not have been sufficient for participants to learn which first-stage options had a higher probability of yielding a reward in the second stage. One hundred trials may have also been insufficient to fully capture habit-based responding.

Additionally, a low correlation between substance use and disinhibition was observed. This result was unexpected given the strong association between these measures that has been reported in previous studies (e.g., Byrne, Patrick, & Worthy, 2016; Patrick & Drislane, 2015; Venables & Patrick, 2012). This low correlation can be attributed to a higher proportion of subjects who scored quite low on the disinhibition subscale but quite high on substance abuse subscale. It is conceivable that this experimental sample may have included more students with low disinhibitory tendencies whose higher substance use scores reflected college-age experimentation as opposed to trait-related substance use. A larger sample size may mitigate this incidental issue.

Thus, in Experiment 2 we sought to increase the duration of the reinforcement learning phase to 200 trials in order to more effectively test the relationship between model-based learning and devaluation sensitivity that was observed in Gillan et al. (2015)’s study. Furthermore, we also increase the sample size to provide higher power to test for effects and utilized a sample with representative disinhibition and substance use tendencies. The devaluation phase remained unchanged in Experiment 2.