Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jul 10.
Published in final edited form as: Psychopharmacology (Berl). 2022 Jun 16;239(9):2885–2901. doi: 10.1007/s00213-022-06174-w

Adolescent reinforcement-learning trajectories predict cocaine-taking behaviors in adult male and female rats

Peroushini Villiamma 1,2, Jordan Casby 3, Stephanie M Groman 1,2,3,4
PMCID: PMC10332493  NIHMSID: NIHMS1911181  PMID: 35705734

Abstract

The anatomical, structural, and functional adaptations that occur in the brain during adolescence are thought to facilitate improvements in decision-making functions that are known to occur during this stage of development. The mechanisms that underlie these neural adaptations are not known, but deviations in developmental trajectories have been proposed to contribute to the emergence of mental illness, including addiction. Direct evidence supporting this hypothesis, however, has been limited. Here, we used a recently developed reversal-learning protocol to investigate the predictive relationship between adolescent decision-making trajectories and cocaine-taking behaviors in adulthood. Decision-making functions in the reversal-learning task were assessed throughout adolescence and into adulthood in male and female Long–Evans rats. Trial-by-trial choice data was fitted with a reinforcement-learning model to quantify the degree to which choice behavior of individual rats was influenced by rewarded (e.g., Δ+ parameter) and unrewarded (e.g., Δ0 parameter) outcomes. We report that reversal-learning performance improved during adolescence and that this was due to an increase in value updating for rewarded outcomes (e.g., Δ+ parameter). Furthermore, the rate of change in the Δ+ parameter predicted individual differences in the Δ+ parameter and, notably, cocaine-taking behaviors in adulthood: Rats that had a shallower adolescent trajectory were found to have a lower Δ+ parameter and greater cocaine self-administration in adulthood. These data indicate that adolescent development plays a critical role in drug use susceptibility. Future studies aimed at understanding the neurobiological mechanisms that underlie these age-related changes in decision-making could provide new insights into the biobehavioral mechanisms mediating addiction susceptibility.

Keywords: Neurodevelopment, Addiction susceptibility, Computational psychiatry, Decision making, Reward learning

Introduction

Adaptive decision-making in dynamic environments is impaired in individuals with mental illness, including addiction. Substance-dependent individuals and animals persistently exposed to drugs of abuse have difficulties adapting their choices in response to changes in the environment (Jentsch et al. 2002; Fillmore and Rush 2006; Ersche et al. 2008; Groman et al. 2012, 2018a, 2020b; Zhukovsky et al. 2019) which are believed to be, in part, a consequence of persistent exposure to drugs of abuse. Studies in animals, however, have found that decision-making impairments that are present prior to any drug use are associated with greater drug-taking behaviors (Dalley et al. 2007; Perry et al. 2008; Cervantes et al., 2013; Groman et al. 2020a). These data suggest that poor decision-making may be a risk factor for developing an addiction and, importantly, that decision-making could serve as a powerful phenotype for identifying the neurobiological mechanisms mediating addiction susceptibility (Groman and Jentsch, 2012; Groman et al., 2022).

Decisions are guided by action values that are computed by multiple learning systems in the brain (Sutton and Barto, 1998; Niv, 2009; Lee, 2013). These systems compute the degree to which action values change on an individual trial, which can vary based on whether the action was performed or not and whether the outcome of a chosen action was appetitive or aversive (Ito and Doya, 2009; Lee, 2013). The degree to which these computations influence choice behavior can be quantified with reinforcement-learning algorithms. We, and others, have been using these reinforcement-learning algorithms to understand the role of decision-making in rodents models of addiction (Zhukovsky et al., 2019; Groman et al. 2020b, 2020a). Our group has found that the ability of rats to adjust their choices in response to changes in reinforcement contingencies (e.g., reversal-learning tasks) is both predictive of and affected by drug self-administration. These relationships, however, are mediated by distinct computations. Specifically, we have found that lower value updating following an appetitive, but not aversive, outcome is predictive of greater escalation in cocaine use (Groman et al. 2020a), whereas value updating following an aversive, but not an appetitive, outcome is disrupted following cocaine self-administration (Groman et al. 2020b). We hypothesized that understanding the mechanisms leading to differences in value updating following an appetitive outcome could provide new insights into addiction susceptibility.

Decision-making functions that are predictive of problematic drug-taking behaviors may be the result of abnormalities in the maturation and/or development of specific neural circuits (Chambers et al. 2003; Blakemore and Robbins 2012; Mitchell et al. 2014; Squeglia and Cservenka 2017). Experimental manipulations that significantly alter neurodevelopment (e.g., neonatal ventral hippocampal lesions and prenatal exposure to methylazoxymethanol acetate [MAM]) have been found to disrupt decision-making and enhance addiction-like behaviors in adult rats (Featherstone et al. 2006; Rao et al. 2016). Adolescence is a critical developmental period associated with robust changes in brain morphology, structure, and function (Casey et al. 2008; Spear 2013) which have been proposed to underlie the improvements in decision-making functions that occur during adolescence (van der Schaaf et al. 2011). Moreover, it is hypothesized that the biobehavioral changes that occur during adolescence are critical in generating the foundation for decision-making and cognitive functions in adulthood (Blakemore and Robbins 2012). Indeed, our recent studies in rats have found that the rate of improvement in value updating following an appetitive outcome during adolescence is predictive of decision-making functions in adulthood (Moin Afshar et al. 2020). Individual differences in value updating that we have found to be associated with drug-taking behaviors in adulthood may be the result of neurodevelopmental disruptions that occur during adolescence. Elucidating the biobehavioral mechanisms underlying age-related changes in decision-making could, therefore, provide key insights into the neurobiology of addiction vulnerability (Chambers et al. 2003; Groman et al. 2022).

In the current study, we investigated the relationship between adolescent-related changes in reinforcement-learning mechanisms and drug-taking behaviors in adulthood. We trained rats to acquire and reverse three-choice spatial discrimination problems at three adolescent ages (postnatal day 35 (P35), P55, and P75) and retested these same rats in adulthood (P120). Following these reversal-learning assessments, rats were implanted with intrajugular catheters and trained to self-administer cocaine in 6-h daily sessions for 14 days. Extinction and reinstatement tests were conducted to assess addiction-relevant behaviors. We hypothesized, based on previous studies (Groman et al. 2019b, 2020a; Moin Afshar et al. 2020), that adolescent trajectories in value updating following an appetitive outcome would predict cocaine-taking behaviors in adulthood.

Methods and materials

Animals

Male and female Long–Evans rats (N=56) were bred in house using four breeding pairs. Rats were weaned at P21 and housed in standard laboratory cages on a 14-h/10-h light/dark cycle in a climate-controlled vivarium (lights on at 6:30 am; lights off at 8:30 pm). Animals had ad libitum access to food and water until behavioral testing began. All experiments were performed as approved by the Institutional Animal Care and Use Committee at the University of Minnesota – Twin Cities and according to the National Institutes of Health and institutional guidelines and the Public Health Service Policy on the Humane Care and Use of Laboratory Animals.

Rats were assigned to participate in either a cross-sectional adolescent study (N=40; 16 F, 24 M) or a longitudinal adolescent study (N=16; 10 F, 6 M). Rats in the cross-sectional study underwent a single round of adolescent testing on the reversal-learning task (described below) at different ages (P35, P55, or P75). Rats in the longitudinal adolescent study were repeatedly tested on the reversal-learning task at the same adolescent ages as rats in the cross-sectional study (P35P55P75). P75 was selected as the last adolescent age based on our previous work demonstrating that the performance of rats in the reversal task between ~ P30 and P70 was highly correlated and largely independent of performance in adulthood (P90–P170). Rats in both studies then underwent a single round of testing on the reversal-learning task in adulthood (P120), and a subset of these individuals was then trained to self-administer cocaine in 6-h daily sessions for 14 days (see Fig. 1D below). We hypothesized that the performance of rats on the reversal-learning task in adolescence would not predict cocaine-taking behaviors, but that the reversal-learning trajectories during adolescence would. This combination of study designs allowed us to ensure that any correlations we observed could not be fully explained by repeated experiences.

Fig. 1.

Fig. 1

Assessing adaptive choice behavior with the reversal-learning task. (A, top) Experimental procedures for training and testing rats in the reversal-learning task. (A, bottom) Schematic of a single trial in the reversal-learning task. (B) The deterministic schedule of reinforcement for each of the three noseport (NP) options. (C) The probabilistic schedule of reinforcement for each of the three noseport options. (D) Diagram of the experimental procedures. Rats in the cross-sectional group underwent a single round of testing in the reversal-learning task at different adolescent ages (P30, P50, or P70). Reversal-learning performance was reassessed in adulthood (P120) and a subset of these trained to self-administer cocaine. Rats in the longitudinal group underwent repeated testing in the reversal-learning task across adolescence (P30, P50, and P70). Reversal-learning performance was reassessed in adulthood (P120) and a subset of these trained to self-administer cocaine

Apparatus

All behavioral testing was performed in standard aluminum and Plexiglass operant conditioning chambers. These chambers were equipped with a photocell reward (liquid or solid) delivery magazine, two retractable levers on either side of the magazine, and a curved panel with five photocell-equipped noseports on the opposite side (Med Associates Inc.). A custom-made lixit was mounted to the magazine to deliver liquid rewards. Chambers were housed inside of sound-attenuating cubicles, with background white noise being broadcast.

Operant training

Cross-sectional study

Rats in the cross-sectional study were exposed to 10% sweetened condensed milk (SCM; % v/v, water) at P35, P55, or P75 in a single 2-h session within their home cage. Food was removed 24 h before rats were trained to make an operant response to receive a reward (60 ul of 10% SCM solution) in 12-h overnight sessions using previously described procedures (Moin Afshar et al. 2020). Briefly, rats could initiate trials by making a response into an illuminated magazine. A single noseport aperture located on the opposite panel was illuminated, and responses into the illuminated noseport resulted in the delivery of reward into the magazine. Sessions terminated when rats had earned 151 rewards or 720 min (e.g., 12 h) had elapsed, whichever occurred first. If rats did not obtain 151 rewards in a single, overnight session, the operant training session was repeated the following day(s) until the reward criterion was met. Chow was provided the morning after each overnight session with the amount titrated to their performance and body weight. If rats lost weight, additional chow was provided the following morning to ensure that rats continued to grow during adolescence (see Table 1).

Table 1.

Performance measures of rats in the reversal-learning task under deterministic and probabilistic schedules of reinforcement

Study Age (days) Developmental age group Weight (grams) Deterministic schedule
Probabilistic schedule
Total number of sessions Total trials completed Number of sessions the criterion was met Total trials included in analysis Total number of sessions Total trials completed Number of sessions the criterion was met Total trials included in analysis

Cross-sectional (N = 40;24 M/16 F) 38 ± 1.15 Adolescence 69 ± 3 10.07 ± 0.83 2316 ± 160 2.79 ± 0.11 889 ± 52 4.21 ± 0.24 1405 ± 63 2.71 ± 0.19 983 ± 71
59 ± 0.86 Adolescence 216 ± 11 6.00 ± 0.52 1486 ± 79 2.71 ± 0.16 890 ± 71 3.64 ± 0.50 1214 ± 115 2.57 ± 0.25 1035 ± 67
79 ± 0.64 Adolescence 330 ± 23 6.31 ± 0.58 1686 ± 87 2.91 ± 0.25 979 ± 93 3.53 ± 0.18 922 ± 65 2.18 ± 0.29 896 ± 89
123 ± 0.54 Adult 396 ± 17 7.13 ± 0.50 1566 ± 82 2.86 ± 0.08 849 ± 46 3.55 ± 0.21 1174 ± 51 2.69 ± 0.11 957 ± 52
Longitudinal (N = 16;8 M/8 F) 38 ± 0.76 Adolescence 71 ± 2 9.94 ± 0.85 2259 ± 136 3.00 ± 0 956 ± 27 4.31 ± 0.28 1393 ± 83 2.5 ± 0.18 883 ± 68
55 ± 0.40 Adolescence 172 ± 6 3.88 ± 0.27 1109 ± 67 3.00 ± 0 908 ± 23 3.19 ± 0.21 1181 ± 58 2.94 ± 0.06 1109 ± 40
74 ± 0.27 Adolescence 257 ± 15 3.25 ± 0.14 935 ± 39 3.00 ± 0 968 ± 17 3.06 ± 0.06 1059 ± 25 2.94 ± 0.06 1024 ± 34
121 ± 0.59 Adult 362 ± 26 5.06 ± 0.59 1132 ± 91 2.94 ± 0.06 783 ± 31 3.94 ± 0.37 1245 ± 88 3.00 ± 0 1013 ± 15

Total number of sessions is the total number of sessions rats required to reach the reward criterion. Total trials are the total number of trials rats completed under each schedule. Values presented are mean ± SEM

Longitudinal study

Rats in the longitudinal study were exposed to 10% sweetened condensed milk at P35 in a single 2-h session within their home cage. Food was removed at the end of the sweetened condensed milk exposure ~ 24 h before rats were trained to make an operant response to receive a reward (60 ml of 10% SCM solution) in 6-h sessions. Rats initiated trials by making a response into an illuminated magazine. A single noseport aperture located on the opposite panel was illuminated, and responses into the illuminated noseport resulted in the delivery of reward into the magazine. Sessions terminated when rats had earned 151 rewards or 720 min had elapsed, whichever occurred first. If rats did not obtain 151 rewards in a single session, the operant training session was repeated in the following days until the performance criterion was met. Chow was provided the morning after each overnight session with the amount titrated to their performance and body weight. If rats lost weight, additional chow was provided the following morning to ensure that rats continued to grow during adolescence (see Table 1).

Deterministic and probabilistic reversal learning

Once operant responding had been established, the ability of rats to acquire and reverse deterministically reinforced three-choice spatial discrimination problems was assessed in 12 h, overnight sessions. Response to the magazine resulted in the illumination of three noseport apertures, and rats could respond to any of the illuminated noseports to earn a deterministically delivered reward (Fig. 1A). One noseport was randomly assigned to deliver a reward, while the other two were assigned to not deliver a reward by a computer program at the start of each session. When rats met a performance criterion (21 choices on the highest reinforced noseport in the last 30 trials), the assignments reversed: The reinforced noseport (100%) was now assigned to not deliver reward (0%), while one of the unreinforced noseports was now assigned to deliver reward (100%). These reinforcement probabilities remained unchanged until the performance criterion was once again met, after which the reinforcement probabilities reversed again between the reinforced noseport and one of the two unreinforced noseports. Each time the performance criterion was met, the assignments reversed again between any two of the noseports. The occurrence was, therefore, contingent on the performance of the rat (Fig. 1B). Sessions terminated when rats earned 151 rewards, or 720 min had elapsed. Rats completed three sessions using this deterministic schedule of reinforcement at each age. If rats failed to earn 151 rewards in a single session, that session was repeated the following day.

The ability of rats to acquire and reverse probabilistically reinforced three-choice spatial discrimination problems was then assessed in 12 h, overnight sessions. Each noseport aperture was randomly assigned to deliver a reward with a probability of 70%, 30%, or 10% by the program at the start of each session. When rats met a performance criterion (21 choices on the highest reinforced noseport in the last 30 trials), the probabilities reversed: The most frequently reinforced noseport (70%) was now assigned to deliver a reward with a lower probability (30% or 10%), and one of the less frequently reinforced noseports (30% or 10%) was now assigned to deliver reward with the highest probability (70%). Sessions terminated when rats received 151 rewards or 720 min had elapsed, whichever occurred first. If rats failed to earn 151 rewards in a single session, that session was repeated the following day.

Once rats completed the third session under the probabilistic schedule of reinforcement, rats in the cross-sectional study were returned to the vivarium and given ad libitum access to food until they had reached the adult testing age (e.g., P120). Rats in the longitudinal study were returned to the vivarium and given ad libitum access to food until they reached the next testing age (e.g., P55, P75, and P120), at which point food was removed ~ 24 h prior to testing on the deterministic schedule.

Due to technical difficulties with the alignment of the lixit in the magazine, which inadvertently reduced reward delivery and slowed initial training on the reversal-learning task, rats that began operant testing at P30 received more training on the deterministic schedule of reinforcement than in our previous study (Moin Afshar et al. 2020; Table 1). It is likely that this greater experience with the deterministic schedule of reinforcement explains why rats in the current study were able to complete more reversals in a single session compared to our previous observations at ~ P30. Nevertheless, we were still able to see age-related changes in reversal learning during adolescence that were similar to our previous study.

Cocaine self-administration and addiction-relevant behaviors

After rats completed the P120 reversal-learning assessments, a subset of rats (N=47) was implanted with intrajugular catheters as previously described (Groman et al. 2019b). This number of rats was smaller than our original sample size (N=56) because we did not implant catheters in those rats that failed to achieve the reward criterion during the P120 assessment (N=9). Surgeries were performed under anesthesia (2–3% isoflurane). Two rats did not recover from surgery (1 M from the longitudinal group, 1 F from the P35 group), resulting in N=44. Rats were given the analgesic Rimadyl (5 mg/kg; Henry Schein, Dublin, OH) once per day for three days and allowed to recover for three days before beginning the self-administration paradigm. Catheters were flushed daily with 0.1 ml of gentamicin/heparin solution (gentamicin: 0.8 mg/kg; heparin: 30 USP U/ml). Catheter patency was assessed the day before beginning cocaine self-administration and once weekly during the self-administration procedure by infusing rats with 0.2 ml of Brevital (10 mg/ml in saline). If the catheter was defective, rats were excluded from the self-administration analyses (N=11), resulting in a final sample size of N=34.

On each self-administration day, rats were flushed with 0.1 ml of the gentamicin/heparin solution before being attached to a tether in a different operant box than where the behavioral assessments occurred. The session began with the extension of two levers located on the sides of the magazine into the operant box. Responses on the active lever result in a single infusion of cocaine (0.5 mg/kg/infusion) and presentation of a compound cue (10 s, 10 kHz auditory tone and steady light cue). A 20-s timeout period followed each infusion to reduce the likelihood of a drug overdose. Responses on the inactive lever were recorded but had no programmed consequence. There was no limit on the number of infusions rats could earn in the 6-h session. Self-administration sessions were conducted daily between 17 00 h and 06 00 h for 14 consecutive days.

Rats then underwent five days of 1-h extinction sessions in which levers were extended into the operant boxes and responses recorded but had no programmed consequence. The day after the last extinction session, the ability of drug-paired cues to reinstate drug-seeking behaviors in a single 1-h session was assessed. During the reinstatement sessions, responses on the active lever resulted in the delivery of a 10-s compound cue previously associated with a cocaine infusion and responses to the active and inactive lever were recorded. Rats were euthanized the following day, and tissue was collected for future experiments not reported here.

Data analyses

Logistic regression

We have previously found that age-related changes in reversal learning are driven by changes in how rats use rewarded outcomes to guide their choice behavior (Moin Afshar et al., 2020). To determine if the same pattern of results was present in the current study, the choice behavior of rats was analyzed by fitting the following logistic regression model that estimated the likelihood of repeating the same choice as in each of the four previous trials according to whether the previous trial was rewarded or not, as follows:

lnPx(t)1-Px(t)=β0+τ=14βτ+I+(t-τ)+τ=14βτ0I0(t-τ)

where Px(t) denotes the probability that in trial t, the rat would make the same noseport choice, x, that could have been made in each of the last 4 trials (τ=1 to 4). I+(t) and I0(t) indicate whether the rat’s choice of the target x in trial t was rewarded or not according to the following convention: I+(t)=1 if the choice of x in trial t was rewarded, 0 if the choice in trial t was unrewarded, and −1 if the animal chose the target other than x in trial t and was rewarded; I0(t)=1 if the choice of x in trial t was unrewarded, 0 if the choice in trial t was rewarded, and −1 if the animal chose the target other than x in trial t and was rewarded. For example, if the animal’s choices in the last 4 trials and their outcomes were NP1 rewarded (t-1), NP2 unrewarded (t-2), NP3 rewarded (t-3), and NP2 rewarded (t-4), then the values of the regressors included in the above logistic regression model for NP1 would be I+=1,0,-1,-1, and I0=0,-1,0,0, for τ=1,2,3,4, respectively. Three separate logistic regressions were performed for each of the three noseport choices, and all the regression coefficients for each of the three choices averaged separately for regressors corresponding to rewarded and unrewarded choices. Positive coefficients for the rewarded and unrewarded predictors indicate that rats are more likely to persist with the same choice, whereas negative regression coefficients indicate that rats are more likely to switch their choice.

Reinforcement-learning model

Reinforcement-learning models predict that choices are based on outcomes from different actions that incrementally accrue over many trials compared to the limited trial history used in the logistic regression analysis. To investigate age-related changes in specific reinforcement-learning processes, choice data were fit with a forgetting reinforcement-learning model (Barraclough et al., 2004; Ito and Doya, 2009; Groman et al., 2016, 2018b). This model was fit using 100 different initial parameter values with starting action values Qx(1)=0 for all actions (x= noseport 1 (NP1), NP2, NP3)). The value updating for this model is as follows:

ifat=x,Qxt+1=γQxt+Δt
ifatx,Qxt+1=γQxt

where the decay rate γ determines how quickly the action value decays and Δ(t) indicates the change in the action value depending on the outcome in trial t. If the outcome of the trial was rewarded, then the value function of the chosen port was updated by Δ(t)=Δ+, the reinforcing strength of reward. Rats with a large, positive Δ+ parameter are more likely to repeat a previously rewarded choice compared to rats with a small Δ+ parameter. If the outcome of the trial was not rewarded, then the value function of the chosen port was updated by Δ(t)=Δ0, the aversive strength of no reward. Rats with a large, positive Δ0 parameter are more likely to repeat a previously unrewarded choice compared to rats with a small and/or negative Δ0 parameter. Choice probability was calculated according to a softmax function and trial-by-trial choice data fit with these three parameters (e.g., γ,Δ+, and Δ0) selected to maximize the likelihood of each rat’s sequence of choices using the fminsearch function in MATLAB (2020a).

We also compared the fit of the forgetting reinforcement-learning model to two other reinforcement-learning models: (1) a Q-learning algorithm that contained a single learning rate parameter (α) and the inverse temperature parameter (β) (Ito and Doya 2009) and (2) a Q-learning algorithm that contained two learning parameters - one for positive outcomes αg and one for negative outcomes α1 - and the inverse temperature parameter (β) (Frank et al. 2004). The value updating for these models is as follows.

Q learning with a single learning rate:

ifat=x,Qxt+1=Qxt+αrt-Qxt
ifa(t)x,Qx(t+1)=Qx(t)

Q learning with two learning rates:

ifa(t)=xandr(t)=1,Qx(t+1)=Qx(t)+αg(1Qx(t))
ifa(t)=xandr(t)=0,Qx(t+1)=Qx(t)+αl(0Qx(t))
ifa(t)x,Qx(t+1)=Qx(t)

The Bayesian Information Criterion (BIC) for each model was calculated, and the BIC for each model was summed across rats. These results are presented in Table 2. The BIC for the forgetting reinforcement-learning model was lower compared to all other models, indicating that this model best fit the rat choice data.

Table 2.

Comparison of reinforcement-learning model fits for choice behavior under the deterministic and probabilistic schedules of reinforcement

Schedule Age Forgetting reinforcement learning model Q-learning model with one learning rate Q-learning model with two learning rates

Deterministic 35 48,893 53,096 52,131
55 46,501 50,553 49,745
75 43,048 46,585 45,964
120 65,197 70,055 69,159
Probabilistic 35 51,216 55,615 53,573
55 55,402 62,625 58,380
75 46,257 49,098 47,983
120 86,136 91,991 89,169

Values presented are the sum of the BIC. Bold values are those with the lowest BIC

Statistical analyses

Data are expressed as mean ± SEM. All analyses were conducted in SPSS (IBM, v 26) using generalized linear models (GLM) or generalized estimating equations (GEE) for repeated measures. GEE is a population-level approach based on the quasi-likelihood function that provides a population average estimate of parameters. GEE permits the specification of a working correlation matrix to account for within-subject correlation of responses on dependent variables of different distributions, including normal, binomial, and Poisson, that yields unbiased regression parameters relative to ordinary least-squares regression (Ballinger, 2004). Data were entered into a GEE model as repeated measures using a probability distribution based on the known properties of these data. The working correlation matrix for each model was determined by comparing the quasi-likelihood criterion (Pan, 2001). Factors in the model included sex (male vs. female), group (cross-sectional vs. longitudinal), and age. Statistical significance of explanatory factors included in the model was assessed with the Wald χ2 test with an alpha threshold of p<0.05. Post-hoc tests of significant interactions consisted of computing low-order comparisons (e.g., 1-way analysis) between sex or experimental group. Regression and multiple linear regression models were used to examine relationships between dependent variables. To determine how reversal-learning trajectories during adolescence may be related to reversal-learning performance in adulthood, the slope between the dependent variables (e.g., number of reversals completed and the Δ+ parameter estimate) and adolescent age (e.g., P35, P55, and P75) was calculated. The mediation analysis was performed in SPSS using multiple linear regression and direct and indirect effects calculated using SPSS PROCESS macro by Andrews Hayes (Hayes 2018).

Results

Operant training

The number of sessions required to achieve the reward criterion in the operant training sessions did not change across the adolescent ages, as follows: Rats that began the operant training on P35, P55, or P75 required 2.4 ± 0.28, 3.3 ± 0.79, and 2.1 ± 0.51, sessions, respectively, to reach the reward criterion. Although there were significant differences in body weight across these ages (see Table 1), these results suggest that rats were equally motivated by the sweetened condensed milk solution.

Reversal learning during adolescence

The performance of rats under the deterministic schedule of reinforcement was then examined (see Table 1 for basic behavioral measures). The relationship between adolescent age, sex, and the dependent measures collected was examined with GLMs. Age, but not sex or the sex × age interaction, explained a significant amount of variance in the number of reversals rats were able to complete in the reversal-learning task (χ2=7.67;p=0.006): As age increased across and within rats, the number of reversals that rats were able to complete in a single session increased (β=0.015±0.008; Fig. 2A). This was not due to differences in the number of trials that rats completed within these sessions across the adolescent ages (age: χ2=2.10;p=0.15; Fig. 2B) because age still explained a significant amount of variance in the number of reversals when the number of trials completed was included in the model (β=0.004;χ2=4.73;p=0.03; Fig. 2C). Moreover, the age-related increase in the number of reversals that rats were able to complete was not due to differences in the ability of rats to acquire the initial discrimination, as the number of trials required to achieve the first reversal was not explained by age (χ2=1.72;p=0.19; Fig. 2D). These results are similar to our previous observations (Moin Afshar et al. 2020) and demonstrate that the ability of rats to adjust their choices in response to changes in the reinforcement contingencies improved across the adolescent ages. We then sought to determine if the increase in performance of rats in the reversal-learning task was due to changes in the ability of rats to use positive and/or negative outcomes to guide their subsequent choice. Regression coefficients from the logistic regression analysis for rewarded and unrewarded outcomes were compared across adolescent ages. The three-way interaction between outcome type, sex, and age was not significant (χ2=0.35;p=0.55), but the two-way interaction between outcome type and age was (χ2=4.60;p=0.03). Post-hoc analyses within each outcome type revealed that as age increased, the likelihood of repeating a recently rewarded outcome increased (χ2=5.43;p=0.02; Fig. 2E). Age, however, did not explain a significant amount of variance in the likelihood of rats repeating an unrewarded choice (χ2=0.48;p=0.49; Fig. 2F). These results indicate that the age-related increase in performance of rats in the reversal-learning task was due to an increase in value updating following positive outcomes.

Fig. 2.

Fig. 2

Performance of rats in the reversal-learning task under the deterministic schedule of reinforcement. (A) The relationship between age and the average number of reversals completed in a single reversal-learning session by rats in the longitudinal group (closed circles) and the cross-sectional group (open circles). (B) The relationship between age and the number of trials required to reach the reward criterion. (C) The relationship between age and the number of reversals completed normalized to the number of trials that rats completed in a single session. (D) The relationship between age and the number of trials required to reach the first reversal. (E) The relationship between age and the sum of the regression coefficients for the rewarded predictor in the logistic regression model. (F) The relationship between age and the sum of the regression coefficients for the unrewarded predictor in the logistic regression model

Similar age-related improvements in performance were observed under the probabilistic schedule of reinforcement. Age, but not sex or the sex × age interaction, explained a significant amount of variance in the number of reversals rats were able to complete in the reversal-learning task (χ2=4.94;p=0.03): As age increased across and within rats, the number of reversals that rats were able to complete in a single session increased (β=0.007±0.007; Fig. 3A). Similar to the performance of rats under the deterministic schedule of reinforcement, this was not due to differences in the number of trials that rats completed within these sessions across the adolescent ages (age: χ2=0.002;p=0.97; Fig. 3B) because age still explained a significant amount of variance in the number of reversals when the number of trials completed was included in the model (β=0.004;χ2=4.12;p=0.04; Fig. 3C). Moreover, the age-related increase in the number of reversals that rats were able to complete was not due to differences in the ability of rats to acquire the initial discrimination, as the number of trials required to achieve the first reversal was not explained by age (χ2=0.55;p=0.46; Fig. 3D).

Fig. 3.

Fig. 3

Performance of rats in the reversal-learning task under the probabilistic schedule of reinforcement. (A) The relationship between age and the average number of reversals completed in a single reversal-learning session by rats in the longitudinal group (closed circles) and the cross-sectional group (open circles). (B) The relationship between age and the number of trials required to reach the reward criterion. (C) The relationship between age and the number of reversals completed normalized to the number of trials that rats completed in a single session. (D) The relationship between age and the number of trials required to reach the first reversal. (E) The relationship between age and the sum of the regression coefficients for the rewarded predictor in the logistic regression model. (F) The relationship between age and the sum of the regression coefficients for the unrewarded predictor in the logistic regression model

The regression coefficients from the logistic regression analyses were then examined. The three-way interaction between outcome type, sex, and age was not significant (χ2=0.02;p=0.90), but the two-way interaction between outcome type and age was (χ2=15.43;p<0.001). Post-hoc analyses within each outcome type revealed that as age increased, the likelihood of repeating a recently rewarded outcome increased (χ2=13.91;p<0.001; Fig. 3E). Age, however, did not explain a significant amount of variance in the likelihood of rats repeating an unrewarded choice (χ2=2.60;p=0.11; Fig. 3F). These results suggest that the age-related increase in reversal-learning performance is due to improvements in reward-guided behavior.

Reinforcement-learning processes in adolescence

Age-related improvements in reversal learning may be mediated by select reinforcement-learning processes. Trial-by-trial choice data in the probabilistic and deterministic reversal-learning task were fitted with three different reinforcement-learning models (see “Materials and methods”). The BIC for each of these models was calculated for individual rats, and the sum of these BIC values is presented in Table 2. The BIC for the forgetting reinforcement-learning model was consistently lower compared to all other models at every age examined, indicating that this model best fit the rat choice data, as we have previously observed (Moin Afshar et al. 2020). The parameter estimates obtained for choice behavior under the deterministic schedule of reinforcement were strongly correlated with that from the probabilistic schedule at each age examined, so the parameter estimates from this model (e.g., γ, Δ+, and Δ0) were averaged across the reinforcement schedules and compared across the adolescent ages.

The three-way interaction among sex × age × parameter was not significant χ2=0.15;p=0.93, but the two-way age × parameter interaction was χ2=9.61;p=0.008). Post-hoc analyses revealed that as age increased across and between rats, the γ parameter decreased (β=-0.002±0.0007;χ2=6.84;p=0.009; Fig. 4A) and the Δ+ parameter increased (β=0.005±0.0017;χ2=8.013;p=0.005; Fig. 4B). The Δ0 parameter, however, did not significantly change with age (β=-0.001±0.0006;χ2=2.46;p=0.12; Fig. 4C). We then examined whether the γ and Δ+ parameter explained unique portions of variance in reversal performance using a GEE. Both the γ and Δ+ parameters were significant predictors in the model, explaining 45% of the variance in the number of reversals that rats completed (γ parameter: χ2=4.12;p=0.04;Δ+ parameter: χ2=8.99;p=0.003; Fig. 4D,F). These reinforcement-learning results and those from the logistic regression collectively indicate that the increase in reversal-learning performance during adolescence is due specifically to improvements in reward-guided choice behavior.

Fig. 4.

Fig. 4

Age-related changes in the reinforcement-learning parameters assess in the longitudinal group (closed circles) and cross-sectional group (open circles). (A) The relationship between age and the γ parameter. (B) The relationship between age and the Δ+ parameter. (C) The relationship between age and the Δ0 parameter. (D) The relationship between the average number of reversals achieved in the reversal-learning task and individual differences in the γ parameter. (E) The relationship between the average number of reversals achieved in the reversal-learning task and individual differences in the Δ+ parameter. (F) The relationship between the average number of reversals achieved in the reversal-learning task and individual differences in the Δ0 parameter

Reversal learning and cocaine-taking behaviors in adult rats

We have previously reported that the rate of change in the Δ+ parameter during adolescence is predictive of individual differences in the Δ+ parameter in adulthood (Moin Afshar et al. 2020). To determine if that relationship was present in the current experiment, we reassessed reversal-learning performance in the same rats once they had reached adulthood (~ P120). A subset of rats in the cross-sectional adolescent group (N=9) failed to reach the reward criterion during the P120 assessments even after extended sessions in the reversal-learning task. This was likely because these rats had gained a significant amount of weight between the adolescent and adult assessments which subsequently reduced motivation for the reward in the reversal-learning task. Rats that failed to achieve the reward criterion weighed more than rats that were able to achieve the reward criterion (rats who failed: 453 ± 32; rats who achieved reward criterion: 379 ± 20; χ2=4.16;p=0.04) which may have reduced motivation in the reversal-learning task. Due to the time restrictions of the current experimental design, these rats were not implanted with catheters and were excluded from subsequent analyses.

The number of reversals that rats completed at P120 was significantly greater than the number of reversals rats completed at each of the adolescent assessments (age χ2=25.48;p<0.001; P35 vs P120: χ2=26.36;p<0.001; P55 vs P120: χ2=15.67;p<0.001; P75 vs P120: χ2=5.88;p=0.02; Fig. 5A). Reversal performance was slightly higher in the longitudinal group compared to the cross-sectional group at P120 (χ2=5.20;p=0.02; Fig. 5B), which is most likely because rats in the longitudinal group had repeated experience with the reversal-learning task. We then conducted a stepwise multiple regression analysis to determine which reinforcement-learning parameters explained unique portions of variance in reversal performance. Only the Δ+ parameter was a significant predictor in the model, explaining 57% of the variance in reversal performance F(1,44)=57.33;p<0.001: Rats with a larger Δ+ parameter were able to complete more reversals than rats that had a smaller Δ+ parameter (Fig. 5C).

Fig. 5.

Fig. 5

Reversal learning and cocaine self-administration in adult rats. (A) Relationship between age and the average number of reversals rats achieved in the reversal-learning task. Thin lines are individual rats; bold line is the average value for all rats. (B) Histogram of the number of reversals rats completed normalized to the number of trials completed in the cross-sectional group (open bar) and longitudinal group (closed bars). (C) The relationship between the number of reversals completed normalized to the number of trials and the Δ+ parameter at P120. (D) The number of cocaine infusions rats earned in each 6-h long-access session across the 14 self-administration days. (E) The number of active lever responses (closed circles) and inactive lever responses (open circles) rats made in each of five 1-h extinction sessions. (F) The number of active (closed circles) and inactive (open circles) lever responses rats made in the last extinction session and in the 1-h cue-induced reinstatement test

After completing the P120 assessments on the reversal-learning task, rats were trained to self-administer cocaine (0.5 mg/kg/infusion) in 6-h daily sessions for 14 days. The number of cocaine infusions rats earned in each 6-h session increased across the 14 days of self-administration (χ2=83.13;p<0.001; Fig. 5D), as we have previously observed (Groman et al. 2020a). Cocaine-taking behavior for individual rats, however, was more variable than in our previous studies which prevented us from fitting individual self-administration curves with the power function to quantify latent drug-taking phenotypes, as we have previously described (Groman et al. 2019b, 2020a). Following the cocaine self-administration sessions, rats underwent extinction training in 1-h daily sessions for five days, followed by a cue reinstatement test. Six rats developed a seroma at the site of catheter implantation during the self-administration procedure and were required to be euthanized. These rats were, therefore, excluded from subsequent analyses. The number of lever responses rats (N=28) made during the 1-h extinction sessions decreased across the five days of extinction training for both the active χ2=25.15;p<0.001 and inactive levers (χ2=6.31;p=0.01; Fig. 5E). The increase in responding of rats in the cue-induced reinstatement test, however, was specific to the active lever χ2=36.87;p<0.001) as responding on the inactive lever did not differ from that during the last extinction session (χ2=1.05;p=0.31; Fig. 5F).

Value updating following a reward is predictive of cocaine-taking behaviors

We have previously reported that individual differences in the Δ+ parameter are predictive of drug-taking behaviors: Rats with a lower Δ+ parameter prior to drug self-administration take more psychostimulants that rats with a higher Δ+ parameter (Groman et al., 2019b, 2020a). To determine if this relationship was also present in the current study, the parameter estimates from the P120 assessments were entered into a multiple linear regression predicting the total number of cocaine infusions rats earned across all 14 self-administration sessions. The Δ+ parameter, but not the γ or Δ+ parameter, explained a significant amount of variance in the total number of cocaine infusions rats earned R2=0.19;p=0.01): Rats with a smaller Δ+ parameter self-administered a greater number of cocaine infusions compared to rats with a larger Δ+ parameter (β=-0.44;p=0.01; Fig. 6A). When the Δ+ parameter was entered into a GEE predicting the number of cocaine infusions rats earned across the 14 self-administration sessions, the main effect of day and the Δ+ parameter were significant (day: χ2=12.52;p<0.001;Δ+ parameter: χ2=7.87;p<0.005), but the day × Δ+ parameter interaction was not (χ2=0.29;p=0.59; Fig. 6B).

Fig. 6.

Fig. 6

Reward-guided choice behavior is predictive of cocaine self-administration behaviors. (A) The relationship between the Δ+ parameter at P120 and the total number of cocaine infusions rats self-administered for the longitudinal group (closed circles) and the cross-sectional group (open circles). (B) The number of cocaine infusions rats self-administered on each of the self-administration days in rats that had a low Δ+ parameter (red line) and in rats that had a high Δ+ parameter (black line). (C) The relationship between the Δ+ parameter at P120 and the total number of active lever responses rats made across the extinction sessions

We then examined whether variation in the Δ+ parameter prior to cocaine use was associated with differences in responding during extinction training or the reinstatement test. The two-way interaction between extinction session and the Δ+ parameter was not significant χ2=0.34;p=0.56), but the Δ+ parameter main effect was χ2=4.41;p=0.04). Rats with a lower Δ+ parameter prior to cocaine self-administration had a greater number of active lever responses during the extinction sessions compared to rats with a higher Δ+ parameter (β=-0.82;p=0.04; Fig. 6C). The Δ+ parameter was not, however, a significant predictor in the number of active lever responses rats made during the cue reinstatement test χ2=0.005;p=0.94. These data, collectively, indicate that poor value updating prior to cocaine self-administration is associated with greater escalation in drug use and greater drug-seeking responses in the absence of cocaine reinforcement.

Adolescent decision-making trajectories are predictive of cocaine-taking behaviors

The results presented here indicate that age-related changes in reversal learning are associated with improvements in reward-mediated value updating (Figs. 2, 3, and 4) and that low value updating following a positive outcome in adulthood is predictive of addiction-relevant behaviors (Fig. 6). Adolescent-related changes in reward-mediated value updating may, therefore, be predictive of cocaine-taking behaviors in adulthood. To test this hypothesis, we first examined the relationship between the Δ+ parameter at each age and the total cocaine infusions of the same rats self-administered in the longitudinal group. No relationship between the adolescent Δ+ parameter estimate (e.g., P35, P55, and P75) and cocaine-taking behaviors was detected (all R2<0.12; all p>0.25). The Δ+ parameter estimate at P120, however, did predict cocaine use (R2=0.30;p=0.05; Fig. 7D) like our previous observation.

Fig. 7.

Fig. 7

Adolescent Δ+ parameter trajectories predict cocaine-taking behaviors. (A) Relationship between the rate of change in the Δ+ parameter during adolescence (~ P38–P79) and the Δ+ parameter at P120. (B) Relationship between the Δ+ parameter at P120 and the total number of cocaine infusions rats self-administered in adulthood at P120. (C) Relationship between the rate of change in the Δ+ parameter during adolescence and the total number of cocaine infusions rats self-administered in adulthood at P120. (D) Mediation analysis examining the relationships between the rate of change in the Δ+ parameter during adolescence, the Δ+ parameter during adulthood, and cocaine self-administration behaviors. (E) Relationship between the rate of change in the Δ+ parameter during adolescence and the total number of active lever responses rats made across the five extinction sessions. (F) Relationship between the rate of change in the Δ+ parameter during adolescence and the number of active lever responses rats made during the reinstatement test

Variation in the Δ+ parameter observed in adult rats (e.g., P120) may be linked to biobehavioral changes that occur during adolescence. We have previously reported that the rate of change of the Δ+ parameter during adolescence is predictive of the Δ+ parameter in adulthood (Moin Afshar et al. 2020). To determine if this relationship was also present here, we calculated the slope between the Δ+ parameter and adolescent age (from P35 to P75) for the longitudinal group and examined the relationship between the slope (e.g., degree of change in the Δ+ parameter) and the P120 Δ+ parameter estimate. Consistent with our previous results, the degree of change in the Δ+ parameter during adolescence predicted the P120 Δ+ parameter estimate (R2=0.29;p=0.03; Fig. 7E): Rats who had a more significant increase in the Δ+ parameter in adolescence had a larger Δ+ parameter in adulthood. These data suggested to us that age-related changes in value updating following a positive outcome during adolescence would be predictive of cocaine use in adulthood.

Indeed, consistent with our hypothesis, the degree of change in the Δ+ parameter during adolescence (e.g., P35–P75) was predictive of cocaine-taking behaviors R2=0.38;p=0.03: Rats who had a slower rate of increase in the Δ+ parameter during adolescence self-administered greater amounts of cocaine compared to rats who had a greater increase in the Δ+ parameter (Fig. 7F). This relationship was not driven by the rats who self-administered the least amount of cocaine (Fig. 7F) because a similar, albeit non-significant, a negative relationship was observed when these subjects were excluded R2=0.21;p=0.18. The rate of change in the Δ+ parameter during adolescence and the Δ+ parameter estimate in adulthood may explain non-overlapping portions of variance in cocaine self-administration behaviors. To explore this possibility, we conducted a multiple regression analysis with the rate of change in the Δ+ parameter and the Δ+ parameter in adulthood entered as independent variables in sequential regression models. The inclusion of both independent variables in a single model did not explain significantly more variance in cocaine-taking behaviors (change in R2=0.06;p=0.33), suggesting that the rate of change in the Δ+ parameter and the Δ+ parameter in adulthood were explaining overlapping portions of variance in cocaine self-administration behaviors. However, when both were included in a stepwise regression model, only the rate of change in the Δ+ parameter during adolescence remained a significant explanatory factor of variance in cocaine-taking behaviors. These results suggest that the rate of change in the Δ+ parameter during adolescence may be a more robust predictor of cocaine-taking behaviors than the P120 Δ+ parameter estimate.

It is also possible that the P120 Δ+ parameter estimate mediates the relationship between the rate of change in the Δ+ parameter during adolescence and cocaine self-administration. To test this, we conducted a mediation analysis using multiple linear regression analyses (Hayes, 2018). The negative relationship between the rate of change in the Δ+ parameter estimate and cocaine use (β=-0.61;p=0.03) was attenuated when the P120 Δ+ parameter estimate was included in the model (β=-0.44;p=0.16). The indirect effect, however, was not significant (95% confidence interval: − 0.54 to 0.13), indicating that the P120 Δ+ parameter estimate only partially mediated the relationship between the rate of change in the Δ+ parameter estimate and cocaine use (Fig. 7D).

We then examined whether the rate of change in the Δ+ parameter during adolescence predicted other addiction-relevant behaviors. The rate of change in the Δ+ parameter during adolescence was not predictive of the number of active lever responses rats made across the five extinction sessions (R2=0.03;p=0.60; Fig. 7E). There was, however, some evidence that the rate of change in the Δ+ parameter during adolescence was negatively related to the number of active lever responses rats made during the reinstatement test (R2=0.12;p=0.29; Fig. 7F). Although this relationship was not statistically significant, an inspection of the data suggested that this might be because one rat had an extreme number of active responses during the reinstatement test (Z score = 3.02). Indeed, when this rat was excluded from the analysis, the strength of the correlation increased R2=0.45;p=0.03): Rats who had a slower rate of increase in the Δ+ parameter during adolescence had a greater cue-induced reinstatement of the drug-seeking response compared to rats who had a greater rate of increase in the Δ+ parameter. These data, collectively, indicate that adolescent decision-making trajectories are an important predictor of addiction-relevant behaviors.

Discussion

The results of the current study provide new evidence that adolescent-related changes in select reinforcement-learning computations are predictive of cocaine-taking behaviors in adulthood. We found that the number of reversals that rats completed under deterministic and probabilistic schedules of reinforcement increased during adolescence and that this was due to improvements in value updating after positive outcomes (e.g., Δ+ parameter). The rate of increase in the Δ+ parameter during adolescence – which was related to individual differences in the Δ+ parameter estimate in adulthood – was predictive of cocaine self-administration in adulthood, and our preliminary evidence suggested that it may be a better predictor of cocaine-taking behaviors than the P120 Δ+ parameter estimate. These data, collectively, indicate that the neurodevelopmental changes in specific decision-making circuits are critically involved in regulating drug use susceptibility and highlight adolescence as an important developmental period in the pathology of addiction.

Adolescent trajectories predict reinforcement-learning and addiction-relevant behaviors in adulthood

The age-related improvement in reversal-learning performance observed in the current study is consistent with previous observations in rats and those made in humans (van der Schaaf et al., 2011; Moin Afshar et al., 2020). We found that the number of reversals that male and female rats were able to complete in a single session increased across the three adolescent ages. This improvement in reversal performance was not explained by differences in the number of trials that rats completed, the number of rewards that rats earned, or in the ability of rats to acquire the initial discrimination. The increase in the number of reversals during adolescence is likely due to improvements in adaptive choice behavior. Our computational analyses revealed that the improvement in reversal-learning performance was due to age-related increases in value updating following a positive outcome: As the age of rats increased, the likelihood that they would repeat a rewarded action increased. This was specific to positive-outcome updating because value updating following a negative outcome did not change across adolescence.

Our data support adolescence as a critical period in the development of reward-based mechanisms and addiction susceptibility. The rate of change in the Δ+ parameter during adolescence was predictive of the Δ+ parameter in adulthood and the amount of cocaine the same rats self-administered as adults. Rats who had a lower rate of improvement in the Δ+ parameter during adolescence had a smaller Δ+ parameter at the P120 assessment and self-administered more cocaine than rats who had a larger rate of improvement in the Δ+ parameter. Notably, the rate of improvement in the Δ+ parameter was a better predictor of cocaine-taking behaviors than the Δ+ parameter obtained during the adolescent or adult assessments, suggesting that the biobehavioral changes that occur during adolescence, rather than absolute values, could be important mediators of addiction susceptibility. Repeated assessments of decision-making functions during adolescence in humans – like those being done in the Adolescent Brain Cognitive Development (ABCD) study – could be a powerful approach for assessing addiction susceptibility in drug naïve youths.

The Δ+ parameter quantifies the degree to which action values are updated following a rewarded outcome and differences in the Δ+ parameter could occur through different might reflect variations in how individual rats value the outcome (e.g., how valuable 60 ul of SCM solution is during that session), variation in reward-prediction errors, attention to actions and/or outcomes, and/or variation in their decision strategy. The age-related improvements in the Δ+ parameter that predict cocaine use may, therefore, reflect changes in one – or possibly several – of these reinforcement-learning computations. We hypothesize that rats with slower Δ+ parameter trajectories are less sensitive to the reinforcing effects of rewards and, consequently, consume greater amounts of cocaine; however, it is also possible that rats with a slower Δ+ parameter trajectory are less sensitive to the aversive effects of cocaine. Our ongoing studies are integrating in vivo calcium recordings with dynamic reinforcement-learning measures to identify the specific reinforcement-learning computations that change during adolescence which could provide insights into the predictive relationships between.

Neurodevelopmental mechanisms underlying adolescent trajectories that predict drug use

The neurodevelopmental mechanisms regulating the age-related change in the Δ+ parameter are not known, but there is evidence that the dopamine system undergoes a significant reorganization during adolescence (Sturman and Moghaddam 2011). For example, midbrain dopamine neurons fire at a faster rate in adolescent rats compared to adults, which had been proposed to elevate dopamine tone and receptor density in the striatum and prefrontal cortex (McCutcheon et al. 2012). Dopamine D3 receptors regulate the activity of midbrain dopamine neurons (Piercey et al. 1996), and our recent work has found that midbrain D3 receptor availability is correlated with individual differences in the Δ+ parameter and predictive of cocaine self-administration in adult rats (Groman et al. 2020a). Individual differences in age-related changes in the expression of dopamine D3 receptors may, therefore, play a key role in regulating the Δ+ parameter trajectories that are predictive of drug-taking behaviors. Moreover, recent evidence has indicated that dopamine neurons encode rewards differently in adolescents compared to adults (McCane et al. 2021), suggesting that age-related changes in midbrain D3 expression may influence reward-prediction error encoding that we hypothesize to mediate the relationship between the Δ+ parameter trajectories and cocaine self-administration.

In addition to these dopaminergic adaptations, adolescence is also a critical developmental period in the maturation of the prefrontal cortex. Neuroimaging studies have observed changes in structure, function, and connectivity of the orbitofrontal cortex (OFC) during adolescence (Sturman and Moghaddam 2011; Karlsgodt et al. 2015; Ikuta et al. 2018), a region known to be critically involved in reversal learning and reinforcement learning. We, and others, have reported that select reinforcement-learning computations are controlled by distinct orbitofrontal circuits (Groman et al. 2019a; Hirokawa et al. 2019). Specifically, we found that ablation of amygdala projections to the OFC impaired performance of rats in a reversal-learning task and disrupted value updating for positive outcomes (e.g., Δ+ parameter; Groman et al. 2019a). It is possible, therefore, that the increase in the Δ+ parameter during adolescence is due to changes in amygdala projections to the OFC. Indeed, connectivity between the amygdala and OFC changes during adolescence (Gee et al., 2013) and amygdala-OFC connectivity is correlated with problematic drinking behaviors in adults (Crane et al. 2018). Our ongoing studies are examining how age-related changes in amygdala-OFC connectivity in the rat co-vary with adolescent decision-making trajectories and how amygdala-OFC connectivity could serve as a noninvasive predictor of addiction susceptibility.

Summary

The current study demonstrates that adolescent reward-guided, decision-making trajectories are predictive of cocaine-taking behaviors. These data support adolescence as an important developmental period in addiction susceptibility and highlight the need for further investigations into the neurobiological mechanisms mediating developmental phenomenon in normal and abnormal states.

Acknowledgements

This work was supported by Public Health Service grants from the National Institute on Drug Abuse (DA051598 and DA051977 to SMG) and the State of Minnesota through its support of the Medical Discovery Team on Addiction. We acknowledge and thank the Drug Supply Program at the National Institute on Drug Abuse for providing cocaine HCl. The authors thank Dr. Summer Thompson for her insightful comments and critiques of an earlier draft of the manuscript.

Footnotes

Declarations

Conflict of interest The authors declare no competing interests.

References

  1. Ballinger GA (2004) Using generalized estimating equations for longitudinal data analysis. Available at: 10.1177/1094428104263672. Accessed 24 Nov 2018 [DOI]
  2. Barraclough DJ, Conroy ML, Lee D (2004) Prefrontal cortex and decision making in a mixed-strategy game. Nat Neurosci 7:404–410 Available at: http://www.ncbi.nlm.nih.gov/pubmed/15004564 [DOI] [PubMed] [Google Scholar]
  3. Blakemore SJ, Robbins TW (2012) Decision-making in the adolescent brain. Nat Neurosci 15:1184–1191 [DOI] [PubMed] [Google Scholar]
  4. Casey BJ, Jones RM, Hare TA (2008) The adolescent brain. Ann N Y Acad Sci 1124:111–126 Available at: 10.1196/annals.1440.010 Accessed 6 May 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cervantes MC, Laughlin RE, Jentsch JD (2013) Cocaine self-administration behavior in inbred mouse lines segregating different capacities for inhibitory control. Psychopharmacol Available at: http://www.ncbi.nlm.nih.gov/pubmed/23681162 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chambers RA, Taylor JR, Potenza MN (2003) Developmental neurocircuitry of motivation in adolescence: a critical period of addiction vulnerability. Am J Psychiatry 160:1041–1052 Available at: 10.1176/appi.ajp.160.6.1041 Accessed 17 Sept 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Crane NA, Gorka SM, Phan KL, Childs E (2018) Amygdala-orbitofrontal functional connectivity mediates the relationship between sensation seeking and alcohol use among binge-drinking adults. Drug Alcohol Depend 192:208–214 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Dalley JW, Fryer TD, Brichard L, Robinson ES, Theobald DE, Laane K, Pena Y, Murphy ER, Shah Y, Probst K, Abakumova I, Aigbirhio FI, Richards HK, Hong Y, Baron JC, Everitt BJ, Robbins TW (2007) Nucleus accumbens D2/3 receptors predict trait impulsivity and cocaine reinforcement. Science (80- ) 315:1267–1270 Available at: http://www.ncbi.nlm.nih.gov/pubmed/17332411 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Ersche KD, Roiser JP, Robbins TW, Sahakian BJ (2008) Chronic cocaine but not chronic amphetamine use is associated with perseverative responding in humans. Psychopharmacol 197:421–431 Available at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18214445 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Featherstone RE, Rizos Z, Nobrega JN, Kapur S, Fletcher PJ (2006) Gestational methylazoxymethanol acetate treatment impairs select cognitive functions: parallels to schizophrenia. Neuropsychopharmacol 2007 322 32:483–492 Available at: https://www.nature.com/articles/1301223 Accessed 30 March 2022 [DOI] [PubMed] [Google Scholar]
  11. Fillmore MT, Rush CR (2006) Polydrug abusers display impaired discrimination-reversal learning in a model of behavioural control. J Psychopharmacol 20:24–32 Available at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16174667. [DOI] [PubMed] [Google Scholar]
  12. Frank MJ, Seeberger LC, O’Reilly RC (2004) By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science (80- ) 306:1940–1943 Available at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15528409. [DOI] [PubMed] [Google Scholar]
  13. Gee DG, Gabard-Durnam LJ, Flannery J, Goff B, Humphreys KL, Telzer EH, Hare TA, Bookheimer SY, Tottenham N (2013) Early developmental emergence of human amygdala-prefrontal connectivity after maternal deprivation. Proc Natl Acad Sci U S A 110:15638–15643 Available at: 10.1073/pnas.1307893110 Accessed 5 April 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Groman SM, Thompson SL, Lee D, Taylor JR (2022) Reinforcement learning detuned in addiction: integrative and translational approaches. Trends Neurosci 45:96–105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Groman SM, Jentsch JD (2012) Cognitive control and the dopamine D(2)-like receptor: a dimensional understanding of addiction. Depress Anxiety 29:295–306 Available at: http://www.ncbi.nlm.nih.gov/pubmed/22147558 [DOI] [PubMed] [Google Scholar]
  16. Groman SM, Lee B, Seu E, James AS, Feiler K, Mandelkern MA, London ED, Jentsch JD (2012) Dysregulation of D2-mediated dopamine transmission in monkeys after chronic escalating methamphetamine exposure. J Neurosci 32:5843–5852 Available at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=22539846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Groman SM, Smith NJ, Petrullli JR, Massi B, Chen L, Ropchan J, Huang Y, Lee D, Morris ED, Taylor JR (2016) Dopamine D3 receptor availability is associated with inflexible decision making. J Neurosci 36:6732–6741 Available at: http://www.ncbi.nlm.nih.gov/pubmed/27335404 [Accessed De 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Groman SM, Rich KM, Smith NJ, Lee D, Taylor JR (2018a) Chronic exposure to methamphetamine disrupts reinforcement-based decision making in rats. Neuropsychopharmacology 43 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Groman SM, Rich KM, Smith NJ, Lee D, Taylor JR (2018b) Chronic exposure to methamphetamine disrupts reinforcement-based decision making in rats. Neuropsychopharmacology 43:770–780 Available at: http://www.ncbi.nlm.nih.gov/pubmed/28741627 Accessed 11 May 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Groman SM, Keistler C, Keip AJ, Hammarlund E, DiLeone RJ, Pittenger C, Lee D, Taylor JR (2019a) Orbitofrontal circuits control multiple reinforcement-learning processes. Neuron 103:734–746. e3 Available at: http://www.ncbi.nlm.nih.gov/pubmed/31253468 Accessed 23 Aug 2019a [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Groman S, Hillmer A, Liu H, Fowles K, Holden D, Morris E, Lee D, Taylor J (2020) Midbrain D 3 receptor availability predicts escalation in cocaine self-administration. Biol Psychiatry 88:767–776 Available at: https://pubmed.ncbi.nlm.nih.gov/32312578/ Accessed 4 November 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Groman SM, Hillmer AT, Liu H, Fowles K, Holden D, Morris ED, Lee D, Taylor JR (2020b) Dysregulation of decision making related to metabotropic glutamate 5, but not midbrain D3, receptor availability following cocaine self-administration in rats. Biol Psychiatry 88:777–787 Available at: http://www.ncbi.nlm.nih.gov/pubmed/32826065 Accessed 5 November 2020b [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Groman SM, Massi B, Mathias SR, Lee D, Taylor JR (2019b) Model-free and model-basedXnfluences in addiction-related behaviors. Biol Psychiatry 85:936–945 Available at: https://linkinghub.elsevier.com/retrieve/pii/S0006322318321218 Accessed 21 Sept 2019b [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Hayes AF (2018) Introduction to mediation, moderation, and conditional process analysis, second edition: a regression-based approach. Guilford Press; 46:1–692 Available at: https://www.guilford.com/books/Introduction-to-Mediation-Moderation-and-Conditional-Process-Analysis/Andrew-Hayes/9781462534654 22 November 2021 [Google Scholar]
  25. Hirokawa J, Vaughan A, Masset P, Ott T, Kepecs A (2019) Frontal cortex neuron types categorically encode single decision variables. Nature 576:446–451 Available at: https://pubmed.ncbi.nlm.nih.gov/31801999/ [Accessed November 5, 2020]. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Ikuta T, del Arco A, Karlsgodt KH (2018) White matter integrity in the fronto-striatal accumbofrontal tract predicts impulsivity. Brain Imaging Behav 12:1524–1528 Available at: 10.1007/s11682-017-9820-x [Accessed January 8, 2020]. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Ito M, Doya K (2009) Validation of decision-making models and analysis of decision variables in the rat basal ganglia. J Neurosci 29:9861–9874 Available at: http://www.ncbi.nlm.nih.gov/pubmed/19657038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Jentsch JD, Olausson P, De La Garza R 2nd, Taylor JR (2002) Impairments of reversal learning and response perseveration after repeated, intermittent cocaine administrations to monkeys. Neuropsychopharmacology 26:183–190 Available at: http://www.ncbi.nlm.nih.gov/pubmed/11790514. [DOI] [PubMed] [Google Scholar]
  29. Karlsgodt KH, John M, Ikuta T, Rigoard P, Peters BD, Derosse P, Malhotra AK, Szeszko PR (2015) The accumbofrontal tract: diffusion tensor imaging characterization and developmental change from childhood to adulthood. Hum Brain Mapp 36:4954–4963 Available at: http://www.ncbi.nlm.nih.gov/pubmed/26366528 [A30 ccessed March, 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Lee D (2013) Decision making: from neuroscience to psychiatry. Neuron 78:233–248 Available at: http://www.ncbi.nlm.nih.gov/pubmed/23622061 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. McCane AM, Wegener MA, Faraji M, Garcia MTR, Wallin-Miller K, Costa VD, Moghaddam B (2021) Adolescent dopamine neurons represent reward differently during action and state guided learning. J Neurosci:JN-RM-1321–21 Available at: https://www.jneurosci.org/content/early/2021/10/04/JNEUROSCI.1321-21.2021. Access FEB 11 Nov 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. McCutcheon JE, Conrad KL, Carr SB, Ford KA, McGehee DS, Marinelli M (2012) Dopamine neurons in the ventral tegmental area fire faster in adolescent rats than in adults. J Neurophysiol 108:1620–1630 Available at: https://pubmed.ncbi.nlm.nih.gov/22723669/ [Accessed November 9, 2021]. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Mitchell MR, Weiss VG, Beas BS, Morgan D, Bizon JL, Setlow B (2014) Adolescent risk taking, cocaine self-administration, and striatal dopamine signaling. Neuropsychopharmacology 39:955–962 Available at: http://www.ncbi.nlm.nih.gov/pubmed/24145852 [Accessed October 10, 2017]. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Moin Afshar N, Keip AJ, Taylor JR, Lee D, Groman SM (2020) Reinforcement learning during adolescence in rats. J Neurosci 40:5857–5870. Available at: 10.1523/JNEUROSCI.0910-20.2020 Accessed 27 July 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Niv Y (2009) Reinforcement learning in the brain. J Math Psychol 53:139–154 Available at: http://linkinghub.elsevier.com/retrieve/pii/S0022249608001181 [Accessed August 25, 2018]. [Google Scholar]
  36. Pan W (2001) Akaike’s information criterion in generalized estimating equations. Biometrics 57:120–125 Available at: http://www.ncbi.nlm.nih.gov/pubmed/11252586 Accessed 24 Nov 2018 [DOI] [PubMed] [Google Scholar]
  37. Perry JL, Nelson SE, Carroll ME (2008) Impulsive choice as a predictor of acquisition of IV cocaine self- administration and reinstatement of cocaine-seeking behavior in male and female rats. Exp Clin Psychopharmacol 16:165–177 Available at: http://www.ncbi.nlm.nih.gov/pubmed/18489020. [DOI] [PubMed] [Google Scholar]
  38. Piercey MF, Hoffmann WE, Smith MW, Hyslop DK (1996) Inhibition of dopamine neuron firing by pramipexole, a dopamine D3 receptor-preferring agonist: comparison to other dopamine receptor agonists. Eur J Pharmacol 312:35–44 [DOI] [PubMed] [Google Scholar]
  39. Rao KN, Sentir AM, Engleman EA, Bell RL, Hulvershorn LA, Breier A, Chambers RA (2016) Toward early estimation and treatment of addiction vulnerability: radial arm maze and N-acetyl cysteine before cocaine sensitization or nicotine self-administration in neonatal ventral hippocampal lesion rats. Psychopharmacology (Berl) 233:3933–3945 Available at: 10.1007/s00213-016-4421-8 [Accessed 30 March 2022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. van der Schaaf ME, Warmerdam E, Crone EA, Cools R (2011) Distinct linear and non-linear trajectories of reward and punishment reversal learning during development: relevance for dopamine’s role in adolescent decision making. Dev Cogn Neurosci 1:578–590 Available at: https://www.sciencedirect.com/science/article/pii/S1878929311000648#fig0015 Accessed 9 May 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Spear LP (2013) Adolescent Neurodevelopment J Adolesc Heal 52:S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Squeglia LM, Cservenka A (2017) Adolescence and drug use vulnerability: findings from neuroimaging. Curr Opin Behav Sci 13:164–170 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Sturman DA, Moghaddam B (2011) Reduced neuronal inhibition and coordination of adolescent prefrontal cortex during motivated behavior. J Neurosci 31:1471–1478 Available at: https://pubmed.ncbi.nlm.nih.gov/21273431/ Accessed 31 May 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press. Available at: https://books.google.com/books?hl=en&lr=&id=CAFR6IBF4xYC&pgis=1 Accessed 6 June 2015 [Google Scholar]
  45. Zhukovsky P, Puaud M, Jupp B, Sala-Bayo J, Alsiö J, Xia J, Searle L, Morris Z, Sabir A, Giuliano C, Everitt BJ, Belin D, Robbins TW, Dalley JW (2019) Withdrawal from escalated cocaine self-administration impairs reversal learning by disrupting the effects of negative feedback on reward exploitation: a behavioral and computational analysis. Neuropsychopharmacology Available at: http://www.ncbi.nlm.nih.gov/pubmed/30952156 [Accessed April 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES