Adolescents exhibit reduced Pavlovian biases on instrumental learning

Hillary A Raab; Catherine A Hartley

doi:10.1038/s41598-020-72628-w

. 2020 Sep 25;10:15770. doi: 10.1038/s41598-020-72628-w

Adolescents exhibit reduced Pavlovian biases on instrumental learning

Hillary A Raab ¹, Catherine A Hartley ^1,^2,^✉

PMCID: PMC7519144 PMID: 32978451

Abstract

Multiple learning systems allow individuals to flexibly respond to opportunities and challenges present in the environment. An evolutionarily conserved “Pavlovian” learning mechanism couples valence and action, promoting a tendency to approach cues associated with reward and to inhibit action in the face of anticipated punishment. Although this default response system may be adaptive, these hard-wired reactions can hinder the ability to learn flexible “instrumental” actions in pursuit of a goal. Such constraints on behavioral flexibility have been studied extensively in adults. However, the extent to which these valence-specific response tendencies bias instrumental learning across development remains poorly characterized. Here, we show that while Pavlovian response biases constrain flexible action learning in children and adults, these biases are attenuated in adolescents. This adolescent-specific reduction in Pavlovian bias may promote unbiased exploration of approach and avoidance responses, facilitating the discovery of rewarding behavior in the many novel contexts that adolescents encounter.

Subject terms: Human behaviour, Motivation, Learning and memory, Learning algorithms

Introduction

From an early age, individuals can rely on distinct forms of learning to maximize rewards and avoid punishments in their environments. Through Pavlovian learning, a neutral cue that predicts reward or punishment acquires the positive or negative value associated with its outcome¹. This learned association between a cue and its predicted outcome can then drive reflexive behavioral responses that are elicited in a valence-dependent manner. Expectations of reward typically lead to approach behaviors^2,3, whereas expectations of punishment tend to foster the inhibition of action⁴. For instance, by learning that a sweet smell often precedes enjoying a scrumptious dessert, you might be drawn toward the fragrant scent in anticipation of an afternoon treat. Or by learning that the sound of a horn often is heard before a car accident, you might freeze upon hearing the noise in anticipation of a negative event. This evolutionarily conserved learning mechanism allows approach or action inhibition to be readily deployed as “default” reactions to anticipated rewards or threats, respectively, without any need to evaluate the efficacy of these responses through experience⁵. Whereas Pavlovian learning couples expectations of reward or punishment with reflexive reactions that have no causal influence on the outcomes that actually occur, instrumental learning enables the discovery and deployment of actions that can directly influence the likelihood of obtaining reward or avoiding punishment. By learning which actions bring about beneficial outcomes, an individual can flexibly engage active or inactive responses based on their causal efficacy, rather than emitting reactive responses that are controlled by cues in their environment.

Pavlovian and instrumental learning systems do not operate in isolation⁶. When active approach responses are required to secure reward, or one must inhibit action to avoid punishment, the Pavlovian reactions aligned with these instrumental contingencies can facilitate learning. However, when Pavlovian tendencies conflict with instrumental contingencies, these default reactions can impede the learning of adaptive instrumental actions. For example, freezing at the sound of a car horn might be beneficial if you are about to step into oncoming traffic, but might be harmful if it prevents you from running out of the way if you are already in the road. Studies across species highlight the difficulty of learning to perform instrumental actions that conflict with Pavlovian expectancies, as the evolutionarily conserved tendencies to approach cues predictive of reward or to inhibit action to cues predictive of punishment can be too robust to overcome^7–12. In active avoidance paradigms, rodents have difficulty learning to cross to the opposite side of a conditioning chamber in order to avoid being shocked, due to their tendency to freeze following a threat-predictive cue^13–15. Similarly, in the appetitive domain, rats have difficulty learning to earn a food reward by not approaching a cue that predicts the reward¹⁶. These studies demonstrate that Pavlovian reactions to anticipated reward or punishment can pose constraints on instrumental action, disrupting flexible goal-directed learning.

To date, the biasing effects of Pavlovian learning on behavior have been studied primarily in adult rodents and humans^10,17–23. While both Pavlovian and instrumental learning are evident from early childhood^24–27, relatively few studies have investigated the interaction between Pavlovian and instrumental learning across development in humans²⁸. Thus, it is unclear when Pavlovian constraints on action learning emerge and how they change from childhood to adulthood.

In the present study, we tested the extent to which Pavlovian reactions differentially biased instrumental learning across development. We hypothesized that these Pavlovian learning constraints would decrease from childhood to adulthood, as the cognitive capabilities and neural circuits that support goal-directed instrumental learning are refined^25,29–32. To test this hypothesis, we had 61 participants, 8–25 years of age (20 children aged 8–12, 20 adolescents aged 13–17, and 21 adults aged 18–25; see methods for detailed demographics of participants), complete a probabilistic Go/No-Go task in which valence and action were orthogonalized. We adapted a well-validated paradigm^20,21,33 for use in a developmental cohort by using a child-friendly narrative to frame the task. The goal of the task was to earn as many “tickets” as possible by interacting with four different colored robots. Valence (i.e., the potential to either win or lose a ticket) and action (i.e., the need to either press or not press a button) were orthogonalized across the four robots. Two of the robots were associated with the potential to win a ticket (“ticket givers”), and two robots were associated with the potential loss of a ticket (“ticket takers;” Fig. 1). For both “ticket givers” and “ticket takers,” the correct response for one of the robots was to press a button, whereas the correct response for the other robot was to withhold a button press. Participants were instructed that robots could be either “ticket givers” or “ticket takers” and that the correct action for each robot could be learned through feedback. For the “ticket givers,” a correct response resulted in winning a ticket 80% of the time but no ticket (the “null” outcome) 20% of the time; whereas for the “ticket takers,” a correct response avoided the loss of a ticket (the “null” outcome) 80% of the time but resulted in the loss of a ticket 20% of the time. Incorrect responses yielded feedback with reversed outcome probabilities.

Task assessing Pavlovian influences on instrumental learning. (a) On each trial, participants saw one of four distinctly colored robots (cue). Participants could then either press (“Go”) or not press (“No-Go”) the robot’s “button” (the target) when it appeared. (b) Following their choice, participants received probabilistic feedback (outcomes for “Win” trials: win a ticket or neither win nor lose a ticket; outcomes for “Avoid Losing” trials: neither win nor lose a ticket or lose a ticket). (c) Each uniquely colored robot, which corresponded to one of the four trial types, was associated with a correct response (“Go” or “No-Go”) and an outcome (rewards or punishments). Pavlovian reactions and instrumental contingencies were aligned for the trial types on the bolded diagonal, whereas for the other two trial types they were in opposition.

The Pavlovian default responses to approach expected reward or to inhibit action in the face of potential punishment were aligned with the correct instrumental response on “Go to Win” and “No-Go to Avoid Losing” trials, whereas the default responses were in conflict with the optimal instrumental actions on “Go to Avoid Losing” and “No-Go to Win” trials. Critically, a Pavlovian bias was evident if performance was better for the robots for which Pavlovian tendencies and correct instrumental responses were congruent than those for which they were incongruent. If Pavlovian learning did not bias instrumental action learning, then performance would be comparable across all four trial types.

Results

To characterize patterns of age-related change in task performance, we assessed through model comparison whether each measure of choice behavior was best captured by a statistical model that included age alone (i.e., a linear model) or a model that included an additional nonlinear age-squared term, as in previous studies^34–36. The best-fitting model thus indicates whether the developmental trajectory of choice behavior exhibits a continuous linear progression from childhood to adulthood, or whether behavior shows either adolescent-specific effects (i.e., children’s and adults’ behavior are more similar to each other than to adolescents’ behavior) or adolescent-emergent effects (i.e., behavior of adolescents is more similar to adults’ behavior than to children’s). Age and age-squared were included as continuous variables in all analyses. However, age is represented categorically (i.e., grouping children, adolescents, and adults) in some figures for the purpose of depicting results, and we use categorical age terminology in our interpretation and discussion of the findings.

Behavioral analyses

First, we tested whether there was a relationship between the number of tickets won and participant age. A linear model that included both age and age-squared as predictors of number of tickets won significantly improved the fit compared to a model that included age alone (F(1,58) = 15.959, p = 0.0002). We found significant effects of age (β value = 7.523, s.e. = 1.804, t(58) = 4.170, p < 0.001, Cohen’s ƒ² = 0.300) and age-squared (β value = −7.334, s.e. = 1.836, t(58) = -3.995, p < 0.001, Cohen’s ƒ² = 0.275) on the number of tickets won, indicating a peak in overall task performance during late adolescence (Fig. 2a).

Behavioral performance by age. (a) Number of tickets won is plotted as a function of age. A quadratic line of best fit is shown. The error bars represent a .95 confidence interval. (b) Mean accuracy across all trials is plotted for each trial type, separately by age group. The darker shaded bars depict the trials for which Pavlovian tendencies are aligned with the optimal instrumental response, and the lighter shaded bars depict the trials for which Pavlovian tendencies are in conflict with the optimal instrumental response. Yellow points represent mean accuracy that was calculated from simulating data using the parameter estimates for each participant from the best-fitting model. The following abbreviations are used: GW: Go to Win; GAL: Go to Avoid Losing; NGW: No-Go to Win; NGAL: No-Go to Avoid Losing. Error bars represent ± 1 SEM.

We next examined how participants’ choice behavior gave rise to this nonlinear age-related change in task performance. The qualitative pattern of performance on each trial type is depicted for children, adolescents, and adults in Fig. 2b (and Supplementary Fig. S1). To test how performance for each trial type differs as a function of age, we performed four separate linear regressions in which age and age-squared predict accuracy. The model that included age alone provided the best fit for accuracy on “Go to Win” (GW) trials, although age was not a significant predictor (F(1,58) = 0.557, p = 0.458; age: β value = 0.013, s.e. = 0.020, t(58) = 0.647, p = 0.52, Cohen’s ƒ² = 0.007. For all other trial types (i.e., Go to Avoid Losing (GAL), No-Go to Win (NGW), No-Go to Avoid Losing (NGAL)), the model that included an age-squared term provided a better fit than the model with age alone (GAL: F(1,58) = 9.937, p = 0.003; NGW: F(1,58) = 8.828, p = 0.004; NGAL: F(1,58) = 10.004, p = 0.002). For these three trial types, age and age-squared were significant predictors of accuracy (GAL age: β value = 0.089, s.e. = 0.020, t(58) = 4.447, p < 0.001, Cohen’s ƒ² = 0.341; GAL age-squared: β value = − 0.064, s.e. = 0.020, t(58) = − 3.152, p = 0.003, Cohen’s ƒ² = 0.171; NGW age: β value = 0.084, s.e. = 0.038, t(58) = 2.204, p = 0.032, Cohen’s ƒ² = 0.084; NGW age-squared: β value = − 0.116, s.e. = 0.039, t(58) = − 2.971, p = 0.004, Cohen’s ƒ² = 0.152; NGAL age: β value = 0.080, s.e. = 0.018, t(58) = 4.402, p < 0.001, Cohen’s ƒ² = 0.334; NGAL age-squared: β value = − 0.058, s.e. = 0.018, t(58) = − 3.163, p = 0.002, Cohen’s ƒ² = 0.172). Thus, apart from GW for which accuracy was the highest and was comparable across age, performance on the other three trial types exhibited significant nonlinear improvements from childhood into adulthood.

To quantify the Pavlovian influence on instrumental learning for each individual, we first calculated a Pavlovian performance bias score by averaging how often reward-related cues invigorated action (number of Go responses to Win cues/total number of Go responses) and how often punishment-related cues suppressed action (number of No-Go responses to Avoid Losing cues/total number of No-Go responses). A bias score of 0.5 indicates the absence of a Pavlovian bias; whereas higher scores reflect a greater Pavlovian bias on action, with 1 being the maximum bias score. The best-fitting linear model predicting age-related changes in this bias score included both age and age-squared as regressors, of which age-squared was a significant predictor of Pavlovian performance bias (F(1,58) = 4.136, p = 0.047; age: β value = −0.018, s.e. = 0.014, t(58) = − 1.271, p = 0.209, Cohen’s ƒ² = 0.028; age-squared: β value = 0.03, s.e. = 0.015, t(58) = 2.034, p = 0.047, Cohen’s ƒ² = 0.071). This analysis revealed that children and adults exhibited a greater Pavlovian bias than adolescents (Fig. 3). This bias appears to be driven comparably by both a reward-driven invigoration of action and a punishment-driven suppression of action (Supplementary Figure S2). A mixed-effects logistic regression corroborated this adolescent-specific attenuation of Pavlovian bias on learning (Supplementary Table S1).

Pavlovian performance bias score by age. A performance bias of .5 indicates no Pavlovian bias, whereas larger scores represent greater Pavlovian interference with instrumental learning. The relationship between age and Pavlovian bias score is best fit by a quadratic function. The error bars represent a .95 confidence interval.

Computational modeling

By formalizing the value computations involved in learning from valenced outcomes, computational models can disentangle the contribution of a Pavlovian bias on choice behavior from differences in value updating, choice stochasticity, a value-independent bias toward action, or sensitivity to reward. We used reinforcement-learning models to dissociate the component processes of learning in the task and test whether the age-related changes observed in choice behavior could be attributed specifically to Pavlovian biases on action-value computation over the course of the learning task. We compared a set of nested reinforcement-learning models that were fit to participants’ data to determine which model best explained choice behavior across all participants. The models were chosen based on prior work in adults using a variant of this task (see Supplementary Table S2 for full list of models tested)^21,33. Median Akaike Information Criterion (AIC) values, for which lower values reflect a better fit, were used to determine which model provided the best explanation of behavior³⁷.

The model that best fit choice behavior included a learning rate, lapse rate, Go bias, Pavlovian bias, and a single reinforcement sensitivity term (Supplementary Table S2; see Methods for details on the other models tested). This model estimated an action-value (Q value or Q(a,s)), for each potential action ( $a$ ; i.e., Go or No-Go) for each of the four robot stimuli (s). These Q value estimates were updated on every trial (t) using an error-driven learning function (Eq. 1). Outcomes (either 1, 0, -1) were included in the model through the reward term (r) and multiplied by a reinforcement sensitivity parameter (ρ) that scaled the effective size of rewards and punishments, with higher values magnifying differences between Q values. The difference between the effective reward on that trial (ρr_t) and the previous Q value estimate (Q_t-1( $a$ _t,s_t)) indexes the reward prediction error, indicating whether the effective outcome was better or worse than expected. The Q value estimate was incrementally updated following each outcome by adding the reward prediction error scaled by a learning rate (α).

Q_{t} (a_{t}, s_{t}) = Q_{t - 1} (a_{t}, s_{t}) + α (ρ r_{t} - Q_{t - 1} (a_{t}, s_{t}))

Two weights were introduced that altered this action value estimate (weighted action estimate (W_t( $a$ _t,s_t))). The first was a Go bias (b), which captured an increased tendency to press the button (Eq. 2).

W_{t} (a, s) = \begin{matrix} Q_{t} (a, s) + b & i f a = g o \\ Q_{t} (a, s) & else \end{matrix}

The second was a Pavlovian bias term (π) that was multiplied by the stimulus value estimate (V_t(s)) for each robot (Eq. 3). The Pavlovian bias parameter indexed the degree to which action was facilitated for cues associated with reward and inhibited for those associated with punishment. The stimulus value estimate was updated on each trial through reinforcement, in a similar manner to the Q value estimate (Eq. 4).

W_{t} (a, s) = \begin{matrix} Q_{t} (a, s) + b + π V_{t} (s) & i f a = g o \\ Q_{t} (a, s) & else \end{matrix}

V_{t} (s_{t}) = V_{t - 1} (s_{t}) + α (ρ r_{t} - V_{t - 1} (s_{t}))

Action values from the model were transformed into choice probabilities using a squashed softmax choice function³⁸ that included a lapse rate (ξ), which captures the effects of inattention (Eq. 5).

p (a_{t} | s_{t}) = [\frac{e x p (W (a_{t} | s_{t})}{\sum_{a^{'}} exp (W (a^{'} | s_{t}))}] (1 - ξ) + \frac{ξ}{2}

We then investigated patterns of age-related change for each free parameter in the model by performing linear regressions including age alone or both age and age-squared as predictors. Lapse rate, learning rate, and reinforcement sensitivity were best fit by linear regression models that included age alone (Table 1; Supplementary Fig. S3; lapse rate: F = 3.192, p = 0.079; learning rate: F = 0.121, p = 0.729; reinforcement sensitivity: F = 1.057, p = 0.308). However, age was not a significant predictor of lapse rate or learning rate, indicating no evidence of age-related changes in inattention or updating the value of an action (lapse rate: β value = − 0.027, s.e. = 0.026, t(58) = − 1.02, p = 0.312, Cohen’s ƒ² = 0.018; learning rate: β value = 0.056, s.e. = 0.033, t(58) = 1.672, p = 0.1, Cohen’s ƒ² = 0.047). There was a significant effect of age on reinforcement sensitivity, revealing greater effective reinforcement with age (β value = 0.935, s.e. = 0.396, t(58) = 2.362, p = 0.022, Cohen’s ƒ² = 0.095). The inclusion of age-squared in the model significantly improved the fit for the Go and Pavlovian bias parameters (Go bias: F = 4.447, p = 0.039; Pavlovian bias: F = 11.916, p = 0.001). Age-squared, but not linear age, was a significant predictor of the Go bias (age: β value = − 0.039, s.e. = 0.112, t = − 0.350, p = 0.728, Cohen’s ƒ² = 0.002; age-squared: β value = 0.24, s.e. = 0.114, t = 2.109, p = 0.039, Cohen’s ƒ² = 0.077). The Pavlovian bias significantly decreased linearly with age (β value = − 0.903, s.e. = 0.223, t = − 4.044, p = 0.001, Cohen’s ƒ² = 0.282) and also exhibited a significant effect of age-squared (β value = 0.784, s.e. = 0.227, t = 3.452, p = 0.001, Cohen’s ƒ² = 0.205). These nonlinear effects revealed that the Go bias and the Pavlovian bias parameter estimates were both attenuated in adolescents, relative to younger and older individuals. The significant linear relationship between Pavlovian bias and age suggests that, in addition to an adolescent-specific effect, this bias is highest in the youngest individuals and decreases with age. By using the parameter estimates from all compared models to simulate choice behavior for each participant, the best-fitting model qualitatively reproduced the pattern of behavior evident in Fig. 2b (see Supplementary Fig. S4 for predictions from all models). Although parameters from the best-fitting model exhibited some degree of correlation (Supplementary Table S3), parameter recovery revealed that all parameters were highly recoverable (see Parameter Recovery section of the supplement).

Table 1.

Reinforcement learning parameter estimates.

Parameter	Median (Q1,Q3)			Regression fit: age vs. age²
Parameter	Child	Adolescent	Adult	Regression fit: age vs. age²
Lapse rate (ξ)	0.075 (0.016, 0.250)	0.067 (0.008, 0.229)	0.026 (0.007, 0.139)	Age
Learning rate (α)	0.223 (0.055, 0.467)	0.364 (0.173, 0.517)	0.482 (0.164, 0.573)	Age
Go bias (b)	0.537 (0.143, 0.942)	0.215 (− 0.007, 0.491)	0.310 (0.006, 1.04)	Age²
Pav bias (π)	1.388 (0.502, 2.694)	0.229 (0.167, 0.411)	0.398 (0.113, 0.862)	Age²
Reinforcement Sensitivity (ρ)	3.159 (1.750, 4.247)	4.846 (3.835, 8.383)	4.512 (2.329, 8.758)	Age

Open in a new tab

Median parameter estimates, as well as the first (Q1) and third (Q3) quartile, are shown separately for each categorical age group. Linear regressions were performed to test the relationship of each parameter estimates with age, which was included as a continuous variable. The addition of age-squared was compared against a model including age alone to identify the best-fitting model, which is listed in the column on the right.

Finally, analysis of response times revealed that older participants made faster responses, and responses to reward-associated stimuli were faster when the correct response was “Go” (Supplementary Table S4).

Discussion

In this study, participants were tasked with learning the optimal response to cues that were associated with either positive or negative outcomes. We investigated how hard-wired tendencies to approach cues associated with reward and to withhold action to cues associated with punishment might differentially constrain participants’ ability to learn optimal actions over the course of development. Unexpectedly, we found that adolescents’ learning was less biased, relative to both younger and older individuals, in two distinct ways. Whereas children and, to a lesser extent, adults exhibited a robust Pavlovian bias on their action value learning, this interference was reduced in adolescents. Adolescents also displayed a diminished bias toward action over inaction (the “Go” bias).

Adolescents in our study showed better performance in the “No-Go to Win” condition, relative to both adults and children. Consistent with our findings, past work in adults using variants of this task has observed the greatest evidence of Pavlovian interference with instrumental learning in this condition, with performance typically approaching chance in this condition^18,21,39. Better performance on “Go to Win” than “No-Go to Win” trials can arise not only through reward invigoration of action, but also from an enhanced ability to “assign credit” for rewarding outcomes to past actions, rather than to inaction. Previous studies using task designs capable of dissociating these features of learning find that adults exhibit both a Pavlovian bias on action learning, as well as difficulty linking reward with inaction^39,40. Thus, it is possible that in addition to a reduced Pavlovian bias, adolescents might also exhibit a heightened ability to learn associations between rewards and past inaction. Future studies should clarify whether developmental differences in the ability to assign credit to passive responses might contribute to adolescents’ superior performance in this task.

Children and adults in our sample, but not adolescents, exhibited a robust Pavlovian bias that was, in part, driven by a suppression of action to punishment-associated cues. In environments in which outcomes are stochastic, actions that are beneficial, on average, may yield occasional negative outcomes. Strong Pavlovian biases may prevent the continued engagement with previously punished contexts that is necessary to discover the opportunities for reward that they present^41–43. Such tendencies to reflexively withdraw from or avoid situations associated with negative outcomes represent a type of “learning trap” that hinders an individual from sampling sufficiently to discern the true statistics of an environment, allowing potential rewards to go undiscovered^44,45. Given the many novel contexts that adolescents encounter during their transition toward parental independence^46,47, a reduced Pavlovian bias might facilitate exploration and unbiased evaluation of action values, allowing adolescents to approach uncertain or ambiguous situations to discover their true value⁴⁸. However, such willingness to continue sampling in stochastic environments that may yield negative outcomes may contribute to a seemingly heightened willingness to take risks during this developmental stage^49,50.

Pavlovian responses serve critical survival functions^51,52. However, their expression can also constrain the flexibility of learning. Theoretical accounts propose that the degree of control afforded in learning environments may be used to optimally calibrate the expression of Pavlovian versus instrumental responses^53–56. Instrumental learning is only adaptive in controllable environments, in which actions can leverage the true contingent causal structure of the environment to bring about beneficial outcomes. In contexts where outcomes are not contingent, the latent assumptions of causality inherent in instrumental learning are overly complex and inaccurate⁵⁷, and reliance on Pavlovian “default” response tendencies may instead be optimal. In active avoidance paradigms, which afford the opportunity for instrumental control of outcomes, adolescent rodents learn to proactively prevent punishment (e.g., by shuttling) better than younger and older animals, whose performance is impaired to a greater degree by the competing Pavlovian tendency to freeze in the face of threat^15,58. However, in uncontrollable environments, in which there is no effective instrumental response to an anticipated threat, adolescents readily learn and express Pavlovian responses that are particularly resistant to subsequent extinction^59,60. A parsimonious account of these seemingly inconsistent findings may be that adolescents are particularly effective at detecting the controllability of a given situation and calibrating their reliance on Pavlovian versus instrumental action accordingly. Our finding of an adolescent-specific reduction in Pavlovian bias accords with this explanation, as adolescents might infer the highly controllable nature of our task and adaptively deploy instrumental action. However, further studies are needed to clarify whether the degree of environmental controllability modulates the expression of motivated behaviors differentially across development.

Studies examining the neural substrates of Pavlovian bias in adults suggest how age-related changes in the brain might give rise to the developmental pattern of behavior observed here. Consistent with computational models proposing that the value of both stimuli and actions are evaluated during learning^61–63, the ventral and dorsal striatum, respectively, encode signals consistent with these computations⁶⁴, and facilitate Pavlovian and instrumental behaviors^65,66. The prefrontal cortex, through its projections to and from the striatum, is proposed to exhibit sensitivity to contexts in which stimulus and action values conflict and to enable attenuation of Pavlovian reactive biases in such contexts^18,21,39. Corticostriatal circuitry undergoes marked structural and functional age-related changes from childhood into adulthood^{25,32,67–74}. Reduced corticostriatal connectivity in children might constrain their ability to modulate Pavlovian biases on action learning, leading to the heightened expression of valence-action coupling in our youngest participants. Developmental increases in the integration between the prefrontal cortex and striatum may contribute to the linear reductions in these reactive responses with age. Dopamine, a neurotransmitter that innervates both the prefrontal cortex and the striatum, appears to modulate the expression of Pavlovian learning biases^33,40. The dopaminergic system exhibits substantial reorganization across adolescence^75,76, including nonlinear changes in dopamine receptor and transporter density^77–81. Moreover, dopaminergic projections from the striatum to the prefrontal cortex continue to develop in adolescence^82,83. Such nonlinear changes in the dopaminergic system and corticostriatal connectivity might contribute to the adolescent-specific attenuation of the Pavlovian bias. While the combined influence of these linear and nonlinear changes in corticostriatal circuits and dopaminergic signaling likely contributes to the developmental trajectory of Pavlovian bias expression, future studies are needed to understand the neural mechanisms underlying the age-related pattern observed here.

Although our study focused on interactions between Pavlovian and instrumental learning, an extensive literature further distinguishes two forms of instrumental action that are proposed to reflect distinct underlying computations^84,85. “Goal-directed” actions are proposed to arise from a “model-based” learning process that computes the value of an action by prospectively searching a mental model of task contingencies and outcomes. In contrast, “habitual” action learning, forgoes the use of a model, instead allowing rewarding outcomes to directly reinforce associations between stimuli and responses. Habitual behavior is well approximated by a “model-free” reinforcement-learning algorithm that incrementally updates a stored estimate of the value of an action. Importantly, these two forms of instrumental action differ in their sensitivity to Pavlovian influence. Pavlovian reactions interfere with habitual behavior to a greater extent than goal-directed action^22,86. Across tasks that assess these distinct forms of learning, adults who rely more on model-based strategies also exhibit reduced Pavlovian bias⁸⁷. Moreover, pharmacological manipulation of dopamine alters both Pavlovian bias^33,40 and the use of model-based strategies⁸⁸, suggesting that a common neural mechanism contributes to the expression these distinct forms of value-based learning. While our task cannot directly differentiate goal-directed and habitual forms of instrumental learning, both goal-directed behavior and the model-based evaluations proposed to support it have been found to increase with age across development^25,89–92. Thus, children’s greater reliance on model-free instrumental learning may confer heightened vulnerability to Pavlovian interference, with a shift toward model-based computation conferring greater resistance to Pavlovian bias with age.

The present study examined the extent to which Pavlovian responses constrain flexible action learning in healthy development. However, strong Pavlovian influences on instrumental action are characteristic of many forms of psychopathology that typically emerge during adolescence^93,94, including substance abuse and anxiety disorders^95–99. Adolescence is a period of heightened plasticity in the neural circuits that govern motivated behavior^24,100–103, which may render adolescents’ expression of Pavlovian action biases particularly sensitive to experiential variation, such as stress exposure^103,104. Consistent with this notion, past studies in rodents have observed that adolescents’ tendency to imbue cues that predict reward or punishment with inherent motivational value (i.e., “sign-tracking”), is reduced relative to adult animals^105,106, but that this age-related pattern is reversed when adolescent animals are exposed to adverse rearing environments^107,108. This experiential sensitivity aligns with evidence in humans that Pavlovian biases on action learning are exacerbated following adolescent exposure to trauma¹⁰⁹. Further studies examining how life experience interacts with neural plasticity during adolescence to shape learning may shed light on mechanisms that promote resilience or susceptibility to psychopathology.

In this study, we characterized the developmental trajectory of reflexive Pavlovian constraints on flexible action learning. We found that the extent to which these valence-specific response tendencies interfered with instrumental action changed nonlinearly with age, showing a selective attenuation during adolescence. This decreased expression of hard-wired behavioral responses enables greater behavioral flexibility as adolescents learn to respond adaptively to opportunities and challenges in their environment. The influence of Pavlovian learning biases on fundamental behaviors that change over development such as exploration and planning, as well as their mechanistic role in multiple forms of psychopathology that typically emerge prior to adulthood, underscores the importance of a deeper understanding of how interactions between Pavlovian and instrumental learning systems contribute to adaptive and maladaptive behavior over the course of development.

Methods

Participants

Previous studies in adults using variants of this task found robust valence-action coupling in sample sizes of 20 participants total^20,21 or group differences with 20–30 participants per group^110,111. Thus, we targeted a sample size of 20 participants in each age bin, for a total of 60 participants. This is in line with sample sizes for previous studies that have examined developmental changes in evaluative processes^36,89,92. Participants were recruited from the New York City metropolitan area through flyers and outreach events. Participants were screened to exclude for a diagnosis of mood or anxiety disorders, a learning disability, current use of beta-blockers or psychoactive medications, or colorblindness. Sixty-two individuals participated in the study. Data from one child were excluded due to a technical error. Our final sample size was 61 participants, consisting of 20 children (8–12 years old, n = 10 female, mean age: 10.55, SD: 1.34), 20 adolescents (13–17 years old, n = 11 female, mean age: 15.38, SD: 1.45), and 21 adults (18–25 years old, n = 10 female, mean age: 21.28, SD: 2.19). The sample included 31 Caucasians (50.82%), 14 individuals of mixed race (22.95%), 12 Asians (19.67%), 3 African Americans (4.92%), and 1 Pacific Islander (1.64%). Eleven participants identified as Hispanic (18.03%). The study protocol was approved by New York University’s Institutional Review Board, the University Committee on Activities Involving Human Subjects. All research was performed in accordance with the relevant guidelines and regulations. All adult participants and parents of minors provided written informed consent and minors provided assent prior to the study.

Task details

On each trial, participants saw the cue, which was one of the four robots (1,000 ms), followed by a fixation cross (250–3,500 ms), and then the target (described to participants as the robot’s “button”). When participants saw the robot’s “button”, they could decide whether to press the button via a keyboard press (“Go” response) or not press the button (“No-Go” response). Participants had 1,500 ms to respond. If they made a “Go” response, the border of the target would enlarge for the remainder of the 1,500 ms. Following the target, another fixation cross appeared on the screen (1,000 ms) prior to receiving probabilistic feedback (2000 ms). During the inter-trial interval, a fixation cross was presented on the screen (750–1,500 ms).

There were 45 trials for each of the four trial types, resulting in 180 trials total. The colors of the robots were randomized across participants. Stimuli were presented in pseudo-random order ensuring that each robot appeared fifteen times in each of three blocks. After every block of 60 trials, participants were given a break. Prior to the task, participants practiced making “Go” responses by pressing the button and withholding their button press to make “No-Go” responses. Participants also practiced pressing a button and not pressing a button for each type of robot, experiencing that a given robot would give (or take) a ticket for one action and not give (or take) a ticket for the other action. The probabilistic nature of the reinforcements was learned through instruction and experience during the actual game, although no information about the reinforcement probabilities was provided. To encourage learning, participants were instructed that the greater number of tickets won would result in more bonus money at the end of the study, though they were not informed of the specific relationship between tickets and money. In reality, all participants received a $5 bonus regardless of their performance on the task. This experiment was programmed using Cogent 2000, a MATLAB toolbox.

Statistical analysis

Age and age-squared were z-scored prior to inclusion in any linear regression. To calculate age-squared, age was first z-scored and then squared. All p-values reported within the manuscript reflect a two-tailed significance test unless otherwise indicated. Cohen’s ƒ² is a measure of effect size for regression analyses and can be used to calculate the local effect size of predictors in multiple regression. Small, medium, and large effect sizes are represented by ƒ² ≥ 0.02, ƒ² ≥ 0.15, and ƒ² ≥ 0.35, respectively¹¹².

Computational models

Model fitting procedures and parameter estimation were performed using maximum a posteriori method using the fmincon function in Matlab 9.1.0. In addition to the best-fitting model, we also compared models that divided the reinforcement sensitivity term into separate reward sensitivity (ρ_rew) and punishment sensitivity (ρ_pun) parameters, capturing asymmetry in the effective size of positive and negative reinforcement. Whereas the best-fitting model included weights on the action estimate, in the simplest models the weighted action estimate (W_t(a_t,s_t)) was equivalent to the Q value estimate.

The parameters were constrained as follows: the lapse rate and learning rate between 0 to 1; the reinforcement sensitivity parameters (ρ, ρ_rew, ρ_pun) and the Pavlovian bias between 0 and ∞; the Go bias parameter was not constrained (-∞ to ∞). Priors were chosen to be minimally informative and were based on previous reinforcement learning studies¹¹³. A prior of Beta(1.1, 1.1) was employed for parameters constrained between 0 and 1. A prior of Normal(0, 1) was employed for parameters constrained between negative infinity and infinity. A prior of Gamma(2, 3) was employed for parameters constrained between 0 and infinity.

Supplementary information

Supplementary file1.^{(236.9KB, pdf)}

Acknowledgements

This work was supported by a Jacobs Foundation Early Career Fellowship, a NARSAD Young Investigator Award, a Klingenstein-Simons Fellowship in Neuroscience, the National Science Foundation (CAREER Grant 1654393 to C.A.H. and Graduate Research Fellowship to H.A.R.), the NYU Vulnerable Brain Project, and the NYU High Performance Computing resources, service, and staff expertise. We are grateful to Shivani Hiralall for her assistance with participant recruitment and testing, Marc Guitart-Masip for sharing the original version of the task, and the families who participated in this study.

Author contributions

H.A.R. and C.A.H. designed the study. H.A.R. implemented the task, collected the data, and performed analyses under the supervision of C.A.H. H.A.R. and C.A.H. wrote the manuscript.

Data availability

Data are available on Open Science Framework: https://osf.io/4h6ne/.

Code availability

Code to reproduce all analyses in the manuscript can be found on Open Science Framework: https://osf.io/4h6ne/.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

is available for this paper at 10.1038/s41598-020-72628-w.

References

1.Pavlov IP. Conditional reflexes: an investigation of the physiological activity of the cerebral cortex. Oxford: Oxford University Press; 1927. [Google Scholar]
2.Hershberger WA. An approach through the looking-glass. Anim. Learn. Behav. 1986;14:443–451. doi: 10.3758/BF03200092. [DOI] [Google Scholar]
3.Williams DR, Williams H. Auto-maintenance in the pigeon: sustained pecking despite contingent non-reinforcement. J. Exp. Anal. Behav. 1969;12:511–520. doi: 10.1901/jeab.1969.12-511. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Gray JA, McNaughton N. The neuropsychology of anxiety: an enquiry into the function of the septo-hippocampal system. Oxford: Oxford University Press; 2000. [Google Scholar]
5.Bolles RC. Species-specific defense reactions and avoidance learning. Psychol. Rev. 1970;77:32–48. doi: 10.1037/h0028589. [DOI] [Google Scholar]
6.O’Doherty, J. P. Multiple systems for the motivational control of behavior and associated neural substrates in humans. In: Behavioral neuroscience of motivation (eds. Simpson, E. H. & Balsam, P. D.) 291–312 (Springer International Publishing, Berlin, 2016). 10.1007/7854_2015_386. [DOI] [PubMed]
7.Boureau Y-L, Dayan P. Opponency revisited: competition and cooperation between dopamine and serotonin. Neuropsychopharmacology. 2011;36:74–97. doi: 10.1038/npp.2010.151. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Breland K, Breland M. The misbehavior of organisms. Am. Psychol. 1961;16:681–684. doi: 10.1037/h0040090. [DOI] [Google Scholar]
9.Dickinson A, Balleine B. Motivational control of goal-directed action. Anim. Learn. Behav. 1994;22:1–18. doi: 10.3758/BF03199951. [DOI] [Google Scholar]
10.Estes WK. Discriminative conditioning. I. A discriminative property of conditioned anticipation. J. Exp. Psychol. 1943;32:150–155. doi: 10.1037/h0058316. [DOI] [Google Scholar]
11.Lovibond PF. Facilitation of instrumental behavior by a Pavlovian appetitive conditioned stimulus. J. Exp. Psychol. Anim. Behav. Process. 1983;9:225. doi: 10.1037/0097-7403.9.3.225. [DOI] [PubMed] [Google Scholar]
12.Rescorla R, Solomon R. Two-process learning theory: relationships between Pavlovian conditioning and instrumental learning. Psychol. Rev. 1967;74:151–182. doi: 10.1037/h0024475. [DOI] [PubMed] [Google Scholar]
13.Choi J-S, Cain CK, LeDoux JE. The role of amygdala nuclei in the expression of auditory signaled two-way active avoidance in rats. Learn. Mem. 2010;17:139–147. doi: 10.1101/lm.1676610. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Galatzer-Levy IR, et al. Heterogeneity in signaled active avoidance learning: substantive and methodological relevance of diversity in instrumental defensive responses to threat cues. Front. Syst. Neurosci. 2014;8:179. doi: 10.3389/fnsys.2014.00179. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Stavnes K, Sprott RL. Effects of age and genotype on acquisition of an active avoidance response in mice. Dev. Psychobiol. 1975;8:437–445. doi: 10.1002/dev.420080508. [DOI] [PubMed] [Google Scholar]
16.Holland PC. Differential effects of omission contingencies on various components of Pavlovian appetitive conditioned responding in rats. J. Exp. Psychol. Anim. Behav. Process. 1979;5:178–193. doi: 10.1037/0097-7403.5.2.178. [DOI] [PubMed] [Google Scholar]
17.Bray S, Rangel A, Shimojo S, Balleine B, O’Doherty JP. The Neural mechanisms underlying the influence of pavlovian cues on human decision making. J. Neurosci. 2008;28:5861–5866. doi: 10.1523/JNEUROSCI.0897-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Cavanagh JF, Eisenberg I, Guitart-Masip M, Huys Q, Frank MJ. Frontal theta overrides pavlovian learning biases. J. Neurosci. 2013;33:8541–8548. doi: 10.1523/JNEUROSCI.5754-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Crockett MJ, Clark L, Robbins TW. Reconciling the role of serotonin in behavioral inhibition and aversion: acute tryptophan depletion abolishes punishment-induced inhibition in humans. J. Neurosci. 2009;29:11993–11999. doi: 10.1523/JNEUROSCI.2513-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Guitart-Masip M, et al. Action dominates valence in anticipatory representations in the human striatum and dopaminergic midbrain. J. Neurosci. 2011;31:7867–7875. doi: 10.1523/JNEUROSCI.6376-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Guitart-Masip M, et al. Go and no-go learning in reward and punishment: interactions between affect and effect. Neuroimage. 2012;62–334:154–166. doi: 10.1016/j.neuroimage.2012.04.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Holland PC. Relations between Pavlovian-instrumental transfer and reinforcer devaluation. J. Exp. Psychol. Anim. Behav. Process. 2004;30:104–117. doi: 10.1037/0097-7403.30.2.104. [DOI] [PubMed] [Google Scholar]
23.Talmi D, Seymour B, Dayan P, Dolan RJ. Human Pavlovian-instrumental transfer. J. Neurosci. 2008;28:360–368. doi: 10.1523/JNEUROSCI.4028-07.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Hartley CA, Lee FS. Sensitive periods in affective development: nonlinear maturation of fear learning. Neuropsychopharmacology. 2015;40:50–60. doi: 10.1038/npp.2014.179. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Raab, H. A. & Hartley, C. A. The development of goal-directed decision-making. in Goal-Directed Decision Making (eds. Morris, R., Bornstein, A. & Shenhav, A.) 279–308 (Academic Press, Cambridge, 2018). 10.1016/B978-0-12-812098-9.00013-9.
26.Rovee-Collier CK, Gekoski MJ. The economics of infancy: a review of conjugate reinforcement. Adv. Child Dev. Behav. 1979;13:195–255. doi: 10.1016/S0065-2407(08)60348-1. [DOI] [PubMed] [Google Scholar]
27.Shechner T, Hong M, Britton JC, Pine DS, Fox NA. Fear conditioning and extinction across development: evidence from human studies and animal models. Biol. Psychol. 2014;100:1–12. doi: 10.1016/j.biopsycho.2014.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Moutoussis M, et al. Change, stability, and instability in the Pavlovian guidance of behaviour from adolescence to young adulthood. PLOS Comput. Biol. 2018;14:e1006679. doi: 10.1371/journal.pcbi.1006679. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Bunge SA, Wright SB. Neurodevelopmental changes in working memory and cognitive control. Curr. Opin. Neurobiol. 2007;17:243–250. doi: 10.1016/j.conb.2007.02.005. [DOI] [PubMed] [Google Scholar]
30.Diamond, A. The early development of executive functions. in Lifespan Cognition: Mechanisms of Change. Bialystok E, Craik F, editors. 70–95 (Oxford University Press, Oxford 2006).
31.Luna, B. Developmental changes in cognitive control through adolescence. in Advances in Child Development and Behavior (ed. Bauer, P.) vol. 37 233–278 (JAI, Amsterdam, 2009). [DOI] [PMC free article] [PubMed]
32.Somerville LH, Casey B. Developmental neurobiology of cognitive control and motivational systems. Curr. Opin. Neurobiol. 2010;20:236–241. doi: 10.1016/j.conb.2010.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Guitart-Masip M, et al. Differential, but not opponent, effects of L -DOPA and citalopram on action learning with reward and punishment. Psychopharmacology. 2014;231:955–966. doi: 10.1007/s00213-013-3313-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Nook EC, Sasse SF, Lambert HK, McLaughlin KA, Somerville LH. The nonlinear development of emotion differentiation: granular emotional experience is low in adolescence. Psychol. Sci. 2018;29:1346–1357. doi: 10.1177/0956797618773357. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Rodman AM, Powers KE, Somerville LH. Development of self-protective biases in response to social evaluative feedback. Proc. Natl. Acad. Sci. 2017;114:13158–13163. doi: 10.1073/pnas.1712398114. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Somerville LH, et al. The medial prefrontal cortex and the emergence of self-conscious emotion in adolescence. Psychol. Sci. 2013;24:1554–1562. doi: 10.1177/0956797613475633. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Akaike H. A new look at the statistical model identification. IEEE Trans. Autom. Control. 1974;19:716–723. doi: 10.1109/TAC.1974.1100705. [DOI] [Google Scholar]
38.Sutton, R. S. & Barto, A. G. Reinforcement Learning. (MIT Press, 1998).
39.Swart JC, et al. Frontal network dynamics reflect neurocomputational mechanisms for reducing maladaptive biases in motivated action. PLOS Biol. 2018;16:e2005979. doi: 10.1371/journal.pbio.2005979. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Swart JC, et al. Catecholaminergic challenge uncovers distinct Pavlovian and instrumental mechanisms of motivated (in) action. eLife. 2017;6:e22169. doi: 10.7554/eLife.22169. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Huys QJM, et al. Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees. PLOS Comput. Biol. 2012;8:e1002410. doi: 10.1371/journal.pcbi.1002410. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Huys QJM, et al. Interplay of approximate planning strategies. Proc. Natl. Acad. Sci. 2015;112:3098–3103. doi: 10.1073/pnas.1414219112. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Lally N, et al. The neural basis of aversive Pavlovian guidance during planning. J. Neurosci. 2017;37:10215–10229. doi: 10.1523/JNEUROSCI.0085-17.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Denrell J, March JG. Adaptation as information restriction: the hot stove effect. Organ. Sci. 2001;12:523–538. doi: 10.1287/orsc.12.5.523.10092. [DOI] [Google Scholar]
45.Rich AS, Gureckis TM. The limits of learning: exploration, generalization, and the development of learning traps. J. Exp. Psychol. Gen. 2018;147:1553. doi: 10.1037/xge0000466. [DOI] [PubMed] [Google Scholar]
46.Casey B, Duhoux S, Cohen MM. Adolescence: what do transmission, transition, and translation have to do with it? Neuron. 2010;67:749–760. doi: 10.1016/j.neuron.2010.08.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Spear LP. The adolescent brain and age-related behavioral manifestations. Neurosci. Biobehav. Rev. 2000;24:417–463. doi: 10.1016/S0149-7634(00)00014-2. [DOI] [PubMed] [Google Scholar]
48.Tymula A, et al. Adolescents’ risk-taking behavior is driven by tolerance to ambiguity. Proc. Natl. Acad. Sci. 2012;109:17135–17140. doi: 10.1073/pnas.1207144109. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Rosenbaum, G. M. & Hartley, C. A. Developmental perspectives on risky and impulsive choice. Philos. Trans. R. Soc. B Biol. Sci.374, 20180133 (2019). [DOI] [PMC free article] [PubMed]
50.Steinberg L. A social neuroscience perspective on adolescent risk-taking. Dev. Rev. DR. 2008;28:78–106. doi: 10.1016/j.dr.2007.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Bach DR, Dayan P. Algorithms for survival: a comparative perspective on emotions. Nat. Rev. Neurosci. 2017;18:311–319. doi: 10.1038/nrn.2017.35. [DOI] [PubMed] [Google Scholar]
52.LeDoux J, Daw ND. Surviving threats: neural circuit and computational implications of a new taxonomy of defensive behaviour. Nat. Rev. Neurosci. 2018;19:269–282. doi: 10.1038/nrn.2018.22. [DOI] [PubMed] [Google Scholar]
53.Huys QJM, Dayan P. A Bayesian formulation of behavioral control. Cognition. 2009;113:314–328. doi: 10.1016/j.cognition.2009.01.008. [DOI] [PubMed] [Google Scholar]
54.Lloyd, K. & Dayan, P. Safety out of control: dopamine and defence. Behav. Brain Funct.12, (2016). [DOI] [PMC free article] [PubMed]
55.Moscarello JM, Hartley CA. Agency and the calibration of motivated behavior. Trends Cogn. Sci. 2017;21:725–735. doi: 10.1016/j.tics.2017.06.008. [DOI] [PubMed] [Google Scholar]
56.Rigoli F, Pezzulo G, Dolan RJ. Prospective and Pavlovian mechanisms in aversive behaviour. Cognition. 2016;146:415–425. doi: 10.1016/j.cognition.2015.10.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Dorfman HM, Gershman SJ. Controllability governs the balance between Pavlovian and instrumental action selection. Nat. Commun. 2019;10:1–8. doi: 10.1038/s41467-019-13737-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Bauer RH. Ontogeny of two-way avoidance in male and female rats. Dev. Psychobiol. 1978;11:103–116. doi: 10.1002/dev.420110203. [DOI] [PubMed] [Google Scholar]
59.McCallum J, Kim JH, Richardson R. Impaired extinction retention in adolescent rats: effects of D-Cycloserine. Neuropsychopharmacology. 2010;35:2134–2142. doi: 10.1038/npp.2010.92. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Pattwell SS, et al. Altered fear learning across development in both mouse and human. Proc. Natl. Acad. Sci. 2012;109:16318–16323. doi: 10.1073/pnas.1206834109. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Barto, A. G. Adaptive critics and the basal ganglia. in Models of information processing in the basal ganglia 215–232 (The MIT Press, Cambridge, 1995).
62.Barto AG, Sutton RS, Anderson CW. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 1983;13:834–846. doi: 10.1109/TSMC.1983.6313077. [DOI] [Google Scholar]
63.Maia TV. Two-factor theory, the actor-critic model, and conditioned avoidance. Learn. Behav. 2010;38:50–67. doi: 10.3758/LB.38.1.50. [DOI] [PubMed] [Google Scholar]
64.O’Doherty J, et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science. 2004;304:452–454. doi: 10.1126/science.1094285. [DOI] [PubMed] [Google Scholar]
65.Cardinal RN, Parkinson JA, Hall J, Everitt BJ. Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex. Neurosci. Biobehav. Rev. 2002;26:321–352. doi: 10.1016/S0149-7634(02)00007-6. [DOI] [PubMed] [Google Scholar]
66.Dayan P, Balleine BW. Reward, motivation, and reinforcement learning. Neuron. 2002;36:285–298. doi: 10.1016/S0896-6273(02)00963-7. [DOI] [PubMed] [Google Scholar]
67.van den Bos W, Cohen MX, Kahnt T, Crone EA. Striatum-medial prefrontal cortex connectivity predicts developmental changes in reinforcement learning. Cereb. Cortex. 2012;22:1247–1255. doi: 10.1093/cercor/bhr198. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.van Duijvenvoorde ACK, Achterberg M, Braams BR, Peters S, Crone EA. Testing a dual-systems model of adolescent brain development using resting-state connectivity analyses. NeuroImage. 2016;124:409–420. doi: 10.1016/j.neuroimage.2015.04.069. [DOI] [PubMed] [Google Scholar]
69.Gogtay N, et al. Dynamic mapping of human cortical development during childhood through early adulthood. Proc. Natl. Acad. Sci. U. S. A. 2004;101:8174–8179. doi: 10.1073/pnas.0402680101. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Larsen B, Verstynen TD, Yeh F-C, Luna B. Developmental changes in the integration of affective and cognitive corticostriatal pathways are associated with reward-driven behavior. Cereb. Cortex. 2018;28:2834–2845. doi: 10.1093/cercor/bhx162. [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Liston C, et al. Frontostriatal microstructure modulates efficient recruitment of cognitive control. Cereb. Cortex N. Y. N. 2006;1991(16):553–560. doi: 10.1093/cercor/bhj003. [DOI] [PubMed] [Google Scholar]
72.Mills KL, et al. Structural brain development between childhood and adulthood: convergence across four longitudinal samples. NeuroImage. 2016;141:273–281. doi: 10.1016/j.neuroimage.2016.07.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Raznahan A, et al. Longitudinal four-dimensional mapping of subcortical anatomy in human development. Proc. Natl. Acad. Sci. 2014;111:1592–1597. doi: 10.1073/pnas.1316911111. [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Sowell ER, Thompson PM, Holmes CJ, Jernigan TL, Toga AW. In vivo evidence for post-adolescent brain maturation in frontal and striatal regions. Nat. Neurosci. 1999;2:859–861. doi: 10.1038/13154. [DOI] [PubMed] [Google Scholar]
75.Hoops D, Flores C. Making dopamine connections in adolescence. Trends Neurosci. 2017;40:709–719. doi: 10.1016/j.tins.2017.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
76.Wahlstrom D, White T, Luciana M. Neurobehavioral evidence for changes in dopamine system activity during adolescence. Neurosci. Biobehav. Rev. 2010;34:631–648. doi: 10.1016/j.neubiorev.2009.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
77.Andersen SL, Rutstein M, Benzo JM, Hostetter JC, Teicher MH. Sex differences in dopamine receptor overproduction and elimination. NeuroReport. 1997;8:1495. doi: 10.1097/00001756-199704140-00034. [DOI] [PubMed] [Google Scholar]
78.Andersen SL, Thompson AT, Rutstein M, Hostetter JC, Teicher MH. Dopamine receptor pruning in prefrontal cortex during the periadolescent period in rats. Synapse. 2000;37:167–169. doi: 10.1002/1098-2396(200008)37:2<167::AID-SYN11>3.0.CO;2-B. [DOI] [PubMed] [Google Scholar]
79.Meng SZ, Ozawa Y, Itoh M, Takashima S. Developmental and age-related changes of dopamine transporter, and dopamine D1 and D2 receptors in human basal ganglia. Brain Res. 1999;843:136–144. doi: 10.1016/S0006-8993(99)01933-2. [DOI] [PubMed] [Google Scholar]
80.Montague DM, Lawler CP, Mailman RB, Gilmore JH. Developmental regulation of the dopamine D 1 receptor in human caudate and putamen. Neuropsychopharmacology. 1999;21:641–649. doi: 10.1016/S0893-133X(99)00062-7. [DOI] [PubMed] [Google Scholar]
81.Teicher MH, Andersen SL, Hostetter JC. Evidence for dopamine receptor pruning between adolescence and adulthood in striatum but not nucleus accumbens. Dev. Brain Res. 1995;89:167–172. doi: 10.1016/0165-3806(95)00109-Q. [DOI] [PubMed] [Google Scholar]
82.Hoops, D., Reynolds, L. M., Restrepo-Lozano, J.-M. & Flores, C. Dopamine development in the mouse orbital prefrontal cortex is protracted and sensitive to amphetamine in adolescence. eNeuro5, (2018). [DOI] [PMC free article] [PubMed]
83.Reynolds LM, et al. DCC receptors drive prefrontal cortex maturation by determining dopamine axon targeting in adolescence. Biol. Psychiatry. 2018;83:181–192. doi: 10.1016/j.biopsych.2017.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
84.Balleine BW, O’Doherty JP. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology. 2010;35:48–69. doi: 10.1038/npp.2009.131. [DOI] [PMC free article] [PubMed] [Google Scholar]
85.Dickinson A. Actions and habits: the development of behavioural autonomy. Phil. Trans. R. Soc. Lond. B. 1985;308:67–78. doi: 10.1098/rstb.1985.0010. [DOI] [Google Scholar]
86.Yin HH, Knowlton BJ. The role of the basal ganglia in habit formation. Nat. Rev. Neurosci. 2006;7:464–476. doi: 10.1038/nrn1919. [DOI] [PubMed] [Google Scholar]
87.Sebold M, et al. Don’t think, just feel the music: individuals with strong pavlovian-to-instrumental transfer effects rely less on model-based reinforcement learning. J. Cogn. Neurosci. 2016;28:985–995. doi: 10.1162/jocn_a_00945. [DOI] [PubMed] [Google Scholar]
88.Wunderlich K, Smittenaar P, Dolan RJ. Dopamine enhances model-based over model-free choice behavior. Neuron. 2012;75:418–424. doi: 10.1016/j.neuron.2012.03.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
89.Decker JH, Otto AR, Daw ND, Hartley CA. From creatures of habit to goal-directed learners tracking the developmental emergence of model-based reinforcement learning. Psychol. Sci. 2016;27:848–858. doi: 10.1177/0956797616639301. [DOI] [PMC free article] [PubMed] [Google Scholar]
90.Kenward B, Folke S, Holmberg J, Johansson A, Gredebäck G. Goal directedness and decision making in infants. Dev. Psychol. 2009;45:809–819. doi: 10.1037/a0014076. [DOI] [PubMed] [Google Scholar]
91.Klossek UMH, Russell J, Dickinson A. The control of instrumental action following outcome devaluation in young children aged between 1 and 4 years. J. Exp. Psychol. Gen. 2008;137:39–51. doi: 10.1037/0096-3445.137.1.39. [DOI] [PubMed] [Google Scholar]
92.Potter TCS, Bryce NV, Hartley CA. Cognitive components underpinning the development of model-based learning. Dev. Cogn. Neurosci. 2017;25:272–280. doi: 10.1016/j.dcn.2016.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
93.Kessler RC, et al. Lifetime prevalence and age-of-onset distributions of dsm-iv disorders in the national comorbidity survey replication. Arch. Gen. Psychiatry. 2005;62:593–602. doi: 10.1001/archpsyc.62.6.593. [DOI] [PubMed] [Google Scholar]
94.Paus T, Keshavan M, Giedd JN. Why do many psychiatric disorders emerge during adolescence? Nat. Rev. Neurosci. 2008;9:947–957. doi: 10.1038/nrn2513. [DOI] [PMC free article] [PubMed] [Google Scholar]
95.Carter BL, Tiffany ST. Meta-analysis of cue-reactivity in addiction research. Addict. Abingdon Engl. 1999;94:327–340. doi: 10.1046/j.1360-0443.1999.9433273.x. [DOI] [PubMed] [Google Scholar]
96.Everitt BJ, Dickinson A, Robbins TW. The neuropsychological basis of addictive behaviour. Brain Res. Rev. 2001;36:129–138. doi: 10.1016/S0165-0173(01)00088-1. [DOI] [PubMed] [Google Scholar]
97.Garbusow M, et al. Pavlovian-to-instrumental transfer effects in the nucleus accumbens relate to relapse in alcohol dependence. Addict. Biol. 2016;21:719–731. doi: 10.1111/adb.12243. [DOI] [PubMed] [Google Scholar]
98.Mineka S, Oehlberg K. The relevance of recent developments in classical conditioning to understanding the etiology and maintenance of anxiety disorders. Acta Psychol. (Amst.) 2008;127:567–580. doi: 10.1016/j.actpsy.2007.11.007. [DOI] [PubMed] [Google Scholar]
99.Mkrtchian A, Aylward J, Dayan P, Roiser JP, Robinson OJ. Modeling avoidance in mood and anxiety disorders using reinforcement learning. Biol. Psychiatry. 2017;82:532–539. doi: 10.1016/j.biopsych.2017.01.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
100.Dahl, R. E. Adolescent brain development: a period of vulnerabilities and opportunities. Keynote Address. Ann. N. Y. Acad. Sci.1021, 1–22 (2004). [DOI] [PubMed]
101.Fuhrmann D, Knoll LJ, Blakemore S-J. Adolescence as a sensitive period of brain development. Trends Cogn. Sci. 2015;19:558–566. doi: 10.1016/j.tics.2015.07.008. [DOI] [PubMed] [Google Scholar]
102.Meyer HC, Lee FS. Translating developmental neuroscience to understand risk for psychiatric disorders. Am. J. Psychiatry. 2019;176:179–185. doi: 10.1176/appi.ajp.2019.19010091. [DOI] [PMC free article] [PubMed] [Google Scholar]
103.Romeo RD, McEwen BS. Stress and the adolescent brain. Ann. N. Y. Acad. Sci. 2006;1094:202–214. doi: 10.1196/annals.1376.022. [DOI] [PubMed] [Google Scholar]
104.Lupien SJ, McEwen BS, Gunnar MR, Heim C. Effects of stress throughout the lifespan on the brain, behaviour and cognition. Nat. Rev. Neurosci. 2009;10:434–445. doi: 10.1038/nrn2639. [DOI] [PubMed] [Google Scholar]
105.Anderson RI, Spear LP. Autoshaping in adolescence enhances sign-tracking behavior in adulthood: Impact on ethanol consumption. Pharmacol. Biochem. Behav. 2011;98:250–260. doi: 10.1016/j.pbb.2011.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
106.Doremus-Fitzwater TL, Spear LP. Amphetamine-induced incentive sensitization of sign-tracking behavior in adolescent and adult female rats. Behav. Neurosci. 2011;125:661–667. doi: 10.1037/a0023763. [DOI] [PMC free article] [PubMed] [Google Scholar]
107.Anderson RI, Bush PC, Spear LP. Environmental manipulations alter age differences in attribution of incentive salience to reward-paired cues. Behav. Brain Res. 2013;257:83–89. doi: 10.1016/j.bbr.2013.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
108.DeAngeli NE, Miller SB, Meyer HC, Bucci DJ. Increased sign-tracking behavior in adolescent rats. Dev. Psychobiol. 2017;59:840–847. doi: 10.1002/dev.21548. [DOI] [PMC free article] [PubMed] [Google Scholar]
109.Ousdal OT, et al. The impact of traumatic stress on Pavlovian biases. Psychol. Med. 2018;48:327–336. doi: 10.1017/S003329171700174X. [DOI] [PubMed] [Google Scholar]
110.de Boer L, et al. Dorsal striatal dopamine D1 receptor availability predicts an instrumental bias in action learning. Proc. Natl. Acad. Sci. 2019;116:261–270. doi: 10.1073/pnas.1816704116. [DOI] [PMC free article] [PubMed] [Google Scholar]
111.Guitart-Masip M, et al. Action controls dopaminergic enhancement of reward representations. Proc. Natl. Acad. Sci. 2012;109:7511–7516. doi: 10.1073/pnas.1202229109. [DOI] [PMC free article] [PubMed] [Google Scholar]
112.Cohen J. A power primer. Psychol. Bull. 1992;112:155–159. doi: 10.1037/0033-2909.112.1.155. [DOI] [PubMed] [Google Scholar]
113.Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans’ choices and striatal prediction errors. Neuron. 2011;69:1204–1215. doi: 10.1016/j.neuron.2011.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary file1.^{(236.9KB, pdf)}

Data Availability Statement

Data are available on Open Science Framework: https://osf.io/4h6ne/.

Code to reproduce all analyses in the manuscript can be found on Open Science Framework: https://osf.io/4h6ne/.

[CR1] 1.Pavlov IP. Conditional reflexes: an investigation of the physiological activity of the cerebral cortex. Oxford: Oxford University Press; 1927. [Google Scholar]

[CR2] 2.Hershberger WA. An approach through the looking-glass. Anim. Learn. Behav. 1986;14:443–451. doi: 10.3758/BF03200092. [DOI] [Google Scholar]

[CR3] 3.Williams DR, Williams H. Auto-maintenance in the pigeon: sustained pecking despite contingent non-reinforcement. J. Exp. Anal. Behav. 1969;12:511–520. doi: 10.1901/jeab.1969.12-511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Gray JA, McNaughton N. The neuropsychology of anxiety: an enquiry into the function of the septo-hippocampal system. Oxford: Oxford University Press; 2000. [Google Scholar]

[CR5] 5.Bolles RC. Species-specific defense reactions and avoidance learning. Psychol. Rev. 1970;77:32–48. doi: 10.1037/h0028589. [DOI] [Google Scholar]

[CR6] 6.O’Doherty, J. P. Multiple systems for the motivational control of behavior and associated neural substrates in humans. In: Behavioral neuroscience of motivation (eds. Simpson, E. H. & Balsam, P. D.) 291–312 (Springer International Publishing, Berlin, 2016). 10.1007/7854_2015_386. [DOI] [PubMed]

[CR7] 7.Boureau Y-L, Dayan P. Opponency revisited: competition and cooperation between dopamine and serotonin. Neuropsychopharmacology. 2011;36:74–97. doi: 10.1038/npp.2010.151. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Breland K, Breland M. The misbehavior of organisms. Am. Psychol. 1961;16:681–684. doi: 10.1037/h0040090. [DOI] [Google Scholar]

[CR9] 9.Dickinson A, Balleine B. Motivational control of goal-directed action. Anim. Learn. Behav. 1994;22:1–18. doi: 10.3758/BF03199951. [DOI] [Google Scholar]

[CR10] 10.Estes WK. Discriminative conditioning. I. A discriminative property of conditioned anticipation. J. Exp. Psychol. 1943;32:150–155. doi: 10.1037/h0058316. [DOI] [Google Scholar]

[CR11] 11.Lovibond PF. Facilitation of instrumental behavior by a Pavlovian appetitive conditioned stimulus. J. Exp. Psychol. Anim. Behav. Process. 1983;9:225. doi: 10.1037/0097-7403.9.3.225. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Rescorla R, Solomon R. Two-process learning theory: relationships between Pavlovian conditioning and instrumental learning. Psychol. Rev. 1967;74:151–182. doi: 10.1037/h0024475. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Choi J-S, Cain CK, LeDoux JE. The role of amygdala nuclei in the expression of auditory signaled two-way active avoidance in rats. Learn. Mem. 2010;17:139–147. doi: 10.1101/lm.1676610. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Galatzer-Levy IR, et al. Heterogeneity in signaled active avoidance learning: substantive and methodological relevance of diversity in instrumental defensive responses to threat cues. Front. Syst. Neurosci. 2014;8:179. doi: 10.3389/fnsys.2014.00179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Stavnes K, Sprott RL. Effects of age and genotype on acquisition of an active avoidance response in mice. Dev. Psychobiol. 1975;8:437–445. doi: 10.1002/dev.420080508. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Holland PC. Differential effects of omission contingencies on various components of Pavlovian appetitive conditioned responding in rats. J. Exp. Psychol. Anim. Behav. Process. 1979;5:178–193. doi: 10.1037/0097-7403.5.2.178. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Bray S, Rangel A, Shimojo S, Balleine B, O’Doherty JP. The Neural mechanisms underlying the influence of pavlovian cues on human decision making. J. Neurosci. 2008;28:5861–5866. doi: 10.1523/JNEUROSCI.0897-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Cavanagh JF, Eisenberg I, Guitart-Masip M, Huys Q, Frank MJ. Frontal theta overrides pavlovian learning biases. J. Neurosci. 2013;33:8541–8548. doi: 10.1523/JNEUROSCI.5754-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Crockett MJ, Clark L, Robbins TW. Reconciling the role of serotonin in behavioral inhibition and aversion: acute tryptophan depletion abolishes punishment-induced inhibition in humans. J. Neurosci. 2009;29:11993–11999. doi: 10.1523/JNEUROSCI.2513-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Guitart-Masip M, et al. Action dominates valence in anticipatory representations in the human striatum and dopaminergic midbrain. J. Neurosci. 2011;31:7867–7875. doi: 10.1523/JNEUROSCI.6376-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Guitart-Masip M, et al. Go and no-go learning in reward and punishment: interactions between affect and effect. Neuroimage. 2012;62–334:154–166. doi: 10.1016/j.neuroimage.2012.04.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Holland PC. Relations between Pavlovian-instrumental transfer and reinforcer devaluation. J. Exp. Psychol. Anim. Behav. Process. 2004;30:104–117. doi: 10.1037/0097-7403.30.2.104. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Talmi D, Seymour B, Dayan P, Dolan RJ. Human Pavlovian-instrumental transfer. J. Neurosci. 2008;28:360–368. doi: 10.1523/JNEUROSCI.4028-07.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Hartley CA, Lee FS. Sensitive periods in affective development: nonlinear maturation of fear learning. Neuropsychopharmacology. 2015;40:50–60. doi: 10.1038/npp.2014.179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Raab, H. A. & Hartley, C. A. The development of goal-directed decision-making. in Goal-Directed Decision Making (eds. Morris, R., Bornstein, A. & Shenhav, A.) 279–308 (Academic Press, Cambridge, 2018). 10.1016/B978-0-12-812098-9.00013-9.

[CR26] 26.Rovee-Collier CK, Gekoski MJ. The economics of infancy: a review of conjugate reinforcement. Adv. Child Dev. Behav. 1979;13:195–255. doi: 10.1016/S0065-2407(08)60348-1. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Shechner T, Hong M, Britton JC, Pine DS, Fox NA. Fear conditioning and extinction across development: evidence from human studies and animal models. Biol. Psychol. 2014;100:1–12. doi: 10.1016/j.biopsycho.2014.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Moutoussis M, et al. Change, stability, and instability in the Pavlovian guidance of behaviour from adolescence to young adulthood. PLOS Comput. Biol. 2018;14:e1006679. doi: 10.1371/journal.pcbi.1006679. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Bunge SA, Wright SB. Neurodevelopmental changes in working memory and cognitive control. Curr. Opin. Neurobiol. 2007;17:243–250. doi: 10.1016/j.conb.2007.02.005. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Diamond, A. The early development of executive functions. in Lifespan Cognition: Mechanisms of Change. Bialystok E, Craik F, editors. 70–95 (Oxford University Press, Oxford 2006).

[CR31] 31.Luna, B. Developmental changes in cognitive control through adolescence. in Advances in Child Development and Behavior (ed. Bauer, P.) vol. 37 233–278 (JAI, Amsterdam, 2009). [DOI] [PMC free article] [PubMed]

[CR32] 32.Somerville LH, Casey B. Developmental neurobiology of cognitive control and motivational systems. Curr. Opin. Neurobiol. 2010;20:236–241. doi: 10.1016/j.conb.2010.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Guitart-Masip M, et al. Differential, but not opponent, effects of L -DOPA and citalopram on action learning with reward and punishment. Psychopharmacology. 2014;231:955–966. doi: 10.1007/s00213-013-3313-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Nook EC, Sasse SF, Lambert HK, McLaughlin KA, Somerville LH. The nonlinear development of emotion differentiation: granular emotional experience is low in adolescence. Psychol. Sci. 2018;29:1346–1357. doi: 10.1177/0956797618773357. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Rodman AM, Powers KE, Somerville LH. Development of self-protective biases in response to social evaluative feedback. Proc. Natl. Acad. Sci. 2017;114:13158–13163. doi: 10.1073/pnas.1712398114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Somerville LH, et al. The medial prefrontal cortex and the emergence of self-conscious emotion in adolescence. Psychol. Sci. 2013;24:1554–1562. doi: 10.1177/0956797613475633. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Akaike H. A new look at the statistical model identification. IEEE Trans. Autom. Control. 1974;19:716–723. doi: 10.1109/TAC.1974.1100705. [DOI] [Google Scholar]

[CR38] 38.Sutton, R. S. & Barto, A. G. Reinforcement Learning. (MIT Press, 1998).

[CR39] 39.Swart JC, et al. Frontal network dynamics reflect neurocomputational mechanisms for reducing maladaptive biases in motivated action. PLOS Biol. 2018;16:e2005979. doi: 10.1371/journal.pbio.2005979. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Swart JC, et al. Catecholaminergic challenge uncovers distinct Pavlovian and instrumental mechanisms of motivated (in) action. eLife. 2017;6:e22169. doi: 10.7554/eLife.22169. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Huys QJM, et al. Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees. PLOS Comput. Biol. 2012;8:e1002410. doi: 10.1371/journal.pcbi.1002410. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Huys QJM, et al. Interplay of approximate planning strategies. Proc. Natl. Acad. Sci. 2015;112:3098–3103. doi: 10.1073/pnas.1414219112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Lally N, et al. The neural basis of aversive Pavlovian guidance during planning. J. Neurosci. 2017;37:10215–10229. doi: 10.1523/JNEUROSCI.0085-17.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Denrell J, March JG. Adaptation as information restriction: the hot stove effect. Organ. Sci. 2001;12:523–538. doi: 10.1287/orsc.12.5.523.10092. [DOI] [Google Scholar]

[CR45] 45.Rich AS, Gureckis TM. The limits of learning: exploration, generalization, and the development of learning traps. J. Exp. Psychol. Gen. 2018;147:1553. doi: 10.1037/xge0000466. [DOI] [PubMed] [Google Scholar]

[CR46] 46.Casey B, Duhoux S, Cohen MM. Adolescence: what do transmission, transition, and translation have to do with it? Neuron. 2010;67:749–760. doi: 10.1016/j.neuron.2010.08.033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Spear LP. The adolescent brain and age-related behavioral manifestations. Neurosci. Biobehav. Rev. 2000;24:417–463. doi: 10.1016/S0149-7634(00)00014-2. [DOI] [PubMed] [Google Scholar]

[CR48] 48.Tymula A, et al. Adolescents’ risk-taking behavior is driven by tolerance to ambiguity. Proc. Natl. Acad. Sci. 2012;109:17135–17140. doi: 10.1073/pnas.1207144109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Rosenbaum, G. M. & Hartley, C. A. Developmental perspectives on risky and impulsive choice. Philos. Trans. R. Soc. B Biol. Sci.374, 20180133 (2019). [DOI] [PMC free article] [PubMed]

[CR50] 50.Steinberg L. A social neuroscience perspective on adolescent risk-taking. Dev. Rev. DR. 2008;28:78–106. doi: 10.1016/j.dr.2007.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Bach DR, Dayan P. Algorithms for survival: a comparative perspective on emotions. Nat. Rev. Neurosci. 2017;18:311–319. doi: 10.1038/nrn.2017.35. [DOI] [PubMed] [Google Scholar]

[CR52] 52.LeDoux J, Daw ND. Surviving threats: neural circuit and computational implications of a new taxonomy of defensive behaviour. Nat. Rev. Neurosci. 2018;19:269–282. doi: 10.1038/nrn.2018.22. [DOI] [PubMed] [Google Scholar]

[CR53] 53.Huys QJM, Dayan P. A Bayesian formulation of behavioral control. Cognition. 2009;113:314–328. doi: 10.1016/j.cognition.2009.01.008. [DOI] [PubMed] [Google Scholar]

[CR54] 54.Lloyd, K. & Dayan, P. Safety out of control: dopamine and defence. Behav. Brain Funct.12, (2016). [DOI] [PMC free article] [PubMed]

[CR55] 55.Moscarello JM, Hartley CA. Agency and the calibration of motivated behavior. Trends Cogn. Sci. 2017;21:725–735. doi: 10.1016/j.tics.2017.06.008. [DOI] [PubMed] [Google Scholar]

[CR56] 56.Rigoli F, Pezzulo G, Dolan RJ. Prospective and Pavlovian mechanisms in aversive behaviour. Cognition. 2016;146:415–425. doi: 10.1016/j.cognition.2015.10.017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR57] 57.Dorfman HM, Gershman SJ. Controllability governs the balance between Pavlovian and instrumental action selection. Nat. Commun. 2019;10:1–8. doi: 10.1038/s41467-019-13737-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR58] 58.Bauer RH. Ontogeny of two-way avoidance in male and female rats. Dev. Psychobiol. 1978;11:103–116. doi: 10.1002/dev.420110203. [DOI] [PubMed] [Google Scholar]

[CR59] 59.McCallum J, Kim JH, Richardson R. Impaired extinction retention in adolescent rats: effects of D-Cycloserine. Neuropsychopharmacology. 2010;35:2134–2142. doi: 10.1038/npp.2010.92. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR60] 60.Pattwell SS, et al. Altered fear learning across development in both mouse and human. Proc. Natl. Acad. Sci. 2012;109:16318–16323. doi: 10.1073/pnas.1206834109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR61] 61.Barto, A. G. Adaptive critics and the basal ganglia. in Models of information processing in the basal ganglia 215–232 (The MIT Press, Cambridge, 1995).

[CR62] 62.Barto AG, Sutton RS, Anderson CW. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 1983;13:834–846. doi: 10.1109/TSMC.1983.6313077. [DOI] [Google Scholar]

[CR63] 63.Maia TV. Two-factor theory, the actor-critic model, and conditioned avoidance. Learn. Behav. 2010;38:50–67. doi: 10.3758/LB.38.1.50. [DOI] [PubMed] [Google Scholar]

[CR64] 64.O’Doherty J, et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science. 2004;304:452–454. doi: 10.1126/science.1094285. [DOI] [PubMed] [Google Scholar]

[CR65] 65.Cardinal RN, Parkinson JA, Hall J, Everitt BJ. Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex. Neurosci. Biobehav. Rev. 2002;26:321–352. doi: 10.1016/S0149-7634(02)00007-6. [DOI] [PubMed] [Google Scholar]

[CR66] 66.Dayan P, Balleine BW. Reward, motivation, and reinforcement learning. Neuron. 2002;36:285–298. doi: 10.1016/S0896-6273(02)00963-7. [DOI] [PubMed] [Google Scholar]

[CR67] 67.van den Bos W, Cohen MX, Kahnt T, Crone EA. Striatum-medial prefrontal cortex connectivity predicts developmental changes in reinforcement learning. Cereb. Cortex. 2012;22:1247–1255. doi: 10.1093/cercor/bhr198. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR68] 68.van Duijvenvoorde ACK, Achterberg M, Braams BR, Peters S, Crone EA. Testing a dual-systems model of adolescent brain development using resting-state connectivity analyses. NeuroImage. 2016;124:409–420. doi: 10.1016/j.neuroimage.2015.04.069. [DOI] [PubMed] [Google Scholar]

[CR69] 69.Gogtay N, et al. Dynamic mapping of human cortical development during childhood through early adulthood. Proc. Natl. Acad. Sci. U. S. A. 2004;101:8174–8179. doi: 10.1073/pnas.0402680101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR70] 70.Larsen B, Verstynen TD, Yeh F-C, Luna B. Developmental changes in the integration of affective and cognitive corticostriatal pathways are associated with reward-driven behavior. Cereb. Cortex. 2018;28:2834–2845. doi: 10.1093/cercor/bhx162. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR71] 71.Liston C, et al. Frontostriatal microstructure modulates efficient recruitment of cognitive control. Cereb. Cortex N. Y. N. 2006;1991(16):553–560. doi: 10.1093/cercor/bhj003. [DOI] [PubMed] [Google Scholar]

[CR72] 72.Mills KL, et al. Structural brain development between childhood and adulthood: convergence across four longitudinal samples. NeuroImage. 2016;141:273–281. doi: 10.1016/j.neuroimage.2016.07.044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR73] 73.Raznahan A, et al. Longitudinal four-dimensional mapping of subcortical anatomy in human development. Proc. Natl. Acad. Sci. 2014;111:1592–1597. doi: 10.1073/pnas.1316911111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR74] 74.Sowell ER, Thompson PM, Holmes CJ, Jernigan TL, Toga AW. In vivo evidence for post-adolescent brain maturation in frontal and striatal regions. Nat. Neurosci. 1999;2:859–861. doi: 10.1038/13154. [DOI] [PubMed] [Google Scholar]

[CR75] 75.Hoops D, Flores C. Making dopamine connections in adolescence. Trends Neurosci. 2017;40:709–719. doi: 10.1016/j.tins.2017.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR76] 76.Wahlstrom D, White T, Luciana M. Neurobehavioral evidence for changes in dopamine system activity during adolescence. Neurosci. Biobehav. Rev. 2010;34:631–648. doi: 10.1016/j.neubiorev.2009.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR77] 77.Andersen SL, Rutstein M, Benzo JM, Hostetter JC, Teicher MH. Sex differences in dopamine receptor overproduction and elimination. NeuroReport. 1997;8:1495. doi: 10.1097/00001756-199704140-00034. [DOI] [PubMed] [Google Scholar]

[CR78] 78.Andersen SL, Thompson AT, Rutstein M, Hostetter JC, Teicher MH. Dopamine receptor pruning in prefrontal cortex during the periadolescent period in rats. Synapse. 2000;37:167–169. doi: 10.1002/1098-2396(200008)37:2<167::AID-SYN11>3.0.CO;2-B. [DOI] [PubMed] [Google Scholar]

[CR79] 79.Meng SZ, Ozawa Y, Itoh M, Takashima S. Developmental and age-related changes of dopamine transporter, and dopamine D1 and D2 receptors in human basal ganglia. Brain Res. 1999;843:136–144. doi: 10.1016/S0006-8993(99)01933-2. [DOI] [PubMed] [Google Scholar]

[CR80] 80.Montague DM, Lawler CP, Mailman RB, Gilmore JH. Developmental regulation of the dopamine D 1 receptor in human caudate and putamen. Neuropsychopharmacology. 1999;21:641–649. doi: 10.1016/S0893-133X(99)00062-7. [DOI] [PubMed] [Google Scholar]

[CR81] 81.Teicher MH, Andersen SL, Hostetter JC. Evidence for dopamine receptor pruning between adolescence and adulthood in striatum but not nucleus accumbens. Dev. Brain Res. 1995;89:167–172. doi: 10.1016/0165-3806(95)00109-Q. [DOI] [PubMed] [Google Scholar]

[CR82] 82.Hoops, D., Reynolds, L. M., Restrepo-Lozano, J.-M. & Flores, C. Dopamine development in the mouse orbital prefrontal cortex is protracted and sensitive to amphetamine in adolescence. eNeuro5, (2018). [DOI] [PMC free article] [PubMed]

[CR83] 83.Reynolds LM, et al. DCC receptors drive prefrontal cortex maturation by determining dopamine axon targeting in adolescence. Biol. Psychiatry. 2018;83:181–192. doi: 10.1016/j.biopsych.2017.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR84] 84.Balleine BW, O’Doherty JP. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology. 2010;35:48–69. doi: 10.1038/npp.2009.131. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR85] 85.Dickinson A. Actions and habits: the development of behavioural autonomy. Phil. Trans. R. Soc. Lond. B. 1985;308:67–78. doi: 10.1098/rstb.1985.0010. [DOI] [Google Scholar]

[CR86] 86.Yin HH, Knowlton BJ. The role of the basal ganglia in habit formation. Nat. Rev. Neurosci. 2006;7:464–476. doi: 10.1038/nrn1919. [DOI] [PubMed] [Google Scholar]

[CR87] 87.Sebold M, et al. Don’t think, just feel the music: individuals with strong pavlovian-to-instrumental transfer effects rely less on model-based reinforcement learning. J. Cogn. Neurosci. 2016;28:985–995. doi: 10.1162/jocn_a_00945. [DOI] [PubMed] [Google Scholar]

[CR88] 88.Wunderlich K, Smittenaar P, Dolan RJ. Dopamine enhances model-based over model-free choice behavior. Neuron. 2012;75:418–424. doi: 10.1016/j.neuron.2012.03.042. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR89] 89.Decker JH, Otto AR, Daw ND, Hartley CA. From creatures of habit to goal-directed learners tracking the developmental emergence of model-based reinforcement learning. Psychol. Sci. 2016;27:848–858. doi: 10.1177/0956797616639301. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR90] 90.Kenward B, Folke S, Holmberg J, Johansson A, Gredebäck G. Goal directedness and decision making in infants. Dev. Psychol. 2009;45:809–819. doi: 10.1037/a0014076. [DOI] [PubMed] [Google Scholar]

[CR91] 91.Klossek UMH, Russell J, Dickinson A. The control of instrumental action following outcome devaluation in young children aged between 1 and 4 years. J. Exp. Psychol. Gen. 2008;137:39–51. doi: 10.1037/0096-3445.137.1.39. [DOI] [PubMed] [Google Scholar]

[CR92] 92.Potter TCS, Bryce NV, Hartley CA. Cognitive components underpinning the development of model-based learning. Dev. Cogn. Neurosci. 2017;25:272–280. doi: 10.1016/j.dcn.2016.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR93] 93.Kessler RC, et al. Lifetime prevalence and age-of-onset distributions of dsm-iv disorders in the national comorbidity survey replication. Arch. Gen. Psychiatry. 2005;62:593–602. doi: 10.1001/archpsyc.62.6.593. [DOI] [PubMed] [Google Scholar]

[CR94] 94.Paus T, Keshavan M, Giedd JN. Why do many psychiatric disorders emerge during adolescence? Nat. Rev. Neurosci. 2008;9:947–957. doi: 10.1038/nrn2513. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR95] 95.Carter BL, Tiffany ST. Meta-analysis of cue-reactivity in addiction research. Addict. Abingdon Engl. 1999;94:327–340. doi: 10.1046/j.1360-0443.1999.9433273.x. [DOI] [PubMed] [Google Scholar]

[CR96] 96.Everitt BJ, Dickinson A, Robbins TW. The neuropsychological basis of addictive behaviour. Brain Res. Rev. 2001;36:129–138. doi: 10.1016/S0165-0173(01)00088-1. [DOI] [PubMed] [Google Scholar]

[CR97] 97.Garbusow M, et al. Pavlovian-to-instrumental transfer effects in the nucleus accumbens relate to relapse in alcohol dependence. Addict. Biol. 2016;21:719–731. doi: 10.1111/adb.12243. [DOI] [PubMed] [Google Scholar]

[CR98] 98.Mineka S, Oehlberg K. The relevance of recent developments in classical conditioning to understanding the etiology and maintenance of anxiety disorders. Acta Psychol. (Amst.) 2008;127:567–580. doi: 10.1016/j.actpsy.2007.11.007. [DOI] [PubMed] [Google Scholar]

[CR99] 99.Mkrtchian A, Aylward J, Dayan P, Roiser JP, Robinson OJ. Modeling avoidance in mood and anxiety disorders using reinforcement learning. Biol. Psychiatry. 2017;82:532–539. doi: 10.1016/j.biopsych.2017.01.017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR100] 100.Dahl, R. E. Adolescent brain development: a period of vulnerabilities and opportunities. Keynote Address. Ann. N. Y. Acad. Sci.1021, 1–22 (2004). [DOI] [PubMed]

[CR101] 101.Fuhrmann D, Knoll LJ, Blakemore S-J. Adolescence as a sensitive period of brain development. Trends Cogn. Sci. 2015;19:558–566. doi: 10.1016/j.tics.2015.07.008. [DOI] [PubMed] [Google Scholar]

[CR102] 102.Meyer HC, Lee FS. Translating developmental neuroscience to understand risk for psychiatric disorders. Am. J. Psychiatry. 2019;176:179–185. doi: 10.1176/appi.ajp.2019.19010091. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR103] 103.Romeo RD, McEwen BS. Stress and the adolescent brain. Ann. N. Y. Acad. Sci. 2006;1094:202–214. doi: 10.1196/annals.1376.022. [DOI] [PubMed] [Google Scholar]

[CR104] 104.Lupien SJ, McEwen BS, Gunnar MR, Heim C. Effects of stress throughout the lifespan on the brain, behaviour and cognition. Nat. Rev. Neurosci. 2009;10:434–445. doi: 10.1038/nrn2639. [DOI] [PubMed] [Google Scholar]

[CR105] 105.Anderson RI, Spear LP. Autoshaping in adolescence enhances sign-tracking behavior in adulthood: Impact on ethanol consumption. Pharmacol. Biochem. Behav. 2011;98:250–260. doi: 10.1016/j.pbb.2011.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR106] 106.Doremus-Fitzwater TL, Spear LP. Amphetamine-induced incentive sensitization of sign-tracking behavior in adolescent and adult female rats. Behav. Neurosci. 2011;125:661–667. doi: 10.1037/a0023763. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR107] 107.Anderson RI, Bush PC, Spear LP. Environmental manipulations alter age differences in attribution of incentive salience to reward-paired cues. Behav. Brain Res. 2013;257:83–89. doi: 10.1016/j.bbr.2013.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR108] 108.DeAngeli NE, Miller SB, Meyer HC, Bucci DJ. Increased sign-tracking behavior in adolescent rats. Dev. Psychobiol. 2017;59:840–847. doi: 10.1002/dev.21548. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR109] 109.Ousdal OT, et al. The impact of traumatic stress on Pavlovian biases. Psychol. Med. 2018;48:327–336. doi: 10.1017/S003329171700174X. [DOI] [PubMed] [Google Scholar]

[CR110] 110.de Boer L, et al. Dorsal striatal dopamine D1 receptor availability predicts an instrumental bias in action learning. Proc. Natl. Acad. Sci. 2019;116:261–270. doi: 10.1073/pnas.1816704116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR111] 111.Guitart-Masip M, et al. Action controls dopaminergic enhancement of reward representations. Proc. Natl. Acad. Sci. 2012;109:7511–7516. doi: 10.1073/pnas.1202229109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR112] 112.Cohen J. A power primer. Psychol. Bull. 1992;112:155–159. doi: 10.1037/0033-2909.112.1.155. [DOI] [PubMed] [Google Scholar]

[CR113] 113.Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans’ choices and striatal prediction errors. Neuron. 2011;69:1204–1215. doi: 10.1016/j.neuron.2011.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Adolescents exhibit reduced Pavlovian biases on instrumental learning

Hillary A Raab

Catherine A Hartley

Abstract

Introduction

Figure 1.

Results

Behavioral analyses

Figure 2.

Figure 3.

Computational modeling

Table 1.

Discussion

Methods

Participants

Task details

Statistical analysis

Computational models

Supplementary information

Acknowledgements

Author contributions

Data availability

Code availability

Competing interests

Footnotes

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Adolescents exhibit reduced Pavlovian biases on instrumental learning

Hillary A Raab

Catherine A Hartley

Abstract

Introduction

Figure 1.

Results

Behavioral analyses

Figure 2.

Figure 3.

Computational modeling

Table 1.

Discussion

Methods

Participants

Task details

Statistical analysis

Computational models

Supplementary information

Acknowledgements

Author contributions

Data availability

Code availability

Competing interests

Footnotes

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases