Significance
The physiological response evoked by short-lived stressful events, referred to as acute stress, impacts human decision-making. Past studies assume that stress causes people to fall back, from more cognitive or deliberative modes of choice, to more primitive or automatic modes of choice because stress impairs peoples’ capacity to process information (working memory). We directly examined how acute stress affects choice in a laboratory decision-making task for which the working memory demands of the two forms of decision-making are well understood, finding that stress impaired use of sophisticated choice strategies that require working memory but did not affect use of simpler, more primitive strategies. Further, this impairment was exacerbated in individuals with smaller working memory capacity, which is related to general intelligence.
Abstract
Accounts of decision-making have long posited the operation of separate, competing valuation systems in the control of choice behavior. Recent theoretical and experimental advances suggest that this classic distinction between habitual and goal-directed (or more generally, automatic and controlled) choice may arise from two computational strategies for reinforcement learning, called model-free and model-based learning. Popular neurocomputational accounts of reward processing emphasize the involvement of the dopaminergic system in model-free learning and prefrontal, central executive–dependent control systems in model-based choice. Here we hypothesized that the hypothalamic-pituitary-adrenal (HPA) axis stress response—believed to have detrimental effects on prefrontal cortex function—should selectively attenuate model-based contributions to behavior. To test this, we paired an acute stressor with a sequential decision-making task that affords distinguishing the relative contributions of the two learning strategies. We assessed baseline working-memory (WM) capacity and used salivary cortisol levels to measure HPA axis stress response. We found that stress response attenuates the contribution of model-based, but not model-free, contributions to behavior. Moreover, stress-induced behavioral changes were modulated by individual WM capacity, such that low-WM-capacity individuals were more susceptible to detrimental stress effects than high-WM-capacity individuals. These results enrich existing accounts of the interplay between acute stress, working memory, and prefrontal function and suggest that executive function may be protective against the deleterious effects of acute stress.
A number of accounts of human and animal decision-making posit the coexistence of separate valuation systems that control choice (1–4), which, broadly speaking, represent automatic or habitual vs. deliberative or controlled modes. The circumstances under which one system may dominate over the other and thereby exert control over behavior has been a question of interest in both neuroscience and psychology, in part because of the implications of such differential control for disorders of compulsion such as drug abuse (5, 6). Acute stress may afford unique leverage in isolating the properties of these systems, because it is believed to prompt a shift from more cognitive or deliberative processes to more automatic processes presumed to be underpinned by phylogenetically older brain structures (7).
Accordingly, a spate of recent work suggests that acute stress—indexed by changes in levels of cortisol, a neuroendocrine marker of stress response—engenders reliance on putative habitual and/or automatic processes in human decision-making (8–13), consistent with the assumption that the physiological stress response impairs central executive functions subserving more deliberative choice. However, distinguishing such processes is both experimentally and theoretically fraught, because in dual process theories, which system controls a particular behavior is typically ambiguous, and can only be recognized by characteristics (such as reaction times or conscious access) associated in different theories with either sort of control, and often only in the comparison between different tasks that promote either mode. Here we leverage a more operational version of this distinction based on reinforcement learning (RL) theory (1), which proposes that deliberative and automatic modes of decision-making arise from two distinct computationally precise and neurobiologically grounded learning strategies for evaluating actions from previous experiences. This approach allows us to characterize more precisely and within a single task the impact of physiological stress response upon trial-by-trial learning dynamics of either sort.
This RL framework (1) posits that choice behavior arises from a combination of two value learning systems that operate in parallel and whose fundamental difference is whether they rely on an “internal model” of task contingencies for evaluating choices. The model-based system is computationally sophisticated and learns a model of the environment to plan candidate courses of action prospectively. In contrast, the model-free system eschews this model and merely prescribes that previously rewarded actions are repeated, akin to the Law of Effect and to prominent theories in which dopaminergic prediction error responses drive learning about action preferences at target areas such as the striatum (14, 15). Because these hypothesized modes of choice are defined quantitatively as arising from different trial-by-trial learning rules, they make clear and divergent predictions about subjects’ trial-by-trial adjustment of decision preferences in response to feedback, enabling the contributions of both approaches to be dissociated experimentally. In fact, many laboratory choice tasks cannot differentiate between the contributions of the learning strategies, because when each action is paired with a single reward, the two sorts of value learning reduce to the same learning rule. However, the strategies differ appreciably in sequentially structured choice tasks. Recent work, informed by this approach, reveals that under normal circumstances, human learning in such tasks exhibits contributions of both putative systems (16–18). The grounding of these theories in neurocomputational models (19) and work on animal learning (4) also provides a unique perspective on dual process architectures, complementary to a set of views whose roots lie more in human cognitive neuroscience.
In line with the considerable computational requirements of model-based evaluation (1, 20), and with evidence that this process relies on the prefrontal cortex (PFC, 4), recent work suggests that the model-based system imposes considerable demands on central executive resources. In particular, depletion of working-memory (WM) resources abolishes model-based contributions to learning behavior but spares model-free contributions (21). At the same time, a different line of work examining central executive function under acute stress reveals how neurophysiological stress response engenders WM capacity impairment (22, 23) and reduction of WM-related activity in the PFC as assessed by neuroimaging (24).
On the basis of these two lines of work, an intuitive prediction emerges: stress response—as it deleteriously impacts the PFC-dependent executive resources—should selectively reduce model-based learning, but simultaneously spare model-free learning. Closely supporting this prediction, previous investigations reveal that acute stress engenders reliance on habitual behaviors, at the expense of flexible, goal-directed responding. However, because the two forms of choice were differentiated by posttraining probe trials—testing flexible sensitivity to reinforcer devaluation (25) or to a conjunction of spatial cues (26)—it remains to be investigated how and whether stress affects either of the two sorts of trial-by-trial learning dynamics that have been hypothesized to give rise to the endpoint behaviors probed there (1).
A complimentary possibility is suggested by findings that acute stress can increase firing rates of dopaminergic neurons (27) and extracellular dopamine levels in the neural structures putatively underpinning model-free RL (28). We might thus expect, alternately or additionally, that stress would modulate or even strengthen model-free learning. There is indeed recent evidence for effects of stress on probabilistic reward learning (29, 30). However, the task used does not permit differentiating model-based from model-free contributions to learning.
Here we elucidate the impact of hypothalamic-pituitary-adrenal (HPA) axis stress response on the expression of model-based and model-free contributions to sequential choice behavior. In the RL task we use (16) model-based and model-free learning strategies—distinguished, respectively, by their utilization and ignorance of the full environment structure—that give rise to distinct and quantifiable behavioral signatures. Our results reveal how the physiological stress response attenuates the influence of model-based (but not model-free) learning, underlining the distinct and separable contributions of these theorized valuation systems.
Further, in line with the central-executive–dependent nature of the model-based system, we shed light on how individual differences in WM capacity (often taken as a general measure of executive function and fluid intelligence, 31), modulate the effect of physiological stress response on model-based choice. Specifically, we demonstrate that subjects with more executive resources to spare find themselves less susceptible to the behavioral changes brought about by stress response, elucidating the interplay between acute stress, executive function, and dual-system accounts of decision-making.
Results
Subjects performed 200 trials of a two-step RL task (Fig. 1) (22, 29), designed to dissociate model-free and model-based learning strategies. In each two-stage trial, subjects made an initial first-stage choice between two options (depicted as fractals), which probabilistically leads to one of two second-stage states (colored green or blue). In each of these subsequent states, subjects made another choice between two options, which were associated with different probabilities of monetary reward. Choosing the left action at the first stage usually leads to the green state (70% of the time, a common transition) but sometimes leads to the blue state (30% of the time, a rare transition). Because the reward probabilities associated with second-stage choices drift over time according to independent random walks, subjects need to make trial-by-trial adjustments to their choices at both stages to effectively maximize payoffs.
Fig. 1.
State transition and reward structure in the two-step RL task. Each first-stage choice (black background) is predominantly associated with one or the other of the second-stage states (green and blue backgrounds) and leads there 70% of the time. These second-stage choices are probabilistically reinforced with money, whose reward probabilities change over the course of the experiment (see Results for a detailed explanation).
Model-based and model-free strategies make different predictions about how the history of rewards received at the second stage should influence first-stage choices, owing to the fact that the model-free approach evaluates actions retrospectively, by learning to repeat actions that tend to be rewarded, whereas model-based learning evaluates actions prospectively, in terms a learned model of their likely consequences. For example, consider a first-stage choice that results in a rare transition to a second-stage state, and the subsequent second-stage choice is rewarded. Under a pure model-free strategy, by virtue of the reinforcement principle [or the temporal difference (TD) algorithm ( for
], one would have an increased chance of repeating the same first-stage response because it ultimately resulted in reward. In contrast, a model-based strategy—using a model of the task’s transition structure—predicts a decreased tendency to repeat the same first-stage action because the other first-stage action is the one that is more likely to lead to that rewarded second-stage state.
Accordingly, below we examine how stress alters the learning systems’ contributions by examining trial-by-trial adjustments in choices as subjects receive feedback. First, by formalizing each system’s learning (1) with a trial-by-trial mathematical model and fitting it to subjects’ choices, we measure how stress response affects the relative expression of the two learning systems. Next, we probe how stress impacts more qualitative signatures of each system, by examining trial-by-trial staying or switching in response to choice outcomes, for which the two accounts of learning predict different patterns.
Physiological and Subjective Response to Stress.
We manipulated stress levels by having subjects undergo either the cold pressor test (CPT) task (32), an acute stress induction in which subjects submerged their arms in ice water for 3 min, or a control task using room temperature water. Baseline-subtracted cortisol concentrations over the four samples are plotted in Fig. 2 (raw data provided in Table S1). Critically, we found a significant interaction between condition (stress/control) and time of cortisol measurement (F = 19.99, P < 0.0001), indicating that the acute stressor induced a marked cortisol response. Moreover, within groups, cortisol concentrations did not change significantly between s3 and s4 (P > 0.54), suggesting cortisol concentrations remained steady throughout the RL task. Subjects in the stress condition reported that the CPT was significantly more unpleasant [mean (M) = 6.68, SD = 0.54] than control subjects (M = 2.19, SD = 0.38, t = 6.95, P < 0.001), indicating that the manipulation evoked a subjective stress response.
Fig. 2.
Cortisol was significantly elevated among stress subjects, relative to controls, at the two time points following administration of the CPT (s3 and s4). Error bars denote SEM.
Stress Response and Model-Based Behavioral Contributions.
We fit a dual-system RL model—a computational instantiation of the principles governing two hypothesized choice systems (16, 18)—to subjects’ trial-by-trial choices (SI Text). This model consists of a model-free system that updates estimates of choice values using TD learning and a model-based system that learns a transition and reward model of the task and uses these to compute choice values on the fly. The model includes parameters controlling the influence of each system in determining choice and a learning rate controlling the decay timescale over which past rewards are considered in the systems’ learning. We used hierarchical Bayesian model-fitting techniques (33) to estimate these parameters (Table S2).
Critically, how subjects adjust their trial-by-trial choice preferences in response to feedback reveals the extent to which they rely on either system. Table 1 reports the estimated parameters. Mirroring findings from previous work (16, 21, 34, 35) that both strategies influenced behavior, and
, which quantify the weight given to model-free and model-based values in determining choice, respectively, were both significantly positive. We additionally estimated the extent to which each of these parameters changed with the stress response (quantified by cortisol delta; Materials and Methods). The parameter
which quantifies change in model-based contribution as a function of an individual subject’s stress response, was significantly negative, indicating that model-based contributions decreased as stress response increased (Fig. 3A). We further hypothesized that, as model-free choice does not impose the same requirements on central executive resources as model-based choice, stress should not impact the contribution of a model-free strategy. Indeed,
was not significantly different from zero, indicating that cortisol delta (i.e., stress response) did not alter model-free contributions (Fig. 3B).
Table 1.
Medians and 95% CI boundaries for the four parameters of interest, relating stress and OSPAN to model-based and model-free contributions
Parameter | Description | Median | Lower 95% | Upper 95% |
![]() |
Model-based weight | 0.313 | 0.149 | 0.492 |
![]() |
Model-free weight | 0.693 | 0.533 | 0.88 |
![]() |
Cortisol effect on model-based weight | −0.2261 | −0.4011 | −0.0552 |
![]() |
Cortisol effect on model-free weight | 0.0069 | −0.1693 | 0.1870 |
![]() |
Cortisol × OSPAN effect on model-based weight | 0.4201 | 0.1341 | 0.7508 |
![]() |
Cortisol × OSPAN effect on model-free weight | 0.0211 | −0.2291 | 0.2723 |
Fig. 3.
Effect of stress on model-based vs. model-free value weights, as determined by the computational model. (A) Individual subjects’ model-based value weights, plotted separately for subjects in the control and stress conditions. There was a significant negative effect of cortisol delta on expression of model-based learning, indicating cortisol change diminished its behavioral expression. (B) Model-free contribution to behavior. Note that there was no significant effect of cortisol change on expression of model-free choice, indicating that expression of model-free contribution is spared. Regression lines are computed from the population-level estimate of the log-linear effect of stress on model-based weight. Dashed gray lines indicate 2 SEs.
Individual Differences in WM Capacity.
We examined how individual WM capacity—operationalized by Operation Span (OSPAN; Materials and Methods)—modulates the effect of cortisol response on model-based choice. The parameter , which quantifies the change in model-based contribution as a function of the interaction between OSPAN and stress response, was significantly positive. The positive relationship indicates that subjects with lower WM capacities were more susceptible to the effect of cortisol delta on model-based choice contribution, whereas subjects with higher WM capacities were effectively shielded from this effect.
This relationship is visualized in Fig. 4, by dividing subjects into low and high WM capacities according to a median split: among subjects low in WM capacity, cortisol delta reduced the expression of model-based choice (Fig. 4A), but among subjects high in WM capacity, cortisol response did not produce an appreciable impact on model-based contributions to behavior (Fig. 4B). Furthermore, as the predicted locus of the OSPAN shielding effect is model-based (but not model-free) contributions, we found that OSPAN did not interact significantly with the relationship between cortisol response and previous reward, the marker for model-free learning (i.e., was not significantly different from zero; Table 1).
Fig. 4.
Effect of stress on model-based learning as a function of individual WM capacity, as measured by OSPAN. Individual subjects’ model-based value weights are plotted for low OSPAN subjects (A) and high OSPAN subjects (B). Cortisol response markedly dampened expression of model-based choice in the low OSPAN subgroup but not in the high OSPAN subgroup. Regression lines are computed from the population-level estimate of the log-linear effect of stress on model-based weight. Dashed gray lines indicate 2 SEs.
Logistic Regression Analysis.
To more directly characterize the effect of stress and OSPAN on learning, we examined how the outcomes of each trial impact the next trial’s choice, an approach taken in previous work (16, 21, 32). This restricted analysis permits a more direct and qualitative examination of model-based and model-free contributions to trial-by-trial learning, because the two strategies make qualitatively distinct predictions about how the reward (rewarded vs. unrewarded) and transition type (common vs. rare) on the immediately preceding trial should influence first-stage choices (Fig. 5). A pure model-free strategy prescribes that the previous reward should influence whether a first-stage action is repeated, independent of which state (common or rare) it was received in. Thus, this algorithm predicts only a main effect of reward. In contrast, a model-based strategy predicts an interaction between the two factors because the effect of the previous reward on the first-stage choice depends on which state it was received. Note that, although both systems (in principle and empirically as estimated above) learn incrementally so that multiple preceding trials’ outcomes influence each choice, these qualitative effects of the single most recent trial still hold.
Fig. 5.
(A) A model-based based choice strategy predicts that rewards after rare transitions should affect the value of the unchosen first-stage option, leading to a predicted interaction between the factors of reward and transition probability. (B) In contrast, a model-free strategy predicts that a first-stage choice resulting in reward is more likely to be repeated on the subsequent trial regardless of whether that reward occurred after a common or rare transition.
The regression analysis confirmed the basic signatures of model-free and model-based strategies as described by the computational model, expressed as significant effects of both previous reward and the interaction between previous reward and transition type (P < 0.001; Table 2, Table S3, and Fig. S1). Moreover, the regression revealed that stress effectively attenuated the model-based learning, expressed as the negative interaction between cortisol response, previous reward, and transition type (P < 0.01), but not model-free learning, expressed as the simple effect of previous reward (P > 0.5).
Table 2.
Logistic regression coefficients indicating the influence of cortisol response, outcome of previous trial, and transition type of previous trial, on response repetition
Coefficient | Estimate (SE) | P value |
(Intercept) | 1.84 (0.16) | <0.0001* |
Reward | 0.80 (0.09) | <0.0001* |
Transition | 0.10 (0.05) | 0.060 |
Cortisol delta | 0.06 (0.16) | 0.712 |
Reward × transition | 0.25 (0.06) | <0.0001* |
Cortisol delta × reward | 0.05 (0.09) | 0.554 |
Cortisol delta × transition | −0.07 (0.06) | 0.180 |
Cortisol delta × reward × transition | −0.14 (0.06) | 0.018* |
Significance at the 0.05 level.
We also specified a model examining how cortisol delta and OSPAN interacted with the same trial-by-trial variables in the above analysis (Table S4). Critically, OSPAN significantly interacted with the three-way interaction between cortisol response, previous reward, and previous transition type (the interaction signifying cortisol response’s effect on model-based choice, P < 0.01). The positive coefficient indicates that subjects with lower WM capacities were more susceptible to stress’ effects on model-based learning, corroborating the computational model fits (Fig. S2).
Discussion
Although a recent body of work has sought to understand the impact of stress on decision-making through a dual-systems framework (10, 36), in the absence of a clear computational framework for valuation, it is difficult to determine the locus of the stress-induced breakdown. Recent work (16, 18) suggests that sequential choice results from two distinct learning strategies for determining choice value from previous experience. Moreover, although dual-process accounts in psychology emphasize the role of WM capacity in determining reliance on and behavioral expression of the two systems (37, 38), the dependence of the two hypothesized modes of choice on central executive resources is not well understood (39). Leveraging a contemporary RL-based framework (1) in which the behavioral contributions of model-based and model-free strategies are separately identifiable and their differential demands on the central executive resources have been characterized (21), we reveal how neurophysiological stress response diminishes the contribution of a computationally expensive, model-based choice strategy but leaves intact the contribution of the more parsimonious model-free valuation system. This approach yields a rich picture of acute stress’ impact on decision-making as it ties together lines of work examining stress response and PFC-dependent executive functions and dual-system theories of choice.
Perhaps more striking is that individual WM capacity—closely related to fluid intelligence and general cognitive ability (31)—appears to protect decision-makers from the deleterious effects of stress response. That is, we found that cortisol reactivity hampers the expression of model-based choice in low, but not high, WM capacity individuals. This result dovetails with notion of “cognitive reserve” in neuropsychology (40). On this view, individual differences in cognitive ability (often operationalized as IQ) allow some people to cope better than others with brain insult. It is conceivable, in the present study, that individuals with greater processing capacity (indexed here by OSPAN) were less burdened by the computational expense of model-based choice (20) and thus, found their choices less severely impacted by HPA axis response. Indeed, such individual differences could elucidate the considerable heterogeneity found in stress-induced changes to decision-making (36): individuals with larger executive capacities could find their behavior less compromised by the HPA axis response.
The effects of acute stress on dopamine (28, 41) are the most obvious candidate for a mechanism by which stress might affect either model-based [via PFC (4)] or model-free [via striatum (19)] learning. Although the early sympathetic nervous system component of the stress response is known to result in rapid release of catecholamines in the PFC and other areas, and the resulting increase of dopamine (DA) levels is deleterious to PFC-dependent functions such as WM maintenance (7), our study focuses on the HPA axis stress response, for the simple, practical reason that the RL task takes time to administer. The release of glucocorticoids, indexed here by changes in cortisol levels, is observed to prolong this typically short-lived DA release in the PFC, among other regions (7, 41, 42). It is conceivable then that supraoptimal levels of DA (43) induced after the stressor and perpetuated by increases in cortisol release underlie the stress-induced deficits in central-executive–dependent, model-based behavior observed here.
In principle, a synergistic effect to weakening model-based learning might be strengthening model-free learning. Dopaminergic effects of stress might have been expected to produce such an effect as well, because increased striatal DA levels brought about by stress are hypothesized to increase overall sensitivity to reward (44). Although we found no such effect in our data, recent human probabilistic learning results may support this hypothesis (30). The probabilistic selection task used there does not formally dissociate model-free from model-based learning, but unlike our task, it does dissociate learning to choose vs. avoid—the locus of the reported stress effect.
Characterizing more precisely how neurophysiological stress response alters the expression of the two hypothesized valuations systems is of practical importance because acute stress is believed to facilitate drug-seeking and relapse (45). At the same time, prominent accounts of addiction (6) ascribe these compulsive behaviors to aberrant expression of the habitual system (instantiated here as the model-free system) at the expense of the goal-directed action (instantiated here as the model-based system). The finding that HPA axis response selectively reduces model-based contributions to behavior dovetails neatly with these accounts: perhaps the drug-seeking and/or relapse engendered by acute stress can be explained in part by a breakdown of the prospective, model-based valuation system.
Although the breakdown of top-down and prefrontal-dependent functions (7) is assumed to underlie the deleterious effects of neurophysiological stress response on model-based choice, a resource-allocation explanation of these results merits speculation here. One influential proposal suggests that people adapt to stress by falling back on strategies with fewer cognitive demands and in doing so, preventing unreliable performance that would ensue from failure to carry out more resource-demanding strategies (46). A recent, more computational proposal (20) frames the arbitration between model-based vs. model-free RL as a tradeoff between time cost and behavioral flexibility, both of which are high in model-based but low in model-free RL. Were the neurophysiological stress response to promote internal time pressure, we would expect the effects observed here. However, whether people register, implicitly or explicitly, the temporal and cognitive costs of model-based choice warrants future research.
Materials and Methods
Participants.
Fifty-six healthy individuals from the New York University community participated in this experiment (30 women, age: M = 25.67 y; SD = 7.27 y) and were paid 5 cents per rewarded trial to incentivize performance. The proportions of women in control and stress conditions were 0.50 and 0.58, respectively. All participants provided written informed consent in accordance with procedures approved by the New York University Committee on Activities Involving Human Subjects. Following previous work (21), we identified and excluded participants who failed to demonstrate engagement with the choice task. Specifically, we excluded data of four participants who failed to meet a response deadline greater than 15 times. We used a further step to remove participants who failed to demonstrate sensitivity to rewards in the decision task using second-stage choices, excluding the data of four participants who repeated previously rewarded second-stage responses—i.e., P(stay|win)—at a rate less than 50%.
Cortisol Measurement.
To assess stress responses, saliva samples were collected throughout the task to assess cortisol concentrations. Samples were collected using an absorbent oral swab that participants placed under their tongues for 2 min. To control for diurnal rhythms in cortisol levels, all participants were run between the hours of 1:00 and 6:00 PM. Sample collection occurred at baseline after a 10-min acclimation period (s1), immediately after OSPAN measurement and task instructions (s2, ∼25 min after s1), 10 min after CPT administration (s3, ∼43 min after s1), and immediately following the RL task (s4, ∼64 min after s1). Cortisol responses to stress were expected to peak during the RL task (10 min after the stress manipulation) (32). Samples were frozen and preserved immediately after testing at −30 °C and were transported frozen to a Clinical Laboratory Improvement Amendments-certified analytical laboratory where cortisol concentrations were determined with high-sensitivity enzyme immunoassay kits (Salimetrics). Duplicate assays were conducted for each sample interval, and the average of the two values was used in our analyses. Because of the skewed nature of cortisol concentration distributions, these values were log-transformed in all statistical tests (29). For each subject, cortisol delta was calculated by subtracting the average of s1 and s2 (pre-CPT) from the average of s3 and s4 (post-CPT).
OSPAN Measurement.
To assess working memory capacity, we administered an automated version of the OSPAN procedure (47), which required participants to remember a series of letters while performing a series of arithmetic problems and which lasted ∼15 min. OSPAN scores were calculated by summing the number of letters selected for all correctly selected sets and ranged from 11 to 75 (M = 48.08, SD = 17.61).
Stress Induction.
In the stress condition, subjects were administered the CPT, described previously (33). Briefly, subjects in the stress condition were asked to immerse their right hand up to and including the wrist for 3 min in ice water (0–5 °C). Subjects in the control condition submerged their right hand up to and including the wrist for 3 min into room temperature water (21–30 °C). Immediately after, subjects indicated on a scale ranging from 0 (not at all) to 10 (very much) how unpleasant they found the immersion procedure.
RL Task.
Immediately after the OSPAN procedure, participants were given the task instructions and completed 10 practice trials to familiarize themselves with the task structure and response procedure. Note that at this point, the control and stress groups were subject to the identical procedure, and thus differences in choice behavior could not be attributable to the conditions under which task instructions were given. Following administration of the cold pressor test and cortisol sample s3, participants completed 200 trials of the two-step RL task (Fig. 1A) immediately after sample s3 was taken. In the first step, two fractal images appeared on a black background (indicating the initial state), and there was a 1.5-s response window in which participants could choose the left- or right-hand response using the Z or ? key, respectively. After a choice was made, the selected action was highlighted for the remainder of the response window followed by the background color changing according to the second-stage state the participant had transitioned to. After the transition, the background color changed to reflect the second-stage state and the selected first-stage action moved to the top of the screen. Two fractal images, corresponding to the actions available in the second stage, were displayed, and participants again had 1.5 s to make a response. The selected action was highlighted for the remainder of the response window. Then, either a picture of a quarter was shown (indicating that they had been rewarded that trial) or the number zero (indicating that they had not been rewarded that trial) was shown. The reward probabilities associated with second-stage actions were governed by independently drifting Gaussian random walks (SD = 0.025) with reflecting boundaries at 0.25 and 0.75. The mapping of actions to stimuli and transition probabilities was randomized across participants.
Data Analysis.
Cortisol deltas were log(+1) transformed to remove positive skew and were, along with OSPAN scores, entered into the RL model and regressions as z-scores. Details of the model fitting procedure and the regression specification are provided in SI Text. We fit subjects’ choices using a full RL model that allows for choices to be influenced by the entire preceding history of rewards. The model follows closely the hybrid model described in ref. 16. For each parameter estimate, we computed a 95% CI; if 0 falls outside this interval, we can reject the null hypothesis that the true value is zero or more extreme with 95% confidence.
Supplementary Material
Acknowledgments
This work was supported by a National Institute of Mental Health Grant (1R01MH087882-01 to N.D.D.), a National Institutes of Health Grant R01 AG039283 (to E.A.P.), and an Award in Understanding Human Cognition from the McDonnell Foundation (to N.D.D.).
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1312011110/-/DCSupplemental.
References
- 1.Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci. 2005;8(12):1704–1711. doi: 10.1038/nn1560. [DOI] [PubMed] [Google Scholar]
- 2.Kahneman D, Frederick S. In: Heuristics and Biases: The Psychology of Intuitive Judgment. Gilovich T, Griffin D, Kahneman D, editors. Cambridge, UK: Cambridge Univ Press; 2002. pp. 49–81. [Google Scholar]
- 3.Sloman SA. The empirical case for two systems of reasoning. Psychol Bull. 1996;119(1):3–22. [Google Scholar]
- 4.Balleine BW, O’Doherty JP. Human and rodent homologies in action control: Corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology. 2010;35(1):48–69. doi: 10.1038/npp.2009.131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Redish AD, Jensen S, Johnson A. A unified framework for addiction: Vulnerabilities in the decision process. Behav Brain Sci. 2008;31(4):415–437. doi: 10.1017/S0140525X0800472X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Everitt BJ, Robbins TW. Neural systems of reinforcement for drug addiction: From actions to habits to compulsion. Nat Neurosci. 2005;8(11):1481–1489. doi: 10.1038/nn1579. [DOI] [PubMed] [Google Scholar]
- 7.Arnsten AFT. Stress signalling pathways that impair prefrontal cortex structure and function. Nat Rev Neurosci. 2009;10(6):410–422. doi: 10.1038/nrn2648. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Leder J, Häusser JA, Mojzisch A. Stress and strategic decision-making in the beauty contest game. Psychoneuroendocrinology. 2013;38(9):1503–1511. doi: 10.1016/j.psyneuen.2012.12.016. [DOI] [PubMed] [Google Scholar]
- 9.Pabst S, Brand M, Wolf OT. Stress and decision making: A few minutes make all the difference. Behav Brain Res. 2013;250:39–45. doi: 10.1016/j.bbr.2013.04.046. [DOI] [PubMed] [Google Scholar]
- 10.Porcelli AJ, Delgado MR. Acute stress modulates risk taking in financial decision making. Psychol Sci. 2009;20(3):278–283. doi: 10.1111/j.1467-9280.2009.02288.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Putman P, Antypa N, Crysovergi P, van der Does WAJ. Exogenous cortisol acutely influences motivated decision making in healthy young men. Psychopharmacology (Berl) 2010;208(2):257–263. doi: 10.1007/s00213-009-1725-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Schwabe L, Wolf OT. Stress-induced modulation of instrumental behavior: from goal-directed to habitual control of action. Behav Brain Res. 2011;219(2):321–328. doi: 10.1016/j.bbr.2010.12.038. [DOI] [PubMed] [Google Scholar]
- 13.Starcke K, Wolf OT, Markowitsch HJ, Brand M. Anticipatory stress influences decision making under explicit risk conditions. Behav Neurosci. 2008;122(6):1352–1360. doi: 10.1037/a0013281. [DOI] [PubMed] [Google Scholar]
- 14.Frank MJ, Seeberger LC, O’reilly RC. By carrot or by stick: Cognitive reinforcement learning in parkinsonism. Science. 2004;306(5703):1940–1943. doi: 10.1126/science.1102941. [DOI] [PubMed] [Google Scholar]
- 15.Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci. 1996;16(5):1936–1947. doi: 10.1523/JNEUROSCI.16-05-01936.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans’ choices and striatal prediction errors. Neuron. 2011;69(6):1204–1215. doi: 10.1016/j.neuron.2011.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gershman SJ, Markman AB, Otto AR. Retrospective revaluation in sequential decision making: A tale of two systems. J Exp Psychol Gen. 2012 doi: 10.1037/a0030844. in press. [DOI] [PubMed] [Google Scholar]
- 18.Gläscher J, Daw N, Dayan P, O’Doherty JP. States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron. 2010;66(4):585–595. doi: 10.1016/j.neuron.2010.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275(5306):1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
- 20.Keramati M, Dezfouli A, Piray P. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLOS Comput Biol. 2011;7(5):e1002055. doi: 10.1371/journal.pcbi.1002055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Otto AR, Gershman SJ, Markman AB, Daw ND. The curse of planning: Dissecting multiple reinforcement-learning systems by taxing the central executive. Psychol Sci. 2013;24(5):751–761. doi: 10.1177/0956797612463080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lupien SJ, Gillin CJ, Hauger RL. Working memory is more sensitive than declarative memory to the acute effects of corticosteroids: A dose-response study in humans. Behav Neurosci. 1999;113(3):420–430. doi: 10.1037//0735-7044.113.3.420. [DOI] [PubMed] [Google Scholar]
- 23.Schoofs D, Wolf OT, Smeets T. Cold pressor stress impairs performance on working memory tasks requiring executive functions in healthy young men. Behav Neurosci. 2009;123(5):1066–1075. doi: 10.1037/a0016980. [DOI] [PubMed] [Google Scholar]
- 24.Qin S, Hermans EJ, van Marle HJF, Luo J, Fernández G. Acute psychological stress reduces working memory-related activity in the dorsolateral prefrontal cortex. Biol Psychiatry. 2009;66(1):25–32. doi: 10.1016/j.biopsych.2009.03.006. [DOI] [PubMed] [Google Scholar]
- 25.Schwabe L, Wolf OT. Stress prompts habit behavior in humans. J Neurosci. 2009;29(22):7191–7198. doi: 10.1523/JNEUROSCI.0979-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Schwabe L, et al. Stress modulates the use of spatial versus stimulus-response learning strategies in humans. Learn Mem. 2007;14(1):109–116. doi: 10.1101/lm.435807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Anstrom KK, Woodward DJ. Restraint increases dopaminergic burst firing in awake rats. Neuropsychopharmacology. 2005;30(10):1832–1840. doi: 10.1038/sj.npp.1300730. [DOI] [PubMed] [Google Scholar]
- 28.Abercrombie ED, Keefe KA, DiFrischia DS, Zigmond MJ. Differential effect of stress on in vivo dopamine release in striatum, nucleus accumbens, and medial frontal cortex. J Neurochem. 1989;52(5):1655–1658. doi: 10.1111/j.1471-4159.1989.tb09224.x. [DOI] [PubMed] [Google Scholar]
- 29.Petzold A, Plessow F, Goschke T, Kirschbaum C. Stress reduces use of negative feedback in a feedback-based learning task. Behav Neurosci. 2010;124(2):248–255. doi: 10.1037/a0018930. [DOI] [PubMed] [Google Scholar]
- 30.Lighthall NR, Gorlick MA, Schoeke A, Frank MJ, Mather M. Stress modulates reinforcement learning in younger and older adults. Psychol Aging. 2013;28(1):35–46. doi: 10.1037/a0029823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Conway ARA, Kane MJ, Engle RW. Working memory capacity and its relation to general intelligence. Trends Cogn Sci. 2003;7(12):547–552. doi: 10.1016/j.tics.2003.10.005. [DOI] [PubMed] [Google Scholar]
- 32.McRae AL, et al. Stress reactivity: Biological and subjective responses to the cold pressor and Trier Social stressors. Hum Psychopharmacol. 2006;21(6):377–385. doi: 10.1002/hup.778. [DOI] [PubMed] [Google Scholar]
- 33.Lee MD. How cognitive modeling can benefit from hierarchical Bayesian models. J Math Psychol. 2011;55(1):1–7. [Google Scholar]
- 34.Smittenaar P, Fitzgerald THB, Romei V, Wright ND, Dolan RJ. Disruption of dorsolateral prefrontal cortex decreases model-based in favor of model-free control in humans [published online ahead of print on October 22, 2013] Neuron. 2013 doi: 10.1016/j.neuron.2013.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wunderlich K, Smittenaar P, Dolan RJ. Dopamine enhances model-based over model-free choice behavior. Neuron. 2012;75(3):418–424. doi: 10.1016/j.neuron.2012.03.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Starcke K, Brand M. Decision making under stress: A selective review. Neurosci Biobehav Rev. 2012;36(4):1228–1248. doi: 10.1016/j.neubiorev.2012.02.003. [DOI] [PubMed] [Google Scholar]
- 37.Shamosh NA, et al. Individual differences in delay discounting: Relation to intelligence, working memory, and anterior prefrontal cortex. Psychol Sci. 2008;19(9):904–911. doi: 10.1111/j.1467-9280.2008.02175.x. [DOI] [PubMed] [Google Scholar]
- 38.Bickel WK, Jarmolowicz DP, Mueller ET, Gatchalian KM, McClure SM. Are executive function and impulsivity antipodes? A conceptual reconstruction with special reference to addiction. Psychopharmacology (Berl) 2012;221(3):361–387. doi: 10.1007/s00213-012-2689-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Peters J, Büchel C. The neural mechanisms of inter-temporal decision-making: understanding variability. Trends Cogn Sci. 2011;15(5):227–239. doi: 10.1016/j.tics.2011.03.002. [DOI] [PubMed] [Google Scholar]
- 40.Stern Y. Cognitive reserve. Neuropsychologia. 2009;47(10):2015–2028. doi: 10.1016/j.neuropsychologia.2009.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Butts KA, Weinberg J, Young AH, Phillips AG. Glucocorticoid receptors in the prefrontal cortex regulate stress-evoked dopamine efflux and aspects of executive function. Proc Natl Acad Sci USA. 2011;108(45):18459–18464. doi: 10.1073/pnas.1111746108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Nagano-Saito A, et al. Stress-induced dopamine release in human medial prefrontal cortex-(18) F-Fallypride/PET study in healthy volunteers. Synapse. 2013;67(12):821–830. doi: 10.1002/syn.21700. [DOI] [PubMed] [Google Scholar]
- 43.Cools R, D’Esposito M. Inverted-U-shaped dopamine actions on human working memory and cognitive control. Biol Psychiatry. 2011;69(12):e113–e125. doi: 10.1016/j.biopsych.2011.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Mather M, Lighthall NR. Both Risk and Reward are Processed Differently in Decisions Made Under Stress. Curr Dir Psychol Sci. 2012;21(2):36–41. doi: 10.1177/0963721411429452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Sinha R. How does stress increase risk of drug abuse and relapse? Psychopharmacology (Berl) 2001;158(4):343–359. doi: 10.1007/s002130100917. [DOI] [PubMed] [Google Scholar]
- 46.Steinhauser M, Maier M, Hübner R. Cognitive control under stress: How stress affects strategies of task-set reconfiguration. Psychol Sci. 2007;18(6):540–545. doi: 10.1111/j.1467-9280.2007.01935.x. [DOI] [PubMed] [Google Scholar]
- 47.Unsworth N, Heitz RP, Schrock JC, Engle RW. An automated version of the operation span task. Behav Res Methods. 2005;37(3):498–505. doi: 10.3758/bf03192720. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.