Significance
Strategic behavior requires learning and decision processes to interact. The role of subcortical regions in these processes has been demonstrated in animal research but is less well known in humans. Here, we use advanced methods to study human subcortical contributions to choice and error-driven learning. We used joint brain–behavior models, simultaneously fit to neural and behavioral data, which improve interpretation and statistical power. Results demonstrate wide-ranging subcortical involvement in value processing, reward prediction errors, and urgency settings. This study paves the way for applying joint models in studying brain–behavior relations and for further refining our understanding of the human subcortex.
Keywords: linking propositions, error-driven learning, reinforcement learning evidence accumulation models (RL-EAMs), Bayesian hierarchical estimation
Abstract
Decision making and learning processes together enable adaptive strategic behavior. Animal studies demonstrated the importance of subcortical regions in these cognitive processes, but the human subcortical contributions remain poorly characterized. Here, we study choice and learning processes in the human subcortex, using a tailored ultra-high field 7T functional MRI protocol combined with joint models. Joint models provide unbiased estimates of brain–behavior relations by simultaneously including behavioral and neural data at the participant and group level. Results demonstrate relations between subcortical regions and the adjustment of decision urgency. Value-related blood-oxygenation level dependent (BOLD) differences were found with opposite BOLD polarity in different parts of the striatum. Multiple subcortical regions showed BOLD signatures of reward prediction error processing, but contrary to expectations, these did not include the dopaminergic midbrain. Combined, this study characterizes the human subcortical contributions to choice and learning, and demonstrates the feasibility and value of joint modeling in facilitating our understanding of brain–behavior relationships.
Decision making and instrumental learning continuously interact (1): error-driven learning processes refine and update the information on which value-based choices are made. In behavioral studies, recent advances have integrated insights from the traditionally separate fields of perceptual decision-making on the one hand, and error-driven learning on the other, into a singular framework (2–13). The combination of evidence accumulation to threshold (a core principle from decision-making research) and simple delta rules (a core principle in reinforcement learning) was shown to provide a precise characterization of behavior in instrumental learning tasks: it can explain response time distributions, choice accuracy, and the learning-related changes in response time distributions and choice accuracy.
While providing a rich account of the algorithmic processes underlying choice and learning, cognitive models are agnostic about the neural implementation, which is our focus here. Both fields can lean on rich literatures on the relation between neural and behavioral data, although based largely on animal recordings. In decision making, the basal ganglia have long been implicated in action selection (14–18) Furthermore, key insights were obtained from recordings that demonstrated processes resembling evidence accumulation in a variety of brain regions including the basal ganglia (19–21), the superior colliculus (22–24), and cortical regions including parietal cortex (25–29), the frontal eye fields (30–34), and premotor and motor cortex (35–38). In parallel, studies in reinforcement learning have long focused on the role of the dopaminergic midbrain in calculating reward prediction errors, and on dopamine as a signal conveying reward prediction errors (e.g., refs. 39–43).
Thus, both fields suggest prominent involvement of subcortical regions. Unfortunately, in humans, the role of subcortical regions in decision making and learning is less well characterized (44). This is due to various factors that make imaging the subcortex particularly difficult. Many subcortical regions suffer from signal losses when conventional functional MRI (fMRI) methods are used. The underlying causes include the deep location of the subcortex, high iron concentrations, and the small sizes of individual regions (for an overview, see ref. 45). Because of these factors, the majority of human neuroimaging studies have focused on the neocortical sheet, combined with the larger subcortical regions including the striatum and thalamus (for a meta-analysis, see ref. 46). To achieve the signal quality necessary for investigating the typically small blood-oxygenation level dependent (BOLD) responses associated with cognitive functions in smaller regions, specialized MRI protocols designed at ultra-high field strengths of 7 T have been developed (47–49).
Signal quality is not the only factor to consider when discussing the challenges of studying the human subcortex. Statistical considerations form a second factor hampering the characterization of the human subcortex in cognitive processes. Model-based analysis methods offer a principled advantage in terms of bridging the algorithmic and neural levels of analysis (50, 51). Traditional model-based MRI studies, however, rely on two-stage approaches, in which a cognitive model is first fit to behavioral data, and the resulting parameters are used as regressors in the analysis of the neural data. While straightforward to implement, two-stage approaches do not fully take into account the reciprocity in the relation between behavior and the brain. Using this approach, the neural model is informed by the cognitive data, but the cognitive model is not informed by the neural data. Furthermore, the measurement uncertainty in the parameters of the cognitive model are ignored. When unaccounted for, this source of noise causes negatively biased effect sizes, a phenomenon known as attenuation (52, 53). It also comes at the risk of overconfidence in the effects of covariates, since the uncertainty in the estimation of the covariate is ignored (52). This is especially detrimental when studying noisy data such as fMRI timeseries obtained from the human subcortex. Joint models, which simultaneously model both the neural and behavioral modalities of data, at all levels of the hierarchy (participant and group level), are required to remedy this issue and achieve full statistical power (50, 54–57).
This study takes a joint modeling approach to studying decision-making processes and instrumental learning in the human subcortex. We bring together three contributions. First, we use a single task paradigm combined with a single cognitive model, that unifies the study of decision making and reinforcement learning processes (6, 7), and allows for disentangling potential interactions between decision making and learning. In this task, participants are required to repeatedly make value-based choices between abstract symbols, and learn from the probabilistic reward associated with each symbol. Prior to each choice, participants are informed to emphasize either response speed or choice accuracy, thereby enforcing a change in choice strategy.
Second, we used an fMRI protocol tailored to meet the specific requirements for studying small subcortical nuclei at an ultra-high field of 7 T (47–49, 58–62). This protocol includes a short echo time to match the low T2* of iron-rich nuclei, small voxels to mitigate partial voluming effects, and a relatively high repetition time. Furthermore, we acquired multimodal quantitative anatomical MRI data, which enabled us to delineate individual subcortical nuclei with automated algorithms (63).
Finally, we analyzed brain–behavior relations in the resulting data using high-powered Bayesian joint modeling techniques, in which two reciprocal links between neural and behavioral data are included: reward prediction errors and value estimates of the reinforcement learning model are fed forward to the neural models within subjects, and simultaneously, across participants, interindividual correlations between neural and behavioral model parameters are estimated. The simultaneous estimation of the cognitive and neural models allows for all sources of uncertainty to be modeled accurately, which leads to unbiased estimates of the brain–behavior relations.
Results
Thirty-seven participants performed an instrumental learning choice task (Fig. 1A) while undergoing 7 T BOLD-fMRI. They made repeated decisions between two abstract choice symbols, each followed by choice-dependent probabilistic rewards, which they used to inform subsequent choices. In total, each participant made 342 decisions. Prior to each decision, participants were instructed to emphasize either response speed or response accuracy. The behavioral data, consisting of response times and choices, were modeled with the reinforcement learning-advantage racing diffusion (RL-ARD) model (7). This model proposes that decisions are formed through an evidence-accumulation process, where the rate of accumulation depends on the sum of an urgency signal and the internal representations of the value of each choice option (Fig. 2A). The values of choice options are learned via a standard delta rule (64). The effect of the speed and accuracy instructions were modeled by allowing both the urgency and threshold parameters to vary with instructions, in line with previous work (7). Threshold refers to the overall amount of evidence that participants require to inform their decisions, whereas urgency refers to how participants become less patient as time within a trial as time passes. In previous work (7), we demonstrated favorable parameter recovery properties with this exact paradigm (see their figure 7 and figure supplement 3).
Fig. 1.
(A) Experimental paradigm. Each trial started with a fixation cross, followed by the SAT cue (“SPD” or “ACC”), another fixation cross, the stimuli representing choice options, another fixation cross, and feedback. Feedback depended both on the response time (in time or too slow) and on the outcome of the probabilistic gamble (0 or 100 points). Rewards were only given if the response was in time. Durations of the fixation crosses were jittered to decorrelate event timing. (B) Data (black) and model fit (green) of the RL-ARD model in the accuracy (Top) and speed (Bottom) condition. Left column depicts accuracy over trials across the run. To visualize the learning effects, all trials were binned into 10 bins (approximately 17 trials per bin), and summary statistics were calculated per bin. Middle and Right panel show 10th, 50th, and 90th RT percentiles for the correct (Middle) and error (Right) response over trial bins. Shaded areas correspond to the 95% credible interval of the model fit. (C) Effects of the difficulty (x-axis) and SAT manipulations (orange ACC, blue SPD) on mean RT (Top) and accuracy (Bottom). Points are data, error bars the 95% credible interval of the model. Difficulty is defined as the difference in pay-off probability between the two choice options (smaller is harder). (D) Estimated (posterior) learning rates for an RL-EAM with a single learning rate (black), and an RL-EAM with separate learning rates for SPD and ACC trials. (E) Effect of reward prediction error (RPE) size on subsequent RT. Individual shaded crosses are trials, lines indicate linear MEM predictions of the fixed effects of RPE size per previous trial’s cue type.
Fig. 2.
Overview of the joint modeling approach, visualizing the connections between the behavior model (A) and data (B), the joint model group-level mean (C) and correlation matrix (D), and the neural model (E) and data (F). The behavioral model (A) is informed by the RT and choice data (B; see Fig. 1B for detail on the visualization of the behavioral data). The trial-by-trial differences in Q-values and prediction errors are fed forward to the design matrix of the GLM (E). The GLM is informed by the neural data (F). Mutual constraint between the two modalities of data is enabled by the joint structure that uses a multivariate normal distribution at the group level. This is described by a group-level mean (C) and correlation matrix (D). All behavioral and neural parameters are estimated simultaneously on the group level and participant level. Brain–behavior associations of reward prediction errors and value differences are characterized with group-level means, while brain–behavior associations between SAT behavior and neural responses are estimated as interindividual correlations. The correlation matrix is divided into behavior–behavior correlations (blue rectangle), brain–brain correlations (green), and brain-behavior correlations (red). For visualization purposes, only a subset of the parameters are shown.
We used mixed effects models (MEMs) to confirm the difficulty (defined as the difference in pay-off between the choice options) and speed–accuracy trade-off (SAT) manipulations had the intended effects on behavior. In the MEMs, fixed effects of difficulty, SAT, and their interaction were estimated, as well as random effects of difficulty and SAT. A linear MEM indicated a significant fixed effect of the SAT cues on RT (), but not of difficulty nor an interaction. A generalized MEM demonstrated an interaction between SAT cue and difficulty on choice accuracy (), as well as a main effect of SAT cue (), with larger SAT effects on accuracy in the easy trials compared to the hard trials (Fig. 1C). Moreover, the RL-ARD provided a generally adequate account of the behavioral data, capturing the learning-dependent increase in accuracy, decrease in response time, and the differences in RT and choice accuracy between the speed-emphasized and accuracy-emphasized trials (Fig. 1 B and C). Note that there was some misfit in of the RTs in the early trials, which replicates an earlier finding with the same paradigm and model (7).
To ensure that the SAT manipulation did not affect reward prediction error processing, we fit a second RL-ARD specification that allowed learning rates to differ between SPD and ACC trials. The estimated learning rates (Fig. 1D) show large overlap, and formal model comparisons suggested that an RL-ARD with a single learning rate provided a better trade-off between fit and model complexity (BPIC difference 66 in favor of the simpler model; see SI Appendix, Table S4 for participant-wise BPIC values of both models). In supporting information, we performed a simulation study which demonstrated that our sample size, trial numbers, and fitting methods would favor a two-learning rate model if a true learning rate difference were 0.05 or larger. We also tested whether there was any between-cue difference in the effect of reward prediction error on the subsequent trial’s RT (i.e., post reward prediction error slowing). A linear MEM showed evidence for a main effect of RPE on subsequent RT (), as well as a main effect of the previous trial’s cue (), but no interaction between RPEs and SAT condition (; Fig. 1E). Combined, the behavioral data and RL-ARDs suggest that the manipulations had the intended effects and the RL-ARD with a single learning rate provided a sufficient account of the behavioral data.
In a separate session, participants underwent high-resolution quantitative MRI scans that allowed us to derive multimodal anatomical data (T1 maps, T2* maps, and quantitative susceptibility maps), which were used to delineate 17 subcortical regions of interest using the multicontrast anatomical subcortical structure parcellation (MASSP) algorithm at the individual level (63). The masks of the gray matter structures—the amygdala (Amg), claustrum (Cl), globus pallidus interna (GPi) and externa (GPe), periaqueductal gray (PAG), pedunculopontine nucleus (PPN), red nucleus (RN) substantia nigra (SN), subthalamic nucleus (STN), striatum (Str), thalamus (Tha), and ventral tegmental area (VTA)—were subsequently used to extract timecourses of the signal from the fMRI data. Fig. 4A provides an overview of these ROIs.
Fig. 4.

Joint model fit to the MASSP ROIs. (A) Illustration of the ROIs, viewed from the front-left (Top) and bottom (Bottom). (B) Group-level correlation matrix, which is split into behavior–behavior relations (outlined by a blue rectangle), brain–brain relations (red), and brain–behavior relations (green). Only credible correlations are shown; noncredible correlations are displayed as empty squares. Relations are considered credible when the 95% credible interval of the correlation coefficient does not cover 0. All parameters are related to the SAT contrast: its effect on urgency (u), threshold (B), and the BOLD contrast in the ROIs. (C and D) Group-level estimates of within-participant brain–behavior relations of value learning and reward prediction errors. Barplots show the percentage signal change per unit change in value difference (C) and reward prediction errors (D), for each region of interest. Green and orange bars depict the left and right hemisphere, respectively. Error bars indicate 95% credible intervals.
These neural fMRI timecourses were modeled with a general linear model (GLM; Fig. 2E) which, next to a set of nuisance regressors (Materials and Methods), included cues (speed and accuracy), stimulus value differences, and reward prediction errors, as regressors of interest. The latter two regressors were derived from the RL-ARD model, and vary across trials within participants. We estimated their mean effect on the group level (Fig. 2C). We also estimated the correlations between the speed–accuracy contrasts in the neural models (one per region of interest) and speed–accuracy difference between the urgency and threshold differences as derived from the RL-ARD (Fig. 2D). Combined, this resulted in three brain–behavior relations per region of interest that were jointly informed and reciprocally constrained by the two modalities of data.
The resulting joint model is visualized in Fig. 4. Fig. 4B shows the interindividual correlations between strategic adjustments in choice behavior (urgency and threshold) and the BOLD responses in the subcortical regions (SI Appendix, Table S1). Although the thresholds were overall higher in the ACC condition than in the SPD condition, the joint models revealed across-participant correlations between urgency and neural responses bilaterally in the Str and VTA, left Cl, and right RN and Tha. Next, we turned to brain–behavior relations of value learning. The PPN and SN showed relations with value differences, as well as the left PAG (Fig. 4C). The joint model further indicated reward prediction error processing in the Amg, Cl, GPe, and Str (Fig. 4D). Interestingly, we found no evidence for involvement of the VTA or SN in reward prediction error coding; if anything, results indicated a negative association between reward prediction errors and neural activity in the right SN.
To investigate this further, we fit another joint model that used an RL-ARD with two separate learning rates for SPD and ACC trials. We reasoned that, although the behavioral evidence indicated no evidence for separate learning rates, the neural data might be more sensitive to such a difference. Hence, in the GLMs, we estimated separate parameters for the modulatory effect of RPEs on BOLD responses for the SPD and ACC trials. In SI Appendix, Fig. S3, we show the effect of RPE on BOLD responses in the MASSP ROIs, which lead to the same overall conclusions as the joint model that assumed no difference between SPD and ACC trials in learning rates or RPE processing.
The results so far indicated involvement of the Tha (as a single region covering all nuclei) in the SAT. In a second joint model, we zoomed in on the individual thalamic nuclei using a thalamus atlas (65). Here, we focused only on regions larger than 150 mm3 in both hemispheres: the anteroventral (AV), centromedian (CM), lateral posterior (LP), mediodorsal (MD), pulvinar, ventral anterior (VA), ventral lateral (VL), and the ventral posterolateral (VPL) nucleus. In the atlas, the MD is split into a lateral and medial part (MDl, MDm), the pulvinar in an anterior, inferior, lateral, and medial part (PuA, PuI, PuL, PuM), and the VL in an anterior and posterior part (VLa, VLp). Fig. 5A illustrates the ROIs that were included. The joint model based on thalamic nuclei highlighted that the brain–behavior correlations with SAT settings were found bilaterally within the AV, CM, MDm, PuA, and PuM, as well as in the right LP, VLa, and VLp (Fig. 5B, see SI Appendix, Table S2). Again, these correlations are with urgency, and appear to dominate in the right hemisphere. In the thalamic regions, we found evidence for a relation with value difference only in the right VPL (Fig. 5C). Evidence for reward prediction error processing was found in the CM, PuI, and VPL (Fig. 5D).
Fig. 5.

Joint model fit to the thalamus ROIs. (A) Illustration of the ROIs, viewed from the front-left (Top) and bottom (Bottom). Meshes were generated by first warping all individual-level delineations to MNI-space, and subsequently running the marching cube algorithm on the across-participant mean in MNI-space. For comparison, the MASSP delineation of the thalamus is illustrated in transparent white. (B) Group-level correlation matrix, which is split into behavior–behavior relations (outlined by a blue rectangle), brain–brain relations (red), and brain–behavior relations (green). Subregions belonging to the same nuclei are clustered along the diagonal with black squares. Only credible correlations are shown; noncredible correlations are displayed as empty squares. Relations are considered credible when the 95% credible interval of the correlation coefficient does not cover 0. All parameters are related to the SAT contrast: its effect on urgency (u), threshold (B), and the BOLD contrast in the ROIs. (C and D) Group-level estimates of within-participant brain–behavior relations of value learning and reward prediction errors. Barplots show the percentage signal change per unit change in value difference (C) and reward prediction errors (D), for each region of interest. Green and orange bars depict the left and right hemisphere, respectively. Error bars indicate 95% credible intervals.
In a third and final joint model, we zoomed in on the striatum. Unlike the thalamus, the human striatum is a relatively homogeneous structure, without clear internal cytoarchitectural or immunohistochemical boundaries between the dorsal and ventral striatum (e.g., ref. 66). However, it has long been argued to be functionally specialized in multiple zones (e.g., ref. 67), with distinct afferent projections (66, 68). Here, we used the recently developed second iteration of MASSP (69) to delineate the striatum into three separate parts: the nucleus accumbens (nAcc), putamen (Pu*), and caudate (Cau) (Fig. 6A). We would like to point out that the nAcc in MASSP was delineated using a perpendicular line at the base of the internal capsule, which may result in the inclusion of an area that is not fully restricted to the nAcc. This approximation of the border of the nAcc is required, since visualization of the border can only be achieved using post mortem histology. The joint model fit to the timeseries of these subregions is shown in Fig. 6B–D (SI Appendix, Table S3) for numerical estimates). The brain–behavior association relating to SAT settings was strongest in the dorsal striatum (Pu and Cau), and only credible in the right (but not left) nAcc. As expected, reward prediction error processing was clearest in the nAcc, but also detectable in both the Pu and the Cau (Fig. 6C). A positive association between the size of the BOLD responses and the size of value differences was found in the Pu, and interestingly, a negative association in the Cau (and no association in the nAcc) (Fig. 6D).
Fig. 6.
Joint model fit to the striatum ROIs. (A) Illustration of the ROIs, viewed from the front-left (Top) and bottom (Bottom). (B) Group-level correlation matrix, which is split into behavior–behavior relations (outlined by a blue rectangle), brain–brain relations (red), and brain–behavior relations (green). Only credible correlations are shown; noncredible correlations are displayed as empty squares. Relations are considered credible when the 95% credible interval of the correlation coefficient does not cover 0. All parameters are related to the SAT contrast: its effect on urgency (u), threshold (B), and the BOLD contrast in the ROIs. (C and D) Group-level estimates of within-participant brain–behavior relations of value learning and reward prediction errors. Barplots show the percentage signal change per unit change in value difference (C) and reward prediction errors (D), for each region of interest. Green and orange bars depict the left and right hemisphere, respectively. Error bars indicate 95% credible intervals.
Finally, we confirmed empirically that joint models provided more statistical power compared to a two-stage approach. To demonstrate the two-stage approach, we first estimated the behavioral model. Based on the median of the posterior parameters and the experimental paradigm, we generated trial-by-trial stimulus and reward prediction error values, which were used to generate design matrices for the neural GLMs. We then estimated the neural GLMs as well. In a second stage, we fit a multivariate Gaussian distribution on the subject-level median behavioral and neural parameters, using a Bayesian estimation routine. This way, we still estimate a distribution of correlation coefficients, but not jointly with the neural and behavioral models. Fig. 3 compares the two-stage brain–behavior correlation distributions with the joint model correlation distributions for the five ROIs with largest correlations in Fig. 4. This demonstrates clearly attenuated effect sizes in the two-stage approach, with approximately 20 to 40% smaller median correlation coefficients in the latter case.
Fig. 3.
Comparison of two-stage (TS, red) posterior correlation coefficients (between the behavioral and neural SAT effects) with joint (, black) posterior correlation coefficients for five MASSP ROIs. Vertical dotted lines indicate correlations of 0. Correlation coefficients in the legend indicate the median of the distributions.
Discussion
In this study, we use joint models to understand the brain–behavior relations between subcortical regions and decision-making and learning. With tailored methods, including ultra-high field 7 T fMRI, decision making and instrumental learning were jointly studied in a single paradigm and corresponding cognitive model, in a Bayesian hierarchical joint modeling framework in which brain–behavior relationships were reciprocally informed by all modalities of data. The resulting joint models revealed that the Str (and particularly the dorsal Str) was involved in choice strategy settings; however, contrary to previous reports, they demonstrated a relation with urgency, rather than response caution. Next, they revealed value-related processing, but not reward prediction error processing, in the substantia nigra. Finally, within the Str, value-related processing was demonstrated to show BOLD responses with opposite polarities in the caudate and putamen.
Our results indicate that subcortical regions may contribute to strategic control of choice behavior through urgency, rather than response caution settings, which has been argued previously (70, 71). At the group level, thresholds were higher in ACC trials compared to SPD trials, as is commonly found. However, the effect of the manipulation on urgency, not threshold, covaried with neural signals. In part, this may arise from the use of the RL-ARD, which is able to dissociate urgency from response caution adjustments, which themselves correlate (e.g., Fig. 4). The implication of urgency adjustments corroborates earlier studies based on neural recordings the basal ganglia in monkeys (19, 72), as well as fMRI evidence using an expanded judgment task (73). The dominance of the right hemisphere in these relations is consistent with previous studies (70, 71, 73), and may be related to the right-lateralized response inhibition networks (74–76).
While our model-based approach is able to dissociate between urgency and threshold, the concept of urgency itself is not singular, as multiple cognitive processes may contribute to or correlate with urgency signals. Understanding these processes may help explain why we found urgency-related signals in so many different regions. For one, urgency is known to be related to arousal (77). In SI Appendix, Fig. S4, we tested whether the SPD cues had a different effect compared to ACC cues on heart rate variability and respiratory volume per time (as potential correlates of arousal), but found no evidence for any difference. However, subtle arousal-related differences could have remained undetected. Future studies could include pupillometry (78) to test whether the identified urgency signals reflect pupil-link arousal in relevant subcortical areas. For example, the CM plays an important role in modulating arousal (79) (and covaries with reward prediction errors; 80, 81). Second, the MD has been implicated in various types of memory processing, including object–reward association memory (e.g., refs. 82–86). The role of the MD may be to prepare the memory processes required for the subsequent value-based decision, and such preparations could start earlier under speed stress. Some evidence also suggests a role for the AV in modulating cortical plasticity and memory formation (87). The involvement of the RN and VTA in urgency has, to our knowledge, not been demonstrated before, but may be related to earlier studies that demonstrate these regions’ involvement in conflict resolution, which is potentially caused by the conflicting instructions of the speed and accuracy requirements (60, 88). Third, urgency may cause attentional processes as well. In earlier behavioral work, we tested for effects of SAT cues on attention in this paradigm (7), but model comparisons preferred models without attention effects. It might be that the effects of attentional processes on behavior were too subtle to be picked up, but their effects on the present neural data are more marked. Fourth, it has recently been proposed that people’s decision processes in accuracy-emphasized trials contain one additional phase of cognitive processing compared to speed-emphasized trials, suggesting that there may be qualitatively different decision processes in speed and accuracy trials (89). Additionally, as noted in the introduction, evidence accumulation signals have been found in a wide variety of cortical and subcortical regions before (19–38). The finding of brain regions that correlated with urgency settings, and their function, can help us theorize about potential confounds of urgency that are difficult to derive based on behavioral studies alone. Model-based analyses should be combined with clever experimental design and manipulations to disentangle the influences of various confounding factors to estimated brain–behavior relations.
Our results further indicated value-related processing in the Str, but with opposite polarities in the Cau compared to the Pu. This striking result might reflect a gradient of functional specialization related to value differences. Alternatively, recent research has shown that neural activity in the dorsal Str can elicit vasoconstriction and negative BOLD responses, implying that our finding of negative BOLD responses could nonetheless indicate increased neural activity (90). Note that value differences, in the present design, are confounded by other factors, which importantly includes difficulty: a choice based on two stimuli which differ in their value is easier compared to stimuli with similar values. Additional confounding factors include salience and arousal effects (see also ref. 91). Disentangling the influence of these factors requires specific experimental designs in future studies.
We further found various subcortical regions in which BOLD responses covaried with reward prediction error sizes. While amygdalar and striatal involvement in reward prediction error coding are well-documented (e.g., refs. 80 and 81), the Cl and GPe received less attention in the literature. Recently, electrophysiological recordings in rodents identified a neural subpopulation encoding reward prediction errors in the GPe (92). To some extent, these may arise also from covariates, such as perceived saliency (93, 94). The Cl involvement might indicate a functional role similar to the Amg in terms of arousal and salience detection (95). It is becoming increasingly clear that dopamine signals can be detected in a wider range of behaviors than classical reward prediction errors and can also signal sensory and motor features (for review, see ref. 43). Under the generalized prediction error framework (96), they are argued to also indicate errors in the sensory world model, and are used to improve the world model. Consequently, a wide set of brain regions are likely involved in the processing of these prediction errors.
Contrary to some previous reports, we did not find evidence for dopaminergic midbrain involvement in reward prediction error encoding. A long history of animal recordings has implicated especially the VTA in reward prediction error processing (e.g., refs. 39–42 and 97), which has partially been supported in humans using fMRI (98–102), but not consistently (see ref. 103, for a meta-analysis). A variety of factors has been argued to contribute to this discrepancy, including variability in the anatomical masks (102) and limited statistical power, as detailed in the opening section. On the contrary, in the present study, the joint models were sufficiently powerful to identify value-related processing in the SN. Perhaps the discrepancy between the electrophysiology and BOLD findings is the result of a much more fundamental difference in methodology: while electrophysiology suggests reward prediction errors in the dopaminergic midbrain are encoded in spiking activity, BOLD responses have long been argued to correlate more strongly with synaptic activity (104–106), which could indicate local processing as well as input to a region. It has often been argued, for example, that the striatal BOLD responses are a result of dopamine release caused by dopaminergic midbrain neural spiking (107, 108). Intriguingly, since reward prediction errors are defined as the difference between obtained and expected reward, a region that calculates prediction errors needs expected reward (or value) as an input. This may explain why the SN BOLD responses were sensitive to value processing, but not reward prediction errors.
Subcortical regions play a prominent role in neurological disorders including Parkinson’s disease (109) as well as psychiatric disorders like drug addiction (110) and social anxiety disorder (111–113). Parkinson’s disease, for example, is associated with a specific loss of dopaminergic cells in the substantia nigra. Our results indicate a role for the substantia nigra in value processing. Earlier work suggests that the loss of dopaminergic cells in PD can lead to an increased propensity to learn from positive compared to negative outcomes, which can be reversed with dopaminergic medication (114). Learning biases are also crucial in addiction (115) and anxiety (e.g., refs. 116 and 117). Abnormal value computation may lead to an overreliance on positive or negative outcomes. Task paradigms that disentangle reward and punishment learning can be used in future applications to test whether maladaptive value computation in disorders is associated with BOLD responses in the substantia nigra. Additionally, many of the subcortical regions we studied are (potential) targets for deep brain stimulation (DBS) in a variety of neurological and psychiatric disorders (e.g., ref. 118). Other regions are also of potential interest, including the bed nucleus of the stria terminalis as a potential target for obsessive-compulsive disorder and the lateral habenula for major depression. Joint modeling approaches with specialized task designs can also be used to further understand these regions’ functions in health and disease, especially in light of their ability to capture interindividual differences.
Especially in the context of translation to the clinic, it is important to consider the emotional and social components in tasks and models. In our current approach, we only relied on cognitive processes such as evidence accumulation and reward learning, but disorders such as social anxiety and autism include social and affective components, which can manifest as altered processing of social rewards (119). Combined with more complex paradigms, RL-EAMs and joint models can be further extended to better understand the brain–behavior relations in such disorders.
Despite a generally good fit of the RL-ARD, some misfit remains in the first trials. Factors such as increased uncertainty (120) could cause the relatively slow responses in the initial trials of each block. Additionally, it could be that the additional time participants take in the initial trials (relative to model predictions) reflect extra cognitive processes involved in interpreting the abstract stimuli and forming memory traces. These memory traces are likely necessary for stimulus identification in later trials, where RTs are primarily governed by evidence accumulation based on Q-values. This hypothesis can be tested in a future experiment where the same stimulus sets are used across multiple blocks with new reward contingencies. The additional time should then be observed only in the first block in which a stimulus appears. It would then also be possible to assess whether the observed RT increase is better explained by heightened response caution or by an increase in nondecision time.
In conclusion, this study revealed various human subcortical underpinnings of decision making and learning. It uncovered new brain–behavior relations (e.g., thalamic nuclei in urgency settings, GPe in reward prediction error processing), and refined previous work (e.g., functionally specialized zones along the anterior–posterior axis of the Str in value processing). It also demonstrates feasibility and value of the combination of joint modeling and tailored fMRI methods in progressing our understanding of the human subcortex in cognitive processes.
Materials and Methods
Participants.
Thirty-seven healthy volunteers [mean age 27 y old (SD 6 y, range 19–39 y old), 20 females] were recruited via local advertisement. The study was approved by the Ethics Review Board of the Faculty of Social and Behavioral Sciences of the University of Amsterdam (reference: 2021-BC-13146) and the Regional Committees for Medical and Health Related Research Ethics of Central Norway (reference: 116630). All participants gave written informed consent prior to the onset of the study. All participants were screened for MRI safety, had normal or corrected-to-normal vision, and no history of psychiatric and neurological illness. All participants participated in five scanning sessions as part of a larger project; here, we report and analyze two of these sessions.
Paradigm.
The experimental paradigm made use of an instrumental learning task (114) with a cue-based SAT manipulation (7, see Fig. 1). In every trial of the task, the participants made a decision between two abstract symbols, each associated with a fixed reward probability that is unknown to the participants. One choice option always had a higher probability of being rewarded than the alternative option. The participants received feedback in the form of points after each choice, which the participants can use to learn which symbols have the highest reward probabilities.
Prior to each trial, the participants were presented with a cue to instruct them to emphasize either response speed (“SPD”) or accuracy (“ACC”) on the upcoming trial. Speed and accuracy cues were randomly interleaved. On speed trials, participants had to respond within 700 ms to be eligible for a reward; on accuracy trials, participants had to respond within 1.5 s. After each choice, participants received two types of feedback: first, the outcome of the choice ([retain-explicit-plus]+0 or [retain-explicit-plus]+100 points), and second, the actually obtained reward. If the participants responded in time (1.5 s in ACC trials, 0.7 s in SPD trials), their reward was equal to the outcome of the choice. If they responded too late, the participants were penalized with 100 points, irrespective of the outcome of the choice. The presentation of both the outcome of the choice and the actual reward allowed participants to both learn from the outcome of their choice as well as from their response timing.
In total, participants performed 342 trials divided over three runs. Each trial took 8.28 s (corresponding to 6 volumes; see below). Each run consisted of three new stimuli sets, that differed in their reward probabilities (80%/20%, 70%/30%, 60%/40%, respectively, for the three stimuli sets within each block) and therefore difficulty. Event timing was jittered to decorrelate the BOLD response design matrix, by pseudorandomly sampling the duration of each fixation cross from 0.5, 1, 1.5, and 2 s. Additionally, 10% null trials were included, during which the screen remained empty for 8.28 s.
MEMs.
We first tested for the effects of the SAT and difficulty manipulations on RT and accuracy using MEMs (e.g., ref. 121). Linear models were used for RT and generalized models with a binomial distribution for accuracy. In both models, difficulty (continuous) and cue (SAT, two levels) were included as both random and fixed effects. Their interaction was included as a fixed, but not random effect, since the maximal model did not converge. Degrees of freedom for the linear MEM were estimated using Satterthwaite’s method. We used the implementation in R packages “lme4” and “lmerTest” (122, 123).
Cognitive Model Specification.
The behavioral data were modeled with the reinforcement learning advantage racing diffusion (RL-ARD) model (7), which is an instance of the broader class of combined reinforcement learning evidence accumulation models (RL-EAMs; 6). The RL-ARD conceptualizes decision making as a race between accumulators, each accumulating evidence for one choice option. The first accumulator to reach a common threshold-level of evidence triggers the motor processes that execute the decision. The time to respond equals the time to reach the threshold, plus an intercept that corresponds to the time required for early perception encoding and response execution.
In the RL-ARD, each accumulator accumulates the advantage of one choice option over another option. Specifically, the rate of evidence accumulation (the drift rate ) of each accumulator depends on three terms: an evidence-independent base rate (urgency); the advantage of one choice option of the other option, weighted by free parameter ; and the total amount of evidence, weighted by free parameters . “Evidence” in this model is based on Q-values, which represent the participant’s internal belief about how rewarding each choice option is. For two-choice tasks such as in the present study, the drift rates for the two accumulators are
| [1] |
where is the Q-value for choice alternative , which are updated after every trial according to a simple delta rule:
| [2] |
where is the trial number, is the obtained reward (in this specific experimental paradigm, the “outcome”), and a free parameter known as the learning rate.
To model the effect of the SAT manipulation, we allowed both the and parameters to vary freely between the speed and accuracy conditions, based on our earlier work (7).
In total, the RL-ARD has eight free parameters: two evidence-independent base rate and , weights on the difference and sum of the evidence and , nondecision time , learning rate , and two thresholds and . Instead of estimating a parameter for each condition separately, we estimated the across-condition mean and difference parameters (hence, and ), and similarly, we estimated an across-condition mean and difference parameter. The direct estimation of the between-condition differences in these parameters facilitates estimation of covariance with neural model parameters, which are detailed below.
MRI Data Acquisition.
In multiple sessions, participants were scanned in a MAGNETOM “Terra” 7 T MRI system (Siemens Healthineers, Germany) with a 32-channel phased array head coil (Nova Medical Inc., USA). The first session contained two anatomical scans: a multiecho gradient recalled echo (GRE) and an MP2RAGE, both at 0.75 mm isotropic resolution. For the MP2RAGE, we used the following parameters: repetition time (TR) 4.3 s, inversion times (TI1,2) 840 ms and 2,370 ms, flip angles (FA1,2) 5° and 6°, echo time (TE) 1.99 ms, field of view (FOV) 240 240 168 mm, bandwidth 250 Hz/px. For the GRE, the following parameters were used: TR 31.0 ms, TE1−−4 2.51, 7.22, 14.44, and 23.23 ms, FA 12°, FOV 240 240 168 mm. In the remainder of this anatomical session, resting-state data were collected that are not of interest for the current study.
The second session contained three functional runs with the task paradigm. A single echo echo planar imaging (EPI) sequence was used designed by the CMRR (https://www.cmrr.umn.edu/multiband/), with parameters based on our previous studies (46, 47) to tailor the sequence for the subcortex: 1.5 mm isotropic resolution, TE 14 ms, TR 1.38 s, partial Fourier , in-plane acceleration (GRAPPA) 3, multiband 2, bandwidth 1,446 Hz/px, phase encoding direction A P, FOV 192 192 132 mm. In contrast to our previous work, we included a multiband factor of 2 in the protocol. Pilot testing indicated that on this MRI system, the increase in statistical power obtained through the increase in number of volumes (due to the lower TR with multiband acquisition) outweighed the loss in SNR (even in subcortical areas) for statistical testing purposes.
Each run consisted of 754 volumes (17 min 56 s). Immediately after each run, we collected 5 volumes of the same protocol with opposite phase encoding direction (P A), which was used for susceptibility distortion correction purposes. Finally, at the end of the functional session, a low-resolution 1 mm MP2RAGE scan was acquired for coregistration purposes, using the same parameters as in the anatomical session.
During functional runs, physiological data on the participant’s heart rate and respiration were acquired using a photoplethysmograph (with sampling frequency 200 Hz) and respiratory belt (with sampling frequency 50 Hz), respectively. In six runs (two in one participant, one in another participant, and three in a third participant), recording of physiological data failed due to technical reasons.
Anatomical Masks.
We used the multicontrast anatomical subcortical structure parcellation (MASSP) algorithm (63) to obtain participant-specific anatomical masks of 17 subcortical structures. MASSP relies on multiple contrasts; here, we used quantified susceptibility (QSM) values, the longitudinal relaxation rates (R1), and effective transverse relaxation rates (R2*). R1 values were computed based on the MP2RAGE data using a look-up table (124). R2* values were computed by least squares fitting of a monoexponential decay function to the four echoes of the GRE data. QSM values were obtained using the phase maps of the last three echoes of the GRE data (125) with TGV-QSM (126). In both cases, LCPCA denoising (127) was performed beforehand on the eight images of the GRE (four magnitude and four phase). Prior to estimating R2* and QSM, the GRE data were brought into MP2RAGE-space by coregistration of the first GRE echo (magnitude image) to the second inversion of the MP2RAGE, using a rigid transformation in ANTs.
The MASSP algorithm combines shape, location, and R1, R2*, QSM value priors to delineate the following 17 subcortical structures in an individual’s data: Amygdala (Amg), claustrum (Cl), fornix (fx), the external and internal segments of the globus pallidus (GPe, GPi), internal capsule (ic), periaqueductal gray (PAG), pedunculopontine nucleus (PPN), red nucleus (RN), substantia nigra (SN), subthalamic nucleus (STN), striatum (Str), thalamus (Tha), ventral tegmental area (VTA), and the lateral, third, and fourth ventricles (LV, 3V, 4V). For all regions except fx, 3V and 4V, separate masks were obtained for both hemispheres. Here, we only focus on the gray matter structures, and thus excluded the internal capsule, fornix, and ventricles from the ROI analyses below; totaling 12 ROIs bilaterally.
Like in ref. 128, we trained the MASSP algorithm on renormalized versions of the quantitative contrasts using a fuzzy C-means clustering of intensities, and linearly interpolating between cluster centroids. We also registered the data to the MASSP atlas in two successive steps. These alterations compared to the original MASSP implementation (63) led to small parcellation improvements for some structures.
To segment the thalamus into individual nuclei, we used the thalamic segmentation tool segmentThalamicNuclei.sh as part of freesurfer 7.2.0. The segmentation applies a probabilistic atlas that was built using a combination of in vivo and ex vivo data (65). The segmentation is performed in subject space with the T1w contrast after running the freesurfer pipeline (recon-all) as part of fmriprep (see below). The tool outputs discrete segmentations at a resolution of 0.5 mm, which were resampled to 1.5 mm resolution with linear interpolation.
fMRI Preprocessing.
Results included in this manuscript come from preprocessing performed using fMRIPrep 20.2.0 (129, 130); RRID:SCR_016216], which is based on Nipype 1.5.1 (131, 132); RRID:SCR_002502]. For brevity in the main article, full details are included in SI Appenidx.
Neural Model Specification: Whole-Brain Generalized Linear Models (GLMs).
The timeseries of the neural data were modeled using GLMs. In these GLMs, we modeled each voxel’s timeseries as
| [3] |
where every is a parameter to be estimated, are the timeseries of the experiment events convolved with the canonical double-gamma hemodynamic response function (HRF; 133), and the residual variance. Note that we estimated a single parameter to account for the shared effects of the presentations of cues, stimuli, and feedback, as well as the effects of motor responses (e.g., the effects of visual processing and overall motor preparation). In experimental paradigm, the effects of these event types cannot be disentangled from one another due to their rapid succession within a trial. Note, however, that the contrasts of interest are orthogonal to these events and can be estimated well.
Mirroring the cognitive model, we estimated a between-cue difference for the BOLD responses relating to the cue. Specifically, the regressor was also modeled on the onset of the cue but shows a negative deflection for ACC cues and a positive deflection for SPD cues. As such, the corresponding reflects the difference in SPD over ACC cues. Similarly, the parameter reflects the BOLD-contrast resulting from left compared to right motor responses. The corresponding regressor was modeled on the onsets of the button presses.
The regressors and relate to the stimulus and stimulus value differences, respectively. The amplitude of the stimulus value regressor varied parametrically across trials, with the trial-by-trial amplitude determined based on the difference in Q-values (internal value representations) as estimated by the RL-ARD model. Similarly, the regressors and relate to the effects of the feedback and the reward prediction error, respectively, which were simulated obtained from the RL-ARD. Both the value difference and reward prediction error regressors were demeaned per run, to orthogonalize them with respect to the stimulus and feedback regressors. We included the temporal derivatives of all task regressors (note that these are not shown in Eq. 3 but are included in SI Appendix, Eq. S1).
As control analyses, we first fit the GLM using a traditional two-stage mass-univariate approach, where a GLM is fit per voxel. In this approach, we first fit the RL-ARD to the behavioral data, and extracted trial-by-trial regressors per subject by simulating them from the RL-ARD model. Specifically, the model was used to simulate the task paradigm for 100 times, each time with a different set of RL-ARD parameters (randomly sampled from the posterior distributions). On each trial of the simulation, the difference in values of the two stimuli was calculated, and the mean of the stimulus value differences at each trial across the 100 simulations was used to determine the regressor’s amplitude. These stimulus value differences were then demeaned per run. The trial-by-trial height of the parametrically varying reward prediction error regressor was determined based on the same simulation of the RL-ARD (except now using the reward prediction error instead of the value differences), and this regressor was also demeaned per run.
To model physiological noise, we included a set of 18 regressors obtained using RETROICOR (134): third-order phase Fourier expansion of the heart rate signal, fourth-order phase expansion of the respiration signal, and a second-order phase Fourier expansion of the interaction between heart rate and respiration (135). Two additional regressors were used to model heart rate variability (HRV; 136), and respiratory volume per time unit (RVT; 137, 138). These physiological regressors were estimated using the PhysIO toolbox (139) implemented in the TAPAS software package (140). For six runs (one in a single participant, two in another participant, and three in a third participant), collection of the physiological data failed due to technical reasons. For these runs, the first 20 aCompCor components (141) were instead included in the design matrix. Additionally, for all participants seven motion-related regressors were included (translation and rotation in three dimensions, plus the framewise displacement), and a set of discrete cosines to model low-frequency drifts. To model residual physiological noise, we also included a regressor with the mean signal within CSF, estimated by fMRIprep. Finally, we included a nuisance regressor to model the effect of response times using the RTDur approach (142). This regressor is generated by convolving a boxcar function (starting at the onset of each stimulus, with the response time on that trial as duration) with the same hemodynamic response function as was used for the task-related regressors.
Prior to fitting the whole-brain GLM, the data were minimally smoothed using SUSAN (143, kernel size FWHM 1.5 mm). Run-level GLMs were estimated using FSL FEAT (144), and afterward the three run-level GLMs per participant were combined with a fixed effects analysis. Group-level models were estimated using FSL FLAME1+2 (145). For the speed–accuracy cue contrast, the design matrix included both an intercept and two model-based parametrically varying parameters: the between-condition differences in the threshold parameter (speed–accuracy) and in the urgency parameter, which were z-scored. All group-level statistical parametric maps (SPMs) were corrected for the false discovery rate with the Benjamini–Hochberg procedure (FDR; ). SPMs of the whole-brain results can be found in SI Appendix.
Joint Models.
The main analysis used joint models. In joint models, the cognitive model (RL-ARD) and the neural model (GLM) are estimated simultaneously (50, 54–57). Furthermore, the joint models we employ assumed that the cognitive and neural parameters are multivariate normally distributed across subjects: . This assumption allows for estimation group-level mean parameters as well as correlations between parameters through the variance–covariance matrix , and thereby allows for estimating which cognitive processes correlate with BOLD responses in which regions of interest (ROIs).
The variance–covariance matrix of a multivariate normal grows quadratically with the number of cognitive and neural parameters estimated. Therefore, we applied multiple restrictions to the participant-level models to retain feasibility of parameter estimation (146). Specifically, we made a distinction between estimating parameters jointly (i.e., both the group-level mean and the correlations between parameters of neural and cognitive models across individual) or nonjointly in which the group-level mean was estimated, but the no correlations were estimated.
Of the cognitive model, we estimated all parameters jointly. Figs. 4–6 focus on only the parameters related to the SAT manipulation. Of the neural model, we estimated the , , and parameters of interest jointly, as well as the and nuisance parameters. We estimated only these latter nuisance parameters jointly as we hypothesized these could most strongly correlate with parameters of interest. All other neural parameters (including the temporal derivatives and the SD of the errors) were estimated nonjointly.
Joint models were fit to neural data from the ROIs defined by MASSP and by the thalamus atlas. To obtain the signal per ROI, first, the mean timeseries within each ROI defined by MASSP was extracted from the unsmoothed functional data. The mean timeseries were rescaled to percent signal change, through division by the mean signal, multiplying by 100 and subtracting 100. To reduce the total number of parameters in the joint models, we first filtered the timeseries and design matrix by least square regression of the same set of confounds as used in the whole-brain GLMs (except for the CSF and RT regressors, which were estimated in the joint model), to reduce physiological noise and remove low-frequency drifts from the signal.
Bayesian Estimation.
To allow for estimation of whole-brain GLM of the neural data, we first fit the cognitive model to the behavioral data only. All model estimations were performed using a Bayesian particle Metropolis-within-Gibbs (PMwG) sampler (147, 148). The PMwG sampler strictly adheres to a hierarchical model in which group-level parameters and participant-level parameters are estimated simultaneously. The group level is modeled with a multivariate Gaussian distribution, which is updated using Gibbs sampling. At the participant level, chains are updated using a combination of particle sampling and Metropolis–Hastings. We followed earlier work (148) by using four sampling stages. The first, preburn stage, was used to approximate the participant-level likelihood landscape for proposal distributions. The burn stage was run until the mean Gelman’s diagnostic (149) was below 1.1. The adaptation stage was used to collect samples to generate a distribution that allows for efficient proposal samples in the last stage, the sampling stage. This sampling stage was run until convergence (assessed using Gelman’s diagnostics and visual inspection of the chains). Samplers were run with three chains.
The priors on the group-level mean were Gaussian distributions. The mean and SD of these priors of the cognitive model parameters were based on the posterior distributions described in ref. 7, which used the same task and model (experiment 2). The prior for was set to , to , to , to , to , and to . The , , and parameters were estimated on the log scale, and on the probit scale. The prior for the contrasts of interest, and , were set to and , respectively (note that threshold and urgency should have opposite signs to allow for faster responding under speed stress: thresholds should decrease, but urgency should increase). Visual comparisons confirmed that the posteriors were not strongly influenced by the priors for the parameters of interest.
For the group-level (co-)variance matrix we used an inverse-Gamma—inverse-Wishart mixture prior with 2 degrees of freedom and a scale parameter of 0.3. These settings give rise to uniform priors on the correlations (150), for parts of the group-level covariance matrix that were allowed to covary.
To visualize the quality of model fit, we randomly sampled 100 parameter sets from the posterior distributions, and used these to simulate the experimental design. These posterior predictive distributions were then used to calculate the credible intervals by taking the range between the 2.5 and 97.5% quantile of the averages across participants for each behavioral measure (RT quantiles and accuracy).
Next, we fit the joint models, in which we used the same priors for the cognitive models, except we decreased the variance of the group-level means to 0.7 for and , and to 0.5 for the other parameters. The priors for the neural parameters were set to , except for the RT nuisance parameter, which was set to . Note that the amplitude of the RT nuisance regressor is much larger than the amplitudes of the other neural regressors, due to its duration being modeled (as opposed to using a stick function of 0.001 s). This also entails that the absolute parameter estimates are much smaller, hence, we also used a smaller variance for this parameter to stabilize estimation.
Joint models were implemented in a customized version of the EMC2 software package for R (151). The analysis scripts and data underlying this manuscript can be found at https://osf.io/pc5bm. A practical tutorial on joint modeling in this framework can be found in ref. 152.
Supplementary Material
Appendix 01 (PDF)
Acknowledgments
We thank Pål Erik Goa for supporting this study by facilitating data acquisition. We thank Sarah Habli, Lisbeth R.øe, and Daniel R. Sokołowski for their help collecting the data.
Author contributions
S.M., P.-L.B., S.J.S.I., A.C.T., A.K.H., and B.U.F. designed research; S.M., N.S., D.H.Y.T., and A.K.H. performed research; S.M., N.S., P.-L.B., A.A., and D.H.Y.T. contributed new reagents/analytic tools; S.M., N.S., and P.-L.B. analyzed data; and S.M., N.S., P.-L.B., A.A., S.J.S.I., A.C.T., A.K.H., and B.U.F. wrote the paper.
Competing interests
P.-L.B. is the owner of Full Brain Picture Analytics. The other authors do not have any competing interests to declare.
Footnotes
This article is a PNAS Direct Submission. R.M. is a guest editor invited by the Editorial Board.
*Note that both the thalamic atlas and the second iteration of MASSP include “Pu” as an abbreviation; the former referring to the Pulvinar, the latter to the Putamen. In this manuscript, Pu refers to the Putamen, and PuA, PuI, PuL, and PuM to the various Pulvinar regions.
Data, Materials, and Software Availability
MRI timeseries and behavioral data have been deposited in OSF (https://osf.io/pc5bm) (153).
Supporting Information
References
- 1.Sutton R. S., Barto A. G., Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA, ed. 2, 2018). [Google Scholar]
- 2.Fontanesi L., Palminteri S., Lebreton M., Decomposing the effects of context valence and feedback information on speed and accuracy during reinforcement learning: A meta-analytical approach using diffusion decision modeling. Cogn., Affective Behav. Neurosci. 19, 490–502 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Fontanesi L., Gluth S., Spektor M. S., Rieskamp J., A reinforcement learning diffusion decision model for value-based decisions. Psychon. Bull. Rev. 26, 1099–1121 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Pedersen M. L., Frank M. J., Biele G., The drift diffusion model as the choice rule in reinforcement learning. Psychon. Bull. Rev. 24, 1234–1251 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Pedersen M. L., Frank M. J., Simultaneous hierarchical Bayesian parameter estimation for reinforcement learning and drift diffusion models: A tutorial and links to neural data. Comput. Brain Behav. 3, 458–471 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Miletić S., Boag R. J., Forstmann B. U., Mutual benefits: Combining reinforcement learning with sequential sampling models. Neuropsychologia 136, 107261 (2020). [DOI] [PubMed] [Google Scholar]
- 7.Miletić S., et al. , A new model of decision processing in instrumental learning tasks. eLife 10, 1–33 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.McDougle S. D., Collins A. G., Modeling the influence of working memory, reinforcement, and action uncertainty on reaction time and choice during instrumental learning. Psychon. Bull. Rev. 28, 20–39 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Turner B. M., Toward a common representational framework for adaptation. Psychol. Rev. 126, 660–692 (2019). [DOI] [PubMed] [Google Scholar]
- 10.Sewell D. K., Jach H. K., Boag R. J., Van Heer C. A., Combining error-driven models of associative learning with evidence accumulation models of decision-making. Psychon. Bull. Rev. 26, 868–893 (2019). [DOI] [PubMed] [Google Scholar]
- 11.Sewell D. K., Stallman A., Modeling the effect of speed emphasis in probabilistic category learning. Comput. Brain Behav. 3, 129–152 (2020). [Google Scholar]
- 12.Wagner B., Mathar D., Peters J., Gambling environment exposure increases temporal discounting but improves model-based control in regular slot-machine gamblers. Comput. Psychiatry 6, 142–165 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Shahar N., et al. , Improving the reliability of model-based decision-making estimates in the two-stage decision task with reaction-times and drift-diffusion modeling. PLoS Comput. Biol. 15, 1–25 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Redgrave P., Prescott T., Gurney K., The basal ganglia: A vertebrate solution to the selection problem? Neuroscience 89, 1009–1023 (1999). [DOI] [PubMed] [Google Scholar]
- 15.Deniau J., Chevalier G., Disinhibition as a basic process in the expression of striatal functions. II. The striato-nigral influence on thalamocortical cells of the ventromedial thalamic nucleus. Brain Res. 334, 227–233 (1985). [DOI] [PubMed] [Google Scholar]
- 16.Chevalier G., Vacher S., Deniau J., Desban M., Disinhibition as a basic process in the expression of striatal functions. I. The striato-nigral influence on tecto-spinal/tecto-diencephalic neurons. Brain Res. 334, 215–226 (1985). [DOI] [PubMed] [Google Scholar]
- 17.Nambu A., et al. , Excitatory cortical inputs to pallidal neurons via the subthalamic nucleus in the monkey. J. Neurophysiol. 84, 289–300 (2000). [DOI] [PubMed] [Google Scholar]
- 18.Mink J. W., Thach W. T., Basal ganglia intrinsic circuits and their role in behavior. Curr. Opin. Neurobiol. 3, 950–957 (1993). [DOI] [PubMed] [Google Scholar]
- 19.Thura D., Cisek P., The basal ganglia do not select reach targets but control the urgency of commitment. Neuron 95, 1160–1170 (2017). [DOI] [PubMed] [Google Scholar]
- 20.Lauwereyns J., Watanabe K., Coe B., Hikosaka O., A neural correlate of response bias in monkey caudate nucleus. Nature 418, 413–417 (2002). [DOI] [PubMed] [Google Scholar]
- 21.Ding L., Gold J. I., Caudate encodes multiple computations for perceptual decisions. J. Neurosci. 30, 15747–15759 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Munoz D. P., Wurtz R. H., Saccade-related activity in monkey superior colliculus. I. Characteristics of burst and buildup cells. J. Neurophysiol. 73, 2313–2333 (1995). [DOI] [PubMed] [Google Scholar]
- 23.Grimaldi P., Cho S. H., Lau H., Basso M. A., Superior colliculus signals decisions rather than confidence: Analysis of single neurons. J. Neurophysiol. 120, 2614–2629 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Jun E. J., et al. , Causal role for the primate superior colliculus in the computation of evidence for perceptual decisions. Nat. Neurosci. 24, 1121–1131 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Shadlen M. N., Newsome W. T., Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. J. Neurophysiol. 86, 1916–1936 (2001). [DOI] [PubMed] [Google Scholar]
- 26.Mazurek M. E., Roitman J. D., Ditterich J., Shadlen M. N., A role for neural integrators in perceptual decision making. Cereb. Cortex 13, 1257–1269 (2003). [DOI] [PubMed] [Google Scholar]
- 27.Huk A. C., Shadlen M. N., Neural activity in macaque parietal cortex reflects temporal integration of visual motion signals during perceptual decision making. J. Neurosci. 25, 10420–10436 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kiani R., Hanks T. D., Shadlen M. N., Bounded integration in parietal cortex underlies decisions even when viewing duration is dictated by the environment. J. Neurosci. 28, 3017–3029 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Steinemann N. A., et al. , Direct observation of the neural computations underlying a single decision. eLife 12, RP90859 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hanes D. P., Schall J. D., Neural control of voluntary movement initiation. Science 274, 427–430 (1996). [DOI] [PubMed] [Google Scholar]
- 31.Schall J. D., The neural selection and control of saccades by the frontal eye field. Philos. Trans. R. Soc. B: Biol. Sci. 357, 1073–1082 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Purcell B. A., et al. , Neurally constrained modeling of perceptual decision making. Psychol. Rev. 117, 1113–1143 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Kim J. N., Shadlen M. N., Neural correlates of a decision in the dorsolateral prefrontal cortex of the macaque. Nat. Neurosci. 2, 176–185 (1999). [DOI] [PubMed] [Google Scholar]
- 34.Ding L., Gold J. I., Neural correlates of perceptual decision making before, during, and after decision commitment in monkey frontal eye field. Cereb. Cortex 22, 1052–1067 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Cisek P., Kalaska J. F., Neural correlates of reaching decisions in dorsal premotor cortex: Specification of multiple direction choices and final selection of action. Neuron 45, 801–814 (2005). [DOI] [PubMed] [Google Scholar]
- 36.Thura D., Cisek P., Modulation of premotor and primary motor cortical activity during volitional adjustments of speed-accuracy trade-offs. J. Neurosci. 36, 938–956 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Romo R., Hernández A., Zainos A., Neuronal correlates of a perceptual decision in ventral premotor cortex. Neuron 41, 165–173 (2004). [DOI] [PubMed] [Google Scholar]
- 38.Peixoto D., et al. , Decoding and perturbing decision states in real time. Nature 591, 604–609 (2021). [DOI] [PubMed] [Google Scholar]
- 39.Schultz W., Responses of midbrain dopamine neurons to behavioral trigger stimuli in the monkey. J. Neurophysiol. 56, 1439–1461 (1986). [DOI] [PubMed] [Google Scholar]
- 40.Schultz W., Dayan P., Montague P. R., A neural substrate of prediction and reward. Science 275, 1593–1599 (1997). [DOI] [PubMed] [Google Scholar]
- 41.Montague P. R., Dayan P., Sejnowski T. J., A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Schultz W., Apicella P., Ljungberg T., Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neurosci. 13, 900–913 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Gershman S. J., et al. , Explaining dopamine through prediction errors and beyond. Nat. Neurosci. 27, 1645–1655 (2024). [DOI] [PubMed] [Google Scholar]
- 44.Miletic S., “Modelling structure and function of the human subcortex,” PhD thesis, University of Amsterdam, Amsterdam, The Netherlands (2023).
- 45.Forstmann B. U., De Hollander G., Van Maanen L., Alkemade A., Keuken M. C., Towards a mechanistic understanding of the human subcortex. Nat. Rev. Neurosci. 18, 57–65 (2017). [DOI] [PubMed] [Google Scholar]
- 46.Keuken M. C., Van Maanen L., Boswijk M., Forstmann B. U., Steyvers M., Large scale structure-function mappings of the human subcortex. Sci. Rep. 8, 15854 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.De Hollander G., Keuken M. C., van der Zwaag W., Forstmann B. U., Trampel R., Comparing functional MRI protocols for small, iron-rich basal ganglia nuclei such as the subthalamic nucleus at 7 T and 3 T. Hum. Brain Mapp. 38, 3226–3248 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Miletić S., et al. , fMRI protocol optimization for simultaneously studying small subcortical and cortical areas at 7 T. NeuroImage 219, 116992 (2020). [DOI] [PubMed] [Google Scholar]
- 49.Miletić S., et al. , 7T functional MRI finds no evidence for distinct functional subregions in the subthalamic nucleus during a speeded decision-making task. Cortex 155, 162–188 (2022). [DOI] [PubMed] [Google Scholar]
- 50.Turner B. M., Palestro J. J., Miletić S., Forstmann B. U., Advances in techniques for imposing reciprocity in brain-behavior relations. Neurosci. Biobehav. Rev. 102, 327–336 (2019). [DOI] [PubMed] [Google Scholar]
- 51.Teller D. Y., Linking propositions. Vis. Res. 24, 1233–1246 (1984). [DOI] [PubMed] [Google Scholar]
- 52.Gelman A., Hill J., Data Analysis Using Regression and Multilevel/Hierarchical Models (Cambridge University Press, Cambridge, MA, 2006). [Google Scholar]
- 53.Spearman C., The proof and measurement of association between two things. Am. J. Psychol. 15, 72–101 (1904). [PubMed] [Google Scholar]
- 54.Turner B. M., Forstmann B. U., Steyvers M., Joint Models of Neural and Behavioral Data, Computational Approaches to Cognition and Perception (Springer International Publishing, 2019). [Google Scholar]
- 55.Turner B. M., Wang T., Merkle E. C., Factor analysis linking functions for simultaneously modeling neural and behavioral data. NeuroImage 153, 28–48 (2017). [DOI] [PubMed] [Google Scholar]
- 56.Turner B. M., Rodriguez C. A., Norcia T. M., McClure S. M., Steyvers M., Why more is better: Simultaneous modeling of EEG, fMRI, and behavioral data. NeuroImage 128, 96–115 (2016). [DOI] [PubMed] [Google Scholar]
- 57.Turner B. M., Forstmann B. U., Love B. C., Palmeri T. J., Van Maanen L., Approaches to analysis in model-based cognitive neuroscience. J. Math. Psychol. 76, 65–79 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Groot J. M., et al. , A high-resolution 7 Tesla resting-state fMRI dataset optimized for studying the subcortex. Data Brief 55, 110668 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Groot J. M., et al. , Echoes from intrinsic connectivity networks in the subcortex. J. Neurosci. 43, 6609–6618 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Isherwood S. J. S., et al. , Investigating intra-individual networks of response inhibition and interference resolution using 7T MRI. NeuroImage 271, 119988 (2023). [DOI] [PubMed] [Google Scholar]
- 61.Lloyd B., et al. , Subcortical nuclei of the human ascending arousal system encode anticipated reward but do not predict subsequent memory. Cereb. Cortex 35, bhaf101 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Trutti A. C., et al. , Investigating working memory updating processes of the human subcortex using 7 Tesla fMRI. eLife 13, RP97874 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Bazin P. L., Alkemade A., Mulder M. J., Henry A. G., Forstmann B. U., Multi-contrast anatomical subcortical structures parcellation. eLife 9, 1–23 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Rescorla R. A., Wagner A. R., A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. Classical Cond.: II. Curr. Res. Theory 21, 64–99 (1972). [Google Scholar]
- 65.Iglesias J. E., et al. , A probabilistic atlas of the human thalamic nuclei combining ex vivo MRI and histology. NeuroImage 183, 314–326 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Haber S. N., Knutson B., The reward circuit: Linking primate anatomy and human imaging. Neuropsychopharmacology 35, 4–26 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Pauli W. M., O’Reilly R. C., Yarkoni T., Wager T. D., Regional specialization within the human striatum for diverse psychological functions. Proc. Natl. Acad. Sci. U.S.A. 113, 1907–1912 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.del Rey N. L. G., García-Cabezas M. Á., Cytology, architecture, development, and connections of the primate striatum: Hints for human pathology. Neurobiol. Dis. 176, 105945 (2023). [DOI] [PubMed] [Google Scholar]
- 69.Bazin P. L., et al. , Automated parcellation and atlasing of the human subcortex with ultra high-resolution quantitative MRI. Imaging Neurosci. 3, imag_a_00560 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Van Maanen L., et al. , Neural correlates of trial-to-trial fluctuations in response caution. J. Neurosci. 31, 17488–17495 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Forstmann B. U., et al. , Striatum and pre-SMA facilitate decision-making under time pressure. Proc. Natl. Acad. Sci. U.S.A. 105, 17538–17542 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Thura D., Cabana J. F., Feghaly A., Cisek P., Integrated neural dynamics of sensorimotor decisions and actions. PLoS Biol. 20, e3001861 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Van Maanen L., Fontanesi L., Hawkins G. E., Forstmann B. U., Striatal activation reflects urgency in perceptual decision making. NeuroImage 139, 294–303 (2016). [DOI] [PubMed] [Google Scholar]
- 74.Hung Y., Gaillard S. L., Yarmak P., Arsalidou M., Dissociations of cognitive inhibition, response inhibition, and emotional interference: Voxelwise ALE meta-analyses of fMRI studies. Hum. Brain Mapp. 39, 4065–4082 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Cieslik E. C., Mueller V. I., Eickhoff C. R., Langner R., Eickhoff S. B., Three key regions for supervisory attentional control: Evidence from neuroimaging meta-analyses. Neurosci. Biobehav. Rev. 48, 22–34 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Isherwood S. J. S., Keuken M. C., Bazin P. L., Forstmann B. U., Cortical and subcortical contributions to interference resolution and inhibition—An fMRI ALE meta-analysis. Neurosci. Biobehav. Rev. 129, 245–260 (2021). [DOI] [PubMed] [Google Scholar]
- 77.Murphy P. R., Boonstra E., Nieuwenhuis S., Global gain modulation generates time-dependent urgency during perceptual choice in humans. Nat. Commun. 7, 13526 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Murphy P. R., Vandekerckhove J., Nieuwenhuis S., Pupil-linked arousal determines variability in perceptual decision making. PLoS Comput. Biol. 10, e1003854 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Motelow J., Blumenfeld H., Consciousness and Subcortical Arousal Systems (Elsevier Inc., 2014), pp. 277–298. [Google Scholar]
- 80.Anderson A. K., et al. , Dissociated neural representations of intensity and valence in human olfaction. Nat. Neurosci. 6, 196–202 (2003). [DOI] [PubMed] [Google Scholar]
- 81.Small D. M., et al. , Dissociation of neural representation of intensity and affective valuation in human gustation. Neuron 39, 701–711 (2003). [DOI] [PubMed] [Google Scholar]
- 82.Li X. B., Inoue T., Nakagawa S., Koyama T., Effect of mediodorsal thalamic nucleus lesion on contextual fear conditioning in rats. Brain Res. 1008, 261–272 (2004). [DOI] [PubMed] [Google Scholar]
- 83.Zola-Morgan S., Squire L. R., Amnesia in monkeys after lesions of the mediodorsal nucleus of the thalamus. Ann. Neurol. 17, 558–564 (1985). [DOI] [PubMed] [Google Scholar]
- 84.Aggleton J. P., Mishkin M., Visual recognition impairment following medial thalamic lesions in monkeys. Neuropsychologia 21, 189–197 (1983). [DOI] [PubMed] [Google Scholar]
- 85.Aggleton J. P., Mishkin M., Memory impairments following restricted medial thalamic lesions in monkeys. Exp. Brain Res. 52, 199–209 (1983). [DOI] [PubMed] [Google Scholar]
- 86.Gaffan D., Parker A., Mediodorsal thalamic function in scene memory in rhesus monkeys. Brain 123, 816–827 (2000). [DOI] [PubMed] [Google Scholar]
- 87.Child N. D., Benarroch E. E., Anterior nucleus of the thalamus. Neurology 81, 1869–1876 (2013). [DOI] [PubMed] [Google Scholar]
- 88.Isherwood S. J. S., et al. , Multi-study fMRI outlooks on subcortical BOLD responses in the stop-signal paradigm. eLife 12, RP88652 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Weindel G., van Maanen L., Borst J. P., Trial-by-trial detection of cognitive events in neural time-series. Imaging Neurosci. 2, 1–28 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Cerri D. H., et al. , Distinct neurochemical influences on fMRI response polarity in the striatum. Nat. Commun. 15, 1916 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.O’Doherty J. P., The problem with value. Neurosci. Biobehav. Rev. 43, 259–268 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Farries M. A., Faust T. W., Mohebi A., Berke J. D., Selective encoding of reward predictions and prediction errors by globus pallidus subpopulations. Curr. Biol. 33, 4124–4135.e5 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Kutlu M. G., et al. , Dopamine release in the nucleus accumbens core signals perceived saliency. Curr. Biol. 31, 4748–4761.e8 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Kutlu M. G., et al. , Dopamine signaling in the nucleus accumbens core mediates latent inhibition. Nat. Neurosci. 25, 1071–1081 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Madden M. B., et al. , A role for the claustrum in cognitive control. Trends Cogn. Sci. 26, 1133–1152 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Gardner M. P. H., Schoenbaum G., Gershman S. J., Rethinking dopamine as generalized prediction error. Proc. R. Soc. Lond. B, Biol. Sci. 285, 20181645 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Watabe-Uchida M., Eshel N., Uchida N., Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40, 373–394 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.D’Ardenne K., McClure S. M., Nystrom L. E., Cohen J. D., BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science 319, 1264–1267 (2008). [DOI] [PubMed] [Google Scholar]
- 99.Pauli W. M., et al. , Distinct contributions of ventromedial and dorsolateral subregions of the human substantia nigra to appetitive and aversive learning. J. Neurosci. 35, 14220–14233 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Zhang Y., Larcher K. M. H., Misic B., Dagher A., Anatomical and functional organization of the human substantia nigra and its connections. eLife 6, 1–6 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Hauser T. U., Eldar E., Dolan R. J., Separate mesocortical and mesolimbic pathways encode effort and reward learning signals. Proc. Natl. Acad. Sci. U.S.A. 114, E7395–E7404 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Fontanesi L., Gluth S., Rieskamp J., Forstmann B. U., The role of dopaminergic nuclei in predicting and experiencing gains and losses: A 7T human fMRI study. bioRxiv [Preprint] (2019). 10.1101/732560 (Accessed 3 May 2023). [DOI]
- 103.Garrison J., Erdeniz B., Done J., Prediction error in reinforcement learning: A meta-analysis of neuroimaging studies. Neurosci. Biobehav. Rev. 37, 1297–1310 (2013). [DOI] [PubMed] [Google Scholar]
- 104.Goense J. B., Logothetis N. K., Neurophysiology of the BOLD fMRI signal in awake monkeys. Curr. Biol. 18, 631–640 (2008). [DOI] [PubMed] [Google Scholar]
- 105.Logothetis N. K., Pauls J., Augath M., Trinath T., Oeltermann A., Neurophysiological investigation of the basis of the fMRI signal. Nature 412, 150–157 (2001). [DOI] [PubMed] [Google Scholar]
- 106.Hall C. N., Howarth C., Kurth-Nelson Z., Mishra A., Interpreting BOLD: Towards a dialogue between cognitive and cellular neuroscience. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 371, 20150348 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Lohrenz T., Kishida K. T., Montague P. R., BOLD and its connection to dopamine release in human striatum: A cross-cohort comparison. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 371, 20150352 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Ferenczi E. A., et al. , Prefrontal cortical regulation of brainwide circuit dynamics and reward-related behavior. Science 351, aac9698 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Hirsch E. C., Graybiel A. M., Agid Y., Melanized dopaminergic neurons are differentially affected in Parkinson’s disease. Nature 334, 345–348 (1988). [DOI] [PubMed] [Google Scholar]
- 110.Koob G. F., Volkow N. D., Neurobiology of addiction: A neurocircuitry analysis. Lancet: Psychiatry 3, 760–773 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Brühl A. B., Delsignore A., Komossa K., Weidt S., Neuroimaging in social anxiety disorder—A meta-analytic review resulting in a new neurofunctional model. Neurosci. Biobehav. Rev. 47, 260–280 (2014). [DOI] [PubMed] [Google Scholar]
- 112.Groenewold N. A., et al. , Volume of subcortical brain regions in social anxiety disorder: Mega-analytic results from 37 samples in the ENIGMA-Anxiety Working Group. Mol. Psychiatry 28, 1079–1089 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.LeDoux J. E., Pine D. S., Using neuroscience to help understand fear and anxiety: A two-system framework. Am. J. Psychiatry 173, 1083–1093 (2016). [DOI] [PubMed] [Google Scholar]
- 114.Frank M. J., Seeberger L. C., O’Reilly R. C., By carrot or by stick: Cognitive reinforcement learning in Parkinsonism. Science 306, 1940–1943 (2004). [DOI] [PubMed] [Google Scholar]
- 115.Redish A. D., Addiction as a computational process gone awry. Science 306, 1944–1947 (2004). [DOI] [PubMed] [Google Scholar]
- 116.Dillon D. G., et al. , Peril and pleasure: An Rdoc-inspired examination of threat responses and reward processing in anxiety and depression. Depress. Anxiety 31, 233–249 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Luckhardt C., et al. , Reward processing in adolescents with social phobia and depression. Clin. Neurophysiol. 150, 205–215 (2023). [DOI] [PubMed] [Google Scholar]
- 118.Lozano A. M., et al. , Deep brain stimulation: Current challenges and future directions. Nat. Rev. Neurology. 15, 148–160 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Richey J. A., et al. , Common and distinct neural features of social and non-social reward processing in autism and social anxiety disorder. Soc. Cogn. Affective Neurosci. 9, 367–377 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Ez-zizi A., Farrell S., Leslie D., Malhotra G., Ludwig C. J., Reinforcement learning under uncertainty: Expected versus unexpected uncertainty and state versus reward uncertainty. Comput. Brain Behav. 6, 626–650 (2023). [Google Scholar]
- 121.Barr D. J., Levy R., Scheepers C., Tily H. J., Random effects structure for confirmatory hypothesis testing: Keep it maximal. J. Mem. Lang. 68, 255–278 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Kuznetsova A., Brockhoff P. B., Christensen R. H. B., lmerTest package: Tests in linear mixed effects models. J. Stat. Software 82, 1–26 (2017). [Google Scholar]
- 123.Bates D., Mächler M., Bolker B., Walker S., Fitting linear mixed-effects models using lme4. J. Stat. Software 67, 1–48 (2015). [Google Scholar]
- 124.Marques J. P., et al. , MP2RAGE, a self bias-field corrected sequence for improved segmentation and T1-mapping at high field. NeuroImage 49, 1271–1281 (2010). [DOI] [PubMed] [Google Scholar]
- 125.Caan M. W. A., et al. , MP2RAGEME: T1, T2*, and QSM mapping in one sequence at 7 tesla. Hum. Brain Mapp. 40, 1786–1798 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Langkammer C., et al. , Fast quantitative susceptibility mapping using 3D EPI and total generalized variation. NeuroImage 111, 622–630 (2015). [DOI] [PubMed] [Google Scholar]
- 127.Bazin P. L., et al. , Denoising high-field multi-dimensional MRI with local complex PCA. Front. Neurosci. 13, 1–10 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Miletić S., et al. , Charting human subcortical maturation across the adult lifespan with in vivo 7 T MRI. NeuroImage 249, 118872 (2022). [DOI] [PubMed] [Google Scholar]
- 129.Esteban O., et al. , fMRIPrep. Zenodo. 10.5281/zenodo.852659. Accessed 5 August 2024. [DOI]
- 130.Esteban O., et al. , fMRIPrep: A robust preprocessing pipeline for functional MRI. Nat. Methods 16, 111–116 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131.Gorgolewski K. J., et al. , Nipype: A flexible, lightweight and extensible neuroimaging data processing framework in Python. Front. Neuroinf. 5 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.Gorgolewski K. J., et al. , Nipype. Zenodo. 10.5281/zenodo.596855. Accessed 5 August 2024. [DOI]
- 133.Glover G. H., Deconvolution of impulse response in event-related BOLD fMRI. NeuroImage 9, 416–429 (1999). [DOI] [PubMed] [Google Scholar]
- 134.Glover G. H., Li Tq., Ress D., Image-based method for retrospective correction of physiological motion effects in fMRI: RETROICOR. Magn. Reson. Med. 44, 162–167 (2000). [DOI] [PubMed] [Google Scholar]
- 135.Harvey A. K., et al. , Brainstem functional magnetic resonance imaging: Disentangling signal from physiological noise. J. Magn. Reson. Imaging 28, 1337–1344 (2008). [DOI] [PubMed] [Google Scholar]
- 136.Chang C., Cunningham J. P., Glover G. H., Influence of heart rate on the BOLD signal: The cardiac response function. NeuroImage 44, 857–869 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Harrison S. J., et al. , A Hilbert-based method for processing respiratory timeseries. NeuroImage 230, 117787 (2021). [DOI] [PubMed] [Google Scholar]
- 138.Birn R. M., Smith M. A., Jones T. B., Bandettini P. A., The respiration response function: The temporal dynamics of fMRI signal fluctuations related to changes in respiration. NeuroImage 40, 644–654 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139.Kasper L., et al. , The PhysIO toolbox for modeling physiological noise in fMRI data. J. Neurosci. Methods 276, 56–72 (2017). [DOI] [PubMed] [Google Scholar]
- 140.Frässle S., et al. , TAPAS: An open-source software package for translational neuromodeling and computational psychiatry. Front. Psych. 12, 1–25 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141.Behzadi Y., Restom K., Liau J., Liu T. T., A component based noise correction method (CompCor) for BOLD and perfusion based fMRI. NeuroImage 37, 90–101 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142.Mumford J. A., et al. , The response time paradox in functional magnetic resonance imaging analyses. Nat. Hum. Behav. 8, 349–360 (2024). [DOI] [PubMed] [Google Scholar]
- 143.Smith S., Brady J., SUSAN—A new approach to low level image processing. Int. J. Comput. Vis. 23, 45–78 (1997). [Google Scholar]
- 144.Woolrich M. W., Ripley B. D., Brady M., Smith S. M., Temporal autocorrelation in univariate linear modeling of FMRI data. NeuroImage 14, 1370–1386 (2001). [DOI] [PubMed] [Google Scholar]
- 145.Woolrich M. W., Behrens T. E., Beckmann C. F., Jenkinson M., Smith S. M., Multilevel linear modelling for FMRI group analysis using Bayesian inference. NeuroImage 21, 1732–1747 (2004). [DOI] [PubMed] [Google Scholar]
- 146.Stevenson N., et al. , Using group level factor models to resolve high dimensionality in model-based sampling. Psychol. Methods, 10.1037/met0000618 (2024). [DOI] [PubMed]
- 147.Gunawan D., Hawkins G. E., Tran M. N., Kohn R., Brown S. D., New estimation approaches for the hierarchical linear ballistic accumulator model. J. Math. Psychol. 96, 102368 (2020). [Google Scholar]
- 148.Stevenson N., et al. , Joint modelling of latent cognitive mechanisms shared across decision-making domains. Comput. Brain Behav. 7, 1–22 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149.Gelman A., Rubin D. B., Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–472 (1992). [Google Scholar]
- 150.Huang A., Wand M. P., Simple marginally noninformative prior distributions for covariance matrices. Bayesian Anal. 8, 439–452 (2013). [Google Scholar]
- 151.Stevenson N., et al. , EMC2: An R Package for cognitive models of choice. PsyArXiv [Preprint] (2025). 10.31234/osf.io/2e4dq_v4 (Accessed 8 July 2025). [DOI]
- 152.Stevenson N., Miletic S., Forstmann B., Bridging brain and behavior: A step-by-step tutorial to joint modeling with fMRI. PsyArXiv [Preprint] (2025). 10.31234/osf.io/rhfk3_v2 (Accessed 8 July 2025). [DOI]
- 153.Miletic S., et al. , Data from “Joint models reveal human subcortical underpinnings of choice and learning behavior”. OSF. https://osf.io/pc5bm. Deposited 27 June 2025. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix 01 (PDF)
Data Availability Statement
MRI timeseries and behavioral data have been deposited in OSF (https://osf.io/pc5bm) (153).




