Skip to main content
eLife logoLink to eLife
. 2025 Apr 15;13:RP101841. doi: 10.7554/eLife.101841

Neural mechanisms of credit assignment for delayed outcomes during contingent learning

Phillip P Witkowski 1,2,3,, Lindsay JH Rondot 1,2,†,, Zeb Kurth-Nelson 4, Mona M Garvert 5, Raymond J Dolan 4,6, Timothy EJ Behrens 6,7,8,, Erie Boorman 1,2,†,
Editors: Michael J Frank9, Michael J Frank10
PMCID: PMC11999693  PMID: 40231604

Abstract

Adaptive behavior in complex environments critically relies on the ability to appropriately link specific choices or actions to their outcomes. However, the neural mechanisms that support the ability to credit only those past choices believed to have caused the observed outcomes remain unclear. Here, we leverage multivariate pattern analyses of functional magnetic resonance imaging (fMRI) data and an adaptive learning task to shed light on the underlying neural mechanisms of such specific credit assignment. We find that the lateral orbitofrontal cortex (lOFC) and hippocampus (HC) code for the causal choice identity when credit needs to be assigned for choices that are separated from outcomes by a long delay, even when this delayed transition is punctuated by interim decisions. Further, we show when interim decisions must be made, learning is additionally supported by lateral frontopolar cortex (lFPC). Our results indicate that lFPC holds previous causal choices in a ‘pending’ state until a relevant outcome is observed, and the fidelity of these representations predicts the fidelity of subsequent causal choice representations in lOFC and HC during credit assignment. Together, these results highlight the importance of the timely reinstatement of specific causes in lOFC and HC in learning choice-outcome relationships when delays and choices intervene, a critical component of real-world learning and decision making.

Research organism: Human

Introduction

Humans and animals have a remarkable ability to navigate complex environments and infer the likely state of the world from observed phenomena. Such adaptive behavior requires the ability to learn about causal relationships between one’s choices and subsequent outcomes. A key challenge for learning systems in the brain arises when a task involves temporal delays between choices and their outcomes. Cooking is one such task in which many decisions may be made about how to adjust the flavor profile of a dish, but the resultant outcomes of these choices typically will not be evaluated until sitting down to eat. Moreover, cooking often requires juggling multiple sub-tasks simultaneously, meaning that interim decisions need to be performed in between adding an ingredient and observing its effect on the dish’s flavor. In such cases, discerning the causal relationship between a particular choice and possible outcomes is nontrivial. While this ability to link choices and outcomes is critical to success in real-world tasks, little is known about how these links are forged at the neural level.

A large body of pioneering work focusing on the role of the lateral orbitofrontal cortex (lOFC) has highlighted the importance of this region in contingent learning (Gardner and Schoenbaum, 2021; Murray and Rudebeck, 2018; Rushworth et al., 2011). Recent studies in multiple species have emphasized a special role for lOFC in leveraging task knowledge for credit assignment, linking specific reinforcement outcomes to specific past choices (Boorman et al., 2013; Jocham et al., 2016; Lamba et al., 2023; Stalnaker et al., 2015; Sutton and Barto, 2014; Walton et al., 2010). In one key study, lesions to the macaque lOFC, impaired the ability of animals to use a model of the task structure in order to track the contingency between specific choices and outcomes they caused, with credit erroneously spreading to non-causal choices (Walton et al., 2010). These results suggest that lOFC is required for using a model of the task structure to form, or update, an association between specific choices and outcomes. Such findings were subsequently replicated and extended in both rats and humans (Costa et al., 2023; Noonan et al., 2017). Other studies in humans have shown that outcome-related blood oxygen-level-dependent (BOLD) activity in lOFC is specific to contingent, but not non-contingent, reward observations (Jocham et al., 2016), and the magnitude of activity reflects the degree to which credit for an outcome is assigned (Boorman et al., 2013; Boorman et al., 2016). Collectively, these findings suggest that computations within the lOFC are critical to credit assignment; however, little is known about the mechanisms by which the lOFC supports assigning credit for outcomes to specific causes.

One possible mechanism by which the brain assigns credit when reinforcement is delayed is by reinstating a representation of the causal choice at the time of feedback. In principle, this could enable the choice representation to be associated with the online encoding of the outcome, potentially via changes in synaptic plasticity between co-active neuronal ensembles. Such coding of past choices specifically at the time of feedback has been identified in macaque lOFC neuronal ensembles, albeit in the absence of any task requirement for contingent learning (Tsujimoto et al., 2009). Likewise, altered dopaminergic prediction error responses in lOFC-lesioned rats were elegantly accounted for by a computational model that incorporates a loss of internal representations of an outcome-linked choice, leading to misattributing value across states (Takahashi et al., 2011). Information about previous choices is also found in regions to which the lOFC shares reciprocal connectivity, particularly the hippocampus (HC) (Barbas and Blatt, 1995; Wikenheiser and Schoenbaum, 2016). A largely separate literature focusing on HC has shown reinstatement of neural activity patterns previously elicited by a stimulus both at the time of choice and reward in sensory pre-conditioning paradigms (Barron et al., 2020; Kurth-Nelson et al., 2015; Wimmer and Shohamy, 2012), and likewise during associative inference and integration (Koster et al., 2018; Park et al., 2020; Zeithamova et al., 2012). Such hippocampal reinstatement of stimulus identity representations might be expected to support lOFC coding of relevant past choices for credit assignment, particularly following lengthier delays (Foerde and Shohamy, 2011; Shohamy et al., 2009; Wang et al., 2020).

In complex tasks where subsequent decisions intervene on the transitions between choices and resultant outcomes, the neural regions supporting credit assignment may extend to encompass regions that also support maintaining information about causal choices pending their resultant outcome. This would allow learning systems to precisely assign credit to causal choices by bridging over interim decisions that may otherwise be inappropriately linked to the observed outcome. A key region for maintaining such ‘pending’ information is the lateral frontal pole (lFPC), which has been implicated in maintaining information about prospective actions or cognitive processes that must be delayed and performed in the future (Burgess et al., 2007; Burgess et al., 2011; Burgess et al., 2022). Other research has shown that lFPC activity reflects the reliability of pending alternative task sets (Donoso et al., 2014; Koechlin et al., 2003; Koechlin and Hyafil, 2007), and that it tracks evidence favoring adapting behavior to specific counterfactual alternatives, and directed exploratory choices, in the future (Badre et al., 2012; Boorman et al., 2009; Boorman et al., 2011; Zajkowski et al., 2017). On this basis, we hypothesized that the lFPC would play a critical role in maintaining information about previous choices that will be needed for future credit assignment during interim decisions.

In the current study, we test these hypotheses using a learning task in which participants must track contingencies between specific choices and outcomes under conditions where choice-outcome transitions are direct following a delay, or indirect and involve an intervening decision. We show that in both conditions, the lOFC and HC reinstate representations of causal choices at the time of feedback. In the indirect condition, this information is critically dependent on representations of the causal choice maintained in a ‘pending state’ in lFPC, which predict subsequent reinstatement in lOFC and HC. Finally, we show that lOFC and HC code task-independent stimulus identity representations during feedback, suggesting a link between coding of a state’s identity and precise credit assignment.

Results

Learning task with direct and indirect choice-outcome transitions

Participants completed a learning task in which they chose between two abstract shapes to obtain one of two distinct outcomes (gift cards to locally available stores rated to be approximately equally desirable). Each shape had a certain probability of leading to one gift card and the inverse probability of leading to the other. These probabilities drifted over time but could be tracked based on the recent choice-outcome observations made in each trial (see Figure 1—figure supplement 1 for probability trajectories and Bayesian model fitting). Participants were informed of how many points each gift card would yield on each trial by colored numbers on the top of the screen, and that these points changed randomly from one trial to the next (Figure 1A). They were further told that at the end of the experiment one trial would be selected at random to count ‘for real’. That is, they would receive the gift card obtained on that trial with a value proportional to the number of points won. Thus, participants were incentivized to maximize their potential winnings on every trial by accurately tracking the probability that each shape would lead to each outcome, but not the history of reward amounts.

Figure 1. Learning Task Design and Behavioral Results.

(A) Two abstract shapes were probabilistically related to each of two outcome identities by independent transition probabilities p1 and p2. (B) Schematic of the direct transition condition. Participants chose one of the two shapes on each trial based on two pieces of information: their estimates of the probability that each would lead to either outcome identity (gift cards) and the randomly generated number of points they could potentially win if that outcome was obtained. The color of each number indicated the identity of the outcome on which that number of points could be won. In the example, green indicates the number of points for the Starbucks gift card, while pink indicates the number of points for iTunes. Next, participants observed the outcome of their choice (the gift card and amount) after a delay. (C) Schematic of the indirect transition condition. Same as (B) except that after participants made their choice they transitioned into another independent decision. After this second decision was made, participants observed the outcome of their first decision. (D) Results of logistic regression analysis predicting the current choice based on previously observed choice-outcome relationships. Each cell represents the combination of a previously observed choice with an observed outcome. The color of each cell shows the value of beta estimates for each combination of previous choice and observed outcome, averaged across participants. Positive values indicate that the choice-outcome pair predicted choosing the same shape again when that shape previously led to the currently desired outcome. (E) Theoretical decomposition of the matrix in (D) into groups of cells which reflect ‘appropriate credit assignment’ given the task structure (orange) and ‘credit spreading’ (pink). (F) Mean (± SEM) of beta coefficients for specific choice-outcome combinations averaged across the groupings of cells shown in E for each condition. See Figure 1—figure supplement 1 for model outputs and Bayesian model fitting.

Figure 1.

Figure 1—figure supplement 1. Follow up behavioral analyses.

Figure 1—figure supplement 1.

(A) Example trajectory across the experiment of the belief estimates generated from the Bayesian learner. Top is the trajectory of S1, and the bottom is the trajectory of S2. While lines represent the true probability trajectory is shown in white and the estimated belief is shown in pink. Color heatmap shows the probability mass for each possible belief in Sx ->O1. (B) Comparison of model fits between our Bayesian model and a value-based RL model (vRL) which used an interactive updating procedure to track the value of each shape based on the history of received rewards. The exceedance probability for the Bayesian model was 1, and 0 for the vRL model, suggesting that Bayesian model, which tracked transition probabilities between choices and outcomes, better fit participants actual choices compared to a value tracking model. (C) Logistic regression curves estimating the change in choice probabilities given the expected value difference between choices. Gray line shows participant specific lines, and the black line shows the effect across groups (associated t-statistics are calculated across participants). The left side shows the effect in the direct transition condition and the right side shows the indirect transition condition.

The task had two conditions which proceeded in a blocked fashion. In the ‘direct transition’ condition, participants saw the outcome of a choice after a delay period (Figure 1B). In the ‘indirect transition’ condition, participants did not see the outcome of their choice until after another choice had been made, requiring them to delay assigning credit to the initial choice until the appropriate outcome was observed (Figure 1C). Participants were instructed about which condition they were in with a screen displaying ‘Your latest choice’ in the direct transition condition, and ‘Your previous choice’ in the indirect condition. Finally, at the beginning of each block participants viewed each of the two abstract shapes and two outcome stimuli in a random order, without making decisions or observing outcomes. This ‘template’ block allowed us to measure neural responses to stimuli independently of the learning task.

Predicting current choice based on previous choice-outcome relationships

To test whether participants were using the structure of each condition to appropriately assign credit to causal choices, we performed a multiple logistic regression analysis testing the influence of previous choice-outcome combinations on the current choice. For each participant, independently in each condition, we constructed a GLM that predicted the current choice as a function of nine different combinations of previous choices and outcomes (Equation 1). For example, the first regressor predicted the current choice based on the previous choice and the previous outcome (trial t-1). These values were coded as 1 if the past choice led to the currently desired outcome, assumed to be the outcome with the largest monetary point value on the current trial, and –1 if it did not (results were virtually identical if we used the participant-specific indifference point (a) to define the desired outcome instead (see Equation 9)). The second regressor predicted the current choice based on the previous choice (t-1) and the outcome received two trials in the past (t-2), and so on for all nine combinations of previous choices and outcomes covering the previous three trials.

In the direct transition condition, we observed significant positive effects along the diagonal of the matrix (choicet1outcomet1: β=6.09, t(19) = 4.81, p<0.001; choicet2outcomet2: β=8.78, t(19) = 5.41, p<0.001; choicet3outcomet3, β=6.76, t(19) = 4.16, p<0.001; Figure 1D), indicating that participants assigned credit for each outcome to the choice made in same trial. In the indirect transition condition, current choices were significantly predicted by the most recently observed outcomes combined with choices made in the trial previous to those outcomes (choicet2outcomet1: β=4.20, t(19) = 2.92, p<0.01; choicet3outcomet2: β=5.07, t(19) = 4.75, p<0.001). Furthermore, the mean of the β-values which reflect appropriate credit assignment in each condition were significantly higher than the mean β-values which represented credit spreading (direct transition condition: t(19) = 5.39, p<0.001, indirect transition condition: t(19) = 4.34, p<0.001; Figure 1E and F). Follow-up analysis showed that participants’ choices in each trial integrated expectations about the probability of receiving a particular outcome and its magnitude and did not rely on estimates of a cached option value (Figure 1—figure supplement 1). These results show that participants used the appropriate task-structure when assigning credit for observed outcomes in each condition.

Next, we compared the relative precision of credit assignment between our behavioral conditions, where we predicted credit assignment would be less precise in the indirect transition condition compared to direct transition condition, owing to additional task complexity. We found that β-values representing appropriate credit assignment in the direct transition condition were higher than those in the indirect transition condition (t(19) = 1.81, p<0.05). However, β-values in cells that represent credit spreading in the direct transition condition were not significantly lower than those in the indirect transition condition (t(19) = 1.11, p=0.14). These results indicate that credit assignment was less precise in the indirect transition condition compared to the direct transition condition, despite each being appropriate for the respective task structure overall.

Causal choice codes are reinstated in lOFC and HC when viewing the outcome of choices

For the direct feedback condition, our main hypothesis was that lOFC codes for the specific causal choice when participants view the outcome of their choice. We also reasoned that, due to the delay between choice and feedback, this lOFC choice code would be supported by choice reinstatement in the interconnected HC (Barbas and Blatt, 1995; Wimmer and Shohamy, 2012). We tested this hypothesis by training a linear support vector machine (SVM) to distinguish BOLD activity patterns at the time of feedback based on the previously chosen shape, cross-validated across scanning runs (see Methods for details on decoding procedure). We used a searchlight analysis within a priori defined ROIs (see Figure 2—figure supplement 1) for lOFC and HC to estimate decoding accuracy for each voxel within the ROI (Kriegeskorte et al., 2008).

We found evidence for choice decoding in the predicted network of regions. Specifically, we found significant and marginally significant decoding of the causal choice in left ([x,y,z] = [–26, 42,–8], t(19) = 4.22, pTFCE <0.05 ROI-corrected using threshold-free cluster enhancement (TFCE) correction Smith and Nichols, 2009) and right ([x,y,z] = [24, 46, -8], t(19) = 3.45, pTFCE = 0.081 ROI-corrected) lOFC, respectively (Figure 2A). A similar pattern was also apparent in the HC, where right HC showed significant decoding ([x,y,z] = [36, -20, -16], t(19) = 4.02, pTFCE <0.05 ROI-corrected), while left HC showed a marginal effect ([x,y,z] = [-22,–10, –24], t(19) = 2.86, pTFCE = 0.080 ROI-corrected). Together, these results show that the lOFC and HC represent the causal choice at the time when credit is assigned in the direct condition of our task.

Figure 2. lOFC and HC carry representations of the causal choice when viewing outcomes.

Left side shows the analysis scheme for decoding representations of the causal choice at feedback in the direct transition condition. An SVM decoder was used to differentiate trials at the time of the outcome (purple) based on the causal choice selected during the ‘choice period’ (cyan). The right side shows axial and coronal slices through a t-statistic map showing significant decoding in OFC and HC during feedback. For illustration, all maps are displayed at threshold of t(19) = 2.54, p<0.01 uncorrected. All effects survive small volume correction in a priori defined anatomical ROIs. See Figure 2—figure supplements 13 for ROI definition and Figure 2—figure supplement 4 for power analysis.

Figure 2.

Figure 2—figure supplement 1. Pre-selected anatomical ROIs.

Figure 2—figure supplement 1.

Illustrations of pre-selected anatomical ROIs taken from Neubert et al., 2015. The lOFC ROI corresponds to index 9 and 30, lFPC corresponds to indexes 14 and 35. The HC ROI was defined in Yushkevich et al., 2015.
Figure 2—figure supplement 2. Functionally defined ROIs for in the direct transitions condition.

Figure 2—figure supplement 2.

(A) Despite having a priori defined anatomical ROIs for our decoding analysis of the causal choice, we wanted to test whether our results depended on these ROI definitions by using a data-driven approach. Here, we trained an SVM classifier to decode representations of the causal choice in run 1 of the direct transition condition, then tested the decoder on run 2 to find regions of the orbitofrontal cortex (OFC) and hippocampus (HC) that significantly decoded causal choice representations at a significance level of t(19) > 2.54, p<0.01, uncorrected. We then used these regions as ROIs for a separate analysis which trained the classifier in run 1 and tested the classifier in run 2. (B) Shows ROIs generated from the same procedure as described in A, but the use of each run for training and testing are switched.
Figure 2—figure supplement 3. Main effect of choice decoding accuracy at the time of feedback TFCE corrected in each run of the direct transition condition.

Figure 2—figure supplement 3.

(A) Regions of the OFC showing significant decoding of the causal choice in run 1 of the direct transition condition. Significance was tested using TFCE correction over voxels with the ROI generated from run 2, using the procedure described above (Figure 2—figure supplement 2). For illustration, we show voxels that survive at threshold to t(19)=1.73, p<0.05 uncorrected. (B) Shows the same as A but for voxels in run 2, using the ROI generated from run 1.
Figure 2—figure supplement 4. Power analysis for Reinstatement Effect in the lOFC.

Figure 2—figure supplement 4.

Power analysis using an independent data set. Twenty-eight participants competed an associative learning task, in which they learned the causal associations between four different choices, and two food rewards. We estimated voxel activity at the time of the outcome for each trial and tested for multivariate patterns of the causal choice in the lOFC, using the same procedures described in the main text (see Methods). We began by drawing 1000 samples of participants of size N, with replacement, for values of N ranging from 15 to 25. We then tested for significant decoding of the causal choice within each subset using small-volume TFCE correction. Finally, we calculated the proportion of these samples that were at or below a significance level of pTFCE <0.05.

Pending item representations in lFPC during indirect transitions predict credit assignment in lOFC

The indirect transition condition allowed us to test whether similar reinstatement mechanisms, as described above, support credit assignment when choice-outcome transitions are punctuated by interim decisions. We anticipated that the structure of the indirect transition condition would render credit assignment more difficult compared to the direct transition condition; a prediction borne out by our behavioral analysis of learning (Figure 1F). Repeating the causal choice decoding analysis on this condition did not reveal a significant effect in any a priori defined ROI (all pTFCE >0.05 ROI corrected), nor did we find significant decoding elsewhere in the brain (all pTFCE >0.05 whole brain corrected). However, a key attribute of this condition is that causal choices must be held in a pending state during interim choices until a prospective outcome is observed. Thus, we reasoned that the fidelity of credit assignment at the time of feedback would be intimately related to the fidelity with which representations were maintained during the interim decision.

Following previous work suggesting that prospective representations of to-be-completed tasks are supported by lFPC (Burgess et al., 2011; Koechlin and Hyafil, 2007), we predicted that lFPC would hold causal choices in a ‘pending state’ when credit assignment needs to be deferred until the resulting outcome is observed. To test this hypothesis, we used a linear SVM to classify neural activity at the time of feedback based on the immediately preceding choice. Note that in this condition the immediately preceding choice is not the cause of the currently observed outcome, but is the cause of the outcome for which credit will be assigned in the next trial. We call this the ‘pending causal choice’. Our analysis revealed a cluster of voxels specifically within the right lFPC ([x,y,z] = (28, 54, 8), t(19) = 3.74, pTFCE <0.05 ROI-corrected; left hemisphere all pTFCE >0.1, Figure 3A), consistent with right lFPC coding for the pending causal choice at feedback time, precisely when the outcome of the prior choice causal choice needed to be evaluated.

Figure 3. lFPC carries representations of the pending causal choice during indirect transitions and predicts credit assignment in the lOFC and HC.

(A) Left side shows the analysis scheme for decoding information about the causal choice in ‘pending state’ (pink) in the indirect transition condition. We decoded information about the previous choice during the feedback period, during which the causal choice should be ‘pending’ credit assignment in the next trial. The image on the right shows a coronal slice through a t-statistic map, showing significant decoding in lFPC. (B) The analysis scheme for the information connectivity analysis which uses the trial-by-trial fidelity of causal choice representations in the ‘pending state’ (pink) to predict the fidelity of these same choices when the outcome is observed (purple). The right side shows axial and coronal slices of a t-statistic map showing effects in lOFC and HC. All maps are displayed using the same conventions as Figure 2 and all effects survive small volume correction in a priori defined anatomical ROIs (for whole brain analysis, see Figure 3—figure supplement 2). (C) Axial (left) and coronal (right) slices through a t-statistic map showing the results of a control analysis in which we test the proportion of correct classifications of causal choice information in OFC and HPC at the time of the outcome for trials in which the lFPC showed correct classification for the causal choice during pending trials. The proportion of correct trials was compared to a permuted baseline of randomly drawn trials for each participant then combined over participants to create a t-statistic. (D) Secondary control analysis in which we reran the classification analysis for causal choice information at the time of outcome, but only on trials where lFPC was found to correctly decode pending causal choice information. Note that this test is different from A because we allowed the classifier to create a new hyperplane separating categories for only those trials in which the lFPC decoding was ‘correct’. For illustration, all maps are displayed at a threshold of t(19)=2.54, p<0.01 uncorrected. All effects survive small volume correction in a priori and functionally defined anatomical ROIs. See Figure 3—figure supplements 12 for ROI definition and whole brain searchlight.

Figure 3.

Figure 3—figure supplement 1. Significant information connectivity between lFPC and OFC in functionally defined ROI from direct transition condition.

Figure 3—figure supplement 1.

(A) We did not observe significant decoding of the causal choice in a bilateral OFC ROI defined by significant cluster in the indirected transition condition. Thus, we used the accuracy map for decoding choices at feedback during the direct transition condition (t (19)>1.73; p<0.05) in the OFC, averaged across runs. (B) We then used those clusters as ROI for TFCE correction for regions of the lOFC that showed significant information connectivity with lFPC. We did this by testing for significant correlations between the trial-by-trial fidelity of pending representations in the lFPC and causal choice representation during feedback in lOFC (see Methods).
Figure 3—figure supplement 2. Exploratory information connectivity analysis for ‘Indirect transition condition’.

Figure 3—figure supplement 2.

To ascertain whether additional regions maybe involved in credit assignment beyond those that formed the focus of our study, we repeated the analysis described in Figure 3 but using a whole brain search light procedure. All aspects of the analyses were the same as those previously conducted except that we corrected for multiple comparisons at the whole brain level using TFCE. For the ‘direct transition condition’, we found no additional regions that showed high decoding for the causal choice at the time of outcome. However, for the ‘indirect transition condition’ we identified a region of medial OFC (mOFC) which showed information about the causal choice that was predicted by pending representation in lFPC (pTFCE <.05). The left panel shows a coronal slice through a t-statistic map, thresholded using the same conventions as Figure 3; the right panel shows a sagittal slice through the same map. These results suggest a potential role for mOFC in credit assignment uniquely during the ‘indirect transition condition’.

To test whether pending choice information held in lFPC was directly related to the causal choice information coded during subsequent credit assignment we used an ‘information connectivity’ (IC) analysis, which seeks to identify how information is shared between brain regions (Coutanche and Thompson-Schill, 2013). Specifically, we tested the correlation between the fidelity of the previous choice representation when in a pending state, and the same causal choice representation during subsequent credit assignment. We began using an SVM to classify representations of the causal choice during the interim feedback period in voxels in the lFPC that were shown to code this information in our previous analysis (thresholded at t(19) = 2.54, p<0.01).

Note that this relatively liberal threshold simply allows for the inclusion of more voxels for a statistically independent test in a left-out set of trials, thereby obviating selection bias. In a left-out set of trials, we calculated the distances between the estimated hyperplane and trial-level voxel activation patterns, and then signed these distances such that positive distances reflected ‘correct’ classifications and negative distances reflected ‘incorrect’ classifications. These signed distances allow us to relate both success in decoding information, as well as failures, between regions. Next, we applied the same method to quantify and sign the distances when decoding the same causal choices at the time of credit assignment – that is, when viewing the relevant outcome in the next trial. Finally, we correlated the decoding distances of causal choices in a pending state in lFPC with decoding distances of these choices during credit assignment in our lOFC and HC ROIs. This allowed us to assess whether the fidelity of pending causal choices representations in lFPC predicts the fidelity of representations during credit assignment in the lOFC and HC.

This analysis revealed strong IC between representations in lFPC at feedback on trial t and the representations in lOFC and HC during feedback on trial t+1. Specifically, we found significant correlations in decoding distance between lFPC and bilateral lOFC ([x,y,z] = [-32,24,–22], t(19) = 3.81, [x,y,z] = [20, 38, -14], t(19) = 3.87, pTFCE <0.05 ROI corrected) and bilateral HC ([x,y,z] = [-28,–10, –24], t(19) = 3.41, [x,y,z] = [22, -10, -24], t(19) = 4.21, pTFCE <0.05 ROI corrected), Figure 3C. Subsequent analyses confirmed that this effect was due to these regions showing a significant increase in positive (correct) decoding in trials where pending information could be positively (correctly) decoded in lFPC, and not simply due to a reduction in incorrect information fidelity (see Figure 3C and D). This finding is consistent with the coding of the causal choice during feedback in lOFC and HC being dependent on that causal choice being faithfully maintained in a pending state in the lFPC.

HC represents task-independent stimulus identity at feedback

Next, we tested whether the content of past choice coding at feedback includes a stimulus identity code that is reinstated during credit assignment. To test for task-independent representations of the causal stimuli, we trained a linear SVM to distinguish neural patterns evoked when participants passively viewed each shape in ‘template trials’ (see Methods). Importantly, these were presented outside the context of the learning task and were not connected to a specific action or outcome. We then tested the classifier on neural patterns evoked at the time of feedback during the learning task. This revealed significant decoding of the causal stimulus identity at the time of feedback when averaged across direct and indirect conditions, in the left HC (Figure 4A; [x,y,z] = [-26,–16, –16], t(19) = 5.20, pTFCE <0.001 ROI-corrected; right hemisphere all pTFCE >0.1). Follow-up analyses showed a marginally significant effect in the direct transition condition alone ([x,y,z] = [-24,–16, –14], t(19) = 3.41, pTFCE = 0.08 ROI-corrected), and a significant effect in the indirect transition condition alone ([x,y,z] = [-28,–16, –18], t(19) = 3.65 pTFCE <0.05). These results show that when observing an outcome, the HC reinstates task-independent representations of causal stimuli, suggesting a role for the HC in retrieving the causal stimulus identity during credit assignment.

Figure 4. Task-independant representations of causal stimuli in HC at feedback.

(A - left) Schematic of the decoding procedure. In task-independent ‘template trials’, participants passively viewed images corresponding to the two choice stimuli and two outcome stimuli in the main task (for more information see Figure 4—figure supplement 1). We used these trials to train an SVM to differentiate stimuli outside the task context and then tested for representations of the causal choice stimulus at the time of feedback during the learning task. (A - right) A coronal slice through a t-statistic map showing regions of the HC with significantly above chance decoding for the causal choice stimulus identity at the time of feedback, across conditions. In this figure, ‘CA’ refers to ‘credit-assignment’. (B) Analysis scheme for generating each participant’s overall credit assignment precision. β-values for each participant were taken from the behavioral model predicting current choices given all combinations of the previous three choices and outcomes (Equation 1). Each participant’s pattern of β-values (left side matrices) were correlated with a matrix representing an optimal pattern of regression betas given the task structure (right side matrices). The optimal matrix was a binary matrix with ones where credit should be assigned for a given outcomes and zeros everywhere else. (C) Axial slice through a t-statistic map showing regions where decoding of the stimulus identity was significantly correlated with estimates of credit assignment precision. All maps are displayed using the same conventions as Figure 2 and all effects survive small volume correction in a priori defined anatomical ROIs. See Figure 4—figure supplements 1 and 2 for catch and bonus trials definition.

Figure 4.

Figure 4—figure supplement 1. Depiction of catch trials.

Figure 4—figure supplement 1.

To ensure that participants where we included valuable catch trials in the passive observing ‘template task’. Participants were asked to report which image out of the four (2 gift cards and 2 stimuli) was the last one presented on the screen. They were endowed an extra £10 from which we removed £1 for every incorrect response. There were four catch trials per template run.
Figure 4—figure supplement 2. Depiction of bonus trials.

Figure 4—figure supplement 2.

To ensure that participants where we included valuable catch trials in the passive observing ‘template task’. Participants were asked to report which image out of the four (2 gift cards and 2 stimuli) was the last one presented on the screen. They were endowed an extra £10 from which we removed £1 for every incorrect response. There were four catch trials per template run. The decision task included ‘bonus trials’ in which participants could predict which gift card they expected to see on the subsequent feedback screen given their choice. They were given 3£ extra on the final gift card that was given to them for every correct answer. The first run of the direct transition condition had two catch trials; the second run had one. Both runs of the indirect transition condition had one catch trial each.

We reasoned further that if the HC supports credit assignment by evoking task-independent identity representations, then the extent to which this information is coded in the HC should be intimately tied to behavioral estimates of credit assignment precision. Alternatively, identity representations in the HC might support credit assignment processes in lOFC, such that the extent to which this information is represented in lOFC is predictive of precise credit assignment. To test these predictions, we estimated each participant’s overall credit assignment precision by correlating their pattern of β-values from the logistic regression models predicting choice with those of an ‘ideal learner’ (Figure 4B). The pattern for an ideal learner was taken to be 1 for any choice-outcome combination that reflected the true task structure, and 0 everywhere else. Higher correlations between these patterns meant that participants appropriately assigned credit to causal choices without attribution spreading to non-causal choices. We then correlated each participant’s estimated credit assignment precision with the average decoding accuracy in HC and lOFC. We found that there was a significant correlation between credit assignment precision and decoding accuracy of the causal stimulus identity reinstatement in lOFC ([x,y,z] = [–24, 34,–16], t(19) = 3.24, pTFCE <0.05 ROI-corrected), but not HC (all pTFCE >0.09 ROI-corrected; Figure 4C). These results suggest that the extent to which identity information is reinstated in lOFC is directly related to the precision with which participants link appropriate choices and outcomes together.

Discussion

Flexible decision making in dynamic environments requires an ability to learn choice-outcome relationships across prolonged delays, which may often be punctuated by interim decisions. Understanding how the brain assigns credit for specific outcomes, and forges connections with their causal choices, is essential for models of learning and decision-making that seek to explain how organisms implement such goal-directed behaviors. The current study reveals critical roles of the lOFC and HC in such credit assignment by showing that these regions specifically represent the causal choice at the time the outcome is observed. Importantly, we show that when credit assignment must be delayed due to an intervening choice, representations of the causal stimulus are maintained in a ‘pending state’ in lFPC. The fidelity of these representations determines the strength of causal choice representations in lOFC and HC when the outcome is subsequently observed. Finally, we show that the content of representations in HC includes the task-independent stimulus identities of the causal choice at the time of feedback, and the extent to which these are also represented in lOFC predicts precise credit assignment. Together, these results show that lOFC and HC adaptively use the task structure to associate identity-specific representations of causal choices to their resultant outcomes during learning and provide novel evidence for interactions between learning systems and lFPC in elaborated task structures which emulate real-world complexity.

Our finding that the lOFC instantiates a representation of the causal stimulus at the time of feedback contributes to a broader literature concerning the role of the lOFC in credit assignment. Previous research has shown that monkeys with lOFC lesions exhibit deficits in appropriately assigning credit to causal choices (Walton et al., 2010). Similarly, activity in human lOFC has been consistently associated with learning about contingencies between choices and rewards (Boorman et al., 2016; Jocham et al., 2016; Lamba et al., 2023; Noonan et al., 2017; Witkowski et al., 2022). We add to this literature by showing that the lOFC and HC contain specific multivariate patterns for inferred causal choices when an outcome is observed, suggesting that these regions are involved in updating links between choices and outcomes. Our results from the ‘indirect transition’ condition show that these patterns are not merely representations of the most recent choice but are representations of the causal choice given the current task structure, and may exist alongside representations of the task structure, in the lOFC and elsewhere (Boorman et al., 2021; Park et al., 2020; Schuck et al., 2016; Seo and Lee, 2010). These findings highlight a key role for the lOFC and HC in creating links between causal states and goal-states (Boorman et al., 2021; Gardner and Schoenbaum, 2021; Howard and Kahnt, 2021; Wang and Kahnt, 2021), and suggest that these regions use the specific task structure to construct causal associations between states.

While our study was designed to focus on the complexity of assigning credit in tasks with different known causal structures, another important component of real-world credit assignment is temporal ambiguity. To isolate the mechanisms which create associations between specific choices and specific outcomes, we instructed participants on the causal structure of each task, removing temporal ambiguity about the causal choice. However, our results are largely congruent with previously reported results in tasks that dissolved the typical experimental trial structure, producing temporal ambiguity, and which observed more pronounced spreading of effect, in addition to appropriate credit assignment (Jocham et al., 2016). Namely, this study found that activation in the lOFC increased only when participants received rewards contingent on a previous action, an effect that was more pronounced in subjects whose behavior reflected more accurate credit assignment. This suggests a shared lOFC mechanism for credit assignment in different types of complex environments. Whether these mechanisms extend to situations where the temporal causal structure is completely unknown remains an important question.

Importantly, we present novel evidence that representations of ‘pending’ causal choices are stored online in the lFPC and predict the strength of causal choice representations at the time of the outcome. Our results fit precisely with theoretical proposals of lFPC functions, which propose that this region is involved in ‘prospective memory’ and tracking alternative behaviors or task sets during ongoing behaviors which may be returned to in the future (Boorman et al., 2009; Burgess et al., 2011; Koechlin and Hyafil, 2007; Tsujimoto et al., 2011). In the ‘indirect transition’ condition, participants needed to delay assigning credit when the first outcome was presented but return to this process when a prospective outcome was observed in the future. We show that when participants viewed outcomes for an unrelated choice, the lFPC held the content of the pending causal choice. These ‘pending’ representations predicted the strength of subsequent causal choice representations in lOFC and HC during the next feedback period, replicating the same network we observed in the ‘direct transition’ condition. The results extend prior work by showing that lFPC activity not only reflects statistics related to the evidence favoring pending options (Badre et al., 2009; Boorman et al., 2009; Boorman et al., 2011; Donoso et al., 2014), but the content of information held in a pending state. One interpretation of these results is that the lFPC actively protects information about causal choices when potentially interfering information must be processed. Future studies will be needed to determine if the lFPC’s contributions are specific to these instances of potential interference, and whether this is a passive or active process. Nonetheless, the findings provide new evidence for the involvement of the lFPC in learning within complex task structures where the transitions between choices and outcomes are indirect - structures which abound in the real world.

Although we show evidence that lFPC is involved in maintaining specific content about causal choices during interim choices, the limited temporal resolution of fMRI makes it difficult to tell if other regions may be supporting the learning processes at timescales not detectable in the BOLD response. Thus, it is possible that the network of regions supporting credit assignment in complex tasks may be much larger. Our results provide a critical first stem in discerning the nature of interactions between cognitive subsystems that make different contributions to the learning process in these complex tasks.

A revealing aspect of our study was the inclusion of ‘template’ trials, which allowed us to measure task-independent neural responses to the stimuli used during the learning task. By training a classifier to decode stimulus representation during passive viewing, we were able to test which regions of the brain coded the specific stimulus identity of the causal choices during credit assignment. Consistent with previous accounts of hippocampal involvement in associative learning and inference (Barron et al., 2020; Kurth-Nelson et al., 2015; Luettgau et al., 2020; Mack and Preston, 2016; Ranganath and Ritchey, 2012; Schuck and Niv, 2019; Wimmer and Shohamy, 2012), we found significant decoding of task-independent choice identities in HC across participants in both direct and indirect conditions. This suggests that the HC retrieves a representation of the stimulus identity to bind together outcomes with causal choice information at the time of credit assignment, supporting the idea that the HC is involved in linking together previous experiences of sensory information (McClelland et al., 1995). Interestingly, recent work has shown the HC neuronal ensembles code a veridical representation of stimulus identities and predicted outcomes, which are critical to inference-guided choices (Barron et al., 2020). Together, these findings imply that a state’s identity relationships constructed during credit assignment in the HC may be critical for future simulation of state-to-state transitions during outcome-guided inferences.

Interestingly, we found that the strength with which a stimulus identity can be decoded in the lOFC was correlated with behavioral measures of credit assignment, but not in HC. Recent work has shown that synchronized theta oscillations in macaques support information transfer from HC to the lOFC during value learning (Knudsen and Wallis, 2020). Disrupting these signals leads to learning deficits, suggesting that these regions work in concert to support value learning based on a relational cognitive map of the task. This synchrony between regions also finds support in human work showing strong functional connectivity and shared information between the anterior medial temporal cortex and OFC (Barnett et al., 2021; Bouffard et al., 2021; Ranganath and Ritchey, 2012). In our task, it is possible that while the HC coded task-independent identities of causal stimuli, the extent to which this information was transferred to, and represented, in the lOFC determined the efficacy of credit assignment. Future studies using methods with higher temporal resolution can elaborate on this idea by testing whether the HC and lOFC also share coherent stimulus identity information that is likewise channeled via theta phase coupling at the time of outcome, and how this information influences the credit assignment process.

In conclusion, we find that the lOFC and HC are critical to using model-based knowledge for efficiently forging links between outcomes and causal choices. Further, we show that in complex tasks where choice-outcome transitions may be interrupted, this credit assignment network relies on interactions with the lFPC, which maintains ‘pending’ representations of causal stimuli during the interim decision. Collectively, these findings make a novel contribution to our understanding of credit assignment in the brain by illuminating the neural mechanisms which underlie linking causal choices to outcomes in complex, real-world tasks.

Methods

Key resources table.

Reagent type (species) or resource Designation Source or reference Identifiers Additional information
Software MATLAB MathWorks https://www.mathworks.com;
RRID:SCR_001622
Matlab2018a
Software Presentation Neurobehavioral Systems http://neurobs.com;
RRID:SCR_002521
Version 18.1
Software LIBSVM Chang and Lin, 2011 http://www.csie.ntu.edu.tw/~cjlin/libsvm;
RRID:SCR_010243
Software MarsBaR Brett et al., 2002 http://marsbar.sourceforge.net/;
RRID:SCR_009605
Ver. 0.44

Participants

Twenty participants (11 females; 9 males; mean age = 23.5) were recruited from the general population around University College London to participate in the study. This sample size was commensurate with previous studies similar in design (Boorman et al., 2016; Howard et al., 2015; Jocham et al., 2016). Using an independent, unpublished data set, we conducted a power analysis for the desire neural effect in lOFC. We found that this number of participants had 84% power to detect this effect (see Figure 2—figure supplement 4). Participants were paid £10 and obtained a gift card of various amounts depending on their performance in the task. None of the participants reported a history of neurological or psychiatric disorder. All participants spoke fluent English and had normal or corrected-to-normal vision. The study was approved by the UCL Research Ethics Committee (Project ID Number: 3450/002), and all participants gave written informed consent.

Task design

Learning task

Participants completed a learning task in which they tracked associations between abstract shapes and specific reward identities (gift cards to two different stores), which were rated for approximately equal desirability. In each trial, participants selected one of two abstract shapes, which were randomly presented on either the left or right side of the screen. Decisions were based on two pieces of information: (1) inferred estimates of the probability that a particular shape would lead to each gift card based on the history of previous trials, and (2) the point value of each gift card on the current trial (Figure 1A–C). Participants were informed prior to starting the task that one of the trials would be chosen at random to count ‘for real’ at the end of the experiment. For this trial, they would receive money on the awarded gift card that was commensurate with the number of associated points (number of points divided by four). Point values for each outcome were presented as two numbers at the top of the screen, with the color of each number indicating the associated gift card identity. Their position relative to each other (top or bottom) was determined randomly on each trial.

Each shape had a specific probability of leading to each outcome and an inverse probability of leading to the other outcome. For example, shape 1 (S1) might lead to a Starbucks gift card with probability p1 and to an iTunes gift card with probability 1 p1. Shape 2 (S2) would lead to the same outcomes but with independent probabilities p2 and 1 p2, respectively. These true probabilities would drift independently over the course of the experiment, meaning that information about outcome probabilities could not be shared across shapes. On any given trial, the number of points that could be won for each gift card ranged from 20 to 100, with a minimum difference of at least 15 points. Although these magnitudes were predetermined, participants were told they were randomly generated at the beginning of each trial and that it was not useful to track them (Pearson correlation between magnitudes in trial n and n+1 was less than.2). Instead, to maximize rewards, participants had to track the probability that a shape led to each outcome and combine this with the reward magnitudes associated with each outcome on the current trial.

Each trial began with viewing the two possible choices for 0.5 s, during which selection was not possible. They then had 3.5 s to make their selection between the two options. The selected shape was highlighted for 0.5 s, before proceeding to the interstimulus interval (ISI), which lasted for a randomly selected duration between 4 s and 8 s. The outcome was then presented for 2000ms before a jittered inter-trial-interval (ITI) of 4s to 8s.

Participants did not have any prior knowledge about choice-outcome associations or how quickly these associations might change, but they knew that they could change throughout the task. Therefore, participants needed to infer both the current associative contingency for each shape and when these contingencies changed from their history of choices and observed outcomes.

Template task

Each run of the scanning session began with a ‘template task’. In this task, participants passively viewed a sequence of all four stimuli (two shapes and two gift cards), individually presented in random order. To ensure that participants were paying attention during passive viewing, they were presented with four ‘catch trials’ which occurred at random between images (see Figure 4—figure supplement 1). In catch trials, all four stimuli were presented simultaneously, and participants were asked to indicate which stimulus had just been presented (see Figure 4—figure supplement 2). Participants were told they could earn an additional £10 on the selected gift card if they responded correctly. However, they would be deducted £1 for each incorrect response or for not making responses in time (max response time = 3 s). Average accuracy for these catch trials was generally high (mean = 0.75, std = 0.15). Participants viewed each item for 1 s followed by a 2.5 s ISI.

Stimuli

Two visually distinct abstract shapes were used as choice objects. These shapes were randomly assigned to serve as S1 or S2 for each participant. The two gift cards were chosen to serve as reward identities during the experiment from six different possible gift cards (iTunes, Argos, Blackwells, Marks & Spencers, Boots, and Starbucks). Each participant rated the six gift cards on a scale from 0 (not preferable) to 100 (extremely preferable). The two gift cards were selected to have the minimal difference in ratings among the highest rated gift cards. This was done to prevent a strong preference for one outcome over the other. All stimuli were presented on a computer running Presentation software (Version 18.1, https://www.neurobs.com/).

Task-schedule and procedure

We generated a reward schedule that predetermined the outcome obtained for each choice on each trial, but this schedule was unknown to the participants. We optimized the schedule such that an ideal Bayesian learner (see Bayesian Computational model) would choose each shape and receive each outcome approximately an equal number of times (percent of overall trials where S1 was chosen was between 42% and 57%). This was done to reduce the potential for sampling bias in planned multivariate analyses. The schedule of outcomes for each shape was generated with independently drifting probabilities so participants could not learn anything about one shape from observing the outcome of the other shape (see Figure 1—figure supplement 1).

Participants completed three scanning runs in one session. The first two runs began with the template task, which was followed by the learning task (37 trials of the direct transition condition, then 37 trials of the indirect transition condition). The third run consisted of only the template task. The learning task began with instructions stating, ‘Your latest choice’, indicating that participants were in the direct transition condition. After 37 trials, a second instruction screen showed ‘Your previous choice’ indicating that participants were about the start indirect transition condition. Participants knew that in the indirect transition condition, the first outcome observed was not linked to any choice.

In each run, we included three ‘bonus trials’ (two in the direct transition condition and one in the indirect transition condition), distributed throughout choice trials, which occurred between a choice and the outcome. Participants were shown the two gift cards on either side of a question mark and were given the chance to predict which outcome they would receive in the upcoming feedback period. For each correct gift card prediction, they received an additional £3 on the gift card they would receive at the end.

Behavioral training

Prior to each scanning session, participants completed a shortened (76 trials) behavioral training session. In the training session, participants completed a practice version of the choice task, which had a unique reward schedule. Prior to the practice trials, participants were verbally given a ‘comprehension quiz’ to verify they understood key elements of the task, such as the difference between choice-outcome transitions in each condition. Finally, the distribution of ISI and ITI durations for this session was constrained to 2s to 4s.

MRI data acquisition and preprocessing

The brain images were acquired using a 32-channel head coil from a 3 Tesla Siemens Trio scanner. We used a T2*-weighted echo-planar imaging (EPI) sequence to collect 43 2 mm slices in ascending order, with 1 mm gaps. The in-plane resolution was of 3x3 mm, with a repetition time (TR) of 3.01 s and echo-time (TE) of 70ms. We set the slice angle to a 30 degree tilt relative to the rostro-caudal axis to minimize signal loss from the lOFC (Weiskopf et al., 2006) and applied a local z-shim with a moment of –0.4 mT/m to the OFC. The first five volumes of each block were discarded to allow for T1 equilibration effects. For accurate registration of the EPI to a standard space, we acquired a T1-weighted anatomical scan with a magnetization-prepared rapid gradient echo sequence (MPRAGE) with a 1×1 × 1 mm resolution. Finally, to measure and correct for geometric distortions due to susceptibility-induced field inhomogeneities, a whole-brain field map with dual echo-time images (TE1=10ms, TE2=14.76ms, resolution 3×3 × 3 mm) was also acquired.

We performed slice time correction, corrected for signal bias, and realigned functional scans to the first volume in the sequence using a six-parameter rigid body transformation to correct for motion. Images were then spatially normalized by warping participant-specific images to the reference brain in the MNI (Montreal Neurological Institute) reference brain and smoothed using an 8 mm full-width at half maximum Gaussian kernel. Pre-processing was done in SPM12 (Wellcome Trust Centre for Neuroimaging, http://www.fil.ion.ucl.ac.uk/spm) using Matlab 2018a.

Quantification and statistical analyses

Regression analysis

To test whether participants showed a behavioral effect of learning on choice, we fit logistic regression models estimating the influence of past choice-outcome observations on choices in the current trial t. The regression model included the effect of the past three choices (Ct-n) in combination with the past three observed outcomes (Ot-n). For example, Ct-1Ot-1 represents the influence of the most recent choice and the most recent outcome on the current choice. The model estimates the probability of making choice C on trial t given all nine combinations of previous choices and outcomes:

p(choice=C)t=β0+β1Ct1Ot1+β2Ct2Ot1+β3Ct3Ot1+β4Ct1Ot2+β5Ct2Ot2+                      β6Ct3Ot2+β7Ct1Ot3+β8Ct2Ot3+β9Ct3Ot3+ϵ (1)

The value of Ct-n was taken to be 1 if they chose shape S1 on trial t-n and –1 if they chose S2. The value of Ot-n was taken to be 1 if the outcome on trial t-n matched the currently desired outcome, on trial t, and –1 if it did not. The currently desired outcome was assumed to be the outcome with the largest point value in each trial. Thus, the value of Ct-nOt-n for each trial was 1 if choice C led to the currently desired outcome n-trials back and –1 if it did not:

CtnOtn={ 1 ifCtnled to the currently desired outcome1 ifCtnled to the currently undesired outcome (2)

We fit separate regression models for each condition in each run for every participant. We then averaged the resulting regression coefficients (β) across runs, resulting the participant specific influence of previous decisions on the current choice.

Bayesian computational model

We used a Bayesian computational model to predict choices in each trial t based on each participant’s previously observed shape-outcome relationships (i.e. the estimated associative probability), and reward magnitudes in the current trial. We briefly describe the model here, but a full description can be found in Behrens et al., 2007; see also Arulampalam et al., 2002 for a related model.

Since the true probability of the associative contingencies cannot be observed, the model estimated, in a Markovian fashion, the subjective belief that making a given shape (S) would lead to outcome 1 (O1), and to outcome 2 (O2) with the inverse probability:

p(SO1)=pS
p(SO2)=1pS (3)

where ps denotes the associative probability of a given shape S leading to O1. On each trial (t) the model estimated the current value of pst, based on the previous observations of outcomes y1:t. We modeled beliefs about the likelihood of each contingency as a beta distribution over possible values of pst:

β(pSt|V) (4)

where pst is the mean of the beta distribution and V=exp(v) describes the variance. A large value of v means that the value of pst is likely to change in the next trial whereas low values of v mean that it is unlikely to change. Here, v is referred to as the ‘volatility’ because it controls the learning rate for shape-outcome associations. The change in the estimated volatility from previous trial to the current trial is controlled by k. This describes the model’s belief that some level of change in the volatility is going to occur in the next trial. Because there are no constraints on values for vt, this distribution can be modeled as a Gaussian:

p(vtvt1,K)=N(vt1,k) (5)

After observing each piece of evidence about the contingency between shape S and the outcome, the estimate of each parameter could then be updated following Bayes rule

p(pSt,vt,k)=p(yt|pSt)[p(pSt1,vt1,k|y1:t1)p(vt|vt1,k)dvt1]p(pSt|pSt1,vt)dpSt1 (6)

This gives us the three-dimension joint probability of the parameters. On each trial, the learner only needs to know the estimated contingency between a shape and outcome which is performed first by marginalizing over v and k:

p(pSt)=p(pSt,vt,k)dvtdk (7)

And then taking the mean of the resulting distribution.

pSt^=pStp(pSt)dpSt (8)

For each participant, we initialized the model with a uniform prior over the entire parameter space. All integral computations are performed using numerical grid integration. We then used the prior belief in the associative contingencies pSt^ to compute the expected value of each shape on each trial according to the following formula:

EvSt=[pSt^mO1tα]+[[1pSt^]mO2t[1/α]] (9)

where α was a free parameter and reflected a participant’s preference for O1 over O2 (0< α <2), and mO1t and mO2t indicated the reward magnitudes of the outcome available in the current trial, t. We then measured the likelihood of each participants choice on each trial according to a SoftMax function:

p(choice=S1)=ebEvS1t(ebEvS1t+ebEvS2t)1 (10)

where the free parameter b, captured the level of sensitivity of choices to expected values (inverse temperature; 0<b<1). Free parameters were fitted using Markov Chain Monte Carlo (see below).

Value-based RL- model

This model estimated the value of each shape given the history of rewards received from choosing the shape. The value of each shape was initiated at 0, then updated using the following equation:

VSxt=VSxt1+δ(αRtVSxt1) (11)

where Rt is the magnitude of the reward on trial t and α is an individual difference term estimating a participant preference for one outcome over the other (0< α <2). The learning rate (δ) was estimated for each participant to capture the magnitude of the update (0< δ <1). We entered these values into a SoftMax function to generate choice probabilities:

p(choice=S1)=ebVS1t(ebVS1t+ebVS2t)1 (12)

where the free parameter b, captured the level of sensitivity of choices to expected values (inverse temperature; 0<b<1). Free parameters were fitted using Markov Chain Monte Carlo (see below). Note that learning failures are not trivial to identify in our paradigm and model, because every choice is based on a participant’s preference between gift card outcomes, and the ability of the computational model to accurately estimate participants’ beliefs in the stimulus-outcome transition probabilities.

Parameter estimates

The Bayesian learning model has two free parameters, α and b. The value RL-model had an additional parameter δ. We fit these parameters independently for each participant using custom Markov Chain Monte Carlo (MCMC) code in MATLAB R2018a. Model parameters were bounded by the following: [0<α<2], [0<b<1], [0< δ <1] and were initialized at α=1 and b=0.5, δ=0.5. Each model was fit to maximize the likelihood of a participant’s choices given model estimates of the expected value of each choice on each trial (Equation 10; Equation 12).

Multivariate decoding of causal choice and pending causal choice representations

Using multivariate pattern analysis (MVPA), we aimed to identify regions of the brain that coded knowledge of causal choices during the feedback period. To test this, we estimated the BOLD activity patterns during the feedback phase for each trial using unsmoothed preprocessed images. The feedback periods were modeled as boxcars that had a constant duration lasting 2000ms from the onset of the outcome presentation in each trial. The GLM also included regressors for the decision period (modeled as boxcars with a duration equal to RT) and template presentations (modeled as boxcars with a 1000ms duration). No parametric modulators were added. Each trial was labeled according to which shape was chosen during the choice period (either S1 or S2). For our analysis of ‘pending’ representations in the indirect transition condition, we linked these labels to the immediately following, interim feedback phase - a time when participants should be delaying credit assignment in anticipation of assigning credit in the next trial.

We used a searchlight procedure to identify regions of the brain that contained representations of the causal choice. Each searchlight consisted of a 5 x 5 × 5 voxel cube placed around a centroid voxel in the brain. Each centroid was required to have values in at least 10 of the surrounding voxels to be considered for further processing. The activity in each trial was standardized by z-scoring the β-values across voxels within each searchlight. The data were then split by blocks into training and test sets by run. We used LIBSVM (Chang and Lin, 2011) to fit linear classifiers with training data, which were subsequently used to classify data points from the test set. We iterated through this process for each of the two runs then computed the mean decoding accuracy (average proportion of correct classifications) across both classifiers. The mean decoding accuracy for each voxel was compared to a voxel-specific null distribution which was estimated by repeating this procedure while randomly assigning the labels for 100 permutations at each searchlight. The mean classification accuracy of this null distribution was subtracted off the classification accuracy of each searchlight to give us a measure of how reliably information about the causal choices could be decoded above chance. The resulting maps were then spatially smoothed using a Gaussian kernel with full width at half maximum of 8 mm.

Group-level analyses were performed using a one-sample t-test on accuracy maps across participants (see Group-level statistical inference). We corrected for multiple comparisons over a priori defined ROIs in lOFC, HPC, and lFPC, and used functionally defined ROIs for lOFC in a data driven ROI analysis (see Figure 2—figure supplements 13).

To ensure that participants where we included valuable catch trials in the passive observing ‘template task’. Participants were asked to report which image out of the four (2 gift cards and 2 stimuli) was the last one presented on the screen. They were endowed an extra £10 from which we removed £1 for every incorrect response. There were four catch trials per template run. –5. We corrected for multiple comparisons using small volume correction TFCE. The threshold for significance remained the same in all analyses (pTFCE <0.05).

Multivariate analyses of information connectivity between regions

To test whether decoding of the causal choice at feedback in the indirect transition condition depended on the strength of ‘pending’ representations held during the interim trial, we tested whether the fidelity of representations of the pending causal choice in lFPC was associated with the fidelity of those same choices at the time of credit assignment (i.e. in the feedback phase of the next trial). We used the same decoding procedure mentioned above to classify voxel patterns at feedback in each trial, but additionally calculated the distance of each pattern from the hyperplane that divides categories. Distances were obtained using the equation specified on the LIBSVM webpage (https://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html). Patterns that are more distant from the hyperplane can be thought of as having higher fidelity, and those that are closer to the hyperplane as having less (Schuck and Niv, 2019). We then signed the distance of each point according to whether the predicted category label was correct (+for correct, – for incorrect).

First, we calculated trial-by-trial distance from the hyperplane when causal choice information was believed to be held in a ‘pending’ state, focusing on lFPC as our ‘seed-region’. For this, we calculated the average distances for voxels within the lFPC that showed significant decoding of the pending choice during the interim feedback period (t(19)=2.54, p<0.01 uncorrected). This gave us a measure of the information about the pending item on each trial. We calculated the decoding strength of these same choices when the true outcome was shown, as a measure of the information about the causal choice during credit assignment. Here, we calculated distances for every 5x5 × 5 voxel cube using the same searchlight procedure we described above. Note that the decoding fidelity metric at each time point represents the decodability of the same choice at different phases of the task. These phases were separated by at least 10 s and 15 s on average, which can be sufficient for disentangling unique activity (Mumford et al., 2012; Mumford et al., 2014). We then correlated the decoding distance for representations in lFPC during ‘pending’ state and the decoding distance of those same choices at credit assignment. Thus, the correlation value between them gives us a measure of whether strong representations of pending causal choices in lFPC predict stronger representations at credit assignment.

To confirm that this correlation did not simply arise because the classifier in each region is ‘less wrong’ when the decoder in lFPC makes correct classifications (i.e., all classifications were wrong, but the test region was less wrong), we performed two control analyses. First, we calculated the frequency of correct classifications for the subset of trials in which lFPC also showed correct classifications. We then compared the frequency of correct classifications to a permuted baseline frequency by randomizing trial distances in the searchlight then recomputed the frequency of correct classifications. We subtracted the mean of the randomized baseline from the true frequency of correct classifications. This gave us a measure of decoding accuracy in each searchlight when lFPC showed correct decoding accuracy. Our second control analysis involved rerunning the classification procedure (see Multivariate analyses of credit-assignment and pending representations), but only for trials in which the lFPC had already shown correct decoding of the causal choice in a pending state. Again, we compared the accuracy of the classifier in each searchlight to a randomized baseline frequency by randomizing trial labels and recomputing the accuracy of the classifier. The mean of the randomized distribution was then subtracted from the classification accuracy using the true labels.

Group-level analyses were performed by Fisher-z transforming the correlation values then using a one-sample t-test on each voxel. We corrected for multiple comparisons using TFCE correction on the resulting volumes within a priori defined ROIs. The same thresholds were applied for group level statistical correction (pTFCE <0.05).

Multivariate analyses of identity codes during credit assignment

To test whether the task-independent identity of the causal choice was reinstated during feedback, we trained a linear SVM to decode representations of causal choice stimuli but trained the classifier during periods when participants passively viewed the stimuli outside of the task context (see ‘Template trials’). In each condition the SVM was trained on all the trials of the three template runs and tested during the feedback period of the learning task. For each participant and in each trial, we estimated the BOLD activity patterns using the same GLM as described above (see ‘Multivariate decoding of causal choice and pending causal choice representations’). Further, we used the same procedure in which we randomly permuted the training labels 100 times to create a null distribution of decoding accuracy. We then averaged decoding accuracy over runs and subtracted the mean of the null distribution from the true decoding accuracy of the classifier.

To test for associations between credit assignment precision and causal choice identity decoding accuracy, we first generated estimates of credit assignment precision based on each participant’s behavior during the task. For each participant we created a behavioral matrix, which included β-values from nine combinations of possible choice-outcome relationships used to assign credit when an outcome is observed (see ‘Regression model’). For the direct transition condition, values along the diagonal of this matrix represent appropriate credit assignment given the task structure and should have high positive values if the participant is assigning credit precisely. All other values should be near 0. A similar matrix can be generated for the indirect transition condition, but appropriate for the causal structure of this condition (see Figure 1E). Next, we created a comparison matrix based on an idealized learner, with values of 1 in each cell that represented appropriate credit assignment for the condition, and values of 0 for non-causal relationships. We then correlated each participant specific behavioral matrix with the comparison matrix. High correlation values represent more precise credit assignment, and the average across conditions was taken to be a measure of the overall credit precision in the learning task. We then regressed each participant’s overall credit precision estimate against voxel-level decoding accuracy across participants. We corrected for multiple comparisons using TFCE correction to volumes within pre-defined ROIs. The same thresholds were applied for group-level statistical correction (pTFCE <0.05).

Group-level statistical inference

Group-level testing was done using a one-sample t-test (df = 19) on the cumulative functional maps generated by the first-level analysis. All first-level maps were smoothed prior to being combined and tested at the group level. To correct for multiple comparisons, we first extracted voxels from each ROI in each participant’s first-level activation map, then applied Threshold-Free Cluster Enhancement (TFCE) which uses permutation testing and accounts for both the height and extent of the cluster (Smith and Nichols, 2009). All parameters were set to default parameters (H=2, E=0.5) and used 5000 permutations for the analysis. We report effects that surpassed a pTFCE <0.05 threshold in each ROI.

Region of interest selection

Regions of interest in the prefrontal cortex were generated from anatomically defined regions with unique functional connectivity fingerprints (Neubert et al., 2015). The lOFC ROIs corresponded to bilateral area BA11 (indexes 9 and 30). We included these regions because they have been previously implicated in credit assignment for causal choices, particularly in similar contingency learning tasks (Boorman et al., 2016; Jocham et al., 2016). For the lFPC, we used indexes 14 and 35. All of these ROIs were threshold at 60% inclusion criteria, although our results did not qualitatively change at different thresholds. Finally, we used a priori anatomically defined bilateral HC ROIs to test for effects in hippocampus (Yushkevich et al., 2015). These ROIs are illustrated in Figure 2—figure supplement 1.

Acknowledgements

Funding was provided by a Sir Henry Wellcome Postdoctoral Fellowship and NSF CAREER Award (1846578) to EDB, a Senior Research Fellowship from the Wellcome Trust and an award from the James S McDonnell Foundation to TEB, and a Principal Research Fellowship from the Wellcome Trust to RJD. This work was also in part supported by the Intramural Research Program at the National Institute on Drug Abuse (ZIA DA000642). The opinions expressed in this work are the authors' own and do not reflect the view of the NIH/DHHS.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. For the purpose of Open Access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Contributor Information

Lindsay JH Rondot, Email: ljrondot@ucdavis.edu.

Erie Boorman, Email: edboorman@ucdavis.edu.

Michael J Frank, Brown University, United States.

Michael J Frank, Brown University, United States.

Funding Information

This paper was supported by the following grants:

  • National Science Foundation 1846578 to Erie Boorman.

  • Intramural Research Program at the National Institute ZIA DA000642 to Phillip P Witkowski.

  • Wellcome Trust Principal Research Fellowship to Raymond J Dolan.

  • Wellcome Trust Senior Research Fellowship to Timothy EJ Behrens.

  • James S McDonnell Foundation to Timothy EJ Behrens.

  • Sir Henry Wellcome Postdoctoral Fellowship to Erie Boorman.

Additional information

Competing interests

No competing interests declared.

Editor-in-Chief, eLife.

Author contributions

Formal analysis, Investigation, Visualization, Writing – original draft.

Data curation, Formal analysis, Investigation, Writing – original draft.

Formal analysis, Investigation.

Investigation, Writing – review and editing.

Supervision, Funding acquisition, Investigation, Project administration, Writing – review and editing.

Supervision, Funding acquisition, Project administration, Writing – review and editing.

Conceptualization, Data curation, Formal analysis, Supervision, Funding acquisition, Investigation, Writing – original draft, Project administration.

Additional files

MDAR checklist

Data availability

Unthresholded group-level statistical maps have been deposited at NeuroVault (https://neurovault.org/collections/17702/) and are publicly available as of the date of publication. Links are listed in the key resources table. All original code has been deposited at Open Science Framework (https://osf.io/b9m6q/) and is publicly available as of the date of publication.

The following datasets were generated:

Witkowski PP, Rondot LJH. 2024. Neural mechanisms of credit assignment for delayed outcomes during associative learning. NeuroVault. 17702

Witkowski PP, Rondot LJH. 2024. Neural mechanisms of credit assignment for delayed outcomes during associative learning. Open Science Framework. b9m6q

References

  1. Arulampalam MS, Maskell S, Gordon N, Clapp T. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing. 2002;50:174–188. doi: 10.1109/78.978374. [DOI] [Google Scholar]
  2. Badre D, Hoffman J, Cooney JW, D’Esposito M. Hierarchical cognitive control deficits following damage to the human frontal lobe. Nature Neuroscience. 2009;12:515–522. doi: 10.1038/nn.2277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Badre D, Doll BB, Long NM, Frank MJ. Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron. 2012;73:595–607. doi: 10.1016/j.neuron.2011.12.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Barbas H, Blatt GJ. Topographically specific hippocampal projections target functionally distinct prefrontal areas in the rhesus monkey. Hippocampus. 1995;5:511–533. doi: 10.1002/hipo.450050604. [DOI] [PubMed] [Google Scholar]
  5. Barnett AJ, Reilly W, Dimsdale-Zucker HR, Mizrak E, Reagh Z, Ranganath C. Intrinsic connectivity reveals functionally distinct cortico-hippocampal networks in the human brain. PLOS Biology. 2021;19:e3001275. doi: 10.1371/journal.pbio.3001275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Barron HC, Reeve HM, Koolschijn RS, Perestenko PV, Shpektor A, Nili H, Rothaermel R, Campo-Urriza N, O’Reilly JX, Bannerman DM, Behrens TEJ, Dupret D. Neuronal computation underlying inferential reasoning in humans and mice. Cell. 2020;183:228–243. doi: 10.1016/j.cell.2020.08.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Behrens TEJ, Woolrich MW, Walton ME, Rushworth MFS. Learning the value of information in an uncertain world. Nature Neuroscience. 2007;10:1214–1221. doi: 10.1038/nn1954. [DOI] [PubMed] [Google Scholar]
  8. Boorman ED, Behrens TEJ, Woolrich MW, Rushworth MFS. How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron. 2009;62:733–743. doi: 10.1016/j.neuron.2009.05.014. [DOI] [PubMed] [Google Scholar]
  9. Boorman ED, Behrens TE, Rushworth MF. Counterfactual choice and learning in a neural network centered on human lateral frontopolar cortex. PLOS Biology. 2011;9:e1001093. doi: 10.1371/journal.pbio.1001093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Boorman ED, O’Doherty JP, Adolphs R, Rangel A. The behavioral and neural mechanisms underlying the tracking of expertise. Neuron. 2013;80:1558–1571. doi: 10.1016/j.neuron.2013.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Boorman ED, Rajendran VG, O’Reilly JX, Behrens TE. Two anatomically and computationally distinct learning signals predict changes to stimulus-outcome associations in hippocampus. Neuron. 2016;89:1343–1354. doi: 10.1016/j.neuron.2016.02.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Boorman ED, Witkowski PP, Zhang Y, Park SA. The orbital frontal cortex, task structure, and inference. Behavioral Neuroscience. 2021;135:291–300. doi: 10.1037/bne0000465. [DOI] [PubMed] [Google Scholar]
  13. Bouffard NR, Boorman ED, Libby LA, Mızrak E, Ranganath C. The hippocampus and orbitofrontal cortex jointly represent task structure during memory-guided decision making. Cell Reports. 2021;37:110065. doi: 10.1016/j.celrep.2021.110065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Brett M, Anton J-L, Valabregue R, Poline J-B. Region of interest analysis using the MarsBar toolbox for SPM 99. Neuroimage. 2002;16:S497 [Google Scholar]
  15. Burgess PW, Dumontheil I, Gilbert SJ. The gateway hypothesis of rostral prefrontal cortex (area 10) function. Trends in Cognitive Sciences. 2007;11:290–298. doi: 10.1016/j.tics.2007.05.004. [DOI] [PubMed] [Google Scholar]
  16. Burgess PW, Gonen-Yaacovi G, Volle E. Functional neuroimaging studies of prospective memory: what have we learnt so far? Neuropsychologia. 2011;49:2246–2257. doi: 10.1016/j.neuropsychologia.2011.02.014. [DOI] [PubMed] [Google Scholar]
  17. Burgess PW, Crum J, Pinti P, Aichelburg C, Oliver D, Lind F, Power S, Swingler E, Hakim U, Merla A, Gilbert S, Tachtsidis I, Hamilton A. Prefrontal cortical activation associated with prospective memory while walking around a real-world street environment. NeuroImage. 2022;258:119392. doi: 10.1016/j.neuroimage.2022.119392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Chang CC, Lin CJ. LIBSVM: A Library for support vector machines. ACM Transactions on Intelligent Systems and Technology. 2011;2:1961199. doi: 10.1145/1961189.1961199. [DOI] [Google Scholar]
  19. Costa KM, Scholz R, Lloyd K, Moreno-Castilla P, Gardner MPH, Dayan P, Schoenbaum G. The role of the lateral orbitofrontal cortex in creating cognitive maps. Nature Neuroscience. 2023;26:107–115. doi: 10.1038/s41593-022-01216-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Coutanche MN, Thompson-Schill SL. Informational connectivity: identifying synchronized discriminability of multi-voxel patterns across the brain. Frontiers in Human Neuroscience. 2013;7:15. doi: 10.3389/fnhum.2013.00015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Donoso M, Collins AGE, Koechlin E. Human cognition: Foundations of human reasoning in the prefrontal cortex. Science. 2014;344:1481–1486. doi: 10.1126/science.1252254. [DOI] [PubMed] [Google Scholar]
  22. Foerde K, Shohamy D. Feedback timing modulates brain systems for learning in humans. The Journal of Neuroscience. 2011;31:13157–13167. doi: 10.1523/JNEUROSCI.2701-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Gardner MPH, Schoenbaum G. The orbitofrontal cartographer. Behavioral Neuroscience. 2021;135:267–276. doi: 10.1037/bne0000463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Howard JD, Gottfried JA, Tobler PN, Kahnt T. Identity-specific coding of future rewards in the human orbitofrontal cortex. PNAS. 2015;112:5195–5200. doi: 10.1073/pnas.1503550112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Howard JD, Kahnt T. To be specific: The role of orbitofrontal cortex in signaling reward identity. Behavioral Neuroscience. 2021;135:210–217. doi: 10.1037/bne0000455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Jocham G, Brodersen KH, Constantinescu AO, Kahn MC, Ianni AM, Walton ME, Rushworth MFS, Behrens TEJ. Reward-guided learning with and without causal attribution. Neuron. 2016;90:177–190. doi: 10.1016/j.neuron.2016.02.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Knudsen EB, Wallis JD. Closed-loop theta stimulation in the orbitofrontal cortex prevents reward-based learning. Neuron. 2020;106:537–547. doi: 10.1016/j.neuron.2020.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Koechlin E, Ody C, Kouneiher F. The architecture of cognitive control in the human prefrontal cortex. Science. 2003;302:1181–1185. doi: 10.1126/science.1088545. [DOI] [PubMed] [Google Scholar]
  29. Koechlin E, Hyafil A. Anterior prefrontal function and the limits of human decision-making. Science. 2007;318:594–598. doi: 10.1126/science.1142995. [DOI] [PubMed] [Google Scholar]
  30. Koster R, Chadwick MJ, Chen Y, Berron D, Banino A, Düzel E, Hassabis D, Kumaran D. Big-loop recurrence within the hippocampal system supports integration of information across episodes. Neuron. 2018;99:1342–1354. doi: 10.1016/j.neuron.2018.08.009. [DOI] [PubMed] [Google Scholar]
  31. Kriegeskorte N, Mur M, Bandettini P. Representational similarity analysis - connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience. 2008;2:4. doi: 10.3389/neuro.06.004.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Kurth-Nelson Z, Barnes G, Sejdinovic D, Dolan R, Dayan P. Temporal structure in associative retrieval. eLife. 2015;4:e04919. doi: 10.7554/eLife.04919. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Lamba A, Nassar MR, FeldmanHall O. Prefrontal cortex state representations shape human credit assignment. eLife. 2023;12:e84888. doi: 10.7554/eLife.84888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Luettgau L, Tempelmann C, Kaiser LF, Jocham G. Decisions bias future choices by modifying hippocampal associative memories. Nature Communications. 2020;11:3318. doi: 10.1038/s41467-020-17192-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Mack ML, Preston AR. Decisions about the past are guided by reinstatement of specific memories in the hippocampus and perirhinal cortex. NeuroImage. 2016;127:144–157. doi: 10.1016/j.neuroimage.2015.12.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. McClelland JL, McNaughton BL, O’Reilly RC. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological Review. 1995;102:419–457. doi: 10.1037/0033-295X.102.3.419. [DOI] [PubMed] [Google Scholar]
  37. Mumford JA, Turner BO, Ashby FG, Poldrack RA. Deconvolving BOLD activation in event-related designs for multivoxel pattern classification analyses. NeuroImage. 2012;59:2636–2643. doi: 10.1016/j.neuroimage.2011.08.076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Mumford JA, Davis T, Poldrack RA. The impact of study design on pattern estimation for single-trial multivariate pattern analysis. NeuroImage. 2014;103:130–138. doi: 10.1016/j.neuroimage.2014.09.026. [DOI] [PubMed] [Google Scholar]
  39. Murray EA, Rudebeck PH. Specializations for reward-guided decision-making in the primate ventral prefrontal cortex. Nature Reviews Neuroscience. 2018;19:404–417. doi: 10.1038/s41583-018-0013-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Neubert FX, Mars RB, Sallet J, Rushworth MFS. Connectivity reveals relationship of brain areas for reward-guided learning and decision making in human and monkey frontal cortex. PNAS. 2015;112:E2695–E2704. doi: 10.1073/pnas.1410767112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Noonan MP, Chau BKH, Rushworth MFS, Fellows LK. Contrasting effects of medial and lateral orbitofrontal cortex lesions on credit assignment and decision-making in humans. The Journal of Neuroscience. 2017;37:7023–7035. doi: 10.1523/JNEUROSCI.0692-17.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Park SA, Miller DS, Nili H, Ranganath C, Boorman ED. Map making: Constructing, combining, and inferring on abstract cognitive maps. Neuron. 2020;107:1226–1238. doi: 10.1016/j.neuron.2020.06.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Ranganath C, Ritchey M. Two cortical systems for memory-guided behaviour. Nature Reviews. Neuroscience. 2012;13:713–726. doi: 10.1038/nrn3338. [DOI] [PubMed] [Google Scholar]
  44. Rushworth MFS, Noonan MP, Boorman ED, Walton ME, Behrens TE. Frontal cortex and reward-guided learning and decision-making. Neuron. 2011;70:1054–1069. doi: 10.1016/j.neuron.2011.05.014. [DOI] [PubMed] [Google Scholar]
  45. Schuck NW, Cai MB, Wilson RC, Niv Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron. 2016;91:1402–1412. doi: 10.1016/j.neuron.2016.08.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Schuck NW, Niv Y. Sequential replay of nonspatial task states in the human hippocampus. Science. 2019;364:eaaw5181. doi: 10.1126/science.aaw5181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Seo H, Lee D. Orbitofrontal cortex assigns credit wisely. Neuron. 2010;65:736–738. doi: 10.1016/j.neuron.2010.03.016. [DOI] [PubMed] [Google Scholar]
  48. Shohamy D, Myers CE, Hopkins RO, Sage J, Gluck MA. Distinct hippocampal and basal ganglia contributions to probabilistic learning and reversal. Journal of Cognitive Neuroscience. 2009;21:1821–1833. doi: 10.1162/jocn.2009.21138. [DOI] [PubMed] [Google Scholar]
  49. Smith SM, Nichols TE. Threshold-free cluster enhancement: addressing problems of smoothing, threshold dependence and localisation in cluster inference. NeuroImage. 2009;44:83–98. doi: 10.1016/j.neuroimage.2008.03.061. [DOI] [PubMed] [Google Scholar]
  50. Stalnaker TA, Cooch NK, Schoenbaum G. What the orbitofrontal cortex does not do. Nature Neuroscience. 2015;18:620–627. doi: 10.1038/nn.3982. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MTM; 2014. [DOI] [Google Scholar]
  52. Takahashi YK, Roesch MR, Wilson RC, Toreson K, O’Donnell P, Niv Y, Schoenbaum G. Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nature Neuroscience. 2011;14:1590–1597. doi: 10.1038/nn.2957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Tsujimoto S, Genovesio A, Wise SP. Monkey orbitofrontal cortex encodes response choices near feedback time. The Journal of Neuroscience. 2009;29:2569–2574. doi: 10.1523/JNEUROSCI.5777-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Tsujimoto S, Genovesio A, Wise SP. Frontal pole cortex: encoding ends at the end of the endbrain. Trends in Cognitive Sciences. 2011;15:169–176. doi: 10.1016/j.tics.2011.02.001. [DOI] [PubMed] [Google Scholar]
  55. Walton ME, Behrens TEJ, Buckley MJ, Rudebeck PH, Rushworth MFS. Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron. 2010;65:927–939. doi: 10.1016/j.neuron.2010.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Wang F, Schoenbaum G, Kahnt T. Interactions between human orbitofrontal cortex and hippocampus support model-based inference. PLOS Biology. 2020;18:e3000578. doi: 10.1371/journal.pbio.3000578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Wang F, Kahnt T. Neural circuits for inference-based decision-making. Current Opinion in Behavioral Sciences. 2021;41:10–14. doi: 10.1016/j.cobeha.2021.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Weiskopf N, Hutton C, Josephs O, Deichmann R. Optimal EPI parameters for reduction of susceptibility-induced BOLD sensitivity losses: A whole-brain analysis at 3 T and 1.5 T. NeuroImage. 2006;33:493–504. doi: 10.1016/j.neuroimage.2006.07.029. [DOI] [PubMed] [Google Scholar]
  59. Wikenheiser AM, Schoenbaum G. Over the river, through the woods: cognitive maps in the hippocampus and orbitofrontal cortex. Nature Reviews Neuroscience. 2016;17:513–523. doi: 10.1038/nrn.2016.56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Wimmer GE, Shohamy D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science. 2012;338:270–273. doi: 10.1126/science.1223252. [DOI] [PubMed] [Google Scholar]
  61. Witkowski PP, Park SA, Boorman ED. Neural mechanisms of credit assignment for inferred relationships in a structured world. Neuron. 2022;110:2680–2690. doi: 10.1016/j.neuron.2022.05.021. [DOI] [PubMed] [Google Scholar]
  62. Yushkevich PA, Amaral RSC, Augustinack JC, Bender AR, Bernstein JD, Boccardi M, Bocchetta M, Burggren AC, Carr VA, Chakravarty MM, Chételat G, Daugherty AM, Davachi L, Ding SL, Ekstrom A, Geerlings MI, Hassan A, Huang Y, Iglesias JE, La Joie R, Kerchner GA, LaRocque KF, Libby LA, Malykhin N, Mueller SG, Olsen RK, Palombo DJ, Parekh MB, Pluta JB, Preston AR, Pruessner JC, Ranganath C, Raz N, Schlichting ML, Schoemaker D, Singh S, Stark CEL, Suthana N, Tompary A, Turowski MM, Van Leemput K, Wagner AD, Wang L, Winterburn JL, Wisse LEM, Yassa MA, Zeineh MM, Hippocampal Subfields Group (HSG) Quantitative comparison of 21 protocols for labeling hippocampal subfields and parahippocampal subregions in in vivo MRI: towards a harmonized segmentation protocol. NeuroImage. 2015;111:526–541. doi: 10.1016/j.neuroimage.2015.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Zajkowski WK, Kossut M, Wilson RC. A causal role for right frontopolar cortex in directed, but not random, exploration. eLife. 2017;6:e27430. doi: 10.7554/eLife.27430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Zeithamova D, Dominick AL, Preston AR. Hippocampal and ventral medial prefrontal activation during retrieval-mediated learning supports novel inference. Neuron. 2012;75:168–179. doi: 10.1016/j.neuron.2012.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife Assessment

Michael J Frank 1

This study provides important findings that during credit assignment, the lateral orbitofrontal cortex (lOFC) and hippocampus (HC) encode causal choice representations, while the frontopolar cortex (FPl) mediates HC -lOFC interactions when the causality needs to be maintained over longer distractions. This research offers compelling evidence and employs sophisticated multivariate pattern analysis. However, while the task design captures the delayed component, it lacks the full complexity and ambiguity of the credit assignment process observed in real-world scenarios. Moreover, the data indicated that other frontal regions beyond just lOFC were involved in delayed credit assignment. This work will be of interest to cognitive and computational neuroscientists who work on value-based decision-making and fronto-hippocampal circuits.

Reviewer #1 (Public review):

Anonymous

Summary

The authors conducted a study on one of the fundamental research topics in neuroscience: neural mechanisms of credit assignment. Building on the original studies of Walton and his colleagues and subsequent studies on the same topic, the authors extended the research into the delayed credit assignment problem with clever task design, which compared the non-delayed (direct) and delayed (indirect) credit assignment processes. Their primary goal was to elucidate the neural basis of these processes in humans, advancing our understanding beyond previous studies.

Major Strengths and Considerations

Strengths:

(1) Innovative task design distinguishing between direct and indirect credit assignment.

(2) Use of sophisticated multivariate pattern analysis to identify neural correlates of pending representations.

(3) Well-executed study with clear presentation of results.

(4) Extension of previous research to human subjects, providing valuable comparative insights.

Considerations for Future Research:

(1) The task design, while clear and effective, might be further developed to capture more real-world complexity in credit assignment.

(2) There's potential for deeper exploration of the role of task structure understanding in credit assignment processes.

(3) The interpretation of lateral orbitofrontal cortex (lOFC) involvement could be expanded to consider its role in both credit assignment and task structure representation.

Achievement of Aims and Support of Conclusions

The authors successfully achieved their aim of investigating direct and indirect credit assignment processes in humans. Their results provide valuable insights into the neural representations involved in these processes. The study's conclusions are generally well-supported by the data, particularly in identifying neural correlates of pending representations crucial for delayed credit assignment.

Impact on the Field and Utility of Methods

This study makes a significant contribution to the field of credit assignment research by bridging animal and human studies. The methods, particularly the multivariate pattern analysis approach, provide a robust template for future investigations in this area. The data generated offers valuable insights for researchers comparing human and animal models of credit assignment, as well as those studying the neural basis of decision-making and learning.

The study's focus on the lOFC and its role in credit assignment adds to our understanding of this brain region's function

Additional Context and Future Directions

(1) Temporal ambiguity in credit assignment: While the current design provides clear task conditions, future studies could explore more ambiguous scenarios to further reflect real-world complexity.

(2) Role of task structure understanding: The difference in task comprehension between human subjects in this study and animal subjects in previous studies offers an interesting point of comparison.

(3) The authors used a sophisticated method of multivariate pattern analysis to find the neural correlate of the pending representation of the previous choice, which will be used for credit assignment process in the later trials. The authors tend to use expressions that these representations are maintained throughout this intervening period. However the analysis period is specifically at the feedback period, which is irrelevant for the credit assignment of the immediately preceding choice. This task period can interfere with the interference of ongoing credit assignment process. Thus, rather than the passive process of maintaining the information of the previous choice, the activity of this specific period can mean the active process of protecting the information from interfering and irrelevant information. It would be great if the authors could comment on this important interpretational issue.

(4) Broader neural involvement: While the focus on specific regions of interest (ROIs) provided clear results, future studies could benefit from a whole-brain analysis approach to provide a more comprehensive understanding of the neural networks involved in credit assignment.

Comments after the revision:

The authors have adequately addressed the majority of concerns raised in my previous review. The manuscript has demonstrably improved as a result of these revisions and represents a valuable contribution to the literature on credit assignment.

However, some limitations persist that, while not readily resolvable within the scope of the current study, warrant attention. Specifically, the investigation focuses primarily on the temporal dimension of credit assignment. In real-world scenarios, the complexity of credit assignment extends beyond temporal distance to encompass the inherent ambiguity of causal attribution arising from the presence of multiple potential causal events. Resolving this ambiguity necessitates a form of structural understanding of the environment, a capacity presumably possessed by humans and animals. While the experimental design of this study provides explicit cues regarding the structure of the environment, deciphering such structure in natural settings is a crucial component of the credit assignment process.

Future research should prioritize the investigation of credit assignment within more ecologically valid contexts, focusing on the role of structural understanding in navigating the causal ambiguity inherent in real-world environments. Addressing this aspect will be crucial for developing a more complete and nuanced understanding of credit assignment mechanisms.

In addition, the newly added whole-brain searchlight decoding analysis provides an important nuance regarding the neural substrates of credit assignment (Figure S7). The results reveal not only activity in the lateral orbitofrontal cortex (lOFC), but also, and more robustly, in the medial orbitofrontal cortex/ventromedial prefrontal cortex (mOFC/vmPFC) specifically during the "indirect transition condition" and not the "direct transition condition." This finding suggests a potentially more significant role for mOFC/vmPFC in processing complex, non-immediate credit assignment scenarios. This nuance should be explicitly noted to appreciate the complexity of the neural mechanisms at play.

Reviewer #2 (Public review):

Anonymous

Summary:

The present manuscript addresses a longstanding challenge in neuroscience: how the brain assigns credit for delayed outcomes, especially in real-world learning scenarios where decisions and outcomes are separated by time. The authors focus on the lateral orbitofrontal cortex and hippocampus, key regions involved in contingent learning. By integrating fMRI data and behavioral tasks, the authors examined how neural circuits maintain a causal link between past decisions and delayed outcomes. Their findings offer insights into mechanisms that could have critical implications for understanding human decision-making.

Strengths:

- The experimental designs were extremely well thought-out. The authors successfully coupled behavioral data and neural measures (through fMRI) to explore the neural mechanisms of contingent learning. This integration adds robustness to the findings and strengthens their relevance.

- The emphasis on the interaction between the lateral orbitofrontal cortex (lOFC) and hippocampus (HC) in this study is very well-targeted. The reported findings regarding their dynamic interactions provide valuable insights into contingent learning in humans.

- The use of advanced modeling framework and analytical techniques allowed the authors to uncover new mechanistic insights regarding a complex case of decision-making process. The methods developed will also benefit analyses of future neuroimaging data on a range of decision-making tasks as well.

Weaknesses:

- Given the limited temporal resolution of fMRI and that the measured signal is an indirect measure of neural activity, it is unclear the extent to which the reported causality reflects the true relationship/interactions between neurons in different regions. That said, I believe this concern is minimized by a series of well-thought-out and robust analyses which consistently point to compelling results.

Comments on revisions:

Thank you for your thorough point-by-point responses to my comments and questions. After carefully reviewing the responses and additional analyses/results provided, I do not have further comments. Importantly, I believe the authors have done a great job addressing inevitable limitations that are inherent to fMRI signals. The thoughtful analyses used in the study combined with the timely questions the manuscript is able to address make the study an important contribution to the field.

Reviewer #3 (Public review):

Anonymous

The authors apply multivoxel decoding analyses from fMRI during reward feedback about the cues previously chosen that led to that feedback. They compare two versions of the task - one in which the feedback is provided about the current trial, and one in which the feedback is provided about the previous trial. Reward probability changes slowly over time, so subjects need to identify which cues are leading to reward at a given time. They find that evidence for recall of the cue in lateral orbitofrontal cortex (lOFC) and hippocampus (HC). They also find that in the second condition, where feedback is for the one-back trial, this representation is mediated by the lateral frontal pole (FPl).

Overall, the analyses are clean and elegant and seem to be complete. I have only a few comments, all of which can be public.

(1) They do find (not surprisingly) that the one-back task is harder. It would be good to ensure that the reason that they had more trouble detecting direct HC & lOFC effects on the harder task was not because the task is harder and thus that there are more learning failures on the harder one-back task. (I suspect their explanation that it is mediated by FPl is likely to be correct. But it would be nice to do some subsampling of the zero-back task [matched to the success rate of the one-back task] to ensure that they still see the direct HC and lOFC there.)

(2) The evidence that they present in the main text (Figure 3) that the HC and lOFC are mediated by FPl is a correlation. I found the evidence presented in Supplemental Figure 7 to be much more convincing. As I understand it, what they are showing in SF7 is that when FPl decodes the cue, then (and only then) HC and lOFC decode the cue. If my understanding is correct, then this is a much cleaner explanation for what is going on than the secondary correlation analysis. If my understanding here is incorrect, then they should provide a better explanation of what is going on so as to not confuse the reader.

(3) I like the idea of "credit spreading" across trials (Figure 1E). I think that credit spreading in each direction (into the past [lower left] and into the future [upper right]) is not equivalent. This can be seen in Figure 1D, where the two tasks show credit spreading differently. I think a lot more could be studied here. Does credit spreading in each of these directions decode in interesting ways in different places in the brain?

Comments on revisions:

After revision, I have no additional comments.

eLife. 2025 Apr 15;13:RP101841. doi: 10.7554/eLife.101841.3.sa4

Author response

Phillip Witkowski 1, Lindsay Rondot 2, Zeb Kurth-Nelson 3, Mona M Garvert 4, Raymond J Dolan 5, Timothy E Behrens 6, Erie Boorman 7

The following is the authors’ response to the original reviews.

Reviewer 1:

Point 1 of public reviews and point 2 of recommendations to authors.

Temporal ambiguity in credit assignment: While the current design provides clear task conditions, future studies could explore more ambiguous scenarios to further reflect real-world complexity…. The role of ambiguity is very important for the credit assignment process. However, in the current task design, the instruction of the task design almost eliminates the ambiguity of which the trial's choice should be assigned credit to. The authors claim the realworld complexity of credit assignment in this task design. However, the real-world complexity of this type of temporal credit assignment involves this type of temporal ambiguity of responsibility as causal events. I am curious about the consequence of increasing the complexity of the credit assignment process, which is closer to the complexity in the real world.

We agree that the structure of causal relationships can be more ambiguous in real-world contexts. However, we also believe that there are multiple ways in which a task might approach “real-world complexity”. One way is by increasing the ambiguity in the relationships between choices and outcomes (as done by Jocham et al., 2016). Another is by adding interim decisions that must be completed between viewing the outcome of a first choice, which mimics task structures such as the cooking tasks described in the introduction. In such tasks, the temporal structure of the actions maybe irrelevant, but the relationship between choice identities and the actions is critical to be effective in the task (e.g., it doesn’t matter whether I add spice before or after the salt, all I need to know that adding spice will result in spicy soup). While ambiguity about either form of causal relation is clearly an important part of real-world complexity, and would make credit assignment harder, our study focuses on how links between outcomes and specific past choice identities are created at the neural level when they are known to be causal.

We consequently felt it necessary to resolve temporal ambiguity for participants. Instructing participants on the structure of the task allowed us to make assumptions about how credit assignment for choice identities should proceed (assign credit to the choice made N trials back) and allowed us make positive predictions about the content of representations in OFC when viewing an outcome. This gave the highest power to detect multivariate information about the causal choice and the highest interpretability of such findings.

In contrast, if we had not resolved this ambiguity, it would be difficult to tell if incorrect decoding from the classifier resulted from noise in the neural signal, or if on that trial participants were assigning credit to non-causal choices that they erroneously believed to have caused the outcome due to the perceived temporal structure. We believe this would have ultimately decreased our power to determine whether representations of the causal choice were present at the time of outcome because we would have to make assumptions about what counts as a “true” causal representation.

We have commented on this in the discussions (p.13):

“While our study was designed to focus on the complexity of assigning credit in tasks with different known causal structures, another important component of real-world credit assignment is temporal ambiguity. To isolate the mechanisms which create associations between specific choices and specific outcomes, we instructed participants on the causal structure of each task, removing temporal ambiguity about the causal choice. However, our results are largely congruent with previously reported results in tasks that dissolved the typical experimental trial structure, producing temporal ambiguity, and which observed more pronounced spreading of effect, in addition to appropriate credit assignment (Jocham et al, 2016). Namely, this study found that activation in the lOFC increased only when participants received rewards contingent on a previous action, an effect that was more pronounced in subjects whose behavior reflected more accurate credit assignment. This suggests a shared lOFC mechanism for credit assignment in different types of complex environments. Whether these mechanisms extend to situations where the temporal causal structure is completely unknown remains an important question.”

Point 2 of public reviews and point 1 of recommendations to authors

Role of task structure understanding: The difference in task comprehension between human subjects in this study and animal subjects in previous studies offers an interesting point of comparison…. The credit assignment involves the resolution of the ambiguity in which the causal responsibility of an outcome event is assigned to one of the preceding events. In the original study of Walton and his colleagues, the monkey subjects could not be instructed on the task structure defining the causal relationships of the events. Then, the authors of the original study observed the spreading of the credit assignments to the "irrelevant" events, which did not occur in the same trial of the outcome event but to the events (choices) in neighbouring trials. This aberrant pattern of the credit assignment can be due to the malfunctions of the credit assignment per se or the general confusion of the task structure on the part of the monkey subjects. In the current study design, the subjects are humans and they are not confused about the task structure. Consistently, it is well known that human subjects rarely show the same patterns of the "spreading of credit assignment". So the implicit mechanism of the credit assignment process involves the understanding of the task structure. In the current study, there are clearly demarked task conditions that almost resolve the ambiguity inherent in the credit assignment process. Yet, the focus of the current analysis stops short of elucidating the role of understanding the task structure. It would be great if the authors could comment on the general difference in the process between the conditions, whether it is behavioral or neural.

We would like to thank the reviewer for making this important point. We believe that understanding the structure of the credit-assignment problem above is quite important, at least for the type of credit assignment described here. That is, because participants know that the outcome viewed is caused by the choice they made, 0 or 1 trials into the past, they can flexibly link choice identities to the newly observed outcomes as the probabilities change. Note, however, that this is already very challenging in the 1-back condition because participants need to track the two independently changing probabilities. We believe this is critical to address the questions we aimed to answer with this experiment, as described above.

We agree that this might be quite different from previous studies done with non-human primates, which also included many more training trials and lesions to the lOFC. Both of these aspects could manifest as difference in task performance and processing at behavioural and neural levels, respectively. Consistent with this possibility, in our task, we found no differences in credit spreading between conditions, suggesting that humans were quite precise in both, despite causal relationships being harder to track in the “indirect transition condition”. This lack of credit spreading could be because humans better understood the task-structure compared to macaques or be due to differences in functioning of the OFC and other regions. Because all participants were trained to understand, and were cued with explicit knowledge of, the task structure, it is difficult to isolate its role as we would need another condition in which they were not instructed about the task structure. This would also be an interesting study, and we leave it to future research to parse the contributions of task-structure ambiguity to credit assignment.

Point 3 of public reviews.

The authors used a sophisticated method of multivariate pattern analysis to find the neural correlate of the pending representation of the previous choice, which will be used for the credit assignment process in the later trials. The authors tend to use expressions that these representations are maintained throughout this intervening period. However, the analysis period is specifically at the feedback period, which is irrelevant to the credit assignment of the immediately preceding choice. This task period can interfere with the ongoing credit assignment process. Thus, rather than the passive process of maintaining the information of the previous choice, the activity of this specific period can mean the active process of protecting the information from interfering and irrelevant information. It would be great if the authors could comment on this important interpretational issue.

We agree that lFPC is likely actively protecting the pending choice representation from interference with the most recent choice for future credit assignment. This interpretation is largely congruent with the idea of “prospective memory” (e.g., Burgess, Gonen-Yaacovi, Volle, 2011), in which the lFPC can be thought of as protecting information that will be needed in the future but is not currently needed for ongoing behavior. That said, from our study alone it is difficult to make claims about whether the information maintained in frontal pole is actively protecting this information because of potentially interfering processes. Our “indirect transition condition” only contains trials where there is incoming, potentially interfering information about new outcomes, but no trials that might avoid interference (e.g., an interim choice made but there is nothing to be learned from it). We comment on this important future direction on page 14:

“One interpretation of these results is that the lFPC actively protects information about causal choices when potentially interfering information must be processed. Future studies will be needed to determine if the lFPC’s contributions are specific to these instances of potential interference, and whether this is a passive or active process”

Point 3 of recommendation to authors

A slightly minor, but still important issue is the interpretation of the role of lOFC. The authors compared the observed patterns of the credit assignment to the ideal patterns of credit assignment. Then, the similarity between these two matrices is used to find the associated brain region. In the assumption that lOFC is involved in the optimal credit assignment, the result seems reasonable. But as mentioned above, the current design involves the heavy role of understanding the task structure, it is debatable whether the lOFC is just involved in the credit assignment process or a more general role of representing the task structure.

We agree that this is an important distinction to make, and it is very likely that multiple regions of the OFC carry information about the task structure, and the extent to which participants understood this structure may be reflected in behavioral estimates of credit assignment or the overall patterns of the matrices (though all participants verbalized the correct structure prior to the task). However, we believe that in our task the lOFC is specifically involved in credit-assignment because of the content of the information we decoded. We demonstrated that the lOFC and HPC carry information about the causal choice during the outcome. These results cannot be explained by differences in understanding of the task structure because that understanding would have been consistent across trials where participants choose either shape identity. Thus, a classifier could not use this to separate these types of trials and would reflect chance decoding.

One interpretation of the lOFC’s role in credit assignment is that it is particularly important when a model of the task structure has to be used to assign credit appropriately. Here, we show lOFC the reinstates specific causal representations precisely at the time credit needs to be assigned, which are appropriate to participants’ knowledge of the task structure. These representations may exist alongside representations of the task structure, in the lOFC and other regions of the brain (Park et al., 2020; Boorman et al., 2021; Seo and Lee, 2010; Schuck et al., 2016). We have added the following sentences to clarify our perspective on this point in the discussion (p. 13):

“Our results from the “indirect transition” condition show that these patterns are not merely representations of the most recent choice but are representations of the causal choice given the current task structure, and may exist alongside representations of the task structure, in the lOFC and elsewhere (Boorman et al., 2021; Park et al., 2020; Schuck et al., 2016; Seo & Lee, 2010).”

Point 4 of public reviews and point 4 of recommendation to authors

Broader neural involvement: While the focus on specific regions of interest (ROIs) provided clear results, future studies could benefit from a whole-brain analysis approach to provide a more comprehensive understanding of the neural networks involved in credit assignment… Also, given the ROI constraint of the analysis, the other neural structure may be involved in representing the task structure but not detected in the current analysis

Given our strong a priori hypotheses about regions of interest (ROIs) in this study, we focused on these specific areas. This choice was based on theoretical and empirical grounds that guided our investigation. However, we thank the reviewer for pointing this out and agree that there could be other unexplored areas that are critical to credit-assignment which we did not examine.

We conducted the same searchlight decoding procedure on a whole brain map and corrected for multiple comparisons using TFCE. We found no significant regions of the brain in the “direct transition condition” but did find other significant regions in our information connectivity analysis of the “indirect transition condition”. In addition to replicating the effects in lOFC and HPC, we also found a region of mOFC which showed a strong correlation with pending choice in lFPC. It’s difficult to say whether this region is involved in credit assignment per se, because we did not see this region in the “direct transition condition” and so we cannot say that it is consistently related to this process. However, the mOFC is thought to be critical to representing the current task state (Schuck et al., 2016), and the task structure (Park et al., 2020). In our task, it could be a critical region for communicating how to assign credit given the more complex task structure of the “indirect transition condition” but more evidence would be needed to support this interpretation.

For now, we have added the results of this whole brain analysis to a new supplementary figure S7 (page 41), and all unthresholded maps have been deposited in a Neurovault repository, which is linked in the paper, for interested readers to assess.

Minor points:

There are some missing and confusing details in the Figure reference in the main text. For example, references to Figure 3 are almost missing in the section "Pending item representations in FPl during indirect transitions predict credit assignment in lOFC". For readability, the authors should improve this point in this section and other sections.

Thank you to the reviewer for pointing this out. We have now added references to Figure 3 on page 8:

“Our analysis revealed a cluster of voxels specifically within the right lFPC ([x,y,z] = [28, 54, 8], t(19) = 3.74, pTFCE <0.05 ROI-corrected; left hemisphere all pTFCE > 0.1, Fig. 3A)”

And on page 10:

Specifically, we found significant correlations in decoding distance between lFPC and bilateral lOFC ([x,y,z] = [-32,24, -22], t(19) = 3.81, [x,y,z] = [20, 38, -14], t(19) = 3.87, pTFCE <0.05 ROI corrected) and bilateral HC ([x,y,z] = [-28, -10, -24], t(19) = 3.41, [x,y,z] = [22, -10, -24], t(19) = 4.21, pTFCE <0.05 ROI corrected), Fig. 3C.

Task instructions for the two conditions (direct and indirect) play important roles in the study. If possible, please include the following parts in the figures and descriptions in the introduction and/or results sections.

We have now included a short description of the condition instructions beginning on page 5:

“Participants were instructed about which condition they were in with a screen displaying “Your latest choice” in the direct transition condition, and “Your previous choice” in the indirect condition.”

And have modified Figure 1 to include the instructions in the title of each condition. We thought this to be the most parsimonious solution so that the choice options in the examples were not occluded.

The subject sample size might be slightly too small in the current standards. Please give some justifications.

We originally selected the sample size for this study to be commensurate with previous studies that looked for similar behavioral and neural effects (see Boorman et al., 2016; Howard et al., 2015; Jocham et al., 2016). This has been mentioned in the “methods” section on page 24.

However, to be thorough, we performed a power analysis of this sample size using simulations based on an independently collected, unpublished data set. In this data set, 28 participants competed an associative learning task similar to the task in the current manuscript. We trained a classifier to decode causal choice option at the time of feedback, using the same searchlight and cross-validation procedures described in the current manuscript, for the same lateral OFC ROI. We calculated power for various sample sizes by drawing N participants with replacement 1000 times, for values of N ranging from 15 to 25. After sampling the participants, we tested for significant decoding for the causal choice within the subset of data, using smallvolume TFCE correction to correct for multiple comparisons. Finally, we calculated the proportion of these samples that were significant at a level of pTFCE <.05.

The results of this procedure show that an N of 20 would result in 84.2% power, which is slightly above the typically acceptable level of 80%. We have added the following sentences to the methods section on page 25:

“Using an independent, unpublished data set, we conducted a power analysis for the desire neural effect in lOFC. We found that this number of participants had 84% power to detect this effect (Fig. S8).”

We also added the following figure to the supplemental figures page (42):

Reviewer 2:

I have several concerns regarding the causality analyses in this study. While Multivariate analyses of information connectivity between regions are interesting and appear rigorous, they make some assumptions about the nature of the input data. It is unclear if fMRI with its poor temporal resolution (in addition to possible region-specific heterogeneity in the readouts), can be coupled with these casual analysis methods to meaningfully study dynamics on a decision task where temporal dynamics is a core component (i.e., delay). It would be helpful to include more information/justification on the methods for inferring relationships across regions from fMRI data. Along this line, discussing the reported findings in light of these limitations would be essential.

We agree that fMRI is limited for capturing fast neural dynamics, and that it can be difficult to separate events that occur within a few seconds. However, we designed the information connectivity analysis to maximally separate the events in question – the representations of the causal choice being held in a pending state, and the representation of the causal choice during credit assignment. These events were separated by at least 10 seconds and by 15 seconds on average, which is commensurate with recommended intervals for disentangling information in such analysis (Mumford et al., 2012, 2014, also see van Loon et al., 2018, eLife; as example of fluctuations in decodability over time). This feature of our task design may not have been clear because information connectivity analyses are typically performed in the same task period. We clarify this point on page 32:

“Note that the decoding fidelity metric at each time point represents the decodability of the same choice at different phases of the task. These phases were separated by at least 10 seconds and 15 seconds on average, which can be sufficient for disentangling unique activity (Mumford et al., 2012, 2014).”

However, we agree with the reviewer that the limitations of fMRI make it difficult to precisely determine how roles of the OFC and lFPC might change over time, and whether other regions may contribute to information transfer at times scales which cannot be detected by fMRI. Further, we do not wish to imply causality between lFPC and lOFC (something we believe we do not claim in the paper), only that information strength in lFPC predicts subsequent strength of the same information in the OFC and HC. We have clarified this limitation on page 14:

“Although we show evidence that lFPC is involved in maintaining specific content about causal choices during interim choices, the limited temporal resolution of fMRI makes it difficult to tell if other regions may be supporting the learning processes at timescales not detectable in the BOLD response. Thus, it is possible that the network of regions supporting credit assignment in complex tasks may be much larger. Our results provide a critical first stem in discerning the nature of interactions between cognitive subsystems that make different contributions to the learning process in these complex tasks.”

Reviewer 3:

Point 1 of public reviews:

They do find (not surprisingly) that the one-back task is harder. It would be good to ensure that the reason that they had more trouble detecting direct HC & lOFC effects on the harder task was not because the task is harder and thus that there are more learning failures on the harder oneback task. (I suspect their explanation that it is mediated by FPl is likely to be correct. But it would be nice to do some subsampling of the zero-back task [matched to the success rate of the one-back task] to ensure that they still see the direct HC and lOFC there).

We would like to thank the reviewer for this comment and agree that the “indirect transition condition” is more difficult than the direct transition condition. However, in this task it is difficult to have an explicit measure of learning failures per se because the “correctness” of a choice is to some extent subjective (i.e., based on the gift card preference and the computational model). We could infer when learning failures occur through the computational model by looking at trials in which participants made choices that the model would consider improbable, (i.e., non-reward maximizing) while accounting for outcome preference. However, there are also a myriad of other possible explanations for these choices, such as exploratory/confirmatory strategies, lapses in attention etc. Thus, we could not guarantee that the two conditions would be uniquely matched in difficulty with specific regard to learning even if we subsampled these trials. We feel it would be better left to future experiments which can specifically compare learning failures to tackle this issue. We have now addressed this point when discussing the model on page 31:

“Note that learning failures are not trivial to identify in our paradigm and model, because every choice is based on a participant’s preference between gift card outcomes, and the ability of the computational model to accurately estimate participants’ beliefs in the stimulus-outcome transition probabilities.”

Point 2 of public reviews:

The evidence that they present in the main text (Figure 3) that the HC and lOFC are mediated by FPl is a correlation. I found the evidence presented in Supplemental Figure 7 to be much more convincing. As I understand it, what they are showing in SF7 is that when FPl decodes the cue, then (and only then) HC and lOFC decode the cue. If my understanding is correct, then this is a much cleaner explanation for what is going on than the secondary correlation analysis. If my understanding here is incorrect, then they should provide a better explanation of what is going on so as to not confuse the reader.

SF7 (now Figures 3C and 3D) does show that positive decoding in the HC and lOFC are more likely to occur when there is positive decoding in lFPC. However, the analysis shown in these figures are only meant to be control analysis to further characterise what is being captured, but not necessarily implied, by the information connectivity analysis. For example, in principle the classifier might never correctly decode a choice label in the lOFC or HC while still getting closer to the hyperplane when the lFPC patterns are correctly decoded. This would lead to a positive correlation, but a difficult to interpret result since patterns in lOFC and HPC are incorrect. Figure SF7A (now Fig. 3C) shows that this is not the case. Lateral OFC and HC have higher than chance positive decoding when lFPC has positive decoding. Figure SF7B (now Fig. 3D) shows that we can decode that information even if a new hyperplane is constructed. However, both cases have less information about the relationship between these regions because they do not include the trials where lOFC/HC and lFPC classifiers were incorrect at the same time. The correlation in Figure 3B includes these failures, giving a more wholistic picture of the data. We therefore try to concisely clarify this point on page 10:

“These signed distances allow us to relate both success in decoding information, as well as failures, between regions.”

And here on page 10:

“Subsequent analyses confirmed that this effect was due to these regions showing a significant increase in positive (correct) decoding in trials where pending information could be positively (correctly) decoded in lFPC, and not simply due to a reduction in incorrect information fidelity (see Fig. 3C & 3D).”

And have integrated these figures on page 9:

Point 3 of public reviews:

I like the idea of "credit spreading" across trials (Figure 1E). I think that credit spreading in each direction (into the past [lower left] and into the future [upper right]) is not equivalent. This can be seen in Figure 1D, where the two tasks show credit spreading differently. I think a lot more could be studied here. Does credit spreading in each of these directions decode in interesting ways in different places in the brain?

We agree that this an interesting question because each component of the off diagonal (upper and lower triangles) may reflect qualitatively different processes of credit spreading. However, we believe this analysis is difficult to carry out with the current dataset for two reasons. First, we designed this study to ask specifically about the information represented in key credit assignment regions during precise credit assignment, meaning we did not optimize the task to induce credit spreading at any point. Indeed, our efforts to train participants on the task were to ensure they would correctly assign credit as much as possible. Figure 1F shows that the regression coefficients representing credit spreading in each condition are near zero (in the negative direction), with little individual differences compared to the credit assignment coefficients. Thus, any analysis aiming to test for credit spreading would unfortunately be poorly powered. Studies such as Jocham et al. (2016), with more variability in causal structures, or studies with ambiguity about the causal structure by dissolving the typical trial structure would be better suited to address this interesting question. The second reason why such an analysis would be challenging is that due to our design, it is difficult to intuitively determine what kind of information should be coded by neural regions when credit spreads to the upper diagonal, since these cells reflect current outcomes that are being linked to future choices.

Replace all the FPl with LFPC (lateral frontal polar cortex)

We have no replace “FPl” with “LFPC” throughout the text and figures

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Witkowski PP, Rondot LJH. 2024. Neural mechanisms of credit assignment for delayed outcomes during associative learning. NeuroVault. 17702 [DOI] [PMC free article] [PubMed]
    2. Witkowski PP, Rondot LJH. 2024. Neural mechanisms of credit assignment for delayed outcomes during associative learning. Open Science Framework. b9m6q [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    MDAR checklist

    Data Availability Statement

    Unthresholded group-level statistical maps have been deposited at NeuroVault (https://neurovault.org/collections/17702/) and are publicly available as of the date of publication. Links are listed in the key resources table. All original code has been deposited at Open Science Framework (https://osf.io/b9m6q/) and is publicly available as of the date of publication.

    The following datasets were generated:

    Witkowski PP, Rondot LJH. 2024. Neural mechanisms of credit assignment for delayed outcomes during associative learning. NeuroVault. 17702

    Witkowski PP, Rondot LJH. 2024. Neural mechanisms of credit assignment for delayed outcomes during associative learning. Open Science Framework. b9m6q


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES