Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2022 Sep 12;18(9):e1010500. doi: 10.1371/journal.pcbi.1010500

Thalamic regulation of frontal interactions in human cognitive flexibility

Ali Hummos 1,2,#, Bin A Wang 3,4,#, Sabrina Drammis 1,2,5,#, Michael M Halassa 1,2,*, Burkhard Pleger 3,4
Editor: Stefano Palminteri6
PMCID: PMC9499289  PMID: 36094955

Abstract

Interactions across frontal cortex are critical for cognition. Animal studies suggest a role for mediodorsal thalamus (MD) in these interactions, but the computations performed and direct relevance to human decision making are unclear. Here, inspired by animal work, we extended a neural model of an executive frontal-MD network and trained it on a human decision-making task for which neuroimaging data were collected. Using a biologically-plausible learning rule, we found that the model MD thalamus compressed its cortical inputs (dorsolateral prefrontal cortex, dlPFC) underlying stimulus-response representations. Through direct feedback to dlPFC, this thalamic operation efficiently partitioned cortical activity patterns and enhanced task switching across different contingencies. To account for interactions with other frontal regions, we expanded the model to compute higher-order strategy signals outside dlPFC, and found that the MD offered a more efficient route for such signals to switch dlPFC activity patterns. Human fMRI data provided evidence that the MD engaged in feedback to dlPFC, and had a role in routing orbitofrontal cortex inputs when subjects switched behavioral strategy. Collectively, our findings contribute to the emerging evidence for thalamic regulation of frontal interactions in the human brain.

Author summary

The expansion of frontal cortex during mammalian evolution suggested a prominent role in intelligent and adaptive behavior, overshadowing earlier parts of the brain. However, recent rodent studies have pointed to a role for the cognitive mediodorsal thalamus (MD) in sustaining and flexibly switching representations in the frontal cortex, but direct relevance to human decision-making are unclear. Here, inspired by animal work, we extended a neural model of an executive frontal-MD network and trained it on a human decision-making task for which human neuroimaging data were collected. We found that the model MD thalamus learned an abstract representation of its cortical inputs and provided direct feedback to frontal cortex leading to flexible computations and enhanced task switching. These abstract MD representations and ability to re-organize frontal computations created an efficient mechanism where MD can integrate input from other regions to select behavioral strategy dynamically. The model predicted an efficient route through the MD for frontal region interactions and we found consistent evidence in human neuroimaging data. Collectively, our findings contribute to the emerging evidence for thalamic regulation of frontal interactions in the human brain.

Introduction

The expansion of frontal cortex during mammalian evolution is thought to have given rise to advanced cognition [1]. In humans, frontal cortex consists of several regions that are generally thought to have specialized functions [25]. For example, the dorsolateral region is linked to executing cognitive functions [6], while the orbitofrontal cortex (OFC) is associated with adaptive and flexible responding in a changing environment [7]. The ventromedial region is associated with estimating the value and salience of sensory and cognitive variables and thereby, guides value-based decision-making [8].

Interactions between these regions are thought to implement a variety of functions relevant to the flexibility by which cognitive resources are deployed. For example, interaction between medial prefrontal cortical regions and the anterior cingulate cortex are thought to implement a process of updating beliefs about higher-order contextual associations [911]. Interactions between OFC and dorsolateral prefrontal cortex (dlPFC) are thought to implement a form of reinforcement learning [12,13], which has been recently modeled by artificial recurrent neural networks (RNNs) [14].

Recent studies have indicated a role for the thalamus in cortico-cortical interactions in non-human animals [15,16] and humans [17,18]. These studies have provided a complementary perspective on thalamic function to the classical notion of a relay, often seen in thalamic regions receiving inputs from sensors and projecting to primary sensory cortex [19]. Specifically, such studies of associative thalamic regions have indicated a role for this subcortical collection of excitatory neurons, devoid of recurrent connections, in sustaining, switching, and synchronizing connected cortical areas [15,2024]. For example, in mice, multiple studies have shown that the mediodorsal thalamus (MD) projects to areas of prefrontal cortex with signals able to sustain or switch task-relevant activity patterns, enabling the maintenance of working memory on one end and task switching on another [20,21]. The anatomical divergent and convergent cortical projections originating from the thalamus makes the thalamus an ideal location not only for information integration but also deployment of simultaneous instructions [25,26]. Specifically, studies have shown diffusive projections from neurons in MD thalamus across several areas of prefrontal cortex in rodents [2729,29,30]. Beyond the MD’s dense connections with frontal cortical regions, the structure also receives projections from regions within the temporal cortex, the midbrain, and the basal ganglia [16,27,31], positioning it well to have a broad integrative function. Given the studies above, two questions are pertinent: first, is the MD thalamus a functional mediator and a communication bridge for frontal cortical areas? Second, what are the precise computational functions of such engagement?

In this study, we built on recent work in rodents and extended a neural model that captures interactions between the MD and frontal cortex [21], hereafter referred to as the thalamocortical “neural model”. We implemented biologically plausible learning rules at the corticothalamic projections capable of learning abstractions of the dominant representations in dlPFC (as inferred from the rodent work). A key model assumption from the rodent work was that the MD thalamus performs an intermediate-level computation rather than a sensory signal relay [20,21]. We first confirmed that this is the case for the human brain, by examining fMRI data of human participants performing a probabilistic inference task [32]. We then asked the neural model to solve components of the same probabilistic inference task and identified a role for the MD in rapid and flexible gating of human dlPFC strategy representations.

We next used the model to interrogate computational mechanisms underlying interactions across frontal regions. Therefore, we considered computations partly attributed to the OFC in representing latent states in a task [33,34] and appropriately switching them at change points [35]. We considered whether an OFC model would communicate its representation of behavioral strategy using a corticocortical OFC-dlPFC pathway or using a transthalamic OFC-MD-dlPFC pathway. Simulations showed that leveraging the thalamic route by OFC inputs required far fewer neurons and a shorter-lasting switch signal compared to a direct OFC-dlPFC pathway. Critically, human fMRI data subjected to dynamic causal modelling was consistent with the trans-thalamic route, providing evidence for the MD having a central role in regulating distributed frontal cortical interactions. All told, our modeling and experimental analysis provide compelling evidence for thalamic gating and integration in human cognitive flexibility.

Results

Thalamocortical neural model solves human task and shows comparable behavioral dynamics

Human probabilistic inference task

Participants learned the predictive strength of a tactile cue (up or down) in forecasting a subsequently presented target stimulus (up or down, Fig 1A) [32,36]. Participants responded by either matching the input cue (match rule) or responding with the opposite direction (non-match rule). The task consisted of unannounced pseudorandom blocks with different association levels between rule and rewards, which we refer to as distinct contexts requiring different sensorimotor mappings akin to the rodent work [21]. Contexts had cue-reward associations that were strongly predictive (90% of match responses rewarded, or 10%), moderately predictive (70% and 30%) and non-predictive (50%).

Fig 1. Comparison between human participants and neural model performance on a probabilistic inference task.

Fig 1

A. Behavioral task design: on individual trials, human participants were asked to generate a behavioral choice based on a somatosensory cue input. Inputs were either up or down, and the mapping onto the choice (match or non-match) changed covertly across blocks of ~50 trials. Participants were told what the correct choice was on every trial (target). The mapping was probabilistic with different association levels (90%, 70% and 50%). B. Cartoon of the main neural model used in this study. The model contains a recurrently connected reservoir of rate neurons that capture the dlPFC, with long range connections to feedforward MD neurons with a winner-take-all mechanism. The model receives task relevant sensory inputs (up and down) as well as value of the mapping strategies (match and non-match). The model outputs behavioral choice (up or down) through two output neurons that linearly decode the reservoir activity. C. Mean task performance of human participants (purple) and of the neural model (green) plotted over 10 blocks of association levels, to show qualitative similarity (middle). Grey bars reflect the correct rule that was rewarded on that trial and green bars show the model response (bottom). The red trace shows the match strategy value signaled by the vmPFC (top, see Methods) which served as an input to the dlPFC RNN. These expected reward values closely follow the association levels through the experiment (black line, top). D. Comparison of model and human performance of high (90/10% predictability), low (70/30%), and non-predictive (50%) association levels. Pairwise comparisons of human responses across association levels were statistically significant for all pairs (left; two-way ANOVA; 90/10% and 70/30%, ***p < 0.0001; 90/10% and 50%, ***p < 0.0001; 70/30% and 50%, ***p < 0.0001). Pairwise comparisons of model responses across association levels were statistically significant for all pairs (middle; two-way ANOVA; 90/10% and 70/30%, ***p < 0.0001; 90/10% and 50%, ***p < 0.0001; 70/30% and 50%, *p = 0.039). Comparison of model and human responses with association levels revealed no significant differences for all association levels (right; two-way ANOVA). E. Correlation between human and model performance with randomly sampled chunks of trials (humans: 15 trials, model: 150 trials) sampled from all trials (Pearson correlation, R2 = 0.89, ***p < 0.0001).

The neural model

Previous data analysis of fMRI signals in this paradigm revealed multiple interactions between frontal cortical areas and the MD thalamus [32, 37]. To derive insight into putative computational mechanisms, we extended a neural model encompassing the MD and an executive PFC region (Fig 1B) which had been used to implement a context switching task in mice [21]. Our neural model includes a reservoir of recurrently connected neurons to model the executive prefrontal cortex (dlPFC in primates), consistent with prior modeling efforts in this domain [38,39]. Sensory input (up and down tactile cues) are projected onto the dlPFC (presumably through sensory cortical hierarchy), consistent with recordings of sensory responsive neurons in the mouse prefrontal cortex [21], and analysis of human fMRI detailed below. Responses (up and down) are read out from two output units with the output weights learned through a biologically-plausible learning rule (node perturbation [40], see Methods).

Model MD neurons are devoid of local excitatory connections consistent with electrophysiological findings [41] and include a winner-take-all mechanism representing mutual inhibition through the thalamic reticular nucleus [41,42]. Previous work in mouse indicated that the MD thalamus integrates its prefrontal inputs to generate a compressed representation of the task’s context [20,21]. We explored whether such representation could emerge in the model based on a Hebbian learning rule at the dlPFC to MD corticothalamic connections, inspired by recent work [43] suggesting that Hebbian learning mediated unsupervised clustering. Using higher number of MD neurons showed similar results so for simplicity, we kept the number at 2 neurons for the current study.

Previous experimental evidence showed that MD thalamic projections engaged direct inhibition of prefrontal neural activity and indirect amplification of local functional connectivity through disinhibitory mechanisms [20,44,45]. We model these experimental observations by an additive (suppressive) and multiplicative (amplifying) thalamic output to the prefrontal reservoir recurrent neural network (RNN). Model behavior did not require learning at these projections and they were chosen randomly and held fixed for stability.

A key difference between the previous animal work upon which our neural model was designed and the current task is that inferring a block switch in the current task requires a record of recent rewards. So, we provided the model with inputs representing the value of the two behavioral strategies in recent trials (i.e., expected rewards from match vs. non-match strategies). These value inputs improved model behavioral flexibility and provided a signal to disambiguate the two strategies based on recent reward patterns (S1A–S1D Fig). Consistent with this modeling assumption, fMRI data of humans performing the same task showed that vmPFC activity correlated with prior belief about outcome value, indicating the availability of such signals in the prefrontal region (S1E and S1F Fig, also see [46,47]).

Human and model performances

Our dlPFC-MD neural model was able to solve the human inference task, exhibited performance qualitatively similar to human performance, and was able to flexibly switch between the behavioral strategies at block changepoints (Fig 1C–1E). Specifically, the neural model was able to match human performance on outputting the correct rule over blocks with multiple association levels and with qualitatively similar dynamics in switching between blocks (example in Fig 1C). Human participants learned the task successfully, with more correct performance in blocks with high predictability (90%/10% association levels) compared to low predictability (70%/30%) or unpredictability (50%; Fig 1D, left). There were no significant differences in behavioral performance between human data and the neural model when sampled from high, low, and non-predictive association levels (Fig 1D); performance was highly correlated when randomly sampled from trials across the task (Fig 1E).

Human fMRI reveals MD engagement in a cognitive role in the task

To draw direct parallels between the neural model and human data, we returned to our previously collected dataset [32] and took a different analytical approach from the published work. In mice, prefrontal inhibition impaired overall performance, while MD inhibition led to impairment specifically when animals were required to switch behavior [21]. As such, we constrained the generalized linear model (GLM) analysis to the prefrontal and thalamic areas, and used a smaller smoothing kernel for improved spatial resolution (see Methods). With this new analysis, we were able to directly evaluate the engagement of different prefrontal and thalamic areas by a small-volume correction [48] (for brain-wide GLM see S1 and S2 Tables). Within the frontal regions, GLM showed significant activity modulation in dlPFC in trials involving strategy switching (Switching>Staying) ((t (27) = 7.07, p < 0.001; small-volume FWE-corrected; Fig 2A), and interestingly, also the orbitofrontal cortex (OFC) (Switching>Staying, t (27) = 7.31, p < 0.001; small-volume FWE-corrected, Fig 2A, middle).

Fig 2. Human fMRI confirms the dlPFC, not OFC or MD, receives the driving sensory input.

Fig 2

A. Strategy switching (Switching>Staying) entailed significant activity in right dlPFC, right OFC and right MD; here projected on axial, coronal and sagittal MRI brain slices. Brain activations displayed at p < 0.001 (uncorrected, red) and p < 0.05 (whole brain FWE correction, yellow; small-volume correction, blue). B. Models with dlPFC, MD and OFC receiving sensory tactile driving “inputs” respectively. C. Bayesian Model Selection revealed dlPFC, not OFC or MD, as the input region. The left panel shows the average and standard error of log-evidence for all models in the corresponding inputs family (we have 6 models for each input family, see S5 Fig). The sum of the log-evidence over subjects was used to compute the expected model posterior probabilities of each input family which are shown in right panel. (dlPFC–dorsolateral prefrontal cortex; OFC–orbitofrontal cortex; MD–thalamic mediodorsal nucleus).

Within the thalamic region, the MD thalamus was the only thalamic area modulated by switching (Fig 2A, right, peak MNI coordinates x/y/z = 12/−10/8, t (27) = 4.37, p = 0.02, small-volume FWE-corrected). These findings corroborated the involvement of these areas included in the neural model as a direct interpretation of physiological recordings from mice performing a similar task [21], and added the OFC as a region of interest which we introduce to the model in a later section.

This frontal network switching motif (dlPFC, OFC and MD) offered a starting point for testing one of the key neural model assumptions; that the MD thalamus is engaged in an intermediate computational role (compressing dlPFC sensorimotor mapping) rather than as a relay of sensory inputs to the dlPFC. By leveraging Dynamic Causal Modelling (DCM) together with family-level Bayesian model selection (BMS) on fMRI data, we first evaluated three competing hypotheses; do sensory tactile inputs arrive to the dlPFC, OFC, or MD (DCM C-matrix, Fig 2B)? BMS evaluates the evidence that compromises the accuracy and complexity of a causal model and uses the group log-evidence (which is equivalent to the log group Bayes factor) to quantify the relative goodness of fit. The sum of the log-evidence over subjects can be used to compute how likely it is that a specific model generated the data at the group level (i.e., the expected model posterior probability). Consistent with the neural model construction and a direct extrapolation from previous animal work [21], we found that the model posterior probability revealed very strong evidence in favor of the causal model in which sensory inputs first enter the dlPFC-MD-OFC circuit through the dlPFC (posterior probability = 1.00, Fig 2C). While the non-relay functions of thalamic nuclei in the human brain are suspected, evidence remains indirect and, to our knowledge, this is one of the few direct demonstrations that a human thalamic area engages in a task not to relay sensory inputs. In addition, this finding also relates to controversies about the direction of communication between the dlPFC and OFC where some evidence points to the OFC as the recipient of multi-modal sensory input [49] and thereby, likely forwarding such information to dlPFC, while other evidence points to OFC being hierarchically higher and contextualizing representations in dlPFC [50]. Our findings are in support of the later view, at least in this task. Our analysis did not include lower-level sensory areas but focused on the subgraph of the three areas implicated in cognitive flexibility in this task.

Neural model MD activity suggests a role in flexible dlPFC switching

To gain computational insight into our neural model of dlPFC-MD, we considered the encoding properties of dlPFC and MD. As predicted from previous studies [21], the dlPFC reservoir showed mixed selectivity, with neurons encoding a variety of response patterns (Fig 3A–3C), including neurons encoding a cue in a single context (Fig 3A), neurons encoding the same cue across contexts (Fig 3B) and neurons switching their encoding across contexts (Fig 3C). Importantly, those neural patterns were readily read-out as behavioral responses in the output neurons (up versus down; S2A and S2B Fig). In comparison, neural model MD neurons encoded the dominant strategy within a block, which we interpret as a temporal context, as it is not signaled by a contextual cue but rather can be inferred from its structure in time (Fig 3D and 3E). MD neurons signaled the temporal context by exhibiting a dominant output that co-varied with the level of association across blocks (S2C and S2D Fig).

Fig 3. Context encoding in MD enhances prefrontal cognitive flexibility in the neural model.

Fig 3

A-C. Example responses from 3 dlPFC neurons to sensory input (‘Up’ or ‘Down’) in trials where match strategy was rewarded (match block, left panels), or non-match strategy was rewarded (non-match block, middle panels), and the trial averaged responses over the first 3 blocks of the experiment (right panels). D. The same panels for one MD neuron showing responses selective to match context. E. Quantification of the coding properties using regression to decode task-relevant variables from population activity of either dlPFC or MD. Context could be decoded from either dlPFC or MD (left), but cue was only decodable from dlPFC activity and not MD (right). F. We tested the modulation of MD activity by rule or by cues by taking the correlation between trial-averaged MD activity and a vector representing either the cues (1 for up trials and -1 for down) or rule (1 for match rule or -1 for non-match). To demonstrate the effect of the dynamic Hebbian eligibility trace we tested MD modulation as we varied the eligibility trace time constant (tau). Lower (higher) tau values biased MD to encode sensory cues (rule). G. Comparing mean (± STD) of ratio correct responses for MD intact and MD lesioned models from 20 random instantiations of the models. Experiment had blocks alternating between 90% and 10% match trials as other association levels showed similar patterns. H. Same as in G, but showing the effects of lesioning MD after training with MD intact for a few blocks (lesion marked with red arrow). Statistical testing with t-test over performance means in each block for 20 random runs.

We considered how these model MD representations might emerge and examined the Hebbian learning dynamics at the dlPFC-MD corticothalamic connections. Since components of Hebbian plasticity operated at different timescales [5153], we considered a dynamic Hebbian eligibility trace and parametrized its time-constant. Smaller time-constant values biased MD to encode the fast-changing signals (up or down cues) while larger time-constant values biased MD to encode the slower temporal context signals (i.e., encoding the dominant strategy within a block, Fig 3F). This finding suggested that the brain may be using long-timescale eligibility traces at the corticothalamic connections for the thalamus to encode current temporal context.

To evaluate the contribution of MD activity patterns to dlPFC function, we performed a perturbation study in which we eliminated the output of the MD to the dlPFC. Compared to the MD intact model, the MD lesioned model exhibited fairly equivalent steady-state performance but had perturbed transient performance when the level of association switched across blocks (switching deficit; Fig 3G). We noticed similar patterns with other association levels (70%, 50%, 30%) and we only show alternating 90% and 10% blocks in following experiments for clarity. Of note, the difference between the MD intact and MD lesioned models starts to vanish, indicating that after multiple rehearsals the cortical network may eventually learn the two rules and switch between them using slow synaptic plasticity. Further, in a model trained with intact MD, lesioning after acquisition of behavior causes a similar pattern with poor performance selectively in trials following rule switches, but again the plasticity from dlPFC to the output layer eventually recovers the behavior with sufficient training (Fig 3H). These impairment patterns are consistent with animal experiments where MD inhibition delayed the acquisition of tasks requiring flexible responses [23,24,5456], but eventually MD inhibited animals achieved normal performance after extensive training [23,24,57].

To understand the effect of MD engagement on dlPFC encoding of task-relevant variables, we systematically compared dlPFC neural activity patterns across the MD intact and MD lesioned neural models. We found that MD engagement reduced the number of dlPFC neurons that show cue selectivity (up vs. down) across the two behavioral contexts rather than only in one (Figs 4A and S3), indicating that the MD may impart a higher degree of computational specialization within the dlPFC. Consistent with this notion, the connection weights from cue selective neurons in the dlPFC reservoir to the output neurons grew more consistently during the appropriate behavioral context, if they were cue selective only in that context (Fig 4B and 4C, see Methods), indicating that MD partitioning of dlPFC activity patterns allowed for readily separable and computationally efficient representations in dlPFC.

Fig 4. Selective dlPFC sensory representations in the MD intact neural model.

Fig 4

A. We identified model dlPFC cells that were cue-responsive in one of the two behavioral contexts selectively and cells that were cue-responsive in both contexts non-selectively, using logistic regression. We found the most significant difference between the MD intact and MD lesioned models was an increase in non-selective cells with MD lesioning (Mann-Whitney U test, two-tailed, both contexts p-value 0.003, match-selective p = 0.210, non-match selective p = 0.449). Bars represent means ±SEM B. Schematic shows cartoon of cells that were cue-responsive only in one of the behavioral contexts, and the connections to the appropriate output cell (left). These cells had their weights to the appropriate output neuron increase coherently in the corresponding behavioral context and remain dormant out of context (right). C. In comparison, cells that responded to the same cue in either context showed impaired learning with forgetting of previous learning after each experimental block, as rewarded behavioral responses reversed. We show weight averages from the population of dlPFC cells encoding on of the input cues, as identified by logistic regression, to the appropriate output neuron, with shaded areas representing ±SD.

We sought to dissociate the contributions of the multiplicative and the additive thalamocortical projections, representing, respectively, the activation of fast spiking and the indirect inhibition of dendrite-targeting dlPFC interneurons [58]. We varied the strength of each type of projections by multiplying the corresponding weights by a factor. We noticed that increasing the weights of the additive projections increased performance and cognitive flexibility up to a limit, after which performance deteriorated again (S4A and S4C Fig). Performance increase coincided with decreased correlation of neural activity between match trials and non-match trials (S4C and S4E Fig), while deterioration in performance coincided with increased correlation between the up and down trials, decreasing the separability of incoming sensory inputs and impairing sensory processing (S4E Fig). In contrast, gradually amplifying the multiplicative effect improved performance significantly, reaching the ideal ceiling performance after learning the behavior in the first two blocks (0.9 ratio correct from the 3rd block onwards) (S4B and S4D Fig). Increasing the multiplicative projection strength lowered the neural activity correlation between the two contexts while maintaining low correlation between the two sensory cues (S4F Fig). Of note, to isolate the contribution of the thalamocortical projections, this experiment had minimal model components with one MD neuron artificially activated for each context (no corticothalamic learning), and no value inputs (from our representation of vmPFC).

Model MD provides an efficient route for dlPFC-OFC interaction in strategy switching

Analysis of human prefrontal fMRI revealed engagement of OFC in the network switching motif (dlPFC, OFC and MD), therefore we extended our model with a module representing OFC computations, to gain further computational insight into the role of the MD in possible OFC-dlPFC interactions. The OFC is known to represent latent states in a task [33,34] and switch them appropriately at change points [35]. Therefore, such latent variables could adjust dlPFC activity patterns beyond what the MD alone may provide. To test this hypothesis with minimal further assumptions on the neural architectures involved, we abstracted OFC as two nodes representing inference over the current block, one node representing predominantly match strategy blocks and the other, non-match strategy blocks. The two nodes were updated according to a Bayesian estimator of the probability of an experiment block switch, which enabled the OFC representation to detect block changes faster than the dlPFC-MD circuit alone could. Hebbian synapses connected the two OFC nodes to MD and dlPFC neurons, allowing the OFC representation to learn which MD and dlPFC neurons were active for each behavioral strategy.

As such, the OFC can now use its representations and learned connections to the neural model to cast a vote on current behavioral strategy. We considered two possible hypotheses, that OFC would send its vote directly to dlPFC (cortico-cortical pathway), or through MD (transthalamic pathway). We found both pathways to be effective at improving performance in the trials right after a block switch (Fig 5A and 5B), as the OFC Bayesian estimator rapidly detected the switch and biased the dlPFC-MD circuit towards the appropriate behavioral strategy. However, the pathway to dlPFC required learning 1000 parameters and sending the signal to 500 dlPFC neurons. To compare the computational efficiency of the two pathways, we explored whether the number of involved dlPFC neurons can be lowered, but we found that performance deteriorated rather rapidly with numbers less than the full 500 neurons (Fig 5C). In contrast, the MD pathway required learning 4 parameters and sending the signal to only 1 of the 2 MD neurons (Fig 5C). While this result depends to an extent on our proposal that MD has a more compressed representation of the task context, these trends would also hold due to MD having fewer number of neurons. In addition, the MD pathway required the OFC signal to be active for fewer timesteps at the beginning of each trial (Fig 5D). As such, the model indicated computational advantages to the transthalamic pathway, and we next turned to human data to find support for either pathway in the human brain.

Fig 5. Human fMRI confirms computational advantage of transthalamic routing of cortical information suggested by neural model.

Fig 5

We considered a Bayesian model capturing computations in the OFC that detected context changes and cued the dlPFC-MD neural model to switch behavioral strategy accordingly. Two nodes of the Bayesian model represented belief in current context (predominantly match or non-match context). Context nodes associated through Hebbian learning to neurons in MD or dlPFC that happened to be active. When the OFC model detected a switch, it activated the appropriate node for the inferred context, thereby activating dlPFC or MD neurons associated with it. A. The OFC switch signal enhanced performance immediately after a block change, and both sending the OFC signal to dlPFC or through a transthalamic route to MD performed comparably. B. Average performance in the 30 trials after a switch shows comparable enhancement in performance in the model with OFC sending switch signal to MD or to dlPFC, compared with model with no OFC signaling (paired t-test, MD vs dlPFC p = 0.6, dlPFC vs OFC off ***p <0.0001, and MD vs OFC off ***p <0.0001). C. The transthalamic pathway required OFC to activate only the 1 of the 2 MD neurons, whereas lowering the number of dlPFC neurons receiving signal immediately deteriorated performance. D. We varied the number of timesteps within a trial that OFC input needed to be active, and the transthalamic path required shorter duration signal to achieve ceiling performance. E. Turning to human fMRI analysis, Bayesian model selection from fMRI data revealed that the causal pattern with connection from OFC to MD rather than to dlPFC was superior in explaining the difference between trials where humans switched strategy or stayed in the same strategy (model posterior probability = 0.99, for all model comparisons see S5B Fig). F. Modulation of connectivity in Switching from fMRI data. Green connections represent positive modulation, red connections represent negative modulation. The number alongside the connections indicate the coupling parameters from Bayesian parameter averaging, which represents the strength of effective connectivity in Hz.

Human fMRI consistent with a transthalamic pathway for switching behavioral strategy

We next applied DCM to human fMRI data and compared resulting patterns of causal interactions amongst dlPFC, OFC, and MD, as humans switched their behavioral strategy (DCM B-matrix). Among six competing dlPFC-OFC-MD causal patterns (S5 Fig) in which the driving input was directed to dlPFC, the pattern with a causal connection from OFC to MD outperformed the alternatives including causal patterns with OFC to dlPFC connections (posterior probability = 0.99, Fig 5E). The Bayesian parameter averaging (BPA) across participants revealed that connection from OFC to MD and from dlPFC to OFC, as well as connections between dlPFC and MD in both directions were all significantly strengthened by strategy switching (Switching>Staying, posterior density > 0.95) (Fig 5F). These DCM results based on human fMRI data support the engagement of transthalamic pathway when humans switched their decision strategy.

Discussion

Altogether, by combining neural network modeling with experimental approaches, our neural model bridged insights from animal recordings to making interpretative predictions about networks of the human brain. The ability to use existing fMRI datasets from human participants was greatly beneficial to guiding and validating the modeling effort.

Our neural model extends previous neural network models of frontal-thalamic networks with the aim of capturing task-relevant computations in the human brain. We introduced Hebbian plasticity at the corticothalamic connections from dlPFC to MD to cluster recent experiences into contexts. The Hebbian plasticity trace evolved over time and served as an inductive bias to detect temporal patterns, similar to recent work [43]. By allowing the network to discover consistent patterns of neural activity, it no longer required the task identifier input provided to MD, as used in previous work [21,59,60]. Analysis of human fMRI data confirmed the active involvement of MD showing distinct patterns of directional interactions when behavioral strategy was switched.

While this probabilistic inference task might be a simplified version of the probabilistic inference humans are capable of, its simplicity allowed for direct comparison with a closely related contextual behavior task in mice [20,45,61]. This setup allowed us to assume that the model architecture inspired by mouse physiological recordings is applicable to the human task, and then verify these assumptions by analyzing human fMRI data. In addition, in comparison to the mouse cross-modal attention task, upon which the model was based, the human task was a purely tactile task, whereas the mouse task is cross-modal, requiring the animals to attend to auditory and visual cues. In the mouse task, the current rule is directly signaled per trial and requires no inference, while in the human task, rules were inferred from monitoring of expected versus actual outcome. Further, humans were instructed that rule changes may occur (although unannounced), whereas animals had to learn the task structure from trial-by-trial errors. Despite these differences, we found that the extended rodent model reliably captured human behavior, emphasizing the robustness of our model.

Multiplicative gain control reduces catastrophic forgetting

Recent studies found that MD input modulated the gain of cortical recurrent connections multiplicatively [20], and such findings were the basis for a recent neural network model of the thalamocortical loop [21]. This current study extends this work by allowing for plastic corticothalamic connections, and random and fixed thalamocortical connections, adding to the generality of this neural model. Recent modeling work showed a role for multiplicative gating in continual learning problems where it reduces interferences between learned memories and reduces catastrophic forgetting [62,63]. Similar to our method, these models learned to infer task or context boundaries from current inputs, but had a gating mechanism abstracted at a different hierarchical level. For example, the dendritic gated networks mapped the molecular layer interneurons in the cerebellum gating the dendrites of a single Purkinje cell [63], whereas our formulation examines a more coordinated and far-reaching control of the entire dlPFC RNN to redefine its computations dynamically, presumably through coordinated MD modulation of dlPFC local interneurons [20,58].

Transthalamic communication as a workspace and a coordination hub

Higher order thalamic nuclei provide an alternative transthalamic route to the direct corticocortical communication pathway [19,64], but the possible roles of the transthalamic route remain unclear. Here we demonstrate through a neural model and human fMRI analysis that the transthalamic route can be an efficient pathway to collect votes from prefrontal regions to coordinate current behavioral strategy. While this emphasizes a role for MD as a shared workspace, it is important to note that these contextual signals emerged from interacting with executive dlPFC which also relied on these contextual signals from MD to separate learning of two different input-output transformations. By interacting intimately with MD, dlPFC gained additional flexibility in its computations and more contextualized sensory representations. At the same time, this creates a structure where dlPFC computation can be modulated, by other brain areas, through influencing MD activity. A recent hypothesis proposes the basal ganglia implementing a fast reinforcement learning algorithm to discover MD activity patterns that improve current rewards [65]. In such a scheme, basal ganglia and possibly other cortical areas could exploit the ability to reconfigure the computations of another cortical area by simply modulating the associated MD, without requiring any long-term changes or plasticity. Future theoretical and computational work should be helpful in addressing these questions.

Current and expected reward associations

Guided by fMRI findings, we examined how frontal cortical areas (dlPFC and OFC) might exploit the compressed representations in MD to coordinate their computations. The OFC is one of the main cortical targets of MD projections, and is engaged in goal-directed behavior [66]. Disconnecting OFC from thalamic afferent leads to impairments in assessing action-outcome associations [56]. The lateral parts of OFC, which we found in our fMRI analyses in the context of switching the strategy, represent predicted outcomes [67] and mediate the updated valuation of outcome desirability [68,69]. Another frontal region, the vmPFC encodes beliefs about outcome values [70] which can be an intermediate signal needed for the computations of belief over current context in OFC. One possibility is that vmPFC sends value information to MD where it can be read out by both dlPFC and OFC. However, since vmPFC was active in both arms of our fMRI contrast, our data did not make statements about where vmPFC might potentially send its value information, and we simply enter it as input to dlPFC. A cascade of frontal regions coordinating their computations seems plausible and will be the subject of future modeling work.

Modular computational compartments

Our neural model used fixed thalamocortical connections and successfully captured behavior and flexible contextual switching, indicating a potential architectural design where the thalamocortical connections are either fixed or are updated at a much lower rate compared to corticothalamic connections. We initially reasoned that Hebbian learning at the thalamocortical synapses would induce formation of modules in the RNN, but in practice it created an echo chamber where the corticothalamic synapses learned an association and the thalamocortical synapses reinforced it, preventing the model from switching to other states, indicating that the brain might have special solutions for learning reciprocal projections between two areas.

Recent work by Tsuda and colleagues revealed interesting segregation of an RNN into distinct computational modules merely by diffusely scaling neuronal inputs multiplicatively, by neuromodulation effect [71]. In contrast, our neural model suggests contextual signals from MD that ultimately modulate activity of local cortical interneurons to disinhibit neuronal dendrites and scale their input gain [58]. Our setting allows for precise and more hierarchical reconfiguration of the RNN as each neuron gets a specific (yet randomly chosen) weight by which its inputs are scaled, creating multiple computational units or modules, defined by the domains of inhibitory interneurons. Physiological data report an MD multiplicative gating, and we here demonstrate its effect in partitioning the dlPFC activity into distinct computational units, which is of great interest to understanding the neural basis of higher cognition on one end and developing machine learning counterparts on another. Segregating neural networks into modules might allow for flexible reconfiguration of frontal area circuitry and to solve tasks by composing modules in different combinations.

Conclusions

Although primate MD is equipped with some unique features not identified in rodents, such as an intrinsic population of interneurons releasing GABA [72], our rodent MD-dlPFC neural model successfully captured contextual switching behavior in humans, suggesting a context-specific, evolutionary well-preserved interaction between both areas. Connectivity between MD and prefrontal cortex is however not restricted to single areas, but multifaceted so that individual neurons from different MD subdivisions exhibit reciprocal projections that diverge to simultaneously contact several different PFC subdivisions [30,73,74]. Distinct MD populations thus seem to facilitate corticocortical communication via trans-thalamic pathways [75] to support several different cognitive entities in rodents and non-human primates, such as working memory [76] and attentional control [20]. Whether the underlying MD-PFC interactions can also be translated to humans and segregated in a comparable way as shown by our MD-dlPFC neural modeling approach remains an open question for future studies.

Given that the MD appears to make cortical computations within and across areas more efficient, we imagine that perturbations of this process may decrease the computational efficiency of frontal network and result in possible behavioral abnormalities as a result (e.g. increased switching failures may manifest as perseverative behavior or obsessive thoughts in autism or obsessive compulsive disorder). More importantly, future work targeting the thalamus with non-invasive neuromodulation, guided by theory and models, could potentially provide a viable strategy for augmenting task-engagement, switching and general cognitive abilities in disorders of the frontal cortex.

Methods

Ethics statement

This study was approved by the ethics committee of the medical faculty at the Ruhr University Bochum, Germany (registration number 16–5786). All participants gave written informed consent prior to participation.

Code generated for this study has been deposited at <GitHub.com/hummosa/MD-reservoir>. Datasets generated during this study were previously published [32] and are available at < https://ruhr-uni-bochum.sciebo.de/s/xKBPyW7ZGLs2q2g>.

Participant details

This study uses the previously collected dataset from twenty-eight healthy human participants (mean age ± SD: 25.3 ± 3.9 years) [32,36]. Only male participants were included to avoid influences of hormonal fluctuations over the menstrual cycle in females on learning and associated blood-oxygen-level-dependent (BOLD) signals [7779]. All participants were right-handed as assessed by the Edinburgh Handedness Inventory [80] and had normal or corrected to normal vision.

Probabilistic inference task design and structure

Participants were instructed to infer the next tactile stimulus from current cue by either choosing the same direction (match) or the opposite (non-match), and the predictability of the cue was manipulated by changing the strength of the cue-target contingency over time. The whole experiment consisted of 10 blocks, 2 blocks for each of the five cue-target contingencies (i.e., strongly predictive: 90% or 10%; moderately predictive: 70% or 30%; and non-predictive trials: 50%). Each block consisted of an equal number of the two tactile patterns (‘Up’ or ‘Down’, presented for 500ms in random order as either cue or target stimulus. The sequence of blocks was pseudorandomized and fixed across participants to ensure inter-subject comparability [81,82]. Participants were informed that the cue-target contingency would change over time, but the exact probabilities were kept unknown. To avoid the prediction of a new block onset, the two blocks for each prediction strength were once presented with 30 trials and the other time with 40 trials. The experiment consisted of 350 trials in total, which we split into three runs, each lasting ~10 min. The details of the equipment used were described previously [32,36].

In the computational model, up and down cue inputs along with vmPFC input were fed to the dlPFC RNN using four input nodes. Trials were 200 time-steps long (to represent 200ms), and inputs were presented for the first 100ms. To directly compare human and model performance in Fig 1, the model required two pre-training blocks to match human performance at the beginning of the experiment, and also required 10 times as many trials in a block to reach the same performance at the end of the block. In all other analysis, model results are shown without any pre-training and the model was trained on blocks with 500 trials.

FMRI data processing

The fMRI data acquired from 28 human participants in the previously published papers [32,36] were used in this study. The preprocessed data were re-analyzed using the general linear model (GLM) in SPM 12. Simply, images were applied to slice time correction, spatial realignment, and normalization to the MNI template using the unified segmentation approach [83]. Finally, normalized images were spatially smoothed using a Gaussian filter with a full-width half-maximum kernel of 6 mm. Data were high pass filtered at 1/128 Hz.

For each participant, we conducted a first level GLM. Events were time-locked to the onset of the presentation of the cue stimulus using stick functions and split into two regressors, one for Staying (no strategy switches) and the other one for Switching trials. Onsets were convolved with the canonical hemodynamic response function in an event-related fashion. Functional data from the three runs were concatenated using the spm_fmri_concatenate.m function in SPM12. Using this function, the high-pass filtering and temporal nonsphericity calculations were corrected to account for the original session length. Regressors of no interest included the presentation of the target stimuli (all trials collapsed to a single regressor), invalid trials (i.e., missing or late responses) and the six head-motion parameters as estimated during the realignment procedure.

Using the GLM, we investigated neural activity underlying switching the response strategy using the contrast “Switching> Staying”. Based on the previous mice studies and to draw direct parallels to our neural model, we applied a smaller smoothing kernel (specifically, from using a 8mm smoothing kernel to a 6mm one), and we constrained the GLM analysis in the prefrontal areas and thalamus. Therefore, the resulting contrast images for all participants were applied to the group level one-sample t-test and thresholded at p<0.05, small-volume corrected for multiple comparisons at the voxel level (family-wise error rate, FWE) using anatomical regions of interest of the entire prefrontal cortex and thalamus (PFC was defined using the Automatic Anatomical Labeling atlas [84] and thalamus using Anatomy Toolbox [85]).

Dynamic causal modeling (DCM) of fMRI data

To investigate effective connectivity and compare different network theories, we performed bilinear deterministic DCM [86] using SPM12. Following our GLM results and a-priori-hypotheses, network nodes were represented by the two prefrontal regions, i.e., right dlPFC, OFC, and the thalamic MD in which neural responses related to Switching>Staying in the GLM. Subject-specific time series were extracted from the nearest local maximum within a sphere with a radius of 8 mm centered on each node’s group maximum. The first Eigenvariate was extracted across all voxels surviving p = 0.05, uncorrected, within a 4 mm sphere centered on the individual peak voxel. The resulting BOLD time series were adjusted for effects of no interest (e.g., invalid trials, and movement parameters). Following these procedures, time series for all three areas could be extracted in 24 out of the 28 participants. In four participants we could not obtain right thalamic time series because activations did not meet the above criteria. These four participants were excluded from DCM analyses.

DCMs are specified in terms of fixed (endogenous) connections between brain areas and condition-specific changes in the strength of these connections (i.e., modulatory or bilinear effects). Given that every brain region is connected reciprocally [87], we assumed reciprocal endogenous connections between the three regions of interest. We specified models with different modulatory (bilinear) effects by the conditions (Staying and Switching). Specifically, we tested whether the feedforward, feedback or both connections between thalamus and dlPFC or OFC were modulated by Switching>Staying. To restrict model space, we fixed connections in both directions between dlPFC and MD, i.e., original mouse network switching motif 18. This resulted in 6 competing models that were further evaluated (S5 Fig).

We used a two-step fixed-effects Bayesian model selection (BMS) to assess the most likely among a set of competing models about the mechanisms that generated observed data in dlPFC, OFC, and MD. The fixed-effect analysis was used as we assumed that all participants were best described by the same brain network, but with different connection strengths. In a first step, we used family-level inference to determine whether models with sensory input into dlPFC, OFC or MD best explained the observed data. For simplicity, we did not include lower-level sensory areas in our DCMs, focusing instead on the subgraph modeling of the brain regions engaged by switching strategy. Second, the models of the winning family were compared to identify the most plausible DCM of modulation effects by the strategy switching in the task.

BMS rests on the model evidence P(y|m), i.e. the probability of observing the data y given a particular model m, and uses the group log-evidence to quantify the relative goodness of models. The log-evidence of a model is calculated as the negative variational free energy under the Laplace approximation. It represents a generic tradeoff between the accuracy and complexity of a model that can be derived from first principles of probability theory. The sum of the log-evidences over participants (which is equivalent to the log group Bayes factor) can be used to compute how likely it is that a specific model generated the data at the group level (i.e., the expected model posterior probability).

Parameters of the winning model were then summarized by Bayesian parameter averaging (BPA), which computes a joint posterior density for the entire group by combining the individual posterior densities [88,89]. A posterior probability criterion of 95% was considered to reflect significant effective connectivity.

Thalamocortical neural model

The model has four main structures: dlPFC as a recurrent neural network (RNN) with reservoir dynamics, MD as 2-neuron network with winner take all dynamics, vmPFC as an estimator of available action strategy values, and OFC as a Bayesian observer to estimate probability of an experimental block changepoint.

The dlPFC receives cue information input (up or down) along with estimated behavioral strategy rewards from the vmPFC (see ‘Maximum Likelihood-Algorithmic model of vmPFC’ below). Network output was a readout projection from dlPFC to 2 neurons representing responses ’up’ or ’down’. Reciprocal connections connect dlPFC and MD with weights to MD following Hebbian plasticity and weights from MD fixed with an additive and multiplicative effects on dlPFC. Two sets of weights were plastic: from dlPFC to output using node perturbation, and from dlPFC to MD using Hebbian learning. Connections from input to dlPFC, recurrent connections in dlPFC, and connections from MD to dlPFC are all fixed.

We describe each element below.

Dorsolateral prefrontal cortex model architecture

We modeled the dorsolateral prefrontal cortex (dlPFC) as a reservoir RNN of 500 neurons with fixed recurrent connectivity weights wij. Input to each neuron Ii evolved dynamically according to:

τdIi(t)dt=Ii+kwikinpInp(t)+jwijrj(t)+j(mwimMDrmMD(t))wijrj(t)+mwimMDrmMD(t)+ρinoise

Where wi,kinp were the weights from inputs unit to dlPFC neurons (described below). wi,k were the recurrent weights drawn from a gaussian (mean:0, var: 0.75/sqrt(2*Nsub), where Nsub = 200 is the no of cells receiving each input), with each row then zero-centered to maintain stability. wi,kMD were the weights from MD. The time-constant t was set to 0.02 and the discretization time step dt was 0.001 ms. ρinoise is the noise added to each neuron generated as:

ρinoiseN(0,13)(t)

Input was then passed through a tanh activation function and activations were clipped to positive values.

ri(t)=[tanh(Ii(t))]+

The multiplicative and the additive input from MD to dlPFC uses the same set of fixed weights drawn from a normal distribution of mean 0, and variance 0.1.

Inputs to the dlPFC

Each cue input unit projected to a population of 200 dlPFC neurons, and each half of these 200 neurons received inputs from one of the value input units, i.e., strategy value (match or non-match) and cue (up or down) combinations are selectively projected to neurons in the dlPFC reservoir such that each strategy-cue combination projects to 100 dlPFC neurons. Weights from input units to their target dlPFC population were sampled uniformly with values between 0.2–0.4. Although inputs to the model were structured, the recurrent dynamics eventually dictated neuronal behavior (not shown), and rather we resorted to decoding neuronal responses using logistic regression as described below.

Medio-dorsal thalamus model architecture

The medio-dorsal thalamus (MD) was modeled as two neurons with no recurrent connectivity but with winner-take-all (WTA) dynamics capturing inhibitory interactions with thalamic reticular nucleus. The two neurons received input from dlPFC neurons through Hebbian plastic weights wijdlPFC. The two MD neurons had the same time-constant as dlPFC neurons and at each time step the neuron with the higher inputs had its activation set to 1 while the other had 0.

Learning at the dlPFC to MD weights followed a Hebbian rule, with the pre-synaptic eligibility trace evolving dynamically with a time constant τpre (2000ms), as follows:

Δρ(t)=1τpre[r(t)ρ(t1)]
ΔwdlPFCMD=αρ(t)rMD(t)

Where Δρ(t) is the change in the Hebbian eligibility trace at time t, r(t) and rMD(t) are the firing rates of dlPFC and MD respectively, and α is the learning rate (set to 5x10-5). Both pre- and postsynaptic activities were centered around zero by subtracting their respective means. To prevent Hebbian learning instabilities, weights were clipped to [-0.1, 0.1] and rescaled to keep their L2 norm constant and the end of each trial. In experiments where MD was lesioned, we removed the multiplicative and the additive MD input to dlPFC neurons, but model performance was impaired due to a decrease in dlPFC activations, so we compensated by multiplying the recurrent connections in dlPFC with a compensation factor (1.3) to bring activation levels back up to values similar to the MD intact model.

Output and readout weights learning

Learning at these synapses is implemented through node perturbation, a biologically-realistic approximation of backpropagation of error [40]. Node perturbation injects a small input into the output neurons and evaluates the changes in the network performance measure (e.g., reward or error). Each input synapse to that node is then changed by a product of the brief noise magnitude and activity of the pre-synaptic neuron, scaled by the improvement of network performance. Mathematically, input to the output neuron i at time step t evolves according to:

τdIi(t)dt=jwijoutrj(t)+ζi(t)

where ζi(t) is drawn from a uniform distribution between (-1, 1).

Weights wijout are then updated according to:

wijout=wijout+μζi(t)rj(t)δe

where μ is the learning rate (5x10-5), rj is the activity of the pre-synaptic dlPFC neurons, and δe is the change in reward received.

Maximum likelihood-algorithmic model of vmPFC

We computed the likelihood of strategy values (or equivalently, association levels) over the range [0, 1] given strategies executed and rewards collected over a horizon of 10 recent trials. We calculate the expected reward probability from executing a match strategy (q) as:

P(q|dTh:T)=t=ThTP(dt|q)P(dt)

Where h is the number of trials in the horizon, dth:t are the rewards and actions in the horizon buffer, and T is the total number of trials. Strategy values sent to dlPFC are expected reward of match and non-match strategies outputted as {q, 1- q}.

We used the value with the maximum likelihood estimate as the current estimate of the q value and used it as output to dlPFC for the next trial. If the current estimate starts to diverge from recent average of rewards by more than a threshold (0.15), the model considers a possible block change, and adds its current estimate to a stored bank of previously encountered values, and then considers if the recent average of rewards matches any of the stored previously encountered values, and if not, it concludes a newly encountered behavioral contexts and it creates a new estimate initialized at 0.5 and adapts it to match the recent average of rewards. As such, this model relies on slowly changing estimates of outcome values to disambiguate outcome values in different contexts. For a more rapid detection of context changes we also utilize a Bayesian model as described next.

Bayesian observer representing OFC computation

The representation of latent variables and appropriately updating them at changepoints was modelled as a Bayesian estimate of the probability of a context switch, and two latent variables representing match or non-match context that get updated when the probability of a context switch reached a threshold. Specifically, the probability of a switch in context at some point in the most recent h trials as:

P(s|dTh:T)=1ht=ThTP(d|st)P(st)P(d)

Where P(s|dT−h:T) is the probability of a context switch given the actions and outcomes over the last h (= 10) trials. The prior P(st) is initialized at 1/T. P(d|st) is the likelihood function of a switch at trial t considering the h most recent actions and outcomes (d), calculated as follows:

P(d|si)=t=ThTh+iP(dt|c)t=Th+iTP(dt|c¯)

Where c is the current context belief nodes taking value [1,0] for match context and [0,1] for non-match context. They are modeled as a Markov chain simplified and made deterministic. Once the probability of a changepoint reaches threshold, the two context belief nodes are flipped and the buffer containing recent trials is cleared (horizon reset to current trial).

c={c,P(s)<θs[1,1]c,P(s)>θs

The two nodes representing current inferred context c are associated through Hebbian plasticity to active MD and dlPFC cells in that trial as follows:

ΔwOFCMD=αcrMD
ΔwOFCdlPFC=αcrdlPFC

Where rMD and rdlPFC were the trial-averaged-activity of MD and dlPFC neurons respectively, and α is the learning rate set at 0.001. Vectors c, rMD, and rdlPFC were zero-centered. After each trial the weights from OFC to MD and to dlPFC were normalized to a length of 1 and 2, respectively, as OFC to dlPFC weights were shared amongst a larger number of neurons. The connections from OFC to the target areas were activated after the first two blocks of the experiment, allowing Hebbian learning to converge before producing behavior. They were also activated briefly in the beginning of each trial and we parametrized this duration to look for the shortest signal necessary to influence the dlPFC-MD circuit behavior (Fig 5D). In the experiment reported in Fig 5C, we varied the number of neurons receiving the signal we used a duration of 10ms, conversely, in the experiment in e Fig 5D we varied the duration and used 500 as the number of neurons.

Finding cue-responsive neurons with logistic regression

To find cue-responsive neurons, we applied logistic regression over the mean trial activity in the interval 50–150 time steps. For an individual neuron, we plotted the points (x, y) where x is the mean trial activity and y is 1 if the cue was up and 0 if the cue was down. Applying logistic regression gave us a pseudo r-squared value for each neuron; in the case that the data was perfectly linearly separable, we set r-squared to be 1.

Applying this technique to trials within the match and non-match contexts separately, allowed us to identify which neurons were cue responsive in one or both contexts, or not at all. These classes of interest correspond to the corners of the R2 histograms in S3A Fig. Neurons that did not respond to cues were in the upper-left corner, and neurons that responded in both contexts were in the lower-right corner, whereas neurons that were cue responsive only in one of the contexts where in the upper-right and lower-left corners.

Statistics

Fig 1D. To compare human and model performance across association levels, we compared the average performance per association level across all human participants (N = 28; mean age ± SD: 25.1 ± 3.8 years; male) and model simulations (N = 5). Bars represent mean +/- SEM; the mean represents average performance for high (90/10%, human mean 0.79 +/- 0.05, model mean 0.80 +/- 0.02), low (70/30%, human mean 0.58 +/- 0.05, model mean 0.59 +/- 0.02), and non-informative (50%, human mean 0.433 +/- 0.05, model mean 0.50 +/- 0.01) association levels, where an observation is the accuracy of a single trial (0 or 1 for correct or incorrect response. We performed a two-way ANOVA with multiple comparison testing (anovan.m, multicompare.m MATLAB) with high, low, and non-informative association levels as factors and human versus model interactions. Statistical significance was found across association level groups in both the human and model data (human high/low, p < 0.0001, high/non-informative, p < 0.0001, low/non-informative, p < 0.0001; model high/low, p < 0.0001, high/non-informative, p < 0.0001, low/non-informative, p = 0.039). Lack of statistical significance was found between human and model data for each association level grouping (90/10% and 70/30%, p = 0.990; 90/10% and 50%, p = 0.990; 70/30% and 50%, p = 0.101).

Fig 1E. In order to measure the correlation between human and model performance, we compared the average performance across all human participants (N = 28; mean age ± SD: 25.1 ± 3.8 years; male) to model performance (N = 5). We binned trial data across time with bin size equal to 15 and 150 for the human and model, respectively, for the same number of bins across human model data. We computed the mean of all bins, comparing the human and model mean per bin. The correlation coefficient r was calculated using corr2.m (MATLAB), and significance was determined by Pearson correlation coefficient using corrcoef.m (MATLAB). The correlation between human and model performance was Pearson correlation coefficient, r = 0.94, p = 1.35e-08.

Fig 2A. To investigate the neural activations underlying switching the response strategy, we compared the trials between switching and staying response strategy (Switching vs. Staying). Individual contrast images (Switching > Staying) were applied to the group level one-sample t-test in SPM. The group-level significant was thresholded at p<0.05, small-volume corrected for multiple comparisons at the voxel level (family-wise error rate, FWE) using anatomical regions of interest in the prefrontal cortex and thalamus.

Fig 3E. We used linear regression model to decode cue or context from either MD or dlPFC population trial-averaged activity and compared the accuracy of either model to chance level prediction (50%). We tested 10 network instantiations and found that context was decodable above chance from both dlPFC or MD activity (p-values 3.7e-7, 1.5e-14) while cue was decodable from dlPFC activity (p-value 1.2e-8) but not MD activity (p-value 0.1778).

Fig 3G and 3H. To compare the mean accuracy of the MD lesioned and MD intact models we simulated 20 runs of the model with different initializations of the network (different random seeds). We averaged performances in each experimental block for each of the 20 runs. We used a t-test to compare the two groups of means from the MD lesioned and MD intact models in each block. P-values < 0.05 labeled with *, <0.01 **, < 0.001 with ***, >0.05 with ns.

Fig 4A. To compare the number of cue-responsive neurons in a model with and without a MD component, we identified and counted the neurons that were cue-response exclusively in match contexts, the neurons that were cue-response exclusively in non-match contexts, and the neurons that were cue-responsive regardless of the context across multiple simulations (N = 10; see Finding cue-responsive neurons with logistic regression). Bars represent mean +/- SEM; the mean represents average number of neurons for a given model and given cue-responsivity (match in MD model 37.5+/-2.57; match in MD lesioned model 52.3+/-3.58; non-match in MD intact model 39.3+/-6.10; non-match in MD lesioned model 55.2+/-2.63; context indifferent in MD intact model 71.4+/-6.55; context indifferent in MD lesioned model 138.9+/-9.19). Pairwise comparison was computed within groups. Statistical significance was determined by a two-tailed Mann-Whitney test (match context p = 0.01; context-indifferent p = 0.9e-4). Lack of statistical significance in the non-match context was determined by a two-tailed Mann-Whitney test (p = 0.10).

Fig 4B. To examine which weights discriminated between the two models, we averaged weights from all cells identified as encoding cues only in a match context and grouped by whether they projected to the matching or the non-matching output neuron, and plotted their evolution throughout the experiment using alternative 90% match and 10% match blocks. We repeated the same plot for cue-responsive cells in both contexts. Shaded areas are +/- SEM.

Fig 5E–5F. We compared the effective connectivity within the winning model of DCM between Switching and Staying. Parameters of the winning model were summarized by Bayesian parameter averaging, which computes a joint posterior density for the entire group by combining the individual posterior densities. The parameter estimates for Switching and Staying trials were then contrasted against each other. A posterior probability criterion of 90% was considered to reflect significant difference. The parameters were significantly higher for Switching than for Staying trials for dlPFC->OFC, OFC->MD, MD->dlPFC and dlPFC->MD connections.

S1F Fig. We compared the prior belief–related activity in vmPFC between Switching and Staying. The prior belief-related activity (beta value) in vmPFC extracted from both Staying and Switching were applied to a two-tailed paired-sample t-test. The significant level defined as p < 0.05. Bars represent mean +/- SEM.

Supporting information

S1 Table. Brain regions related to the Strategy Switching (Switching > Staying) (p < 0.001, uncorrected).

(DOCX)

S2 Table. Brain regions correlating with the prior belief in Switching and Staying.

(p < 0.001, uncorrected).

(DOCX)

S1 Fig. Value inputs improve model behavioral flexibility and correlate with fMRI activity in human vmPFC.

We compared the model with and without value input (+vmPFC, -vmPFC respectively), and also with and without output from MD (+MD, -MD, respectively). A. Performance of model with no value inputs shows little behavioral flexibility with significant dips in performance at block changepoints. B. Trial-averaged MD activity correlation with ground truth present context for each trial. Only model of both value inputs and intact MD output to dlPFC showed appropriate encoding of context in MD. C. The weights from dlPFC neurons to output showed significant learning and unlearning in model without value input. Adding value inputs leads to more coherent weight changes across blocks, and adding MD further reduce destructive learning and unlearning. D. We considered the distribution of output weights at the end of experiment and looked at its variance as a measure of dispersion. Weights that learned and then unlearned across blocks remained close to zero with low variance. E. The comparison of prior belief–related fMRI activity in vmPFC between Switching and Staying in human participants. Prior belief about the outcome value, derived from Hierarchical Gaussian Filter model (details in previously published papers [32,37]), correlated with the activity in vmPFC for both Staying and Switching strategy. The results projected on axial MRI brain slices. Brain activations displayed at p < 0.001 (uncorrected, red). For other regions see S2 Table. F. The prior belief-related activity (beta value) in vmPFC extracted from both Staying and Switching were applied to a paired sample t-test. The result showed no significant difference between Staying and Switching (p = 0.30). The error bar depicts the standard error.

(TIF)

S2 Fig. Reponses of output neurons and MD neurons.

A. Responses of the ‘Down’ output neurons during a trial with input sensory cue as ‘Up’ or ‘Down’, first in a match context (left), then in a non-match context (middle), and trial-averaged activity of the same output neuron over the first three block of the experiment (right). Output neuron activity to the correct responses separates with readout weight learning over the first three blocks of the experiment and correctly reads out target output from dlPFC activity. B. Same as in A but for the ‘Up’ output neuron. C. Behavioral responses of the model across an experiment starting with 90% and 10% blocks, but then including a 70% and a 50% association level blocks. D. Responses of one of the MD neurons with some increased responses in the opposite context when the association level is less predictive. E. Weights from dlPFC neurons to one MD neuron, averaged over the 5 neurons with the highest eligibility trace in block 1 (blue), or the 5 lowest (orange) showing opposing weight dynamics across blocks.

(TIF)

S3 Fig. Changes in representation and behavior in the MD lesioned model.

A. Histogram of correlation values (pseudo R-squared) between individual neuronal activity and input cue, with trials drawn from a match context on the x axis, and a non-match context on the y-axis (See Methods, ‘Finding cue-responsive neurons with logistic regression’). The histogram for the MD intact model (left) and MD lesioned model (middle) were subtracted to highlight the differences (right), showing mainly fewer cells that correlated with input cue in both contexts.

(TIF)

S4 Fig. Multiplicative thalamocortical projections separate context representations without obfuscating sensory cue representations.

Experiments with alternating blocks of 10% and 90% of match trials rewarded with no vmPFC inputs, but one MD neuron artificially activated for each type of block and Hebbian learning at the corticothalamic projections disabled. We tested the model with multiplicative or additive thalamocortical projections separated and parametrized the strength of either projections by multiplying their respective values by a factor from 1 to 40. A. Model performance for selected strengths of additive projections and B. multiplicative projections. C. Increasing the strength of additive projections initially improves performance and behavioral flexibility at block changepoints but performance rapidly peaks and declines. D. Increasing the strength of multiplicative projections consistently increases performance until reaching 0.9 ratio correct steadily with minimal dips at block changepoints. E. Increasing additive projections strength decreased neural activity correlation in dlPFC for match vs non-match contexts, but also rapidly increased correlation between up and down trials until they become highly correlated, and presumably difficulty to decode. F. Increasing multiplicative project strengths decreases neural activity correlation in dlPFC between contexts with limited increase in correlation between cues.

(TIF)

S5 Fig. Causal pattern space of DCM for fMRI data.

A. Illustration of the model space for Bayesian model selection. We specified six models to determine whether the feedforward, feedback or both connections between OFC were modulated by the strategy switching. We constrained the space to models with assumed dlPFC to MD reciprocal connections as the role of these connections have been demonstrated in animal studies [21]. Bayesian model selection revealed that among the models with the tactile input directed to dlPFC, model 3 was superior to the other 5 models. B. The log-evidence and posterior probability for each of the 6 models.

(TIF)

Data Availability

Code made publicly available at <Github.com/hummosa/MD-reservoir>. Behavioral data and group-level fMRI data available at <https://ruhr-uni-bochum.sciebo.de/s/xKBPyW7ZGLs2q2g>.

Funding Statement

SD is funded by National Science Foundation awards CCF-2139936, CCF-1810758, CCF-0939370, and CCF-2003830. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Vernier P. A natural history of the frontal cortex. Revue Neurologique. 2018;174: 737. doi: 10.1016/j.neurol.2018.09.002 [DOI] [PubMed] [Google Scholar]
  • 2.Baniqued PL, Gallen CL, Kranz MB, Kramer AF, D’Esposito M. Brain network modularity predicts cognitive training-related gains in young adults. Neuropsychologia. 2019;131: 205–215. doi: 10.1016/j.neuropsychologia.2019.05.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Cohen JR D’Esposito M The Segregation and Integration of Distinct Brain Networks and Their Relationship to Cognition. J Neurosci. 2016;36: 12083–12094. doi: 10.1523/JNEUROSCI.2965-15.2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Koechlin E, Ody C, Kouneiher F. The Architecture of Cognitive Control in the Human Prefrontal Cortex. Science. 2003;302: 1181–1185. doi: 10.1126/science.1088545 [DOI] [PubMed] [Google Scholar]
  • 5.Koechlin E. Prefrontal executive function and adaptive behavior in complex environments. Current Opinion in Neurobiology. 2016;37: 1–6. doi: 10.1016/j.conb.2015.11.004 [DOI] [PubMed] [Google Scholar]
  • 6.Jones DT, Graff-Radford J. Executive Dysfunction and the Prefrontal Cortex. Continuum (Minneap Minn). 2021;27: 1586–1601. doi: 10.1212/CON.0000000000001009 [DOI] [PubMed] [Google Scholar]
  • 7.Schoenbaum G, Khamassi M, Pessiglione M, Gottfried JA, Murray EA. The magical orbitofrontal cortex. Behav Neurosci. 2021;135: 108. doi: 10.1037/bne0000470 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Dundon NM, Shapiro AD, Babenko V, Okafor GN, Grafton ST. Ventromedial Prefrontal Cortex Activity and Sympathetic Allostasis During Value-Based Ambivalence. Front Behav Neurosci. 2021;15: 615796. doi: 10.3389/fnbeh.2021.615796 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sarafyazd M, Jazayeri M. Hierarchical reasoning by neural circuits in the frontal cortex. Science. 2019;364. doi: 10.1126/science.aav8911 [DOI] [PubMed] [Google Scholar]
  • 10.Botvinick MM, Braver TS, Barch DM, Carter CS, Cohen JD. Conflict monitoring and cognitive control. Psychol Rev. 2001;108: 624–652. doi: 10.1037/0033-295x.108.3.624 [DOI] [PubMed] [Google Scholar]
  • 11.Botvinick M, Nystrom LE, Fissell K, Carter CS, Cohen JD. Conflict monitoring versus selection-for-action in anterior cingulate cortex. Nature. 1999;402: 179–181. doi: 10.1038/46035 [DOI] [PubMed] [Google Scholar]
  • 12.Leong YC, Radulescu A, Daniel R, DeWoskin V, Niv Y. Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments. Neuron. 2017;93: 451–463. doi: 10.1016/j.neuron.2016.12.040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Nejati V, Salehinejad MA, Nitsche MA. Interaction of the Left Dorsolateral Prefrontal Cortex (l-DLPFC) and Right Orbitofrontal Cortex (OFC) in Hot and Cold Executive Functions: Evidence from Transcranial Direct Current Stimulation (tDCS). Neuroscience. 2018;369: 109–123. doi: 10.1016/j.neuroscience.2017.10.042 [DOI] [PubMed] [Google Scholar]
  • 14.Song HF, Yang GR, Wang X-J. Reward-based training of recurrent neural networks for cognitive and value-based tasks. Behrens TE, editor. eLife. 2017;6: e21492. doi: 10.7554/eLife.21492 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Saalmann YB, Pinsk MA, Wang L, Li X, Kastner S. The Pulvinar Regulates Information Transmission Between Cortical Areas Based on Attention Demands. Science. 2012;337: 753–756. doi: 10.1126/science.1223082 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Halassa MM, Kastner S. Thalamic functions in distributed cognitive control. Nat Neurosci. 2017;20: 1669–1679. doi: 10.1038/s41593-017-0020-1 [DOI] [PubMed] [Google Scholar]
  • 17.Hwang K, Bertolero MA, Liu WB, D’Esposito M. The Human Thalamus Is an Integrative Hub for Functional Brain Networks. J Neurosci. 2017;37: 5594–5607. doi: 10.1523/JNEUROSCI.0067-17.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wen X, Li W, Liu Y, Liu Z, Zhao P, Zhu Z, et al. Exploring communication between the thalamus and cognitive control-related functional networks in the cerebral cortex. Cogn Affect Behav Neurosci. 2021. [cited 17 Apr 2021]. doi: 10.3758/s13415-021-00892-y [DOI] [PubMed] [Google Scholar]
  • 19.Sherman SM. Thalamus plays a central role in ongoing cortical functioning. Nat Neurosci. 2016;19: 533–541. doi: 10.1038/nn.4269 [DOI] [PubMed] [Google Scholar]
  • 20.Schmitt LI, Wimmer RD, Nakajima M, Happ M, Mofakham S, Halassa MM. Thalamic amplification of cortical connectivity sustains attentional control. Nature. 2017;545: 219–223. doi: 10.1038/nature22073 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Rikhye RV, Gilra A, Halassa MM. Thalamic regulation of switching between cortical representations enables cognitive flexibility. Nat Neurosci. 2018;21: 1753–1763. doi: 10.1038/s41593-018-0269-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Loukavenko EA, Wolff M, Poirier GL, Dalrymple-Alford JC. Impaired spatial working memory after anterior thalamic lesions: recovery with cerebrolysin and enrichment. Brain Struct Funct. 2016;221: 1955–1970. doi: 10.1007/s00429-015-1015-x [DOI] [PubMed] [Google Scholar]
  • 23.Wolff M, Faugère A, Desfosses É, Coutureau É, Marchand AR. Mediodorsal but not anterior thalamic nuclei lesions impair acquisition of a conditional discrimination task. Neurobiol Learn Mem. 2015;125: 80–84. doi: 10.1016/j.nlm.2015.07.018 [DOI] [PubMed] [Google Scholar]
  • 24.Alcaraz F, Naneix F, Desfosses E, Marchand AR, Wolff M, Coutureau E. Dissociable effects of anterior and mediodorsal thalamic lesions on spatial goal-directed behavior. Brain Struct Funct. 2016;221: 79–89. doi: 10.1007/s00429-014-0893-7 [DOI] [PubMed] [Google Scholar]
  • 25.Wolff M, Vann SD. The Cognitive Thalamus as a Gateway to Mental Representations. J Neurosci. 2019;39: 3–14. doi: 10.1523/JNEUROSCI.0479-18.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Rubio-Garrido P, Pérez-de-Manzo F, Porrero C, Galazo MJ, Clascá F. Thalamic Input to Distal Apical Dendrites in Neocortical Layer 1 Is Massive and Highly Convergent. Cerebral Cortex. 2009;19: 2380–2395. doi: 10.1093/cercor/bhn259 [DOI] [PubMed] [Google Scholar]
  • 27.Wolff M, Morceau S, Folkard R, Martin-Cortecero J, Groh A. A thalamic bridge from sensory perception to cognition. Neuroscience & Biobehavioral Reviews. 2021;120: 222–235. doi: 10.1016/j.neubiorev.2020.11.013 [DOI] [PubMed] [Google Scholar]
  • 28.Alcaraz F, Marchand AR, Courtand G, Coutureau E, Wolff M. Parallel inputs from the mediodorsal thalamus to the prefrontal cortex in the rat. European Journal of Neuroscience. 2016;44: 1972–1986. doi: 10.1111/ejn.13316 [DOI] [PubMed] [Google Scholar]
  • 29.Murphy MJM, Deutch AY. Organization of afferents to the orbitofrontal cortex in the rat. Journal of Comparative Neurology. 2018;526: 1498–1526. doi: 10.1002/cne.24424 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kuramoto E, Pan S, Furuta T, Tanaka YR, Iwai H, Yamanaka A, et al. Individual mediodorsal thalamic neurons project to multiple areas of the rat prefrontal cortex: A single neuron-tracing study using virus vectors. J Comp Neurol. 2017;525: 166–185. doi: 10.1002/cne.24054 [DOI] [PubMed] [Google Scholar]
  • 31.Leung BK, Balleine BW. Ventral Pallidal Projections to Mediodorsal Thalamus and Ventral Tegmental Area Play Distinct Roles in Outcome-Specific Pavlovian-Instrumental Transfer. J Neurosci. 2015;35: 4953–4964. doi: 10.1523/JNEUROSCI.4837-14.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wang BA, Pleger B. Confidence in Decision-Making during Probabilistic Tactile Learning Related to Distinct Thalamo-Prefrontal Pathways. Cerebral cortex. 2020;30: 4677–4688,. doi: 10.1093/cercor/bhaa073 [DOI] [PubMed] [Google Scholar]
  • 33.Yu LQ, Wilson RC, Nassar M. Adaptive learning is structure learning in time. Neuroscience & Biobehavioral Reviews. 2021. doi: 10.1016/j.neubiorev.2021.06.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Schuck NW, Cai MB, Wilson RC, Niv Y. Human Orbitofrontal Cortex Represents a Cognitive Map of State Space. Neuron. 2016;91: 1402–1412. doi: 10.1016/j.neuron.2016.08.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Nassar MR, McGuire JT, Ritz H, Kable JW. Dissociable Forms of Uncertainty-Driven Representational Change Across the Human Brain. J Neurosci. 2019;39: 1688–1698. doi: 10.1523/JNEUROSCI.1713-18.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wang BA, Schlaffke L, Pleger B. Modulations of Insular Projections by Prior Belief Mediate the Precision of Prediction Error during Tactile Learning. The Journal of neuroscience: the official journal of the Society for Neuroscience. 2020;40: 3827–3837,. doi: 10.1523/JNEUROSCI.2904-19.2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Mathys C, Daunizeau J, Friston KJ, Stephan KE. A bayesian foundation for individual learning under uncertainty. Frontiers in human neuroscience. 2011;5, 39. doi: 10.3389/fnhum.2011.00039 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Dominey P, Arbib M, Joseph JP. A model of corticostriatal plasticity for learning oculomotor associations and sequences. J Cogn Neurosci. 1995;7: 311–336. doi: 10.1162/jocn.1995.7.3.311 [DOI] [PubMed] [Google Scholar]
  • 39.Enel P, Procyk E, Quilodran R, Dominey PF. Reservoir Computing Properties of Neural Dynamics in Prefrontal Cortex. PLoS Comput Biol. 2016;12. doi: 10.1371/journal.pcbi.1004967 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Fiete IR, Seung HS. Gradient Learning in Spiking Neural Networks by Dynamic Perturbation of Conductances. Phys Rev Lett. 2006;97: 048104. doi: 10.1103/PhysRevLett.97.048104 [DOI] [PubMed] [Google Scholar]
  • 41.Lee S-C, Cruikshank SJ, Connors BW. Electrical and chemical synapses between relay neurons in developing thalamus. J Physiol. 2010;588: 2403–2415. doi: 10.1113/jphysiol.2010.187096 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Pinault D, Bourassa J, Deschênes M. Thalamic reticular input to the rat visual thalamus: a single fiber study using biocytin as an anterograde tracer. Brain Res. 1995;670: 147–152. doi: 10.1016/0006-8993(94)01303-y [DOI] [PubMed] [Google Scholar]
  • 43.Bouchacourt F, Palminteri S, Koechlin E, Ostojic S. Temporal chunking as a mechanism for unsupervised learning of task-sets. Elife. 2020;9. doi: 10.7554/eLife.50469 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Llinas RR, Leznik E, Urbano FJ. Temporal binding via cortical coincidence detection of specific and nonspecific thalamocortical inputs: a voltage-dependent dye-imaging study in mouse brain slices. Proc Natl Acad Sci U S A. 2002;99: 449–454. doi: 10.1073/pnas.012604899 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Mukherjee A, Lam NH, Wimmer RD, Halassa MM. Thalamic circuits for independent control of prefrontal signal and noise. Nature. 2021; 1–8. doi: 10.1038/s41586-021-04056-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Fellows LK. Deciding how to decide: ventromedial frontal lobe damage affects information acquisition in multi-attribute decision making. Brain. 2006;129: 944–952. doi: 10.1093/brain/awl017 [DOI] [PubMed] [Google Scholar]
  • 47.Hare TA, Camerer CF, Knoepfle DT, O’Doherty JP, Rangel A. Value Computations in Ventral Medial Prefrontal Cortex during Charitable Decision Making Incorporate Input from Regions Involved in Social Cognition. J Neurosci. 2010;30: 583–590. doi: 10.1523/JNEUROSCI.4089-09.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Worsley KJ, Marrett S, Neelin P, Vandal AC, Friston KJ, Evans AC. A unified statistical approach for determining significant signals in images of cerebral activation. Hum Brain Mapp. 1996;4: 58–73. doi: [DOI] [PubMed] [Google Scholar]
  • 49.Cavada C, Compañy T, Tejedor J, Cruz-Rizzolo RJ, Reinoso-Suárez F. The Anatomical Connections of the Macaque Monkey Orbitofrontal Cortex. A Review. Cerebral Cortex. 2000;10: 220–242. doi: 10.1093/cercor/10.3.220 [DOI] [PubMed] [Google Scholar]
  • 50.Fine JM, Hayden BY. The whole prefrontal cortex is premotor cortex. Philosophical Transactions of the Royal Society B: Biological Sciences. 2022;377: 20200524. doi: 10.1098/rstb.2020.0524 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Bliss TVP, Collingridge GL, Morris RGM, Lisman J. Long-term potentiation: outstanding questions and attempted synthesis. Philosophical Transactions of the Royal Society of London Series B: Biological Sciences. 2003;358: 829–842. doi: 10.1098/rstb.2002.1242 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Frey U, Morris RGM. Synaptic tagging: implications for late maintenance of hippocampal long-term potentiation. Trends in Neurosciences. 1998;21: 181–188. doi: 10.1016/s0166-2236(97)01189-2 [DOI] [PubMed] [Google Scholar]
  • 53.Zenke F, Gerstner W. Hebbian plasticity requires compensatory processes on multiple timescales. Philosophical Transactions of the Royal Society B: Biological Sciences. 2017;372: 20160259. doi: 10.1098/rstb.2016.0259 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Chakraborty S, Kolling N, Walton ME, Mitchell AS. Critical role for the mediodorsal thalamus in permitting rapid reward-guided updating in stochastic reward environments. In: eLife [Internet]. eLife Sciences Publications Limited; 2 May 2016. [cited 2 Jun 2022]. doi: 10.7554/eLife.13588 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Alcaraz F, Fresno V, Marchand AR, Kremer EJ, Coutureau E, Wolff M. Thalamocortical and corticothalamic pathways differentially contribute to goal-directed behaviors in the rat. Elife. 2018;7: e32517. doi: 10.7554/eLife.32517 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Fresno V, Parkes SL, Faugère A, Coutureau E, Wolff M. A thalamocortical circuit for updating action-outcome associations. Behrens TE, Schoenbaum G, Schoenbaum G, Corbit L, editors. eLife. 2019;8: e46187. doi: 10.7554/eLife.46187 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Ouhaz Z, Ba-M’hamed S, Mitchell AS, Elidrissi A, Bennis M. Behavioral and cognitive changes after early postnatal lesions of the rat mediodorsal thalamus. Behavioural Brain Research. 2015;292: 219–232. doi: 10.1016/j.bbr.2015.06.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Mukherjee A, Bajwa N, Lam NH, Porrero C, Clasca F, Halassa MM. Variation of connectivity across exemplar sensory and associative thalamocortical loops in the mouse. Colgin LL, f M, Wolff M, Aggleton JP, editors. eLife. 2020;9: e62554. doi: 10.7554/eLife.62554 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Mante V, Sussillo D, Shenoy KV, Newsome WT. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature. 2013;503: 78–84. doi: 10.1038/nature12742 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Miconi T. Biologically plausible learning in recurrent neural networks reproduces neural dynamics observed during cognitive tasks. Frank MJ, editor. eLife. 2017;6: e20899. doi: 10.7554/eLife.20899 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Wimmer RD, Schmitt LI, Davidson TJ, Nakajima M, Deisseroth K, Halassa MM. Thalamic control of sensory selection in divided attention. Nature. 2015;526: 705–709. doi: 10.1038/nature15398 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Veness J, Lattimore T, Budden D, Bhoopchand A, Mattern C, Grabska-Barwinska A, et al. Gated Linear Networks. arXiv:191001526 [cs, math, stat]. 2020. [cited 21 Jun 2021]. Available: http://arxiv.org/abs/1910.01526 [Google Scholar]
  • 63.Sezener E, Grabska-Barwińska A, Kostadinov D, Beau M, Krishnagopal S, Budden D, et al. A rapid and efficient learning rule for biological neural circuits. bioRxiv. 2021; 2021.03.10.434756. doi: 10.1101/2021.03.10.434756 [DOI] [Google Scholar]
  • 64.Prasad JA, Carroll BJ, Sherman SM. Layer 5 Corticofugal Projections from Diverse Cortical Areas: Variations on a Pattern of Thalamic and Extrathalamic Targets. J Neurosci. 2020;40: 5785–5796. doi: 10.1523/JNEUROSCI.0529-20.2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Wang MB, Halassa MM. Thalamocortical contribution to solving credit assignment in neural systems. arXiv:210401474 [q-bio]. 2021. [cited 18 Jun 2021]. Available: http://arxiv.org/abs/2104.01474 [Google Scholar]
  • 66.Parkes SL, Ravassard PM, Cerpa J-C, Wolff M, Ferreira G, Coutureau E. Insular and Ventrolateral Orbitofrontal Cortices Differentially Contribute to Goal-Directed Behavior in Rodents. Cereb Cortex. 2018;28: 2313–2325. doi: 10.1093/cercor/bhx132 [DOI] [PubMed] [Google Scholar]
  • 67.Murray EA, Moylan EJ, Saleem KS, Basile BM, Turchi J. Specialized areas for value updating and goal selection in the primate orbitofrontal cortex. Gold JI, editor. eLife. 2015;4: e11695. doi: 10.7554/eLife.11695 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Rudebeck PH, Saunders RC, Lundgren DA, Murray EA. Specialized Representations of Value in the Orbital and Ventrolateral Prefrontal Cortex: Desirability versus Availability of Outcomes. Neuron. 2017;95: 1208–1220.e5. doi: 10.1016/j.neuron.2017.07.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Murray EA, Rudebeck PH. Specializations for reward-guided decision-making in the primate ventral prefrontal cortex. Nat Rev Neurosci. 2018;19: 404–417. doi: 10.1038/s41583-018-0013-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Rouault M, Drugowitsch J, Koechlin E. Prefrontal mechanisms combining rewards and beliefs in human decision-making. Nat Commun. 2019;10: 301. doi: 10.1038/s41467-018-08121-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Tsuda B, Pate SC, Tye KM, Siegelmann HT, Sejnowski TJ. Neuromodulators enable overlapping synaptic memory regimes and nonlinear transition dynamics in recurrent neural networks. bioRxiv. 2021; 2021.05.31.446462. doi: 10.1101/2021.05.31.446462 [DOI] [Google Scholar]
  • 72.Arcelli P, Frassoni C, Regondi MC, De Biasi S, Spreafico R. GABAergic neurons in mammalian thalamus: a marker of thalamic complexity? Brain Res Bull. 1997;42: 27–37. doi: 10.1016/s0361-9230(96)00107-4 [DOI] [PubMed] [Google Scholar]
  • 73.Ray JP, Price JL. The organization of projections from the mediodorsal nucleus of the thalamus to orbital and medial prefrontal cortex in macaque monkeys. J Comp Neurol. 1993;337: 1–31. doi: 10.1002/cne.903370102 [DOI] [PubMed] [Google Scholar]
  • 74.Klein JC, Rushworth MFS, Behrens TEJ, Mackay CE, de Crespigny AJ, D’Arceuil H, et al. Topography of connections between human prefrontal cortex and mediodorsal thalamus studied with diffusion tractography. NeuroImage. 2010;51: 555–564. doi: 10.1016/j.neuroimage.2010.02.062 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Saalmann YB. Intralaminar and medial thalamic influence on cortical synchrony, information transmission and cognition. Front Syst Neurosci. 2014;8. doi: 10.3389/fnsys.2014.00083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Bolkan SS. Thalamic projections sustain prefrontal activity during working memory maintenance. Nature neuroscience. 2017;20: 987–996,. doi: 10.1038/nn.4568 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Dreher JC. Menstrual cycle phase modulates reward-related neural function in women. Proceedings of the National Academy of Sciences of the United States of America. 2007. pp. 2465–2470,. doi: 10.1073/pnas.0605569104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Sacher J, Okon-Singer H, Villringer A. Evidence from neuroimaging for the role of the menstrual cycle in the interplay of emotion and cognition. Frontiers in human neuroscience. 2013;7, 374. doi: 10.3389/fnhum.2013.00374 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Wetherill RR, Jagannathan K, Hager N, Maron M, Franklin TR. Influence of menstrual cycle phase on resting-state functional connectivity in naturally cycling, cigarette-dependent women. Biology of sex differences. 2016;7: 24,. doi: 10.1186/s13293-016-0078-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Oldfield RC. The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia. 1971;9: 97–113. doi: 10.1016/0028-3932(71)90067-4 [DOI] [PubMed] [Google Scholar]
  • 81.Iglesias S. Hierarchical prediction errors in midbrain and basal forebrain during sensory learning. Neuron. 2013;80: 519–530,. doi: 10.1016/j.neuron.2013.09.009 [DOI] [PubMed] [Google Scholar]
  • 82.Vossel S. Cholinergic stimulation enhances Bayesian belief updating in the deployment of spatial attention. The Journal of neuroscience: the official journal of the Society for Neuroscience. 2014;34: 15735–15742,. doi: 10.1523/JNEUROSCI.0091-14.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Ashburner J, Friston KJ. Unified segmentation. NeuroImage. 2005;26: 839–851,. doi: 10.1016/j.neuroimage.2005.02.018 [DOI] [PubMed] [Google Scholar]
  • 84.Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, et al. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage. 2002;15: 273–289. doi: 10.1006/nimg.2001.0978 [DOI] [PubMed] [Google Scholar]
  • 85.Eickhoff SB, Stephan KE, Mohlberg H, Grefkes C, Fink GR, Amunts K, et al. A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data. Neuroimage. 2005;25: 1325–1335. doi: 10.1016/j.neuroimage.2004.12.034 [DOI] [PubMed] [Google Scholar]
  • 86.Friston KJ, Harrison L, Penny W. Dynamic causal modelling. NeuroImage. 2003;19: 1273–1302. doi: 10.1016/s1053-8119(03)00202-7 [DOI] [PubMed] [Google Scholar]
  • 87.Friston KJ. Functional and effective connectivity: a review. Brain Connect. 2011;1: 13–36. doi: 10.1089/brain.2011.0008 [DOI] [PubMed] [Google Scholar]
  • 88.Neumann J, Lohmann G. Bayesian second-level analysis of functional magnetic resonance images. NeuroImage. 2003;20: 1346–1355,. doi: 10.1016/S1053-8119(03)00443-9 [DOI] [PubMed] [Google Scholar]
  • 89.Garrido MI, Kilner JM, Kiebel SJ, Stephan KE, Friston KJ. Dynamic causal modelling of evoked potentials: a reproducibility study. NeuroImage. 2007;36: 571–580,. doi: 10.1016/j.neuroimage.2007.03.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010500.r001

Decision Letter 0

Daniele Marinazzo, Stefano Palminteri

17 May 2022

Dear Dr Hummos,

Thank you very much for submitting your manuscript "Thalamic regulation of frontal interactions in human cognitive flexibility" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

While we ask you to thoroughly address all remarks (this is a major revision; the paper will be sent back to the original reviewers), we would like to attract your attention to the following points:

Several reviewers pointed to the fact that some features and characteristics of the model were rather obscure (both its structure and its function - including how it reacted to the lesion experiments). We believe that clarifying this part will crucially improve the readability (and possible impact) of your paper.

Concerning fMRI analysis, R1 points to a lack of clarity concerning the region of interest selection (this is a major issue in fMRI), as well as insufficient reporting: both these aspects should be taken extremely seriously during the revisions. Finally, R2 makes a very interesting and relevant remark, that is that the current implementation of the analyses does not take into account the model predictions (activity in the different nodes). I believe this approach could provide important insights.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Stefano Palminteri

Associate Editor

PLOS Computational Biology

Daniele Marinazzo

Deputy Editor

PLOS Computational Biology

***********************

While we ask you to thoroughly address all remarks (this is a major revision; the paper will be sent back to the original reviewers), we would like to attract your attention to the following points:

Several reviewers pointed to the fact that some features and characteristics of the model were rather obscure (both its structure and its function - including how it reacted to the lesion experiments). We believe that clarifying this part will crucially improve the readability (and possible impact) of your paper.

Concerning fMRI analysis, R1 points to a lack of clarity concerning the region of interest selection (this is a major issue in fMRI), as well as insufficient reporting: both these aspects should be taken extremely seriously during the revisions. Finally, R2 makes a very interesting and relevant remark, that is that the current implementation of the analyses does not take into account the model predictions (activity in the different nodes). I believe this approach could provide important insights.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Summary of the research and overall impression

This study aims to translate to humans the past research lines carried out on the rodent model: the computational modeling of the role of the mediodorsal thalamus (MD) on prefrontal cortex interactions during decision making. This is ambitious as the human prefrontal cortex is significantly more complex with debated roles of its sub-territories in decision making, and the MD holds unique primate features not identified in rodents. The authors propose a decision-making task switching across different contingencies inspired by past work on rodent models (changing visual vs. auditory cuing by up vs. down tactile cuing). It allowed them to use the computational model developed in rodents, and expend it in humans. Nicely, the authors report the fMRI activity during their decision-making task, allowing them to identify the orbitofrontal cortex (OFC) and dorsolateral prefrontal cortex (dlPFC) activities to correlate with switching behavior during their task, and demonstrate that MD is engaged in feedback with the dlPFC, and transmit OFC inputs during strategy switching. Overall, this paper is an ambitious and convincing attempt to model the role of MD in prefrontal cortices interactions during the strategy switch in line with contingencies changes during a simple decision-making task in humans. Regarding the thalamus, studies on the role of the MD in humans are mostly lesion studies, which are often non-specific, with a large spectrum of effects on amnesia, dementia, autonomic functions, mood, and sleep/waking cycle. Thalamic nuclei are small and deep structures that are difficult to record with classical non-invasive brain imaging studies, and the few intracranial recording studies focused on memory formation. In this context, proposing a computational model of the role of MD in humans is ambitious, but it brings valuable insights to this largely unexplored domain of research. In particular, this study gives a role to MD that is not being a relay of sensory inputs. To my knowledge, this is the first direct evidence of such a role of thalamus nuclei in humans. Moreover, such models could help to understand the deficits induced by lesions or dysfunction of MD in humans.

To me, the outcomes of this study fit the ambition and impact expectations of PLOS Computational Biology. However, if the computation part of the study appears robust and convincing, some clarifications on the selection of the prefrontal regions seem necessary (see major issues below). A more exhaustive report of fMRI results is expected, and the course of the presentation should be rebuilt to show more clearly the unbiased identification of the prefrontal regions of interest.

Major issues

1. The course of the presentation is misleading. While the challenge of the prefrontal complexity in humans is well exposed since the beginning, the literature on sub-territories functions in decision making/task switching is too succinctly introduced and discussed. The dlPFC is given in the introduction as the region of interest for the model with few justifications. Actually, the dlPFC was not reported in the Change>Keep contrast for the same fMRI task in Burkhard Pleger et al., Cerebral Cortex, 2020. Authors should consider using the fMRI data as an unbiased first step of their study to label the entities of their model, and then more extensively discuss this result in line with the existing human literature.

2. In line with the previous comment, it is unclear if all cortical regions were explored or if the analysis was a-priori constrained to the dlPFC, OFC, and ventromedial prefrontal cortex (vmPFC). For example, it is surprising that activations related to Prior belief appear only in the vmPFC. The table with statistical effect for each cortical ROI is a must-have supplementary element in this study (tables for both Switching>Staying and Staying>Switching contrasts, and Prior belief).

Minor issues

1. The thalamic fMRI data is constrained to the MD, justified by the fact that previous studies in rodents focused on MD, and by the connectivity pattern of the MD. However, the effect in the other limbic thalamic nuclei would have been nice to have, knowing that poor evidence exists yet on the functional differences between those nuclei in humans.

2. The statistical difference between correct predictions in blocks with low predictability compared in comparison to blocks with unpredictability is not reported. The results from the ONE-way ANOVA should be shown in figure 1D. Actually, outcomes from a two-way ANOVA with interaction would be preferable with the type (Human/model) and association level (90-10/70-30/50%) as factors.

3. The authors did a significant and appreciable writing effort. However, they should consider doing an additional rereading effort of their manuscript including spelling, spacing, figure labeling (e.g. figure 1D and 1E and inverted between figure and text), and figures (e.g. missing error bar in figure 1D).

Reviewer #2: In this paper, Hummos and colleagues refine a previously developed neural model of an executive frontal-thalamic (MD) circuit based on mice work, and train it on a human decision-making task based on either a matching or non-matching rule on tactile percepts associated with variable contingencies. Both the nature of the rule and the contingencies can be parametrically varied to stimulate flexible responding. As a major take-home, the data suggest that the MD could act as a transthalamic hub to support indirect cortico-cortical communication in humans which may be required when flexible responding is warranted, as also supported by neuroimaging data. Altogether the paper is well written and of high interest for the field and even the broader audience as it effectively stimulates thoughts and proposes new insights to reveal the computational principles at play within thalamocortical circuits, with a possible anatomical relevance. I believe there are a few points that could developed or clarified to increase the value of the paper further.

1/ The neural model

Some additional descriptions or statements could be provided to better explain some crucial aspects of the model, especially for non-specialists. When considering the main features of the model, it is crucial to distinguish between what is driven by well-established facts from what is actually a working hypothesis. For example, the rationale that the MD exhibits “winner take-it-all” computational properties should be better explained. Is this based on empirical facts or a computational hypothesis? The reference provided l.148 based on anatomical work from Pinault does not seem to support this statement. There are only 2 neurons considered for the MD in the model so is this a limit of the current model or one of its feature to implement the “winner take-it-all” rule? In a similar way can we really make meaningful functional comparison between the 500 neurons dlPFC reservoir and the 2 neurons MD? See line 353-354 for one instance of the latter.

I was also a bit confused on the statement regarding the RNN which “did not have a mechanism to encode recent rewards” (the paragraph l.161-165). It is then said that the model is provided with inputs representing the value of each rule “computed over recent trials”. But if it is computed over recent trials what is then the difference from having a memory of recent rewards? Please clarify the difference.

The data illustrated in figure 2C need to be more comprehensively described and analyzed. What do top and bottom panels represent exactly, what is the difference between the two and how does this ultimately support the conclusion that dlPFC, not OFC or MD acts as input region? There is only one short mention in the main text l.207-208 which is insufficient to understand this and it is crucial for the next parts of the paper.

Finally, I think it would have been great to discuss a little further the extent to which the proposed model also has anatomical relevance. I think it is relatively well described that the OFC is the recipient from multiple sensory streams (e.g. classic neuroanatomy work from Cavada et al., Cerebral Cortex 2000), consistent with a major integrative role for this region. So I expect that some neuroanatomists would have expected that inputs may be conveyed from the OFC to the dlPFC rather than the opposite. This at least should deserve a more thorough discussion. Perhaps views from Barbas & colleagues could also be considered here to strengthen this aspect (e.g. García-Cabezas & Barbas, 2017). In direct line to this, the final paragraph l.358-366 is very short and do not bring much data for the reader to help consider the relevance of the proposed model (inputs reach dlPFC first, not OFC or MD). This part would benefit from being a little more scholar with clearer explanations of the data considered and the types of analyses that have been done.

2/ Perturbation experiment

The inclusion of a perturbation approach is nice and could be very informative to test some of the predictions of the model. In its present form however, it is difficult to have a clear functional understanding of MD deletion in the RNN. The data presented on figures 3F-I need to more thoroughly explained and analyzed. At present the main text and the caption do convey ambiguous statements. The main text says that the initial effect of MD lesion on flexible switching became less prominent after further training (l.233-234) but the caption of figure 3 (l.253-254) state that initial blocks showed no lesion effect unlike subsequent blocks which suggests the opposite. So I don’t really understand the nature of the deficit here. A transient deficit or acquisition impairment is a common feature in the rodent literature documenting the effect of MD lesion and has been relatively widely been discussed (there are multiple reviews out that document this specific issue) and it would be nice to compare that with the present dataset. I think the effect of MD lesions really need a much clearer description on what exactly is being impaired, what is not, and whether any of the effects are permanent or temporary. Given the scope of the paper, it would be expected to compare that with actual lesion data, there is a myriad of those available both in rodent and NHPs. This may help to strengthen the relevance of the model further and/or to identify some limits.

3/ Context vs rule

I was confused by some of the terminology used in the paper and in particular the use of the term ‘context’ which I did not expect to be necessary here. Both ‘temporal context’ and ‘behavioral context’ are used in the ms to describe the fact that the rule can change (matching vs non-matching). The first occurrence of this is line 218 and I really don’t understand the rationale for labelling this as ‘temporal context’ when what is really meant here is that MD neurons may encode the current rule within a block. Introducing ‘context’ here seems unnecessary and actually incorrect since there is no element to signal a context. Adding ‘temporal’ (and then ‘behavioral’ and multiple further instance) only makes the issue even worse and obscure much of the interpretation. I would really suggest to remove these terms throughout the whole ms or to better explain the rationale for using them (and especially what it brings to use those). To identify the current rule, participant must track any difference between expected and actual outcome but I really don’t see how the context could play a role in this.

I have a few more minor comments.

1. I think that the data considered in the introduction and the whole rationale to present the focus on the MD is a bit thin. There is a lot of literature out there and only 2 MD papers are considered. It should be easy to build a stronger case considering the vast literature available. Similarly, I found the initial question l.71 to introduce the present work a bit narrow as it only asks whether the MD could mimic Pulvinar functions on frontal systems. The Pulvinar surely is a high-order thalamic nucleus like the MD but unlike the MD it is modality specific so I think the MD is a candidate to have actually possibly broader (even more integrative) functions, at least across modalities. I think it would impact more significantly on the general audience to better introduce the present work and how it is positioned to influence what is now a quite dynamic field, with important functional questions.

2. The mentions on the non-relay functions of the MD may appear a bit trivial in 2022 (e.g. l.78, l.202, l.205, and especially l.209). Surely we knew it before the present work was conducted that the MD is not relaying tactile sensory information to the prefrontal cortex. Consider the sentence l. 208-209; alternatively, what would be the evidence demonstrating instead that a thalamic area actually engages in a task in humans to relay sensory information? I don’t think this bring much to main topic of the work and it is possibly distracting.

To get back to the previous point, I even wondered upon reading to this why the authors did not look at the Pom instead (the high-order thalamic nucleus for somatosensory information) as an actual parallel to the Pulvinar (especially l.205)?

3. Considering the model was initially developed to account for the performance of mice in a cross-modal attention task, I think it would be great to compare more clearly the mouse task with the human one as they have a number of clear differences which may impact at the functional level (e.g. cross-modal versus purely tactile task, cue signaling the current rule versus need to monitor the difference between expected and actual outcomes to identify the current rule, mice learn everything by trials-and-errors while participants are warned they will be rule changes – even though the changes are not directly signaled etc…).

4. In the discussion, the rationale for the statement l.437 that “thalamocortical connections are either fixed or updated at a much lower rate compared to corticothalamic projections” is not provided or explained. Please expand here. Is the fact that CT projections typically outnumber TC projections relevant here?

5. Some typos: l.139 perhaps a ‘to’ is missing, l.157 ‘disinhibitory’

Reviewer #3: The manuscript by Hummos and colleagues provides a modelling study investigating thalamocortical circuit mechanisms of context-dependent learning. This is combined with both behavioural and fMRI data from a human learning experiment, to which the model was applied. DCM on the fMRI data is used in attempt to arbitrate between different model architectures. In the task, participants are presented with a tactile stimulus (up or down) to which they have to respond with a match- or non-match response. The cue-response mappings (match- vs non-match) are probabilistic and change between blocks of trials (0.9, 0.7, 0.5 probability of the match- vs non-match response being correct).

The neural network model had four components: dlPFC (modelled as RNN with reservoir dynamics), mediodorsal thalamus (MD), vmPFC (providing value-related input), and OFC (providing estimates of the probability that the underlying latent state has changed). Importantly, thalamocortical connections (MD--> dlPFC) were fixed, while corticothalamic (dlPFC-->MD) weights undergoing Hebbian learning. The model behaviour closely tracks human choice data and model vmPFC output closely follows the underlying action probabilities. fMRI results show increased activity on strategy swith vs stay trials in dlPFC, MD, and OFC. DCM applied to these fMRI data supports a model in which sensory inputs first enter dlPFC, rather than being relayed via MD.

There are several results in this paper that I found very interesting. The authors investigated the coding properties of model dlPFC and MD neurons. In line with previous experimental findings, the architecture gave rise to mixed selectivity in dlPFC neurons. In contrast, MD neural activity was tightly related to the current context, or block (match vs non-match). Notably, the latter property of MD neurons relied on the long time scale of the Hebbian eligibility trace. When this time constant was reduced, MD neurons were biased to tracking the sensory signal (up vs down cue) that varies from trial to trial. Learning data of a model with MD lesions (output of MD to dlPFC removed) showed a "reversal learning deficit" - the lesioned model took longer to learn the new contingencies following a covert block switch, but it eventually reached control levels within a few hundred trials. The MD lesion also reduced cue selectivity in dlPFC neurons. Furthermore, in the intact model, connections between dlPFC reservoir to output neurons for the appropriate context increased for the correct context, but remained silent out of context. Finally, DCM results provided evidene in favour of transthalamic routing of OFC signals (OFC signalling to dlPFC via MD, rather than direct cortio-cortical signalling).

As stated above, I think this manuscript provides important new insights regarding the thalamocortical circuit mechanisms underlying context-dependent learning/probabilistic inference. I should point out that I am not an expert in neural network modelling (far from it), so I cannot provide a well-founded judgement on these methods. My comments below also mainly pertain to clarifying some questions that came up while reading the manuscript.

1) I think the cartoon of the model in figure 1B should be elaborated. The neural network model takes centre stage, so this needs to be clear. Before I arrived at the methods, it had not been really clear to me which components the model included - and which brain areas they were thought to represent. For example, I initially thought that the authors used vmFPC and OFC interchangeably, until I realized that these were separate components. Perhaps it would even be an option to have a separate figure solely dedicated to the model (and the variations of it that were used) to make the approach more straightforward to follow. Questions on the architecture that still remain open to me:

- How exactly does the MD component of the model look? I think that it contains two excitatory neurons that project to a population of inhibitory cells (hypothesized in the reticular nucleus) that provide feedback inhibition - is this correct?

- The OFC component is (at least what I understand) not at all present in the schematic (1B) - was it added only at a later stage?

- Is there anything about the RNN architecture that would make it particular to the dlPFC? In other words, could this component map to any cortical area (with strong recurrent connectivity)? Did the mapping of the component to dlPFC result from the fMRI findings?

2) I was intrigued when looking at the model MD neuron firing patterns (Figure 3D) - this looked to me like a code that very sharply represents the latent task state - thus holding a representation very similar to those found in OFC (Wilson et al., Schuck et al.). I have difficulty understanding how the representation found here would differ from an OFC representation of latent states. The MD representation seems to be very sharp (apparently emerging immediately after a block switch) and binary (where I would assume the latter part results from the winner-take-all dynamics in MD?). Could you please elaborate on how this emerges (in particular so abruptly) - is it a representation that is inherited from the OFC component, which I assume will be similarly sharp (threshold crossing of the switch probability)?

3) The fMRI results (with the exception of the DCM) are somewhat touched only quite briefly. In particular, I had been wondering what the rationale was for using a simple switch>stay contrast as the main fMRI contrast (which yielded dlPFC, OFC and MD; figure 2). I think this could be elaborated a bit in order to relate the regions more to their underlying functions as specified in the model. Wouldn't it be interesting to use e.g. the output of the OFC component as regressor in the GLM to test whether this region truly tracks the underlying latent state? Similarly with the model MD output (possibly also dlPFC). I appreciate that the dominant focus of this manuscript leans towards the modelling, but I would hope that a more thorough analyses of the fMRI data could further strengthen the authors' case.

Minor:

1) In the introduction, the authors sometimes switch between pulvinar and MD thalamus (e.g. line 71, 72) - would the authors see them as equivalent?

2) The task task figure (1A) does not seem to show the non-predictive (p = 0.5) context? Further to this, what do the eight circles represent? In figure 1C, it is not quite clear to me what is shown with the dots. I think more description in the caption could help. Overall, while I found all other figures rather straightforward to read and clear, I was quite much struggling with this one (see also my questions about model architecture above).

3) It seems that some literature is not cited correctly. In the introduction (line 56/57), two studies (Takahashi et al., Schultz et al.) are cited in support of OFC-dlPFC interactions, but as far as I remember neither of these investigated this topic.

4) I was wondering what specific purpose the HGF was used for? It only briefly comes up, almost in passing on page 5. It is mentioned that prior beliefs about outcome probability were weaker on switch compared to stay trials - and then the authors move on. This seems unnecessary to me, but maybe I'm missing something, in which case I would ask the authors to elaborate.

5) For the model MD lesion studies, blocks with 0.9 and 0.1 match probability were used - unlike main experiment with 0.9, 0.7, 0.5. 0.3, 0.1 - why?

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Antoine Collomb-Clerc

Reviewer #2: Yes: Mathieu Wolff

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010500.r003

Decision Letter 1

Daniele Marinazzo, Stefano Palminteri

19 Aug 2022

Dear Dr. Hummos,

We are pleased to inform you that your manuscript 'Thalamic regulation of frontal interactions in human cognitive flexibility' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Stefano Palminteri

Academic Editor

PLOS Computational Biology

Daniele Marinazzo

Section Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors exhaustively answered the comments of the first review. The manuscript appears strengthened, with an increased impact. Once again, this study is of high interest to the field as it highlights a new role of the human thalamus, and may help the understanding of symptoms following thalamic lesions.

Importantly, major issues concerning ROIs have been well clarified in both the results and methods sections. The methodological approach to fMRI data is clearer in the text. It is now straightforward why results may differ from previously published work. Moreover, the effort of the authors to bring a more detailed introduction of human PFC subregions is appreciable for the conceptualization of the explored ROIs. Finally, constraining the analysis to specific ROIs based on a strong hypothesis related to previous studies seems fully acceptable, as soon as it is well specified, which is now the case.

Regarding minor issues, thank you to the authors for replying and modifying exhaustively in line with the comments. The first comment arose more from scientific curiosity, and the author’s answer suggests that switching/flexibility appears specific to MD. Other limbic nuclei are likely involved in this task, but more in memory/reinforcement learning processes.

Reviewer #2: The authors have addressed all my comments. I really appreciate the efforts and as a whole, the paper is clearly even more interesting to me. On the definition side of things, I am still wondering if a term like "state" (state representation) could be maybe better suited than "context" but this comment is only the expression of my scientific interest, I am not asking to discuss the issue further especially after the definition that the authors provided. I have no further comment.

Reviewer #3: The authors have clarified all my questions, performed a thorough revision and I think the manuscript is now even better and clearer to read. There is only one remaining point from my side. Previously, I had suggested to regress the model outputs against the fMRI signal. The authors have argued that this likely can't be done in a meaningful way, as the model outputs a 'generic average behaviour' - which does not account for the individual subjects' variation in when they switch their decision strategy. I would fully agree with this. But then I read in response to a comment by reviewer #2 that the RNN was provided with inputs representing recent rewards, and that these were experimenter-computed. This made me wonder whether it would not be possible to provide subject-specific value estimates to the RNN? Could one not fit a simple RL model to each subject's behaviour - this way the inputs to the RNN would take into consideration the individual sequence of choices made, and outcomes experienced by the participants. Please consider this merely a suggestion. If this could work the way I think, then maybe it might be worthwhile trying it. But to be clear: I think the paper is strong and comprehensive the way it is now - it does not require this extra analysis. Congrats on this exciting work.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Antoine Collomb-Clerc

Reviewer #2: Yes: Mathieu Wolff

Reviewer #3: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010500.r004

Acceptance letter

Daniele Marinazzo, Stefano Palminteri

6 Sep 2022

PCOMPBIOL-D-22-00338R1

Thalamic regulation of frontal interactions in human cognitive flexibility

Dear Dr Hummos,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofia Freund

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Brain regions related to the Strategy Switching (Switching > Staying) (p < 0.001, uncorrected).

    (DOCX)

    S2 Table. Brain regions correlating with the prior belief in Switching and Staying.

    (p < 0.001, uncorrected).

    (DOCX)

    S1 Fig. Value inputs improve model behavioral flexibility and correlate with fMRI activity in human vmPFC.

    We compared the model with and without value input (+vmPFC, -vmPFC respectively), and also with and without output from MD (+MD, -MD, respectively). A. Performance of model with no value inputs shows little behavioral flexibility with significant dips in performance at block changepoints. B. Trial-averaged MD activity correlation with ground truth present context for each trial. Only model of both value inputs and intact MD output to dlPFC showed appropriate encoding of context in MD. C. The weights from dlPFC neurons to output showed significant learning and unlearning in model without value input. Adding value inputs leads to more coherent weight changes across blocks, and adding MD further reduce destructive learning and unlearning. D. We considered the distribution of output weights at the end of experiment and looked at its variance as a measure of dispersion. Weights that learned and then unlearned across blocks remained close to zero with low variance. E. The comparison of prior belief–related fMRI activity in vmPFC between Switching and Staying in human participants. Prior belief about the outcome value, derived from Hierarchical Gaussian Filter model (details in previously published papers [32,37]), correlated with the activity in vmPFC for both Staying and Switching strategy. The results projected on axial MRI brain slices. Brain activations displayed at p < 0.001 (uncorrected, red). For other regions see S2 Table. F. The prior belief-related activity (beta value) in vmPFC extracted from both Staying and Switching were applied to a paired sample t-test. The result showed no significant difference between Staying and Switching (p = 0.30). The error bar depicts the standard error.

    (TIF)

    S2 Fig. Reponses of output neurons and MD neurons.

    A. Responses of the ‘Down’ output neurons during a trial with input sensory cue as ‘Up’ or ‘Down’, first in a match context (left), then in a non-match context (middle), and trial-averaged activity of the same output neuron over the first three block of the experiment (right). Output neuron activity to the correct responses separates with readout weight learning over the first three blocks of the experiment and correctly reads out target output from dlPFC activity. B. Same as in A but for the ‘Up’ output neuron. C. Behavioral responses of the model across an experiment starting with 90% and 10% blocks, but then including a 70% and a 50% association level blocks. D. Responses of one of the MD neurons with some increased responses in the opposite context when the association level is less predictive. E. Weights from dlPFC neurons to one MD neuron, averaged over the 5 neurons with the highest eligibility trace in block 1 (blue), or the 5 lowest (orange) showing opposing weight dynamics across blocks.

    (TIF)

    S3 Fig. Changes in representation and behavior in the MD lesioned model.

    A. Histogram of correlation values (pseudo R-squared) between individual neuronal activity and input cue, with trials drawn from a match context on the x axis, and a non-match context on the y-axis (See Methods, ‘Finding cue-responsive neurons with logistic regression’). The histogram for the MD intact model (left) and MD lesioned model (middle) were subtracted to highlight the differences (right), showing mainly fewer cells that correlated with input cue in both contexts.

    (TIF)

    S4 Fig. Multiplicative thalamocortical projections separate context representations without obfuscating sensory cue representations.

    Experiments with alternating blocks of 10% and 90% of match trials rewarded with no vmPFC inputs, but one MD neuron artificially activated for each type of block and Hebbian learning at the corticothalamic projections disabled. We tested the model with multiplicative or additive thalamocortical projections separated and parametrized the strength of either projections by multiplying their respective values by a factor from 1 to 40. A. Model performance for selected strengths of additive projections and B. multiplicative projections. C. Increasing the strength of additive projections initially improves performance and behavioral flexibility at block changepoints but performance rapidly peaks and declines. D. Increasing the strength of multiplicative projections consistently increases performance until reaching 0.9 ratio correct steadily with minimal dips at block changepoints. E. Increasing additive projections strength decreased neural activity correlation in dlPFC for match vs non-match contexts, but also rapidly increased correlation between up and down trials until they become highly correlated, and presumably difficulty to decode. F. Increasing multiplicative project strengths decreases neural activity correlation in dlPFC between contexts with limited increase in correlation between cues.

    (TIF)

    S5 Fig. Causal pattern space of DCM for fMRI data.

    A. Illustration of the model space for Bayesian model selection. We specified six models to determine whether the feedforward, feedback or both connections between OFC were modulated by the strategy switching. We constrained the space to models with assumed dlPFC to MD reciprocal connections as the role of these connections have been demonstrated in animal studies [21]. Bayesian model selection revealed that among the models with the tactile input directed to dlPFC, model 3 was superior to the other 5 models. B. The log-evidence and posterior probability for each of the 6 models.

    (TIF)

    Attachment

    Submitted filename: Response to reviewers.docx

    Data Availability Statement

    Code made publicly available at <Github.com/hummosa/MD-reservoir>. Behavioral data and group-level fMRI data available at <https://ruhr-uni-bochum.sciebo.de/s/xKBPyW7ZGLs2q2g>.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES