Abstract
The brain must maintain a stable world model while rapidly adapting to the environment, but the underlying mechanisms are not known. Here, we posit that cortico-cerebellar loops play a key role in this process. We introduce a computational model of cerebellar networks that learn to drive cortical networks with task-outcome predictions. First, using sensorimotor tasks, we show that cerebellar feedback in the presence of stable cortical networks is sufficient for rapid task acquisition and switching. Next, we demonstrate that, when trained in working memory tasks, the cerebellum can also underlie the maintenance of cognitive-specific dynamics in the cortex, explaining a range of optogenetic and behavioural observations. Finally, using our model, we introduce a systems consolidation theory in which task information is gradually transferred from the cerebellum to the cortex. In summary, our findings suggest that cortico-cerebellar loops are an important component of task acquisition, switching, and consolidation in the brain.
Subject terms: Network models, Learning algorithms
How the brain maintains a stable world model while swiftly adapting to environmental changes remains unclear. Here, the authors propose that the cerebellum drives cortical dynamics, enabling rapid task acquisition, switching, and consolidation.
Introduction
Learning to interact with the environment requires the ongoing integration of rapidly changing sensory cues with future behavioral outcomes. Growing evidence suggests that cortical dynamics integrate the task-specific information that is needed for such sensory-behavioural transformations1–5. One dominating view in the field assumes that cortical networks are themselves learnt or optimised leading to the rich dynamics required for task performance6–8. However, to help ensure a stable representation of the world, cortical plasticity must be kept under control and relatively weak9–12. This raises the question of how can the brain quickly acquire new task-specific dynamics in the presence of relatively fixed cortical connectivity.
One possible solution is to consider feedback loops that drive cortical dynamics13. Computational studies have extended recurrent neural networks (RNNs) models of cortical networks (Fig. 1a) to incorporate feedback loops for task acquisition. One type of feedback loop drives RNN dynamics by projecting the readout back to the RNN14–16 (Fig. 1b). Building on this line of work, two recent theoretical studies have suggested that thalamo-cortical feedback can both prepare and control RNN dynamics to achieve flexible motor sequencing17,18. All of these studies assume that connectivity within the RNN itself remains fixed, thereby avoiding complex learning rules while being able to reuse RNN dynamics for different contexts19. However, these approaches either assume a relatively simple feedback (i.e., a linear combination of RNN activity) or rely on theoretically optimal, but biologically implausible, derivations for the feedback signal. In particular, the possible role of more powerful, highly adaptable brain regions is often overlooked.
Fig. 1. Schematic of cortical recurrent networks with different types of feedback.
a Model variant with no feedback: temporal external input (xt) is fed to a cortical RNN (grey) and a linear readout layer (blue) produces the final model output (zt). b Model variant with readout-only feedback: in this scheme there is a feedback loop in which the RNN also receives readout predictions as extra input14,16. c Model variant with cerebellar feedback: a copy of RNN activity (ht) is sent to a (feedforward) cerebellar network , which feedbacks to the cortical network its own cerebellar predictions (ct). d A key property of our cerebellar network is that it learns via behavioural timing-specific learning rules, in line with experimental observations43. In this learning rule the error between the cerebellar prediction c and future behavioural outcomes y (150 ms) triggers plasticity via climbing fibers at the parallel fibre input of Purkinje cells.
Here we focus on the feedback loop between two key brain regions, the cortex and the cerebellum. The cerebellum is a highly plastic system and is well placed to drive cortical dynamics via a set of stereotypical, but functionally separable cortico-cerebellar loops20,21. Indeed, an ever-growing array of clinical22, functional imaging23,24, and optogenetic25–28 studies support an important cerebellar contribution to cortical activity in both motor and non-motor domains. Recently, two hypotheses on the computational role of cortico-cerebellar loops have been put forward29–32. The first asserts that the cerebellum reinforces cortical-dependent goal-directed behaviour by appropriately steering or stabilising cortical states in real-time29,30. The second also promotes the cerebellum as a facilitator of goal-directed cortical transitions, but it does so indirectly via teaching signals which lead to cortical plasticity31,32. Whilst these two views may co-exist, it is the former that is well placed to operate under weakly plastic cortical networks. Moreover, the cerebellum acting as an instantaneous driver of cortical dynamics is in line with the fast activity-dependent cortico-cerebellar interactions that have been observed experimentally25,27,28.
Here, we put forward a computational framework in which the cerebellum learns to rapidly steer and stabilise task-dependent cortical dynamics. We test this model on a variety of motor and non-motor tasks, proposing that the cerebellum is optimised to support task acquisition in the cortex. This reduces the burden of learning in cortical networks and allows a given cortical area to rapidly switch between different tasks. In line with this, we show that a strong cortical dependence on cerebellar feedback arises after learning, consistent with recent behavioural and optogenetic experiments. Finally, we use this model to put forward a cerebellar-to-cortical systems consolidation theory, in which quickly learnt task-specific information encoded by the cerebellum is gradually transferred to the cortex. Overall, we introduce a computationally and experimentally supported theory for cerebellar-supported task acquisition, switching and consolidation in the brain.
Results
A computational model of cerebellar-driven cortical dynamics for task acquisition
To study the role that cerebellar feedback can have in driving cortical dynamics during task acquisition, we explore different variants of cortical RNNs: without feedback (Fig. 1a), with readout feedback (Fig. 1b)14,16 and with feedback provided by a cortico-cerebellar loop (Fig. 1c). We introduce a model of cortico-cerebellar loops, in which a cortical RNN is reciprocally connected to a feedforward cerebellar network . In our model, temporal RNN representations ht are passed onto the cerebellar network to compute task-specific predictions ct, which are then sent back to the same cortical RNN. The final model output zt is then a linear readout of the RNN activity
1 |
where α denotes the cortical internal memory (or leak) of the RNN neurons, f(x) is the cortical activation function which is set as . are the recurrent, input, and cerebellar weights onto the RNN respectively, and Wrdt are the readout weights (see Fig. S1 for a detailed schematic). For computational efficiency and due to the relatively long duration of the tasks we train our model using a discrete approximation of a continuous RNN (see Methods). To highlight the need for optimised network connectivity rather than inherent cortical memory mechanisms, in our experiments we generally focus on small α = 0.1 (see Methods).
The cerebellar feedback ct is a feedforward computation on the previous RNN activity
2 |
where WMF represent the cerebellar (input) mossy fibre (MF) weights onto granule cells (GC) and WPF the parallel fibre (PF) weights from GC to Purkinje cells (PC), here representing the output. Together, these constitute the main stages of processing in the cerebellum33–35. In general we model WMF as highly divergent with an input/output ratio of 1:20 (see Methods) and as a rectified linear function (ReLU), in line with the large numbers of cerebellar GCs and responses36,37. As we demonstrate in our results, and consistent with prior work, the dimensionality expansion and non-linearity at the GC layer enables better representations during learning.
We use biologically plausible gradient descent38 to optimise cortical weights during the acquisition of a given task (Eq. (1)). In particular, we minimise the temporal error , where yt denotes the desired task outcome at time t and is the task error function (see Methods). These weights can all be optimised simultaneously during learning – we refer to this case as fully plastic. However, a key idea that we put forward in this study is that it is not the neocortex, but in fact the cerebellum, which acts as a key driver for task acquisition. For this reason we highlight the case in which RNN plasticity is constrained. In particular, we focus on conditions in which RNN plasticity is either absent – fixed RNN case, or in which plasticity is strictly limited to its input synapses (i.e., only in Eq. (1) are plastic) – input plastic case. The latter case considers both plasticity at sensory and cerebello-cortical input during task acquisition, in line with experimental observations showing plasticity at cerebellar pathways to the cortex39,40.
In contrast to cortical learning, the cerebellum is always optimised, through a separate but related cerebellar error . In line with classical models of the cerebellum33 we assume that learning occurs at the parallel fibres WPF, mediated by climbing fibre error signals, whilst mossy fibres inputs WMF remain fixed. Like the cortical prediction error, the cerebellar error function depends on the desired task outcome y. However, as we will see later, it is advantageous for the cerebellum to provide predictions of future outcomes. To enable this we formulate a temporal cerebellar learning rule. In this rule the cerebellum learns by comparing its own past output within a predefined time-window τ, with current desired outcomes (Fig. 1d), – behavioural timing-specific learning rule. This learning rule then predicts the need for temporally precise coordination between parallel fibre inputs and subsequent climbing fibre error signals to achieve plasticity, in line with experimental findings41–46. Therefore, it enables the cerebellum to predict future outcomes effectively, i.e., ct ≈ yt+τ. For our motor-based tasks we generally consider a cerebellar time window of τ ≈ 150 ms43 and for the later cognitive tasks use longer windows τ ≈ 600 ms (see Methods).
Cerebellum learns to drive cortical dynamics during a line drawing task
To study the functional consequences of cortico-cerebellar loops we first test the model in a motor-based line drawing task. In this task the model receives one out of six cues at the beginning of the task and learns to either remain still or produce one out of five possible straight lines (Fig. 2a; see Methods). Feedback provided by desired outcomes (i.e., straight lines) is provided at each timestep. Consistent with behavioural studies on cerebellar patients47, we find that cerebellar feedback significantly improves learning of the task and final performance (Fig. 2a, b). The ability for cerebellar feedback to facilitate learning does not depend on the degree of plasticity and internal memory in the cortical RNN (Fig. 2c). Interestingly, a fixed RNN with a plastic cerebellum achieves the same learning performance as a fully plastic or input plastic RNN. In contrast, when no feedback or a simple readout feedback is provided the network can fail to learn the task due to the leaky properties of RNNs (Fig. 2b, c). Classical cerebellar models pose that the cerebellum can act as a direct controller of motor tasks33. To contrast this view with our model we also train an RNN with a direct cerebellar readout, which apart from the cortico-cerebellar feedback weights uses the same free network parameters, and find it insufficient to learn the task (Figs. S1 and S2).
Fig. 2. Cerebellum learns to drive cortical dynamics during a line drawing task.
a Given one of six possible stimuli at the first timestep the model must learn to draw a corresponding line (dotted black line) or remain still. Model output after training is shown for three model architectures with a fixed RNN. b Learning curves of models in A (same colour-coding). MSE denotes mean squared error. c Average training error across different levels of RNN internal memory (α) and plasticity (fixed RNN, input plastic and fully plastic) for the no feedback and cerebellar feedback models; arrow denotes cortical internal memory used in the other panels (α = 0.1). d Average training error of cortico-cerebellar model under varying numbers of granule cells and cerebellar temporal windows (τ). Orange arrow denotes default parameter choices. e Prediction error between cortical output and itself (gray) or cortical output and cerebellar output (orange) for different temporal delays. f Evolution of first (upper panel) and second (lower panel) principal components of cortical RNN for different stimuli, colour-coded as in a using small (τ = 0 ms) and large (τ = 250 ms) cerebellar time windows. g Variance across cues from both first and second PCs (cf. F) for different cerebellar temporal windows, τ. h Model output for different periods of cerebellar ablation (blue box represents period of ablation). i Output x and y coordinates of the lines drawn in (h). j Average model error across all inputs for ablation periods in (h, i). k Average error for different degrees of plasticity and ablation periods (left to right) as in (h–j). l Average change in task error for models with versus without cerebellar feedback during (black) and after (blue) training for different degrees of cortical plasticity. All results are averaged over 5 different initial conditions. Error bars represent standard error of the mean.
Next, we study how two known cerebellar features: (i) a large number of granule cells and (ii) behavioural timing-specific plasticity rules contribute to task proficiency. We find that a combination of high numbers of granule cells with a learning rule with a non-zero temporal horizon, τ, result in better cerebellar learning (Fig. S3), which in turn drives better cortical representations and overall task performance (Fig. 2d and Figs. S3, S4). Moreover, because both the cortical RNN readout and cerebellar network are trained on the same desired outcome, we observe that cerebellar output effectively predicts cortical readout τms ahead (Fig. 2e). Our model thus provides a theory of how the cerebellum learns to predict upcoming movements48,49.
The advantage of a large number of granule cells has been well studied is likely due to better linear separability of its inputs50. However, what are the computational advantages of the cerebellum providing the cortical RNN with expected future outcomes? Due to RNN leakiness, sensory cues are rapidly forgotten. Therefore a high cerebellar τ gives the cerebellar network the ability to map RNN activity to desired outcomes early on in the task. Consistent with this we find that the predictive cerebellar output drives outcome-dependent RNN representations (Fig. 2f, g). This result showing potent initial drive of cortical activity could provide a justification for the observed role of the cerebellum in movement initiation51,52.
Finally, to directly examine the role of cerebellar feedback on cortical dynamics, we inhibit - or “ablate” - cerebellar output (i.e. ct = 0 in Eq. (1)) during different stages of the task. In each case we observe significant impairment in the model output which returns to baseline (Fig. 2h–j). Moreover, this effect is most detrimental to task performance when ablation occurs at the start (Fig. 2k). These findings are consistent with the observed freezing effect of cerebellar lesions on gait53. In line with both cortical and cerebellar networks working jointly to perform the task, we find that when the RNN is fully plastic cerebellar ablations have a significant but reduced impact on the cortical dynamics (Fig. 2k, l and Fig. S5). This impact is further reduced when training the cortical RNN with more powerful artificial learning algorithms (Fig. S6), suggesting that the extent of cerebellar involvement depends not only on the presence but also effectiveness of cortical plasticity. We also observe that the cortical RNN is particularly sensitive to the presence of noise in cerebellar output. When noise is added to its output it leads to irregular behaviour (Fig. S7), in line with the classical motor symptoms of cerebellar ataxia54.
Taken together, this motor-based task highlights the computational benefits of training a cerebellar network to drive cortical dynamics, predicting that the cortex can critically depend on cerebellar feedback for successful task execution. Furthermore, we demonstrate that cerebellar plasticity can effectively replace the need for local cortical plasticity.
Cerebellar-mediated task switching in cortical networks
We have shown that cortico-cerebellar loops can enable successful task learning with minimal cortical plasticity. This opens the possibility of reusing cortical networks across different contexts and behaviours.
To demonstrate the model’s ability to adapt and perform context-dependent task switching, we consider how models trained in the line-drawing task can be retrained to a curl-field variant55. In particular, we analyse how the cerebellar network can (i) successfully enable learning in a new task context and also (ii) rapidly revert, or switch, to a previously learned context.
As expected, when the new task context is introduced to the model, there is a steep increase in error before the model successfully learns the new task (Fig. 3a, left and middle). Notably, however, when the original task is reintroduced, the fixed RNN model recovers the initial dynamics significantly faster than the fully plastic model and more faithfully captures the behavioural data from macaque monkeys55 (Fig. 3a, right). This relatively slow switching back suggests that the fully plastic RNN is more prone to forgetting the original task9.
Fig. 3. Context-dependent cerebellar feedback can enable multi-task learning and switching in the cortex.
a Training error of cortico-cerebellar models originally trained for line drawing (cf. Fig. 2; α = 0.5). The models continue to execute the line-drawing task (left) before being trained on a novel curl-field variant of the task (middle) and then finally switch back to the original task (right). Data from behavioural experiments in macaque monkeys is reproduced here for comparison (bottom; ref. 55). b Average training error across different levels of parallel fibre (PF) task overlap for the different tasks for the fixed RNN (top) and fully plastic (bottom) models. Task periods colour-coded as in a. Arrows denote degree of PF task overlap used in (a, c–f). c Model output for each of the three training periods defined in a for the zero-overlap condition; “zero-shot” output corresponds to the model output in the first trial when task 1 is reintroduced. d Model retention score for task 1. The retention score is computed as the error of task 1 during baseline over the error at the first trial after switching back to task 1. e, f Change in (e) activity and (f) covariance in the RNN population between task 1 (baseline) and after learning task 2. Mean changes in experimental data in F are reproduced (see Methods) from neuronal recordings obtained from premotor (PMd) and primary motor (M1) cortices in macaque monkeys55. All results are averaged over 5 different initial conditions. Error bars represent standard error of the mean.
We then asked how the cerebellar network might enable even faster task switching. In line with observed context-dependent activations56,57 and plasticity rules58 in the cerebellum, we consider cerebellar PFs which are task-specific. The extent of task-specificity at PFs is modelled by the PF task overlap; full overlap (100%) would imply that the same exact PFs are used across task contexts, while zero overlap (0%) implies that a completely different set of PFs is used for each task respectively.
Our results show that the degree of PF task overlap predicts a tradeoff between the speed of learning the new task and the ability to rapidly switch back to the original task (Fig. 3b). Specifically, whilst maximal PF task overlap is beneficial when a new task is introduced, rapid switching is favoured when distinct PFs are used. To highlight the ability to immediately switch back to the original task (zero-shot switch) we focus on the zero-overlap case. For the fixed RNN, but not the fully plastic RNN, the model achieves near-perfect switching to the original task (Fig. 3c, d). Consistent with the need to learn a new task all models show a substantial change in the neuronal activity (Fig. 3e and Fig. S8a). However, we expect that models with minimal local cortical plasticity should result in minimal changes in the underlying dynamics of both tasks. To test this, we measure changes in the the covariance of the neuronal activity between the new task and the initial task (see Methods and ref. 59). As predicted, only the models with reduced cortical plasticity show the minimal changes observed experimentally (Fig. 3f and Fig. S8b). On the other hand, for the fully plastic model the dynamics acquired after switching back to the initial task are significantly different to baseline (Fig. S8c, d). This suggests that the fully plastic model learns a new solution to the initial task, explaining its relative slowness in switching.
Overall, we apply our models to demonstrate a cerebellar-driven solution to multi-task learning and task switching. We show that the underlying dynamics preserved by a fixed cortical RNN, supported by context-dependent cerebellar feedback, can support rapid behavioural changes whilst minimising forgetting of previously acquired task knowledge.
Cerebellar temporal basis supports non-linear drawing task
Above we have modelled a case in which the cerebellum learns to drive cortical dynamics using a specific predictive time-window (namely τ = 150 ms). However, a recent study has revealed a diversity of temporal plasticity windows to be at play in the cerebellum43,60 (Fig. 4a). Such diversity of temporal windows may enable the cerebellum to learn a temporal basis for upcoming events, which may enhance the cerebellum’s ability to predict future outcomes.
Fig. 4. Cerebellar temporal basis supports cortical dynamics of a non-linear digit drawing task.
a Schematic of cerebellar learning with a temporal basis. We consider multiple populations of Purkinje cells with different learning time windows τ. b Model output after training for different input examples of the digit drawing task (fixed RNN; α = 0.1). c Learning curves of models in b together with readout feedback model (blue). d Average training error across different levels of RNN cortical internal memory (α) and plasticity assumptions. e Performance of cerebellar feedback for different numbers of granule cells and and cerebellar time windows. Orange arrow indicates default parameter choices with a single cerebellar time window; red arrow indicates temporal basis model with multiple time-windows. f Model output under control and cerebellar ablation conditions for example inputs (digit 2 in upper panels and digit 4 in lower panels); dashed red line represents model output during and after ablation period. g Average model error across all inputs for control (left) and ablation (right) conditions. h Average error for different degrees of cortical plasticity and ablation periods (middle period illustrated in f, g). i Average change in task error for models with versus without cerebellar feedback during (black) and after (blue) training across different degrees of cortical plasticity. All results are averaged over 5 different initial conditions. Error bars represent standard error of the mean.
To demonstrate the benefit of diversity in temporal windows we consider a more realistic (and challenging) variant of the line-drawing task in which the model is now trained to produce a digit-like output (Fig. 4b; see Methods). This task is selected so as to produce a non-linear and highly varied set of future desired outcomes and therefore the need for richer cerebellar predictions. In particular, we consider a cerebellar network which simultaneously learns with a range, or “temporal basis”, of time-windows τi ∈ [0 ms, 250 ms] such that its prediction effectively spans a relatively long window of upcoming desired outcomes (see Methods).
We find this heterogeneity of cerebellar time windows to enable both faster learning and higher performance thresholds (Fig. 4b, c and Fig. S9). As expected, when considering the simpler line-drawing task having multiple time windows does not improve learning (Fig. S9c). Moreover, in line with the results above, a fixed RNN achieves a performance comparable to the plastic RNN models across different degrees of internal memory in the cortical network (Fig. 4d). When comparing the network performance across different numbers of granule cells and time-windows, we find that higher numbers of granule cells combined with multiple time-window learning achieves the best average learning performance (Fig. 4e). Finally, as with the simpler line-drawing task, we find that cerebellar ablation is detrimental to the maintenance and development of these representations (Fig. 4f–h) in a way that depends on the degree of cortical plasticity (Fig. 4i and Fig. S10).
These results suggest that the diversity of behavioural-specific learning windows observed experimentally in the cerebellum43,60 improve behaviour when in the presence of more challenging task conditions.
Cerebellar-driven cortical dynamics maintains beliefs in an evidence accumulation task
So far we have focused purely on motor-based tasks, but growing evidence strongly suggests that the cerebellum also plays important roles in functions that go beyond direct motor control21,61,62. To demonstrate this we model an evidence accumulation task that has been shown to be cerebellar-dependent27. In this study Deverett et al.27 showed that optogenetic inhibition of the cerebellar output nuclei disrupts the ability of mice to determine whether the left or right cheek received more air puffs over a period of time (Fig. 5a). Unlike the previous tasks, here the desired outcome is only provided at the end of the task, making error-related signals highly sparse.
Fig. 5. Cortico-cerebellar model mimics mouse behaviour during evidence accumulation task.
a Schematic of evidence accumulation task27: a random sequence of non-zero inputs ("air puffs'') is delivered in the leftward (−) or rightward (+) direction. The model must integrate this input and decide at the end of the task which side received more input overall. b Learning curves of models (fixed RNN; α = 0.1) without feedback (grey), with readout feedback (blue) and with cerebellar feedback (orange). c Change in average training error of the cortico-cerebellar model with respect to the no feedback model across different levels of cortical internal memory (α) and degrees of cortical plasticity. d Model beliefs over time without (orange) and with complete cerebellar ablation (purple) in model (upper panels) and data-derived behavioural model (lower panels) reproduced from Deverett et al.27. Thin model lines represent one example seed. Belief P denotes model output probability. e Normalised regression weights at different periods of input presentation (cue) during control (upper) and ablation (lower) conditions for both model (orange line) and behavioural data (black line). f Model and data error under different ablation periods and degrees of cortical plasticity. g Average change in task error for models with versus without cerebellar feedback across different cue durations. h Average change in task error for models with versus without cerebellar feedback during and after training across different degrees of cortical plasticity. All model results are averaged over 5 different initial conditions. Error bars represent standard error of the mean.
Similar to the motor tasks studied above, cerebellar feedback improves task learning relative to models without feedback or with readout feedback (Fig. 5b). Moreover, a fixed RNN achieves performance comparable or even superior to the fully plastic models across a range of degrees of cortical internal memory (Fig. 5c and Fig. S11). These results suggest that weakly plastic cortical networks driven by the cerebellum may also be sufficient for learning cognitive-based tasks with sparse error information.
Next, our ablation analysis reveals strong similarities to the optogenetic observations by Deverett et al.27. In particular, cerebellar ablation greatly impairs the model’s capacity to maintain and develop beliefs, mirroring the behavioural effects observed experimentally (Fig. 5d and Fig. S12). Indeed, using the same behavioural regression performed by Deverett et al.27 (see Methods), we show that cerebellar ablation in latter periods leads to a final choice in which information about previously seen inputs is greatly reduced (Fig. 5e), in line with experimental findings. Because more information is effectively lost, we find that ablation near the end of the task has a particularly detrimental impact on task performance, consistent with behavioural observations (Fig. 5f), and this leads to a sub-chance ability to perform “history-centric” trials which rely more on initial inputs (Fig. S12; see Methods). These ablation results also emphasize that even though the cerebellum is trained with teaching signals close to the end of the task, cerebellar predictions prove to be valuable earlier in the task (Fig. 5f). Finally, to demonstrate that task performance also depends on cortical dynamics, we performed (partial) ablation to cortical RNN and observed similar behavioural deficits (Fig. S13a–c).
Given that cerebellar feedback is necessary to preserve information over time and avoid leaky cortical dynamics, we predicted that the behavioural effect of cerebellar ablation would depend on the timescale of the task and would weaken for shorter task durations. Indeed, we find that the performance effect of ablation increases as a function of task length (Fig. 5g and Fig. S14; see Methods). Like in the previous motor-based tasks, our model predicts that cerebellar feedback is particularly helpful when in the presence of weak cortical plasticity (Fig. 5g, h).
Overall, our model predicts that the proper maintenance of model selectivity depends critically on cerebellar feedback during evidence accumulation. Consistent with behavioural results, these effects are emphasised when cerebellar ablation occurs in the later stages of the task.
Cerebellar feedback sustains cortical dynamics in a delayed association task
Next we aim to demonstrate that cerebellar networks can also effectively drive cortical dynamics in tasks with long delay periods, while capturing both neuronal and behavioural observations. To achieve this we model a delayed association task which was recently shown to dependent on cortico-cerebellar loops25. In this study mice were presented with one of two stimuli (left or right) followed by a delay period, after which they were trained to lick in the corresponding direction (Fig. 6a, top). At the same time neural selectivity was recorded both in the anterior lateral motor cortex (ALM) - a working memory and planning region - as well as the cerebellar output nuclei (Fig. 6a, bottom). Timed photoinhibition was used to reveal ALM selectivity to strongly depend on the cerebellar output nuclei, and vice versa.
Fig. 6. Cerebellar network sustains cortical dynamics during delayed association task in line with optogenetic experiments.
a Delayed association task (top); a sensory cue is presented followed by a delay and decision period25. The cortico-cerebellar loop models the interactions between a working memory region and a cognitive module of the cerebellum (bottom). b Learning curves of model without feedback (grey), readout feedback (blue) or cerebellar feedback (orange) for models with an input plastic RNN (α = 0.1). c Change in average training error of the cortico-cerebellar model with respect to the no feedback model across different levels of cortical internal memory (α) and degrees of plasticity in the cortical RNN. d Cue selectivity during the delay period without (left) and with cerebellar ablation (right; blue area denotes period of ablation and thin line shows control) in the model (upper panels) and optogenetic experiments (lower panels) reproduced from Gao et al.25. e First decision principal component (dPC) during the delay period without (left) and with (right) cerebellar ablation in the model (top) and in optogenetic experiments (bottom)25. f Cue selectivity during the delay period with cerebellar ablation when using the fully plastic RNN (cf. d). g Model error during cerebellar ablation (input plastic RNN; control error shown with dashed-dotted line). Dotted grey line denotes chance level. h Average error from cerebellar ablation at different points during the delay period and different degrees of cortical plasticity. i Average change in task error for models with versus without cerebellar feedback during and after training across different degrees of cortical plasticity. j Model error for different numbers of cerebellar granule cells (GCs) and delay period lengths in the delayed association task (fixed RNN; α = 0.1). k Signal-to-noise ratio (SNR) of RNN activities (left y-axis) and number of GCs needed to decode the stimulus from these activities (right y-axis). Results are averaged over 5 different initial conditions. Error bars represent standard error of the mean. Mouse schematic in panel a used with permission from Petrucco, L. (2020). Mouse head schema. Zenodo. 10.5281/zenodo.3925903.
To model this task we follow the same protocol used experimentally25, where one of two possible cues are presented followed by a delay period, after which the model makes a cue-based response (left or right; see Methods). Given the lack of sensory or teaching information during the delay period the cortico-cerebellar network it is particularly vital in this task to sustain stimulus representations. It is important to note that a standard randomly initialised RNN is unlikely to achieve this property, since memories of previous inputs naturally decay in the absence of task-induced plasticity19.
We observe that cerebellar feedback consistently enables task acquisition (Fig. S15), and identify a particularly interesting case when plasticity in the RNN is limited strictly to its input synapses (input plastic). In this case cerebellar feedback significantly improves cortical learning to reach near-perfect performance, whilst also enabling a high degree of stability in task selectivity throughout the delay period (Fig. 6b−d and Fig. S15). We speculated that for this task input plasticity is particularly important, because the cerebellum is required to sustain task-specific predictions in the RNN throughout the entire delay period. We verified this stronger cerebello-cortical drive by using concepts from control theory63. In particular, we can explicitly relate cerebello-cortical optimisation to a quantitative increase in the impact, or energy, of cerebellar feedback onto RNN activity (Fig. S16; see Methods). Moreover, the ability of the cerebellum to drive cortical task dynamics should depend on the cortical network’s intrinsic ability to provide a rich temporal representation of the task. In line with this view our results show that (even untrained) cortical recurrent weights are important in maintaining cerebellar predictions over time (Fig. S17). Coupled with the relative failure of the model to learn with an open loop cortico-cerebellar architecture (Figs. S1d and S15), this is consistent with a recent follow up study of this experimental paradigm which specifically highlights the importance of conjunctive cortico-cerebellar communication for task acquisition26.
Next, to demonstrate that the cerebellum helps drive task-specific dynamics in the cortical RNN we performed a simulated ablation in which the cerebellum is transiently removed during the delay period. Consistent with in vivo neural recordings25, we find that both cerebellar and cortical ablation drastically disrupts cortical task selectivity (Fig. 6d and Fig. S13d–f). We next show a similar effect in the model’s latent dynamics: using demixed principal component analysis64 we observe that the choice component of the RNN’s population dynamics collapses rapidly during the ablation period, consistent with neural data (Fig. 6e). As with the previous tasks, our model predicts that this effect depends on the degree of plasticity in the cortical RNN. In particular, a fully plastic RNN notably fails to capture the strong dependence on cerebellar feedback as observed experimentally (Fig. 6f and Fig. S18; compare with Fig. 6d, bottom right). Indeed, we only observe an effect on performance consistent with experimental findings when cortical plasticity is limited (Fig. 6f−i). Taken together our results suggest that the cerebellum, not the cortex, is the primary site of learning during the acquisition of this working memory task25.
As mentioned, a prevalent feature in classical cerebellar theories is that the divergence provided by the granular layer enables a linear separation of similar inputs34,35,65. Whilst this has typically been studied using isolated models of the cerebellum, it has recently been suggested that this feature may be of relevance in the context of memories in the cortex which merge or “collapse” onto similar representations over time66.
We tested this in our model and observed that a large quantity of cerebellar granule cells is indeed particularly valuable when the initial stimulus is followed by a long delay (Fig. 6j). In particular, our results show that as the signal-to-noise ratio (SNR) of the cortical RNN activity decreases over time, more granule cells are required to decode the stimulus from that activity (Fig. 6k; see Methods). It should be noted that while the learning rule operates within a 600 ms window, cerebellar predictions become effective only after 1.2 s. The model therefore demonstrates that the cerebellum is uniquely placed to decode cortical representations whose task-relevant signals naturally weaken over time. This may explain recent experimental results which suggest the cerebellum is particularly important for tasks which induce long delay periods67.
Overall, these results demonstrate that our model can capture working memory tasks and the observed dependency of cortical dynamics on cerebellar input. Moreover, our model makes the prediction that the cerebellum is a key site of plasticity during acquisition of delayed association tasks.
Cerebellar task knowledge can be consolidated in the cortex
In each of the previous tasks, cerebellar feedback is shown to mediate learning and the maintenance of task-specific cortical dynamics. However, the neocortex is known to encode long-term representations of tasks10. This suggests a need for a “consolidation” period, during which the memory stored in the cerebellum may be transferred to cortical areas.
To demonstrate cerebellar-to-cortical systems consolidation in our model we develop consolidation-specific learning rules. To achieve consolidation we train cortical recurrent weights to mimic cerebellar input (see Methods). In principle, this should be readily attainable, since the addition of cortico-cerebellar feedback itself can be interpreted as a low-rank modification of the RNN weights68. We also gradually decay the cerebellar-to-cortical input weights so that over training the cerebellum stops driving the cortical network, thereby giving full control of the task to the cortical RNN (Fig. 7a).
Fig. 7. Cerebellum can mediate task consolidation in the cortex.
a Schematic of proposed theory of cerebellar-to-cortical task consolidation. During the initial learning phase (left), task representations are primarily driven by the cerebellum and RNN connectivity is not yet task-specialised. During the consolidation phase there is a period of cerebellar-to-cortical (CC) task information transfer (middle), whereby CC interaction drives plasticity in the cortical RNN. After consolidation (right), the RNN can operate effectively without the need for cerebellar input. The colour of the structures reflects the importance of each component throughout consolidation. b Model error in the delayed association task (Fig. 6) throughout consolidation with (purple) and without (orange) cerebellar ablation. For reference an optimal consolidation model is also given (green). Dotted black line denotes chance. c Model selectivity with and without cerebellar ablation at different stages of the consolidation process; titles colour coded according to arrows in (b). d Strength of the cerebellar-to-cortical weights (; top), local cortical weights (Whh; middle) and change in local cortical weights (ΔWhh; bottom) over the period of consolidation. Strength and change is measured by the Euclidean norm. e Cosine similarity between cRNN (RNN and cerebellar network) activities before and during consolidation. f Cosine similarity between the learned recurrent input currents (generated locally in the cortical RNN) during consolidation and the total cortical input current (generated locally and by cerebellar-cortical input) in the pre-consolidation network. Similarity of the consolidation model is shown in orange and the optimal consolidation model in green. g Task error after the consolidation period for models with different initial degrees of performance prior to consolidation. Results are averaged over 5 different initial conditions. Error bars represent standard error of the mean.
We tested this computational theory of consolidation on the cortico-cerebellar models (input plastic condition) trained on the previous delayed association task (Fig. 6d, top left). We consider two types of learning rule. The first is a simple biologically plausible rule, which depends on the ratio of cerebellar-to-cortical input and total RNN activity. Specifically, the recurrent weight wij from cortical neuron i to j evolves according to where is the jth row of and the denominator a normalising factor. This normalisation factor helps ensure stability, as commonly used in a range of models of long-term synaptic plasticity69,70.
which may be computed by cortical interneurons71. For comparison we also consider a theoretically optimal (but biologically unrealistic) rule based on a least squares solution (see Methods).
In both cases, we observe that the RNNs gradually learn to perform the task without the need for cerebellar input (Fig. 7b, c). During this period, the cerebello-cortical weights decay gradually to zero, whilst relatively small but important weight modifications take place within the cortical RNN (Fig. 7d). By construction of the learning rule, the cerebello-cortical activities throughout the consolidation period closely resemble, or “replay”, their original pre-consolidation values, and the RNN is eventually able to independently recreate the (pre-consolidation) cerebellar-dependent dynamics (Fig. 7e, f). Such “replay” of task-dependent dynamics is consistent with experimental observations of cerebello-cortical interactions during sleep72.
We also find that a model with fixed RNN connectivity does not perform as well as the input plastic condition (Fig. S19). This is likely due to improved network stability when using the input plastic case, compared to the purely fixed RNN (Fig. S18a, b and S20). Related to this, we find that models which have not yet perfected the task exhibit worse performance after consolidation (Fig. 7g).
In summary, the framework we introduce here suggest that the cortico-cerebellar loops may play an important role in systems consolidation by gradually transferring the rapidly learnt cerebellar knowledge to the cortex.
Discussion
Growing experimental evidence suggests that cortico-cerebellar loops support behaviour, but their computational roles have remained unclear. Here we have introduced a systems-level modelling framework in which a feedforward cerebellar network receives the state of a cortical RNN and provides task-specific predictions in return. In our model, cerebellar feedback facilitates learning by shaping the underlying cortical dynamics during motor and cognitive tasks in a way that is consistent with both behavioural and optogenetic studies. Our work suggests that the cerebellum is a key site of learning in the brain, allowing for rapid context-switching of cortical dynamics that underlie behaviour. We finish by introducing a theory of cerebellar-to-cortical system consolidation, in which task-specific knowledge is gradually transferred to the cortical network.
Our model is related to previous network architectures in that it uses feedback to enhance neuronal representations and selectivity in an otherwise fixed RNN, thereby facilitating task-relevant downstream processes15,16. There is a growing interest in neuroscience on the role that feedback can play in cortical circuits. For example, two recent theoretical studies demonstrate how thalamic feedback implemented by cortico-thalamic loops can flexibly prepare and execute motor sequences17,18. We highlight two key computational differences in our work. First, in our model feedback is not derived by a linear function of the RNN (as usually done when using simple readout or thalamic networks), but from a divergent cerebellar-like feedforward network (Fig. 1). Second, our model incorporates behavioural timing-specific learning rules in line with experimental findings41,43–46. We show that these cerebellar features improve task-acquisition against a standard readout feedback architecture14–16 (Figs. 2, 4, 5 and 6). Interestingly, we observe that these cerebellar learning rules shown competitive task-performance when compared to optimising the cerebellum using error signals derived directly from the cortical readout using the backpropagation algorithm (Fig. S21). Conceptually, therefore, our model makes the prediction that error-driven cerebellar plasticity alone suffices to successfully learn and maintain adequate task representations, whilst the cortex remains relatively stable.
By retraining cortico-cerebellar networks in a novel task we propose a key role of the cerebellum in task switching (Fig. 3). In particular, we show that cerebellar feedback may provide a solution to the problem of context-dependent adaptation, which requires (i) an ability to learn a new context but also (ii) an instant retrieval of appropriate response to previously learned contexts73,74. Interestingly, we observe that while recurrent cortical plasticity enables adaptation to a new task context there is catastrophic forgetting of the original context. This is at odds with well-known behaviour in the primate, and provides a computational explanation for why local modifications in the monkey cortex during motor adaptation appear to be limited55. In our model rapid task switching is achieved by context-specific activation of cerebellar parallel fibres. In future work it would be of interest to compare different mechanisms by which the cerebellum may realise context-dependent processing; for example, a recent study has suggested that dendritic gating via cerebellar interneurons may perform this role75. Moreover, recent observations suggest that the cerebellar-driven thalamus enables context-dependent responses in the cortex for movement initiation52,57 and cognitive tasks76. Indeed, our work suggests that fast context-switching is easier to incorporate in the relatively simple, divergent and rapidly learnable feedforward architecture of the cerebellum compared to the highly intricate cortical RNNs with weak plasticity.
There are a number of other well-described cortico-cerebellar properties that would be of interest to study in the context of our framework. For example, it is well known that thalamus is a major intermediary between the cerebellum and cortex. In fact, the study preceding25 identified the thalamic nuclei that receive cerebellar input as a key driver of task-relevant dynamics77. One would therefore expect thalamic perturbation to have similar effects as cerebellar ablation, which is exactly what was previously shown in the same delayed association task77. Indeed, by making a simple extension to our framework we can provide a model that captures the effect of both thalamic and cerebellar ablation (Fig. S22). However, the computational role and possible benefit of the thalamus in mediating cerebellar-cortical communication should be explored further. Incorporation of additional intermediate filters such as pontine nuclei and cerebellum78,79, enforcing sparse connectivity of mossy fibers34,35,80, and considering synaptic plasticity driven primarily by long-term depression43, all are also likely to offer important biological and computational insights.
A unifying framework of the cortico-cerebellar loop, and indeed the cerebellum itself, may extend to non-motor tasks. Recent task-based fMRI studies have revealed functional diversity of the cerebellar cortex across a range of cognitive functions23. Our model inherently implies a high degree of heterogeneity – it suggests that different modules would be required to drive different parts of the cortex that in turn underlie different cognitive functions. In this study, we modeled recent behavioural and optogenetic experimental observations25,27 that directly implicate the cerebellum in supporting cortical dynamics during evidence accumulation and delayed association tasks (Figs. 5 and 6). In particular, our results show that cortico-cerebellar interactions are enough to learn tasks with highly sparse teaching signals (i.e., only at the end of the task). Furthermore, the model predicts that the cerebellar influence becomes more pronounced during longer task durations (typically of the order of seconds). This phenomenon is attributed to both the preservation of task-specific dynamics through the cortico-cerebellar loop and the cerebellum’s intrinsic capacity, which is enhanced by its extensive hidden granular layer, to disentangle task-specific information from overlapping cortical dynamics. Significantly, we can best capture experimental observations in conditions in which RNN plasticity is limited, making the prediction that the cerebellum is the primary site of learning for these tasks. This provides an alternative to the commonly assumed view that cortical areas are optimised for specific tasks6–8. Finally, in contrast to several previous studies, Oostland et al.81 found that a cerebellum-specific transgenic mouse model exhibited faster learning in a sensory evidence-accumulation task. This accelerated learning may be due to compensatory plasticity in other brain regions, but further research is needed to clarify the differences between this study and earlier findings in the field25,27, including the model introduced here.
In our model, the cerebellum drives cortical dynamics based on prediction error signals that depend on the desired task outcome. In alignment with prevailing cerebellar models, we propose that the inferior olive (IO) computes the error signals essential for learning in Purkinje cells (PCs). There exist projections that reach the IO from three key brain regions: the neocortex, the mesodiencephalic junction, and the ventral tegmental area (VTA)82. Of particular relevance to our manuscript are the direct projections from the neocortex and the potential targets relayed via the mesodiencephalic junction83–86. Additionally, there is evidence for connections between the VTA and the IO87, which could transmit reward-based signals, in line with our model for evidence accumulation and delayed association tasks. However, it remains to be tested exactly how the reward-predictive representations developed by our model compare to those found experimentally.
Here we have also introduced a theory of cerebello-cortical task consolidation. Our theory suggests that cerebellar and cortical learning may operate at different timescales: after an initial fast stage of learning driven by the cerebellum, a period of consolidation ensue in which the cortex gradually acquires task-specific knowledge encoded in the cerebellum (Fig. 7). This view of systems task consolidation is in line with growing experimental evidence suggesting an important role of cerebellar-to-cortical task consolidation72,88,89. For example, Xu et al.72 have observed similar replay-like cerebellar-to-cortical task-specific neuronal dynamics in awake and sleep. Such combination of fast and gradual learning is reminiscent of recent experimental results which suggest significantly faster timescales of plasticity in the hippocampus compared to the prefrontal cortex during a cognitive task90. Moreover, the consolidation period can be related to the idea that a task-optimised cerebellum can be utilised as a cortical teacher, following another recently proposed computational model of the cortico-cerebellar loop31,32. Although this model itself fails to capture the observed instantaneous cortical dependency on cerebellar output at very (Fig. S23), we highlight that it is in principle possible for cerebellar-thalamo-cortical projections to act as both a driver and a teacher of cortical states. For example, anatomical evidence supports a dual role in which cerebellar-thalamic projections can deliver “driving” and “teaching” input via the basal and apical dendrites of cortical pyramidal cells, respectively91.
Although our work suggests that the cerebellum is particularly beneficially in the presence of minimal cortical plasticity, this does not mean that there should not be any cortical plasticity. Indeed, in principle, all that is required is that the cortex shows weaker plasticity than the cerebellum (Fig. S24). A key consideration in interpreting our results is the fact that the experimental studies that we consider were performed in adult animals, which are well-documented to exhibit reduced cortical plasticity92–94. Theoretical and experimental research also underscores that relatively feedforward brain areas, such as the cerebellum and hippocampus, possess crucial characteristics like sparse input layers that facilitate rapid learning65,80. Additionally, the cerebellum classically proposed to enable fast motor adaptation through supervised learning, which is in contrast with the more gradual and slow unsupervised learning in the neocortex95.
Our work highlights commonalities of cortico-cerebellar interactions in motor and cognitive tasks alike. However, it also suggests interesting differences. The first marked distinction relates to the increased significance of cerebellar-to-cortical (input) plasticity during pure working memory (Fig. 6). This is in line with recent experimental evidence showing stronger plasticity at higher-order thalamo-cortical pathways40. Indeed, because of the need to sustain information during the delay period without sensory or teaching input, it is advantageous for the network to encode a point attractor-like state (see Fig. S18, left). Cerebello-cortical plasticity39,40 may thus enable greater controllability of cerebellar feedback to push the network to these states during working memory tasks, but less so in motor-based tasks63 (Fig. S16).
Relatedly, the second difference we highlight is about cerebello-cortical consolidation being more readily achieved when in the presence of networks with stable dynamics (cf. Fig. 7 and Fig. S19). We speculate that unstable network dynamics makes cerebellar-to-cortical consolidation less reliable. Therefore, we predict that while cerebellar-to-cortical systems consolidation might be possible for near perfected tasks which involve discrete stable representations (e.g., working memory tasks), for tasks which are not yet fully learned, or which require faster, more dynamic responses (as often required in the motor domain), cerebellar control is likely to be required throughout life.
In conclusion, our work suggests that while the cortex encodes a stable model of the world, it is the cerebellum that allows for quick and flexible adaptation to new environmental conditions. This new cerebellar-guided knowledge can then be gradually consolidated in the cortex.
Methods
Model architecture and training
The complete dynamics of each model architecture that we consider (Fig. S1; no feedback, readout feedback, cerebellar feedback, no feedback with cerebellar readout) are given in Table 1. In all of our simulations we use a recurrent neural network (RNN) with 50 time-discrete units (see section below).
Table 1.
Dynamics of the different model variants, where ht is the cortical RNN state, zt the readout and ct cerebellar feedback
No feedback | Readout feedback | Cerebellar feedback | No feedback (cerebellar readout) | |
---|---|---|---|---|
ht | αht−1 + Whhf(ht−1) + Wihxt | αht−1 + Whhf(ht−1) + Wihxt + Wzhzt | αht−1 + Whhf(ht−1) + Wihxt | |
zt | Wrdtf(ht) | Wrdtf(ht) | Wrdtf(ht) | |
ct | NA | NA | NA |
For the experiments presented here we set and is the cerebellar feedforward network with one hidden layer, . Whh RNN recurrent weights; Wih stimulus-to-RNN weights, Wrdt (cortical) readout weights; , cerebellar-to-RNN weights, WMF cerebellar mossy fibre weights, WPF cerebellar parallel fibre weights; set as ReLU.
Unless otherwise stated, the feedforward cerebellar network contains a single hidden layer with 1000 units (granule cells), but other hidden layer sizes are also considered (Figs. 2d and 4e). This yields a divergence from the cortical RNN to the cerebellar granular layer of 50:1000 = 1:20. The cerebellar output layer, which we interpret as Purkinje cells, on the other hand, mirrors the desired task outcome and is therefore of significantly lower dimensionality (3 in evidence accumulation task and 2 in all other tasks).
For each task simulation, network parameters are initialised as follows. The RNN input, recurrent and cerebellar feedback weights Wih, Whh, are drawn from a uniform distribution where . The readout weights Wrdt and cerebellar weights, WMF, WPF, are initialised according to where bk denotes the “kaiming bound” He et al.96 (slope ). The biases of the cortical readout are drawn from , where nin denotes the input size of the layer. In line with existing models of cortical networks16, in our model we do not obey Dale’s law and use a tanh activation function. In future work it would be of interest to test a variant of our model with explicit excitatory and inhibitory cortical populations. We conducted each task simulation with 5 random seeds for initialisation, which were sufficient to demonstrate the robustness of the model across multiple initial conditions. Note that we also tested several other control models (see Supplementary Figs.).
During the learning of a task model parameters are updated using gradient descent from the task error signal E = ∑tEt with respect to to the model parameters (see section below). For each dataset each training session covers 1000 random examples, presented to the model in batch sizes of 10 which we call a “trial”. The test set (used after training) also covers 1000 randomly generated examples. When analysing the learned network dynamics (e.g., model output with and without cerebellar ablation) the model with the best validation error during training was selected. An ADAM optimiser97 was used with initial learning rate η = 0.001 for the RNN (when plastic), readout and cerebellar network, except for the delayed association task for which we found an RNN learning rate of η = 0.0025 to provide more stable learning. The different plasticity constraints of the entire model - termed “fixed RNN”, “input plastic”, and “fully plastic” - are defined with respect to the cortical parameters of Eq. (1) as follows. For the fixed RNN case, only the cortical readout weights Wrdt are learned. For the input plastic case, RNN input weights and Wih and are also learned. Finally, for the fully plastic case, the recurrent weight Whh is also learned. In all of these cases the cerebellar “parallel fibres” WPF are learned, whilst the “mossy fibres” WMF remain constant, in line with mossy fibres synapses being (relatively) stable33,98.
In each of the considered tasks we report the change in error during and after training as a result of cerebellar feedback (Figs. 2l, 4i, 5h, 6i,). The change in error during training is computed as the average difference in training error between the cerebellar feedback and no feedback models. The change in error after training is computed as the average difference in test error between a trained cerebellar feedback model, and a trained cerebellar feedback model subject to cerebellar ablation. As in the main results this cerebellar ablation after training may be transient. In particular, for the line drawing and digit drawing tasks we consider transient ablation during the middle period of the task, for the delayed association task we consider transient ablation as Fig. 6d−g, and for the evidence accumulation task we consider full cerebellar ablation.
Continuous dynamics of RNN model
A continuous version of our RNN can be expressed as
3 |
where τM is the membrane time constant (not to be confused with the cerebellar time window τ), Rm is the membrane resistance, and f is the rate-based non-linearity which we set as . Discretising Eqs. (3) with timesteps of Δt yields equations in Table 1, where . Note that as in38 we ignore the (1 − α)Rm. This simplifies notation and has no effect on dynamics if model weights are scaled accordingly. In general we use τM ≈ 20 ms and Δt = 50 ms for the drawing tasks (Figs. 2, 3 and 4) and a higher τM ≈ 90 ms with Δt = 200ms for the cognitive tasks (Figs. 5, 6 and 7) in line with6). In both cases this gives us a cortical internal memory α = 0.1.
Cortical and cerebellar learning rules
When the desired task outcome yt is provided the associated error is computed as for the cortical network and for the cerebellar network, where denotes the task error function (mean squared error and cross-entropy loss for regression and classification tasks respectively) and τ is the cerebellar time window. The error gradients for the readout and cerebellar weights Wrdt, WPF can then be obtained locally with a simple delta-rule on the gradient of the error signal. That is,
4 |
where η denotes the learning rate of the cortico-cerebellar network and g denotes the hidden granule cell activity of the cerebellar network which is computed as (cf. Eq. (2)).
For the input/recurrent weights Wih, , Whh - when plastic - obtaining error gradients is more difficult as temporal dependencies need to be considered. To improve biological feasibility in this work we avoid backpropagation through time (BPTT) and instead use the eprop algorithm38. Details can be found in ref. 38, but the main idea is that BPTT can be approximated with a mixture of locally computed synaptic eligibility traces and current learning signal. Specifically, the error gradient for a given synapse wji from neuron i to j is computed as
5 |
where for ease of notation we now use the superscript to denote timestep t and is the neuron j learning signal (obtained by one-step backpropagation through space except for the cerebellar readout architecture in Fig. S1d). is the synaptic eligibility trace of wji which is computed as defined recursively by
6 |
where is initialised as zero. Note that the terms in Eq. (6) are locally available to the synapse. In the case of our network dynamics (Eq. (1)), the eligibility trace is simply defined by , where ai is the activation of the presynaptic neuron i (e.g. or ci).
For all weights, the error gradients are accumulated across multiple examples (i.e., batch update) and timesteps before the weights themselves are updated.
Learning rules for cerebellar-to-cortical consolidation
A period of “consolidation” is considered for the trained models of the delayed association task (Fig. 7 and Fig. S19). During this period the model is presented with further trials (batch size 10) of training data but without their associated targets. The forward dynamics of the model then run as normal (Eq. (1)) but now we use a consolidation learning rule for the RNN weights. We consider both an optimal learning rule which uses the least-squares algorithm and also a simple biological learning rule.
We first present the optimal consolidation learning rule, since this motivates the biological rule. We want to change the recurrent (cortico-cortical) input to match the cerebellar-cortico input over the task. To this end we concatenate the time-dependent RNN activities H = ⨁t≥1ht and cerebellar output activities C = ⨁t≥1ct, where ⊕ denotes vector concatenation. We then set the change in recurrent weight ΔconsWhh with where is the RNN consolidation learning rate and Flsq is the least-squares solution
7 |
At the same time the cerebellar-cortical weights decay according to
8 |
where is the rate of cerebellar-cortical decay. In the experiments shown we select .
For the biological learning rule, the cerebellar-cortical weight decays as in Eq. (8) but now the RNN weights are updated according to the ratio of cerebellar feedback against the whole population activity. That is, for the recurrent weight from neuron i to neuron j we have
9 |
for arbitrary timestep t and where denotes the jth row of the cerebellar-cortical weight .
To demonstrate that Eq. (9) leads to changes in cortico-cortico input which are proportional to the cerebellar-cortical input, we see that the change in recurrent input to a given RNN neuron j at time t becomes
10 |
That is, we recover a solution (up to proportionality) to Eq. (7). For this biological learning rule, to improve network stability, we found it beneficial to increase the RNN consolidation learning rate such that (where Δconswij is accumulated over the whole sequence). This explains the initially faster learning (over the first few trials) for the biological learning rule (Fig. 7F).
For this consolidation learning period a learning optimiser is not used (i.e. ADAM is not used). Note that these consolidation learning rules do not require information about the desired task outcome (i.e. target) and are in that sense unsupervised.
Demixed principal component analysis
To study the response dynamics specific to task variables in the delayed association task (Fig. 6) we perform demixed principal component analysis (dPCA)64. dPCA extracts low-dimensional components that explain maximum population variance constrained by task-specific variables. As a result we obtain principal components that are specific to task variables; in this case the task variable of interest is animal/model choice. The neural data we provide as input to dPCA is a three-dimensional array (n, s, t) with each dimension representing average neuronal activity (concatenated across animals/seeds), choice identity and time, respectively. dPCA is applied to the model representations (after learning) and neural data acquired in ref. 25.
Task details
Line drawing task
For the line drawing task, the model has to transform one of six possible 10-dimensional binary inputs x ∈ [0, 1]10 at timestep 1 into an associative “go” 2-dimensional line yline (for five of the inputs) or a “no-go” stay at the origin (for one of the inputs). The starting point for each line is the origin, and the endpoints of each line are evenly spaced on the edge of the unit circle (see Fig. 2a, black dashed line). The model learns to draw the line over 20 discrete timesteps, with the intermediate target points spaced evenly, i.e. for a line with endpoint yend we have .
For the stimulus timestep (timestep 1) as well as the remaining 19 timesteps, the model receives (through its Wih connection) zero-mean Gaussian noise with σ = 0.1. Model errors are computed as the mean-squared error to the target response. Unless otherwise stated a cerebellar time window τ = 3 timesteps ( ≈ 150 ms when α = 0.1) is used. The prediction error across time delay t0 between cortical output and cerebellar (or cortical) output (Fig. 2e) is computed as the cue/time average , where ∣∣. ∣∣ is the Euclidean norm.
To analyse the effects of cerebellar ablation we consider partial cerebellar ablation at the start, middle, and end of the sequence (Fig. 2h−k and Fig. S5). The specific time windows of these ablation periods are timesteps [1-6, 8-13, 15-20] (inclusive), respectively.
Curl-field variant: Once the models of the line drawing task are trained, we tested whether they could re-translate the same external inputs to a curl-field variant of the task (see ref. 55). For this we selected models with cortical internal memory α = 0.5, since we found this resulted in faster learning which was comparable to the presented experimental data55, but we find α = 0.1 (as presented in Fig. 2) also learns but more slowly. Switching and learning this curl-field new task “context” involved retraining the models to new desired outcomes (central grey curves in Fig. 3c).
Specifically, the curl-field target responses have the same end-point for each line (or same “no-go” zero cue), but intermediate target points now form a semi-ellipse between the origin and the respective end-point. Given the desired endpoint , this can be parameterised by
11 |
where is the angle to the end point and t runs uniformly between 0 and π (or, for direction towards (xend, yend) as in our experiments, from π to 2π).
To test how context-dependent cerebellar processing could enable rapid task switching, we considered the extent to which parallel fibre (PF) weights are shared across task contexts. In particular, we label the percentage of PFs used for each context as the PF task overlap. For example, if the PF task overlap is 25%, then 25% of the PFs used for cerebellar processing apply to both task contexts, whilst 75% specifically apply (and are trained) to the current context. Before learning, the PFs which are not shared (i.e. only apply to the curl-field context) are initialised randomly as in the original line-drawing task.
Neuronal activity and covariance during task switching: The change in activities and change in covariances (Fig. 3d−f and Fig. S8) are computed as in ref. 59. We record the RNN time-dependent activities (post non-linearity) given 1000 input examples in multiple periods: task 1 baseline, task 2 and task 1 switching (Fig. 3a). For the latter two periods these are recorded at their respective end, whilst we take two samples of the baseline period at its start and end. The change in activity between any two periods P1 and P2 is the average change in activity for a given neuron i, which is given by
12 |
where , are the time-varying input-dependent activities of neuron i for periods P1, P2 respectively, and stdi is the standard deviation of that neuron in the start of the task 1 baseline period. Here ∣. ∣ denotes the average (absolute) difference in activity across timesteps and input examples.
For each period, we also compute the covariance matrix of the RNN population. The change in covariance between two sessions is then computed as 1 minus the Pearson correlation between their respective covariance matrices59.
For the task 2 and task 1 switching periods we report changes with respect to the start of the task 1 baseline period. To account for natural variability in the network and better compare to the neural data in55, we normalise the changes by taking away the changes observed within the baseline period itself. For example, the change in covariance in the task 2 period is , where B1, B2, T2 are the start of the task 1 baseline, end of task 1 baseline, and (end of) task 2 respectively. We apply the same normalisation to the reported experimental changes in the monkey M1 and PmD55; this normalisation leads to (average) near-zero change for the M1 activity and PmD (Fig. 3f).
The number of training trials for training in task 2 shown in Fig. 3a (500 trials) leads to good, but not perfect, performance. To demonstrate that the models can eventually perform task 2 to a close to perfect standard, the model outputs presented in Fig. 3c underwent 1000 trials of training.
Digit drawing task
For the digit drawing task the inputs are the same as the 10-dimensional binary vectors used in the line drawing task, except now the model must draw an associative digit over 20 timesteps instead of line (Fig. 4a). The targets ydigit are constructed manually within the space [0, 1]2 and resemble the digits from 0 to 5 (inclusive). For exact implementation refer to the provided code (see below).
For the standard model with cerebellar feedback a cerebellar time window τ = 3 timesteps ( ≈ 150 ms when α = 0.1) is generally used. For the model using cerebellar feedback with a temporal basis, we model the cerebellum with a range of time windows, i.e. for some distinct τi ≥ 0 ms. In this task we consider with τi = i timesteps (i.e. 0-250 ms), so that the final cerebellar output is a concatenation of task predictions which span over the proceeding 250 ms period. Explicitly, after training we have cerebellar feedback, , where ⊕ denotes vector concatenation.
Zero-mean Gaussian noise is added to the input at each timestep. Model errors are computed as the mean-squared error to the target response.
To analyse the effects of cerebellar ablation we consider the same partial cerebellar ablation periods as in the line-drawing task. That is, we consider cerebellar ablation at the start, middle, and end of the sequence (Fig. 4 and Fig. S10), which correspond to timesteps [1-6, 8-13, 15-20] (inclusive), respectively.
Evidence accumulation task
In the evidence accumulation task the model receives 2-dimensional binary inputs (i.e. x ∈ [0, 1]2) over a presentation period of timesteps. A non-zero input can occur for at most one of the two dimensions; that is, , where the rate of zero inputs defines the sparsity of input ρ (ρ = 0.7 in our simulations). After this presentation of input there is then a delay period of Tdel = 5 timesteps after which the model must classify at which dimension more non-zero input was received (or whether the number at each dimension was the same). That is, the desired outcome y takes one of three values which respectively correspond to more input in the first dimension, more input in the second dimension, or the same. This task resembles the experimental structure of27, in which mice were trained to select the side of their whiskers which received more air puffs.
Zero-mean Gaussian noise is added to the input at each timestep. Model errors are defined by the cross-entropy loss to the target response.. Model “belief” (Figs. 5d and S12) is defined as the model probability (obtained by applying a softmax on the readout) of the correct classification. Unless otherwise stated a cerebellar time window τ = 3 timesteps (≈600 ms when α = 0.1) is used. For both readout and cerebellar feedback models, we apply a softmax operation to the feedback returned to the RNN so as to bound its values between 0 and 1.
To analyse the effects of cerebellar ablation we consider full cerebellar ablation (for the entire sequence 1-50; see Fig. 5d and Fig. S12a–c, left) and also partial periods of ablation: at the start, middle, and end of the sequence (Fig. 5e, f and Fig. S12a–c, right). The specific time windows of these partial ablation periods are timesteps [1-15, 15-30, 30-45] (inclusive), respectively. To improve readability of our results, the mean error presented in the training curves for this task is smoothed using a Savitzky-Golay filter with window length 25 and polynomial order 3.
To compute the dependence of model choice on inputs over different temporal bins (Fig. 5f), we follow the method in27. In particular, we divide the presentation period evenly into 3 time windows - [1-15, 16-30, 31-45] - and fit the model choice according to a logistic regression model
13 |
where denotes the predicted model choice probability, S is the sigmoid logistic function, Ei = #Ri − #Li is the different in the total number of ‘right’ and ‘left’ inputs in window i, and βi is the respective weight on that window. is fitted to minimise the negative log likelihood of the observed model decisions. We present the normalised weights of each window .
History-centric cases: In line with27, we observe cerebellar ablation to be particularly detrimental to input examples for which correct classification would depend on adequately maintaining past inputs (Fig. 5e, f and Fig. S12), which we refer to as “history-centric” examples. We define an input example as being history-centric if exposure only to the final third of the input sequence would lead strictly to the wrong answer. That is, examples (x, y) such that the “final-third target” is not equal to the desired outcome .
Sub-second task lengths To identify whether dependency on cerebellar feedback holds for shorter timescales, we consider cue presentation periods from 0.1 − 1s (Fig. 5g). For these simulations there is no delay period and the sparsity of input is ρ = 0.5. We apply a finer time discretisation so that Δt = 10ms; we redefine the cortical internal memory α and rescale the network parameters accordingly. The cerebellar network is trained with a time window τ = 3 timesteps in each case.
Delayed association task
In the delayed association task the model must associate one of two 10-dimensional binary inputs at timestep 1 to a desired binary response y at timestep T, where T is the sequence length or “delay” period25. We select T = 15 timesteps but also consider other lengths (Fig. 6J). The task error (as presented in the main text) is defined at the end of the sequence. For stability, we train the network output 5 timesteps from the end of the sequence (timestep 10 onwards when T = 15).
Zero-mean Gaussian noise is added to the input at each timestep. Model errors are defined by the cross-entropy loss to the target response. Model “selectivity” is defined as the model output (readout) at the dimension of the correct classification (prior to the softmax operation). Unless otherwise stated a cerebellar time window τ = 3 timesteps (≈600 ms when α = 0.1) is used. For both readout and cerebellar feedback models, we apply a softmax operation to the feedback returned to the RNN so as to bound its values between 0 and 1.
To analyse the effects of cerebellar ablation we consider cerebellar ablation within a particular time window between timesteps 8-12 (inclusive) which approximately mirrors the timings in25 (Fig. 6d, e and Fig. S18) and also partial ablation periods during the start, middle, and end of the sequence (Fig. 6f). The specific time windows of these partial ablation periods are timesteps [1-5, 6-10, 11-15], respectively. To improve readability of our results, the mean error presented in the training curves for this task is smoothed using a Savitzky-Golay filter with window length 25 and polynomial order 3.
For this task we consider how the model evolves during a consolidation period (Fig. 7). At the end of each consolidation trial we observe the model error (Fig. 7B), activity (Fig. 7E) and recurrent input (Fig. 7F) over a test set of 1000 randomly generated examples. The activity here is the concatenation of activity in the cortical RNN and the hidden layer of the cerebellar network (over all examples and timesteps). We compute the cosine similarity between these activities and the initial activities prior to consolidation; for comparison we also the cosine similarity been the initial activities and a shuffled version of the initial activities (averaged over 100 samples). To analyse how the recurrent input changes we proceed as follows. At each timestep we consider the cortical RNN state h and cerebellar feedback c. We then compute the cosine similarity between Whhh and , where are the pre-consolidation RNN weights and cerebellar-cortical weights, respectively.
Control-theoretic estimation of cerebellar feedback
For the delayed association task we analyse cerebellar-to-cortical input from a control-theoretic point of view. In particular, we quantify the effect of plasticity in the pathway between the cerebellar network and cortical RNN () on cortical activations by estimating the energy cerebellar feedback induces in RNN state space63. This level of energy reflects the potency of feedback onto the RNN: a low energy would reveal a suppressed RNN response, whereas a high energy would reveal an amplified response. We speculated that these two cases would arise from a non-optimised and optimised , respectively (Fig. S16a).
As per Kao and Hennequin63, we compute the energy of cerebellar feedback through the controllability Gramian P associated with RNN dynamics. Informally, P describes the “intrinsic manifold” of the RNN and describes the directions in which the RNN is most (or least) likely to visit. Formally, given a direction v in state space, the average energy generated along direction v is
14 |
In general, the Gramian matrix P is only defined for linear systems. In this work we therefore generalise the notion of controllability for the non-linear RNN dynamics as defined in Eq. (1). Here we use the noise covariance matrix Σ in its place, which for linear systems is shown to be equivalent to the Gramian, Σ = P63. Explicitly, we compute Σ as the time-course average covariance of RNN hidden activations ht under noisy inputs which follow a Wiener process. That is, where is a set of N samples of RNN states which each evolve according to
15 |
In our experiments we use N = 500 samples and simulate Eq. (15). To ignore intrinsic RNN transients that occur at the start of simulation, we discard the RNN states during the first 5 simulation timesteps when computing Σ. The energy generated from cerebellar feedback is then , where is the normalised direction being driven by the cerebellum in RNN state space. We report the energy generated (during the noise dynamics of Eq. (15)) by cerebellar feedback at timestep 10, a time chosen strictly after the initial RNN transient phase (Fig. S16b). For comparison we compare this to the energy generated by 100 random sample directions where I is the identity matrix. To enable greater interpretability we then normalise these energies by its highest possible value ; i.e. the input which elicits maximal amplification of RNN dynamics. This value can be computed as u⊤Σu where u is the principal eigenvector of Σ.
Cerebellum decodes low-signal cortical representations
For the delayed association task we discussed the need for a greater number of hidden cerebellar units (granule cells) to achieve good task performance (Fig. 6j). In particular, we find that the number of granule cells (GCs) required is inversely proportional to the signal-to-noise (SNR) of the RNN hidden neurons.
To estimate SNR(RNN) in the models for the delayed association task (Fig. 6k, left axis), we suppose that the activity population activity in the RNN can be divided into two components such that f(h) = ζ + ω, where ζ is a task-dependent component which depends on the current task condition s (i.e. left or right stimulus), and ω is a task agnostic component which does not depend on s (but instead depends on, for example, intrinsic RNN connectivity and noise). The SNR is then defined as the ratio of the variance of these two respective components: .
We compute the variance of the task-agnostic component as the (average) variance of the population under the same task stimulus s, i.e. . Be equally calculating the total variance , the variance of the task-relevant component is then simply computed as the difference to the total variance, i.e. . To determine the minimum number of granule cells required to decode the stimulus from the RNN activity (Fig. 6k, right axis), we tested whether the cerebellar network could be trained to successfully discriminate the stimulus after 40 training sessions for varying quantities of granule cells (quantities as described below). The cerebellar network was deemed to successfully decode the stimulus if, for at least 4 of the 5 seeds, the average error during the last 4 training sessions was less than 5%.
Cerebellar-thalamic feedback
The thalamus is a key intermediary between the cerebellar output nuclei and cortex. To demonstrate that our results hold when this brain region is implemented we also consider a cortico-cerebellar-thalamic circuit (Fig. S22a) for the delayed association task. In this circuit we model the thalamus a feedforward network of 4 hidden units. Both the RNN and cerebellar output activity are projected onto the thalamus via a random connection (initialised as per the other feedforward layers in the model such as the cortical readout), which remains fixed throughout learning. The thalamic activity θt is then used to update the cortical dynamics as in Eq. (1) (i.e. replace ct with θt).
With this model not only can we demonstrate that the cerebellum is still crucial for maintaining task representations within the cortico-cerebellar-thalamic loop, but also replicate experiments in which the thalamus itself is directly shown to impact cortical dynamics. Specifically, we weaken the thalamic output by a factor of 0.25 and observe subdued cortical selectivity as observed experimentally77.
Comparison with cerebellar-mediated cortical plasticity
To the best of our knowledge the only other general computational model of the cortico-cerebellar loop as proposed by Boven et al.31,32. In this model the cerebellum provides the cortex with predicted teaching signals which mediate the local RNN weight update. Whilst we believe our model can work in combination with this model (see Discussion), we highlight that it inherently fails to capture the dependency of cortical dynamics on cerebellar output at fast timescales (e.g., single trial).
To demonstrate this, we implemented this model with cerebellar-mediated cortical plasticity in the multitask learning paradigm and the delayed association task (Fig. S23). For the multi-task learning paradigm we use two separate cerebellar modules, or “synthesizers”, for each task context as in the zero parallel fibre overlap case (cf. Fig. 3). We set the initial learning rate as 0.001 for all parameters in the multi-task paradigm and 5e-5 for the delayed association task. We used the same number of RNN and cerebellar units as in our model with a truncation size (or “cortical feedback horizon”) of one timestep with the backpropagation through time algorithm. In general, whilst these models are capable of learning an individual task, they were incapable of replicating fast cortico-cerebellar dependency as observed experimentally or enabling fast switching between different task contexts.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Source data
Acknowledgements
We would like to thank the Neural & Machine Learning group, Ellen Boven, James M Shine, Paul Dodson, Everton Agnes, Laureline Logiaco, Jake Stroud, James Bennett and Heike Stein for useful feedback. J.P. was funded by a EPSRC Doctoral Training Partnership award (EP/R513179/1), P.C. by the Wellcome Trust (209453/Z/17/Z) and R.P.C. by the Medical Research Council (MR/X006107/1), BBSRC (BB/X013340/1), EPSRC (EP/X029336/1) and a ERC-UKRI Frontier Research Guarantee Grant (EP/Y027841/1). This work used the HPC system Blue Pebble at the University of Bristol, UK. We would like to thank Dr Stewart for a donation that supported the purchase of GPU nodes embedded in the Blue Pebble HPC system.
Author contributions
J.P. developed computational framework with the guidance of R.P.C. J.P. performed all numerical and analytical work. J.P. and R.P.C. wrote the manuscript, with contributions from P.C. R.P.C supervised the project.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Data availability
Simulated data used for our experiments can be generated using scripts https://github.com/neuralml/ccLoops(10.5281/zenodo.13960780) The experimental data used to test the model predictions is available from the respective papers as indicated throughout. Source data are provided with this paper.
Code availability
We used the PyTorch library (version 1.7.0) for all neural network models. The code is available at https://github.com/neuralml/ccLoops(10.5281/zenodo.13960780).
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Joseph Pemberton, Email: jpmbrton@uw.edu.
Rui Ponte Costa, Email: rui.costa@dpag.ox.ac.uk.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-024-55315-6.
References
- 1.Asaad, W. F., Rainer, G. & Miller, E. K. Task-specific neural activity in the primate prefrontal cortex. J. Neurophysiol.84, 451–459 (2000). [DOI] [PubMed] [Google Scholar]
- 2.Miller, E. K. & Cohen, J. D. An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci.24, 167–202 (2001). [DOI] [PubMed] [Google Scholar]
- 3.Tanji, J. & Hoshi, E. Role of the lateral prefrontal cortex in executive behavioral control. Physiol. Rev.88, 37–57 (2008). [DOI] [PubMed] [Google Scholar]
- 4.Mansouri, F. A., Tanaka, K. & Buckley, M. J. Conflict-induced behavioural adjustment: a clue to the executive functions of the prefrontal cortex. Nat. Rev. Neurosci.10, 141–152 (2009). [DOI] [PubMed] [Google Scholar]
- 5.Banerjee, A. et al. Value-guided remapping of sensory cortex by lateral orbitofrontal cortex. Nature585, 245–250 (2020). [DOI] [PubMed] [Google Scholar]
- 6.Song, H. F., Yang, G. R. & Wang, X.-J. Reward-based training of recurrent neural networks for cognitive and value-based tasks. Elife6, e21492 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Orhan, A. E. & Ma, W. J. A diverse range of factors affect the nature of neural representations underlying short-term memory. Nat. Neurosci.22, 275–283 (2019). [DOI] [PubMed] [Google Scholar]
- 8.Aoi, M. C., Mante, V. & Pillow, J. W. Prefrontal cortex exhibits multidimensional dynamic encoding during decision-making. Nat. Neurosci. 23, 1410–1420 (2020). [DOI] [PMC free article] [PubMed]
- 9.French, R. M. Catastrophic forgetting in connectionist networks. Trends Cogn. Sci.3, 128–135 (1999). [DOI] [PubMed] [Google Scholar]
- 10.Klinzing, J. G., Niethard, N. & Born, J. Mechanisms of systems memory consolidation during sleep. Nat. Neurosci.22, 1598–1610 (2019). [DOI] [PubMed] [Google Scholar]
- 11.Jedlicka, P., Tomko, M., Robins, A. & Abraham, W. C. Contributions by metaplasticity to solving the catastrophic forgetting problem. Trends Neurosci. 45, 656–666 (2022). [DOI] [PubMed]
- 12.Flesch, T., Saxe, A. & Summerfield, C. Continual task learning in natural and artificial agents. Trends Neurosci. 46, 199–210 (2023). [DOI] [PMC free article] [PubMed]
- 13.Abbott, L. & Svoboda, K. Brain-wide interactions between neural circuits. Curr. Opin. Neurobiol.65, iii-v (2020). [DOI] [PubMed]
- 14.Jaeger, H. & Haas, H. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science304, 78–80 (2004). [DOI] [PubMed] [Google Scholar]
- 15.Maass, W., Joshi, P. & Sontag, E. D. Computational aspects of feedback in neural circuits. PLoS Comput Biol.3, e165 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Sussillo, D. & Abbott, L. F. Generating coherent patterns of activity from chaotic neural networks. Neuron63, 544–557 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Logiaco, L., Abbott, L. F. & Escola, S. Thalamic control of cortical dynamics in a model of flexible motor sequencing. Cell Rep.35, 109090 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kao, T.-C., Sadabadi, M. S. & Hennequin, G. Optimal anticipatory control as a theory of motor preparation: A thalamo-cortical circuit model. Neuron109, 1567–1581 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Maass, W., Natschläger, T. & Markram, H. Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Comput.14, 2531–2560 (2002). [DOI] [PubMed] [Google Scholar]
- 20.Middleton, F. A. & Strick, P. L. Basal ganglia and cerebellar loops: motor and cognitive circuits. Brain Res. Rev.31, 236–250 (2000). [DOI] [PubMed] [Google Scholar]
- 21.Ramnani, N. The primate cortico-cerebellar system: anatomy and function. Nat. Rev. Neurosci.7, 511–522 (2006). [DOI] [PubMed] [Google Scholar]
- 22.Carlson, E. S. et al. Catecholaminergic Innervation of the Lateral Nucleus of the Cerebellum Modulates Cognitive Behaviors. J. Neurosci.10.1523/JNEUROSCI.2406-20.2021. https://www.jneurosci.org/content/early/2021/02/02/JNEUROSCI.2406-20.2021 (2021). ISSN 0270-6474. [DOI] [PMC free article] [PubMed]
- 23.King, M., Hernandez-Castillo, C. R., Poldrack, R. A., Ivry, R. B. & Diedrichsen, J. Functional boundaries in the human cerebellum revealed by a multi-domain task battery. Nat. Neurosci.22, 1371–1378 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Brissenden, J. A., Tobyne, S. M., Halko, M. A. & Somers, D. C. Stimulus-specific visual working memory representations in human cerebellar lobule VIIb/VIIIa. J. Neurosci.41, 1033–1045 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Gao, Z. et al. A cortico-cerebellar loop for motor planning. Nature563, 113 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zhu, J., Hasanbegović, H., Liu, L. D., Gao, Z. & Li, N. Activity map of a cortico-cerebellar loop underlying motor planning. Nat. Neurosci. 26, 1916–1928 (2023). [DOI] [PMC free article] [PubMed]
- 27.Deverett, B., Kislin, M., Tank, D. W. & Wang, S. S.-H. Cerebellar disruption impairs working memory during evidence accumulation. Nat. Commun.10, 1–7 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Chabrol, F. P., Blot, A. & Mrsic-Flogel, T. D. Cerebellar contribution to preparatory activity in motor neocortex. Neuron103, 506–519 (2019). [DOI] [PMC free article] [PubMed]
- 29.Li, N. & Mrsic-Flogel, T. D. Cortico-cerebellar interactions during goal-directed behavior. Curr. Opin. Neurobiol.65, 27–37 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Tanaka, H., Ishikawa, T., Lee, J. & Kakei, S. The Cerebro-Cerebellum as a Locus of Forward Model: A Review. Front. Syst. Neurosci.14, 19 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Pemberton, J., Boven, E., Apps, R. & Costa, R. P. Cortico-cerebellar networks as decoupling neural interfaces. Adv. Neural Info. Processing Syst.34, 7745–7759 (2021).
- 32.Boven, E., Pemberton, J., Chadderton, P., Apps, R. & Costa, R. P. Cerebro-cerebellar networks facilitate learning through feedback decoupling. Nat. Commun.14, 1–18 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Marr, D. From Trinity College, Cambridge. J. Physiol.202, 437–470 (1969). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Babadi, B. & Sompolinsky, H. Sparseness and expansion in sensory representations. Neuron83, 1213–1226 (2014). [DOI] [PubMed] [Google Scholar]
- 35.Litwin-Kumar, A., Harris, K. D., Axel, R., Sompolinsky, H. & Abbott, L. F. Optimal degrees of synaptic connectivity. Neuron93, 1153–1164 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Rössert, C., Solinas, S., D’Angelo, E., Dean, P. & Porrill, J. Model cerebellar granule cells can faithfully transmit modulated firing rate signals. Front. Cell. Neurosci.8, 304 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Chadderton, P., Schaefer, A. T., Williams, S. R. & Margrie, T. W. Sensory-evoked synaptic integration in cerebellar and cerebral cortical neurons. Nat. Rev. Neurosci.15, 71–83 (2014). [DOI] [PubMed] [Google Scholar]
- 38.Bellec, G. et al. A solution to the learning dilemma for recurrent networks of spiking neurons. Nat. Commun.11, 1–15 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Aumann, T. D. Cerebello-thalamic synapses and motor adaptation. Cerebellum1, 69–77 (2002). [DOI] [PubMed] [Google Scholar]
- 40.Audette, N. J., Bernhard, S. M., Ray, A., Stewart, L. T. & Barth, A. L. Rapid plasticity of higher-order thalamocortical inputs during sensory learning. Neuron103, 277–291 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Wang, S. S.-H., Denk, W. & Häusser, M. Coincidence detection in single dendritic spines mediated by calcium release. Nat. Neurosci.3, 1266–1273 (2000). [DOI] [PubMed] [Google Scholar]
- 42.Medina, J. F., Carey, M. R. & Lisberger, S. G. The representation of time for motor learning. Neuron45, 157–167 (2005). [DOI] [PubMed] [Google Scholar]
- 43.Suvrathan, A., Payne, H. L. & Raymond, J. L. Timing rules for synaptic plasticity matched to behavioral function. Neuron92, 959–967 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Shim, H. G. et al. Long-term depression of intrinsic excitability accompanied by synaptic depression in cerebellar Purkinje cells. J. Neurosci.37, 5659–5669 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Rowan, M. J. M. et al. Graded control of climbing-fiber-mediated plasticity and learning by inhibition in the cerebellum. Neuron99, 999–1015 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Suvrathan, A. Beyond stdp—towards diverse and functionally relevant plasticity rules. Curr. Opin. Neurobiol.54, 12–19 (2019). [DOI] [PubMed] [Google Scholar]
- 47.Sanes, J. N., Dimitrov, B. & Hallett, M. Motor learning in patients with cerebellar dysfunction. Brain113, 103–120 (1990). [DOI] [PubMed] [Google Scholar]
- 48.Yamamoto, K., Kawato, M., Kotosaka, S. & Kitazawa, S. Encoding of movement dynamics by Purkinje cell simple spike activity during fast arm movements under resistive and assistive force fields. J. Neurophysiol.97, 1588–1599 (2007). [DOI] [PubMed] [Google Scholar]
- 49.Ebner, T. J. & Pasalar, S. Cerebellum Predicts the Future Motor State. Cerebellum (Lond., Engl.)7, 583 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Lanore, F., Cayco-Gajic, N. A., Gurnani, H., Coyle, D. & Silver, R. A. Cerebellar granule cell axons support high-dimensional representations. Nat. Neurosci.24, 1142–1150 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Christie, J. M. & Gaffield, M. A. The cerebellum encodes and influences the initiation and termination of discontinuous movements. bioRxiv, 10.1101/2021.06.24.449622. https://www.biorxiv.org/content/early/2021/06/24/2021.06.24.449622 (2021). [DOI] [PMC free article] [PubMed]
- 52.Dacre, J. et al. A cerebellar-thalamocortical pathway drives behavioral context-dependent movement initiation. Neuron109, 2326–2338 (2021). [DOI] [PMC free article] [PubMed]
- 53.Fasano, A., Laganiere, S. E., Lam, S. & Fox, M. D. Lesions causing freezing of gait localize to a cerebellar functional network. Ann. Neurol.81, 129–141 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Marsden, J. F. Cerebellar ataxia. Handb. Clin. Neurol.159, 261–281 (2018). [DOI] [PubMed] [Google Scholar]
- 55.Perich, M. G., Gallego, J. A. & Miller, L. E. A neural population mechanism for rapid learning. Neuron100, 964–976 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Milak, M. S., Bracha, V. & Bloedel, J. R. Context-dependent modulation of cerebellar nuclear neurons related to the performance of specific movement segments. Soc. Neurosci. Abstr. 20, 1746 (1994).
- 57.Shahshahani, L., King, M., Nettekoven, C., Ivry, R. & Diedrichsen, J. Selective recruitment of the cerebellum evidenced by task-dependent gating of inputs. eLife13, RP96386 (2024). [DOI] [PMC free article] [PubMed]
- 58.Ogasawara, H., Doi, T., Doya, K. & Kawato, M. Nitric oxide regulates input specificity of long-term depression and context dependence of cerebellar learning. PLoS Comput. Biol.3, e179 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Feulner, B. et al. Small, correlated changes in synaptic connectivity may facilitate rapid motor learning. Nat. Commun.13, 1–14 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Raymond, J. L. & Medina, J. F. Computational principles of supervised learning in the cerebellum. Annu. Rev. Neurosci.41, 233–253 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Ito, M. Control of mental activities by internal models in the cerebellum. Nat. Rev. Neurosci.9, 304 (2008). [DOI] [PubMed] [Google Scholar]
- 62.Wagner, M. J. & Luo, L. Neocortex–cerebellum circuits for cognitive processing. Trends Neurosci.43, 42–54 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Kao, T.-C. & Hennequin, G. Neuroscience out of control: control-theoretic perspectives on neural circuit dynamics. Curr. Opin. Neurobiol.58, 122–129 (2019). [DOI] [PubMed] [Google Scholar]
- 64.Kobak, D. et al. Demixed principal component analysis of neural population data. eLife, 5, e10989, (2016). [DOI] [PMC free article] [PubMed]
- 65.Cayco-Gajic, N. A. & Silver, R. A. Re-evaluating circuit mechanisms underlying pattern separation. Neuron101, 584–602 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Stein, H. Why Does the Neocortex Need the Cerebellum for Working Memory? J. Neurosci.41, 6368–6370 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Locke, T. M. et al. Dopamine D1 Receptor-Positive Neurons in the Lateral Nucleus of the Cerebellum Contribute to Cognitive Behavior. Biol. Psychiatry84, 401–412 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Schuessler, F., Mastrogiuseppe, F., Dubreuil, A., Ostojic, S. & Barak, O. The interplay between randomness and structure during learning in RNNs. Adv. neural Inf. Process. Syst.33, 13352–13362 (2020). [Google Scholar]
- 69.Oja, E. A simplified neuron model as a principal component analyzer. J. Math. Biol.15, 267–273 (1982). [DOI] [PubMed] [Google Scholar]
- 70.van Rossum, M. C. W., Bi, G.-Q. & Turrigiano, G. G. Stable hebbian learning from spike timing-dependent plasticity. J. Neurosci.20, 8812–8821 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Coultrip, R., Granger, R. & Lynch, G. A cortical model of winner-take-all competition via lateral inhibition. Neural Netw.5, 47–54 (1992). [Google Scholar]
- 72.Xu, W., De Carvalho, F. & Jackson, A. Conserved population dynamics in the cerebro-cerebellar system between waking and sleep. J. Neurosci.42, 9415–9425 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Lewis, R. F. Context-dependent adaptation of visually-guided arm movements and vestibular eye movements: role of the cerebellum. Cerebellum2, 123–130 (2003). [DOI] [PubMed] [Google Scholar]
- 74.Heald, J. B., Lengyel, M. & Wolpert, D. M. Contextual inference underlies the learning of sensorimotor repertoires. Nature600, 489–493 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Sezener, E. et al. A rapid and efficient learning rule for biological neural circuits. bioRxiv 2021.03.10.434756 (2021).
- 76.Hwang, K., Shine, J. M., Cole, M. W. & Sorenson, E. Thalamocortical contributions to cognitive task activity. Elife11, e81282 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Guo, Z. V. et al. Maintenance of persistent activity in a frontal thalamocortical loop. Nature545, 181–186 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Uusisaari, M. & De Schutter, E. The mysterious microcircuitry of the cerebellar nuclei. J. Physiol.589, 3441–3457 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Muscinelli, S., Wagner, M. and Litwin-Kumar, A. Optimal routing to cerebellum-like structures. bioRxiv. (2022). [DOI] [PMC free article] [PubMed]
- 80.Cayco-Gajic, N. A., Clopath, C. & Silver, R. A. Sparse synaptic connectivity is required for decorrelation and pattern separation in feedforward networks. Nat. Commun.8, 1–11 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Oostland, M. et al. Cerebellar acceleration of learning in an evidence-accumulation task. bioRxiv 2021.12.23.474034 (2021).
- 82.Kostadinov, D. & Häusser, M. Reward signals in the cerebellum: origins, targets, and functional implications. Neuron110, 1290–1303 (2022). [DOI] [PubMed]
- 83.De Zeeuw, C. I. et al. Microcircuitry and function of the inferior olive. Trends Neurosci.21, 391–400 (1998). [DOI] [PubMed] [Google Scholar]
- 84.Garden, D. L. F., Rinaldo, L. & Häusser, M. Inferior olive to purkinje cell communication: Evolution of microcircuitry. Eur. J. Neurosci.46, 2640–2656 (2017). [Google Scholar]
- 85.Ten Brinke, M. M. et al. Encoding of action by the purkinje cells of the cerebellum. Nat. Neurosci.22, 1691–1702 (2019). [Google Scholar]
- 86.Wang, S. S.-H., Kloth, A. D. & Badura, A. Neural circuits for cerebellar control of movement. Annu. Rev. Neurosci.44, 251–276 (2021). [Google Scholar]
- 87.Fallon, J. H. & Moore, R. Y. Monoamine innervation of the basal forebrain: V. dopamine innervation of the basal forebrain, superior colliculus, and ventral tegmental area. J. Comp. Neurol.222, 507–524 (1984). [Google Scholar]
- 88.Canto, C. B., Onuki, Y., Bruinsma, B., van der Werf, Y. D. & De Zeeuw, C. I. The sleeping cerebellum. Trends Neurosci.40, 309–323 (2017). [DOI] [PubMed] [Google Scholar]
- 89.De Zeeuw, C. I. & Canto, C. B. Sleep deprivation directly following eyeblink-conditioning impairs memory consolidation. Neurobiol. Learn. Mem.170, 107165 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Corrigan, B. W. et al. Distinct neural codes in primate hippocampus and lateral prefrontal cortex during associative learning in virtual environments. Neuron110, 2155–2169 (2022). [DOI] [PubMed]
- 91.Anastasiades, P. G., Collins, D. P. & Carter, A. G. Mediodorsal and ventromedial thalamus engage distinct L1 circuits in the prefrontal cortex. Neuron109, 314–330 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Berardi, N., Pizzorusso, T., Ratto, L. & Maffei, L. Molecular basis of plasticity in the visual cortex. Curr. Opin. Neurobiol.10, 142–148 (2000). [Google Scholar]
- 93.Berardi, N., Pizzorusso, T. & Maffei, L. Critical periods during sensory development. Trends Neurosci.23, 104–111 (2003). [DOI] [PubMed] [Google Scholar]
- 94.Hensch, T. K. Critical period mechanisms in developing visual cortex and beyond. Nat. Rev. Neurosci.6, 877–888 (2005). [DOI] [PubMed] [Google Scholar]
- 95.Doya, K. Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr. Opin. Neurobiol.10, 732–739 (2000). [DOI] [PubMed] [Google Scholar]
- 96.He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proc. IEEE Int. Conf. Comput. Vis. 1026–1034, (2015).
- 97.Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, (2014).
- 98.Albus, J. S. A theory of cerebellar function. Math. Biosci.10, 25–61 (1971). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Simulated data used for our experiments can be generated using scripts https://github.com/neuralml/ccLoops(10.5281/zenodo.13960780) The experimental data used to test the model predictions is available from the respective papers as indicated throughout. Source data are provided with this paper.
We used the PyTorch library (version 1.7.0) for all neural network models. The code is available at https://github.com/neuralml/ccLoops(10.5281/zenodo.13960780).