Skip to main content
Nature Portfolio logoLink to Nature Portfolio
. 2024 Nov 28;639(8054):421–429. doi: 10.1038/s41586-024-08433-6

Individual variability of neural computations underlying flexible decisions

Marino Pagan 1,2,, Vincent D Tang 1, Mikio C Aoi 1,3, Jonathan W Pillow 1, Valerio Mante 4,5, David Sussillo 6,7, Carlos D Brody 1,8,
PMCID: PMC11903320  PMID: 39608399

Abstract

The ability to flexibly switch our responses to external stimuli according to contextual information is critical for successful interactions with a complex world. Context-dependent computations are necessary across many domains13, yet their neural implementations remain poorly understood. Here we developed a novel behavioural task in rats to study context-dependent selection and accumulation of evidence for decision-making46. Under assumptions supported by both monkey and rat data, we first show mathematically that this computation can be supported by three dynamical solutions and that all networks performing the task implement a combination of these solutions. These solutions can be identified and tested directly with experimental data. We further show that existing electrophysiological and modelling data are compatible with the full variety of possible combinations of these solutions, suggesting that different individuals could use different combinations. To study variability across individual subjects, we developed automated, high-throughput methods to train rats on our task and trained many subjects using these methods. Consistent with theoretical predictions, neural and behavioural analyses revealed substantial heterogeneity across rats, despite uniformly good task performance. Our theory further predicts a specific link between behavioural and neural signatures, which was robustly supported in the data. In summary, our results provide an experimentally supported theoretical framework to analyse individual variability in biological and artificial systems that perform flexible decision-making tasks, open the door to cellular-resolution studies of individual variability in higher cognition, and provide insights into neural mechanisms of context-dependent computation more generally.

Subject terms: Dynamical systems, Decision, Network models, Cognitive control, Neural encoding


Behavioural experiments to study decision-making in response to context-dependent accumulation of evidence provide testable models that are consistent with the heterogeneity in neural signatures among rats that perform well in trials.

Main

We are often required to use context or top-down goals to select relevant information from a sensory stream, ignore irrelevant information and guide further action. For example, if we hear our name called in a crowded room and our goal is to turn towards the caller, regardless of their identity, information about the location of the sound will drive our actions; but if we wish to respond on the basis of the identity of the caller, the frequencies, in the very same sound, will be most important for driving our actions. As with other types of decision, when the evidence for or against different choices is noisy or uncertain, accumulation of many observations over time is an important strategy for reducing noise1,4,7,8. Here we explore the neural mechanisms that underlie our ability to flexibly accumulate evidence about external stimuli and to switch our response according to contextual information.

We developed a series of experimental and computational techniques to address this question. First, we developed a behavioural pulse-based task in rats to study context-dependent selection and accumulation of evidence for decision-making. Delivering evidence in highly random, yet precisely known pulses provided us with high statistical power to precisely characterize the rats’ behaviour and neural dynamics. Then, using an automated, high-throughput procedure, we trained many rats to solve the task, which enabled us to uncover a surprising degree of variability in the behaviour and neural dynamics across individuals, even when they were all well-trained, high performing animals. Next, we developed a mathematical framework that defined the space of solutions for networks that can implement the required computation. The theoretical framework predicted that variability in position in that solution space, within and across individuals, should be the underlying variable that would jointly drive variability in behaviour and neural responses—implying that behavioural and neural variability should be tightly correlated. Our experimental data robustly confirmed this theoretical prediction. Finally, we developed techniques to engineer artificial recurrent neural networks (RNNs) across the full range of our theoretical solution space and showed that gradient-descent methods, as typically used to train network models, lead to only one corner of the possible data-compatible solutions.

Flexible evidence accumulation in rats

To study the neural basis of context-dependent selection and accumulation of sensory evidence, we trained rats on a novel auditory task in which, in alternating blocks of trials, subjects were cued to determine either the prevalent location (LOC) or the prevalent frequency (FRQ) of a sequence of randomly timed auditory pulses (Fig. 1a). The relative rates of left versus right and high- versus low-frequency pulses corresponded to the strength of the evidence about LOC and FRQ, respectively (Fig. 1b). These relative rates were chosen randomly and independently on each trial, and were used to generate a train of pulses that were maximally randomly timed—that is, having a Poisson distribution. Correct performance requires selecting the relevant feature for a given context, accumulating the pulses of evidence for that feature over time, and ignoring the irrelevant feature. Many rats were trained to good performance on this task using an automated training procedure (Fig. 1c; training code available at https://github.com/Brody-Lab/flexible_decision_making_training) and most rats learned the task in a timespan between two and five months (Extended Data Fig. 2g). After training, rats associated the audiovisual cue presented at the beginning of each trial with the correct task context, and were able to switch between selected stimulus features within four trials of a new context block (Extended Data Fig. 1e). Our task structure was inspired by a previous visual task used with macaques4—major distinctions between the previous and current tasks included the species difference, the sensory modality difference, and the pulse-based nature of our task; this last will be key for the analyses performed below. Despite the important differences across tasks, attained performances were similar across the two species (Extended Data Fig. 1c,f). We reasoned that the highly random yet precisely known stimulus pulses, together with large numbers of trials and subjects, would provide us with statistical power to characterize both behavioural9 and neural responses.

Fig. 1. Rats can perform context-dependent evidence accumulation.

Fig. 1

a, The task. Each trial starts with a sound indicating context (LOC or FRQ), followed by a 1.3-s train of randomly timed auditory pulses. Each pulse is played either from a left or right speaker, and has either low or high frequency (freq.). In LOC trials, subjects must turn, at the end of the stimulus, towards the side that played the higher total number of pulses, ignoring frequency. In FRQ trials, subjects must turn right if there was a higher number of high-frequency pulses (Hi) and left if there was a higher number of low-frequency pulses (Lo). An identical stimulus can be associated with opposite responses in the two contexts. L, left; R, right. b, The stimulus set. c, Logistic fits of psychometric curves for 20 rats after training (more than 120,000 trials for each rat). In the LOC context, choices are mostly affected by location; in the FRQ context, choices are mostly affected by frequency. d, Population activity evolving over time corresponds to a trajectory in a high-dimensional neural space. This trajectory is projected onto axes that optimally encode momentary LOC and FRQ evidence and choice. e, Trajectory of choice-modulated neural activity, projected onto its first two principal components (PC1 and PC2). The trajectory was computed separately for each context, but the principal components were computed in common across contexts. The choice axis was defined as the straight-line fit to the trace from t = 0 to t = 1.3 s. f, Population trajectories from recordings in FOF of rats performing the task. Trajectories are projected onto choice and LOC axes (top row) or choice and FRQ axes (bottom row). Trajectories are sorted by strength of location (top row) or frequency (bottom row). Stim, stimulus. g, Same analysis as in f, for recordings from FEF of macaque monkeys performing an analogous visual version of the task, with motion and colour contexts4.

Extended Data Fig. 2. Training procedure.

Extended Data Fig. 2

a) Stage 1: rats are trained only on the location task, with strong location evidence and no frequency evidence (pulses consist of superimposed low and high frequency). The context cue is played before each trial. b) Stage 2: rats learn to alternate between the location and frequency context. In the frequency context rats are presented with strong frequency evidence and no location evidence (stereo pulses). c) Stage 3: introduction of pulse modulation. In the frequency context, pulses are now presented on either side (but with no prevalent side). In the location context, pulses are either high-frequency or low-frequency (but with no prevalent frequency). d) Stage 4: irrelevant information is introduced, but the relevant information is always at maximum strength. e) Stage 5: relevant information can have intermediate strength. f) Stage 6: relevant information can have low strength. g) Training progression. Most rats learn stages 1-3 in approximately 2 weeks, but it takes a much longer time to learn stages 4-6 because of the introduction of irrelevant evidence. The feature selection index quantifies whether rats attend to the correct feature and ignore the irrelevant feature (see methods). The black dashed line indicates chance, the red dashed line indicates the threshold performance to consider a rat trained. Most rats learn the task within 2-5 months.

Extended Data Fig. 1. Comparison of rat task and monkey task.

Extended Data Fig. 1

a) In the rat task, the subject is cued using an audiovisual stimulus, and is presented with a train of randomly-timed auditory pulses varying in location and frequency. In different contexts, the subject determines the prevalent location or the prevalent frequency of the pulses. b) Stimulus set for the rat task: strength of location and prevalent frequency are varied independently on each trial. c) Psychometric curves for the rat task (n = 20 rats). d) In the monkey task, the subject is cued using the shape and color of a fixation dot, and is presented with a field of randomly-moving red and green dots. In different contexts, the subject determines the prevalent color or the prevalent motion of the dots. e) Stimulus set for the monkey task: strength of motion and prevalent color are varied independently on each trial. f) Psychometric curves for the monkey task (n = 2 macaque monkeys). g) Rats rapidly switch between contexts. Performances saturate within the first 4-5 trials in the block. The weight of location and frequency evidence is computed using a logistic regression (see methods). Thin lines indicate individual rats, thick lines indicate the average across rats. h) Full matrix of behavioral performances for one example rat across the two contexts.

To compare neural dynamics in a decision-making region across monkeys and rats, we examined neural activity in the frontal orienting fields (FOF) while rats performed our task. The FOF are a rat cortical region that is thought to be involved in decision-making for orienting choice responses10,11, and have been suggested as homologous or analogous to macaque frontal eye fields (FEF)10,12,13, which are the cortical region recorded in the previous monkey task4. Consistent with a key role for the FOF in our task, bilateral optogenetic silencing of rat FOF demonstrated that they are required for accurate performance of the task (Extended Data Fig. 4; n = 3 rats). We implanted tetrodes into the FOF and into another frontal region, the medial prefrontal cortex (mPFC), and we recorded from n = 3,495 putative single neurons during n = 199 sessions from n = 7 rats while they performed the task shown in Fig. 1. As with previous reports in frontal cortices of macaques and rodents, we found that task-related firing rates were highly heterogeneous across neurons. We then carried out the same analysis that had been applied to the (also heterogeneous) neurons recorded from monkey FEF4,14, and found strong qualitative similarities across the two species (compare Fig. 1f,g). The analysis, known as targeted dimensionality reduction (TDR) begins by describing neural population activity at a given moment in time as a point in ‘neural space’, where each axis represents the firing rate of one of the N recorded neurons. As activity evolves over the duration of a trial, a trajectory in N-dimensional neural space is traced out (Fig. 1d). Following ref. 4, neurons recorded separately in different sessions were combined into a single time-evolving N-dimensional neural vector. This ‘pseudo-population’ activity was averaged across trials with a given generative pulse rate (that is, within each of the 36 blocks in Fig. 1b), for each of the two contexts, and for each of the subject’s choices. These trajectories were projected onto the orthogonalized linear subspaces that best predicted the subject’s choice, momentary location evidence or momentary frequency evidence (illustrated as different axes in Fig. 1d). We found that trajectories for different evidence strengths were clearly separated along the axis of each sensory feature (see separation of traces along the vertical axes of the panels of Fig. 1f; only correct trials are shown). This was true regardless of whether the feature was relevant or irrelevant (compare vertical separation for left versus right columns in Fig. 1f). A similar observation in the monkey data (Fig. 1g) previously led to the conclusion that irrelevant feature information was not gated out from reaching frontal cortex4; the same conclusion applies to our rat data. Next, we present a theoretical analysis that applies equally to this scenario (no gating of irrelevant information before reaching frontal cortex), as well as to alternative mechanisms that rely on early gating, an aspect we return to in the discussion. Overall, the marked qualitative similarity between the rat (Fig. 1f) and monkey (Fig. 1g) traces suggests that the underlying neural mechanisms in the two species may be similar enough that an active exchange of ideas between studies in the two species will be very fruitful.

Extended Data Fig. 4. Electrophysiology and optogenetics techniques.

Extended Data Fig. 4

a) 64-channel custom-made multi-tetrode drive, allowing independent movement of 16 tetrodes. This drive was used in one rat for wired recordings. b) 128-channel custom-made multi-tetrode drive, allowing independent movement of 4 bundles with 8 tetrodes each. This drive was used in six rats for wireless recordings. c) Device for wireless optogenetic perturbation. In the implant, two chemically sharpened optic fibers targeting both hemispheres are attached using optical glue to two laser diodes. The laser diodes are controlled independently by a control board, which communicates wirelessly with the computer controlling the behavior. The control board can be attached/detached using a microUSB connector. d) Example rat with wireless electrophysiology implant and headstage. e) Example rat with wireless optogenetic implant and control board. f,g) Result of inactivation of FOF. 3 rats expressed AAV2/5-mDlx-ChR2-mCherry and were stimulated with blue light (450 nm, 25mW) for the full duration of the stimulus. f) Result of unilateral inactivation on rats’ choices as a function of strength of relevant evidence (averaged across the two contexts). Activation of each laser was randomized across trials. g) Result of bilateral FOF inactivation on rats’ choices as a function of strength of relevant evidence (averaged across the two contexts). h,i) Example responses of single units recorded in FOF (h) and in mPFC (i). Shown are the peri-stimulus time histograms of responses for correct trials, averaged according to context and choice. Units in both areas exhibit significant heterogeneity and large modulation according to combinations of the rat’s upcoming choice and the current context. The dashed vertical lines indicate the beginning of the pulse-train stimulus presentation, the end of the pulse-train stimulus presentation, and the average time when the rat performed a poke in one of the two side ports to indicate his choice. Shaded areas indicate standard errors.

Using model-based TDR analysis14, we found the two-dimensional subspace that best accounts for the contribution of the animal’s choice to the neural activity (accounting for 81.3% of the variance). We then projected the kernel-based estimates of ‘go-right’ and ‘go-left’ trajectories (which are noise-reduced versions of the raw trajectories) (Extended Data Fig. 5h) onto it (Fig. 1e). During the stimulus presentation (t = 0 to t = 1.3 s, a period during which subjects must accumulate sensory evidence), this choice-related information in firing rates evolved along an essentially one-dimensional straight line in neural space (accounting for 73.3% of the variance), only later curving into a second dimension (see Extended Data Fig. 5 for per-animal analysis). This is consistent with previous findings, with the initial linear phase having been suggested as corresponding to gradual evidence accumulation, whereas the subsequent rotation may correspond to formation of a motor plan15,16, perhaps after commitment to a decision14,17. We will focus on evidence accumulation during this linear phase, while the decision is being formed, and will refer to the corresponding line in neural space as the ‘choice axis’: the animal’s upcoming choice can be predicted from position on this axis. Crucially, both correct and incorrect trials are used for this analysis, allowing to separate this choice-predictive signal from responses to sensory stimuli. In a final similarity with the monkey data, we found that the choice axes, estimated separately for each of the two contexts, were essentially parallel (average angle between contexts = 1.6°; not significantly different from 0 (P > 0.1) for 6 out of 7 rats; Methods). Consequently, in the theoretical development below we will assume that the direction of the choice axis is the same in the two contexts. However, this simplifying assumption can be relaxed, as addressed in the discussion and detailed in Extended Data Fig. 10.

Extended Data Fig. 5. Choice-related dynamics, computed independently for each rat, and across the two contexts.

Extended Data Fig. 5

For each rat, the horizontal and vertical axes in the two subpanels are the same across the two panels, and are computed using data from both contexts. In panels a-g, the dynamics in each context are computed using the choice kernels of the pulse-based regression (see Fig. 1.1 in Extended Discussion). The kernels provide a regularized, noise-reduced version of the raw trajectories (which are shown for Rat 1 in panel h). The black dot indicates the time of the start of stimulus presentation (t = 0), the purple dots indicate the end of stimulus presentation (t = 1.3s). The line indicates the choice axis computed in the given context, and above the panels is indicated the angle between the choice axes computed across the two contexts.

Extended Data Fig. 10. Extension of the theory to the general case with context-dependent line attractors.

Extended Data Fig. 10

a) Rewriting of the equation describing the differential integration of a pulse across contexts (Equation 1) after the assumption that the line attractor is parallel across the two contexts is dropped. In this equation, the first three terms correspond to the same terms as in Equation 2, with addition of a fourth term, which captures changes in the direction of the line attractor along the average input direction. (b) Graphical intuition of the four solutions in the general case where the line attractor is not parallel across the two contexts. Top left: changes of the input along the direction of the line attractor (“direct input modulation”). Top right: changes of the input along a direction orthogonal to the line attractor (“indirect input modulation”). Bottom left: changes of the direction of the line attractor across the two contexts. Bottom right: changes of the component of the selection vector orthogonal to the line attractor across contexts (“selection vector modulation”).

Three components underlie task solutions

It has long been hypothesized that neural dynamics around the choice axis are well approximated by a line attractor18,19—that is, that the choice axis is formed by a closely packed sequence of stable points. This follows from the idea that the position of the system on the choice axis corresponds to net accumulated evidence towards right versus left choice; in temporal gaps between pulses of evidence, an accumulator must be able to stably maintain accumulated values, and thus position anywhere along this axis should be a stable point. We now develop theoretical implications of this computation-through-dynamics18 line attractor hypothesis, which lead to a new description of the space of possible network solutions consistent with the hypothesis, and to new experimental predictions that we find to be robustly supported by the data.

A key implication of the line attractor hypothesis, which follows from linearized approximations of the dynamics of the system, is that a sensory stimulus pulse that perturbs the system along direction i has a net effect on position along the choice axis4,19 given by the dot product of that input vector i and the ‘selection vector’ s. That is, the change in choice axis position is equal to s ⋅ i (Box 1). Thus, in the linear dynamics approximation, and under the line attractor hypothesis, the simple dot product s ⋅ i summarizes the result of the interaction of local recurrent dynamics (represented by s) with a pulse of external input (represented by i).

It follows that for a pulse of evidence to have a greater effect on choice in the context in which it is relevant than when it is irrelevant, s ⋅ i must be greater in the relevant context than in the irrelevant context. The recurrent dynamics in the decision-making region could be different in the two contexts; similarly, context-dependent modulation of early sensory responses2024 could lead the direction i along which a pulse of a given feature perturbs the system to be different in the two contexts. Thus, indicating relevant versus irrelevant context with a subscript (sREL versus sIRR and iREL versus iIRR for relevant versus irrelevant, respectively), the general condition for a given feature’s input pulse to have greater effect on choice when relevant versus irrelevant is:

Δ(si)=sRELiRELsIRRiIRR>0

where Δ indicates difference across contexts. For each of the features being considered (in our experiments, LOC and FRQ), this difference Δ(s ⋅ i) can be rewritten as the sum of three components (Fig. 2c).

Δ(si)=12(sLOC+sFRQ)s¯(iLOCiFRQ)Δi+12(iLOC+iFRQ)i¯(sLOCsFRQ)Δs=s¯ΔiInputmodulation+Δsi¯Selectionvectormodulation 1
s¯ΔiIndirectinputmodulation+s¯ΔiDirectinputmodulation+Δsi¯Selectionvectormodulation 2

where the overbar symbol represents the average over the two contexts, Δ represents difference between the two contexts, and Δi and Δi represent the component of Δi that is orthogonal and parallel to the choice axis, respectively. For any given feature (here, either LOC or FRQ), and for any given network that solves the task (and thus has Δ(s ⋅ i) > 0), the percentage that each of the components contributes to the total Δ(s ⋅ i) can be visualized in terms of distances from the vertices of a triangle—that is, a point in barycentric coordinates (Fig. 2g). We emphasize that all positions on the triangle have Δ(s ⋅ i) > 0 and thus all describe solutions; the different positions describe variations across networks that embody different solutions for the task. This will be a key aspect to understanding variability across different individuals that all solve the task. Indirect input modulation (IIM), the first component in equation (2), is what follows if the difference across contexts is a change in the input vector i, with the change orthogonal to the line attractor. The direct input modulation (DIM), the second component, follows from change in the input i that is parallel to the line attractor. Selection vector modulation (SVM), the third component, follows from a change in the selection vector s that represents the recurrent dynamics in the decision-making region itself. The manner in which each of the components of equation (2) lead to a greater change in line attractor position for the relevant context than for the irrelevant context is illustrated in Fig. 2d–f.

Fig. 2. Context-dependent evidence selection can be dissected into three components.

Fig. 2

a, The stimulus provides a train of go-left (down arrow) versus go-right (up arrow) pulses of LOC evidence (top) and FRQ evidence (bottom). Pulses of relevant evidence must move the system’s position along the choice axis, whereas irrelevant evidence should have negligible effect. b, The final effect of a single pulse of evidence is equal to the dot product of the selection vector sREL and the input vector iREL. In the irrelevant context, the pulse effect equals sIRR ⋅ iIRR. c, To solve the task, relevant evidence must have a larger effect than irrelevant evidence. This can be rewritten as the sum of three components, spanning the space of possible solutions. Δ indicates difference across contexts; bar indicates mean across contexts. d, The IIM is a change in input across contexts, orthogonal to the choice axis. Bottom left, the projection onto the choice axis is initially identical across contexts, differing only after the relaxation dynamics. Bottom right, the differential pulse response (the difference across contexts in the projection onto the choice axis of the response to a pulse) increases gradually from zero. e, The SVM describes changes across contexts in the recurrent dynamics. As in d, the differential pulse response is initially zero and increases only after the relaxation dynamics. f, The DIM is a change in the input vector parallel to the choice axis. In contrast to d,e, the differential pulse response is non-zero immediately upon pulse presentation. g, Top, all recurrent networks that solve the task can be expressed as a weighted sum of three components and can therefore be mapped inside a triangle with barycentric coordinates. Bottom, the vertical axis quantifies how quickly the differential pulse response diverges from zero. A second axis (oblique line) captures how much the network relies on context-dependent modulation of inputs versus context-dependent modulation of recurrent dynamics.

Different positions on the triangle of Fig. 2g are not merely distinct mathematically; they have different, and important, biological implications. First, where a network solution lies along the ‘change in inputs’ versus ‘change in dynamics’ tilted axis in Fig. 2g has important anatomical implications. For networks at the ‘change in dynamics’ corner, the anatomical locus of context-dependence must be in decision-making regions, as it is the recurrent dynamics of these regions that differ across contexts. By contrast, for networks at the ‘change in inputs’ end of the axis, the anatomical locus of context-dependence could be outside decision-making regions—for example, it could lie in modulation of responses in sensory regions2024 or in modulation of the pathways from sensory to decision-making regions25. Second, where a network solution lies along the vertical ‘fast’ versus ‘slow’ axis in Fig. 2g has both neural and behavioural implications. We describe the neural implications first. Networks at the slow end of the axis have 0% DIM—that is, they are all mixtures of IIM and SVM. For both IIM and SVM, the projection of the position of the system onto the choice axis immediately after a pulse of evidence is the same for the two contexts, and the difference across contexts develops only gradually (Fig. 2d,e, ‘differential response’). By contrast, networks at the fast end of the axis are 100% DIM, and for these a difference across contexts in the projection onto the choice axis is immediate (Fig. 2f). It is in this sense that neural context-dependence effects on the choice axis are fast at the DIM end of the axis, and slow at the base of the axis (s.v.m or i.i.m). If behavioural choices are driven by the position of the system on the choice axis, it follows that solution diversity on this axis will produce consequent behavioural diversity; we examine this idea further in Fig. 5.

Fig. 5. Theory predicts and experimental data confirm that variability in the neural slope index should explain variability in the behavioural slope index.

Fig. 5

a,b, Schematics of the theoretical reasoning. a, For a network using mostly DIM, there is immediate and sustained separation along the neural choice axis between relevant and irrelevant pulses. Thus the differential effect (across contexts) of a pulse on choice does not depend on whether the pulse is presented early (left) or late (middle) relative to choice commitment. The temporally flat differential pulse response of the neurons thus results in a temporally flat differential behavioural kernel (right). b, By contrast, for a network using SVM or IIM, pulses have a differential effect on choice only after relaxation dynamics. Pulses presented well before choice commitment have a substantially different effect on choice across contexts (left), whereas pulses presented immediately before choice commitment have no time to induce a differential impact (middle). Gradually diverging neural differential pulse responses thus result in gradually converging differential behavioural kernels (right panel). c, Data from n = 30 engineered RNNs spanning the vertical axis of the barycentric coordinates (colours as Fig. 3b). Left, examples of neural differential pulse kernels (as in Fig. 3e–h), each from a single RNN. Bottom, examples of differential behavioural kernels (as in Fig. 4). RNN models follow the theoretical prediction, with anti-correlated slope indices for neural differential pulse kernels and differential behavioural kernels. d, Experimental data (conventions as in c). Data follow the theoretical prediction, with anti-correlated slope indices for behavioural and neural measures. Shapes of individual data points indicate LOC and FRQ features for each of the n = 7 rats. Error bars are centred around the mean and indicate bootstrapped standard errors.

Two parenthetical remarks follow from the algebraic rewriting in equation (2). First, early gating out of irrelevant information (iIRR = 0) is a special case within this framework, and can be either DIM (example 1 in Supplementary Discussion) or IIM (example 2 in Supplementary Discussion). Second, the direction of the line attractor enters the rewriting only in the step from equation (1) to equation (2), when distinguishing IIM versus DIM. This is because this step describes Δi as the sum of a component orthogonal and a component parallel to a particular reference direction that is fixed across the two contexts; here, this reference is the direction of the line attractor. We focus here on the case where the line attractor direction is the same in the two contexts for simplicity and because it is what we found in our rat data (Fig. 1) and what was found in the monkey data4. However, equation (2) can be extended to the case of line attractors that are not parallel across the two contexts15 (Discussion and Extended Data Fig. 10).

Box 1 Dynamics around line attractors.

Linearized dynamics around a fixed point in neural space can be represented by drdt=Mr, where M is a matrix and r is a vector that represents the system’s position in neural space relative to the fixed point.

The eigencoordinates e, defined by e ≡ V−1 ⋅ r, where the columns of V are the eigenvectors of M, can also be used to describe these dynamics. The advantage of eigencoordinates is that each element j of the vector e evolves over time independently of the others, following ej(t)=ej(t=0)exp(λjt), where λj is the eigenvalue corresponding to the jth eigenvector.

For a line attractor, one eigenvalue (by convention the one with index j = 0) has value 0 (λ0 = 0) and consequently e0(t)=constant=e0(t=0). All other eigenvalues have a negative real part, implying that their corresponding eigencoordinates decay to zero over time, as the system state relaxes back onto the line attractor. Thus, if an external input pulse i perturbs the system off the line attractor onto position r(t = 0) = i, it follows that, after the transients in which eigencoordinates j > 0 decay to zero, the new position on the line attractor, relative to the starting fixed point, will be given by e0(t) = e0(t = 0), since this will be the only non-zero eigencoordinate.

The zeroth eigencoordinate of the initial position, e0(t = 0), will be the dot product of the the top row of V, which we label as the selection vector s and the input vector i (refs. 4,19):

e(t=0)=V1r(t=0)=V1i

which implies that net motion along the line attractor caused by an input pulse i is equal to s ⋅ i.

Pulse analyses distinguish solutions

Artificial model networks can be used to illustrate approaches to solving the task. To find networks with many individual heterogeneous units, as observed in the experimental data (see for example, Extended Data Fig. 4), Mante et al.4 trained RNNs to perform the task. Using the analyses of Fig. 1d–g, they observed important similarities between the neural trajectories in the experimental data and in the trained RNNs. Upon analysing the linearized dynamics of the RNNs, they found that the trained RNNs solved the task using SVM. This prompted their influential suggestion of SVM as the leading candidate for how the brain implements context-dependent decision-making. What was unappreciated at the time was that the linearization that they used (‘activation space’ linearization; see Supplementary Information, ‘Linearizing RNN dynamics in firing rate space versus activation space’) precluded observing input vector modulation (whether direct or indirect) for the type of inputs used in their networks26. We therefore repeated their analysis, but using a linearization (‘firing rate space’ linearization) that does permit observing input vector modulation in these RNNs27. Starting from randomly chosen initial network weights, we trained many RNNs to solve the task, analysed their linearized dynamics, and using equation (2), plotted the position of each RNN in barycentric coordinates. The results with the new linearization at first sight confirmed the essence of the conclusion of Mante et al.4, namely, that the trained RNN solutions are densest near the SVM corner at bottom left (Fig. 3a). However, the insight in equation (2), together with our choice of linearization in firing rate space, also allowed us to engineer RNNs that solve the task and lie at any chosen point within the barycentric coordinates (Methods)—that is, we are no longer constrained to exclusively examine the set of RNN solutions that are produced through training. Surprisingly, we found that SVM is not required to produce trajectories such as those in Fig. 1f,g. Instead, network solutions at any point within the barycentric coordinates, not only those close to the SVM corner, produce traces that are qualitatively similar to the experimental data (see Fig. 3d and Extended Data Fig. 6). This suggests that analyses such as the one in Fig. 1f,g, which averages trials within each stimulus block in Fig. 1b, cannot readily distinguish between different solutions across the barycentric coordinates of Fig. 2g—a space that, as described above, spans all possible solutions that are consistent with the choice axis being parallel across the two contexts.

Fig. 3. Backpropagation-trained RNNs explore a subset of possible solutions, whereas engineered RNNs span the full solution space, matching heterogeneity in experimental data.

Fig. 3

a, Distribution of 1,000 RNNs trained using backpropagation through time: networks favoured SVM, as found in Mante et al.4. b, RNNs can be engineered to lie anywhere in the space of solutions (Extended Data Fig. 6), including, as shown here, the vertical axis, from 0% to 100% DIM. cf, Each row analyses a single trained RNN, with different rows having different DIM percentages, as indicated in b. c, Networks across the 0 to 100% DIM axis perform the task with psychometric curves qualitatively similar to experimental data (Fig. 1c). d, All of the networks have neural activity that produces TDR traces that are qualitatively similar to the experimental data (compare with Fig. 1f,g). e, In contrast to c,d, differential pulse responses (as in Fig. 2d–f) distinguish the different RNNs. f, Estimation of the differential pulse responses using kernel regression methods applicable to experimental data (Methods) match the calculated differential pulse responses from d. g, The slope index (Methods) quantifies the slope of the traces. Applied to the estimated differential pulse responses in e, it has a monotonic relationship with DIM percentage, and therefore can be used as a proxy measure for DIM percentage. h, Differential pulse responses estimated from experimental data for each of the FRQ (bottom) and LOC (left) features, with the corresponding parallel indices plotted against each other (top right). Arrows point to the parallel index value of each of the examples shown. Error bars indicate bootstrapped standard errors. Data from n = 7 recorded rats.

Extended Data Fig. 6. Engineered recurrent neural networks (RNNs) across the entire solution space (Fig. 2g) all qualitatively reproduce rat TDR trial-based dynamics, but are distinguished by pulse-based analysis.

Extended Data Fig. 6

a) Architecture of the RNNs. (b) TDR analysis (orange frame) and pulse-based analysis (purple frame) applied to RNNs generated to span different points within the solution space, as indicated by the RNN symbol on the barycentric coordinates. The TDR analysis and the pulse-based analysis of one RNN at each position are shown, connected to their RNN position by the arrow. For the position at the very center of the triangle, three different RNNs at that position, trained by starting from different random initial weights, are shown. All RNNs qualitatively reproduce rat TDR trial-based dynamics. The variability of trial-based TDR seen across RNNs is not predictive of the position within the solution space, and even RNNs generated from the same point can produce variable TDR trajectories. In contrast, the estimated pulse-triggered response reliably indicates the position of RNNs along the vertical axis of the solution space.

By contrast, the descriptions of the three components illustrated in Fig. 2d–f suggest that analysing the response of the system to pulses of evidence would better distinguish different solutions—an analysis that our pulse-based task is well suited to. A full characterization would require an estimate of each of the dynamics selection vectors sREL and sIRR, which unfortunately are not directly observable. Nevertheless, the direction of the choice axis is straightforwardly estimated (Fig. 1), making the projection of the system’s state onto the choice axis a readily assayed measure. Figure 2d–f, bottom right shows that the difference across contexts of the time evolution of this projection (the differential pulse response) can serve as an assay of the percentage of DIM in the solution because it can distinguish solutions along the fast versus slow axis of Fig. 2g. This is illustrated in Fig. 3 using engineered RNNs, for which we can analytically compute their position on the barycentric coordinates (Fig. 3b) and can also directly measure the differential pulse response (Fig. 3e and Methods). As a summary of the temporal shape of the differential pulse response, we use the slope of a straight-line fit to it (slope index; Methods); the smallest slope index corresponds to Fig. 3f, top, and the largest slope index corresponds to Fig. 3f, bottom. Figure 3g confirms that in the RNNs, this slope index can be used as a measure of a network solution’s position on the fast versus slow axis. On the basis of previous approaches28, we developed kernel-based regression methods to measure the differential pulse response from neural activity recorded experimentally, and validated these methods in the RNNs (compare Fig. 3e,f). We then applied them to experimental data from each of our seven rats and for each of the LOC and FRQ features (Fig. 3h). Of note, we did not find that a particular slope index consistently characterized solutions across rats. Instead, there was high variability across rats in this measure, and even across features within a single rat; no apparent correlation between the LOC and FRQ slope indices was visible (Fig. 3h, top right).

Linking neural and behaviour variability

A widespread hypothesis in the field is that behavioural choices are driven by the system’s position on the choice axis2931. If this is correct, then fast versus slow context-dependent effects on the choice axis, as produced by large versus small DIM percentages (Fig. 3e,f), should have corresponding behavioural correlates. To assess the effect on behavioural choices of pulses at different times of a trial, we used logistic regression to compute behavioural kernels for LOC and FRQ evidence in each of the two LOC and FRQ contexts; each these kernels is a measure, from behavioural data, of the relative weight that evidence presented across different time points of a trial has on the subject’s choices (Methods). For a given feature, either LOC or FRQ, we refer to the difference across contexts as the differential behavioural kernel (panels along vertical and horizontal axes of Fig. 4; Extended Data Fig. 3). The shape of an individual’s differential behavioural kernel for one feature did not appear to predict the shape of the kernel for the other feature (Fig. 4, top right), similar to our finding with the neural differential pulse responses (Fig. 3h). Nevertheless, the theory predicts that neural differential pulse responses and differential behavioural kernels should be tightly linked. Figure 5a,b illustrates the concept. We use the simplifying assumption that the neural differential pulse response (Fig. 3) does not depend on time within a trial or on previously presented evidence (data supporting this assumption are in Extended Data Fig. 7g). If T is the time at which position on the choice axis is read out to commit to a right versus left choice, then the context-dependent difference in the impact on behavioural choices of a pulse at time t will follow the neural differential pulse response at an interval T − t after the pulse. For DIM, with a differential pulse response that is immediate and sustained (Fig. 3e,f, top), the differential behavioural effect of a pulse should then be the same whether it is presented close to or long before the choice commitment time T, producing a flat differential behavioural kernel (that is, slope index = 0; Fig. 5a). However, for SVM or IIM with differential pulse responses that grow only gradually from zero (Fig. 3e,f bottom), the differential behavioural effect of a pulse will be small if presented shortly before choice commitment, and larger if presented longer before. This should result in a converging differential behavioural kernel (slope index > 0; Fig. 5b). In other words, the shape of the differential behavioural kernel should be the reflection on the time axis of the differential pulse response. These two very different types of measures—behavioural versus neural—are thus predicted to have the same slope index (but with opposite sign). We tested this prediction on RNNs engineered to solve the task using different amounts of DIM As predicted, the slope indices of the two different measures were tightly anti-correlated (Fig. 5c). We then tested whether a similar relationship existed for the rats’ behavioural and neural experimental data. To avoid any spurious correlations, we used different sets of pulses to assay each measure: we used pulses from the first half of the stimulus to measure the neural differential pulse response, and pulses from the second half of the stimulus to measure the differential behavioural kernel. We found robust support in the data for the theoretical prediction that the two measures should be correlated (Fig. 5d, r = −0.73, P < 0.01), with the correlation also holding for LOC evidence alone (r = −0.71, P < 0.1) or for FRQ evidence alone (r = −0.71, P < 0.1). Thus, although there is no correlation within the neural measure (Fig. 3h) or within the behavioural measure (Fig. 4), and although the two measures were assayed on entirely different sets of pulses, the theoretical prediction that they should be strongly correlated was confirmed (Fig. 5d). These results support both the overall theoretical framework, which was built around the line attractor hypothesis for the choice axis from which behaviour is read out, and the idea that variability in a solution’s position in the barycentric coordinates of Fig. 2g is the common source underlying and explaining the neural and behavioural variability in Figs. 3h and 4.

Fig. 4. Differential behavioural kernels show substantial heterogeneity across and within subjects, even when all subjects perform the task well.

Fig. 4

Behavioural kernels are a behaviour-based measure of how much weight the pulses from different time-points within a trial have on a subject’s decision (Methods and Supplementary Fig. 1). For a given feature, the differential behavioural kernel, shown here, is the difference in the behavioural kernel when that feature is in its relevant versus irrelevant context. Time axes run from the start of the auditory pulse trains (t = 0) to their end (t = 1.3 s). Figure conventions as in Fig. 3h, but the data here are behavioural, not neural. n = 18 rats.

Extended Data Fig. 3. Behavioral data for all rats.

Extended Data Fig. 3

Rat ID color indicates whether rat was used for electrophysiology (red), optogenetics (cyan) or only for behavior (black). a) Psychometric curves for frequency evidence, measuring the fraction of right choices as a function of strength of frequency evidence (6 levels of strength, see Fig. 1b). Green indicates frequency context (relevant), purple indicates location context (irrelevant). b) Weights for frequency evidence computed using the behavioral logistic regression for each rat (see Fig. 1d); colors as in panel a. c) Differential behavioral kernel for frequency evidence across all rats. d) Psychometric curves for location evidence, measuring the fraction of right choices as a function of strength of location evidence (6 levels of strength, see Fig. 1b). Green indicates location context (relevant), purple indicates frequency context (irrelevant). e) Weights for location evidence computed using the behavioral logistic regression for each rat (see Fig. 1d); colors as in panel d. f) Differential behavioral kernel for location evidence across all rats. Shaded areas indicate bootstrapped standard errors. g) The slope index computed from behavioral trials in the first half split is highly correlated with the slope index computed using the second half split (r = 0.58; p = 0.000013). The significance of the correlation was computed using the Student’s t distribution. h) Psychometric curves can be predicted with high precision from the weights of the logistic regression. Data are shown from the seven rats used for electrophysiology recordings.

Extended Data Fig. 7. Validation of pulse regression method.

Extended Data Fig. 7

a) Example application of the pulse regression to one example recorded unit. (b) Fraction of explained variance as a function of firing rate across all recorded units. (c,d) The pulse-regression kernels provide an accurate estimate of the response to a single isolated pulse. In (c) are shown the responses to a single isolated pulse of either location or frequency evidence in both contexts for an example RNN unit. In (d) are shown the estimates of these pulses from the dynamics of the RNN solving the task with regular trials featuring many consecutive pulses presented at 40 Hz. (e) Comparison of the direction of the true line attractor (computed by finding the RNN’s fixed points, see methods) with the choice axis estimated by the trial-based regression (Fig. 1f,g) and the pulse-based regression (Fig. 3). The choice axis closely approximates the direction of the true line attractor. (f) Kernels estimated using the assumption of gaussian noise closely approximate those estimated using the assumption of Poisson noise. Kernels are shown here for one example neuron. (g) Prediction accuracy does not improve when two separate kernels are computed for the early portion of the stimulus and the late portion of the stimulus. Here is shown the improvement in cross-validated prediction accuracy across all recorded neurons when using two separate kernels as compared to using a single kernel throughout the stimulus. The significance was evaluated using a two-tailed paired-sample t-test (p > 0.1). (h) Population pulse responses for two example rats, and corresponding differential pulse-triggered kernels for the 10 individual neurons with largest contributions to the choice axis.

Discussion

An influential conceptual approach known as ‘computation through dynamics’18,19,32 has posited that an understanding of neural activity from a mathematical dynamical systems perspective will enable explanation of high-level phenomena such as cognition. Our work supports this view: starting from the longstanding hypothesis that decision evidence accumulation occurs along a line attractor (a concept drawn from dynamical systems; Supplementary Information), with the system’s position on this line attractor driving choice behaviour, and adding an algebraic rewriting of how the linearized dynamics around such an attractor would differ across two contexts, we developed a theory that describes and accounts for the variability in the properties of different solutions used by equally well-performing individuals. The theory predicted a tight link between otherwise disparate neural and behavioural measurements. This prediction was then found to be well supported in the data across multiple animals.

The approach led to multiple insights: theoretical insights, defining the space of possible solutions (Fig. 2g); biological insights, describing the behavioural, neural and anatomical implications of the different solutions; conceptual insights, identifying the underlying source that links neural and behavioural variability (Fig. 5); and technical insights, enabling us to engineer RNNs that could not be constructed before, spanning the full space of solutions (Fig. 3a,b).

We describe our theoretical work as a ‘framework’ because it does not specify particular network implementations. Instead, it defines axes onto which all possible dynamical solutions can be projected and described, with the position of a solution on this space quantifying how pulse-evoked dynamics change across contexts. The different components of the barycentric coordinates of Fig. 2g can also be viewed in terms of an associated latent circuit that clearly separates each component (Extended Data Fig. 9). Each point in the space constrains features of the circuits that map to it, but each point could nevertheless be implemented in multiple ways. Recent computational work has described several different implementations of context-dependent decision-making in RNNs3335 (but see ref. 36 regarding ref. 35). Since the barycentric coordinates of Fig. 2g can be used to describe any network that solves the task with line attractors that are parallel across contexts (and see Extended Data Fig. 10 otherwise), all of the networks in refs. 4,3335 can be located on those coordinates. The rank 1 networks described in ref. 33 map onto points lying exclusively along the right edge of the triangle of barycentric coordinates in Fig. 2g (the input modulation edge). This is because networks with a non-zero SVM component require rank 2 or higher (Supplementary Information). The idealized latent network solution of ref. 34 (their fig. 3b) maps onto the bottom right corner of Fig. 2g (100% IIM). The recurrent network version of ref. 35 (their fig. S5H), which modulates the linearized inputs and the recurrent dynamics equally, maps onto a point at the centre of the left edge of the triangle. Finally, as described in Fig. 3, ref. 4 maps onto the bottom left corner (100% SVM). All three of refs. 4,34,35 each describe solutions that cover only a restricted region of the barycentric coordinates, and therefore do not address the variability we observed across individuals (see Supplementary Information for more on the relationship between refs. 3335 and our work).

Extended Data Fig. 9. Three distinct “languages” can capture the three fundamental solutions to the task.

Extended Data Fig. 9

a) “Linear algebra language”. As derived in the main text in Equations 1 and 2, the overall differential integration can be expressed as a sum of three terms. (b) “Network dynamics language”. The three solutions are associated with distinct pulse-evoked dynamics within the space spanned by the line attractor and the selection vector. (c) “Circuit dynamics language”. The three solutions are associated with three different latent circuit structures. To show this, we first note that our derivation of task solutions stems from focusing on linearized dynamics around fixed points of a line attractor (Fig. 2c). These linearized dynamics can be interpreted as an equivalent linear circuit whose synaptic connectivity matrix is defined by the state transition matrix (i.e. matrix M in Equation (1)). This circuit can be further simplified into a feedforward circuit using the Schur transformation (Goldman, 2009), which operates a change of coordinates to transform the state transition into an upper triangular form. In the resulting circuit, the first node represents the accumulator (i.e. the line attractor), and it receives feed-forward inputs from the other nodes of the circuit. Our three solutions can be interpreted as three different ways to modulate the connectivity of this circuit across the two contexts. In the case of “direct input modulation”, it is the input to the accumulator node that varies across contexts. In the case of “indirect input modulation”, it is the input to the other nodes that changes across contexts, and this differential input eventually reaches the accumulator through the feed-forward connections. Finally, in the case of “selection vector modulation”, the input to all nodes stays the same across contexts, but the feed-forward connections between the other nodes and the accumulator node change across contexts.

Our work also provides a cautionary note, highlighting the fact that trained RNNs, which are commonly used to model brain function4,3742, need not comprise the full set of solutions consistent with the biological data. We found that training led towards only one corner of the full space of solutions (Fig. 3a). It was a deeper understanding of the mathematics behind solutions (equations (1) and (2)), not the use of trained networks, that enabled us to engineer data-compatible RNNs across the full space of solutions (Fig. 3b–f and Extended Data Fig. 6).

The interactions between afferent input signals and recurrent dynamics are a key part of understanding context-dependent computations. This view is closely related to the alignment of inputs and dynamics recently reported for sensory learning43. For example, large context-dependent changes in the sensory input (that is, a large Δi in equation (1)) are not sufficient to conclude that those context-dependent changes in inputs drive context-dependent decision-making: only those input changes that are aligned to s¯, the average direction in neural space representing the recurrent dynamics, will produce a context-dependent effect on decisions (through Δis¯). For the same reason, we note that although our data (Fig. 1) and that of ref. 4 are not compatible with ‘early gating’ (that is, blocking irrelevant evidence from reaching decision-making regions), the data are nevertheless compatible with input modulation (Fig. 3 and Extended Data Fig. 6). Several further studies have also provided evidence against early gating5,6,44, but there are nevertheless multiple studies providing evidence in favour of early gating24,35,45,46, making the issue a matter of ongoing debate. It has been argued that early gating is indicated by a representation of evidence in decision-making regions that is weaker in the irrelevant context (that is, a smaller magnitude |i|, in our terminology)45, but example 3 in Supplementary Information illustrates a counter-example in which the context with smaller |i| is actually the one in which i has a larger effect on decisions, because it has the larger s ⋅ i; in other words, the interaction with recurrent dynamics needs to be taken into account before firm conclusions can be drawn. Similar to individual variability across the vertical axis of the solution space of Fig. 2g, which we believe is a result of all of the encompassed solutions being capable of solving the task, solutions with or without early gating are equally capable of solving the task (and both lie within the framework that we describe; see examples 1 and 2 in Supplementary Information). It is thus possible that there could be variability across tasks and individuals, and perhaps even within them, in the use of early gating. Further work will be needed to resolve the relative prevalence or absence of early gating.

We have focused on the case in which the choice axes of the two contexts are parallel to each other. A recent study15 reported that in contrast to the findings of ref. 4 in monkey FEF and our findings in rat FOF, choice axes in monkey parietal cortex rotated across two task contexts. This motivated a broadening of our barycentric coordinates framework, and Extended Data Fig. 10 and the discussion in Supplementary Information describe how it can be extended to choice axes that rotate across contexts. In that more complex case, there are four components that add up to the net context-dependent effect, rather than three, and the barycentric coordinates therefore exist in a tetrahedron instead of a triangle. However, the core concepts of the framework remain the same. The same study15 further contrasted with the approximately linear choice axis that we (Fig. 1e, t = 0 to t = 1.3 s) and others4,14,29,47,48 have found, in that they reported a curved choice axis due to a direction in neural space that encoded the magnitude of a trial’s difficulty, regardless of the sign of the subject’s upcoming choice. We speculate that differences across the studies could perhaps be explained by individual differences in the strength of difficulty encoding. In tasks or individuals where the difficulty encoding is stronger, the curvature would become a more important feature.

Even though our experiments were performed with rats, the similarity in the results of behavioural (Extended Data Fig. 1c,f) and neural (Fig. 1e,f) analyses that could be carried out in common with monkeys suggests that conclusions reached from rat data may generalize to other species. Using humans, a recent context-dependent decision-making study49 found that different stimulus features were processed independently. This finding is in line with our result that rat subjects can use separate mixtures of context-dependent components to select and accumulate each of the two features (Figs. 3h and 4).

Electrophysiological studies are frequently centred on findings that are similar across subjects, and it is common practice to report the result for an ‘average’ subject. However, our results reveal a surprising degree of heterogeneity across, and even within, individual subjects, underscoring the importance of characterizing the computations used by each individual50. This issue may be of particular importance for cognitive computations, which are largely internal and therefore potentially subject to substantial covert variability across subjects. Here, studying how computations vary across subjects was made possible by two key methodologies: (1) an efficient, automated procedure to train a sufficient number of rats9; and (2) characterization of the computations of each individual by leveraging the statistical power afforded by a randomly timed, pulse-based stimulus9.

A limitation of our analyses of the experimental data is that we are currently unable to discriminate between mechanisms that rely on context-dependent changes of recurrent dynamics (SVM) versus changes in the linearized sensory inputs (input vector modulation—that is, the oblique axis in Fig. 2g, bottom). A full characterization of the relevant neural dynamics will require estimation of the selection vector s for each context. Simultaneous recordings from large neural populations, combined with the application of recently developed latent dynamics estimation methods such as LFADS (latent factor analysis via dynamical systems)51 or FINDR (flow-field inference from neural data using deep recurrent networks)52, may prove instrumental in future work in this direction. Another potential limitation stems from the possibility that recurrent dynamics might evolve more rapidly53 than the current time resolution in our measurements, leaving us unable to discriminate between contextual input modulation and fast recurrent modulation. However, our results indicate that our analyses quantified the speed of evidence selection as smoothly varying across subjects (Figs. 3h and 4 and Extended Data Fig. 8), suggesting that in most subjects dynamics are slow enough to be captured with our method.

Extended Data Fig. 8. Differential pulse responses and behavioral kernels.

Extended Data Fig. 8

a) Differential pulse responses across the RNNs shown in Fig. 5c. The number above each behavioral kernel indicates the fraction of direct input modulation for the associated RNN (same notation as in Extended Data Fig. 6). (b) Corresponding behavioral kernel for each RNN. (c) Differential pulse responses across all rats shown in Fig. 5d (n = 7 rats, two features per rat). Gray indicates location feature, blue indicates frequency feature. (d) Corresponding behavioral kernels for each rat and feature. Shaded areas indicate bootstrapped standard errors.

In sum, our work provides a general framework to describe and investigate artificial and biological networks for flexible decision-making, and enables cellular-resolution study of individual variability in the neural computations that underlie higher cognition.

Methods

Subjects

All animal use procedures were approved by the Princeton University Institutional Animal Care and Use Committee (IACUC) and were carried out in accordance with NIH standards. All subjects were male Long-Evans rats between the ages of 6 and 24 months, that were kept on a reversed light–dark cycle. All training and testing procedures were performed during the dark cycle. Rats were placed on a restricted water schedule to motivate them to work for a water reward. A total of 26 rats were used for the experiments presented in this study. Of these, 7 rats were used for electrophysiology recordings, and 3 rats were implanted with optical fibres for optogenetic inactivation.

Behaviour

All rats included in this study were trained to perform a task requiring context-dependent selection and accumulation of sensory evidence (Fig. 1a). The task was performed in a behavioural box consisting of three straight walls and one curved wall with three nose ports. Each nose port was equipped with an LED to deliver visual stimuli, and with an infrared beam to detect the rat’s nose when entering the port. In addition, above the two side ports there were speakers to deliver sound stimuli, and water cannulas to deliver a water reward. At the beginning of each trial, rats were presented with an audiovisual cue indicating the context of the current trial, either LOC context or FRQ context. The context cues consisted of 1-s-long, clearly distinguishable frequency modulated sounds, and in addition the LOC context was signalled by turning on the LEDs of all three ports, whereas in the FRQ context only the centre LED was turned on. After the end of the context cue, the rats were required to place their nose into the centre port. While maintaining fixation in the centre port, rats were presented with a 1.3-s-long train of randomly timed auditory pulses. Each pulse was played either from the speaker to the rat’s left or from the speaker to their right, and each pulse a 5-ms pure tone with either low frequency (6.5 kHz) or high frequency (14 kHz). The pulse trains were generated by Poisson processes with different underlying rates. The strength of the location evidence was manipulated by varying the relative rate of right versus left pulses, and the strength of the frequency evidence was manipulated by varying the relative rate of high versus low pulses (Fig. 1b). The overall pulse rate was kept constant at 40 Hz. In the LOC context, rats were rewarded if they turned, at the end of the stimulus, towards the side that had played the greater total number of pulses, ignoring the frequency of the pulses. In blocks of frequency trials, rats were rewarded for orienting left if the total number of low-frequency pulses was higher than the total number of high-frequency pulses, and orienting right otherwise, ignoring the location of the pulses. The context was kept constant in blocks of trials, and block switches occurred after a minimum of 30 trials per block, and when a local estimate of performance reached a threshold of 80% correct. Behavioural sessions lasted 2–4 h, and rats performed on average 542 trials per session. On average, rats switched across 14.6 context blocks per session.

Electrophysiology

Tetrodes were constructed using nickel/chrome alloy wire, 12.7 μm (Sandvik Kanthal), and were gold-plated to 200 kΩ at 1 kHz. Tetrodes were mounted onto custom-made drives54 (Extended Data Fig. 4a,b), and the microdrives were implanted using previously described surgical stereotaxic implantation techniques11. Five rats were implanted with bilateral electrodes targeting FOF, centred at +2 mm anteroposterior (AP), ±1.3 mm mediolateral (ML) from bregma, while two rats were implanted with bilateral electrodes targeting the prelimbic area of mPFC, with coordinates +3.2 mm AP, ±0.75 mm ML from bregma. In 1 rat with an implant in FOF, 16 tetrodes were connected to a 64-channel electronic interface board, and recordings were performed using a wired setup (Open-Ephys). In the other 6 rats, 32 tetrodes per rat were connected to a 128-channel electronic interface board and recordings were performed using wireless headstages (Spikegadgets; Extended Data Fig. 4d).

Optogenetics

Preparation of chemically sharpened optical fibres (0.37 NA, 400 μm core; Newport) and basic virus injection techniques were the same as previously described11. At the targeted coordinates (FOF, +2 mm AP, ±1.3 mm ML from bregma), injections of 9.2 nl of adeno-associated virus (AAV) (AAV2/5-mDlx-ChR2-mCherry, three rats) were made every 100 μm in depth for 1.5 mm. Four additional injection tracts were completed at coordinates 500 μm anterior, 500 μm posterior, 500 μm medial and 500 μm lateral from the central tract. In total, 1.5 μl of virus was injected over approximately 30 min. Chemically sharpened fibres were lowered down the central injection track. Virus expression was allowed to develop for eight weeks before optogenetic stimulation began. Optogenetic stimulation was delivered at 25 mW using a customized wireless system derived from the Cerebro system55,56 (https://karpova-lab.github.io/cerebro; Extended Data Fig. 4c,e).

Analysis of behaviour

All code for data collection was written in Matlab 2019b. Data was extracted from all behavioural sessions in which rats’ fraction of correct responses was equal or above 70%, feature selection index (see below) was equal or above 0.7, and in which rats performed at least 100 trials. Analysis of behaviour was performed for all rats with electrophysiology or optogenetics implants, as well as for all other rats that performed at least 120,000 valid trials—that is, where the rat maintained fixation for the full duration of the pulse train before making a decision. Psychometric curves (Fig. 1c and Extended Data Fig. 3) were used to display the fraction of rightward choices as a function of the difference between the total number of right pulses and left pulses (location evidence strength), and as a function of the difference between the total number of high pulses and low pulses (frequency evidence strength). These curves were fit to a four-parameter logistic function9:

y(x)=y0+a1+e(xx0)b 3

To quantify whether a rat selected the contextually relevant evidence to form its decisions on a given session, we computed a feature selection index. For this purpose, we performed a logistic regression for each of the two contexts, where the rat’s choices were fit as a function of the strength of location and frequency evidence. For each context, we considered all valid trials, and we compiled the rat’s choices, as well as the strength of location and frequency evidence. The vector of choices was parameterized as a binary vector (right = 1; left = 0), the strength of location evidence was computed as the difference between the rate of right and the rate of left pulses, while the strength of frequency evidence was computed as the difference between the rate of high-frequency and low-frequency pulses. In the LOC context, we fit the probability of choosing right on trial k using the logistic regression:

logit(P(right)k)=sLOCEVD,kLOCCTXwLOCEVDLOCCTX+sFRQEVD,kLOCCTXwFRQEVDLOCCTX+βLOCCTX 4

where sLOCEVD,kLOCCTX indicates the strength of location evidence on trial k, sFRQEVD,kLOCCTX indicates the strength of frequency evidence on trial k, wLOCEVDLOCCTX is the weight of location evidence on the rat’s choices, wFRQEVDLOCCTX is the weight of frequency evidence on the rat’s choices, and βLOC CTX is a bias term. The relative weight of location evidence in the LOC context was computed as

Relativeweightlocation=wLOCEVDLOCCTXwLOCEVDLOCCTX+wFRQEVDLOCCTX 5

Similarly, in the FRQ context we fit the rat’s choices as

logit(P(right)k)=sLOCEVD,kFRQCTXwLOCEVDFRQCTX+sFRQEVD,kFRQCTXwFRQEVDFRQCTX+βFRQCTX 6

where sLOCEVD,kFRQCTX indicates the strength of location evidence on trial k, sFRQEVD,kFRQCTX indicates the strength of frequency evidence on trial k, wLOCEVDFRQCTX is the weight of location evidence on the rat’s choices, wFRQEVDFRQCTX is the weight of frequency evidence on the rat’s choices, and βFRQ CTX is a bias term. The relative weight of frequency evidence in the FRQ context was computed as

Relativeweightfrequency=wFRQEVDFRQCTXwLOCEVDFRQCTX+wFRQEVDFRQCTX 7

Finally, the feature selection index was then computed as the average between the relative weight of location in the LOC context (equation (5)) and the relative weight of frequency in the FRQ context (equation (7)):

Featureselectionindex=12wLOCEVDLOCCTXwLOCEVDLOCCTX+wFRQEVDLOCCTX+wFRQEVDFRQCTXwLOCEVDFRQCTX+wFRQEVDFRQCTX 8

The feature selection index was used to precisely quantify the rats’ learning during training, as this metric enables comparison of data across stages with different evidence strength (Extended Data Fig. 2g). In addition, the relative weight of location and frequency were computed for each rat as a function of the position of a trial within the block (for example, immediately after a block switch, one trial after a block switch, and so on), providing a measure of the rats’ ability to rapidly switch attended feature upon context switching (Extended Data Fig. 1g).

Behavioural logistic regression

To quantify the dynamics of evidence accumulation, behavioural data was analysed using another logistic regression. Importantly, in equations (5) and (7) we quantified the rat’s weighting of evidence using a single number, because we considered the generative rates—that is, the expected strength of location and frequency evidence on a given trial. Now, we seek instead to quantify how these weights vary throughout stimulus presentation, by taking advantage of the knowledge of the exact pulse timing. For each rat, data across all sessions was compiled into a single vector of choices (right versus left), and two matrices detailing the pulse information presented on every trial. More specifically, the choice vector was parameterized as a binary vector (right = 1; left = 0), with dimensionality N, where N is the total number of valid trials. Pulse information was split into location evidence and frequency evidence, and was binned into 26 bins with 50-ms width. For a given bin, the amount of location evidence was computed as the natural logarithm of the ratio between the number of right and the number of left pulses, and was compiled in a location pulse matrix XL with dimensionality N × 26. Similarly, frequency evidence was computed as the logarithm of the ratio between high-frequency and low-frequency pulses, and was compiled into a frequency pulse matrix XF with dimensionality N × 26. We chose to use the logarithm of the ratio instead of the difference because it provided a better fit to cross-validated data. To quantify the impact on choices of evidence presented at different time points we fit a logistic regression, where the probability of choosing right at trial k was given by

logit(P(right)k)=t=126Xk,tLwtL+Xk,tFwtF+β 9

where Xk,tL indicates the location evidence at time t on trial k, Xk,tF indicates the frequency evidence at time t on trial k, wtL indicates the location weight at time t, wtF indicates the frequency weight at time t, and β indicates the bias to one particular side. Weights were fit using ridge regression, and the ridge regularizer was chosen to optimally predict cross-validated choices. The regression was applied separately for trials in the LOC context, and trials in the FRQ context, resulting in four sets of weights computed for each rat (Supplementary Fig. 1.2). To study how evidence was differentially integrated across the two contexts, we then computed a differential behavioural kernel. The location differential kernel was equal to the difference between the location weights computed in the LOC context, and the location weights computed in the FRQ context. Similarly, the frequency differential kernel was equal to the difference between the frequency weights computed across the two contexts.

To quantify the shape of the differential behavioural kernels, we computed a behavioural slope index. To obtain this, we computed the straight line that provided the least-square fit of the difference between the weights across the two contexts. The slope index was defined as the slope of this fitting line.

As a result, a slope index = 0 indicates that the fitting line is perfectly horizontal (that is, the difference between the two sets of weights is constant at all time points), while a slope index <0 indicates a decreasing difference between the weights across contexts, and a slope index >0 indicates a rising difference. Empirically we found that differential behavioural kernels predominantly displayed convergence towards the end of the pulse stimulus presentation (Fig. 4 and Extended Data Fig. 8).

Analysis of neural data

Spike sorting was performed using MountainSort57, followed by manual curation of the results. In total, 3,495 putative single units were recorded from 5 rats in FOF (number of units in each rat: 2,047, 832, 258, 94, 54), while 210 units were recorded from 2 rats in mPFC (number of units in each rat: 112, 98). To measure the responses of individual neurons, peri-stimulus time histograms were computed by binning spikes in 20-ms intervals, and averaging responses for trials according to choice and context. Responses of single neurons in both areas were highly heterogeneous and multiplexed multiple types of information (Extended Data Fig. 4), and no systematic difference was found in the encoding of task variables across the two regions (see for example, Extended Data Fig. 5), so all studies of neural activity were carried out at the level of neural populations, and pooling data from FOF and from mPFC.

Trial-based TDR analysis of neural population dynamics

To study trial-averaged population dynamics, we applied model-based TDR (mTDR)14, a dimensionality-reduction method that seeks to identify the dimensions of population activity that carry information about different task variables. This method was applied to our rat dataset, and to reanalyse a dataset collected while macaque monkeys performed a similar visual task4 (Extended Data Fig. 1). In brief, the goal of mTDR is to identify the parameters of a model where the activity of each neuron is described as a linear combination of different task variables (choice, time, context and stimulus strength). For each of these task variables, the model retrieves a time-varying weight vector wi(t) (with number of elements, indexed by i, equal to the number of recorded neurons) specifying the linear relationship between the value of that variable and the activity of each neuron at each time point (each variable v contributes an additive component v ⋅ wi(t) to the firing rate of neuron i), and the collection of these weight vectors across all neurons are constrained to form a low-rank matrix. Singular value decomposition of this low-rank weight matrix is then used to identify basis vectors that maximally encode each of the task variables. Using this method, we identified one axis maximally encoding information about the upcoming choice of the animal (choice axis), one axis maximally encoding information about the momentary strength of the first stimulus feature (location for rat data, motion for monkey data), and one axis maximally encoding information about the momentary strength of the second stimulus feature (frequency for rat data, colour for monkey data). To study how neural dynamics evolved in this reduced space, we first averaged the activity of each neuron across all correct trials according to the strength of location evidence, strength of frequency evidence (that is, within each of the 36 blocks; Fig. 1b), and context, and choice. For this analysis, spike counts were computed in 50-ms non-overlapping bins with centres starting at the beginning of the pulse train presentation and ending 50 ms after the end of the pulse train presentation. For any given trial condition, a pseudo-population (that is, including non-simultaneously recorded neurons) was computed for each time point by compiling the responses of all neurons into a single vector. The trajectory of this vector over time was then projected on the retrieved task-relevant axes to evaluate population dynamics (Fig. 1d–g).

Pulse-based TDR analysis of neural population dynamics

To estimate the impact of evidence pulses and other task variables on neural responses, we fit the activity of each recorded unit using a pulse-based linear regression (Supplementary Fig. 1.1). For each neuron, spike counts were computed in 20-ms non-overlapping bins with centres starting 1 s before the beginning of the pulse train presentation, and ending 700 ms after the end of the stimulus presentation. The activity of neuron i at time t on trial k was described as

ri,t(k)=βchoice;i,tchoice(k)+βcontext;i,tcontext(k)+βtime;i,t++βLOC,LOC;ipulsesLOC,LOC(k)+βLOC,FRQ;ipulsesLOC,FRQ(k)+βFRQ,LOC;ipulsesFRQ,LOC(k)+βFRQ,FRQ;ipulsesFRQ,FRQ(k) 10

where xchoice(k) indicates the rat’s choice on trial k (right = 1, left = 0), xcontext(k) indicates the context on trial k (location = 1, frequency = 0), pulsesLOC,LOC(k) indicates the signed location evidence (number of right pulses minus number of left pulses) presented at each time bin on trial k in the LOC context, pulsesLOC,FRQ(k) indicates location evidence in the FRQ context, pulsesFRQ,LOC(k) indicates frequency evidence (number of high pulses minus number of low pulses) in the LOC context, and pulsesFRQ,FRQ(k) indicates frequency evidence in the FRQ context. The first three regression coefficients βchoice;i, βcontext;i and βtime;i account for modulations of neuron i across time according to choice, context and time. The other four sets of regression coefficients βLOC,LOC;i, βLOC,FRQ;i, βFRQ,LOC;i and βFRQ,FRQ;i indicate the effect of a pulse on the subsequent neural activity, and * indicates a convolution of each kernel with the pulse train; for example, in the case of location evidence in the LOC context:

βLOC,LOC;ipulsesLOC,LOC(k)=βLOC,LOC;ipulsesLOC,LOC(k;tτ) 11

meaning that the element at position τ of kernel βLOC,LOC;i represents the impact of a pulse of location evidence in the LOC context on the activity of unit i after a time. The three kernels for choice, context and time describe modulations from 1 s before stimulus start to 0.7 s after stimulus end in 20-ms non-overlapping bins, resulting in 151-dimensional vectors. The 4 pulse kernels describe modulations from the time of pulse presentation to 0.65 s after pulse presentation resulting in 33-dimensional vectors. To avoid overfitting, this regression was regularized using a ridge regularizer, as well as an L2 smoothing prior58. Pulse kernels were regarded as an approximation of the neural response to each pulse type (an assumption confirmed by analysis of RNNs) (Fig. 3e,f and Extended Data Fig. 7c,d).

We wish to emphasize that the critical difference between our previous trial-based application of TDR and the current pulse-based analysis is merely that in the previous trial-based analysis, stimuli are described as two scalar numbers, namely the expected strength of location and frequency evidence over the entirety of a trial. That is, the analysis ignores the precise timing of pulses. By contrast, the pulse-based analysis leverages knowledge of the precise timing of evidence presentation, a feature made possible by the pulse-based nature of our task. Besides that difference in how the stimulus regressors are treated, all other regressors are the same in the two methods; as a consequence, the resulting kernels are very similar across the two methods. This is true in particular for the choice kernels, thus leading to highly similar choice axes using either of the two methods, albeit the kernel-based method is regularized to reduce noise (see the high degree of alignment between the choice axes computed using either method versus the analytically computed line attractor direction in RNNs trained to perform the task, Extended Data Fig. 7e). Details of the computation of choice axis using the kernel-based are provided in ‘Estimating the choice axis’.

Finally, we note that there is a difference between the granularity of the neural kernels (20 ms) and the behavioural kernels (50 ms; see ‘Behavioural logistic regression’). In the case of the neural analysis, we noticed that the initial pulse-triggered response was often very fast, and that a shorter 20-ms time bin was best suited to allow us to capture its shape, especially in the first time points after the pulse presentation. By contrast, we noticed that the logistic regression was often noisier, and required pooling over at least 50-ms time bins to prevent behavioural kernels from being too noisy. For this reason, we decided to choose the optimal time bin size for each method, rather than using the same time bin for both analyses.

Estimating the choice axis

To compute the population choice axis, we compiled the choice kernels across all neurons, limited to a time window during the presentation of the pulse train stimulus (0 to 1.3 s after stimulus start), into a matrix Mc that is Nneurons × Ntime bins in size. The first principal component of this matrix (that is, the first eigenvector of McMcT, after correcting for the mean firing rate of each neuron), is the Nneurons-long vector in neural space that captures the most variance across choice kernels. This vector was then taken as the choice axis. The pulse-evoked population responses, and their projection onto the choice axis, were computed by compiling pulse kernels across all N neurons recorded from the same rat (Extended Data Fig. 5). At each point in time, the pulse kernel values across all neurons are a vector Nneurons in length; this was projected onto the choice axis (which is a vector of the same length). We then studied the time evolution of the results of this projection, which we referred to as the ‘projection onto the choice axis of population pulse response kernel’.

To test whether the direction of the choice axis was different across the two contexts, we computed the axis for each animal twice, using data collected only from one context at at time (Fig. 1e and Extended Data Fig. 5). To assess whether the direction of the choice axes computed for each context were significantly different from each other, for each rat we performed a random permutation test, where on each iteration we shuffled the context label of each trial. This label-shuffled data becomes the null model. We then recomputed the choice axis separately for trials labelled with each of the two contexts, and measured the angle between the two axes. Done across many shufflings, this provided us with a distribution of the angles between choice axes to be expected from the null model—that is, if there were no difference across contexts.

Estimating differential neural kernels

To study the differential evolution of pulse-evoked population responses across the two contexts, we computed a differential pulse response. For location evidence, the differential pulse response was defined as the difference between the projection onto the choice axis of the response to location pulses in the LOC context, and the response to location pulses in the FRQ context. For frequency evidence, the differential pulse response was computed as the difference between the projection onto the choice axis of the frequency pulse response in the FRQ context, minus the frequency pulse response in the LOC context (Supplementary Fig. 1.1c).

Summarizing the shape of the neural kernels in a slope index

To quantify the shape of differential pulse responses, we computed a neural slope index. To obtain this, we computed the straight line that provided the least-square fit of the difference between the pulse responses across the two contexts. The slope index was defined as the slope of this fitting line. As a result, a slope index = 0 indicates that the fitting line is perfectly horizontal (that is, the difference between the two pulse responses is constant at all time points), a slope index >0 indicates a rising differential response, and a slope index <0 indicates a decreasing differential response. Empirically we found that differential pulse responses only displayed positive (or zero) slope indices—that is, further amplifying the effect of relevant over irrelevant evidence onto the choice axis (Fig. 3h and Extended Data Fig. 8).

Recurrent neural networks

To validate our analyses of behaviour and neural dynamics, and to gather a deeper understanding of the mathematical mechanisms that could underlie our rats’ context-dependent behaviour, we trained RNNs to perform a pulse-based context-dependent evidence accumulation task analogous to that performed by the rats.

The activity of the N = 100 hidden units of each network (Extended Data Fig. 6a) was defined by the dynamical equations

x^=Wr^+i^τr^˙=r^+g(x^) 12

where τ is the network time constant, x^ is the vector of activations of each unit, with each of its elements interpreted as roughly paralleling the net input current to a neuron, W is the matrix of connections between units, i^ is the external input to each unit, and g() is a pointwise nonlinearity whose output is interpreted as roughly paralleling the activity (firing rate) of a neuron given that neuron’s net input current. We used g()=tanh(), but similar results should apply with other standard nonlinearities.

The input i^ is in turn composed of several terms:

i^=b+Wcc+wLOCiLOC+wFRQiFRQ 13

The first term, b, represents a bias to each unit that is constant across time and trials. In the second term, c is a two-element-long column vector that encodes current context in a one-hot manner (in the LOC context, c=10, and in the FRQ context, c=01). The matrix Wc is N × 2 in size, so its first column represents an additive bias to the units in the LOC context while its second column represents an additive bias in the FRQ context. In the next two terms, the time-dependent scalars iLOC and iFRQ represent the momentary LOC and FRQ evidence, respectively, with wLOC and wFRQ representing how each of those impact the units of the network.

The output of the network was determined by a single output unit performing a linear readout of the activity of the RNN units:

z=wOTr^+kO 14

where wO indicates the N × 1 vector of output weights assigned to each hidden unit and kO is a scalar representing the output bias. The choice of the network on a given trial was determined by the sign of z at the last time point (T = 1.3 s). During training and analysis, evolution of the network was computed in 10-ms time steps. During training, τ was set to 10 ms, but in subsequent analyses τ was set to 100 ms, so as to replicate the autocorrelation timescale observed in neural data.

Training of RNNs using backpropagation

RNNs were trained using backpropagation through time with the Adam optimizer and implemented in the Python JAX framework. The weights of the network were initialized using a standard normal distribution, modified according to the number of inputs to a unit, and then rescaled. If η is drawn from a standard normal distribution η ~ N(0, 1), input weights were chosen as η1U; recurrent weights were chosen as η0.8N; output weights were chosen as η1N; where U indicates the number of inputs (U = 4) and N indicates the number of hidden units (N = 100). All the biases of the network were initialized at 0. The initial conditions were also learned, and were also initialized randomly from a standard normal distribution, with each element of the initial condition initialized as 0.1. The Adam parameters for training were b1 = 0.9; b2 = 0.999; epsilon = 0.1. The learning rate followed an exponential decay with initial step size = 0.002, and decay factor = 0.99998. Training occurred over 120,000 batches with a batch size of 256 trials. Using this procedure, we trained 1,000 distinct RNNs to solve the task using different random initializations on each run (Fig. 3a). All networks learned to perform the task with high accuracy (see for example, Fig. 3c). All the code for training, analysis and engineering of RNNs is available at https://github.com/Brody-Lab/flexible_decision_making_rnn.

Analysis of RNN mechanisms

To analyse the linear dynamics implemented by each RNN to perform context-dependent evidence accumulation, we first identified the fixed points of each trained network using a previously described optimization procedure4,59. We then linearized around that fixed point, as follows.

Around any given point (r^0,i^0), a first-order Taylor expansion tells us that the dynamics (equation (12)) will be approximated by

τr^τr^˙(r^0,i^0)+τr^˙r^(r^r^0)+τr^˙i^(i^i^0) 15

where the partial derivatives are evaluated at (r^0,i^0). When (r^0,i^0) is a fixed point, r^˙(r^0,i^0)=0. Using equation (12), we can obtain the derivatives

τr^˙r^=I+DWτr^˙i^=D 16

where D is a diagonal matrix that we refer to as the gain matrix, and whose elements are given by

Djj=g(x^0j) 17

with g being the derivative of the pointwise nonlinearity g() and x^0j being determined by the fixed point, as they are the elements of x^0=Wr^0+i^0.

Combining equation (16) with equation (15), and changing variables to

r=r^r^0i=i^i^0 18

we obtain the linearized dynamics

τr˙=r+DWr+Di 19

In the absence of sensory evidence—that is, in the silences between clicks when iLOC = 0 and iFRQ = 0—the fixed points of the system will be determined by i^0=b+Wcc. The fixed points are therefore context-dependent, and as a consequence, the gain matrix D will also be context-dependent, since it is a function of the fixed point around which we are linearizing (equation (17)). The context-dependence of D is what leads to different linearized dynamics in the two contexts.

The linearized connectivity matrix that determines the recurrent dynamics, D ⋅ W, depends on D; and the linearized input vector, D ⋅ i, also depends on D. Thus, this formulation allows both context-dependent modulation of the recurrent dynamics and of the input vector.

The discussion in Supplementary Information describes how RNN equations linearized in the activation space x^, even while equivalent to the dynamics used here, do not allow observing context-dependent input modulation. This would eliminate the right and top corners of the barycentric coordinates of Fig. 2g. Analyses linearizing in activation space x^ are therefore limited to describing solutions as being 100% SVM.

For each trained RNN, we focused on the analysis of the linearized dynamics corresponding to the fixed point with the smallest absolute network output |z| (that is, where the network is closest to the decision boundary), but results were similar when considering different fixed points (that is, linearized dynamics were mostly similar across different fixed points). Similar to previous reports4, we found that in every well-trained network, fixed points were roughly aligned to form a line attractor for each of the two contexts, and that eigendecomposition of the Jacobian matrix D ⋅ W reveals a single eigenvalue close to 0, and all other eigenvalues with a negative real value. This reflects the existence of a single stable direction of evidence accumulation (the line attractor), surrounded by stable dynamics.

The right eigenvector associated with the eigenvalue closest to 0 defined the direction of the line attractor ρ, while the corresponding left eigenvector defined the direction of the selection vector s. For each network, we computed these vectors separately for the two contexts by setting the contextual input c as c=10 in the LOC context, and c=01 in the FRQ context, before computing the fixed points and the eigendecomposition. As a result, for each network we computed the line attractor in each of the two contexts, which we denote as ρLOC and ρFRQ, and the selection vector in each of the two contexts (sLOC and sFRQ), as well as the linearized input D ⋅ i in each of the two contexts (iLOC and iFRQ). Using these quantities, we directly computed the terms in equation (2) to quantify how much each of the three components contributed to differential pulse accumulation, and we plotted the results for 1,000 RNNs in barycentric coordinates (Fig. 3a).

Engineering of RNNs to implement arbitrary combinations of components

To engineer RNNs that would implement arbitrary combinations of components, we started from the RNN solutions obtained from standard training using backpropagation through time. For a given trained network, we first computed the fixed points of the network and the linearized network dynamics, and we identified the line attractor, selection vector and effective input across the two contexts (see above). Because the RNN dynamics are known (equations (12) and (13)), the linearized dynamics can be expressed in closed form as a function of the network weights:

Mj=Frj=wRjtanh(wRrfixed+wCc+k) 20
i=Fu=wutanh(wRrfixed+wCc+k) 21

where Mj indicates the jth column of the Jacobian matrix, wRj indicates the jth column of the matrix of recurrent weights, rfixed indicates the network activity at the fixed point, tanh indicates the first derivative of the hyperbolic tangent nonlinearity, and ⊙ indicates the Hadamard product or element-wise multiplication, where the elements of two vectors are multiplied element-by-element to produce a vector of the same size. We further define the saturation factor for each of the two contexts as:

satLOC=tanh(wRrfixed,LOC+wCcLOC+k) 22
satFRQ=tanh(wRrfixed,FRQ+wCcFRQ+k) 23

where rfixed,LOC indicates the fixed point with the smallest absolute network output in the LOC context, rfixed,FRQ indicates the fixed point with the smallest absolute network output in the FRQ context, cLOC indicates the context input in the LOC context (1, 0), and cFRQ indicates the context input in the FRQ context (0, 1). The effective input for the two contexts can therefore be computed as:

iLOC=wusatLOCiFRQ=wusatFRQ 24

The three components of context-dependent differential integration defined in equation (2) can therefore be rewritten as a function of the input weights wu. The SVM, which is equal to the dot product between the difference in the selection vector and the average effective input, can be rewritten as:

Δsi¯=ΔsiLOC+iFRQ2=ΔswusatLOC+satFRQ2==Δswusat¯=wu(Δssat) 25

where sat¯ indicates the average saturation factor across contexts, and the last step takes advantage of the associative property of the Hadamard and dot product. The DIM, which is equal to the dot product between the difference in the effective input and the line attractor, can be rewritten as:

Δiρ¯=(iLOCiFRQ)ρ¯=wu(satLOCsatFRQ)ρ¯=wuΔsatρ¯=wu(Δsatρ¯) 26

where Δsat indicates the difference between the saturation factor across the two contexts. The IIM, which is equal to the dot product between the difference in the effective input and the average selection vector orthogonal to the line attractor s, can be rewritten as:

Δis¯=(iLOCiFRQ)s¯=wu(satLOCsatFRQ)s¯=wuΔsats¯=wu(Δsats¯) 27

Knowledge of equations (21), (22) and (23) allows us to identify input vectors that produce network dynamics relying on any arbitrary combinations of the three components. For example, producing a network using exclusively SVM requires the first component (equation (21)) to be large, while the second (equation (22)) and third (equation (23)) components must be 0. In other words, the input weights wu must satisfy:

wu(Δssat¯)>0wu(Δsatρ¯)=0wu(Δsats¯)=0 28

In addition, we must also require that the network does not accumulate the pulse in the irrelevant context. Because we are conducting this analysis for pulses of location evidence, this means that the dot product between the effective input and the selection vector in the FRQ context should be 0:

iFRQsFRQ=0iFRQsFRQ=wusatFRQsFRQ==wu(satFRQsFRQ)=0 29

Finally, we then use the Gram-Schmidt process to find the set of weight wu maximally aligned to the vector Δssat¯, and orthogonal to vectors Δsatρ¯, Δsats¯ and satFRQ ⊙ sFRQ. Similar considerations can be applied to produce networks using different mechanisms. For example, to engineer a network that uses only DIM the input weight must be maximally aligned to Δsatρ¯ and orthogonal to Δssat¯, Δsats¯ and satFRQ ⊙ sFRQ. Engineering networks implementing combinations of mechanisms can be obtained by choosing the input vector as a linear combination between extreme network solutions. Finally, we emphasize that the mechanism chosen for one stimulus feature (for example, location) is entirely independent from the mechanism chosen for the other stimulus feature (for example, frequency).

Statistical methods

Comparison of the strength of the encoding of relevant versus irrelevant information (Fig. 1f,g) was performed by quantifying the variability across responses to different stimulus strengths, normalized by trial-by-trial variability, limiting the analysis to the subspace orthogonal to choice encoding. Error bars for neural and behavioural kernels were computed using bootstrapping. On each iteration of the bootstrap procedure, we randomly resampled trials, with replacement, and we computed the standard error as the standard deviation of the bootstrapped values over 100 iterations.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Online content

Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41586-024-08433-6.

Supplementary information

Supplementary Information (1.4MB, pdf)

This file contains an extended discussion, including supplementary figures and additional references.

Reporting Summary (2.1MB, pdf)

Acknowledgements

The authors thank S. Ostojic, K. Miller, S. Fusi and S. Druckmann for discussion and feedback on the manuscript, and J. Teran and C. Kopec for animal and laboratory support. This work was funded by the Howard Hughes Medical Institute and by NIH grant R21MH124383. M.P. was supported by a Simons Collaboration on the Global Brain Postdoctoral Fellowship, and by a Simons Foundation Autism Research Initiative Bridge to Independence Award.

Extended data figures and tables

Author contributions

M.P. and C.D.B. designed the experiment. M.P., V.M. and C.D.B. designed the automated training procedure. M.P. and V.D.T. performed the experiments. M.C.A. and J.W.P. developed the mTDR analysis. M.P., M.C.A. and J.W.P. designed the pulse-based analysis of neural data. All authors contributed to the conceptual development of the theory. M.P. and C.D.B. developed the mathematical framework. M.P. and D.S. trained and analysed artificial neural networks. M.P. and C.D.B. wrote the manuscript after discussions among all authors. C.D.B. supervised the project.

Peer review

Peer review information

Nature thanks Cristina Savin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Data availability

Rat behavioural and electrophysiological data are available at https://github.com/Brody-Lab/flexible_decision_making_rats. Modelling data are available at https://github.com/Brody-Lab/flexible_decision_making_rnn.

Code availability

The code to train rats is available at https://github.com/Brody-Lab/flexible_decision_making_training. All the code for training, analysis and engineering of RNNs is available at https://github.com/Brody-Lab/flexible_decision_making_rnn.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Marino Pagan, Email: mpagan@ed.ac.uk.

Carlos D. Brody, Email: brody@princeton.edu

Extended data

is available for this paper at 10.1038/s41586-024-08433-6.

Supplementary information

The online version contains supplementary material available at 10.1038/s41586-024-08433-6.

References

  • 1.Okazawa, G. & Kiani, R. Neural mechanisms that make perceptual decisions flexible. Annu. Rev. Physiol. 10.1146/annurev-physiol-031722-024731 (2022). [DOI] [PMC free article] [PubMed]
  • 2.Livneh, Y. et al. Homeostatic circuits selectively gate food cue responses in insular cortex. Nature546, 611–616 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Sarel, A. et al. Natural switches in behaviour rapidly modulate hippocampal coding. Nature609, 119–127 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature503, 78–84 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Siegel, M., Buschman, T. J. & Miller, E. K. Cortical information flow during flexible sensorimotor decisions. Science348, 1352–1355 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sasaki, R. & Uka, T. Dynamic readout of behaviorally relevant signals from area MT during task switching. Neuron62, 147–157 (2009). [DOI] [PubMed] [Google Scholar]
  • 7.Gold, J. I. & Shadlen, M. N. The neural basis of decision making. Annu. Rev. Neurosci.30, 535–574 (2007). [DOI] [PubMed] [Google Scholar]
  • 8.Brody, C. D. & Hanks, T. D. Neural underpinnings of the evidence accumulator. Curr. Opin. Neurobiol.37, 149–157 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Brunton, B. W., Botvinick, M. M. & Brody, C. D. Rats and humans can optimally accumulate evidence for decision-making. Science340, 95–98 (2013). [DOI] [PubMed] [Google Scholar]
  • 10.Erlich, J. C., Bialek, M. & Brody, C. D. A cortical substrate for memory-guided orienting in the rat. Neuron72, 330–343 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Hanks, T. D. et al. Distinct relationships of parietal and prefrontal cortices to evidence accumulation. Nature520, 220–223 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Leonard, C. M. The prefrontal cortex of the rat. I. Cortical projection of the mediodorsal nucleus. II. Efferent connections. Brain Res.12, 321–343 (1969). [DOI] [PubMed] [Google Scholar]
  • 13.Sinnamon, H. M. & Galer, B. S. Head movements elicited by electrical stimulation of the anteromedial cortex of the rat. Physiol. Behav.33, 185–190 (1984). [DOI] [PubMed] [Google Scholar]
  • 14.Aoi, M. C., Mante, V. & Pillow, J. W. Prefrontal cortex exhibits multidimensional dynamic encoding during decision-making. Nat. Neurosci.23, 1410–1420 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Okazawa, G., Hatch, C. E., Mancoo, A., Machens, C. K. & Kiani, R. Representational geometry of perceptual decisions in the monkey parietal cortex. Cell184, 3748–3761.e18 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Charlton, J. A. & Goris, R. L. T. Abstract deliberation by visuomotor neurons in prefrontal cortex. Nat. Neurosci.27, 1167–1175 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Luo, T. Z. et al. Transitions in dynamical regime and neural mode underlie perceptual decision-making. Preprint at bioRxiv10.1101/2023.10.15.562427 (2023).
  • 18.Vyas, S., Golub, M. D., Sussillo, D. & Shenoy, K. V. Computation through neural population dynamics. Annu. Rev. Neurosci.43, 249–275 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Seung, H. S. How the brain keeps the eyes still. Proc. Natl Acad. Sci. USA93, 13339–13344 (1996). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Barbosa, J. et al. Early selection of task-relevant features through population gating. Nat. Commun.14, 6837 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Reynolds, J. H. & Chelazzi, L. Attentional modulation of visual processing. Annu. Rev. Neurosci.27, 611–647 (2004). [DOI] [PubMed] [Google Scholar]
  • 22.Noudoost, B., Chang, M. H., Steinmetz, N. A. & Moore, T. Top-down control of visual attention. Curr. Opin. Neurobiol.20, 183–190 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Maunsell, J. H. R. & Treue, S. Feature-based attention in visual cortex. Trends Neurosci.29, 317–322 (2006). [DOI] [PubMed] [Google Scholar]
  • 24.Wimmer, R. D. et al. Thalamic control of sensory selection in divided attention. Nature526, 705–709 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Servan-Schreiber, D., Printz, H. & Cohen, J. D. A network model of catecholamine effects: gain, signal-to-noise ratio, and behavior. Science249, 892–895 (1990). [DOI] [PubMed] [Google Scholar]
  • 26.Pagan, M., Valente, A., Ostojic, S. & Brody, C. D. Brief technical note on linearizing recurrent neural networks (RNNs) before vs after the pointwise nonlinearity. Preprint at 10.48550/arXiv.2309.04030 (2023).
  • 27.Maheswaranathan, N. & Sussillo, D. How recurrent networks implement contextual processing in sentiment analysis. Preprint at 10.48550/arXiv.2309.04030 (2020).
  • 28.Park, I. M., Meister, M. L. R., Huk, A. C. & Pillow, J. W. Encoding and decoding in parietal cortex during sensorimotor decision-making. Nat. Neurosci.17, 1395–1403 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Peixoto, D. et al. Decoding and perturbing decision states in real time. Nature591, 604–609 (2021). [DOI] [PubMed] [Google Scholar]
  • 30.Kurikawa, T., Haga, T., Handa, T., Harukuni, R. & Fukai, T. Neuronal stability in medial frontal cortex sets individual variability in decision-making. Nat. Neurosci.21, 1764–1773 (2018). [DOI] [PubMed] [Google Scholar]
  • 31.Orlandi, J. G., Abdolrahmani, M., Aoki, R., Lyamzin, D. R. & Benucci, A. Distributed context-dependent choice information in mouse posterior cortex. Nat. Commun.14, 192 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Hopfield, J. J. Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Natl Acad. Sci. USA81, 3088–3092 (1984). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Dubreuil, A., Valente, A., Beiran, M., Mastrogiuseppe, F. & Ostojic, S. The role of population structure in computations through neural dynamics. Nat. Neurosci.25, 783–794 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Langdon, C. & Engel, T. A. Latent circuit inference from heterogeneous neural responses during cognitive tasks. Preprint at bioRxiv10.1101/2022.01.23.477431 (2022). [DOI] [PMC free article] [PubMed]
  • 35.Flesch, T., Juechems, K., Dumbalska, T., Saxe, A. & Summerfield, C. Orthogonal representations for robust context-dependent task performance in brains and neural networks. Neuron110, 1258–1270.e11 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Flesch, T. et al. Are task representations gated in macaque prefrontal cortex? Preprint at 10.48550/arXiv.2306.16733 (2023).
  • 37.Duan, C. A. et al. Collicular circuits for flexible sensorimotor routing. Nat. Neurosci.24, 1110–1120 (2021). [DOI] [PubMed] [Google Scholar]
  • 38.Perich, M. G. & Rajan, K. Rethinking brain-wide interactions through multi-region ‘network of networks’ models. Curr. Opin. Neurobiol.65, 146–151 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Orhan, A. E. & Ma, W. J. A diverse range of factors affect the nature of neural representations underlying short-term memory. Nat. Neurosci.22, 275–283 (2019). [DOI] [PubMed] [Google Scholar]
  • 40.Wang, J., Narain, D., Hosseini, E. A. & Jazayeri, M. Flexible timing by temporal scaling of cortical responses. Nat. Neurosci.21, 102–110 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Sohn, H., Narain, D., Meirhaeghe, N. & Jazayeri, M. Bayesian computation through cortical latent dynamics. Neuron103, 934–947.e5 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Remington, E. D., Narain, D., Hosseini, E. A. & Jazayeri, M. Flexible sensorimotor computations through rapid reconfiguration of cortical dynamics. Neuron98, 1005–1019.e5 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Chadwick, A. et al. Learning shapes cortical dynamics to enhance integration of relevant sensory input. Neuron111, 106–120.e10 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Rodgers, C. C. & DeWeese, M. R. Neural correlates of task switching in prefrontal cortex and primary auditory cortex in a novel stimulus selection task for rodents. Neuron82, 1157–1170 (2014). [DOI] [PubMed] [Google Scholar]
  • 45.Takagi, Y., Hunt, L. T., Woolrich, M. W., Behrens, T. E. & Klein-Flügge, M. C. Adapting non-invasive human recordings along multiple task-axes shows unfolding of spontaneous and over-trained choice. eLife10, e60988 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Barbosa, J. et al. Early selection of task-relevant features through population gating. Nat. Commun.14, 6837 (2023). [DOI] [PMC free article] [PubMed]
  • 47.Li, N., Daie, K., Svoboda, K. & Druckmann, S. Robust neuronal dynamics in premotor cortex during motor planning. Nature532, 459–464 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ni, A. M., Ruff, D. A., Alberts, J. J., Symmonds, J. & Cohen, M. R. Learning and attention reveal a general relationship between population activity and behavior. Science359, 463–465 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Ritz, H. & Shenhav, A. Humans reconfigure target and distractor processing to address distinct task demands. Psychol. Rev.131, 349 (2024). [DOI] [PMC free article] [PubMed]
  • 50.Prinz, A. A., Bucher, D. & Marder, E. Similar network activity from disparate circuit parameters. Nat. Neurosci.7, 1345–1352 (2004). [DOI] [PubMed] [Google Scholar]
  • 51.Pandarinath, C. et al. Inferring single-trial neural population dynamics using sequential auto-encoders. Nature Methods15, 805–815 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Kim, T. D. et al. Flow-field inference from neural data using deep recurrent networks. Preprint at bioRxiv10.1101/2023.11.14.567136 (2023).
  • 53.Murphy, B. K. & Miller, K. D. Balanced amplification: a new mechanism of selective amplification of neural activity patterns. Neuron61, 635–648 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Aronov, D. & Tank, D. W. Engagement of neural circuits underlying 2D spatial navigation in a rodent virtual reality system. Neuron84, 442–456 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Brown, J. et al. Expanding the optogenetics toolkit by topological inversion of rhodopsins. Cell175, 1131–1140.e11 (2018). [DOI] [PubMed] [Google Scholar]
  • 56.Tervo, D. & Karpova, A. Y. Rapidly inducible, genetically targeted inactivation of neural and synaptic activity in vivo. Curr. Opin. Neurobiol.17, 581–586 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Chung, J. E. et al. A fully automated approach to spike sorting. Neuron95, 1381–1394.e6 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Pillow, J. W. et al. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature454, 995–999 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Sussillo, D. & Barak, O. Opening the black box: low-dimensional dynamics in high-dimensional recurrent neural networks. Neural Comput.25, 626–649 (2013). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information (1.4MB, pdf)

This file contains an extended discussion, including supplementary figures and additional references.

Reporting Summary (2.1MB, pdf)

Data Availability Statement

Rat behavioural and electrophysiological data are available at https://github.com/Brody-Lab/flexible_decision_making_rats. Modelling data are available at https://github.com/Brody-Lab/flexible_decision_making_rnn.

The code to train rats is available at https://github.com/Brody-Lab/flexible_decision_making_training. All the code for training, analysis and engineering of RNNs is available at https://github.com/Brody-Lab/flexible_decision_making_rnn.


Articles from Nature are provided here courtesy of Nature Publishing Group

RESOURCES