A reservoir of time constants for memory traces in cortical neurons

Alberto Bernacchia; Hyojung Seo; Daeyeol Lee; Xiao-Jing Wang

doi:10.1038/nn.2752

. Author manuscript; available in PMC: 2011 Sep 1.

Published in final edited form as: Nat Neurosci. 2011 Feb 13;14(3):366–372. doi: 10.1038/nn.2752

A reservoir of time constants for memory traces in cortical neurons

Alberto Bernacchia ¹, Hyojung Seo ¹, Daeyeol Lee ¹, Xiao-Jing Wang ^1,¹

PMCID: PMC3079398 NIHMSID: NIHMS263877 PMID: 21317906

Abstract

According to reinforcement learning theory of decision making, reward expectation is computed by integrating past rewards with a fixed timescale. By contrast, we found that a wide range of time constants is available across cortical neurons recorded from monkeys performing a competitive game task. By recognizing that reward modulates neural activity multiplicatively, we found that one or two time constants of reward memory can be extracted for each neuron in prefrontal, cingulate, and parietal cortex. These timescales ranged from hundreds of milliseconds to tens of seconds, according to a power-law distribution, which is consistent across areas and reproduced by a “reservoir” neural network model. These neuronal memory timescales were weakly but significantly correlated with those of monkey's decisions. Our findings suggest a flexible memory system, where neural subpopulations with distinct sets of long or short memory timescales may be selectively deployed according to the task demands.

In economic behavior, choices that have a higher reward expectation are favoured, and adaptive decision-making depends on our ability to learn reward expectation through past rewards associated with our actions. The neural mechanisms underlying this process have been the subject of growing interest, since they could provide important insights on how learning occurs in the brain, and how humans and other animals make economic decisions. Neural correlates of reward valuation have been observed in different studies¹^-³, and interpreted in the framework of reinforcement learning (RL) theory⁴^-⁵. In the RL model, reward expectation is computed by weighting the previous rewards through a temporal filter, which quantifies the memory trace of rewards. The optimal duration of the filter (memory) depends on the predictability of the environment. If the payoffs for the same option change often and unpredictably, then rewards should be filtered on short timescales in order to track the fast changes in a volatile environment; by contrast, if past rewards reliably predict future ones, then they should be filtered on long timescales to exploit a stable environment⁶^-⁷. The neural mechanism underlying switching between long and short time constants for computing reward expectation remains poorly understood.

On which timescale does the brain filter rewards? So far a few studies have estimated the time constant of this filter from behavior, and assessed how past rewards affect choice selection⁸^-¹², but the neural mechanisms responsible for such timescales are still unknown. To address this issue, we analyzed the activity of cortical neurons in monkeys performing a competitive game task. Using a method based on the idea that reward memory modulates neural activity multiplicatively, we show that memory time constants can be extracted from the activity of single neurons. We found that a different timescale for reward memory can be associated with each recorded neuron, and there is a wide range of timescales across neurons, obeying a power law distribution. The same distribution is found across three different cortical areas, anterior cingulate cortex (ACCd), dorsolateral prefrontal cortex (DLPFC), and lateral intraparietal cortex (LIP). Hence, each area is endowed with a reservoir of time constants for reward memory, which are distributed heterogeneously across neurons.

We found that the time constants estimated from pairs of simultaneously recorded neurons are uncorrelated, implying that our results cannot be explained by a single time constant for all neurons that changes slowly over time. On the other hand, our analysis of animal's behavior suggests that the timescale over which reward events affect decisions changes across experimental sessions, possibly reflecting the animal's attempt to increase its payoff by exploring different strategies. The time constants for reward memory at the behavioural and neuronal levels were weakly but significantly correlated across experimental sessions. Finally, we show that a randomly connected circuit model, akin to a “reservoir” network¹³^-¹⁵, can reproduce the observed distribution of timescales, provided that the network operates at the critical point (or “edge of chaos”)¹⁶^-¹⁸. Taken together, these findings suggest a distributed, flexible neural system for reward valuation and memory.

Results

Multiplicative memory traces in cortical neurons

We analyzed single-neuron activity recorded from three cortical areas, dorsal anterior cingulate cortex¹⁹ (ACCd, 154 neurons), dorsolateral prefrontal cortex²⁰ (DLPFC, 322 neurons) and lateral intraparietal cortex²¹ (LIP, 205 neurons) of six monkeys performing a matching pennies task¹¹^,²² (Fig.1a). In each trial, the animal chose one of two targets by shifting its gaze, while the computer made its choice by simulating a rational opponent; the animal received reward if its choice matched that of the computer. We computed firing rates of each neuron by counting the spikes in twelve time intervals of 250ms (coloured bars in Fig. 1a), which are referred to as epochs. This includes six epochs (1.5s) before saccade initiation (pre-fixation, fore-period, delay) and six epochs (1.5s) after saccade completion (choice fixation, feedback and post-feedback). Consistent with previous studies²³^-²⁵, we found that the activity of neurons varied substantially in different trial epochs (99% of neurons, 675/681, ANOVA p<0.05). The time course of the activity in successive epochs differs substantially in different neurons. Fig.2a shows the firing rate in the different epochs of a trial (squares), averaged across all trials, of an example neuron recorded in ACCd. The activity of this neuron decreased after the saccade to a chosen target and increased after the feedback period.

Behavioral task and schematic illustration of memory traces. (a) In the matching pennies task, the monkey was required to fixate a central spot during the fore-period (500 ms) and delay period (500 ms) while the two choice targets (green disks) were displayed. Then, the central spot disappeared and the monkey made a saccadic eye movement to one of the two choice targets, and maintained its gaze on the chosen target for 500ms (choice fixation). A red ring appearing around the correct target revealed the computer's choice, and if it matched the animal's choice (as illustrated), reward was delivered 500 ms later. Coloured bars at the bottom show the twelve 250ms intervals (epochs) used to compute the firing rates in the analysis. (b,c) Two hypothetical neurons. The neuron in panel b has a constant average firing rate (black line), while the firing rate of neuron in panel c depends on the trial epoch, repeating in each of the three consecutive trials. Red lines show the change in activity due to the outcome in the first trial (continuous line – reward, dashed line – no reward). The inset shows the memory trace of the reward, given by the difference between the red and black lines. The memory trace of neuron in panel b shows a simple decay, while that of neuron in panel c is multiplicatively modulated by the epoch-dependent activity.

An example neuron in ACCd showing multiplicative modulation of memory traces by the epoch code. The colors in all panels denote trial epochs, following the format of Fig.1a. (a) The epoch code for an example neuron, i.e. the firing rate computed in twelve 250ms epochs within a trial and averaged over all trials (black squares, interpolated by the black line, broken during the saccade). Coloured disks correspond to the slopes fitted in panel c (error bars, ±SE); their correlation with the epoch coded quantifies the multiplicative modulation, and is referred to as the factorization index (FI=0.97 in this example). (b) The memory trace f of past rewards in the same neuron, up to five trials in the past. Coloured dots and error bars (±SE) show the results of the multiple linear regression model, Eq.(1), and the black line is the exponential fit (Eq.(2), continuous line, exponential ex(t); broken line, modulated envelope g·ex(t)). The parameters for the fit are shown (A, amplitude; τ, timescale). (c) The memory trace f (from panel b), plotted as a function of the exponential function ex. The lines are least squares fit, each line encompassing a particular epoch and all five trial lags. According to the factorization, the slopes should correspond to the epoch code, f = g·ex. The values of the slopes are plotted in panel a (coloured squares) and compared with the epoch code g(k).

We then examined the effect of reward on the activity of neurons. Neural activity in all three cortical areas carries the information of past reward events¹⁹^-²². In order to characterize the memory trace of each neuron, we introduce a novel approach that is schematically illustrated in Fig.1 for two hypothetical neurons (panels (b) and (c)). The black line shows the average time course of neural activity in three consecutive trials, which is constant for the neuron in panel (b), while it depends on the trial epoch for the neuron in panel (c). The red lines illustrate the change in activity, with respect to the average time course, due to the outcome in the first trial (reward or no reward). The difference with respect to the black line is defined as the memory trace of the reward, and is shown in the inset. Note that the average time course, in black, is obtained by averaging the two red traces, corresponding to the two possible outcomes. Panel (b) shows a simple decay of the memory trace of reward, which is slowly fading in time, and where both the memory trace and the average time course are independent of the trial epoch. By contrast, in panel (c) the time course of neural activity depends on the trial epoch, and is nearly zero during the delay period. This implies that this hypothetical neuron cannot signal any reward memory during the delay period, since it never produces any spikes during that epoch. Starting with this intuition, we hypothesized that the memory trace in a given epoch is proportional to the average firing rate in that epoch. In that case, in addition to the decay, the memory trace (inset) is modulated (multiplied) by the average firing rate. In the next section, we show that, although individual neurons differ in their firing rates and the types of memory decay, this general principle holds.

We define the “epoch code” as the firing rate averaged across all trials, as a function of the different epochs (e.g. Fig.2a), denoted by g(k) (k=1,…,12 epochs, in temporal order). In order to separate the contributions of epoch and reward memory to neural activity, we modeled the firing rate measured in trial n and epoch k, denoted by FR(n,k), as the sum of the epoch code g(k) and a filter f(n',k) convolved with the animal's reward history in previous trials (last 5 trials; in each trial Rew = +1 indicates reward; Rew = −1, no-reward), namely,

FR (n, k) = g (k) + Σ_{n ’ = 0 : 5} f (n ’, k) \cdot Rew (n - n ’) .

(1)

The filter f describes how the reward in a given trial affects neural activity in the subsequent trials, assuming that the effects of rewards in successive trials are additive. For example, f(3,4) describes the effect of a reward after 3 trials during epoch 4. The filter f corresponds to our definition of memory trace as illustrated in Fig.1b,c; it reflects the deviation from the epoch-dependent time course g(k) due to a reward event. Since in this study Rew(n) is nearly a random sequence¹¹ of +1 and −1, averaging the firing rates over all trials recovers the epoch code g(k) (sum over n of FR(n,k)). We estimated the memory trace f(n',k) by applying multiple linear regression to the data according to Eq.(1). One example memory trace is given in Fig. 2b (same neuron as in Fig.2a, colour denotes epoch), which is negative, i.e. reward decreases the activity of this neuron in subsequent trials. The memory trace does not decay monotonically, but its strength is modulated throughout the trial consistent with the epoch code. According to the multiplicative model illustrated in Fig.1b,c, we assumed that the memory trace f is factorized into the epoch code g(k) and an exponential function ex(t), as described by the following equation,

FR (n, k) = g (k) + g (k) \cdot Σ_{n ’ = 0 : 5} ex (t) \cdot Rew (n - n ’) .

(2)

The filter f considered in Eq.(1) is now replaced by the product of two factors g(k)·ex(t), where ex(t) = A·e^−t/τ is an exponential decay function, and t is the time elapsed since the outcome (see Methods). By applying this model to the neuron shown in Figure 2, for example, we obtained a timescale of memory decay τ = 6.9 trials and an amplitude A = −0.24. The exponential function (ex) and its modulated envelope (g·ex) are shown in Fig. 2b by the continuous and broken line, respectively. According to the factorization (f = g·ex), the constant of proportionality between the memory trace f and the exponential function ex, estimated in different epochs (Fig.2c), should reproduce the epoch code g(k). The epoch codes for the neuron closely followed these predictions, indicating that the factorization is nearly exact (Fig. 2a). The factorization index (FI) of a neuron, defined as the correlation coefficient between the epoch code and the proportionality constants (slopes), was 0.97 for this neuron.

The modulated decay of the memory trace was observed in the majority of the recorded neurons in all three cortical areas. In some cases, the sum of two exponential functions, ex(t) = A₁·e^−t/τ1+A₂·e^−t/τ2, fitted the data better than a single exponential, in which case the memory trace often exhibits a biphasic characteristic (with A₁ and A₂ of the opposite sign, see fourth column of Fig 3). Using the Bayesian Information Criterion (BIC), we found that the best fit was a single exponential for 269 neurons and double exponentials for 268 neurons, while the remaining 144 neurons were fitted best by a model with ex(t)=0. The latter is interpreted as no memory, and the corresponding neurons were excluded from further analysis. We tested the validity of the fitting procedure by randomly reshuffling the order of trials in each session, and we consistently found that 96% of neurons (656/681) show no memory after reshuffling.

Firing rates and memory traces for six neurons, two for each of the three recorded areas. For each of the six neurons, epoch codes (first and third column) and memory traces (second and fourth column) are shown, in the same format as in Figure 2a and 2b. The second column shows monotonic decay of the memory trace, while the fourth column shows biphasic memory traces (double exponential). Different neurons have different firing rates, both in magnitude and time course, and different types of memory decay, but they are all consistent with an exponential (single or double) decay of the memory modulated by the epoch code. FI's for those neurons are: (a,b) 0.98, (c,d) 0.91, (e,f) 0.98, (g,h) 0.84, (i,j) 0.97, (k,l) 0.61.

Fig.3 shows six example neurons, two for each of the three cortical areas, and for each neuron, the average firing rate (epoch code) and memory trace are plotted. Different neurons have different magnitudes and time courses of the firing rate, and they all show a decaying memory trace modulated by the epoch code. We stress that the broken lines in the plots of memory traces (Fig 3, second and fourth columns) are not the result of fitting the coloured dots; they rather result from an independent application of the factorial model in Eq.2 to each neuron's firing rate. Although the activity of most neurons is consistent with an exponential decay of the memory trace (79%, 537/681, single and double exponentials), a fraction of them did not show a modulation of the memory by the epoch code. This is quantified by the factorization index, which is significantly positive for approximately half of the neurons showing a memory effect (46%, 249/537, p<0.05 t-test). We found a small but significant difference in the fraction of neurons with memory across different areas (87% in ACCd, 75% in DLPFC and 78% in LIP, χ²-test p=0.01).

We next investigated how the timescales of memory traces were distributed across neurons in different cortical areas. Fig.4 shows the distribution of timescales in all cortical areas (black circles), whereas coloured lines show the distribution for the three different areas, which are remarkably consistent. The red line shows a power law fit with an exponent of −2. The power law implies that timescales are distributed in a wide range of values. In fact, for a power law distribution ~ τ⁻², the variance increases with the sample size and, in principle, arbitrarily large timescales would be observed with a proportionally large increment in the number of recorded neurons. Note that the power law tail applies for timescales equal to or larger than one trial, which are those timescales that might be involved in memory (see below). About 20% of all recorded neurons (133/681) had timescale larger than one trial (29% in ACCd, 19% in DLPFC and 13% in LIP, χ²-test p=0.0005, see Fig.S1c-e in the Supplementary Material). Since the timescales from one (τ) and two-exponential functions (τ₁, τ₂) were distributed similarly (see Fig.S1a,b in the Supplementary Material), we pooled all timescales (a total of 805 timescales from 269 single exponential and 268 double exponential, i.e. 269 τ, 268 τ₁ and 268 τ₂). ACCd contributed 197 timescales from 71 single exponential and 63 double exponential functions (71 τ, 63 τ₁ and 63 τ₂), whereas 20 neurons had no memory. A total of 362 timescales were obtained from DLPFC with 124 single and 119 double exponential functions (124 τ, 119 τ₁ and 119 τ₂), and 79 DLPFC neurons had no memory. LIP neurons contributed 246 timescales from 74 single and 86 double exponential functions (74 τ, 86 τ₁ and 86 τ₂), and 45 LIP neurons showed no memory.

Distribution of the timescales characterizing the reward memory traces across neurons. Black disks show the density for the neurons in all three cortical areas in the corresponding bin, i.e. the count of timescales divided by the bin length (error bars: ±SE). The inset shows the count of the timescales in the same bins, in a linear scale (a total of 805 timescales). Grey markers show the density separately for each of the three different cortical areas (square - ACCd, 197 timescales; upward triangle - DLPFC, 362; downward triangle - LIP, 246). The red line (red curve in the inset) shows a power law fit (exponent = −2).

Comparison with behavior

Are the neural memory timescales relevant for learning and decision-making? The matching pennies task used in this study does not necessarily require the memory of past rewards, and the optimal strategy for the monkey is to choose randomly and unpredictably. Although the overall performance of monkeys was nearly optimal, their trial-by-trial decisions, locally in time, were influenced by previous rewards and actions¹¹^,¹⁹^-²². We analyzed the behavior of monkeys in different experimental sessions by fitting their decisions with a standard reinforcement learning model⁵ (RL, see Methods). The learning rate parameter (α) of the RL model quantifies the behavioral timescale of the memory trace (α ~ 1/τ). The resulting likelihood was significantly larger than the likelihood for reshuffled trials, and the model fit with behavioural data is significant in 78% of the sessions (196/250, p<0.05). We found that the timescales of behavioural memory vary across sessions, possibly suggesting that monkeys adopt different strategies in successive sessions. For the 196 sessions fitted by the RL model, the distribution of behavioral timescales followed a power law distribution (Fig.5a), and the exponent was consistent with that measured in the neural distribution. Hence, the distributions of behavioural and neuronal timescales qualitatively match with each other. This result suggests that there might be a relationship between the memory trace observed at the neural level and that observed at the behavioural level. We tested this hypothesis by comparing the neural timescale for reward memory observed during a given recording session with the behavioural timescale fit in that session (when both are available), and we found a small but significant correlation across sessions (Fig.5b, R=0.12, p=0.003), suggesting that the activity of single neurons is related, albeit weakly, to the behavioural strategy of the animals.

Distribution of behavioral timescales and their relationship with the neural memory timescales. (a) Time constant τ estimated from the learning rate α (τ ~ 1/α) of a reinforcement learning model fit to the monkey's behavioural data. Black disks show the density in the corresponding bin, i.e. the count of timescales divided by the bin length (error bars, ±SE). The inset shows the count of the timescales in the same bins, in linear scale (a total of 196 timescales). The red line (red curve in the inset) shows a power law fit (exponent = −1.9). (b) The scatterplot of behavioural vs neural memory timescales obtained from all sessions where both were available. Neural timescales from different types of fit (τ from single exponential and τ₁, τ₂ from double exponential) are shown in different colours. Behavioral and neural timescales show a small but significant correlation (R=0.12, p=0.003).

Do the reward memory timescales also change within a single session? We determined whether the timescales are stable within a single recording session by dividing each session in two separate blocks (halves) of trials, and we re-estimated both the neural and behavioral timescales separately in the two blocks. Results are shown in Fig.6, suggesting that both the behavioral (panel (a)) and neural memory timescales (panel (b)) were fairly stable within a single session.

Stability of behavioural (a) and neural memory timescales (b) within an experimental session. In both panels, the scatterplot of the timescales fitted in the second half of the trials is plotted against the timescales fitted in the first half of the trials in the same session. The correlation is significantly different from zero in both cases (R=0.4 for behavioural timescales, R=0.77 for neural timescales), suggesting that both types of timescales are fairly stable within a single session. Neural memory timescales from different types of fit (τ from single exponential and τ₁, τ₂ from double exponential) are shown in different colours.

The neural and behavioural timescales might fluctuate together across sessions, but their small correlation indicates that there is only a weak coupling. Indeed, we found that at any moment, the time scales of reward memory varied across cortical neurons. In each recording session of our study, only few neurons were simultaneously recorded (about two on average). When we estimated memory timescales for pairs of simultaneously recorded neurons, the correlation between their time constants was not significantly different from zero (312 pairs of timescales, R=0.07, p=0.2). This result suggests that the broad distribution of memory time constants observed in the data reflects a variability of timescales across different neurons, rather than resulting from a memory timescale fixed for all neurons that collectively changes across sessions.

Taken together, our results support the conclusion that a diverse collection of neural memory timescales, a “reservoir”, is available across cortical neurons at any given time. The animal's behavior may be determined by a readout system that is able to sample, at different times, from a variety of timescales present in the reservoir. The reservoir might not be static, and it may change its distribution of timescales from day to day. During competitive games, the subjects might also take into account their recent choices to determine their future behavior. Therefore, we tested whether any memory trace of choice exists in the recorded neurons, by applying the same analysis of Eqs.(1) and (2), substituting reward with choice. Our results indicate that multiplicative modulation and a power law distribution of memory timescales also hold for memory trace of past choices (see Fig.S2 in the Supplementary Material). A detailed analysis of the neural memory of choice, and of how the two types of memory for reward and choice may be combined, will be the subject of a separate study.

Neural network model for memory traces

What neural mechanism(s) accounts for the statistical properties of reward memory described above? To address this question, we constructed a simple neural network model that reproduces the observed neural memory traces (Fig.7a, Fig.S3). Model neurons integrate the reward signals by receiving a current impulse whenever a reward is obtained. Since neurons are recurrently connected and form loops, their activities reverberate and are able to maintain the memory of reward events. However, those memories decay and are slowly forgotten according to a time course which depend on the pattern of synaptic connections among neuron pairs. Specifically, the activity of neurons evolve according to dv/dt = J·v(t) + h·Rew(t), where v is a vector of M components, each component is the activity of a different neuron in the reservoir (M=1000 neurons in simulations); J is the synaptic connectivity matrix of their interactions; h is a vector representing the relative strength of the reward input Rew(t) to each neuron. For our purposes, the specific form of the input signals is not important; the results will depend only on the synaptic matrix J. We assumed the connection weights (the entries of the matrix J) to be randomly distributed, and we looked for candidate probability distributions such that the network model reproduces the distributions of timescales and amplitudes observed in the neural data from behaving monkeys (see Supplementary Text). Amplitudes determine the extent of the immediate response of neurons to reward, with respect to the average activity. While time constants have a power-law distribution (Fig.4), the distribution of amplitudes is exponential (Fig.8a, where we used A for one exponential and A₁+A₂ for two exponentials).

Neural responses (memory traces) in the model and distribution of timescales of the memory traces in model neurons. (a) The memory traces of four model neurons. (b) Black disks show the density of timescales in the corresponding bin, i.e. the count of timescales divided by the bin length (error bars, ±SE). The inset shows the count of the timescales in the same bins, in linear scales (a total of 1000 timescales). The red line (red curve in the inset) shows a power law fit (exponent = −2).

Distribution of amplitudes of the memory traces in the neural data (a) and model (b). In both panels, black disks show the density in the corresponding bin, i.e. the count of timescales divided by the bin length (error bars, ±SE). The inset shows the count of the amplitudes in the same bins, in a linear scale (537 amplitudes in the data, 1000 in the model). Amplitudes are plotted as absolute values, since the distribution is approximately symmetric (symmetry is shown in the inset). Grey markers show the density separately for the three different recorded areas (squares: ACCd, 134 amplitudes, upward triangles: DLPFC, 243, downward triangles: LIP, 160). The red line (red curve in the inset) shows an exponential fit (e^−|A|).

First, we found that the connection weights must be broadly distributed among neuron pairs, and this endows the network with a wide variety of timescales. Intuitively, the stronger the connection, the longer is the reverberation of the input and hence the timescale of the memory trace. However, if connections are also heterogeneous, then weaker connections, and smaller timescales, will also contribute to the memory traces. If the width of the distribution of connection weights reaches a certain threshold, a power-law distribution of timescales is observed (Fig.7b), which is characterized by a high probability for both small and large timescales. This is a distinct type of network state “at a critical point” (or “edge of chaos” in nonlinear systems), which have been proposed to be desirable for many kinds of computations¹⁶^-¹⁸. In our model, the criticality corresponds to the situation where the system is on the verge of losing stability. When the width of the connection distribution exceeds the critical level, the linear system is unstable and the model would need to be extended to include nonlinearities such as saturation of neural activity. For the sake of simplicity, here we limit ourselves to the linear model, which is sufficient for the purpose of reproducing the observed power-law distribution of timescales under specific conditions.

A second desirable property of the network is that its dynamics is robust with respect to small changes of the connection strengths. If the coding of the memory changes dramatically as a result of small changes in the connection strengths (e.g. synaptic noise), it would be difficult for a downstream system to interpret that code. A known property of the connection matrix J ensuring that kind of robustness is normality, which guarantees that there is an orthogonal set of eigenvectors²⁶ (but see Refs ²⁷^-²⁹ for non-normal neural network models). If J is normal, we showed that the amplitudes of the memory traces follow an exponential distribution (Fig.8b), consistent with the experimental observations (Fig.8a). To our knowledge, our results provide the first complete statistical description of a network connection matrix based on in vivo neuronal recordings of behaving animals (see also Refs ³⁰^-³²).

Discussion

The power law of timescales suggests that the duration of reward memory trace is highly diverse across cortical neurons. The same diversity is observed across three cortical areas, suggesting that the computation of reward memory is a distributed process. This finding is in line with an increasing appreciation that neural encoding of cognitive variables is highly heterogeneous and distributed³³^,³⁴. Prefrontal cortex plays a major role in dynamic decision processes encoding and updating values¹^-⁴. While anterior cingulate cortex was implicated in monitoring conflict between incompatible response processes³⁵ or detecting performance errors³⁶, recent studies have placed more emphasis on its role in representing both positive and negative values¹⁹^,³⁷. Parietal cortex has also been implicated in decision making based on the value representation as well as on the accumulation of sensory evidence³⁸^,³⁹.

Our work provides a comprehensive description of memory traces in terms of a specific distribution of timescales across a population of neurons, and introduces a framework that could potentially be applicable to different brain areas and different types of memory. The concept of multiplicative modulation of memory traces can be used to deduce the neural memory timescales in various tasks, and to test the idea that a different set of time constants is selected to adapt to a specific environment⁶^,⁷. Although the global optimal strategy for the matching pennies task is to choose randomly and therefore does not require memory, the animals made their decisions largely on the basis of their reward history¹¹^,¹⁹^-²². Perhaps in the persistent search for an appropriate strategy, they sampled different timescales across experimental sessions. We found that those behavioural timescales followed a similar distribution as, and were weakly yet significantly correlated with, the timescales observed at the neural level. This suggests the possibility that the behavior might be driven by a mechanism that appropriately samples from a range of timescales in a neural network, which has yet to be elucidated. Alternatively, this weak correlation might be caused by factors that are currently not understood. Note that the observed range is different for the neuronal versus behavioural time constants. Also, we have not attempted to fit the behavioural data by an RL model endowed with multiple time constants. Future work is needed to further assess the correlation between neural memory traces and behavior. Regardless, our results suggest that reward memory with multiple time constants might be used to compute the value functions in reinforcement learning theory in more than one timescale. Similarly, the double exponential decay of memory may correspond to a reward prediction error signal: if the short timescale (τ₁) is small enough (about one trial or smaller) then the corresponding exponential filter will respond primarily to the reward in the present trial, while the long timescale (τ₂) may provide a value signal by weighting the rewards in the past few trials. When the two exponentials have opposite sign, they roughly subtract the value from the actual reward signal, therefore providing a reward prediction error. It was already noted previously that a biphasic filtering in dopamine neurons might provide a reward prediction error⁴⁰.

Besides the memory for reward, the activity of primate cortical neurons reflects other types of short-term memory. The time course of memory-related activity varies across different neurons and different task protocols, including persistent, ramping, and multi-phasic activity⁴¹^-⁴³. Memory traces in the neural signals are mixed with other task-dependent factors⁴⁴^,⁴⁵, and it has been debated whether other processes involved in goal-directed behavior could be mistakenly identified as memory, such as spatial attention⁴⁶, motor planning⁴⁷, anticipation of future events⁴⁸, or timing⁴⁹. The epoch code in the present task might include many of those processes, and we have shown that memory signals could be dissociated from those factors by assuming a multiplicative computation. The hypothesis of a multiplicative effect of memory on neural activity could be tested by looking more closely at the multi-phasic time course of memory-related activity observed in other experiments. The computational advantage of the multiplicative effect of memory needs to be further investigated. For example, it may serve the appropriate recall of memories at different epochs (see Supplementary Text), as observed in a recent study⁵⁰.

Reservoir-type networks have been the subject of active research in computational neuroscience and machine learning¹³^-¹⁵, but so far experimental support that such networks are adopted by the brain has been lacking. Those models predict that the memory of input signals is stored in a large, recurrent and heterogeneous network (reservoir), in a distributed manner, and that a desired output is obtained by a trainable combination of the response signals in the reservoir. The heterogeneous encoding of the input allows the flexible learning of different output functions. In our context, that may correspond to a flexible change in strategy resulting from the variety of timescales for reward memory present in the reservoir. We present direct experimental evidence, at the level of single neurons, for a high-dimensional reservoir network of reward memory traces in prefrontal, cingulate and parietal areas of the primate cortex. This empirical finding is reproduced by a simple computational model, which suggests that reward filtering in the cortex involves a dynamic reservoir network operating at the critical point, leading to a power-law distribution of time constants. The output of the network, supposedly driving the animal's behavior, is not explicitly modeled in our equations. Further studies are necessary to elucidate how the motor areas read out the memory of reward and choices, and how the two are combined to subserve adaptive choice behavior.

Power-law distributions are unusual, as they imply a high probability for both large and small time constants. A diversity of time constants also means a broad range of learning rates, since the two are inversely related to each other. This is noteworthy, since a shift from an exploitive to an exploratory strategy as the environment becomes uncertain is often assessed by an increase in the learning rate¹⁰. It remains to be seen whether the same or different distribution of learning rates holds across species when faced to a similar environment, and whether it can be flexibly modified to adapt to different circumstances. Ultimately, this framework could lead to a new model for predicting how reward expectation is computed and how reward memory affects decision-making.

Methods

Animal preparation and electrophysiological recording

All the data were collected using the same behavioural task and electrophysiological techniques. Here we give a brief summary of the methods, described previously in Refs. ¹⁹^-²¹. Five male and one female rhesus monkeys were used. The animal's head was fixed during the experiment, and eye movements were monitored at a sampling rate of 225 Hz with a high-speed eye tracker (Thomas Recording). Animals performed an oculomotor free-choice task²² (matching pennies, Fig. 1a). Trials began with the animal fixating a small yellow square (0.9° × 0.9°) displayed at the center of the computer screen for a 0.5-s fore-period. Two identical green disks were presented at 5° eccentricity in diametrically opposed locations along the horizontal meridian for a 0.5-s delay period. The extinction of the central target signalled the animal to shift its gaze toward one of the targets within 1 s. After the animal maintained its fixation on the chosen peripheral target for 0.5 s, a red ring appeared around the target selected by the computer. The animal was rewarded only if it chose the same target as the computer, which simulated a rational decision maker in the matching pennies game trying to minimize the animal's expected payoff. Before each trial, the computer made a prediction for the animal's choice by computing the conditional probabilities for the animal to choose each target given its choices and rewards in the preceding four trials. The computer made a random choice if the probabilities were consistent with unbiased behaviors, otherwise it would bias its selection against the prediction. Single-unit activity was recorded using a 5-channel multi-electrode recording system (Thomas Recording) from three cortical regions; the dorsal bank of anterior cingulate sulcus¹⁹ (ACCd, area 24c, 2 male monkeys, 8-12kg), dorsolateral prefrontal cortex²⁰^,²² (DLPFC anterior to the frontal eye field; 4 male and 1 female monkeys, 5-12kg), lateral bank of the intraparietal sulcus²¹ (LIP, 2 male and 1 female monkeys, 5-11kg). All the neurons were recorded without pre-screening. The placement of the recording chamber was guided by magnetic resonance (MR) images, and confirmed by metal pins inserted in known anatomical locations at the end of the experiment in some animals. In three animals, two recording chambers were used for simultaneous recording of DLPFC and LIP. All the experimental procedures were approved by the Institutional Animal Care and Use Committee at Yale University and conformed to the Public Health Services Policy on Humane Care and Use of Laboratory Animals and the Guide for the Care and Use of Laboratory Animals.

Multiple regression analysis of memory traces

This section explains the method used to estimate the memory traces f(n,k) from the observed neuronal firing rates and sequence of rewards. In each trial, firing rates were computed in twelve time intervals of 250ms each (see Fig.1a). The following model was used to fit the firing rates: the firing rate of a neuron depends on the trial epoch k, following the epoch code g(k); after the outcome is revealed (feedback period) in each trial, the firing rate is changed, by an amount of +f(n',k) for reward and −f(n',k) for no reward, where n' is the number of trials elapsed since that outcome. Effects of outcomes in successive trials are additive. The firing rate FR(n,k) is thus described by the following equation

FR (n, k) = g (k) + Σ_{n ’ = 0 : 5} f (n ’, k) \cdot Rew (n - n ’) + noise,

(M.1)

where the index k labels the epoch (k = 1,…,12) and the indices n and n' label trials. The effect of reward extends up to 5 trials (n'=0,…,5), while the index n runs over all N trials available in each neuron recording (starting after the first 5 trials, n = 6,…,N). In order to determine f(n,k) and g(k), we applied a multiple regression model by using the known FR(n,k) and Rew(n) (= +1/−1 for reward/no reward). Note that the epoch code g(k) depends on the twelve different epochs within a trial, while the reward Rew(n) depends only on trial number. As a consequence, the regression can be applied separately for each epoch. For a fixed epoch k, the seven unknown variables g(k), f(0,k), f(1,k), f(2,k), f(3,k), f(4,k), f(5,k) can be determined by using the known values of FR(n,k) and Rew(n) in N–5 trials (n=6,…,N). Using a parsimonious matrix notation and omitting the epoch label k, Eq.(M.1) can be rewritten as

FR = Rew \cdot f + noise

(M.2)

Where the vector of the known firing rates FR is equal to

FR = {[FR (6, k), FR (7, k), \dots, FR (N, k)]}^{T}

(M.3)

The seven unknown variables have been rewritten by a single vector f

f = {[g (k), f (0, k), f (1, k), f (2, k), f (3, k), f (4, k), f (5, k)]}^{T}

(M.4)

The matrix Rew is known, given by

Rew = (\begin{matrix} 1 & {Rew}_{6} & {Rew}_{5} & {Rew}_{4} & {Rew}_{3} & {Rew}_{2} & {Rew}_{1} \\ 1 & {Rew}_{7} & {Rew}_{6} & {Rew}_{5} & {Rew}_{4} & {Rew}_{3} & {Rew}_{2} \\ \dots & \dots & \dots & \dots & \dots & \dots & \dots \\ 1 & {Rew}_{N} & {Rew}_{N - 1} & {Rew}_{N - 2} & {Rew}_{N - 3} & {Rew}_{N - 4} & {Rew}_{N - 5} \end{matrix})

(M.5)

Because the sequence of rewards is nearly random and N is large, different columns of the matrix Rew are nearly orthogonal. This implies that the matrix product (Rew^T·Rew) is well conditioned, and the solution f_sol minimizing the variance of the noise (or squared error) is robust and given by

f_{sol} = {({Rew}^{T} \cdot Rew)}^{- 1} \cdot {Rew}^{T} \cdot FR

(M.6)

This expression is used to obtain the results. The confidence intervals for f_sol are derived from the residual errors according to the MATLAB function regress.

The matrix product (Rew^T·Rew) is approximately proportional to the identity matrix. When Rew^T·Rew=I, the filter is equal to the firing rate averaged over all trials, where the average is conditioned on the past rewards. This is equivalent to the cross-correlation between the input (rewards) and output (firing rates), and its application would correspond to a reverse correlation method, commonly used in the analysis of sensory neural coding. However, in the main text we showed only results from the multiple regression analysis. For simplicity, we used an average over all trials as the definition of epoch code g(k) in the main text, making use of the above approximation.

Exponential memory traces and model selection

The model considered here is similar to that of Eq.(M.1), but we assumed that memory traces are exponential function ex(t) rescaled by the epoch code g(k), namely,

FR (n, k) = g (k) + g (k) \cdot Σ_{n ’ = 0 : 5} ex (t) \cdot Rew (n - n ’) + noise

(M.7)

The filter f considered in Eq.(M.1) is replaced by g(k)·ex(t). We considered two different exponential functions, a single exponential and the sum of two exponentials, namely,

{ex}_{1} (t) = A \cdot e^{- t ∕ τ}

(M.8)

{ex}_{2} (t) = A_{1} \cdot e^{- t ∕ τ 1} + A_{2} \cdot e^{- t ∕ τ 2}

(M.9)

where τ₁ < τ₂. The physical time t depends on all indices k, n and n', because the time elapsed between different epochs and between successive trials is variable, due to the variability in the time taken by the animal to start a trial and to make a saccade to one of the two targets. Based on the time stamps generated during the experiment, we computed the physical time t = t(n,k,n') as the difference between the time corresponding to a given trial and epoch (n,k) and the time corresponding to the feedback epoch of n' trials in the past (up to 5 trials). Note that the memory trace f obtained by the multi-linear regression is not computed in physical time. In that case, we assumed that the saccade reaction time of the animal in all trials is equal to 120ms (average), and that the time elapsed between the initiation of two successive trials is 3.4s (median).

The epoch code g(k) was fixed by the firing rates averaged across trials, while the parameters of the exponential function (two parameters (A,τ) when using Eq.(M.8) and four parameters (A₁,τ₁,A₂,τ₂) when using Eq.(M.9)) were estimated using a non-linear curve-fitting procedure, implemented by the MATLAB function fminsearch, minimizing the variance of the noise (sum of squared errors) in Eq.(M.7). Fitting was repeated ten times for each neuron and each model, in the search for a global minimum of the error. Any parameters resulting in unrealistic values were discarded, such as negative values of τ, τ₁ or τ₂, values of τ larger than 20 trials, and the absolute value of A or (A₁+A₂) larger than 4. We determined the parameters for all neurons in both exponential models, single and double exponential, and denoted the corresponding square errors by σ₁² and σ₂², respectively. We also computed the variance of firing rate, σ₀², as the square error for a zero filter model, i.e. ex=0 or FR=g+noise. Among the three models, the selection of the appropriate one for each neuron was determined according to the Bayesian information criterion (BIC):

{BIC}_{i} = m \cdot \log ({σ_{i}}^{2}) + p_{i} \cdot \log (m)

(M.10)

where p_i denotes the number of parameters in the model, and p₀=1, p₁=3, p₂=5, for 0, 1 and 2 exponential fit, respectively (note that the variance σ_i² is also a parameter); m is the number of data points, m=12(N–5) (12 epochs and N–5 trials for each neuron). The model with the minimum BIC was chosen for each neuron. As a control of the fitting procedure, we reshuffled the label n in the firing rates FR(n,k), assigning to each firing rate the value of a random trial, and we repeated the entire procedure.

Reinforcement learning fit of behavior

We applied a standard reinforcement learning model⁵, separately for each recording session, to analyze how the animal's choice was influenced by the outcomes of its previous choices. For example, when right target R was chosen in trial t, the value function for R, denoted by Q_R(t), was updated according to:

Q_{R} (t + 1) = Q_{R} (t) + α [Rew (t) - Q_{R} (t)]

(M.11)

Where Rew(t) denotes the reward received by the animal in trial t, and the term inside square is commonly defined as the reward prediction error (RPE), i.e. the discrepancy between the actual reward and the expected reward. A similar equation holds for the left value function Q_L(t). The probability that the animal would choose the rightward target in trial t, P_R(t), was determined by the SoftMax transformation as follows:

P_{R} (t) = \exp (β Q_{R} (t)) ∕ [\exp (β Q_{L} (t)) + \exp (β Q_{R} (t))]

(M.12)

where β, referred to as the inverse temperature, determines the randomness of the animal's choices. Model parameters (α,β) were estimated separately for each recording session by using a maximum likelihood procedure, where the likelihood is the product of probabilities in all trials (Eq.(M.12)), in each trial using R or L according to the actual monkey's choice. The parameter values maximizing the likelihood were found by using the MATLAB function fminsearch. The significance of the estimation was assessed, for each session, by constructing 100 surrogate sessions, each one obtained by reshuffling of the order of trials. The distribution of 100 maximum likelihoods obtained by the estimation procedure was then compared with the maximum likelihood of the non-reshuffled case, which was considered significant if not smaller than the five largest reshuffled likelihoods.

Value functions and RPE signals can be related to the exponential filters estimated for individual neurons. If a single value function (for a given stimulus/action) and a single reward (delivered at time zero) are considered, the solution of Eq.(M.11), can be approximated by an exponential response, i.e. Q(t) = (1−1/τ)^t ~ exp(−t/τ), provided that τ is larger than 1 trial. When a sequence of rewards is delivered instead of a single one, the value is a superposition of the exponential responses for each reward.

Supplementary Material

NIHMS263877-supplement-1.doc^{(591.5KB, doc)}

Acknowledgements

We thank James Mazer and Min Whan Jung for comments on an earlier version of the manuscript, Rishidev Chaudhuri, Michael Harre and John Murray for discussions. This work was supported by the NIH grant R01 MH062349 and the Swartz Foundation (A.B. and X.J.W.), and by NIH grants R01 MH073246 (X.J.W., D.L.) and DA029330 (D.L.).

Footnotes

Author Contributions

All authors participated in the research design and the preparation of the manuscript. H.S. collected the data, A.B. and H.S. analyzed data, A.B. and X-J.W. performed modeling.

Competing Financial Interest

The authors declare no competing financial interests.

References

1.Kable JW, Glimcher PW. The neurobiology of decision: consensus and controversy. Neuron. 2009;63:733–745. doi: 10.1016/j.neuron.2009.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Rushworth MF, Behrens TE. Choice, uncertainty and value in prefrontal and cingulate cortex. Nat. Neurosci. 2008;11(4):389–97. doi: 10.1038/nn2066. [DOI] [PubMed] [Google Scholar]
3.Wang X-J. Decision making in recurrent neural circuits. Neuron. 2008;60:215–234. doi: 10.1016/j.neuron.2008.09.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Soltani A, Lee D, Wang X-J. Neural mechanism for stochastic behavior during a competitive game. Neural Networks. 2006;19:1075–1090. doi: 10.1016/j.neunet.2006.05.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Sutton RS, Barto AG. Reinforcement Learning, An Introduction. MIT Press; Cambridge, MA: 1998. [Google Scholar]
6.Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an uncertain world. Nat. Neurosci. 2007;10:1214–1221. doi: 10.1038/nn1954. [DOI] [PubMed] [Google Scholar]
7.Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature. 2006;441:876–879. doi: 10.1038/nature04766. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Lau B, Glimcher PW. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 2005;84:555–579. doi: 10.1901/jeab.2005.110-04. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Corrado GS, Sugrue LP, Seung HS, Newsome WT. Linear-Nonlinear-Poisson models of primate choice dynamics. J. Exp. Anal. Behav. 2005;84:581–617. doi: 10.1901/jeab.2005.23-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Kennerley SW, Walton ME, Behrens TE, Buckley MJ, Rushworth MF. Optimal decision making and the anterior cingulate cortex. Nat. Neurosci. 2006;9:940–947. doi: 10.1038/nn1724. [DOI] [PubMed] [Google Scholar]
11.Lee D, Conroy ML, McGreevy BP, Barraclough DJ. Reinforcement learning and decision making in monkeys during a competitive game. Cogn. Brain Res. 2004;22:45–58. doi: 10.1016/j.cogbrainres.2004.07.007. [DOI] [PubMed] [Google Scholar]
12.Kim S, Hwang J, Seo H, Lee D. Valuation of uncertain and delayed rewards in primate prefrontal cortex. Neural Networks. 2009;22:294–304. doi: 10.1016/j.neunet.2009.03.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Maass W, Natschläger T, Markram H. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 2002;14:2531–2560. doi: 10.1162/089976602760407955. [DOI] [PubMed] [Google Scholar]
14.Jaeger H, Lukosevicius M, Popovici D, Siewert U. Optimization and applications of echo state networks with leaky-integrator neurons. Neural Network. 2007;20:335–352. doi: 10.1016/j.neunet.2007.04.016. [DOI] [PubMed] [Google Scholar]
15.Verstraeten D, Schrauwen B, D'Haene M, Stroobandt D. An experimental unification of reservoir computing methods. Neural Network. 2007;20:391–403. doi: 10.1016/j.neunet.2007.04.003. [DOI] [PubMed] [Google Scholar]
16.Sussillo D, Abbott LF. Generating coherent patterns of activity from chaotic neural networks. Neuron. 2009;63:544–557. doi: 10.1016/j.neuron.2009.07.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Bertschinger N, Natschlager T. Real-time computation at the edge of chaos in recurrent neural networks. Neural Comput. 2004;16:1413–1436. doi: 10.1162/089976604323057443. [DOI] [PubMed] [Google Scholar]
18.Langton CG. Computation at the edge of chaos: phase transitions and emergent computations. Phisica D. 1990;42:12–37. [Google Scholar]
19.Seo H, Lee D. Temporal filtering of reward signals in the dorsal anterior cingulate cortex during a mixed-strategy game. J. Neurosci. 2007;27:8366–8377. doi: 10.1523/JNEUROSCI.2369-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Seo H, Barraclough DJ, Lee D. Dynamic signals related to choices and outcomes in the dorsolateral prefrontal cortex. Cereb. Cortex. 2007;17:i110–i117. doi: 10.1093/cercor/bhm064. [DOI] [PubMed] [Google Scholar]
21.Seo H, Barraclough DJ, Lee D. Lateral intraparietal cortex and reinforcement learning during a mixed-strategy game. J. Neurosci. 2009;29:7278–7289. doi: 10.1523/JNEUROSCI.1479-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Barraclough DJ, Conroy ML, Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nature Neuroscience. 2004;7:404–410. doi: 10.1038/nn1209. [DOI] [PubMed] [Google Scholar]
23.Lapish CC, Durstewitz D, Chandler LJ, Seamans JK. Successful choice behavior is associated with distinct and coherent network states in anterior cingulate cortex. Proc. Natl. Acad. Sci. USA. 2008;105:11963–11968. doi: 10.1073/pnas.0804045105. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Sigala N, Kusonoki M, Nimmo-Smith I, Gaffan D, Duncan J. Hierarchical coding for sequential task events in the monkey prefrontal cortex. Proc. Natl. Acad. Sci. USA. 2008;105:11969–11974. doi: 10.1073/pnas.0802569105. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Jin DZ, Fujii N, Graybiel AN. Neural representation of time in cortico-basal ganglia circuits. Proc. Natl. Acad. Sci. USA. 2009;106:19156–19161. doi: 10.1073/pnas.0909881106. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Trefethen LN, Embree M. Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators. Princeton University Press; Princeton, NJ: 2005. [Google Scholar]
27.Murphy BK, Miller KD. Balanced amplification: a new mechanism of selective amplification of neural activity patterns. Neuron. 2009;61:635–648. doi: 10.1016/j.neuron.2009.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Ganguli S, Huh D, Sompolinsky H. Memory traces in dynamical systems. Proc. Natl. Acad. Sci. USA. 2008;105:18970–18975. doi: 10.1073/pnas.0804451105. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Goldman MS. Memory without feedback in a neural network. Neuron. 2009;61:621–634. doi: 10.1016/j.neuron.2008.12.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Schneidman E, Berry MJ, Segev R, Bialek W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature. 2006;440:1007–1012. doi: 10.1038/nature04701. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Brunel N, Hakim V, Isope P, Nadal J-P, Barbour B. Optimal information storage and the distribution of synaptic weights: perceptron versus purkinje cell. Neuron. 2004;43:745–757. doi: 10.1016/j.neuron.2004.08.023. [DOI] [PubMed] [Google Scholar]
32.Ganguli S, Bisley JW, Roitman JD, Shadlen MN, Goldberg ME, Miller KD. One-dimensional dynamics of attention and decision making in LIP. Neuron. 2008;58:15–25. doi: 10.1016/j.neuron.2008.01.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Duncan J. An adaptive coding model of neural function in prefrontal cortex. Nat. Rev. Neurosci. 2001;2:820–829. doi: 10.1038/35097575. [DOI] [PubMed] [Google Scholar]
34.Rigotti M, Rubin DBD, Wang X-J, Fusi S. Internal representation of task rules by recurrent dynamics: the importance of the diversity of neural responses. Front. Comput. Neurosci. 2010;4:24. doi: 10.3389/fncom.2010.00024. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Botvinick MM, Braver TS, Barch DM, Carter CS, Cohen JD. Conflict monitoring and cognitive control. Psychol. Rev. 2001;108:624–52. doi: 10.1037/0033-295x.108.3.624. [DOI] [PubMed] [Google Scholar]
36.Holroyd CB, Coles MGH. The neural basis of human error processing: reinforcement learning, dopamine, and error-related negativity. Psychol. Rev. 2002;109:679–709. doi: 10.1037/0033-295X.109.4.679. [DOI] [PubMed] [Google Scholar]
37.Wallis JD, Kennerley SW. Heterogeneous reward signals in prefrontal cortex. Curr. Opin. Neurobiol. 2010;20:191–198. doi: 10.1016/j.conb.2010.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Platt ML, Glimcher PW. Neural correlates of decision variables in parietal cortex. Nature. 1999;400:233–238. doi: 10.1038/22268. [DOI] [PubMed] [Google Scholar]
39.Roitman JD, Shadlen MN. Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J. Neurosci. 2002;22:9475–9489. doi: 10.1523/JNEUROSCI.22-21-09475.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Bayer HM, Glimcher PW. Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal. Neuron. 2005;47:129–141. doi: 10.1016/j.neuron.2005.05.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Rainer G, Miller EK. Time course of object-related neural activity in the primate prefrontal cortex during a short-term memory task. Eur J Neurosci. 2002;15:1244–1254. doi: 10.1046/j.1460-9568.2002.01958.x. [DOI] [PubMed] [Google Scholar]
42.Machens CK, Romo R, Brody CD. Functional, but not anatomical, separation of “what” and “when” in prefrontal cortex. J Neurosci. 2010;30:350–360. doi: 10.1523/JNEUROSCI.3276-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Shafi M, Zhou Y, Quintana J, Chow C, Fuster J, Bodner M. Variability in neuronal activity in primate cortex during working memory tasks. Neuroscience. 2007;146:1082–108. doi: 10.1016/j.neuroscience.2006.12.072. [DOI] [PubMed] [Google Scholar]
44.Curtis CE, Lee D. Beyond working memory: the role of persistent activity in decision making. Trends Cogn Sci. 2010;14:216–222. doi: 10.1016/j.tics.2010.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Passingham D, Sakai K. The prefrontal cortex and working memory: physiology and brain imaging. Curr. Opin. Neurobiol. 2004;14:163–168. doi: 10.1016/j.conb.2004.03.003. [DOI] [PubMed] [Google Scholar]
46.Lebedev MA, Messinger A, Kralik JD, Wise SP. Representation of attended versus remembered locations in prefrontal cortex. PLoS Biol. 2004;2:1919–1935. doi: 10.1371/journal.pbio.0020365. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Funahashi S, Chafee MV, Goldman-Rakic PS. Prefrontal neuronal activity in rhesus monkeys performing a delayed anti-saccade task. Nature. 1993;365:753–756. doi: 10.1038/365753a0. [DOI] [PubMed] [Google Scholar]
48.Rainer G, Rao SG, Miller EK. Prospective coding for objects in primate prefrontal cortex. J. Neurosci. 1999;19:5493–5505. doi: 10.1523/JNEUROSCI.19-13-05493.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Brody CD, Hernandez A, Zainos A, Romo R. Timing and neural encoding of somatosensory parametric working memory in macaque prefrontal cortex. Cereb. Cortex. 2003;13:1196–1207. doi: 10.1093/cercor/bhg100. [DOI] [PubMed] [Google Scholar]
50.Bromberg-Martin ES, Matsumoto M, Nakahara H, Hikosaka O. Multiple timescales of memory in lateral habenula and dopamine neurons. Neuron. 2010;67:499–510. doi: 10.1016/j.neuron.2010.06.031. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS263877-supplement-1.doc^{(591.5KB, doc)}

[R1] 1.Kable JW, Glimcher PW. The neurobiology of decision: consensus and controversy. Neuron. 2009;63:733–745. doi: 10.1016/j.neuron.2009.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Rushworth MF, Behrens TE. Choice, uncertainty and value in prefrontal and cingulate cortex. Nat. Neurosci. 2008;11(4):389–97. doi: 10.1038/nn2066. [DOI] [PubMed] [Google Scholar]

[R3] 3.Wang X-J. Decision making in recurrent neural circuits. Neuron. 2008;60:215–234. doi: 10.1016/j.neuron.2008.09.034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Soltani A, Lee D, Wang X-J. Neural mechanism for stochastic behavior during a competitive game. Neural Networks. 2006;19:1075–1090. doi: 10.1016/j.neunet.2006.05.044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Sutton RS, Barto AG. Reinforcement Learning, An Introduction. MIT Press; Cambridge, MA: 1998. [Google Scholar]

[R6] 6.Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an uncertain world. Nat. Neurosci. 2007;10:1214–1221. doi: 10.1038/nn1954. [DOI] [PubMed] [Google Scholar]

[R7] 7.Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature. 2006;441:876–879. doi: 10.1038/nature04766. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Lau B, Glimcher PW. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 2005;84:555–579. doi: 10.1901/jeab.2005.110-04. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Corrado GS, Sugrue LP, Seung HS, Newsome WT. Linear-Nonlinear-Poisson models of primate choice dynamics. J. Exp. Anal. Behav. 2005;84:581–617. doi: 10.1901/jeab.2005.23-05. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Kennerley SW, Walton ME, Behrens TE, Buckley MJ, Rushworth MF. Optimal decision making and the anterior cingulate cortex. Nat. Neurosci. 2006;9:940–947. doi: 10.1038/nn1724. [DOI] [PubMed] [Google Scholar]

[R11] 11.Lee D, Conroy ML, McGreevy BP, Barraclough DJ. Reinforcement learning and decision making in monkeys during a competitive game. Cogn. Brain Res. 2004;22:45–58. doi: 10.1016/j.cogbrainres.2004.07.007. [DOI] [PubMed] [Google Scholar]

[R12] 12.Kim S, Hwang J, Seo H, Lee D. Valuation of uncertain and delayed rewards in primate prefrontal cortex. Neural Networks. 2009;22:294–304. doi: 10.1016/j.neunet.2009.03.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Maass W, Natschläger T, Markram H. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 2002;14:2531–2560. doi: 10.1162/089976602760407955. [DOI] [PubMed] [Google Scholar]

[R14] 14.Jaeger H, Lukosevicius M, Popovici D, Siewert U. Optimization and applications of echo state networks with leaky-integrator neurons. Neural Network. 2007;20:335–352. doi: 10.1016/j.neunet.2007.04.016. [DOI] [PubMed] [Google Scholar]

[R15] 15.Verstraeten D, Schrauwen B, D'Haene M, Stroobandt D. An experimental unification of reservoir computing methods. Neural Network. 2007;20:391–403. doi: 10.1016/j.neunet.2007.04.003. [DOI] [PubMed] [Google Scholar]

[R16] 16.Sussillo D, Abbott LF. Generating coherent patterns of activity from chaotic neural networks. Neuron. 2009;63:544–557. doi: 10.1016/j.neuron.2009.07.018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Bertschinger N, Natschlager T. Real-time computation at the edge of chaos in recurrent neural networks. Neural Comput. 2004;16:1413–1436. doi: 10.1162/089976604323057443. [DOI] [PubMed] [Google Scholar]

[R18] 18.Langton CG. Computation at the edge of chaos: phase transitions and emergent computations. Phisica D. 1990;42:12–37. [Google Scholar]

[R19] 19.Seo H, Lee D. Temporal filtering of reward signals in the dorsal anterior cingulate cortex during a mixed-strategy game. J. Neurosci. 2007;27:8366–8377. doi: 10.1523/JNEUROSCI.2369-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Seo H, Barraclough DJ, Lee D. Dynamic signals related to choices and outcomes in the dorsolateral prefrontal cortex. Cereb. Cortex. 2007;17:i110–i117. doi: 10.1093/cercor/bhm064. [DOI] [PubMed] [Google Scholar]

[R21] 21.Seo H, Barraclough DJ, Lee D. Lateral intraparietal cortex and reinforcement learning during a mixed-strategy game. J. Neurosci. 2009;29:7278–7289. doi: 10.1523/JNEUROSCI.1479-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Barraclough DJ, Conroy ML, Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nature Neuroscience. 2004;7:404–410. doi: 10.1038/nn1209. [DOI] [PubMed] [Google Scholar]

[R23] 23.Lapish CC, Durstewitz D, Chandler LJ, Seamans JK. Successful choice behavior is associated with distinct and coherent network states in anterior cingulate cortex. Proc. Natl. Acad. Sci. USA. 2008;105:11963–11968. doi: 10.1073/pnas.0804045105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Sigala N, Kusonoki M, Nimmo-Smith I, Gaffan D, Duncan J. Hierarchical coding for sequential task events in the monkey prefrontal cortex. Proc. Natl. Acad. Sci. USA. 2008;105:11969–11974. doi: 10.1073/pnas.0802569105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Jin DZ, Fujii N, Graybiel AN. Neural representation of time in cortico-basal ganglia circuits. Proc. Natl. Acad. Sci. USA. 2009;106:19156–19161. doi: 10.1073/pnas.0909881106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Trefethen LN, Embree M. Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators. Princeton University Press; Princeton, NJ: 2005. [Google Scholar]

[R27] 27.Murphy BK, Miller KD. Balanced amplification: a new mechanism of selective amplification of neural activity patterns. Neuron. 2009;61:635–648. doi: 10.1016/j.neuron.2009.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Ganguli S, Huh D, Sompolinsky H. Memory traces in dynamical systems. Proc. Natl. Acad. Sci. USA. 2008;105:18970–18975. doi: 10.1073/pnas.0804451105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Goldman MS. Memory without feedback in a neural network. Neuron. 2009;61:621–634. doi: 10.1016/j.neuron.2008.12.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Schneidman E, Berry MJ, Segev R, Bialek W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature. 2006;440:1007–1012. doi: 10.1038/nature04701. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Brunel N, Hakim V, Isope P, Nadal J-P, Barbour B. Optimal information storage and the distribution of synaptic weights: perceptron versus purkinje cell. Neuron. 2004;43:745–757. doi: 10.1016/j.neuron.2004.08.023. [DOI] [PubMed] [Google Scholar]

[R32] 32.Ganguli S, Bisley JW, Roitman JD, Shadlen MN, Goldberg ME, Miller KD. One-dimensional dynamics of attention and decision making in LIP. Neuron. 2008;58:15–25. doi: 10.1016/j.neuron.2008.01.038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Duncan J. An adaptive coding model of neural function in prefrontal cortex. Nat. Rev. Neurosci. 2001;2:820–829. doi: 10.1038/35097575. [DOI] [PubMed] [Google Scholar]

[R34] 34.Rigotti M, Rubin DBD, Wang X-J, Fusi S. Internal representation of task rules by recurrent dynamics: the importance of the diversity of neural responses. Front. Comput. Neurosci. 2010;4:24. doi: 10.3389/fncom.2010.00024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Botvinick MM, Braver TS, Barch DM, Carter CS, Cohen JD. Conflict monitoring and cognitive control. Psychol. Rev. 2001;108:624–52. doi: 10.1037/0033-295x.108.3.624. [DOI] [PubMed] [Google Scholar]

[R36] 36.Holroyd CB, Coles MGH. The neural basis of human error processing: reinforcement learning, dopamine, and error-related negativity. Psychol. Rev. 2002;109:679–709. doi: 10.1037/0033-295X.109.4.679. [DOI] [PubMed] [Google Scholar]

[R37] 37.Wallis JD, Kennerley SW. Heterogeneous reward signals in prefrontal cortex. Curr. Opin. Neurobiol. 2010;20:191–198. doi: 10.1016/j.conb.2010.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Platt ML, Glimcher PW. Neural correlates of decision variables in parietal cortex. Nature. 1999;400:233–238. doi: 10.1038/22268. [DOI] [PubMed] [Google Scholar]

[R39] 39.Roitman JD, Shadlen MN. Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J. Neurosci. 2002;22:9475–9489. doi: 10.1523/JNEUROSCI.22-21-09475.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Bayer HM, Glimcher PW. Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal. Neuron. 2005;47:129–141. doi: 10.1016/j.neuron.2005.05.020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Rainer G, Miller EK. Time course of object-related neural activity in the primate prefrontal cortex during a short-term memory task. Eur J Neurosci. 2002;15:1244–1254. doi: 10.1046/j.1460-9568.2002.01958.x. [DOI] [PubMed] [Google Scholar]

[R42] 42.Machens CK, Romo R, Brody CD. Functional, but not anatomical, separation of “what” and “when” in prefrontal cortex. J Neurosci. 2010;30:350–360. doi: 10.1523/JNEUROSCI.3276-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Shafi M, Zhou Y, Quintana J, Chow C, Fuster J, Bodner M. Variability in neuronal activity in primate cortex during working memory tasks. Neuroscience. 2007;146:1082–108. doi: 10.1016/j.neuroscience.2006.12.072. [DOI] [PubMed] [Google Scholar]

[R44] 44.Curtis CE, Lee D. Beyond working memory: the role of persistent activity in decision making. Trends Cogn Sci. 2010;14:216–222. doi: 10.1016/j.tics.2010.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] 45.Passingham D, Sakai K. The prefrontal cortex and working memory: physiology and brain imaging. Curr. Opin. Neurobiol. 2004;14:163–168. doi: 10.1016/j.conb.2004.03.003. [DOI] [PubMed] [Google Scholar]

[R46] 46.Lebedev MA, Messinger A, Kralik JD, Wise SP. Representation of attended versus remembered locations in prefrontal cortex. PLoS Biol. 2004;2:1919–1935. doi: 10.1371/journal.pbio.0020365. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] 47.Funahashi S, Chafee MV, Goldman-Rakic PS. Prefrontal neuronal activity in rhesus monkeys performing a delayed anti-saccade task. Nature. 1993;365:753–756. doi: 10.1038/365753a0. [DOI] [PubMed] [Google Scholar]

[R48] 48.Rainer G, Rao SG, Miller EK. Prospective coding for objects in primate prefrontal cortex. J. Neurosci. 1999;19:5493–5505. doi: 10.1523/JNEUROSCI.19-13-05493.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] 49.Brody CD, Hernandez A, Zainos A, Romo R. Timing and neural encoding of somatosensory parametric working memory in macaque prefrontal cortex. Cereb. Cortex. 2003;13:1196–1207. doi: 10.1093/cercor/bhg100. [DOI] [PubMed] [Google Scholar]

[R50] 50.Bromberg-Martin ES, Matsumoto M, Nakahara H, Hikosaka O. Multiple timescales of memory in lateral habenula and dopamine neurons. Neuron. 2010;67:499–510. doi: 10.1016/j.neuron.2010.06.031. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A reservoir of time constants for memory traces in cortical neurons

Alberto Bernacchia

Hyojung Seo

Daeyeol Lee

Xiao-Jing Wang

Abstract

Results