Skip to main content
eLife logoLink to eLife
. 2023 Mar 16;12:e80680. doi: 10.7554/eLife.80680

Neural learning rules for generating flexible predictions and computing the successor representation

Ching Fang 1, Dmitriy Aronov 1, LF Abbott 1, Emily L Mackevicius 1,2,
Editors: Srdjan Ostojic3, Timothy E Behrens4
PMCID: PMC10019889  PMID: 36928104

Abstract

The predictive nature of the hippocampus is thought to be useful for memory-guided cognitive behaviors. Inspired by the reinforcement learning literature, this notion has been formalized as a predictive map called the successor representation (SR). The SR captures a number of observations about hippocampal activity. However, the algorithm does not provide a neural mechanism for how such representations arise. Here, we show the dynamics of a recurrent neural network naturally calculate the SR when the synaptic weights match the transition probability matrix. Interestingly, the predictive horizon can be flexibly modulated simply by changing the network gain. We derive simple, biologically plausible learning rules to learn the SR in a recurrent network. We test our model with realistic inputs and match hippocampal data recorded during random foraging. Taken together, our results suggest that the SR is more accessible in neural circuits than previously thought and can support a broad range of cognitive functions.

Research organism: Other

eLife digest

Memories are an important part of how we think, understand the world around us, and plan out future actions. In the brain, memories are thought to be stored in a region called the hippocampus. When memories are formed, neurons store events that occur around the same time together. This might explain why often, in the brains of animals, the activity associated with retrieving memories is not just a snapshot of what happened at a specific moment-- it can also include information about what the animal might experience next. This can have a clear utility if animals use memories to predict what they might experience next and plan out future actions.

Mathematically, this notion of predictiveness can be summarized by an algorithm known as the successor representation. This algorithm describes what the activity of neurons in the hippocampus looks like when retrieving memories and making predictions based on them. However, even though the successor representation can computationally reproduce the activity seen in the hippocampus when it is making predictions, it is unclear what biological mechanisms underpin this computation in the brain.

Fang et al. approached this problem by trying to build a model that could generate the same activity patterns computed by the successor representation using only biological mechanisms known to exist in the hippocampus. First, they used computational methods to design a network of neurons that had the biological properties of neural networks in the hippocampus. They then used the network to simulate neural activity. The results show that the activity of the network they designed was able to exactly match the successor representation. Additionally, the data resulting from the simulated activity in the network fitted experimental observations of hippocampal activity in Tufted Titmice.

One advantage of the network designed by Fang et al. is that it can generate predictions in flexible ways,. That is, it canmake both short and long-term predictions from what an individual is experiencing at the moment. This flexibility means that the network can be used to simulate how the hippocampus learns in a variety of cognitive tasks. Additionally, the network is robust to different conditions. Given that the brain has to be able to store memories in many different situations, this is a promising indication that this network may be a reasonable model of how the brain learns.

The results of Fang et al. lay the groundwork for connecting biological mechanisms in the hippocampus at the cellular level to cognitive effects, an essential step to understanding the hippocampus, as well as its role in health and disease. For instance, their network may provide a concrete approach to studying how disruptions to the ways neurons make and break connections can impair memory formation. More generally, better models of the biological mechanisms involved in making computations in the hippocampus can help scientists better understand and test out theories about how memories are formed and stored in the brain.

Introduction

To learn from the past, plan for the future, and form an understanding of our world, we require memories of personal experiences. These memories depend on the hippocampus for formation and recall (Scoville and Milner, 1957; Penfield and Milner, 1958; Corkin, 2002), but an algorithmic and mechanistic understanding of memory formation and retrieval in this region remains elusive. From a computational perspective, a key function of memory is to use past experiences to inform predictions of possible futures (Bubic et al., 2010; Wayne et al., 2018; Whittington et al., 2020; Momennejad, 2020). This suggests that hippocampal memory is stored in a way that is particularly suitable for forming predictions. Consistent with this hypothesis, experimental work has shown that, across species and tasks, hippocampal activity is predictive of the future experience of an animal (Skaggs and McNaughton, 1996; Lisman and Redish, 2009; Mehta et al., 1997; Payne et al., 2021; Muller and Kubie, 1989; Pfeiffer and Foster, 2013; Schapiro et al., 2016; Garvert et al., 2017). Furthermore, theoretical work has found that models endowed with predictive objectives tend to resemble hippocampal activity (Blum and Abbott, 1996; Mehta et al., 2000; Stachenfeld et al., 2017; Momennejad et al., 2017; Geerts et al., 2020; Recanatesi et al., 2021; Whittington et al., 2020; George et al., 2021). Thus, it is clear that predictive representations are an important aspect of hippocampal memory.

Inspired by work in the reinforcement learning (RL) field, these observations have been formalized by describing hippocampal activity as a predictive map under the successor representation (SR) algorithm (Dayan, 1993; Gershman et al., 2012; Stachenfeld et al., 2017). Under this framework, an animal’s experience in the world is represented as a trajectory through some defined state space, and hippocampal activity predicts the future experience of an animal by integrating over the likely states that an animal will visit given its current state. This algorithm further explains how, in addition to episodic memory, the hippocampus may support relational reasoning and decision making (Recanatesi et al., 2021; Mattar and Daw, 2018), consistent with differences in hippocampal representations in different tasks (Markus et al., 1995; Jeffery, 2021). The SR framework captures many experimental observations of neural activity, leading to a proposed computational function for the hippocampus (Stachenfeld et al., 2017).

While the SR algorithm convincingly argues for a computational function of the hippocampus, it is unclear what biological mechanisms might compute the SR in a neural circuit. Thus, several relevant questions remain that are difficult to probe with the current algorithm. What kind of neural architecture should one expect in a region that can support this computation? Are there distinct forms of plasticity and neuromodulation needed in this system? What is the structure of hippocampal inputs to be expected? A biologically plausible model can explore these questions and provide insight into both mechanism and function (Marr and Poggio, 1976; Frank, 2015; Love, 2021).

In other systems, it has been possible to derive biological mechanisms with the goal of achieving a particular network function or property (Zeldenrust et al., 2021; Karimi et al., 2022; Pehlevan et al., 2017; Olshausen and Field, 1996; Burbank, 2015; Aitchison et al., 2021; Földiák, 1990; Tyulmankov et al., 2022). Key to many of these models is the constraint that learning rules at any given neuron can only use information local to that neuron. A promising direction towards such a neural model of the SR is to use the dynamics of a recurrent neural network (RNN) to perform SR computations (Vértes and Sahani, 2019; Russek et al., 2017). An RNN model is particularly attractive as the hippocampus is highly recurrent, and its connectivity patterns are thought to support associative learning and recall (Gardner-Medwin, 1976; McNaughton and Morris, 1987; Marr et al., 1991; Liu et al., 2012). However, an RNN model of the SR has not been tied to neural learning rules that support its operation and allow for testing of specific hypotheses.

Here, we show that an RNN with local learning rules and an adaptive learning rate exactly calculates the SR at steady state. We test our model with realistic inputs and make comparisons to neural data. In addition, we compare our results to the standard SR algorithm with respect to the speed of learning and the learned representations in cases where multiple solutions exist. Our work provides a mechanistic account for an algorithm that has been frequently connected to the hippocampus, but could only be interpreted at an algorithmic level. This network-level perspective allows us to make specific predictions about hippocampal mechanisms and activity.

Results

The successor representation

The SR algorithm described in Stachenfeld et al., 2017 first discretizes the environment explored by an animal (whether a physical or abstract space) into a set of n states that the animal transitions through over time (Figure 1A). The animal’s behavior can then be thought of as a Markov chain with a corresponding transition probability matrix TN×N (Figure 1B). T gives the probability that the animal transitions to a state s from the state s in one time step: Tji=P(s=i|s=j). The SR matrix is defined as

M=t=0γtTt=(IγT)1 (1)

Figure 1. The successor representation and an analogous recurrent network model.

Figure 1.

(A) The behavior of an animal running down a linear track can be described as a transition between discrete states where the states encode spatial location. (B) By counting the transitions between different states, the behavior of an animal can be summarized in a transition probability matrix T. (C) The successor representation matrix is defined as M=t=0γtTt. Here, M is shown for γ=0.6. Dashed boxes indicate the slices of M shown in (D) and (E). (D) The fourth row of the M matrix describes the activity of each state-encoding neuron when the animal is at the fourth state. (E) The fourth column of the M matrix describes the place field of the neuron encoding the fourth state. (F) Recurrent network model of the SR (RNN-S). The current state of the animal is one-hot encoded by a layer of input neurons. Inputs connect one-to-one onto RNN neurons with synaptic connectivity matrix J=T. The activity of the RNN neurons are represented by x. SR activity is read out from one-to-one connections from the RNN neurons to the output neurons. The example here shows inputs and outputs when the animal is at state 4. (G) Feedforward neural network model of the SR (FF-TD). The M matrix is encoded in the weights from the input neurons to the output layer neurons, where the SR activity is read out. (H) Diagram of the terms used for the RNN-S learning rule. Terms in red are used for potentiation while terms in blue are used for normalization (Equation 4). (I) As in (H) but for the feedforward-TD model (Equation 11). To reduce the notation indicating time steps, we use in place of (t) and no added notation for (t-1).

Here, γ(0,1) is a temporal discount factor. Mji can be seen as a measure of the occcupancy of state i over time if the animal starts at state j, with γ controlling how much to discount time steps in the future (Figure 1C). The SR of state j is the jth row of M and represents the states that an animal is likely to transition to from state j. Stachenfeld et al., 2017 demonstrate that, if one assumes each state drives a single neuron, the SR of j resembles the population activity of hippocampal neurons when the animal is at state j (Figure 1D). They also show that the ith column of M resembles the place field (activity as a function of state) of a hippocampal neuron representing state i (Figure 1E). In addition, the ith column of M shows which states are likely to lead to state i.

Recurrent neural network computes SR at steady state

We begin by drawing connections between the SR algorithm (Stachenfeld et al., 2017) and an analogous neural network architecture. The input to the network encodes the current state of the animal and is represented by a layer of input neurons (Figure 1FG). These neurons feed into the rest of the network that computes the SR (Figure 1FG). The SR is then read out by a layer of output neurons so that downstream systems receive a prediction of the upcoming states (Figure 1FG). We will first model the inputs ϕ as one-hot encodings of the current state of the animal (Figure 1FG). That is, each input neuron represents a unique state, and input neurons are one-to-one connected to the hidden neurons.

We first consider an architecture in which a recurrent neural network (RNN) is used to compute the SR (Figure 1F). Let us assume that the T matrix is encoded in the synaptic weights of the RNN. In this case, the steady state activity of the network in response to input ϕ retrieves a row of the SR matrix, Mϕ (Figure 1F, subsection 4.14). Intuitively, this is because each recurrent iteration of the RNN progresses the prediction by one transition. In other words, the tth recurrent iteration raises T to the tth power as in Equation 1. To formally derive this result, we first start by defining the dynamics of our RNN with classical rate network equations (Amarimber, 1972). At time t, the firing rate x(t) of N neurons given each neurons’ input ϕ(t) follows the discrete-time dynamics (assuming a step size Δt=1)

Δx=x(t)+γJf(x(t))+ϕ(t) (2)

Here, γ scales the recurrent activity and is a constant factor for all neurons. The synaptic weight matrix JN×N is defined such that Jij is the synaptic weight from neuron j to neuron i. Notably, this notation is transposed from what is used in RL literature, where conventions have the first index as the starting state. Generally, f is some nonlinear function in Equation 2. For now, we will consider f to be the identity function, rendering this equation linear. Under this assumption, we can solve for the steady state activity xss as

xss=(IγJ)1ϕ (3)

Equivalence between Equation 1 and Equation 3 is clearly reached when J=T (Russek et al., 2017; Vértes and Sahani, 2019). Thus, if the network can learn T in its synaptic weight matrix, it will exactly compute the SR.

Here, the factor γ represents the gain of the neurons in the network, which is factored out of the synaptic strengths characterized by J. Thus, γ is an independently adjustable factor that can flexibly control the strength of the recurrent dynamics (see Sompolinsky et al., 1988). A benefit of this flexibility is that the system can retrieve successor representations of varying predictive strengths by modulating the gain factor γ. In this way, the predictive horizon can be dynamically controlled without any additional learning required. We will refer to the γ used during learning of the SR as the baseline γ, or γB.

We next consider what is needed in a learning rule such that J approximates T. In order to learn a transition probability matrix, a learning rule must associate states that occur sequentially and normalize the synaptic weights into a valid probability distribution. We derive a learning rule that addresses both requirements (Figure 1H, Appendix 2),

ΔJij=ηxi(t)xj(t1)ηxj(t1)kJikxk(t1), (4)

where η is the learning rate. The first term in Equation 4 is a temporally asymmetric potentiation term that counts states that occur in sequence. This is similar to spike-timing dependent plasticity, or STDP (Bi and Poo, 1998; Skaggs and McNaughton, 1996; Abbott and Blum, 1996).

The second term in Equation 4 is a form of synaptic depotentiation. Depotentiation has been hypothesized to be broadly useful for stabilizing patterns and sequence learning (Földiák, 1990; Fiete et al., 2010), and similar inhibitory effects are known to be elements of hippocampal learning (Kullmann and Lamsa, 2007; Lamsa et al., 2007). In our model, the depotentiation term in Equation 4 imposes local anti-Hebbian learning at each neuron– that is, each column of J is normalized independently. This normalizes the observed transitions from each state by the number of visits to that state, such that transition statistics are correctly captured. We note, however, that other ways of column-normalizing the synaptic weight matrix may give similar representations (Appendix 7).

Crucially, the update rule (Equation 4) uses information local to each neuron (Figure 1H), an important aspect of biologically plausible learning rules. We show that, in the asymptotic limit, the update rule extracts information about the inputs ϕ and learns T exactly despite having access only to neural activity x (Appendix 3). We will refer to an RNN using Equation 4 as the RNN-Successor, or RNN-S. Combined with recurrent dynamics (Equation 3), RNN-S computes the SR exactly (Figure 1H).

As an alternative to the RNN-S model, we consider the conditions necessary for a feedforward neural network to compute the SR. Under this architecture, the M matrix must be encoded in the weights from the input neurons to the hidden layer neurons (Figure 1G). This can be achieved by updating the synaptic weights with a temporal difference (TD) learning rule, the standard update used to learn the SR in the usual algorithm. Although the TD update learns the SR, it requires information about multiple input layer neurons to make updates for the synapse from input neuron j to output neuron i (Figure 1I). Thus, it is useful to explore other possible mechanisms that are simpler to compute locally. We refer to the model described in Figure 1I as the feedforward-TD (FF-TD) model. The FF-TD model implements the canonical SR algorithm.

Evaluating SR learning by biologically plausible learning rules

To evaluate the effectiveness of the RNN-S learning rule, we tested its accuracy in learning the SR matrix for random walks. Specifically, we simulated random walks with different transition biases in a 1D circular track environment (Figure 2A). The RNN-S can learn the SR for these random walks (Figure 2B).

Figure 2. Comparing the effects of an adaptive learning rate and plasticity kernels in RNN-S.

(A) Sample one-minute segments from random walks on a 1 meter circular track. Possible actions in this 1D walk are to move forward, stay in one place, or move backward. Action probabilities are uniform (top), biased to move forward (middle), or biased to stay in one place (bottom). (B) M matrices estimated by the RNN-S model in the full random walks from (A).(C) The proposed learning rate normalization. The learning rate ηj for synapses out of neuron j changes as a function of its activity xj and recency bias λ. Dotted lines are at [0.0,0.5,1.0]. (D) The mean row sum of T over time computed by the RNN-S with an adaptive learning rate (blue) or the RNN-S with static learning rates (orange). Darker lines indicate larger static learning rates. Lines show the average over 5 simulations from walks with a forward bias, and shading shows 95% confidence interval. A correctly normalized T matrix should have a row sum of 1.0. (E) As in (D), but for the mean absolute error in estimating T. (F) As in (E), but for mean absolute error in estimating the real M, and with performance of FF-TD included, with darker lines indicating slower learning rates for FF-TD. (G) Lap-based activity map of a neuron from RNN-S with static learning rate η=10-1.5. The neuron encodes the state at 45cm on a circular track. The simulated agent is moving according to forward-biased transition statistics. (H) As in (G), but for RNN-S with adaptive learning rate. (I) The learning rate over time for the neuron in (G) (orange) and the neuron in (H) (blue). (J) Mean-squared error (MSE) at the end of meta-learning for different plasticity kernels. The pre→post (K+) and post→pre (K-) sides of each kernel were modeled by Ae-1τ. Heatmap indices indicate the values τ s were fixed to. Here, K+ is always a positive function (i.e., A was positive), because performance was uniformly poor when K+ was negative. K- could be either positive (left, “Post → Pre Potentiation") or negative (right, “Post → Pre Depression"). Regions where the learned value for A was negligibly small were set to high errors. Errors are max-clipped at 0.03 for visualization purposes. 40 initializations were used for each K+ and K- pairing, and the heatmap shows the minimum error acheived over all intializations. (K) Plasticity kernels chosen from the areas of lowest error in the grid search from (J). Left is post → pre potentiation. Right is post → pre depression. Kernels are normalized by the maximum, and dotted lines are at one second intervals.

Figure 2.

Figure 2—figure supplement 1. Comparing model performance in different random walks.

Figure 2—figure supplement 1.

(a-c) As in Figure 2D–F of the main document, but for a walk with uniform action probabilities.

Because equivalence is only reached in the asymptotic limit of learning (i.e. ΔJ0), our RNN-S model learns the SR slowly. In contrast, animals are thought to be able to learn the structure of an environment quickly (Zhang et al., 2021), and neural representations in an environment can also develop quickly (Monaco et al., 2014; Sheffield and Dombeck, 2015; Bittner et al., 2015). To remedy this, we introduce a dynamic learning rate that allows for faster normalization of the synaptic weight matrix, similar to the formula for calculating a moving average (Appendix 4). For each neuron, suppose that a trace n of its recent activity is maintained with some time constant λ(0,1),

, n(t)=t<tλ(tt)x(t) (5)

If the learning rate of the outgoing synapses from each neuron j is inversely proportional to nj(η=1nj(t)), the update equation quickly normalizes the synapses to maintain a valid transition probability matrix (Appendix 4). Modulating synaptic learning rates as a function of neural activity is consistent with experimental observations of metaplasticity (Abraham and Bear, 1996; Abraham, 2008; Hulme et al., 2014). We refer to this as an adaptive learning rate and contrast it with the previous static learning rate. We consider the setting where λ=1, so the learning rate monotonically decreases over time (Figure 2C). In general, however, the learning rate could increase or decrease over time if λ<1 (Figure 2C), and n could be reset, allowing for rapid learning. Our learning rule with the adaptive learning rate is the same as in Equation 4, with the exception that η=min(1nj(t),1.0) for synapses J*j. This learning rule still relies only on information local to the neuron as in Figure 1H.

The RNN-S with an adaptive learning rate normalizes the synapses more quickly than a network with a static learning rate (Figure 2D, Figure 2—figure supplement 1) and learns T faster (Figure 2E, Figure 2—figure supplement 1). The RNN-S with a static learning rate exhibits more of a tradeoff between normalizing synapses quickly (Figure 2D, Figure 2—figure supplement 1A) and learning M accurately (Figure 2E, Figure 2—figure supplement 1). However, both versions of the RNN-S estimate M more quickly than the FF-TD model (Figure 2F, Figure 2—figure supplement 1).

Place fields can form quickly, but over time the place fields may skew if transition statistics are consistently biased (Stachenfeld et al., 2017; Monaco et al., 2014; Sheffield and Dombeck, 2015; Bittner et al., 2015). The adaptive learning rate recapitulates both of these effects, which are thought to be caused by slow and fast learning processes, respectively. A low learning rate can capture the biasing of place fields, which develops over many repeated experiences. This is seen in the RNN-S with a static learning rate (Figure 2G). However, a high learning rate is needed for hippocampal place cells to develop sizeable place fields in one-shot. Both these effects of slow and fast learning can be seen in the neural activity of an example RNN-S neuron with an adaptive learning rate (Figure 2H). After the first lap, a sizeable field is induced in a one-shot manner, centered at the cell’s preferred location. In subsequent laps, the place field slowly distorts to reflect the bias of the transition statistics (Figure 2H). The model is able to capture these learning effects because the adaptive learning rate transitions between high and low learning rates, unlike the static version (Figure 2I).

Thus far, we have assumed that the RNN-S learning rule uses pre→post activity over two neighboring time steps (Equation 4). A more realistic framing is that a convolution with a plasticity kernel determines the weight change at any synapse. We tested how this affects our model and what range of plasticity kernels best supports the estimation of the SR. We do this by replacing the pre→post potentiation term in Equation 4 with a convolution:

ΔJij=xi(t)t=tK+(tt)xj(t)+xj(t)t=tK(tt)xi(t)ηxj(t1)kJikxk(t1) (6)

In the above equation, the full kernel K is split into a pre→post kernel (K+) and a post→pre kernel (K-). K+ and K- are parameterized as independent exponential functions, Ae-t/τ.

To systematically explore the space of plasticity kernels that can be used to learn the SR, we performed a grid search over the sign and the time constants of the pre→post and post→pre sides of the plasticity kernels. For each fixed sign and time constant, we used an evolutionary algorithm to learn the remaining parameters that determine the plasticity kernel. We find that plasticity kernels that are STDP-like are more effective than others, although plasticity kernels with slight post→pre potentiation work as well (Figure 2J). The network is sensitive to the time constant and tends to find solutions for time constants around a few hundred milliseconds (Figure 2JK). Our robustness analysis indicates the timescale of a plasticity rule in such a circuit may be longer than expected by standard STDP, but within the timescale of changes in behavioral states. We note that this also contrasts with behavioral timescale plasticity (Bittner et al., 2015), which integrates over a window that is several seconds long. Finally, we see that even plasticity kernels with slightly different time constants may give results with minimal error from the SR matrix, even if they do not estimate the SR exactly (Figure 2J). This suggests that, although other plasticity rules could be used to model long-horizon predictions, the SR is a reasonable –although not strictly unique– model to describe this class of predictive representations.

RNN-S can compute the SR with arbitrary γR under a stable regime of γB

We next investigate how robust the RNN-S model is to the value of γ. Typically, for purposes of fitting neural data or for RL simulations, γ will take on values as high as 0.9 (Stachenfeld et al., 2017; Barreto et al., 2017). However, previous work that used RNN models reported that recurrent dynamics become unstable if the gain γ exceeds a critical value (Sompolinsky et al., 1988; Zhang et al., 2021). This could be problematic as we show analytically that the RNN-S update rule is effective only when the network dynamics are stable and do not have non-normal amplification (Appendix 2). If these conditions are not satisfied during learning, the update rule no longer optimizes for fitting the SR and the learned weight matrix will be incorrect.

We first test how the value of γB, the gain of the network during learning, affects the RNN-S dynamics. The dynamics become unstable when γB exceeds 0.6 (Figure 3—figure supplement 1). Specifically, the eigenvalues of the synaptic weight matrix exceed the critical threshold for learning when γB>0.6 (Figure 3A, ‘Linear’). As expected from our analytical results, the stability of the network is tied to the network’s ability to estimate M. RNN-S cannot estimate M well when γB>0.6 (Figure 3B, ‘Linear’). We explored two strategies to enable RNN-S to learn at high γ.

Figure 3. RNN-S requires a stable choice of γB during learning, and can compute SR with any γR (A) Maximum real eigenvalue of the J matrix at the end of random walks under different γB.

The network dynamics were either fully linear (solid) or had a tanh nonlinearity (dashed). Red line indicates the transition into an unstable regime. 45 simulations were run for each γB, line indicates mean, and shading shows 95% confidence interval. (B) MAE of M matrices learned by RNN-S with different γB. RNN-S was simulated with linear dynamics (solid line) or with a tanh nonlinearity added to the recurrent dynamics (dashed line). Test datasets used various biases in action probability selection. (C) M matrix learned by RNN-S with tanh nonlinearity added in the recurrent dynamics. A forward-biased walk on a circular track was simulated, and γB=0.8. (D) The true M matrix of the walk used to generate (C). (E) Simulated population activity over the first ten laps in a circular track with γB=0.4. Dashed box indicates the retrieval phase, where learning is turned off and γR=0.9. Boxes are zoomed in on three minute windows.

Figure 3.

Figure 3—figure supplement 1. Understanding the effects of recurrency on stability.

Figure 3—figure supplement 1.

(A) Mean absolute error (MAE) of M matrices learned by RNN-S with different baseline γ and different numbers of recurrent steps in dynamics. Test datasets used various biases in action probability selection. Errors are max-clipped at 101 for visualization purposes. (B) M matrix learned by RNN-S with two recurrent steps in dynamics and baseline γ=0.8. A forward-biased walk on a circular track was simulated. (C) As in (B), but for four recurrent steps. (D) As in (B), but for five recurrent steps. Three examples are shown from different sampled walks to highlight the runaway activity of the network. (E) As in (B) but for the RNN-S activity calculated as (I-γJ)-1. Note that this calculation amounts to an unstable fixed point in the dynamics that cannot be reached when the network is in an unstable regime. (F) Mean absolute error (MAE) in T made by RNN-S with linear dynamics using γB during learning. (G) MAE in M for γR made by RNN-S with linear dynamics using γB during learning. (H) As in (G), but the dynamics now have a tanh nonlinearity.

One way to tame this instability is to add a saturating nonlinearity into the dynamics of the network. This is a feature of biological neurons that is often incorporated in models to prevent unbounded activity (Dayan and Abbott, 2001) Specifically, instead of assuming the network dynamics are linear (f is the identity function in Equation 2), we add a hyperbolic tangent into the dynamics equation. This extends the stable regime of the network– the eigenvalues do not exceed the critical threshold until γB>0.8 (Figure 3A). Similar to the linear case, the network with nonlinear dynamics fits M well until the critical threshold for stability (Figure 3b). These differences are clear visually as well. While the linear network does not estimate M well for γB=0.8 (Figure 3B), the estimate of the nonlinear network (Figure 3c) is a closer match to the true M (Figure 3D). However, there is a tradeoff between the stabilizing effect of the nonlinearity and the potential loss of accuracy in calculating M with a nonlinearity (Figure 3—figure supplement 1).

We explore an alternative strategy for computing M with arbitrarily high γ in the range 0γ<1. We have thus far pushed the limits of the model in learning the SR for different γB. However, an advantage of our recurrent architecture is that γ is a global gain modulated independently of the synaptic weights. Thus, an alternative strategy for computing M with high γ is to consider two distinct modes that the network can operate under. First, there is a learning phase in which the plasticity mechanism actively learns the structure of the environment and the model is in a stable regime (i.e. γB is small). Separately, there is a retrieval phase during which the gain γR of the network can be flexibly modulated. By changing the gain, the network can compute the SR with arbitrary prediction horizons, without any changes to the synaptic weights. We show the effectiveness of separate network phases by simulating a 1D walk where the learning phase uses a small γB (Figure 3E). Halfway through the walk, the animal enters a retrieval mode and accurately computes the SR with higher γR (Figure 3E).

Under this scheme, the model can compute the SR for any γ<1 (Figure 3—figure supplement 1). The separation of learning and retrieval phases stabilizes neural dynamics and allows flexible tuning of predictive power depending on task context.

RNN-S can be generalized to more complex inputs with successor features

We wondered how RNN-S performs given more biologically realistic inputs. We have so far assumed that an external process has discretized the environment into uncorrelated states so that each possible state is represented by a unique input neuron. In other words, the inputs ϕ are one-hot vectors. However, inputs into the hippocampus are expected to be continuous and heterogeneous, with states encoded by overlapping sets of neurons (Hardcastle et al., 2017). When inputs are not one-hot, there is not always a canonical ground-truth T matrix to fit and the predictive representations are referred to as successor features (Barreto et al., 2017; Kulkarni et al., 2016). In this setting, the performance of a model estimating successor features is evaluated by the temporal difference (TD) loss function.

Using the RNN-S model and update rule (Equation 4), we explore more realistic inputs ϕ and refer to ϕ as ‘input features’ for consistency with the successor feature literature. We vary the sparsity and spatial correlation of the input features (Figure 4A). As before (Figure 3H), the network will operate in separate learning and retrieval modes, where γB is below the critical value for stability. Under these conditions, the update rule will learn

J=Rϕϕ(-1)Rϕϕ(0)-1 (7)

Figure 4. Generalizing the model to more realistic inputs.

(A) Illustration of possible feature encodings ϕ for two spatially adjacent states in green and red. Feature encodings may vary in sparsity level and spatial correlation. (B) Average value of the STDP component (red) and the decorrelative normalization (solid blue) component of the gradient update over the course of a random walk. In dashed blue is a simpler Oja-like independent normalization update for comparison. Twenty-five simulations of forward-biased walks on a circular track were run, and shading shows 95% confidence interval. Input features are 3% sparse, with 10 cm spatial correlation. (C) Top: Example population activity of neurons in the RNN-S using the full decorrelative normalization rule over a 2min window of a forward-biased random walk. Population activity is normalized by the maximum firing rate. Bottom: As above, but for RNN-S using the simplified normalization update. (D) Shifts in place field peaks after a half hour simulation from the first two minutes of a 1D walk. Proportion of shifts in RNN-S with one-hot inputs shown in gray. Proportion of shifts in RNN-S with feature encodings (10% sparsity, 7.5 cm spatial correlation, γR=0.8) shown in blue. Each data point is the average shift observed in one simulated walk, and each histogram is over 40 simulated walks. Solid line indicates the reported measure from Mehta et al., 2000.

Figure 4.

Figure 4—figure supplement 1. Comparing place field shift and skew effects for different feature encodings.

Figure 4—figure supplement 1.

(A–D) Average firing rate as a function of position on a circular track for four example neurons. The walk and feature encodings were generated as in Figure 4D of the main text. Each neuron is sampled from a different walk. ‘Before Learning’ refers to firing fields made from the first 2-minute window of the walk. ‘After Learning’ refers to firing fields made from the entire walk. (E–F) As in (A–D), but for two neurons from a walk where the features were one-hot encoded.

at steady state, where Rϕϕ(τ) is the correlation matrix of ϕ with time lag τ (Appendix 3). Thus, the RNN-S update rule has the effect of normalizing the input feature via a decorrelative factor (Rϕϕ(0)-1) and mapping the normalized input to the feature expected at the next time step in a STDP-like manner (Rϕϕ(-1)). This interpretation generalizes the result that J=T in the one-hot encoding case (Appendix 3).

We wanted to further explore the function of the normalization term. In the one-hot case, it operates over each synapse independently and makes a probability distribution. With more realistic inputs, it operates over a set of synapses and has a decorrelative effect. We first ask how the decorrelative term changes over learning of realistic inputs. We compare the mean value of the STDP term of the update (xi(t)xj(t-1)) to the normalization term of the update (xj(t-1)kJikxk(t-1)) during a sample walk (Figure 4B). The RNN-S learning rule has stronger potentiating effects in the beginning of the walk. As the model learns more of the environment and converges on the correct transition structure, the strength of the normalization term balances out the potentiation term. It may be that the normalization term is particularly important in maintaining this balance as inputs become more densely encoded. We test this hypothesis by using a normalization term that operates on each synapse independently (similar to Oja’s Rule, Oja, 1982, Appendix 5). We see that the equilibrium between potentiating and depressing effects is not achieved by this type of independent normalization (Figure 4B, Appendix 6).

We wondered whether the decorrelative normalization term is necessary for the RNN-S to develop accurate representations. By replacing the decorrelative term with an independent normalization, features from non-adjacent states begin to be associated together and the model activity becomes spatially non-specific over time (Figure 4C, top). In contrast, using the decorrelative term, the RNN-S population activity is more localized (Figure 4C, bottom).

Interestingly, we noticed an additional feature of place maps as we transitioned from one-hot feature encodings to more complex feature encodings. We compared the representations learned by the RNN-S in a circular track walk with one-hot features versus more densely encoded features. For both input distributions, the RNN-S displayed the same skewing in place fields seen in Figure 2 (Figure 4—figure supplement 1). However, the place field peaks of the RNN-S model additionally shifted backwards in space for the more complex feature encodings (Figure 4D). This was not seen for the one-hot encodings (Figure 4D). The shifting in the RNN-S model is consistent with the observations made in Mehta et al., 2000 and demonstrates the utility of considering more complex input conditions. A similar observation was made in Stachenfeld et al., 2017 with noisy state inputs. In both cases, field shifts could be caused by neurons receiving external inputs at more than one state, particularly at states leading up to its original field location.

RNN-S estimates successor features even with naturalistic trajectories

We ask whether RNN-S can accurately estimate successor features, particularly under conditions of natural behavior. Specifically, we used the dataset from Payne et al., 2021, gathered from foraging Tufted Titmice in a 2D arena (Figure 5A). We discretize the arena into a set of states and encode each state as in Section 2.5. Using position-tracking data from Payne et al., 2021, we simulate the behavioral trajectory of the animal as transitions through the discrete state space. The inputs into the successor feature model are the features associated with the states in the behavioral trajectory.

Figure 5. Fitting successor features to data with RNN-S over a variety of feature encodings.

(A) We use behavioral data from Payne et al, where a Tufted Titmouse randomly forages in a 2D environment while electrophysiological data is collected (replicated with permission from authors). Two example trajectories are shown on the right. (B) Temporal difference (TD) loss versus the spatial correlation of the input dataset, aggregated over all sparsity levels. Here, γR=0.75. Line shows mean, and shading shows 95% confidence interval. (C) As in (B), but measuring TD loss versus the sparsity level of the input dataset, aggregated over all spatial correlation levels. (D) TD loss for RNN-S with datasets with different spatial correlations and sparsities. Gray areas were not represented in the input dataset due to the feature generation process. Here, γR=0.75, and three simulations were run for each spatial correlation and sparsity pairing under each chosen γR. (E) As in (G), but for FF-TD. (F) TD loss of each model as a function of γR, aggregated over all input encodings. Line shows mean, and shading shows 95% confidence interval.

Figure 5.

Figure 5—figure supplement 1. Parameter sweep details and extended TD error plots.

Figure 5—figure supplement 1.

(A) The values of P (initial sparsity of random vectors before spatial smoothing) and σ sampled in our parameter sweep for Figures 5 and 6 in the main text. See methods 4.10 for more details of how feature encodings were generated. (B) The values of S (final sparsity of features, measured after spatial smoothing) and σ sampled in our parameter sweep for Figures 5 and 6 in the main text. (C) A sample state encoded by the firing rate of 200 input neurons. Here, s=0.11 and σ=2. (D) As in Figure 5F of the main text, with the results from a random feedforward network included (“Shuffle”). The random network was constructed by randomly drawing weights from the distribution of weights learned by the FF-TD network. The random network is representative of a model without learned structure but with a similar magnitude of weights as the FF-TD model. (E) Spatial correlation of the feature encoding for an example state with the features of all other states. The 14×14 states are laid out in their position in the 2D arena. Here, the sample state is the state in the center of the 2D arena and σ=2.0. (F) As in (E), but for σ=0.0. (G) As in Figure 5D of the main text, but for RNN-S (first row) and FF-TD (second row) with γR=0.4 (left column), γR=0.6 (middle column), and γR=0.8 (right column).

We first wanted to test whether the RNN-S model was robust across a range of different types of input features. We calculate the TD loss of the model as a function of the spatial correlation across inputs ϕ (Figure 5B). We find that the model performs well across a range of inputs but loss is higher when inputs are spatially uncorrelated. This is consistent with the observation that behavioral transitions are spatially local, such that correlations across spatially adjacent features aid in the predictive power of the model. We next examine the model performance as a function of the sparsity of inputs ϕ (Figure 5C). We find the model also performs well across a range of feature sparsity, with lowest loss when features are sparse.

To understand the interacting effects of spatial correlation and feature sparsity in more detail, we performed a parameter sweep over both of these parameters (Figure 5D, Figure 5—figure supplement 1A-F). We generated random patterns according to the desired sparsity and smoothness with a spatial filter to generate correlations. This means that the entire parameter space is not covered in our sweep (e.g. the top-left area with high correlation and high sparsity is not explored). Note that since we generate ϕ by randomly drawing patterns, the special case of one-hot encoding is also not included in the parameter sweep (one-hot encoding is already explored in Figure 2). The RNN-S seems to perform well across a wide range, with highest loss in regions of low spatial correlation and low sparsity.

We want to compare the TD loss of RNN-S to that of a non-biological model designed to minimize TD loss. We repeat the same parameter sweep over input features with the FF-TD model (Figure 5E, Figure 5—figure supplement 1G). The FF-TD model performs similarly to the RNN-S model, with lower TD loss in regions with low sparsity or higher correlation. We also tested how the performance of both models is affected by the strength of γR (Figure 5F). Both models show a similar increase in TD loss as γR increases, although the RNN-S has a slightly lower TD loss at high γ than the FF-TD model. Both models perform substantially better than a random network with weights of comparable magnitude (Figure 5—figure supplement 1D).

Unlike in the one-hot case, there is no ground-truth T matrix for non-one-hot inputs, so representations generated by RNN-S and FF-TD may look different, even at the same TD loss. Therefore, to compare the two models, it is important to compare representations to neural data.

RNN-S fits neural data in a random foraging task

Finally, we tested whether the neural representations learned by the models with behavioral trajectories from Figure 5 match hippocampal firing patterns. We performed new analysis on neural data from Payne et al., 2021 to establish a dataset for comparison. The neural data from Payne et al., 2021 was collected from electrophysiological recordings in titmouse hippocampus during freely foraging behavior (Figure 6A). Payne et al., 2021 discovered the presence of place cells in this area. We analyzed statistics of place cells recorded in the anterior region of the hippocampus, where homology with rodent dorsal hippocampus is hypothesized (Tosches et al., 2018). We calculated the distribution of place field size measured relative to the arena size (Figure 6B), as well as the distribution of the number of place fields per place cell (Figure 6C). Interestingly, with similar analysis methods, Henriksen et al., 2010 see similar statistics in the proximal region of dorsal CA1 in rats, indicating that our analyses could be applicable across organisms.

Figure 6. Comparing place fields from RNN-S to data.

(A) Dataset is from Payne et al, where a Tufted Titmouse randomly forages in a 2D environment while electrophysiological data is collected (replicated with permission from authors). (B) Distribution of place cells with some number of fields, aggregated over all cells recorded in all birds. (C) Distribution of place cells with some field size as a ratio of the size of the arena, aggregated over all cells recorded in all birds. (D) Average proportion of non-place cells in RNN-S, aggregated over simulations of randomly drawn trajectories from Payne et al. Feature encodings are varied by spatial correlation and sparsity as in Figure 5. Each simulation used 196 neurons. As before, three simulations were run for each spatial correlation and sparsity pairing under each chosen γR. (E) As in (D), but for average field size of place cells. (F) As in (D), but for average number of fields per place cell. (G) As in (D) and (E), but comparing place cell statistics using the KL divergence (DKL) between RNN-S and data from Payne et al. At each combination of input spatial correlation and sparsity, the distribution of field sizes is compared to the neural data, as is the distribution of number of fields per neuron, then the two DKL values are summed. Contour lines are drawn at DKL values of 1, 1.5, and 2 bits. (H) Place fields of cells chosen from the region of lowest KL divergence. (I) As in (G) but for FF-TD. (J) Change in KL divergence for field size as function of γ. Line shows mean, and shading shows 95% confidence interval. (K) Same as (J), but for number of fields.

Figure 6.

Figure 6—figure supplement 1. Extended place field evaluation plots.

Figure 6—figure supplement 1.

(A) As in Figure 6E–G of the main text, but for γR=0.4 (left column) and γR=0.8 (right column). In addition, the plots showing KL divergence (in bits) for the distribution of field sizes and number of fields per cell are shown. (B) As in (A) but for FF-TD. (C) A in Figure 6H of the main text, but for FF-TD with γR=0.4 and (D) FF-TD with γR=0.8. (E) Total KL divergence across γR for RNN-S, FF-TD, the random network from Figure 6D (‘Shuffle’), and the split-half noise floor from the Payne et al. dataset (‘Data’). This noise floor is calculated by comparing the place field statistics of a random halves of the neurons from Payne et al. We measure the KL divergence between the distributions calculated from each random half. This is repeated 500 times, and it is representative of a lower bound on KL divergence. Intuitively, it should not be possible to fit the data of Payne et al as well as the dataset itself can.

In order to test how spatial representations in the RNN-S are impacted by input features, we performed parameter sweeps over input statistics. As in Payne et al., 2021, we define place cells in the model as cells with at least one statistically significant place field under permutation tests. Under most of the parameter range, all RNN-S neurons would be identified as a place cell (Figure 6D). However, under conditions of high spatial correlation and low sparsity, a portion of neurons (12%) do not have any fields in the environment. These cells are excluded from further analysis. We measured how the size of place fields varies across the parameter range (Figure 6E). The size of the fields increases as a function of the spatial correlation of the inputs, but is relatively insensitive to sparsity. This effect can be explained as the spatial correlation of the inputs introducing an additional spatial spread in the neural activity. Similarly, we measured how the number of place fields per cell varies across the parameter range (Figure 6F). The number of fields is maximal for conditions in which input features are densely encoded and spatial correlation is low. These are conditions in which each neuron receives inputs from multiple, spatially distant states.

Finally, we wanted to identify regions of parameter space that were similar to the data of Payne et al., 2021. We measured the KL divergence between our model’s place field statistics (Figure 6DE) and the statistics measured in Payne et al., 2021 (Figure 6BC). We combined the KL divergence of both these distributions to find the parameter range in which the RNN-S best fits neural data (Figure 6G). This optimal parameter range occurs when inputs have a spatial correlation of σ8.75 cm and sparsity 0.15. We note that the split-half noise floor of the dataset of Payne et al., 2021 is a KL divergence of 0.12 bits (Figure 6—figure supplement 1E). We can visually confirm that the model fits the data well by plotting the place fields of RNN-S neurons (Figure 6H).

We wondered whether the predictive gain (γR) of the representations affects the ability of the RNN-S to fit data. The KL divergence changes only slightly as a function of γR. Mainly, the KL-divergence of the place field size increases as γR increases (Figure 6I), but little effect is seen in the distribution of the number of place fields per neuron (Figure 6J).

We next tested whether the neural data was better fit by representations generated by RNN-S or the FF-TD model. Across all parameters of the input features, despite having similar TD loss (Figure 5DE), the FF-TD model has much higher divergence from neural data (Figure 6GI, Figure 6—figure supplement 1), similar to a random feedforward network (Figure 6—figure supplement 1E).

Overall, our RNN-S model seems to strike a balance between performance in estimating successor features, similarity to data, and biological plausibility. Furthermore, our analyses provide a prediction of the input structure into the hippocampus that is otherwise not evident in an algorithmic description or in a model that only considers one-hot feature encodings.

Discussion

Hippocampal memory is thought to support a wide range of cognitive processes, especially those that involve forming associations or making predictions. However, the neural mechanisms that underlie these computations in the hippocampus are not fully understood. A promising biological substrate is the recurrent architecture of the CA3 region of the hippocampus and the plasticity rules observed. Here, we showed how a recurrent network with local learning rules can implement the successor representation, a predictive algorithm that captures many observations of hippocampal activity. We used our neural circuit model to make specific predictions of biological processes in this region.

A key component of our plasticity rule is a decorrelative term that depresses synapses based on coincident activity. Such anti-Hebbian or inhibitory effects are hypothesized to be broadly useful for learning, especially in unsupervised learning with overlapping input features (Litwin-Kumar and Doiron, 2014; Sadeh and Clopath, 2021; Pehlevan et al., 2018; Payne et al., 2021). Consistent with this hypothesis, anti-Hebbian learning has been implicated in circuits that perform a wide range of computations, from distinguishing patterns (Földiák, 1990), to familiarity detection (Tyulmankov et al., 2022), to learning birdsong syllables (Mackevicius et al., 2020). This inhibitory learning may be useful because it decorrelates redundant information, allowing for greater specificity and capacity in a network (Sadeh and Clopath, 2021; Földiák, 1990). Our results provide further support of these hypotheses and predict that anti-Hebbian learning is fundamental to a predictive neural circuit.

We derive an adaptive learning rate that allows our model to quickly learn a probability distribution, and generally adds flexibility to the learning process. The adaptive learning rate changes such that neurons that are more recently active have a slower learning rate. This is consistent with experimental findings of metaplasticity at synapses (Abraham and Bear, 1996; Abraham, 2008; Hulme et al., 2014), and theoretical proposals that metaplasticity tracks the uncertainty of information (Aitchison et al., 2021). In RNN-S, the adaptive learning rate improves the speed of learning and better recapitulates hippocampal data. Our adaptive learning rate also has interesting implications for flexible learning. Memory systems must be able to quickly learn new associations throughout their lifetime without catastrophe. Our learning rate is parameterized by a forgetting term λ that controls the timescale in which environmental statistics are expected to be stationary. Although we fixed λ=1 in our simulations, there are computational benefits in considering cases where λ<1. This parameter provides a natural way for a memory system to forget gradually over time and prioritize recent experiences, in line with other theoretical studies that have also suggested that learning and forgetting on multiple timescales allow for more flexible behavior (Kaplanis et al., 2018; Fusi et al., 2007).

We tested the sensitivity of our network to various parameters and found a broad range of valid solutions. Prior work has sought to understand how an emergent property of a network could be generated by multiple unique solutions (Goldman et al., 2001; Prinz et al., 2004; Bittner et al., 2021; Hertäg and Clopath, 2022). It has been suggested that redundancy in solution space makes systems more robust, accounting for margins of error in the natural world (Marder and Goaillard, 2006; Marder and Taylor, 2011). In a similar vein, our parameter sweep over plasticity kernels revealed that a sizeable variety of kernels give solutions that resemble the SR. Although our model was initially sensitive to the value of γ, we found that adding biological components, such as nonlinear dynamics and separate network modes, broadened the solution space of the network.

Several useful features arise from the fact that RNN-S learns the transition matrix T directly, while separating out the prediction timescale, γ, as a global gain factor. It is important for animals to engage in different horizons of prediction depending on task or memory demands (Mattar and Lengyel, 2022; Bellmund et al., 2020). In RNN-S, changing the prediction time horizon is as simple as increasing or decreasing the global gain of the network. Mechanistically, this could be accomplished by a neuromodulatory gain factor that boosts γ, perhaps by increasing the excitability of all neurons (Heckman et al., 2009; Nadim and Bucher, 2014). In RNN-S, it was useful to have low network gain during learning (γB), while allowing higher gain during retrieval to make longer timescale predictions (γR). This could be accomplished by a neuromodulatory factor that switches the network into a learning regime (Pawlak et al., 2010; Brzosko et al., 2019), for example acetylcholine, which reduces the gain of recurrent connections and increases learning rates (Hasselmo, 1999; Hasselmo, 2006). The idea that the hippocampus might compute the SR with flexible γ could help reconcile recent results that hippocampal activity does not always match high-γ SR (Widloski and Foster, 2022; Duvelle et al., 2021). Additionally, flexibility in predictive horizons could explain the different timescales of prediction observed across the anatomical axes of the hippocampus and entorhinal cortex (Jung et al., 1994; Dolorfo and Amaral, 1998; Brun et al., 2008; Kjelstrup et al., 2008; Strange et al., 2014; Poppenk et al., 2013; Brunec and Momennejad, 2022). Specifically, a series of successor networks with different values of γ used in retrieval could establish a gradient of predictive timescales. Functionally, this may allow for learning hierarchies of state structure and could be useful for hierarchical planning (McKenzie et al., 2014; Momennejad and Howard, 2018; Ribas-Fernandes et al., 2019).

Estimating T directly provides RNN-S with a means to sample likely future trajectories, or distributions of trajectories, which is computationally useful for many memory-guided cognitive tasks beyond reinforcement learning, including reasoning and inference (Ostojic and Fusi, 2013; Goodman et al., 2016). The representation afforded by T may also be particularly accessible in neural circuits. Ostojic and Fusi, 2013 note that only few general assumptions are needed for synaptic plasticity rules to estimate transition statistics. Thus, it is reasonable to assume that some form of transition statistics are encoded broadly across the brain.

Interestingly, we also found that the recurrent network fit hippocampal data better than a feedforward network. An interesting direction for further work involves untangling which brain areas and cognitive functions can be explained by deep (feed forward) neural networks (Bonnen et al., 2021), and which rely on recurrent architectures, or even richer combinations of generative structures (Das et al., 2021). Recurrent networks, such as RNN-S, support generative sequential sampling, reminiscent of hippocampal replay, which has been proposed as a substrate for planning, imagination, and structural inference (Foster and Wilson, 2006; Singer et al., 2013; Momennejad and Howard, 2018; Evans and Burgess, 2020; Kay et al., 2020).

There are inherent limitations to the approach of using a recurrent network to estimate the SR. For instance, network dynamics can be prone to issues of instability due to the recurrent buildup of activity. To prevent this instability, we introduce two different modes of operation, ‘learning’ and ‘retrieval’. An additional limitation is that errors in the estimated one-step transition can propagate over the course of the predictive rollout. This is especially problematic if features are more densely coded or more correlated, which makes one-step transition estimations more difficult. These insights into vulnerabilities of a recurrent network have interesting parallels in biology. Some hippocampal subfields are known to be highly recurrent (Schaffer, 1892; Ramón and Cajal, 1904; Miles and Wong, 1986; Le Duigou et al., 2014). This recurrency has been linked to the propensity of the hippocampus to enter unstable regimes, such as those that produce seizures (Sparks et al., 2020; Thom, 2014; Lothman et al., 1991; Knight et al., 2012). It remains an open question how a healthy hippocampus maintains stable activity, and to what extent the findings in models such as ours can suggest biological avenues to tame instability.

Other recent theoretical works have also sought to find biological mechanisms to learn successor representations, albeit with different approaches (Brea et al., 2016; de Cothi and Barry, 2020; Bono et al., 2023; Lee, 2022; George et al., 2023). For instance, the model from George et al., 2023 explores a feedforward network that takes advantage of theta phase-precession to learn the SR. They analyze how place cells deform around boundaries and the function of the dorsal-ventral gradient in field size. The model introduced by Bono et al., 2023 uses a feedforward network with hippocampal replay. They explore how replay can modulate the bias-variance tradeoff of their SR estimate and apply their model to fear-conditioning data. It is important to note that these mechanisms are not mutually exclusive with RNN-S. Taken together with our work, these models suggest that there are multiple ways to learn the SR in a biological circuit and that these representations may be more accessible to neural circuits than previously thought.

Methods

Code availability

Code is posted on Github: https://github.com/chingf/sr-project; Fang, 2022.

Random walk simulations

We simulated random walks in 1D (circular track) and 2D (square) arenas. In 1D simulations, we varied the probability of staying in the current state and transitioning forwards or backwards to test different types of biases on top of a purely random walk. In 2D simulations, the probabilities of each possible action were equal. In our simulations, one timestep corresponds to 1/3 second and spatial bins are assumed to be 5 cm apart. This speed of movement (15 cm/s) was chosen to be consistent with previous experiments. In theory, one can imagine different choices of timestep size to access different time horizons of prediction– that is, the choice of timestep interacts with the choice of γ in determining the prediction horizon.

RNN-S model

This section provides details and pseudocode of the RNN-S simulation. Below are explanations of the most relevant variables:

N Number of states, also equal to the number of neurons in the RNN
x N-length vector of RNN neural activity
J (N×N) synaptic weight matrix
M (N×N) SR matrix
ϕ N-length input vector into network
b binary variable indicating learning (0) or retrieval (1) mode
γB Value of γ the network uses to calculate M in learning mode
γR Value of γ the network uses to calculate M in retrieval mode
n Variable that tracks the activity of neurons integrated over time
λ Discount value the network uses to calculate n
η Learning rate

The RNN-S algorithm is as follows:

Algorithm 1 RNN-S.
Inputs:
  ϕ(t) for t1,,T
  b(t) for t1,,T
Initialize:
  J0N×N
  n0N
  x(t)0N for t1,,T
for t1,,T do
  if b(t)==1 then                                // Retrieval Mode
    M(1γRJ)1
    x(t)Mϕ(t)
  else                                        // Learning Mode
    M(1γBJ)1
    x(t)Mϕ(t)
    nx(t)+λn                           // Learning rate update
    ΔJx(t)x(t1)(Jx(t1))x(t1)            // Calculate weight update
    η=1n                    // Get learning rates (elementwise inversion)
    η=min(η,1.0)                        // Learning rates can’t exceed 1.0
    JijJij+ηjΔJij                      // Update synaptic weight matrix
  end if
end for
return x

RNN-S with plasticity kernels

We introduce additional kernel-related variables to the RNN-S model above that are optimized by an evolutionary algorithm (see following methods subsection for more details):

A+,τ+ pre→ post side of the plasticity kernel K+(t)=A+E-t/τ+
A,τ As above, but for the post→ pre side
αd Scaling term to allow for different self-synapse updates
αn Scaling term to allow for different learning rate updates

We also define the variable tk=20, which is the length of the temporal support for the plasticity kernel. The value of tk was chosen such that e-tk/τ was negligibly small for the range of τ we were interested in. The update algorithm is the same as in Algorithm 1, except lines 15-16 are replaced with the following:

Algorithm 2 Plasticity kernel update
nαnx+λn                               // Learning rate update
k+A+t=0tkx(tt)et/τ+               // Convolution with plasticity kernel
kAt=0tkx(tt)et/τ
ΔJKx(t)k++kx(t)         // Calculate contribution to update from plasticity kernel
ΔJK[ii]αdx(t)k+               // Updates to self-synapses use separate scaling
ΔJΔJK(Jx)x                       // Calculate weight update

Metalearning of RNN parameters

To learn parameters of the RNN-S model, we use covariance matrix adaptation evolution strategy (CMA-ES) to learn the parameters of the plasticity rule. The training data provided are walks simulated from a random distribution of 1D walks. Walks varied in the number of states, the transition statistics, and the number of timesteps simulated. The loss function was the mean-squared error (MSE) loss between the RNN J matrix and the ideal estimated T matrix at the end of the walk.

RNN-S with truncated recurrent steps and nonlinearity

For the RNN-S model with tmax recurrent steps, lines 10 and 13 in Algorithm 1 is replaced with Mt=0tmaxγtJt.

For RNN-S with nonlinear dynamics, there is no closed form solution. So, we select a value for tmax and replace lines 10 and 13 in Algorithm 1 with an iterative update for tmax steps: Δx=x+γJtanh(x)+ϕ. We choose tmax such that γmaxt<104.

RNN-S with successor features

We use a tanh nonlinearity as described above. For simplicity, we set γB=0.

RNN-S with independent normalization

As in Algorithm 1, but with the following in place of line 16

ΔJijxi(t)xj(t1)Jijxj(t1)2 (8)

FF-TD Model

In all simulations of the FF-TD model, we use the temporal difference update. We perform a small grid search over the learning rate η to minimize error (for SR, this is the MSE between the true M and estimated M; for successor features, this is the temporal difference error). In the one-hot SR case, the temporal difference update given an observed transition from state s to state s is:

ΔMji={γMsiMsiif s=ji1+γMsiMsiif s=j=i0otherwise (9)

for all synapses ji. Given arbitrarily structured inputs (as in the successor feature case), the temporal difference update is:

ΔM=η(ϕ+γMϕMϕ)ϕ (10)

or, equivalently,

ΔMji=η(ϕi+γkMkiϕkkMkiϕk)ϕj (11)

Generation of feature encodings for successor feature models

For a walk with N states, we created N-dimensional feature vectors for each state. We choose an initial sparsity probability p and create feature vectors as random binary vectors with probability p of being ‘on’. The feature vectors were then blurred by a 2D Gaussian filter with variance σ with 1 standard deviation of support. The blurred features were then min-subtracted and max-normalized. The sparsity of each feature vector was calculated as the L1 norm divided by N. The sparsity s of the dataset then was the median of all the sparsity values computed from the feature vectors. To vary the spatial correlation of the dataset we need only vary σ. To vary the sparsity s of the dataset we need to vary p, then measure the final s after blurring with σ. Note that, at large σ, the lowest sparsity values in our parameter sweep were not possible to achieve.

Measuring TD loss for successor feature models

We use the standard TD loss function (Equation 18). To measure TD loss, at the end of the walk we take a random sample of observed transition pairs (ϕ,ϕ). We use these transitions as the dataset to evaluate the loss function.

Analysis of place field statistics

We use the open source dataset from Payne et al., 2021. We select for excitatory cells in the anterior tip of the hippocampus. We then select for place cells using standard measures (significantly place-modulated and stable over the course of the experiment).

We determined place field boundaries with a permutation test as in Payne et al., 2021. We then calculated the number of fields per neuron and the field size as in Henriksen et al., 2010. The same analyses were conducted for simulated neural data from the RNN-S and FF-TD models.

Behavioral simulation of Payne et al

We use behavioral tracking data from Payne et al., 2021. For each simulation, we randomly select an experiment and randomly sample a 28-min window from that experiment. If the arena coverage is less than 85% during the window, we redo the sampling until the coverage requirement is satisfied. We then downsample the behavioral data so that the frame rate is the same as our simulation (3 FPS). Then, we divide the arena into a 14×14 grid. We discretize the continuous X/Y location data into these states. This sequence of states makes up the behavioral transitions that the model simulates.

Place field plots

From the models, we get the activity of each model neuron over time. We make firing field plots with the same smoothing parameters as Payne et al., 2021.

Citation diversity statement

Systemic discriminatory practices have been identified in neuroscience citations, and a ‘citation diversity statement’ has been proposed as an intervention (Dworkin et al., 2020; Zurn et al., 2020). There is evidence that quantifying discriminatory practices can lead to systemic improvements in academic settings (Hopkins, 2002). Many forms of discrimination could lead to a paper being under-cited, for example authors being less widely known or less respected due to discrimination related to gender, race, sexuality, disability status, or socioeconomic background. We manually estimated the number of male and female first and last authors that we cited, acknowledging that this quantification ignores many known forms of discrimination, and fails to account for nonbinary/intersex/trans folks. In our citations, first-last author pairs were 62% male-male, 19% female-male, 9% male-female, and 10% female-female, somewhat similar to base rates in our field (biaswatchneuro.com). To familiarize ourselves with the literature, we used databases intended to counteract discrimination (blackinneuro.com, anneslist.net, connectedpapers.com). The process of making this statement improved our paper, and encouraged us to adopt less biased practices in selecting what papers to read and cite in the future. We were somewhat surprised and disappointed at how low the number of female authors were, despite being a female-female team ourselves. Citation practices alone are not enough to correct the power imbalances endemic in academic practice National Academies of Sciences, 2018 — this requires corrections to how concrete power and resources are distributed.

Acknowledgements

This work was supported through NSF NeuroNex Award DBI-1707398, the Gatsby Charitable Foundation, the New York Stem Cell Foundation (Robertson Neuroscience Investigator Award), National Institutes of Health (NIH Director’s New Innovator Award (DP2-AG071918)), and the Arnold and Mabel Beckman Foundation (Beckman Young Investigator Award). CF received support from the NSF Graduate Research Fellowship Program. ELM received support from the Simons Society of Fellows. We thank Jack Lindsey and Tom George for comments on the manuscript, as well as Stefano Fusi, William de Cothi, Kimberly Stachenfeld, and Caswell Barry for helpful discussions.

Appendix 1

Finding the conditions to retrieve from RNN steady-state activity

The successor representation is defined as

M=(IγT)1 (12)

where T is the transition probability matrix such that Tji=P(s=i|s=j) for current state s and future state s

For an RNN with connectivity J, activity x, input ϕ, and gain γ[0,1], the (linear) discrete-time dynamics equation is (Amarimber, 1972)

Δx=x(t)+γJx(t)+ϕ(t). (13)

Furthermore, the steady state solution can be found by setting Δx=0

xSS=(IγJ)1ϕ (14)

Assume that J=TT as a result of the network using some STDP-like learning rule where pre-post connections are potentiated. The transposition is due to notational differences from the RL literature, where the ij th index typically concerns the direction from state i to state j. This is a result of differences in RL and RNN conventions in which inputs are left-multiplied and right-multiplied, respectively. Let γ be a neuromodulatory factor that is applied over the whole network (and, thus, does not need to be encoded in the synaptic weights). Then, the equivalence to equation 12 becomes clear and our steady state solution can be written as:

xSS=MTϕ (15)

This is consistent with the successor representation framework shown in Stachenfeld et al., 2017, where the columns of the M matrix represent the firing fields of a neuron, and the rows of the M matrix represent the network response to some input.

Appendix 2

Deriving the RNN-S learning rule from TD Error and showing the learning rule is valid under a stability condition

Transitions between states (s,s) are observed as features ϕ(s),ϕ(s) where ϕ is some function. For notational simplicity, we will write these observed feature transitions as (ϕ,ϕ). A dataset D is comprised of these observed feature transitions over a behavioral trajectory. Successor features are typically learned by some function approximator ψ(ϕ;θ) that is parameterized by θ and takes in the inputs ϕ. The SF approximator, ψ, is learned by minimizing the temporal difference (TD) loss function (Sutton and Barto, 2018):

L(θ)=E[ϕ+γψπ(ϕ;θ)ψ(ϕ;θ)22|D] (16)

for the current policy π. Here, the TD target is ϕ+γψπ(ϕ;θ). Analogous to the model-free setting where the value function V is being learned, ϕ is in place of the reward r. Following these definitions, we can view the RNN-S as the function approximator ψ:

ψ(ϕ;θ=J)=(IγJ)1ϕ (17)

For a single transition (ϕ,ϕ) we can write out the loss as follows:

L(θ)=ϕ+γψπ(ϕ;θ)(IγJ)1ϕ22 (18)

For each observed transition, we would like to update ψ such that the loss L is minimized. Thus, we take the gradient of this temporal difference loss function with respect to our parameter θ=J:

JL(θ)=2(ϕ+γψπ(ϕ;θ)(IγJ)1ϕ)J((IγJ)1ϕ) (19)

We can make the TD approximation (Sutton and Barto, 2018):

JL(θ)=2(ϕ+γ(IγJ)1ϕ(IγJ)1ϕ)J((IγJ)1ϕ) (20)
=2(ϕ+γ(IγJ)1ϕ(IγJ)1ϕ)((IγJ)1(γ)(IγJ)1ϕ) (21)
=2((IγJ)x+γxx)(γ(IγJ)1x) (22)
=2γ2(xJx)((IγJ)1x) (23)
=2γ2(xJx)x(IγJ) (24)

While -JL(θ) gives the direction of steepest descent in the loss, we will consider a linear transformation of the gradient that allows for a simpler update rule. This simpler update rule will be more amenable to a biologically plausible learning rule. We define this modified gradient as D=JL(θ)M where M=(I-γJ). We must first understand the condition for D to be in a direction of descent:

D,JL>0 (25)
Tr(DJL)>0 (26)
Tr(JLMJL)>0 (27)
Tr(JL(M+M2+MM2)JL)>0 (28)
12Tr(JL(M+M)JL)>0 (29)

This expression is satisfied if M+M is positive definite (its eigenvalues are positive). Thus, we find that our modified gradient points towards a descent direction if the eigenvalues of M+M are positive. Interestingly, this condition is equivalent to stating that the recurrent network dynamics are stable and do not exhibit non-normal amplification (Kumar et al., 2022; Murphy and Miller, 2009; Goldman, 2009). In other words, as long as the network dynamics are in a stable regime and do not have non-normal amplification, our modified gradient reduces the temporal difference loss. Otherwise, the gradient will not point towards a descent direction.

We will use the modified gradient D=(xJx)x as our synaptic weight update rule. Our theoretical analysis explains much of the results seen in the main text. As the gain parameter γB is increased, the network is closer to the edge of stability (the eigenvalues of M are close to positive values, Figure 3A). Stability itself is not enough to guarantee that our update rule is valid. We need the additional constraint that non-normal amplification should not be present (eigenvalues of M+M are positive). In practice, however, this does not seem to be a mode that affects our network. That is, the γB value for which the error in the network increases coincides with the γB value for which the network is no longer stable (Figure 3B). Our theoretical analysis also shows that the gain γB can always be decreased such that the eigenvalues of M+M are positive and our update rule is valid (Figure 3E). At the most extreme, one can set γB=0 during learning to maintain stability (as we do in Figure 4 and onwards).

Appendix 3

Proving the RNN-S update rule calculated on firing rates (x) depends only on feedforward inputs (ϕ) at steady state

We will show that our update rule, which uses x (neural activity), converges on a solution that depends only on ϕ (the feedforward inputs). We will also show that in the one-hot case, we learn the SR exactly.

As a reminder, our learning rule for each ji synapse is:

ΔJ=η(xJx)x (30)

We can solve for the steady state solution of Equation 30 (set ΔJ=0). Let A=(1-γJ)-1 for notational convenience, and recall that in steady state x=Aϕ. Let x denote the average of x over time.

J=xxxx1 (31)
J=Aϕ(Aϕ)Aϕ(Aϕ)1 (32)
J=AϕϕAAϕϕA1 (33)
J=AϕϕA(AϕϕA)1 (34)

Note that, since A=(1-γJ)-1, J=1γ(1-A-1).

AϕϕA=1γ(1A1)AϕϕA (35)
AϕϕA=1γ(AϕϕAϕϕA) (36)
Aϕϕ=1γ(Aϕϕϕϕ) (37)

Thus,

ϕϕ=1γ(1A1)ϕϕ (38)

Therefore,

J=ϕϕϕϕ1 (39)
J=Rϕϕ(1)Rϕϕ(0)1 (40)

where Rϕϕ(τ) is the autocorrelation matrix for some time lag τ. Therefore, the RNN-S weight matrix J at steady state is only dependent on the inputs into the RNN over time.

In the case where ϕ is one-hot, we compute the SR exactly. This is because the steady state solution at each ji synapse simplifies into the following expression:

Jij=tϕj(t1)ϕi(t)tϕj(t) (41)

This is the definition of the transition probability matrix and we see that J=T. Note that the solution for Jij in Equation 41 is undefined if state j is never visited. We assume each relevant state is visited at least once here.

Appendix 4

Deriving the adaptive learning rate update rule

This section explains how the adaptive learning rate is derived. The logic will be similar to calculating a weighted running average. Let dij(t) be a binary function that is 1 if the transition from timestep t-1 to timestep t is state j to state i. Otherwise, it is 0. Assume ϕ is one-hot encoded. Notice that in the one-hot case, the RNN-S update rule (Equation 4) simplifies to:

ΔJijη(dijJijxj) (42)

What η should be used so J approaches T as quickly as possible? During learning, the empirical transition matrix, T(t), changes at each timestep t, based on transitions the animal has experienced. Define the total number of times that state ϕj happened prior to time t as nj(t)=t=1ϕj(t-t), and define the running count of transitions from state j to state i as cij(t)=t=1dij(t-t). We want J(t)=T(t), which necessitates

ΔJij(t)=Tji(t)Tji(t1)=cij(t)nj(t)cij(t1)nj(t1) (43)
=nj(t1)cij(t)cij(t1)nj(t)nj(t)nj(t1) (44)

Note that nj(t)=nj(t-1)+ϕj(t-1), and cij(t)=cij(t-1)+dij(t), which gives us

ΔJij(t)=nj(t1)cij(t1)+nj(t1)dij(t)cij(t1)nj(t1)cij(t1)ϕj(t1)nj(t)nj(t1) (45)
=nj(t1)dij(t)cij(t1)ϕj(t1)nj(t)nj(t1) (46)
=1nj(t)(dij(t)cij(t1)ϕj(t1)nj(t1)) (47)
=1nj(t)(dij(t)Tjiϕj(t1)) (48)

Therefore, comparing with Equation 42, we can see that a learning rate ηj=1nj(t) will let J=T as quickly as possible. We have defined n in terms of the inputs ϕ for this derivation, but in practice the adaptive learning rate as a function of x works well with the RNN-S update rule (which is also a function of x). Thus, we use the adaptive learning rate defined over x in our combined learning rule for increased biological plausibility.

In its current form, the update equation assumes transitions across all history of inputs are integrated. In reality, there is likely some kind of memory decay. This can be implemented with a decay term λ(0,1):

nj(t)=t=1λtxj(tt) (49)

λ determines the recency bias over the observed transitions that make up the T estimate. The addition of λ has the added benefit that it naturally provides a mechanism for learning rates to modulate over time. If λ=1, the learning rate can only monotonically decrease. If λ<1, the learning rate can become strong again over time if a state has not been visited in a while. This provides a mechanism for fast learning of new associations, which is useful for a variety of effects, including remapping.

Appendix 5

Endotaxis model and the successor representation

The learning rule and architecture of our model is similar to a hypothesized “endotaxis” model (Zhang et al., 2021). In the endotaxis model, neurons fire most strongly near a reward, allowing the animal to navigate up a gradient of neural activity akin to navigating up an odor gradient. The endotaxis model discovers the structure of an environment and can solve many tasks such as spatial navigation and abstract puzzles. We were interested in similarities between RNN-S and the learning rules for endotaxis, in support of the idea that SR-like representations may be used by the brain for a broad range of intelligent behaviors. Here, we outline similarities and differences between the two model architectures.

The endotaxis paper (Zhang et al., 2021) uses Oja’s rule in an RNN with place-like inputs. The SR can also be learned with an Oja-like learning rule. Oja’s rule is typically written as (Oja, 1982):

ΔJij=ηxjxiηJijxi2 (50)

If we assume that there is a temporal asymmetry to the potentiation term (e.g., potentiation is more STDP-like than Hebbian), then we have

ΔJij=ηxj(t1)xi(t)ηJijxi(t)2 (51)

We then solve for the steady state solution of this equation, when ΔJij=0:

0=ηxj(t1)xi(t)ηJijxi(t)2 (52)
Jij=xj(t1)xi(t)xi(t)2 (53)
Jij=txj(t1)xi(t)txi(t)2 (54)

where indicates the time-average of some term. Assume that the plasticity rule does not use x exactly, but instead uses ϕ directly. Given that inputs are one-hot encodings of the animal’s state at some time t, the expression becomes

Jij=tϕj(t1)ϕi(t)tϕi(t) (55)

If we assume T is symmetric, J=T. Alternatively, if we use pre-synaptic normalization as opposed to the standard post-synaptic normalization of Oja’s rule (i.e., index j instead of i in the denominator), we also have J=T. Thus, the steady state activity of a RNN with this learning rule retrieves the SR, as shown in subsection 4.14.

Appendix 6

Independent normalization and successor features

If we assume the same Oja-like rule as in Appendix 5, we can also arrive at a similar interpretation in the successor feature case as in Equation 7. By solving for the steady state solution without any assumptions about the inputs ϕ, we get the following equation:

J=Rϕϕ(1)diag(Rϕϕ(0))1 (56)

where diag is a function that retains only the diagonal of the matrix. This expression provides a useful way to contrast the learning rule used in RNN-S with an Oja-like alternative. While RNN-S normalizes by the full autocorrelation matrix, an Oja-like rule only normalizes by the diagonal of the matrix. This is the basis of our independent normalization model in Figure 4BC.

Appendix 7

Comparing alternate forms of normalizing the synaptic weight matrix

The anti-Hebbian term of the RNN-S learning rule normalizes the synaptic weight matrix into exactly a transition probability matrix. We wanted to test how important it was to use this exact normalization and whether other forms of the synaptic weight matrix could yield similar results. We simulated representations that would arise from different normalization procedures. For these tests, we simulate a random walk on a circular track, as in Figure 2, for 10 minutes of simulation time. A model where the synaptic weight matrix exactly estimates the transition probability matrix (as in Equation 4) will give the SR matrix (Appendix 7—figure 1A).

We test a model where the normalization term for the synaptic weight matrix is removed. Thus, J will be equal to the count of observed transitions, i.e. Jij is equal to the number of experienced transitions from state j to state i. We will refer to this as a count matrix. Without normalization, the values in the count matrix will increase steadily over the course of the simulation. This quickly results in unstable dynamics from the weights of the matrix being too large (Appendix 7—figure 1B). A simple way to prevent instability (specifically, ensure the maximum eigenvalue of the synaptic weight matrix is below 1) is to use an additional scaling factor α over the weights of the matrix, such that J is multiplied by 1αmax(J). A careful choice of a scaling value can ensure network activity remains stable within this walk, although this is not a generalizable solution as different scaling values may be needed for different random walks and tasks. However, even with this modification the representations above are not sufficiently predictive compared to the original SR (the off-diagonal elements of the SR are not captured well), and the activity strength seems to be unevenly distributed across the states (Appendix 7—figure 1CD). It is likely that depressing all synapses by the same factor (similar to Fiete et al., 2010) does not correct for differences in occupancies. In other words, states that happen to be visited more by chance are likely to dominate the dynamics, even if the transition statistics are identical across all states.

Final, as a further test of different ways of parameterizing the synaptic weight matrix, we examine the steady state neural activity when the count matrix is instead scaled in a row-by-row fashion (Appendix 7—figure 1E). Specifically, we divide each row i of the count matrix by the maximum of row i (and some global scaling factor to ensure stability). Note that this is in contrast to T, where each row is divided by its sum. This is closer to the SR matrix expected if the synaptic weight matrix estimates T. We see there is a slight unevenness early on in learning in the diagonal of the matrix (Appendix 7—figure 1E). However, given enough observed transitions, the predictive representation looks reasonable and quite similar to the SR matrix.

Overall, we see that there are likely other formulations of the synaptic weight matrix that can give a representation similar to the SR. The important ingredient for this to happen appears to be some type of row-dependent normalization-- that is, neurons should have their synaptic weights normalized independently of each other. This ensures that occupancy is not conflated with transition statistics.

Appendix 7—figure 1. SR matrices under different forms of normalization.

Appendix 7—figure 1.

(A) The resulting SR matrix from a random walk on a circular track for 10minutes, if the synaptic weight matrix exactly estimates the transition probability matrix (as in Equation 4). (B) Model as in (A), but with normalization removed. Thus, J will be equal to the count of observed transitions, i.e. Jij is equal to the number of experienced transitions from state j to state i. We will refer to this as a count matrix. The plot show the maximum eigenvalue of the weight matrix, where an eigenvalue 1 indicates instability (Sompolinsky et al., 1988). (C) As in (B), but with an additional scaling factor α over the weights of the matrix, such that J is multiplied by 1αmax(J). (D) Steady state neural activity of the model in (C) with scaling factor 1.75. (E) As in (D), but the count matrix is instead scaled in a row-by-row fashion. Specifically, we divide each row i of the count matrix by the maximum of row i (and some global scaling factor to ensure stability).

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Emily L Mackevicius, Email: em3406@columbia.edu.

Srdjan Ostojic, École Normale Supérieure Paris, France.

Timothy E Behrens, University of Oxford, United Kingdom.

Funding Information

This paper was supported by the following grants:

  • National Science Foundation NeuroNex Award DBI-1707398 to LF Abbott, Ching Fang, Emily L Mackevicius.

  • Gatsby Charitable Foundation to LF Abbott, Ching Fang, Emily L Mackevicius.

  • New York Stem Cell Foundation Robertson Neuroscience Investigator Award to Ching Fang, Dmitriy Aronov, Emily L Mackevicius.

  • National Institutes of Health NIH Director's New Innovator Award (DP2-AG071918) to Ching Fang, Dmitriy Aronov, Emily L Mackevicius.

  • Arnold and Mabel Beckman Foundation Beckman Young Investigator Award to Ching Fang, Dmitriy Aronov, Emily L Mackevicius.

  • National Science Foundation Graduate Research Fellowship Program to Ching Fang.

  • Simons Foundation Society of Fellows to Emily L Mackevicius.

Additional information

Competing interests

No competing interests declared.

No competing interests declared.

Author contributions

Conceptualization, Resources, Software, Formal analysis, Funding acquisition, Validation, Methodology, Writing – original draft, Writing – review and editing.

Conceptualization, Resources, Supervision, Funding acquisition, Visualization, Methodology, Writing – original draft, Writing – review and editing.

Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Visualization, Methodology, Writing – original draft, Writing – review and editing.

Conceptualization, Resources, Software, Formal analysis, Supervision, Funding acquisition, Validation, Visualization, Methodology, Writing – original draft, Writing – review and editing.

Additional files

Transparent reporting form

Data availability

The current manuscript is a computational study, so no data have been generated for this manuscript. Modelling code is publicly available on GitHub: https://github.com/chingf/sr-project (copy archived at swh:1:rev:43320e9b8c15927c67849f768d2a9bf17f68a0ea).

The following previously published dataset was used:

Payne H, Lynch G, Aronov D. 2021. Neural representations of space in the hippocampus of a food-caching bird. Dryad Digital Repository.

References

  1. Abbott LF, Blum KI. Functional significance of long-term potentiation for sequence learning and prediction. Cerebral Cortex. 1996;6:406–416. doi: 10.1093/cercor/6.3.406. [DOI] [PubMed] [Google Scholar]
  2. Abraham WC, Bear MF. Metaplasticity: the plasticity of synaptic plasticity. Trends in Neurosciences. 1996;19:126–130. doi: 10.1016/s0166-2236(96)80018-x. [DOI] [PubMed] [Google Scholar]
  3. Abraham WC. Metaplasticity: tuning synapses and networks for plasticity. Nature Reviews. Neuroscience. 2008;9:387–390. doi: 10.1038/nrn2356. [DOI] [PubMed] [Google Scholar]
  4. Aitchison L, Jegminat J, Menendez JA, Pfister JP, Pouget A, Latham PE. Synaptic plasticity as Bayesian inference. Nature Neuroscience. 2021;24:565–571. doi: 10.1038/s41593-021-00809-5. [DOI] [PubMed] [Google Scholar]
  5. Amarimber S-I. Characteristics of random nets of analog neuron-like elements. IEEE Transactions on Systems, Man, and Cybernetics. 1972;SMC-2:643–657. doi: 10.1109/TSMC.1972.4309193. [DOI] [Google Scholar]
  6. Barreto A, Dabney W, Munos R, Hunt JJ, Schaul T, Hasselt HP, Silver D. Successor Features for Transfer in Reinforcement Learning. arXiv. 2017 https://arxiv.org/abs/1606.05312
  7. Bellmund JLS, Polti I, Doeller CF. Sequence memory in the hippocampal-entorhinal region. Journal of Cognitive Neuroscience. 2020;32:2056–2070. doi: 10.1162/jocn_a_01592. [DOI] [PubMed] [Google Scholar]
  8. Bi GQ, Poo MM. Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. The Journal of Neuroscience. 1998;18:10464–10472. doi: 10.1523/JNEUROSCI.18-24-10464.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bittner KC, Grienberger C, Vaidya SP, Milstein AD, Macklin JJ, Suh J, Tonegawa S, Magee JC. Conjunctive input processing drives feature selectivity in hippocampal CA1 neurons. Nature Neuroscience. 2015;18:1133–1142. doi: 10.1038/nn.4062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bittner SR, Palmigiano A, Piet AT, Duan CA, Brody CD, Miller KD, Cunningham J. Interrogating theoretical models of neural computation with emergent property inference. eLife. 2021;10:e56265. doi: 10.7554/eLife.56265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Blum KI, Abbott LF. A model of spatial MAP formation in the hippocampus of the rat. Neural Computation. 1996;8:85–93. doi: 10.1162/neco.1996.8.1.85. [DOI] [PubMed] [Google Scholar]
  12. Bonnen T, Yamins DLK, Wagner AD. When the ventral visual stream is not enough: a deep learning account of medial temporal lobe involvement in perception. Neuron. 2021;109:2755–2766. doi: 10.1016/j.neuron.2021.06.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Bono J, Zannone S, Pedrosa V, Clopath C. Learning predictive cognitive maps with spiking neurons during behaviour and replays. eLife. 2023;12:e80671. doi: 10.7554/eLife.80671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Brea J, Gaál AT, Urbanczik R, Senn W. Prospective coding by spiking neurons. PLOS Computational Biology. 2016;12:e1005003. doi: 10.1371/journal.pcbi.1005003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Brun VH, Solstad T, Kjelstrup KB, Fyhn M, Witter MP, Moser EI, Moser MB. Progressive increase in grid scale from dorsal to ventral medial entorhinal cortex. Hippocampus. 2008;18:1200–1212. doi: 10.1002/hipo.20504. [DOI] [PubMed] [Google Scholar]
  16. Brunec IK, Momennejad I. Predictive representations in hippocampal and prefrontal hierarchies. The Journal of Neuroscience. 2022;42:299–312. doi: 10.1523/JNEUROSCI.1327-21.2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Brzosko Z, Mierau SB, Paulsen O. Neuromodulation of spike-timing-dependent plasticity: past, present, and future. Neuron. 2019;103:563–581. doi: 10.1016/j.neuron.2019.05.041. [DOI] [PubMed] [Google Scholar]
  18. Bubic A, von Cramon DY, Schubotz RI. Prediction, cognition and the brain. Frontiers in Human Neuroscience. 2010;4:25. doi: 10.3389/fnhum.2010.00025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Burbank KS. Mirrored STDP implements autoencoder learning in a network of spiking neurons. PLOS Computational Biology. 2015;11:e1004566. doi: 10.1371/journal.pcbi.1004566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Corkin S. What’s new with the amnesic patient h.m.? Nature Reviews. Neuroscience. 2002;3:153–160. doi: 10.1038/nrn726. [DOI] [PubMed] [Google Scholar]
  21. Das R, Tenenbaum JB, Solar-Lezama A, Tavares Z. Autumnsynth: synthesis of reactive programs with structured latent state. Advances in Programming Languages and Neurosymbolic Systems Workshop; 2021.2021. [Google Scholar]
  22. Dayan P. Improving generalization for temporal difference learning: the successor representation. Neural Computation. 1993;5:613–624. doi: 10.1162/neco.1993.5.4.613. [DOI] [Google Scholar]
  23. Dayan P, Abbott LF. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. Massachusetts, United States: The MIT Press; 2001. [Google Scholar]
  24. de Cothi W, Barry C. Neurobiological successor features for spatial navigation. Hippocampus. 2020;30:1347–1355. doi: 10.1002/hipo.23246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Dolorfo CL, Amaral DG. Entorhinal cortex of the rat: topographic organization of the cells of origin of the perforant path projection to the dentate gyrus. The Journal of Comparative Neurology. 1998;398:25–48. [PubMed] [Google Scholar]
  26. Duvelle É, Grieves RM, Liu A, Jedidi-Ayoub S, Holeniewska J, Harris A, Nyberg N, Donnarumma F, Lefort JM, Jeffery KJ, Summerfield C, Pezzulo G, Spiers HJ. Hippocampal place cells encode global location but not connectivity in a complex space. Current Biology. 2021;31:1221–1233. doi: 10.1016/j.cub.2021.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Dworkin JD, Linn KA, Teich EG, Zurn P, Shinohara RT, Bassett DS. The extent and drivers of gender imbalance in neuroscience reference Lists. Nature Neuroscience. 2020;23:918–926. doi: 10.1038/s41593-020-0658-y. [DOI] [PubMed] [Google Scholar]
  28. Evans T, Burgess N. Replay as Structural Inference in the Hippocampal-Entorhinal System. bioRxiv. 2020 doi: 10.1101/2020.08.07.241547. [DOI]
  29. Fang C. Sr-project. swh:1:rev:43320e9b8c15927c67849f768d2a9bf17f68a0eaSoftware Heritage. 2022 https://archive.softwareheritage.org/swh:1:dir:7d0694e03e241f453e530eeb5dd850a85d929de6;origin=https://github.com/chingf/sr-project;visit=swh:1:snp:802a8c5651d1f4615916bbae5ac7d25d89e63748;anchor=swh:1:rev:43320e9b8c15927c67849f768d2a9bf17f68a0ea
  30. Fiete IR, Senn W, Wang CZH, Hahnloser RHR. Spike-time-dependent plasticity and heterosynaptic competition organize networks to produce long scale-free sequences of neural activity. Neuron. 2010;65:563–576. doi: 10.1016/j.neuron.2010.02.003. [DOI] [PubMed] [Google Scholar]
  31. Földiák P. Forming sparse representations by local anti-hebbian learning. Biol Cybern. 1990;64:165–170. doi: 10.1007/BF02331346. [DOI] [PubMed] [Google Scholar]
  32. Foster DJ, Wilson MA. Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature. 2006;440:680–683. doi: 10.1038/nature04587. [DOI] [PubMed] [Google Scholar]
  33. Frank M. An Introduction to Model-Based Cognitive Neuroscience. New York, NY: Springer; 2015. [Google Scholar]
  34. Fusi S, Asaad WF, Miller EK, Wang XJ. A neural circuit model of flexible sensorimotor mapping: learning and forgetting on multiple timescales. Neuron. 2007;54:319–333. doi: 10.1016/j.neuron.2007.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Gardner-Medwin AR. The recall of events through the learning of associations between their parts. Proceedings of the Royal Society of London. Series B, Biological Sciences. 1976;194:375–402. doi: 10.1098/rspb.1976.0084. [DOI] [PubMed] [Google Scholar]
  36. Garvert MM, Dolan RJ, Behrens TE. A map of abstract relational knowledge in the human hippocampal-entorhinal cortex. eLife. 2017;6:e17086. doi: 10.7554/eLife.17086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Geerts JP, Chersi F, Stachenfeld KL, Burgess N. A general model of hippocampal and dorsal striatal learning and decision making. PNAS. 2020;117:31427–31437. doi: 10.1073/pnas.2007981117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. George D, Rikhye RV, Gothoskar N, Guntupalli JS, Dedieu A, Lázaro-Gredilla M. Clone-structured graph representations enable flexible learning and vicarious evaluation of cognitive maps. Nature Communications. 2021;12:1–17. doi: 10.1038/s41467-021-22559-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. George TM, de Cothi W, Stachenfeld K, Barry C. Rapid learning of predictive maps with STDP and theta phase precession. eLife. 2023;12:e80663. doi: 10.7554/eLife.80663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Gershman SJ, Moore CD, Todd MT, Norman KA, Sederberg PB. The successor representation and temporal context. Neural Computation. 2012;24:1553–1568. doi: 10.1162/NECO_a_00282. [DOI] [PubMed] [Google Scholar]
  41. Goldman MS, Golowasch J, Marder E, Abbott LF. Global structure, robustness, and modulation of neuronal models. The Journal of Neuroscience. 2001;21:5229–5238. doi: 10.1523/JNEUROSCI.21-14-05229.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Goldman MS. Memory without feedback in a neural network. Neuron. 2009;61:621–634. doi: 10.1016/j.neuron.2008.12.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Goodman ND, Tenenbaum JB, Contributors TP. Probabilistic Models of Cognition. 2016. [May 3, 2022]. http://probmods.org/
  44. Hardcastle K, Maheswaranathan N, Ganguli S, Giocomo LM. A multiplexed, heterogeneous, and adaptive code for navigation in medial entorhinal cortex. Neuron. 2017;94:375–387. doi: 10.1016/j.neuron.2017.03.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Hasselmo ME. Neuromodulation: acetylcholine and memory consolidation. Trends in Cognitive Sciences. 1999;3:351–359. doi: 10.1016/S1364-6613(99)01365-0. [DOI] [PubMed] [Google Scholar]
  46. Hasselmo ME. The role of acetylcholine in learning and memory. Current Opinion in Neurobiology. 2006;16:710–715. doi: 10.1016/j.conb.2006.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Heckman CJ, Mottram C, Quinlan K, Theiss R, Schuster J. Motoneuron excitability: the importance of neuromodulatory inputs. Clinical Neurophysiology. 2009;120:2040–2054. doi: 10.1016/j.clinph.2009.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Henriksen EJ, Colgin LL, Barnes CA, Witter MP, Moser MB, Moser EI. Spatial representation along the proximodistal axis of CA1. Neuron. 2010;68:127–137. doi: 10.1016/j.neuron.2010.08.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Hertäg L, Clopath C. Prediction-error neurons in circuits with multiple neuron types: formation, refinement, and functional implications. PNAS. 2022;119:e2115699119. doi: 10.1073/pnas.2115699119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Hopkins N. A study on the status of women faculty in science at mit in AIP conference proceedings. American Institute of Physics. 2002;628:103–106. doi: 10.1063/1.1505288. [DOI] [Google Scholar]
  51. Hulme SR, Jones OD, Raymond CR, Sah P, Abraham WC. Mechanisms of heterosynaptic metaplasticity. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 2014;369:1633. doi: 10.1098/rstb.2013.0148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Jeffery KJ. How environmental movement constraints shape the neural code for space. Cognitive Processing. 2021;22:97–104. doi: 10.1007/s10339-021-01045-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Jung MW, Wiener SI, McNaughton BL. Comparison of spatial firing characteristics of units in dorsal and ventral hippocampus of the rat. The Journal of Neuroscience. 1994;14:7347–7356. doi: 10.1523/JNEUROSCI.14-12-07347.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Kaplanis C, Shanahan M, Clopath C. Continual reinforcement learning with complex synapses. Proceedings of the 35th International Conference on Machine Learning; 2018. pp. 2497–2506. [Google Scholar]
  55. Karimi P, Golkar S, Friedrich J, Chklovskii D. Learning a biologically plausible linear controller for nonlinear systems. APS March Meeting 2022.2022. [Google Scholar]
  56. Kay K, Chung JE, Sosa M, Schor JS, Karlsson MP, Larkin MC, Liu DF, Frank LM. Constant sub-second cycling between representations of possible futures in the hippocampus. Cell. 2020;180:552–567. doi: 10.1016/j.cell.2020.01.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Kjelstrup KB, Solstad T, Brun VH, Hafting T, Leutgeb S, Witter MP, Moser EI, Moser MB. Finite scale of spatial representation in the hippocampus. Science. 2008;321:140–143. doi: 10.1126/science.1157086. [DOI] [PubMed] [Google Scholar]
  58. Knight LS, Wenzel HJ, Schwartzkroin PA. Inhibition and interneuron distribution in the dentate gyrus of p35 knockout mice. Epilepsia. 2012;53 Suppl 1:161–170. doi: 10.1111/j.1528-1167.2012.03487.x. [DOI] [PubMed] [Google Scholar]
  59. Kulkarni TD, Saeedi A, Gautam S, Gershman SJ. Deep Successor Reinforcement Learning. arXiv. 2016 https://arxiv.org/abs/1606.02396
  60. Kullmann DM, Lamsa KP. Long-Term synaptic plasticity in hippocampal interneurons. Nature Reviews. Neuroscience. 2007;8:687–699. doi: 10.1038/nrn2207. [DOI] [PubMed] [Google Scholar]
  61. Kumar A, Bouchard K, Kitayama K, Jalali B. In: AI and Optical Data Sciences III. Jalali B, Kitayama K, editors. San Francisco, United States: SPIE; 2022. Non-normality in neural networks; pp. 204–227. [DOI] [Google Scholar]
  62. Lamsa KP, Heeroma JH, Somogyi P, Rusakov DA, Kullmann DM. Anti-hebbian long-term potentiation in the hippocampal feedback inhibitory circuit. Science. 2007;315:1262–1266. doi: 10.1126/science.1137450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Le Duigou C, Simonnet J, Teleñczuk MT, Fricker D, Miles R. Recurrent synapses and circuits in the CA3 region of the hippocampus: an associative network. Frontiers in Cellular Neuroscience. 2014;7:262. doi: 10.3389/fncel.2013.00262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Lee H. Toward the biological model of the hippocampus as the successor representation agent. Bio Systems. 2022;213:104612. doi: 10.1016/j.biosystems.2022.104612. [DOI] [PubMed] [Google Scholar]
  65. Lisman J, Redish AD. Prediction, sequences and the hippocampus. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 2009;364:1193–1201. doi: 10.1098/rstb.2008.0316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Litwin-Kumar A, Doiron B. Formation and maintenance of neuronal assemblies through synaptic plasticity. Nature Communications. 2014;5:1–12. doi: 10.1038/ncomms6319. [DOI] [PubMed] [Google Scholar]
  67. Liu X, Ramirez S, Pang PT, Puryear CB, Govindarajan A, Deisseroth K, Tonegawa S. Optogenetic stimulation of a hippocampal engram activates fear memory recall. Nature. 2012;484:381–385. doi: 10.1038/nature11028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Lothman EW, Bertram EH, Stringer JL. Functional anatomy of hippocampal seizures. Progress in Neurobiology. 1991;37:1–82. doi: 10.1016/0301-0082(91)90011-o. [DOI] [PubMed] [Google Scholar]
  69. Love BC. Levels of biological plausibility. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 2021;376:1815. doi: 10.1098/rstb.2019.0632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Mackevicius EL, Happ MTL, Fee MS. An avian cortical circuit for chunking tutor song syllables into simple vocal-motor units. Nature Communications. 2020;11:1–16. doi: 10.1038/s41467-020-18732-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Marder E, Goaillard JM. Variability, compensation and homeostasis in neuron and network function. Nature Reviews. Neuroscience. 2006;7:563–574. doi: 10.1038/nrn1949. [DOI] [PubMed] [Google Scholar]
  72. Marder E, Taylor AL. Multiple models to capture the variability in biological neurons and networks. Nature Neuroscience. 2011;14:133–138. doi: 10.1038/nn.2735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Markus EJ, Qin YL, Leonard B, Skaggs WE, McNaughton BL, Barnes CA. Interactions between location and task affect the spatial and directional firing of hippocampal neurons. The Journal of Neuroscience. 1995;15:7079–7094. doi: 10.1523/JNEUROSCI.15-11-07079.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Marr D, Poggio T. From understanding computation to understanding neural circuitry. MIT Artifical Intelligence Laboratory; 1976. https://dspace.mit.edu/handle/1721.1/5782?show=full [Google Scholar]
  75. Marr D, Willshaw D, McNaughton B. In: From the Retina to the Neocortex. Vaina L, editor. Springer; 1991. Simple memory: a theory for archicortex; pp. 59–128. [DOI] [Google Scholar]
  76. Mattar MG, Daw ND. Prioritized memory access explains planning and hippocampal replay. Nature Neuroscience. 2018;21:1609–1617. doi: 10.1038/s41593-018-0232-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Mattar MG, Lengyel M. Planning in the brain. Neuron. 2022;110:914–934. doi: 10.1016/j.neuron.2021.12.018. [DOI] [PubMed] [Google Scholar]
  78. McKenzie S, Frank AJ, Kinsky NR, Porter B, Rivière PD, Eichenbaum H. Hippocampal representation of related and opposing memories develop within distinct, hierarchically organized neural schemas. Neuron. 2014;83:202–215. doi: 10.1016/j.neuron.2014.05.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. McNaughton BL, Morris RGM. Hippocampal synaptic enhancement and information storage within a distributed memory system. Trends in Neurosciences. 1987;10:408–415. doi: 10.1016/0166-2236(87)90011-7. [DOI] [Google Scholar]
  80. Mehta MR, Barnes CA, McNaughton BL. Experience-dependent, asymmetric expansion of hippocampal place fields. PNAS. 1997;94:8918–8921. doi: 10.1073/pnas.94.16.8918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Mehta MR, Quirk MC, Wilson MA. Experience-dependent asymmetric shape of hippocampal receptive fields. Neuron. 2000;25:707–715. doi: 10.1016/s0896-6273(00)81072-7. [DOI] [PubMed] [Google Scholar]
  82. Miles R, Wong RK. Excitatory synaptic interactions between CA3 neurones in the guinea-pig hippocampus. The Journal of Physiology. 1986;373:397–418. doi: 10.1113/jphysiol.1986.sp016055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Momennejad I, Russek EM, Cheong JH, Botvinick MM, Daw ND, Gershman SJ. The successor representation in human reinforcement learning. Nature Human Behaviour. 2017;1:680–692. doi: 10.1038/s41562-017-0180-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Momennejad I, Howard MW. Predicting the Future with Multi-Scale Successor Representations. bioRxiv. 2018 doi: 10.1101/449470. [DOI]
  85. Momennejad I. Learning structures: predictive representations, replay, and generalization. Current Opinion in Behavioral Sciences. 2020;32:155–166. doi: 10.1016/j.cobeha.2020.02.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Monaco JD, Rao G, Roth ED, Knierim JJ. Attentive scanning behavior drives one-trial potentiation of hippocampal place fields. Nature Neuroscience. 2014;17:725–731. doi: 10.1038/nn.3687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Muller RU, Kubie JL. The firing of hippocampal place cells predicts the future position of freely moving rats. The Journal of Neuroscience. 1989;9:4101–4110. doi: 10.1523/JNEUROSCI.09-12-04101.1989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Murphy BK, Miller KD. Balanced amplification: a new mechanism of selective amplification of neural activity patterns. Neuron. 2009;61:635–648. doi: 10.1016/j.neuron.2009.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Nadim F, Bucher D. Neuromodulation of neurons and synapses. Current Opinion in Neurobiology. 2014;29:48–56. doi: 10.1016/j.conb.2014.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. National Academies of Sciences . Sexual Harassment of Women: Climate, Culture, and Consequences in Academic Sciences, Engineering, and Medicine. Washington, United States: National Academies of Sciences; 2018. [PubMed] [Google Scholar]
  91. Oja E. A simplified neuron model as a principal component analyzer. Journal of Mathematical Biology. 1982;15:267–273. doi: 10.1007/BF00275687. [DOI] [PubMed] [Google Scholar]
  92. Olshausen BA, Field DJ. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature. 1996;381:607–609. doi: 10.1038/381607a0. [DOI] [PubMed] [Google Scholar]
  93. Ostojic S, Fusi S. Synaptic encoding of temporal Contiguity. Frontiers in Computational Neuroscience. 2013;7:32. doi: 10.3389/fncom.2013.00032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Pawlak V, Wickens JR, Kirkwood A, Kerr JND. Timing is not everything: neuromodulation opens the STDP gate. Frontiers in Synaptic Neuroscience. 2010;2:146. doi: 10.3389/fnsyn.2010.00146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Payne HL, Lynch GF, Aronov D. Neural representations of space in the hippocampus of a food-caching bird. Science. 2021;373:343–348. doi: 10.1126/science.abg2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Pehlevan C, Mohan S, Chklovskii DB. Blind nonnegative source separation using biological neural networks. Neural Computation. 2017;29:2925–2954. doi: 10.1162/neco_a_01007. [DOI] [PubMed] [Google Scholar]
  97. Pehlevan C, Sengupta AM, Chklovskii DB. Why do similarity matching objectives lead to hebbian/anti-hebbian networks? Neural Computation. 2018;30:84–124. doi: 10.1162/neco_a_01018. [DOI] [PubMed] [Google Scholar]
  98. Penfield W, Milner B. Memory deficit produced by bilateral lesions in the hippocampal zone. A.M.A. Archives of Neurology and Psychiatry. 1958;79:475–497. doi: 10.1001/archneurpsyc.1958.02340050003001. [DOI] [PubMed] [Google Scholar]
  99. Pfeiffer BE, Foster DJ. Hippocampal place-cell sequences depict future paths to remembered goals. Nature. 2013;497:74–79. doi: 10.1038/nature12112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Poppenk J, Evensmoen HR, Moscovitch M, Nadel L. Long-axis specialization of the human hippocampus. Trends in Cognitive Sciences. 2013;17:230–240. doi: 10.1016/j.tics.2013.03.005. [DOI] [PubMed] [Google Scholar]
  101. Prinz AA, Bucher D, Marder E. Similar network activity from disparate circuit parameters. Nature Neuroscience. 2004;7:1345–1352. doi: 10.1038/nn1352. [DOI] [PubMed] [Google Scholar]
  102. Ramón S, Cajal S. Textura Del Sistema Nervioso Del Hombre y de Los Vertebrados. Imprenta y Librería de Nicolás Moya Madrid; 1904. [Google Scholar]
  103. Recanatesi S, Farrell M, Lajoie G, Deneve S, Rigotti M, Shea-Brown E. Predictive learning as a network mechanism for extracting low-dimensional latent space representations. Nature Communications. 2021;12:1–13. doi: 10.1038/s41467-021-21696-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Ribas-Fernandes JJF, Shahnazian D, Holroyd CB, Botvinick MM. Subgoal- and goal-related reward prediction errors in medial prefrontal cortex. Journal of Cognitive Neuroscience. 2019;31:8–23. doi: 10.1162/jocn_a_01341. [DOI] [PubMed] [Google Scholar]
  105. Russek EM, Momennejad I, Botvinick MM, Gershman SJ, Daw ND. Predictive representations can link model-based reinforcement learning to model-free mechanisms. bioRxiv. 2017 doi: 10.1101/083857. [DOI] [PMC free article] [PubMed]
  106. Sadeh S, Clopath C. Excitatory-Inhibitory balance modulates the formation and dynamics of neuronal assemblies in cortical networks. Science Advances. 2021;7:eabg8411. doi: 10.1126/sciadv.abg8411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Schaffer K. Beitrag Zur histologie Der ammonshornformation. Archiv Für Mikroskopische Anatomie. 1892;39:611–632. doi: 10.1007/BF02961541. [DOI] [Google Scholar]
  108. Schapiro AC, Turk-Browne NB, Norman KA, Botvinick MM. Statistical learning of temporal community structure in the hippocampus. Hippocampus. 2016;26:3–8. doi: 10.1002/hipo.22523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Scoville WB, Milner B. Loss of recent memory after bilateral hippocampal lesions. Journal of Neurology, Neurosurgery, and Psychiatry. 1957;20:11–21. doi: 10.1136/jnnp.20.1.11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Sheffield MEJ, Dombeck DA. Calcium transient prevalence across the dendritic arbour predicts place field properties. Nature. 2015;517:200–204. doi: 10.1038/nature13871. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Singer AC, Carr MF, Karlsson MP, Frank LM. Hippocampal SWR activity predicts correct decisions during the initial learning of an alternation task. Neuron. 2013;77:1163–1173. doi: 10.1016/j.neuron.2013.01.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Skaggs WE, McNaughton BL. Replay of neuronal firing sequences in rat hippocampus during sleep following spatial experience. Science. 1996;271:1870–1873. doi: 10.1126/science.271.5257.1870. [DOI] [PubMed] [Google Scholar]
  113. Sompolinsky H, Crisanti A, Sommers HJ. Chaos in random neural networks. Physical Review Letters. 1988;61:259–262. doi: 10.1103/PhysRevLett.61.259. [DOI] [PubMed] [Google Scholar]
  114. Sparks FT, Liao Z, Li W, Grosmark A, Soltesz I, Losonczy A. Hippocampal adult-born granule cells drive network activity in a mouse model of chronic temporal lobe epilepsy. Nature Communications. 2020;11:1–13. doi: 10.1038/s41467-020-19969-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Stachenfeld KL, Botvinick MM, Gershman SJ. The hippocampus as a predictive MAP. Nature Neuroscience. 2017;20:1643–1653. doi: 10.1038/nn.4650. [DOI] [PubMed] [Google Scholar]
  116. Strange BA, Witter MP, Lein ES, Moser EI. Functional organization of the hippocampal longitudinal axis. Nature Reviews. Neuroscience. 2014;15:655–669. doi: 10.1038/nrn3785. [DOI] [PubMed] [Google Scholar]
  117. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Stanford University; 2018. [Google Scholar]
  118. Thom M. Review: hippocampal sclerosis in epilepsy: a neuropathology review. Neuropathology and Applied Neurobiology. 2014;40:520–543. doi: 10.1111/nan.12150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Tosches MA, Yamawaki TM, Naumann RK, Jacobi AA, Tushev G, Laurent G. Evolution of pallium, hippocampus, and cortical cell types revealed by single-cell transcriptomics in reptiles. Science. 2018;360:881–888. doi: 10.1126/science.aar4237. [DOI] [PubMed] [Google Scholar]
  120. Tyulmankov D, Yang GR, Abbott LF. Meta-learning synaptic plasticity and memory addressing for continual familiarity detection. Neuron. 2022;110:544–557. doi: 10.1016/j.neuron.2021.11.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Vértes E, Sahani M. A Neurally Plausible Model Learns Successor Representations in Partially Observable Environments. arXiv. 2019 https://arxiv.org/abs/1906.09480
  122. Wayne G, Hung CC, Amos D, Mirza M, Ahuja A, Grabska-Barwinska A, Rae J, Mirowski P, Leibo JZ, Santoro A. Unsupervised Predictive Memory in a Goal-Directed Agent. arXiv. 2018 https://arxiv.org/abs/1803.10760
  123. Whittington JCR, Muller TH, Mark S, Chen G, Barry C, Burgess N, Behrens TEJ. The tolman-eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formation. Cell. 2020;183:1249–1263. doi: 10.1016/j.cell.2020.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  124. Widloski J, Foster DJ. Flexible rerouting of hippocampal replay sequences around changing barriers in the absence of global place field remapping. Neuron. 2022;110:1547–1558. doi: 10.1016/j.neuron.2022.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Zeldenrust F, Gutkin B, Denéve S. Efficient and robust coding in heterogeneous recurrent networks. PLOS Computational Biology. 2021;17:e1008673. doi: 10.1371/journal.pcbi.1008673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Zhang T, Rosenberg M, Perona P, Meister M. Endotaxis: A Universal Algorithm for Mapping, Goal-Learning, and Navigation. bioRxiv. 2021 doi: 10.1101/2021.09.24.461751. [DOI] [PMC free article] [PubMed]
  127. Zurn P, Bassett DS, Rust NC. The citation diversity statement: a practice of transparency, a way of life. Trends in Cognitive Sciences. 2020;24:669–672. doi: 10.1016/j.tics.2020.06.009. [DOI] [PubMed] [Google Scholar]

Editor's evaluation

Srdjan Ostojic 1

This important work provides compelling evidence for the biological plausibility of the Successor Representation (SR) algorithm. The SR is a leading computational hypothesis to explore whether neural representations are consistent with the hypothesis that the neural networks in specific brain areas perform predictive computations. Establishing a biologically plausible learning rule for SR representations to form is of high significance in the field of neuroscience.

Decision letter

Editor: Srdjan Ostojic1
Reviewed by: Stefano Recanatesi2, Arthur Juliani

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Neural learning rules for generating flexible predictions and computing the successor representation" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Timothy Behrens as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Stefano Recanatesi (Reviewer #1); Arthur Juliani (Reviewer #2); Srdjan Ostojic (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Main Comments:

1) The form of the plasticity rule in Equation 4 is motivated by the requirement that synaptic weights encode a properly normalised transition probability matrix (lines 92-96). But why is the normalisation important? What would change if synaptic weights were simply monotonic functions of transition probabilities, without normalisation? Presumably that would allow for a broader range of plasticity rules.

2) As the results of the paper strongly rely on the normalizing term in Equation 4. One of the reviewers suggests potentially moving upfront part of the discussion of this term, and enlarging the paragraph that discusses the biological plausibility of this specific term. Clearly laying out, for the non-expert reader, why it is biologically plausible compared to other learning rules. Also consider moving the required material to establish the novelty of such term: a targeted review of the relevant literature (current lines 358-366 and 413-433). This would allow the reader to understand immediately the significance and relative novelty of such term. For example, this reviewer personally wondered while reading the paper of how different was such term from the basic idea of Fiete et al., Neuron 2010 (DOI 10.1016/j.neuron.2010.02.003).

3) Related to the first point, the text insists on the fact that \γ is not encoded in the synaptic weights (eg line 89). Again, it is not entirely clear why this is important and justified, since γ is an ad-hoc factor in Equation 2. Presumably the proper normalisation of γ relies on the normalisation of J discussed above? It seems that this constraint could be relaxed.

4) As a consequence of the body of the text being devoted to the analysis of the design choices behind the proposed model, a relatively smaller portion of the work involves direct comparisons with neural data. In these comparisons, while it is apparent that there is a reasonable match between the proposed model and the empirical data, it is difficult to interpret these results. This is because it is unclear what should be expected of a good or bad model given the data being analyzed (TD error and KL divergence), and reasonable baselines to compare against are not presented outside of the traditional TD algorithm, which is shown to be comparable to the proposed RNN based method in a number of cases.

5) It would be useful to have a "limitations" paragraph in the discussion clearly outlining what this learning rule couldn't achieve. For example, Stachenfeld et al., Nat.Neuro. have many examples where the SR is deployed. Does the learning rule suggested by the authors would always work across the board, or are there limitations that could be highlighted where the framework suggested would not work well. No need to perform more experiments/simulations but simply to share insight regarding the results and the capability of the proposed learning rule.

Other comments/suggestions:

– Page 1: The introduction motivates this work with a discussion of hippocampal memory (storage and retrieval), but the work focuses on the SR which is inherently prospective. The first paragraph of the text could be revised to better make this connection beyond simply stating that the hippocampus is involved in both memory and future prediction.

– Page 2: The end of the introduction would be stronger if the motivation for an RNNs usage was tied to the literature on the known recurrent dynamics of the hippocampus. See for example: https://www.frontiersin.org/articles/10.3389/fncel.2013.00262/full

– Page 6: It is not clear the extent to which the FF-TD model differs from a canonical tabular SR algorithm or linear SF algorithm. My understanding is that it is the same, but the presentation in Figure 1i for example makes this somewhat unclear.

– Pages 6 – 11: it may be of benefit to more strongly support the various modifications to the model with connections to known or hypothesized hippocampal neural dynamics.

– Page 14: It states that "We discretize the arena into a set of states and encode each state as a randomly drawn feature ϕ." If I understand correctly, these features are not completely random, and instead follow the distribution described in Section 2.5. As it currently reads, it seems that these features might be drawn from a uniform random distribution, which would be misleading.

– Page 14: In Section 2.6 there is an assumption that a certain level of TD error corresponds to good performance. It is not clear what should objectively be considered a reasonable TD error. This is especially difficult to interpret in the case where both the RNN-S and FF-TD models display comparable performance. Is there perhaps some other baseline you would expect to perform considerably worse?

– Page 17: In Figure 4 it is somewhat confusing that the KL divergence (subplots G and I) has reversed shading (dark for low values) compared to the other subplots. It would be easier to interpret these graphs if their color coding was more consistent.

– Page 18: Similar to the difficulty of interpreting the TD error results, it is not clear what a "good" or "bad" KL divergence from the neural data would be. Any hypotheses on how to ground the numbers provided here would help to improve the quality of the results.

– Page 20: It is mentioned that the predictive timescale may be a separate gain term which the hippocampus takes as input, but there is evidence that different regions of the hippocampus seem to operate on different timescales. See for example: https://www.jneurosci.org/content/42/2/299.abstract. Is there a way to reconcile these ideas?

– Page 23: Section 4.5 describes the procedure for learning the parameters of the weight update rule as CMA-ES. Mentioning the fact that an evolutionary algorithm is used for learning these weights would help to make Section 2.3 more clear.

– Figures 5D-E and similar supplementary figures: if there is a parameter region that is unexplored then the color used for such region should be outside of the colormap. One of the reviewers suggests replacing white with gray for such region in these figures.

– Line 173: the text makes the distinction between an "SR-like" representation, and an "exact SR". What is the difference? Why is it important to have an exact of the SR in the neural activity, rather then eg a monotonic encoding of the SR?

– The RNN described in Equation 2 is not of the standard form (the non-linearity is applied after the connectivity matrix, ie f(J x) instead of Jf(x)). Is this detail important? If not, why not use the more standard form to avoid confusion?

– A line of work in the Fusi lab has examined plasticity rules that lead to the encoding of transition probabilities (eg Fusi et al., Neuron 2007). In particular, a paper by the reviewing editor (Ostojic and Fusi Front Comp Neuro 2013) examined the encoding of transition probabilities using plasticity rules that look similar to this manuscript. This is mentioned just for information, the authors should decide if those papers are relevant.

– Figures 5D-E and similar supplementary figures: if there is a parameter region that is unexplored then the color used for such region should be outside of the colormap. One of the reviewers suggests replacing white with gray for such region in these figures.

eLife. 2023 Mar 16;12:e80680. doi: 10.7554/eLife.80680.sa2

Author response


Main Comments:

1) The form of the plasticity rule in Equation 4 is motivated by the requirement that synaptic weights encode a properly normalised transition probability matrix (lines 92-96). But why is the normalisation important? What would change if synaptic weights were simply monotonic functions of transition probabilities, without normalisation? Presumably that would allow for a broader range of plasticity rules.

The reviewer makes an interesting point that a range of possible rules may yield useful representations even if they do not learn the transition probability matrix exactly. We tested these ideas (see below) and found that normalization is generally important for maintaining stable dynamics in the recurrent network. Many forms of normalization can learn predictive representations similar to the SR, as long as the normalization is performed across rows of the weight matrix independently. We have added a few sentences of text (lines 107-116) and a supplementary figure summarizing and showing these results. The details of our additional analyses and the text added to the manuscript are given below.

We constructed representations using different normalization procedures. For these tests, we simulated a random walk on a circular track, as in Appendix 7-figure 1B, for 10 minutes of simulation time.

If the synaptic weight matrix estimates the normalized transition probability matrix (as in equation 4), the resulting SR matrix over the course of the walk is as Appendix 7-figure 1A.

As an initial test of the role of normalization, we did the same simulation, removing normalization. Thus, $J$ will be equal to the count of observed transitions, i.e. $J_{ij}$ is equal to the number of experienced transitions from state $j$ to state $i$. We refer to this as a count matrix. Note that, without normalization, the values in the count matrix increase steadily over the course of the simulation. This quickly results in unstable dynamics due to the weights of the matrix being too large. We can quantify this instability by plotting the maximum eigenvalue of the weight matrix, where an eigenvalue >= 1 indicates instability (Sompolinsky et al., 1988), Appendix 7-figure 1B.

A simple way to prevent instability is to scale the weights of the matrix by a constant factor such that the dynamics are in the stable regime (specifically, ensuring that the maximum eigenvalue of the synaptic weight matrix is below 1). A careful choice of a scaling value can ensure network activity remains stable within this walk, although this is not a generalizable solution as different scaling values will be needed for different random walks and tasks.

Informed by Appendix 7-figure 1C, we chose a scaling factor of 1.75. and tested what neural representations look like throughout the walk.

Compared to the ground truth SR, the representations above are not very predictive (the off-diagonal elements of the SR are not captured well), and the activity strength seems to be unevenly distributed across states. For instance, there is more activity at state 8 than other states. It is likely that depressing all synapses by the same factor (similar in flavor to Fiete et al., 2010) does not correct for differences in occupancies. In other words, states that happen to be visited more by chance are likely to dominate the dynamics, even if the transition statistics are identical across all states (Appendix 7-figure 1D).

As a further test of different ways of parameterizing the synaptic weight matrix, we can instead scale the count matrix in a row-by-row fashion. Specifically, we divide each row $i$ of the count matrix by the maximum of row $i$ (and some global scaling factor to ensure stability). Note that this is in contrast to $T$, where each row is divided by its sum. The resulting steady state activity matrices (Appendix 7-figure 1E).

This is closer to the SR matrix expected if the synaptic weight matrix estimates $T$. There is a slight unevenness early on in learning the diagonal of the matrix. However, given enough observed transitions, the predictive representation looks reasonable and quite similar to the SR matrix.

Overall, we see that the intuition of the reviewer is correct: there are likely other formulations of the synaptic weight matrix that can give a representation similar to the SR. The important ingredient for this to happen appears to be some form of row-dependent normalization—that is, neurons should have their synaptic weights normalized independently of each other. This ensures that occupancy is not conflated with transition statistics. We added the figures in appendix 7. We also added the following passage to the introduction of the plasticity rule in Results section 2.2 (and use the suggested reference to Fiete 2010):

“The second term in equation 4 is a form of synaptic depotentiation. Depotentiation has been hypothesized to be broadly useful for stabilizing patterns and sequence learning [37, 49], and similar inhibitory effects are known to be elements of hippocampal learning [50, 51]. In our model, the depotentiation term in equation 4 imposes local anti-Hebbian learning at each neuron—that is, each column of $J$ is normalized independently. This normalizes the observed transitions from each state by the number of visits to that state, such that transition statistics are correctly captured. We note, however, that other ways of column-normalizing the synaptic weight matrix may give similar representations (Figure S1).”

2) As the results of the paper strongly rely on the normalizing term in Equation 4. one of the reviewers suggests potentially moving upfront part of the discussion of this term, and enlarging the paragraph that discusses the biological plausibility of this specific term. Clearly laying out, for the non-expert reader, why it is biologically plausible compared to other learning rules. Also consider moving the required material to establish the novelty of such term: a targeted review of the relevant literature (current lines 358-366 and 413-433). This would allow the reader to understand immediately the significance and relative novelty of such term. For example, this reviewer personally wondered while reading the paper of how different was such term from the basic idea of Fiete et al., Neuron 2010 (DOI 10.1016/j.neuron.2010.02.003).

The reviewer points out that more context and clarity around the plasticity rule would be useful, particularly since an understanding of the plasticity rule is integral to the paper.

We would like to clarify that, although the RNN-S is more biologically plausible than the FF-TD learning rule, there is likely additional biological complexity/realism that can be added to the RNN-S learning rule. We wanted to find the simplest rule that could capture the essence of the SR, which is why we focused on the particular form of the learning rule we used.

Aside from plausibility, a key aspect of the RNN-S normalizing term (as discussed in Main Comment 1) is that it independently normalizes each column of the synaptic weight matrix. This is in contrast, say, to the Fiete et al., 2010 paper (which has a global depressive term) and other similar plasticity rules. We directly tested the effect of different normalizations on the RNN representations (see response to Main Comment 1), and find that column-specific normalization is important for capturing transition statistics.

To make these subtle points more clear to the reader, we added additional sentences about biological realism and other forms of learning rules in the section where the learning rule is introduced:

“Crucially, the update rule (equation 4) uses information local to each neuron (Figure 1h), an important aspect of biologically plausible learning rules.

The second term in equation 4 is a form of synaptic depotentiation. Depotentiation has been hypothesized to be broadly useful for stabilizing patterns and sequence learning [37, 49], and similar inhibitory effects are known to be elements of hippocampal learning [50, 51]. In our model, the depotentiation term in equation 4 imposes local anti-Hebbian learning at each neuron-- that is, each column of $J$ is normalized independently. This normalizes the observed transitions from each state by the number of visits to that state, such that transition statistics are correctly captured. We note, however, that other ways of column-normalizing the synaptic weight matrix may give similar representations (Figure S1).”

3) Related to the first point, the text insists on the fact that \γ is not encoded in the synaptic weights (eg line 89). Again, it is not entirely clear why this is important and justified, since γ is an ad-hoc factor in Equation 2. Presumably the proper normalisation of γ relies on the normalisation of J discussed above? It seems that this constraint could be relaxed.

The reviewer is correct that factorizing \γ as a separate factor from the synaptic weights (J) is a notational choice. We include the \γ as a factor distinct from the synaptic strengths for several reasons. The first is for consistency with previous literature. In the SR literature, \γ is factorized out of normalized transition matrices. Similarly, in RNN literature, it is typical to analyze a global gain factor (g), which determines the operating regime of the network (Sompolinsky et al., 1988). Our second is mechanistic. We interpret \γ as a measure of the gain of the network units, that is, as a physiological property of the neurons. The synaptic matrix J, on the other hand, measures the strengths of synapses. Keeping them separate allows for more flexibility, which leads to the third reason. Treating \γ as a separate factor allows the network to retrieve successor representations of different predictive strengths. Importantly, this dynamic predictive ability is achieved without changing any synaptic weights and without additional learning. In other words, a separate \γ allows us to decouple the learning and retrieval processes, providing more flexibility in using the SR.

To make this point more clearly in the text, we added the following lines to explain this rationale:

“Here, the factor $\γ$ represents the gain of the neurons in the network, which is factored out of the synaptic strengths characterized by J. Thus, $\γ$ is an independently adjustable factor that can flexibly control the strength of the recurrent dynamics (see [46]). A benefit of this flexibility is that the system can retrieve successor representations of varying predictive strengths by modulating the gain factor $\γ$. In this way, the predictive horizon can be dynamically controlled without any additional learning required.”

4) As a consequence of the body of the text being devoted to the analysis of the design choices behind the proposed model, a relatively smaller portion of the work involves direct comparisons with neural data. In these comparisons, while it is apparent that there is a reasonable match between the proposed model and the empirical data, it is difficult to interpret these results. This is because it is unclear what should be expected of a good or bad model given the data being analyzed (TD error and KL divergence), and reasonable baselines to compare against are not presented outside of the traditional TD algorithm, which is shown to be comparable to the proposed RNN based method in a number of cases.

The reviewer suggests conducting control analyses to help with interpretability of the TD error and KL divergence results. Specifically, they suggest comparing the performance of the RNN and FF network to good and bad models.

As an example of a “bad” model, we calculate the TD error and KL divergence of a feedforward network with weights randomly drawn from the distribution of weights of the FF-TD model at the end of learning. We call this the Shuffle model, and it is representative of a model without learned structure but with a similar magnitude of weights as the FF-TD model. Specifically, the shuffle model has much higher TD error than the RNN or FF network.

As an example of a “good” model, we calculate the KL divergence between randomly split halves of the dataset from Payne et al., (2021). Specifically, we compare the place field statistics of a random half of the neurons from Payne et al., (2021) with another random half. We calculate the KL divergence between the distributions calculated from each random half. This is repeated 500 times. We call this “Data” in the plot, and it is representative of a lower bound on KL divergence. Intuitively, it should not be possible to fit the data of Payne et al., as well as the dataset itself can. We compare this KL divergence to the KL divergence between each model and the neural data.

The KL divergence of the shuffle model is quite close to the FF network, suggesting that most of the “place-like” qualities of the FF network are more likely a reflection of the input features being “place-like” than the learned weights constructing these place fields. Place fields from the RNN-S model are more similar to neural data than the FF-TD or the Shuffle models. The analysis on split halves of the Payne et al., dataset shows a low KL divergence (specifically, around 0.12) that is much smaller than any of the models.

We add the two plots into the supplementary material (Figure S). Additionally, we report these results in the following sections of the main text:

In the section introducing TD error:

“We want to compare the TD loss of RNN-S to that of a non-biological model designed to minimize TD loss… Both models show a similar increase in TD loss as $\γ_R$ increases, although the RNN-S has a slightly lower TD loss at high $\γ$ than the FF-TD model. Both models perform substantially better than a random network with weights of comparable magnitude (Figure S5d).”

In the section introduced summed KL divergence:

“We combined the KL divergence of both these distributions to find the parameter range in which the RNN-S best fits neural data (Figure 6g). This optimal parameter range occurs when inputs have a spatial correlation of $\σ \approx 8.75$ cm and sparsity $\approx 0.15$. We note that the split-half noise floor of the dataset of Payne et al., is a KL divergence of $0.12$ bits (Figure S6E).

We next tested whether the neural data was better fit by representations generated by RNN-S or the FF-TD model. Across all parameters of the input features, despite having similar TD loss (Figure 5de), the FF-TD model has much higher divergence from neural data (Figure 6gi, Figure S6), similar to a random feedforward network (Figure S6E).”

5) It would be useful to have a “limitations” paragraph in the discussion clearly outlining what this learning rule couldn’t achieve. For example, Stachenfeld et al., Nat.Neuro. have many examples where the SR is deployed. Does the learning rule suggested by the authors would always work across the board, or are there limitations that could be highlighted where the framework suggested would not work well. No need to perform more experiments/simulations but simply to share insight regarding the results and the capability of the proposed learning rule.

The reviewer suggests adding further conceptual clarity to the discussion by giving insight into the limitations of the model and its differences from the Stachenfeld et al., simulations. We anticipate that the RNN-S can capture the results seen in Stachenfeld et al., since the SR learned by the algorithm in Stachenfeld et al., is identical to the representation learned by the RNN-S in the one-hot case.

However, using a recurrent architecture does impose limitations on how the learning rule is structured and how the network can be used. In particular, care must be given to avoid instability in the network due to build-up of recurrent activity. We proposed a ‘learning’ and ‘retrieval’ mode in the network precisely to control such instabilities. Furthermore, the recurrency of the network means that errors in transition structure can compound across a long horizon of prediction. This is especially problematic in the case of non-one-hot features, where greater errors in transition estimation are likely for more densely coded features.

We added a limitations paragraph in the discussion focusing on the limitations of using a recurrent network model (as opposed to a feedforward network). We also further tie these observations to biological evidence:

“There are inherent limitations to the approach of using a recurrent network to estimate the SR. For instance, network dynamics can be prone to issues of instability due to the recurrent buildup of activity. To prevent this instability, we introduce two different modes of operation, ``learning’’ and ``retrieval’’. An additional limitation is that errors in the estimated one-step transition can propagate over the course of the predictive rollout. This is especially problematic if features are more densely coded or more correlated, which makes one-step transition estimations more difficult. These insights into vulnerabilities of a recurrent network have interesting parallels in biology. Some hippocampal subfields are known to be highly recurrent [92, 93, 94, 95]. This recurrency has been linked to the propensity of the hippocampus to enter unstable regimes, such as those that produce seizures [96, 97, 98, 99]. It remains an open question how a healthy hippocampus maintains stable activity, and to what extent the findings in models such as ours can suggest biological avenues to tame instability.”

Other comments/suggestions:

– Page 1: The introduction motivates this work with a discussion of hippocampal memory (storage and retrieval), but the work focuses on the SR which is inherently prospective. The first paragraph of the text could be revised to better make this connection beyond simply stating that the hippocampus is involved in both memory and future prediction.

The reviewer makes an Important point, and in fact the connection between episodic memory and predictive maps in the hippocampus is an active area of research in the hippocampus field. We have edited the first paragraph of the introduction to better explain and flesh out the hypothesized connections between predictive coding in the hippocampus and its function in memory:

To learn from the past, plan for the future, and form an understanding of our world, we require memories of personal experiences. These memories depend on the hippocampus for formation and recall [1, 2, 3], but an algorithmic and mechanistic understanding of memory formation and retrieval in this region remains elusive. From a computational perspective, a key function of memory is to use past experiences to inform predictions of possible futures [4, 5, 6, 7]. This suggests that hippocampal memory is stored in a way that is particularly suitable for forming predictions.”

– Page 2: The end of the introduction would be stronger if the motivation for an RNNs usage was tied to the literature on the known recurrent dynamics of the hippocampus. See for example: https://www.frontiersin.org/articles/10.3389/fncel.2013.00262/full

https://www.frontiersin.org/articles/10.3389/fncel.2013.00262/full

We add the following sentences in the introduction to further motivate the usage of RNNs with known hippocampal anatomy/dynamics, adding in the suggested reference:

“A promising direction towards such a neural model of the SR is to use the dynamics of a recurrent neural network (RNN) to perform SR computations [39, 40]. An RNN model is particularly attractive as the hippocampus is highly recurrent, and its connectivity patterns are thought to support associative learning and recall [41, 42, 43, 44]. However, an RNN model of the SR has not been tied to neural learning rules that support its operation and allow for testing of specific hypotheses.”

– Page 6: It is not clear the extent to which the FF-TD model differs from a canonical tabular SR algorithm or linear SF algorithm. My understanding is that it is the same, but the presentation in Figure 1i for example makes this somewhat unclear.

Yes, you’re correct that the FF-TD model is exactly the linear SR/SF algorithm. We’ve added clarifying sentences in the section introducing the FF-TD model to emphasize this:

As an alternative to the RNN-S model, we consider the conditions necessary for a feedforward neural network to compute the SR. Under this architecture, the $M$ matrix must be encoded in the weights from the input neurons to the hidden layer neurons (Figure 1g). This can be achieved by updating the synaptic weights with a temporal difference (TD) learning rule, the standard update used to learn the SR in the usual algorithm… The FF-TD implements the canonical SR algorithm.”

– Pages 6 – 11: it may be of benefit to more strongly support the various modifications to the model with connections to known or hypothesized hippocampal neural dynamics.

The reviewer suggests motivating the modifications to the model by connecting to known biological mechanisms. We currently make these connections in the Discussion section. To make these points earlier in the paper, we summarize some of these points made in the discussion and put them earlier in the paper. Specifically, we make the following additions:

In the section introducing the anti-Hebbian term:

“The second term in equation 4 is a form of synaptic depotentiation. Depotentiation has been hypothesized to be broadly useful for stabilizing patterns and sequence learning [37, 49], and similar inhibitory effects are known to be elements of hippocampal learning [50, 51].”

In the section introducing the adaptive learning rate:

“If the learning rate of the outgoing synapses from each neuron $j$ is inversely proportional to nj(η=1nj(t)), the update equation quickly normalizes the synapses to maintain a valid transition probability matrix (Supplementary Notes 4). Modulating synaptic learning rates as a function of neural activity is consistent with experimental observations of metaplasticity [56, 57, 58]”

In the section adding a nonlinearity into network dynamics:

“One way to tame this instability is to add a saturating nonlinearity into the dynamics of the network. This is a feature of biological neurons that is often incorporated in models to prevent unbounded activity [60]. Specifically, instead of assuming the network dynamics are linear ($f$ is the identity function in equation 2), we add a hyperbolic tangent into the dynamics equation.”

– Page 14: It states that“"We discretize the arena into a set of states and encode each state as a randomly drawn feature ϕ”" If I understand correctly, these features are not completely random, and instead follow the distribution described in Section 2.5. As it currently reads, it seems that these features might be drawn from a uniform random distribution, which would be misleading.

Thank you for pointing this ou— it is true that the features in Section 2.6 are constructed the same as in Section 2.5. We’ve made this more explicitly clear in Section 2.6:

We discretize the arena into a set of states and encode each state as in Section $2.5$.”

– Page 14: In Section 2.6 there is an assumption that a certain level of TD error corresponds to good performance. It is not clear what should objectively be considered a reasonable TD error. This is especially difficult to interpret in the case where both the RNN-S and FF-TD models display comparable performance. Is there perhaps some other baseline you would expect to perform considerably worse?

This comment is already raised and addressed in Main Comment 4.

– Page 17: In Figure 4 it is somewhat confusing that the KL divergence (subplots G and I) has reversed shading (dark for low values) compared to the other subplots. It would be easier to interpret these graphs if their color coding was more consistent.

Thanks for the clarifying suggestion. We reversed the color map for KL divergence.

– Page 18: Similar to the difficulty of interpreting the TD error results, it is not clear what a“"goo”" or“"ba”" KL divergence from the neural data would be. Any hypotheses on how to ground the numbers provided here would help to improve the quality of the results.

This comment is already raised and addressed in Main Comment 4.

– Page 20: It is mentioned that the predictive timescale may be a separate gain term which the hippocampus takes as input, but there is evidence that different regions of the hippocampus seem to operate on different timescales. See for example: https://www.jneurosci.org/content/42/2/299.abstract. Is there a way to reconcile these ideas?

The reviewer points out an interesting hippocampal finding that is not obviously explained in the RNN-S model: some anatomical axes of the hippocampal formation appear to contain a continuum of predictive timescales in their neural activity (Dolorfo 1998, Brun 2008). This is a well-supported finding with interesting functional implications, and thus is worth discussing and addressing in the paper.

One way to model an anatomical gradient of predictive timescales is to use a series of RNN-S systems. Each of these systems would have a different $\γ$ values that is used during retrieval. Thus, despite these systems receiving the same feature inputs, each network can estimate the state of the animal across different timescales.

Alternatively, the gradient in timescales or granularity could exist on the input level. As in the first hypothesis, we can assume a series of RNN-S systems, except all systems utilize the same $\γ$ value during retrieval. If each system receives inputs that encode different granularities of the animal’s state space (in a spatial example: perhaps one set of inputs uses a state space that divides the arena into quadrants, while another set of inputs uses a state space that divides the arena into a 10x10 grid), then each RNN-S network will naturally develop representations across a continuum of scales.

Both these hypotheses can be functionally useful as a way to learn hierarchical structure and use that information for planning.

We choose to emphasize the first hypothesis (a gradient of $\γ$ values), and summarize this idea by add the following sentences into the discussion paragraph on flexible $\γ$:

The idea that the hippocampus might compute the SR with flexible $\γ$ could help reconcile recent results that hippocampal activity does not always match high-$\γ$ SR [79, 80]. Additionally, flexibility in predictive horizons could explain the different timescales of prediction observed across the anatomical axes of the hippocampus and entorhinal cortex [88, 89, 90, 91, 92]. Specifically, a series of successor networks with different values of γ used in retrieval could establish a gradient of predictive timescales. Functionally, this may allow for learning hierarchies of state structure and could be useful for hierarchical planning [93, 94, 95].”

– Page 23: Section 4.5 describes the procedure for learning the parameters of the weight update rule as CMA-ES. Mentioning the fact that an evolutionary algorithm is used for learning these weights would help to make Section 2.3 more clear.

We added additional sentences in Section 2.3 clarifying how parameters were learned:

To systematically explore the space of plasticity kernels that can be used to learn the SR, we performed a grid search over the sign and the time constants of the pre -> post and post -> pre sides of the plasticity kernels. For each fixed sign and time constant, we used an evolutionary algorithm to learn the remaining parameters that determine the plasticity kernel.”

– figures 5D-E and similar supplementary figures: if there is a parameter region that is unexplored then the color used for such region should be outside of the colormap. One of the reviewers suggests replacing white with gray for such region in these figures.

Thanks for the clarifying suggestion. We switched the color of the unexplored region from white to gray.

– Line 173: the text makes the distinction between an“"SR-lik”" representation, and an“"exact S”". What is the difference? Why is it important to have an exact of the SR in the neural activity, rather then eg a monotonic encoding of the SR?

By “exact SR”, we mean the error between the steady state dynamics matrix and the SR matrix is precisely zero. “SR-like” was our loose way of referring to representation matrices with some amount of mean absolute error from the SR matrix that was still seemingly minimal but not zero.

The reviewer raises an important question about how crucial it is for the network to learn the SR exactly, versus other representations that may also capture long-horizon predictions.

This is similar in spirit to the question raised in Main Comment 1. We showed in an analysis for Main Comment 1 that it may not be necessary to learn the SR exactly, and that a range of possible rules may yield similar representations. The SR is convenient as a reasonable formalization of long-horizon predictions. The analysis referenced in line 173 (Figure 2J) also shows that plasticity kernels with varying time constants yield representations that are similar to the SR.

We clarify the statement previously in line 173 to remove the term “is SR-like” with “has minimal error from the SR matrix”. We also emphasize that the results of the analysis further supports that many predictive representations look similar to each other, and that an exactathemaatical equivalence to the SR is not the most important aspect of a predictive representation:

Finally, we see that even plasticity kernels with slightly different time constants may give results with minimal error from the SR matrix, even if they do not estimate the SR exactly (Figure 2j). This suggests that, although other plasticity rules could be used to model long-horizon predictions, the SR is a reasonable—-though not strictly uniqu— model to describe this class of predictive representations.”

– The RNN described in Equation 2 is not of the standard form (the non-linearity is applied after the connectivity matrix, ie f(J x) instead of Jf(x)). Is this detail important? If not, why not use the more standard form to avoid confusion?

The reviewer points out that our equation uses Δx=x+f(Jx)+i. It is standard to use either a firing rate representation (Δr=r+f(Jr+i)) or a voltage representation (Δv=v+Jf(v)+i). Choosing between the firing rate or voltage representation is not critical. Indeed, if input is not considered, these forms are equivalent up to a transformation (Miller and Fumarola 2012).

Nevertheless, the reviewer is correct that we used a non-conventional amalgamation of these two standard forms.

We re-ran analyses using Jf(x) instead of f(J x). We find that there is no obvious difference in the results generated. We updated the equation in the text to match the standard form. We then updated Figures 3, 5, 6, S3, S5, S6 to reflect this change in the model.

– A line of work in the Fusi lab has examined plasticity rules that lead to the encoding of transition probabilities (eg Fusi et al., Neuron 2007). In particular, a paper by the reviewing editor (Ostojic and Fusi Front Comp Neuro 2013) examined the encoding of transition probabilities using plasticity rules that look similar to this manuscript. This is mentioned just for information, the authors should decide if those papers are relevant.

Thanks for the recommendation— the Ostojic and Fusi paper is indeed quite relevant. It seems like equation 2 of the Ostojic and Fusi paper is the same as our plasticity rule under one-hot encoding assumptions. In the case with more complex input features, the Ostojic and Fusi paper would be identical to the “Independent Normalization” model we present as comparison in Figure 4.

Overall, it is promising and exciting to find another study that arrives at similar conclusions: “Our study shows that synapses encode transition probabilities under general assumptions and this indicates that temporal contiguity is likely to be encoded and harnessed in almost every neural circuit in the brain.” (from the abstract of Ostojic and Fusi).

We have updated the discussion to include this reference:

“Estimating $T$ directly provides RNN-S with a means to sample likely future trajectories, or distributions of trajectories, which is computationally useful for many memory-guided cognitive tasks beyond reinforcement learning, including reasoning and inference (Ostojic et al., 2013, Goodman et al., 2016). The representation afforded by $T$ may also be particularly accessible by neural circuits. Ostojic et al., (2013) note that only few general assumptions are needed for synaptic plasticity rules to estimate transition statistics. Thus, it is reasonable to assume that some form of transition statistics are encoded broadly across the brain.”

– Figures 5D-E and similar supplementary figures: if there is a parameter region that is unexplored then the color used for such region should be outside of the colormap. One of the reviewers suggests replacing white with gray for such region in these figures.

This is a duplicate comment of another comment and has been fixed.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Payne H, Lynch G, Aronov D. 2021. Neural representations of space in the hippocampus of a food-caching bird. Dryad Digital Repository. [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    Transparent reporting form

    Data Availability Statement

    The current manuscript is a computational study, so no data have been generated for this manuscript. Modelling code is publicly available on GitHub: https://github.com/chingf/sr-project (copy archived at swh:1:rev:43320e9b8c15927c67849f768d2a9bf17f68a0ea).

    The following previously published dataset was used:

    Payne H, Lynch G, Aronov D. 2021. Neural representations of space in the hippocampus of a food-caching bird. Dryad Digital Repository.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES