Abstract
Exploration, consolidation, and planning depend on the generation of sequential state representations. However, these algorithms require disparate forms of sampling dynamics for optimal performance. We theorize how the brain should adapt internally generated sequences for particular cognitive functions and propose a neural mechanism by which this may be accomplished within the entorhinal-hippocampal circuit. Specifically, we demonstrate that the systematic modulation along the MEC dorsoventral axis of grid population input into hippocampus facilitates a flexible generative process which can interpolate between qualitatively distinct regimes of sequential hippocampal reactivations. By relating the emergent hippocampal activity patterns drawn from our model to empirical data, we explain and reconcile a diversity of recently observed, but apparently unrelated, phenomena such as generative cycling, diffusive hippocampal reactivations, and jumping trajectory events.
1. Introduction
The entorhinal-hippocampal circuit (EHC) is thought to contribute to a diverse range of cognitive functions1,2. An important motif within this circuit is the sequential non-local reactivation of hippocampal place codes3,4. The archetypal instantiation of this functionality is “replay”, which refers to a temporally compressed representation of a previously experienced trajectory embedded within hippocampal sharp-wave ripples (SWRs)5. Initially observed during sleep, replay is thought to subserve long-term memory consolidation in neocortical networks6. More recently, sequential non-local hippocampal reactivations have been observed that do not fit with this classical definition of replay7. While rodents quietly rest, ensemble place cell activity appears to random walk through a cognitive map of a familiar environment instead of veridically replaying a rodent’s physical traversals8. In the awake immobile state, SWR-related trajectory events encode novel goal-directed routes9. During active movement, an alternative form of sequential non-local reactivation may also occur. Theta sequences, which typically phase precess through local positions, may sweep ahead to remote locations along potential paths available to the rodent10–12.
Spanning these diverse forms of sequential hippocampal representation, we consider a unified algorithmic theme conceptualizing hippocampus as a sequence generator13, contributing sample trajectories drawn from cognitive maps to computations being executed downstream in cortex. Establishing neocortical memory traces via synaptic plasticity may be characterized as a learning process extracting information from experiential replay during sleep14. Hypothetical environment trajectories, encoded in SWRs or theta sequences, may be thought of as samples of possible future behaviors which are input into a planning algorithm for the purposes of optimizing exploration and prospective decision-making15,16. We suggest that this generalized perspective imposes a substantial computational obligation on the EHC as a generative sampling system since the performance of algorithms can vary substantially depending on the statistical and dynamical structure of the input samples17. This motivates our computational hypothesis that hippocampal sequence generation is systematically modulated in order to optimize the resulting sampling regime for the current cognitive objective. Since previous computational EHC models have tended to focus on relatively specific applications such as localization18–20 or vector-based navigation21, the necessity to modulate sequence generation between cognitive algorithms is not addressed. Therefore, the broader computational viewpoint taken here raises unique theoretical questions such as what alternative modes of hippocampal sequence generation are to be expected, by what neural mechanism can such distinctive dynamics be systematically regulated, and how can this be efficiently achieved for large relational spaces (e.g., graphs) which may be non-spatial in nature22.
We develop an algorithmic framework and associated neural mechanism by which distinct dynamical modes of sequence generation may be parsimoniously realized in a pathway between grid cells in medial entorhinal cortex (MEC) and place cells in the CA1 subregion of hippocampus (HC)2. The critical technical innovation is the characterization of grid cells as encoding infinitesimal generators of hippocampal sequence generation. A generator is a mathematical object that specifies how a system evolves in continuous time23,24. Conceptualizing hippocampal sequence generation as a dynamical system operating over a cognitive map, generators then determine the probabilities with which a given position will be reactivated at any time. We propose that MEC grid cells encode generators in a decomposed format. This enables a simple neural mechanism to flexibly interpolate hippocampal sequence generation between qualitatively and quantitatively distinctive regimes from random walks with Lévy jumps to generative cycling (Fig. 1A-E). In a phenomenological linear network model of hippocampal sequence generation, we demonstrate the systematic modulation of grid cells arrayed dorsoventrally as a function of spatial scale in the MEC layer and the consequential effects on place cell activity. In simulating this model, we show that our theory reconciles a diversity of empirical observations in sequential hippocampal reactivations.
2. Results
2.1. Spectral modulation in an entorhinal-hippocampal network model of sequence generation
How might sequences of positions in an environment or, more generally, states of an internal world model be simulated within the brain? This question can be posed formally within a generative probabilistic framework as how to sample state sequences x = (x 0, x 1, …, xt) from a probability distribution p(x) defined over state-space 𝒳. A typical example considered in this study is the sequence distribution p(x) based on the hypothetical decision-making policy of a rodent in an experimental task (Fig. 1F) as this will allow us to relate sequences generated by our model to the sequential non-local reactivations encoded in hippocampal place cells. We consider the state-space 𝒳 to be discrete as this allows us to study both continuous spatial domains (via discretization) and inherently discrete spaces (e.g. graphs) in a common formalism based on matrix-vector products. When studying discretized continuous state-spaces, we interpolate discrete probability distributions where appropriate. Analogous techniques in purely continuous domains replace matrix-vector products with integral transforms24. Therefore, our model may in principle be applied across a wide range of cognitive maps, relational spaces, mental models, or intuitive physics models22,25.
An internal simulation is initialized based on a distribution p(x 0) over states at an initial time t = 0. We compactly denote this distribution over initial states as ρ 0 = p(x 0). How can this initial distribution (for example, a rodent’s initial position in an experiment) be combined with dynamics information (the rodent’s behavioral policy) in order to compute the state distribution ρt = p(xt) at an arbitrary timepoint t in the future (where the rodent will be)? To answer this question, we need to understand how the state distribution ρ evolves in time. This is characterized by its time derivative . Assuming that the dynamics depend only on the current state and that they do not change over time, the evolution of a state distribution ρ is determined by a master equation
(1) |
where O is a matrix known as an infinitesimal generator 23,26. The generator O encodes stochastic transitions between states at short timescales (see Section B.2 of the Supplementary Math Note (SMN) for details). For example, in a T-maze, an entry in O could encode the local bias for a left turn at a critical junction leading to reward acquisition (Fig. 1F). The tempo parameter τ modulates the speed of the simulated evolution. The master equation (Eqn. 1) implies that a dynamical system is economically encoded in an initial state distribution ρ 0 and a generator O since the distribution ρt of possible states of the system at any time t can be retrieved from this information.
The master equation (Eqn. 1) has an analytic solution:
(2) |
Given an initial state distribution ρ 0, the propagator e τ−1ΔtO is a time-dependent matrix which evolves ρ 0 through the time interval Δt to ρ Δt by propagating the initial state probability mass ρ 0 across the state-space under the dynamics specified by O. Fixing Δt = 1, the propagator Pτ = e τ−1O can be applied iteratively to generate state distributions on successive time steps as ρ t+1 = ρ t e τ−-O. State sequences characterizing the simulated evolution of the system can therefore be generated by recursively applying this time-step propagator Pτ and sampling
(3) |
where 1x is a one-hot vector indicating that state x is active with probability one. This will result in state sequences x that accurately reflect the generative distribution of sequences p(x) defined by the generator O and initialization ρ 0 (Section B.8, SMN).
Directly computing the propagator is challenging since it requires an infinite sum of matrix powers. In Section B.3 SMN, we show that Pτ can be computed efficiently using a generator eigendecomposition O = GΛW as
(4) |
Furthermore, the facility to freely modulate the tempo τ of propagation is a highly desirable property. This would enable coarse hierarchical sequence generation to be run expeditiously rather than at the true rate of evolution of the external world or, if time allows, an internal simulation could be slowed down for a more fine-grained analysis27–29. We show that, given this representation (Eqn. 4), the tempo τ can be efficiently manipulated via a computational mechanism which we refer to as spectral modulation. Since Λ is the diagonal matrix of O-eigenvalues, its exponentiation is trivially accomplished by exponentiating the eigenvalues separately along the diagonal . Multiplication by G projects a state distribution ρt on to the generator eigenvectors ϕk = [G] ·k which we refer to as the spectral components of the propagator. Note that we employ this term as a broad reference for generator eigenvectors or components of alternative generator decompositions, potentially constrained by other considerations such as non-negativity, which facilitate spectral modulation (see Section A.1 SMN). In this spectral representation, time shifts simply correspond to rescaling according to the power spectrum
(5) |
Each spectral component ϕk is scaled by sτ (λk) based on its eigenvalue λk. Finally, W projects the spectral representation of the future state distribution ρ t+1 back onto the state-space 𝒳. This spectral format factorizes time and position within an environment such that a linear readout can generate a propagator for any timescale.
We apply the spectral propagator (Eqn. 4) recursively to generate state sequences according to
(6) |
where S = e τ−1Λ is the power spectrum matrix. This generator-based model of sequence generation can be minimally realized in a linear feedback network model of EHC (Figure 2A) in which the activity profiles of the network units are qualitatively consistent with those of grid cells in medial entorhinal cortex (MEC) and place cells in hippocampus (HC). Specifically, we equate the spectral components ϕk (columns of G) with grid cells topographically organized by spatial scale along the dorsoventral axis of MEC (Figure 2B)30. The linear readout W from the MEC layer to the hippocampal layer embeds the the future state distribution ρt in a predictive place code28. See Section A (SMN) for further considerations regarding biological plausibility and connections to other EHC models.
The critical mechanism of spectral modulation, which sets the tempo of propagation, is the systematic regulation of MEC grid cell output as a function of spatial scale according to the power spectrum s. At the implementation level, we hypothesize that spectral modulation may be accomplished via gain control or grid rescaling according to top-down cortical input. An example of the latter would be if a small-scale grid firing map is enlarged to a medium scale in order to increase MEC power at medium scales and reduce MEC power at small scales. Notably, grid modules appear to have the capacity to rescale independently30, do so as a function of experience and presumed cognitive processing31, and cause a consistent rescaling of place fields in hippocampus when the grid scale is perturbed32. More generally, several empirical results support the critical contribution of entorhinal input towards the coherent temporal organization of hippocampal activity33,34. Indeed, beyond tempo control set by τ, we study several other parametric and non-parametric classes of power spectra with highly distinctive causal effects on hippocampal sequence generation. Each of these EHC settings will be motivated as an optimized operational mode for a particular cognitive process.
2.2. Foraging in an open environment
Consider a rodent foraging in a large open environment. Without cues indicating where food may be located, its exploration process must search each location. How might it generate the next environment position to inspect? Approaches to this problem range in terms of computational complexity from serial visitations minimizing repetitions (imposing increasingly burdensome memory and planning costs) to random sampling (requiring no memory). Though generators could be adaptively designed to implement sophisticated forms of uncertainty-driven exploration (e.g., based on Gaussian processes), here we focus on maximizing the efficiency of random sampling in the low-complexity limit. Assuming a random walk generator O rw (implying the rodent has no information as to where food may be located), we study the effect of spectrally modulating tempo in generating the next position to visit. If the rodent repeatedly samples target states at large spatial scales (τ → 0) then it will repeatedly traverse the environment expending too much energy. In contrast, a small-scale search pattern (τ ≫ 0) will lead to the rodent oversampling within a limited area and taking too long to fully explore the environment. Defining exploration efficiency as the fraction of the environment visited per cumulative distance traversed, neither tempo regime delivers a satisfactory return.
This ubiquitous conundrum has been extensively studied in the foraging literature, leading to the Lévy flight foraging hypothesis35. Theoretical analysis and simulations have shown that exploration efficiency is maximized by interleaving jumps (i.e., sampling successive positions separated by a large distance) with local search patterns. Effectively, this strikes an balance between local searches (τ ≫ 0) and global re-orientations to new positions in an environment (τ → 0). It is proposed that these distinctive search dynamics are naturally selected for in animals across a wide variety of ecological niches due to their universally advantageous properties and draws its name from the Lévy distribution which characterizes the distribution of possible next positions in a Euclidean space35. This is a heavy-tailed distribution and therefore in addition to a high probability of sampling a nearby position, it has a small probability of generating a large jump to a more distal region.
We show that analogous heavy-tailed propagation distributions can be accessed in our model via an alternative form of spectral modulation, thus providing a mechanistic account regarding how such sampling may be accomplished in the EHC. Based on mathematical considerations (Section C.1 SMN), we introduce the stability parameter a which determines the entorhinal power spectrum sτ α according to
(7) |
For α = 1, the power spectrum is unchanged, s τ,1 = sτ (compare Eqn. 5, Fig. 2C). This results in random walks which correspond to diffusions in continuous spaces therefore we refer to the α = 1 regime as diffusive (Section B.6, SMN). Setting α < 1, the linear readout from the MEC layer in our circuit model reflects a propagator with probability mass smoothly redistributed to remote positions from the nearby positions (Fig. 2D). Sequence generation with α < 1 is referred to as superdiffusive (Section 5.3, Methods). Although both superdiffusive and diffusive sequence generation are ultimately truncated by the limited extent of an environment, they are differentially sensitive to the possible range of spatiotemporal scales. Intuitively, stability modulation results in a scale-invariant sampling process since sampling can occur at any spatiotemporal scale simultaneously on each iteration. In contrast, τ specifies a limited range of spatiotemporal scales and therefore tempo modulation can never lead to scale-invariance. This implies that α-modulation and τ-modulation have fundamentally distinct effects on the statistical structure of sequence generation (Fig. E1).
We compared diffusive (α = 1, Fig. 2E) and superdiffusive (α = 0.5, Fig. 2F) sampling in an open box environment typically used in rodent foraging experiments8,36 postulating that EHC generates candidate exploratory trajectories which the rodent subsequently pursues physically. Whereas diffusive behavioral trajectories failed to fully explore all areas in the simulated window of time (Fig. 2G), superdiffusive trajectories visited positions in an approximately homogeneous distribution across the entire arena (Fig. 2H), thus highlighting the flexibility with which superdiffusions adapt to an arbitrary environment scale. With respect to the standard measure of exploration efficiency, superdiffusions explored significantly more positions in the open environment as a function of distance traversed than diffusions (Fig. 2I,J). This is also the case for structured state-spaces, such as compartmentalized environments with obstacles (Fig. E2). Consistent with our theoretical arguments and simulations, superdiffusive sequential activation of hippocampal place cells (Fig. 2K,L) and superdiffusive rodent behavioral trajectories have been observed8,36 in foraging experiments where rodents are required to explore environments with essentially random distributions of food locations.
2.3. Goal-directed trajectory events with heterogeneous jumps
We relate our model operating in the superdiffusive regime to heterogeneous sequences of hippocampal representations exhibiting jump transitions to motivationally salient locations9,11. In the first experiment we simulated, SWR-related place cell activity in the CA1 subregion of dorsal hippocampus was recorded from rats engaged in random foraging and spatial memory tasks in alternation9. In the spatial memory task, a single reward location in the open square arena was repeatedly baited and thus the rat could remember this predictable “Home” location and engage in goal-directed navigation in order to acquire the reward. The application of neural decoding analyses revealed the rapid sequential encoding of positions across the environment while the rats were task-engaged but immobile (Fig. 3A-D). This study provided evidence that the hippocampus encodes novel goal-directed paths to memorized locations which were significantly over-represented in the generated trajectories (“away-events”, Fig. 3B,D). We simulated the generation of trajectory events in this experiment with our model in order to demonstrate the novel computational mechanism by which a memorized location may be stored in a generator representation and remotely activated exclusively in the superdiffusive regime. We subtly manipulated a random-walk generator O leveraging a distinguishing feature of the generator-propagator formalism, namely the ability to independently modify the spatial and temporal statistics of sequence generation in a spatially localized manner (Section B.10, SMN). Specifically, we controlled the remote activation of the rewarded Home location by scaling the generator transition rates at home states according to v −1 Oh., where h indexes home states and v is a scalar specifying their motivational value (Section 5.1.3, Methods). Initializing the rodent’s position away from the Home location and activating superdiffusive sequence generation in our EHC model (Fig. 3E-H) results in trajectory events reflecting random walks with biased jumps to the rewarded location (Fig. 3F,H). Using the same generator in the diffusive regime (Fig. 3I-L) does not over-represent the Home location (Fig. 3J,L). Comparing propagators between sequence generation regimes explains this remote activation as the localized increase in sampling probability exclusively at the Home location in the superdiffusive regime (Fig. 3M-P).
In another study of multi-goal foraging in a circular track environment11, it was observed that theta sequences exhibited a strong dependency on the currently targeted goal. As the rodent initialized their behavioral trajectory, theta sequences exhibited a non-local, non-diffusive pattern of activation. Place cells near the goal destination were frequently active along with place cells near the rat’s actual position within individual cycles. We constructed a generator O which encoded a goal-directed policy of clockwise movement around this circular track, with turn-offs to the three goal locations located at the intervals G1, G2, and G3. Assuming that the rodent is located at the start position, we generated sequences in the diffusive (α = 1, Fig. 4A) and superdiffusive (α = 0.5, Fig. 4B,C) regimes. We observed that the latter had a strong tendency to generate jumps to goal locations interleaved with local roll-outs as observed empirically (Fig. 4D,E). This effect emerges from the fact that remote goal locations are over-represented in superdiffusive propagation (Fig. 4F). Taking the distance around the track to the furthest encoded location as the look-ahead distance, the distributions of look-aheads scaled with the distance to the target goal as recorded in theta sequences (Fig. 4G). After placing the rodent at locations along the circular track before each of the goal turn-offs, and without changing any parameters, the sequences exhibited short-range look-aheads which did not scale with the distance to the targeted goal (Fig. 4H).
These effects emerge from the specific combination of superdiffusive sequence generation and a goal-sensitive generator. In the linear track (Fig. 2B-D) and open box (Fig. 2E-H) simulations where random walk generators were used, it was shown that the superdiffusive regime engenders spatially extended jumps between states. Due to the metric correspondence between space and time (set by a specific velocity), this can be equivalently stated as superdiffusions generating large jumps through time13. That is, superdiffusive sequence generation interleaves state transitions over short time-scales (resulting in small spatial steps) with state transitions over long time-scales (resulting in large jumps). Since a goal-directed policy results in goal locations being over-represented in the stationary distribution of internal simulations, the probability of sampling such states in a superdiffusion is relatively high. Therefore, as observed in Fig. 4F, superdiffusive propagation specifically over-weights the goal locations (regardless of the current position of the rodent, Fig. E3).
2.4. Generative cycling emerges from minimally autocorrelated sampling
A critical element of many planning algorithms is prospectively evaluating possible future trajectories. In order to accomplish this, model-based simulations are often employed such as in Monte Carlo Tree Search37. These methods rely on sampling sequences of states in a state-space, retrieving rewards associated to those states, and computing Monte Carlo estimates of choice-relevant objectives. For example, an agent can produce an estimate of the average reward expected to be accrued based on sampled states (x 1, …, xN) and a reward function r(x) as
(8) |
and then input such estimates into an action selection algorithm. Sequentially sampling states from a propagator, retrieving the associated rewards, and estimating the expected average reward via Eqn. 8 forms a Markov chain Monte Carlo (MCMC) algorithm in the service of planning (see Section C.4, SMN).
The quality of an MCMC estimator (Eqn. 8) is quantified by its sample variance which reports how variable the estimate will be across different sample sequences17. A major source of sample variance in MCMC, which also afflicts the generator model in both the diffusive and superdiffusive regimes, is generative autocorrelations. Technically, the sample variance is proportional to the integrated autocorrelation time Δt ac (Eqn. 68, SMN):
(9) |
where CX(t) is the autocorrelation function of the state variable X at step t. Intuitively, Δt ac is the average number of iterations that a sampling algorithm requires in order to generate a single independent sample. Although samples may be generated on each iteration t = 1, 2, 3, we can expect new independent samples to be generated on iterations t = Δt ac, 2Δt ac, 3Δt ac. Standard practice in MCMC estimation, such as with the Metropolis-Hastings algorithm, is to simply discard autocorrelated samples (see Section C.4.1 SMN for further details). This brings into focus a sharp trade-off between sampling time and estimation accuracy, which necessarily burdens any cognitive function dependent on Monte Carlo estimation15.
We show that a fundamentally different solution to the autocorrelation problem of MCMC estimation is available for generator-based sampling. The need to dispose of correlated samples may be obviated by directly optimizing the power spectrum in order to minimize autocorrelations in the emitted sequences. The resulting sequences are then composed of approximately independent samples, thus facilitating rapid online simulation, estimation, and responsive action loops. Technically, we show that the integrated autocorrelation time Δt ac of the state variable X can be expressed analytically in terms of the power spectrum s(k) (where k indexes the spectral components), and the constraints necessary to ensure that the resulting propagator is valid are linear (Section C.5 SMN). Therefore, the minimally autocorrelated power spectrum s mac(k) can be identified using standard optimization routines. We hypothesized that minimally autocorrelated sequence generation should result in a sampling process which methodically shifts between the most salient points of divergence under dynamical evolution of the system. As a counterpoint, consider the genesis of autocorrelations in diffusive sequences (Fig. 2E, G). In diffusions, nearby states tend to be closely associated within the sampling dynamics due to overlapping propagation distributions. Therefore there tend to be relatively likely paths back to previously visited states leading to a large integrated autocorrelation time Δt ac. A minimally autocorrelated sequence generation can avoid this pitfall by restructuring its propagation dynamics in order to successively sample states which do not admit likely paths between them.
We studied this computational hypothesis in the context of a spatial alternation task where a rodent was required to make a binary decision at a junction leading to a reward or not12. Alternating representations of hypothetical future trajectories were identified at several levels of neural organization in dorsal HC while the rats approached the critical junction (but not after the junction turn). In particular, place cells encoding the left and right arms fired in an alternating fashion within the theta band. Assuming that the rodent would initiate planning as it approached the junction where it is required to make a decision, we assumed that hippocampal sequence generation would shift to a regime of minimally autocorrelated sampling (Fig. 5A). The power spectrum s mac which minimized the integrated autocorrelation time (Eqn. 9) bore a strikingly dissimilar profile to the parametrically modulated power spectra sα τ (Fig. 5B). The most notable distinction was the emergence of counterweighted spectral components across spatial scales. The heaviest negative weighting applied to the spectral component encoding a high-level hierarchical decomposition of the environment which we refer to as the dominant spectral component (DSC, Fig. 5C).
Despite the fact that the minimally autocorrelating propagator (Fig. 5D) samples states in both arms at the junction similar to a diffusive propagator, minimally autocorrelated sequence generation subsequently deviated radically from diffusion. Sequences generated under minimally autocorrelated spectral modulation (orange, Fig. 5E) were strongly reminiscent of the reported generative cycling phenomenon in that successive state samples were repeatedly drawn from the opposing arm in the maze12. These stood in stark contrast to diffusive sequences generated both before and after the junction (red, Fig. 5E). Diffusively propagating into one of the arms means that it is relatively likely that a sequence will remain in that arm for a long time thus increasing the generative autocorrelation. This would imply that the rodent’s internal simulation does not have sufficient diversity (in particular it has not sampled the other arm) in order to make an informed decision. As predicted, estimated autocorrelations (Fig. 5F) were significantly lower in generatively cycling sequences (orange) compared to the diffusive (red) or superdiffusive (blue) regimes. In contrast to diffusive propagation (Fig. 5G), minimally autocorrelated sampling leverages the hierarchical structure of the environment in order to generate sequences that efficiently sample across dynamically divergent components of the environment based on repulsive propagation (Fig. 5H).
2.5. Diffusive hippocampal reactivations for structure consolidation
In contrast to the superdiffusive sequences observed during random and goal-directed foraging, offline hippocampal reactivations exhibit a diffusive operational mode during rest8. In an experiment where rats foraged for randomly dropped food pellets, spatial trajectories were decoded from SWRs during a post-exploration rest period (“sleep SWRs”) as well as during immobile pauses in active exploration (“wake SWRs”). The SWR sequence generation regime is statistically identified by estimating the mean displacement function (MD) md(t) = 〈‖xt – x 0‖〉 from the decoded position sequences x = (x 0, …, xt) (Section 5.3, Methods). On a log-log plot, the MDs of diffusions and superdiffusions are linearly related to time with different slopes α −1 determined by stability24. By studying the slope of estimated MDs, it can therefore be concluded that while wake SWRs were superdiffusive (α < 1, consistent with rodent movements), sleep SWRs encoded random walks (α = 1). Furthermore, sleep SWRs were recorded over a range of velocities as parametrized by tempo τ in our model (with τ decreasing as velocity increases). Notably, SWR trajectory velocity was uniquely related to fast gamma power suggesting that it may vary as a function of MEC input8,38. We reconcile these two SWR modes within our model from two perspectives. Mechanistically, we show that spectral modulation can interpolate between the distinct statistical regimes of SWRs associated with sleep versus wake by fitting the empirical MD measurements as a function of stability α (Fig. 6A) and that the empirical sleep SWRs step distributions (Fig. 6B) are well-approximated by diffusive propagators (Fig. 6C).
Furthermore, in support of our computational hypothesis that the statistical dynamics of sequential hippocampal reactivations are flexibly altered between exploration or planning in the wake state and memory consolidation during rest, we studied performance metrics for exploration, learning, and Monte Carlo estimation algorithms based on the sequence generation of input state sequences (Fig. 7A-D). We simulated a learning process whereby an environment representation, formalized as the successor representation (SR), is acquired through error-driven learning based on state sequences28,39. The SR is a predictive state representation whereby the representation of each state encodes the rate at which other states will be visited in the future (Section C.2 SMN). While superdiffusions provide better exploration efficiency (Fig. 7B), and minimally-autocorrelated sampling exhibits the best sampling coverage (Fig. 7D), both are conspicuously inferior in terms of the SR consolidation accuracy (Fig. 7C). Diffusive sequence generation resulted in the best learned approximation to the random walk SR (Fig. 7E-L). Note that this result can be understood theoretically as diffusions embody fundamental spatial biases in their statistical structure (Section 5.1.7, Methods). Though we focused on consolidating the random-walk SR as reflecting a homogeneous predictive map in the absence of salient states such as rewards or landmarks, diffusive sequence generation is also relatively optimized for consolidating directed predictive maps (Fig. E4).
2.6. Dysregulated entorhinal input degrades spatiotemporal consistency of hippocampal activity
In order to flexibly shift between different regimes of sequence generation, our model requires that the spectral modulation of MEC activity be coherently balanced across grid modules. Dysregulated entorhinal activity may imbalance spectral modulation, thereby disrupting the spatiotemporal structure of hippocampal representations. In particular, spectral modulation with stability α > 1 defines a pathological regime of sequence generation which we refer to as turbulence. Despite seemingly minor differences in the power spectra from diffusion α = 1 to turbulence α = 2 (Fig. 8A), the sequences generated differ greatly. Turbulent sequences are highly irregular and fail to reflect the structure of the underlying space (Fig. 8B). In simulation, increasingly turbulent sampling propagates approximately uniformly across states independent of the initial position and propagators may even fail to preserve the state probability density. We suggest that the erroneous modulation of grid cell activity may therefore contribute to psychopathologies in cognitive processes dependent on coherent sampling from internal representations.
In particular, a core positive symptom of schizophrenia is conceptual disorganization. Identifying conceptual representations with nodes in an internal semantic network, this formal thought disorder is indicative of a progressive degradation of the relational structure between nodes during sequence generation consistent with that observed in the generator model as EHC dynamics is shifted deeper into the turbulent regime. The resultant disorder in the hippocampal layer of our model is reminiscent of that observed in mouse models of schizophrenia40. In one study, electrophysiological recordings of neural activity in the CA1 hippocampal subregion of forebrain-specific calcineurin knockout (KO) mice were acquired as the mice freely explored40. Such KO mice had previously been shown to exhibit several behavioral abnormalities reminiscent of those diagnosed in schizophrenia such as impairments in working memory and latent inhibition. The knockout of plasticity-mediating calcineurin leads to a shift towards potentiation and we sought to highlight a possible mechanism by which the resulting over-excitability may disrupt hippocampal SWRs through the lens of our theory. In particular, we model the effect of calcineurin knockout as an imbalance in the spectral regulation of entorhinal input into hippocampus. Notably, the relationship between the temporal displacement of spikes between distinct cells given the spatial displacement of their place fields was abolished under calcineurin knockout40. This is the key quality of turbulent sequence generation. We made a direct quantitative comparison based on a spiking cross-correlation analysis41. In the diffusive regime, the cross-correlations between place cells as a function of the distance between their place fields embodies the expected relationship between spatial and temporal displacements across the population code whereby place cells at a greater distance from one another tend to activate after larger time intervals (Fig. 8C). Performing the same analysis with the same grid code but with a turbulent spectral modulation returns qualitatively distinct results (Fig. 8D) in which the cross-correlation is essentially independent of place field distance consistent with a complete loss of sensitivity to the underlying spatial structure in sequence generation. While the former reflects the characteristic “V” pattern observed in healthy hippocampal activity (Fig. 8E), the latter replicates the key characteristic of disordered hippocampal sequences recorded from this mouse model of schizophrenia40 (Fig. 8F).
3. Discussion
Sequential hippocampal reactivations traversing cognitive maps are viewed as a potential neural substrate of internal simulation at the systems level5,13,42. We have sought to address some of the apparent variability in the structure and statistics of sequence generation in the EHC8,9,11,12,36,40,41. Our contributions are three-fold. A principled motivation for activating distinct sequence generation regimes depending on the current cognitive process was established through simulation and technical arguments. A linear feedback network model of EHC was proposed which implements the novel technique of spectrally modulating sequence generation. This provides a mechanistic account of how random walk processes may be smoothly interpolated with superdiffusive foraging patterns and minimally autocorrelated sampling. Applications of this model across a variety of environment geometries, behavioral states, and cognitive functions were shown to explain variations in the spatiotemporal structure of hippocampal sequence generation and behavior across a number of experiments.
An open computational question is how might the optimal sequence generation regime for a particular cognitive algorithm and environmental scenario be identified within the brain. We suggest that this arbitration problem is subsumed within the framework of computational rationality which seeks to understand how the parameters of resource-limited algorithms may be optimized43. Previous analyses within this broad remit have focused on optimizing the distribution from which samples are drawn during decision-making44 or how state sequence propagation should be initialized and directed for reinforcement learning16. Generators may be custom designed to embed such desirable algorithmic features within sequence generation.
Our neural network model is designed to provide conceptual clarity regarding how entorhinal spectral modulation may explain variations in hippocampal sequence generation. An integration of other anatomical and functional features of the broader entorhinal-hippocampal system into this model is warranted. Inspiration may be drawn from continuous attractor network models of theta sequences and replay45 and MEC electrophysiological recordings which have gleaned evidence that MEC input plays a causal role in refining the temporal organization of hippocampal activity33,34. Our simulations are based on sampling from the propagated state distribution on each network iteration assuming a noisy readout to the hippocampal code. Possible extensions of this model include the sampling of multiple steps, or sub-sequences, on each iteration which may be regulated by intra-hippocampal processing between CA3 and CA17,34. Indeed, a more comprehensive generalization of our model would also operationalize the spectral modulation mechanism along the trisynaptic pathway based on the projections from MEC layer II46. With respect to theta-gamma coupling38,47, sequence generation may be embedded within gamma oscillations alternating between hetero-associative and auto-associative intra-hippocampal dynamics36.
The causal relationship between the power spectrum of MEC activity and the statistical dynamics of hippocampal sequence generation is a core prediction of our model that has not been directly tested to our knowledge. In particular, spectral modulation between diffusive sequence generation (associated with learning and consolidation during rest) and superdiffusive dynamics (expected during active exploration and goal-directed behavior) specifies distinct profiles of total grid population activity along the MEC dorsoventral axis potentially underpinned by variations in neural gain and grid rescaling. Furthermore, minimally autocorrelated hippocampal sequence generation (appropriate for sampling-based planning and inference) specifies novel non-parametric spectral motifs which may be identifiable experimentally.
5. Methods
5.1. Simulation details
5.1.1. Linear track simulation, Fig. 2
Propagator simulations were based on a generator defining random walks on a discretized linear track state-space composed of ten states. Propagation densities were interpolated between states. The (τ, α)-modulated power spectrum sτ,α(λ) is defined as
(10) |
and the relative (τ, α)-modulated power spectrum rτ,α(λ) is then
(11) |
The latter is normalized for each propagator to facilitate comparisons across propagators
(12) |
This measures the fraction of the total power spectrum accounted for by each spectral component relative to the baseline (τ = 1, α = 1).
5.1.2. Pfeiffer & Foster, Science (2015), Fig. 2
A square enclosure containing no obstacles was represented by a 50 × 50 lattice of states. The stability parameter was set to α = 1 and α = 0.5 for diffusive and superdiffusive exploration respectively. The tempo parameter was set to τ = 1 in both cases. The simulation results reflect 20 sequences composed of 75 samples each. Superdiffusive exploration was more efficient than diffusive across a range of parameter values and changes in the size of the environment. The performance advantage of superdiffusion diminished as the environment size was reduced as all states became locally accessible to diffusive exploration and global re-positioning rendered unnecessary. In panel 2k, the step size was computed using the Euclidean distance between states embedded within an ambient continuous space.
5.1.3. Pfeiffer & Foster, Nature (2013), Fig. 3
The square arena containing no obstacles was discretely represented by a 25 × 25 lattice of states. The random walk generator O rw (Section B.5, SMN) on this graph was modified to embed the rewarded Home location by scaling the generator transition rates at home states according to
(13) |
where h indexes home states and v is a scalar quantity specifying the motivational value of home states. This leverages a unique property of the generator formalism whereby the temporal localization of states can be controlled independently of spatial structure (Section B.10, SMN).
In our simulations, the stability parameter was set to α = 1 and α = 0.5 for diffusive and superdiffusive sequence generation respectively. The tempo parameter was set to τ = 5 in both cases and the motivational value parameter v = 100. Trajectory events were simulated via sequence generation emanating from a home location (“home-events”) and four away locations (“away-events”). The home location was specified as four adjacent states close to the center of the environment and the four away locations were placed in each corner. Twenty home-events and ten away-events per away location, each composed of ten samples, were generated. Sampling density was estimated from these trajectory events as the number of samples of each state divided by the total number of samples. This function was computed for each combination of condition (home and away) and sampling regime (diffusive and superdiffusive). The results were robust to variations in simulation parameters and the number of trajectory events.
5.1.4. Wikenheiser & Redish, Nature Neuroscience (2015), Fig. 4
The circular track was discretized into 24 states with 3 further states connected at regular intervals on the outside of the circle serving as the goal locations. Consistent with the behaviors acquired by the rats during task training, the generator employed reflected a goal-directed behavioral bias in favour of anti-clockwise movement around the track and turn-offs to goal locations. For sequence generation targeting a particular goal, the generator reflected a bias towards staying at the goal location once it was reached in order to reflect the time taken for the reward to be delivered, consummatory behaviors, and trial termination. In the multi-goal scenario (Fig. E3), the generator transition structure facilitated a return to the main track after goal arrival thus enabling multiple goal locations to be visited. Simulations of theta sequences (for both the individual sequence plots and look-ahead estimations) were composed of five iterations in order to match the number of decoded positions in the figure panels from Ref. [11]. As usual, stabilities α = 1 and α = 0.5 were used for diffusive and superdiffusive sequence generation respectively with τ = 1 in both. Look-ahead distances were computed as the distance from the initial position to the furthest sampled position. The initial position was the start location marked by an X in the “Trajectory Initiation” simulations (Fig. 4G). In the “Goal Arrival” simulations, the initial position for a particular targeted goal was located in the center of the preceding track segment. The mean and standard error of the lookahead distances were estimated from 50 samples (an arbitrary number ensuring a stable estimate). The spatial scale (i.e., the numeric distance between two adjacent states) was set in order to roughly match that of the circular track employed in the experiments11.
5.1.5. Kay et al., Cell (2020), Fig. 5
We assumed that the rodent’s decision-making policy reflected a directed run to the junction, a random choice to turn left or right, followed by a directed run to the end of the maze arm (where a reward might be available). A small transition rate opposing the directed transitions was added in order to ensure reversible and aperiodic propagation dynamics and numerically stable analyses. We initialized the spectrum in the diffusion regime and applied standard minimization routines (scipy.minimize) in order to solve the constrained optimization problem (Eqn. 84). We minimized the integrated autocorrelation time summed over nine lags. Autocorrelations were estimated from 100 generated sequences of 20 iterations each.
Note that the planning objective (Eqn. 8) is based on the average-reward, infinite-horizon formulation of Markov decision process51. This facilitates a simpler exposition and a more direct analogy to Markov chain Monte Carlo methods. For episodic and discounted MDPs, a similar but more complicated analysis may be pursued. In this case, the expected cumulative reward forms the value function objective requiring Monte Carlo estimation over sampled sequences. This may be accomplished via MCMC over the joint distribution over states across time points or else by applying Sequential Monte Carlo methods. Importantly, the integrated autocorrelation time Δt ac remains the key objective for sampling optimization since it is also reflected in the sample variance of the expected cumulative reward value function estimator.
5.1.6. Stella et al., Neuron (2019), Fig. 6
We hypothesized that, though the rodents’ physical experiences conformed to superdiffusive trajectories, the EHC recapitulated the environment experiences in the form of diffusive trajectories in order to facilitate accurate spatial consolidation, reconsolidation, and maintenance processes during sleep. This is because diffusive replay embodies fundamental inductive biases regarding space, namely that space has a smooth, localized, and isotropic structure. In contrast, superdiffusive trajectories are superior for foraging as shown in simulation (Fig. 7) and theoretical studies35. Therefore, we also conjectured that, in contrast to sleep SWRs, SWRs occuring during immobile periods interleaved with active foraging (i.e., “wake SWRs”) may reflect non-local, superdiffusive spatial trajectories in order to leverage a cognitive map of the environment in support of exploration. In particular, a superdiffusive sequence of positions may be generated during pauses, which the rodent can then follow physically in order to search for the food pellets.
Although rodents were familiar with the foraging environment at the time of the electrophysiological recordings and thus only a modest amount of new information may be required to be consolidated, theoretical and computational studies have shown that off-line replay is crucial to maintaining precise storage and retrieval mechanisms for previously learned information14. This need is particularly pressing in order to avoid interference due to ongoing cortical plasticity as well as the consolidation of new information such as that drawn from experiences in novel environments as is the case in the protocol of the current experiment. Furthermore, from a spatial cognition perspective, off-line replay may contribute towards “map refinement”8. That is, increasing the accuracy and resolution, and reducing any residual uncertainty, with respect to the structure of cognitive maps that may already be consolidated to a degree.
In order to show that our model supports the switching of the desired hippocampal sampling regime between diffusion for consolidation to superdiffusion for exploration based on monosynaptic input into the CA1 region, we modeled the mean displacement curves of generated sequences under diffusion and superdiffusion. These curves were plotted as a function of the time interval Δt and compared to the empirical mean displacements for sleep and wake SWRs respectively (Fig. 6A). The mean displacement curves of sequence generation under the generator model were analytically computed (Eqn. 17, Section 5.3, Methods) as a function of the stability parameters α and diffusion constants K estimated in Ref. [8]. This is in Fig. 6A). In order to focus on stability modulation, which specifically distinguishes between diffusion and superdiffusion, we averaged the empirical data over trajectory velocities (which is parametrized by the τ parameter in our model). In addition to SWR trajectories, we also plot the mean displacement curve for the median velocity behavioral trajectories (see “behavior” Fig. 6A). The spatial scale parameter σ was set to 4 in order to approximately match the ratio between the spatial (y-axis) and temporal (x-axis) scales. This parameter can be thought of as defining what constitutes a centimetre in our simulation and does not affect the relative slopes in mean displacement.
5.1.7. Sampling optimization, Fig. 7
A “ring-of-cliques” state-space with five cliques composed of ten states each was studied (Fig. 7A). The definitive feature of this state-space is that it exhibits a community structure. Within each clique, states are densely connected, however only sparse connections are available between cliques. Rings of cliques are commonly studied in the field of social psychology as a proxy for social networks, and it is known that human structure learning is sensitive to such community architectures50. Propagators were based on the random walk generator on this state-space. The power spectrum was parametrized such that the propagator shape was approximately equalized across conditions. This was accomplished by equating the modal probability (i.e., the “height”) of the propagator distributions. In particular, the stability parameter was set to α = 1 for diffusion and α = 0.3 for superdiffusion. These settings were paired with tempo parameters of τ = 20.7 and τ = 3.1 respectively. The integrated autocorrelation time over nine lags was minimized for the min-autocorrelation propagator. See Fig. E7 for a visualization of sample sequences.
For exploration, a single sequence of 100 samples was generated and interpolated into a complete behavioral trajectory. For SR consolidation, sequences of 50 states were generated. The learned SRs (panels F-H) reflect the estimated SR after 500 sequences. For sampling, 10 sequences of 10 states each was generated (resulting in 100 states sampled in total). These are conceptualized as chains running in parallel as is commonly implemented in MCMC algorithms and thus the sampling coverage was computed by integrating over the 10 sequences generated. For exploration and sampling, all sequences were initialized at a fixed state presumed to correspond to the position of the agent in the environment while, for learning, sequences were randomly initialized according to the stationary distribution as if randomly sampling from memory during sleep. All curves reflect the mean and standard error over 50 simulations.
The performance rank order across generative regimes for each measure was robust with respect to variations in τ and α, the number of sequences and sampling iterations, and the parametrization of the environment (i.e., changes to the number of cliques or the number of states per clique). With regards to the latter, it seemed that the graph diameter was an important factor in determining the gain in exploration efficiency for the superdiffusive regime relative to the min-autocorrelation regime (Fig. 7B). The graph diameter measures the minimum distance the agent must travel in the worst case scenario. In the ring-of-cliques model, the graph diameter scales with the number of cliques. As the graph diameter increased and, consequently, large non-local “jumps” were more heavily penalized for distance, superdiffusions tended to excel. Therefore, one would expect that superdiffusive dynamics may be less important in small-world networks which are characterized by connectivity structures admitting a short path between almost all pairs of nodes. Such networks are inconsistent with Euclidean spaces but are notably over-represented in social and transport networks. We also observed that running multiple chains in parallel increased the min-autocorrelator performance in sampling coverage relative to superdiffusions (Fig. 7D). This suggests that, consistent with MCMC intuitions, there is a cumulative effect of sampling diversity when autocorrelations are minimized. That is, the evolution of each chain increasingly diverges from other chains over sampling iterations. More formally, the sample cross-correlation across chains is low due to the generative minimum autocorrelation property.
The structure consolidation results can be understood theoretically. Diffusion is a random process with zero drift and finite scale (measured as propagation variance as a function of time). Therefore it encapsulates inductive biases regarding spatial structure, respectively that space is isotropic and has a scale that distinguishes between local and global structure. Moreover, from an information-theoretical point of view, it is the simplest such random process, being the unique maximum entropy process with these properties52. This implies that diffusive sequences implicitly encode isotropy and scale, and nothing else. In contrast, the distinguishing feature of superdiffusive Lévy flight processes in unbounded Euclidean spaces is the infinite variance of its heavy-tailed propagation distribution24. This means that extraordinarily large jumps may be generated and so the downstream learning process is unable to distinguish between local structure (learned from individual small jumps within short time periods) and global structure (learned from the accumulation of many small jumps over a period of time).
5.1.8. Karlsson & Frank, Nature Neuroscience (2009) and Suh et al., Neuron (2013), Fig. 8
The diffusion and turbulence regimes were parametrized by α = 1 and α = 2 respectively with τ set to 20 in both cases. In a linear track environment, 100 state sequences with 100 steps were generated in each of these stochastic regimes. The average spiking activity of each place cell (each encoding a distinct state) was taken to be the output propagator density. The Euclidean distance between states (assuming the underlying graph is embedded in an ambient continuous space) was taken to be the distance between place fields. The relative spike timings were offsets in the number of steps within the generated sequences. Given these modeling assumptions, we computed normalized cross-correlograms of place cell activity as a function of the distance between place fields41. The key qualitative distinction between the cross-correlograms was robustly observed regardless of modifications to the other model parameters (e.g., r, number of sequences, number of steps).
In Fig. 8E, we present a cross-correlogram of SWRs recorded in healthy mice which clearly shows the predicted V-structure consisted with a systematic relationship between temporal and spatial displacements during sequence generation. This plot comes from another reference Ref. [41]. Note that, in Ref. [40], the cross-correlogram for the littermate controls is also presented (see Fig. 4C 40) but the V-structure is less clear (presumably, due to fewer SWRs and a smaller environment). Therefore, we include the V-structure from the prototypical analyses in Ref. [41] since it is easier to interpret on viewing for the first time.
5.2. Evaluating sequence generation across exploration, learning, and planning
The central computational motivation in conceptualizing MEC as a modulator of hippocampal sequence generation is that distinct cognitive algorithms have fundamentally different requirements as to the statistical and dynamical structure of the input sequences received from EHC. In order to establish this empirically, we evaluated sequences generated with respect to metrics sensitive to the performance of exploration, learning, and sampling-based planning algorithms. The three metrics are exploration efficiency, consolidation accuracy, and sampling coverage, and are defined in the following three sub-sections.
5.2.1. Exploration efficiency
In order to evaluate how appropriate a generative sampling regime is for exploration, we quantified exploration efficiency as the fraction of environment states or positions visited relative to the cumulative distance traversed. This definition is a simple adaptation to graph structures of the standard definition commonly studied in the foraging literature53. Exploration efficiency is plotted as a function of the number of states sampled in Fig. 7B for diffusive, superdiffusive, and minimally autocorrelated sequence generation. Note that, if the generator model samples successive non-adjacent states, then it is assumed that the agent visits all states along the shortest path between the sampled states as if physically traversing the environment. In a discrete state-space, the distance is taken to be the number of steps taken while in a continuous domain, it is the Euclidean distance between positions.
Propagators may repeatedly sample the same state on multiple iterations leading to a period of immobility for the agent. We assume that the cost of remaining in a state is zero since the agent has not traversed any distance. However, re-sampling the same state adds to the total time cost of exploration, which can be defined as the total time taken for sampling subsequent states and moving to those states. Assuming that hippocampal state sampling is embedded within theta11 or slow-gamma36 cycles, the rate at which new states are sampled is on the order of centiseconds. In contrast, a rodent requires time periods on the order of seconds to move between positions sampled in a typical open box environment. Thus, the contribution of sampling time to total exploration time is neglible and is therefore not reflected in the exploration efficiency measure. Note that, in this regard, the definition of exploration efficiency contrasts sharply with the sampling coverage measure which specifically penalizes for time and not distance.
5.2.2. Consolidation accuracy
We quantify how well a structural representation of an environment (in particular, a successor representation39) can be learned from replayed state sequences. As a measure of consolidation accuracy, we utilize the Spearman correlation between the true and learned state-space successor representations as a function of the number of sequences generated. This measure is plotted as a function of the number of sequences generated in Fig. 7C for each modulatory regime. Before learning commences, the prior successor representation is taken to be that generated by a fully-connected graph. This implies that the future expected rate of occupancy is completely homogeneous across initial and successor states a priori. The SR is learned via a standard temporal difference learning rule28,39.
5.2.3. Sampling coverage
An important objective in sample-based inference (e.g., in planning37 and reinforcement learning54) is to generate a diversity of sampled states as quickly as possible as this leads to robust and rapid estimators. Sampling coverage is defined as the fraction of environment states or positions visited relative to the number of sampling iterations (Fig. 7D). Note that, unlike the exploration efficiency measure, sampling coverage does not penalize for the distance traversed between states, as it is presumed to reflect an internal sampling mechanism which is completed prior to any contingent behavior. The sampling coverage measure specifically penalizes for repeatedly sampling the same state since the fraction of states sampled remains the same despite the number of sampling iterations increasing. Both exploration efficiency and sampling coverage are sensitive to the fraction of states visited or sampled respectively. The major distinction between the two is that the former isolates the cost of distance traversed by the agent, while the latter penalizes the time taken to generate the sequence.
5.3. Mean squared displacement and mean displacement as diagnostic measures of the sequence generation regime
Mean squared displacement (MSD) msd(t) = 〈|d(t)|2〉 is a measure of the scaling law of a stochastic process and can be estimated from trajectory samples8,24. It is a time-dependent function which characterizes the relationship between the spatial and temporal displacements. Intuitively, for any time period t, msd(t) measures how far away from the initial state at t = 0 that sequence generation is expected to sample as a function of time. Given a trajectory x = (x 0, …, xt) in Euclidean spaces, the measure of spatial displacement is taken to be the squared Euclidean distance d(t) with respect to the initial state:
(14) |
where xt,i is the i-th position coordinate of the process at time t and D the dimensionality of the state-space. It can be shown analytically24 that, for diffusive-type processes, MSD scales according to
(15) |
In log-space
(16) |
and so the intercept on a log-log plot is the log of the diffusion constant K (which sets the rate of diffusion independent of τ and α modulation) and the slope is the inverse of the stability parameter α. If the slope of the MSD graph equals 1 then the process is a diffusion. A slope greater than 1 indicates supralinear displacements and therefore the process is known as a superdiffusion (α < 1). This is the genesis of the term superdiffusion. Therefore, the stability regime of sequence generation can be inferred from sample trajectories by estimating msd(t). The key implication of a supralinear MSD is that a superdiffusion is expected to sample further away from its origin over time compared to diffusion. When computing msd(t) for sequences generated by our model, we consider the Euclidean distance d induced by embedding the state-space within an ambient Euclidean space.
The mean displacement (MD) is analyzed in Fig. 6 and Ref. [8]. This is similar to the MSD formula but without squaring the distance function:
(17) |
therefore in log-space, we have
(18) |
Statistical Analyses
No statistical analyses were conducted for this study.
Extended Data
Supplementary Material
Acknowledgements
This work was supported by the Center for Brains, Minds and Machines (funded by NSF STC award CCF-1231216) and the Wellcome Trust (Sir Henry Wellcome Fellowship 110257/Z/15/Z to DM). For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. Thanks to Alberto Bernacchia, Quentin Huys, Zeb Kurth-Nelson, Matt Nour, Evan Russek, Hannah Sheahan, Hrvoje Stojic, Sina Tootoonian, and Elliott Wimmer for discussions and feedback. Thanks to Federico Stella and Jozsef Csicsvari for sharing their data and for performing additional analyses. Thanks to Ben Sorscher and colleagues for sharing their code.
Footnotes
Author Contributions
All authors contributed towards model conceptualization. DCM performed the simulations. DCM prepared a draft of the manuscript which all authors reviewed contributing edits and comments.
Competing Interests
Daniel McNamee and Samuel Gershman declare no competing interests. Kimberly Stachenfeld and Matthew Botvinick are employed by Google DeepMind.
Reporting Summary
Further information on research design is available in the Life Sciences Reporting Summary linked to this article.
Data Availability
No data was acquired for this study.
Code Availability
Simulation code was written using open source packages in the Python 3 programming environment and can be found at https://github.com/dmcnamee/FlexModEHC.
References
- 1.Buzsáki G, Moser EI. Memory, navigation and theta rhythm in the hippocampal-entorhinal system. Nature Neuroscience. 2013;16:130–138. doi: 10.1038/nn.3304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rowland DC, Roudi Y, Moser MB, Moser EI. Ten Years of Grid Cells. Annual Review of Neuroscience. 2016;39 doi: 10.1146/annurev-neuro-070815-013824. [DOI] [PubMed] [Google Scholar]
- 3.Lisman J, et al. Viewpoints: how the hippocampus contributes to memory, navigation and cognition. Nature Neuroscience. 2017 doi: 10.1038/nn.4661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ólafsdóttir HF, Bush D, Barry C. The Role of Hippocampal Replay in Memory and Planning. Current Biology. 2018;28:R37–R50. doi: 10.1016/j.cub.2017.10.073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Foster DJ. Replay comes of age. Annual Review of Neuroscience. 2017;40:581–602. doi: 10.1146/annurev-neuro-072116-031538. [DOI] [PubMed] [Google Scholar]
- 6.Klinzing JG, Niethard N, Born J. Mechanisms of systems memory consolidation during sleep. Nature Neuroscience. 2019 doi: 10.1038/s41593-019-0467-3. [DOI] [PubMed] [Google Scholar]
- 7.Pfeiffer BE. The content of hippocampal “replay”. Hippocampus. 2020:1–13. doi: 10.1002/hipo.22824. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Stella F, Baracskay P, O’Neill J, Csicsvari J. Hippocampal Reactivation of Random Trajectories Resembling Brownian Diffusion. Neuron. 2019 doi: 10.1016/j.neuron.2019.01.052. [DOI] [PubMed] [Google Scholar]
- 9.Pfeiffer BE, Foster DJ. Hippocampal place-cell sequences depict future paths to remembered goals. Nature. 2013;497:14–19. doi: 10.1038/nature12112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Johnson A, Redish A. Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J Neurosci. 2007;27:12176–12189. doi: 10.1523/JNEUROSCI.3761-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wikenheiser AM, Redish D. Hippocampal theta sequences reflect current goals. Nature Neuroscience. 2015;18 doi: 10.1038/nn.3909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kay K, et al. Constant Sub-second Cycling between Representations of Possible Futures in the Hippocampus. Cell. 2020 doi: 10.1016/j.cell.2020.01.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Buzsáki G, Tingley D. Space and Time: The Hippocampus as a Sequence Generator. Trends in Cognitive Sciences. 2018;22:853–869. doi: 10.1016/j.tics.2018.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Káli S, Dayan P. Off-line replay maintains declarative memories in a model of hippocampal-neocortical interactions. Nature Neuroscience. 2004;7:286–294. doi: 10.1038/nn1202. [DOI] [PubMed] [Google Scholar]
- 15.Sanborn A, Chater N. Bayesian Brains without Probabilities. Trends Cogn Sci. 2016;20:883–893. doi: 10.1016/j.tics.2016.10.003. [DOI] [PubMed] [Google Scholar]
- 16.Mattar MG, Daw ND. Prioritized memory access explains planning and hippocampal replay. Nature Neuroscience. 2018 doi: 10.1038/s41593-018-0232-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Sokal A. Monte Carlo methods in statistical mechanics: foundations and new algorithms. Springer; 1997. pp. 131–192. [Google Scholar]
- 18.McNaughton BL, et al. Deciphering the hippocampal polyglot: the hippocampus as a path integration system. Journal of Experimental Biology. 1996;199:173–185. doi: 10.1242/jeb.199.1.173. [DOI] [PubMed] [Google Scholar]
- 19.Burak Y, Fiete IR. Accurate path integration in continuous attractor network models of grid cells. PLoS Computational Biology. 2009;5 doi: 10.1371/journal.pcbi.1000291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ocko S, Hardcastle K, Giocomo L, Ganguli S. Emergent elasticity in the neural code for space. Proc Natl Acad Sci USA. 2018;115:E11798–E11806. doi: 10.1073/pnas.1805959115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bush D, Barry C, Manson D, Burgess N. Using Grid Cells for Navigation. Neuron. 2015;87:507–520. doi: 10.1016/j.neuron.2015.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Behrens T, et al. What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior. Neuron. 2018;100:490–509. doi: 10.1016/j.neuron.2018.10.002. [DOI] [PubMed] [Google Scholar]
- 23.Norris J. Markov Chains. Cambridge University Press; 1997. [Google Scholar]
- 24.Klages R, Radons G, Sokolov IM. Anomalous transport: foundations and applications. John Wiley & Sons; 2008. [Google Scholar]
- 25.McNamee D, Wolpert DM. Internal Models in Biological Control. Annual Review of Control, Robotics, and Autonomous Systems. 2019;2:339–364. doi: 10.1146/annurev-control-060117-105206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Weber MF, Frey E. Master equations and the theory of stochastic path integrals. Reports on Progress in Physics. 2017;80 doi: 10.1088/1361-6633/aa5ae2. 046601. [DOI] [PubMed] [Google Scholar]
- 27.McNamee D, Wolpert D, Lengyel M. Efficient state-space modularization for planning: theory, behavioral and neural signatures. Advances in Neural Information Processing Systems 30. 2016;29:4511–4519. [Google Scholar]
- 28.Stachenfeld K, Botvinick M, Gershman S. The hippocampus as a predictive map. Nature Neuroscience. 2017;20:1643–1653. doi: 10.1038/nn.4650. [DOI] [PubMed] [Google Scholar]
- 29.Michelmann S, Staresina BP, Bowman H, Hanslmayr S. Speed of time-compressed forward replay flexibly changes in human episodic memory. Nature Human Behaviour. 2019;3:143–154. doi: 10.1038/s41562-018-0491-4. [DOI] [PubMed] [Google Scholar]
- 30.Stensola H, et al. The entorhinal grid map is discretized. Nature. 2012;492:72–78. doi: 10.1038/nature11649. [DOI] [PubMed] [Google Scholar]
- 31.Barry C, Ginzberg LL, O’Keefe J, Burgess N. Grid cell firing patterns signal environmental novelty by expansion. Proceedings of the National Academy of Sciences of the United States of America. 2012;109:17687–17692. doi: 10.1073/pnas.1209918109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Mallory C, Hardcastle K, Bant J, Giocomo L. Grid scale drives the scale and long-term stability of place maps. Nature Neuroscience. 2018;21:270–282. doi: 10.1038/s41593-017-0055-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Schlesiger M, et al. The medial entorhinal cortex is necessary for temporal organization of hippocampal neuronal activity. Nature Neuroscience. 2015;18:1123–1132. doi: 10.1038/nn.4056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Yamamoto J, Tonegawa S. Direct medial entorhinal cortex input to hippocampal CA1 is crucial for extended quiet awake replay. Neuron. 2017;96:217–227. e4. doi: 10.1016/j.neuron.2017.09.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Viswanathan GM, et al. Optimizing the success of random searches. Nature. 1999;401:911. doi: 10.1038/44831. [DOI] [PubMed] [Google Scholar]
- 36.Pfeiffer BE, Foster DJ. Autoassociative dynamics in the generation of sequences of hippocampal place cells. Science. 2015;349:180–184. doi: 10.1126/science.aaa9633. [DOI] [PubMed] [Google Scholar]
- 37.Browne CB, et al. A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games. 2012;4:1–43. [Google Scholar]
- 38.Colgin L, et al. Frequency of gamma oscillations routes flow of information in the hippocampus. Nature. 2009;462:353–357. doi: 10.1038/nature08573. [DOI] [PubMed] [Google Scholar]
- 39.Dayan P. Improving Generalization for Temporal Difference Learning: The Successor Representation. Neural Computation. 1993;5:613–624. [Google Scholar]
- 40.Suh J, Foster D, Davoudi H, Wilson M, Tonegawa S. Impaired hippocampal ripple-associated replay in a mouse model of schizophrenia. Neuron. 2013;80:484–493. doi: 10.1016/j.neuron.2013.09.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Karlsson M, Frank L. Awake replay of remote experiences in the hippocampus. Nature Neuroscience. 2009;12:913–918. doi: 10.1038/nn.2344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wikenheiser AM, Redish AD. Decoding the cognitive map : ensemble hippocampal sequences and decision making. Current Opinion in Neurobiology. 2015;32:8–15. doi: 10.1016/j.conb.2014.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Gershman SJ, Horvitz EJ, Tenenbaum JB. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science. 2015;349:273–278. doi: 10.1126/science.aac6076. [DOI] [PubMed] [Google Scholar]
- 44.Lieder F, Griffiths T, Hsu M. Overrepresentation of extreme events in decision making reflects rational use of cognitive resources. Psychological Review. 2018;125:1–32. doi: 10.1037/rev0000074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Kang L, DeWeese M. Replay as wavefronts and theta sequences as bump oscillations in a grid cell attractor network. Elife. 2019;8 doi: 10.7554/eLife.46351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Witter MP, Canto CB, Couey JJ, Koganezawa N, O’Reilly KC. Architecture of spatial circuits in the hippocampal region. Philosophical Transactions of the Royal Society B: Biological Sciences. 2014;369 doi: 10.1098/rstb.2012.0515. 20120515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Lisman J, Jensen O. The Theta-Gamma Neural Code. Neuron. 2013;77:1002–1016. doi: 10.1016/j.neuron.2013.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Hills T, et al. Exploration versus exploitation in space, mind, and society. Trends Cogn Sci. 2015;19:46–54. doi: 10.1016/j.tics.2014.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Botvinick MM, Niv Y, Barto AC. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition. 2009;113:262–280. doi: 10.1016/j.cognition.2008.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Schapiro A, Rogers T, Cordova N, Turk-Browne N, Botvinick M. Neural representations of events arise from temporal community structure. Nature Neuroscience. 2013;16:486–492. doi: 10.1038/nn.3331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Sutton R, Barto A. Reinforcement Learning: An Introduction. MIT Press; 2018. [Google Scholar]
- 52.Cover T, Thomas J. Elements of Information Theory. Wiley; 2006. [Google Scholar]
- 53.Viswanathan GM, Da Luz MG, Raposo EP, Stanley HE. The physics of foraging: an introduction to random searches and biological encounters. Cambridge University Press; 2011. [Google Scholar]
- 54.Discovering options for exploration by minimizing cover time. International Conference on Machine Learning; PMLR; 2019. pp. 3130–3139. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
No data was acquired for this study.
Simulation code was written using open source packages in the Python 3 programming environment and can be found at https://github.com/dmcnamee/FlexModEHC.