Neural kernels for recursive support vector regression as a model for episodic memory

Christian Leibold

doi:10.1007/s00422-022-00926-9

. 2022 Mar 29;116(3):377–386. doi: 10.1007/s00422-022-00926-9

Neural kernels for recursive support vector regression as a model for episodic memory

Christian Leibold ^1,^✉

PMCID: PMC9170657 PMID: 35348879

Abstract

Retrieval of episodic memories requires intrinsic reactivation of neuronal activity patterns. The content of the memories is thereby assumed to be stored in synaptic connections. This paper proposes a theory in which these are the synaptic connections that specifically convey the temporal order information contained in the sequences of a neuronal reservoir to the sensory-motor cortical areas that give rise to the subjective impression of retrieval of sensory motor events. The theory is based on a novel recursive version of support vector regression that allows for efficient continuous learning that is only limited by the representational capacity of the reservoir. The paper argues that hippocampal theta sequences are a potential neural substrate underlying this reservoir. The theory is consistent with confabulations and post hoc alterations of existing memories.

Supplementary Information

The online version contains supplementary material available at 10.1007/s00422-022-00926-9.

Keywords: Theta sequences, Episodic memory, Recursive support vector regression

Introduction

To retrieve episodic memories, brains need to elicit robust internal sequences of neuronal activity patterns that are linked to previous sensory-motor experiences. Thus, neural processes need to be in place that form such activity sequences as well as link them to sensory-motor areas while learning. Episodic memories are further known to be able to change over time by reconsolidation (Sara 2000; Nader et al 2000; Milekic and Alberini 2002; Alberini and Ledoux 2013), eventually even leading to false memories of events that never happened (Loftus 1992; Hyman Jr. et al 1995). This suggests that the architecture of episodic memory is versatile and local in time in the sense that any pair of memory items can be connected into a memory episode independent of context.

Since electrophysiological recordings in animals prohibit correlating activity sequences to introspective retrieval of episodic memories, memory-related activity sequences are typically studied in rodents in association with behavioral performance in navigational tasks (Lee and Wilson 2002; Karlsson and Frank 2009). Activity sequences of hippocampal place cells thereby have been reported to correlate to (Lee and Wilson 2002; Dragoi and Buzsáki 2006; Foster and Wilson 2007) and to causally explain (Jadhav et al 2012; Fernández-Ruiz et al 2019) memory-dependent navigation. Sequences have furthermore been found to exist even before a specific spatial experience has been made by an animal (Dragoi and Tonegawa 2011, 2014; Farooq and Dragoi 2019), suggesting that, at least part of the learning process is about establishing synaptic connections between existing intrinsic neuronal sequences and the sensory-motor areas that represent the content of the memory episode.

The idea that multi-purpose intrinsic neuronal dynamics is used to represent time series of extrinsic events has been invented multiple times under the names of echo-state networks (Jaeger and Haas 2004), liquid computing (Maass et al 2002) and reservoir computing (Jaeger 2005; Schrauwen et al 2007; Lukoševičius and Jaeger 2009) and has proven to be both computationally powerful and versatile (Maass et al 2002; Sussillo and Abbott 2009), particularly, since multiple output functions can be learned on the same intrinsic activity trajectory (sequence) and played out in parallel.

There has been considerable previous work on how to construct a dynamical reservoir via the dynamics of neuronal networks (Haeusler and Maass 2007; Sussillo and Abbott 2009; Lazar et al 2009). Also different learning rules for the synapses from the sequence reservoir to the output neurons were successfully explored, such as the perceptron rule (Maass et al 2002), a Hebb rule (Leibold 2020) or recursive least squares-derived rules (Williams and Zipser 1989; Stanley 2001; Jaeger and Haas 2004; Sussillo and Abbott 2009). The more general applicability of reservoir computing to neuroscience is, however, still limited because several open questions remained, particularly about how to relate reservoir computing ideas to neurophysiological data: For example, can sufficiently rich reservoirs be realized with spiking neuronal networks? How can be found out whether reservoir spiking activity bears meaningful representations in the sense of Marr’s second level—as opposed to just being a “liquid” black box? Can a regression type learning rule be neuronally implemented using local Hebbian principles? How can new information (including false memories) be added to an existing episode or specific memory items be deleted? Can physiologically plausible models realize the universal approximation property (Grigoryeva and Ortega 2018) or are the limits of learning already imposed by interference of weight updates at output synapses below the capacity limit (Amit and Fusi 1994)? Particularly the latter problem is of fundamental importance in applying reservoir computing ideas to brain activity data, since available recordings (and also most models) are generally restricted to only a relatively small number of neurons (limiting representational capacity), whereas a whole real brain has been close to infinite capacity for all practical (here experimental) purposes. Finding a representation of reservoir activity would thus eliminate capacity limitations and allow for efficient representation of a huge set of sensory-motor experiences in the synaptic weights.

Here, I propose a neuronal implementation of recursive kernel support vector regression as an efficient one-shot learning rule that is only limited by the representational capacity of the dynamical reservoir and allows for importance scaling (as compared to only graceful decay). Kernels thereby allow reservoir activity to be interpreted as representations in the sense of Marr (Hermans and Schrauwen 2012), which adds to theoretical neuroscience by allowing for specific interpretations of neural activity. For example, I will argue that theta sequences of hippocampal place cells (Foster and Wilson 2007) implement a kernel that represents distance in time or space and that the integration of auditory nerve activity at different delays implements a kernel representing time for acoustic stimuli in a cochlear frequency band. Below the capacity limit, the learning rule implements the well-known recursive (Gauss-Legendre) least mean squares (or FORCE) rule (Sussillo and Abbott 2009) on the underlying neuronal patterns, showing that FORCE-learning is only limited by the capacity of the simulated or measured reservoir.

Results

Let us consider an episodic experience to be fully reflected by the sensory-motor evoked summed postsynaptic potentials $y_{t}^{(k)}$ at all involved neurons k for all points in time t. In order to store the episodic experience as a memory, the synaptic inputs $y_{t}^{(k)}$ need to be linked to a preexisting reservoir state $x_{t}$ such that, whenever $x_{t}$ is present afterward, the learned synaptic connections $w^{(k)}$ from the reservoir evoke the same depolarizations

\begin{matrix} y_{t}^{(k)} = x_{t}^{T} w^{(k)} \end{matrix}

without the presence of the original sensory-motor activity (Fig. 1), i.e., $w^{(k)}$ solve a regression problem with $x_{t}$ as regressors. Considering only depolarizations $y^{(k)}$ in such a one-layer feedforward network, one does not need to consider nonlinearities during spike generation and, with reasonable approximation, the model is an effectively linear network. It also should be noted that in this paper I do not intend to explain the nature of the preexisting sequences $x_{t}$ , and just assume that they exist. For the sake of simplicity, I further drop the neuron index k, since all considerations trivially generalize to multiple neurons.

Besides the scalar product in Eq. (1), biological feasibility imposes two more constraints on how one models learning. First, synaptic plasticity should be activity-dependent and therefore the weights should be a superposition of existing neuronal activity patterns,

\begin{matrix} w = \sum_{t = 1}^{P} x_{t} u_{t} = X u \end{matrix}

with $X = (x_{1}, \dots, x_{P})$ (see Sect. 4 on representer theorem). Second, the learning rule needs to be recursive, i.e., new input–output pairs $(x_{P + 1}, y_{P + 1})$ should be added such that Eq. (1) holds for all previous patterns (no interference) until the capacity limit and memory decay beyond the capacity limit should be importance based. In short, the learning rule is supposed to identify the loads $u$ such that the outputs $y_{t}$ are exactly recovered by the model,

\begin{matrix} y = X^{T} X u . \end{matrix}

As long as the kernel matrix $K = X^{T} X$ is invertible (below the capacity limit), the solution for $u$ is exact and straightforward. For non-invertible or badly conditioned K (at or above the capacity limit), the standard approach would be to use the pseudo-inverse $K^{*}$ of K, which optimizes the mean squared deviation between output $y$ and model output $K K^{*} y$ and leads to the classical recursive least squares (RLS) algorithm if applied recursively. RLS on the loads $u$ , however, has two main disadvantages. First, RLS makes explicit use of time making it hard to modify memories by post hoc insertion of new detail within an existing memory sequence. Second, RLS on the loads $u$ is hard to interpret biologically.

I therefore suggest, as an alternative approach, to solve the regression problem by maximizing

\begin{matrix} W (u) = - \frac{1}{2} u^{T} K u + y^{T} u, \end{matrix}

which, for invertible K, yields the exact recovery condition Eq. (3), therefore justifying the use of $W$ as the underlying objective function. Moreover, the maximization problem from Eq. (4) can be derived as the dual problem of support vector regression for $ε$ -insensitive loss (see Sect. 3 and Vapnik 1995; Schölkopf and Smola 2002), further supporting the interpretation of regression.

Since support vector approaches translate to nonlinear models using the kernel trick $K_{nm} = x_{n}^{T} x_{m} \to κ (n, m)$ (Vapnik 1995), the model also provides a foundation for neural implementations of kernels, which can be considered as representations of the topological space spanned by n and m. In the same sense as Marr saw representations to be connected to the algorithmic level, the kernel represents the space of n and m in a sufficient way to fully specify the outlined regression algorithm, and thus, following (Hermans and Schrauwen 2012), I suggest to consider it as being the true neural representation of this space in contrast to considering representations as activity patterns in undersampled cell populations.

Maximizing $W$ results in an update rule for $u$ (see Sect. 3) that translates into a weight change $Δ w = X Δ u$ of

\begin{matrix} Δ w = \underset{e_{P}}{\underset{⏟}{(y_{P} - w^{T} x_{P})}} \frac{N x_{P}}{x_{P}^{T} N x_{P}} . \end{matrix}

with $N = 1 l - X K^{- 1} X^{T}$ , and an iteration rule

\begin{matrix} N \leftarrow N - \frac{(N x_{P}) {(N x_{P})}^{T}}{x_{P}^{T} N x_{P}}, \end{matrix}

that is equivalent to RLS without forgetting (i.e., without regularization). The learning rule is one shot in the sense that, for any new pattern, the update rules have to be applied only once and it allows for the functional interpretation error ( $e_{P}$ ) times novelty ( $N$ ): Because $1 l - N$ is a projection operator (see Sect. 3), $N x_{P}$ will be 0 whenever $x_{P}$ equals one of the previous patterns already included in X, whereas any component of $x_{P}$ that is orthogonal to all patterns in X will be unaffected by $N$ . The action of $N$ can thus be computationally interpreted as novelty detection. For a naive learner ( $P = 0$ ), the rule is plain Hebbian, since the error equals the output and the novelty equals the input pattern. In Sect. 4, I will suggest a biologically feasible implementation of $N$ and its learning as anti-Hebbian updates of a recurrent neural network. Importantly, the translation into neuron space resulting in Eqs. (5) and (6) is only required to show how the learning rule can be biologically implemented. In contrast to RLS, it is not necessary to use these update rules for all ensuing applications, which are only relying on the numerically much more tractable update rule for the loads $u$ presented in Eq. (7) in Sect. 3.

As a first neuroscience application, I refer to hippocampal theta sequences (Fig 2A): Roughly, one considers a subset of place cells to fire in sequence in every cycle of the hippocampal theta oscillation of the local field potential (about 8 Hz in rodents). In the subsequent cycle, the starting neuron of the previous cycle drops out of the sequence but a new neuron is added at the end of the sequence. Thus the activity patterns of close-by cycles are similar, whereas they become more and more distinct the further the cycles are spaced apart.

In the simple theta sequence model outlined above, the overlap (scalar product) of activity patterns decays linearly (see Sect. 3) implementing a kernel $K_{mn} = κ (n - m)$ as a function of the distance $n - m$ of the two cycles (Fig. 2B).

Inserting the triangular linear kernel from Fig. 2B into the learning rule derived by recursively maximizing $W$ , one can recover the original signal $y_{t}$ without simulating the underlying reservoir. Increasing detail of the original signal can be retrieved the more pairs ( $x_{t}, y_{t}$ ) one takes into account for learning (Fig. 2C). Since the kernel is a continuous function, the capacity has become infinite, i.e., any function $y_{t}$ can be recovered if the neuron number N becomes infinite.

As mentioned above, generalization to multiple neurons is trivial, and to illustrate let us consider each output neuron to reflect one RGB color channel of any pixel in a movie ( 1.3 million neurons). Using only 20 of 110 movie frames already allow for recovery of the movie snippet with a compression below 20% (Fig. 2D, E).

By construction, the learning rule has no explicit dependence on time; thus, the order in which pairs $(x_{t}, y_{t})$ are presented makes no difference to the final fit (Fig. 3A), which is not the case for the FORCE rule derived from classical least squares.

Fig. 3 — Post hoc addition of memory items. A Left: Retrieval (red) of a low-pass noise signal (see Fig. 2C) of length $T = 100$ for $P = 15$ randomly positioned inputs (circles). Right: Same as left after 35 further inputs (crosses) have been iteratively added to the learning process. B Illustration of A for post hoc insertion of a movie scene. Top: original movie sequence ( $P = 20$ ). Bottom: Movie sequence after a new scene has been inserted to the original snippet ( $P = 35$ ). Movies are provided in Videos S3 and S4

Biologically, this means that any episode can be post hoc modified by learning new pairs $(x_{t}, y_{t})$ with temporal contingencies reflected in the kernel arguments, generating a model of false memories (Fig. 3B).

Every memory system is finite and the way of forgetting fundamentally determines its usefulness for practical applications. A graceful decay of memories over time (Amit and Fusi 1994) is already quite an advantage to catastrophic forgetting in attractor networks (Hopfield 1982); however, the behavioral relevance of a memory may not just depend on how old or young it is. I therefore introduce an importance scaling into the learning rule in that loads $u_{t}$ are multiplied with some attenuation factor $0 \leq a_{t} \leq 1$ . Thus, if one chooses $a_{t} = λ^{(T - t)}, 0 < λ < 1$ one retains a graceful decay over time as in standard RLS. The resulting learning rule that maximizes the modified $W$ is then obtained by only the small modification of replacing the kernel $κ (n, m)$ by $κ (n, m) a_{n} a_{m}$ (see Sect. 3). The effect of importance scaling is illustrated in Fig. 4A, B, where the learning rule is told to pay more attention to a certain time interval at the cost of worse reconstruction in other time intervals.

Fig. 4 — Importance scaling. A Retrieval (red) of a low-pass noise signal (black; see Fig. 2C) for attenuation parameters $a_{t} = λ^{| P^{*} - t |}$ with $λ = 0.999$ and varying importance centers $P^{*}$ (see titles). B Illustration using the movie snippet from Fig. 2 with importance in the beginning ( $a_{t} = λ^{t}$ ,top) and in the end ( $a_{t} = λ^{110 - t}$ , bottom). In the image sequence on top one stills sees an erroneous reflection of the glass in the last three images, whereas in the bottom sequence the glass in the first to frames shown erroneously displays the yellowish colors from the end. Movies are provided in Videos S5 and S6. C Left: Retrieval (red) of a low-pass noise signal (black) with $N = 20, 000$ time steps (only shown between time step 6,000 and 6,500) and $P = 500$ patterns (crosses) with random importance values a (cyan) between 0.5 and 1. Middle: Reconstruction error (absolute difference between black an red line) negatively correlates with a for all $P = 500$ patterns. Right: Error has no dependence on time

Importance may randomly vary over time and thus temporal contingency in a values should not be a necessary prerequisite for importance scaling. Applying the learning rule in a scenario with random a values shows that retrieval error is indeed largest for small a independent of time (Fig. 4C). Post hoc increase of a could thus be considered as a model of memory consolidation, post hoc decrease of a as a model of extinction learning.

With importance scaling as a weighting mechanism at hand, let us now revisit the original capacity question. In the language of the recursive updating rules from equations (7) and (8) the memory and computational demand scale with square of the number of patterns P. A straightforward choice to limit the capacity is to introduce a cutoff dimension $d_{c}$ such that only the $d_{c}$ patterns with highest importance values a are stored in the algorithm and the other dimensions are set to 0. In Fig. 5A, B I vary $d_{c}$ for low-pass filtered noise signals of different length with linearly increasing importance toward the signal end and observe that for low $d_{c}$ , the reconstruction error increases relatively soon, whereas for $d_{c} ⪆ 300$ reconstruction worked well even for signal lengths up to 10 times larger than $d_{c}$ , which reflects that the geometry of the kernel fits the correlational structure of the signal.

Fig. 5 — Capacity. A Example reconstructions (green) of a signal (orange) for smaller (top) and larger (bottom) cutoff dimensions ( $d_{c} = 100$ and 300, respectively). Sample points used for reconstruction (signal length p = cycles / 2) are shown in blue. For the left panels the signal length equals the cutoff dimension. On the right the signal is 10 times longer than the cutoff (only 1000 data points shown). B Reconstruction error (root mean squared) for different cutoff dimensions (colors as indicated) as a function of signal length (solid lines indicate mean from 20 repetitions, shaded areas the 90 percent quantile). The results are derived from a low-pass noise signal with a running average over 100 time steps and a triangular kernel with length 25 time steps

The need to adjust the kernel length to the time scale of signal fluctuations suggests that more specific signal properties require more specifically designed kernels. In most neuroscience applications, sensory signals are not random but reflect physical constraints of the environment or the sensory periphery. As a next example I therefore consider functions with bandpass characteristics similar to cochlear frequency channels. Knowledge about the preferred local structure of a function (oscillations with a certain frequency) suggests a kernel with similar bandpass characteristics (see Sect. 3 and Fig. 6A). In contrast to the triangular linear kernel which only represents temporal distance, the band kernels represent temporal distance (by their decay) and frequency.

Fig. 6 — Sound reconstruction. A Kernels representing time in a frequency channel with center frequency on top (see Sect. 3 *Frequency kernels*). B Retrieval (red) of the signal (black) in five of the frequency channels (crosses mark memory items). C Reconstruction (red; moved upward for reasons of illustration) of the original sound signal (black; the beginning of the song http://ccmixter.org/files/texasradiofish/63300, CC BY NC) by summing over the filter-weighted channel components (see Sect. 3 *Frequency kernels*). Reconstructed sound file is provided in Audiofile S8, well as the identically filtered original sound wave (Audiofile S7)

Learning is then performed on each cochlear frequency channel separately and the fitting benefits from both recovering the function values at a few points and the fine structure of the kernel. A post hoc synthesis across frequency channels recovers the original soundwave with high fidelity and smaller memory demand as the original sampling (see Sect. 3 Frequency kernels).

Methods

Recursive support vector regression

Linear support vector regression with $ε$ -insensitive loss (Schölkopf and Smola 2002; Vapnik 1995) is derived from minimizing the squared L $_{2}$ -norm $\frac{1}{2} {‖ w ‖}^{2}$ of the weight vector of the linear model $f (x) = w^{T} x + b$ under the inequality constraints $- (ε + ζ_{n}) \leq y_{n} - f (x_{n}) \leq ε + ζ_{n}^{*}$ , with $ζ_{n}, ζ_{n}^{*} \geq 0$ , and including the sum of slack variables $\sum_{n} (ζ_{n} + ζ_{n}^{*})$ as a regularizer.

The classical work has shown that the resulting optimal solution yields a weight vector of shape

\begin{matrix} w = \sum_{n} (α_{n}^{*} - α_{n}) x_{n} \end{matrix}

that maximizes the dual problem

\begin{matrix} W (u, v) = - \frac{1}{2} u^{T} K u + y u - ε \sum_{n} v_{n} \end{matrix}

with $K_{nm} = x_{n}^{T} x_{m}$ , $u_{n} = α_{n}^{*} - α_{n}$ , $v_{n} = (α_{n}^{*} + α_{n})$ under the constraints $α, α^{*} \geq 0$ . Hence, for every local maximum of $W$ regarding $u$ , there is a combination of $α_{n}, α_{n}^{*}$ that minimizes $\sum_{n} v_{n}$ , i.e., $α_{n} = 0$ if $u_{n} > 0$ and $α_{n}^{*} = 0$ if $u_{n} < 0$ . For $ε \to 0$ , the maximum in $(u, v)$ converges to $α_{n} = 0$ or $α_{n}^{*} = 0$ , and thus, in this limit, one can drop $v$ from the equations.

Here, a recursive learning rule is derived such that $W$ remains at this maximum if a new observation $(y_{p}, x_{P})$ is added. One therefore denotes $u^{T} = ({\tilde{u}}^{T}, u_{P})$ , $y^{T} = ({\tilde{y}}^{T}, y_{P})$ , and $\tilde{X} = (x_{1}, \dots, x_{P - 1})$ and finds the optimum of

\begin{matrix} W ({({\tilde{u}}^{T}, u_{P})}^{T}) = & - \frac{1}{2} {\tilde{u}}^{T} \tilde{K} \tilde{u} - u_{P} x_{p}^{T} \tilde{X} \tilde{u} \\ - \frac{1}{2} u_{P}^{2} K_{PP} + \tilde{y} \tilde{u} + y_{P} u_{P} \end{matrix}

\begin{matrix} 0 = & \partial_{\tilde{u}} W = - \tilde{K} \tilde{u} - u_{P} {\tilde{X}}^{T} x_{P} + \tilde{y} \to \\ \tilde{u} = & {\tilde{K}}^{- 1} (\tilde{y} - u_{P} {\tilde{X}}^{T} x_{P}) \end{matrix}

and

\begin{matrix} 0 = \partial_{u_{P}} W = - x_{P}^{T} \tilde{X} \tilde{u} - K_{PP} u_{P} + y_{P} \\ 0 = y_{P} - x_{p}^{T} \tilde{X} {\tilde{K}}^{- 1} \tilde{y} - u_{P} (K_{PP} - x_{P}^{T} \tilde{X} {\tilde{K}}^{- 1} {\tilde{X}}^{T} x_{P}) \end{matrix}

If one denotes the optimum loads of the previous $P - 1$ inputs by ${\tilde{u}}^{'} = {\tilde{K}}^{- 1} \tilde{y}$ , one can express the optimality conditions using $x_{P}^{T} \tilde{X} {\tilde{u}}^{'} = x_{P}^{T} w$ , as

\begin{matrix} u_{P} = & \frac{y_{P} - x_{P}^{T} w}{K_{PP} - x_{P}^{T} \tilde{X} {\tilde{K}}^{- 1} {\tilde{X}}^{T} x_{P}} \\ \tilde{u} = & {\tilde{u}}^{'} - u_{P} {\tilde{K}}^{- 1} {\tilde{X}}^{T} x_{P} \end{matrix}

The update rules for $u$ from Eq. (7) require computation of the inverse of $\tilde{K}$ , which, a) is computationally costly and, b) biologically not straightforward. I therefore derived an iteration rule using the Sherman–Morrison–Woodbury identity (Nocedal and Wright 2006), which yields an iteration equation for $K^{- 1}$ from the $P - 1$ st to the Pth pattern

\begin{matrix} K^{- 1} = (\begin{matrix} {\tilde{K}}^{- 1} & 0 \\ 0^{T} & 0 \end{matrix}) + C_{P}^{- 1} (\begin{matrix} \tilde{Q} {\tilde{Q}}^{T} & - \tilde{Q} \\ b m {\tilde{Q}}^{T} & 1 \end{matrix}) \end{matrix}

with $\tilde{Q} = {\tilde{K}}^{- 1} {\tilde{X}}^{T} x_{P}$ and $C_{P} = K_{PP} - x_{P}^{T} \tilde{X} {\tilde{K}}^{- 1} {\tilde{X}}^{T} x_{P}$ . The iteration equation (8) can be proven by elementary algebra ( $K^{- 1} K = 1 l$ ).

Remarks

Translation of update rules from Eq. (7) to weight updates $Δ w = X Δ u$ is straightforward:
$\begin{matrix} Δ w = & (\tilde{X}, x_{P}) (\begin{matrix} \tilde{u} - {\tilde{u}}^{'} \\ u_{p} \end{matrix}) = (- \tilde{X} {\tilde{K}}^{- 1} {\tilde{X}}^{T} + 1 l) x_{P} u_{P} \\ = & (y_{P} - {\tilde{x}}_{p}^{T} w) \frac{(1 l - \tilde{X} {\tilde{K}}^{- 1} {\tilde{X}}^{T}) x_{P}}{x_{P}^{T} (1 l - \tilde{X} {\tilde{K}}^{- 1} {\tilde{X}}^{T}) x_{P}} ; \end{matrix}$
see result from Eq. (5).
$1 l - N : = \tilde{X} {\tilde{K}}^{- 1} {\tilde{X}}^{T}$ , and $N$ are projection operators, since ${[1 l - N]}^{2} = [1 l - N]$ and $N^{2} = N$ .
If $P - 1 \leq N$ and patterns are linearly independent, $\tilde{K}$ is a Gramian and, hence, invertible.
For $P - 1$ exceeding N, $\tilde{K}$ can no longer be exactly inverted. Formally this is not necessary using a kernel representation, since the kernel operates on an infinite-dimensional Hilbert space. Biologically, for a finite number N of neurons, approximate inversion can be obtained by importance scaling (see below).
Recursively adding data points continuously increases the dimensions of the matrix $K^{- 1}$ and, hence, memory and computational costs. A brute force strategy to avoid this numerical divergence is to introduce a cutoff dimension, after which one removes the patterns with lowest importance values a. For all figures except Fig. 5, in which we explicitly study this parameter, we used a cutoff dimension of 300.

Importance scaling

Importance is introduced by attenuation factors $0 \leq a_{t} \leq 1$ that scale the inequality constraints of support vector regression: $- (ε + ζ_{n}) \leq a_{n} [y_{n} - f (x_{n})] \leq ε + ζ_{n}^{*}$ . If $a_{n}$ is small, slack variables can also be small and the pair $(y_{n}, x_{n})$ contributes little to the loss via the regularizer. The resulting optimal solution is very similar to the one without attenuation factors, only the weight vector are now

\begin{matrix} w = \sum_{t} u_{t} a_{t} x_{t} \end{matrix}

which, in the computation of the recursive learning rule, requires to replace

\begin{matrix} κ (n, m) \to κ (n, m) a_{n} a_{m} . \end{matrix}

Biologically, this rule maps to an attenuation of the inputs $x_{t} \to a_{t} x_{t}$ . Thus, patterns with low $a_{t}$ are treated as more different to patterns with large $a_{t}$ , even if they have similar structure.

The scaling of the kernel also has interesting consequences for situations in which the K is no longer invertible ( $P > N$ ) if constructed from a finite population of neurons. In this case, one nevertheless, can apply the iteration equation (8); however, patterns with small $a_{n}$ will contribute only little to $\tilde{Q}$ as the respective rows are scaled down in ${\tilde{X}}^{T}$ . The resulting matrix is hence no longer an exact inverse, but the patterns for which the “inversion” fails mostly are those with low $a_{n}$ . This is best illustrated by assuming $a_{n} = 0$ , in which case the pattern $x$ has no contribution to $\tilde{Q}$ and hence $K^{- 1}$ , as if it would not have been used for learning. Functionally modulating plasticity with a also allows a post hoc improvement in an existing episodic memory, by setting higher importance $a_{n}$ to this pattern if the episode is presented as second time.

Theta sequences

Sparse binary random patterns $ξ_{n}$ with Prob( $ξ_{n}^{(k)} = 1$ ) $= f$ are assumed to represent hippocampal ensembles that fire together at a specific phase of the theta cycle. Given that S of those ensembles are activated in sequence during a theta cycle the population pattern in cycle t equals

\begin{matrix} x_{t} = \sum_{k = 0}^{S - 1} ξ_{t + k} \end{matrix}

For a population of N neurons, the overlap of two such patterns can be computed as

\begin{matrix} K_{nm} = & \sum_{k k^{'}} ξ_{n + k}^{T} ξ_{m + k^{'}} \underset{N \to \infty}{\to} {[S - | n - m |]}^{+} ⟨ ξ^{2} ⟩ N \\ + (S^{2} - {[S - | n - m |]}^{+} {) ⟨ ξ ⟩}^{2} N . \end{matrix}

For independent binary random variables, one finds $⟨ ξ ⟩ = ⟨ ξ^{2} ⟩ = f$ , and thus the overlap is a linear triangular kernel

\begin{matrix} K_{nm} = & K (| n - m |) \\ = & N ({[S - | n - m |]}^{+} f (1 - f) + {(S f)}^{2}) \end{matrix}

as depicted in Fig. 2.

Frequency kernels

The cochlea separates a sound s(t) into frequency channels that roughly act as band-pass filters and can thus be characterized by a filter kernel $γ_{f} (t)$ , with f denoting the center frequency of the cochlear channel. If one assumes multiple ( $k = 1, \dots, K$ ) auditory nerve fibers to connect to such a frequency channel the linear response of each of those fibers can be modeled as $x_{t}^{(k)} = c_{f} (t - Δ^{(k)}) = (γ_{f} * s) (t - Δ^{(k)})$ with a fiber-specific delay $Δ^{(k)}$ that may reflect differences in fiber lengths, diameters or myelination.

For a large number K of fibers the resulting kernel can be computed as an integral

\begin{matrix} \sum_{k} x_{t}^{(k)} x_{t^{'}}^{(k)} \approx \int d Δ c_{f} (t - Δ) c_{f} (t^{'} - Δ) \\ = \int d u c_{f} (u) c_{f} (t^{'} - t + u) = κ (t^{'} - t), \end{matrix}

which corresponds to the autocorrelation of cochlear response, and for long broadband signals s equals the autocorrelations of the filters $γ_{f}$ . The exponentially decaying kernel used in Fig. 6 reflects exactly such a prototypical autocorrelation.

Specifically, a sound signal (the beginning of the CC BY NC song I’ll be your everything by Texas Radio Fish, http://ccmixter.org/files/texasradiofish/63300) was passed through a gamma tone filterbank consisting of seven channels (center frequencies $2^{k} \times 200$ Hz, $k = 0, \dots, 6$ ) with width constants $2.019 E R B$ (Glasberg and Moore 1990). In each of the channels $ρ_{k}$ data points per cycle (equally spaced) were selected for learning. The parameters $ρ_{k}$ where channel (k-)dependent and equaled 6, 4, 3, 3, 1.5, 1, .25 for $k = 0, \dots, 6$ . The recursive KSVR was fitted in each channel independently in chunks of 500 data points.

For full audio reconstruction, the reconstructed signals were Fourier-transformed in each band and divided by the Fourier transforms of the respective gammatone filter kernel omitting frequencies below 10 Hz and above 20 kHz. These filter-corrected components were backtransformed, summed and rescaled to the root mean square level of the original signal.

Discussion

Kernel support vector regression (KSVR) is a powerful tool for function fitting. Here, I presented a biologically plausible neural implementation of recursive KSVR that enables storing episodic memories as temporal sequences of retrieved sensory-motor activity patterns $y_{t}$ (i.e., fitting $y_{t}$ ). The kernels can be biologically interpreted as scalar products of activity patterns $x_{t}$ of a reservoir and provide a neural representation of temporal distance.

Hippocampal theta sequences provide a well-known example that realize exactly such a reservoir. However, already in the hippocampus, neuronal activity not only consists of sequence-type activity, but also exhibits rate modulations induced by changes in the sensory environment generally known as remapping (Muller and Kubie 1987; Leutgeb et al 2005; Fetterhoff et al 2021). Thus, behavior-related neuronal activity may always contain both reflections $W x_{t}$ of the reservoir and feedforward sensory motor drive, thereby balancing expectations (i.e., reservoir-driven activity) and sensory reality. This combination of top-down and bottom-up input streams is widely considered to be a general design principle of the neocortex (Douglas and Martin 2004; Larkum 2013), resulting in sensory-motor activity patterns $y_{t}$ at the same time reflecting stimulus-driven responses and intrinsic dynamics as, for example, reflected by synfire chains (Abeles et al 1993).

While the view of neocortex as a hierarchical combination of sensory-motor prediction loops (Ahissar and Kleinfeld 2003) is probably a good proxy of the neurobiological substrate, it is not widely explored in classical artificial neural network research. There, the universal approximation theorem, as a hallmark result, states that neural networks can approximate any function to arbitrary degree of precision (Cybenko 1989; Hornik 1991) which rather views brains as feedforward function fitting devices. The field of reservoir computing has extended this idea toward the temporal domain by suggesting intrinsic neural dynamics to represent a time axis as the independent variable of function fitting (Jaeger 2005) and thereby allows neural networks to generate predictions varying with time. However, to be able to operate on a continuous stream of sensory inputs, the learning rules for the output synapses of the reservoir need to be able to recursively update (Williams and Zipser 1989; Stanley 2001; Sussillo and Abbott 2009), which requires a biological interpretation of the common least-mean square derived ideas.

Here, I suggest that the iterative update of the projection operation $N$ that only requires anti-Hebbian type outer products can be implemented as anti-Hebbian learning of a simple recurrent neuronal network: In the neural space of synaptic weights, $X K^{- 1} X^{T} = 1 l - N$ is of outer product form as seen from Eq. (8). The matrix $N = 1 l - X K^{- 1} X^{T}$ can thus be interpreted as the connectivity of a recurrent neural network that is learned by anti-Hebbian updates, i.e.,

\begin{matrix} N = 1 l - \sum_{t = 1}^{P - 1} r_{t} r_{t}^{T} \end{matrix}

with $r_{P} = N x_{P} / \sqrt{x_{P}^{T} N x_{P}}$ . Since $X K^{- 1} X^{T}$ is a projection matrix (see Sect. 3), one furthermore can write $r_{t} = x_{t}^{⊥} / ‖ x_{t}^{⊥} ‖$ with $x_{t}^{⊥}$ being the component of $x_{t}$ that is orthogonal to all previously learned patterns.

This leads to the following interpretation of $r$ as the activity of a neural network in discrete time s

\begin{matrix} r (s + 1) = ϕ [δ_{s, 0} x + N r (s)], \end{matrix}

where the network is initialized at $r (s = 0) = 0$ , the input $x$ is present only at time step $s = 0$ , and $ϕ (z) = z / ‖ z ‖$ . As a result of this dynamics $r (s) = x_{t}^{⊥} / ‖ x_{t}^{⊥} ‖$ for all time steps $s > 1$ . This dynamical fixed point state will then produce an anti-Hebbian weight update from Eq. (9).

A further drawback of RLS-derived rules was their lacking theoretical foundation since they made explicit use of the reservoir patterns that, for technical reasons, were limited to a small subsample of neurons. Here, I use the generalized representer theorem (Schölkopf et al 2001) to translate the weight update into an update rule for the loads (coefficients) $u$ of the input patterns X and thereby avoid an explicit representation of the neural feature space $x_{t}$ and instead only require a kernel representation (Hermans and Schrauwen 2012). Formulation of the learning rule on the loads allows analytical insights for reservoirs of size $N \to \infty$ , but also reduces the computational demand of simulating (or recording) from a large number of neurons.

Importantly, this paper considers reservoir activity only in the context of memory retrieval but not replay of reservoir sequences. Replay in the context of reservoirs has often been used to improve performance and stability (Mayer and Browne 2004; Jaeger 2010; Sussillo and Abbott 2012; Reinhart and Jakob Steil 2012; Laje and Buonomano 2013; Jaeger 2017; Leibold 2020). However, changing reservoir patterns would require to also change the readout-matrix to maintain the originally learned memory traces $y_{t}$ (Sussillo and Abbott 2012; Reinhart and Jakob Steil 2012; Jaeger 2017). In the context of the model presented here, relearning is not necessary as long as the kernel remains fixed, i.e., the topology of the space is constant. Neurobiologically, however, such a trick would require to change the weights $w$ by replacing the matrix X.

I presented two neurobiological examples of how kernel representations are or may be implemented, hippocampal theta sequences and auditory nerve fiber populations. Temporal sequences of activation patterns, however, are ubiquitous in sensory-motor systems and occur on multiple time scales. Thus the proposed theory may also apply to a multitude of other examples. A prerequisite is to find a continuous representation of time in the population patterns that then translates via a scalar product into kernels with continuous time dependence. Further such examples could be the long-term changes of the hippocampal rate code of place cells (Mankin et al 2012; Ziv et al 2013), activation of cerebellar purkinje cells during limb movements (Hewitt et al 2011), or olfactory-driven activity that evolves along fixed trajectories after odor presentation (Stopfer and Laurent 1999).

Supplementary Information

Below is the link to the electronic supplementary material.

Download video file^{(388.9KB, mp4)}

Original movie snippet from .re_potemkin, a copyleft crowdsourcing free/open source cinema project (https://re-potemkin.httpdot.net/)

Download video file^{(250.7KB, mp4)}

Reconstruction of the original movie snippet (Video 1).

Download video file^{(337KB, mp4)}

Original movie with time extended in the middle.

Download video file^{(590.4KB, mp4)}

As Supplemental Video 3 but with a new scene added post hoc.

Download video file^{(461KB, mp4)}

Reconstruction of original movie (Video 1) with importance reduced in the last frames.

Download video file^{(322.4KB, mp4)}

Reconstruction of original movie (Video 1) with importance reduced in the first frames.

Download audio file^{(4MB, wav)}

Filtered version (see Methods) of the original sound snippet song from the song I{\rsquo}ll be your everything by Texas Radio Fish (http://ccmixter.org/files/texasradiofish/63300, CC BY NC)

Reconstruction of Supplemental Audio 1.^{(4MB, wav)}

Acknowledgements

The work was funded by the German Research Association (DFG) under Grant number LE2250/13-1. The author is indebted to Stefan Häusler for discussions and comments on the manuscript.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Repositories

The core recursive KSVR implementation can be found at https://github.com/cleibold/recsvr.

Declarations

Conflict of interest

The author declares no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Abeles M, Bergman H, Margalit E, et al. Spatiotemporal firing patterns in the frontal cortex of behaving monkeys. J Neurophysiol. 1993;70(4):1629–1638. doi: 10.1152/jn.1993.70.4.1629. [DOI] [PubMed] [Google Scholar]
Ahissar E, Kleinfeld D. Closed-loop neuronal computations: focus on vibrissa somatosensation in rat. Cereb Cortex. 2003;13(1):53–62. doi: 10.1093/cercor/13.1.53. [DOI] [PubMed] [Google Scholar]
Alberini CM, Ledoux JE. Memory reconsolidation. Curr Biol. 2013;23(17):R746–750. doi: 10.1016/j.cub.2013.06.046. [DOI] [PubMed] [Google Scholar]
Amit DJ, Fusi S. Learning in neural networks with material synapses. Neural Comput. 1994;6(5):957–982. doi: 10.1162/neco.1994.6.5.957. [DOI] [Google Scholar]
Cybenko G. Approximation by superpositions of a sigmoidal function. Math Control Signals Syst. 1989;2(4):303–314. doi: 10.1007/BF02551274. [DOI] [Google Scholar]
Douglas RJ, Martin KA. Neuronal circuits of the neocortex. Annu Rev Neurosci. 2004;27:419–451. doi: 10.1146/annurev.neuro.27.070203.144152. [DOI] [PubMed] [Google Scholar]
Dragoi G, Buzsáki G. Temporal encoding of place sequences by hippocampal cell assemblies. Neuron. 2006;50(1):145–157. doi: 10.1016/j.neuron.2006.02.023. [DOI] [PubMed] [Google Scholar]
Dragoi G, Tonegawa S. Preplay of future place cell sequences by hippocampal cellular assemblies. Nature. 2011;469(7330):397–401. doi: 10.1038/nature09633. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dragoi G, Tonegawa S. Selection of preconfigured cell assemblies for representation of novel spatial experiences. Philos Trans R Soc Lond B Biol Sci. 2014;369(1635):20120522. doi: 10.1098/rstb.2012.0522. [DOI] [PMC free article] [PubMed] [Google Scholar]
Farooq U, Dragoi G. Emergence of preconfigured and plastic time-compressed sequences in early postnatal development. Science. 2019;363(6423):168–173. doi: 10.1126/science.aav0502. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fernández-Ruiz A, Oliva A, Fermino de Oliveira E, et al. Long-duration hippocampal sharp wave ripples improve memory. Science. 2019;364(6445):1082–1086. doi: 10.1126/science.aax0758. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fetterhoff D, Sobolev A, Leibold C. Graded remapping of hippocampal ensembles under sensory conflicts. Cell Rep. 2021;36(11):109,661. doi: 10.1016/j.celrep.2021.109661. [DOI] [PubMed] [Google Scholar]
Foster DJ, Wilson MA. Hippocampal theta sequences. Hippocampus. 2007;17(11):1093–1099. doi: 10.1002/hipo.20345. [DOI] [PubMed] [Google Scholar]
Glasberg BR, Moore BC. Derivation of auditory filter shapes from notched-noise data. Hear Res. 1990;47(1–2):103–138. doi: 10.1016/0378-5955(90)90170-T. [DOI] [PubMed] [Google Scholar]
Grigoryeva L, Ortega JP. Echo state networks are universal. Neural Netw. 2018;108:495–508. doi: 10.1016/j.neunet.2018.08.025. [DOI] [PubMed] [Google Scholar]
Haeusler S, Maass W. A statistical analysis of information-processing properties of lamina-specific cortical microcircuit models. Cereb Cortex. 2007;17(1):149–162. doi: 10.1093/cercor/bhj132. [DOI] [PubMed] [Google Scholar]
Hermans M, Schrauwen B. Recurrent kernel machines: computing with infinite echo state networks. Neural Comput. 2012;24(1):104–133. doi: 10.1162/NECO_a_00200. [DOI] [PubMed] [Google Scholar]
Hewitt AL, Popa LS, Pasalar S, et al. Representation of limb kinematics in Purkinje cell simple spike discharge is conserved across multiple tasks. J Neurophysiol. 2011;106(5):2232–2247. doi: 10.1152/jn.00886.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA. 1982;79(8):2554–2558. doi: 10.1073/pnas.79.8.2554. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hornik K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991;4(2):251–257. doi: 10.1016/0893-6080(91)90009-T. [DOI] [Google Scholar]
Hyman IE, Jr, Husband TH, Billings FJ. False memories of childhood experiences. Appl Cognit Psychol. 1995;9(3):181–197. doi: 10.1002/acp.2350090302. [DOI] [Google Scholar]
Jadhav SP, Kemere C, German PW, et al. Awake hippocampal sharp-wave ripples support spatial memory. Science. 2012;336(6087):1454–1458. doi: 10.1126/science.1217230. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jaeger H (2005) Reservoir riddles: suggestions for echo state network research. In: Proceedings of 2005 IEEE international joint conference on neural networks, vol 3, pp 1460–1462. 10.1109/IJCNN.2005.1556090
Jaeger H (2010) Reservoir self-control for achieving invariance against slow input distortions
Jaeger H. Using conceptors to manage neural long-term memories for temporal patterns. J Mach Learn Res. 2017;18(13):1–43. [Google Scholar]
Jaeger H, Haas H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science. 2004;304(5667):78–80. doi: 10.1126/science.1091277. [DOI] [PubMed] [Google Scholar]
Karlsson MP, Frank LM. Awake replay of remote experiences in the hippocampus. Nat Neurosci. 2009;12(7):913–918. doi: 10.1038/nn.2344. [DOI] [PMC free article] [PubMed] [Google Scholar]
Laje R, Buonomano DV. Robust timing and motor patterns by taming chaos in recurrent neural networks. Nat Neurosci. 2013;16(7):925–933. doi: 10.1038/nn.3405. [DOI] [PMC free article] [PubMed] [Google Scholar]
Larkum M. A cellular mechanism for cortical associations: an organizing principle for the cerebral cortex. Trends Neurosci. 2013;36(3):141–151. doi: 10.1016/j.tins.2012.11.006. [DOI] [PubMed] [Google Scholar]
Lazar A, Pipa G, Triesch J. SORN: a self-organizing recurrent neural network. Front Comput Neurosci. 2009;3:23. doi: 10.3389/neuro.10.023.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee AK, Wilson MA. Memory of sequential experience in the hippocampus during slow wave sleep. Neuron. 2002;36(6):1183–1194. doi: 10.1016/S0896-6273(02)01096-6. [DOI] [PubMed] [Google Scholar]
Leibold C. A model for navigation in unknown environments based on a reservoir of hippocampal sequences. Neural Netw. 2020;124:328–342. doi: 10.1016/j.neunet.2020.01.014. [DOI] [PubMed] [Google Scholar]
Leutgeb JK, Leutgeb S, Treves A, et al. Progressive transformation of hippocampal neuronal representations in “morphed” environments. Neuron. 2005;48(2):345–358. doi: 10.1016/j.neuron.2005.09.007. [DOI] [PubMed] [Google Scholar]
Loftus EF. When a lie becomes memory’s truth: memory distortion after exposure to misinformation. Curr Dir Psychol Sci. 1992;1(4):121–123. doi: 10.1111/1467-8721.ep10769035. [DOI] [Google Scholar]
Lukoševičius M, Jaeger H. Reservoir computing approaches to recurrent neural network training. Comput Sci Rev. 2009;3(3):127–149. doi: 10.1016/j.cosrev.2009.03.005. [DOI] [Google Scholar]
Maass W, Natschläger T, Markram H. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 2002;14(11):2531–2560. doi: 10.1162/089976602760407955. [DOI] [PubMed] [Google Scholar]
Mankin EA, Sparks FT, Slayyeh B, et al. Neuronal code for extended time in the hippocampus. Proc Natl Acad Sci USA. 2012;109(47):19,462–19,467. doi: 10.1073/pnas.1214107109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mayer NM, Browne M (2004) Echo state networks and self-prediction. In: Ijspeert AJ, Murata M, Wakamiya N (eds) Biologically inspired approaches to advanced information technology. Springer, Berlin, pp 40–48
Milekic MH, Alberini CM (2002) Temporally graded requirement for protein synthesis following memory reactivation. Neuron 36(3):521–525 [DOI] [PubMed]
Muller RU, Kubie JL. The effects of changes in the environment on the spatial firing of hippocampal complex-spike cells. J Neurosci. 1987;7(7):1951–1968. doi: 10.1523/JNEUROSCI.07-07-01951.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nader K, Schafe GE, Le Doux JE. Fear memories require protein synthesis in the amygdala for reconsolidation after retrieval. Nature. 2000;406(6797):722–726. doi: 10.1038/35021052. [DOI] [PubMed] [Google Scholar]
Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. Springer, New York
Reinhart RF, Jakob Steil J (2012) Regularization and stability in reservoir networks with output feedback. Neurocomputing 90:96–105. 10.1016/j.neucom.2012.01.032. Advances in artificial neural networks, machine learning, and computational intelligence (ESANN 2011)
Sara SJ. Retrieval and reconsolidation: toward a neurobiology of remembering. Learn Mem. 2000;7(2):73–84. doi: 10.1101/lm.7.2.73. [DOI] [PubMed] [Google Scholar]
Schölkopf B, Smola AJ (2002) Learning with kernels : support vector machines, regularization, optimization, and beyond. Adaptive computation and machine learning. MIT Press. http://www.worldcat.org/oclc/48970254
Schölkopf B, Herbrich R, Smola AJ. A generalized representer theorem. In: Helmbold D, Williamson B, editors. Computational learning theory. Berlin: Springer; 2001. pp. 416–426. [Google Scholar]
Schrauwen B, Verstraeten D, Campenhout JMV (2007) An overview of reservoir computing: theory, applications and implementations. In: ESANN 2007, 15th European symposium on artificial neural networks, Bruges, Belgium, April 25–27, 2007, Proceedings, pp 471–482. https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2007-8.pdf
Stanley GB (2001) Recursive stimulus reconstruction algorithms for real-time implementation in neural ensembles. Neurocomputing 38-40:1703–1708. 10.1016/S0925-2312(01)00535-5. Computational Neuroscience: Trends in Research 2001
Stopfer M, Laurent G. Short-term memory in olfactory network dynamics. Nature. 1999;402(6762):664–668. doi: 10.1038/45244. [DOI] [PubMed] [Google Scholar]
Sussillo D, Abbott LF. Generating coherent patterns of activity from chaotic neural networks. Neuron. 2009;63(4):544–557. doi: 10.1016/j.neuron.2009.07.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sussillo D, Abbott LF. Transferring learning from external to internal weights in echo-state networks with sparse connectivity. PLoS One. 2012;7(5):e37,372. doi: 10.1371/journal.pone.0037372. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vapnik VN. The nature of statistical learning theory. New York: Springer; 1995. [Google Scholar]
Williams RJ, Zipser D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1989;1(2):270–280. doi: 10.1162/neco.1989.1.2.270. [DOI] [Google Scholar]
Ziv Y, Burns LD, Cocker ED, et al. Long-term dynamics of CA1 hippocampal place codes. Nat Neurosci. 2013;16(3):264–266. doi: 10.1038/nn.3329. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Download video file^{(388.9KB, mp4)}

Original movie snippet from .re_potemkin, a copyleft crowdsourcing free/open source cinema project (https://re-potemkin.httpdot.net/)

Download video file^{(250.7KB, mp4)}

Reconstruction of the original movie snippet (Video 1).

Download video file^{(337KB, mp4)}

Original movie with time extended in the middle.

Download video file^{(590.4KB, mp4)}

As Supplemental Video 3 but with a new scene added post hoc.

Download video file^{(461KB, mp4)}

Reconstruction of original movie (Video 1) with importance reduced in the last frames.

Download video file^{(322.4KB, mp4)}

Reconstruction of original movie (Video 1) with importance reduced in the first frames.

Download audio file^{(4MB, wav)}

Filtered version (see Methods) of the original sound snippet song from the song I{\rsquo}ll be your everything by Texas Radio Fish (http://ccmixter.org/files/texasradiofish/63300, CC BY NC)

Reconstruction of Supplemental Audio 1.^{(4MB, wav)}

Data Availability Statement

The core recursive KSVR implementation can be found at https://github.com/cleibold/recsvr.

[CR1] Abeles M, Bergman H, Margalit E, et al. Spatiotemporal firing patterns in the frontal cortex of behaving monkeys. J Neurophysiol. 1993;70(4):1629–1638. doi: 10.1152/jn.1993.70.4.1629. [DOI] [PubMed] [Google Scholar]

[CR2] Ahissar E, Kleinfeld D. Closed-loop neuronal computations: focus on vibrissa somatosensation in rat. Cereb Cortex. 2003;13(1):53–62. doi: 10.1093/cercor/13.1.53. [DOI] [PubMed] [Google Scholar]

[CR3] Alberini CM, Ledoux JE. Memory reconsolidation. Curr Biol. 2013;23(17):R746–750. doi: 10.1016/j.cub.2013.06.046. [DOI] [PubMed] [Google Scholar]

[CR4] Amit DJ, Fusi S. Learning in neural networks with material synapses. Neural Comput. 1994;6(5):957–982. doi: 10.1162/neco.1994.6.5.957. [DOI] [Google Scholar]

[CR5] Cybenko G. Approximation by superpositions of a sigmoidal function. Math Control Signals Syst. 1989;2(4):303–314. doi: 10.1007/BF02551274. [DOI] [Google Scholar]

[CR6] Douglas RJ, Martin KA. Neuronal circuits of the neocortex. Annu Rev Neurosci. 2004;27:419–451. doi: 10.1146/annurev.neuro.27.070203.144152. [DOI] [PubMed] [Google Scholar]

[CR7] Dragoi G, Buzsáki G. Temporal encoding of place sequences by hippocampal cell assemblies. Neuron. 2006;50(1):145–157. doi: 10.1016/j.neuron.2006.02.023. [DOI] [PubMed] [Google Scholar]

[CR8] Dragoi G, Tonegawa S. Preplay of future place cell sequences by hippocampal cellular assemblies. Nature. 2011;469(7330):397–401. doi: 10.1038/nature09633. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] Dragoi G, Tonegawa S. Selection of preconfigured cell assemblies for representation of novel spatial experiences. Philos Trans R Soc Lond B Biol Sci. 2014;369(1635):20120522. doi: 10.1098/rstb.2012.0522. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] Farooq U, Dragoi G. Emergence of preconfigured and plastic time-compressed sequences in early postnatal development. Science. 2019;363(6423):168–173. doi: 10.1126/science.aav0502. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] Fernández-Ruiz A, Oliva A, Fermino de Oliveira E, et al. Long-duration hippocampal sharp wave ripples improve memory. Science. 2019;364(6445):1082–1086. doi: 10.1126/science.aax0758. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] Fetterhoff D, Sobolev A, Leibold C. Graded remapping of hippocampal ensembles under sensory conflicts. Cell Rep. 2021;36(11):109,661. doi: 10.1016/j.celrep.2021.109661. [DOI] [PubMed] [Google Scholar]

[CR13] Foster DJ, Wilson MA. Hippocampal theta sequences. Hippocampus. 2007;17(11):1093–1099. doi: 10.1002/hipo.20345. [DOI] [PubMed] [Google Scholar]

[CR14] Glasberg BR, Moore BC. Derivation of auditory filter shapes from notched-noise data. Hear Res. 1990;47(1–2):103–138. doi: 10.1016/0378-5955(90)90170-T. [DOI] [PubMed] [Google Scholar]

[CR15] Grigoryeva L, Ortega JP. Echo state networks are universal. Neural Netw. 2018;108:495–508. doi: 10.1016/j.neunet.2018.08.025. [DOI] [PubMed] [Google Scholar]

[CR16] Haeusler S, Maass W. A statistical analysis of information-processing properties of lamina-specific cortical microcircuit models. Cereb Cortex. 2007;17(1):149–162. doi: 10.1093/cercor/bhj132. [DOI] [PubMed] [Google Scholar]

[CR17] Hermans M, Schrauwen B. Recurrent kernel machines: computing with infinite echo state networks. Neural Comput. 2012;24(1):104–133. doi: 10.1162/NECO_a_00200. [DOI] [PubMed] [Google Scholar]

[CR18] Hewitt AL, Popa LS, Pasalar S, et al. Representation of limb kinematics in Purkinje cell simple spike discharge is conserved across multiple tasks. J Neurophysiol. 2011;106(5):2232–2247. doi: 10.1152/jn.00886.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA. 1982;79(8):2554–2558. doi: 10.1073/pnas.79.8.2554. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] Hornik K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991;4(2):251–257. doi: 10.1016/0893-6080(91)90009-T. [DOI] [Google Scholar]

[CR21] Hyman IE, Jr, Husband TH, Billings FJ. False memories of childhood experiences. Appl Cognit Psychol. 1995;9(3):181–197. doi: 10.1002/acp.2350090302. [DOI] [Google Scholar]

[CR22] Jadhav SP, Kemere C, German PW, et al. Awake hippocampal sharp-wave ripples support spatial memory. Science. 2012;336(6087):1454–1458. doi: 10.1126/science.1217230. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] Jaeger H (2005) Reservoir riddles: suggestions for echo state network research. In: Proceedings of 2005 IEEE international joint conference on neural networks, vol 3, pp 1460–1462. 10.1109/IJCNN.2005.1556090

[CR24] Jaeger H (2010) Reservoir self-control for achieving invariance against slow input distortions

[CR25] Jaeger H. Using conceptors to manage neural long-term memories for temporal patterns. J Mach Learn Res. 2017;18(13):1–43. [Google Scholar]

[CR26] Jaeger H, Haas H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science. 2004;304(5667):78–80. doi: 10.1126/science.1091277. [DOI] [PubMed] [Google Scholar]

[CR27] Karlsson MP, Frank LM. Awake replay of remote experiences in the hippocampus. Nat Neurosci. 2009;12(7):913–918. doi: 10.1038/nn.2344. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] Laje R, Buonomano DV. Robust timing and motor patterns by taming chaos in recurrent neural networks. Nat Neurosci. 2013;16(7):925–933. doi: 10.1038/nn.3405. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] Larkum M. A cellular mechanism for cortical associations: an organizing principle for the cerebral cortex. Trends Neurosci. 2013;36(3):141–151. doi: 10.1016/j.tins.2012.11.006. [DOI] [PubMed] [Google Scholar]

[CR30] Lazar A, Pipa G, Triesch J. SORN: a self-organizing recurrent neural network. Front Comput Neurosci. 2009;3:23. doi: 10.3389/neuro.10.023.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] Lee AK, Wilson MA. Memory of sequential experience in the hippocampus during slow wave sleep. Neuron. 2002;36(6):1183–1194. doi: 10.1016/S0896-6273(02)01096-6. [DOI] [PubMed] [Google Scholar]

[CR32] Leibold C. A model for navigation in unknown environments based on a reservoir of hippocampal sequences. Neural Netw. 2020;124:328–342. doi: 10.1016/j.neunet.2020.01.014. [DOI] [PubMed] [Google Scholar]

[CR33] Leutgeb JK, Leutgeb S, Treves A, et al. Progressive transformation of hippocampal neuronal representations in “morphed” environments. Neuron. 2005;48(2):345–358. doi: 10.1016/j.neuron.2005.09.007. [DOI] [PubMed] [Google Scholar]

[CR34] Loftus EF. When a lie becomes memory’s truth: memory distortion after exposure to misinformation. Curr Dir Psychol Sci. 1992;1(4):121–123. doi: 10.1111/1467-8721.ep10769035. [DOI] [Google Scholar]

[CR35] Lukoševičius M, Jaeger H. Reservoir computing approaches to recurrent neural network training. Comput Sci Rev. 2009;3(3):127–149. doi: 10.1016/j.cosrev.2009.03.005. [DOI] [Google Scholar]

[CR36] Maass W, Natschläger T, Markram H. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 2002;14(11):2531–2560. doi: 10.1162/089976602760407955. [DOI] [PubMed] [Google Scholar]

[CR37] Mankin EA, Sparks FT, Slayyeh B, et al. Neuronal code for extended time in the hippocampus. Proc Natl Acad Sci USA. 2012;109(47):19,462–19,467. doi: 10.1073/pnas.1214107109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] Mayer NM, Browne M (2004) Echo state networks and self-prediction. In: Ijspeert AJ, Murata M, Wakamiya N (eds) Biologically inspired approaches to advanced information technology. Springer, Berlin, pp 40–48

[CR39] Milekic MH, Alberini CM (2002) Temporally graded requirement for protein synthesis following memory reactivation. Neuron 36(3):521–525 [DOI] [PubMed]

[CR40] Muller RU, Kubie JL. The effects of changes in the environment on the spatial firing of hippocampal complex-spike cells. J Neurosci. 1987;7(7):1951–1968. doi: 10.1523/JNEUROSCI.07-07-01951.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] Nader K, Schafe GE, Le Doux JE. Fear memories require protein synthesis in the amygdala for reconsolidation after retrieval. Nature. 2000;406(6797):722–726. doi: 10.1038/35021052. [DOI] [PubMed] [Google Scholar]

[CR42] Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. Springer, New York

[CR43] Reinhart RF, Jakob Steil J (2012) Regularization and stability in reservoir networks with output feedback. Neurocomputing 90:96–105. 10.1016/j.neucom.2012.01.032. Advances in artificial neural networks, machine learning, and computational intelligence (ESANN 2011)

[CR44] Sara SJ. Retrieval and reconsolidation: toward a neurobiology of remembering. Learn Mem. 2000;7(2):73–84. doi: 10.1101/lm.7.2.73. [DOI] [PubMed] [Google Scholar]

[CR45] Schölkopf B, Smola AJ (2002) Learning with kernels : support vector machines, regularization, optimization, and beyond. Adaptive computation and machine learning. MIT Press. http://www.worldcat.org/oclc/48970254

[CR46] Schölkopf B, Herbrich R, Smola AJ. A generalized representer theorem. In: Helmbold D, Williamson B, editors. Computational learning theory. Berlin: Springer; 2001. pp. 416–426. [Google Scholar]

[CR47] Schrauwen B, Verstraeten D, Campenhout JMV (2007) An overview of reservoir computing: theory, applications and implementations. In: ESANN 2007, 15th European symposium on artificial neural networks, Bruges, Belgium, April 25–27, 2007, Proceedings, pp 471–482. https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2007-8.pdf

[CR48] Stanley GB (2001) Recursive stimulus reconstruction algorithms for real-time implementation in neural ensembles. Neurocomputing 38-40:1703–1708. 10.1016/S0925-2312(01)00535-5. Computational Neuroscience: Trends in Research 2001

[CR49] Stopfer M, Laurent G. Short-term memory in olfactory network dynamics. Nature. 1999;402(6762):664–668. doi: 10.1038/45244. [DOI] [PubMed] [Google Scholar]

[CR50] Sussillo D, Abbott LF. Generating coherent patterns of activity from chaotic neural networks. Neuron. 2009;63(4):544–557. doi: 10.1016/j.neuron.2009.07.018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] Sussillo D, Abbott LF. Transferring learning from external to internal weights in echo-state networks with sparse connectivity. PLoS One. 2012;7(5):e37,372. doi: 10.1371/journal.pone.0037372. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] Vapnik VN. The nature of statistical learning theory. New York: Springer; 1995. [Google Scholar]

[CR53] Williams RJ, Zipser D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1989;1(2):270–280. doi: 10.1162/neco.1989.1.2.270. [DOI] [Google Scholar]

[CR54] Ziv Y, Burns LD, Cocker ED, et al. Long-term dynamics of CA1 hippocampal place codes. Nat Neurosci. 2013;16(3):264–266. doi: 10.1038/nn.3329. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Neural kernels for recursive support vector regression as a model for episodic memory

Christian Leibold

Abstract

Supplementary Information

Introduction

Results

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Methods

Recursive support vector regression

Remarks

Importance scaling

Theta sequences

Frequency kernels

Discussion

Supplementary Information

Acknowledgements

Funding

Repositories

Declarations

Conflict of interest

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Neural kernels for recursive support vector regression as a model for episodic memory

Christian Leibold

Abstract

Supplementary Information

Introduction

Results

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Methods

Recursive support vector regression

Remarks

Importance scaling

Theta sequences

Frequency kernels

Discussion

Supplementary Information

Acknowledgements

Funding

Repositories

Declarations

Conflict of interest

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases