Skip to main content
Springer logoLink to Springer
. 2022 Mar 29;116(3):377–386. doi: 10.1007/s00422-022-00926-9

Neural kernels for recursive support vector regression as a model for episodic memory

Christian Leibold 1,
PMCID: PMC9170657  PMID: 35348879

Abstract

Retrieval of episodic memories requires intrinsic reactivation of neuronal activity patterns. The content of the memories is thereby assumed to be stored in synaptic connections. This paper proposes a theory in which these are the synaptic connections that specifically convey the temporal order information contained in the sequences of a neuronal reservoir to the sensory-motor cortical areas that give rise to the subjective impression of retrieval of sensory motor events. The theory is based on a novel recursive version of support vector regression that allows for efficient continuous learning that is only limited by the representational capacity of the reservoir. The paper argues that hippocampal theta sequences are a potential neural substrate underlying this reservoir. The theory is consistent with confabulations and post hoc alterations of existing memories.

Supplementary Information

The online version contains supplementary material available at 10.1007/s00422-022-00926-9.

Keywords: Theta sequences, Episodic memory, Recursive support vector regression

Introduction

To retrieve episodic memories, brains need to elicit robust internal sequences of neuronal activity patterns that are linked to previous sensory-motor experiences. Thus, neural processes need to be in place that form such activity sequences as well as link them to sensory-motor areas while learning. Episodic memories are further known to be able to change over time by reconsolidation (Sara 2000; Nader et al 2000; Milekic and Alberini 2002; Alberini and Ledoux 2013), eventually even leading to false memories of events that never happened (Loftus 1992; Hyman Jr. et al 1995). This suggests that the architecture of episodic memory is versatile and local in time in the sense that any pair of memory items can be connected into a memory episode independent of context.

Since electrophysiological recordings in animals prohibit correlating activity sequences to introspective retrieval of episodic memories, memory-related activity sequences are typically studied in rodents in association with behavioral performance in navigational tasks (Lee and Wilson 2002; Karlsson and Frank 2009). Activity sequences of hippocampal place cells thereby have been reported to correlate to (Lee and Wilson 2002; Dragoi and Buzsáki 2006; Foster and Wilson 2007) and to causally explain (Jadhav et al 2012; Fernández-Ruiz et al 2019) memory-dependent navigation. Sequences have furthermore been found to exist even before a specific spatial experience has been made by an animal (Dragoi and Tonegawa 2011, 2014; Farooq and Dragoi 2019), suggesting that, at least part of the learning process is about establishing synaptic connections between existing intrinsic neuronal sequences and the sensory-motor areas that represent the content of the memory episode.

The idea that multi-purpose intrinsic neuronal dynamics is used to represent time series of extrinsic events has been invented multiple times under the names of echo-state networks (Jaeger and Haas 2004), liquid computing (Maass et al 2002) and reservoir computing (Jaeger 2005; Schrauwen et al 2007; Lukoševičius and Jaeger 2009) and has proven to be both computationally powerful and versatile (Maass et al 2002; Sussillo and Abbott 2009), particularly, since multiple output functions can be learned on the same intrinsic activity trajectory (sequence) and played out in parallel.

There has been considerable previous work on how to construct a dynamical reservoir via the dynamics of neuronal networks (Haeusler and Maass 2007; Sussillo and Abbott 2009; Lazar et al 2009). Also different learning rules for the synapses from the sequence reservoir to the output neurons were successfully explored, such as the perceptron rule (Maass et al 2002), a Hebb rule (Leibold 2020) or recursive least squares-derived rules (Williams and Zipser 1989; Stanley 2001; Jaeger and Haas 2004; Sussillo and Abbott 2009). The more general applicability of reservoir computing to neuroscience is, however, still limited because several open questions remained, particularly about how to relate reservoir computing ideas to neurophysiological data: For example, can sufficiently rich reservoirs be realized with spiking neuronal networks? How can be found out whether reservoir spiking activity bears meaningful representations in the sense of Marr’s second level—as opposed to just being a “liquid” black box? Can a regression type learning rule be neuronally implemented using local Hebbian principles? How can new information (including false memories) be added to an existing episode or specific memory items be deleted? Can physiologically plausible models realize the universal approximation property (Grigoryeva and Ortega 2018) or are the limits of learning already imposed by interference of weight updates at output synapses below the capacity limit (Amit and Fusi 1994)? Particularly the latter problem is of fundamental importance in applying reservoir computing ideas to brain activity data, since available recordings (and also most models) are generally restricted to only a relatively small number of neurons (limiting representational capacity), whereas a whole real brain has been close to infinite capacity for all practical (here experimental) purposes. Finding a representation of reservoir activity would thus eliminate capacity limitations and allow for efficient representation of a huge set of sensory-motor experiences in the synaptic weights.

Here, I propose a neuronal implementation of recursive kernel support vector regression as an efficient one-shot learning rule that is only limited by the representational capacity of the dynamical reservoir and allows for importance scaling (as compared to only graceful decay). Kernels thereby allow reservoir activity to be interpreted as representations in the sense of Marr (Hermans and Schrauwen 2012), which adds to theoretical neuroscience by allowing for specific interpretations of neural activity. For example, I will argue that theta sequences of hippocampal place cells (Foster and Wilson 2007) implement a kernel that represents distance in time or space and that the integration of auditory nerve activity at different delays implements a kernel representing time for acoustic stimuli in a cochlear frequency band. Below the capacity limit, the learning rule implements the well-known recursive (Gauss-Legendre) least mean squares (or FORCE) rule (Sussillo and Abbott 2009) on the underlying neuronal patterns, showing that FORCE-learning is only limited by the capacity of the simulated or measured reservoir.

Results

Let us consider an episodic experience to be fully reflected by the sensory-motor evoked summed postsynaptic potentials yt(k) at all involved neurons k for all points in time t. In order to store the episodic experience as a memory, the synaptic inputs yt(k) need to be linked to a preexisting reservoir state xt such that, whenever xt is present afterward, the learned synaptic connections w(k) from the reservoir evoke the same depolarizations

yt(k)=xtTw(k) 1

without the presence of the original sensory-motor activity (Fig. 1), i.e., w(k) solve a regression problem with xt as regressors. Considering only depolarizations y(k) in such a one-layer feedforward network, one does not need to consider nonlinearities during spike generation and, with reasonable approximation, the model is an effectively linear network. It also should be noted that in this paper I do not intend to explain the nature of the preexisting sequences xt, and just assume that they exist. For the sake of simplicity, I further drop the neuron index k, since all considerations trivially generalize to multiple neurons.

Fig. 1.

Fig. 1

Conceptual overview. At any instance in time, let us consider q to encode any sensory-motor experience of an agent (human, animal or machine). A neocortical representation y(q) of this experience is evoked by the sensory afferents, and motor efference copies. The current state x of a reservoir (e.g., in the hippocampal formation) is linked to the temporally coincident experience q by synaptic learning of the connections W from the reservoir to the neocortex. The synaptic change is thereby proportional to the error signal y-Wx; see Eq. (5). During retrieval, the reservoir state x previously associated with a real, or confabulated experience q^, evokes a corresponding neocortical representation y(q^)

Besides the scalar product in Eq. (1), biological feasibility imposes two more constraints on how one models learning. First, synaptic plasticity should be activity-dependent and therefore the weights should be a superposition of existing neuronal activity patterns,

w=t=1Pxtut=Xu 2

with X=(x1,,xP) (see Sect. 4 on representer theorem). Second, the learning rule needs to be recursive, i.e., new input–output pairs (xP+1,yP+1) should be added such that Eq. (1) holds for all previous patterns (no interference) until the capacity limit and memory decay beyond the capacity limit should be importance based. In short, the learning rule is supposed to identify the loads u such that the outputs yt are exactly recovered by the model,

y=XTXu. 3

As long as the kernel matrix K=XTX is invertible (below the capacity limit), the solution for u is exact and straightforward. For non-invertible or badly conditioned K (at or above the capacity limit), the standard approach would be to use the pseudo-inverse K of K, which optimizes the mean squared deviation between output y and model output KKy and leads to the classical recursive least squares (RLS) algorithm if applied recursively. RLS on the loads u, however, has two main disadvantages. First, RLS makes explicit use of time making it hard to modify memories by post hoc insertion of new detail within an existing memory sequence. Second, RLS on the loads u is hard to interpret biologically.

I therefore suggest, as an alternative approach, to solve the regression problem by maximizing

W(u)=-12uTKu+yTu, 4

which, for invertible K, yields the exact recovery condition Eq. (3), therefore justifying the use of W as the underlying objective function. Moreover, the maximization problem from Eq. (4) can be derived as the dual problem of support vector regression for ε-insensitive loss (see Sect. 3 and Vapnik 1995; Schölkopf and Smola 2002), further supporting the interpretation of regression.

Since support vector approaches translate to nonlinear models using the kernel trick Knm=xnTxmκ(n,m) (Vapnik 1995), the model also provides a foundation for neural implementations of kernels, which can be considered as representations of the topological space spanned by n and m. In the same sense as Marr saw representations to be connected to the algorithmic level, the kernel represents the space of n and m in a sufficient way to fully specify the outlined regression algorithm, and thus, following (Hermans and Schrauwen 2012), I suggest to consider it as being the true neural representation of this space in contrast to considering representations as activity patterns in undersampled cell populations.

Maximizing W results in an update rule for u (see Sect. 3) that translates into a weight change Δw=XΔu of

Δw=(yP-wTxP)ePNxPxPTNxP. 5

with N=1l-XK-1XT, and an iteration rule

NN-(NxP)(NxP)TxPTNxP, 6

that is equivalent to RLS without forgetting (i.e., without regularization). The learning rule is one shot in the sense that, for any new pattern, the update rules have to be applied only once and it allows for the functional interpretation error (eP) times novelty (N): Because 1l-N is a projection operator (see Sect. 3), NxP will be 0 whenever xP equals one of the previous patterns already included in X, whereas any component of xP that is orthogonal to all patterns in X will be unaffected by N. The action of N can thus be computationally interpreted as novelty detection. For a naive learner (P=0), the rule is plain Hebbian, since the error equals the output and the novelty equals the input pattern. In Sect. 4, I will suggest a biologically feasible implementation of N and its learning as anti-Hebbian updates of a recurrent neural network. Importantly, the translation into neuron space resulting in Eqs. (5) and (6) is only required to show how the learning rule can be biologically implemented. In contrast to RLS, it is not necessary to use these update rules for all ensuing applications, which are only relying on the numerically much more tractable update rule for the loads u presented in Eq. (7) in Sect. 3.

As a first neuroscience application, I refer to hippocampal theta sequences (Fig 2A): Roughly, one considers a subset of place cells to fire in sequence in every cycle of the hippocampal theta oscillation of the local field potential (about 8 Hz in rodents). In the subsequent cycle, the starting neuron of the previous cycle drops out of the sequence but a new neuron is added at the end of the sequence. Thus the activity patterns of close-by cycles are similar, whereas they become more and more distinct the further the cycles are spaced apart.

Fig. 2.

Fig. 2

Episodic learning with theta sequences. (A) Spike raster plot of the first 300 of N=10,000 neurons implementing theta sequences as described in Sect. 3 Theta sequences (sparseness f=0.01, sequence length S=10). In every theta cycle the sequence moves one neuron upward. (B) Kernel derived as scalar product between population patterns from the simulations shown in A (black dots) and theoretical prediction (blue line). (C) Retrieval (red line) of a low-pass noise signal (black) of length T=100 from P observations (crosses; for P see insets from left to right) using the theoretical kernel from B. The signal was generated as a running average (50 time steps) of white noise. (D) Same as B. Brightness signal for five example RGB channels from a movie scene (P=20, T=111, N=576×768×3). (E) Retrieval of movie snippet (five example frames shown) from .re_potemkin, a copyleft crowd sourcing free/open source cinema project (https://re-potemkin.httpdot.net/). Original movie snippet and reconstruction are provided as Videos S1 and S2

In the simple theta sequence model outlined above, the overlap (scalar product) of activity patterns decays linearly (see Sect. 3) implementing a kernel Kmn=κ(n-m) as a function of the distance n-m of the two cycles (Fig. 2B).

Inserting the triangular linear kernel from Fig. 2B into the learning rule derived by recursively maximizing W, one can recover the original signal yt without simulating the underlying reservoir. Increasing detail of the original signal can be retrieved the more pairs (xt,yt) one takes into account for learning (Fig. 2C). Since the kernel is a continuous function, the capacity has become infinite, i.e., any function yt can be recovered if the neuron number N becomes infinite.

As mentioned above, generalization to multiple neurons is trivial, and to illustrate let us consider each output neuron to reflect one RGB color channel of any pixel in a movie ( 1.3 million neurons). Using only 20 of 110 movie frames already allow for recovery of the movie snippet with a compression below 20% (Fig. 2D, E).

By construction, the learning rule has no explicit dependence on time; thus, the order in which pairs (xt,yt) are presented makes no difference to the final fit (Fig. 3A), which is not the case for the FORCE rule derived from classical least squares.

Fig. 3.

Fig. 3

Post hoc addition of memory items. A Left: Retrieval (red) of a low-pass noise signal (see Fig. 2C) of length T=100 for P=15 randomly positioned inputs (circles). Right: Same as left after 35 further inputs (crosses) have been iteratively added to the learning process. B Illustration of A for post hoc insertion of a movie scene. Top: original movie sequence (P=20). Bottom: Movie sequence after a new scene has been inserted to the original snippet (P=35). Movies are provided in Videos S3 and S4

Biologically, this means that any episode can be post hoc modified by learning new pairs (xt,yt) with temporal contingencies reflected in the kernel arguments, generating a model of false memories (Fig. 3B).

Every memory system is finite and the way of forgetting fundamentally determines its usefulness for practical applications. A graceful decay of memories over time (Amit and Fusi 1994) is already quite an advantage to catastrophic forgetting in attractor networks (Hopfield 1982); however, the behavioral relevance of a memory may not just depend on how old or young it is. I therefore introduce an importance scaling into the learning rule in that loads ut are multiplied with some attenuation factor 0at1. Thus, if one chooses at=λ(T-t),0<λ<1 one retains a graceful decay over time as in standard RLS. The resulting learning rule that maximizes the modified W is then obtained by only the small modification of replacing the kernel κ(n,m) by κ(n,m)anam (see Sect. 3). The effect of importance scaling is illustrated in Fig. 4A, B, where the learning rule is told to pay more attention to a certain time interval at the cost of worse reconstruction in other time intervals.

Fig. 4.

Fig. 4

Importance scaling. A Retrieval (red) of a low-pass noise signal (black; see Fig. 2C) for attenuation parameters at=λ|P-t| with λ=0.999 and varying importance centers P (see titles). B Illustration using the movie snippet from Fig. 2 with importance in the beginning (at=λt,top) and in the end (at=λ110-t, bottom). In the image sequence on top one stills sees an erroneous reflection of the glass in the last three images, whereas in the bottom sequence the glass in the first to frames shown erroneously displays the yellowish colors from the end. Movies are provided in Videos S5 and S6. C Left: Retrieval (red) of a low-pass noise signal (black) with N=20,000 time steps (only shown between time step 6,000 and 6,500) and P=500 patterns (crosses) with random importance values a (cyan) between 0.5 and 1. Middle: Reconstruction error (absolute difference between black an red line) negatively correlates with a for all P=500 patterns. Right: Error has no dependence on time

Importance may randomly vary over time and thus temporal contingency in a values should not be a necessary prerequisite for importance scaling. Applying the learning rule in a scenario with random a values shows that retrieval error is indeed largest for small a independent of time (Fig. 4C). Post hoc increase of a could thus be considered as a model of memory consolidation, post hoc decrease of a as a model of extinction learning.

With importance scaling as a weighting mechanism at hand, let us now revisit the original capacity question. In the language of the recursive updating rules from equations (7) and (8) the memory and computational demand scale with square of the number of patterns P. A straightforward choice to limit the capacity is to introduce a cutoff dimension dc such that only the dc patterns with highest importance values a are stored in the algorithm and the other dimensions are set to 0. In Fig. 5A, B I vary dc for low-pass filtered noise signals of different length with linearly increasing importance toward the signal end and observe that for low dc, the reconstruction error increases relatively soon, whereas for dc300 reconstruction worked well even for signal lengths up to 10 times larger than dc, which reflects that the geometry of the kernel fits the correlational structure of the signal.

Fig. 5.

Fig. 5

Capacity. A Example reconstructions (green) of a signal (orange) for smaller (top) and larger (bottom) cutoff dimensions (dc=100 and 300, respectively). Sample points used for reconstruction (signal length p = cycles / 2) are shown in blue. For the left panels the signal length equals the cutoff dimension. On the right the signal is 10 times longer than the cutoff (only 1000 data points shown). B Reconstruction error (root mean squared) for different cutoff dimensions (colors as indicated) as a function of signal length (solid lines indicate mean from 20 repetitions, shaded areas the 90 percent quantile). The results are derived from a low-pass noise signal with a running average over 100 time steps and a triangular kernel with length 25 time steps

The need to adjust the kernel length to the time scale of signal fluctuations suggests that more specific signal properties require more specifically designed kernels. In most neuroscience applications, sensory signals are not random but reflect physical constraints of the environment or the sensory periphery. As a next example I therefore consider functions with bandpass characteristics similar to cochlear frequency channels. Knowledge about the preferred local structure of a function (oscillations with a certain frequency) suggests a kernel with similar bandpass characteristics (see Sect. 3 and Fig. 6A). In contrast to the triangular linear kernel which only represents temporal distance, the band kernels represent temporal distance (by their decay) and frequency.

Fig. 6.

Fig. 6

Sound reconstruction. A Kernels representing time in a frequency channel with center frequency on top (see Sect. 3 Frequency kernels). B Retrieval (red) of the signal (black) in five of the frequency channels (crosses mark memory items). C Reconstruction (red; moved upward for reasons of illustration) of the original sound signal (black; the beginning of the song http://ccmixter.org/files/texasradiofish/63300, CC BY NC) by summing over the filter-weighted channel components (see Sect. 3 Frequency kernels). Reconstructed sound file is provided in Audiofile S8, well as the identically filtered original sound wave (Audiofile S7)

Learning is then performed on each cochlear frequency channel separately and the fitting benefits from both recovering the function values at a few points and the fine structure of the kernel. A post hoc synthesis across frequency channels recovers the original soundwave with high fidelity and smaller memory demand as the original sampling (see Sect. 3 Frequency kernels).

Methods

Recursive support vector regression

Linear support vector regression with ε-insensitive loss (Schölkopf and Smola 2002; Vapnik 1995) is derived from minimizing the squared L2-norm 12w2 of the weight vector of the linear model f(x)=wTx+b under the inequality constraints -(ε+ζn)yn-f(xn)ε+ζn, with ζn,ζn0, and including the sum of slack variables n(ζn+ζn) as a regularizer.

The classical work has shown that the resulting optimal solution yields a weight vector of shape

w=n(αn-αn)xn

that maximizes the dual problem

W(u,v)=-12uTKu+yu-εnvn

with Knm=xnTxm, un=αn-αn, vn=(αn+αn) under the constraints α,α0. Hence, for every local maximum of W regarding u, there is a combination of αn,αn that minimizes nvn, i.e., αn=0 if un>0 and αn=0 if un<0. For ε0, the maximum in (u,v) converges to αn=0 or αn=0, and thus, in this limit, one can drop v from the equations.

Here, a recursive learning rule is derived such that W remains at this maximum if a new observation (yp,xP) is added. One therefore denotes uT=(u~T,uP), yT=(y~T,yP), and X~=(x1,,xP-1) and finds the optimum of

W(u~T,uP)T=-12u~TK~u~-uPxpTX~u~-12uP2KPP+y~u~+yPuP

by

0=u~W=-K~u~-uPX~TxP+y~u~=K~-1(y~-uPX~TxP)

and

0=uPW=-xPTX~u~-KPPuP+yP0=yP-xpTX~K~-1y~-uP(KPP-xPTX~K~-1X~TxP)

If one denotes the optimum loads of the previous P-1 inputs by u~=K~-1y~, one can express the optimality conditions using xPTX~u~=xPTw, as

uP=yP-xPTwKPP-xPTX~K~-1X~TxPu~=u~-uPK~-1X~TxP 7

The update rules for u from Eq. (7) require computation of the inverse of K~, which, a) is computationally costly and, b) biologically not straightforward. I therefore derived an iteration rule using the Sherman–Morrison–Woodbury identity (Nocedal and Wright 2006), which yields an iteration equation for K-1 from the P-1st to the Pth pattern

K-1=K~-100T0+CP-1Q~Q~T-Q~bmQ~T1 8

with Q~=K~-1X~TxP and CP=KPP-xPTX~K~-1X~TxP. The iteration equation (8) can be proven by elementary algebra (K-1K=1l).

Remarks

  • Translation of update rules from Eq. (7) to weight updates Δw=XΔu is straightforward:
    Δw=X~,xPu~-u~up=(-X~K~-1X~T+1l)xPuP=(yP-x~pTw)(1l-X~K~-1X~T)xPxPT(1l-X~K~-1X~T)xP;
    see result from Eq. (5).
  • 1l-N:=X~K~-1X~T, and N are projection operators, since [1l-N]2=[1l-N] and N2=N.

  • If P-1N and patterns are linearly independent, K~ is a Gramian and, hence, invertible.

  • For P-1 exceeding N, K~ can no longer be exactly inverted. Formally this is not necessary using a kernel representation, since the kernel operates on an infinite-dimensional Hilbert space. Biologically, for a finite number N of neurons, approximate inversion can be obtained by importance scaling (see below).

  • Recursively adding data points continuously increases the dimensions of the matrix K-1 and, hence, memory and computational costs. A brute force strategy to avoid this numerical divergence is to introduce a cutoff dimension, after which one removes the patterns with lowest importance values a. For all figures except Fig. 5, in which we explicitly study this parameter, we used a cutoff dimension of 300.

Importance scaling

Importance is introduced by attenuation factors 0at1 that scale the inequality constraints of support vector regression: -(ε+ζn)an[yn-f(xn)]ε+ζn. If an is small, slack variables can also be small and the pair (yn,xn) contributes little to the loss via the regularizer. The resulting optimal solution is very similar to the one without attenuation factors, only the weight vector are now

w=tutatxt

which, in the computation of the recursive learning rule, requires to replace

κ(n,m)κ(n,m)anam.

Biologically, this rule maps to an attenuation of the inputs xtatxt. Thus, patterns with low at are treated as more different to patterns with large at, even if they have similar structure.

The scaling of the kernel also has interesting consequences for situations in which the K is no longer invertible (P>N) if constructed from a finite population of neurons. In this case, one nevertheless, can apply the iteration equation (8); however, patterns with small an will contribute only little to Q~ as the respective rows are scaled down in X~T. The resulting matrix is hence no longer an exact inverse, but the patterns for which the “inversion” fails mostly are those with low an. This is best illustrated by assuming an=0, in which case the pattern x has no contribution to Q~ and hence K-1, as if it would not have been used for learning. Functionally modulating plasticity with a also allows a post hoc improvement in an existing episodic memory, by setting higher importance an to this pattern if the episode is presented as second time.

Theta sequences

Sparse binary random patterns ξn with Prob(ξn(k)=1=f are assumed to represent hippocampal ensembles that fire together at a specific phase of the theta cycle. Given that S of those ensembles are activated in sequence during a theta cycle the population pattern in cycle t equals

xt=k=0S-1ξt+k

For a population of N neurons, the overlap of two such patterns can be computed as

Knm=kkξn+kTξm+kN[S-|n-m|]+ξ2N+(S2-[S-|n-m|]+)ξ2N.

For independent binary random variables, one finds ξ=ξ2=f, and thus the overlap is a linear triangular kernel

Knm=K(|n-m|)=N[S-|n-m|]+f(1-f)+(Sf)2

as depicted in Fig. 2.

Frequency kernels

The cochlea separates a sound s(t) into frequency channels that roughly act as band-pass filters and can thus be characterized by a filter kernel γf(t), with f denoting the center frequency of the cochlear channel. If one assumes multiple (k=1,,K) auditory nerve fibers to connect to such a frequency channel the linear response of each of those fibers can be modeled as xt(k)=cf(t-Δ(k))=(γfs)(t-Δ(k)) with a fiber-specific delay Δ(k) that may reflect differences in fiber lengths, diameters or myelination.

For a large number K of fibers the resulting kernel can be computed as an integral

kxt(k)xt(k)dΔcf(t-Δ)cf(t-Δ)=ducf(u)cf(t-t+u)=κ(t-t),

which corresponds to the autocorrelation of cochlear response, and for long broadband signals s equals the autocorrelations of the filters γf. The exponentially decaying kernel used in Fig. 6 reflects exactly such a prototypical autocorrelation.

Specifically, a sound signal (the beginning of the CC BY NC song I’ll be your everything by Texas Radio Fish, http://ccmixter.org/files/texasradiofish/63300) was passed through a gamma tone filterbank consisting of seven channels (center frequencies 2k×200 Hz, k=0,,6) with width constants 2.019ERB (Glasberg and Moore 1990). In each of the channels ρk data points per cycle (equally spaced) were selected for learning. The parameters ρk where channel (k-)dependent and equaled 6, 4, 3, 3, 1.5, 1, .25 for k=0,,6. The recursive KSVR was fitted in each channel independently in chunks of 500 data points.

For full audio reconstruction, the reconstructed signals were Fourier-transformed in each band and divided by the Fourier transforms of the respective gammatone filter kernel omitting frequencies below 10 Hz and above 20 kHz. These filter-corrected components were backtransformed, summed and rescaled to the root mean square level of the original signal.

Discussion

Kernel support vector regression (KSVR) is a powerful tool for function fitting. Here, I presented a biologically plausible neural implementation of recursive KSVR that enables storing episodic memories as temporal sequences of retrieved sensory-motor activity patterns yt (i.e., fitting yt). The kernels can be biologically interpreted as scalar products of activity patterns xt of a reservoir and provide a neural representation of temporal distance.

Hippocampal theta sequences provide a well-known example that realize exactly such a reservoir. However, already in the hippocampus, neuronal activity not only consists of sequence-type activity, but also exhibits rate modulations induced by changes in the sensory environment generally known as remapping (Muller and Kubie 1987; Leutgeb et al 2005; Fetterhoff et al 2021). Thus, behavior-related neuronal activity may always contain both reflections Wxt of the reservoir and feedforward sensory motor drive, thereby balancing expectations (i.e., reservoir-driven activity) and sensory reality. This combination of top-down and bottom-up input streams is widely considered to be a general design principle of the neocortex (Douglas and Martin 2004; Larkum 2013), resulting in sensory-motor activity patterns yt at the same time reflecting stimulus-driven responses and intrinsic dynamics as, for example, reflected by synfire chains (Abeles et al 1993).

While the view of neocortex as a hierarchical combination of sensory-motor prediction loops (Ahissar and Kleinfeld 2003) is probably a good proxy of the neurobiological substrate, it is not widely explored in classical artificial neural network research. There, the universal approximation theorem, as a hallmark result, states that neural networks can approximate any function to arbitrary degree of precision (Cybenko 1989; Hornik 1991) which rather views brains as feedforward function fitting devices. The field of reservoir computing has extended this idea toward the temporal domain by suggesting intrinsic neural dynamics to represent a time axis as the independent variable of function fitting (Jaeger 2005) and thereby allows neural networks to generate predictions varying with time. However, to be able to operate on a continuous stream of sensory inputs, the learning rules for the output synapses of the reservoir need to be able to recursively update (Williams and Zipser 1989; Stanley 2001; Sussillo and Abbott 2009), which requires a biological interpretation of the common least-mean square derived ideas.

Here, I suggest that the iterative update of the projection operation N that only requires anti-Hebbian type outer products can be implemented as anti-Hebbian learning of a simple recurrent neuronal network: In the neural space of synaptic weights, XK-1XT=1l-N is of outer product form as seen from Eq. (8). The matrix N=1l-XK-1XT can thus be interpreted as the connectivity of a recurrent neural network that is learned by anti-Hebbian updates, i.e.,

N=1l-t=1P-1rtrtT 9

with rP=NxP/xPTNxP. Since XK-1XT is a projection matrix (see Sect. 3), one furthermore can write rt=xt/xt with xt being the component of xt that is orthogonal to all previously learned patterns.

This leads to the following interpretation of r as the activity of a neural network in discrete time s

r(s+1)=ϕ[δs,0x+Nr(s)],

where the network is initialized at r(s=0)=0, the input x is present only at time step s=0, and ϕ(z)=z/z. As a result of this dynamics r(s)=xt/xt for all time steps s>1. This dynamical fixed point state will then produce an anti-Hebbian weight update from Eq. (9).

A further drawback of RLS-derived rules was their lacking theoretical foundation since they made explicit use of the reservoir patterns that, for technical reasons, were limited to a small subsample of neurons. Here, I use the generalized representer theorem (Schölkopf et al 2001) to translate the weight update into an update rule for the loads (coefficients) u of the input patterns X and thereby avoid an explicit representation of the neural feature space xt and instead only require a kernel representation (Hermans and Schrauwen 2012). Formulation of the learning rule on the loads allows analytical insights for reservoirs of size N, but also reduces the computational demand of simulating (or recording) from a large number of neurons.

Importantly, this paper considers reservoir activity only in the context of memory retrieval but not replay of reservoir sequences. Replay in the context of reservoirs has often been used to improve performance and stability (Mayer and Browne 2004; Jaeger 2010; Sussillo and Abbott 2012; Reinhart and Jakob Steil 2012; Laje and Buonomano 2013; Jaeger 2017; Leibold 2020). However, changing reservoir patterns would require to also change the readout-matrix to maintain the originally learned memory traces yt (Sussillo and Abbott 2012; Reinhart and Jakob Steil 2012; Jaeger 2017). In the context of the model presented here, relearning is not necessary as long as the kernel remains fixed, i.e., the topology of the space is constant. Neurobiologically, however, such a trick would require to change the weights w by replacing the matrix X.

I presented two neurobiological examples of how kernel representations are or may be implemented, hippocampal theta sequences and auditory nerve fiber populations. Temporal sequences of activation patterns, however, are ubiquitous in sensory-motor systems and occur on multiple time scales. Thus the proposed theory may also apply to a multitude of other examples. A prerequisite is to find a continuous representation of time in the population patterns that then translates via a scalar product into kernels with continuous time dependence. Further such examples could be the long-term changes of the hippocampal rate code of place cells (Mankin et al 2012; Ziv et al 2013), activation of cerebellar purkinje cells during limb movements (Hewitt et al 2011), or olfactory-driven activity that evolves along fixed trajectories after odor presentation (Stopfer and Laurent 1999).

Supplementary Information

Below is the link to the electronic supplementary material.

Download video file (388.9KB, mp4)

Original movie snippet from .re_potemkin, a copyleft crowdsourcing free/open source cinema project (https://re-potemkin.httpdot.net/)

Download video file (250.7KB, mp4)

Reconstruction of the original movie snippet (Video 1).

Download video file (337KB, mp4)

Original movie with time extended in the middle.

Download video file (590.4KB, mp4)

As Supplemental Video 3 but with a new scene added post hoc.

Download video file (461KB, mp4)

Reconstruction of original movie (Video 1) with importance reduced in the last frames.

Download video file (322.4KB, mp4)

Reconstruction of original movie (Video 1) with importance reduced in the first frames.

Download audio file (4MB, wav)

Filtered version (see Methods) of the original sound snippet song from the song I{\rsquo}ll be your everything by Texas Radio Fish (http://ccmixter.org/files/texasradiofish/63300, CC BY NC)

Acknowledgements

The work was funded by the German Research Association (DFG) under Grant number LE2250/13-1. The author is indebted to Stefan Häusler for discussions and comments on the manuscript.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Repositories

The core recursive KSVR implementation can be found at https://github.com/cleibold/recsvr.

Declarations

Conflict of interest

The author declares no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Abeles M, Bergman H, Margalit E, et al. Spatiotemporal firing patterns in the frontal cortex of behaving monkeys. J Neurophysiol. 1993;70(4):1629–1638. doi: 10.1152/jn.1993.70.4.1629. [DOI] [PubMed] [Google Scholar]
  2. Ahissar E, Kleinfeld D. Closed-loop neuronal computations: focus on vibrissa somatosensation in rat. Cereb Cortex. 2003;13(1):53–62. doi: 10.1093/cercor/13.1.53. [DOI] [PubMed] [Google Scholar]
  3. Alberini CM, Ledoux JE. Memory reconsolidation. Curr Biol. 2013;23(17):R746–750. doi: 10.1016/j.cub.2013.06.046. [DOI] [PubMed] [Google Scholar]
  4. Amit DJ, Fusi S. Learning in neural networks with material synapses. Neural Comput. 1994;6(5):957–982. doi: 10.1162/neco.1994.6.5.957. [DOI] [Google Scholar]
  5. Cybenko G. Approximation by superpositions of a sigmoidal function. Math Control Signals Syst. 1989;2(4):303–314. doi: 10.1007/BF02551274. [DOI] [Google Scholar]
  6. Douglas RJ, Martin KA. Neuronal circuits of the neocortex. Annu Rev Neurosci. 2004;27:419–451. doi: 10.1146/annurev.neuro.27.070203.144152. [DOI] [PubMed] [Google Scholar]
  7. Dragoi G, Buzsáki G. Temporal encoding of place sequences by hippocampal cell assemblies. Neuron. 2006;50(1):145–157. doi: 10.1016/j.neuron.2006.02.023. [DOI] [PubMed] [Google Scholar]
  8. Dragoi G, Tonegawa S. Preplay of future place cell sequences by hippocampal cellular assemblies. Nature. 2011;469(7330):397–401. doi: 10.1038/nature09633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dragoi G, Tonegawa S. Selection of preconfigured cell assemblies for representation of novel spatial experiences. Philos Trans R Soc Lond B Biol Sci. 2014;369(1635):20120522. doi: 10.1098/rstb.2012.0522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Farooq U, Dragoi G. Emergence of preconfigured and plastic time-compressed sequences in early postnatal development. Science. 2019;363(6423):168–173. doi: 10.1126/science.aav0502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Fernández-Ruiz A, Oliva A, Fermino de Oliveira E, et al. Long-duration hippocampal sharp wave ripples improve memory. Science. 2019;364(6445):1082–1086. doi: 10.1126/science.aax0758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fetterhoff D, Sobolev A, Leibold C. Graded remapping of hippocampal ensembles under sensory conflicts. Cell Rep. 2021;36(11):109,661. doi: 10.1016/j.celrep.2021.109661. [DOI] [PubMed] [Google Scholar]
  13. Foster DJ, Wilson MA. Hippocampal theta sequences. Hippocampus. 2007;17(11):1093–1099. doi: 10.1002/hipo.20345. [DOI] [PubMed] [Google Scholar]
  14. Glasberg BR, Moore BC. Derivation of auditory filter shapes from notched-noise data. Hear Res. 1990;47(1–2):103–138. doi: 10.1016/0378-5955(90)90170-T. [DOI] [PubMed] [Google Scholar]
  15. Grigoryeva L, Ortega JP. Echo state networks are universal. Neural Netw. 2018;108:495–508. doi: 10.1016/j.neunet.2018.08.025. [DOI] [PubMed] [Google Scholar]
  16. Haeusler S, Maass W. A statistical analysis of information-processing properties of lamina-specific cortical microcircuit models. Cereb Cortex. 2007;17(1):149–162. doi: 10.1093/cercor/bhj132. [DOI] [PubMed] [Google Scholar]
  17. Hermans M, Schrauwen B. Recurrent kernel machines: computing with infinite echo state networks. Neural Comput. 2012;24(1):104–133. doi: 10.1162/NECO_a_00200. [DOI] [PubMed] [Google Scholar]
  18. Hewitt AL, Popa LS, Pasalar S, et al. Representation of limb kinematics in Purkinje cell simple spike discharge is conserved across multiple tasks. J Neurophysiol. 2011;106(5):2232–2247. doi: 10.1152/jn.00886.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA. 1982;79(8):2554–2558. doi: 10.1073/pnas.79.8.2554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hornik K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991;4(2):251–257. doi: 10.1016/0893-6080(91)90009-T. [DOI] [Google Scholar]
  21. Hyman IE, Jr, Husband TH, Billings FJ. False memories of childhood experiences. Appl Cognit Psychol. 1995;9(3):181–197. doi: 10.1002/acp.2350090302. [DOI] [Google Scholar]
  22. Jadhav SP, Kemere C, German PW, et al. Awake hippocampal sharp-wave ripples support spatial memory. Science. 2012;336(6087):1454–1458. doi: 10.1126/science.1217230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Jaeger H (2005) Reservoir riddles: suggestions for echo state network research. In: Proceedings of 2005 IEEE international joint conference on neural networks, vol 3, pp 1460–1462. 10.1109/IJCNN.2005.1556090
  24. Jaeger H (2010) Reservoir self-control for achieving invariance against slow input distortions
  25. Jaeger H. Using conceptors to manage neural long-term memories for temporal patterns. J Mach Learn Res. 2017;18(13):1–43. [Google Scholar]
  26. Jaeger H, Haas H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science. 2004;304(5667):78–80. doi: 10.1126/science.1091277. [DOI] [PubMed] [Google Scholar]
  27. Karlsson MP, Frank LM. Awake replay of remote experiences in the hippocampus. Nat Neurosci. 2009;12(7):913–918. doi: 10.1038/nn.2344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Laje R, Buonomano DV. Robust timing and motor patterns by taming chaos in recurrent neural networks. Nat Neurosci. 2013;16(7):925–933. doi: 10.1038/nn.3405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Larkum M. A cellular mechanism for cortical associations: an organizing principle for the cerebral cortex. Trends Neurosci. 2013;36(3):141–151. doi: 10.1016/j.tins.2012.11.006. [DOI] [PubMed] [Google Scholar]
  30. Lazar A, Pipa G, Triesch J. SORN: a self-organizing recurrent neural network. Front Comput Neurosci. 2009;3:23. doi: 10.3389/neuro.10.023.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Lee AK, Wilson MA. Memory of sequential experience in the hippocampus during slow wave sleep. Neuron. 2002;36(6):1183–1194. doi: 10.1016/S0896-6273(02)01096-6. [DOI] [PubMed] [Google Scholar]
  32. Leibold C. A model for navigation in unknown environments based on a reservoir of hippocampal sequences. Neural Netw. 2020;124:328–342. doi: 10.1016/j.neunet.2020.01.014. [DOI] [PubMed] [Google Scholar]
  33. Leutgeb JK, Leutgeb S, Treves A, et al. Progressive transformation of hippocampal neuronal representations in “morphed” environments. Neuron. 2005;48(2):345–358. doi: 10.1016/j.neuron.2005.09.007. [DOI] [PubMed] [Google Scholar]
  34. Loftus EF. When a lie becomes memory’s truth: memory distortion after exposure to misinformation. Curr Dir Psychol Sci. 1992;1(4):121–123. doi: 10.1111/1467-8721.ep10769035. [DOI] [Google Scholar]
  35. Lukoševičius M, Jaeger H. Reservoir computing approaches to recurrent neural network training. Comput Sci Rev. 2009;3(3):127–149. doi: 10.1016/j.cosrev.2009.03.005. [DOI] [Google Scholar]
  36. Maass W, Natschläger T, Markram H. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 2002;14(11):2531–2560. doi: 10.1162/089976602760407955. [DOI] [PubMed] [Google Scholar]
  37. Mankin EA, Sparks FT, Slayyeh B, et al. Neuronal code for extended time in the hippocampus. Proc Natl Acad Sci USA. 2012;109(47):19,462–19,467. doi: 10.1073/pnas.1214107109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Mayer NM, Browne M (2004) Echo state networks and self-prediction. In: Ijspeert AJ, Murata M, Wakamiya N (eds) Biologically inspired approaches to advanced information technology. Springer, Berlin, pp 40–48
  39. Milekic MH, Alberini CM (2002) Temporally graded requirement for protein synthesis following memory reactivation. Neuron 36(3):521–525 [DOI] [PubMed]
  40. Muller RU, Kubie JL. The effects of changes in the environment on the spatial firing of hippocampal complex-spike cells. J Neurosci. 1987;7(7):1951–1968. doi: 10.1523/JNEUROSCI.07-07-01951.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Nader K, Schafe GE, Le Doux JE. Fear memories require protein synthesis in the amygdala for reconsolidation after retrieval. Nature. 2000;406(6797):722–726. doi: 10.1038/35021052. [DOI] [PubMed] [Google Scholar]
  42. Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. Springer, New York
  43. Reinhart RF, Jakob Steil J (2012) Regularization and stability in reservoir networks with output feedback. Neurocomputing 90:96–105. 10.1016/j.neucom.2012.01.032. Advances in artificial neural networks, machine learning, and computational intelligence (ESANN 2011)
  44. Sara SJ. Retrieval and reconsolidation: toward a neurobiology of remembering. Learn Mem. 2000;7(2):73–84. doi: 10.1101/lm.7.2.73. [DOI] [PubMed] [Google Scholar]
  45. Schölkopf B, Smola AJ (2002) Learning with kernels : support vector machines, regularization, optimization, and beyond. Adaptive computation and machine learning. MIT Press. http://www.worldcat.org/oclc/48970254
  46. Schölkopf B, Herbrich R, Smola AJ. A generalized representer theorem. In: Helmbold D, Williamson B, editors. Computational learning theory. Berlin: Springer; 2001. pp. 416–426. [Google Scholar]
  47. Schrauwen B, Verstraeten D, Campenhout JMV (2007) An overview of reservoir computing: theory, applications and implementations. In: ESANN 2007, 15th European symposium on artificial neural networks, Bruges, Belgium, April 25–27, 2007, Proceedings, pp 471–482. https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2007-8.pdf
  48. Stanley GB (2001) Recursive stimulus reconstruction algorithms for real-time implementation in neural ensembles. Neurocomputing 38-40:1703–1708. 10.1016/S0925-2312(01)00535-5. Computational Neuroscience: Trends in Research 2001
  49. Stopfer M, Laurent G. Short-term memory in olfactory network dynamics. Nature. 1999;402(6762):664–668. doi: 10.1038/45244. [DOI] [PubMed] [Google Scholar]
  50. Sussillo D, Abbott LF. Generating coherent patterns of activity from chaotic neural networks. Neuron. 2009;63(4):544–557. doi: 10.1016/j.neuron.2009.07.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Sussillo D, Abbott LF. Transferring learning from external to internal weights in echo-state networks with sparse connectivity. PLoS One. 2012;7(5):e37,372. doi: 10.1371/journal.pone.0037372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Vapnik VN. The nature of statistical learning theory. New York: Springer; 1995. [Google Scholar]
  53. Williams RJ, Zipser D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1989;1(2):270–280. doi: 10.1162/neco.1989.1.2.270. [DOI] [Google Scholar]
  54. Ziv Y, Burns LD, Cocker ED, et al. Long-term dynamics of CA1 hippocampal place codes. Nat Neurosci. 2013;16(3):264–266. doi: 10.1038/nn.3329. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Download video file (388.9KB, mp4)

Original movie snippet from .re_potemkin, a copyleft crowdsourcing free/open source cinema project (https://re-potemkin.httpdot.net/)

Download video file (250.7KB, mp4)

Reconstruction of the original movie snippet (Video 1).

Download video file (337KB, mp4)

Original movie with time extended in the middle.

Download video file (590.4KB, mp4)

As Supplemental Video 3 but with a new scene added post hoc.

Download video file (461KB, mp4)

Reconstruction of original movie (Video 1) with importance reduced in the last frames.

Download video file (322.4KB, mp4)

Reconstruction of original movie (Video 1) with importance reduced in the first frames.

Download audio file (4MB, wav)

Filtered version (see Methods) of the original sound snippet song from the song I{\rsquo}ll be your everything by Texas Radio Fish (http://ccmixter.org/files/texasradiofish/63300, CC BY NC)

Data Availability Statement

The core recursive KSVR implementation can be found at https://github.com/cleibold/recsvr.


Articles from Biological Cybernetics are provided here courtesy of Springer

RESOURCES