Skip to main content
Springer logoLink to Springer
. 2022 Mar 14;116(3):327–362. doi: 10.1007/s00422-022-00923-y

The Impact of Sparse Coding on Memory Lifetimes in Simple and Complex Models of Synaptic Plasticity

Terry Elliott 1,
PMCID: PMC9170679  PMID: 35286444

Abstract

Models of associative memory with discrete state synapses learn new memories by forgetting old ones. In the simplest models, memories are forgotten exponentially quickly. Sparse population coding ameliorates this problem, as do complex models of synaptic plasticity that posit internal synaptic states, giving rise to synaptic metaplasticity. We examine memory lifetimes in both simple and complex models of synaptic plasticity with sparse coding. We consider our own integrative, filter-based model of synaptic plasticity, and examine the cascade and serial synapse models for comparison. We explore memory lifetimes at both the single-neuron and the population level, allowing for spontaneous activity. Memory lifetimes are defined using either a signal-to-noise ratio (SNR) approach or a first passage time (FPT) method, although we use the latter only for simple models at the single-neuron level. All studied models exhibit a decrease in the optimal single-neuron SNR memory lifetime, optimised with respect to sparseness, as the probability of synaptic updates decreases or, equivalently, as synaptic complexity increases. This holds regardless of spontaneous activity levels. In contrast, at the population level, even a low but nonzero level of spontaneous activity is critical in facilitating an increase in optimal SNR memory lifetimes with increasing synaptic complexity, but only in filter and serial models. However, SNR memory lifetimes are valid only in an asymptotic regime in which a mean field approximation is valid. By considering FPT memory lifetimes, we find that this asymptotic regime is not satisfied for very sparse coding, violating the conditions for the optimisation of single-perceptron SNR memory lifetimes with respect to sparseness. Similar violations are also expected for complex models of synaptic plasticity.

Keywords: Synaptic plasticity, Memory models, Sparse coding, Stochastic processes

Introduction

One line of experimental evidence suggests that synapses may occupy only a very limited number of discrete states of synaptic strength (Petersen et al. 1998; Montgomery and Madison 2002, 2004; O’Connor et al. 2005a, b; Bartol et al. 2015), or may change their strengths via discrete, jump-like processes (Yasuda et al. 2003; Bagal et al. 2005; Sobczyk and Svoboda 2007). Discrete state synapses overcome the catastrophic forgetting of the Hopfield model (Hopfield 1982) in associative memory tasks, turning memory systems into so-called palimpsests, which learn new memories by forgetting old ones (Nadal et al. 1986; Parisi 1986). Unfortunately, memory lifetimes in the simplest such models are rather limited, growing only logarithmically with the number of synapses (Tsodyks 1990; Amit and Fusi 1994; see also Leibold and Kempter 2006; Barrett and van Rossum 2008; Huang and Amit 2010). Memory lifetimes may be extended by considering either sparse coding at the population level (Tsodyks and Feigel’man 1988) or complex models of synaptic plasticity in which synapses can express metaplasticity (changes in internal states) without necessarily expressing plasticity (changes in strength) (Fusi et al. 2005; Leibold and Kempter 2008; Elliott and Lagogiannis 2012; Lahiri and Ganguli 2013). Two previous studies have examined complex models of synaptic plasticity operating in concert with sparse coding (Leibold and Kempter 2008; Rubin and Fusi 2007). For a discussion of the possible roles of the persistence and transience of memories and the synaptic mechanisms underlying synaptic stability, see, for example, Richards and Frankland (2017), and Rao-Ruiz et al. (2021).

We have proposed integrate-and-express models of synaptic plasticity in which synapses act as low-pass filters in order to control fluctuations in developmental patterns of synaptic connectivity (Elliott 2008; Elliott and Lagogiannis 2009). We have also applied these complex models of synaptic plasticity to memory formation, retention and longevity with discrete synapses (Elliott and Lagogiannis 2012), finding that they outperform cascade models (Fusi et al. 2005) in most biologically relevant regions of parameter space (Elliott 2016b). In this paper, we consider the role of sparse coding in the memory dynamics of a filter-based model. For comparison, we also consider the cascade model (Fusi et al. 2005), the serial synapse model (Leibold and Kempter 2008; Rubin and Fusi 2007) and a model of simple synapses (Tsodyks 1990) using our protocols.

Our paper is organised as follows. In Sect. 2, we present our general approach by describing the two memory storage protocols that we study, considering two different definitions of memory lifetimes, and obtaining general, model-independent results. Then, in Sect. 3, we consider both simple and complex models of synaptic plasticity, obtaining the analytical results required to study memory lifetimes in detail. We compare and contrast results for memory lifetimes in simple and complex models in Sect. 4. Finally, in Sect. 5, we briefly discuss our results.

General approach and formulation

We provide a convenient list of the most commonly used mathematical symbols and their meanings, excluding those that appear in the appendices, in Table 1.

Table 1.

List of frequently used mathematical symbols and their meanings (excluding those in Appendix B)

Symbol Meaning
P Number of neurons in memory system
N Number of synaptic inputs received by each neuron
N=NP Total number of synapses in memory system
r Rate of Poisson process for memory storage
Si(t) Strength of synapse i, i=1,,N, at time t0 s for a typical perceptron in the population
ξ_α Memory α, α=0,1,2,, to be stored by a single perceptron at Poisson storage step α
h(t)=hξ_0(t) Memory signal for the tracked memory stored at t=0 s
hα Memory signal for the tracked memory immediately after the storage of memory α
h0 h0=h(0), the initial perceptron activation induced by the tracked memory ξ_0
μ(t), σ(t)2, SNR(t) Mean, variance and signal-to-noise ratio of h(t), the single-perceptron memory signal
τsnr SNR memory lifetime of a typical memory, for a single perceptron
τmfpt(h0) MFPT memory lifetime conditioned on a definite activation h0 induced by a definite, tracked memory
ϑ Perceptron’s firing threshold
τmfpt Conditional MFPT τmfpt(h0) averaged over all tracked memories with h0>ϑ
σfpt2 Variance in the FPT
A(h), B(h) Jump moments in Fokker–Planck equation
ζ Level of spontaneous electrical activity, with 0ζ<1
f, g Probabilities of evoked (rather than spontaneous) pre- and postsynaptic (respectively) activities; fg
s Number of internal synaptic states for each of the two possible states of synaptic strength
M+, M-; M Potentiating and depressing matrices describing transitions in a single synapse’s state; M=12M++M-
K±; K K±=(1-f)I+fM±, I being the identity matrix; K=12K++K-
Tn Operator describing simultaneous changes in n synapses’ states at each (average) non-tracked memory storage step
A_n Normalised unit eigenstate of Tn, giving the joint equilibrium probability distribution of n synapses’ states
Ω_ Ω_T=-1_T|+1_T, the vector by which to weight a synapse’s internal states by their strengths
Peff Number of neurons in the population of P neurons that experience evoked activity during tracked memory storage
μp(t), σp(t)2, SNRp(t) Mean, variance and signal-to-noise ratio of the population memory signal
τpop SNR memory lifetime of a typical memory for the population of neurons
p, ψ Probability p of synaptic updates for a simple stochastic updater (SU) synapse; ψ=fp is a convenient shorthand
κ2 κ2=ψ/(2-ψ), the equilibrium correlation between pairs of SU synapses’ strengths in the Hebb protocol
B0 B0=B(0)/(ψg), so that part of B(h) that is independent of h, up to scaling factors
Neff Number of a single perceptron’s synapses that experience evoked presynaptic activity during tracked memory storage
Θ Synaptic filter threshold
Xopt For any parameter X, the label “opt” indicates that value of X that maximises τsnr or τpop
τsnropt, τpopopt Maximum values of τsnropt or τpopopt, optimised with respect to some parameter X

Memories and memory lifetimes

We consider a population of P neurons forming a memory system, perhaps performing association or auto-association tasks. Let each neuron receive N synaptic connections from N other neurons that are randomly selected from the entire population. Fully recurrent connectivity would imply that N=P-1 (excluding self-connections) but in general NP. Other than the requirement that N<P, N may be regarded as mathematically independent of P. Memories are stored sequentially, one after the other, by this memory system. We take them to be stored at times t0 s governed by a Poisson process of rate r Hz. This continuous time approach is more realistic than a discrete time approach in which memories are stored at uniformly spaced time steps. Due to ongoing synaptic plasticity driven by the storage of later memories, the synaptic patterns that embody earlier memories may be degraded, so that the fidelity of recall of earlier memories may fall over time, ultimately falling to an equilibrium or background level of complete amnesia. It is typical in these scenarios to track the fidelity of recall of the first memory as subsequent memories are stored. This first memory is taken to be stored at time t=0 s on the background equilibrium probability distribution of synaptic strengths.

In previous work, we have focused on a single neuron, or perceptron, in such a system and have examined its recall of stored memories. Here, in a sparse population coding context, we must consider the collective dynamics of the entire population of neurons, but these collective dynamics are nevertheless driven by synaptic processes occurring at the level of single perceptrons in the system. Considering, then, a single perceptron in this population, let its N synapses have strengths Si(t){-1,+1}, i=1,,N, at time t0 s. These two strength states should be thought of as low and high rather than inhibitory and excitatory. As memories are presented to the system for storage, the perceptron is exposed to synaptic inputs characterised by the N-dimensional vectors ξ_α, α=0,1,2,, where α indexes the memories. The component ξiα represents the input through synapse i during the presentation of memory α, and for simplicity we assume that these components are independent between synapses and across memories.

In response to each of these memory vectors, the perceptron must generate the correct activation or output. With inputs xi through its N synapses, the perceptron’s activation is defined as usual by

hx_(t)=1Ni=1NxiSi(t). 1

The perceptron’s output is some possibly nonlinear function of its activation, where this output can correspond to spontaneous activity under conditions of no (or spontaneous) input. We track the fidelity of recall of the first memory ξ_0 by examining the perceptron’s activation upon re-presentation (but not re-storage) of this memory at later times t>0 s. We refer to h(t)hξ_0(t) as the tracked memory signal or just the memory signal. The dynamics of h(t) will determine the lifetime of memory ξ_0, at least as far as this single perceptron’s capacity to generate the correct output upon re-presentation of ξ_0 is concerned. Of course, we are not interested in the lifetime of any particular tracked memory ξ_0 stored on any particular pattern of synaptic connectivity and subject to any particular sequence of subsequent, non-tracked memories ξ_α, α>0, stored at any particular set of Poisson-distributed times 0<t1<t2<t3<. Rather, we are interested only in the lifetime of a typical tracked memory subject to a typical sequence of later memories. Thus, we consider only the statistical properties of h(t) when suitably averaged over all memories.

Memory lifetimes may be defined in a variety of ways using these statistical properties. The simplest definition is to consider the mean and variance of h(t)

μ(t)=E[h(t)], 2a
σ(t)2=Var[h(t)], 2b

define the signal-to-noise ratio (SNR) as:

SNR(t)=μ(t)-μ()/σ(t), 3

and then define the memory lifetime as that value of t, call it τsnr, that is the (largest) finite, non-negative solution of the equation SNR(τsnr)=1 when this solution exists; otherwise, we set τsnr=0 s (Tsodyks 1990). This is the last time at which μ(t) is distinguishable from its equilibrium value μ() at the level of one standard deviation. Although an “ideal observer” approach to defining memory lifetimes has also been considered (Fusi et al. 2005; Lahiri and Ganguli 2013), it is essentially equivalent to the SNR approach (Elliott 2016b).

The activation h(t) provides a direct read-out of the perceptron’s response to the re-presentation of ξ_0 at later times, and would correspond to a neuron’s membrane potential in a more realistic, integrate-and-fire model. By focusing on this read-out of the perceptron’s state, we are naturally led to consider the first passage time (FPT) for the perceptron’s activation to fall below firing threshold, and thus to consider the mean first passage time (MFPT) for this process, which is the mean taken over all tracked and non-tracked memories (Elliott 2014). We may then define an alternative memory lifetime, call it τmfpt, as this MFPT for the perceptron’s activation in response to re-presentation of a typical tracked memory to fall below firing threshold. We have extensively discussed and contrasted the SNR and FPT approaches to defining memory lifetimes elsewhere (Elliott 2014, 2016a, 2017a). In essence, SNR memory lifetimes are only valid in asymptotic, typically large N regimes, while FPT memory lifetimes are valid in all regimes. SNR lifetimes must therefore be interpreted with caution.

To compute FPT lifetimes, we require Prob[hα+1|hα], the transition probability describing the probability that the perceptron’s activation (in response to re-presentation of the tracked memory) is hα+1 immediately after the storage of average non-tracked memory ξ_α+1, given that its activation is hα immediately before the memory’s storage. This transition probability is most easily computed in simple models of synaptic plasticity, for which it is independent of the memory storage step (Elliott 2014, 2017a, 2019). This independence arises because simple synapses with only two strength states are “stateless” (Elliott 2020), having no internal states and not enough strength states to carry information between consecutive memory storage steps. In this case, all the probabilities Prob[hα+1|hα] over all the possible, discrete values of hα and hα+1 define the elements of a transition matrix in the perceptron’s activation between memory storage steps that is independent of the non-tracked memory storage step α+1 (α0). We can then drop the index α and consider general elements Prob[h|h] between any two possible values of the perceptron activation, h and h. We will therefore examine FPT lifetimes only for simple synapses, and SNR memory lifetimes for both simple and complex synapses, but with the understanding that SNR results must be interpreted cautiously. With the transition probabilities Prob[h|h] being independent of the memory storage step, and with the storage of the definite tracked memory ξ_0 inducing the definite activation h0 immediately after its storage, the FPT lifetime of the memory ξ_0 is the solution of the equation

τmfpt(h0)=1r+h>ϑτmfpt(h)Prob[h|h0], 4

where transitions to activations below the firing threshold ϑ are disallowed (Elliott 2014; see van Kampen 1992, for a general discussion). Equation (4) generalises in the obvious way to an integral equation when h can be regarded as a continuous rather than discrete variable. Solving Eq. (4) for τmfpt(h0) for all values of h0>ϑ entails solving a linear system involving the perceptron activation transition matrix. We therefore refer to Eq. (4) for simplicity but somewhat inaccurately as a matrix equation, to distinguish it from its integral equation equivalent in the continuum limit. This matrix or integral equation (MIE) approach to FPTs is exact. We may also consider an approximation involving the Fokker–Planck equation (FPE) approach by computing the jump moments induced by Prob[h|h]. Because memories are stored as a Poisson process, the jump moments are simply

A(h)=E[(h-h)1|h], 5a
B(h)=E[(h-h)2|h], 5b

where the expectation values are calculated with respect to Prob[h|h]. Then, standard methods (Elliott 2014; see van Kampen 1992, in general) give the MFPT as the solution of the equation:

-1r=A(h0)ddh0τmfpt(h0)+12B(h0)d2dh02τmfpt(h0), 6

subject to the boundary condition rτmfpt(ϑ)=0. Equations similar to Eqs. (4) and (6) give the higher-order FPT moments (Elliott 2019). Given τmfpt(h0), we obtain τmfpt by averaging over the distribution of h0 (for values of h0>ϑ), corresponding to averaging over the tracked memory ξ_0, i.e. τmfpt=τmfpt(h0)h0>ϑ.

Hebb protocol

We adopt and adapt the memory storage protocol employed by Leibold and Kempter (2008). Their memory system performs an association task. Within the population of P neurons, a sub-population of “cue” neurons is required to activate a sub-population of “target” neurons. Synapses from cue to target neurons experience potentiating induction signals during memory storage, while those from target to cue experience depressing induction signals; all other synapses do not experience plasticity induction signals. Although Leibold and Kempter (2008) do allow for the possibility of some overlap between cue and target sub-populations, this will not be relevant here. The storage of different memories involves different cue and target sub-populations, so that the entire population of P neurons will be involved in storing many memories over time. If cue and target sub-populations are of equal size, as we assume, then potentiation and depression processes are equally balanced on average. This assumption stands in lieu of realistic neuron models, in which we expect (Elliott 2016a) synaptic plasticity to be dynamically regulated to move to stable dynamical fixed points in which such balancing is achieved automatically (Bienenstock et al. 1982; Burkitt et al. 2004; Appleby and Elliott 2006).

While Leibold and Kempter (2008) consider activities ξiα{0,1}, corresponding to inactive (ξiα=0) and active (ξiα=1) input neurons, we will consider the more general case of ξiα{ζ,1}, with 0ζ<1, where ξiα=ζ represents a spontaneous, non-evoked, or background level of activity for an input neuron that is in neither cue nor target sub-populations, while ξiα=1 represents evoked activity from a cue or target input. We often refer below to active and inactive inputs or neurons, with the understanding that we mean evoked activity and spontaneous activity, respectively. Because synaptic plasticity occurs only between cue and target neurons, synapses between a pre- or postsynaptic neuron that is only spontaneously active do not undergo synaptic plasticity. This accords with our expectations from known physiology: protocols for long-term potentiation (LTP; Bliss and Lømo 1973) and long-term depression (LTD; Lynch et al. 1977) require sustained bouts of evoked electrical activity rather than just spontaneous levels of activity. On a broadly BCM view of synaptic plasticity (Bienenstock et al. 1982), we would expect two thresholds for synaptic plasticity: as activity levels ramp up from spontaneous to weak to strong tetanisation, plasticity switches from none to LTD to LTP. Since synaptic plasticity can only occur between pairs of active, synaptically coupled neurons in this scenario, we refer to it as the Hebb protocol: Hebbian synaptic plasticity is typically understood to mean activity-dependent, bidirectional synaptic plasticity between active pre- and postsynaptic neurons. Although spontaneous activity has by assumption no impact on synaptic plasticity here, it nevertheless has a direct impact on h(t).

For a particular perceptron, let the probability that it is active during the storage of any particular memory be g. Since the perceptron could be part of either cue or target sub-populations, the probability that it is either cue or target during the storage of a memory is just 12g. The probability that any one of its synaptic inputs is active during memory storage is also just g. However, for the purposes of clarity it is convenient to distinguish between these two probabilities, so we denote the probability that an input is active as f (g). In this way, the appearance of a factor of g indicates a global, postsynaptic factor due to the perceptron, or postsynaptic cell, being in the cue or target population, while a factor of f indicates a local, presynaptic factor due to an input being in the cue or target population. The probability g, or f, controls the sparseness of the memory representation in this memory system. Considering just a single perceptron, if it is neither cue nor target, then none of its synapses can experience plasticity induction signals. If it is a cue, then only those inputs that correspond to target cells (if any) experience plasticity induction signals, and specifically depressing signals. If it is a target, then similarly only cue inputs experience induction signals, and so only potentiating signals. Without loss of generality, we may therefore just assume that during memory storage, an active perceptron’s active inputs are either all cue or all target neurons. This simplifying assumption effectively doubles on average the rate of plasticity induction signals experienced by synapses compared to the scenario in which the perceptron’s active inputs could represent a combination of cue and target neurons. We could therefore just scale f accordingly.

We summarise the Hebb protocol in Fig. 1, which schematically illustrates a sample of the population of pairs of pre- and postsynaptic neurons, showing all possible combinations of presynaptic activities and postsynaptic roles with their respective probabilities, together with the direction of synaptic plasticity induced by them.

Fig. 1.

Fig. 1

Schematic illustration of the Hebb protocol for memory storage. Six pairs of synaptically coupled neurons are shown. Each cell body is represented by a triangle, with the value (ζ or 1) inside the triangle indicating the neuron’s activity during memory storage. A neuron’s axon is denoted by a directed line, while two of its dendrites are denoted by the dashed lines. Synaptic coupling is indicated by a small black blob where an axon terminates on a dendrite, with the symbol to the right of the blob indicating the direction of induced synaptic plasticity during memory storage (“” indicates potentiation, “” depression, and “×” no change). The labels “C”, “N” or “T” attached to a postsynaptic cell body indicate that the neuron is a cue cell, neither a cue cell nor a target cell, or a target cell, respectively, in the population. Probabilities of presynaptic activity (f or 1-f) are indicated, as are the joint probabilities of postsynaptic activity and specific role (12g or 1-g). The fact that an active presynaptic neuron synapsing on a cue or target cell always experiences the induction of depression or potentiation, respectively, reflects the simplifying assumption discussed in the main text

To assess memory lifetimes under this protocol, we may track the ability of the cue sub-population to successfully evoke activity in the target sub-population. Considering a single perceptron in the target sub-population, we may obtain general expressions for μ(t) and σ(t)2 in Eq. (2), where these expressions are independent of any particular model of synaptic plasticity. Because h(t)=1Ni=1Nξi0Si(t) and similarly h(t)2=1N2i,j=1Nξi0ξj0Si(t)Sj(t), their expectation values lead to

μ(t)=fES1(t)|++(1-f)ζES1(t)|×0, 7a
σ(t)2=f+(1-f)ζ2-μ(t)2N+N-1N{f2ES1(t)S2(t)|+++2f(1-f)ζES1(t)S2(t)|+×+(1-f)2ζ2ES1(t)S2(t)|××-μ(t)2}, 7b

where we could pick any synapse i in Eq. (7a) and any distinct pair of synapses i and j in Eq. (7b) but we restrict without loss of generality to i=1 and j=2. In these equations, we condition on whether a synapse has experienced a potentiating induction signal (“+”) with probability f or not (“×”) with probability 1-f, during the storage of ξ_0. For the models of synaptic plasticity that we consider below, the (marginal) equilibrium probability distribution of any single synapse’s strength is uniform, or Prob[Si()=±1]=12, so that if a synapse does not experience a plasticity induction signal during the storage of ξ_0, then ESi(t)|×0 at t=0 s and this remains true for all times t0 s when potentiation and depression processes are treated symmetrically, as indicated in Eq. (7a). However, for the pairwise correlations in Eq. (7b) that condition on one or both synapses not having experienced an induction signal, the expectation values do not vanish under the Hebb protocol. This is because of the higher-order equilibrium correlational structure induced by the fact that it is impossible for some of the synapses of an active neuron to experience potentiating induction signals while others experience depressing induction signals during the storage of the same memory under the Hebb protocol.

We may obtain general expressions for the expectation values in Eq. (7) by writing down the transition processes that govern changes in a single synapse’s strength or simultaneous changes in a pair of synapses’ strengths. Let each synapse have s possible internal states for each of its two possible strengths ±1, so that the possible state of a synapse is described by a 2s-dimensional vector, with the internal states for strength -1 (respectively, +1) corresponding to the first (respectively, last) s components. Given the stochastic nature of the plasticity induction signals, this vector defines a joint probability distribution for a synapse’s combined strength and internal state. Let the transition matrix M+ implement the definite change in a synapse’s state in response to a potentiating induction signal, and M- that for a depressing induction signal. We then determine the transition matrix governing the change in a single synapse’s state in response to the storage of a typical non-tracked memory by conditioning on all possible combinations of presynaptic activity and postsynaptic role. Defining K±=(1-f)I+fM±, where I is the identity matrix, this transition matrix is

T1=(1-g)I+12gK++12gK-, 8a

or just T1=(1-g)I+gK, where K=(1-f)I+fM with M=12(M++M-). The three terms in Eq. (8a) arise from conditioning on the three possible perceptron roles in memory storage (determined by the global factor g), while the two terms in each of K± arise from conditioning on the two possible levels of presynaptic activity (determined by the local factor f). Similarly, the transition operator that governs simultaneous changes in pairs of synapses’ states during typical non-tracked memory storage is

T2=(1-g)II+12gK+K++12gK-K-, 8b

with the generalisation to Tn for any number of synapses n being clear. The (marginal) equilibrium probability distribution of a single synapse’s state, denoted by A_1, is the (normalised) eigenvector of T1 with unit eigenvalue, which is also just the unit eigenvector of M. That for any pair of synapses, A_2, corresponds to the unit eigenvector of T2. However, because T2(1-g)II+gKK, then A_2A_1A_1. Rather, A_2 must be explicitly computed as the unit eigenstate of 12(K+K++K-K-). It is this failure of factorisation that induces the non-trivial pairwise correlational structure in the equilibrium state.

Using T1 and T2, we may write down the conditional expectation values in Eq. (7). We define the vector Ω_T=-1_T|+1_T, where T denotes the transpose and the s-dimensional vector 1_ is a vector all of whose components are unity. This vector weights synaptic states according to their two possible strengths. Then,

ES1(t)|+=Ω_TeT1-IrtM+A_1, 9a

and

ES1(t)S2(t)|++=(Ω_TΩ_T)eT2-IIrtM+M+A_2, 9b

and for the other two pairwise expectation values in Eq. (7b), we replace M+M+ in Eq. (9b) by M+I for +× and II for ××. Since T1-I=fgM-I, we have μ(t)=fΩ_TeM-IfgrtM+A_1, so that sparse coding just introduces a multiplicative factor of f and scales the rate r by the product fg in μ(t). In the equilibrium limit, by definition exp[(Tn-II)rt]v_A_n for any state v_ corresponding to a probability distribution, as t. Hence, μ()=fΩ_TA_10, which always follows when potentiation and depression processes are treated symmetrically. For the equilibrium variance, we obtain

σ()2=f+(1-f)ζ2N+N-1Nf+(1-f)ζ2(Ω_TΩ_T)A_2. 10

The second, covariance term does not in general vanish because of the equilibrium synaptic pairwise correlations.

These general results allow us to obtain SNR lifetimes when the matrices M± are specified for any particular model of synaptic plasticity. We defer the derivation of the transition matrix elements Prob[h|h] that are required for FPT lifetimes until we explicitly discuss simple models of synaptic plasticity in Sect. 3.1.

Hopfield protocol

Although the Hebb protocol is intuitive as a means of exploring memory lifetimes in an associative memory system, its non-trivial equilibrium distribution of synaptic states is awkward. To avoid this awkwardness, we may consider an alternative protocol that is nevertheless equivalent to the Hebb protocol in the limit of small fgN. We first define the protocol and then demonstrate the equivalence.

During memory storage, instead of defining cue and target sub-populations, we now specify the entire activity pattern, representing a memory, across the whole population of neurons. We allow these activities to take values from the set {-1,-ζ,+ζ,+1} with probabilities {12f,12(1-f),12(1-f),12f}. Here, the values ±1 represent evoked activity (the neuron is involved in memory storage), with +1 (respectively, -1) representing a strongly (respectively, weakly) tetanising stimulus in the usual LTP (respectively, LTD) sense. In contrast, the values ±ζ represent spontaneous activity (the neuron is not involved in memory storage). For a single perceptron, this amounts to specifying memory vectors ξ_α with components ξiα taking one of these four values, and also specifying the perceptron’s required output in response to an input vector, where this output is drawn from the same set with the same probabilities, but with f replaced by g as usual. We can track the perceptron’s activation when its required output is either +1 or -1, but by symmetry its activation would differ only by a sign between these two cases, so for concreteness we just take the required output to be +1 during the storage of ξ_0. As with the Hebb protocol, a synapse does not experience a plasticity induction signal if its presynaptic input or the postsynaptic perceptron itself is only spontaneously active. However, if both input and perceptron are active, then the synapse experiences a plasticity induction signal, either potentiating if both activities are the same or depressing if different. This is just the standard Hopfield rule (Hopfield 1982), so we refer to this protocol as the Hopfield protocol: we obtain a pattern of synaptic plasticity induction signals in response to evoked activity that is identical to the standard Hopfield rule, but supplemented by the presence of spontaneous activity that does not induce synaptic plasticity.

Figure 2 summarises the Hopfield protocol, showing all allowed combinations of pre- and postsynaptic activities during memory storage, together with their associated probabilities and induced plasticity induction signals. Depressing and potentiating induction signals both occur with the same overall probability 12fg as in the Hebb protocol.

Fig. 2.

Fig. 2

Schematic illustration of the Hopfield protocol for memory storage. The format of this figure is essentially identical to that for the Hebb protocol in Fig. 1, except that labels indicating postsynaptic roles are not required. To avoid duplication, spontaneously active neurons are shown with both possible spontaneous activity levels, ±ζ; the corresponding probability is for each of these levels rather than for both

Computing μ(t) and σ(t)2 for the Hopfield protocol and using the various symmetries ES1(t)|+=-ES1(t)|-, ES1(t)S2(t)|++=ES1(t)S2(t)|--, etc., we obtain

μ(t)=fES1(t)|+, 11a
σ(t)2=f+(1-f)ζ2-μ(t)2N+N-1N{f2ES1(t)S2(t)|+++(1-f)2ζ2ES1(t)S2(t)|××0-μ(t)2}. 11b

These are structurally identical to the expressions in Eq. (7) for the Hebb protocol, except that the linear terms in ζ are absent because of cancellation. Had we instead used a single level ζ of spontaneous activity rather than the two levels ±ζ, we would have obtained identical linear terms, too. Writing down the transition operators T1 and T2 in the Hopfield protocol, we obtain

T1=(1-g)I+gK, 12a
T2=(1-g)II+gKK, 12b

with immediate generalisation to Tn. The (marginal) equilibrium distribution of a pair of synapses’ states is therefore determined by the unit eigenstate of KK and thus of MM, and so is just A_2=A_1A_1; again, generalisation to A_n is immediate. The result is that all conditional expectation values involving at least one synapse that does not experience a plasticity induction event during the storage of ξ_0 vanish, when potentiation and depression processes are treated symmetrically. So, whether we use four-level or three-level activities in the Hopfield protocol, the ζ-dependent contributions to the covariance term in Eq. (11b) drop out, as indicated, so that the variance is affected only by the ζ2 term in the first term on the right-hand side (RHS) of Eq. (11b). Moreover, the covariance term vanishes entirely in the large t, equilibrium limit, since ES1(t)S2(t)|++(Ω_TA_1)20, so that σ()2=[f+(1-f)ζ2]/N.

The equivalence of the Hebb and Hopfield protocols in the limit of small fgN is now clear. The corresponding transition matrices T1 are in any case identical for both protocols, and hence so are the means. For T2, in both protocols we have that

T2-II=fgM-II+IM-I+O(f2g), 13

and for general TN the O(fg) term on the RHS contains N terms, each of which contains N-1 factors of I and just one factor of M-I. This structure reflects the fact that in the limit of small fgN, at most one of the perceptron’s synapses experiences a plasticity induction signal, regardless of the protocol. The corresponding unit eigenstate of TN in this limit is just A_1A_1, regardless of the protocol. Therefore, in the small fgN limit, the equilibrium distribution of synaptic states in the Hebb protocol reduces to that in the Hopfield protocol, and all statistical properties of h(t) must therefore also reduce in the same way. The Hopfield protocol therefore offers a way of extrapolating the small fgN behaviour of the Hebb protocol to larger f without the awkwardness of the Hebb protocol’s equilibrium structure in this regime. Furthermore, the simpler form of the results in the Hopfield protocol allow us to use it to extract the scaling properties of memory lifetimes as a function of small f (or g) in both protocols.

For the non-sparse-coding case of f=1, spontaneous activity does not contribute to the Hopfield protocol’s dynamics, and we recover precisely the Hopfield model with discrete-state synapses. For f<1, we expand the possible activities of neurons to allow for spontaneously active neurons that are not involved in memory storage. Thus, although the Hopfield protocol provides a convenient tool for examining the small fgN limit of the Hebb protocol, we also regard the Hopfield protocol as a fully fledged protocol in its own right, because it constitutes a very natural way of examining sparse coding with a Hopfield plasticity rule.

Population memory lifetimes

So far we have focused on the memory dynamics of a single perceptron. We now consider the memory dynamics of the entire population of P neurons. We do this only for the Hopfield protocol for simplicity. The tracked memory will evoke activity in a sub-population of on average gP neurons. In an experimental protocol, during the storage of the tracked memory we can at least in principle explicitly identify all those neurons that are active, and then subsequently track all their activities during later re-presentations of the tracked memory. Because of synaptic coupling between these tracked neurons and the other on average (1-g)P neurons, spontaneous activity in the other neurons will affect and potentially degrade the activation of the tracked neurons upon re-presentation of the tracked memory, affecting the tracked neurons’ ability to read out the tracked memory. But, as we are only concerned with the tracked neurons’ read-out of the tracked memory, we do not need to explicitly track the activities of all these other neurons: their activities do not directly form part of the memory signal from the tracked neurons.

In the Hopfield protocol, a tracked neuron will by definition have an output of +1 or -1 during memory storage. For a single perceptron, we focused on an output of +1 without loss of generality. A perceptron with an initial output of -1 will have identical dynamics to one with an initial output of +1, except that the activation will be reversed in sign. Therefore, we can just define the memory signal for any active perceptron to be ±h(t), depending on this sign. Denoting the moment generating function (MGF) of +h(t) for a tracked neuron with an initial output of +1 by M(z;t), the MGF for -h(t) for a tracked neuron with an initial output of -1 will also be just M(z;t). All tracked neurons therefore have the same MGF for their memory signals.

Suppose that Peff neurons form the sub-population that stores the tracked memory, where Peff is binomially distributed with parameter P and probability g. Although these neurons’ activations will not in general evolve independently, as an extremely coarse approximation we assume that their activations do evolve independently during subsequent memory storage (cf. Rubin and Fusi 2007). Population memory lifetimes obtained from this simplifying assumption will therefore only be theoretical, and perhaps very loose, upper bounds on exact memory lifetimes. With this simplification, the MGF for the memory signal from the tracked sub-population is then M(z;t)Peff, by independence. Averaging over Peff, the MGF of the population memory signal is then just (1-g)+gM(z;t)P. The mean, μp(t), and variance, σp(t)2, of this population signal follow directly. Ignoring covariance terms (or considering the limit t for the variance), we have μp(t)=gPμ(t) and σp(t)2gPσ(t)2, where μ(t) and σ(t)2 are the single-perceptron mean and variance above.1 Hence, the population SNR, SNRp(t)=μp(t)-μp()/σp(t), is just scaled by the factor gP relative to the single-perceptron SNR, so

SNRp(t)gPSNR(t). 14

The population SNR memory lifetime, which we denote by τpop, is then the solution of SNRp(τpop)=1. With σ(t)2[f+(1-f)ζ2]/N in the Hopfield protocol, SNRp(t) depends on N=NP, the total number of synapses in the memory system, but it also contains the additional factor of g compared to SNR(t), which modifies scaling behaviour compared to single-perceptron results.

Models of synaptic plasticity

Simple synapses: the stochastic updater

The simplest model of synaptic plasticity to consider is one in which synapses lack any internal states so that s=1, and given a plasticity induction signal, they change strength (if possible) with some fixed probability p (Tsodyks 1990). Because a synapse just changes its strength stochastically in this model, we have called such a synapse a “stochastic updater” (SU; Elliott and Lagogiannis 2012). The underlying strength transition matrices are then

M+=1-p,0p,1,M-=1,p0,1-p, 15

and so

K+=1-ψ,0ψ,1,K-=1,ψ0,1-ψ, 16

where we define ψ=fp for convenience.

The equilibrium distribution of a single synapse’s strength in both protocols is just the normalised unit eigenvector of K, or A_1=121,1T. For the Hopfield protocol, any pair of synapses’ strengths has the equilibrium distribution A_2=A_1A_1. For the Hebb protocol, we require the unit eigenstate of 12(K+K++K-K-), which gives

A_2=1+κ241010+0101+1-κ241001+0110, 17

where κ2=ψ/(2-ψ). The quantity κ2 determines the pairwise correlations present in this state, since (Ω_TΩ_T)A_2κ2. For f0, κ20 and A_2A_1A_1. With these equilibrium distributions, we may explicitly compute μ(t) and σ(t)2 in both protocols using Eqs. (7) and (11). For the common mean, we obtain

μ(t)=fpe-fgprt, 18

and for the two variances, we need the various correlation functions in Eqs. (7b) and (11b). For the Hebb protocol, these are

ES1(t)S2(t)|++=p2e-(2-fp)fgprt+[1-p(2-p)e-(2-fp)fgprt]κ2, 19a
ES1(t)S2(t)|+×=[1-pe-(2-fp)fgprt]κ2, 19b
ES1(t)S2(t)|××=κ2, 19c

and for the Hopfield protocol, we just set κ2=0 in these equations. These results allow us to determine SNR lifetimes for simple, SU synapses. Approximating the Hopfield variance by its asymptotic form σ()2, the single-perceptron SNR memory lifetime in the Hopfield protocol for a stochastic updater is then

τsnr12fgprlogef2p2σN()2, 20

and for the population SNR lifetime τpop, we replace σN() by σN()/g, where the subscript in σX()2=[f+(1-f)ζ2]/X indicates either N or N=NP.

To determine FPT lifetimes, we require Prob[h|h] for the MIE approach, or the induced jump moments A(h) and B(h) for the FPE method. We relegate the derivation of Prob[h|h] to Appendix A, where we also indicate our numerical methods for obtaining FPTs. From Appendix A, we obtain the jump moments in Eq. (5) for the FPE approach to FPTs. For both protocols we get the same first jump moment

A(h)=-ψgh, 21

and for the second jump moment, we get

B(h|Neff)=NeffN2+N-NeffN2ζ2ψ(2-ψ)g+ψ2g(h)2+Neff(Neff-1)N2ψ2g+2Neff(N-Neff)N2ψ2gζ+(N-Neff)(N-Neff-1)N2ψ2gζ2, 22a
B(h|Neff)=NeffN2+N-NeffN2ζ2ψ(2-ψ)g+ψ2g(h)2, 22b

for the Hebb and Hopfield protocols, respectively. We have explicitly indicated the dependence of B(h) on Neff, where Neff is the number of a perceptron’s synapses that are active during the storage of ξ_0. We write B(h|Neff)=ψgB0(Neff)+ψ2g(h)2, where we separate out the quadratic dependence on h and it is convenient to remove an overall factor of ψg from the definition of B0(Neff). Dropping the quadratic term from B(h|Neff) is equivalent to considering dynamics based on the Ornstein–Uhlenbeck process (Uhlenbeck and Ornstein 1930), which we have found to be a very good approximation (Elliott 2014, 2017a, 2019), so we work with just the constant term.

For the MIE approach to FPTs, a technical difficulty as discussed in Appendix A requires us to restrict to the specific case of ζ=0 only. We use numerical methods to obtain FPT lifetimes from the MIE approach, but for small fN, the dynamics are dominated by Neff=1. For Neff=1 and ϑ=0, Eq. (4) is trivial because the only contribution to the sum involves no transition, occurring with probability 1-12ψg regardless of the protocol. Writing σfpt2 as the variance in the FPT, we obtain

MIEτmfptN(1+p)pgr,σfpt24N(1+p)p2fg2r, 23

at leading order, for small f (=g) in both protocols. We see that τmfpt scales as 1/f in this regime, but that σfpt scales as 1/f3/2. Although σfpt swamps τmfpt for small f, τmfpt is nevertheless robustly positive. We may use our earlier results to obtain the corresponding forms for the FPE approach to FPT lifetimes for small f (see Eqs. (3.29) and (3.30) in Elliott 2017a). We obtain

FPEτmfptloge22pfgr,σfpt2π2+6loge2224p2f2g2r. 24

In contrast to Eq. (23), now τmfpt scales as 1/f2 and not 1/f, and σfpt scales as 1/f2 and not 1/f3/2. Moreover, in the FPE approach, the FPT moments have lost their overall scaling with N. Although the forms in Eq. (24) are obtained using mean field approximations that are expected to be invalid when fN is small, in fact we obtain the same scaling behaviour when the expectation values are obtained by averaging properly over h0 and Neff. Our simulation results, discussed in Sect. 4, agree with the behaviour in Eq. (23). Therefore, the failure of the FPE approach for small fN in Eq. (24) is due to the approximations intrinsic to the FPE approach itself. These include the diffusion and especially the continuum limit. For small fN, the system is nowhere near the continuum limit, so the scaling behaviour must be incorrect there.

Complex synapses

We now turn to models of complex synapses that have internal states, so that s>1. In such models, synapses can undergo metaplastic changes in their internal states without expressing changes in synaptic strength. We will only consider SNR lifetimes in relation to complex synapses. We have studied FPT lifetimes for filter-based synaptic plasticity for both bistate (Elliott 2017b) and multistate (Elliott 2020) synapses in a non-sparse coding context, but we have yet to consider other models of complex synapses. We therefore restrict to SNR lifetimes, but with the caveat that they are valid only in an asymptotic regime.

We have discussed filter-based models of synaptic plasticity at length elsewhere (Elliott 2008; Elliott and Lagogiannis 2009, 2012; Elliott 2016b), so we only briefly summarise them here. Synapses are proposed to implement a form of low-pass filtering by integrating plasticity induction signals in an internal filter state. Synapses then filter out high-frequency noise in their induction signals and pass only low-frequency trends, rendering them less susceptible to changes in strength due to fluctuations in their inputs. Potentiating (respectively, depressing) induction signals increment (respectively, decrement) the filter state, with synaptic plasticity being expressed (if possible) only when the filter reaches an upper (respectively, lower) threshold. For symmetric potentiation and depression processes, we may take these thresholds to be ±Θ. The filter can occupy the 2Θ-1 states -(Θ-1),,+(Θ-1), with the thresholds ±Θ not being occupiable states. Several variant filter models are distinguishable by their different dynamics upon reaching threshold (Elliott 2016b), but we consider only the simplest of them here. In the simplest model, the filter always resets to the zero filter state upon reaching threshold, regardless of its strength state and regardless of the type of plasticity induction signal. This filter generalises to any multistate synapse. If the synapse is saturated at its upper (respectively, lower) strength state and reaches its upper (respectively, lower) filter threshold upon receipt of a potentiating (respectively, depressing) induction signal, the filter resets to zero despite the fact that it cannot increment (respectively, decrement) its strength. The transitions for this filter for the case of Θ=3 are illustrated in Fig. 3A. Although for clarity we have shown all permitted transitions between all filter and strength states, we stress that each synapse possesses only a single synaptic filter: the filter is not duplicated for each strength state. Transitions in filter state occur independently of strength state. Nevertheless, to describe transitions in the joint strength and filter state, we require 2(2Θ-1)×2(2Θ-1) matrices, so s=2Θ-1, although the number of required physical states for filter-based synapses is just 2Θ-1 for the filter states themselves, and an additional, binary-valued variable for the bistate strength, so a total of 2Θ states.

Fig. 3.

Fig. 3

Strength and internal state transitions for various models of complex synapses. Coloured circles indicate synaptic states, with red (respectively, blue) circles corresponding to strength S=-1 (respectively, S=+1), and the labelled numbers inside the circles identifying the particular internal states (indexed by I for filter states and i for serial and cascade states). Different internal states of the same strength state are organised in the same vertical column, while different strength states correspond to different columns. Solid (respectively, dashed) lines between states show transitions caused by potentiating (respectively, depressing) induction signals, with arrows indicating the direction of the transition. Loops to and from the same state indicate no transition. Three different models are shown, as labelled, corresponding to a Θ=3 filter model (A), and s=5 serial (B) and cascade (C) synapse models. For the filter and serial synapse models, given the presence of an induction signal of the correct type, the transition probabilities are unity. For the cascade model, the transition probabilities are as discussed in the main text

We state without derivation the result for μ(t) in this filter model:

μfil(t)=fΘ3l=0Θ-1cot2(2l+1)π4Θe-fgrt1-cos(2l+1)π2Θ-4fΘ3l=0Θ-12cot2(2l+1)π2Θe-fgrt1-cos(2l+1)πΘ, 25

where · denotes the floor function. This expression is obtained from Eq. (4.24) in Elliott (2016b) just by multiplying by f and inserting a factor of fg into the exponents. This result is required for obtaining SNR lifetimes. The pairwise correlation functions required for σ(t)2 are computed via numerical matrix methods using the matrices M± for the filter model (given in Elliott (2016b) or implied by the transitions in Fig. 3A), and we also obtain the Hebb equilibrium distribution A_2 by numerical methods.

To estimate SNR lifetimes in the filter model for the Hopfield protocol, we consider the slowest decaying mode in the first and second terms of Eq. (25). For non-sparse coding, it is usually enough to consider just the slowest mode in the first term, but with sparseness, both terms must be considered for a better approximation. For Θ large enough, we then have

μfil(t)16fπ2Θe-π2fgrt/8Θ2-e-π2fgrt/2Θ2. 26

Approximating the Hopfield variance by its asymptotic form σ()2, the single-perceptron SNR memory lifetime for the filter model is then

fgrτsnr(fil)4Θ2π2loge256π4Θ2f2σN()2-π4Θ5512σN()3f3, 27

where in deriving this expression, we have regarded the second term as a correction to the first term, with the first term arising purely from the first term in Eq. (26). To obtain the population SNR memory lifetime τpop(fil), we again just replace σN() by σN()/g in Eq. (27).

We also consider the serial synapse model (Leibold and Kempter 2008; Rubin and Fusi 2007). In this model, a synapse performs a symmetric, unbiased, one-step random walk on a set of 2s states between reflecting boundaries. The first (respectively, second) group of s states are identified as corresponding to strength -1 (respectively, +1). For each strength state, there are thus s metastates. If a synapse has strength -1 (respectively, +1) and experiences a sequence of depressing (respectively, potentiating) induction signals, then it is pushed into progressively higher metastates. However, the synapse can only change strength when in the lowest, i=1 metastate. The transitions are illustrated in Fig. 3b. The transition matrices M± are just

M+=diag{0,,02s-1,1}+diagl{1,,12s-1}, 28a
M-=diag{1,0,,02s-1}+diagu{1,,12s-1}, 28b

where diagu and diagl denote the upper and lower diagonals, respectively. The eigen-decomposition of M=12M++M- is standard (cf. Elliott 2016a, for the eigen-decomposition of the similar matrix C there), so we can directly evaluate μ(t)=fΩ_TeM-IfgrtM+A_1, where A_1T=1_T|1_T/(2s). We obtain

μser(t)=fs2l=0s-1(-1)lcot(2l+1)π4se-fgrt1-cos(2l+1)π2s. 29

For the Hebb protocol, we again use numerical matrix methods to obtain A_2. To estimate SNR lifetimes for the Hopfield protocol, it is sufficient to consider just the slowest decaying term in Eq. (29), giving

μser(t)4fπse-π2fgrt/8s2, 30

for s large enough, and hence

fgrτsnrser4s2π2loge16π2s2f2σN()2, 31

as the required approximation, with τpopser obtained in the usual way.

In the cascade model of synaptic plasticity (Fusi et al. 2005), there are also 2s metalevels, s for each bistate strength state, but unlike the serial synapse model, a potentiating (respectively, depressing) induction signal for a synapse with strength -1 (respectively, +1) in metastate i can with probability 21-i (or 22-i for i=s) cause the synapse to change strength and return to metastate i=1. The same probabilities govern transitions to higher metastates. The transitions are illustrated in Fig. 3C. The cascade model essentially constitutes a tower of stochastic updaters that progressively render the synapse less labile. We have extensively analysed the cascade model elsewhere (Elliott and Lagogiannis 2012) and compared its memory performance to filter-based synapses, which outperform the cascade model in almost all biologically relevant regions of parameter space (Elliott 2016b). It is possible to obtain analytical results for the Laplace transform of the mean dynamics in the cascade model (Elliott and Lagogiannis 2012), but here we use numerical matrix methods. Rubin and Fusi (2007) give a formula for the SNR based on finding a fit to numerical results. The implied formula for the mean is

μcas(t)14f5se-fgrt/2s-21+fgrt. 32

Taking the asymptotic variance σ()2 in the Hopfield protocol, we can then use the expression μcas(t)/σN() for the SNR. This still cannot be solved analytically for the SNR lifetime τsnrcas (or the population form τpopcas), but we can use it to obtain numerical solutions that can be compared to results obtained from exact matrix methods.

A serial or cascade synapse possesses 2s states, with each set of s metalevels duplicated for each strength. Metalevel i for strength -1 cannot be identified with metalevel i for strength +1 because the transitions induced by plasticity induction signals are in opposite directions. This is in contrast to the filter model, in which the filter transitions are independent of the strength state. Serial and cascade synapses therefore possess fully 2s physical states characterising the state of a synapse, while a filter synapse possesses 2Θ physical states and not 2(2Θ-1) states. Hence, we may directly compare the performance of a filter synapse with threshold Θ to a serial or cascade synapse with a total of 2s metastates, or s metastates per strength state.

Results

We now turn to a discussion of our results, comparing and contrasting the various models of synaptic plasticity considered above, for the Hebb and Hopfield protocols. For simplicity we consider simulation results only for SU synapses, to confirm and validate our analytical results. Simulations are run according to protocols discussed extensively elsewhere (see, for example, Elliott and Lagogiannis 2012; Elliott 2014), but modified to allow for sparse coding. We first consider single-perceptron memory lifetimes and then population memory lifetimes.

Single-perceptron memory lifetimes

In Fig. 4, we show results for memory lifetimes for SU synapses with no spontaneous activity, ζ=0, comparing the Hopfield and Hebb protocols. We consider both FPT and SNR lifetimes, and for FPT lifetimes, we show results for both the FPE and MIE approaches. Simulation results are also shown, although only for f10-3: for smaller values it becomes increasingly difficult to obtain enough statistics for decent averaging due to the longer simulation run times. We select an update probability of p=1/10, which is our standard choice of p in earlier work (see, for example, Elliott 2014). From Eq. (24), rτmfpt and rσfpt are expected to scale as 1/f2 for small f for the FPE approach, so we remove this scaling by multiplying by f2, which in this figure affords greater clarity and resolution.

Fig. 4.

Fig. 4

Convergence of Hebb and Hopfield protocol results for stochastic updater synapses in the limit of sparse coding. Scaled single-perceptron memory lifetimes are shown as a function of sparseness, f. Results in red (respectively, blue) correspond to the Hopfield (respectively, Hebb) protocol. Shaded regions indicate f2r(τmfpt±σfpt) (with the central solid line showing f2rτmfpt) computed using the FPE approach to FPTs, so that we show the (scaled) MFPT τmfpt surrounded by the one standard deviation region around it, governed by σfpt. Short-dashed lines show f2rτmfpt obtained using the exact, MIE approach to FPTs. Circular data points correspond to results from simulation, for f10-3. Long-dashed lines show results for f2rτsnr; rτsnr=0 for the Hebb protocol over the whole range of f in panel A. The value of N is indicated in each panel. In all panels, p=1/10, ζ=0 and ϑ=0

Above we showed that the Hopfield and Hebb protocols must coincide for f1/N. For the various choices of N used in Fig. 4, we see this convergence of both protocols’ results, becoming indistinguishable for f below 1/N, for all forms of memory lifetime. Focusing first on rτmfpt from the FPE approach, for smaller N we clearly see that f2rτmfpt asymptotes to a common, N-independent constant as f becomes small; we would see the same behaviour for larger N too, but would need to take smaller values of f than those used in this figure. We also see that f2rτmfpt from the MIE approach tracks that from the FPE approach quite closely, and indeed for intermediate values of f and smaller choices of N, it plateaus, so that rτmfpt scales as 1/f2 in this regime. However, for N=103 and f10-3, we clearly see the MIE f2rτmfpt turn downwards and approach zero as f decreases. This behaviour is consistent with the derived form of the exact scaling behaviour in Eq. (23), in which rτmfpt1/f for small f. We also just see this change for N=104 for f close to 10-4, but for larger N we would need to take f smaller to see the 1/f scaling of the exact form of rτmfpt. Our simulation results agree with the results from the MIE approach, validating both. Although we do not take f small enough to see the switch to 1/f scaling for N=103 in Fig. 4A, we nevertheless do clearly see the start of the down-turn at f=10-3.

For f>1/N in Fig. 4, we see very significant differences between the Hebb and Hopfield protocols. While for the Hopfield protocol f2rτmfpt grows like logeN for fN large enough, this is not the case for the Hebb protocol. For f in the region of unity, f2rτmfpt is roughly speaking independent of N. This means that the dynamics are dominated by the correlations between pairs of synapses’ strength in the Hebb protocol. For f=1, we obtain rτmfpt5.34 and 5.35 for N=103 and N=106, respectively, from the FPE approach. (The corresponding values from the MIE approach are 6.64 and 6.79, respectively.) In the regime of f not too far from unity, memory lifetimes in the Hebb protocol are therefore significantly reduced by the synaptic correlations induced by this protocol, where the influence of these correlations cannot be removed by increasing N.

We see that rτmfpt is robustly positive in Fig. 4 for all choices of N over the whole range of displayed f, and it remains so for small f because of the discussed scaling behaviour. However, looking at the one standard deviation region around rτmfpt, it is clear that in some regimes for f, there can be high variability in FPT memory lifetimes. For the Hopfield protocol, this regime of high variability occurs for small f (where what is “small” f depends on N), while in the Hebb protocol, there is an additional regime for f close to unity. High variability does not mean that memories cannot be stored: rτmfpt is always robustly positive. Rather, high variability simply means that some memories are stored strongly while others are stored weakly or not at all.

Turning to a consideration of rτsnr, we see from Fig. 4 that rτsnr exists (i.e. rτsnr>0) in precisely those regions of low variability in FPT lifetimes. Indeed, the results for rτsnr track quite closely those for r(τmfpt-σfpt) over some range of f, and deviate from it elsewhere. We have shown in a non-sparse coding context that FPT and SNR lifetimes for simple synapses essentially coincide (up to additive constants) in the regime in which the distribution of h0 is tightly concentrated around its supra-threshold mean (Elliott 2017a). For the specific case of ϑ=0, as here, we showed that if we can write the initial variance σ(0)2 in the form σ(0)2B0(N)/2, then the parameter μμ(0)2/B0(N) must be large enough, which means μ2 (Elliott 2017a). We then have that μμ(0)/σ(0)2, which is just a condition on the initial SNR.2 Using the pre-averaged form B0(Neff)Neff (see Appendix A), this condition reduces to 4/(p2N)f in the Hopfield protocol for ζ=0. For the Hebb protocol, the limit of large N with p not too close to unity additionally satisfies the requirement on σ(0)2, giving the upper bound fp/2 for ζ=0. In the Hebb protocol, we therefore have the interval 4/(p2N)fp/2 for equivalence of SNR and FPT memory lifetimes, for ζ=0. (We must have N8/p3 for this interval to exist.) With p=1/10 in Fig. 4, these conditions are 400/Nf and 400/Nf0.05 for the Hopfield and Hebb protocols, respectively. For 400/Nf in both protocols (except for the Hebb protocol for N=103, where the bounding range of f is invalid), we do indeed see that the FPE results for f2rτmfpt and those for f2rτsnr run essentially parallel to each other, but that for f<400/N, f2rτsnr peels away from f2rτmfpt. The same is true for the Hebb protocol for N>103: as f increases above 0.05, f2rτsnr also peels away from f2rτmfpt. Thus, these two estimates for the two protocols appear to capture well the region of f for which rτsnr is a reliable indicator of memory longevity. SNR lifetimes are therefore acceptable surrogates for FPT lifetimes when the latter are subject to low variability, but outside these regions SNR lifetimes fail to capture the possibility of memory storage, albeit with high variability. Importantly, the requirement that f4/(p2N) in both protocols means that the SNR approach cannot be extended to very small or just small f, because such values violate the asymptotic regime. Essentially, then, the SNR approach cannot probe the very sparse coding regime in either protocol.

For the Hopfield protocol, Eq. (20) is just

rτsnr12f2plogeNf2p2f+(1-f)ζ2. 33

With ζ=0, we require f>1/(p2N) for rτsnr>0, and we see precisely these threshold values for the different choices of N in Fig. 4. Alternatively, we require N>1/(fp2) for memories to be stored according to the SNR criterion. However, these conditions do not carry over to FPT memory lifetimes: we need neither a minimum N nor a minimum f for rτmfpt>0, because it is always positive. This failure of SNR conditions to carry over to the FPT case also applies to any optimality conditions derived from rτsnr. From Eq. (33) with ζ=0, we may find that value of f, fopt, that maximises rτsnr, giving rise to rτsnropt, with the result that fopt=e/(p2N). The same value essentially applies to the Hebb protocol, albeit with complicated corrections. However, for the validity of the SNR results, both protocols require f4/(p2N). If the SNR optimality condition is valid, then it must satisfy fopt=e/(p2N)4/(p2N), or e4. This is clearly false, and hence the SNR optimality condition for f is spurious, because at f=fopt, the asymptotic validity condition is violated. In fact, we may essentially take f as small as we like and rτmfpt will continue to grow, albeit with increasing variability in the FPT lifetimes. Thus, although we will shortly consider optimality conditions for SNR memory lifetimes with complex synapses, these conditions must be viewed with extreme caution.

Figure 4 considers only the case of exactly zero spontaneous activity, ζ=0. In Fig. 5, we examine the impact of spontaneous activity on SU memory lifetimes. We show only the case of N=105 to avoid unnecessary clutter, but the results are qualitatively similar for other choices of N. In the Hopfield protocol, ζ appears only through a quadratic term in B(h) or σ(t)2, while in the Hebb protocol, ζ also appears through a linear term. This difference makes the Hebb protocol much more sensitive to spontaneous activity than the Hopfield protocol, and we see this explicitly in Fig. 5. In the Hopfield protocol, the asymptotic variance takes the form σ()2=[f+(1-f)ζ2]/N, so ζ exerts a significant influence on memory lifetimes only for fζ2. We therefore only start to see a divergence of memory lifetimes from those for ζ=0 at around fζ2, and this is confirmed in the figure. However, as f is taken small, the dependence of rτmfpt (from the FPE) on ζ is lost (just as its dependence on N is lost), so that for very small f, ζ does not affect (FPE) FPT lifetimes, neither their means nor their variances. This is because for small f, the scaling results in Eq. (24) depend only on the A and not the B jump moment, so they depend only on drift and not diffusion. However, ζ appears only through the diffusion term. In contrast to the Hopfield protocol, even a choice of ζ=0.01 induces a large reduction in memory lifetimes in the Hebb protocol, at least away from the small f regime. For small f, the Hebb and Hopfield protocols coincide, so we observe the same loss of dependence on ζ in (FPE) FPT lifetimes in the Hebb protocol. However, away from the small f regime, the linear term in ζ in B or σ(t)2 significantly impacts memory lifetimes.

Fig. 5.

Fig. 5

Impact of spontaneous activity on stochastic updater single-perceptron memory lifetimes. Results are shown for f2rτmfpt (from the FPE approach) and f2rτsnr for both the Hopfield and Hebb protocols, as indicated in the different panels. Different line styles correspond to different levels of spontaneous activity, ζ, as indicated in the common legend in panel D. Some lines style are absent in panel D because there is no corresponding rτsnr>0. In all panels we take N=105, with p=1/10 and ϑ=0 in all cases

Examining Eq. (33) for the Hopfield protocol, for ζ=0, we have just f1 in the logarithm, while for ζ=1, we have f2. Roughly speaking, for intermediate values of ζ, the effective power of f switches rapidly from one to two in the vicinity of f=ζ2. This switching can be seen clearly in Fig. 5, where as f decreases, f2rτsnr (and also f2rτmfpt) tracks closely the form for ζ=0, until it rapidly peels away, following a different power. Although it is still clearly the case that optimality conditions obtained from rτsnr are invalid, it is nevertheless worth examining fopt. For ζ=0, we again obtain fopt=e/(p2N), but for ζ=1, we instead obtain fopt=e/(p2N)1/2, so that the N-dependence changes. The corresponding optimal lifetimes are rτsnropt=p3N2/(4e) for ζ=0 and rτsnropt=pN/(2e) for ζ=1. Of course, we see explicitly in Fig. 5 that these SNR-derived optimal values of f and thus maximum possible SNR lifetimes are invalid, but SNR lifetimes do at least indicate when FPT lifetimes are subject to lower variability and when they are subject to higher variability.

Considering ζ=1 is of course biologically meaningless, as then there is no distinction between spontaneous and evoked electrical activity levels. However, taking either ζ=0 or ζ=1 allows explicit optimality results to be obtained for these two cases, while such results are not available for intermediate values of ζ. As just indicated, empirically we observe a very rapid switching in dynamics in the vicinity of ζ=f, with the explicit results for ζ=0 and ζ=1 therefore indicating the general behaviour prior to and after, respectively, this switching. When we give results for ζ=1, we therefore do so with this understanding: that the limit is biologically meaningless, but that it nevertheless indicates the general behaviour for ζ in excess of around f.

We now turn to complex models of synaptic plasticity, considering only SNR lifetimes. In Figs. 6 and 7, we plot SNR lifetimes against sparseness, f, for the three complex models discussed above, for both zero and nonzero spontaneous firing rates, and for the Hopfield (Fig. 6) and Hebb (Fig. 7) protocols. All results are obtained by numerical matrix methods to solve the SNR equation μ(τsnr)=σ(τsnr), where the standard deviation σ(t) is computed fully rather than via just its asymptotic form σ().

Fig. 6.

Fig. 6

Spontaneous activity reduces single-perceptron memory lifetimes and limits sparseness in complex synapse models: Hopfield protocol. Single-perceptron SNR memory lifetimes are shown for different complex models of synaptic plasticity under the Hopfield protocol, as a function of sparseness, f. Each panel shows results for the indicated model and choice of spontaneous activity, either ζ=0 or ζ=0.1. Results are shown for Θ or s ranging from 2 to 12 in increments of 2, with the particular choice identified by the line colour described by the common legend in panel B. In all cases, N=105

Fig. 7.

Fig. 7

Spontaneous activity reduces single-perceptron memory lifetimes and limits sparseness in complex synapse models: Hebb protocol. The format of this figure is identical to Fig. 6, except that it shows results for the Hebb protocol, and in the right-hand panels we use a smaller value ζ=0.01. Some lines of specific colour are absent in some graphs because there is no corresponding rτsnr>0. In all cases, N=105

For the Hopfield protocol in Fig. 6 we see in all cases and for all choices of parameters an onset of SNR lifetimes for a minimum, threshold value of f, the rapid attainment of a peak or optimal value of rτsnr, followed by a steady fall in lifetimes as f increases further. For all complex models, this onset of SNR lifetimes occurs for increasingly large values of f as Θ or s increases. At least for the parameter ranges in this figure, in the filter and serial models, for a given choice of f, increasing Θ or s increases rτsnr, although as the number of internal states continues to increase, ultimately rτsnr will start to fall. In the case of the cascade model, however, the dependence of rτsnr on s for fixed f is not as simple as for the other complex models. We note that for all models in this figure, the optimal values of rτsnr decrease for increasing Θ or s, at least for ζ=0. However, when we increase the spontaneous activity to ζ=0.1, the optimal values lose most of their dependence on Θ or s in the filter and serial models, although not in the cascade model. This loss of dependence on Θ or s is strongly N- and ζ-dependent. For N=103, we must take ζ close to unity before this loss of dependence is noticeable, while for N=106, even ζ=10-3/20.0316 is sufficient.

For the Hebb protocol, in Fig. 7, for smaller f we obtain essentially the same results as for the Hopfield protocol because these two protocols must coincide for f1/N, regardless of the model of synaptic plasticity. However, for larger f, the synaptic correlation terms induced by the Hebb protocol again significantly impact SNR memory lifetimes, with the impact being greater for larger Θ or s in the filter and serial models. Thus, as with SU synapses under the Hebb protocol, SNR lifetimes exist only in some interval of f (below f=1), with this interval shrinking and disappearing as the number of internal states increases (or as p decreases for SU synapses). These dynamics dramatically limit the number of internal states that give rise to positive SNR lifetimes. Nevertheless, as Θ or s increases, rτsnr in general increases, at least until the permissible range of f becomes very small and then disappears entirely. For the cascade model, however, the upper limit on f is roughly speaking independent of s, but we also see that in general, as s increases, rτsnr decreases for fixed f. This relative insensitivity of the upper limit of the permissible range of f to s in the cascade model occurs because the cascade model has different metastates with different update probabilities, with some synapses residing in the lower metastates and so having larger update probabilities than those residing in higher metastates.

In the presence of spontaneous activity, we see a dramatic change in the memory lifetimes. Indeed, such is the sensitivity of the Hebb protocol to ζ, especially for complex synapse models, that in contrast to Fig. 6 for the Hopfield protocol, for which we took ζ=0.1, in Fig. 7 we take ζ=0.01. Even with just 1% spontaneous activity, the filter and serial models’ number of internal states becomes severely restricted, in terms of giving rise to positive SNR lifetimes. The cascade model under the Hebb protocol is not quite so sensitive, again because of its different metastates, but a 10% level of spontaneous activity would still dramatically restrict the permissible ranges of f and s, compared to the Hopfield protocol.

We quantify these observations by explicitly considering the optimal choices of the parameters f and either Θ or s, so fopt and either Θopt or sopt, that maximise rτsnr, giving rise to rτsnropt. In Figs. 8 and 9, we plot fopt and rτsnropt against Θ or s, for different levels of spontaneous activity, ζ, for the particular choice of N=105. Results are obtained both by numerical matrix methods and by using the approximations for μ(t) and rτsnr given in Sect. 3.2. For the latter, we maximise rτsnr as a function of f for fixed Θ or s.

Fig. 8.

Fig. 8

Optimal sparseness in complex synapse models for single perceptrons in the Hopfield protocol. The left-hand panels (A, C, E) show the optimal single-perceptron memory lifetimes rτsnropt obtained at the corresponding optimal levels of sparseness fopt shown in the right-hand panels (B, D, F), for the indicated complex models. Lines show numerical matrix results while the corresponding data points show approximate analytical results obtained as discussed in the main text. Results are shown for different values of ζ, with identifying line styles corresponding to those in Fig. 5. We have set N=105 in all panels

Fig. 9.

Fig. 9

Optimal sparseness in complex synapse models for single perceptrons in the Hebb protocol. The format of this figure is essentially identical to Fig. 8, except that it shows results for the Hebb protocol. Approximate analytical results are not available for the Hebb protocol and so are not present. The termination of a line at a threshold value of Θ or s indicates that above that value, no choice of f generates rτsnr>0. We have set N=105 in all panels

For the Hopfield protocol in Fig. 8, we see that for ζ=0, rτsnropt falls as a function of Θ or s. However, in the filter and serial models, as ζ increases, the fall in rτsnropt with Θ or s reduces and disappears; indeed, the exact results in fact show a very slight increase in rτsnropt with Θ or s for ζ=1, although this behaviour is not noticeable in Fig. 8. For the displayed choice of N=105, we need only take ζ0.1 for the filter and serial models’ rτsnropt to be relatively insensitive to Θ or s. This is N-dependent: for N=106, even ζ=0.01 is sufficient; for N=103, ζ needs to be quite close to unity. In contrast, for the cascade model, rτsnropt always falls with s for any choice of ζ, including ζ=1. The behaviour of the filter and serial models’ rτsnropt is easy to extract from the approximate results in Sect. 3.2. Ignoring for simplicity the correction terms in Eq. (27), both filter and serial models’ rτsnr can be written in the form

rτsnraq2f2logeNbf2q2[f+(1-f)ζ2] 34

(cf. Eq. (33)), where a and b are numerical constants and q denotes Θ or s. For ζ=0 we obtain fopt=q2e/(bN) and rτsnroptab2N2/(2eq2), while for ζ=1 we obtain fopt=qe/bN and rτsnroptabN/e. Therefore, fopt scales differently with q and with N in these two cases, and for ζ=0, rτsnropt falls as q increases, but for ζ=1, rτsnropt is completely independent of q. Intermediate choices of ζ result in intermediate behaviours between these two extremes, and the corrections terms in Eq. (27) provide only corrections to, rather than fundamentally alter, this behaviour. We see in Fig. 8 that the numerical and approximate analytical results agree well for the filter and serial models, and that, moreover, both these models’ optimal values are very similar. Unfortunately, in the case of the cascade model, no such simple analysis, even using the fitted form for μcas(t) in Eq. (32), is available to explain the fact that rτsnropt falls with s for all values of ζ, including ζ=1. The numerical and fitted results for rτsnropt agree well in the cascade model, although there are quite large discrepancies in fopt obtained from the (exact) numerical methods and the fitted expression, particularly for larger values of s and for ζ closer to zero than to unity. Fitting our numerical matrix results for rτsnropt in the cascade model to power laws in s and N for large enough s, we find that for ζ=0, fopts2/N and rτsnroptN2/s4, while for ζ=1, fopts/N and rτsnroptN/s2. While the scaling behaviour of fopt is the same as that in the filter and serial models, the dependence of rτsnropt on qs differs in the cascade model compared to the filter and serial models.

For the Hebb protocol in Fig. 9, again the pairwise correlation structure present in σ(t)2, and the Hebb protocol’s extreme sensitivity to even very small levels of spontaneous activity ζ, have a significant impact on optimality conditions. In the filter and serial models, the permissible range of Θ or s is considerably reduced, so that even for ζ=0.01, s cannot exceed 6 in the serial synapse model, or 5 in the filter model. As N is reduced from the displayed value of N=105, the permissible ranges of Θ and s reduce. The cascade model in the Hebb protocol is also extremely sensitive to noise, but as discussed, the different metastates’ different update probabilities somewhat ameliorate this sensitivity. Nevertheless, increasing ζ from ζ=0 to just ζ=0.1 reduces rτsnropt by several orders of magnitude.

In Figs. 10 and 11 we instead examine Θopt or sopt as a function of f, rather than vice versa, so that we maximise rτsnr with respect to Θ or s while holding f fixed. For the Hopfield protocol in Fig. 10, subject to a minimum, threshold requirement, Θopt or sopt increases as a function of f, for any level of spontaneous activity ζ, in all three complex models considered here. However, as ζ moves from ζ=0 to ζ=1, the functional dependence of Θopt or sopt on f changes. We can derive this explicitly by again using the simple expression for rτsnr in Eq. (34) for the filter and serial models. The optimal value of q (either Θ or s) is

qopt=fbNe[f+(1-f)ζ2]. 35

Thus, as f increases, qopt essentially switches from linear growth in f to slower, f growth, at around fζ2. This behaviour is clearer for the smaller nonzero choices of ζ used in Fig. 10. The corrections due to the additional terms in Eq. (27) do not fundamentally change this behaviour for the filter model. The corresponding optimal SNR memory lifetime is

rτsnroptabNe[f+(1-f)ζ2](atqopt). 36

For ζ=0, rτsnropt decreases as f increases, but for ζ=1, rτsnropt is independent of f. As f increases, the transition from rτsnropt being independent of f to falling as 1/f is again sharp, occurring around fζ2. This transition is clear for the filter and serial models in Fig. 10. In the case of the cascade model, however, although sopt increases with f, albeit according to clearly different power laws than for the filter and serial models, the corresponding value of rτsnropt always decreases as a function of f, regardless of ζ.

Fig. 10.

Fig. 10

Optimal synaptic complexity in complex synapse models for single perceptrons in the Hopfield protocol. The format of this figure is very similar to Fig. 8, except that we have optimised with respect to Θ or s rather than f. In panels A and C the lines switch from numerical matrix to approximate analytical results when the corresponding values of Θopt or sopt exceed 20 in the right-hand panels; before this transition, the lines correspond to numerical matrix results and the discrete points to approximate analytical results. We have set N=105 in all panels

Fig. 11.

Fig. 11

Optimal synaptic complexity in complex synapse models for single perceptrons in the Hebb protocol. The format of this figure is essentially identical to Fig. 10, except that approximate analytical results are not available for the Hebb protocol. We have set N=105 in all panels

For the Hebb protocol in Fig. 11, again the small f behaviour must be identical to that for the Hopfield protocol in Fig. 10. However, the increase in Θopt or sopt with increasing f is halted and then reversed as f increases further, as the effects of the pairwise synaptic correlations induced by the Hebb protocol are felt. These correlations not only pull down the optimal value of Θopt or sopt, but they also have a deleterious effect on rτsnropt, changing the 1/f behaviour of the filter and serial models in the Hopfield protocol to approximately 1/f3 behaviour in the Hebb protocol (obtained by fitting), for ζ=0. Furthermore, while spontaneous activity can make rτsnropt independent of f before the switch to 1/f behaviour in the Hopfield protocol for the filter and serial models, in the Hebb protocol, rτsnropt always decreases with increasing f, for all three complex models considered here.

Population memory lifetimes

We now turn to population SNR memory lifetimes. Because SNRp(t)gPSNR(t), optimisation of τpop with respect to Θ or s is not affected by the additional factor of g. When we instead optimise population SNR lifetimes with respect to f=g, however, the additional factor of g in μp(t)/σp(t) compared to μ(t)/σ(t) functionally changes the optima compared to those for a single perceptron, and so we focus on this case. Because of the independence approximation involved in estimating τpop in Sect. 2.4, τpop is only an upper bound on population SNR lifetimes, and this will be implicit below.

For simple, SU synapses, Eq. (20) indicates that τsnr and τpop differ in the logarithmic term, with the former having argument Nf2p2/[f+(1-f)ζ2] and the latter Nf3p2/[f+(1-f)ζ2]. We therefore see immediately that single-perceptron SNR lifetimes with ζ=1 and population SNR lifetimes with ζ=0 have identical f-dependence. For single-perceptron lifetimes with ζ=0 and ζ=1 and population lifetimes with ζ=0 and ζ=1, the f-dependence under the logarithm is f1, f2, f2 and f3, respectively. The effective power of f switches rapidly in the vicinity of f=ζ2, in an N- or N-dependent way. Because N=NPN, we expect very rapid switching in the population case, and with only very small, even negligible levels of spontaneous activity being required to induce the change in effective power. Above we found for single-perceptron optimal lifetimes that rτsnroptp3N2/(4e) at fopt=e/(p2N) for ζ=0, and rτsnroptpN/(2e) at fopt=e/(p2N)1/2 for ζ=1. For optimal population lifetimes these become rτpopoptpN/(2e) at fopt=e/(p2N)1/2 for ζ=0, and rτpopopt3(pN2)1/3/(4e) at fopt=e/(p2N)1/3. Spontaneous activity changes the N-dependence of rτsnropt from N2 to N, and the N-dependence of rτpopopt from N to N2/3, which latter is a smaller overall reduction, although in all cases the dependence on p involves a positive power. Because the dominant behaviour of SNR lifetimes in the filter and serial models is governed by a similar, single logarithmic term, many of these scaling observations for simple synapses carry over unchanged to these complex synapses.

We examine the behaviour of optimal population SNR lifetimes for complex synapses in Fig. 12. Compared to the single-perceptron optimal SNR lifetimes in Fig. 8, the population results in Fig. 12 are markedly different, particularly for the filter and serial models. For these models, with ζ=0, rτpopopt is now approximately independent of Θ or s, while with ζ>0, rτpopopt grows as a function of Θ or s. Even with ζ=0.01, this growth is present and of almost the same profile as that for ζ=1, while at the single-perceptron level, for smaller choices of N it is necessary to take ζ close to unity to halt the decrease in rτpopopt with increasing Θ or s. This sensitivity to small, nonzero values of ζ at the population level is N-dependent, but even with N=106 (e.g. N=103 and P=103), we only require ζ=0.1 for rτpopopt to adopt the same profile as that for ζ=1. For the cascade model, however, optimal population SNR lifetimes fall with s just as they do for single perceptrons. Nonzero ζ does render rτpopopt nearly independent of s for small s (s6), but for larger s, rτpopopt falls with s. We may quantify the filter and serial models’ population SNR lifetimes as before, using the slowest decaying modes. We obtain

rτpop=aq2f2logeNbf3q2[f+(1-f)ζ2], 37

where now we have f3 rather than f2 in the numerator of the logarithm, just as for SU synapses. The optimal values of f are now fopt=qe/bN (cf. q2e/(bN)) for ζ=0 and fopt=q2/3e/(bN)1/3 (cf. qe/bN) for ζ=1. The corresponding optimal memory lifetimes are rτpopopt=abN/e (cf. a2bN2/(2eq2)) and rτpopopt=3a(bqN)2/3/(2e) (cf. abN/e), respectively. The corrections due to the additional terms in the filter models’ results again modify but do not fundamentally alter this behaviour. Thus, at the population level, for ζ=0, the filter and serial models’ optimal SNR lifetimes are independent of q, while at the single perceptron level, they fall as 1/q2. However, for ζ=1, the population lifetimes grow as q2/3, while for a single perceptron, they are constant. We cannot obtain similar analytical results for the cascade model, so we fit the numerical results for the cascade model in Fig. 12 to power laws in s and N. We find that for larger values of s, at the population level rτpopoptN/s2 with fopts/N for ζ=0 (cf. N2/s4 and s2/N, respectively, for a single perceptron), and rτpopoptN2/3/s4/3 with fopts2/3/N1/3 for ζ=1 (cf. N/s2 and s/N, respectively, for a single perceptron), with the same rapid switching behaviour for intermediate ζ as for the filter and serial models. The population dynamics soften the fall of rτpopopt with s, but not enough to turn the dependence into growth with s.

Fig. 12.

Fig. 12

Optimal sparseness in complex synapse models for neuronal populations in the Hopfield protocol. The format of this figure is essentially identical to that in Fig. 8, which shows results for the single-perceptron case. Lines show numerical solutions of the equation μp(τpop)/σp(τpop)=1 maximised with respect to f, so rτpopopt at f=fopt, while data points show approximate analytical results. We have set N=104 and P=108, or N=1012, in all panels

In Table 2 we summarise the scaling behaviour of rτopt and fopt as functions of either p or q and of either N or N, for simple and complex synapses, for both single-perceptron and population results, for ζ=0 and ζ=1. In each column, regardless of the model, fopt scales identically as a function of q or p-1 and of N or N. This is not surprising given that the SU, filter and serial model results for fopt come from the same dominating logarithmic behaviour, but the cascade results are obtained by fitting numerical matrix data to power laws to extract the behaviour. For τopt, we also obtain the same scaling behaviour as a function of N or N within each column, again regardless of the model. However, the scaling of τopt with q (or p-1) within each column does depend on the particular model of plasticity. Across a row, moving from single-perceptron ζ=0 and ζ=1 results to population ζ=0 and ζ=1 results, the dependence of τopt on q (or p-1) changes in such a way that increasing q (or decreasing p) has an increasingly less deleterious effect on memory lifetimes for SU and cascade synapses. For SU synapses, the power of p reduces from 3 to 1 to 13, while for cascade synapses the power of q (or s) changes from -4 to -2 to -43. For both SU and cascade synapses, optimal memory lifetimes therefore always decrease as p decreases or s (the number of metastates) increases, regardless of the level of spontaneous activity, and regardless of whether at a single-perceptron or population level. For filter and serial synapses, however, the power of q changes from -2 to 0 (i.e. no dependence) to +23. Increasing the number of serial metastates or filter states available to a filter or serial synapse therefore increases optimal population SNR lifetimes, but only in the presence of spontaneous activity. As Fig. 12 indicates, we need only have very low levels of spontaneous activity to induce this growth of optimal population SNR lifetimes with the number of internal states available to filter or serial synapses.

Table 2.

Overall dependence of optimal single-perceptron and population SNR memory lifetimes and the corresponding optimal sparseness on model parameters. Here, q represents Θ or s, depending on the complex model, and is assumed large

Single-perceptron Population
ζ=0 ζ=1 ζ=0 ζ=1
Stochastic updater fopt (p-2/N)1 (p-2/N)1/2 (p-2/N)1/2 (p-2/N)1/3
τopt p3N2 pN pN p1/3N2/3
Non-cascade fopt (q2/N)1 (q2/N)1/2 (q2/N)1/2 (q2/N)1/3
τopt (N/q)2 N N (qN)2/3
Cascade fopt (q2/N)1 (q2/N)1/2 (q2/N)1/2 (q2/N)1/3
τopt (N/q2)2 (N/q2)1 (N/q2)1 (N/q2)2/3

Discussion

Memory is a complex, multi-level, system-wide phenomenon involving processes occurring over many time scales and across different brain regions, with integrated and orchestrated control processes coordinating, for example, the transition from short- to long-term memory (Eichenbaum and Cohen 2001). Palimpsest models of memory, in which older memories are forgotten as newer ones are stored (Nadal et al. 1986; Parisi 1986), focus on the dynamics of memory storage and retrieval within a single memory system, such as the hippocampal CA3 recurrent network (Andersen et al. 2007). Sparse population coding (see, for example, Csicsvari et al. 2000; Olshausen and Field 2004) enhances memory lifetimes in these memory models by reducing the overall rate of synaptic plasticity at single synapses, so effectively dilating time, and by decorrelating synaptic updates induced by overlapping memories (Tsodyks and Feigel’man 1988). Complex synapse models, involving metaplastic changes in synapses’ internal states without associated changes in synaptic strength, have also been proposed as a way in which to enhance memory lifetimes in palimpsest models (Fusi et al. 2005), whereas we introduced models of integrate-and-express, filter-based synapses as a means of enhancing the stability of developmental patterns of synaptic connectivity in stochastic models of synaptic plasticity (Elliott 2008; Elliott and Lagogiannis 2009).

Understanding the interaction between sparseness and synaptic complexity in palimpsest memory models is therefore crucial (Leibold and Kempter 2008; Rubin and Fusi 2007). Taken at face-value, our results for SNR single-perceptron memory lifetimes support two conclusions. First, optimal single-perceptron SNR lifetimes, optimised with respect to sparseness, require lower synaptic complexity for longer optimal lifetimes. Second, optimal single-perceptron SNR lifetimes, optimised instead with respect to synaptic complexity, again require lower synaptic complexity as sparseness increases for longer optimal lifetimes. These conclusions hold regardless of the level of spontaneous activity, although spontaneous activity can prevent the decrease in optimal single-perceptron memory lifetimes. These conclusions appear to argue in favour of reduced synaptic complexity in real neurons in the presence of sparse population coding, at least at the single-neuron level. However, at the population level, the first of these conclusions is overturned, at least for filter and serial synapses. Critically, even in the presence of low but nonzero levels of spontaneous activity, optimal population SNR lifetimes, optimised with respect to sparseness, increase rather than decrease with synaptic complexity, for filter and serial synapses but not for cascade synapses. At a population level, sparseness, synaptic complexity and, crucially, nonzero spontaneous activity interact to promote increased optimal population SNR memory lifetimes. It is remarkable that non-cascade complex synapse models therefore appear to require the existence of spontaneous activity in a population setting with sparse population coding.

In reaching these conclusions, we have employed two superficially rather different memory storage protocols. First, the Hebb protocol uses two-level inputs ξi{ζ,1}, while the Hopfield protocol uses four-level inputs ξi{-1,-ζ, +ζ,+1}, although in stripping out spontaneous activity, the latter reduces to the standard conventions of the Hopfield model with its two-level inputs ξi{-1,+1}. However, because we have considered binary strength synapses with Si{-1,+1}, the effective contributions of these two sets of inputs to a perceptron’s activation are identical: in both protocols, ξiSi takes the same four possible values. Second, the Hebb protocol uses the cue and target sub-populations approach to determine the direction of synaptic plasticity, while the Hopfield protocol uses the standard Hopfield rule governed by the product of evoked pre- and postsynaptic activity. However, in both protocols, synapses experience identical potentiating and depressing plasticity induction signals at the same separate rates, rfg/2. Furthermore, in both protocols, these induction signals are produced by imposing a pattern of electrical activity on the sub-population of active neurons during memory storage, rather than by allowing neurons’ activities to be generated via direct, afferent synaptic drive. Both protocols therefore implicitly assume executive control of memory storage by other brain regions (see, for example, Eichenbaum and Cohen 2001). These two differences are indeed therefore just superficial, and this is reflected in the fact that the mean activation μ(t) evolves identically under both protocols. The real difference between the Hebb and Hopfield protocols does not reside in these matters of convention and definition. Rather, it resides in the fact that an active perceptron’s synapses with active inputs experience either only potentiating or only depressing induction signals during memory storage under the Hebb protocol, while in the Hopfield protocol some experience potentiating and others depressing induction signals. This difference gives rise to the Hebb protocol’s complicated equilibrium structure, with its nonzero pairwise and higher-order synaptic correlation functions. Remove this higher-order structure, and the two protocols would have identical statistics for perceptron activation. Indeed, in the limit of small fgN in which at most one of a perceptron’s synapses experiences a plasticity induction signal during memory storage, the dynamical difference between the two protocols vanishes and their statistical structures become identical.

Two earlier studies have considered memory lifetimes in complex models of synaptic plasticity in the presence of sparse population coding (Leibold and Kempter 2008; Rubin and Fusi 2007). Leibold and Kempter (2008) used the cue-target protocol that we have adapted and referred to as the Hebb protocol. They employed synaptic strengths Si{0,1} rather than our Si{-1,+1}, although this difference is unimportant because it just amounts to an effective re-definition of the firing threshold ϑ (Elliott and Lagogiannis 2012). They also employed two-level activities, but with ξi{0,1}, so without considering the possible influence of nonzero spontaneous activity, ζ>0, on memory lifetimes. Rubin and Fusi (2007) used the Hopfield protocol with two-level activities, ξi{-1,+1}, interpreting ξi=-1 as spontaneous activity and ξi=+1 as evoked activity, and stressed the importance of considering the impact of spontaneous activity on memory lifetimes. We have modelled spontaneous activity in the Hopfield protocol by moving to four-level inputs, but as indicated, this approach is essentially equivalent to two-level inputs for synapses with Si{-1,+1} in terms of the overall statistical structure of perceptron activation. However, by using four activity levels, we are able to consider varying ζ over its allowed range in order to explore the impact of different degrees of spontaneous activity on memory lifetimes. A significant difference between our approach and that of Rubin and Fusi (2007) is that we do not allow spontaneous activity to induce synaptic plasticity, a position that we consider to be mandated by a broadly BCM (Bienenstock et al. 1982) view of synaptic plasticity, as discussed earlier. Finally, our respective definitions of the memory signal, from which SNR memory lifetimes are obtained, differ in a population setting. Rubin and Fusi (2007) define this signal over the entire population of neurons, while we define it over only that sub-population of neurons that are directly involved in memory storage (or the equivalent of Leibold and Kempter (2008)’s target sub-population). This difference leads to different scaling behaviours of optimal population SNR memory lifetimes as a function of the sparseness of the population coding.

The difference between the scaling behaviours of optimal SNR memory lifetimes (optimised with respect to sparseness) in the single-perceptron and population cases is intriguing. Furthermore, the role of even very small levels of spontaneous activity in enhancing optimal population SNR lifetimes with increasing synaptic complexity in non-cascade models is fascinating. However, we have cautioned against over-interpreting results from an SNR analysis of memory lifetimes. This analysis depends on the distribution of h0 being tightly concentrated around its supra-threshold mean. We have shown in earlier work that this requirement is often not satisfied, and that a FPT approach is required to examine memory lifetimes away from this regime (Elliott 2016a, 2017a, 2020). Here, for simple synapses, we have explicitly seen that the single-perceptron SNR analysis breaks down in the limit of small f, and so it cannot probe the very sparse coding regime. The explanation for this failure is straightforward: as f is reduced, the initial SNR μ(0)/σ(0) reduces, and below some threshold value of f the SNR validity condition μ(0)/σ(0)2 fails. For a single perceptron, we saw that this condition is Nf2p2/f+(1-f)ζ24 (for either protocol). Plugging in fopt for SU synapses with ζ=0 and ζ=1, this condition becomes e4 and e4, respectively, where we saw the former case earlier. Both conditions are violated, although with spontaneous activity the violation is not so great. Although we have not extended our FPT analysis of filter-based synapses (Elliott 2017a, 2020) to the sparse coding regime considered here, the same issues arise with complex synapses. Therefore, we fully expect single-perceptron SNR optimality conditions to be violated for complex synapses, too.

Whether population SNR optimality conditions are violated, in either simple or complex models, is unclear. We would need to extend our single-perceptron FPT analysis to a population setting. Furthermore, this extended analysis would need to be reducible to the population SNR analysis with its rather coarse approximation that neurons’ activities evolve independently, despite synaptic coupling. However, it is extremely tempting to speculate that the simple synapse condition for population SNR validity is just the obvious generalisation, namely μp(0)/σp(0)2. Using the population results for fopt for simple synapses, this condition becomes the false e4 for ζ=0 and the true e3/24 for ζ=1. It is thus quite remarkable that if this speculation is borne out by a more careful analysis, then optimal population SNR memory lifetimes for simple synapses are valid in the presence of spontaneous activity.

Acknowledgements

I acknowledge the use of the IRIDIS High Performance Computing Facility, and associated support services at the University of Southampton, in the completion of this work.

Open Access

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Transition matrix elements and jump moments

Table 3 provides an additional table of frequently used mathematical symbols and their meanings used exclusively in the appendices.

Table 3.

List of main mathematical symbols and their meanings appearing in the appendices

Symbol Meaning
Φmn(p) Φmn(p)=nCmpm(1-p)n-m, the binomial distribution’s mass function
e_(N) Normalised unit eigenvector of (N+1)×(N+1) matrix with transition elements given in Eq. (42)
FN(z) FN(z)=i=0Nei(N)zi, probability generation function of the components of e_(N)
G(zw) G(z,w)=N=0FN(z)wN, ordinary generating function for the PGFs FN(z)
ε_(N), FN(z), G(z,w) Equivalents of e_(N), FN(z), G(zw) for model-independent arguments
H(x) G(zw) re-expressed in terms of the single dependent variable x=w(1-z)/[2-w(1+z)]
κn Coefficient in power series for H(x), or n synapses’ equilibrium strength correlation coefficient
K(z) Exponential generating function for the correlation coefficients κn
ψj ψj=ψ1-ψj

To obtain FPT lifetimes for SU synapses, we require Prob[h|h], or the induced jump moments. For synaptic strengths ±1, the number of synapses of strength +1 uniquely determines h(t). Let Neff of a perceptron’s synapses be active during the storage of ξ_0. Consider first the Hebb protocol. Immediately before the storage of any subsequent non-tracked memory, let i of these Neff synapses have strength +1 and j of the other N-Neff synapses have strength +1. Then,

Nh=(2i-Neff)+ζ2j-(N-Neff). 38

Similarly, immediately after the storage of this non-tracked memory, let k and l synapses out of the Neff and N-Neff synapses, respectively, have strength +1, so that

Nh=(2k-Neff)+ζ2l-(N-Neff). 39

Then, the transition operator TN=(1-g)II+12gK+K++12gK-K- induces the corresponding transition probability

Prob[k,l|i,j]=(1-g)δk,iδl,j+12gΦk-iNeff-i(ψ)Φl-jN-Neff-j(ψ)+12gΦi-ki(ψ)Φj-lj(ψ), 40

where δi,j is the Kronecker delta function, and Φmn(p)=nCmpm(1-p)n-m is the binomial probability distribution with nCm being the binomial coefficient. The three terms correspond to the three parts of TN, with the last two just being products of the probabilities for the possible ways in which the different sets of synapses can change strength to give the required transition process. We can also obtain a similar result for the Hopfield protocol. In this case, let i of the Neff synapses and j of the other N-Neff synapses contribute positively to h; and similarly k and l, respectively, to h. For example, a synapse with a component ξ0=+1 (or +ζ) and S(t)=+1 and a synapse with a component ξ0=-1 (or -ζ) and S(t)=-1 both contribute positively to i (or j). We again just have Nh=(2i-Neff)+ζ2j-(N-Neff), and similarly for Nh. Then, from TN=(1-g)II+gKK, we obtain

Prob[k,l|i,j]=(1-g)δk,iδl,j+gm=0iΦmi(12ψ)Φk+m-iNeff-i(12ψ)×n=0jΦmj(12ψ)Φl+n-jN-Neff-j(12ψ), 41

where the first sum arises because m{0,,i} out of the i synapses can change strength and then k+m-i of the Neff-i synapses must also change strength for the ik transition, and similarly for the second sum. These probabilities Prob[k,l|i,j] uniquely determine Prob[h|h], from which we can obtain the jump moments in Eq. (5). The final results are given in Eqs. (21) and (22). The second jump moment depends explicitly on Neff, so we denote it as B(h|Neff). It is convenient to write B(h|Neff)=ψgB0(Neff)+ψ2g(h)2, so that B0(Neff) is, up to a multiplicative factor, the h-independent part of B(h|Neff).

To obtain FPT lifetimes from the FPE approach, we must solve Eq. (6) with its Neff-dependent second jump moment for any particular choice of Neff. For this given value, we must average the resulting solution τmfpt(h0|Neff) over the initial distribution of h0, which also depends on Neff, and then average over Neff according to its binomial distribution, obtaining τmfpt=τmfpt(h0|Neff)h0>ϑNeff. However, for large enough fN, it is sufficient to average B0(Neff) over the distribution of Neff and just use the average jump moment ψgB0(Neff)Neff in Eq. (6). This “pre-averaging” method is similar to a mean field approximation, but goes beyond just replacing Neff with its mean value, fN. We then average the resulting solution τmfpt(h0) over h0, where the unconditional statistics of h0=h(0) are given in Eqs. (7) and (11) with the correlations in Eq. (19). Because the FPE is valid only to second order, it suffices to take the distribution of h0 as a Gaussian with these first- and second-order statistics.

Although Prob[k,l|i,j] determines Prob[h|h] uniquely, the reverse is in general not the case in the presence of nonzero spontaneous activity. In particular, the equation Nh=(2i-Neff)+ζ2j-(N-Neff) may have multiple solutions for i and j, given a value of h, depending on the value of ζ. To avoid this awkwardness, for determining FPT lifetimes according to the exact, MIE approach of Eq. (4) or its continuum limit, we restrict to the specific case of ζ=0. Then, the contributions to h(t) from the inactive inputs drop out, and we need only work with Prob[k|i], with the transition processes involving j and l being irrelevant. Since Nh=(2i-Neff) and Nh=(2k-Neff), Prob[k|i] uniquely determines Prob[h|h] and vice versa. As with Eq. (6), Eq. (4) is then conditioned on Neff synapses contributing positively to the perceptron’s activation during the storage of ξ_0, and so we must also solve Eq. (4) for each value of Neff and compute the same double average, τmfpt=τmfpt(h0|Neff)h0>ϑNeff. Where feasible, we always perform this exact calculation. For larger choices of N and values of f closer to unity (for which fN remains sizeable), we move to the mean field approximation, setting Neff=fN (or its closest integer), which is an excellent approximation that makes the calculations tractable. For even larger N (N=106), we move to the integral equation form of Eq. (4), corresponding to a continuum limit for h. This limit also works well for smaller values of N, but we prefer exact methods where possible. Unlike the FPE approach, we need the exact distribution of h0 in the MIE approach, or a good approximation to it, to average correctly over h0, and so we need the equilibrium distribution of all synapses’ strengths. We give the details of the calculation for the Hebb protocol in Appendix B.

Hebb equilibrium structure for simple synapses

We require the unit eigenstate of the operator 12K+K++12K-K- for all N synapses, where K± are given in Eq. (16). This operator induces the transition probabilities

Prob[k|i]=12Φk-iN-i(ψ)+12Φi-ki(ψ), 42

for the number of synapses of strength +1 in equilibrium (cf. Eq. (40)). These probabilities are the elements of an (N+1)×(N+1) matrix whose unit eigenvector determines the equilibrium distribution. Let the (N+1)-dimensional vector e_(N) with components ei(N), i=0,,N, be this eigenvector, with i=0Nei(N)=1. Then ei(N) is the probability that i synapses have strength +1 in equilibrium, where we indicate the dependence of ei(N) on N for clarity. We define the probability generating function (PGF) Gi(z)=i=0NzkProb[k|i] for the columns of the transition matrix, obtaining

Gi(z)=12zi(1-ψ)+ψzN-i+12ψ+(1-ψ)zi, 43

and also define the PGF for the components of its unit eigenvector e_(N) by writing FN(z)=i=0Nei(N)zi. Then, the eigenvalue equation can be written as:

FN(z)=12(1-ψ)+ψzNFNz(1-ψ)+ψz+12FNψ+(1-ψ)z. 44

Forming an ordinary generating function (OGF) by writing G(z,w)=N=0FN(z)wN, we obtain

G(z,w)=12Gz(1-ψ)+ψz,(1-ψ)+ψzw+12Gψ+(1-ψ)z,w. 45

This equation must be solved subject to the two boundary conditions G(1,w)=1/(1-w) and G(z,0)=F0(z)1.

Although Eq. (45) is a nasty functional equation, we can exploit very general, model-independent arguments to simplify it. For indistinguishable synapses, marginalising a general equilibrium distribution A_N+1 over one synapse must give A_N. Let εi(N) be the probability that any i out of N synapses have strength +1 in this general equilibrium distribution A_N. The probability of any particular configuration of i such synapses having strength +1 is then just εi(N)/NCi. Considering an additional synapse added to these N synapses, it could have strength -1 or +1, and these two new, particular configurations have probabilities εi(N+1)/N+1Ci and εi+1(N+1)/N+1Ci+1, respectively. So, we must have

1NCiεi(N)=1N+1Ciεi(N+1)+1N+1Ci+1εi+1(N+1), 46

which represents the result of marginalising A_N+1 over one synapse’s strength to obtain A_N. Then,

εi(N)=N+1-iN+1εi(N+1)+i+1N+1εi+1(N+1). 47

This equation has natural boundary conditions: putting i=N+1 or i=-1 gives zero on the RHS, respecting the convention that εi(N)=0 for i<0 and i>N. Writing the PGF FN(z)=i=0Nεi(N)zi and using these boundary conditions, the PGF must satisfy the ordinary differential equation

(1-z)dFN+1(z)dz+(N+1)FN+1(z)=(N+1)FN(z). 48

Then writing the OGF G(z,w)=N=0FN(z)wN, we obtain the partial differential equation

(1-z)G(z,w)z+w(1-w)G(z,w)w=wG(z,w), 49

which is subject to the boundary conditions G(1,w)=1/(1-w) and G(z,0)=1. The general solution of this equation can be written in the form:

G(z,w)=1w(1-z)Hw(1-z)2-w(1+z) 50

for an arbitrary (at least once-differentiable) function H(x), where the two boundary conditions at z=1 and w=0 impose the same requirement, that H(x)/x2 as x0. The solution in Eq. (50) imposes a functional constraint on the form of G(z,w) in any model with indistinguishable synapses.

Applying the general form in Eq. (50) to the particular case in Eq. (45), so writing G(zw) in terms of some function H(x) where x=w(1-z)/[2-w(1+z)], we reduce the functional equation involving a function in two variables to the much simpler and more symmetric functional equation

H(x)=12(1-ψ)H(1-ψ)x1+ψx+H(1-ψ)x1-ψx, 51

in just one variable. We solve this equation using a power series solution. There are no even terms (because H(0)0 guarantees that H(x)/x is finite as x0), so we write

H(x)=2xi=0κ2ix2i, 52

with κ01 satisfying the boundary conditions. The coefficients κ2i must then satisfy the infinite tower of recurrence relations

κ2i(2i)!=j=0iκ2j(2j)!(1-ψ)2jψ2(i-j)[2(i-j)]!. 53

These equations can be solved iteratively to any desired order.

Given the κ2i coefficients, we then have H(x), with G(zw) following directly as:

G(z,w)=i=0κ2iw2i(1-z2)2i1-w1+z22i+1. 54

Then, FN(z) follows, which can be written in the form:

FN(z)=i=0NNCiκi1-z2i1+z2N-i, 55

where we define the odd coefficients κ2i+10 to make the expression for FN(z) take its simplest form. By definition FN(z)i=0Nei(N)zi, so by reading off the coefficient of zi in Eq. (55) we obtain ei(N) in terms of the κ2j coefficients. From the definition of FN(z) as the PGF for the number of synapses with strength +1 in equilibrium, the equilibrium distribution A_N takes the form:

A_N=i=0Nei(N)NCiP[1010N-i0101i], 56

where P denotes a sum over all NCi combinations of the N indicated tensor products involving i synapses of strength +1 and N-i of strength -1, and the ei(N) can be expressed in terms of the κ2j via Eq. (55).

Although this completely solves the problem of finding the Hebb equilibrium distribution, in fact we can write A_N directly in terms of the κi coefficients. To see this, we first evaluate FN(z) at z=-1 using Eq. (55), giving FN(-1)κN. But FN(-1)=i=0N(-1)iei(N), and this alternating sum is, up to an overall sign, just the equilibrium correlation coefficient E[S1()××SN()], since

E[S1()××SN()]=(-1)N×e0(N)+(-1)N-1×e1(N)++(-1)0×eN(N)=(-1)Ni=0N(-1)iei(N). 57

So E[S1()××SN()]=(-1)NFN(-1)(-1)NκN. But κ2i+10, so we can just drop the parity factor (-1)N. Hence, κN is the equilibrium correlation function between the strengths of N synapses, for any choice of N. We must therefore be able to expand A_N directly in terms of these correlation functions. Equation (56) writes A_N in terms of the two orthogonal vectors (1,0)T and (0,1)T. Although these definite strength states are the natural ones, we can instead expand the equilibrium state using a different pair of orthogonal vectors. In particular, we may use the pair A_1=12(+1,+1)T and A_1=12(+1,-1)T, where the former is just the monosynaptic equilibrium distribution. Then, we may instead write

A_N=i=0NκiP[A_1A_1N-iA_1A_1i]. 58

Because κ2i+1=0, only an even number of A_1 vectors can appear in each term. We may confirm by explicit calculation that Eqs. (56) and (58) are equivalent representations of the same equilibrium state A_N, and we may also confirm that the form in Eq. (58) has the correct marginal and correlational structure. For example, to compute a correlation function involving j out of the N synapses, we need to marginalise over N-j synapses. This marginalisation is achieved just by summing over their states, which we do by dotting through with the 2-dimensional vector 1_ in the N-j relevant places in the tensor product. To obtain expectation values of the other j synapses’ strengths, we just dot through with Ω_ in the j relevant places. But, 1_2A_1 and Ω_-2A_1, so Eq. (58) is just a disguised expansion in the orthogonal vectors 1_ and Ω_ that must be used when computing the equilibrium correlation functions. Equation (58) is therefore the only possible form with the requisite correlational structure.

Although in obtaining Eq. (53) we have essentially found the Hebb equilibrium distribution, using it to obtain the κ2i is awkward. For example, the first few κ2i are

κ0=1,κ2=ψ2-ψ,κ4=ψ2(2-ψ)2(6-10ψ+5ψ2)(2-2ψ+ψ2),κ6=ψ3(2-ψ)3(90-450ψ+1013ψ2-1276ψ3+929ψ4-366ψ5+61ψ6)(6-18ψ+29ψ2-28ψ3+17ψ4-6ψ5+ψ6), 59

and they become increasingly complicated as i increases. We can compute these coefficients numerically for any given value of ψ, but high precision is required to obtain stable results, with more precision required as N increases. Rather than using Eq. (53) directly, we instead define the exponential generating function (EGF) of the coefficients κ2i,

K(z)=i=0κ2i(2i)!z2i. 60

This EGF undoes the convolution structure in Eq. (53), and we obtain the q-like equation (see, for example, Andrews et al. 1999)

K(z)=K((1-ψ)z)coshψz. 61

The solution follows by iteration, so that

K(z)=j=0coshψ(1-ψ)jz, 62

where we have used K(0)1. This EGF can be evaluated in closed form for the three cases of ψ=1, ψ=12 and ψ=0 (or strictly in the limit ψ0), giving

K(z)=1forψ=0sinhzzforψ=12coshzforψ=1. 63

The coefficients κ2i follow as: κ2i=δi,0 for ψ=0; κ2i=1/(1+2i) for ψ=12; and κ2i=1 for ψ=1. Plugging these into Eq. (55), we obtain

FN(z)=1+z2Nforψ=01N+11-zN+11-zforψ=12121+zNforψ=1. 64

Thus, for ψ=0, ei(N)=NCi/2N, so binomially distributed with probability 12; for ψ=12, ei(N)=1/(N+1), so uniformly distributed; for ψ=1, ei(N)=12δi,0+δi,N, so bimodally distributed with equiprobable spikes at i=0 and i=N only. The case of ψ=0 (or ψ0) corresponds to the distribution A_N=A_1A_1.

Away from these exact cases, we approximate K(z) by considering only a finite number of terms in the product for the EGF in Eq. (62),

Km(z)=j=0mcoshψ(1-ψ)jz. 65

Writing cosh in its exponential form and considering all combinations of products of the individual terms, the coefficient of zi in Km(z) is

ziKm(z)=1+(-1)i21i!12m×±±ψ0±ψ1±ψ2±±ψmi, 66

where we write ψj=ψ(1-ψ)j, and the sum is over all 2m possible combinations of signs. For m=0, the sum just means ψ0. Thus, we write

κi(m)=1+(-1)i212m±±ψ0±ψ1±ψ2±±ψmi, 67

and then limmκi(m)=κi. For ψ>0, ψj falls to zero geometrically fast as j increases, so the convergence is rapid. Thus, a controlled approximation replaces the coefficients κi with their truncated forms κi(m), where we only need to take m large enough for good convergence. For notational simplicity, we write Eq. (67) in the form:

κi(m)=1+(-1)i212mαpαi, 68

where the sum over pαi is shorthand for the full sum in Eq. (67). Inserting this into Eq. (55), we obtain the PGF

FN(m)(z)=12m+1α[1+pα2+1-pα2zN+1-pα2+1+pα2zN], 69

with again limmFN(m)(z)=FN(z). The approximated equilibrium distribution is therefore an average over 2m+1 binomial distributions all with parameter N but with the 2m+1 probabilities 12(1±ψ0±±ψm). Although it involves a sum over 2m+1 terms, it does not require high numerical precision to obtain stable results, and in general it provides a very efficient method for obtaining the equilibrium distribution for anything but very small N. The approximation can be made even more efficient by replacing a binomial distribution with parameter N and probability 12(1±pα) with a Gaussian distribution of mean 12N(1±pα) and variance 14N(1-pα2) for N large enough.

In Fig. 13 we illustrate the complexity of the Hebb equilibrium distribution, for various choices of ψ and N. As ψ is reduced from unity to zero, the equilibrium distribution moves from bimodal via uniform to binomial (or Gaussian in the continuum limit). We focus on values of ψ not far from ψ=12, for which the distribution in uniform, so as to capture this transition in the overall structure of the distribution, and also ψ=0.1 and ψ=0.9 at the extreme ranges. For fixed ψ, increasing N can create more oscillations in the distribution. The distribution respects the overall envelope for smaller N, but the maxima can split apart into multiple new maxima and minima as N increases. For fixed N, the oscillations spread out from the bimodal peaks at i=0 and i=N as ψ is reduced, then they flip over in the transition through ψ=12, and finally they coalesce as ψ is reduced to zero, where we expect the equilibrium distribution to become exactly binomial (or Gaussian). Even at ψ=0.1, the equilibrium distribution is not far from Gaussian for larger values of N.

Fig. 13.

Fig. 13

Hebb equilibrium probability distribution for stochastic updater synapses. Each panel shows ei(N) plotted against i for a fixed choice of ψ=fp as indicated, and for different choices of N, with both ei(N) and i scaled appropriately to permit comparison

SNR lifetimes are not sensitive to the full equilibrium structure of the Hebb protocol. When obtaining SNR lifetimes from simulations, we therefore only need to ensure that the equilibrium distribution of synaptic strengths has the correct first- and second-order statistical structure. Defining A_±=A_1±κ2A_1, we find that

A_1=12A_++A_-, 70a
A_2=12A_+A_++A_-A_-, 70b
A_3=12A_+A_+A_++A_-A_-A_-. 70c

It is not possible in general to write A_N for N4 as a similar sum over tensor products with identical factors corresponding to probability distributions. However, the approximation

A_N12A_+A_++A_-A_- 71

is exact for N=1, N=2 and N=3, and is guaranteed to have the correct marginal first-, second- and third-order statistical structure for N4. In simulations to obtain SNR lifetimes, it therefore suffices to prepare the equilibrium distribution according to Eq. (71) rather than the exact form in Eq. (56).

To compute τmfpt=τmfpt(h0|Neff)h0>ϑNeff using the MIE approach in Eq. (4), we require the distribution of h0 conditioned on exactly Neff components of ξ_0 being +1. For ζ=0, for which we can obtain Prob[h|h], the distribution of h0 is equivalent to the distribution of the number of synapses of strength +1 after the storage of ξ_0. Before the storage of ξ_0, these Neff synapses are in equilibrium, with probability ei(Neff) that i of them have strength +1. During the storage of ξ_0, the other Neff-i may potentiate, each with probability p. The probability that k of these Neff synapses have strength +1 after the storage of ξ_0 is therefore just i=0kei(Neff)Φk-iNeff-i(p). Because of the convolution structure of this sum, we can find the PGF for these probabilities in terms of FNeff(z), obtaining (cf. Eq. (44))

FNeff+(z)=(1-p)+pzNeffFNeffz(1-p)+pz. 72

Using FNeff+(z), we obtain the first two moments of h0 as

E[h0|Neff]=NeffNp, 73a
E[h02|Neff]=NeffN2+Neff(Neff-1)N2p2+κ2(1-p)2, 73b

and by averaging over Neff, we recover μ(0) and σ(0)2 in Eq. (7) (using Eq. (19a) for the correlation function) for ζ=0 and with t=0 s. For smaller values of Neff, we require the full conditional distribution of h0 encoded in FNeff+(z), but for larger values, we can safely replace it by a Gaussian with the moments in Eq. (73).

Author Contributions

Not relevant.

Funding

None.

Availability of data and material:

Available upon request.

Declarations

Conflict of interest

None.

Code availability

Available upon request.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Footnotes

1

Strictly speaking we should work at discrete memory storage steps and only move to continuous time at the final step of the calculation, but as we drop covariance terms the difference is unimportant.

2

We stress that this condition is very different in character from the more usual condition that μ(0)/σ(0)>1, which is just the trivial requirement that the mean initial signal can be distinguished from μ()0 at the one standard deviation level in models in which μ(t) falls monotonically.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Andersen P, Morris RGM, Amaral D, Bliss TVP, O’Keefe J (2007) The hippocampus book. Oxford University Press, Oxford
  2. Amit D, Fusi S. Learning in neural networks with material synapses. Neural Comput. 1994;6:957–982. doi: 10.1162/neco.1994.6.5.957. [DOI] [Google Scholar]
  3. Andrews G, Askey R, Roy R. Special functions. Cambridge: Cambridge University Press; 1999. [Google Scholar]
  4. Appleby P, Elliott T. Stable competitive dynamics emerge from multispike interactions in a stochastic model of spike-timing-dependent plasticity. Neural Comput. 2006;18:2414–2464. doi: 10.1162/neco.2006.18.10.2414. [DOI] [PubMed] [Google Scholar]
  5. Bagal A, Kao J, Tang CM, Thompson S. Long-term potentiation of exogenous glutamate responses at single dendritic spines. Proc Natl Acad Sci USA. 2005;102:14434–14439. doi: 10.1073/pnas.0501956102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Barrett A, van Rossum M. Optimal learning rules for discrete synapses. PLoS Comput Biol. 2008;4:e1000230. doi: 10.1371/journal.pcbi.1000230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bartol T, Bromer C, Kinney J, Chirillo M, Bourne J, Harris K, Sejnowski T. Nanoconnectomic upper bound on the variability of synaptic plasticity. Elife. 2015;4:e10778. doi: 10.7554/eLife.10778. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bienenstock E, Cooper L, Munro P. Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J Neurosci. 1982;2:32–48. doi: 10.1523/JNEUROSCI.02-01-00032.1982. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bliss T, Lømo T. Long-lasting potentiation of synaptic transmission in the dentate area of the unanaesthetized rabbit following stimulation of the perforant path. J Physiol. 1973;232:331–356. doi: 10.1113/jphysiol.1973.sp010273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Burkitt A, Meffin H, Grayden D. Spike-timing-dependent plasticity: the relationship to rate-based learning for models with weight dynamics determined by a stable fixed point. Neural Comput. 2004;16:885–940. doi: 10.1162/089976604773135041. [DOI] [PubMed] [Google Scholar]
  11. Csicsvari J, Hirase H, Mamiya A, Buzsaki G. Ensemble patterns of hippocampal CA3-CA1 neurons during sharp wave-associated population events. Neuron. 2000;28:585–594. doi: 10.1016/S0896-6273(00)00135-5. [DOI] [PubMed] [Google Scholar]
  12. Eichenbaum H, Cohen NJ. From conditioning to conscious recollection. Oxford: Oxford University Press; 2001. [Google Scholar]
  13. Elliott T. Temporal dynamics of rate-based plasticity rules in a stochastic model of spike-timing-dependent plasticity. Neural Comput. 2008;20:2253–2307. doi: 10.1162/neco.2008.06-07-555. [DOI] [PubMed] [Google Scholar]
  14. Elliott T. Memory nearly on a spring: a mean first passage time approach to memory lifetimes. Neural Comput. 2014;26:1873–1923. doi: 10.1162/NECO_a_00622. [DOI] [PubMed] [Google Scholar]
  15. Elliott T (2016a) The enhanced rise and delayed fall of memory in a model of synaptic integration: extension to discrete state synapses. Neural Comput 28:1927–1984 [DOI] [PubMed]
  16. Elliott T (2016b) Variations on the theme of synaptic filtering: a comparison of integrate-and-express models of synaptic plasticity for memory lifetimes. Neural Comput 28:2393–2460 [DOI] [PubMed]
  17. Elliott T (2017a) First passage time memory lifetimes for simple, multistate synapses. Neural Comput 29:3219–3259 [DOI] [PubMed]
  18. Elliott T (2017b) Mean first passage memory lifetimes by reducing complex synapses to simple synapses. Neural Comput 29:1468–1527 [DOI] [PubMed]
  19. Elliott T. First passage time memory lifetimes for simple, multistate synapses: beyond the eigenvector requirement. Neural Comput. 2019;31:8–67. doi: 10.1162/neco_a_01147. [DOI] [PubMed] [Google Scholar]
  20. Elliott T. First passage time memory lifetimes for multistate, filter-based synapses. Neural Comput. 2020;32:1069–1143. doi: 10.1162/neco_a_01283. [DOI] [PubMed] [Google Scholar]
  21. Elliott T, Lagogiannis K. Taming fluctuations in a stochastic model of spike-timing-dependent plasticity. Neural Comput. 2009;21:3363–3407. doi: 10.1162/neco.2009.12-08-916. [DOI] [PubMed] [Google Scholar]
  22. Elliott T, Lagogiannis K. The rise and fall of memory in a model of synaptic integration. Neural Comput. 2012;24:2604–2654. doi: 10.1162/NECO_a_00335. [DOI] [PubMed] [Google Scholar]
  23. Fusi S, Drew P, Abbott L. Cascade models of synaptically stored memories. Neuron. 2005;45:599–611. doi: 10.1016/j.neuron.2005.02.001. [DOI] [PubMed] [Google Scholar]
  24. Hopfield J. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA. 1982;79:2554–2558. doi: 10.1073/pnas.79.8.2554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Huang Y, Amit Y. Precise capacity analysis in binary networks with multiple coding level inputs. Neural Comput. 2010;22:660–688. doi: 10.1162/neco.2009.02-09-967. [DOI] [PubMed] [Google Scholar]
  26. Lahiri S, Ganguli S. A memory frontier for complex synapses. In: Burges C, Bottou L, Welling M, Ghahramani Z, Weinberger K, editors. Advances in neural information processing systems. Cambridge: MIT Press; 2013. pp. 1034–1042. [Google Scholar]
  27. Leibold C, Kempter R. Memory capacity for sequences in a recurrent network with biological constraints. Neural Comput. 2006;18:904–941. doi: 10.1162/neco.2006.18.4.904. [DOI] [PubMed] [Google Scholar]
  28. Leibold C, Kempter R. Sparseness constrains the prolongation of memory lifetime via synaptic metaplasticity. Cereb Cortex. 2008;18:67–77. doi: 10.1093/cercor/bhm037. [DOI] [PubMed] [Google Scholar]
  29. Lynch G, Dunwiddie T, Gribkoff V. Heterosynaptic depression: a postsynaptic correlate of long-term potentiation. Nature. 1977;266:737–739. doi: 10.1038/266737a0. [DOI] [PubMed] [Google Scholar]
  30. Montgomery J, Madison D. State-dependent heterogeneity in synaptic depression between pyramidal cell pairs. Neuron. 2002;33:765–777. doi: 10.1016/S0896-6273(02)00606-2. [DOI] [PubMed] [Google Scholar]
  31. Montgomery J, Madison D. Discrete synaptic states define a major mechanism of synapse plasticity. Trends Neurosci. 2004;27:744–750. doi: 10.1016/j.tins.2004.10.006. [DOI] [PubMed] [Google Scholar]
  32. Nadal J, Toulouse G, Changeux J, Dehaene S. Networks of formal neurons and memory palimpsests. Europhys Lett. 1986;1:535–542. doi: 10.1209/0295-5075/1/10/008. [DOI] [Google Scholar]
  33. O’Connor D, Wittenberg G, Wang SH (2005) Dissection of bidirectional synaptic plasticity into saturable unidirectional process. J Neurophysiol 94:1565–1573 [DOI] [PubMed]
  34. O’Connor D, Wittenberg G, Wang SH (2005) Graded bidirectional synaptic plasticity is composed of switch-like unitary events. Proc Natl Acad Sci USA 102:9679–9684 [DOI] [PMC free article] [PubMed]
  35. Olshausen BA, Field DJ. Sparse coding of sensory inputs. Curr Opin Neurobiol. 2004;14:481–487. doi: 10.1016/j.conb.2004.07.007. [DOI] [PubMed] [Google Scholar]
  36. Parisi G. A memory which forgets. J Phys A: Math Gen. 1986;19:L617–L620. doi: 10.1088/0305-4470/19/10/011. [DOI] [Google Scholar]
  37. Petersen C, Malenka R, Nicoll R, Hopfield J. All-or-none potentiation at CA3–CA1 synapses. Proc Natl Acad Sci USA. 1998;95:4732–4737. doi: 10.1073/pnas.95.8.4732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Rao-Ruiz P, Visser E, Mitric M, Smit AB, van den Oever MC. A synaptic framework for the persistence of memory engrams. Front Synaptic Neurosci. 2021;13:661476. doi: 10.3389/fnsyn.2021.661476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Richards BA, Frankland PW. The persistence and transience of memory. Neuron. 2017;94:1071–1084. doi: 10.1016/j.neuron.2017.04.037. [DOI] [PubMed] [Google Scholar]
  40. Rubin D, Fusi S. Long memory lifetimes require complex synapses and limited sparseness. Front Comput Neurosci. 2007;1:7. doi: 10.3389/neuro.01.1.1.001.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Sobczyk A, Svoboda K. Activity-dependent plasticity of the NMDA-receptor fractional Ca2+ current. Neuron. 2007;53:17–24. doi: 10.1016/j.neuron.2006.11.016. [DOI] [PubMed] [Google Scholar]
  42. Tsodyks M. Associative memory in neural networks with binary synapses. Mod Phys Lett B. 1990;4:713–716. doi: 10.1142/S0217984990000891. [DOI] [Google Scholar]
  43. Tsodyks M, Feigel’man M (1988) The enhanced storage capacity in neural networks with low activity levels. Europhys Lett 6:101–105
  44. Uhlenbeck G, Ornstein L. On the theory of Brownian motion. Phys Rev. 1930;36:823–841. doi: 10.1103/PhysRev.36.823. [DOI] [Google Scholar]
  45. van Kampen N. Stochastic processes in physics and chemistry. Amsterdam: Elsevier; 1992. [Google Scholar]
  46. Yasuda R, Sabatini B, Svoboda K. Plasticity of calcium channels in dendritic spines. Nat Neurosci. 2003;6:948–955. doi: 10.1038/nn1112. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Available upon request.


Articles from Biological Cybernetics are provided here courtesy of Springer

RESOURCES