Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2021 Dec 30;17(12):e1009691. doi: 10.1371/journal.pcbi.1009691

When shared concept cells support associations: Theory of overlapping memory engrams

Chiara Gastaldi 1,*, Tilo Schwalger 2,, Emanuela De Falco 3, Rodrigo Quian Quiroga 4,5, Wulfram Gerstner 1,
Editor: Abigail Morrison6
PMCID: PMC8754331  PMID: 34968383

Abstract

Assemblies of neurons, called concepts cells, encode acquired concepts in human Medial Temporal Lobe. Those concept cells that are shared between two assemblies have been hypothesized to encode associations between concepts. Here we test this hypothesis in a computational model of attractor neural networks. We find that for concepts encoded in sparse neural assemblies there is a minimal fraction cmin of neurons shared between assemblies below which associations cannot be reliably implemented; and a maximal fraction cmax of shared neurons above which single concepts can no longer be retrieved. In the presence of a periodically modulated background signal, such as hippocampal oscillations, recall takes the form of association chains reminiscent of those postulated by theories of free recall of words. Predictions of an iterative overlap-generating model match experimental data on the number of concepts to which a neuron responds.

Author summary

Experimental evidence suggests that associations between concepts are encoded in the hippocampus by cells shared between neuronal assemblies (“overlap” of concepts). What is the necessary overlap that ensures a reliable encoding of associations? Under which conditions can associations induce a simultaneous or a chain-like activation of concepts? Our theoretical model shows that the ideal overlap presents a tradeoff: the overlap should be larger than a minimum value in order to reliably encode associations, but lower than a maximum value to prevent loss of individual memories. Our theory explains experimental data from human Medial Temporal Lobe and provides a mechanism for chain-like recall in presence of inhibition, while still allowing for simultaneous recall if inhibition is weak.

Introduction

Human memory exploits associations between concepts. If you visited a famous place with a friend, a postcard of that place will remind you of him or her. The episode “with my friend at this place” has given rise to an association between two existing concepts: before the trip (the episodic event), you already knew your friend (first concept) and had seen the place (second concept), but only after the trip, you associate these two concepts.

Concepts are encoded in the human Medium Temporal Lobe (MTL) by neurons, called “concept cells”, that respond selectively and invariantly to stimuli representing a specific person or a specific place [13]. Each concept is thought to be represented by an assembly of concept cells that increases their firing rates simultaneously upon presentation of an appropriate stimulus. The fraction γ of neurons in the human MTL which is involved in the representation of each concept is estimated to be γ ∼ 0.23% [4]. Under the assumption that each memory item is represented by the activation of a fixed, but random, subset of active neurons, a single concept is expected to activate γN neurons and two arbitrary concepts are expected to share γ2 N cells, where N is the total number of neurons in the relevant brain areas.

Experimental studies have shown that single neurons can become responsive to new concepts while learning pairs of associations [5]. Moreover, it has been estimated that assemblies representing two arbitrary concepts share less than 1% of neurons, whereas assemblies representing previously associated concepts share about 4–5% of neurons [6] suggesting that an increased fraction of shared neurons supports the association between concepts [68].

With the presence of shared neurons, the activation of a first assembly (e.g., a place) may also activate a second assembly (e.g., a person). This poses several theoretical questions. First, for the brain to function correctly as a memory network, it must remain possible to recall the two associated concepts separately (e.g. place without your friend), and not automatically the two together. However, if the concepts share too many neurons it becomes likely that the two memory items can no longer be distinguished, but are merged into a single, broader concept encoded by a larger number of active neurons. We therefore ask as a first question: what is the maximally allowed fraction cmax of shared neurons between two assemblies before the possibility of separate memory recalls breaks down? Shared concept cells can be visualised as an overlap between two memory engrams. Below the maximal fraction cmax of shared neurons, each of the associated patterns can be recalled as a separate memory pattern, as schematically illustrated in Fig 1A.

Fig 1. Overlapping concepts can be retrieved separately and jointly.

Fig 1

A) Activation of concepts (schematic). Black filled circles = inactive neurons. Yellow filled circles = active neurons. Colored halos (red, green) represents assignment to a specific concept. When the fraction of shared neurons is small (top row, c < cmax) the two concepts can be recalled separately or together. If the number of shared concept cells is too large (bottom row, c > cmax), the recall of a first concept (red) leads inevitably to the activation of the second associated concept (green). B) Similarity measure. If only a subset of neurons belonging to the first memory engram is activated (top), the configuration exhibits similarities m1 < 1 and m2 = 0. If the first memory is fully recalled, while memory 2 is not (bottom), the similarity measures are m1 = 1 and m2 ≪ 1. C) Dynamics of the similarities for different fractions of shared neurons. The similarities m1 (green) and m2 (red) as a function of time in a full network simulation (solid lines) are compared to predictions of mean-field theory (dashed lines). Strong external stimulation I1 = 0.3 is given to the units belonging to concept μ = 1 during a first stimulation period and a weak external stimulation I2 = 0.1 is given to the units belonging to concept μ = 2 during the second stimulation period (in grey). If c > cmax, the concept 2 gets activated without receiving any stimulation. D) Three phase-planes of the dynamics of similarity variables m1 and m2 for different values of fraction of shared neurons c. Arrows indicate direction and speed of increase or decrease of the similarity variables. Intersections of blue and orange lines (the “nullclines” of the two variables m1, m2) indicate fixed points, with a stability encoded by color (legend). E) Minimum amplitude of the external stimulation I2 needed to activate the memory of the second concept if the first one is activated (as a function of the fraction of shared neurons c). Parameters: h^0=0.25, b^=100, rmax = 40 Hz, τ = 25 ms, α = 0, γ = 0.2%. For simulations: N = 10000, P = 2.

As an alternative to a static recall of one or the other concept (or the two associated concepts together), we could also ask whether the activation of a concept would facilitate the recall of an associated one, or even a temporal chain activation of associations (as described in free memory recall tasks [912]), due to overlaps in the representations. In this context, we ask a second question: if each concept is represented by a small fraction of active neurons γ, given the activation of a concept, is there a minimal fraction of shared neurons cmin necessary to enable a reliable activation of associated ones?

Moreover, while most experimental studies have dealt with pairwise associations between, say one person and one place, more recent work has shown that a single neuron can respond to multiple concepts [6], e.g., several related places. In view of this, we ask a third question: how should memory be organized in a neural network such that k different memory engrams all have the equal size pairwise overlaps?

Associative memory in recurrent networks, such as the area CA3 of the hippocampus, has been modeled with attractor neural networks [1317] where each memory item is encoded as a memory engram [18, 19] in a fixed random subset of neurons (called “pattern” in the theoretical literature [17]) such that no pattern has an overlap above chance with another one. Animal studies provide evidence of attractor dynamics in area CA3 [20, 21]. The few theoretical studies that considered overlapping memory engrams above chance level in the past [22, 23] focused on overlaps arising from a hierarchical organization of memories. Whereas such a hierarchical approach is suitable for modeling memory representation in the cortex, we are interested in modeling MTL, and in particular area CA3 of the hippocampus, where experimentally no hierarchical or topographical organization has been observed [6]. In experiments, episodic associations between arbitrary different concepts (such as a person and a place)—and shared neurons in the corresponding assemblies—can be induced by joint presentation of images representing the different concepts [5]. Inspired by these experiments, we create pairwise associations between a number of concepts by artificially introducing shared concept cells in the model. We will talk about “overlapping engrams” if the number of shared concept cells is beyond the number γ2N of cells that are shared by chance.

Results

The first two questions introduced above can be summarized as a more general one: What is the role of those concept cells that are shared between stored memory engrams? To answer this question, we consider an attractor neural network of N neurons in which P engrams are stored in the form of binary random patterns [7]. The pattern ξμ={ξiμ{0,1};1iN} with pattern index μ ∈ {1, …, P} represents one of the stored memory engrams: a value ξiμ=1 indicates that neuron i is part of the stored memory engram and therefore belongs to the assembly of concept μ, while a value of ξiμ=0 indicates that it does not. A network that has stored P memory engrams is said to have a memory load of α = P/N.

Since concept-cells in human hippocampus form sparse neural assemblies with a sparseness parameter γ ∼ 0.23% [4], we focus on the case of sparse memory engrams. In other words, an arbitrary neuron i has a low probability γ=Prob(ξiμ=1)1 to participate in the assembly of concept cells corresponding to memory engram μ.

The attractor neural network is implemented in a standard way [24, 25]. Each neuron, i = 1, ⋯, N, is modelled by a firing rate model [25]

τdridt=-ri+ϕ(hi), (1)

where ri(t) is the firing rate of neuron i and ϕ(h) = rmax/{1 + exp[−b(hh0)]} is the sigmoidal transfer function, or frequency-current (f-I) curve, characterized by the firing threshold h0, the maximal steepness b, and the maximal firing rate rmax. The patterns ξμ are encoded in the synaptic weights wij via a Hopfield-Tsodyks connectivity for sparse patterns so that the average of synaptic weights across a large population of neurons vanishes [17].

In attractor neural network models, memory engrams μ induce stable values rμ,i* of the neuronal firing rates during the retrieval of a stored concept. In mathematical terms, to each engram μ corresponds a fixed point rμ* in such a way that the firing rate rμ,i* of neuron i is high if ξiμ=1 and low if ξiμ=0. When the network state r(t) is initialized close enough to the stored memory μ, the attractor dynamics drives the network to the retrieval state rμ* characterized by persistent activity of all those neurons that belong to the assembly of concept μ.

The similarity between the momentary network state and a stored memory μ is defined as

mμ(t)=1Nγ(1-γ)rmaxj=1N(ξjμ-γ)rj(t). (2)

The similarity measures the correlation between the firing rates {rj(t)}j=1,…,N and the stored patterns ξμ such that if memory concept μ is retrieved, then mμ ∼ 1 (schematics in Fig 1B), and, if no memory is recalled (resting state), then mμ ∼ 0 for all μ. The similarity of the network activity with a stored memory develops as a function of time. For example, computer simulations of a network of N = 10000 interacting neurons indicate that, if one of two engrams that share concept cells is stimulated for 120ms, then the similarity of the network activity with this engram increases to a value close to one, indicating that the memory has been recalled (Fig 1C middle) while the second memory is only weakly activated quantified by a small, but non-zero similarity. However, if the fraction of shared neurons is above a maximally allowed fraction cmax, then the second memory always gets activated even before it is stimulated (Fig 1C bottom) indicating that associations are so strong that the two concepts have been merged.

Maximal fraction of shared neurons between memory engrams

In order to better understand the network dynamics, we develop a mathematical theory that depends on the fraction of neurons c that are shared between two engrams. The total number n of shared neurons in a network of size N depends on c and the sparsity parameter γ via the relation n = γcN.

Let us imagine to gradually increase the fraction of shared neurons between the first two memory engrams. At the lowest end, c = γ, the patterns ξ1 and ξ2 are independent, and hence cell assemblies 1 and 2 share a small fraction of neurons corresponding to chance level. It is well known, that in this case, each memory engram generates a separate attractive fixed-point of the network dynamics [17] implying that the two corresponding concepts can be retrieved separately. However, experimental data reports that, for associated concepts, the fraction of shared neurons c ∼ 4–5% [6] is much larger than the chance level γ ∼ 0.23%. This observation suggests that the patterns ξ1 and ξ2 of two associated memory engrams have a fraction of shared neurons larger than chance level, c > γ. On the other hand, in the (trivial) limit case of large fraction of shared neurons c → 1, the two memory engrams and hence the two cell assemblies share all neurons, and it is clearly impossible to retrieve one memory without the other.

To study the maximal fraction of shared neurons cmax at which independent memory recall breaks down, we use a mean-field approach for large networks and work in the limit N → ∞. In this limit, it is possible to fully describe the network dynamics using the similarities mμ as the relevant macroscopic variables. Since we are interested in the retrieval process of concepts μ = 1 and 2, we assume the similarity of the present network state with other memories μ > 2 to be close to zero: we will refer to these non-activated memories as “background patterns”. Under these assumptions, we find dynamical mean-field equations that capture the network dynamics through the similarity variables m1 and m2.

τdm1dt=-m1+F1(m1,m2) (3a)
τdm2dt=-m2+F2(m1,m2) (3b)

where the explicit form of the functions F1 and F2 is given in Eq (44) of Materials and methods (or Eq (45) for the special case of small load α). Eq (3) represents a two-dimensional dynamical systems which can be analyzed using phase-plane analysis. Fig 1D shows three phase-planes in the m1m2 space, each for a different value of the fraction of shared neurons. The m1- or m2-nullclines solve dm1/dt = 0 or dm2/dt = 0 in Eqs (3a) and (3b), respectively. The intersections between the m1- and m2-nullcline are equilibrium solutions, or fixed points, of the mean-field dynamics and are color-coded according to their stability. For c = γ, we identify four stable fixed points: the resting state (m1, m2) = (0, 0), two single-retrieval states (m1, m2) = (1, 0) and (m1, m2) = (0, 1) corresponding to the retrieval of concept μ = 1 and the retrieval of concept μ = 2, respectively. Finally, there is a symmetric state which corresponds to the activation of both concepts simultaneously, (m1 = m2 ≲ 1).

Once a maximally allowed value c = cmax is reached, the two single-retrieval states merge with their nearby saddle points and disappear. To compute the numerical value of the maximal fraction of shared neurons, we extract it following the procedure described in the paragraph “Extract numerically the maximal correlation” in the Materials and Methods. For fractions of shared neurons c > cmax only two stable fixed points are left, the resting state and the symmetric state in which assemblies of both concepts are activated together: this symmetric state is the theoretical description of the state that we qualitatively predicted above where the activation of a first concept leads inevitably to the activation of the second, overlapping one (Fig 1C, bottom). The minimum external stimulation needed to activate the second concept depends on the fraction of shared neurons (Fig 1E). With our choice of parameters, no external stimulation is needed to recall the second memory, if the fraction of shared neurons is c > cmax = 22%, since the two concepts have merged into a single one and are always recalled together.

In the limit of infinite steepness b → ∞, vanishing load α = 0 and vanishing sparseness γ → 0, the value cmax0 of the maximal fraction of shared neurons can be calculated analytically. Since this value provides an upper bound of the maximal fraction of shared neurons for arbitrary b, we have the inequality (Fig 2A)

cmaxcmax0γ+(1-γ)h0Armax, (4)

where A characterizes the overall strength of synaptic weights (see Eq (5) below). Further analysis (see Materials and methods) shows that the stationary states of the mean-field dynamics depend—apart from the parameters γ, C and α related to the patterns—only on two dimensionless parameters: the rescaled firing threshold h^0=h0/(Armax) and the rescaled steepness b^=b·(Armax). We find that the maximal fraction of shared neurons cmax increases with b^ and also with h^0 (Fig 2A).

Fig 2. The maximal fraction cmax of shared neurons depends on the neuronal frequency-current curve but not on the memory load.

Fig 2

A) Maximal fraction (cmax, color code) as a function of the parameters b^=bArmax (steepness) and h^0=h0/(Armax) (firing threshold) of the frequency-current curve. Niveau lines added for indicated values of cmax. In the black area the resting state is the only stable solution. Vertical white dashed lines indicate the theoretical upper bound cmax0, for different values of h^0. The green square indicates the parameter choice used in Figs 1 and 2B and 2C. The green star indicates the parameters extracted for the Macaque inferotemporal cortex [24]. B) Maximal fraction cmax of shared neurons as a function of the memory load α = P/N (left graph) without (solid grey line) or with overlaps in pairs of two of the P − 2 background patters (dashed green line); and as a function of the number p of correlated patterns (histogram, right graph). C) As in Fig 1C, but with a large number of background patterns (α = 0.2). Network activity exhibits only small similarity with background patterns (diversely colored lines) but large similarity with the stimulated pattern μ = 1. Parameters (unless specified): γ = 0.2%, b^=100, h^0=0.25, rmax = 40 Hz, τ = 25ms; α = 0 in A-B. For simulations in C: N = 10000, p = 2, γ = 0.2%.

We proceed by studying how the maximal fraction of shared neurons varies as a function of the memory load α = P/N (Fig 2B). As the load increases, we observe that the maximal fraction of shared neurons decreases, but the change is modest. This weak dependence on the load is robust against two variations of the network where (i) self-interaction of neurons is excluded; or (ii) the P − 2 background patterns are also overlapping in pairs, e.g., pattern 3 is overlapping with pattern 4, 5 with 6, etc. For both modifications, the mean-field equations look slightly different (Materials and methods) but neither modification leads to a significant change of the maximal fraction of shared neurons cmax (Fig 2A). In a network that has stored a total of P memory engrams, the maximal fraction of shared neurons could potentially depend on the group size p of patterns that are all overlapping with each other. So far we have considered p = 2. We extended the mean-field approach to the case of three and four overlapping patterns (SI) by rewriting and adapting Eq (44) in Materials and Methods. Again we find that the maximal overlap is not significantly influenced by the group size p of overlapping patterns (Fig 2B). The group size p can be large provided that the total number of patterns P does not exceed the memory capacity of the network.

In summary, we found a maximal fraction cmax of shared neurons beyond which the retrieval of single concepts is no longer possible. The value of cmax depends on frequency-current curve of neurons.

What is the minimal fraction of shared concept cells to encode associations?

We find that a symmetric double-retrieval state exists where two concepts are recalled at the same time (Fig 1D, top), even if the fraction of shared concept cells is at chance level. This co-activation of two unrelated concepts could be an artifact of the model considered so far.

In order to check whether our findings in Figs 1 and 2 are generic, we added to the network the effect of inhibitory neurons by implementing a negative feedback proportional to the overall activity of the N neurons in the network. Inhibitory feedback of strength J0 > 0 causes competition during recall of memories. We find that for J0 = 0.5, each of the two concepts can be recalled individually, but simultaneous recall of both concepts is not possible if the fraction of shared concept cells is at chance level (Fig 3A). If we increase the fraction of shared concept cells above c = 5%, then individual as well as simultaneous recall of the two associated memories becomes possible (Fig 3B). The effect becomes even more pronounced at c = 20% (Fig 3C). If the fraction of shared neurons reaches a high value of cmax = 50%, then the separate retrieval of the two individual concepts is no longer possible, indicating that the two concepts have merged into a single one (Fig 3D). Thus, in the presence of inhibition of strength J0, we find that the fraction c of shared neurons must be in a range cmin(J0) < c < cmax(J0) to enable individual as well as joint recall of associated concepts. For J0 < 0.5, the minimal fraction cmin(J0) is at chance level and for J0 = 0.5 at cmin = 5%.

Fig 3. The existence of a symmetric double-retrieval state requires a fraction of shared neurons above chance level in the presence of global inhibition.

Fig 3

Four phase-planes showing the stable fixed points in presence of global inhibition, for a fractions of shared neurons A c = γ, B c = 5%, C c = 20%, D c = 50%. On the diagonal, nullclines lie nearly on top of each other (dashed line). Parameters: h^0=0, b^=500, α = 0, γ = 0.2%, J0 = 0.5.

Association chains

Neurons shared between memory engrams have been proposed to be the basis for the recall of a memorized list of words [912, 26]. In order to translate this idea to chains of associated concepts (Fig 4A), we follow earlier work [912, 26] and add two ingredients to the model of the previous subsection. First, the strength of global inhibitory feedback is now periodically modulated by oscillations mimicking Hippocampal oscillatory activity. The oscillations provide a clock signal that triggers transitions between overlapping concepts. Second, we add to each neuron i an adaptation current θi(t) in order to prevent the network state to immediately return to the previous concept. With this extended model, the network state hops from one concept to the next (Fig 4B). Transitions are repeated, but after some time the network state returns to one of the already retrieved memories, leading to a cycle of patterns [9] (Fig 4B). In network simulations where concepts are represented by sparse memory engrams (γ = 0.2%), we allow a subgroup of p = 2, 4 or 16 memory engrams to share a fraction of neurons of c = 20%. Because the number of shared concept cells is identical between all pairs of concepts within the same subgroup, the order of the recalled concepts depends on the initial condition. If the subgroup of overlapping engrams is small (p = 2, 4), all memory items are retrieved, while for a large group of overlapping engrams (p = 16) the cycle closes once a subgroup of the overlapping memory engrams has been retrieved (Fig 4B). The number of concepts in the cycle depends on the time scale of adaptation: in Fig 4 we use τθ2TJ0, which determines a cycle of minimum three concepts.

Fig 4. Chain of associations requires shared concepts cells.

Fig 4

A) Schematic of a chain of association cycling between two concepts. Assignment of cells to assemblies is indicated by halos’ color. Filled black circles indicate inactive neurons and filled yellow circles indicate active neurons. The schematics corresponds to the top plot of panel B. B) Full network simulation for engrams overlapping above chance level (c = 20%>γ) with low sparsity (γ = 0.2%). Each line corresponds to the similarity mμ with one of the stored memory engrams as a function of time. A subgroup of p engrams is overlapping (top to bottom: p = 2,4,16. If the network state is initialized to retrieve one of the overlapping concepts, other concepts within the subgroup are retrieved later. C) Same as in B, but memory engrams are independent (c = γ) and only share cells by chance. By decreasing the mean activity γ, the retrieval dynamics of a chain of memories is disrupted. The match between mean-field theory and simulations is shown in S4 Fig. Parameters: N = 10000, P = 16, b^, τθ = 1.125 s, T = 3.75 ms, TJ0=625ms, τ = 25 ms, rmax = 40 Hz.

In previous studies [912, 26], each memory engram involved a large fraction (γ = 10%) of neurons so that transitions could rely on the number of units shared by chance. The overlaps between memory assemblies may vary due to finite size effects. The concept that shares the biggest overlap with the active one, is activated next, until the association chain falls into periodic cycle of patterns. However, given that the value of the sparsity in MTL is much smaller (γ ∼ 0.23%), it is natural to ask whether the number of neurons shared by chance (c = γ) is sufficient to induce a sequence of memory retrievals. Our simulations indicate that this is not the case (Fig 4C). Thus, in a network storing assemblies with a realistic level of sparsity γ ∼ 0.2%, memory engrams with a fraction of shared neurons above chance level are necessary for the retrieval of chains of concepts.

To better understand the role of overlaps between engrams for the formation of association chains, we extend the mean-field dynamics to include the global feedback with periodic modulation J0(t). Since simulations indicate that overlaps are necessary, we want to estimate the minimal and maximal fractions of shared neurons required to enable association chains. Because, in our model, the periodic modulation of the global inhibition strength J0(t) is slow, we consider the mean-field dynamics and the corresponding phase portraits quasi-statically at the two extreme cases, where J0 is at its maximum and where J0 is at its minimum. For our parameter setting, when J0(t) is clamped at its minimum, the network possesses three stable states: the resting state and the two single retrieval states (Fig 5B, left). For a successful association chain, we need that concepts can be retrieved separately. The fraction of shared neurons, cmax, that makes the two single retrieval states disappear therefore sets the upper bound of the useful range of c. The parameter cmax is analogous to cmax in the previous section, but evaluated in the presence of perdiodic inhibition.

Fig 5. Dependence of association chains on sparsity and neuronal parameters.

Fig 5

A) Dynamical mean-field solutions for m1 and m2 in the case of two correlated patterns. The grey dashed line shows the modulation of J0(t). B) Phase planes corresponding to the minimum (J0 = 0.7) and maximum (J0 = 1.2) value of inhibition in the case of two associated patterns. C) Minimal and maximal fraction of shared concept cells as a function of the sparsity γ and D) of the steepness b. E) Table with the values of C and D. Parameters (unless specified): γ = 0.2%, b^=100, c = 20%, τθ = 1.125 s, T = 3.75 ms, TJ0=625ms, θi = 0 for every i.

Next, we consider the situation when the global inhibition is clamped at its maximum and find the minimal fraction such that the system has, besides the resting state, a second fixed point for m1 = m2 > 0 where the assemblies of both the previous and next concept are simultaneously active at low firing rates. Since this state is necessary to enable the transition, we call it the transition state. If the transition state is present, the network could, once global inhibition decreases, either return from the transition state to the previous concept, or jump to the next one (Fig 5B, right side). However, in the presence of adaptation (which is not included in the phase plane picture of Fig 5), the transition to the next concept is systematically favored because neurons participating in the assembly of the earlier concept are fatigued. The existence of the transition state is a necessary condition for the formation of temporal association chains. Thus, the lower bound of the fraction of shared neurons cmin is the smallest overlap such that the transition state exists. Since in the mean-field limit, the transition state appears only for c > γ, a fraction of shared neurons above chance level is needed to allow the hopping between concepts. In Fig 5C–5E we show the dependence of the maximal and minimal fraction of shared concept cells upon the sparsity γ and the steepness b: in both cases the dependence is not strong, but sparser networks lead to a slightly smaller range of the admissible fraction c of shared neurons supporting association chains. Importantly, the minimal fraction of shared neurons necessary for association chains is significantly above the fraction of neurons that are shared by chance. We find that for a suitable choice of neuronal and network parameters, association chains are possible for realistic values of γ and c as measured in human MTL. This suggests that, in principle, associations could be implemented as sequences of transitions if the number of shared neurons is above cmin.

In conclusion, we have shown the need for overlaps between memory engrams—equivalent to a number of shared concept cells significantly above chance level – to explain free memory recall as a chain of associations in recurrent networks such as the human CA3 where each engram involves only a small fraction of neurons.

How does a network embed groups of overlapping memories?

In our discussion on shared concept cells, we have so far mainly focused on neurons that are shared between a single pair of memory engrams such as one place and one person. However, humans are able to memorize many different persons and places, some memories forming subgroups of associated items, others not. In order to compare our network model with human data we therefore need to encode several subgroups of two or more of overlapping memory engrams in the same network of N neurons. Based on the results of the previous sections, we wondered whether we can explain the experimental distribution of the number of concepts a single neuron responds to. We find that imposing the fraction c of shared concept cells between pairs of concepts, does not predict uniquely how many neurons are used if a given number of memory engrams is embedded in a network. Therefore, imposing c as a target number of shared concept cells while encoding multiple concepts is not sufficient to predict whether a given neuron responds to 3 or 5 different concepts. The question then is: to how many concepts does a single neuron respond if several groups of overlapping engrams have been embedded in the network?.

To study this question we consider three different algorithms that all construct memory engrams of 200 neurons per memory with a pairwise overlap of 8 neurons in a network of 100,000 neurons, i.e., γ = 0.2% and c = 4% (Fig 6B). First we consider an iterative overlap-generating a non-hierarchical model, in which we impose a fixed target number of shared concept cells as the only condition. Second, we consider two hierarchical models, in which every subgroup of associated memory patterns is derived by a single “parent” pattern, which does not take part in the subgroup. In the hierarchical generative model, only neurons that belong to the parent pattern can be contribute to the neural representation of the patterns in the subgroup. On the contrary, in the indicator neuron model, the parent pattern is composed of indicator neurons that have a fixed probability λind of appearing in each of the subgroup’s patterns. Non-indicator neurons can also take part in the representation of the subgroup’s patterns with a different probability Ω. In other words, in the hierarchical generative model, neurons that do not belong to the parent pattern are excluded from the representation of all of the associated patterns in a subgroup, while in the indicator neuron model, no neuron is excluded from contributing to the representation of the subgroup. When we embed subgroups of 16 engrams with identical numbers of pairwise shared neurons, then an iterative overlap-generating model needs about 2400 neurons out of the 100,000 available neurons, whereas two different hierarchically organized algorithms need about 2400 or 3000 neurons, respectively. In order to understand which of the three algorithms explains experimental data best, we quantify the predictions of the three algorithms under the assumption that not just one, but several subgroups of patterns are embedded in the same network and compare the predictions with experimental data using a previously published dataset of human concept cells [6].

Fig 6. A single neuron responds to several concepts.

Fig 6

A) Probability that a neuron responds to a given number of concepts: comparison between data and 3 different algorithms: the hierarchical generative model and the indicator neuron model, which both build overlapping engrams in a hierarchical way, and the iterative overlap-generating model which is a non-hierarchical algorithm. Each algorithm was run 40 times to generate the mean and error bars (only upward bars are displayed, corresponding to one standard deviation). B) For each of the three algorithms, we generated three subgroups of patterns containing p = 16, p = 4, or p = 2 patterns, respectivley, as well as an isolated pattern (p = 1). The table gives the expected total number of active neurons in each subgroup in a neural network of 100000 neurons if patterns have sparsity γ = 0.2% and a pairwise fraction of shared neurons c = 4%.

The dataset contains the activity of 4066 neurons recorded from the human MTL during the presentation of several visual stimuli. We can extract the experimental probability that a single neuron responds to exactly k different concepts (Fig 6A, black stars). From the probability distribution, we observe the existence of neurons responding to a large number of concepts (10 or more), but also a sizable fraction of neurons that respond to 5 or 6 different concepts. We will refer to those neurons as multi-responsive neurons.

To describe the data, we take into account the size and number of subgroups used in the experimental stimulation paradigm (SI). We find that only the iterative overlap-generating model fits the data (Fig 5C), i.e., it is the only one that predicts the correct probability of multi-responsive neurons. Since the iterative overlap-generating model is not based on a hierarchical generation of patterns, this suggests that the MTL encodes large subgroups of memory engrams in a non-hierarchical way, in agreement with earlier papers [6].

Robustness to heterogeneity

Because biological neural networks present different forms of heterogeneity, we have checked our model’s robustness to (i) the heterogeneity of frequency-current curves and (ii) dilution of the number of synaptic connections.

In the experimental data set, each neuron is characterized by different baseline firing rates and maximal rates in response to the preferred stimulus. We therefore introduce in our model heterogeneous frequency-current curves characterised by a minimum and a maximum firing rates (rmin)i and (rmax)i respectively and renormalize the network dynamics appropriately (Materials and methods). Despite the heterogeneity, simulations indicate that memory recall with heterogeneity is nearly indistinguishable from that without (compare Figs 7A and 2C and 7B and 4B).

Fig 7. The model is robust to heterogeneity of frequency-current curves.

Fig 7

Full network simulations A) in absence of adaptation (equivalent to Fig 2C) and B) in presence of adaptation and periodic inhibition. C) The model is robust to the dilutions of the synaptic connections. Full network simulations equivalent to Fig 2C.

Secondly, we allow the weight matrix to be diluted. Whereas so far we have assumed an “all-to-all” connectivity, we now introduce the dilution coefficient d, which indicates the fraction of actual synaptic connections compared to the N2 potential ones. Importantly, for sparsely connected networks, the theory still contains the parameter α for memory load, except that α is redefined to α = P/M where M is the mean number of connections per neuron (Materials and methods). Simulations in Fig 7C show that the model is robust for d = 0.8, i.e. after dropping 20% of all possible synaptic connections and an appropriate rescaling of the average connection strength. We have explored lower values of connection probability d, to approach a more bio-plausible regime [27]. However, increasing the dilution of the connections takes the dull network simulations away from the mean-field regime in which the theory is valid. The problem could be overcame by increasing proportionally the size of the network size N, but the computational cost grows with the square of network size N. The optimization of the simulation of a very diluted attractor neural network is beyond the goals of the current work.

Discussion

Our results bridge observations and theories from four different fields: first, experimental observations in the human MTL [1, 3, 6, 28, 29]; second, experimental observations of memory engrams [18, 19]; third, the theory of association chains used to explain free memory recall [912]; and fourth the classic theory of attractor neural networks [13, 25]. Our main result is that, in networks where concepts are encoded by sparse assemblies, the number of shared concept cells must be above chance level but below a maximal number in order to enable a reliable encoding of associations. With 4–5% overlap between memory assemblies as reported in the human MTL [6], association chains are possible for a range of parameters of frequency-current curves. Our work extends the classical mean-field formalism [15] to memory engrams that exhibit pairwise overlap, both in a static and chain-like retrieval setting.

While sparsity limits the number of concept cells shared by chance, Hebbian learning could induce sharing of concept cells between a small number of specific memories engrams [6]. The existence of a maximal fraction of shared neurons implies that Hebbian learning must work with an intrinsic control mechanism so as to avoid unwanted merging of separate concepts.

Our model allows us to make novel predictions for experiments. We see from Fig 1C and 1D that stimulating an associated concept is easier in the presence of shared concept cells than without. We extend this paradigm to several concepts and form the following prediction. Imagine having two sub-groups of overlapping memories both involving the same person P0, in one case related to a person P1 and in another case to a person P2. How can we dissociate between the 2 memories (P0 and P1 vs P0 and P2)? Our model predicts that the dissociation is possible by introducing different contexts (e.g., different places). Say P0 and P1 would be related to context C1 (Barcelona) and P0 and P2 would be related to context C2 (Pisa), so whenever P0 is activated together with C1, P1 will also tend to be activated but not P2 and whenever P0 is activated in context C2, P2 will tend to be co-activated but not P1. In Fig 8 we illustrate the experimental setup with the following simulation: we store 5 concepts, two contexts C1 and C2 and three persons P0, P1, P2. The context could be place C1 that has quite strong overlap with both concepts of persons P0 and P1, but not P2. The place C2, instead has overlap with P0 and P2 but not P1. During the simulation we give a weak ambiguous stimulation to P1 and P2. If we stimulate the concept of person P0, P0 has overlap with P1 and P2 and so far we have no bias in either way. Later we activate context C1, and this disentangles memories by favouring the recall of P0 together with P1. We emphasize that the activation of C1 provides a bias to activate concepts of Persons P0 and P1 but this bias is not strong enough on its own to recall P0 or P1. Even the co-activation of C1 and P0, is not enough, to automatically activate P1 on its own, in the context of the static model without adaptation, or inhibition. In our framework the activation of one context favors the recall of concepts associated to it, and it can be qualitative compared to neuron-specific gating model proposed in [30], where the activation of one context defines a subset of available neurons.

Fig 8. Simulation procedure that predicts that the context disentangles memories.

Fig 8

A) schematics of the overlaps between the five stored concepts: three persons (P0, blue; P1, orange; P2, green) and two contexts (C1, red; C2, violet) B) During the first stimulation period (indicated by the shaded grey area) the concept of Person 0 (P0) is strongly stimulated and during the second grey period, the concept of Context 1 (C1) is strongly stimulated, leading to the activation of P1. Person 1 (P1) and 2 (P2) are always weakly stimulated. C) Same as B, but in the second stimulation period we strongly stimulate concept C2 instead of C1, leading to the activation of P2. Parameters: intensity of the weak stimulation = 0.02, intensity of the strong stimulation = rmax, N = 20000, P = 5, b^=100, τ = 25 ms, rmax = 40 Hz, correlation within each of the two sub-groups P0-P1-C1 and P0-P2-C2, C = 0.1.

Association chains could form the basis of a “stream of thought” where the direction of transitions from one concept to the next is based on learned associations. Our oscillatory network dynamics is inspired by the model of Romani, Tsodyks and collaborators [912]. Even though in the Romani-Tsodyks model memory engrams are independent, finite size effects make some pairs of engrams share neurons above chance level which enables sequential recall in the presence of a periodic background input. We find that in large networks with sparse coding level (γ ≈ 0.23%), neurons shared by chance are not enough to reliably induce the retrieval of a chain of concepts. Sequential memory retrieval is possible only for overlaps larger than chance, potentially representing associations learned during real-life episodes. Instead of transitions triggered by oscillations, transitions could also be triggered by two adaptation mechanisms that act on different time scales without the need of periodic inhibition [3133].

Attractor networks with sparse patterns [17] and random connectivity [34] are suitable candidate models for biological memory because they present two features: (i) memory retrieval after stimulation with a partial cue and (ii) sustained activity after a stimulus has been removed. One of the points of critique of attractor networks, traditionally analyzed with the replica [35] or cavity [36, 37] method, has been the unrealistic assumption of symmetric connections. However, the derivation used here, based on dynamical systems arguments [38], can easily be generalized to the case of asymmetric connectivity.

We discuss the possibility of allowing all patterns to share the same amount of correlation at the very end of section “Overlapping background patterns”, in the Materials and Methods. In this case, we show that the standard deviation of the background noise is proportional to P2/N. If we make the standard mean-field assumption that both P and N tend to infinity with constant ratio P/N = α, then the quenched noise due to the presence of background patterns would diverge. However, we can define the memory load α′ = P2/N and we can assume that P and N tend to infinity, keeping α′ constant. In this scenario, the network capacity is drastically reduced. Alternative approaches to keep into account the fact that some neurons are more easily recruited, can be 1. To assign heterogeneous gain functions, therefore making some neurons more excitable, 2. To consider those neurons to be the “multi-responsive neurons” that we describe in section “How does a network embed groups of overlapping memories”.

The maximal number of patterns that can be stored in an attractor neural networks has attracted a lot of research [15, 17, 39]. However, does the hippocampus actually operate in the regime of high memory load? Even though we do not believe that hippocampus stores words, we may estimate a rough upper bound for the load α = P/M in the area CA3 of the hippocampus from the number of words a native English speaker knows (which is about P = 30’000 according to The Economist, Lexical Facts) and the number of input connections per neuron (which is about M = 30’000 [40]). Hence we estimate an upper bound of α about 1 if concepts are stored in area CA3—and our theory captures such a high load.

The maximal fraction of neurons which two concepts can share before they effectively merge into a single concept mainly depends on two dimensionless parameters: the rescaled threshold h^0=h0/(Armax) and the rescaled steepness b^=Armaxb. Since these parameters have so far not been estimated for the human CA3 area of the hippocampus or for the MTL in general, we checked parameters of the frequency-current curve of Macaque inferotemporal cortex [24], for which we find cmax = 34%.

Finally, by comparing the experimental measured number of concepts a neurons responds to and model predictions we find that the iterative overlap-generating model can predict the number of multi-responsive neurons quite accurately. The algorithm of how to build overlapping engrams plays a key role in fitting the experimental data and confirms the idea that memory engrams in the hippocampus are not hierarchically organised.

Materials and methods

We consider an attractor neural network of N rate units with firing rates ri, i = 1, …, N, in which P memory engrams are stored. Each engram μ, 1 ≤ μP, is given by a binary random pattern xμ=[ξ1μ,,ξNμ]T, where ξiμ{0,1} are Bernoulli random variables with mean ξiμ=γ. Here and in the following, 〈.〉 indicates expectation over the random numbers ξiμ that make up the patterns. Each neuron follows the rate dynamics of Eq (1), where the synaptic weight from neurons j to neurons i is defined as [17, 24]

wij=ANγ(1-γ)μ=1P(ξjμ-γ)(ξiμ-γ). (5)

Here, the constant A can be interpreted as the global scale of “connection strength”. For independent patterns, the synaptic weight wij has mean zero, 〈wij〉 = 0, and variance wij2=A2P/N2.

Model without adaptation and global feedback

In the results in Figs 1 and 2, each neuron follows the Wilson-Cowan dynamics [41]

τdridt=-ri+ϕ(hi), (6)

where the total input driving neuron i is

hi(t)=j=1Nwijrj(t)+Ii(t)=μ=1P(ξiμ-γ)mμ(t)+Ii(t). (7)

Here, Ii is the external input to neuron i and

mμ(t)=1Nγ(1-γ)rmaxj=1N(ξjμ-γ)rj(t). (8)

is the similarity measure (also called “overlap” in the attractor network literature). It measures the similarity (correlation) of the current network state with pattern μ; cf. Eq (2). In Figs 1C (during the first stimulation period), 2C and 6 the external input Ii=Iextξi1 is positive during stimulation for all neurons that belong to the assembly of pattern μ = 1, and zero for all other neurons.

In Eq (6), the input is passed through the transfer function ϕ (also called f-I curve in the Results section), which is chosen to be a sigmoid:

ϕ(h)=rmax1+e-b(h-h0). (9)

The parameters that define the transfer function can be interpreted as follows: rmax is the maximal firing rate, b is the steepness of the transfer function and h0 is the bias which is commonly interpreted as firing threshold. While h0 is a hard threshold for b → ∞, at finite b the model exhibits a soft threshold allowing firing activity even below h0.

Model with adaptation and global inhibitory feedback

For Fig 4 of Results, we added adaptation and a global inhibitory feedback to the model as described in previous studies [912] (see also [42] for a similar rate model with adaptation). Specifically, we add two negative feedback terms to the input potential:

hi(t)=j=1Nwijrj(t)-θi(t)-J0(t)γNj=1Nrj(t)+Ii, (10)

First, the variable θi(t) models neuron-specific firing-rate adaptation via the first-order kinetics

τθdθidt=-θi+Dθri. (11)

Here, τθ is the adaptation time constant and Dθ determines the strength of adaptation. Note that this adaptation model with a hyperpolarizing feedback current is equivalent to a model in which adaptation is implemented as an increase in the threshold h0 + θi(t).

Second, the global inhibitory feedback term proportional to J0(t) (third term in Eq (10)) provides a clock signal that triggers transitions between attractors. The strength of the global feedback, J0(t), is modulated periodically in time:

J0=12(Jmax-Jmin)sin(2πTJ0t-π2)+12(Jmax+Jmin) (12)

Importantly, inhibition proportional to the summed activity of the network units penalizes network configurations with many active neurons and therefore reduces stability of the double-retrieval state where two memories are recalled together. Here, the strength J0(t) of the global feedback is modulated periodically between values 0.7 and 1.2 with a sinusoidal time course of period TJ0 that sets the time scale of transitions between memories. Note that the model without adaptation and global feedback is a special case of the full model by setting Dθ = 0 and J0(t) ≡ 0. For the results of Fig 3, J0 is a constant parameter and Dθ = 0.

Non-dimensionalization of the model

The calculations below are considerably simplified if the model is nondimensionalized. We take into account that rmax has units of 1/time and the parameter A has units of current ⋅ time and measure in the following time in units of rmax-1 and current input in units of Armax.

Model without adaptation and global feedback

Using the dimensionless quantities

h^i=hiArmax,h^0=h0Armax,b^=bArmax,r^i=rirmax (13)
w^ij=wijA,τ^=τrmax,I^i(t)=Ii(t)Armax, (14)

the nondimensionalized model without adaptation reads

τ^dr^idt^=-r^i+ϕ^(h^i),withh^i=j=1Nw^ijr^j+I^i (15)

with a transfer function ϕ^(h^)=1/{1+exp[-b^(h^-h^0)]}.

Model with adaptation and global feedback

Introduction of further dimensionless quantities

τ^θ=τθrmax,θ^=θArmax,D^θ=DθA,J^0(t^)=J0(t^/rmax)A (16)

leads to the nondimensionalized model with adaptation

τ^dr^idt^=-r^i+ϕ^(h^i), (17)
τ^θdθ^idt^=-θ^i+D^θr^i (18)

with input

h^i=j=1Nw^ijr^j-θ^i-J^0(t)γNj=1Nr^j(t)+I^i. (19)

Review of attractor theory

Starting from the overlap definition Eq (2), we can write equations for the overlaps variables. We first focus on the model without adaptation and global feedback. For this case, we follow an approach well-known in literature [17, 38]. Taking the temporal derivative of the similarity mμ yields

τ^dmμdt^=1Nγ(1-γ)j=1N(ξjμ-γ)τ^dr^jdt^. (20)

By inserting the expression for the single neuron dynamics Eq (15) and recognizing the overlap definition Eq (2), we obtain:

τ^dmμdt^=-mμ+Fμ(m1,,mP). (21)

with

Fμ(m1,,mP)=1Nγ(1-γ)j=1N(ξjμ-γ)ϕ^(h^j). (22)

In this equation, the dependence on the overlaps m1, …, mP is contained in the input term h^j. From Eq (15) and by using the definition of the weights wij, Eq (5), we have

h^i=μ=1P(ξiμ-γ)mμ+I^i. (23)

In what follows, we are interested in finding equilibrium solutions of Eq (21), for which mμ = Fμ(m1, …, mP). Because we are interested in pattern retrieval, we consider, without loss of generality, the retrieval of pattern 1. To this end, we assume that among all mμ, only m1 is significantly larger than zero. This network state could be the result of a stimulation in the direction of pattern 1: I^i(t)=I^(t)ξi1. Under this assumption we can re-write the input term h^i isolating the contribution from m1

h^i=(ξi1-γ)m1+μ=2P(ξiμ-γ)mμ+I^(t)ξi1. (24)

We call the patterns that are not recalled “background patterns”; in the present case, these are all patterns for which μ ≥ 2. The second term on the r.h.s of Eq 24 represents the contribution from the background patterns causing some degree of heterogeneity of the input potential for neurons with the same selectivity to pattern 1. For large P, this heterogeneity can be captured by replacing the term μ=2P(ξiμ-γ)mμ by a Gaussian random variable with mean zero and variance

σ2=1Ni=1Nμ=2Pν=2P(ξiμ-γ)(ξiν-γ)mμmν=γ(1-γ)ν=2P(mν)2 (25)

To obtain the result in Eq (25), we used the assumption that patterns ξiμ and ξiν are uncorrelated, and the fact that only the term for μ = ν survives because (ξiμξiν+γ2-γξiν-γξiμ)i=δμνγ(1-γ). Here and in the following, the brackets 〈xii of a variable xi denotes the population average xii=1Ni=1Nxi. In the next passages, we compute (mμ)2, μ ≠ 1, in the large network limit N → ∞. For μ = 2, …, P, we expand Eq (22) around mμ = 0 up to first order in mμ,

Fμ(m1,,mP)1γ(1-γ)Nj=1N[(ξjμ-γ)ϕ^(h^j)|mμ=0+(ξjμ-γ)2ϕ^(h^j)|mμ=0mμ]. (26)

At equilibrium, mμ = Fμ(m1, …, mP), and we thus have

mμ(1-j=1N(ξjμ-γ)2ϕ^(h^j)γ(1-γ)N)=1γ(1-γ)Nj=1N(ξjμ-γ)ϕ^(h^j),μ2. (27)

On the left hand side of the last expression, we can make some simplification, utilizing the fact that ξjμ is uncorrelated with ϕ′(hj) in the N → ∞ limit:

limN1Nj=1N(ξjμ-γ)2ϕ^(h^j)γ(1-γ)=limN1Nj=1Nϕ^(h^j)=ϕ^(h^i)i. (28)

We can therefore define the quantity qϕ^(h^i)i as the expectation of ϕ^(h^i) over neurons. As a consequence, mμ can be written as

mμ=1γ(1-γ)(1-q)Nj=1N(ξjμ-γ)ϕ^(h^j). (29)

Using this equation, we can finally compute the square of mν for ν ≥ 2:

(mν)2=1γ2(1-γ)2(1-q)2N2i=1Nj=1N(ξiμ-γ)(ξjμ-γ)ϕ^(h^i)ϕ^(h^j), (30)
=1γ2(1-γ)2(1-q)2N2i=1N(ξiμ-γ)2[ϕ^(h^i)]2, (31)
=pγ(1-γ)(1-q)2N. (32)

where pϕ^2(h^i)i. Similarly as in Eq (25), we used that in the double sum i=1Nj=1N, only the term i = j survives:

1N2i=1Nj=1N(ξiμ-γ)(ξjμ-γ)ϕ^(h^i)ϕ^(h^j)=1N2i=1N(ξiμ-γ)2[ϕ^(h^i)]2+1N2i=1NjiN(ξiμξjμ-γξiμ-γξjμ+γ2)ϕ^(h^i)ϕ^(h^j). (33)

The population average in the last term factorizes owing to the independence of ξiμ and h^i in the limit N → ∞, and thus vanishes:

1N2i=1NjiN(ξiμξjμ-γξiμ-γξjμ+γ2)ϕ^(h^i)ϕ^(h^j)=[1N2i=1NjiN(ξiμξjμ-γξiμ-γξjμ+γ2)]·[1N2i=1NjiNϕ^(h^i)ϕ^(h^j)]=0 (34)

Here, we have used that the first factor is a vanishing population average: γ2 − 2γ2 + γ2 = 0. The standard deviation of the neuron-to-neuron variability (heterogeneity), Eq (25), is thus

σ=αR,Rp(1-q)2. (35)

As a result, the input potentials, Eq (24), can be expressed at equilibrium as

h^i=(ξi1-γ)m1+αRZi+I·ξi1, (36)

where ZiN(0, 1) are Gaussian random variables. Therefore, we find from Eq (22) that the overlap m1 at equilibrium satisfies

m1=F1(m1)1γ(1-γ)(ξi1-γ)ϕ^(h^i)i, (37)

where h^i is given by Eq (36). The population averages 〈⋅〉i can be treated as expectations over the independent random variables ξi1 and Zi. On the one hand, ξi1 is a Bernoulli variable such that ξi1=1 with probability P1 = γ and ξi1=0 with probability P0 = 1 − γ. On the other hand, Zi is a standard normal random variable with probability density pZ(z)=exp(-z2/2)/2π. We can therefore rewrite the population average in Eq (37) explicitly resulting in

F1(m1)=1γ(1-γ)k=0,1Pk(k-γ)ϕ^(h^k(m1,z))e-z22dz2π (38a)

where we defined

h^k(m,z)=(k-γ)m+αR(m)z+Ik,k{0,1}. (38b)
R(m)=p(m)[1-q(m)]2 (38c)
q(m)=k=0,1Pkϕ^(h^k(m,z))e-z22dz2π (38d)
p(m)=k=0,1Pkϕ^2(h^k(m,z))e-z22dz2π. (38e)

Dynamical mean-field equations

Approximating the function F1(m1, …, mP) in the dynamical Eq (21) by the simplified function F1(m1) derived in the previous section, the retrieval of pattern 1 can be described by the closed dynamical mean-field equation

τ^dm1dt^=-m1+F1(m1). (39)

For small network load, α ≪ 1, the effect of background patterns in Eqs (36) and (38b) can be neglected. In this case, we can set mν = 0 for ν ≥ 2 and it is straightforward to calculate F1(m1). The result is Eq (38) with αR = 0:

F1(m1)=1γ(1-γ)k=0,1Pk(k-γ)ϕ^((k-γ)m+Ik). (40)

Note that the mean-field dynamics Eqs (39) and (40) in the small-load limit (α = P/N → 0 as N → ∞) is exact.

For large network load α (i.e. α = O(1) as N → ∞), the effect of background patterns may not be negligible. As shown above, the equilibrium solution in this case is given by Eqs (37) and (38). For the non-stationary dynamics Eq (39), we still use Eq (38) for F1(m1) even though this equation has been derived under the assumption of stationarity. This means that we assume that the overlaps with background patters are always at their equilibrium value while the overlap variables with retrieved patterns evolve in time. While this assumption is not strictly true, it gives results in excellent agreement with full network simulations (Fig 2C). In other words, the mean-field dynamics in Fig 2C is correct before stimulus onset and after the system has retrieved pattern 1, whereas during transients the dynamics with F1(m1) given by Eq (38) is an approximation. Moreover, we argued in the discussion that assuming a small or even negligible network load α ≥ 0 is a biologically plausible assumption for the human MTL. In this case, the dynamical mean-field equations for α = 0, Eqs (39) and (40), are valid.

Mean-field equations for two overlapping patterns

Overlap between two engrams is implemented as two patterns with a non-zero Pearson correlation coefficient. Without loss of generality, we take patterns ξ1 and ξ2 to be correlated, while all other P − 2 patterns are independent. We define the correlation C between the two patterns as the Pearson correlation coefficient (covariance/variance):

C=Cov(ξi1,ξi2)Var(ξiμ)=P11-γ2γ(1-γ), (41)

where P11=P(ξi1=1,ξi2=1)=ξi1·ξi2 is the joint probability of a neuron to be selective to both patterns. We generate correlated patterns with mean activity ξi1i=ξi2i=γ and correlation coefficient C, using the procedure described in SI. The fraction c of shared neurons is related to C by the identity c = C(1 − γ) + γ.

We are interested in the retrieval dynamics of the correlated patterns ξ1 and ξ2.

The derivation of the system of mean-field equations in case two correlated pattern case in Eq (43) is analogous to that described in the section above. To that end, we also assume that the stimulus only depends on the selectivities ξi1 and ξi2 of the neuron, i.e. I^i(t)=Iξi1,ξi2(t) for all neurons i = 1, …, N. The input term hi has now two non-negligible terms, both from ξ1 and ξ2:

h^i(m1,m2)=(ξi1-γ)m1+(ξi2-γ)m2+αR(m1,m2)Zi+Iξi1,ξi2(t), (42)

where

τ^dm1dt=-m1+1γ(1-γ)(ξi1-γ)ϕ^(h^i)i (43a)
τ^dm2dt=-m2+1γ(1-γ)(ξi2-γ)ϕ^(h^i)i (43b)
q=ϕ^(h^i)i (43c)
p=ϕ^2(h^i)i (43d)
R(m1,m2)=p(1-q)2, (43e)

and ZiN(0, 1), i = 1, …, N are independent, standard normal random variables. Analogous to Eq (38) we compute the population averages in Eq (43) explicitly leading to the mean-field dynamics

τ^dm1dt=-m1+F1(m1,m2) (44a)
τ^dm2dt=-m2+F2(m1,m2). (44b)

Here, the nonlinear functions F1 and F2 are given by (μ = 1, 2)

Fμ(m1,m2)=x1=0,1x2=0,1xμ-γγ(1-γ)Px1x2dz2πe-z22ϕ^(h^x1x2(m1,m2,z)) (44c)

with

h^x1x2(m1,m2,z)=ν=1,2(xν-γ)mν+Ix1,x2(t)+αRh(m1,m2)z. (44d)

This function can be interpreted as the mean-field input potential of a neuron with selectivity ξi1=x1 and ξi2=x2, background variability Zi = z, in the case when the network has overlap m1 and m2 with patterns 1 and 2, respectively. The last term in Eq (44d) captures the influence of background patterns on the mean-field dynamics of m1(t) and m2(t). This influence is quantified by the function Rh(m1, m2) representing the mean squared overlap of the system with the background patterns μ = 3, …, P. We used a subscript h for Rh(m1, m2) to indicate that R depends functionally on the mean-field potential h^x1x2(m1,m2,z). This functional is given by

Rh(m1,m2)=p(1-q)2 (44e)
q=x1=0,1x2=0,1Px1,x2ϕ^(h^x1x2(m1,m2,z))e-z22dz2π (44f)
p=x1=0,1x2=0,1Px1,x2ϕ^2(h^x1x2(m1,m2,z))e-z22dz2π. (44g)

The mean-field input potentials h^x1x2(m1,m2,z), x1, x2 ∈ {0, 1}, needed in Eq (44c) are obtained from the self-consistent solution of the functional Eqs (44d)(44g), details are in Section “Numerical solutions”. Eq (44) simplify significantly for α = 0, which is the parameter choice of most figures, so it is worth writing explicitly the m1 and m2 dynamics in the case of negligible load:

τ^dm1dt=-m1+1γ(1-γ){P11(1-γ)ϕ^[(1-γ)(m1+m2)+I1+I2]+P10(1-γ)ϕ^[(1-γ)m1-γm2+I1]-P01γϕ^[-γm1+(1-γ)m2+I2]-P00γϕ^[-γ(m1+m2)]}, (45a)
τ^dm2dt=-m2+1γ(1-γ){P11(1-γ)ϕ^[(1-γ)(m1+m2)+I1+I2]-P10γϕ^[-γm1+(1-γ)m2+I1]+P01(1-γ)ϕ^[(1-γ)m1-γm2+I2]-P00γϕ^[-γ(m1+m2)]}. (45b)

Here, we have used the specific form Ix1, x2(t) = I1(t)x1 + I2(t)x2, x1, x2 ∈ {0, 1}, of the external currents, where the coefficients I1(t) and I2(t) are the external input currents given selectively to the neurons of pattern 1 and 2, respectively.

The same procedure can be generalized to generate several correlated binary patterns, as in Fig 2B. The generalization is straightforward, we can re-write the system in Eq (44) with one dynamical equations for each correlated pattern and add the relative terms in the input h^(x1,..,xμ,z). Finally we need the joint probabilities Px1,x2,x3 and Px1,x2,x3,x4. The general formula for the joint probability is given in Eq (97) below. For instance, for three correlated patterns, the mean-field dynamics analogue to Eq (44) is given by

τ^dm1dt=-m1+1γ(1-γ)(ξi1-γ)ϕ^(h^i)i (46a)
τ^dm2dt=-m2+1γ(1-γ)(ξi2-γ)ϕ^(h^i)i (46b)
τ^dm3dt=-m3+1γ(1-γ)(ξi2-γ)ϕ^(h^i)i (46c)
q=ϕ^(h^i)i (46d)
p=ϕ^2(h^i)i (46e)
R(m1,m2,m3)=p(1-q)2 (46f)

where

h^i(m1,m2,m3)=(ξi1-γ)m1+(ξi2-γ)m2+(ξi3-γ)m3+αR(m1,m2,m3)Zi+Ii. (47)

Excluding self-interaction

In Section “Review: mean-field equations for independent patterns” we show the derivation of the mean-field equations for the retrieval of one pattern in an attractor neural network with self-connections (“autapses”). To make the network more biologically plausible and to avoid the creation of local minima around the attractors corresponding to the stored patterns, we now consider the case where self-interactions are excluded [38]. The effect of excluding the self-interaction term on input terms in Eqs (38a) and (44d) is captured by a the correction term [38]:

qαϕ^(h^)(1-q). (48)

Then, Eq (44d) becomes

h^x1x2(m1,m2,z)=(x1-γ)m1+(x2-γ)m2+qαϕ(h^x1x2(m1,m2,z))(1-q)+αrz+Ix1,x2(t), (49)

where again Ix1,x2(t)=I1(t)x1+I2(t)x2 and x1, x2 ∈ {0, 1}. In our simulation, we used the same stimulation for both patterns, i.e. I1(t) = I2(t) ≡ I(t). The input term in Eq (49), is solved recursively in Fig 2B, left hand side.

Overlapping background patterns

In Fig 2B we explore the possibility that the maximal fraction of shared neurons cmax might be influenced by the presence of Pearson’s correlation between pairs of background patterns. Moreover, the assumption that there are many subgroups of overlapping memory engrams seems more biologically plausible. If we let the background patterns to be overlapping in subgroups of 2 patterns each, the variable R in the mean-field equations of Eq (43) needs to be replaced by In this section, we provide the derivation of the critical correlation in the presence of correlations between background patterns (main text, Fig 2B left).

To start with, let us suppose that each pattern is correlated with just one other, so a given pattern ξν is only correlated with one other pattern ξν, νν′. In the following, the prime notation ν′ denotes for any given pattern ν the index of the associated correlated pattern. What changes, compared to the derivation in Section “Review of attractor theory”, is the variance of the heterogeneity term in Eq (25), σ2=μ,ν>2(ξiμξiν-γξiν-γξiμ+γ2)mμmν, where patterns are pair-wise correlated. For a fixed pair (ν, ν′), we obtain

(ξiμξiν-γξiν-γξiμ+γ2)=δμνγ(1-γ)+δμ,ν(P11-γ2), (50)
=γ(1-γ)[δμ,ν+δμ,νC], (51)

where the second term at the right hand side captures the effect of correlation. Background patterns can still be approximated by a Gaussian variable in the large network limit, in this case with variance:

σ2=γ(1-γ)ν=3P[(mν)2+Cmνmν]. (52)

In order to compute Eq (52), we need to derive (mν)2 and mνmν. In what follows, we use the same definition of q and p as in Eq (38). Let us start from writing mν at the first-order Taylor expansion for mν and mν both small:

Fν(m1,,mP)1γ(1-γ)Ni=1N(ξiν-γ)ϕ^(h^i)+1γ(1-γ)Ni=1N(ξiν-γ)2ϕ^(h^i)mν+1γ(1-γ)Ni=1N(ξiν-γ)(ξiν-γ)ϕ^(h^i)mν (53)

Then, following analogous passages as Eqs (2629) we obtain the expressions:

(1-q)mν=1γ(1-γ)Ni=1N(ξiν-γ)ϕ^(h^i)+qCmν, (54a)
(1-q)mν=1γ(1-γ)Ni=1N(ξiν-γ)ϕ^(h^i)+qCmν. (54b)

Eq (54) is a linear system of the form:

Dmν=B+qCmν (55a)
Dmν=B+qCmν (55b)

where B=1γ(1-γ)Ni=1N(ξiν-γ)ϕ^(h^i), D = (1 − q), similarly, B=1γ(1-γ)Ni=1N(ξiν-γ)ϕ^(h^i), and C=P11-γ2γ(1-γ). System Eq (54) has solutions

mν=DB+qCBD2-C2 (56a)
mν=DB+qCBD2-C2. (56b)

We are now ready to write the expressions for (mν)2 and mνmν:

(mν)2=D2B2+(qC)2(B)2+2DqCBB(D2-(qC)2)2,mνmν=D2BB+qCD(B)2+qCDB2+C2BB(D2-(qC)2)2, (57)

where B and B′ are analogous to the term on right hand side of Eq (27): (B)2=B2=pNγ(1-γ). Note that B2 and B2 are equal in expectation, however BB′ ≠ B2 in expectation due to the correlation between ξν and ξnu. The last missing piece is the cross term BB′ which can also be calculated analogously to Eq (32):

BB=1[Nγ(1-γ)]2i=1Nj=1N[(ξiν-γ)(ξjν-γ)ϕ^(h^i)ϕ^(h^j)]==1[Nγ(1-γ)]2i=1N[(P11-γ2)]ϕ^2(h^i)==P11-γ2N[γ(1-γ)]2p==CpNγ(1-γ). (58)

In the first passage, we used the fact that the first order approximations of ϕ^(h^i) and ϕ^(h^j) are independent (see Eq (53). Plugging the expressions for (mν)2 and mνmν into Eq (52), we obtain the variance

σ2=γ(1-γ)(D2-q2C2)2··ν3{D2B2+q2C2(B)2+2BDqCB+CDBB+DqC2B2+DqC2(B)2+q2C2Bb}==αp(D2-q2C2)2[D2+q2C2+4DqC2+q2C4] (59)

Finally we can write the expression for the effective R = 〈σ2〉/α, under the effect of pairwise correlation between background patterns:

R=p[D2-(qC)]2γ(1-γ)[D2+q2C2+4DqC2+q2C4]==p[D2-(qC)]2[(1-q)2+q2C2+4(1-q)qC2+(1-q)2C4]. (60)

The so obtained expression for R can be substituted to that of system Eq (43).

The derivation of Eq (60) can be extended to the case in which background patterns share correlation C between non overlapping groups of exactly p patterns. To do so, we need to extend the system in Eq (55b) linearly:

M=(D-qC-qC-qC-qCD-qC-qC-qC-qCD-qC-qC-qC-qCD)·(mνmνmνmνn)=(BBBBn) (61)

where M is a p × p matrix. In order to find the solution mν=M-1B of system Eq (61) we need to invert the matrix M. Indeed matrices M of the form

{M}ij={Difi=j-qCifij (62)

are invertible. In order to derive the inverted matrix we can rewrite the matrix M as M = AqCvvT, where A is diagonal with entries Ai,i = DqC and v is a column vector of all ones. If M and A are both invertible, we can use the Sherman-Morrison formula:

M-1=(A-qCvvT)-1=A-1--qCA-1vvTA-11-qCvTA-1v. (63)

Since A is diagonal, then (A-1)i,i=(Ai,i)-1=1D+qC. Then

{M-1}ij={1D+qC-1c(D-qC)2ifi=j-1c(D-qC)2ifij (64)

where the constant c=-1qC+n1D+qC. Terms can be re-arranged to obtain:

{M-1}ij=1Z{D+qC-1ifi=j-1ifij (65)

where

Z=-D+(n-1)qCqC(D+qC)2. (66)

As a final note, we consider the case in which all patterns are equally correlated, then

σ2=ν3μ3[PNγ(1-γ)(mν)2+P2N(P11-γ2)mνmμ] (67)

The second term in the variance diverges as N → ∞ because P = αN unless P11 = γ2. We conclude that, in the limit N → ∞ and assuming that the ration between patterns and neurons is a finite constant α > 0, it is not possible to allow a correlation C > 0 between all stored patterns.

Extract numerically the maximal correlation

In order to numerically compute the maximal correlation, we use the bifurcation diagram in Fig 9B: the fixed points in the phase-plane are projected on the m1-axis and their positions are plotted as C increases. From the bifurcation diagram we can extract the value Cmax at which the single retrieval states merge with the saddle points and disappear. Thus, at C = Cmax we have a saddle-node bifurcation (see the derivation of Eq (4)).

Fig 9.

Fig 9

A) Four phase-planes of the dynamics of variables m1 and m2 for different values of correlation C. Fixed points are color-coded by their stability: blue = stable, green = saddle and red = unstable. B) Bifurcation diagram. The projection of the fixed points position on m1 is plotted against C. The critical correlation Cmax is highlighted by the black dashed line. C) Same as B, but in the limit b^, which leads to Ch^0. Parameters: γ = 0.002, b^=100, h^0=0.25, α = 0.

The value of the maximal correlation Cmax can be calculated analytically in the limit of infinite steepness b → ∞, vanishing load α = 0, vanishing sparseness and load, γ → 0 and α = 0. This value matches the one extracted from the bifurcation plot in Fig 9C.

Mean-field dynamics in the presence of adaptation and global feedback

In order to derive the mean-field equations for the model with adaptation and global feedback, we consider the simplest case, in which only two patterns are correlated (ξ1 and ξ2) while all the others are independent. Analogously to Section “Mean field equations for two correlated patterns”, we can group neurons into four homogeneous populations (in the presence of background patterns, the neural populations will be slightly inhomogeneous): neurons that are selective to both patterns (ξi1=ξi2=1), neurons selective to pattern 1 but not 2 (ξi1=1,ξi2=0), neurons selective to pattern 2 but not 1 (ξi1=0,ξi2=1) and neurons that are selective to neither pattern 1 or 2 (ξi1=ξi2=0). The probability for a neuron to belong to population (x1, x2), i.e. ξi1=x1 and ξi2=x2, is the joint probability Px1,x2 in Eq (93). Furthermore each population (x1, x2) is characterized by a different firing threshold θx1,x2(t). Analogous to the derivation of Eq (44), we obtain the six-dimensional mean-field dynamics:

τ^dm1dt=-m1+F1(m1,m2,{θ^x1x2}), (68a)
τ^dm2dt=-m2+F2(m1,m2,{θ^x1x2}), (68b)
τ^θdθ^x1x2dt=-θx1x2+θ^0+D^θr^x1x2(m1,m2,θ^x1x2),x1,x2{0,1}. (68c)

Here, we have introduced the nonlinear functions

Fμ(m1,m2,{θ^x1x2})=x1=0,1x2=0,1Px1,x2xμ-γγ(1-γ)r^x1x2(m1,m2,θ^x1x2),μ=1,2 (68d)
r^x1x2(m1,m2,θ^x1x2)=ϕ^(h^x1x2(m1,m2,θ^x1x2,z))e-z22dz2π, (68e)

with the mean-field input potential

h^x1,x2(m1,m2,{θ^x1x2},z)=(x1-γ)m1+(x2-γ)m2+αRz-θ^x1x2-J^0(t)γk1=0,1k2=0,1Pk1,k2r^k1k2(m1,m2,θ^k1k2). (68f)

and the mean squared overlap of background patterns R given by

R=p(1-q)2 (68g)
q=x1=0,1x2=0,1Px1,x2ϕ^(h^x1,x2(m1,m2,{θ^x1x2},z))e-z22dz2π (68h)
p=x1=0,1x2=0,1Px1,x2ϕ^2(h^x1,x2(m1,m2,{θ^x1x2},z))e-z22dz2π (68i)

In order to obtain r^x1x2(m1,m2,θ^x1x2) in Eqs (68c), (68d) and (68e)(68i) need be solved self-consistently (for more details, see Section “Numerical Solutions”).

Analogously to the previous section, we can extract numerically the minimal and maximal correlation using the bifurcation analysis described in the Supplementary Information (S2 Fig).

Stability of the fixed points

In order to compute the stability of the fixed points in Fig 9, we compute the eigenvalues of the Jacobian matrix J of the m1m2 dynamics at the point location in the m1m2 plane. The Jacobian matrix is symmetric and the three independent entries are computed from Eq (43) as:

J11(m1,m2)=(-m1+F1(m1,m2))m1=-1+Aγ(1-γ)(ξi1-γ)2ϕ(hi)i (69)
J12(m1,m2)=(-m1+F1(m1,m2))m2=J21(m1,m2)=Aγ(1-γ)(ξi1-γ)(ξi2-γ)ϕ(hi)i (70)
J22(m1,m2)=(-m2+F2(m1,m2))m2=-1+Aγ(1-γ)(ξi2-γ)2ϕ(hi))i (71)

In the numerical computation of the J, we exploited the symmetries under exchange of m1 and m2: J22(m1, m2) = J11(m2, m1) and J21(m1, m2) = J12(m2, m1).

Analogously to the system in Eq (43), also the Jacobian matrix can be adapted to the case of 3 or 4 correlated pattern, using the joint probabilities in Eq (97) and the generic forms

Jμ,μ(mμ,mμ)=-1+Aγ(1-γ)(ξiμ-γ)2ϕ(hi)i, (72)
Jμ,ν(mμ,mν)=Aγ(1-γ)(ξiμ-γ)(ξiν-γ)ϕ(hi)i (73)

The limit case b → ∞ (Heaviside transfer function)

In the limit b → ∞, the transfer function converges to the Heaviside step function ϕ(h) = rmax Θ(hh0) which leads to some simplifications in the explicit writing of the mean-field system Eq (43). First of all, we can rewrite ϕ2(h)=rmax2Θ(h-h0) and ϕ′(h) = rmax δ(hh0), where δ(x) is the Dirac delta function. In the dimension-less notation, we would then write ϕ(h)=Θ(h-h^0), ϕ2(h)=Θ(h-h^0) and ϕ(h)=δ(h-h^0), where δ(x) We can re-write Eq (43) as as follows:

x1=0,1x2=0,1Px1x2dz2πe-z22Θ(h^x1x2(m1,m2,z)-h^0) (74)
τ^dm1dt=-m1+x1=0,1x2=0,1x1-γγ(1-γ)Px1x2dz2πe-z22Θ(h^x1x2(m1,m2,z)-h^0) (75a)
τ^dm2dt=-m2+x1=0,1x2=0,1x2-γγ(1-γ)Px1x2dz2πe-z22Θ(h^x1x2(m1,m2,z)-h^0) (75b)
q=x1=0,1x2=0,1Px1x2dz2πe-z22δ(h^x1x2(m1,m2,z)-h^0) (75c)
p=x1=0,1x2=0,1Px1x2dz2πe-z22Θ(h^x1x2(m1,m2,z)-h^0) (75d)
R(m1,m2)=p(1-q)2, (75e)

where

h^i(m1,m2)=(ξi1-γ)m1+(ξi2-γ)m2+αR(m1,m2)Zi+Iξi1,ξi2(t). (76)

In the next passage the erfc function come at hands. Erfc is defined as erfc(x) = 1 − erf(x), where erf is the error function and we use the following identity, which follows directly from the definition:

ce-x222πdx=12erfc(c2). (77)

The identity in Eq (77) allows to rewrite the system Eq (75) as:

τdm1dt=-m1+12γ(1-γ)x1x2Px1,x2(ξi1-γ)erfc(h0-h^i2αR) (78a)
τdm2dt=-m2+12γ(1-γ)x1x2Px1,x2(ξi2-γ)erfc(h0-h^i2αR) (78b)
q=x1x2Px1,x212πe-(h^i)22αR (78c)
p=12x1x2Px1,x2erfc(h0-h^i2αR) (78d)
R=p(1-q)2. (78e)

It is important to make a remark on the units of the system: if we do not use the unit-less notation, then the variable q is proportional to rmax and the variable p is proportional to rmax2.

If we consider the case where neural self-interaction is excluded, an extra correction term should be added to the input h(x1, x2, z)) and its limit for b → ∞ reads as follows:

A2qαϕ(h)(1-Aq)bAαrmax2. (79)

In the dimensionless notation, the correction term is reduced to a constant α2 and we can write explicitly the input term h^x1x2(m1,m2,z), when self interaction is excluded:

h^x1x2(m1,m2,z)=(x1-γ)m1+(x2-γ)m2+α2+αrz+Ix1,x2(t), (80)

Finally, in order to derive the critical correlation let us consider the retrieving state of pattern 1 (that of pattern 2 is symmetric with respect to the m1m2 axis in absence of external input): in this state, m1 = 1, and m2 depends on the correlation C, as it emerges from Fig 9A, however what is the exact value? It can be computed analytically in the limit, b → ∞ and γ → 0. We rewrite the equation for m2 in Eq 45 as:

τ^dm2dt=-m2+{1γP11(1-γ)ϕ^[(1-γ)(m1+m2)+I1+I2]-11-γP10γϕ^[-γm1+(1-γ)m2+I1]+1γP01(1-γ)ϕ^[(1-γ)m1-γm2+I2]-11-γP00ϕ^[-γ(m1+m2)]}. (81)

Next we need to write the probabilities Px1,x2 as a function of γ:

P11=γ2+γ(1-γ)C (82a)
P10=P01=P(x2=0|x1=1)P(x1=1)=γ(1-γ)-Cγ(1-γ) (82b)
P00=1-P11-P10-P01-P00=(1-γ)2+γ(1-γ)C. (82c)

Then, in the limit γ → 0, we have

τ^dm2dt=-m2+{Cϕ^[m1+m2+I1+I2]+(1-C)ϕ^[m2+I2]-ϕ^[0]}. (83)

Using the limit b → ∞, in the assumption that we are recalling the first concept, m1 = 1, there is not any external input, we obtain

m2=CΘ(1+m2-h^0)+(1-C)Θ(m2-h^0)-Θ(-h^0). (84)

Since h^0<1 and m2 ≥ 0, the term Θ(1+m2-h^0)=1. On the other hand, Θ(-h^0)=0. Therefore, m2 = C if m2<h^0 (cf. the bifurcation diagram in Fig 9C). In the limit case where m2h^0 we obtain Eq (4):

CmaxCmaxh^0=h0Armax. (85)

How does a network embed groups of overlapping memories? Different algorithms to generate correlated patterns

In this section we describe how a single subgroup of K patterns with sparseness γ is created according to three different algorithms. Patterns belonging to the same subgroup correspond to associated concepts and share pair-wise a fraction of neurons c. For the hierarchical generative model and the indicator neuron model, we associate the algorithm to the theoretical probability distribution for a neuron to respond exactly to k concepts out of K.

Hierarchical generative model

We start by creating a “parent” pattern which is not part of the subgroup. The parent pattern has sparseness λ = γ/c: prob (ξiparent=1)=λ. We proceed to create the actual patterns by copying the ones of the parent pattern with probability c, while the zeros stay untouched, following the conditional probabilities

prob(ξiμ=1|ξiparent=1)=c, (86a)
prob(ξiμ=1|ξiparent=0)=1-c, (86b)
prob(ξiμ=1|ξiparent=0)=0, (86c)
prob(ξiμ=0|ξiparent=0)=1. (86d)

This ensures that the patterns ξiμ have the right sparseness and fraction of pair-wise shared neurons. The sparseness can be checked as follows:

prob(ξiμ=1)=λc=γ, (87a)
prob(ξiμ=0)=λ(1-c)+(1-λ)=1-γ. (87b)

On the other hand, the fraction of pair-wise shared neurons is given by the conditional probability that a neuron is part of pattern ν given that is it part of pattern μ:

prob(ξiν=1|ξiμ=1)=c+(1-c)δμν. (88)

Hence the fraction of shared neurons as it should be. More generically, the theoretical probability (or the expectation) that a neuron participates in k patterns out of K is

PK(k)=K!(K-k)!k!λck(1-c)K-k+(1-λ)δk0. (89)

Indicator neuron model

To create a subgroup of pair-wise associated patterns using indicator neurons (i.e. neurons that indicate the subgroup), we proceed in three steps:

  1. generate with probability λ a small subset of indicator neurons for this subgroup. This subset gives a parent pattern of indicator neurons:
    prob(ξiparent=1)=λind=cγ-γ2(1-ϵ)2-2γ(1-ϵ)+cγ. (90)

    In a network of N neurons, nind = λind N are indicator neurons.

  2. To create each pattern μ of the subgroup, copy indicator neurons with probability (1 − ϵ):
    prob(ξiμ=+1|ξiparent=1)=1-ϵ (91)
  3. Add random neurons (with probability Ω) to pattern μ
    prob(ξiμ=1|ξiparent=0)=Ω=γ-λind(1-ϵ)1-λind. (92)

    This last probability can also be interpreted as the probability of flipping a 0 from the parent pattern when creating the correlated patterns.

With this construction, the total number of neurons that are active in pattern μ is λindN(1-ϵ)+(1-λind)Nγ-λind(1-ϵ)1-λind=Nγ as it should be. The value of λind is chosen in order to ensure that the fraction of pair-wise shared neurons is c. Indeed we found it by solving = λ(1 − ϵ)2 + (1 − λ)Ω.

In this work, we always choose ϵ such that ϵ = Ω, For specific case ϵ = Ω, it is possible to derive ϵ directly from the correlation C and the sparsity γ.

We create a “parent” pattern ξ0 with mean activity ξi0i=λ. Starting from ξ0 we create ξ1 and ξ2, each unit i has probability ϵ of being the equal to ξi0 and probability 1 − ϵ of being flipped compared to ξi0. All other patterns ξμ, μ = 3, …, P are sorted independently from a Bernoulli distribution with probability P(ξiμ=1)=γ. The joint probabilities Pkl=P(ξi1=k,ξi2=l) can be computed as functions of the probabilities λ and ϵ:

P11=λϵ2+(1-λ)(1-ϵ)2, (93a)
P10=P01=λϵ(1-ϵ)+(1-λ)ϵ(1-ϵ)=ϵ(1-ϵ), (93b)
P00=λ(1-ϵ)2+(1-λ)ϵ2. (93c)

Note that by this procedure we only obtain non-negative correlations C ∈ [0, 1].

Using P11 from Eq (93), we can express C as

C(λ,ϵ)=P11-γ2γ(1-γ)=(1-λ)[λϵ2+(1-ϵ)2]γ(1-γ). (94)

Similarly, the mean activity of the correlated patterns can be expressed as a function of λ and ϵ as

γ(λ,ϵ)=ξi1i=ξi2i=λϵ+(1-λ)(1-ϵ). (95)

So far, we showed how to generate correlated patterns given the probabilities λ and ϵ. Conversely, how do we choose λ and ϵ given the mean activity γ and the correlation C, C ≥ 0? To this end, we invert the above relations in order to solve for λ(C, γ) and ϵ(C, γ):

λ=γ+ϵ-12ϵ-1, (96a)
2ϵ3-3ϵ2+[1+2γ(1-γ)(1-C^)]ϵ-γ(1-γ)(1-C^)=0, (96b)

Eq (96b) has up to three solutions, we chose those that are real and in the range [0, 1].

The same procedure can be generalized to generate several correlate binary patterns. The general formula for the joint probability can be written as follows:

Px1,,xn=λϵa(1-ϵ)b+(1-λ)ϵb(1-ϵ)a, (97)

where a=μ=1nxμ is the number of xμ variables taking value 1 and b = na is the number of xμ variables taking value 0. The value of the joint probabilities in Eq (97) is invariant under permutation of the xμ.

Iterative overlap-generating model

In this subgroup construction, we do not define any parent pattern. We define the number of active neurons as γN and the number of pair-wise shared neurons as γcN.

  • 1) We define the set of “untouched neurons”, which counts all neurons at the beginning of the procedure

  • 2) To create pattern 1 we randomly sample γN neurons and exclude the sampled neurons from the untouched ones.

We follow the iterative steps, from 3) to 5), to create patterns 2 to p.

  • 3) For every pattern ξμ with μ from 2 to K, compare it with each of the already created patterns. Let’s suppose we are comparing the new pattern μ with the already formed pattern ν. a) check how many neurons are in common between the two. b) sample from pattern ν the remaining neurons needed to reach γcN shared neurons.

  • 4) Complete pattern μ by adding neurons from the untouched ones until reaching γN active units.

  • 5) Remove the units used in point 4 from the untouched ones.

It is important to underline the necessity of point 3a). To illustrate this point, let us consider the case we are building a subgroup of 3 patterns. We build the first one as in point 2. When we build pattern two starting from scratch, it does not share any neuron with pattern 1, so we just sample γcN from pattern 1 and γN(1 − c) from the untouched neurons. Now we move to pattern 3. As before, it does not share neurons from pattern 2, so we pick γcN from it. Now we compare pattern 3 with pattern 1: it can be that between the neurons we picked from pattern 2 some belong to pattern 1 as well, that’s why we need to adjust the number of neurons to pick in order to preserve the correct amount of pair-wise correlation.

When the subgroup size K is big however it is still possible to exceed the correct fraction of shared neurons between some of the patterns that are built last. Let’s suppose we are creating a subgroup of size K = 16, I start by applying point 3 of the algorithm between pattern 16 and 15, then pattern 16 and 14 and so on. It can be that when we get to the point of picking neurons from pattern 4, 3, 2, 1 we take some neurons that also belong to pattern 15 but they are not the ones we picked in the previous iteration and thus get accepted. This creates a higher correlation between the last built patterns in large subgroups. We checked that this does not influence significantly the average pairwise correlation during the virtual experiments described in the next section.

Comparing algorithm predictions with experimental data

The experimental dataset of Fig 6 comes from a previous publication [6]. Data were collected in 100 recording sessions with epileptic patients implanted with chronic depth electrodes in the MTL for the monitoring of epileptic seizures. Micro-wires recorded the localized neural activity; spike detection and sorting allowed to identify the activity of 4066 single neurons. During recordings, patients were shown different pictures of known people and places repeated several times. For each neuron, the stimuli eliciting a response were identified using a statistical criterion based on the modulations of firing rate during stimulus presentation compared to baseline epochs. For additional details on the dataset and data processing we refer to the original publication. The association between each pair of stimuli was estimated using a web-based association score.

In order to compare the predictions of the algorithms with the data, we try to reproduce the real data by running virtual experiments based on the three algorithms presented in the previous section. In each virtual experiment we replicate the conditions of the real experiment as follows. For each real experimental session, we first extract the number of responsive neurons in each session. We then group the presented stimuli into clusters based on an association matrix derived from the web-association scores. To do so, we use an hierarchical agglomerating clustering algorithm with threshold equal to the mean of the association matrix for the session. Such clusters indicate the amount and the size of the patterns subgroups we have to build for the corresponding virtual experiment.

We can then proceed with the virtual experiment: in each session we a) build subgroups of patterns in the same number and size as the clusters of stimuli for each of the three algorithms and then b) sample a neuron at the time and count to how many patterns does it respond to. c) Finally, the count of how many stimuli a neuron responds to that of other sessions. We sample neurons until we match the number of responsive neurons with that of the real experimental session. Each virtual experiment counts N = 105 neurons and it is run 40 times and plot in Fig 6C the normalised mean and standard deviation.

We choose to ignore non-responding neurons in our analysis, since it is likely that the proportion of non-responsive neurons compared to that of responsive ones is largely underestimated in the experiment (non-responsive neurons are more likely to remain silent during the experiment and not to be recorded at all).

Comparing virtual experiments and expected distributions

It is also possible to compare the virtual experiments with the theoretical distributions in Eqs (89) and (97). Eqs (89) and (97) provide the probability that a neuron is selective to k out of K patterns if a single subgroup of stimuli is stored in the network. But how do we combine such probabilities when several subgroups of patterns are stored in the network? We define Ψs(k) the probability that a neuron responds to exactly k patterns in session s. We know from the previous session the number and sizes Gj of subgroups present in each session. Then

Ψs(k)=j=kmaxKGjPj(k)ζj (98)

where maxK is the biggest between all subgroup sizes Kj and ζj = 1 − Pj(0) is the probability that a neuron takes part into the subgroup j. The formula Eq 98 is valid in the assumption that subgroups are strictly disjoint, meaning that we assume that the same can not take part into encoding patterns belonging to different subgroups. This assumption is not true for the way we algorithmically build subgroups patterns in the virtual experiments, however dropping it make the expression for Ψs(k) not treatable. Finally the probabilities Ψs(k) from each session must be combined into the final distribution Ψfinal(k):

Ψfinal(k)=sNssampleΨs(k)skNssampleΨs(k)=N1sampleΨ1(k)+N2sampleΨ2(k)+...Ntotsample (99)

where Nssample is the amount of responsive neurons measured in each experimental session and Ntot is the total amount of measured responsive neurons. In the last passage, note that skNssampleΨs(k)=sNssamplekΨs(k)=Ntot, since ∑kΨs(k) in every session s. The comparison between the theoretical distributions, the virtual experiments and the experimental data is shown in S6 Fig. The virtual experiments are the same a in Fig 6: we re-run the experiment 40 times and took the average (main points) and standard deviation (error bars). The small mismatch between the theoretical predictions and virtual experiments is due to the fact that in the theoretical prediction we do not allow the same neurons to take part to two or more subgroups of concepts, while there is no such a restriction in the virtual experiment. Theory prediction and mean of the virtual experiments are really close, proving that only very few neurons take part in encoding different subgroups.

Heterogeneous frequency-current curves

The frequency-current function of model neurons is neuron-specific and re-written as

ϕi(x)=(rmax)i-(rmin)i1+e-b^(x-(h^0)i)+(rmin)i, (100)

where the values of (rmin)i and (rmax)i are randomly sampled for each neuron from a Gaussian distribution with mean and standard deviation μmin, σmin and μmax, σmax respectively. The parameter (h^0)i is then defined as (h^0)i=h0((rmax)i-(rmin)i)/(Aμmax2), where h0 is a global constant. Finally in the firing rate equation, we re-scale the firing rates as follows:

riMax[0,ri-(rmin)i(rmax)i-(rmin)i]μmax. (101)

In Fig 7 we choose the parameters μmin = 0 Hz, μmax = 1 = 40 Hz, and σmin = σmax = 4 Hz.

Diluted weight matrix

We define an attractor neural network of N units, where each unit receives input from K others. The probability of having a connection between two units is d = M/N. The load of the network is defined as α = P/N, where P is the total number of patterns. We also assume that A/d = constant (to be introduced into the dimensional analysis). The input term Eq 7 is filtered with transfer function ϕ, which is chosen to be a sigmoid as in Eq 9. The connection matrix, wij, contains the synaptic weights between neurons i and j, but, compared to Eq 5, connections are diluted with probability d as defined in [24]

wij=ANγ(1-γ)dijdμP(ξiμ-γ)(ξjμ-γ) (102)

where dij is 1 with probability M/N and 0 otherwise and the constant A can be interpret as “connection strength”. In order for the weights to have expectation 〈wij〉 = 0, we subtract the mean activity of patterns <ξiμ>=γ. Using the similarity measure introduced in Eq 2, the input terms hj can also be re-written as a function of the overlaps m1, …, mP, by using the definition of the weights wij, Eq (102), and that of the overlaps.

hi=jNwijrj=ANdγ(1-γ)jNdijμP(ξiμ-γ)(ξjμ-γ)rj==ANdγ(1-γ)jNdij(ξi1-γ)rj+ANdγ(1-γ)jNdijμ=2P(ξjμ-γ)(ξiμ-γ)rj (103)

where we have separate the “signal” related to the first pattern being retrieved and a noise term Yi. We write hi = Am1 + Yi, with

Yi=ANdγ(1-γ)jN(1-dij)(ξi1-γ)rj+ANdγ(1-γ)jNdijμ=2P(ξjμ-γ)(ξiμ-γ)rj (104)

Since the terms dij and (ξi1-γ) are independent, we have 〈Yi〉.

We assume Yi to be distributed like a Gaussian with variance

σ2i=(Yi)2i=1NiNA2N2d2γ2(1-γ)2μ1Pν1P(ξiμ-γ)(ξiν-γ)jk(ξjμ-γ)(ξkν-γ)dijdikrjrk=1NiNA2N2d2γ2(1-γ)2μ1P(ξiμ-γ)2jdij(ξjμ-γ)2rj2 (105)

in the last passage, we used the fact that (ξjμ-γ)(ξkν-γ)=δjkγ(1-γ) and dij2=dij. We then apply the same independence argument as used for the signal term and obtain

σ2=A2rmax2d2γ(1-γ)dμ(mμ)2 (106)

From now on the passages are the same as in the SI, except maybe the correction term for excluding self-interaction, which I should recompute.

The final difference in the equations is that the term αrz, where α′ = P/N should be substituted with αdrz. The two terms however are equivalent since α′ = αd.

Numerical solutions

The code used to generate the results of this work can be downloaded from: https://github.com/ChiaraGastaldi/pub_Gastaldi_2021_AttractorNetwork.git.

Two correlated patterns: Finding the fixed points

The system in Eq (43) is solved numerically to obtain the fixed nullclines, points, and flux arrows, plotted in Fig 9. Fixed points are obtain through a grid search in the three-dimensional space spanned by m1, m2 and R. For each value of Rval ∈ [0, max(R)] and mval1,mval2[Lowerbound,Upperbound], Eqs (44d)(44g) are solved. We call the value of R obtained by Eq (44d) Rreconstructed. If Rval and Rreconstructed are close enough, namely

|Rval-Rreconstructed|<correction-constant·step. (107)

The quantity called “step” is the step size of the linear space we used to span R,

step=max(R)Resolution (108)

The correction constant can increase or decrease the range in which we accept a value Rval as a valid solution: it is equal to 1 in most cases, but can be chose to be a bit bigger than one to avoid counting the same fixed point too many times. The values of Rval that satisfy Eq (107) are then used to solve Eq (44), providing the values mreconstructed1 and mreconstructed2. Analogously to before, we find the solutions of Eq (44) comparing the values mval1 and mval2 with the recomputed counterparts mreconstructed1 and mreconstructed2 as follows

|mvalμ-mreconstructedμ|<correction-constant·step, (109)

where the step is defined as

step=|Upperbound-Lowerbound|Resolution. (110)

List of parameters

Figs 1C, 9A and 5 and S3 Fig) resolution = 1000, correction-constant = 1, size = 1000, max(R) = 0.3, lower bound = -0.2, upper bound = 1.2. S1 Fig) resolution = 1000, correction-constant = 1, size = 1000, max(R) = 0.3, lower bound = -0.05, upper bound = 1.05. Fig 2A and 2B) resolution = 100, correction-constant = 1.1, size = 50, max(R) = 0.3, lower bound = -0.05, upper bound = 1.05. S2 Fig) resolution = 500, correction-constant = 1, size = 500, max(R) = 0.3, lower bound = -0.2, upper bound = 1.2. S5 Fig) resolution = 500, correction-constant = 1, size = 200, max(R) = 0, lower bound = -0.2, upper bound = 1.2.

Two correlated patterns with adaptation and periodic inhibition

In order to solve the dynamical equations of the mean-field in the presence of adaptation and global inhibition (as done in S3 and S4 Figs) we compute at each point in time ϕx1x2¯(m1,m1,Θx1x2), p, q, R and J0(t). In particular, the four ϕx1x2¯(m1,m1,Θx1x2) are solved first and recursively since they are functions of themselves. We then update Θx1,x2(t) with Euler method. Finally, we compute m1(t), m2(t). In order to compute m1(t), m2(t), we make a time-scale separation argument. We assume that m1(t) and m2(t) dynamics are much faster than Θx1,x2(t) and J0(t), τTJ0<Tθ. According to this approximation, at each point in time we let m1(t) and m2(t) reach their equilibrium values given the current Θx1,x2(t) and J0(t). In other words, at each point in time, we consider all dynamical quantities frozen, than let m1(t) and m2(t) evolve according to their dynamics (we use Euler method) until convergence, and finally update the other quantities.

To find the fixed points in S3 Fig, we proceed like in the non-adaptive case: we do a grid search in the space spanned by m1, m2 and R. For each solution of R, ϕx1x2¯(m1,m1,Θx1x2) are computed recursively. Finally, for the obtained values of R and ϕx1x2¯(m1,m1,Θx1x2), the solutions of m1 and m2 are found.

Excluding self-interaction: A numerical approximation

In Fig 2B, we compute the critical correlation for non-zero network load, α > 0, in the case we consider the correction to exclude self interaction. To find the numerical solutions of the fixed points, we have approximated the input term h(x1, x2, z) to the first order in z as follows:

h(x1,x2,z)=h(x1,x2,z)z+Aαrz. (111)

Then the quantity

h(x1,x2,z)z=Armax(x1-γ)m1+Armax(x2-γ)m2+A2qαϕ(h(x1,x2,z))(1-Aq)z (112)

can be approximated by

h(x1,x2,z)zArmax(x1-γ)m1+Armax(x2-γ)m2+A2qαϕ(h(x1,x2,z)z)(1-Aq) (113)

which is equivalent to take the order 0 term into the Taylor expansion of h(x1, x2, z) for small z.

Stability of the fixed points

The stability of the fixed points in Figs 9, S5 and S2 is obtained by computing the Jacobian matrix of the differential equations for m1 and m2 from Eqs (43a) and (43b) respectively. Analogously, the stability of the fixed points in S2 and S3 Figs are obtained by computing the Jacobian matrix of the differential equations for m1 and m2 from Eq (68ca-b).

When the steepness of the transfer function is very high, b > 1000, we approximate the transfer function with an Heaviside. The system Eq (43) as well as the Jacobian matrix are rewritten in a simpler way for b → ∞ as can be found in Eq (78).

In the numerical computation of the Jacobian matrix computed in Section “Stability of the fixed points”, we exploited the symmetries under exchange of m1 and m2, for example J22(m1, m2) = J11(m2, m1) and so on.

Supporting information

S1 Text. Supplementary text for “When shared concept cells support associations: Theory of overlapping memory engrams”.

(PDF)

S1 Fig. Evolution of m1(t) and m2(t) according to the mean-field dynamics for super-critical correlation.

The system is initialized in the rest state. During the stimulation period (0.5–8s) m1(t) receives external input. A) The system state is plotted in the phase-plane before, during and after stimulation respectively. B) The delay between the activation of m1(t) and m2(t) is highlighted. Parameters: γ = 0.002, b^=100, h^0=0.25, rmax = 1, τ^=τ=1rmax, α = 0, C = 0.2.

(TIF)

S2 Fig. Estimation of the correlation range in which retrieval of a chain of concepts is possible.

A) Estimation of the maximum correlation, which correspond to the loss of the two single retrieval states, when J0 is lowest. B) Estimation of the minimum correlation, which corresponds to the creation of the stable fixed point at m1 = m2 > 0, when the inhibition J0 is at its maximum. In both A and B adaptation is frozen and θ = 0. Parameters: γ = 0.002, α = 0, b^=50, h^0=0, min(J^0)=0.7, min(J^0)=1.2.

(TIF)

S3 Fig

A) Dynamical mean-field solutions for m1 and m2 in the case of two independent patterns. B) Phase planes corresponding to the minimum (J0 = 0.7) and maximum (J0 = 1.2) value of inhibition in the case of two independent patterns. C,D) Same as A and B, but for correlated patterns C = 0.2. Parameters in A—D: γ = 0.1, α = 0, b = 100. E, F) Same as C and D but in the low activity regime and for independent patterns. G, H) Same as C and D but in the low activity regime. Parameters in E—H: γ = 0.002, α = 0, b = 100, τθ = 45, T = 0.015, TJ0=25. For the dynamics: resolution = 200, factor = 1. For the phase-planes: resolution = 1000, factor = 1, upper bound = 1.2, lower bound = -0.2 (same as Figs 2 and 4).

(TIF)

S4 Fig

Retrieval dynamics in the presence of adaptation according to the mean-field equations (dashed lines), and comparison with Fig 4A (shaded solid lines) A) Only two patterns are correlated. B) Four patterns are correlated. Parameters: N = 104, P = 16 in full network simulations and α = 0 in mean-field. γ = 0.002, τθ = 45, T = 0.015, TJ0=25 in both.

(TIF)

S5 Fig. Equivalent of Fig 9 but with the parameters extracted from [24].

In A and B the transfer function parameters are taken as those of function ϕ in [24]: A = 3.55, rmax = 76.2, b = 0.82, h0 = 2.46. On the other hand, in C and D I estimated the parameters of a Sigmoid function that fits the function f(ϕ) in [24] as follows: A = 3.55, rmax = 0.83, b = 4.35, h0 = 1.7. In all plots γ = 0.001. A and C) The phase-plane for c = 0 shows the position of fixed points. B and D) Bifurcation diagram and critical fraction of shared neurons according to different parameter choices.

(TIF)

S6 Fig. Comparison between model prediction and data.

Probability of finding a neuron responding to a given number of concepts as measured from experimental data (black stars), predicted by the three algorithms (as in Fig 6, the area between error bar of one standard deviation is shaded) and theoretically forecast for the indicator neuron model (light blue) and for the hierarchical generative model (light green) obtained from Eq (99). The theoretical predictions are not smooth curves due to choice of matching the subgroups sizes to the dataset.

(TIF)

Acknowledgments

We thanks to Valentin Marc Schmutz, Martin Barry, Alireza Modirshanechi and Johanni Brea for useful comments and discussions.

Data Availability

All relevant data are within the manuscript and its Supporting information files. The code used to numerically solve the equations derived in the manuscript is available at https://github.com/ChiaraGastaldi/pub_Gastaldi_2021_AttractorNetwork.git.

Funding Statement

WG and CG were supported by the Swiss National Science Foundation (www.nsf.gov), grant agreement 200020_184615 and by the European Union Horizon 2020 Framework Program (https://ec.europa.eu/programmes/horizon2020/) under agreement no. 785907 (HumanBrain Project, SGA2). RQQ acknowledges support from Biotechnology and Biological Sciences Research Council. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Quiroga R Quian, Reddy Leila, Kreiman Gabriel, Koch Christof, and Fried Itzhak. Invariant visual representation by single neurons in the human brain. Nature, 435(7045):1102, 2005. doi: 10.1038/nature03687 [DOI] [PubMed] [Google Scholar]
  • 2. Ison Matias J and Quiroga Rodrigo Quian. Selectivity and invariance for visual object perception. Front Biosci, 13:4889–4903, 2008. doi: 10.2741/3048 [DOI] [PubMed] [Google Scholar]
  • 3. Quiroga Rodrigo Quian. Neural representations across species. Science, 363(6434):1388–1389, 2019. doi: 10.1126/science.aaw8829 [DOI] [PubMed] [Google Scholar]
  • 4. Waydo Stephen, Kraskov Alexander, Quiroga Rodrigo Quian, Fried Itzhak, and Koch Christof. Sparse representation in the human medial temporal lobe. Journal of Neuroscience, 26(40):10232–10234, 2006. doi: 10.1523/JNEUROSCI.2101-06.2006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Ison Matias J, Quiroga Rodrigo Quian, and Fried Itzhak. Rapid encoding of new memories by individual neurons in the human brain. Neuron, 87(1):220–230, 2015. doi: 10.1016/j.neuron.2015.06.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. De Falco Emanuela, Ison Matias J, Fried Itzhak, and Quiroga Rodrigo Quian. Long-term coding of personal and universal associations underlying the memory web in the human brain. Nature communications, 7:13408, 2016. doi: 10.1038/ncomms13408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Rey Hernan G, De Falco Emanuela, Ison Matias J, Valentin Antonio, Alarcon Gonzalo, Selway Richard, Richardson Mark P, and Quiroga Rodrigo Quian. Encoding of long-term associations through neural unitization in the human medial temporal lobe. Nature communications, 9(1):1–13, 2018. doi: 10.1038/s41467-018-06870-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Rey Hernan G., Gori Belen, Chaure Fernando J., Collavini Santiago, Blenkmann Alejandro O., Seoane Pablo, Seoane Eduardo, Kochen Silvia, and Quiroga Rodrigo Quian. Single neuron coding of identity in the human hippocampal formation. Current Biology, 30(6):1152–1159.e3, 2020. doi: 10.1016/j.cub.2020.01.035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Romani Sandro, Pinkoviezky Itai, Rubin Alon, and Tsodyks Misha. Scaling laws of associative memory retrieval. Neural computation, 25(10):2523–2544, 2013. doi: 10.1162/NECO_a_00499 [DOI] [PubMed] [Google Scholar]
  • 10. Recanatesi Stefano, Katkov Mikhail, Romani Sandro, and Tsodyks Misha. Neural network model of memory retrieval. Frontiers in computational neuroscience, 9:149, 2015. doi: 10.3389/fncom.2015.00149 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Recanatesi Stefano, Katkov Mikhail, and Tsodyks Misha. Memory states and transitions between them in attractor neural networks. Neural computation, 29(10):2684–2711, 2017. doi: 10.1162/neco_a_00998 [DOI] [PubMed] [Google Scholar]
  • 12. Naim Michelangelo, Katkov Mikhail, Romani Sandro, and Tsodyks Misha. Fundamental law of memory recall. Physical Review Letters, 124(1):018101, 2020. doi: 10.1103/PhysRevLett.124.018101 [DOI] [PubMed] [Google Scholar]
  • 13. Hopfield John J. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8):2554–2558, 1982. doi: 10.1073/pnas.79.8.2554 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Weisbuch Gérard and Fogelman-Soulié Françoise. Scaling laws for the attractors of hopfield networks. Journal de Physique Lettres, 46(14):623–630, 1985. doi: 10.1051/jphyslet:019850046014062300 [DOI] [Google Scholar]
  • 15. Amit Daniel J and Amit Daniel J. Modeling brain function: The world of attractor neural networks. Cambridge university press, 1992. [Google Scholar]
  • 16. Kanter I and Sompolinsky Haim. Associative recall of memory without errors. Physical Review A, 35(1):380, 1987. doi: 10.1103/PhysRevA.35.380 [DOI] [PubMed] [Google Scholar]
  • 17. Tsodyks Mikhail V and Feigel’man Mikhail V. The enhanced storage capacity in neural networks with low activity level. EPL (Europhysics Letters), 6(2):101, 1988. doi: 10.1209/0295-5075/6/2/002 [DOI] [Google Scholar]
  • 18. Tonegawa Susumu, Pignatelli Michele, Roy Dheeraj S, and Ryan Tomás J. Memory engram storage and retrieval. Current opinion in neurobiology, 35:101–109, 2015. doi: 10.1016/j.conb.2015.07.009 [DOI] [PubMed] [Google Scholar]
  • 19. Josselyn Sheena A and Tonegawa Susumu. Memory engrams: Recalling the past and imagining the future. Science, 367 (6473), 2020. doi: 10.1126/science.aaw4325 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Rennó-Costa César, Lisman John E, and Verschure Paul FMJ. A signature of attractor dynamics in the ca3 region of the hippocampus. PLoS Comput Biol, 10(5):e1003641, 2014. doi: 10.1371/journal.pcbi.1003641 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Wills Tom J, Lever Colin, Cacucci Francesca, Burgess Neil, and O’Keefe John. Attractor dynamics in the hippocampal representation of the local environment. Science, 308(5723):873–876, 2005. doi: 10.1126/science.1108905 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Bös Siegfried, Kühn R, and van Hemmen JL. Martingale approach to neural networks with hierarchically structured information. Zeitschrift für Physik B Condensed Matter, 71(2):261–271, 1988. doi: 10.1007/BF01312798 [DOI] [Google Scholar]
  • 23. Boboeva Vezha, Brasselet Romain, and Treves Alessandro. The capacity for correlated semantic memories in the cortex. Entropy, 20(11):824, 2018. doi: 10.3390/e20110824 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Pereira Ulises and Brunel Nicolas. Attractor dynamics in networks with learning rules inferred from in vivo data. Neuron, 99(1):227–238.e4, 2018. doi: 10.1016/j.neuron.2018.05.038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Hopfield J. J. Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the National Academy of Sciences, 81(10):3088–3092, 1984. doi: 10.1073/pnas.81.10.3088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Katkov Mikhail, Romani Sandro, and Tsodyks Misha. Effects of long-term representations on free recall of unrelated words. Learning & Memory, 22(2):101–108, 2015. doi: 10.1101/lm.035238.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Guzman Segundo Jose, Schlögl Alois, Frotscher Michael, and Jonas Peter. Synaptic mechanisms of pattern completion in the hippocampal ca3 network. Science, 353(6304):1117–1123, 2016. doi: 10.1126/science.aaf1836 [DOI] [PubMed] [Google Scholar]
  • 28. Quiroga Rodrigo Quian. Concept cells: the building blocks of declarative memory functions. Nature Reviews Neuroscience, 13(8):587–597, 2012. doi: 10.1038/nrn3251 [DOI] [PubMed] [Google Scholar]
  • 29. Quiroga Rodrigo Quian. Plugging in to human memory: advantages, challenges, and insights from human single-neuron recordings. Cell, 179(5):1015–1032, 2019. doi: 10.1016/j.cell.2019.10.016 [DOI] [PubMed] [Google Scholar]
  • 30. Podlaski William F, Agnes Everton J, and Vogels Tim P. Context-modular memory networks support high-capacity, flexible, and robust associative memories. BioRxiv, 2020. [Google Scholar]
  • 31. Russo Eleonora, Namboodiri Vijay MK, Treves Alessandro, and Kropff Emilio. Free association transitions in models of cortical latching dynamics. New Journal of Physics, 10(1):015008, 2008. doi: 10.1088/1367-2630/10/1/015008 [DOI] [Google Scholar]
  • 32. Russo Eleonora and Treves Alessandro. Cortical free-association dynamics: Distinct phases of a latching network. Physical Review E, 85(5):051920, 2012. doi: 10.1103/PhysRevE.85.051920 [DOI] [PubMed] [Google Scholar]
  • 33. Akrami Athena, Russo Eleonora, and Treves Alessandro. Lateral thinking, from the hopfield model to cortical dynamics. Brain research, 1434:4–16, 2012. doi: 10.1016/j.brainres.2011.07.030 [DOI] [PubMed] [Google Scholar]
  • 34. Amit Daniel J and Brunel Nicolas. Model of global spontaneous activity and local structured activity during delay periods in the cerebral cortex. Cerebral cortex (New York, NY: 1991), 7(3):237–252, 1997. [DOI] [PubMed] [Google Scholar]
  • 35. Amit Daniel J, Gutfreund Hanoch, and Sompolinsky Haim. Information storage in neural networks with low levels of activity. Physical Review A, 35(5):2293, 1987. doi: 10.1103/PhysRevA.35.2293 [DOI] [PubMed] [Google Scholar]
  • 36. Mézard Marc, Parisi Giorgio, and Virasoro Miguel. Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company, 1987. [Google Scholar]
  • 37. Shamir Maoz and Sompolinsky Haim. Thouless-anderson-palmer equations for neural networks. Physical Review E, 61(2):1839, 2000. doi: 10.1103/PhysRevE.61.1839 [DOI] [PubMed] [Google Scholar]
  • 38. Shiino Masatoshi and Fukai Tomoki. Self-consistent signal-to-noise analysis and its application to analogue neural networks with asymmetric connections. Journal of Physics A: Mathematical and General, 25(7):L375, 1992. doi: 10.1088/0305-4470/25/7/017 [DOI] [Google Scholar]
  • 39. Amit Daniel J, Gutfreund Hanoch, and Sompolinsky Haim. Storing infinite numbers of patterns in a spin-glass model of neural networks. Physical Review Letters, 55(14):1530, 1985. doi: 10.1103/PhysRevLett.55.1530 [DOI] [PubMed] [Google Scholar]
  • 40. Andersen Per, Morris Richard, Amaral David, Bliss Tim, and O’Keefe John. The hippocampus book. Oxford university press, 2006. [Google Scholar]
  • 41. Wilson Hugh R and Cowan Jack D. Excitatory and inhibitory interactions in localized populations of model neurons. Biophysical journal, 12(1):1–24, 1972. doi: 10.1016/S0006-3495(72)86068-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Muscinelli S. P., Gerstner W., and Schwalger T. How single neuron properties shape chaotic dynamics and signal transmission in random neural networks. PLOS Comput. Biol., 15(6):e1007122, June 2019. doi: 10.1371/journal.pcbi.1007122 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009691.r001

Decision Letter 0

Lyle J Graham, Abigail Morrison

12 Jul 2021

Dear Mrs Gastaldi,

Thank you very much for submitting your manuscript "When shared concept cells support associations: theory of overlapping memory engrams" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Abigail Morrison

Associate Editor

PLOS Computational Biology

Lyle Graham

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors use a computational model of attractor neural network for modeling association between concepts through shared concept cells. Their main findings are: (i) there is a minimum fraction c_min of shared cells, below which assemblies can't be recalled simultaneously, i.e. no association is created between the two concepts; (ii) there is a maximum fraction of shared cells c_max, above which assemblies can't be recalled individually, i.e. the two concepts are merged into one; (iii) c_min exists in the presence of global inhibition, and its value should be above chance level; (iv) the value of c_max depends on the threshold and steepness of the frequency-current curves, and does not depend on memory load; (v) in the presence of a periodically modulated background signal, the recall takes the form of association chains; (vi) predictions of a non hierarchical iterative overlap generating model match experimental data on the number of concepts to which a neuron responds, suggesting MTL encodes memory engrams in a non hierarchical way.

I find the results interesting and relevant for the PLOS CB audience. It is also well organized and clearly written.

Some minor comments:

Section - Association chains

It would be interesting to have a bit more information about the reactivation. Is there an order? What defines it? When does the cycle closes? Is it clear what defines the length/order of a cycle?

Section - How does a network embed groups of overlapping memories?

Motivation for the 3 different algorithms is not very clear. Hierarchical vs. non-hierarchical is clear, but why using two non-hierarchical, for example, is not well motivated. Brief description/intuition of the 3 different algorithms on results could help.

Section - Robustness to heterogeneity

What happens to the model predictions for further dilution of the weight matrix (d<0.8)? Some discussion about this would be interesting, considering the lower connection probability found in CA3 networks (Guzman et al., Science, 2016. DOI: 10.1126/science.aaf1836).

Typos:

Caption Figure 4C: "By decreasing their mean activity γ,"

Line 379: "transitions could also triggered"

Line 418: "can be interpret"

Reviewer #2: I reviewed a previous version of this manuscript (in thesis chapter form). The (minor) issues I brought up have been resolved in this version. I report below my previous summary of the findings and include suggestions for a couple of possible additional discussion points.

The focus of this work is related to the experimental observation of “concept” cells, discovered by recording neurons from the medial temporal lobe (MTL) of human subjects suffering from epilepsy. These neurons are active whenever specific abstract information - e.g., a specific person or object - is presented to or retrieved by a subject. In several experiments from various labs, it has been estimated that the representation of a given concept in MTL circuits is very sparse, only a small fraction of neurons is active on average for a given concept. These observations, and several previous other findings related to sparse memory representations of other stimuli in other species, have driven a considerable amount of research in artificial neural networks capable of storing and retrieving many sparse memory items. In these networks, memory items are represented by the activation of a small group of neurons. The choice of which neuron encodes which item is decided, independently for each neuron and each item, by tossing a biased coin. This bias determines the sparsity of the representations. The synaptic weight between every pair of neurons is then determined according to their desired activation across all the items. With an appropriate choice of the weights, the activation of a group of neuron becomes an attractor of the network dynamics.

The questions addressed in this work arise from the discrepancies between the experimental observations and the models. The standard assumption of random and uncorrelated sparse encoding of patterns in attractor models has an immediate consequence for the conjunctive encoding of multiple items. For instance, if a neuron participates to the encoding of a memory item with probability p, the probability that the same neuron will encode two memory items is p^2. This basic observation is at odds with empirical findings from statistical estimates of the encoding of concept cells. In the experiments, pairs of concepts have a higher probability of being encoded by a concept cell compared to the chance level defined above, especially if the two concepts are related (e.g., semantically). As a first question, the authors hence asked if and how attractor networks can deal with the higher than chance correlation between representations of distinct items. The problem is systematically attacked with a mathematical analysis of a new model that incorporates these extra correlations. First, the scenario in which two correlated patterns are stored in an attractor network is analyzed. The problem is studied with a mean-field description of the system, corroborated by numerical solutions of the full network dynamics. As a result, the maximal amount of correlation between two patterns that would still allow the network to retrieve the individual patterns, as opposed to only a mixture of the two, could be characterized. From this complete description of the possible dynamical regimes for two correlated patterns, the authors moved to examine several variations of the problem within the theory: (i) two correlated patterns stored together with many other uncorrelated patterns (ii) several stored patterns where distinct pairs are correlated (iii) correlations among more than two patterns. This is a thorough characterization of the possible states of networks storing correlated patterns in a variety of conditions, and it also shows that the estimated sparsity and correlations in concept cells are within the bound computed in the paper. This work shows that the standard attractor network models can be extended to account for interesting electrophysiological measurements in humans.

A second question addressed in the paper arises from another set of experimental and theoretical observations related to human memory. In free recall experiments, human participants are presented with a list of words, one word at a time. The participants are then asked to retrieve as many words as possible, in no particular order. There are a number of interesting systematic effects in the way the words are recalled in these experiments. Many of these effects can be explained by a variant of the standard attractor networks described above, where some additional dynamics (e.g., periodic modulation of inhibition or adaptation) induce transitions between the attractors in the networks, mimicking the sequential retrieval of words in free recall. In these models, the transitions are determined by the correlation between representations, which are assumed to arise from the random overlaps between the random uncorrelated representations. This mechanism relies on the finite size of the network models (and the not so sparse representations used in these models). The second question addressed in the paper is if and how the transition between the attractors would occur with very sparse, but correlated, patterns. In particular, the lower bounds on correlations that would allow the network dynamics to transition between attractors have been examined. The study of this problem allowed to show that the empirical estimates of correlation would also allow the network to perform retrieval in a way that could account for the results from free recall experiments.

Lastly, the paper examines three different models for generating correlated patterns (including models that rely on hierarchical representations of concepts). The resulting probability distributions for a neuron to be active for k different concepts are then compared with experimental data. The results argue against a hierarchical encoding in the MTL recordings. I particularly appreciate the effort of linking the theory with experimental data.

In summary, this work advances our understanding of computations with discrete attractors in recurrent neural networks. I think that this work is a novel useful contribution to the study of memory processes in theoretical neuroscience. This work also strengthens the link between theoretical and experimental approaches to the study of memory phenomena.

I recommend publication.

Below a couple of points that the authors might want to consider:

- The paper primarily examines correlations between disjoint pairs of stored items. It could be useful to see a discussion about scenarios where any pair has above chance correlations (e.g., some of the neurons tend to be active almost regardless of the identity of the memory item). Some concrete examples in hippocampus, in the context of spatial coding, can be found in (Rich, Liaw & Lee, 2014; Grosmark & Buzsaki, 2016) among others.

- The paper examines associative transitions between memory items in the presence of correlations, motivated by the experimental and modeling work on free recall. One prominent of feature of free recall is the sublinear scaling of the number of retrieved items vs the number of presented items. It would be useful to see a discussion about these scaling laws within the context of the theory proposed by the authors.

- In Discussion, the context-dependent disentanglement of memories could be compared and contrasted with the work described in (Podlaski, Agnes & Vogels, bioRxiv 2020).

Minor: “backgrong” typo in Fig 2B legend

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No: I have not seen a link to data and code, apologies if it's there and I missed it.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009691.r003

Decision Letter 1

Lyle J Graham, Abigail Morrison

29 Nov 2021

Dear Mrs Gastaldi,

We are pleased to inform you that your manuscript 'When shared concept cells support associations: theory of overlapping memory engrams' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Abigail Morrison

Associate Editor

PLOS Computational Biology

Lyle Graham

Deputy Editor

PLOS Computational Biology

***********************************************************

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009691.r004

Acceptance letter

Lyle J Graham, Abigail Morrison

16 Dec 2021

PCOMPBIOL-D-21-00730R1

When shared concept cells support associations: theory of overlapping memory engrams

Dear Dr Gastaldi,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Olena Szabo

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Supplementary text for “When shared concept cells support associations: Theory of overlapping memory engrams”.

    (PDF)

    S1 Fig. Evolution of m1(t) and m2(t) according to the mean-field dynamics for super-critical correlation.

    The system is initialized in the rest state. During the stimulation period (0.5–8s) m1(t) receives external input. A) The system state is plotted in the phase-plane before, during and after stimulation respectively. B) The delay between the activation of m1(t) and m2(t) is highlighted. Parameters: γ = 0.002, b^=100, h^0=0.25, rmax = 1, τ^=τ=1rmax, α = 0, C = 0.2.

    (TIF)

    S2 Fig. Estimation of the correlation range in which retrieval of a chain of concepts is possible.

    A) Estimation of the maximum correlation, which correspond to the loss of the two single retrieval states, when J0 is lowest. B) Estimation of the minimum correlation, which corresponds to the creation of the stable fixed point at m1 = m2 > 0, when the inhibition J0 is at its maximum. In both A and B adaptation is frozen and θ = 0. Parameters: γ = 0.002, α = 0, b^=50, h^0=0, min(J^0)=0.7, min(J^0)=1.2.

    (TIF)

    S3 Fig

    A) Dynamical mean-field solutions for m1 and m2 in the case of two independent patterns. B) Phase planes corresponding to the minimum (J0 = 0.7) and maximum (J0 = 1.2) value of inhibition in the case of two independent patterns. C,D) Same as A and B, but for correlated patterns C = 0.2. Parameters in A—D: γ = 0.1, α = 0, b = 100. E, F) Same as C and D but in the low activity regime and for independent patterns. G, H) Same as C and D but in the low activity regime. Parameters in E—H: γ = 0.002, α = 0, b = 100, τθ = 45, T = 0.015, TJ0=25. For the dynamics: resolution = 200, factor = 1. For the phase-planes: resolution = 1000, factor = 1, upper bound = 1.2, lower bound = -0.2 (same as Figs 2 and 4).

    (TIF)

    S4 Fig

    Retrieval dynamics in the presence of adaptation according to the mean-field equations (dashed lines), and comparison with Fig 4A (shaded solid lines) A) Only two patterns are correlated. B) Four patterns are correlated. Parameters: N = 104, P = 16 in full network simulations and α = 0 in mean-field. γ = 0.002, τθ = 45, T = 0.015, TJ0=25 in both.

    (TIF)

    S5 Fig. Equivalent of Fig 9 but with the parameters extracted from [24].

    In A and B the transfer function parameters are taken as those of function ϕ in [24]: A = 3.55, rmax = 76.2, b = 0.82, h0 = 2.46. On the other hand, in C and D I estimated the parameters of a Sigmoid function that fits the function f(ϕ) in [24] as follows: A = 3.55, rmax = 0.83, b = 4.35, h0 = 1.7. In all plots γ = 0.001. A and C) The phase-plane for c = 0 shows the position of fixed points. B and D) Bifurcation diagram and critical fraction of shared neurons according to different parameter choices.

    (TIF)

    S6 Fig. Comparison between model prediction and data.

    Probability of finding a neuron responding to a given number of concepts as measured from experimental data (black stars), predicted by the three algorithms (as in Fig 6, the area between error bar of one standard deviation is shaded) and theoretically forecast for the indicator neuron model (light blue) and for the hierarchical generative model (light green) obtained from Eq (99). The theoretical predictions are not smooth curves due to choice of matching the subgroups sizes to the dataset.

    (TIF)

    Attachment

    Submitted filename: Answer to Reviewers.pdf

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting information files. The code used to numerically solve the equations derived in the manuscript is available at https://github.com/ChiaraGastaldi/pub_Gastaldi_2021_AttractorNetwork.git.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES