Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2022 Oct 10;18(10):e1010547. doi: 10.1371/journal.pcbi.1010547

Multiple bumps can enhance robustness to noise in continuous attractor networks

Raymond Wang 1,2, Louis Kang 2,*
Editor: Xuexin Wei3
PMCID: PMC9584540  PMID: 36215305

Abstract

A central function of continuous attractor networks is encoding coordinates and accurately updating their values through path integration. To do so, these networks produce localized bumps of activity that move coherently in response to velocity inputs. In the brain, continuous attractors are believed to underlie grid cells and head direction cells, which maintain periodic representations of position and orientation, respectively. These representations can be achieved with any number of activity bumps, and the consequences of having more or fewer bumps are unclear. We address this knowledge gap by constructing 1D ring attractor networks with different bump numbers and characterizing their responses to three types of noise: fluctuating inputs, spiking noise, and deviations in connectivity away from ideal attractor configurations. Across all three types, networks with more bumps experience less noise-driven deviations in bump motion. This translates to more robust encodings of linear coordinates, like position, assuming that each neuron represents a fixed length no matter the bump number. Alternatively, we consider encoding a circular coordinate, like orientation, such that the network distance between adjacent bumps always maps onto 360 degrees. Under this mapping, bump number does not significantly affect the amount of error in the coordinate readout. Our simulation results are intuitively explained and quantitatively matched by a unified theory for path integration and noise in multi-bump networks. Thus, to suppress the effects of biologically relevant noise, continuous attractor networks can employ more bumps when encoding linear coordinates; this advantage disappears when encoding circular coordinates. Our findings provide motivation for multiple bumps in the mammalian grid network.

Author summary

Our brains maintain an internal sense of location and direction so we can, for example, find our way to the door if the lights go off. A class of neural circuits called continuous attractor networks is believed to be responsible for this ability. These circuits must be resilient against the myriad forms of imperfections and random fluctuations present in the brain, which can degrade the accuracy of their encoded information. We have discovered a new way in which continuous attractor networks can improve their robustness to noise: they should distribute their activity among multiple regions in the network, called bumps, instead of concentrating it in a single bump. Bump number is a fundamental feature of continuous attractor networks, but its connection to error suppression has never been appreciated. A recent experiment in rodents suggests that one such network indeed contains multiple regions of activity; our finding provides motivation for why such a configuration may have been evolved.

Introduction

Continuous attractor networks (CANs) sustain a set of activity patterns that can be smoothly morphed from one to another along a low-dimensional manifold [13]. Network activity is typically localized into attractor bumps, whose positions along the manifold can represent the value of a continuous variable. These positions can be set by external stimuli, and their persistence serves as a memory of the stimulus value. Certain CAN architectures are also capable of a feature called path integration. Instead of receiving the stimulus value directly, the network receives its changes and integrates over them by synchronously moving the attractor bump [46]. Path integration allows systems to estimate an external state based on internally perceived changes, which is useful in the absence of ground truth.

Path-integrating CANs have been proposed as a mechanism through which brains encode various physical coordinates. Head direction cells in mammals and compass neurons in insects encode spatial orientation by preferentially firing when the animal faces a particular direction relative to landmarks (Fig 1A, top; Refs [7] and [8]). They achieve this as members of 1D CANs whose attractor manifolds have ring topologies [9, 10]. For the case of compass neurons, a ring structure also exists anatomically, and its demonstration of continuous attractor dynamics is well-established [8, 1113]. Grid cells in mammals encode position by preferentially firing at locations that form a triangular lattice in 2D space (1D analogue in Fig 1A, bottom; Ref [14]). They are thought to form a 2D CAN with toroidal topology [1518], and mounting experimental evidence supports this theory [1922]. The ability for head direction cells, compass neurons, and grid cells to maintain their tunings in darkness without external cues demonstrates that these CANs can path integrate [8, 14, 23].

Fig 1. Continuous attractor networks with any number of bumps can produce head direction cells and grid cells.

Fig 1

(A) Desired tuning curves of a head direction cell and a 1D grid cell. (B) Orientation and position coordinates whose changes drive bump motion. (C) One- and two-bump ring attractor networks. Each black neuron produces the desired tuning curves in A. In the two-bump network, the coupling to coordinate changes is half as strong, and the second bump is labeled for clarity.

CANs also appear in studies of other brain regions and neural populations. Signatures of continuous attractor dynamics have been detected in the prefrontal cortex during spatial working memory tasks [2426]. Theorists have further invoked CANs to explain place cells [27, 28], hippocampal view cells [29], eye tracking [4, 6], visual orientation tuning [30, 31], and perceptual decision making [32, 33]. Thus, CANs are a crucial circuit motif throughout the brain, and better understanding their performance would provide meaningful insights into neural computation.

One factor that strongly affects the performance of CANs in path integration is biological noise. To accurately represent physical coordinates, attractor bumps must move in precise synchrony with the animal’s trajectory. Hence, the bump velocity must remain proportional to the driving input that represents coordinate changes [18]. Different sources of noise produce different types of deviations from this exact relationship, all of which lead to path integration errors. While noisy path-integrating CANs have been previously studied [10, 18, 34, 35], these works did not investigate of role of bump number. CANs with different connectivities can produce different numbers of attractor bumps, which are equally spaced throughout the network and perform path integration by moving in unison [16, 18, 36]. Two networks with different bump numbers have the same representational capability (Fig 1). They can share the same attractor manifold and produce neurons with identical tuning curves, as long as the coupling strength between bump motion and driving input scales appropriately. The computational advantages of having more or fewer bumps are unknown.

Our aim is to elucidate the relationship between bump number and robustness to noise. We first develop a rigorous theoretical framework for studying 1D CANs that path integrate and contain multiple bumps. Our theory predicts the number, shape, and speed of bumps. We then introduce three forms of noise. The first is Gaussian noise added to the total synaptic input, which can represent fluctuations in a broad range of cellular processes occurring at short timescales. The second is Poisson spiking noise. The third is noise in synaptic connectivity strengths; the ability for bumps to respond readily to driving inputs is generally conferred by a precise network architecture. We add Gaussian noise to the ideal connectivity and evaluate path integration in this setting. The first two forms of noise are independent over time and neurons, in contrast to the third. We find that networks with more bumps can better resist all three forms of noise under certain encoding assumptions. These observations are explained by our theoretical framework with simple scaling arguments. The following Results section presents all simulation findings and major theoretical conclusions; complete theoretical derivations are found in the Theoretical model section.

Results

Bump formation in a ring attractor network

We study a 1D ring attractor network that extends the model of Ref [37] to allow for multiple attractor bumps. It contains two neural populations α ∈ {L, R} at each network position x, with N neurons in each population (Fig 2A). Each neuron is described by its total synaptic input g that obeys the following dynamics:

τdgα(x,t)dt+gα(x,t)=βdyWβ(x,y)sβ(y,t)+A±αγb(t)+ζα(x,t), (1)

where ±L means − and ±R means +. Aside from spiking simulations, firing rates s are given by

sα(x,t)=ϕ[gα(x,t)], (2)

where ϕ is a nonlinear activation function. For all simulations in this Results section, we take ϕ to be the rectified linear unit (ReLU) activation function (Eq 35). Our theoretical formulas for diffusion coefficients and velocities in this section also assume a ReLU ϕ. In S1 Text, we consider a logistic ϕ instead and find that all major conclusions are preserved (Fig A in S1 Text), and in the Theoretical methods section, we derive most expressions for general ϕ. W is the synaptic connectivity and only depends on the presynaptic population β. It obeys a standard continuous attractor architecture based on local inhibition that is strongest at an inhibition distance l. Each population has its synaptic outputs shifted by a small distance ξl in opposite directions. We use the connectivity profile described in Fig 2B and Eq 38 for all simulations, but all theoretical expressions in this Results section are valid for any W. A is the resting input to all neurons. The driving input, or drive, b is proportional to changes in the coordinate encoded by the network; for the physical coordinates in Fig 1B, it represents the animal’s velocity obtained from self-motion cues. In our results, b is constant in time. It is coupled to the network with strength γ. We will consider various forms of noise ζ. Finally, τ is the neural time constant.

Fig 2. Bump formation in a ring attractor network.

Fig 2

(A) Network schematic with populations L and R and locally inhibitory connectivity W. (B and C) Networks with 200 neurons and 3 bumps. (B) Connectivity weights for a neuron at the origin. The inhibition distance is l = 29 and the connectivity shift is ξ = 2. (C) Steady-state synaptic inputs. Curves for both populations lie over each other. With a ReLU activation function, the firing rates follow the solid portions of the colored lines and are 0 over the dashed portions. The bump distance is λ = 200/3. Thick gray line indicates Eq 4. (D and E) Networks with 500 neurons. (D) More bumps and shorter bump distances are produced by smaller inhibition distances. Points indicate data from 10 replicate simulations. Line indicates Eq 5. (E) The inhibition distance l = 55 corresponds to the black point in D with λ = 125 and M = 4. These values also minimize the Lyapunov functional (Eq 6), which varies smoothly across λ for infinite networks (line) and takes discrete values for finite networks (points). (F) The scaled bump shape remains invariant across network sizes and bump numbers, accomplished by rescaling connectivity strengths according to Eq 7. Curves for different parameters lie over one another.

With no drive b = 0 and no noise ζ = 0, the network dynamics in Eqs 1 and 2 can be simplified to

τdg(x,t)dt+g(x,t)=2dyW(x-y)ϕ[g(y,t)]+A, (3)

where 2W(xy) = ∑β Wβ(x, y) and the synaptic inputs g are equal between the two populations. This baseline equation evolves towards a periodic steady-state g with approximate form (see also Ref [38]).

g(x)=acos2π(x-x0)λ+d. (4)

Expressions for a and d are given in the Theoretical model section (Eq 60). The firing rates s(x) = ϕ[g(x)] exhibit attractor bumps with periodicity λ, a free parameter that we call the bump distance (Fig 2C). x0 is the arbitrary position of one of the bumps. It parameterizes the attractor manifold with each value corresponding to a different attractor state up to λ.

The bump number M = N/λ is determined through λ. It can be predicted by the fastest-growing mode in a linearized version of the dynamics (Eq 43; Refs [39] and [40]). The mode with wavenumber q and corresponding wavelength 2π/q grows at rate (2W˜(q)-1)/τ, where W˜(q) is the Fourier transform of W(x). Thus,

2πλ=argmaxqW˜(q). (5)

Fig 2D shows that simulations follow the predicted λ and M over various inhibition distances l. Occasionally for small l, a different mode with a slightly different wavelength will grow quickly enough to dominate the network. A periodic network enforces an integer bump number, which discretizes the allowable wavelengths and prevents changes in λ and M once they are established. In an aperiodic or infinite system, the wavelength can smoothly vary from an initial value to a preferred length over the course of a simulation [18, 41]. To determine this preferred λ theoretically, we notice that the nonlinear dynamics in Eq 3 obey the Lyapunov functional

L=-dxdyW(x-y)s(x)s(y)+dx0s(x)dρϕ-1[ρ]-Adxs(x). (6)

In the Theoretical model section, we find for ReLU ϕ that L is minimized when q = 2π/λ maximizes W˜(q) (Eq 66). This is the same condition as for the fastest-growing mode in Eq 5 (Fig 2E). In other words, the wavelength λ most likely to be established in a periodic network is the preferred bump distance in an aperiodic or infinite system, up to a difference of one fewer or extra bump due to discretization.

We now understand how to produce different bump numbers M in networks of different sizes N by adjusting the inhibition distance l. To compare networks across different values of M and N, we scale the connectivity strength W according to

Wβ(x,y)MN. (7)

This keeps the total connectivity strength per neuron ∫dxWβ(x, y) constant over M and N. In doing so, the shape of each attractor bump as a function of scaled network position x/λ remains invariant (Fig 2F). Thus, Eq 7 removes additional variations in bump shape and helps to isolate our comparisons across M and N to those variables themselves. In S1 Text, we consider the alternative without this scaling and find that many major results are preserved (Fig B in S1 Text).

Bump dynamics: Path integration and diffusion

The drive b produces coherent bump motion by creating an imbalance between the two neural populations. A positive b increases input to the R population and decreases input to the L population (Fig 3A). Because the synaptic outputs of the former are shifted to the right, the bump moves in that direction. Similarly, a negative b produces leftward bump motion. The bump velocity vdrive can be calculated in terms of the baseline firing rates s(x) obtained without drive and noise (see also Refs [37] and [42]):

vdrive=-γbξdxd2sdx2τdx(dsdx)2. (8)

Fig 3. Dynamics in a ring attractor network.

Fig 3

(AC) Networks with 200 neurons and 3 bumps. (A) Synaptic inputs for populations L and R under drive b = 2. Snapshots taken at 150 ms intervals demonstrate rightward motion. (B) Bump velocity is proportional to drive. The connectivity shift is ξ = 2. (C) Bump velocity is largely proportional to connectivity shift. The drive is b = 0.5. (DH) Networks with synaptic input noise. (D) Bump displacements for 48 replicate simulations demonstrating diffusion with respect to coherent motion. Networks with 200 neurons and 1 bump. (E and F) Mean bump velocity is proportional to drive and remains largely independent of network size, bump number, and noise magnitude. (G and H) Bump diffusion coefficient scales quadratically with noise magnitude, remains largely independent of drive, and varies with network size and bump number. The noise magnitude is σ = 0.5 in D, E, and G, and the drive is b = 0.5 in D, F, and H. Values for both bumps in two-bump networks lie over each other. Points indicate data from 48 replicate simulations and bars indicate bootstrapped standard deviations. Dotted gray lines indicate Eqs 8 and 10.

As a note, these integrals, as well as subsequent ones, do not include the singular points at the edges of attractor bumps. Eq 8 states that bump velocity is proportional to drive b and connectivity shift ξ, which is reflected in our simulations, with some deviation at larger ξ (Fig 3B and 3C). The strict proportionality between v and b is crucial because it implies faithful path integration [18]. If b(t) represents coordinate changes (such as angular or linear velocity in Fig 1B), then the bump position θ(t) will accurately track the coordinate itself (orientation or position).

In contrast to drive, uncorrelated noise ζ produces bump diffusion. To illustrate this effect, we introduce one form of ζ that we call synaptic input noise. Suppose ζ is independently sampled for each neuron at each simulation timestep from a Gaussian distribution with mean 0 and variance σ2. Loosely, it can arise from applying the central limit theorem to the multitude of noisy synaptic and cellular processes occurring at short timescales. Then,

ζα(x,t)=0,ζα(x,t)ζβ(y,t)=σ2Δtδ(t-t)δαβδ(x-y), (9)

where the timestep Δt sets the resampling rate of ζ, and angle brackets indicate averaging over an ensemble of replicate simulations. Input noise causes bumps to diffuse away from the coherent driven motion (Fig 3D). The mean velocity 〈v〉 remains proportional to drive b, which means that the network still path integrates on average (Fig 3E). Since 〈v〉 is largely independent of noise magnitude σ, and the bump diffusion coefficient D is largely independent of b, drive and input noise do not significantly interact within the explored parameter range (Fig 3F and 3G). D can be calculated in terms of the baseline firing rates (see also Refs [35] and [43]):

Dinput=σ2Δt4τ2dx(dsdx)2. (10)

The quadratic dependence of D on σ is confirmed by simulation (Fig 3H).

We now turn our attention to bump number M and network size N. The mean bump velocity 〈v〉 is independent of these parameters (Fig 3E and 3F), which can be understood theoretically. Bump shapes across M and N are simple rescalings of one another (Fig 2F), so derivatives of s with respect to x are simply proportional to M (more bumps imply faster changes) and inversely proportional to N (larger networks imply slower changes). Similarly, integrals of expressions containing s over x are simply proportional to N. In summary,

dsdxMN,d2sdx2M2N2,dxN. (11)

Applying these scalings to Eq 8, we indeed expect vdrive to be independent of M and N. In contrast, Fig 3G and 3H reveal that the diffusion coefficient D varies with these parameters. When a one-bump network is increased in size from 200 to 400 neurons, D increases as well, which implies greater path integration errors. This undesired effect can be counteracted by increasing the bump number from 1 to 2, which lowers D below that of the one-bump network with 200 neurons. These initial results suggest that bump number and network size are important factors in determining a CAN’s resilience to noise. We will explore this idea in greater detail.

Mapping network coordinates onto physical coordinates

Before further comparing networks with different bump numbers M and sizes N, we should scrutinize the relationship between bump motion and the physical coordinate encoded by the network. After all, the latter is typically more important in biological settings. First, we consider the trivial case in which each neuron represents a fixed physical interval across all M and N; this is equivalent to using network coordinates without a physical mapping (Fig 4A). It is suited for encoding linear variables like position that lack intrinsic periodicity, so larger networks can encode wider coordinate ranges. However, with more bumps or fewer neurons, the range over which the network can uniquely encode different coordinates is shortened. We assume that ambiguity among coordinates encoded by each bump can be resolved by additional cues, such as local features, that identify the true value among the possibilities [4446]; this process will be examined in detail below. We leave quantities with dimensions of network distance in natural units of neurons.

Fig 4. Possible mappings between network coordinates and two types of physical coordinates.

Fig 4

(A) In networks encoding linear coordinates such as position, one neuron always represents a fixed physical interval. This mapping is trivial and identical to using network coordinates. (B) In networks encoding circular coordinates such as orientation, the bump distance always represents 360°.

Multi-bump networks are intrinsically periodic, especially those with a ring architecture. A natural way for them to encode a circular coordinate like orientation would be to match network and physical periodicities. For example, the bump distance may always represent 360°across different M and N so that neurons always exhibit unimodal tuning (Fig 4B). This relationship implies that quantities with dimensions of network distance should be multiplied by powers of the conversion factor

360°·MN, (12)

which converts units of neurons to degrees. In other words, circular mapping implies normalizing network distances by the bump distance λ = N/M.

For circular mapping, we must also ensure that networks with different bump numbers M and sizes N path integrate consistently with one another. The same drive b should produce the same bump velocity v in units of degree/s. To do so, we rescale the coupling strength γ only under circular mapping:

γNM. (13)

This effectively compensates for the factor of M/N in Eq 12. To see this explicitly, recall that vdrive does not depend on M and N in units of neuron/s, as shown in Fig 3E and 3F and previously explained through scaling arguments. Under circular mapping, vdrive would be multiplied by one power of the conversion factor in Eq 12. Since its formula contains γ in the numerator (Eq 8), vdrive receives an additional power of the rescaling factor in Eq 13. The two factors cancel each other, so vdrive does not depend on M and N under either mapping:

vdrive1linear,vdrive1circular. (14)

Thus, a consistent relationship between b and vdrive is preserved in units of both neurons/s and degrees/s.

Of course, there are other possible mappings between network and physical coordinates across bump numbers and network sizes. For example, intermediate scalings can be achieved with the conversion factor (M/N)μ for 0 < μ < 1 instead of Eq 12, with the corresponding γ ∝ (N/M)μ instead of Eq 13. But for the rest of our paper, we will consider the linear and circular cases, which correspond to μ = 0 and μ = 1, respectively. To be clear, networks with the same ring architecture are used for both mappings. We will see how noise affects encoding quality in either case.

More bumps improve robustness to input and spiking noise under linear mapping

We now revisit the effect of input noise on bump diffusion, as initially explored in Fig 3D–3H. We measure how the diffusion coefficient D varies with bump number M and network size N under linear and circular mappings. Under linear mapping, D decreases as a function of M but increases as a function of N (Fig 5A and 5B). Thus, more bumps attenuate diffusion produced by input noise, which is especially prominent in large networks. However, for circular coordinates, D remains largely constant with respect to M and decreases with respect to N (Fig 5A and 5B). Increasing the number of bumps provides no benefit. These results can be understood through Eqs 10, 11 and 12, which predict

DinputNM2linear,Dinput1Ncircular. (15)

Fig 5. Bump diffusion due to input and spiking noise.

Fig 5

(A, B) Networks with synaptic input noise of magnitude σ = 0.5 and drive b = 0.5. Dotted gray lines indicate Eq 10. (A) Diffusion decreases with bump number under linear mapping and remains largely constant under circular mapping. Networks with 600 neurons. (B) Diffusion increases with network size under linear mapping and decreases under circular mapping. Networks with 3 bumps. (C and D) Same as A and B, but for networks with Poisson spiking noise instead of input noise. Dotted gray lines indicate Eq 20. Points indicate data from 48 replicate simulations and bars indicate bootstrapped standard deviations.

Two powers of the conversion factor in Eq 12 account for the differences between the two mappings.

Next, we investigate networks with spiking noise instead of input noise. To do so, we replace the deterministic formula for firing rate in Eq 2 with

sα(x,t)=cα(x,t)Δt. (16)

Here, s is a stochastic, instantaneous firing rate given by the number of spikes c emitted in a simulation timestep divided by the timestep duration Δt. We take the c’s to be independent Poisson random variables driven by the deterministic firing rate:

cα(x,t)Pois[ϕ[gα(x,t)]Δt]. (17)

As fully explained in the Theoretical model section (Eq 99), we can approximate this spiking process by the rate-based dynamics in Eqs 1 and 2 with the noise term

ζα(x,t)=βdyWβ(x,y)ϕ[gβ(y,t)]Δtηβ(y,t). (18)

The η’s are independent random variables with zero mean and unit variance:

ηα(x,t)=0,ηα(x,t)ηβ(y,t)=Δtδ(t-t)δαβδ(x-y). (19)

As for Eq 9, the simulation timestep Δt sets the rate at which η is resampled. This spiking noise produces bump diffusion with coefficient (see also Ref [43])

Dspike=dxs(x)(dsdx)24τ2[dx(dsdx)2]2. (20)

As before, s is the baseline firing rate configuration without noise and drive. Through the relationships in Eqs 11 and 12, Dspike scales with M and N in the same way as Dinput does:

DspikeNM2linear,Dspike1Ncircular. (21)

These findings are presented in Fig 5C and 5D along with simulation results that confirm our theory. Spiking noise behaves similarly to input noise. Increasing bump number improves robustness to noise under linear mapping but has almost no effect under circular mapping. Bump diffusion in larger networks is exacerbated under linear mapping but suppressed under circular mapping. For both input noise and spiking noise, the conversion factor in Eq 12 produces the differences between the two mappings. Coupling strength rescaling in Eq 13 does not play a role because γ does not appear in Eqs 10 and 20. In S1 Text, we consider splitting a large network with many bumps into smaller networks, each with fewer bumps; the intact network and the combined readout of the split networks exhibit similar diffusion properties.

To evaluate noise robustness in a complementary way, we perform mutual information analysis of networks with input noise. Mutual information describes how knowledge of one random variable can reduce the uncertainty in another, and it serves as a general metric for encoding quality. Before proceeding, we mention a related quantity called Fisher information, which is directly related to mutual information [47, 48] and inversely related to bump diffusion in CANs [43]. Thus, we expect that networks with less diffusion in Fig 5 should generally contain more mutual information about their encoded coordinate. Fisher information also permits a more intuitive explanation for our diffusion scalings. It is proportional to network size N [43], because monitoring a larger number of noisy neurons tells us more about the encoded coordinate. Otherwise, it only depends on the tuning curves of neurons within the network; in particular, steeper tuning curves convey quadratically more information [43]. For our networks, across M and N, linear tuning curves are simple rescalings of each other with derivative inversely proportional to λ = N/M, while circular tuning curves remain unimodal and identical (Figs 2F and 4). Thus, Fisher information should be proportional to N ⋅ (M/N)2 and N for linear and circular coordinates, respectively. Applying the inverse proportionality between Fisher information and diffusion coefficients discovered by Ref [43], we arrive at Eqs 15 and 21.

Using simulations, we investigate the mutual information I between the physical coordinate encoded by the noisy network, represented by the random variable U with discretized sample space U, and the activity of a single neuron, represented by the random variable S with discretized sample space S (see Simulation methods):

I[S;U]=sS,uUp(s|u)p(u)logp(s|u)p(s). (22)

We then average I across neurons. Larger mean mutual information implies more precise coordinate encoding and greater robustness to noise. Note that the joint activities of all the neurons confer much more coordinate information than single-neuron activities do, but since estimating high-dimensional probability distributions over the former is computationally very costly, we use the latter as a metric for network performance.

The physical coordinate U is either position or orientation and obeys the mappings described in Fig 4 across bump numbers M and network sizes N. To obtain the probability distributions in Eq 22 required to compute I, we initialize multiple replicate simulations at evenly spaced coordinate values u (Fig 6A). We do not apply a driving input, so the networks should still encode their initialized coordinates at the end of the simulation. However, they contain input noise that degrades their encoding. Collecting the final firing rates produces p(s|u) for each neuron. For both position and orientation, we consider narrow and wide coordinate ranges to assess network performance in both regimes.

Fig 6. Mutual information between neural activity and physical coordinates with input noise of magnitude σ = 0.5.

Fig 6

(A) To compute mutual information, we initialize replicate simulations without input drive at different coordinate values (thick black lines) and record the final neural activities (thin colored lines). The physical coordinate can be linear or circular and its range can be narrow or wide; here, we illustrate two possibilities for networks with 600 neurons and 3 bumps. (B and C) Mutual information between physical coordinate and single-neuron activity under narrow coordinate ranges. (B) Information increases with bump number for linear coordinates and remains largely constant for circular coordinates. Networks with 600 neurons. (C) Information decreases with network size for linear coordinates and increases for circular coordinates. Networks with 3 bumps. (D and E) Mutual information between physical coordinate and single-neuron activity under wide coordinate ranges. The trends in B and C are preserved for circular coordinates. They are also preserved for linear coordinates, except for the shaded regions in which the coordinate range exceeds the bump distance. (F) Coarse local cues are active over different quadrants of the wide coordinate ranges. (G and H) Mutual information between physical coordinate and the joint activities of a single neuron with the four cues in F under wide coordinate ranges. The trends in B and C are preserved for both linear and circular coordinates. Points indicate data from 96 replicate simulations at each coordinate value averaged over neurons and bars indicate bootstrapped standard errors of the mean. Cue icons adapted from Streamline Freemoji (CC BY license, https://www.streamlinehq.com/emojis/freebies-freemojis).

We first consider narrow coordinate ranges. For linear coordinates, information increases as a function of M but decreases as a function of N; for circular coordinates, it does not strongly depend on M and increases as a function of N (Fig 6B and 6C). These results exactly corroborate those in Fig 5A and 5B obtained for bump diffusion, since we expect information and diffusion to be inversely related.

We next consider wide coordinate ranges. Our ring networks can only uniquely represent coordinate ranges up to their bump distances (converted to physical distances by Fig 4). Beyond these values, two physical coordinates separated by the converted bump distance cannot be distinguished by the network. Our mutual information analysis captures this phenomenon; for linear coordinates, the increase in information with larger M or smaller N as observed in Fig 6B and 6C disappears once the converted bump distance drops below the physical range of 200 cm (green shaded regions of Fig 6D and 6E). In this regime, the benefits of more bumps and smaller networks toward decreasing diffusion are counteracted by bump ambiguity. In contrast, the circular mapping in Fig 4 lacks bump ambiguity since the bump distance is always converted to the maximum physical range of 360°, so the same qualitative trends in mutual information are observed for any coordinate range (Fig 6D and 6E).

For linear coordinates with wide ranges, the advantages of increasing bump number can be restored by coarse local cues. We illustrate this process by introducing four cues, each of which is active over a different quadrant and is otherwise inactive (Fig 6F). They can be conceptualized as two-state sensory neurons or neural populations that fire in the presence of a local stimulus. By themselves, the cues do not encode precise coordinate values. Mutual information computed with the joint activity of each neuron with these cues recovers the behavior observed for narrow ranges across all M and N (Fig 6G and 6H). Ring attractor neurons provide information beyond the 2 bits conveyed by the cues alone, and for position, this additional information increases with more bumps and fewer neurons without saturating.

In summary, our conclusions about robustness to input noise obtained by diffusion analysis are also supported by mutual information analysis. Moreover, the latter explicitly reveals how networks encoding wide, linear coordinate ranges can leverage coarse local cues to address ambiguities and preserve the advantages of multiple bumps. In S1 Text, we calculate mutual information for a few additional regimes (Fig C of S1 Text; see also Refs [49] and [50], which investigated the Fisher information conveyed by multi-bump tuning curves).

More bumps improve robustness to connectivity noise under linear mapping

Another source of noise in biological CANs is inhomogeneity in the connectivity W. Perfect continuous attractor dynamics requires W to be invariant to translations along the network [9, 10, 16, 18, 28], a concept related to Goldstone’s theorem in physics [51, 52]. We consider the effect of replacing WW + V, where V is a noisy connectivity matrix whose entries are independently drawn from a zero-mean Gaussian distribution. V disrupts the symmetries of W. This noise is quenched and does not change over the course of the simulation, in contrast to input and spiking noise, which are independently sampled in time. It contributes a noise term

ζα(x,t)=βdyVαβ(x,y)sβ(y,t). (23)

This formula implies that V produces correlated ζ’s across neurons, which also differs from input and spiking noise. Because of these distinctions, the dominant effect of connectivity noise is not diffusion, but drift. V induces bumps to move with velocity vconn(θ), even without drive b:

vconn(θ)=-αβdxdyVαβ(x,y)ds(x-θ)dxs(y-θ)2τdx(dsdx)2. (24)

The movement is coherent but irregular, as it depends on the bump position θ (Fig 7A). Refs [53] and [54] refer to vconn(θ) as the drift velocity.

Fig 7. Bump trapping due to connectivity noise at low drive.

Fig 7

(AC) Networks with 600 neurons, 1 bump, and the same realization of connectivity noise of magnitude 0.002. (A) Theoretical values for drift velocity as a function of bump position using Eq 24. (B) Bumps drift towards trapped positions over time. The drive is b = 0. Arrows indicate predictions from vconn(θ) crossing 0 with negative slope in A. Lines indicate simulations with different starting positions. (C) Bump trajectories with smallest positive and negative drive required to travel through the entire network. Respectively, b = 0.75 and b = −0.52. The larger of the two in magnitude is the escape drive b0 = 0.75. Note that positions with low bump speed exhibit large velocities in the opposite direction in A. (D and E) Networks with multiple realizations of connectivity noise of magnitude 0.002. (D) Escape drive decreases with bump number under linear mapping and remains largely constant under circular mapping. Networks with 600 neurons. (E) Escape drive increases with network size under linear mapping and remains largely constant under circular mapping. Networks with 3 bumps. Points indicate simulation means over 48 realizations and bars indicate standard deviations. Dotted gray lines indicate Eq 26 averaged over 96 realizations.

Connectivity noise traps bumps at low drive b. We first consider b = 0, so bump motion is governed solely by drift according to dθ/dt = vconn(θ). The bump position θ has stable fixed points wherever vconn(θ) crosses 0 with negative slope [53, 54]. Simulations confirm that bumps drift towards these points (Fig 7B). The introduction of b adds a constant vdrive that moves the curve in Fig 7A up for positive b or down for negative b:

vtotal(θ)=vdrive+vconn(θ). (25)

If vtotal(θ) still crosses 0, bumps would still be trapped. The absence of bump motion in response to coordinate changes encoded by b would be a catastrophic failure of path integration. To permit bump motion through the entire network, the drive must be strong enough to eliminate all zero-crossings. Fig 7C shows bump motion at this drive for both directions of motion. The positive b is just large enough for the bump to pass through the region with the most negative vconn(θ) in Fig 7A; likewise for negative b and positive vconn(θ). We call the larger absolute value of these two drives the escape drive b0. Simulations show that b0 decreases with bump number M and increases with network size N under linear mapping (Fig 7D and 7E). A smaller b0 implies weaker trapping, so smaller networks with more bumps are more resilient against this phenomenon. Under circular mapping, however, b0 demonstrates no significant dependence on M or N. We can predict b0 by inverting the relationship in Eq 8 between b and v:

b0=-maxθ|vconn(θ)|·τdx(dsdx)2γξdxd2sdx2. (26)

This theoretical result agrees well with values obtained by simulation (Fig 7D and 7E). In the Theoretical model section, we present a heuristic argument (Eq 124) that leads to the observed scaling of escape drive on M and N:

b0NMlinear,b01circular. (27)

At high drive |b| > b0, attractor bumps are no longer trapped by the drift velocity vconn(θ). Instead, the drift term produces irregularities in the total velocity vtotal(θ) (Fig 8A). They can be decomposed into two components: irregularities between directions of motion and irregularities within each direction. Both imply errors in path integration because v and b are not strictly proportional. To quantify these components, we call |v+(θ)| and |v(θ)| the observed bump speeds under positive and negative b. We define speed difference as the unsigned difference between mean speeds in either direction, normalized by the overall mean speed:

speeddifference=2|meanθ|v+(θ)|-meanθ|v-(θ)||meanθ|v+(θ)|+meanθ|v-(θ)|. (28)

Fig 8. Bump speed irregularity due to connectivity noise at high drive.

Fig 8

(A) Bump speed as a function of bump position with connectivity noise of magnitude 0.002 and drive b = 1.5. Network with 600 neurons, 1 bump, and the same realization of connectivity noise as in Fig 7A–7C. Thick gray lines indicate Eq 25. (BE) Networks with multiple realizations of connectivity noise of magnitude 0.002 and drive b = 1.5. (B) Speed difference between directions decreases with bump number under linear mapping and remains largely constant under circular mapping. Networks with 600 neurons. (C) Speed difference increases with network size under linear mapping and remains largely constant under circular mapping. Networks with 3 bumps. (D and E) Same as B and C, but for speed variability within each direction. Points indicate simulation means over 48 realizations and bars indicate standard deviations. Dotted gray lines indicate Eqs 30 and 31 averaged over 96 realizations.

We then define speed variability as the standard deviation of speeds within each direction, averaged over both directions and normalized by the overall mean speed:

speedvariability=stdθ|v+(θ)|+stdθ|v-(θ)|meanθ|v+(θ)|+meanθ|v-(θ)|. (29)

Speed difference and speed variability follow the same trends under changes in bump number M and network size N (Fig 8B–8E). Under linear mapping, they decrease with M and increase with N. Under circular mapping, they do not significantly depend on M and N. These are also the same trends exhibited by the escape drive b0 (Fig 7D and 7E). In terms of theoretical quantities, the formulas for speed difference and variability become

speeddifference=2|meanθvconn(θ)||vdrive| (30)

and

speedvariability=stdθvconn(θ)|vdrive|. (31)

These expressions match the observed values well (Fig 8B–8E). In the Theoretical methods section, we calculate the observed dependence of speed difference (Eq 113) and variability (Eq 120) on M and N:

speeddifferenceandvariabilityNMlinear,speeddifferenceandvariability1circular. (32)

For all results related to connectivity noise, the coupling strength rescaling in Eq 13 produces the differences between the two mappings via the γ in Eq 8. The conversion factor in Eq 12 does not play a role because escape drive, speed difference, and speed variability do not have dimensions of network distance.

To summarize, CANs with imperfect connectivity benefit from more attractor bumps when encoding linear coordinates. This advantage is present at all driving inputs and may be more crucial for larger networks. On the other hand, connectivity noise has largely the same consequences for networks of all bump numbers and sizes when encoding circular coordinates.

Discussion

We demonstrated how CANs capable of path integration respond to three types of noise. Additive synaptic input noise and Poisson spiking noise cause bumps to diffuse away from the coherent motion responsible for path integration (Figs 3 and 5). This diffusion is accompanied by a decrease in mutual information between neural activity and encoded coordinate (Fig 6). Connectivity noise produces a drift velocity field that also impairs path integration by trapping bumps at low drive and perturbing bump motion at high drive (Figs 7 and 8).

For all three types of noise, CANs with more attractor bumps exhibit less deviation in bump motion in network units. This is observed across network parameters (Figs A and B in S1 Text). Thus, they can more robustly encode linear variables whose mapping inherits network units and does not rescale with bump number (Fig 4A). If grid cell networks were to encode spatial position in this manner, then multiple attractor bumps would be preferred over a single bump. Ref [20] reports experimental evidence supporting multi-bump grid networks obtained by calcium imaging of mouse medial entorhinal cortex. Our work implies that the evolution of such networks may have been partially encouraged by biological noise. Additional bumps introduce greater ambiguity among positions encoded by each bump, but this can be resolved by a rough estimate of position from additional cues, such as local landmarks [44, 5557], another grid module with different periodicity [15, 40, 41, 44, 46, 58], or a Bayesian prior based on recent memory [45]. In this way, grid modules with finer resolution and more attractor bumps could maintain a precise egocentric encoding of position, while coarser modules and occasional allocentric cues would identify the true position out of the few possibilities represented. We explicitly explored one realization of this concept and observed how cues enable networks to improve their information content by increasing bump number, despite a concomitant increase in bump ambiguity (Fig 6F–6H).

In contrast, CANs encoding circular variables may rescale under different bump numbers to match periodicities (Fig 4B), which eliminates any influence of bump number on encoding accuracy for all three types of noise. If head direction networks were to encode orientation in this manner, then they would face less selective pressure to evolve beyond the single-bump configuration observed in Drosophila [8]. Moreover, without the assumption of bump shape invariance accomplished by Eq 7, robustness to all three types of noise decreases with bump number, which actively favors single-bump orientation networks (Fig B in S1 Text). Further experimental characterization of bump number in biological CANs, perhaps through techniques proposed by Ref [59], would test the degree to which the brain can leverage the theoretical advantages identified in this work.

Under linear mapping, larger CANs exhibit more errors in path integration from all three types of noise. The immediate biological implication is that larger brains face a dramatic degradation of CAN performance, accentuating the importance of suppressing error with multi-bump configurations. However, the simple rule that one neuron always represents a fixed physical interval does not need to be followed, and larger animals may tolerate greater absolute errors in path integration because they interact with their environments over larger scales. Nevertheless, our results highlight the importance of considering network size when studying the performance of noisy CANs. Under circular mapping, bump diffusion decreases with network size for input and spiking noise, and the magnitude of errors due to connectivity noise is independent of network size. This implies that head direction networks can benefit from incorporating more neurons; the observed interactions among such networks across different mammalian brain regions may act in this manner to suppress noise [60].

The computational advantages of periodic over nonperiodic encodings has been extensively studied in the context of grid cells [45, 46, 49, 6164]. Our results extend these findings by demonstrating that some kinds of periodic encodings can perform better than others. Our results also contribute to a rich literature on noisy CANs. Previous studies have investigated additive input noise [35, 43, 54, 65, 66], multiplicative input noise [67], spiking noise [18, 43, 54, 62, 63, 68], and quenched noise due to connectivity or input inhomogeneities [10, 53, 54, 66, 69]. Among these works, the relationship between bump number and noise has only been considered in the context of multiple-item working memory, in which a single network can be dynamically loaded with different numbers of bumps [62, 63, 67, 68]. Interestingly, they find that robustness to noise decreases with bump number, which is opposite to our results (cf. Ref [63], which reports no dependence between bump number and encoding accuracy under certain conditions). It appears that CANs designed for path integration with fixed bump number and CANs designed for multiple-item working memory with variable bump number differ crucially in their responses to noise. Further lines of investigation that compare these two classes would greatly contribute to our general understanding of CANs.

Beyond our concrete results on CAN performance, our work offers a comprehensive theoretical framework for studying path-integrating CANs. We derive a formula for the multi-bump attractor state and a Lyapunov functional that governs its formation. For a ReLU activation function, we calculate all key dynamical quantities such as velocities and diffusion coefficients in terms of firing rates. Our formulas yield scaling relationships that facilitate an intuitive understanding for their dependence on bump number and network size. Much of our theoretical development does not assume a specific connectivity matrix or nonlinear activation function, which allows our results to have wide significance. For example, we expect them to hold for path-integrating networks that contain excitatory synapses. Other theories have been developed for bump shape [37, 38, 53, 66, 67, 70], path integration velocity [37, 42], diffusion coefficients [35, 43, 54, 66, 67, 71], and drift velocity [10, 53, 54]. Our work unifies these studies through a simple framework that features path integration, multiple bumps, and a noise term that can represent a wide range of sources. It can be easily extended to include other components of theoretical or biological significance, such as slowly-varying inputs [27, 66, 72], synaptic plasticity [34, 73], neural oscillations [7476], and higher-dimensional attractor manifolds [2, 28].

Theoretical model

Architecture

We investigate CAN dynamics through a 1D ring attractor network. This class of network has been analyzed in previous theoretical works, and at various points, our calculations will parallel those in Refs [37, 38, 43, 53, 54, 66], and [42].

There are two neurons at each position i = 0, …, N − 1 with population indexed by α ∈ {L, R} (Fig 1A). For convenient calculation, we unwrap the ring and connect copies end-to-end, forming a linear network with continuous positions x ∈ (−∞, ∞). Unless otherwise specified, integrals are performed over the entire range. To map back onto the finite-sized ring network, we enforce our results to have a periodicity λ that divides N. For example, λ = N corresponds to a single-bump configuration. Integrals would then be performed over [0, N), with positions outside this range corresponding to their equivalents within this range.

The network obeys the following dynamics for synaptic inputs g:

τdgα(x,t)dt+gα(x,t)=βdyWβ(x,y)sβ(y,t)+A±αγb(t)+ζα(x,t), (33)

where ±L means − and ±R means +, and the opposite for ∓α. τ is the neural time constant, W is the synaptic connectivity, and A is the resting input. The nonlinear activation function ϕ converts synaptic inputs to firing rates:

sα(x,t)=ϕ[gα(x,t)]. (34)

Most of our results will apply to general ϕ, but we also consider a ReLU activation function specifically:

ϕ[g]={gg>00g0. (35)

In this section, we will explicitly mention when we specifically consider the ReLU case, and we will always simplify the function away. Thus, if an expression contains the symbol ϕ, then it holds for general ϕ. In the Results section, formulas for Dinput, Dspike, vdrive, and vconn(θ) as well as all simulation results invoke Eq 35. We will also use this form in the Bump shape g subsection of this section. On the other hand, scalings for Dinput, Dspike, vdrive, and vconn(θ) in the Results section will hold for general ϕ, as long as the connectivity obeys Eq 7 such that g(x/λ) remains invariant over M and N (Fig 2F). b is the driving input, γ is its coupling strength, and ζ is the noise, which can take different forms. γb and ζ are small compared to the rest of the right-hand side of Eq 33. For notational convenience, we will often suppress dependence on t.

Wβ(x, y) obeys a standard continuous attractor architecture based on a symmetric and translation-invariant W:

Wβ(x,y)=W(x-yβξ)whereW(-x)=W(x). (36)

Each population β deviates from W by a small shift ξN in synaptic outputs. Thus, the following approximation holds:

βWβ(x,y)2W(x-y). (37)

We will consider the specific form of W (Fig 1B):

W(x)={w·cosπx/l-12|x|<2l0|x|2l={w·coskx-12|x|<2π/k0|x|2π/k, (38)

where k = π/l. We will explicitly mention when we specifically consider this form; in fact, we only do so for Eqs 46, 47, 59 and 60, as well as for our simulation results in the Results section. Otherwise, each expression holds for general W.

Baseline configuration without drive and noise

Linearized dynamics and bump distance λ

First, we consider the case of no drive b = 0 and no noise ζ = 0. The dynamical equation Eq 33 becomes

τdgα(x)dt+gα(x)=βdyWβ(x,y)ϕ[gβ(y)]+A. (39)

Since the right-hand side no longer depends on α, g must be the same for both populations, and we can use Eq 37 to obtain

τdg(x)dt+g(x)=2dyW(x-y)ϕ[g(y)]+A. (40)

We analyze these dynamics using the Fourier transform F. Our chosen convention, applied to the function h, is

h˜(q)=F[h](q)=dxe-iqxh(x)h(x)=F-1[h˜](x)=dq2πeiqxh˜(q). (41)

Fourier modes h˜(q) represent sinusoids with wavenumber q and corresponding wavelength 2π/q. Applying this transform to Eq 40, we obtain

τdg˜(q)dt+g˜(q)=2W˜(q)F[ϕ[g]](q)+2πAδ(q). (42)

In this subsection, we consider the case of small deviations, such that g(x) ≈ g0 and ϕ[g(x)] ≈ ϕ[g0] + ϕ′[g0](g(x) − g0). Then, Eq 42 becomes

τdg˜(q)dt+g˜(q)=2ϕ[g0]W˜(q)g˜(q)+2πA0δ(q), (43)

where A0=A+2W˜(0)(ϕ[g0]-ϕ[g0]g0). The solution to this linearized equation for q ≠ 0 is

g˜(q,t)=g˜(q,0)er(q)t. (44)

Each mode grows exponentially with rate

r(q)=2ϕ[g0]W˜(q)-1τ, (45)

so the fastest-growing component of g is the one that maximizes W˜(q), as stated in Eq 5 of the Results section. The wavelength 2π/q of that component predicts the bump distance λ.

For the specific W in Eq 38, its Fourier transform is

W˜(q)=-wk2sin2πqkk2q-q3, (46)

so

λ=2πargminqk2sin2πqkk2q-q3=2largminψsin2πψψ-ψ32.28l. (47)

λ is proportional to l, as also noted by Refs [16, 18, 41], and [40].

Bump shape g

We call the steady-state synaptic inputs g without drive and noise the baseline configuration. To calculate its shape, we must account for the nonlinearity of the activation function ϕ and return to Eq 42. We invoke our particular form of ϕ in Eq 35 to calculate F[ϕ[g]](q). g must be periodic, and its periodicity is the bump distance λ with wavenumber κ = 2π/λ. Without loss of generality, we take g to have a bump centered at 0. Since W is symmetric, g is an even function. We define z as the position where g crosses 0:

g(z)=0. (48)

If g is approximately sinusoidal, then g(x) > 0 wherever nλ − z < x < nλ + z for any integer n. The ReLU ϕ in Eq 35 implies

ϕ[g(x)]=g(x)Φ(x)whereΦ(x)=n=-Θ[x-nλ+z]Θ[-(x-nλ-z)]. (49)

Θ is the Heaviside step function. The Fourier transform for Φ is

Φ˜(q)=2sinqzqn=-e-2πinq/κ=2sinqzqn=-δ(n-qκ)=2κsinqzqn=-δ(q-nκ), (50)

where the second equality comes from the Fourier series for a Dirac comb. Therefore,

F[ϕ[g]](q)=12πdqΦ˜(q-q)g˜(q)=1πn=-sinnκzng˜(q-nκ), (51)

so Eq 42 becomes

τdg˜(q)dt+g˜(q)=2πW˜(q)n=-sinnκzng˜(q-nκ)+2πAδ(q). (52)

This equation describes the full dynamics of g with a ReLU activation function. It contains couplings between all modes q that are multiples of the wavenumber κ, which corresponds to the bump distance.

To find the baseline g, we set dg˜/dt=0. We also simplify g˜(q) by only considering the lowest modes that couple to each other: q = 0, ±κ. Due to symmetry, W˜(-q)=W˜(q) and g˜(-q)=g˜(q). Eq 52 gives

g˜(0)=2πW˜(0)[κzg˜(0)+2sin(κz)g˜(κ)]+2πAδ(0)g˜(κ)=2πW˜(κ)[sin(κz)g˜(0)+(κz+sin2κz2)g˜(κ)]. (53)

Now we need to impose Eq 48: g(z) = 0. To do so, we note that g˜(0) and g˜(κ) are both proportional to δ(0) according to Eq 53. That means g˜(q) has the form

g˜(q)=G0δ(q)+Gδ(q-κ)+Gδ(q+κ), (54)

where G0 and G are the Fourier modes with delta functions separated. This implies

g(x)=G02π+Gπcosκx, (55)

and g(z) = 0 implies

G0=-2cos(κz)G. (56)

Substituting Eqs 54 and 56 into Eq 53, we obtain

Gπ=-πA2W˜(0)(sinκz-κzcosκz)+πcosκzκz-cosκzsinκz=π2W˜(κ). (57)

We can solve the second equation of Eq 57 for κz and then substitute it into the first equation to obtain G. This then gives us g(x), which becomes through Eqs 55 and 56

g(x)=Gπ(cosκx-cosκz). (58)

In particular, let’s use the W defined by Eq 38 with Fourier transform Eq 46. Then,

W˜(0)=-2πwkandW˜(κ)=-wκk2k2-κ2sin2πκk. (59)

Thus,

g(x)=Gπ(cosκx-cosκz)κz-cosκzsinκz=-π/[2wκk2k2-κ2sin2πκk]Gπ=A/[4wk(sinκz-κzcosκz)-cosκz]. (60)

This provides expressions for a and d in Eq 4 of the Results section, where a = G/π and d = −(G/π)cos κz.

Lyapunov functional and bump distance λ

The dynamical equation in Eq 40 admits a Lyapunov functional. In analogy to the continuous Hopfield model [77], we can define a Lyapunov functional in terms of s(x) = ϕ[g(x)]:

L=-dxdyW(x-y)s(x)s(y)+dx0s(x)dρϕ-1[ρ]-Adxs(x). (61)

The nonlinearity ϕ must be invertible in the range (0, s) for any possible firing rate s. For L to be bounded from below for a network of any size N, we need

  1. W(x) to be negative-definite, and

  2. 0sdρϕ-1[ρ]-As to be bounded from below for any possible firing rate s.

We can check that these hold for our particular functions. Eq 38 immediately shows that the first condition is met. Eq 35 states that ϕ−1[ρ] = ρ when ρ > 0, so 0sdρϕ-1[ρ]-As=12s2-As, which satisfies the second condition.

Now we take the time derivative and use Eq 40:

dLdt=-dx{2dyW(x-y)s(y)-ϕ-1[s(x)]+A}ds(x)dt=-τdxdg(x)dtds(x)dt=-τdxϕ[g(x)](dg(x)dt)2. (62)

As long as ϕ is a monotonically nondecreasing function, dL/dt ≤ 0. Thus, L is a Lyapunov functional.

Now we seek to simplify Eq 61. Suppose we are very close to a steady-state solution, so dg/dt ≈ 0. We substitute Eq 40 into Eq 61 to obtain

L=-12dx[g(x)-A]s(x)+dx0s(x)dρϕ-1[ρ]-Adxs(x)=-12dxg(x)s(x)+dx0s(x)dρϕ-1[ρ]-A2dxs(x). (63)

Now we invoke our ReLU ϕ from Eq 35 to obtain

L=-12dx[g(x)-s(x)]s(x)-A2dxs(x)=-A2dxs(x). (64)

The last equality was obtained by noticing that for any x, either s(x) = 0 or g(x) − s(x) = 0 with our ϕ. Therefore, the stable solution that minimizes L is the one that maximizes the total firing rate.

We can apply our sinusoidal g in Eq 58 to perform the integral, recalling that κ = 2π/λ:

L=-NAG2π2(sinκz-κzcosκz), (65)

where N is the network size. So L depends on G and the quantity κz, which we will rewrite as ψ. We now simplify Eq 65 using Eq 57:

L=-NA2(sinψ-ψcosψ)4W˜(0)(sinψ-ψcosψ)-2πcosψ=NA2-4W˜(0)+2πtanψ-ψ. (66)

Note that since W is negative-definite, W˜(0)=dxW(x)<0. Also note that 1/(tanψψ) is a monotonically decreasing function of ψ in the range [0, π]. Thus, to minimize L, we need to minimize ψ. Meanwhile, Eq 57 now reads ψ-cosψsinψ=π/2W˜(κ). The left-hand side is a monotonically increasing function of ψ in the range [0, π], so to minimize ψ, we need to maximize W˜(κ). Thus, the Lyapunov-stable wavelength λ = 2π/κ is the one that maximizes W˜(κ). This is the same mode that grows fastest for the linearized dynamics in Eq 45.

Bump motion under drive and noise

Dynamics along the attractor manifold

Now that we have determined the baseline configuration g, including the bump shape and bump distance, we investigate its motion under drive b and noise ζ. We introduce θ to label the position of the configuration. It can be defined as the center of mass or the point of maximum activity of one of the bumps. We expand the full time-dependent configuration with respect to the baseline configuration located at θ:

gα(x,t)=g(x-θ)+δgα(x,t). (67)

g(xθ) solves Eq 40 with dg/dt = 0; to facilitate calculations below, we will write the baseline equation in this form:

g(x-θ)=βdyWβ(x,y)ϕ[g(y-θ)]+A. (68)

Substituting Eq 67 into Eq 33 and invoking Eq 68, we obtain the following linearized dynamics for δg:

τdδgα(x,t)dt+δgα(x,t)=βdyWβ(x,y)ϕ[g(y-θ)]δgβ(y,t)±αγb(t)+ζα(x,t). (69)

We can rewrite this as

τdδgα(x,t)dt=βdyKαβ(x,y;θ)δgβ(y,t)±αγb(t)+ζα(x,t), (70)

where

Kαβ(x,y;θ)=Wβ(x,y)ϕ[g(y-θ)]-δαβδ(x-y). (71)

We will often suppress the argument of derivatives of g. If we consider a configuration located at θ, dg/dx implies dg(xθ)/dx. We make the argument explicit when necessary.

If we differentiate Eq 68 by θ, we obtain

dgdx=βdyWβ(x,y)ϕ[g(y-θ)]dgdy0=βdyKαβ(x,y;θ)dgdy, (72)

which indicates that dg/dx is a right eigenvector of K with eigenvalue 0. To be explicit about this, we recover the discrete case by converting continuous functions to vectors and matrices:

gi=g(i-θ),Δgi=dg(x-θ)dx|x=i,Kαβij=Kαβ(i,j;θ). (73)

If we concatenate matrices and vectors across populations as

J=(KLLKLRKRLKRR),e=(ΔgΔg), (74)

e is the right null eigenvector of J:0=jJijej.

Since K is not symmetric, its left and right eigenvectors may be different. To find the left null eigenvector, we again differentiate Eq 68 with respect to θ, but this time interchanging variables x and y:

dgdy=βdxWβ(y,x)ϕ[g(x-θ)]dgdx2dxW(x-y)ϕ[g(x-θ)]dgdx. (75)

The second equality is obtained from Eqs 36 and 37. Replacing the position y by y ±β ξ, where ξ is the connectivity shift, we get

dg(y-θ±βξ)dy2dxW(x-yβξ)ϕ[g(x-θ)]dg(x-θ)dx, (76)

where we have made the arguments of g explicit. Let’s define shifted versions of the baseline g for each population α:

g¯α(x)=g(x±αξ). (77)

Since ξ is small,

αg¯α(x)2g(x). (78)

Applying these expressions to Eq 76 and recalling Eq 36,

dg¯βdy2dxWβ(x,y)ϕ[g(x-θ)]dgdxαdxWβ(x,y)ϕ[g(x-θ)]dg¯αdx. (79)

Finally, we multiply both sides of the equation by ϕ′[g(yθ)] to obtain

ϕ[g(y-θ)]dg¯βdyαdxWβ(x,y)ϕ[g(y-θ)]ϕ[g(x-θ)]dg¯αdx0=αdxKαβ(x,y;θ)ϕ[g(x-θ)]dg¯αdx. (82)

Thus ϕ[g(x-θ)]dg¯α/dx is the left null eigenvector for Kαβ. Again, to be explicit, the discrete equivalent is

J=(KLLKLRKRLKRR),f=(ϕ[g]Δg¯Lϕ[g]Δg¯R), (81)

where ⊙ represents element-wise (Hadamard) multiplication. Then, f is the left null eigenvector of J:0=iJijfi.

We now revisit Eq 67 and assume that g changes such that the bumps slowly move along the attractor manifold:

gα(x,t)g(x-θ(t)),dδgα(x,t)dt=dgα(x,t)dt-dg(x-θ(t))dxdθdt. (82)

Again for simplicity, we will often suppress arguments of derivatives of g and dependence on t. We return to Eq 70, project it along the left null eigenvector, and apply Eq 82 to obtain

-τdθdtαdxϕ[g(x-θ)]dg¯αdxdgdx=γbαdx(±α1)·ϕ[g(x-θ)]dg¯αdx+αdxϕ[g(x-θ)]dg¯αdxζα(x). (83)

The velocity of bump motion is given by dθ/dt. It is

dθdt-γbαdx(±α1)·ϕ[g(x-θ)]dg¯α(x-θ)dx2τdxϕ[g(x-θ)](dg(x-θ)dx)2-αdxϕ[g(x-θ)]dg¯α(x-θ)dxζα(x)2τdxϕ[g(x-θ)](dg(x-θ)dx)2, (84)

where we have made the arguments of g explicit. This equation encapsulates all aspects of bump motion for our theoretical model. It includes dependence on both drive b and noise ζ, the latter of which is kept in a general form. We will proceed by considering specific cases of this equation.

Path integration velocity vdrive due to driving input b

The noiseless case of Eq 84 with ζα(x) = 0 yields the bump velocity due to drive b, which is responsible for path integration:

vdrive=-γbdxϕ[g(x-θ)](dg¯Rdx-dg¯Ldx)2τdxϕ[g(x-θ)](dgdx)2. (85)

Note that this expression is independent of the position θ. We can explicitly remove θ by shifting the dummy variable xx + θ:

vdrive=-γbdxϕ[g(x)](dg(x+ξ)dx-dg(x-ξ)dx)2τdxϕ[g(x)](dg(x)dx)2-γbξdxϕ[g(x)]d2gdx2τdxϕ[g(x)](dgdx)2. (86)

Now let’s consider the specific ReLU activation function ϕ. Eq 35 implies

ϕ[g]={1g>00g0,soϕ[g]2=ϕ[g]andϕ[g]·ϕ[g]=ϕ[g]. (87)

These identities, along with the definition for s (Eq 34), give

ϕ[g(x)]d2gdx2=d2sdx2,ϕ[g(x)](dgdx)2=(dsdx)2,ϕ[g(x)](dgdx)2=s(x)(dsdx)2. (88)

Applying the first two equalities to Eq 86 produces Eq 8 of the Results section.

Now we reintroduce noise ζ and assume it is independent across neurons and timesteps, with mean 〈ζ〉. If we average Eq 84 over ζ, the numerator of the second term becomes

αdxϕ[g(x-θ)]dg¯α(x-θ)dxζ=0. (89)

The integral vanishes because g is even and αdg¯α/dx is odd. Thus,

dθdt=vdrive, (90)

demonstrating that networks with independent noise still path integrate on average.

Diffusion Dinput due to input noise

Independent noise ζ produces diffusion, a type of deviation in bump motion away from the average trajectory. It is quantified by the diffusion coefficient D:

[θ(t)-θ(t)]2=2Dt. (91)

In terms of derivatives of θ,

[θ(t)-θ(t)]2=0t0tdtdt(dθdt-dθdt)(dθdt-dθdt). (92)

Eqs 84 and 90 imply

dθdt-dθdt=-αdxϕ[g(x-θ)]dg¯αdxζα(x)2τdxϕ[g(x-θ)](dgdx)2. (93)

We then shift the dummy variable xx + θ(t) and reintroduce explicit dependence on t to obtain

[θ(t)-θ(t)]2=0t0tdtdtαβdxdyϕ[g(x)]ϕ[g(y)]dg¯αdxdg¯βdyζα(x+θ(t),t)ζβ(y+θ(t),t)4τ2[dxϕ[g(x)](dgdx)2]2. (94)

One class of independent ζ is Gaussian noise added to the total synaptic input, which represents neural fluctuations at short timescales. We assume it is independent across neurons and timesteps with zero mean and fixed variance σ2:

ζα(x,t)=0,ζα(x,t)ζβ(y,t)=σ2Δtδ(t-t)δαβδ(x-y). (95)

Δt is the simulation timestep, which defines the rate at which the random noise variable is resampled. Eq 94 then becomes, with the help of Eq 78,

[θ(t)-θ(t)]2=0tdtσ2Δtαdxϕ[g(x)]2(dg¯αdx)24τ2[dxϕ[g(x)](dgdx)2]2σ2Δtdxϕ[g(x)]2(dgdx)22τ2[dxϕ[g(x)](dgdx)2]2·t. (96)

Reconciling this with the definition of the diffusion coefficient D in Eq 91 yields

Dinput=σ2Δtdxϕ[g(x)]2(dgdx)24τ2[dxϕ[g(x)](dgdx)2]2. (97)

Applying Eq 88 for a ReLU ϕ gives Eq 10 of the Results section.

Diffusion Dspike due to spiking noise

Instead of input noise, we consider independent noise arising from spiking neurons. In this case, the stochastic firing rate s is no longer the deterministic expression in Eq 34. Instead,

sα(x,t)=cα(x,t)Δt, (98)

where c is the number of spikes emitted in a simulation timestep of length Δt. We model each cα(x, t) as an independent Poisson-like random variable driven by the deterministic firing rate ϕ[gα(x, t)] with Fano factor F. It has mean ϕ[gα(x, t)]Δt and variance [gα(x, t)]Δt. Therefore,

sα(x,t)=ϕ[gα(x,t)]+Fϕ[gα(x,t)]Δtηα(x,t), (99)

where each ηα(x, t) is an independent random variable with zero mean and unit variance:

ηα(x,t)=0,ηα(x,t)ηβ(y,t)=Δtδ(t-t)δαβδ(x-y). (100)

As in Eq 95, the simulation timestep Δt defines the rate at which η is resampled. By substituting Eq 99 into Eq 33, we see that spiking neurons can be described by deterministic firing rate dynamics with the stochastic noise term

ζα(x,t)=βdyWβ(x,y)Fϕ[gβ(y,t)]Δtηβ(y,t). (101)

Now we calculate the diffusion coefficient produced by this noise. Eq 93 becomes

dθdt-dθdt=-αβdxdyWβ(x,y)ϕ[g(x-θ)]dg¯αdxFϕ[g(y-θ)]Δtηβ(y)2τdxϕ[g(x-θ)](dgdx)2=-βdydg¯βdyFϕ[g(y-θ)]Δtηβ(y)2τdxϕ[g(x-θ)](dgdx)2. (102)

We recalled Eqs 67 and 79 to obtain these equalities. We then proceed as for input noise to calculate

[θ(t)-θ(t)]2=0t0tdtdtFΔtαβdxdyϕ[g(x)]ϕ[g(y)]dg¯αdxdg¯βdyηα(x+θ(t),t)ηβ(y+θ(t),t)4τ2[dxϕ[g(x)](dgdx)2]2, (103)

which yields the diffusion coefficient

Dspike=Fdxϕ[g(x)](dgdx)24τ2[dxϕ[g(x)](dgdx)2]2. (104)

After applying Eq 88 for a ReLU ϕ and setting F = 1 for Poisson spiking, we obtain Eq 20 of the Results section.

Drift velocity vconn(θ) due to quenched connectivity noise

Suppose that we perturb the symmetric, translation-invariant W by a small component V representing deviations away from an ideal attractor architecture:

Wβ(x,y)Wβ(x,y)+Vαβ(x,y). (105)

By Eq 33, this produces the noise term

ζα(x,t)=βdyVαβ(x,y)ϕ[gβ(y,t)]. (106)

In contrast to input and spiking noise, this noise is correlated across neurons and time, so it cannot be averaged away as in Eqs 89 and 90. Substituting Eq 106 into Eq 84, we obtain

dθdt=vdrive+vconn(θ), (107)

where the drift velocity is

vconn(θ)=-αβdxdyVαβ(x,y)ϕ[g(x-θ)]dg(x-θ)dxϕ[g(y-θ)]2τdxϕ[g(x-θ)](dg(x-θ)dx)2. (108)

Because V is already small, we ignored ξ in Eq 77 to obtain this expression. We have also made the dependence on bump position θ explicit to illustrate how it influences vconn(θ). After applying Eq 88 for a ReLU ϕ, we obtain Eq 24 of the Results section.

We now make scaling arguments for speed difference (Eq 30), speed variability (Eq 31), and escape drive b0 (Eq 26). To do so, we impose a ReLU ϕ and return to discrete variables to be explicit:

vconn;θ=-αβijVαβij·Δsi-θ·sj-θ2τi(Δsi-θ)2. (109)

We need to understand how the numerator scales with M and N. It is a weighted sum of 4N2 independent Gaussian random variables Vαβij and is thus a Gaussian random variable itself. It has zero mean, but its variance is proportional to N2M2/N2. The N2 comes from the number of terms in the sum and the M2/N2 comes from the scaling of ds/dx (Eq 11). In combination with the scaling of the denominator, we conclude that vconn;θ is a Gaussian random variable with

E[vconn;θ]=0,Var[vconn;θ]N2M2. (110)

Eq 109 implies that vconn;θ is correlated over θ. The weights for the sum over Vαβij are the firing rates and their derivatives for a bump centered at θ. If θ is slightly changed, almost the same entries of V will be summed over with similar weights. The amount of correlation across θ is determined by the degree of overlap in weights, and therefore, by the width and number of bumps. Let’s consider the effects of changing N and M on the covariance matrix Cov[vconn;θ, vconn;θ]. A larger N increases the bump width and the correlation length proportionally, so values of the main diagonal decay proportionally more slowly into the off diagonals. A larger M redistributes values among the diagonals by decreasing the bump width and adding more bumps, but it does not change the total amount of correlation. Thus,

θ,θCov[vconn;θ,vconn;θ]N2·Var[vconn;θ]. (111)

This allows us to evaluate

Var[meanθvconn;θ]=Var[1Nθvconn;θ]=1N2θ,θCov[vconn;θ,vconn;θ]N2M2. (112)

As a sum of zero-mean Gaussian random variables, meanθ vconn;θ is also a zero-mean Gaussian random variable. That means |meanθ vconn;θ| follows a folded normal distribution, which obeys

E[|meanθvconn;θ|]=2πVar[meanθvconn;θ]NM. (113)

Combining this with Eqs 12 and 14 produces the scalings for speed difference in Eq 32.

We now study speed variability, which involves the expression

stdθvconn;θ=1Nθvconn;θ2. (114)

Since each vconn;θ is Gaussian, the sum of their squares follows a generalized chi-square distribution. Its mean is the trace of the covariance matrix Cov[vconn;θ, vconn;θ], which is equal to N times the variance. Thus, by Eq 110,

E[1Nθvconn;θ2]=1N·N·Var[vconn;θ]N2M2. (115)

We are interested in the square root of the random variable on the left-hand side, and we anticipate its expected value to scale as the square root of the right-hand side. We can make this argument precise. Suppose H is a random variable with a probability distribution function p(h) that scales with a power of the parameter B. We can write

p(h)=BnP(Bmh) (116)

for exponents n and m, where the rescaled probability distribution function P does not scale with B. Conservation of total probability implies

BndhP(Bmh)=BnB-mdhP(h)=1. (117)

Thus, m = n. Next, suppose we know that E[H] ∝ Bo:

E[H]=BndhhP(Bnh)=B-ndhhP(h)Bo. (118)

Thus, n = −o. We can now conclude that E[H]E[H]:

E[H]=B-odhhP(B-oh)=Bo/2dhhP(h)Bo/2. (119)

Applying this result to Eq 115, we obtain

E[stdθvconn;θ]=E[1Nθvconn;θ2]E[1Nθvconn;θ2]NM. (120)

Combining this with Eqs 12 and 14 produces the scalings for speed variability in Eq 32.

The escape drive b0 involves the expression maxθ|vconn;θ|. Extreme value statistics for correlated random variables is generally poorly understood. We follow Ref [78] and provide a heuristic argument for its scaling. We can partition vconn;θ across θ into groups that are largely independent from one another based on its correlation structure. As discussed above, vconn;θ is a weighted sum of independent Gaussian random variables Vαβij (Eq 109). The weights are products between the firing rates sjθ and their derivatives Δsiθ for a configuration centered at position θ. If we choose two θ’s such that their bumps do not overlap, the corresponding vconn;θ’s will sum over different Vαβij’s and will be independent. Thus, λ/z roughly sets the number of independent components, where λ is the bump distance and z is the bump width. This ratio does not change with M or N in our networks (Fig 2F), so the maximum function does not change the scaling of |vconn;θ|:

maxθ|vconn;θ||vconn;θ|. (121)

The scaling of E[|vconn;θ|] can be determined from Var[vconn;θ] through arguments similar to those made in Eqs 116119. Suppose we know that Var[H] ∝ Bo and E[H] = 0. Then,

Var[H]=Bndhh2P(Bnh)=B-2ndh(h)2P(h)Bo. (122)

Thus, n = −o/2. We can now conclude that E[|H|]Var[H]:

E[|H|]=B-o/2dh|h|P(B-o/2h)=Bo/2dh|h|P(h)Bo/2. (123)

Applying this result to Eq 121, we obtain

E[maxθ|vconn;θ|]E[|vconn;θ|]Var[vconn;θ]NM. (124)

Combining this with Eqs 12, 13 and 26 produces the scalings for the escape drive b0 in Eq 27.

Simulation methods

Dynamics and parameter values

To simulate the dynamics in Eq 33, we discretize the network by replacing neural position x with index i and propagate forward in time with the simple Euler method:

τgαi(t+Δt)-gαi(t)Δt+gαi(t)=βjWβijsβj(t)+A±αγb(t)+ζαi(t). (125)

We use τ = 10 ms. We use Δt = 0.5 ms and A = 1 for all simulations except those with spiking neurons. In the latter case, we use finer timesteps Δt = 0.1 ms and set A = 0.1 ms-1. Synaptic inputs g and resting inputs A can be dimensionless for rate-based simulations, but they must have units of rate for spiking simulations. We use γ = 0.1 for rate-based simulations and γ = 0.01 ms-1 for spiking simulations. In all cases, we run the simulation for 1000 timesteps before recording any data to form the bumps. To achieve the relationship in Eq 13 for circular mapping, we rescale γ with network size N and bump number M:

γγ·N600·3M. (126)

The connectivity W takes the form in Eq 38. Unless otherwise specified, we use shift ξ = 2. To produce M bumps in a network of size N, we turn to Eq 47 and set l = 0.44N/M. Note that an alternative to Eqs 13 and 126 would be to rescale ξ proportionally with lN/M under circular mapping, since bump velocity is also proportional to ξ (Fig 3C). We use w = 8M/N ≈ 3.5/l. For the case of 2l > N/2, which corresponds to a one-bump network, the tails of the cosine function extend beyond the network size. Instead of truncating them, we wrap them around the ring:

W(x)W(x)+W(x-N)+W(x+N). (127)

This procedure, along with the scaling of w with N and M, accomplishes Eq 7 and keeps the total connectivity strength per neuron ∑i Wi constant across all N and M, where Wi is the discrete form of W(x).

To generate the Poisson-like spike counts cαi(t) in Eq 98, we rescale Poisson random variables:

cαi(t)=F·Cαi(t),Cαi(t)Pois[ϕ[gαi(t)]Δt/F]. (128)

These counts will be multiples of the Fano factor F. To produce a cαi(t) whose domain is the natural numbers, one can follow Ref [18], which takes multiple samples of Cαi(t) during each timestep.

To obtain theoretical values in Figs 3, 5, 7 and 8, we need to substitute the baseline inputs gi into the appropriate equations. We use noiseless and driveless simulations to generate gi instead of using Eq 4.

Bump position

We track the position θ of each bump using the firing rate summed across both populations Si(t) = ∑α ϕ[gαi(t)]. We first compute the circular center of mass of Si(t) with periodicity N/M:

θ0=N2πMatan2[iSi(t)sin(2πiM/N),iSi(t)cos(2πiM/N)]. (129)

atan2 is the two-argument arctangent, and we choose its branch cut such that its range is [0, 2π). Thus, θ0 lies between 0 and N/M and represents the bump position averaged periodically across bumps. To track the position of each bump independently, we then partition the network into segments of length ⌊N/M⌋. If N/M is not an integer, we skip one neuron between some segments to have them distributed as evenly as possible throughout the network. We perform a circular shift of Si(t) such that network position θ0 is shifted to the middle of the first segment N/2M, after rounding both quantities to integers. The purpose of this process is to approximately center each bump within a segment so that Si(t) drops to 0 before reaching segment boundaries. We then calculate the center of mass of Si(t) within each segment. After reversing the circular shift, these centers of masses are taken to be the bump positions.

Path integration velocity and diffusion

To obtain our results in Figs 3 and 5, we run each simulation for T = 5 s. To extract the bump velocity v produced by a constant drive b, we calculate the mean displacement Θ as a function of time offset u:

Θ(u)=ΔtT-ut[θ(t+u)-θ(t)]. (130)

θ is the bump position. This equation averages over the fiducial starting time t, which ranges from 0 to Tu − Δt in increments of Δt. We vary u between 0 and T/2 in increments of Δt; the maximum is T/2 to ensure enough t’s for accurate averaging. We then fit Θ(u) to a line through the origin to obtain the velocity:

Θ(u)vu. (131)

We calculate the diffusion coefficient D based on an ensemble of replicate simulations. In this section, angle brackets will indicate averaging over this ensemble. Following the definition of D in Eq 92, we calculate each bump’s position relative to the mean motion of the ensemble:

ω(t)=θ(t)-θ(t). (132)

We compute squared displacements and then average over fiducial starting times to obtain a mean squared displacement for each bump as a function of time offset u:

Ω(u)=ΔtT-ut[ω(t+u)-ω(t)]2. (133)

t and u span the same time ranges as they did for Θ. We average Ω(u) over the ensemble and fit it to a line through the origin to obtain the diffusion coefficient:

Ω(u)2Du. (134)

For simulations with M bumps, we arbitrarily assign identity numbers 1, …, M to bumps in each simulation. We perform ensemble averaging over bumps with the same identity numbers; that is, we only average over one bump per simulation. This way, we obtain separate values for each bump in Fig 3E–3H; nevertheless, these values lie on top of each other. In Fig 3B and 3C, each point represents v averaged across bumps. To calculate the mean velocity 〈v〉 in Fig 3E and 3F, we fit 〈Θ(u)〉 to a line through the origin. To estimate standard deviations for Figs 3E–3H and 5, we create 48 bootstrapped ensembles, each of which contains 48 replicate simulations sampled with replacement from the original ensemble. We calculate 〈v〉 or D for each bootstrapped ensemble and record the resulting standard deviation. In Fig 5, each point represents D and its estimated standard deviation averaged across bumps.

Trapping and position-dependent velocity

For simulations with connectivity noise, we determine the escape drive b0 (Fig 7), the smallest drive that allows the bumps to travel through the entire network, by a binary search over b. We perform 8 rounds of search between the limits 0 and 1.28 and another 8 rounds between 0 and −1.28 to obtain b0 within an accuracy of 0.01. In each round, we run a simulation with the test b and see whether the bumps travel through the network or get trapped. Traveling through the network means that every position (rounded to the nearest integer) has been visited by a bump, and trapping means that the motion of at least one bump slows below a threshold for a length of time.

To obtain the position-dependent bump velocity v(θ) produced by connectivity noise when |b| > b0, we run a simulation until the bumps have traveled through the network. At each timestep, we record the positions of the bumps (binned to the nearest integer) and their instantaneous velocities with respect to the previous timestep. We smooth the velocities in time with a Gaussian kernel of width 10 ms, which is the neural time constant τ. We compute the mean and standard deviation of these smoothed velocities for each position bin.

Mutual information

For simulations with input noise, we explore the mutual information between encoded coordinate and single-neuron activity (Fig 6). To do so, we must generate data from which we can compute p(s|u) in Eq 22, for coordinate uU and activity sS. We have chosen one set of conditions for performing this analysis, which we detail below.

We first choose to represent either a linear or circular coordinate, which we take to be position or orientation, respectively. We then choose to represent a narrow or wide coordinate range umax, which is 20 cm or 200 cm for position and 36°or 360°for orientation. We divide the range into 20 equally spaced coordinates such that U={umax/20,,umax}. We convert these coordinates to network positions according to the mappings in Fig 4. For each coordinate value u, we initialize 96 replicate simulations at the corresponding network position by applying additional synaptic input to the desired bump positions during bump formation. We run the simulations for 5 s, record the final firing rates, and bin them using 6 equally spaced bins from 0 to the 99th percentile across all neurons. All rates above the 99th percentile are also added to the 6th bin. These bins define the discrete S, and normalizing the bin counts produces p(s|u). We marginalize over u to obtain p(s), and p(u) is uniform. We can then use Eq 22 to compute the mutual information.

The 4 local cues in Fig 6F–6H correspond to 4 activity states Scue separate from the 6 activity bins of the CAN neurons, Sneuron. The joint sample space of a single neuron with cues is thus S=Sneuron×Scue with 6 × 4 = 24 total states. We bin neural activity across these more numerous states, using the coordinate value u to determine the cue state value, to again compute p(s|u) and then the mutual information.

We choose to compute mutual information with single-neuron activities binned into 6 discrete states due to computational tractability. A better indication of encoding quality for the entire network would involve using the joint activity of multiple neurons. However, assuming the same binning process, that would involve estimating probability distributions over 6n states for n neurons, which would require exponentially more replicate simulations per coordinate value than the 96 we use. Alternatively, one could reduce the dimensionality of the network activity by projecting it onto various attractor configurations, as done by Ref [79].

Supporting information

S1 Text. Contains text on different model parameters, additional information results, and splitting networks, as well as Figs A, B, and C.

(PDF)

Acknowledgments

We are grateful to Steven Lee for sharing his code and to John Widloski for a careful reading of this manuscript.

Data Availability

Simulation and analysis code is available at https://github.com/louiskang-group/wang-2022.

Funding Statement

RW and LK were funded by RIKEN Center for Brain Science. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Amari Si. Dynamics of pattern formation in lateral-inhibition type neural fields. Biol Cybern. 1977;27(2):77–87. doi: 10.1007/BF00337259 [DOI] [PubMed] [Google Scholar]
  • 2. Ermentrout GB, Cowan JD. A mathematical theory of visual hallucination patterns. Biol Cybern. 1979;34(3):137–150. doi: 10.1007/BF00336965 [DOI] [PubMed] [Google Scholar]
  • 3. Milnor J. On the concept of attractor. Commun Math Phys. 1985;99(2):177–195. doi: 10.1007/BF01212280 [DOI] [Google Scholar]
  • 4. Cannon SC, Robinson DA, Shamma S. A proposed neural network for the integrator of the oculomotor system. Biol Cybern. 1983;49(2):127–136. doi: 10.1007/BF00320393 [DOI] [PubMed] [Google Scholar]
  • 5. McNaughton BL, Chen LL, Markus EJ. Dead reckoning, landmark learning, and the sense of direction: a neurophysiological and computational hypothesis. J Cognit Neurosci. 1991;3(2):190–202. doi: 10.1162/jocn.1991.3.2.190 [DOI] [PubMed] [Google Scholar]
  • 6. Seung HS. How the brain keeps the eyes still. Proc Natl Acad Sci USA. 1996;93(23):13339–13344. doi: 10.1073/pnas.93.23.13339 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Taube JS, Muller RU, Ranck JB. Head-direction cells recorded from the postsubiculum in freely moving rats. I. Description and quantitative analysis. J Neurosci. 1990;10(2):420–435. doi: 10.1523/JNEUROSCI.10-02-00420.1990 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Seelig JD, Jayaraman V. Neural dynamics for landmark orientation and angular path integration. Nature. 2015;521(7551):186–191. doi: 10.1038/nature14446 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Skaggs WE, Knierim JJ, Kudrimoti HS, McNaughton BL. A model of the neural basis of the rat’s sense of direction. Adv Neural Inf Process Syst. 1995;7:173–180. [PubMed] [Google Scholar]
  • 10. Zhang K. Representation of spatial orientation by the intrinsic dynamics of the head-direction cell ensemble: A theory. J Neurosci. 1996;16(6):2112–2126. doi: 10.1523/JNEUROSCI.16-06-02112.1996 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Kim SS, Rouault H, Druckmann S, Jayaraman V. Ring attractor dynamics in the Drosophila central brain. Science. 2017;356(6340):849–853. doi: 10.1126/science.aal4835 [DOI] [PubMed] [Google Scholar]
  • 12. Turner-Evans D, Wegener S, Rouault H, Franconville R, Wolff T, Seelig JD, et al. Angular velocity integration in a fly heading circuit. eLife. 2017;6:e04577. doi: 10.7554/eLife.23496 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Green J, Adachi A, Shah KK, Hirokawa JD, Magani PS, Maimon G. A neural circuit architecture for angular integration in Drosophila. Nature. 2017;546(7656):101–106. doi: 10.1038/nature22343 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Hafting T, Fyhn M, Molden S, Moser MB, Moser EI. Microstructure of a spatial map in the entorhinal cortex. Nature. 2005;436(7052):801–806. doi: 10.1038/nature03721 [DOI] [PubMed] [Google Scholar]
  • 15. McNaughton BL, Battaglia FP, Jensen O, Moser EI, Moser MB. Path integration and the neural basis of the’cognitive map’. Nat Rev Neurosci. 2006;7(8):663–678. doi: 10.1038/nrn1932 [DOI] [PubMed] [Google Scholar]
  • 16. Fuhs MC, Touretzky DS. A spin glass model of path integration in rat medial entorhinal cortex. J Neurosci. 2006;26(16):4266–4276. doi: 10.1523/JNEUROSCI.4353-05.2006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Guanella A, Kiper D, Verschure P. A model of grid cells based on a twisted torus topology. Int J Neural Syst. 2007;17(04):231–240. doi: 10.1142/S0129065707001093 [DOI] [PubMed] [Google Scholar]
  • 18. Burak Y, Fiete IR. Accurate path integration in continuous attractor network models of grid cells. PLOS Comput Biol. 2009;5(2):e1000291. doi: 10.1371/journal.pcbi.1000291 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Yoon K, Buice MA, Barry C, Hayman R, Burgess N, Fiete IR. Specific evidence of low-dimensional continuous attractor dynamics in grid cells. Nat Neurosci. 2013;16(8):1077–1084. doi: 10.1038/nn.3450 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Gu Y, Lewallen S, Kinkhabwala AA, Domnisoru C, Yoon K, Gauthier JL, et al. A map-like micro-organization of grid cells in the medial entorhinal cortex. Cell. 2018;175(3):736–750. doi: 10.1016/j.cell.2018.08.066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Gardner RJ, Lu L, Wernle T, Moser MB, Moser EI. Correlation structure of grid cells is preserved during sleep. Nat Neurosci. 2019;22(4):598–608. doi: 10.1038/s41593-019-0360-0 [DOI] [PubMed] [Google Scholar]
  • 22. Gardner RJ, Hermansen E, Pachitariu M, Burak Y, Baas NA, Dunn BA, et al. Toroidal topology of population activity in grid cells. Nature. 2022;602(7895):123–128. doi: 10.1038/s41586-021-04268-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Goodridge JP, Dudchenko PA, Worboys KA, Golob EJ, Taube JS. Cue control and head direction cells. Behav Neurosci. 1998;112(4):749–761. doi: 10.1037/0735-7044.112.4.749 [DOI] [PubMed] [Google Scholar]
  • 24. Constantinidis C, Wang XJ. A neural circuit basis for spatial working memory. Neuroscientist. 2004;10(6):553–565. doi: 10.1177/1073858404268742 [DOI] [PubMed] [Google Scholar]
  • 25. Edin F, Klingberg T, Johansson P, McNab F, Tegnér J, Compte A. Mechanism for top-down control of working memory capacity. Proceedings of the National Academy of Sciences. 2009;106(16):6802–6807. doi: 10.1073/pnas.0901894106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Wimmer K, Nykamp DQ, Constantinidis C, Compte A. Bump attractor dynamics in prefrontal cortex explains behavioral precision in spatial working memory. Nat Neurosci. 2014;17(3):431–439. doi: 10.1038/nn.3645 [DOI] [PubMed] [Google Scholar]
  • 27. Tsodyks M, Sejnowski T. Associative memory and hippocampal place cells. Int J Neural Syst. 1995;6:S81–S86. [Google Scholar]
  • 28. Samsonovich A, McNaughton BL. Path integration and cognitive mapping in a continuous attractor neural network model. J Neurosci. 1997;17(15):5900–5920. doi: 10.1523/JNEUROSCI.17-15-05900.1997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Stringer SM, Rolls ET, Trappenberg TP. Self-organizing continuous attractor network models of hippocampal spatial view cells. Neurobiol Learn Mem. 2005;83(1):79–92. doi: 10.1016/j.nlm.2004.08.003 [DOI] [PubMed] [Google Scholar]
  • 30. Ben-Yishai R, Bar-Or RL, Sompolinsky H. Theory of orientation tuning in visual cortex. Proc Natl Acad Sci USA. 1995;92(9):3844–3848. doi: 10.1073/pnas.92.9.3844 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Somers DC, Nelson SB, Sur M. An emergent model of orientation selectivity in cat visual cortical simple cells. J Neurosci. 1995;15(8):5448–5465. doi: 10.1523/JNEUROSCI.15-08-05448.1995 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Brody CD, Romo R, Kepecs A. Basic mechanisms for graded persistent activity: discrete attractors, continuous attractors, and dynamic representations. Curr Opin Neurobiol. 2003;13(2):204–211. doi: 10.1016/S0959-4388(03)00050-3 [DOI] [PubMed] [Google Scholar]
  • 33. Machens CK, Romo R, Brody CD. Flexible control of mutual inhibition: a neural model of two-interval discrimination. Science. 2005;307(5712):1121–1124. doi: 10.1126/science.1104171 [DOI] [PubMed] [Google Scholar]
  • 34. Stringer SM, Trappenberg TP, Rolls ET, Araujo IETd. Self-organizing continuous attractor networks and path integration: one-dimensional models of head direction cells. Netw Comput Neural Syst. 2002;13(2):217–242. doi: 10.1080/net.13.2.217.242 [DOI] [PubMed] [Google Scholar]
  • 35. Wu S, Hamaguchi K, Amari Si. Dynamics and computation of continuous attractors. Neural Comput. 2008;20(4):994–1025. doi: 10.1162/neco.2008.10-06-378 [DOI] [PubMed] [Google Scholar]
  • 36. Stringer SM, Rolls ET, Trappenberg TP. Self-organising continuous attractor networks with multiple activity packets, and the representation of space. Neural Networks. 2004;17(1):5–27. doi: 10.1016/S0893-6080(03)00210-7 [DOI] [PubMed] [Google Scholar]
  • 37. Xie X, Hahnloser RHR, Seung HS. Double-ring network model of the head-direction system. Phys Rev E. 2002;66(4):041902. doi: 10.1103/PhysRevE.66.041902 [DOI] [PubMed] [Google Scholar]
  • 38.Widloski J. Grid cell attractor networks: development and implications. Doctoral dissertation, University of Texas at Austin. 2015;.
  • 39. Sorscher B, Mel G, Ganguli S, Ocko S. A unified theory for the origin of grid cells through the lens of pattern formation. Adv Neural Inf Process Syst. 2019;32:10003–10013. [Google Scholar]
  • 40. Khona M, Chandra S, Fiete IR. From smooth cortical gradients to discrete modules: spontaneous and topologically robust emergence of modularity in grid cells. bioRxiv. 2022; p. 2021.10.28.466284. [Google Scholar]
  • 41. Kang L, Balasubramanian V. A geometric attractor mechanism for self-organization of entorhinal grid modules. eLife. 2019;8:e46687. doi: 10.7554/eLife.46687 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Mosheiff N, Burak Y. Velocity coupling of grid cell modules enables stable embedding of a low dimensional variable in a high dimensional neural attractor. eLife. 2019;8:e48494. doi: 10.7554/eLife.48494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Burak Y, Fiete IR. Fundamental limits on persistent activity in networks of noisy neurons. Proc Natl Acad Sci USA. 2012;109(43):17645–17650. doi: 10.1073/pnas.1117386109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. O’Keefe J, Burgess N. Dual phase and rate coding in hippocampal place cells: theoretical significance and relationship to entorhinal grid cells. Hippocampus. 2005;15(7):853–866. doi: 10.1002/hipo.20115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Sreenivasan S, Fiete I. Grid cells generate an analog error-correcting code for singularly precise neural computation. Nat Neurosci. 2011;14(10):1330–1337. doi: 10.1038/nn.2901 [DOI] [PubMed] [Google Scholar]
  • 46. Stemmler M, Mathis A, Herz AVM. Connecting multiple spatial scales to decode the population activity of grid cells. Sci Adv. 2015;1(11):e1500816–e1500816. doi: 10.1126/science.1500816 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Brunel N, Nadal JP. Mutual information, Fisher information, and population coding. Neural Comput. 1998;10(7):1731–1757. doi: 10.1162/089976698300017115 [DOI] [PubMed] [Google Scholar]
  • 48. Wei XX, Stocker AA. Mutual information, Fisher information, and efficient coding. Neural Comput. 2016;28(2):305–326. doi: 10.1162/NECO_a_00804 [DOI] [PubMed] [Google Scholar]
  • 49. Mathis A, Herz AVM, Stemmler M. Optimal population codes for space: grid cells outperform place cells. Neural Comput. 2012;24(9):2280–2317. doi: 10.1162/NECO_a_00319 [DOI] [PubMed] [Google Scholar]
  • 50. Papadimitriou CH. On the optimality of grid cells. arXiv. 2016; p. 1606.04876. [Google Scholar]
  • 51. Nambu Y. Quasi-particles and gauge invariance in the theory of superconductivity. Phys Rev. 1960;117(3):648–663. doi: 10.1103/PhysRev.117.648 [DOI] [Google Scholar]
  • 52. Goldstone J. Field theories with Superconductor solutions. Nuovo Cimento. 1961;19(1):154–164. doi: 10.1007/BF02812722 [DOI] [Google Scholar]
  • 53. Itskov V, Hansel D, Tsodyks M. Short-term facilitation may stabilize parametric working memory trace. Front Comput Neurosci. 2011;5:40. doi: 10.3389/fncom.2011.00040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Seeholzer A, Deger M, Gerstner W. Stability of working memory in continuous attractor networks under the control of short-term plasticity. PLOS Comput Biol. 2019;15(4):e1006928. doi: 10.1371/journal.pcbi.1006928 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Krupic J, Bauza M, Burton S, Lever C, O’Keefe J. How environment geometry affects grid cell symmetry and what we can learn from it. Philos Trans R Soc B. 2014;369(1635):20130188. doi: 10.1098/rstb.2013.0188 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Bush D, Burgess N. A hybrid oscillatory interference/continuous attractor network model of grid cell firing. J Neurosci. 2014;34(14):5065–5079. doi: 10.1523/JNEUROSCI.4017-13.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Hardcastle K, Ganguli S, Giocomo LM. Environmental boundaries as an error correction mechanism for grid cells. Neuron. 2015;86(3):827–839. doi: 10.1016/j.neuron.2015.03.039 [DOI] [PubMed] [Google Scholar]
  • 58. Stensola H, Stensola T, Solstad T, Frøland K, Moser MB, Moser EI. The entorhinal grid map is discretized. Nature. 2012;492(7427):72–78. doi: 10.1038/nature11649 [DOI] [PubMed] [Google Scholar]
  • 59. Widloski J, Marder MP, Fiete IR. Inferring circuit mechanisms from sparse neural recording and global perturbation in grid cells. eLife. 2018;7:e33503. doi: 10.7554/eLife.33503 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Taube JS. The head direction signal: origins and sensory-motor integration. Annu Rev Neurosci. 2007;30(1):181–207. doi: 10.1146/annurev.neuro.29.051605.112854 [DOI] [PubMed] [Google Scholar]
  • 61. Fiete IR, Burak Y, Brookings T. What grid cells convey about rat location. J Neurosci. 2008;28(27):6858–6871. doi: 10.1523/JNEUROSCI.5684-07.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Wei Z, Wang XJ, Wang DH. From distributed resources to limited slots in multiple-item working memory: a spiking network model with normalization. J Neurosci. 2012;32(33):11228–11240. doi: 10.1523/JNEUROSCI.0735-12.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Almeida R, Barbosa J, Compte A. Neural circuit basis of visuo-spatial working memory precision: a computational and behavioral study. J Neurophysiol. 2015;114(3):1806–1818. doi: 10.1152/jn.00362.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Wei XX, Prentice J, Balasubramanian V. A principle of economy predicts the functional architecture of grid cells. eLife. 2015;4:e08362. doi: 10.7554/eLife.08362 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Compte A, Brunel N, Goldman-Rakic PS, Wang XJ. Synaptic mechanisms and network dynamics underlying spatial working memory in a cortical network model. Cereb Cortex. 2000;10(9):910–923. doi: 10.1093/cercor/10.9.910 [DOI] [PubMed] [Google Scholar]
  • 66. Kilpatrick ZP, Ermentrout B. Wandering bumps in stochastic neural fields. SIAM J Appl Dyn Syst. 2013;12(1):61–94. doi: 10.1137/120877106 [DOI] [Google Scholar]
  • 67. Krishnan N, Poll DB, Kilpatrick ZP. Synaptic efficacy shapes resource limitations in working memory. J Comput Neurosci. 2018;44(3):273–295. doi: 10.1007/s10827-018-0679-7 [DOI] [PubMed] [Google Scholar]
  • 68. Bouchacourt F, Buschman TJ. A flexible model of working memory. Neuron. 2019;103(1):147–160.e8. doi: 10.1016/j.neuron.2019.04.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Can T, Krishnamurthy K. Emergence of memory manifolds. arXiv. 2021; p. 2109.03879. [Google Scholar]
  • 70. Wu S, Amari Si, Nakahara H. Population coding and decoding in a neural field: a computational study. Neural Comput. 2002;14(5):999–1026. doi: 10.1162/089976602753633367 [DOI] [PubMed] [Google Scholar]
  • 71. Qi Y, Breakspear M, Gong P. Subdiffusive dynamics of bump attractors: mechanisms and functional roles. Neural Comput. 2015;27(2):255–280. doi: 10.1162/NECO_a_00698 [DOI] [PubMed] [Google Scholar]
  • 72. Fung CCA, Wong KYM, Wu S. A moving bump in a continuous manifold: a comprehensive study of the tracking dynamics of continuous attractor neural networks. Neural Comput. 2010;22(3):752–792. doi: 10.1162/neco.2009.07-08-824 [DOI] [PubMed] [Google Scholar]
  • 73. Renart A, Song P, Wang XJ. Robust spatial working memory through homeostatic synaptic scaling in heterogeneous cortical networks. Neuron. 2003;38(3):473–485. doi: 10.1016/S0896-6273(03)00255-1 [DOI] [PubMed] [Google Scholar]
  • 74. Thurley K, Leibold C, Gundlfinger A, Schmitz D, Kempter R. Phase precession through synaptic facilitation. Neural Comput. 2008;20(5):1285–1324. doi: 10.1162/neco.2008.07-06-292 [DOI] [PubMed] [Google Scholar]
  • 75. Navratilova Z, Giocomo LM, Fellous JM, Hasselmo ME, McNaughton BL. Phase precession and variable spatial scaling in a periodic attractor map model of medial entorhinal grid cells with realistic after-spike dynamics. Hippocampus. 2012;22(4):772–789. doi: 10.1002/hipo.20939 [DOI] [PubMed] [Google Scholar]
  • 76. Kang L, DeWeese MR. Replay as wavefronts and theta sequences as bump oscillations in a grid cell attractor network. eLife. 2019;8:e46351. doi: 10.7554/eLife.46351 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Hopfield JJ. Neurons with graded response have collective computational properties like those of two-state neurons. Proc Natl Acad Sci USA. 1984;81(10):3088–3092. doi: 10.1073/pnas.81.10.3088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Majumdar SN, Pal A, Schehr G. Extreme value statistics of correlated random variables: A pedagogical review. Phys Rep. 2020;840:1–32. doi: 10.1016/j.physrep.2019.10.005 [DOI] [Google Scholar]
  • 79. Roudi Y, Treves A. Representing where along with what information in a model of a cortical patch. PLOS Comput Biol. 2008;4(3):e1000012. doi: 10.1371/journal.pcbi.1000012 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010547.r001

Decision Letter 0

Daniele Marinazzo, Xuexin Wei

5 May 2022

Dear Dr. Kang,

Thank you very much for submitting your manuscript "Multiple bumps can enhance robustness to noise in continuous attractor networks" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Xuexin Wei

Associate Editor

PLOS Computational Biology

Daniele Marinazzo

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors analyze and simulate the dynamical evolution of bump solutions in perturbed versions of continuous attractor networks. Their analysis relies primarily on weak noise or weak connectivity perturbation assumptions so that the stochastic motion of bumps can be approximated by linearizing around stationary bump or pattern solutions. As the authors make clear, these approaches and findings are actually fairly standard at this point and already developed and used by a number of authors to analyze stochastic bump dynamics in neural network models that are similar but maybe not quite quantitatively parameterized the same. The new finding proffered by this paper is simply that for the particular model parameterization increasing the number of bumps decreases bump diffusion for "linear" networks and increases bump diffusion for "circular" networks. Linear networks reside on a non-periodic line and circular networks are periodic.

My primary issue with the work is that the coupling strength scaling (Eq. 13) the authors choose seems to be doing a lot of the work in shaping these relations, but this is not well justified. Moreover, I do not see how general this result is or if you've simply chosen constituent functions of the model (\\phi and W) that are non-generic and so produce a behavior that would not likely persist in other networks with different transfer or weight functions. In fact, in a recent review by Stein, Barbosa, & Compte (2021; Curr Op Neuro), this issue is alluded to, in that qualitative trends in bump diffusion increasing/decreasing can be model function choice dependent. I don't think the authors are really hiding anything, but I would like to see how well these results hold up across a broader range of model function choice. Otherwise, I'd say given the technical nature of the paper, it is presented thoroughly and clearly. See below for more detailed comments.

Comments:

1. You seem to suggest that networks with M bumps essentially code information the same way regardless of bump number. However, a single neuron will see activity rise and fall M times in a "rotation period" and so the same velocity integration process can represent an interval of 1/M the length. This is made fairly clear in past papers of grid cell models that the authors cite. It is unclear how bump identity could be known by a single neuron (in Fig. 1). It seems like you would shorten the period. What is driving the rotation of the bumps? Is it now that you have to pack neurons of a particular tuning N times as tightly?

2. The intro is wrong in asserting no one else has examined the effect of bump number of noise-induced WM degradation. Many past papers have:

Edin et al. Mechanism for top-down control of working memory capacity (2009) PNAS

Wei et al. From distributed resources to limited slots in multiple-item working memory: a spiking network model with normalization (2012) J Neurosci

Almeida et al. Neural circuit basis of visuo-spatial working memory precision: a computational and behavioral study (2015) J Neurophysiol

Krishnan et al. Synaptic efficacy shapes resource limitations in working memory (2018) J Comput Neurosci

Bouchacourt & Buschman. A flexible model of working memory (2019) Neuron

3. Per my point above, what justifies rescaling connectivity as in Eq 7? A lot of results in the paper hinge on this, but it is not well justified.

4. Provide justification/citation for the reduction from Eq 1 to Eq 3. Does this depend on the ReLu transfer function assumption? You do this in the Theoretical model section too, but don’t really prove the reduction is valid.

5. Line 119: Doesn’t the period of the network impact the spacing more than a small difference? Seems like this would be the case especially if L/\\lambda equals an integer plus a half.

6. Why does keeping the bump shape the same make for a fair comparison? Neural architecture would seem more likely to be the constrained quantity. You should provide more evidence that there is some straightforward mechanism for the amplitude of neural connectivity to be varied so that bump are of the same size regardless of their number. This seems like it would be a complicated feedback control mechanism.

7. You introduce the notion of linear vs circular coordinates, but you are not very clear about what boundary conditions you use in the linear coordinate case. Are they free boundaries? In general you are on clear about what weight function W is used in any of the plots. Is this somewhere buried in the methods at the end?

8. Line 355: You say most of your results apply to a general nonlinear transfer function. Where do you make it clear that this is true? All your plots are for the ReLU, so I’m not sure where you’re showing results for other nonlinearities. Do Eq. 8, 10, 12 all hold for general \\phi or specifically the ReLU? Make this more clear.

9. Your analysis assumes periodic solutions g that are sinusoidal, but shouldn’t there also be solutions that might just be single localized bumps? Did you explore this possibility? Is it clear that choosing W as in Eq. 37 rules out such solutions? If so, it seems to me that the W you chose is nongeneric, since typically in continuum bump attractor networks we should find single bumps with quiescence around.

Reviewer #2: The paper by Wang and Kang addresses an important problem in theoretical neuroscience. Continuous attractors are an important conceptual model in neuroscience but it is know that noise can cause drift them. This will hamper both the retention of information in continuous attractor models of working memory and the integration of input in continuous attractor models of path integration.

The main contribution of the paper is that noise induced drift can be minimized by using connectivity patterns that yield more bumps. This is shown both by solid analytical calculations and computer simulations.

In general this is a good paper, but I think it suffers from two major and one minor issues as I describe below.

Major Issues:

1. Although I largely agree with the author's finding that drift due to fast noise can be reduced by having more bumps, I do not find that this necessarily confers, or that the authors show that it confers, a benefit when looked at from the perspective of the presumed function of continuous attractors. Having more bumps in a network of N neurons, in addition to reducing noise, reduces the resolution by which the external covariate can be encoded. In other words, what the authors do not address is how much "information" is lost or maintained as a result of drift with different number of bumps. Consider using a continuous attractor of N neurons for keeping an angle in working memory. The original angle, encoded by the original position of the bump(s), and the angle encoded by the position of the bump(s) after a given time T may be closer to each other when there are more bumps in the network. However, with one bump, the 360 degrees can be represented in steps of 360/N degrees, while with two bumps in steps of 360*2/N. I think the authors should perhaps use mutual information to study this tradeoff as has been used e.g. in Roudi and Treves PLoS Comp Biol 2008.

2. The only focus on the effect of fast noise on the drift. If not more important, an equally important problem, is that of the noise induced by quenched noise, e.g. when the connectivity is not a perfect connectivity in Fig2 b but with a temporally fixed noise added to it, e.g. see Eq. 2 Itskov et al 2011. I think the authors should discuss the interaction between multi- bumps solutions and the influence of quenched noise in more detail perhaps also discussing the recent result in https://arxiv.org/pdf/2109.03879.pdf

Minor:

There is also a minor issue regarding the way the authors discuss CANs. The authors seem to confuse evidence that is compatible with CANs with evidence that shows the existence of CANs in the brain. As far as I can say, none of the papers cited by the authors show that CANs are actually implemented in the brain or are actually used by the brain. The cited evidence are either theoretical models or experimental data that are consistent or supportive of CANs. I suggest the authors be more careful (e.g. "The brain uses path-integrating CANs" can be changed to "Path integrating CANs have been proposed as a mechanism ...").

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No: The response to this question the authors give is simply the link https://louiskang.group/repo but this does not include a folder for code for the current paper.

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010547.r003

Decision Letter 1

Daniele Marinazzo, Xuexin Wei

30 Jul 2022

Dear Dr. Kang,

Thank you very much for submitting your manuscript "Multiple bumps can enhance robustness to noise in continuous attractor networks" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Xuexin Wei

Associate Editor

PLOS Computational Biology

Daniele Marinazzo

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Thanks to the authors for addressing my concerns. I now support publication of the article in PLoS Computational Biology.

Reviewer #3: The continuous attractor model is believed to represent the dynamics and structure of certain populations of neurons in the brain – such as head direction cells and grid cells, thought to encode the direction the animal is facing and its location. The networks are usually given a distinct topology and a predefined connectivity, and the activity of each neuron is given by the time-dependent synaptic input. Depending on the connectivity, one or more activity packets or “bumps” arise on the network. These may be smoothly translated along the network and the position of the packet(s) usually depict the current state of some external variable – e.g. position in space or head direction. It is still an open question whether the brain follows continuous attractor dynamics and if so, whether the networks hold one or more bumps.

Wang and Kang give a convincing exposé of how, under certain circumstances, there may be an advantage in a continuous attractor network (CAN) having multiple bumps. The authors provide a thorough analysis on the effect of three variants of noise - synaptic input (1), spiking noise (2) and perturbations in connectivity (3) - on a 1D CAN model with ring topology under varying bump numbers and network size. The measure of robustness is given through bump diffusion coefficients and mutual information for noise 1 and 2 and by bump “escape drive” and bump velocity irregularities for 3. Through mathematical derivations the noise dependence on number of bumps and neurons is clearly shown, and the findings are supported by simulations. The authors conclude that, for CANs representing a linear variable (e.g. location on a linear track), the robustness to noise is increased when having more bumps but decreased with a large number of neurons. For CANs representing a circular variable (e.g. head direction) there is no effect of adding more bumps, but increasing the number of neurons makes the network more robust. The figures are polished and easy to understand, and the text well-written. The code is neatly organized, easy to implement and runs quickly.

The revision has improved the paper by adding mutual information as a second measure of robustness, clarifying certain statements, including references to working memory CAN literature and adding an analysis of a different non-linear activation function as well as an analysis using unscaled connectivity. Notably, the latter showed that fewer bumps had greater robustness to noise for the circular mapping.

The analysis is elaborate and the theory sound. However, such analysis is dependent on the assumptions made in the model. The authors explore a few different conditions – linear vs circular mapping, ReLU and sigmoidal activation function, scaled vs unscaled connectivity weights and three types of noise. The specific results rely heavily on the mapping between internal and external variables and possibly other choices made throughout – e.g. the topology and dimension of the network. The plausibility for the assumptions is only briefly discussed and the intuition behind the assumptions and their consequences should be clarified to the reader. Thus, to me, the impact seems to lie rather in the framework than in the results and a main concern is the extent to which the title and main text focuses on the increased robustness of multi-bump networks. Clarifying and softening the claims may help to not give the impression that the increased robustness is a general result for all CANs.

In particular, explaining more explicitly how the choice of mapping relates to the results and why multiple bumps and a low number of neurons is beneficial for a linear mapping but not a circular one would make it easier to understand the non-triviality of the findings and thus the impact of the paper. E.g. doubling the number of bumps doubles the number of neurons encoding each (fixed) location under both linear and circular mapping. I.e. going from one to two bumps is similar to splitting the population into two networks, each encoding half the range of the original network but with equal intervals between each encoded position in the linear case, and the full range for circular mapping, but with twice the interval between each angle. Could the results thus depend on the number of neurons encoding each (fixed) interval, the size of the interval, the total range of coordinates encoded by the network or perhaps just the "spread" of noise across bumps? How do the results compare when splitting the networks - is the combined readout of multiple networks more or less robust than the readout of one network with multiple bumps and which is more biologically plausible? Showing more intuitively how the results relate to the assumptions would make the exposition clearer and help understand what is really making a difference.

Lines 130-131 say the scaling of W removes any influence of bump shape. However, the change of firing rate over network position (i.e. the slope of the bump ds/dx ~ M/N = 1/\\lambda), seems to be an important factor for robustness (e.g. eq. 10). It would seem the results could be explained by saying that adding more bumps (in the linear case) decreases the bump distance, increasing ds/dx (hence changing the bump shape) and thus increasing robustness (while under circular mapping ds/dx is unchanged as the network distance is scaled)? In 10A, we see how the change in w changes the shape of the bump (and ds/dx) as seen from eq. 58 and 60, which should also be reflected in the results. A clearer derivation/analysis of this effect would be enlightening.

Similarly, it could be explicitly shown in Fig. 6 how the temporal/spatial shape of the bumps changes when introducing more bumps and how this leads to a different spatial tuning curve per neuron and thus a higher mutual information score for the linear case. In this regard, I am also missing mutual information scores in Fig. 10.

In computing mutual information, I am wondering if it would be fairer to look at the whole coordinate range of the network? Looking at different number of bumps is similar to studying neurons of grid cell modules of different scales and I would expect the descriptive power of each neuron to be the same but at different spatial scales. Visual cues (or multiple modules) may be involved in confining the possible states, but would the results then be obvious in that grid modules with small spacing gives a more precise description of location (in a short interval) than those with large spacing?

It would also be interesting to see how robust the readout of the combined network of grid cell modules (of different spacing) would be with a varying number of bumps and how they could interact to resolve the ambiguity.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No: I don't see a code repo link.

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: Yes: Erik Hermansen

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010547.r005

Decision Letter 2

Daniele Marinazzo, Xuexin Wei

6 Sep 2022

Dear Dr. Kang,

We are pleased to inform you that your manuscript 'Multiple bumps can enhance robustness to noise in continuous attractor networks' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Xuexin Wei

Academic Editor

PLOS Computational Biology

Daniele Marinazzo

Section Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #3: I thank the authors for addressing my concerns, their findings are now more intuitively explained through revised/added sentences and with the discussion of Fisher information. The augmentation of analysis of multiple networks and added mutual information results showcase the ambiguity of the advantage of multiple bumps (Split networks with fewer bumps can have similar noise resilience and the mutual information results describe rather the range/resolution (and hence possibly function) of the network than noise robustness), making the article more well-balanced.

The article serves as a nice addition to the CAN literature and I support its publication in PLoS Computational Biology.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: Yes: Erik Hermansen

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010547.r006

Acceptance letter

Daniele Marinazzo, Xuexin Wei

28 Sep 2022

PCOMPBIOL-D-22-00269R2

Multiple bumps can enhance robustness to noise in continuous attractor networks

Dear Dr Kang,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofia Freund

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Contains text on different model parameters, additional information results, and splitting networks, as well as Figs A, B, and C.

    (PDF)

    Attachment

    Submitted filename: Response.pdf

    Attachment

    Submitted filename: Response.pdf

    Data Availability Statement

    Simulation and analysis code is available at https://github.com/louiskang-group/wang-2022.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES