Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2017 Oct 17;114(44):E9366–E9375. doi: 10.1073/pnas.1705841114

Balanced excitation and inhibition are required for high-capacity, noise-robust neuronal selectivity

Ran Rubin a,1, L F Abbott a,b, Haim Sompolinsky c,d
PMCID: PMC5676886  PMID: 29042519

Significance

Neurons and networks in the cerebral cortex must operate reliably despite multiple sources of noise. Using a mathematical analysis and model simulations, we show that noise robustness requires synaptic connections to be in a balanced regime in which excitation and inhibition are strong and largely cancel each other. Our theory predicts an optimal ratio for the number of excitatory and inhibitory synapses that depends on the statistics of afferent activity and is consistent with data. This distinct form of excitation–inhibition balance is essential for robust neuronal selectivity and crucial for stability in associative memory networks, and it emerges automatically from learning in the presence of noise.

Keywords: E/I balance, synaptic learning, associative memory

Abstract

Neurons and networks in the cerebral cortex must operate reliably despite multiple sources of noise. To evaluate the impact of both input and output noise, we determine the robustness of single-neuron stimulus selective responses, as well as the robustness of attractor states of networks of neurons performing memory tasks. We find that robustness to output noise requires synaptic connections to be in a balanced regime in which excitation and inhibition are strong and largely cancel each other. We evaluate the conditions required for this regime to exist and determine the properties of networks operating within it. A plausible synaptic plasticity rule for learning that balances weight configurations is presented. Our theory predicts an optimal ratio of the number of excitatory and inhibitory synapses for maximizing the encoding capacity of balanced networks for given statistics of afferent activations. Previous work has shown that balanced networks amplify spatiotemporal variability and account for observed asynchronous irregular states. Here we present a distinct type of balanced network that amplifies small changes in the impinging signals and emerges automatically from learning to perform neuronal and network functions robustly.


The response properties of neurons in many brain areas including cerebral cortex are shaped by the balance between coactivated inhibitory and excitatory synaptic inputs (15) (for a review see ref. 6). Excitation–inhibition balance may have different forms in different brain areas or species and its emergence likely arises from multiple mechanisms. Theoretical work has shown that, when externally driven, circuits of recurrently connected excitatory and inhibitory neurons with strong synapses settle rapidly into a state in which population activity levels ensure a balance of excitatory and inhibitory currents (7, 8). Experimental evidence in some systems indicates that synaptic plasticity plays a role in maintaining this balance (912). Here we address the question of what computational benefits are conferred by the excitation–inhibition balance properties of balanced and unbalanced neuronal circuits. Although it has been shown that networks in the balanced states have advantages in generating a fast and linear response to changing stimuli (7, 8, 13, 14), the advantages and disadvantages of excitation–inhibition balance for general information processing have not been elucidated [except in special architectures (1517)]. Here we compare the computational properties of neurons operating with and without excitation–inhibition balance and present a constructive computational reason for strong, balanced excitation and inhibition: It is needed for neurons to generate selective responses that are robust to output noise, and it is crucial for the stability of memory states in associative memory networks. The distinct balanced networks we present here naturally and automatically emerge from synaptic learning that endows neurons and networks with robust functionality.

We begin our analysis by considering a single neuron receiving input from a large number of afferents. We characterize its basic task as discriminating patterns of input activation to which it should respond by firing action potentials from other patterns which should leave it quiescent. Neurons implement this form of response selectivity by applying a threshold to the sum of inputs from their presynaptic afferents. The simplest (parsimonious) model that captures these basic elements is the binary model neuron (18, 19), which has been studied extensively (2023) and used to model a variety of neuronal circuits (2428). Our work is based on including and analyzing the implications of four fundamental neuronal features not previously considered together: (i) nonnegative input, corresponding to the fact that neuronal activity is characterized by firing rates; (ii) a membrane potential threshold for neuronal firing above the resting potential (and hence a silent resting state); (iii) sign-constrained and bounded synaptic weights, meaning that individual synapses are either excitatory or inhibitory and the total synaptic strength is limited; and (iv) two sources of noise, input and output noise, representing fluctuations arising from variable stimuli and inputs and from processes within the neuron. As will be shown, these features imply that, when the number of input afferents is large, synaptic input must be strong and balanced if the neuron’s response selectivity is to be robust. We extend our analysis to recurrently connected networks storing long-term memory and find that similar balanced synaptic patterns are required for the stability of the memory states against noise. In addition, maximizing the performance of neurons and networks in the balanced state yields a prediction for the optimal ratio of excitatory to inhibitory inputs in cortical circuits.

Results

Our model neuron is a binary unit that is either active or quiescent, depending on whether its membrane potential is above or below a firing threshold. The potential, labeled VPSP, is a weighted sum of inputs xi, i= 1,2,,N, that represent afferent firing rates and are thus nonnegative,

VPSP(𝒙,𝒘)=Vrest+i=1Nwixi, [1]

where Vrest is the resting potential of the neuron and x and w are N-component vectors with elements xi and wi, respectively. The weight wi represents the synaptic efficacy of the ith input. If VPSPVth the neuron is in an active state; otherwise, it is in a quiescent state. To implement the segregation of excitatory and inhibitory inputs, each weight is constrained so that wi 0 if input i is excitatory and wi 0 if input i is inhibitory.

To function properly in a circuit, a neuron must respond selectively to an appropriate set of inputs. To characterize selectivity, we define a set of P exemplar input vectors 𝒙μ, with μ=1,2,,P, and randomly assign them to two classes, denoted as “plus” and “minus.” The neuron must respond to inputs belonging to the plus class by firing (active state) and to the minus class by remaining quiescent. This means that the neuron is acting as a perceptron (1822, 25, 27, 29). We assume the P input activations, 𝒙μ, are drawn identically and independently from a distribution with nonnegative means, 𝒙¯, and covariance matrix, C (when N is large, higher moments of the distribution of x have negligible effect). For simplicity we assume that the stimulus average activities are the same for all input neurons within a population, so that x¯i=x¯exc(inh) 0, and that C is diagonal with equal variances within a population, σi2=σexc(inh)2. Note that synaptic weights are in units of membrane potential over input activity levels (firing rates) and hence will be measured in units of (VthVrest)/σexc.

We call weight vectors that correctly categorize the P exemplar input patterns, 𝒙μ for μ= 1,2,,P, solutions of the categorization task presented to the neuron. Before describing in detail the properties of the solutions, we outline a broad distinction between two types of possible solutions. One type is characterized by weak synapses, i.e., individual synaptic weights that are inversely proportional to the total number of synaptic inputs, wi 1/N [note that weights weaker than O(1/N) will not enable the neuron to cross the threshold]. For this solution type, the total excitatory and inhibitory parts of the membrane potential are of the same order as the neuron’s threshold. An alternative scenario is a solution in which individual synaptic weights are relatively strong, wi 1/N. In this case, both the total excitatory and inhibitory parts of the potential are, individually, much greater than the threshold, but they make approximately equal contributions, so that excitation and inhibition tend to cancel, and the mean VPSP is close to threshold. We call the first type of solution unbalanced and the second type balanced. Importantly, since both balanced and unbalanced solutions solve the categorization task with the same value of Vth, the two solution types are not related to each other by a global scaling of the weights but represent different patterns of {wi}. Note that the norm of the weight vector, |𝒘|=i=1Nwi2, serves to distinguish the two types of solutions. This norm is of order of 1/N for unbalanced solutions and of order 1 in the balanced case. Weights with norms stronger than O(1) lead to membrane potential values that are much larger in magnitude than the neuron’s threshold. For biological neurons postsynaptic potentials of such magnitude can result in very high, unreasonable firing rates (although see ref. 30). We therefore impose an upper bound of the weight norm |𝒘|Γ, where Γ is of order 1. We now argue that the differences between unbalanced and balanced solutions have important consequences for the way the system copes with noise.

As mentioned above, neurons in the central nervous system are subject to multiple sources of noise, and their performance must be robust to its effects. We distinguish two biologically relevant types of noise: input noise resulting from the fluctuations of the stimuli and sensory processes that generate the stimulus-related input x and output noise arising from afferents unrelated to a particular task or from biophysical processes internal to the neuron, including fluctuations in the effective threshold due to spiking history and adaptation (3133) (for theoretical modeling see ref. 34). Both sources of noise result in trial-by-trial fluctuations of the membrane potential VPSP and, for a robust solution, the probability of changing the state of the output neuron relative to the noise-free condition must be low. The two sources of noise differ in their dependence on the magnitude of the synaptic weights. Because input noise is filtered through the same set of synaptic weights as the signal, its effect on the membrane potential is sensitive to the magnitude of those weights. Specifically, if the trial-to-trial variability of each input xiμ is characterized by SD σin, the fluctuations it generates in the membrane potential have SD |𝒘|σin (Fig. 1, Top Left and Top Right). On the other hand, the effect of output noise is independent of the synaptic weights w. Output noise characterized by SD σout induces membrane potential fluctuations with the same SD σout for both types of solutions (Fig. 1, Bottom Left and Bottom Right).

Fig. 1.

Fig. 1.

Only balanced solutions can be robust to both input and output noise. Each panel depicts membrane potentials resulting from different input patterns in a classification task. Weights are unbalanced [|𝒘|=O(1/N), Top Left and Bottom Left] or balanced [|𝒘|=O(1), Top Right and Bottom Right]. The neuron is in an active state only if the membrane potential is greater than the threshold Vth. The input pattern class (plus or minus) is specified by the squares underneath the horizontal axis. Each input pattern determines a membrane potential (mean, horizontal bars) that fluctuates from one presentation to another due to input noise (Top Left and Top Right) and output noise (Bottom Left and Bottom Right). Vertical bars depict the magnitude of the noise in each case. The variability of the mean VPSP across input patterns (which is the signal differentiating input pattern classes) is proportional to |𝒘|. As a result, the mean VPSPs for unbalanced solutions (Top Left and Bottom Left) cluster close to the threshold [difference from threshold O(1/N)]. For balanced solutions (Top Right and Bottom Right), the mean VPSPs have a larger spread [potential difference O(1)]. Input noise (fluctuations of xi, Top Left and Top Right) produces membrane potential fluctuations with SD that is proportional to |𝒘|, which is of O(1/N) for unbalanced solutions (Top Left) and of O(1) for balanced solutions (Top Right). Output noise (Bottom Left and Bottom Right) produces membrane potential fluctuations that are independent of |𝒘|, so it is of the same magnitude for both solution types. Thus, while both balanced and unbalanced solutions can be robust to input noise, only balanced solutions can also be robust to substantial output noise.

We can now appreciate the basis for the difference in the noise robustness of the two types of solutions. For unbalanced solutions, the difference between the potential induced by typical plus and minus noise-free inputs (the signal) is of the order of |𝒘|=O(1/N) (Fig. 1, Top Left and Bottom Left). Although the fluctuations induced by input noise are of this same order (Fig. 1, Top Left), output noise yields fluctuations in the membrane potential of order 1, which is much larger than the magnitude of the weak signal (Fig. 1, Bottom Left). In contrast, for balanced solutions, the signal differentiating plus and minus patterns is of order |𝒘|=O(1), which is the same order as the fluctuations induced by both types of noise (Fig. 1, Top Right and Bottom Right). Thus, we are led to the important observation that the balanced solution provides the only hope for producing selectivity that is robust against both types of noise. However, there is no guarantee that robust, balanced solutions exist or that they can be found and maintained in a manner that can be implemented by a biological system. Key questions, therefore, are, Under what conditions does a balanced solution to the selectivity task exist? And what are, in detail, its robustness properties? Below, we derive conditions for the existence of a balanced solution, analyze its properties, and study the implications for single-neuron and network computation. We show that, subject to a small reduction of the total information stored in the network, robust and balanced solutions exist and can emerge naturally when learning occurs in the presence of output noise.

Balanced and Unbalanced Solutions.

We begin by presenting the results of an analytic approach (2022) for determining existence conditions and analyzing properties of weights that generate a specified selectivity, independent of the particular method or learning algorithm used to find the weights (SI Replica Theory for Sign- and Norm-Constrained Perceptron). We validate the theoretical results by using numerical methods that can determine the existence of such weights and find them if they exist (SI Materials and Methods).

When the number of patterns P is too large, solutions may not exist. The maximal value of P that permits solutions is proportional to the number of synapses, N, so a useful measure is the ratio α=P/N, which we call the load. The capacity, denoted as αc, is the maximal load that permits solutions to the task. The capacity depends on the relative number of plus and minus input patterns. For simplicity we assume throughout that the two classes are equal in size (but see SI Capacity for Noneven Split of Plus and Minus Patterns). A classic result for the perceptron with weights that are not sign constrained is that the capacity is αc= 2 (20, 35, 36). For the “constrained perceptron” considered here, we find that αc depends also on the fraction of excitatory afferents, denoted by fexc. This fraction is an important architectural feature of neuronal circuits and varies in different brain systems. For fexc= 0, namely a purely inhibitory circuit, the capacity vanishes, because when all of the input to the neuron is inhibitory, VPSP cannot reach threshold and the neuron is quiescent for all stimuli. When the circuit includes excitatory synapses, the task can be solved by appropriate shaping of the strength of the excitatory and inhibitory synapses, and this ability increases the larger the fraction of excitatory synapses is. Therefore, for fexc> 0, αc increases with fexc up to a maximum of αc= 1 (half the capacity of an unconstrained perceptron) for fractions equal to or greater than a critical fraction fexc=fexc. This dependence can be summarized by the capacity curve αc(fexc) (Fig. 2A, solid line) bounding the range of loads which admit solutions for the different excitatory/inhibitory ratios.

Fig. 2.

Fig. 2.

Balanced and unbalanced solutions. (A) Perceptron solutions as a function of load and fraction of excitatory weights. Above the capacity line [αc(fexc), solid line] no solution exists. Balanced solutions exist only below the balanced capacity line [αb(fexc), dashed shaded line]. Between the balanced capacity and maximum capacity lines, only unbalanced solutions exist (U). On the other hand, below the balanced capacity line, unbalanced solutions coexist with balanced ones (B+U). (B) The norm of the synaptic weight vector of typical solutions as a function the load [in units of (VthVrest)/σexc]. Below αb the norm is clipped at its upper bound Γ (in this case Γ= 1). Above αb the norm collapses and is of order 1/N (shown here for N= 3,000). (C) The input imbalance index (IB, Eq. 3) of typical solutions as a function of the load. Note the sharp onset of imbalance above αb. In B and C fexc= 0.8, yielding αc= 1. See SI Materials and Methods for other parameters used. For simulation results see Fig. S1.

Interestingly, fexc depends on the statistics of the inputs (SI Replica Theory for Sign- and Norm-Constrained Perceptron). We denote the coefficient of variation (CV) of the excitatory and inhibitory input activities by CVexc=σexc/x¯exc and CVinh=σinh/x¯inh, respectively. These measure the degree of stimulus tuning of the two afferent populations. In terms of these quantities, the critical excitatory fraction is

fexc=CVexcCVexc+CVinh. [2]

In other words, the critical ratio between the number of excitatory and inhibitory afferents [fexc/(1 − fexc)] equals the ratio of their degree of tuning. To understand the origin of this result, we note that to maximize the encoding capacity, the relative strength of the weights should be inversely proportional to the SD of their afferents, w¯exc(inh) 1/σexc(inh), implying that the mean total synaptic inputs are proportional to fexcw¯excx¯exc+finhw¯inhx¯inh=fexc/CVexcfinh/CVinh, where finh= 1fexc. For excitatory fraction fexc>fexc this mean total synaptic inputs are positive, allowing the voltage to reach the threshold and the neuron to implement the required selectivity task with optimally scaled weights. Thus, the capacity of the neuron is unaffected by changes in fexc in the range fexcfexc 1. For excitatory fraction fexc<fexc the neuron cannot remain responsive (reach threshold) with optimally scaled weights, and thus the capacity is reduced.

In cortical circuits, inhibitory neurons tend to fire at higher firing rates and are thought to be more broadly tuned than excitatory neurons (4, 37, 38), implying fexc> 0.5 (SI Effects of E and I Input Statistics). This is consistent with the abundance of excitatory synapses in cortex. However, input statistics that make fexc< 0.5 do not change the qualitative behavior we discuss (SI Effects of E and I Input Statistics and Fig. S2A).

Fig. S2.

Fig. S2.

Effects of input statistics. (A) Solution type vs. fexc and α (as a fraction of αcUnconst) (Capacity for Noneven Split of Plus and Minus Patterns) for different values of ϕ=CVexc/CVinh. From Left to Right ϕ= 1/2, 1,2. Lines are as in Fig. 2A. (B) Type of maximal κin solutions vs. ϕ and κinmax for different values of λ=σinh/σexc. For a wide range of ϕ and λ these solutions are unbalanced for all values of κinmax. Here fexc=0.8 and λ= 1/2, 1, 2 from Left to Right. (C) Fraction of silent weights for maximal κout solutions vs. the load for different values of λ. Fraction of E silent weights is shown in blue and fraction of I silent weights is depicted in red. Here fexc=0.8,ϕ=2, pout=0.1, and λ= 1/2, 1, 2 from Left to Right. Notably, for unbalanced, maximal κin solutions the fraction of silent weights is constant and equals 0.5 for both E and I inputs (Saddle-point equations for the maximal κin solution and Distribution of synaptic weights).

For load levels below the capacity, many synaptic weight vectors solve the selectivity task and we now describe the properties of the different solutions. In particular, we investigate the parameter regimes where balanced or unbalanced solutions exist. We find that unbalanced solutions with weight vector norms of order 1/N exist for all load values below αc. As for the balanced solutions with weight vector norms of order 1, they exist below a critical value αb which may be smaller than αc. Specifically, for fexcfexc balanced solutions exist for all load values below capacity; i.e., αb=αc. For fexc>fexc, αb is smaller than αc and decreases with fexc until it vanishes at fexc= 1 (Fig. 2A, dashed shaded line). The absence of balanced solutions for fexc= 1 is clear, as there is no inhibition to balance the excitatory inputs. Furthermore, the synaptic excitatory weights must be weak (scaling as 1/N) to ensure that VPSP remains close to threshold (slightly above it for plus patterns and slightly below it for minus ones). For 1fexc>fexc the predominance of excitatory afferents precludes a balanced solution if the load is high; i.e., αbααc. As argued above and shown below, balanced solutions are more robust than unbalanced solutions. Hence, we can identify fexc as the optimal fraction of excitatory input, because it is the fraction of excitatory afferents for which the capacity of balanced solutions is maximal.

For loads below αb both balanced and unbalanced solutions exist, raising the question, What would be the character of a weight vector that is sampled randomly from the space of all possible solutions? Our theory predicts that whenever the balanced solutions exist, the vast majority of the solutions are balanced and furthermore have a weight vector norm that is saturated at the upper bound Γ. This is a consequence of the geometry of high-dimensional spaces in which volumes are dominated by the volume elements with the largest radii (see SI Replica Theory for Sign- and Norm-Constrained Perceptron for details). Thus, for fexc>fexc, the typical solution undergoes a transition from balanced to unbalanced weights as α crosses the balanced capacity line αb(fexc). At this point the norm of the solution collapses from Γ to |𝒘| 1/N (Fig. 2B).

As explained above, for balanced solutions we expect to find a near cancellation of the total excitatory (E) and inhibitory (I) inputs. Our theory confirms this expectation. To measure the degree of E-I cancellation for any solution, we introduce the imbalance index,

IB=iwix¯iiexcwix¯iiinhwix¯i, [3]

where the overbar symbol denotes an average over all of the input patterns (μ) and, as mentioned above, E weights are nonnegative (wi 0) and I weights are nonpositive (wi 0). Whereas for the unbalanced solution the IB is of order 1, for the balanced solution it is small, of order 1/N. Thus, the typical solution below αb has zero imbalance (to leading order in N), but the imbalance increases sharply as α increases beyond αb (Fig. 2C).

Noise Robustness of Balanced and Unbalanced Solutions.

To characterize the effect of noise on the different solutions, we introduce two measures, input robustness κin and output robustness κout, which characterize the robustness of the noise-free solutions to the addition of two types of noise. To ensure robustness to output noise, the noise-free membrane potential that is the closest to the threshold must be sufficiently far from it. Thus, we define

κout=minμ|i=1Nwixiμ1|, [4]

where the minimum is taken over all of the input patterns in the task and the threshold is 1 [because we measure the weights in units of (VthVrest)/σexc]. The second measure, which characterizes robustness to input noise, must take into account the fact that the fluctuations in the membrane potential induced by this form of noise scale with the size of the synaptic weights. Hence, κin=κout/|𝒘| [κin corresponds to the notion of margin in machine learning (39)]. Efficient algorithms for finding the solution with a maximum possible value of κin have been studied extensively (39, 40). We have developed an efficient algorithm for finding solutions with maximal κout (SI Materials and Methods).

We now ask, What are the possible values of the input and output robustness of unbalanced and balanced solutions? Our theory predicts that the majority of both balanced and unbalanced solutions have vanishingly small values of κin and κout and are thus very sensitive to noise. However, for a given load (below capacity) robust solutions do exist, with a spectrum of robustness values up to maximal values, κinmax> 0 and κoutmax> 0. Since the magnitude of w scales both signal and noise in the inputs, κinmax is not sensitive to |𝒘| and hence is of O(1) for both unbalanced and balanced solutions. On the other hand, κoutmax=κinmax|𝒘| is proportional to |𝒘|. Thus, we expect κoutmax to be of O(1) when balanced solutions exist and of O(1/N) when only unbalanced solutions exist. In addition, we expect that increasing the load will reduce the value of κinmax and κoutmax as the number of constraints that need to be satisfied by the synaptic weights increases.

In Fig. 3 we present the values of κinmax and κoutmax vs. the load. As expected, we find that the values of both κinmax and κoutmax reach zero as the load approaches the capacity, αc (and diverges, as N, for vanishingly small loads). However, κoutmax is only substantial (of order 1) and proportional to Γ below αb where balanced solutions exist (Fig. 3 A and B). In contrast, κinmax remains of order 1 up to the full capacity, αc (Fig. 3C). What are the properties of “optimal” solutions that achieve the maximal robustness to either input or output noise? We find that the solutions that achieve the maximal output robustness, κoutmax, are balanced for all ααb and their norm saturates the upper bound, Γ (Fig. S3B). Interestingly, for a wide range of input parameters (SI Replica Theory for Sign- and Norm-Constrained Perceptron, Effects if E and I Input Statistics, and Fig. S2B), solutions that achieve the maximal input robustness, κinmax, are unbalanced solutions (Fig. S3C). Nevertheless, we find that below the critical balance load, αb, the κin values of the balanced maximal κout solutions are of the same order as, and indeed close to, κinmax (Fig. 3C, dashed shaded line). In fact, the balanced solution with maximal κout also poses the maximal value of κin that is possible for balanced solutions.

Fig. 3.

Fig. 3.

Maximal values of input and output robustness. (A) Maximal value of κout vs. load [in units of Γσexc/(VthVrest)]. No solutions exist above the maximal κout line (κoutmax, solid line). Below κoutmax, for output robustness that is of order 1, only balanced solutions exist. (B) Maximal value of κout for loads between αb and αc (in units of σexc/x¯exc). In this range only unbalanced solutions exist and the maximal κout values (solid line) scale as 1/N. (C) Maximal value of κin vs. load (in units of σexc). No solutions exist above the maximal κin line (κinmax, solid line). For the parameters used, solutions that achieve κinmax are unbalanced. The maximal value of κin for balanced solutions (dashed shaded line) is not far from the κinmax and is attained by solutions that maximize κout for α<αb. In A–C, theory and numerical results are depicted in solid or shaded lines and shaded circles, respectively. Error bars depict SE of the mean. See SI Materials and Methods for parameters used. For further simulation results see Fig. S3.

Fig. S3.

Fig. S3.

Properties of maximal output and input robustness solutions. (A) Input robustness, κin, vs. the load for the maximal κin solution (red) and the maximal κout solution (blue). (B) Norm of synaptic weight vector vs. the load for the maximal κout solution. In the balanced regime (α<αb) the norm saturates its upper bound Γ=1. Since the norm is constant, maximizing κout in the balanced regime is equivalent to maximizing κin under the constraint |𝒘|=Γ. (C) Rescaled norm of synaptic weight vector (N|𝒘|) vs. the load for the maximal κin solution. To demonstrate the 1/N scaling of the weight vector norm, colors depict results for N=750 (gray), N=1,500 (green), and N=3,000 (red). In A–C lines depict theoretical prediction. fexc=0.8, pout=0.1, ϕ=CVexc/CVinh=2, and λ=σinh/σexc=2; results are averaged over 100 samples.

We conclude that solutions that are robust to both input and output noise exist for loads less than αb which for fexc>fexc is smaller than αc. However, as long as fexc is close to fexc, the reduction in capacity from αc to αb imposed by the requirement of robustness is small.

Balanced and Unbalanced Solutions for Spiking Neurons.

Neurons typically receive their input and communicate their output through action potentials. Thus, a fundamental question is, How will the introduction of spike-based input and spiking output affect our results? Here we show that the main properties of balanced and unbalanced synaptic efficacies, as discussed above, remain when the inputs are spike trains and the model neuron implements spiking and membrane potential reset mechanisms.

We consider a leaky integrate-and-fire (LIF) neuron that is required to perform the same binary classification task we considered using the perceptron. Each input is characterized by a vector of firing rates, 𝒙μ. Each afferent generates a Poisson spike train over an interval from time t= 0 to t=T, with mean rate rixiμ. The LIF neuron integrates these input spikes (SI Materials and Methods) and emits an output spike whenever its membrane potential crosses a firing threshold. After each output spike, the membrane potential is reset to the resting potential, and the integration of inputs continues. We define the output state of the LIF neuron, using the total number of output spikes nspikes: The neuron is quiescent if nspikesnthr and active if nspikes>nthr, where nthr is chosen to maximize classification performance. We do not discuss the properties of learning in LIF neurons (4145), but instead test the properties of the solutions (weights) obtained from the perceptron model when they are used for the LIF neuron. In particular, we compare the performance of the balanced, maximal κout solution and the unbalanced, maximal κin solution. When the synaptic weights of the LIF neuron are set according to the two perceptron solutions, the mean output of the LIF neuron correctly classifies the input patterns (according to the desired classification; Fig. S4). Consistent with the results for the perceptron, we find that with no output noise the performance of both solutions is good, even in the presence of the substantial input noise caused by Poisson fluctuations in the number of input spikes and their timings (Fig. 4 AC). When the output noise magnitude is increased (SI Materials and Methods), however, the performance of the unbalanced maximal κin solution quickly deteriorates, whereas the performance of the balanced maximal κout solution remains largely unaffected (Fig. 4 DF). Thus, the spiking model recapitulates the general results found for the perceptron.

Fig. S4.

Fig. S4.

Neuronal selectivity for a spiking neuron. Both panels depict the histograms of the mean output spike count for patterns belonging to the plus (blue) and minus (red) classes of an LIF neuron with balanced weights maximizing κout (Left) and unbalanced weights maximizing κin (Right). Here the magnitude of the output noise is zero. In both cases the mean output spike count can be used to correctly classify the patterns. For parameters used see Fig. 3.

Fig. 4.

Fig. 4.

Selectivity in a spiking model. A and B (D and E) depict the output of an LIF neuron with no (high) output noise for the balanced maximal κout solution (A and D) and the unbalanced maximal κin solution (B and E). C and F depict the receiver operating characteristic (ROC) curves for the two solutions under the no output noise (C) and high output noise (F) conditions obtained as the decision threshold (nthr) is modified from 0 to . Consistent with the results of the perceptron, the performances of the two solutions with no output noise are very similar with a slight advantage for the maximal κin solution. With higher levels of output noise, the performance of the unbalanced maximal κin solution quickly deteriorates, whereas the performance of the balanced maximal κout solution is only slightly affected. |𝒘| of the balanced solution was chosen to equalize the mean output spike count across all patterns in both solutions (mean nspike 4). See SI Materials and Methods for parameters used.

Balanced and Unbalanced Synaptic Weights in Associative Memory Networks.

Thus far, we have considered the selectivity of a single neuron, but our results also have important implications for recurrently connected neuronal networks, in particular recurrent networks implementing associative memory functions. Models of associative memory in which stable fixed points of the network dynamics represent memories, and memory retrieval corresponds to the dynamic transformation of an initial state to one of the memory-representing fixed points, have been a major focus of memory research for many years (24, 27, 28, 4648). For the network to function as an associative memory, memory states must have large basins of attraction so that the network can perform pattern completion, recalling a memory from an initial state that is similar but not identical to it. In addition, memory retrieval must be robust to output noise. As we will show, the variables κin and κout for the synaptic weights projecting onto individual neurons in the network are closely related to the sizes of the basins of attraction of the memories and the robustness to output noise, respectively.

We consider a network that consists of Nfexc E and N(1fexc) I, recurrently connected binary neurons. The network operates in discrete time steps and at each step the state of one randomly chosen neuron, i, is updated according to

si(t+1)=Θ[jiJijsj(t)+ηout(t)1]. [5]

Here Θ(x)= 1 for x 0 and 0 otherwise, Jij is the weight of the synapse from neuron j to neuron i, and ηout(t), the output noise, is a Gaussian random variable with SD σout. P randomly chosen binary activity patterns {𝐬μ},μ= 1,2,,P (where each siμ={0,1}) representing the stored memories are encoded in the recurrent synaptic matrix J. This is achieved by treating each neuron, say i, as a perceptron with a weight vector 𝒘i={Jij}ji that maps its inputs {sjμ} from all other neurons to its desired output siμ for each memory state (Fig. 5 A and B and SI Materials and Methods). This creates an attractor network in which the memory states are fixed points of the dynamics in the noise-free condition (σout= 0) (20).

Fig. 5.

Fig. 5.

Recurrent associative memory network constructed using single-neuron feedforward learning. (A) A fully connected recurrent network of E and I neurons in a particular memory state. Active (quiescent) neurons are shown in black (white). E and I synaptic connections (Jij) are shown in yellow and blue, respectively (not all connections are depicted). Lines symbolize axons, and synapses are shown as small circles. (B) To find an appropriate Jij, the postsynaptic weights of each neuron are set using the memory-state activities of the other neurons as input and its own memory state as the desired output. In this example, neuron 4 will implement its desire memory state through modification of the weights J4j for j=1,2,3,5,6,7. C and E show the fraction of erroneous (different from a given memory pattern) neurons in the network as a function of time. (C) Network dynamics with σout= 0. An initial state of the network can either converge to the memory state (blue) or diverge to other network states (red). (D) Probability of converging to a memory state vs. initial pattern distortion (SI Materials and Methods) for a network with unbalanced maximal κin weights (green), a network with balanced maximal κout weights (black), and a network with balanced maximal κout weights with unlearned inhibition (gray, main text). (E) Network dynamics with σout> 0. The network is initialized at the memory state. The dynamics can be stable (blue; the network remains close to the memory state), or unstable (red; the network diverges to another state). (F) Probability of stable dynamics for at least 500N time steps for networks initialized at the memory state in the presence of output noise vs. σout. Colors are the same as in D. (G) Maximal output noise magnitude vs. load for networks with balanced synaptic weights matrix maximizing κout. Similar to κout, the maximal output noise magnitude is of order 1 only below αb. Above it, even though solutions exist they are extremely sensitive to output noise. Results are shown for fexc= 0.8 (green) and fexc= 0.9 (magenta). See SI Materials and Methods for parameters used.

We do not attempt to perform a complete analysis of the effects of input and output noise in recurrent networks, a difficult challenge. Instead, we link observations from our single-neuron analysis to key features of a recurrent network performing a memory function. The capacity of such a memory network is defined as the maximal load for which the memory patterns can be fixed points of the noise-free dynamics, stable against single-neuron perturbations. This condition is met as long as the single-neuron synaptic weights possess substantial κin (i.e., κinO(1)) for all neurons. Thus, the single-neuron capacities will determine the overall network capacity. As we showed before, the capacity of a single-neuron perceptron depends on the statistics of its desired output (which in our case is the sparsity of activity across memory states). Since this statistic may be different in E and I populations, the single-neuron capacity of the two populations may vary, and hence the global capacity of the recurrent network is the minimum of the single-neuron capacities of the two neuron types. As long as P is smaller than this critical capacity, a recurrent weight matrix exists for which all P memory states are stable fixed points of the noiseless dynamics. However, such solutions are not unique, and the choice of a particular matrix can endow the network with different robustness properties. As stated above, to properly function as an associative memory the fixed points must have large basins of attraction. Corruption of the initial state away from the parent memory pattern introduces variability into the inputs of each neuron for subsequent dynamic iterations and hence is equivalent to injecting input noise in the single-neuron feedforward case. The network propagates this initial input noise in a nontrivial way; however, its magnitude always remains proportional to the magnitude of the norm of the neurons’ synaptic weights. We therefore expect that a large basin of attraction is achieved when the matrix J yields a large input noise robustness for each neuron in the (noise-free) fixed points (49, 50). When output noise is introduced to the network dynamics (σout> 0), the network may propagate it as input noise to other neurons in subsequent time steps. However, initially its magnitude is proportional only to σout and is unaffected by the scale of the synaptic weights. Thus, we expect that the requirement that the memory states and retrieval will be robust against output noise is satisfied when J yields a large output noise robustness for each neuron in the (noise-free) fixed points. We therefore consider two types of recurrent connections: one in which each row of J is a weight vector that maximizes κin and hence, in the chosen parameter regime, is necessarily unbalanced and a second one in which the rows of the connection matrix correspond to balanced solutions that maximize κout.

We estimate the basins of attraction of the memory patterns numerically by initializing the network in states that are corrupted versions of the memory states (SI Materials and Methods) and observing whether the network, with σout=0, converges to the parent memory state (Fig. 5C, blue) or diverges away from it (Fig. 5C, red). We define the size of the basin of attraction as the maximum distortion in the initial state that ensures convergence to the parent memory with high probability.

Comparing the basins of attraction of the two types of networks, we find that the mean basin of attraction of the unbalanced network is moderately larger than that of the balanced one (Fig. 5D), consistent with the slightly lower value of κin in the balanced case (Fig. 5D). On the other hand, the behavior of the two networks is strikingly different in the presence of output noise. To illustrate this, we start each network at a memory state and determine whether it is stable (remains in the vicinity of this state for an extended period), despite the noise in the dynamics (Fig. 5E). We estimate the output noise tolerance of the network by measuring the maximal value of σout for which the memory states are stable (Fig. 5F). We find that memory states in the balanced solution with maximal κout are stable for noise levels that (for the network sizes used in the simulation) are an order of magnitude larger than for the unbalanced network with maximal κin (Fig. 5F).

Finally, we ask how the noise robustness of the memory states in the balanced network depends on the number of memories. As shown in Fig. 5F, for a fixed level of load below capacity, memory patterns are stable (Pstable> 0.5) as long as levels of noise remain below a threshold value, which we denote as σoutmax(α). When σout increases beyond σoutmax(α), stability of the memory states rapidly deteriorates. The critical noise function σoutmax(α) decreases smoothly from a large value at small α to zero at a level of load, αb. This load coincides with the maximal load for which both E and I neurons have balanced solutions (Fig. 5G). For loads αb<α<αc, all solutions are unbalanced, and hence the magnitude of the stochastic dynamical component can be at most of order 1/N.

The Role of Inhibition in Associative Memory Networks.

In our associative memory network model, we assumed that both E and I neurons code desired memory states and that all network connections are modified by learning. Most previous models of associative memory that separate excitation and inhibition assume that memory patterns are restricted to the E population, whereas inhibition provides stabilizing inputs (14, 48, 5154). To address the emergence of balanced solution in scenarios where the I neurons do not represent long-term memories, we studied an architecture where I to E, I to I, and E to I connections are random sparse matrices with large amplitudes, resulting in I activity patterns driven by the E memory states. In such conditions, the I subnetwork exhibits irregular asynchronous activity with an overall mean activity that is proportional to the mean activity of the driving E population (7, 55, 56). Although the mean I feedback provided to the E neurons can balance the mean excitation, the variability in this feedback injects substantial noise onto the E neurons, which degrades system performance (SI Recurrent Networks with Nonlearned Inhibition). This variability stems from the differences in I activity patterns generated by the different E memory states (albeit with the same mean). Additional noise is caused by the temporal irregular activity of the chaotic I dynamics. Next we ask whether the system’s performance can be improved through plasticity in the I to E connections for which some experimental evidence exists (23, 5760). Indeed, we find an appropriate plasticity rule for this pathway (SI Recurrent Networks with Nonlearned Inhibition) that suppresses the spatiotemporal fluctuations in the I feedback, yielding a balanced state that behaves similarly to the fully learned networks described above (Fig. 5 D and F, gray lines). Interestingly, in this case the basins of attraction of the balanced network are comparable to or even larger than the basins of the unbalanced fully learned network (compare gray to green curves in Fig. 5D). Despite the fact that no explicit memory patterns are assigned to the the I populations, the I activity plays a computational role that goes beyond providing global I feedback; when the weights of the I to E connections are shuffled, the network’s performance significantly degrades (Fig. S5).

Fig. S5.

Fig. S5.

Effect of shuffling learned I weights in recurrent networks with nonlearned I activity. Shaded line depicts the performance of the network with random E to I and I to I connections and learned E to E and I to E connections (Recurrent Networks with Nonlearned Inhibition) (same as gray line in Fig. 5D). Solid line depicts the performance of the same network with the I weights of each E neuron randomly shuffled. Thus, the distribution of I synaptic weights for each E neuron is identical in both cases. This result shows that the learned I weights are important for network performance and stability.

Learning Robust Solutions.

Thus far, we have presented analytical and numerical investigations of solutions that support selectivity or associative memory and provide substantial robustness to noise. However, we did not address the way in which these robust solutions could be learned by a biological system. In fact, as stated above, the majority of solutions for these tasks have vanishingly small output and input robustness and the above maximum robustness solutions are found numerically by special learning algorithms. Therefore, an important question is whether noise robust weights can emerge naturally from synaptic learning rules that are appropriate for neuronal circuits.

The actual algorithms used for learning in the neural circuits are generally unknown, especially within a supervised learning scenario. Experiments suggest that learning rules may depend on brain area and both pre- and postsynaptic neuron types (for example, refs. 5759, 61; for reviews see refs. 60, 6264). From a theoretical perspective, the properties of the solutions found through learning, and in particular their noise robustness, depend on both the type and parameters of the algorithm and the properties of the space of possible solutions. However, our theory suggests that a general, simple way to ensure that learning arrives at a robust solution is to introduce noise during learning. Indeed, this is a common practice in machine learning for increasing generalization abilities [a specific form of data augmentation (65, 66)]. The rationale is that learning algorithms that achieve low error in the presence of noise necessarily lead to solutions that are robust against noise levels at least as large as those present during learning. In the case we are considering, learning in the presence of substantial input noise should lead to solutions that have substantial κin and introducing output noise during learning should lead to solutions with substantial κout. We note that κin may be large even if κout remains small (for example, in unbalanced solutions with maximal κin) but not vice versa [because κout of order 1 implies |𝒘| (and as a result κin) of order 1 as well]. Therefore, learning in the presence of significant output noise should lead to solutions that are robust to both input and output noise, whereas learning in the presence of input noise alone may lead to unbalanced solutions that are sensitive to output noise, depending on the details of the learning algorithm. We therefore predict that performing successful learning in the presence of output noise is a sufficient condition for the emergence of excitation–inhibition balance.

To demonstrate that robust balanced solutions emerge in the presence of output noise, we consider a variant of the perceptron learning algorithm (18) in which we have forced the sign constraints on the weights (29) and, in addition, added a weight decay term implementing a soft constraint on the magnitude of the weights (SI Materials and Methods). This supervised learning rule possesses several important properties that are required for biological plausibility: It is online, and weights are modified incrementally after each pattern presentation; it is history independent so that each weight update depends only on the current pattern and error signal; and finally, it is simple and local, and weight updates are a function of the error signal and quantities that are available locally at the synapse (presynaptic activity and synaptic efficacy). When this learning rule is applied to train a selectivity task in the presence of substantial output noise, the resulting solution has a balanced weight vector with substantial κout and κin (Fig. 6, shaded lines). In contrast, if learning occurs with weak output noise, the algorithm’s tendency to reduce the magnitude of the weights causes the resulting solution to be unbalanced with small κout, while its κin may be large if substantial input noise is present during learning (Fig. 6, solid lines). When this learning rule is applied in the load regime where only unbalanced solutions exist (αb<α<αc), learning fails to achieve reasonable performance when applied in the presence of large output noise. When noise is scaled down to the value allowed by κoutmax 1/N, learning yields unbalanced solutions with robustness values of the order of the maximum allowed in this region (Fig. S6).

Fig. 6.

Fig. 6.

Emergence of E-I balance from learning in the presence of output noise. All panels show the outcome of perceptron learning for a noisy neuron (SI Materials and Methods) under low (σout= 0.01, solid lines) and high (σout= 0.1, shaded lines) output noise conditions. Except for σout, all model and learning parameters are identical for the two conditions (including σin= 0.1). (A) Mean training error vs. learning cycle. On each cycle, all of the input patterns to be learned are presented once. The error decays and plateaus at its minimal value under both low and high output noise conditions. (B) Mean IB (Eq. 3) vs. learning cycle. IB remains of order 1 under low output noise conditions and drops close to zero under high output noise conditions. (C) Mean input robustness (κin) vs. learning cycle. Input robustness is high under both output noise conditions. (D) Mean output robustness (κout) vs. learning cycle. Output robustness is substantial only under the high output noise learning condition. These results demonstrate that robust balanced solutions naturally emerge under learning in the presence of high output noise. See SI Materials and Methods for other parameters used.

Fig. S6.

Fig. S6.

Perceptron learning with input and output noise for αb<α<αc. A–D depict the outcome of simple perceptron learning for a noisy neuron (Materials and Methods) under low output noise conditions (σout=0.01/N, solid lines) and high output noise conditions (σout=0.01, shaded lines). Except for σout all model and learning parameters are identical for the two conditions (including σin=0.01). (A) Mean training error vs. learning cycle. At each cycle all of the learned input patterns are presented once. (B) Mean imbalance index vs. learning cycle. IB remains of order 1 under low output noise conditions and drops to lower values under high output noise conditions. (C) Mean input robustness (κin) vs. learning cycle. (D) Mean rescaled output robustness (Nκout) vs. learning cycle. The error decays and plateaus at its minimal value under both low and high output noise conditions; however, for high output noise the error remains substantial. Both output and input robustness are negative under the high output noise conditions. (The learning does not find a weights vector that performs the classification of the noise-free patterns correctly.) Input and output robustness are positive when the output noise scales at most as 1/N. Random patterns are binary pattern xiμ{0, 1} with equal probabilities and an even split of plus and minus patterns. N=3,000, P=2,400. Learning algorithm parameters: ε= 108, ρ=0.02/N, σin=0.01. Results are averaged over 50 samples.

SI Materials and Methods

Finding Perceptron Solutions.

There are a number of numerical methods for choosing a weight vector w that generates a specified selectivity (25, 27, 29, 39). For numerical simulations we developed algorithms that find the maximal κout and maximal κin solutions that obey the imposed biological constraints. These solutions can be found directly by solving conic programming optimization problems for which efficient algorithms exist and are widely available (73). For details see Finding Maximal κin and κout Solutions.

Random Patterns in Numerical Estimation of 𝜿outmax and 𝜿inmax Solutions.

In numerical experiments for Figs. 3 and 4 and Figs. S1 and S3, E inputs for the random patterns were drawn i.i.d. from an exponential distribution with unity mean and SD. I inputs were drawn from a Gamma distribution with shape parameter k and scale parameter θ [the probability density function of the Gamma distribution is given by P(x)=1Γ(k)θ(xθ)k1exθ, where Γ(k) is the Gamma function].

Fig. S1.

Fig. S1.

Numerical measurement of capacity and balanced capacity. (A) Capacity of sign-constrained weights perceptron, αc, vs. the fraction of excitatory inputs, fexc, as a fraction of the capacity of an unconstrained perceptron (Capacity for Noneven Split of Plus and Minus Patterns). Theory is depicted in black. Simulation results are shown in blue for pout=0.5 and red for pout=0.1. To measure αc we measure the probability of the existence of a solution as a function of α. We estimate αc by the load at which this probability is 1/2. (B) Capacity of balanced solutions, αb, as a fraction of αc vs. fexc>fexc. Since κoutmax solutions are balanced whenever balanced solutions exist, to measure αb we measure the probability of finding a balanced κoutmax solution, i.e., a solution that saturates the upper bound on |𝒘|. We estimate αb by the load at which this probability is 1/2. In both A and B, N=3,000, CVexc/CVinh=2.

Dynamics of LIF Neuron.

Input spike trains.

For each input pattern 𝒙μ input spike trains of input afferent i=1,2,,N were drawn randomly from a Poisson process with rate ri=Axiμ, for duration T.

Synaptic input.

Given the set of input spike trains {ti}, i=1,2,,N the contribution of synaptic input to the membrane potential is given by Vsyn(t)=iwitiK(tti), where wi is the synaptic efficacy of the synapse from the ith input afferent and K(t) is a postsynaptic potential kernel. K(t)=0 for t<0 and is given by K(t)=V0(etτmetτs) for t>0, where τm and τs are the membrane and synaptic time constants, respectively, and V0 is such that the maximal value of K(t) is one.

Output noise.

Output noise was added to the neuron’s membrane potential as random synaptic input Vo.n.(t)=j=1NnoisegjK(ttj), where gj was randomly drawn from a zero mean Gaussian distribution with SD σn and tj(0,T) was randomly drawn from a uniform distribution.

Voltage reset.

After each threshold crossing the membrane potential was reset to its resting potential. Given the set of output spike times {tspike}, the total contribution of voltage reset to the membrane potential can be written as Vreset(t)=(VthVrest)tspikeR(ttspike), where Vrest and Vth are the neuron’s resting and threshold potentials, respectively, and R(t) implements the postspike voltage reset. R(t)=0 for t<0 and is given by R(t)=etτm for t0. This form ensures the voltage is reset to the resting potential immediately after an output spike.

Membrane potential.

Finally, the neuron’s membrane potential is given by V(t)=Vrest+Vsyn(t)+Vo.n.(t)+Vreset(t), where Vreset is computed given Vsyn and Vo.n..

Simulations of Recurrent Networks.

Memory states.

Networks were trained to implement a set of P memory states, specified by xiμ{0,1}, i=1,2,,N, μ=1,2,,P, as stable fixed points of the noise-free dynamics. Memory states were randomly chosen i.i.d. from binary distributions with parameter pexc/inh according to the type of the ith input afferent; i.e., Pr(xiμ=1)=pexc/inh and Pr(xiμ=0)=1pexc/inh.

Initial pattern distortion.

To start the network close to a memory state 𝒙μ, the initial state of the network, si(t=0) for i=1,2,,N, was randomly chosen according to Pr(si=1)=(1δ)Θ(2xiμ1)+δpexc/inh1pexc/inhΘ(2xiμ+1), where δ is the initial pattern distortion level (Fig. 5B) and Θ(x)=1 for x0 and 0 otherwise. This procedure ensures that the mean activity levels of E and I neurons in the initial state are the same as their mean activity levels in the memory state (74).

Perceptron Learning Algorithm.

The perceptron learning algorithm (Fig. 6 and Fig. S6) learns to classify a set of P labeled patterns. At learning time-step t one pattern 𝒙t with desired output yt=±1 is presented to the neuron. The output of the perceptron st is given by st=sign(𝒘tT𝒙t+ηt1), where ηt is a Gaussian random variable with zero mean and variance |𝒘t|2σin2+σout2. The error signal is defined as et=ytΘ(styt), where Θ(x)=1 for x>0 and zero otherwise. After each pattern presentation all synapses are updated. The synaptic weights of E inputs are updated according to wi,t+1=[(1ε)wi,t+ρetxi,t]+ and weights of I inputs are updated according to wi,t+1=[(1ε)wi,t+ρetxi,t], where [x]±=xΘ(±x), ε is a weight decay constant, and ρ is a constant learning rate. At each learning cycle (P learning time steps) all patterns are presented sequentially in a random order (randomized at each learning cycle).

Figure Parameters.

Fig. 2.

In all panels σinh/σexc=2 and CVexc/CVinh=2. σinh/σexc=2,CVexc/CVinh=2 with an even split between responsive/unresponsive labels. In B and C fexc=0.8.

Fig. 3.

In all panels N=3,000, k=2, and θ=2 (Random Patterns in Numerical Estimation of κoutmax and κinmax Solutions), leading to σinh/σexc= 2CVexc/CVinh=2, fexc=0.8 with an even split between responsive/unresponsive labels. Numerical results are averaged over 100 samples.

Fig. 4.

In all panels N=1,000, P=1,000, fraction of plus patterns pout=0.1, fexc=0.8, Vrest=0, Vthr=1, τm= 30ms, τs= 10ms, T= 200ms, A= 30Hz (Dynamics of LIF Neuron). Random patterns were drawn as described in Random Patterns in Numerical Estimation of κoutmax and κinmax Solutions with k=2 and θ=2. Maximal κout solutions were found with Γ=1.5 in units of (VthVrest)/σexc. No output noise was added in A–C. In D and E output noise was added with Nnoise=30,000 and σn=2/Nnoise (Dynamics of LIF Neuron).

Fig. 5.

In C–F N=2,000, P=1,000, fexc=0.8, pexc=0.1, pinh=0.2, Γ=10pexc(1pexc) in units of (VthVrest)/σexc. In D and F results are averaged over 10 networks and 10 patterns from each network. See Recurrent Networks with Nonlearned Inhibition for parameters of I connectivity of the nonlearned inhibition networks (gray lines). In G maximal output noise magnitude is defined as the value of σout for which the stable pattern probability is 1/2. To minimize finite-size effects in simulations we used N=3,000, fexc=0.8, pexc=0.5, pinh=0.5, Γ=10pexc(1pexc) in units of (VthVrest)/σexc. Stable pattern probability for each load and noise level was estimated by averaging over five networks and 20 patterns from each network.

Fig. 6.

Random patterns are binary pattern xiμ{0,1} with equal probabilities and an even split of plus and minus patterns. N=3,000, P=900. Learning algorithm parameters are ε= 5 107, ρ=0.1/N, σin=0.1 (Perceptron Learning Algorithm). Results are averaged over 50 samples.

Finding Maximal 𝜿in and Maximal 𝜿out Solutions.

Here we describe how finding the maximal κin and maximal κout solutions can be expressed as convex conic optimization problems. This allows us to efficiently validate the theoretical results. As noted in the main text, maximizing κin is equivalent to maximizing the margin of the solution’s weight vector as is done by support vector machines (39). However, to our knowledge, the application of conic optimization tools for maximizing κout is a unique contribution of our work.

Solution weight vectors, w, with input robustness κin or output robustness κout, satisfy the inequalities

μyμ(𝒘T𝒙μVth)D, [S1]

where Din=|𝒘|κin and Dout=κout (here we assume without loss of generality that Vrest=0).

For each solution w we define effective weights, u, and effective threshold b [the so-called canonical weights and threshold (39)] given by

ui=Λwi [S2]
b=ΛVth, [S3]

where Λ>0 is chosen such that ΛD=1 (for either Din or Dout).

Together with the sign and norm constraints on the weights, u and b must satisfy the linear constraints

μyμ(𝒖T𝒙μb)1isiui0b0, [S4]

where si=1 if wi is excitatory and si=1 if wi is inhibitory, and the quadratic constraint

|𝒖|2b2Γ2/Vth2, [S5]

which enforces the constraint |𝒘|Γ.

For the effective weights and threshold, κin is given by κin=1|𝒖| and κout is given by κout=Vthb. Thus, maximizing κin is equivalent to minimizing |𝒖| and maximizing κout is equivalent to minimizing b. We therefore define a minimization cost function E(𝒖,b) that is given by

Ein(𝒖,b)=12𝒖T𝒖, [S6]

for the κinmax solution, and

Eout(𝒖,b)=b, [S7]

for the κoutmax solution.

To find the maximal κin or maximal κout solution we solve the conic program

minu,b,τE(𝒖,b)+βτ [S8]

in the limit of β, subject to

μyμ(𝒖T𝒙μb)1τisiui0b0τ0b2Γ2/Vth2|𝒖|2. [S9]

τ is a global regularization variable that ensures the existence of a solution to the optimization problem (Eqs. S8 and S9) even when the linear constraints S4 are not realizable. In practice it is sufficient to set β to be a large constant (we set β= 105). If the optimal value of τ is zero, the solution corresponds to the optimal perceptron solution for the classification task. If the optimal value of τ is greater then zero, it indicates that the labeled patterns are not linearly separable and that there is no zero error solution to the classification task. Given that a solution with τ=0 is found, the optimal weights are given by 𝒘=Vth𝒖/b.

SI Capacity for Noneven Split of Plus and Minus Patterns

The capacity of a perceptron with no sign constraints on synaptic weights for classification of random patterns is a function of the fraction of plus patterns in the desired classification, pout (2022), and is given by

αcUnconst.=[poutΔDt(tΔ)2+(1pout)ΔDt(tΔ)2]1,

where Dt is the Gaussian integration measure, Dt=et222πdt, and the order parameter Δ is given by the solution to the equation

0=poutΔDt(tΔ)+(1pout)ΔDt(tΔ).

Fig. S1A depicts the theoretical and measured αc of our “constrained” perceptron as a fraction of the corresponding unconstrained capacity vs. fexc for two values of pout. Fig. S1B depicts theoretical and measured αb as a fraction of αc for two values of pout.

SI Effects of E and I Input Statistics

Our results depend, of course, on parameters, but in a fairly reduced way. In particular, the properties we discuss depend on the ratio of the inputs standard deviations, λ=σinh/σexc, and the ratio of their coefficients of variation, ϕ=CVexc/CVinh (Repica Theory for Sign- and Norm-Constrained Perceptron). As discussed in the main text ϕ determines the value of the optimal fraction of excitatory synapses, fexc, which can be written as fexc=ϕ/(1+ϕ) (Eq. 2 in the main text). Thus, the shape of the phase diagram changes with ϕ (Fig. S2A). The parameter λ has more subtle effects. We note here the main effect λ has on the maximal κin and maximal κout solutions.

Balanced and Unbalanced Maximal 𝜿in Solutions.

The maximal κin solutions can be either balanced or unbalanced, depending on fexc,ϕ,λ and the value of κinmax (example in Fig. S2B). Importantly, for a wide range of reasonable parameters [for example, ϕfexc/(1fexc) and λ1] the κinmax solution is unbalanced for all values of κinmax.

Fraction of “Silent” Weights in Maximal 𝜿in and Maximal 𝜿out Solutions.

As noted in previous studies (25, 27), a prominent feature of “critical” solutions with sign-constraint weights, such as the maximal κin and maximal κout solutions, is that a finite fraction of the synapses are silent; i.e., wi=0. Our theory allows us to derive the full distribution of synaptic efficacies (Distribution of synaptic weights) and calculate the fraction of silent weights for each solution. For the maximal κout solutions in the unbalanced regime (αb<α<αc), the fraction of E (I) silent weights is always larger (smaller) then 1/2 (Fig. S2C). However, in the balanced regime (α<αb) the qualitative behavior depends on λ (Fig. S2C). Interestingly, for unbalanced maximal κin solutions the fraction of silent weights is constant and equals 1/2 for both E and I inputs (Saddle-point equations for the maximal κin solution and Distribution of synaptic weights).

Tuning Properties of Cortical Neurons Suggest That in Cortex fexc>0.5.

In cortical circuits, I neurons tend to fire with higher firing rates and are thought to be more broadly tuned than E neurons, implying, under reasonable assumptions, that both λ and ϕ are greater than 1, leading to fexc>0.5.

To see this, we consider input neurons with Gaussian tuning curves to some external stimulus variable φ[0,1]; i.e., the mean response, xi, of neuron i to stimulus φ is given by

xi=Aiexp[(φφipref)2/(2δi2)], [S10]

where Ai, φipref, and δi characterize the response properties of the neuron. Assuming that φ is distributed uniformly and, for simplicity, that δi1, the mean and variance of the neurons’ responses are given by

x¯i=2πAiδi, [S11]

and

σi2Ai2δiπ, [S12]

where we neglect terms of order δi2. We now assume that Ai=Aexc and δi=δexc if neuron i is E and that Ai=Ainh and δi=δinh if neuron i is I. Further, we assume that I neurons respond with a higher firing rate (Ainh>Aexc) and are more broadly tuned (δinh>δexc). In this case we have

λ=AinhδinhAexcδexc>1, [S13]

and

ϕ=δinhδexc>1. [S14]

SI Recurrent Networks with Nonlearned Inhibition

In our basic model for an associative memory network we assume that the activity of both E and I neurons is specified in the desired memory states and that all network connections are learned. Both of these assumptions can be modified, creating new scenarios with different computational properties.

First, we assume that memory state is specified only by the activity of E neurons and that the memory is recalled when the activity of E neurons matches the memory state regardless of the activity of I neurons. The problem of learning in such a network is computationally hard since the learning needs to optimize the activity of the I neurons, using the full connectivity matrix. In our work we do not address this scenario. Instead we forgo the assumption that E and I connections onto I neurons are learned and replace them with randomly chosen connections, i.e., assume that E to I and I to I connections are not learned and random.

Choosing Random Synapses for I Neurons.

In this scenario the activity of I neurons is determined by the network dynamics. We consider random I to I and E to I weights with means JII and JIE and standard deviations σJII and σJIE. We examine the distribution of I neurons’ membrane potential, given that the activity of E neurons is held at a memory state in which poutexcN neurons are active. When N is large, this distribution is Gaussian and we assume correlations are weak. Thus, the mean activity in the network is the probability that the membrane potential is above threshold and is given by the equation

mI=H(VthVσ2(V)), [S15]

where H(x)=xey222πdy, and V and σ2(V) are the mean and variance of the membrane potential of I neurons, respectively.

On the other hand, given the mean activity, mI, the mean and variance of the membrane potentials are given by

V=N(poutexcgexcJIEmI(1gexc)JII) [S16]
σ2(V)=N([σJIE2+(1poutexc)JIE2]poutexcgexc+[σJII2+(1mI)JII2]mI(1gexc)). [S17]

Together, Eqs. S15S17 define the relations between mI, JII, JIE, σJIE, and σJII.

In our simulations we set mI and the mean and variance of the I to I connections and choose the mean and variance of the E to I connections according to the solution of Eq. S15 [when N is large, JIE is given by NJIE(Vth+mIN(1gexc)JII)/(gexcpoutexc)]. In particular, we choose an I network with binary weights in which each I neuron projects to another I neuron with probability pII with synaptic efficacy jII=1/(NpII). Each E neuron projects to an I neuron with probability pIE with synaptic efficacy jIE that ensures that the mean I activity level at the memory states is mI.

In this parameter regime, the I subnetwork exhibits asynchronous activity, with mean activity mI, at the E memory states. However, different memory states lead to different asynchronous states.

Training Set Definition.

E neurons need to learn to remain stationary at the desired memory states, given the network activity at this state. However, since the activity of the I subnetwork is not stationary at the desired memory states, the training set for learning is not well defined.

To properly define the training set, we sample nsample instances of the generated I activity for each memory state when the activity of the E neuron is clipped to this memory state. Sampling was performed by running the I network dynamics and recording the state of the I neurons after T=100N time steps. We then use the sampled activity patterns together with the E memory states as an extended training set (with Pnsample patterns) for the E neurons.

Learned Network Stability.

The nonfixed point dynamics of the I subnetwork imply that the convergence of the learning on the training set does not entail that the memory states themselves are dynamically stable, in contrast to our prior model in which I neurons learn their synaptic weights. Therefore, after training we measure the probabilities that patterns are stable. This is done by the following procedure: First, we run the network dynamics (with σout=0) when the E neurons’ activity is clipped to the memory state, for Tinit=50N time steps. We then release the E neurons to evolve according to the natural network dynamics and observe whether their activity remains in the vicinity of the memory state for T=500N time steps. In a similar way we test the basins of attraction, starting the E network from a distorted version of the memory state instead of the memory state itself.

Learning Only E to E Connections.

First, we consider the case in which I to E connections are random: Each I neuron projects to an E neuron with probability pEI with synaptic efficacy jEI=1/(NpEI). We then try to find an appropriate E to E connection, using the learning scheme described above. We find that the pattern to pattern fluctuations in the I feedback due to the variance of the I to E connections and the variance in the I network neurons’ activation are substantial and of the same order of the signal differentiating the memory states. In fact, in this scenario the parameters we consider (N=2,000, P=1,000, gexc=0.8, poutexc=0.15, pII=1/2, pIE=1/2, mI=0.4, pEI=1/2, nsample=40) are above the system’s memory capacity and we are unable to find appropriate E weights which implement the desired memory states for the training set. We conclude that this form of balancing I feedback is too restrictive due to the heterogeneity of I to E connections and variability of I neurons’ activity.

Learning Both E to E and I to E Connections.

In this scenario we find the maximal κout solution for the extended training set described above. For the parameters used (N=2,000, P=1,000, gexc=0.8, poutexc=0.15, pII=1/2, pIE=1/2, mI=0.4, nsample=40) we are able to find solutions that implement all of the desired memory states for the extended training set. In addition, we find that the E memory states are dynamically stable with very high probability (we did not observe any unstable pattern). For numerical results see Fig. 5 and Fig S5.

SI Replica Theory for Sign- and Norm-Constrained Perceptron

We use the replica method (75) to calculate the system’s typical properties. For the perceptron architecture the replica symmetric solution has been shown to be stable and exact (2022).

Given a set of P patterns, 𝒙μ, and desired labels yμ=±1 for μ=1,2,,P, the Gardner volume is given by

VG=D(𝒘)μ=1PΘ[yμ(wT𝒙μVth)K], [S18]

where Θ[x] is the Heaviside step function and D(𝒘) is an integration domain obeying the sign and norm constraint |𝒘|Γ.

We assume input pattern and labels are drawn independently from distributions with nonnegative means x¯exc(inh) and SD σexc(inh). Labels are independently drawn from a binary distribution with Pr(yμ=1)=pout and Pr(yμ=1)=1pout.

We handle both input and output robustness criteria by using different K for each case,

Kin=|𝒘|σexcκinKout=Vthκout, [S19]

where here, κin and κout are dimensionless numbers representing the input robustness in units of σexc and the output robustness in units of Vth, respectively.

Further, we define the parameters

λ=σinhσexc,η=x¯inhx¯exc, [S20]

and

ϕ=CVexcCVinh. [S21]

The Order Parameters.

We calculate the mean logarithm of the Gardner volume lnVGxy averaged over the E and I input distributions and the desired label distribution. The result of the calculation is expressing lnVGxy as a stationary-phase integral over a free energy that is a function of several order parameters. The value of the order parameters is determined by the saddle-point equations of the free energy.

In our model the saddle-point equations are a system of six equations for the six order parameters: q,Q,θ,Δ,B, and C.

The order parameters q,Q,θ,andΔ have a straightforward physical interpretation.

The parameter q is the mean typical correlation coefficient between the VPSPs elicited by two different solutions to the same classification task: Given two typical solution weight vectors 𝒘α and 𝒘β, q is given by

q=i=1Nλi2wiαwiβ(i=1Nλi2wiα2)(i=1Nλi2wiβ2), [S22]

where λi=1 if wi is E and λi=λ if wi is I.

Given a typical solution w, the physical interpretation of Q and θ is given by

Q=i=1Nλi2wi2i=1Nwi2 [S23]
θ=Vthσexc(i=1Nλi2wi2)12. [S24]

The norm constraint on the weights is satisfied as long as

θVthσexcΓQ. [S25]

Thus, there are two types of solutions: one in which the value of θ is determined by the saddle-point equation (unbalanced solutions) and the other in which θ is clipped to its lower-bound value (balanced solutions). Note that q and Q remain of order 1 for any scaling of |𝒘| while θ scales as N when |𝒘| is of order 1/N and is of order 1 when |𝒘| is of order 1.

The physical interpretation of Δ can be expressed through the relation

Δ=θ(1x¯exci=1NηiwiVth), [S26]

where ηi=1 if wi is E and ηi=η if wi is I.

Summary of Main Results.

Before describing the full saddle-point (SP) equations and their various solutions in detail, we provide a brief general summary of the results that will hopefully provide some flavor of the derivations for the interested reader.

Since θ is bounded from below by Vth/(σexcΓQ), we have two sets of SP equations which we term the balanced and the unbalanced sets. In both sets, given the free energy F(Q,q,Δ,θ,B,C), five of the SP equations are given by

FQ=Fq=FΔ=FB=FC=0. [S27]

The sixth equation is

Fθ=0 [S28]

in the unbalanced set and is

θ=VthσexcΓQ [S29]

in the balanced set. Importantly, we find that Eq. S28 has solutions only when θN, which implies |𝒘|1/N, and Eq. S29 implies |𝒘|=ΓO(1), justifying the naming of the two sets. The solutions to the two sets of SP equations define the range of possible values of α, κin, and κout that permits the existence of solution weight vectors. There are a number of interesting cases that we analyze below.

We first consider the solutions of the SP equations for zero κout and κin. In this case the SP describes the typical solutions that dominate the Gardner volume. Since the N-dim. volume of balanced solutions with |𝒘|1 is exponentially larger than the volume of unbalanced solutions with |𝒘|1/N, we expect that balanced solutions will dominate the Gardner volume whenever they exist. Indeed, solving the two sets of SP equations we find that solutions to the balanced set exist only for α<αb while solutions for the unbalanced set exist only for αb<α<αc.

Next, we examine the values of κout that permit solutions to the balanced and unbalanced sets of SP equations. Importantly, we show that the unbalanced set can be solved only for κout1/N. Thus, unbalanced solutions cannot have κout of O(1) or conversely all solutions with κout of O(1) are balanced.

Of particular interest are the so-called critical solutions for which q1. At this limit the typical correlation coefficient between the VPSPs elicited by two different solutions to the same classification task approaches unity, which implies that only one solution exists and the Gardner volume shrinks to zero. Thus, for a given κin or κout, the value of α for which q1 is the maximal load for which solutions exist. In this case, the SP describes the properties of the maximal κout or κin solutions.

The structure of the equations in this limit is relatively simple. First, the order parameter Δ is given by the solutions to

0=poutΔ+KDt(tΔK)+(1pout)ΔKDt(tΔ+K) [S30]

with the robustness parameter K being Kin=κin/Q or Kout=θκout and the integration measure, Dt, is given by Dt=et222πdt. Second, we find a simple relation between critical loads of the constrained perceptron considered here and the critical loads of the classic unconstrained perceptron, αUnconst.:

α=2CαUnconst.. [S31]

αUnconst. is given by

αUnconst.=[poutΔ+KDt(tΔK)2+(1pout)ΔKDt(tΔ+K)2]1, [S32]

which is indeed the critical load of an unconstrained perceptron with a given margin K (21). Finding the critical load is then reduced to solving for the order parameter C. For each value of κin>0 or κout>0 only one set of SP equations can be solved, determining whether the maximal κin or κout solutions are balanced or unbalanced. By examining the range of solutions for each set we can find the value of κinmax and κoutmax for any α and determine that (i) the maximal κout solution is balanced for ααb and unbalanced for αb<α<αc and (ii) for a wide range of parameters the maximal κin is unbalanced for all α<αc. In addition, we find that for αb<α<αc, κoutmax is given by

κoutmax=σexcx¯excNκ0, [S33]

where κ0 is finite and larger than zero when α approaches αb from above and κ0 approaches zero when α approaches αc. The above result implies that output robustness can be increased when the tuning of the input is increased. As we discuss in the main text in the context of neuronal selectivity in purely E circuits, sparse input activity is one way to increase the input tuning. If we consider sparse binary inputs with mean activity level s1, the output robustness will be given by κ0(1s)/sNκ0/sN.

Finally, we consider the solutions of the SP equations in the critical limit (q1) for κin=κout=0. In this limit the SP describes the capacity and balanced capacity. We note that for κout=κin=0, Δ (and as a result αcUnconst.) is independent of all of the other order parameters, simplifying the equations. In this case, we have only two coupled SP equations (for the order parameters B, C, and θ), given by

(1σexcBθx¯exc2CN)2C=fexcγ+(B)+(1fexc)γ(Bϕ) [S34]
σexcθx¯excN=fexcγ+(B)+(1fexc)ϕγ(Bϕ)2C, [S35]

where we defined the functions

γ±(x)=x2+1212Du(x+u)2Θ[±(x+u)] [S36]
γ±(x)=xDu(x+u)Θ[±(x+u)]. [S37]

For the balanced SP equations we have θ=Vth/(σexcΓQ) and for the unbalanced SP equations, Eq. S28 reduces to B=0. Finally, α is given by Eq. S31.

For the unbalanced set (B=0) we have γ±(0)=12120Du(u)2=14 and γ±(0)=0Du(u)=1π. We immediately get C=14 and

θ=2Nπ[fexc/CVexc(1fexc)/CVinh]. [S38]

This solution suggests that at capacity the solutions are unbalanced (θN|𝒘|1/N) and that capacity as a function of fexc is constant with

αc=12αcUnconst.. [S39]

However, this solution is valid only as long as θ>Vth/(σexcΓQ), which is true only as long as

[fexc/CVexc(1fexc)/CVinh]>0, [S40]

which implies

fexc>fexc=CVexcCVexc+CVinh. [S41]

For the solution for the balanced set (θ=Vth/(σexcΓQ)) terms with θ/N can be neglected and we have the equation

0=fexcγ+(B)+(1fexc)ϕγ(Bϕ) [S42]

for the order parameter B. C and Q are given by

C=fexcγ+(B)+(1fexc)γ(Bϕ) [S43]
Q=fexcγ+(B)+(1fexc)γ(Bϕ)fexcγ+(B)+(1fexcλ2)γ(Bϕ). [S44]

This solution gives us the balanced capacity line

αb(fexc)=2[fexcγ+(B)+(1fexc)γ(Bϕ)]αUncont.(pout), [S45]

where B is given by the solution of Eq. S42 and αUncont.(pout) is given by Eqs. S32 and S30 with K=0.

Detailed Solutions of the SP Equations.

Below we provide the SP equations and their solutions under various conditions. We also provide the derived form of the distributions of synaptic weights for critical solutions.

The general SP equations.

We define the following:

Fh=poutDtlnH[X+(t)]+(1pout)DtlnH[X(t)] [S46]
Dt=et222πdt [S47]
H(x)=xDt [S48]
X±(t)=qtΔK1q [S49]
Kin=κin/Q [S50]
Kout=θκout [S51]
ϕ+=1,ϕ=ϕ [S52]
λ+=1,λ=λ [S53]
f+=fexc,f=1fexc [S54]
α=PN [S55]
θ=σexc(θΔ)x¯excN [S56]
Z±=2C[2C2CBθ+(1q)×(12αQFhQ(1Qλ±2))]1 [S57]
Φ±(x,z,q)=z(x2+1)2+1q2×[1+DuJ1(±z(x+u)(1q))] [S58]
Φ±(x,z,q)=zx±z(1q)DuJ2(±z(x+u)(1q)) [S59]
J1(x)=H(x)H(x)x [S60]
J2(x)=H(x)H(x). [S61]

The SP equations are given by

C/Q=±f±λ±2Z±Φ±(Bϕ,Z±,q) [S62]
C=±f±Z±Φ±(Bϕ±,Z±,q) [S63]
2Cθ=±f±ϕ±Φ±(Bϕ±,Z±,q) [S64]
α=C(1q)2(Fhq)1 [S65]
σexcBx¯exc2CN=12(1q)FhΔ(Fhq)1 [S66]
σexcBx¯exc2CN=1q2θC+12(1q)Fhθ(Fhq)1𝑶𝑹θ=VthσexcΓQ. [S67]

It is important to note the relation between θ and θ. θ is of the order of θ/N (Δ remains of order 1 under all conditions). Thus, for unbalanced solutions θN and θ is of O(1) while for balanced solutions θ is of O(1) and θ is of O(1/N) and can be neglected.

SP equations for typical solutions.

For typical solutions we solve the SP equations for κin=0 or κout=0, leading to K=0. In this case we have Fhθ=FhQ=0 and thus

Z±=2C[2C2CBθ+(1q)]1 [S68]

and the SP equation for θ is

σexc2CBθx¯excN=1q𝑶𝑹θ=VthσexcΓQ. [S69]

We now can solve the SP equations for the unbalanced case with

Z±=1andθ>VthσexcΓQ,θ>0 [S70]

and for the balanced case with

Z±=2C[2C+(1q)]1andθ=VthσexcΓQ,θ=0. [S71]

We find that for α<αb a solution exists only for equations of the balanced case while for α>αb a solution exists for the equations for the unbalanced case. Thus, typical solutions are balanced below αb and unbalanced above it. The norm of the weight and the IB depicted in Fig. 2 B and C are given by

|𝒘|=1Qθ [S72]
IB=±f±ϕ±Φ±(Bϕ±,Z,q)±±f±ϕ±Φ±(Bϕ±,Z,q). [S73]

Solutions with significant 𝜿𝐨𝐮𝐭 are balanced.

In this section we show that all unbalanced solutions have output robustness of order 1/N and, equivalently, solutions with κout of order 1 are balanced.

Theorem.

All unbalanced solutions have output robustness of the order of 1/N.

Proof.

In the case of output robustness we have K=Kout=θκout and thus FhQ=0. We are looking for unbalanced solutions (θ>0,θO(N)) so we have the equations

σexcBx¯exc2CN=12(1q)FhΔ(Fhq)1 [S74]
σexcBx¯exc2CN=1q2θC+12(1q)Fhθ(Fhq)1. [S75]

Both equations must be satisfied and therefore we have (using Eqs. S65, S74, and S75)

0=FhΔ+Fhθ1αθ. [S76]

Performing the derivatives, we get

0=(κout+1)poutDtJ2(X+)+(κout1)(1pout)DtJ2(X)1qαθ. [S77]

Now, we use Eq. S74, leading to

poutDtJ2(X+)=(1pout)DtJ2(X)M/N, [S78]

where we defined M as

M=2Bσexc1qCx¯excFhq, [S79]

which remains of O(1). Thus, we are left with

0=2κout(1pout)DtJ2(X)(κout+1)MN1qαθ. [S80]

Note that J2(x)<0 and the first term is negative (nonzero). The other two terms scale as 1/N and therefore the equation can be satisfied only if κ=κ0N.

SP equations for critical solutions.

To find the capacity, balanced capacity, and solutions with maximal output and input robustness we consider the limit q1.

We define

GQ=limq1Q(1q)(Fhq)1FhQ, [S81]

and thus in this limit Z± is given by

Z±=[1Bθ2C+(1Qλ±2)GQ]1. [S82]

In addition,

limq1Φ±(x,z,q)=zγ±(x) [S83]
γ±(x)=x2+1212Du(x+u)2Θ[±(x+u)] [S84]
limq1Φ±(x,z,q)=zγ±(x) [S85]
γ±(x)=xDu(x+u)Θ[±(x+u)], [S86]

and, in the q1 limit we have

(1q)FhΔ=M(Δ,K) [S87]
M(Δ,K)=poutΔ+KDt(tΔK)+(1pout)ΔKDt(tΔ+K) [S88]
(1q)2Fhq=12αUnconst.(Δ,K) [S89]
αUnconst.(Δ,K)=[poutΔ+KDt(tΔK)2+(1pout)ΔKDt(tΔ+K)2]1. [S90]

We now write the final form of the SP equations for critical solutions:

C=±f±Z±2γ±(Bϕ±) [S91]
Q=±f±Z±2γ±(Bϕ±)±f±λ±2Z±2γ±(Bϕ±) [S92]
θ=±f±ϕ±Z±γ±(Bϕ±)2±f±Z±2γ±(Bϕ±) [S93]
0=M(Δ,K) [S94]
B2CN=x¯exσex2CαUnconst.(Δ,K)limq1(1q)Fhθ𝑶𝑹θ=VthσexcΓQ. [S95]

Finally α is given by

α=2CαUnconst.(Δ,K). [S96]

Capacity and balanced capacity.

The capacity is given for δ=0 and κ=0. In this case both Fhθ and FhQ are zero and Eq. S95 has two possible solutions:

B=0,θ>VthσexcΓQ, [S97]

for unbalanced solutions or

θ=VthσexcΓQ,θ=0, [S98]

for balanced solutions. In both cases we have Z±=1.

Unbalanced solution.

The SP equations become

Q=1±1λ±2f± [S99]
θ=1π±±f±ϕ± [S100]
C=14 [S101]

and the capacity is given by

αc=12αUnconst.(Δ,0), [S102]

where Δ is given by M(Δ,0)=0.

This solution is valid only when θ is larger than its O(1) lower bound, which is guarantied in the large N limit as long as θ>0. Using Eq. S100, this entails that

fexc>fexc [S103]

with

fexc=ϕ1+ϕ [S104]

or conversely ϕ<fexc1fexc.

Balanced solution.

In this solution we have θ=VthσexcΓQ,θ=0.

B is given by the solution to

±f±ϕ±γ±(Bϕ±)=0 [S105]

and we have

Q=±f±γ±(Bϕ±)±1λ±2f±γ±(Bϕ±) [S106]
C=±f±γ±(Bϕ±) [S107]

and

αb=2CαUnconst.(Δ,0), [S108]

where Δ is given by M(Δ,0)=0.

This gives the balanced capacity line. For fexc<fexc this is the capacity line as well. Thus, for fexc<fexc, at capacity the solution is balanced.

Coexistence of balanced and unbalanced solutions below the balanced capacity line.

To show that unbalanced solutions coexist with balanced solutions for any α<αb, we calculate the capacity of unbalanced solutions with a given norm. This can be done by solving Eqs. S91S94 while imposing the condition |𝒘|=VthrWNσexc through the SP equation of θ:

θ=NWQ. [S109]

We therefore have

θ=σexc(θΔ)x¯excNσexcx¯excWQ=1WQ. [S110]

We are interested in the capacity and therefore we take K=0. As a result we have

Z±=[1B2CQW2]1 [S111]

and the SP equations become

1W=±f±ϕ±γ±(Bϕ±)2±f±λ±2γ±(Bϕ±) [S112]
C=[1BW2CQ]2±f±γ±(Bϕ±) [S113]
0=M(Δ,0), [S114]

where Q is given by

Q=±f±γ±(Bϕ±)±f±λ±2γ±(Bϕ±). [S115]

Given the value of B and Q the equation for C can be solved for C and we get

C=B2QW+±f±γ±(Bϕ±)(B2QW)2. [S116]

This solution is valid as long as C0. The conditional capacity αc(W) is then given by

αc(W)=2CαUnconst.(Δ,0). [S117]

It is easy to see that for W, the SP equations converge to the equations of the balanced capacity and thus αc(W) approaches αb. In addition, we find that for fexc<fexc, αc(W) is a monotonically increasing function of W. Another way to interpret this result is to “invert the function” and ask, What is the minimal value of W that permits solutions given α? Our result implies that strictly below αb the minimal value of W that permits solutions is of O(1) [i.e., |𝒘| of O(1/N)] and unbalanced solutions exist. The minimal W diverges as α approaches αb and hence the solution at αb is balanced [|𝒘| of O(1)].

SP equations for the maximal 𝜿𝐢𝐧 solution.

In this case we have K=Kin=δQ and therefore Fhθ=0. For unbalanced solutions we have

B=0,θ>VthσexcΓQ,θ>0 [S118]

and for balanced solutions we have

θ=VthσexcΓQ,θ=0. [S119]

In both cases Z± is given by

Z±=[1+(1Qλ±2)GQ]1 [S120]

with

GQ=κin/QαUnconst.(Δ,κin/Q)×[poutΔ+κinQDt(tΔκinQ)(1pout)ΔκinQDt(tΔ+κinQ)]. [S121]
Unbalanced solution.

In this case we have equations for Δ and Q:

M(Δ,κin/Q)=0 [S122]
Q=±f±Z±2±1λ±2f±Z±2. [S123]

We then have

θ=12π±±f±ϕ±Z±12±f±Z±2 [S124]
C=14±f±Z±2 [S125]

and

α=2CαUnconst.(Δ,κin/Q). [S126]
Balanced solution.

In this case we have equations for Δ, B, and Q:

M(Δ,κin/Q)=0 [S127]
Q=±f±γ±(Bϕ±)Z±2±1λ±2f±γ±(Bϕ±)Z±2 [S128]
0=±f±ϕ±γ±(Bϕ±)Z±. [S129]

The equations for GQ (Eq. S121) and α (Eq. S126) remain the same; however, C is given by

C=±f±γ±(Bϕ±)Z±2. [S130]
Transition between balanced and unbalanced solutions.

Transition points between balanced and unbalanced solutions depend on the value of ϕ, λ, and fexc. Transition points are points in which both B=0 and θ=0. Thus, we have

ϕ=fexcZ+(1fexc)Z, [S131]

where Q and Δ are given by Eqs. S123 and S122. Thus, ϕ is a function of κin and λ. Solutions are balanced for ϕ>ϕ and unbalanced for ϕ<ϕ (Fig. S2B).

SP equations for the maximal 𝜿𝐨𝐮𝐭 solution.

Unbalanced solution.

This solution is valid for α>αb, fexc>fexc. We look for a solution with θ>0 and thus θ must scale as N.

In this case K=θκ and so FhQ=0 and Z±=[1Bθ2C]1.We then have

Q=±f±γ±(Bϕ±)±1λ±2f±γ±(Bϕ±) [S132]

and we are left with equations to solve for θ, B, and Δ:

θ=±f±ϕ±γ±(Bϕ±)2±f±γ±(Bϕ±) [S133]
M(Δ,K)=0 [S134]

and

B2CN=x¯exσexκαUnconst.(Δ,K)[fΔ+KDt(tΔK)(1f)ΔKDt(tΔ+K)]. [S135]

There is a solution only if

κ=σexcx¯excNκ0,θNx¯exσexθ [S136]

so we have K=θκ0 and

B2C=κ0αUnconst.(Δ,θκ0)×[fΔ+θκ0Dt(tΔθκ0)(1f)Δθκ0Dt(tΔ+θκ0)]. [S137]

Finally, we have C=Z2±f±γ±(Bϕ±), from which we can isolate C to have

C=[±f±γ±(Bϕ±)][1B±f±ϕ±γ±(Bϕ±)2±f±γ±(Bϕ±)]2. [S138]

α is given as before, α=2CαUnconst.(Δ,θκ0).

The equations given in this section are equivalent to the ones derived in Eq. S27.

Balanced solution.

We look for balanced solutions with θ=VthσexcΓQ,θ=0. The SP equations in this case are given by the same equations as the balanced solution described in SP equations for the maximal κin solution with κoutVth/σexcΓ replacing κin.

Distribution of Synaptic Weights.

We derive the mean distribution of synaptic weights for critical solutions (q1)

P±(w)=H(Bϕ±)δ(w)+θ2Nλ±22πσw±2exp×[(θNλ±w+Bϕ±σw±)22σw±2], [S139]

with σw±=Z±2C, where P+ and P denote the probability densities for E and I synaptic weights, respectively, δ(x) is the Dirac delta function, and weights are given in units of Vth/σexc. The fraction of silent synapses is given by H(B) for E synapses and by H(Bϕ) for I synapses.

Discussion

The results we have presented come from imposing a set of fundamental biological constraints: fixed-sign synaptic weights, nonnegative afferent activities, a positive firing threshold (relative to the resting potential), and both input and output forms of noise. Amit et al. (23) studied the maximal margin solution for the sign-constrained perceptron and showed that it has half the capacity of the unconstrained perceptron. However, this previous work considered afferent activities that were centered around zero and a neuron with zero firing threshold, features that preclude the presence of the behavior exhibited by the more biologically constrained model studied here. Chapeton et al. (27) studied perceptron learning with sign-constrained weights and a preassigned level of robustness, but considered only solutions in the unbalanced regime which, as we have shown, are extremely sensitive to output noise.

Learning in neural circuits involves a trade-off between exhausting the system’s capacity for implementing complex input–output functions on the one hand and ensuring good generalization properties on the other. A well-known approach in machine learning has been to search for solutions that fit the training examples while maximizing the distance of samples from the decision surface, a strategy known as maximizing the margin (21, 23, 39). The margin being maximized in this case corresponds, in our framework, to κin. Work in computational neuroscience has implicitly optimized a robustness parameter equivalent to our κout (25, 27). To our knowledge, the two approaches have not been distinguished before or shown to result in solutions with dramatically different noise sensitivities. In particular, over a wide parameter range, we have shown that maximizing κout leads to a balanced solution with minimal sensitivity to output noise and robustness to input noise that is almost as good as that of the maximal margin solution, with only a modest trade-off in capacity. On the other hand, maximizing the margin (κin) often leads to unbalanced solutions with extreme sensitivity to output noise.

The perceptron has long been considered a model of cerebellar learning and computation (67, 68). More recently, Brunel et al. (25) investigated the capacity and robustness of a perceptron model of a cerebellar Purkinje cell, taking all weights to be E. In view of the analysis presented here, balanced solutions are not possible in this case (fexc= 1), and solutions that maximize either input-noise or output-noise robustness both have κout 1/N. These two types of solutions differ in their weight distributions, with experimentally testable consequences for the predicted circuit structure [SI κoutmax and κinmax Solutions in Purely E Networks and Fig. S2C; Brunel et al. (25) considered only solutions that maximize κout]. Output robustness of the unbalanced solutions can be increased by making the input activity patterns sparse. Denoting by s the mean fraction of active neurons in the input, maximum output robustness scales as κout 1/Ns (Fig. 3B and SI Replica Theory for Sign- and Norm-Constrained Perceptron). Thus, the high sparsity in input activation (granule cell activity) of the cerebellum relative to the modest sparsity in the neocortex is consistent with the former being dominated by E modifiable synapses.

Interestingly, our results suggest an optimal ratio of E to I synapses. Capacity in the balanced regime is optimal when fexc=fexc, with fexc determined by the CVs (with respect to stimulus) of the E and I inputs (Eq. 2). Thus, optimality predicts a simple relation between the fraction of E and I inputs and their degree of tuning. Estimating the CVs from existing data is difficult, but it would be interesting to check whether input statistics and connectivity ratios in different brain areas are consistent with this prediction. The commonly observed value in cortex, fexc 0.8, would be optimal for input statistics with CVexc/CVinh 4. In general, we expect that CVexc/CVinh> 1, which implies that fexc> 1/2.

For most of our work, we assumed that I neurons learn to represent specific sensory and long-term memory information, the same as the E ones, and that all synaptic pathways are learned using similar learning rules. While plasticity in both E and I pathways has been observed (5759, 61, 63, 64, 69), accumulating experimental evidence indicates a high degree of cell-type and synaptic-type specificity of the plasticity rules. In addition, synaptic plasticity is under tight control of neuromodulatory systems. At present, it is unclear how to interpret our learning rules in terms of concrete experimentally observed synaptic plasticity. Other functional models of neural learning assume learning only within the E population with inhibition acting as a global stabilizing force. In the case of sensory processing, our approach is consistent with the observation of a similar stimulus tuning of E and I postsynaptic currents in many cortical sensory areas. The role of I neurons in memory representations is less known (but see ref. 70). Importantly, we have shown that our main results are valid also in the case in which I neurons do not explicitly participate in the coding of the memories. Interestingly, our work suggests that even if I neurons are only passive observers during learning processes, learning of I synapses onto E cells can amplify the memory stability of the system against fluctuations in the I feedback. Given the diversity of I cell types it is likely that in the real circuits inhibition plays multiple roles, including both conveying information and providing stability.

Several previous models of associative memory have incorporated biological constraints on the sign of the synapses, Dale’s law, assuming variants of Hebbian plasticity in the E to E synapses (14, 48, 5154). The capacity of the these Hebbian models is relatively poor, and their basins of attraction are small, except at extremely sparse activity levels. In contrast, our model applies a more powerful learning rule that, while keeping the sign constraints on the synapses, exhibits significantly superior performance: with high capacity even for moderate sparsity levels, large basins of attraction, and high robustness to output noise.

From a dynamical systems perspective, the associative memory networks we construct exhibit unusual properties. In most associative memory network models large basins of attractions endow the memory state with robustness against stochasticity in the dynamics (i.e., output noise). Here, we found that, for the same set of fixed-point memories, the synaptic weights with the largest possible basins (the unbalanced solutions with maximal κin) are very sensitive to even mild levels of stochasticity, whereas the balanced synaptic weights with somewhat reduced basins have substantially increased output noise robustness.

At the network level, as at the single-neuron level, imposing basic features of neural circuitry—positive inputs, bounded synapses of fixed sign, a positive firing threshold, and sources of noise—forces neural circuits into the balanced regime. A recent class of models showing computational benefits of balanced inputs uses extremely strong synapses, which are outside the range we have discussed (16). These models are stabilized by instantaneous transmission of signals between neurons which are not required in the range of synaptic strength we consider.

Previous models of balanced networks have highlighted the ability of networks with strong E and I recurrent synapses to settle into a state in which the total input is dynamically balanced without special tuning of the synaptic strengths. Such a state is characterized by a high degree of intrinsically generated spatiotemporal variability (7). Mean population activities respond fast and in a linear fashion to external inputs. Typically, these networks lack the population-level nonlinearity required to generate multiple attractors. In contrast, we have explored the capacity of the balanced network to support multiple stable fixed points by tuning the synaptic strengths through appropriate learning. We note that fully understanding and characterizing the dynamic properties of these networks and their relation to previously studied models remains an important challenge. Despite the dynamic and functional differences in the two classes of networks, the balancing of excitation and inhibition plays a similar role in both. In the first scenario, synaptic balance amplifies small changes in the spatial or temporal properties of the external drive. Similarly, in the present scenario, balanced synaptic architecture leads to enhanced robustness by amplifying the small variations in the synaptic inputs induced by changes in the stimulus or memory identity. It would be very interesting to combine fast dynamics with robust associative memory capabilities.

In conclusion, we have uncovered a fundamental principle of neuronal learning under basic biological constraints. Our work reveals that excitation–inhibition balance may have a critical computational role in producing robust neuronal functionality that is insensitive to output noise. We showed that this balance is important at the single-neuron level for both spiking and nonspiking neurons and at the level of recurrently connected neural networks. Further, the theory suggests that excitation–inhibition balance may be a collective, self-maintaining, emergent phenomenon of synaptic plasticity. Any successful neuronal learning process in the presence of substantial output noise will lead to strong balanced synaptic efficacies with noise robustness features. The fundamental nature of this result suggests that it should apply across a variety of neuronal circuits that learn in the presence of noise.

Materials and Methods

Detailed methods and simulation parameters are given in SI Materials and Methods.

Software.

To acknowledge their contribution to scientific work we cite the open source projects that directly and most crucially contributed to the current work: The Python stack of scientific computing [CPython, Numpy, Scipy, Matplotlib (71), Jupyter/Ipython (72), and others], CVXOPT (73) (convex conic optimization), and IPyparallel (parallelization).

Code Availability.

Python code for simulations and numerical solution of saddle-point equations is available upon request.

SI 𝜿outmax and 𝜿inmax Solutions in Purely E Networks

In purely E networks (fexc=1) all solutions are unbalanced and output robustness can be achieved by sparse input (25) or tonic inhibition (28). However, the distinction between output robustness and input robustness still applies and, surprisingly, maximizing either κin or κout leads to two different solutions with qualitatively different properties.

In particular, as noted in refs. 25 and 27, the fraction of silent weights of the κoutmax solutions increases as the load decreases. Thus, if the network implements the maximal κout solution, network connectivity, as measured in a pairwise stimulation experiment, is expected to be sparse. However, for the maximal κin solution the fraction of silent weights is constant and remains 1/2 for all values of the load. Thus, measured network connectivity is expected to be higher.

Establishing correspondence between theory and experiment in this case is confounded by the difficulty to experimentally distinguish between silent synapses and completely absent synapses that were never available as inputs for the postsynaptic neuron during learning.

Acknowledgments

We thank Misha Tsodyks for helpful discussions. Research was supported by National Institutes of Health Grant MH093338 (to L.F.A. and R.R.), the Gatsby Charitable Foundation through the Gatsby Initiative in Brain Circuitry at Columbia University (L.F.A. and R.R.) and the Gatsby Program in Theoretical Neuroscience at the Hebrew University (H.S.), the Simons Foundation (L.F.A., R.R., and H.S.), the Swartz Foundation (L.F.A., R.R., and H.S.), and the Kavli Institute for Brain Science at Columbia University (L.F.A. and R.R.).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1705841114/-/DCSupplemental.

References

  • 1.Anderson JS, Carandini M, Ferster D. Orientation tuning of input conductance, excitation, and inhibition in cat primary visual cortex. J Neurophysiol. 2000;84:909–926. doi: 10.1152/jn.2000.84.2.909. [DOI] [PubMed] [Google Scholar]
  • 2.Wehr M, Zador AM. Balanced inhibition underlies tuning and sharpens spike timing in auditory cortex. Nature. 2003;426:442–446. doi: 10.1038/nature02116. [DOI] [PubMed] [Google Scholar]
  • 3.Okun M, Lampl I. Instantaneous correlation of excitation and inhibition during ongoing and sensory-evoked activities. Nat Neurosci. 2008;11:535–537. doi: 10.1038/nn.2105. [DOI] [PubMed] [Google Scholar]
  • 4.Poo C, Isaacson JS. Odor representations in olfactory cortex: “Sparse” coding, global inhibition, and oscillations. Neuron. 2009;62:850–861. doi: 10.1016/j.neuron.2009.05.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Atallah BV, Scanziani M. Instantaneous modulation of gamma oscillation frequency by balancing excitation with inhibition. Neuron. 2009;62:566–577. doi: 10.1016/j.neuron.2009.04.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Isaacson J, Scanziani M. How inhibition shapes cortical activity. Neuron. 2011;72:231–243. doi: 10.1016/j.neuron.2011.09.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.van Vreeswijk C, Sompolinsky H. Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science. 1996;274:1724–1726. doi: 10.1126/science.274.5293.1724. [DOI] [PubMed] [Google Scholar]
  • 8.Vreeswijk Cv, Sompolinsky H. Chaotic balanced state in a model of cortical circuits. Neural Comput. 1998;10:1321–1371. doi: 10.1162/089976698300017214. [DOI] [PubMed] [Google Scholar]
  • 9.Froemke RC, Merzenich MM, Schreiner CE. A synaptic memory trace for cortical receptive field plasticity. Nature. 2007;450:425–429. doi: 10.1038/nature06289. [DOI] [PubMed] [Google Scholar]
  • 10.Dorrn AL, Yuan K, Barker AJ, Schreiner CE, Froemke RC. Developmental sensory experience balances cortical excitation and inhibition. Nature. 2010;465:932–936. doi: 10.1038/nature09119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sun YJ, et al. Fine-tuning of pre-balanced excitation and inhibition during auditory cortical development. Nature. 2010;465:927–931. doi: 10.1038/nature09079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Li Yt, Ma Wp, Pan Cj, Zhang LI, Tao HW. Broadening of cortical inhibition mediates developmental sharpening of orientation selectivity. J Neurosci. 2012;32:3981–3991. doi: 10.1523/JNEUROSCI.5514-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Tsodyks MV, Sejnowski T. Rapid state switching in balanced cortical network models. Netw Comput Neural Syst. 1995;6:111–124. [Google Scholar]
  • 14.van Vreeswijk C, Sompolinsky H. Course 9-irregular activity in large networks of neurons in Les Houches. In: Carson C, Boris G, David H, Claude M, Jean D, editors. Methods and Models in Neurophysics. Vol 80. Elsevier; Amsterdam: 2005. pp. 341–406. [Google Scholar]
  • 15.Lim S, Goldman MS. Balanced cortical microcircuitry for maintaining information in working memory. Nat Neurosci. 2013;16:1306–1314. doi: 10.1038/nn.3492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Boerlin M, Machens CK, Denève S. Predictive coding of dynamical variables in balanced spiking networks. PLoS Comput Biol. 2013;9:e1003258. doi: 10.1371/journal.pcbi.1003258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Lajoie G, Lin KK, Thivierge JP, Shea-Brown E. Encoding in balanced networks: Revisiting spike patterns and chaos in stimulus-driven systems. PLoS Comput Biol. 2016;12:e1005258. doi: 10.1371/journal.pcbi.1005258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rosenblatt F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books; Washington, DC: 1962. [Google Scholar]
  • 19.Minsky ML, Papert SA. Perceptrons: Expanded Edition. MIT Press Cambridge; MA: 1988. [Google Scholar]
  • 20.Gardner E. Maximum storage capacity in neural networks. Europhys Lett. 1987;4:481–485. [Google Scholar]
  • 21.Gardner E. The space of interactions in neural network models. J Phys A Math Gen. 1988;21:257–270. [Google Scholar]
  • 22.Gardner E, Derrida B. Optimal storage properties of neural network models. J Phys A Math Gen. 1988;21:271–284. [Google Scholar]
  • 23.Amit DJ, Campbell C, Wong KYM. The interaction space of neural networks with sign-constrained synapses. J Phys A Math Gen. 1989;22:4687–4693. [Google Scholar]
  • 24.Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA. 1982;79:2554–2558. doi: 10.1073/pnas.79.8.2554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Brunel N, Hakim V, Isope P, Nadal JP, Barbour B. Optimal information storage and the distribution of synaptic weights: Perceptron versus Purkinje cell. Neuron. 2004;43:745–757. doi: 10.1016/j.neuron.2004.08.023. [DOI] [PubMed] [Google Scholar]
  • 26.Clopath C, Nadal JP, Brunel N. Storage of correlated patterns in standard and bistable Purkinje cell models. PLoS Comput Biol. 2012;8:e1002448. doi: 10.1371/journal.pcbi.1002448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chapeton J, Fares T, LaSota D, Stepanyants A. Efficient associative memory storage in cortical circuits of inhibitory and excitatory neurons. Proc Natl Acad Sci USA. 2012;109:E3614–E3622. doi: 10.1073/pnas.1211467109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Brunel N. Is cortical connectivity optimized for storing information? Nat Neurosci. 2016;19:749–755. doi: 10.1038/nn.4286. [DOI] [PubMed] [Google Scholar]
  • 29.Amit DJ, Wong KYM, Campbell C. Perceptron learning with sign-constrained weights. J Phys A Math Gen. 1989;22:2039–2045. [Google Scholar]
  • 30.Denève S, Machens CK. Efficient codes and balanced networks. Nat Neurosci. 2016;19:375–382. doi: 10.1038/nn.4243. [DOI] [PubMed] [Google Scholar]
  • 31.Brown DA, Adams PR. Muscarinic suppression of a novel voltage-sensitive K+ current in a vertebrate neurone. Nature. 1980;283:673–676. doi: 10.1038/283673a0. [DOI] [PubMed] [Google Scholar]
  • 32.Madison DV, Nicoll RA. Control of the repetitive discharge of rat CA 1 pyramidal neurones in vitro. J Physiol. 1984;354:319–331. doi: 10.1113/jphysiol.1984.sp015378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Fleidervish IA, Friedman A, Gutnick MJ. Slow inactivation of Na+ current and slow cumulative spike adaptation in mouse and guinea-pig neocortical neurones in slices. J Physiol. 1996;493:83–97. doi: 10.1113/jphysiol.1996.sp021366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Benda J, Herz AVM. A universal model for spike-frequency adaptation. Neural Comput. 2003;15:2523–2564. doi: 10.1162/089976603322385063. [DOI] [PubMed] [Google Scholar]
  • 35.Cover TM. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput. 1965;EC-14:326–334. [Google Scholar]
  • 36.Venkatesh SS. Denker JS. AIP Conference Proceedings. Vol 151. AIP Publishing; Melville, NY: 1986. Epsilon capacity of neural networks; pp. 440–445. [Google Scholar]
  • 37.Liu Bh, et al. Visual receptive field structure of cortical inhibitory neurons revealed by two-photon imaging guided recording. J Neurosci. 2009;29:10520–10532. doi: 10.1523/JNEUROSCI.1915-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kerlin AM, Andermann ML, Berezovskii VK, Reid RC. Broadly tuned response properties of diverse inhibitory neuron subtypes in mouse visual cortex. Neuron. 2010;67:858–871. doi: 10.1016/j.neuron.2010.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Vapnik V. The Nature of Statistical Learning Theory. Springer; New York: 2000. [Google Scholar]
  • 40.Bottou L, Lin CJ. Bottou L, Chapelle O, DeCoste D, Weston J. Large Scale Kernel Machines. MIT Press; Cambridge, MA: 2007. Support vector machine solvers; pp. 301–320. [Google Scholar]
  • 41.Gütig R, Sompolinsky H. The tempotron: A neuron that learns spike timing-based decisions. Nat Neurosci. 2006;9:420–428. doi: 10.1038/nn1643. [DOI] [PubMed] [Google Scholar]
  • 42.Memmesheimer RM, Rubin R, Ölveczky B, Sompolinsky H. Learning precisely timed spikes. Neuron. 2014;82:925–938. doi: 10.1016/j.neuron.2014.03.026. [DOI] [PubMed] [Google Scholar]
  • 43.Gütig R. Spiking neurons can discover predictive features by aggregate-label learning. Science. 2016;351:aab4113. doi: 10.1126/science.aab4113. [DOI] [PubMed] [Google Scholar]
  • 44.Rubin R, Gütig R, Sompolinsky H. DiLorenzo PM, Victor JD. Spike Timing: Mechanisms and Function. CRC Press; Boca Raton, FL: 2013. Neural coding and decoding with spike times; pp. 35–64. [Google Scholar]
  • 45.Gütig R. To spike, or when to spike? Curr Opin Neurobiol. 2014;25:134–139. doi: 10.1016/j.conb.2014.01.004. [DOI] [PubMed] [Google Scholar]
  • 46.Amit DJ, Gutfreund H, Sompolinsky H. Storing infinite numbers of patterns in a spin-glass model of neural networks. Phys Rev Lett. 1985;55:1530–1533. doi: 10.1103/PhysRevLett.55.1530. [DOI] [PubMed] [Google Scholar]
  • 47.Tsodyks MV, Feigel’man MV. The enhanced storage capacity in neural networks with low activity level. Europhys Lett. 1988;6:101–105. [Google Scholar]
  • 48.Roudi Y, Latham PE. A balanced memory network. PLOS Comput Biol. 2007;3:e141. doi: 10.1371/journal.pcbi.0030141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Krauth W, Nadal JP, Mezard M. Basins of attraction in a perceptron-like neural network. Complex Syst. 1988;2:387–408. [Google Scholar]
  • 50.Krauth W, Nadal JP, Mezard M. The roles of stability and symmetry in the dynamics of neural networks. J Phys A Math Gen. 1988;21:2995–3011. [Google Scholar]
  • 51.Amit DJ, Treves A. Associative memory neural network with low temporal spiking rates. Proc Natl Acad Sci USA. 1989;86:7871–7875. doi: 10.1073/pnas.86.20.7871. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Golomb D, Rubin N, Sompolinsky H. Willshaw model: Associative memory with sparse coding and low firing rates. Phys Rev A. 1990;41:1843–1854. doi: 10.1103/physreva.41.1843. [DOI] [PubMed] [Google Scholar]
  • 53.Hasselmo ME. Acetylcholine and learning in a cortical associative memory. Neural Comput. 1993;5:32–44. [Google Scholar]
  • 54.Barkai E, Bergman RE, Horwitz G, Hasselmo ME. Modulation of associative memory function in a biophysical simulation of rat piriform cortex. J Neurophysiol. 1994;72:659–677. doi: 10.1152/jn.1994.72.2.659. [DOI] [PubMed] [Google Scholar]
  • 55.Kadmon J, Sompolinsky H. Transition to chaos in random neuronal networks. Phys Rev X. 2015;5:041030. [Google Scholar]
  • 56.Harish O, Hansel D. Asynchronous rate chaos in spiking neuronal circuits. PLoS Comput Biol. 2015;11:e1004266. doi: 10.1371/journal.pcbi.1004266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Nugent FS, Kauer JA. LTP of GABAergic synapses in the ventral tegmental area and beyond. J Physiol. 2008;586:1487–1493. doi: 10.1113/jphysiol.2007.148098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Chevaleyre V, Castillo PE. Heterosynaptic LTD of hippocampal GABAergic synapses: A novel role of endocannabinoids in regulating excitability. Neuron. 2003;38:461–472. doi: 10.1016/s0896-6273(03)00235-6. [DOI] [PubMed] [Google Scholar]
  • 59.D’amour J, Froemke R. Inhibitory and excitatory spike-timing-dependent plasticity in the auditory cortex. Neuron. 2015;86:514–528. doi: 10.1016/j.neuron.2015.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.McBain CJ, Kauer JA. Presynaptic plasticity: Targeted control of inhibitory networks. Curr Opin Neurobiol. 2009;19:254–262. doi: 10.1016/j.conb.2009.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Lu Jt, Li Cy, Zhao JP, Poo Mm, Zhang Xh. Spike-timing-dependent plasticity of neocortical excitatory synapses on inhibitory interneurons depends on target cell type. J Neurosci. 2007;27:9711–9720. doi: 10.1523/JNEUROSCI.2513-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Kullmann DM, Lamsa KP. Long-term synaptic plasticity in hippocampal interneurons. Nat Rev Neurosci. 2007;8:687–699. doi: 10.1038/nrn2207. [DOI] [PubMed] [Google Scholar]
  • 63.Lamsa KP, Kullmann DM, Woodin MA. Spike-timing dependent plasticity in inhibitory circuits. Front Synaptic Neurosci. 2010;2:8. doi: 10.3389/fnsyn.2010.00008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Larsen RS, Sjöström PJ. Synapse-type-specific plasticity in local circuits. Curr Opin Neurobiol. 2015;35:127–135. doi: 10.1016/j.conb.2015.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15:1929–1958. [Google Scholar]
  • 66.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
  • 67.Marr D. A theory of cerebellar cortex. J Physiol. 1969;202:437–470. doi: 10.1113/jphysiol.1969.sp008820. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Albus JS. A theory of cerebellar function. Math Biosci. 1971;10:25–61. [Google Scholar]
  • 69.Hennequin G, Everton AJ, Vogels TP. Inhibitory plasticity: Balance, control and codependence. Annu Rev Neurosci. 2017;40:557, 579. doi: 10.1146/annurev-neuro-072116-031005. [DOI] [PubMed] [Google Scholar]
  • 70.Wilent WB, Nitz DA. Discrete place fields of hippocampal formation interneurons. J Neurophysiol. 2007;97:4152–4161. doi: 10.1152/jn.01200.2006. [DOI] [PubMed] [Google Scholar]
  • 71.Hunter JD. Matplotlib: A 2d graphics environment. Comput Sci Eng. 2007;9:90–95. [Google Scholar]
  • 72.Pérez F, Granger BE. IPython: A system for interactive scientific computing. Comput Sci Eng. 2007;9:21–29. [Google Scholar]
  • 73.Andersen M, Dahl J, Liu Z, Vandenberghe L. Sra S, Nowozin S, Wright SJ. Optimization for Machine Learning. MIT Press; Cambridge, MA: 2011. Interior-point methods for large-scale cone programming; pp. 55–83. [Google Scholar]
  • 74.Litwin-Kumar A, Harris KD, Axel R, Sompolinsky H, Abbott LF. Optimal degrees of synaptic connectivity. Neuron. 2017;93:1153–1164.e7. doi: 10.1016/j.neuron.2017.01.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Engel A, Broeck C. Statistical Mechanics of Learning. Cambridge Univ Press; Cambridge, UK: 2001. [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES