Balanced excitation and inhibition are required for high-capacity, noise-robust neuronal selectivity

Ran Rubin; L F Abbott; Haim Sompolinsky

doi:10.1073/pnas.1705841114

. 2017 Oct 17;114(44):E9366–E9375. doi: 10.1073/pnas.1705841114

Balanced excitation and inhibition are required for high-capacity, noise-robust neuronal selectivity

Ran Rubin ^a,¹, L F Abbott ^a,^b, Haim Sompolinsky ^c,^d

PMCID: PMC5676886 PMID: 29042519

Significance

Neurons and networks in the cerebral cortex must operate reliably despite multiple sources of noise. Using a mathematical analysis and model simulations, we show that noise robustness requires synaptic connections to be in a balanced regime in which excitation and inhibition are strong and largely cancel each other. Our theory predicts an optimal ratio for the number of excitatory and inhibitory synapses that depends on the statistics of afferent activity and is consistent with data. This distinct form of excitation–inhibition balance is essential for robust neuronal selectivity and crucial for stability in associative memory networks, and it emerges automatically from learning in the presence of noise.

Keywords: E/I balance, synaptic learning, associative memory

Abstract

Neurons and networks in the cerebral cortex must operate reliably despite multiple sources of noise. To evaluate the impact of both input and output noise, we determine the robustness of single-neuron stimulus selective responses, as well as the robustness of attractor states of networks of neurons performing memory tasks. We find that robustness to output noise requires synaptic connections to be in a balanced regime in which excitation and inhibition are strong and largely cancel each other. We evaluate the conditions required for this regime to exist and determine the properties of networks operating within it. A plausible synaptic plasticity rule for learning that balances weight configurations is presented. Our theory predicts an optimal ratio of the number of excitatory and inhibitory synapses for maximizing the encoding capacity of balanced networks for given statistics of afferent activations. Previous work has shown that balanced networks amplify spatiotemporal variability and account for observed asynchronous irregular states. Here we present a distinct type of balanced network that amplifies small changes in the impinging signals and emerges automatically from learning to perform neuronal and network functions robustly.

The response properties of neurons in many brain areas including cerebral cortex are shaped by the balance between coactivated inhibitory and excitatory synaptic inputs (1–5) (for a review see ref. 6). Excitation–inhibition balance may have different forms in different brain areas or species and its emergence likely arises from multiple mechanisms. Theoretical work has shown that, when externally driven, circuits of recurrently connected excitatory and inhibitory neurons with strong synapses settle rapidly into a state in which population activity levels ensure a balance of excitatory and inhibitory currents (7, 8). Experimental evidence in some systems indicates that synaptic plasticity plays a role in maintaining this balance (9–12). Here we address the question of what computational benefits are conferred by the excitation–inhibition balance properties of balanced and unbalanced neuronal circuits. Although it has been shown that networks in the balanced states have advantages in generating a fast and linear response to changing stimuli (7, 8, 13, 14), the advantages and disadvantages of excitation–inhibition balance for general information processing have not been elucidated [except in special architectures (15–17)]. Here we compare the computational properties of neurons operating with and without excitation–inhibition balance and present a constructive computational reason for strong, balanced excitation and inhibition: It is needed for neurons to generate selective responses that are robust to output noise, and it is crucial for the stability of memory states in associative memory networks. The distinct balanced networks we present here naturally and automatically emerge from synaptic learning that endows neurons and networks with robust functionality.

We begin our analysis by considering a single neuron receiving input from a large number of afferents. We characterize its basic task as discriminating patterns of input activation to which it should respond by firing action potentials from other patterns which should leave it quiescent. Neurons implement this form of response selectivity by applying a threshold to the sum of inputs from their presynaptic afferents. The simplest (parsimonious) model that captures these basic elements is the binary model neuron (18, 19), which has been studied extensively (20–23) and used to model a variety of neuronal circuits (24–28). Our work is based on including and analyzing the implications of four fundamental neuronal features not previously considered together: (i) nonnegative input, corresponding to the fact that neuronal activity is characterized by firing rates; (ii) a membrane potential threshold for neuronal firing above the resting potential (and hence a silent resting state); (iii) sign-constrained and bounded synaptic weights, meaning that individual synapses are either excitatory or inhibitory and the total synaptic strength is limited; and (iv) two sources of noise, input and output noise, representing fluctuations arising from variable stimuli and inputs and from processes within the neuron. As will be shown, these features imply that, when the number of input afferents is large, synaptic input must be strong and balanced if the neuron’s response selectivity is to be robust. We extend our analysis to recurrently connected networks storing long-term memory and find that similar balanced synaptic patterns are required for the stability of the memory states against noise. In addition, maximizing the performance of neurons and networks in the balanced state yields a prediction for the optimal ratio of excitatory to inhibitory inputs in cortical circuits.

Results

Our model neuron is a binary unit that is either active or quiescent, depending on whether its membrane potential is above or below a firing threshold. The potential, labeled $V_{PSP}$ , is a weighted sum of inputs $x_{i}$ , $i = 1, 2, \dots, N$ , that represent afferent firing rates and are thus nonnegative,

V_{PSP} (𝒙, 𝒘) = V_{rest} + \sum_{i = 1}^{N} w_{i} x_{i},

[1]

where $V_{rest}$ is the resting potential of the neuron and x and w are $N$ -component vectors with elements $x_{i}$ and $w_{i}$ , respectively. The weight $w_{i}$ represents the synaptic efficacy of the $i$ th input. If $V_{PSP} \geq V_{th}$ the neuron is in an active state; otherwise, it is in a quiescent state. To implement the segregation of excitatory and inhibitory inputs, each weight is constrained so that $w_{i} \geq 0$ if input $i$ is excitatory and $w_{i} \leq 0$ if input $i$ is inhibitory.

To function properly in a circuit, a neuron must respond selectively to an appropriate set of inputs. To characterize selectivity, we define a set of $P$ exemplar input vectors $𝒙^{μ}$ , with $μ = 1,2, \dots, P$ , and randomly assign them to two classes, denoted as “plus” and “minus.” The neuron must respond to inputs belonging to the plus class by firing (active state) and to the minus class by remaining quiescent. This means that the neuron is acting as a perceptron (18–22, 25, 27, 29). We assume the $P$ input activations, $𝒙^{μ}$ , are drawn identically and independently from a distribution with nonnegative means, $\bar{𝒙}$ , and covariance matrix, $C$ (when $N$ is large, higher moments of the distribution of x have negligible effect). For simplicity we assume that the stimulus average activities are the same for all input neurons within a population, so that ${\bar{x}}_{i} = {\bar{x}}_{exc (inh)} \geq 0$ , and that $C$ is diagonal with equal variances within a population, $σ_{i}^{2} = σ_{exc (inh)}^{2}$ . Note that synaptic weights are in units of membrane potential over input activity levels (firing rates) and hence will be measured in units of $(V_{th} - V_{rest}) / σ_{exc}$ .

We call weight vectors that correctly categorize the $P$ exemplar input patterns, $𝒙^{μ}$ for $μ = 1, 2, \dots, P$ , solutions of the categorization task presented to the neuron. Before describing in detail the properties of the solutions, we outline a broad distinction between two types of possible solutions. One type is characterized by weak synapses, i.e., individual synaptic weights that are inversely proportional to the total number of synaptic inputs, $w_{i} \sim 1 / N$ [note that weights weaker than $O (1 / N)$ will not enable the neuron to cross the threshold]. For this solution type, the total excitatory and inhibitory parts of the membrane potential are of the same order as the neuron’s threshold. An alternative scenario is a solution in which individual synaptic weights are relatively strong, $w_{i} \sim 1 / \sqrt{N}$ . In this case, both the total excitatory and inhibitory parts of the potential are, individually, much greater than the threshold, but they make approximately equal contributions, so that excitation and inhibition tend to cancel, and the mean $V_{PSP}$ is close to threshold. We call the first type of solution unbalanced and the second type balanced. Importantly, since both balanced and unbalanced solutions solve the categorization task with the same value of $V_{th}$ , the two solution types are not related to each other by a global scaling of the weights but represent different patterns of ${w_{i}}$ . Note that the norm of the weight vector, $| 𝒘 | = \sqrt{\sum_{i = 1}^{N} w_{i}^{2}}$ , serves to distinguish the two types of solutions. This norm is of order of $1 / \sqrt{N}$ for unbalanced solutions and of order $1$ in the balanced case. Weights with norms stronger than $O (1)$ lead to membrane potential values that are much larger in magnitude than the neuron’s threshold. For biological neurons postsynaptic potentials of such magnitude can result in very high, unreasonable firing rates (although see ref. 30). We therefore impose an upper bound of the weight norm $| 𝒘 | \leq Γ$ , where $Γ$ is of order 1. We now argue that the differences between unbalanced and balanced solutions have important consequences for the way the system copes with noise.

As mentioned above, neurons in the central nervous system are subject to multiple sources of noise, and their performance must be robust to its effects. We distinguish two biologically relevant types of noise: input noise resulting from the fluctuations of the stimuli and sensory processes that generate the stimulus-related input x and output noise arising from afferents unrelated to a particular task or from biophysical processes internal to the neuron, including fluctuations in the effective threshold due to spiking history and adaptation (31–33) (for theoretical modeling see ref. 34). Both sources of noise result in trial-by-trial fluctuations of the membrane potential $V_{PSP}$ and, for a robust solution, the probability of changing the state of the output neuron relative to the noise-free condition must be low. The two sources of noise differ in their dependence on the magnitude of the synaptic weights. Because input noise is filtered through the same set of synaptic weights as the signal, its effect on the membrane potential is sensitive to the magnitude of those weights. Specifically, if the trial-to-trial variability of each input $x_{i}^{μ}$ is characterized by SD $σ_{in}$ , the fluctuations it generates in the membrane potential have SD $| 𝒘 | σ_{in}$ (Fig. 1, Top Left and Top Right). On the other hand, the effect of output noise is independent of the synaptic weights w. Output noise characterized by SD $σ_{out}$ induces membrane potential fluctuations with the same SD $σ_{out}$ for both types of solutions (Fig. 1, Bottom Left and Bottom Right).

Fig. 1. — Only balanced solutions can be robust to both input and output noise. Each panel depicts membrane potentials resulting from different input patterns in a classification task. Weights are unbalanced [ $| 𝒘 | = O (1 / \sqrt{N})$ , *Top Left* and *Bottom Left*] or balanced [ $| 𝒘 | = O (1)$ , *Top Right* and *Bottom Right*]. The neuron is in an active state only if the membrane potential is greater than the threshold $V_{th}$ . The input pattern class (plus or minus) is specified by the squares underneath the horizontal axis. Each input pattern determines a membrane potential (mean, horizontal bars) that fluctuates from one presentation to another due to input noise (*Top Left* and *Top Right*) and output noise (*Bottom Left* and *Bottom Right*). Vertical bars depict the magnitude of the noise in each case. The variability of the mean $V_{PSP}$ across input patterns (which is the signal differentiating input pattern classes) is proportional to $| 𝒘 |$ . As a result, the mean $V_{PSP}$ s for unbalanced solutions (*Top Left* and *Bottom Left*) cluster close to the threshold [difference from threshold $O (1 / \sqrt{N})$ ]. For balanced solutions (*Top Right* and *Bottom Right*), the mean $V_{PSP}$ s have a larger spread [potential difference $O (1)$ ]. Input noise (fluctuations of $x_{i}$ , *Top Left* and *Top Right*) produces membrane potential fluctuations with SD that is proportional to $| 𝒘 |$ , which is of $O (1 / \sqrt{N})$ for unbalanced solutions (*Top Left*) and of $O (1)$ for balanced solutions (*Top Right*). Output noise (*Bottom Left* and *Bottom Right*) produces membrane potential fluctuations that are independent of $| 𝒘 |$ , so it is of the same magnitude for both solution types. Thus, while both balanced and unbalanced solutions can be robust to input noise, only balanced solutions can also be robust to substantial output noise.

We can now appreciate the basis for the difference in the noise robustness of the two types of solutions. For unbalanced solutions, the difference between the potential induced by typical plus and minus noise-free inputs (the signal) is of the order of $| 𝒘 | = O (1 / \sqrt{N})$ (Fig. 1, Top Left and Bottom Left). Although the fluctuations induced by input noise are of this same order (Fig. 1, Top Left), output noise yields fluctuations in the membrane potential of order $1,$ which is much larger than the magnitude of the weak signal (Fig. 1, Bottom Left). In contrast, for balanced solutions, the signal differentiating plus and minus patterns is of order $| 𝒘 | = O (1)$ , which is the same order as the fluctuations induced by both types of noise (Fig. 1, Top Right and Bottom Right). Thus, we are led to the important observation that the balanced solution provides the only hope for producing selectivity that is robust against both types of noise. However, there is no guarantee that robust, balanced solutions exist or that they can be found and maintained in a manner that can be implemented by a biological system. Key questions, therefore, are, Under what conditions does a balanced solution to the selectivity task exist? And what are, in detail, its robustness properties? Below, we derive conditions for the existence of a balanced solution, analyze its properties, and study the implications for single-neuron and network computation. We show that, subject to a small reduction of the total information stored in the network, robust and balanced solutions exist and can emerge naturally when learning occurs in the presence of output noise.

Balanced and Unbalanced Solutions.

We begin by presenting the results of an analytic approach (20–22) for determining existence conditions and analyzing properties of weights that generate a specified selectivity, independent of the particular method or learning algorithm used to find the weights (SI Replica Theory for Sign- and Norm-Constrained Perceptron). We validate the theoretical results by using numerical methods that can determine the existence of such weights and find them if they exist (SI Materials and Methods).

When the number of patterns $P$ is too large, solutions may not exist. The maximal value of $P$ that permits solutions is proportional to the number of synapses, $N$ , so a useful measure is the ratio $α = P / N$ , which we call the load. The capacity, denoted as $α_{c}$ , is the maximal load that permits solutions to the task. The capacity depends on the relative number of plus and minus input patterns. For simplicity we assume throughout that the two classes are equal in size (but see SI Capacity for Noneven Split of Plus and Minus Patterns). A classic result for the perceptron with weights that are not sign constrained is that the capacity is $α_{c} = 2$ (20, 35, 36). For the “constrained perceptron” considered here, we find that $α_{c}$ depends also on the fraction of excitatory afferents, denoted by $f_{exc}$ . This fraction is an important architectural feature of neuronal circuits and varies in different brain systems. For $f_{exc} = 0$ , namely a purely inhibitory circuit, the capacity vanishes, because when all of the input to the neuron is inhibitory, $V_{PSP}$ cannot reach threshold and the neuron is quiescent for all stimuli. When the circuit includes excitatory synapses, the task can be solved by appropriate shaping of the strength of the excitatory and inhibitory synapses, and this ability increases the larger the fraction of excitatory synapses is. Therefore, for $f_{exc} > 0$ , $α_{c}$ increases with $f_{exc}$ up to a maximum of $α_{c} = 1$ (half the capacity of an unconstrained perceptron) for fractions equal to or greater than a critical fraction $f_{exc} = f_{exc}^{⋆}$ . This dependence can be summarized by the capacity curve $α_{c} (f_{exc})$ (Fig. 2A, solid line) bounding the range of loads which admit solutions for the different excitatory/inhibitory ratios.

Fig. 2. — Balanced and unbalanced solutions. (A) Perceptron solutions as a function of load and fraction of excitatory weights. Above the capacity line [ $α_{c} (f_{exc})$ , solid line] no solution exists. Balanced solutions exist only below the balanced capacity line [ $α_{b} (f_{exc})$ , dashed shaded line]. Between the balanced capacity and maximum capacity lines, only unbalanced solutions exist (U). On the other hand, below the balanced capacity line, unbalanced solutions coexist with balanced ones (B+U). (B) The norm of the synaptic weight vector of typical solutions as a function the load [in units of $(V_{th} - V_{rest}) / σ_{exc}$ ]. Below $α_{b}$ the norm is clipped at its upper bound $Γ$ (in this case $Γ = 1$ ). Above $α_{b}$ the norm collapses and is of order $1 / \sqrt{N}$ (shown here for $N = 3, 000$ ). (C) The input imbalance index (IB, Eq. 3) of typical solutions as a function of the load. Note the sharp onset of imbalance above $α_{b}$ . In B and C $f_{exc} = 0.8$ , yielding $α_{c} = 1$ . See *SI Materials and Methods* for other parameters used. For simulation results see Fig. S1.

Interestingly, $f_{exc}^{⋆}$ depends on the statistics of the inputs (SI Replica Theory for Sign- and Norm-Constrained Perceptron). We denote the coefficient of variation (CV) of the excitatory and inhibitory input activities by ${CV}_{exc} = σ_{exc} / {\bar{x}}_{exc}$ and ${CV}_{inh} = σ_{inh} / {\bar{x}}_{inh}$ , respectively. These measure the degree of stimulus tuning of the two afferent populations. In terms of these quantities, the critical excitatory fraction is

f_{exc}^{⋆} = \frac{{CV}_{exc}}{{CV}_{exc} + {CV}_{inh}} .

[2]

In other words, the critical ratio between the number of excitatory and inhibitory afferents [ $f_{exc}^{⋆}$ /(1 − $f_{exc}^{⋆}$ )] equals the ratio of their degree of tuning. To understand the origin of this result, we note that to maximize the encoding capacity, the relative strength of the weights should be inversely proportional to the SD of their afferents, ${\bar{w}}_{exc (inh)} \propto 1 / σ_{exc (inh)}$ , implying that the mean total synaptic inputs are proportional to $f_{exc} {\bar{w}}_{exc} {\bar{x}}_{exc} + f_{inh} {\bar{w}}_{inh} {\bar{x}}_{inh} = f_{exc} / {CV}_{exc} - f_{inh} / {CV}_{inh}$ , where $f_{inh} = 1 - f_{exc}$ . For excitatory fraction $f_{exc} > f_{exc}^{⋆}$ this mean total synaptic inputs are positive, allowing the voltage to reach the threshold and the neuron to implement the required selectivity task with optimally scaled weights. Thus, the capacity of the neuron is unaffected by changes in $f_{exc}$ in the range $f_{exc}^{⋆} \leq f_{exc} \leq 1$ . For excitatory fraction $f_{exc} < f_{exc}^{⋆}$ the neuron cannot remain responsive (reach threshold) with optimally scaled weights, and thus the capacity is reduced.

In cortical circuits, inhibitory neurons tend to fire at higher firing rates and are thought to be more broadly tuned than excitatory neurons (4, 37, 38), implying $f_{exc}^{⋆} > 0.5$ (SI Effects of E and I Input Statistics). This is consistent with the abundance of excitatory synapses in cortex. However, input statistics that make $f_{exc}^{⋆} < 0.5$ do not change the qualitative behavior we discuss (SI Effects of E and I Input Statistics and Fig. S2A).

Fig. S2. — Effects of input statistics. (A) Solution type vs. $f_{exc}$ and $α$ (as a fraction of $α_{c}^{Unconst}$ ) (*Capacity for Noneven Split of Plus and Minus Patterns*) for different values of $ϕ = {CV}_{exc} / {CV}_{inh}$ . From *Left* to *Right* $ϕ = 1 / \sqrt{2}, 1, \sqrt{2}$ . Lines are as in Fig. 2A. (B) Type of maximal $κ_{in}$ solutions vs. $ϕ$ and $κ_{in}^{\max}$ for different values of $λ = σ_{inh} / σ_{exc}$ . For a wide range of $ϕ$ and $λ$ these solutions are unbalanced for all values of $κ_{in}^{\max}$ . Here $f_{exc} = 0.8$ and $λ = 1 / 2, 1, 2$ from *Left* to *Right*. (C) Fraction of silent weights for maximal $κ_{out}$ solutions vs. the load for different values of $λ$ . Fraction of E silent weights is shown in blue and fraction of I silent weights is depicted in red. Here $f_{exc} = 0.8, ϕ = \sqrt{2}$ , $p_{out} = 0.1$ , and $λ = 1 / 2, 1, 2$ from *Left* to *Right*. Notably, for unbalanced, maximal $κ_{in}$ solutions the fraction of silent weights is constant and equals 0.5 for both E and I inputs (*Saddle-point equations for the maximal $κ$ in solution* and *Distribution of synaptic weights*).

For load levels below the capacity, many synaptic weight vectors solve the selectivity task and we now describe the properties of the different solutions. In particular, we investigate the parameter regimes where balanced or unbalanced solutions exist. We find that unbalanced solutions with weight vector norms of order $1 / \sqrt{N}$ exist for all load values below $α_{c}$ . As for the balanced solutions with weight vector norms of order $1$ , they exist below a critical value $α_{b}$ which may be smaller than $α_{c}$ . Specifically, for $f_{exc} \leq f_{exc}^{⋆}$ balanced solutions exist for all load values below capacity; i.e., $α_{b} = α_{c}$ . For $f_{exc} > f_{exc}^{⋆}$ , $α_{b}$ is smaller than $α_{c}$ and decreases with $f_{exc}$ until it vanishes at $f_{exc} = 1$ (Fig. 2A, dashed shaded line). The absence of balanced solutions for $f_{exc} = 1$ is clear, as there is no inhibition to balance the excitatory inputs. Furthermore, the synaptic excitatory weights must be weak (scaling as $1 / N$ ) to ensure that $V_{PSP}$ remains close to threshold (slightly above it for plus patterns and slightly below it for minus ones). For $1 \geq f_{exc} > f_{exc}^{⋆}$ the predominance of excitatory afferents precludes a balanced solution if the load is high; i.e., $α_{b} \leq α \leq α_{c}$ . As argued above and shown below, balanced solutions are more robust than unbalanced solutions. Hence, we can identify $f_{exc}^{⋆}$ as the optimal fraction of excitatory input, because it is the fraction of excitatory afferents for which the capacity of balanced solutions is maximal.

For loads below $α_{b}$ both balanced and unbalanced solutions exist, raising the question, What would be the character of a weight vector that is sampled randomly from the space of all possible solutions? Our theory predicts that whenever the balanced solutions exist, the vast majority of the solutions are balanced and furthermore have a weight vector norm that is saturated at the upper bound $Γ$ . This is a consequence of the geometry of high-dimensional spaces in which volumes are dominated by the volume elements with the largest radii (see SI Replica Theory for Sign- and Norm-Constrained Perceptron for details). Thus, for $f_{exc} > f_{exc}^{⋆}$ , the typical solution undergoes a transition from balanced to unbalanced weights as $α$ crosses the balanced capacity line $α_{b} (f_{exc})$ . At this point the norm of the solution collapses from $Γ$ to $| 𝒘 | \sim 1 / \sqrt{N}$ (Fig. 2B).

As explained above, for balanced solutions we expect to find a near cancellation of the total excitatory (E) and inhibitory (I) inputs. Our theory confirms this expectation. To measure the degree of E-I cancellation for any solution, we introduce the imbalance index,

IB = \frac{\sum_{i} w_{i} {\bar{x}}_{i}}{\sum_{i \in exc} w_{i} {\bar{x}}_{i} - \sum_{i \in inh} w_{i} {\bar{x}}_{i}},

[3]

where the overbar symbol denotes an average over all of the input patterns ( $μ$ ) and, as mentioned above, E weights are nonnegative ( $w_{i} \geq 0$ ) and I weights are nonpositive ( $w_{i} \leq 0$ ). Whereas for the unbalanced solution the IB is of order $1$ , for the balanced solution it is small, of order $1 / \sqrt{N}$ . Thus, the typical solution below $α_{b}$ has zero imbalance (to leading order in $N)$ , but the imbalance increases sharply as $α$ increases beyond $α_{b}$ (Fig. 2C).

Noise Robustness of Balanced and Unbalanced Solutions.

To characterize the effect of noise on the different solutions, we introduce two measures, input robustness $κ_{in}$ and output robustness $κ_{out}$ , which characterize the robustness of the noise-free solutions to the addition of two types of noise. To ensure robustness to output noise, the noise-free membrane potential that is the closest to the threshold must be sufficiently far from it. Thus, we define

κ_{out} = \min_{μ} | \sum_{i = 1}^{N} w_{i} x_{i}^{μ} - 1 |,

[4]

where the minimum is taken over all of the input patterns in the task and the threshold is 1 [because we measure the weights in units of $(V_{th} - V_{rest}) / σ_{exc}$ ]. The second measure, which characterizes robustness to input noise, must take into account the fact that the fluctuations in the membrane potential induced by this form of noise scale with the size of the synaptic weights. Hence, $κ_{in} = κ_{out} / | 𝒘 |$ [ $κ_{in}$ corresponds to the notion of margin in machine learning (39)]. Efficient algorithms for finding the solution with a maximum possible value of $κ_{in}$ have been studied extensively (39, 40). We have developed an efficient algorithm for finding solutions with maximal $κ_{out}$ (SI Materials and Methods).

We now ask, What are the possible values of the input and output robustness of unbalanced and balanced solutions? Our theory predicts that the majority of both balanced and unbalanced solutions have vanishingly small values of $κ_{in}$ and $κ_{out}$ and are thus very sensitive to noise. However, for a given load (below capacity) robust solutions do exist, with a spectrum of robustness values up to maximal values, $κ_{in}^{\max} > 0$ and $κ_{out}^{\max} > 0$ . Since the magnitude of w scales both signal and noise in the inputs, $κ_{in}^{\max}$ is not sensitive to $| 𝒘 |$ and hence is of $O (1)$ for both unbalanced and balanced solutions. On the other hand, $κ_{out}^{\max} = κ_{in}^{\max} | 𝒘 |$ is proportional to $| 𝒘 |$ . Thus, we expect $κ_{out}^{\max}$ to be of $O (1)$ when balanced solutions exist and of $O (1 / \sqrt{N})$ when only unbalanced solutions exist. In addition, we expect that increasing the load will reduce the value of $κ_{in}^{\max}$ and $κ_{out}^{\max}$ as the number of constraints that need to be satisfied by the synaptic weights increases.

In Fig. 3 we present the values of $κ_{in}^{\max}$ and $κ_{out}^{\max}$ vs. the load. As expected, we find that the values of both $κ_{in}^{\max}$ and $κ_{out}^{\max}$ reach zero as the load approaches the capacity, $α_{c}$ (and diverges, as $N \to \infty$ , for vanishingly small loads). However, $κ_{out}^{\max}$ is only substantial (of order 1) and proportional to $Γ$ below $α_{b}$ where balanced solutions exist (Fig. 3 A and B). In contrast, $κ_{in}^{\max}$ remains of order 1 up to the full capacity, $α_{c}$ (Fig. 3C). What are the properties of “optimal” solutions that achieve the maximal robustness to either input or output noise? We find that the solutions that achieve the maximal output robustness, $κ_{out}^{\max}$ , are balanced for all $α \leq α_{b}$ and their norm saturates the upper bound, $Γ$ (Fig. S3B). Interestingly, for a wide range of input parameters (SI Replica Theory for Sign- and Norm-Constrained Perceptron, Effects if E and I Input Statistics, and Fig. S2B), solutions that achieve the maximal input robustness, $κ_{in}^{\max}$ , are unbalanced solutions (Fig. S3C). Nevertheless, we find that below the critical balance load, $α_{b}$ , the $κ_{in}$ values of the balanced maximal $κ_{out}$ solutions are of the same order as, and indeed close to, $κ_{in}^{\max}$ (Fig. 3C, dashed shaded line). In fact, the balanced solution with maximal $κ_{out}$ also poses the maximal value of $κ_{in}$ that is possible for balanced solutions.

Fig. S3. — Properties of maximal output and input robustness solutions. (A) Input robustness, $κ_{in}$ , vs. the load for the maximal $κ_{in}$ solution (red) and the maximal $κ_{out}$ solution (blue). (B) Norm of synaptic weight vector vs. the load for the maximal $κ_{out}$ solution. In the balanced regime ( $α < α_{b}$ ) the norm saturates its upper bound $Γ = 1$ . Since the norm is constant, maximizing $κ_{out}$ in the balanced regime is equivalent to maximizing $κ_{in}$ under the constraint $| 𝒘 | = Γ$ . (C) Rescaled norm of synaptic weight vector ( $\sqrt{N} | 𝒘 |$ ) vs. the load for the maximal $κ_{in}$ solution. To demonstrate the $1 / \sqrt{N}$ scaling of the weight vector norm, colors depict results for $N = 750$ (gray), $N = 1, 500$ (green), and $N = 3, 000$ (red). In *A–C* lines depict theoretical prediction. $f_{exc} = 0.8$ , $p_{out} = 0.1$ , $ϕ = {CV}_{exc} / {CV}_{inh} = \sqrt{2},$ and $λ = σ_{inh} / σ_{exc} = 2$ ; results are averaged over 100 samples.

We conclude that solutions that are robust to both input and output noise exist for loads less than $α_{b}$ which for $f_{exc} > f_{exc}^{⋆}$ is smaller than $α_{c}$ . However, as long as $f_{exc}$ is close to $f_{exc}^{⋆}$ , the reduction in capacity from $α_{c}$ to $α_{b}$ imposed by the requirement of robustness is small.

Balanced and Unbalanced Solutions for Spiking Neurons.

Neurons typically receive their input and communicate their output through action potentials. Thus, a fundamental question is, How will the introduction of spike-based input and spiking output affect our results? Here we show that the main properties of balanced and unbalanced synaptic efficacies, as discussed above, remain when the inputs are spike trains and the model neuron implements spiking and membrane potential reset mechanisms.

We consider a leaky integrate-and-fire (LIF) neuron that is required to perform the same binary classification task we considered using the perceptron. Each input is characterized by a vector of firing rates, $𝒙^{μ}$ . Each afferent generates a Poisson spike train over an interval from time $t = 0$ to $t = T$ , with mean rate $r_{i} \propto x_{i}^{μ}$ . The LIF neuron integrates these input spikes (SI Materials and Methods) and emits an output spike whenever its membrane potential crosses a firing threshold. After each output spike, the membrane potential is reset to the resting potential, and the integration of inputs continues. We define the output state of the LIF neuron, using the total number of output spikes $n_{spikes}$ : The neuron is quiescent if $n_{spikes} \leq n_{thr}$ and active if $n_{spikes} > n_{thr}$ , where $n_{thr}$ is chosen to maximize classification performance. We do not discuss the properties of learning in LIF neurons (41–45), but instead test the properties of the solutions (weights) obtained from the perceptron model when they are used for the LIF neuron. In particular, we compare the performance of the balanced, maximal $κ_{out}$ solution and the unbalanced, maximal $κ_{in}$ solution. When the synaptic weights of the LIF neuron are set according to the two perceptron solutions, the mean output of the LIF neuron correctly classifies the input patterns (according to the desired classification; Fig. S4). Consistent with the results for the perceptron, we find that with no output noise the performance of both solutions is good, even in the presence of the substantial input noise caused by Poisson fluctuations in the number of input spikes and their timings (Fig. 4 A–C). When the output noise magnitude is increased (SI Materials and Methods), however, the performance of the unbalanced maximal $κ_{in}$ solution quickly deteriorates, whereas the performance of the balanced maximal $κ_{out}$ solution remains largely unaffected (Fig. 4 D–F). Thus, the spiking model recapitulates the general results found for the perceptron.

Fig. S4. — Neuronal selectivity for a spiking neuron. Both panels depict the histograms of the mean output spike count for patterns belonging to the plus (blue) and minus (red) classes of an LIF neuron with balanced weights maximizing $κ_{out}$ (*Left*) and unbalanced weights maximizing $κ_{in}$ (*Right*). Here the magnitude of the output noise is zero. In both cases the mean output spike count can be used to correctly classify the patterns. For parameters used see Fig. 3.

Fig. 4. — Selectivity in a spiking model. A and B (D and E) depict the output of an LIF neuron with no (high) output noise for the balanced maximal $κ_{out}$ solution (A and D) and the unbalanced maximal $κ_{in}$ solution (B and E). C and F depict the receiver operating characteristic (ROC) curves for the two solutions under the no output noise (C) and high output noise (F) conditions obtained as the decision threshold ( $n_{thr}$ ) is modified from 0 to $\infty$ . Consistent with the results of the perceptron, the performances of the two solutions with no output noise are very similar with a slight advantage for the maximal $κ_{in}$ solution. With higher levels of output noise, the performance of the unbalanced maximal $κ_{in}$ solution quickly deteriorates, whereas the performance of the balanced maximal $κ_{out}$ solution is only slightly affected. $| 𝒘 |$ of the balanced solution was chosen to equalize the mean output spike count across all patterns in both solutions (mean $n_{spike} \sim 4$ ). See *SI Materials and Methods* for parameters used.

Balanced and Unbalanced Synaptic Weights in Associative Memory Networks.

Thus far, we have considered the selectivity of a single neuron, but our results also have important implications for recurrently connected neuronal networks, in particular recurrent networks implementing associative memory functions. Models of associative memory in which stable fixed points of the network dynamics represent memories, and memory retrieval corresponds to the dynamic transformation of an initial state to one of the memory-representing fixed points, have been a major focus of memory research for many years (24, 27, 28, 46–48). For the network to function as an associative memory, memory states must have large basins of attraction so that the network can perform pattern completion, recalling a memory from an initial state that is similar but not identical to it. In addition, memory retrieval must be robust to output noise. As we will show, the variables $κ_{in}$ and $κ_{out}$ for the synaptic weights projecting onto individual neurons in the network are closely related to the sizes of the basins of attraction of the memories and the robustness to output noise, respectively.

We consider a network that consists of $N f_{exc}$ E and $N (1 - f_{exc})$ I, recurrently connected binary neurons. The network operates in discrete time steps and at each step the state of one randomly chosen neuron, $i$ , is updated according to

s_{i} (t + 1) = Θ [\sum_{j \neq i} J_{i j} s_{j} (t) + η_{out} (t) - 1] .

[5]

Here $Θ (x) = 1$ for $x \geq 0$ and 0 otherwise, $J_{i j}$ is the weight of the synapse from neuron $j$ to neuron $i$ , and $η_{out} (t)$ , the output noise, is a Gaussian random variable with SD $σ_{out}$ . $P$ randomly chosen binary activity patterns ${𝐬^{μ}}, μ = 1, 2, \dots, P$ (where each $s_{i}^{μ} = {0,1}$ ) representing the stored memories are encoded in the recurrent synaptic matrix $J$ . This is achieved by treating each neuron, say $i,$ as a perceptron with a weight vector $𝒘^{i} = {J_{i j}}_{j \neq i}$ that maps its inputs ${s_{j}^{μ}$ } from all other neurons to its desired output $s_{i}^{μ}$ for each memory state (Fig. 5 A and B and SI Materials and Methods). This creates an attractor network in which the memory states are fixed points of the dynamics in the noise-free condition ( $σ_{out} = 0$ ) (20).

Fig. 5. — Recurrent associative memory network constructed using single-neuron feedforward learning. (A) A fully connected recurrent network of E and I neurons in a particular memory state. Active (quiescent) neurons are shown in black (white). E and I synaptic connections ( $J_{i j}$ ) are shown in yellow and blue, respectively (not all connections are depicted). Lines symbolize axons, and synapses are shown as small circles. (B) To find an appropriate $J_{i j}$ , the postsynaptic weights of each neuron are set using the memory-state activities of the other neurons as input and its own memory state as the desired output. In this example, neuron 4 will implement its desire memory state through modification of the weights $J_{4 j}$ for $j = 1, 2, 3, 5, 6, 7$ . C and E show the fraction of erroneous (different from a given memory pattern) neurons in the network as a function of time. (C) Network dynamics with $σ_{out} = 0$ . An initial state of the network can either converge to the memory state (blue) or diverge to other network states (red). (D) Probability of converging to a memory state vs. initial pattern distortion (*SI Materials and Methods*) for a network with unbalanced maximal $κ_{in}$ weights (green), a network with balanced maximal $κ_{out}$ weights (black), and a network with balanced maximal $κ_{out}$ weights with unlearned inhibition (gray, main text). (E) Network dynamics with $σ_{out} > 0$ . The network is initialized at the memory state. The dynamics can be stable (blue; the network remains close to the memory state), or unstable (red; the network diverges to another state). (F) Probability of stable dynamics for at least $500 N$ time steps for networks initialized at the memory state in the presence of output noise vs. $σ_{out}$ . Colors are the same as in D. (G) Maximal output noise magnitude vs. load for networks with balanced synaptic weights matrix maximizing $κ_{out}$ . Similar to $κ_{out}$ , the maximal output noise magnitude is of order 1 only below $α_{b}$ . Above it, even though solutions exist they are extremely sensitive to output noise. Results are shown for $f_{exc} = 0.8$ (green) and $f_{exc} = 0.9$ (magenta). See *SI Materials and Methods* for parameters used.

We do not attempt to perform a complete analysis of the effects of input and output noise in recurrent networks, a difficult challenge. Instead, we link observations from our single-neuron analysis to key features of a recurrent network performing a memory function. The capacity of such a memory network is defined as the maximal load for which the memory patterns can be fixed points of the noise-free dynamics, stable against single-neuron perturbations. This condition is met as long as the single-neuron synaptic weights possess substantial $κ_{in}$ (i.e., $κ_{in} \sim O (1)$ ) for all neurons. Thus, the single-neuron capacities will determine the overall network capacity. As we showed before, the capacity of a single-neuron perceptron depends on the statistics of its desired output (which in our case is the sparsity of activity across memory states). Since this statistic may be different in E and I populations, the single-neuron capacity of the two populations may vary, and hence the global capacity of the recurrent network is the minimum of the single-neuron capacities of the two neuron types. As long as $P$ is smaller than this critical capacity, a recurrent weight matrix exists for which all $P$ memory states are stable fixed points of the noiseless dynamics. However, such solutions are not unique, and the choice of a particular matrix can endow the network with different robustness properties. As stated above, to properly function as an associative memory the fixed points must have large basins of attraction. Corruption of the initial state away from the parent memory pattern introduces variability into the inputs of each neuron for subsequent dynamic iterations and hence is equivalent to injecting input noise in the single-neuron feedforward case. The network propagates this initial input noise in a nontrivial way; however, its magnitude always remains proportional to the magnitude of the norm of the neurons’ synaptic weights. We therefore expect that a large basin of attraction is achieved when the matrix $J$ yields a large input noise robustness for each neuron in the (noise-free) fixed points (49, 50). When output noise is introduced to the network dynamics ( $σ_{out} > 0$ ), the network may propagate it as input noise to other neurons in subsequent time steps. However, initially its magnitude is proportional only to $σ_{out}$ and is unaffected by the scale of the synaptic weights. Thus, we expect that the requirement that the memory states and retrieval will be robust against output noise is satisfied when $J$ yields a large output noise robustness for each neuron in the (noise-free) fixed points. We therefore consider two types of recurrent connections: one in which each row of $J$ is a weight vector that maximizes $κ_{in}$ and hence, in the chosen parameter regime, is necessarily unbalanced and a second one in which the rows of the connection matrix correspond to balanced solutions that maximize $κ_{out}$ .

We estimate the basins of attraction of the memory patterns numerically by initializing the network in states that are corrupted versions of the memory states (SI Materials and Methods) and observing whether the network, with $σ_{out} = 0$ , converges to the parent memory state (Fig. 5C, blue) or diverges away from it (Fig. 5C, red). We define the size of the basin of attraction as the maximum distortion in the initial state that ensures convergence to the parent memory with high probability.

Comparing the basins of attraction of the two types of networks, we find that the mean basin of attraction of the unbalanced network is moderately larger than that of the balanced one (Fig. 5D), consistent with the slightly lower value of $κ_{in}$ in the balanced case (Fig. 5D). On the other hand, the behavior of the two networks is strikingly different in the presence of output noise. To illustrate this, we start each network at a memory state and determine whether it is stable (remains in the vicinity of this state for an extended period), despite the noise in the dynamics (Fig. 5E). We estimate the output noise tolerance of the network by measuring the maximal value of $σ_{out}$ for which the memory states are stable (Fig. 5F). We find that memory states in the balanced solution with maximal $κ_{out}$ are stable for noise levels that (for the network sizes used in the simulation) are an order of magnitude larger than for the unbalanced network with maximal $κ_{in}$ (Fig. 5F).

Finally, we ask how the noise robustness of the memory states in the balanced network depends on the number of memories. As shown in Fig. 5F, for a fixed level of load below capacity, memory patterns are stable ( $P_{stable} > 0.5$ ) as long as levels of noise remain below a threshold value, which we denote as $σ_{out}^{\max} (α)$ . When $σ_{out}$ increases beyond $σ_{out}^{\max} (α)$ , stability of the memory states rapidly deteriorates. The critical noise function $σ_{out}^{\max} (α)$ decreases smoothly from a large value at small $α$ to zero at a level of load, $α_{b}$ . This load coincides with the maximal load for which both E and I neurons have balanced solutions (Fig. 5G). For loads $α_{b} < α < α_{c}$ , all solutions are unbalanced, and hence the magnitude of the stochastic dynamical component can be at most of order $1 / \sqrt{N}$ .

The Role of Inhibition in Associative Memory Networks.

In our associative memory network model, we assumed that both E and I neurons code desired memory states and that all network connections are modified by learning. Most previous models of associative memory that separate excitation and inhibition assume that memory patterns are restricted to the E population, whereas inhibition provides stabilizing inputs (14, 48, 51–54). To address the emergence of balanced solution in scenarios where the I neurons do not represent long-term memories, we studied an architecture where I to E, I to I, and E to I connections are random sparse matrices with large amplitudes, resulting in I activity patterns driven by the E memory states. In such conditions, the I subnetwork exhibits irregular asynchronous activity with an overall mean activity that is proportional to the mean activity of the driving E population (7, 55, 56). Although the mean I feedback provided to the E neurons can balance the mean excitation, the variability in this feedback injects substantial noise onto the E neurons, which degrades system performance (SI Recurrent Networks with Nonlearned Inhibition). This variability stems from the differences in I activity patterns generated by the different E memory states (albeit with the same mean). Additional noise is caused by the temporal irregular activity of the chaotic I dynamics. Next we ask whether the system’s performance can be improved through plasticity in the I to E connections for which some experimental evidence exists (23, 57–60). Indeed, we find an appropriate plasticity rule for this pathway (SI Recurrent Networks with Nonlearned Inhibition) that suppresses the spatiotemporal fluctuations in the I feedback, yielding a balanced state that behaves similarly to the fully learned networks described above (Fig. 5 D and F, gray lines). Interestingly, in this case the basins of attraction of the balanced network are comparable to or even larger than the basins of the unbalanced fully learned network (compare gray to green curves in Fig. 5D). Despite the fact that no explicit memory patterns are assigned to the the I populations, the I activity plays a computational role that goes beyond providing global I feedback; when the weights of the I to E connections are shuffled, the network’s performance significantly degrades (Fig. S5).

Fig. S5. — Effect of shuffling learned I weights in recurrent networks with nonlearned I activity. Shaded line depicts the performance of the network with random E to I and I to I connections and learned E to E and I to E connections (*Recurrent Networks with Nonlearned Inhibition*) (same as gray line in Fig. 5D). Solid line depicts the performance of the same network with the I weights of each E neuron randomly shuffled. Thus, the distribution of I synaptic weights for each E neuron is identical in both cases. This result shows that the learned I weights are important for network performance and stability.

Learning Robust Solutions.

Thus far, we have presented analytical and numerical investigations of solutions that support selectivity or associative memory and provide substantial robustness to noise. However, we did not address the way in which these robust solutions could be learned by a biological system. In fact, as stated above, the majority of solutions for these tasks have vanishingly small output and input robustness and the above maximum robustness solutions are found numerically by special learning algorithms. Therefore, an important question is whether noise robust weights can emerge naturally from synaptic learning rules that are appropriate for neuronal circuits.

The actual algorithms used for learning in the neural circuits are generally unknown, especially within a supervised learning scenario. Experiments suggest that learning rules may depend on brain area and both pre- and postsynaptic neuron types (for example, refs. 57–59, 61; for reviews see refs. 60, 62–64). From a theoretical perspective, the properties of the solutions found through learning, and in particular their noise robustness, depend on both the type and parameters of the algorithm and the properties of the space of possible solutions. However, our theory suggests that a general, simple way to ensure that learning arrives at a robust solution is to introduce noise during learning. Indeed, this is a common practice in machine learning for increasing generalization abilities [a specific form of data augmentation (65, 66)]. The rationale is that learning algorithms that achieve low error in the presence of noise necessarily lead to solutions that are robust against noise levels at least as large as those present during learning. In the case we are considering, learning in the presence of substantial input noise should lead to solutions that have substantial $κ_{in}$ and introducing output noise during learning should lead to solutions with substantial $κ_{out}$ . We note that $κ_{in}$ may be large even if $κ_{out}$ remains small (for example, in unbalanced solutions with maximal $κ_{in}$ ) but not vice versa [because $κ_{out}$ of order 1 implies $| 𝒘 |$ (and as a result $κ_{in}$ ) of order 1 as well]. Therefore, learning in the presence of significant output noise should lead to solutions that are robust to both input and output noise, whereas learning in the presence of input noise alone may lead to unbalanced solutions that are sensitive to output noise, depending on the details of the learning algorithm. We therefore predict that performing successful learning in the presence of output noise is a sufficient condition for the emergence of excitation–inhibition balance.

To demonstrate that robust balanced solutions emerge in the presence of output noise, we consider a variant of the perceptron learning algorithm (18) in which we have forced the sign constraints on the weights (29) and, in addition, added a weight decay term implementing a soft constraint on the magnitude of the weights (SI Materials and Methods). This supervised learning rule possesses several important properties that are required for biological plausibility: It is online, and weights are modified incrementally after each pattern presentation; it is history independent so that each weight update depends only on the current pattern and error signal; and finally, it is simple and local, and weight updates are a function of the error signal and quantities that are available locally at the synapse (presynaptic activity and synaptic efficacy). When this learning rule is applied to train a selectivity task in the presence of substantial output noise, the resulting solution has a balanced weight vector with substantial $κ_{out}$ and $κ_{in}$ (Fig. 6, shaded lines). In contrast, if learning occurs with weak output noise, the algorithm’s tendency to reduce the magnitude of the weights causes the resulting solution to be unbalanced with small $κ_{out}$ , while its $κ_{in}$ may be large if substantial input noise is present during learning (Fig. 6, solid lines). When this learning rule is applied in the load regime where only unbalanced solutions exist ( $α_{b} < α < α_{c}$ ), learning fails to achieve reasonable performance when applied in the presence of large output noise. When noise is scaled down to the value allowed by $κ_{out}^{\max} \propto 1 / \sqrt{N}$ , learning yields unbalanced solutions with robustness values of the order of the maximum allowed in this region (Fig. S6).

Fig. 6. — Emergence of E-I balance from learning in the presence of output noise. All panels show the outcome of perceptron learning for a noisy neuron (*SI Materials and Methods*) under low ( $σ_{out} = 0.01$ , solid lines) and high ( $σ_{out} = 0.1$ , shaded lines) output noise conditions. Except for $σ_{out}$ , all model and learning parameters are identical for the two conditions (including $σ_{in} = 0.1$ ). (A) Mean training error vs. learning cycle. On each cycle, all of the input patterns to be learned are presented once. The error decays and plateaus at its minimal value under both low and high output noise conditions. (B) Mean IB (Eq. 3) vs. learning cycle. IB remains of order 1 under low output noise conditions and drops close to zero under high output noise conditions. (C) Mean input robustness ( $κ_{in}$ ) vs. learning cycle. Input robustness is high under both output noise conditions. (D) Mean output robustness ( $κ_{out})$ vs. learning cycle. Output robustness is substantial only under the high output noise learning condition. These results demonstrate that robust balanced solutions naturally emerge under learning in the presence of high output noise. See *SI Materials and Methods* for other parameters used.

Fig. S6. — Perceptron learning with input and output noise for $α_{b} < α < α_{c}$ . *A–D* depict the outcome of simple perceptron learning for a noisy neuron (*Materials and Methods*) under low output noise conditions ( $σ_{out} = 0.01 / \sqrt{N}$ , solid lines) and high output noise conditions ( $σ_{out} = 0.01$ , shaded lines). Except for $σ_{out}$ all model and learning parameters are identical for the two conditions (including $σ_{in} = 0.01$ ). (A) Mean training error vs. learning cycle. At each cycle all of the learned input patterns are presented once. (B) Mean imbalance index vs. learning cycle. IB remains of order 1 under low output noise conditions and drops to lower values under high output noise conditions. (C) Mean input robustness ( $κ_{in}$ ) vs. learning cycle. (D) Mean rescaled output robustness ( $\sqrt{N} κ_{out}$ ) vs. learning cycle. The error decays and plateaus at its minimal value under both low and high output noise conditions; however, for high output noise the error remains substantial. Both output and input robustness are negative under the high output noise conditions. (The learning does not find a weights vector that performs the classification of the noise-free patterns correctly.) Input and output robustness are positive when the output noise scales at most as $1 / \sqrt{N}$ . Random patterns are binary pattern $x_{i}^{μ} \in {0, 1}$ with equal probabilities and an even split of plus and minus patterns. $N = 3, 000$ , $P = 2, 400$ . Learning algorithm parameters: $ε = 10^{- 8}$ , $ρ = 0.02 / N$ , $σ_{in} = 0.01$ . Results are averaged over 50 samples.

SI Materials and Methods

Finding Perceptron Solutions.

There are a number of numerical methods for choosing a weight vector w that generates a specified selectivity (25, 27, 29, 39). For numerical simulations we developed algorithms that find the maximal $κ_{out}$ and maximal $κ_{in}$ solutions that obey the imposed biological constraints. These solutions can be found directly by solving conic programming optimization problems for which efficient algorithms exist and are widely available (73). For details see Finding Maximal $κ$ in and $κ$ out Solutions.

Random Patterns in Numerical Estimation of $𝜿_{out}^{max}$ and $𝜿_{in}^{max}$ Solutions.

In numerical experiments for Figs. 3 and 4 and Figs. S1 and S3, E inputs for the random patterns were drawn i.i.d. from an exponential distribution with unity mean and SD. I inputs were drawn from a Gamma distribution with shape parameter $k$ and scale parameter $θ$ [the probability density function of the Gamma distribution is given by $P (x) = \frac{1}{Γ (k) θ} {(\frac{x}{θ})}^{k - 1} e^{- \frac{x}{θ}}$ , where $Γ (k)$ is the Gamma function].

Fig. S1. — Numerical measurement of capacity and balanced capacity. (A) Capacity of sign-constrained weights perceptron, $α_{c}$ , vs. the fraction of excitatory inputs, $f_{exc}$ , as a fraction of the capacity of an unconstrained perceptron (*Capacity for Noneven Split of Plus and Minus Patterns*). Theory is depicted in black. Simulation results are shown in blue for $p_{out} = 0.5$ and red for $p_{out} = 0.1$ . To measure $α_{c}$ we measure the probability of the existence of a solution as a function of $α$ . We estimate $α_{c}$ by the load at which this probability is 1/2. (B) Capacity of balanced solutions, $α_{b}$ , as a fraction of $α_{c}$ vs. $f_{exc} > f_{exc}^{⋆}$ . Since $κ_{out}^{\max}$ solutions are balanced whenever balanced solutions exist, to measure $α_{b}$ we measure the probability of finding a balanced $κ_{out}^{\max}$ solution, i.e., a solution that saturates the upper bound on $| 𝒘 |$ . We estimate $α_{b}$ by the load at which this probability is 1/2. In both A and B, $N = 3, 000$ , ${CV}_{exc} / {CV}_{inh} = \sqrt{2}$ .

Dynamics of LIF Neuron.

Input spike trains.

For each input pattern $𝒙^{μ}$ input spike trains of input afferent $i = 1, 2, \dots, N$ were drawn randomly from a Poisson process with rate $r_{i} = A x_{i}^{μ},$ for duration $T$ .

Synaptic input.

Given the set of input spike trains ${t_{i}},$ $i = 1,2, \dots, N$ the contribution of synaptic input to the membrane potential is given by $V_{syn} (t) = \sum_{i} w_{i} \sum_{t_{i}} K (t - t_{i})$ , where $w_{i}$ is the synaptic efficacy of the synapse from the $i$ th input afferent and $K (t)$ is a postsynaptic potential kernel. $K (t) = 0$ for $t < 0$ and is given by $K (t) = V_{0} (e^{- \frac{t}{τ_{m}}} - e^{- \frac{t}{τ_{s}}})$ for $t > 0$ , where $τ_{m}$ and $τ_{s}$ are the membrane and synaptic time constants, respectively, and $V_{0}$ is such that the maximal value of $K (t)$ is one.

Output noise.

Output noise was added to the neuron’s membrane potential as random synaptic input $V_{o . n .} (t) = \sum_{j = 1}^{N_{noise}} g_{j} K (t - t_{j})$ , where $g_{j}$ was randomly drawn from a zero mean Gaussian distribution with SD $σ_{n}$ and $t_{j} \in (0, T)$ was randomly drawn from a uniform distribution.

Voltage reset.

After each threshold crossing the membrane potential was reset to its resting potential. Given the set of output spike times ${t_{spike}}$ , the total contribution of voltage reset to the membrane potential can be written as $V_{reset} (t) = - (V_{th} - V_{rest}) \sum_{t_{spike}} R (t - t_{spike})$ , where $V_{rest}$ and $V_{th}$ are the neuron’s resting and threshold potentials, respectively, and $R (t)$ implements the postspike voltage reset. $R (t) = 0$ for $t < 0$ and is given by $R (t) = e^{- \frac{t}{τ_{m}}}$ for $t \geq 0$ . This form ensures the voltage is reset to the resting potential immediately after an output spike.

Membrane potential.

Finally, the neuron’s membrane potential is given by $V (t) = V_{rest} + V_{syn} (t) + V_{o . n .} (t) + V_{reset} (t)$ , where $V_{reset}$ is computed given $V_{syn}$ and $V_{o . n .}$ .

Simulations of Recurrent Networks.

Memory states.

Networks were trained to implement a set of $P$ memory states, specified by $x_{i}^{μ} \in {0,1},$ $i = 1, 2, \dots, N$ , $μ = 1, 2, \dots, P$ , as stable fixed points of the noise-free dynamics. Memory states were randomly chosen i.i.d. from binary distributions with parameter $p_{exc / inh}$ according to the type of the $i$ th input afferent; i.e., $\Pr (x_{i}^{μ} = 1) = p_{exc / inh}$ and $\Pr (x_{i}^{μ} = 0) = 1 - p_{exc / inh}$ .

Initial pattern distortion.

To start the network close to a memory state $𝒙^{μ}$ , the initial state of the network, $s_{i} (t = 0)$ for $i = 1, 2, \dots, N$ , was randomly chosen according to $\Pr (s_{i} = 1) = (1 - δ) Θ (2 x_{i}^{μ} - 1) + δ \frac{p_{exc / inh}}{1 - p_{exc / inh}} Θ (- 2 x_{i}^{μ} + 1)$ , where $δ$ is the initial pattern distortion level (Fig. 5B) and $Θ (x) = 1$ for $x \geq 0$ and 0 otherwise. This procedure ensures that the mean activity levels of E and I neurons in the initial state are the same as their mean activity levels in the memory state (74).

Perceptron Learning Algorithm.

The perceptron learning algorithm (Fig. 6 and Fig. S6) learns to classify a set of $P$ labeled patterns. At learning time-step $t$ one pattern $𝒙_{t}$ with desired output $y_{t} = \pm 1$ is presented to the neuron. The output of the perceptron $s_{t}$ is given by $s_{t} = sign (𝒘_{t}^{T} 𝒙_{t} + η_{t} - 1),$ where $η_{t}$ is a Gaussian random variable with zero mean and variance ${| 𝒘_{t} |}^{2} σ_{in}^{2} + σ_{out}^{2}$ . The error signal is defined as $e_{t} = y_{t} Θ (- s_{t} y_{t})$ , where $Θ (x) = 1$ for $x > 0$ and zero otherwise. After each pattern presentation all synapses are updated. The synaptic weights of E inputs are updated according to $w_{i, t + 1} = {[(1 - ε) w_{i, t} + ρ e_{t} x_{i, t}]}_{+}$ and weights of I inputs are updated according to $w_{i, t + 1} = {[(1 - ε) w_{i, t} + ρ e_{t} x_{i, t}]}_{-}$ , where ${[x]}_{\pm} = x Θ (\pm x)$ , $ε$ is a weight decay constant, and $ρ$ is a constant learning rate. At each learning cycle ( $P$ learning time steps) all patterns are presented sequentially in a random order (randomized at each learning cycle).

Figure Parameters.

Fig. 2.

In all panels $σ_{inh} / σ_{exc} = 2$ and ${CV}_{exc} / {CV}_{inh} = \sqrt{2}$ . $σ_{inh} / σ_{exc} = 2, {CV}_{exc} / {CV}_{inh} = \sqrt{2}$ with an even split between responsive/unresponsive labels. In B and C $f_{exc} = 0.8$ .

Fig. 3.

In all panels $N = 3, 000$ , $k = 2$ , and $θ = \sqrt{2}$ (Random Patterns in Numerical Estimation of $κ_{out}^{\max}$ and $κ_{in}^{\max}$ Solutions), leading to $σ_{inh} / σ_{exc} = 2 {CV}_{exc} / {CV}_{inh} = \sqrt{2}$ , $f_{exc} = 0.8$ with an even split between responsive/unresponsive labels. Numerical results are averaged over 100 samples.

Fig. 4.

In all panels $N = 1, 000$ , $P = 1, 000$ , fraction of plus patterns $p_{out} = 0.1,$ $f_{exc} = 0.8$ , $V_{rest} = 0$ , $V_{thr} = 1$ , $τ_{m} = 30 ms,$ $τ_{s} = 10 ms$ , $T = 200 ms,$ $A = 30 Hz$ (Dynamics of LIF Neuron). Random patterns were drawn as described in Random Patterns in Numerical Estimation of $κ_{out}^{\max}$ and $κ_{in}^{\max}$ Solutions with $k = 2$ and $θ = \sqrt{2}$ . Maximal $κ_{out}$ solutions were found with $Γ = 1.5$ in units of $(V_{th} - V_{rest}) / σ_{exc}$ . No output noise was added in A–C. In D and E output noise was added with $N_{noise} = 30, 000$ and $σ_{n} = 2 / \sqrt{N_{noise}}$ (Dynamics of LIF Neuron).

Fig. 5.

In C–F $N = 2, 000$ , $P = 1, 000$ , $f_{exc} = 0.8$ , $p_{exc} = 0.1$ , $p_{inh} = 0.2$ , $Γ = 10 \sqrt{p_{exc} (1 - p_{exc})}$ in units of $(V_{th} - V_{rest}) / σ_{exc}$ . In D and F results are averaged over 10 networks and 10 patterns from each network. See Recurrent Networks with Nonlearned Inhibition for parameters of I connectivity of the nonlearned inhibition networks (gray lines). In G maximal output noise magnitude is defined as the value of $σ_{out}$ for which the stable pattern probability is 1/2. To minimize finite-size effects in simulations we used $N = 3, 000$ , $f_{exc} = 0.8,$ $p_{exc} = 0.5$ , $p_{inh} = 0.5$ , $Γ = 10 \sqrt{p_{exc} (1 - p_{exc})}$ in units of $(V_{th} - V_{rest}) / σ_{exc}$ . Stable pattern probability for each load and noise level was estimated by averaging over five networks and 20 patterns from each network.

Fig. 6.

Random patterns are binary pattern $x_{i}^{μ} \in {0,1}$ with equal probabilities and an even split of plus and minus patterns. $N = 3, 000$ , $P = 900$ . Learning algorithm parameters are $ε = 5 \cdot 10^{- 7}$ , $ρ = 0.1 / N$ , $σ_{in} = 0.1$ (Perceptron Learning Algorithm). Results are averaged over 50 samples.

Finding Maximal $𝜿_{in}$ and Maximal $𝜿_{out}$ Solutions.

Here we describe how finding the maximal $κ_{in}$ and maximal $κ_{out}$ solutions can be expressed as convex conic optimization problems. This allows us to efficiently validate the theoretical results. As noted in the main text, maximizing $κ_{in}$ is equivalent to maximizing the margin of the solution’s weight vector as is done by support vector machines (39). However, to our knowledge, the application of conic optimization tools for maximizing $κ_{out}$ is a unique contribution of our work.

Solution weight vectors, w, with input robustness $κ_{in}$ or output robustness $κ_{out}$ , satisfy the inequalities

\forall μ y^{μ} (𝒘^{T} 𝒙^{μ} - V_{th}) \geq D,

[S1]

where $D_{in} = | 𝒘 | κ_{in}$ and $D_{out} = κ_{out}$ (here we assume without loss of generality that $V_{rest} = 0$ ).

For each solution w we define effective weights, u, and effective threshold $b$ [the so-called canonical weights and threshold (39)] given by

u_{i} = Λ w_{i}

[S2]

b = Λ V_{th},

[S3]

where $Λ > 0$ is chosen such that $Λ D = 1$ (for either $D_{in}$ or $D_{out}$ ).

Together with the sign and norm constraints on the weights, u and $b$ must satisfy the linear constraints

\forall μ y^{μ} (𝒖^{T} 𝒙^{μ} - b) \geq 1 \forall i s_{i} u_{i} \geq 0 b \geq 0,

[S4]

where $s_{i} = 1$ if $w_{i}$ is excitatory and $s_{i} = - 1$ if $w_{i}$ is inhibitory, and the quadratic constraint

{| 𝒖 |}^{2} \leq b^{2} Γ^{2} / V_{th}^{2},

[S5]

which enforces the constraint $| 𝒘 | \leq Γ$ .

For the effective weights and threshold, $κ_{in}$ is given by $κ_{in} = \frac{1}{| 𝒖 |}$ and $κ_{out}$ is given by $κ_{out} = \frac{V_{th}}{b}$ . Thus, maximizing $κ_{in}$ is equivalent to minimizing $| 𝒖 |$ and maximizing $κ_{out}$ is equivalent to minimizing $b$ . We therefore define a minimization cost function $E (𝒖, b)$ that is given by

E_{in} (𝒖, b) = \frac{1}{2} 𝒖^{T} 𝒖,

[S6]

for the $κ_{in}^{\max}$ solution, and

E_{out} (𝒖, b) = b,

[S7]

for the $κ_{out}^{\max}$ solution.

To find the maximal $κ_{in}$ or maximal $κ_{out}$ solution we solve the conic program

\min_{u, b, τ} E (𝒖, b) + β τ

[S8]

in the limit of $β \to \infty$ , subject to

\forall μ y^{μ} (𝒖^{T} 𝒙^{μ} - b) \geq 1 - τ \forall i s_{i} u_{i} \geq 0 b \geq 0 τ \geq 0 b^{2} Γ^{2} / V_{th}^{2} \geq {| 𝒖 |}^{2} .

[S9]

$τ$ is a global regularization variable that ensures the existence of a solution to the optimization problem (Eqs. S8 and S9) even when the linear constraints S4 are not realizable. In practice it is sufficient to set $β$ to be a large constant (we set $β = 10^{5}$ ). If the optimal value of $τ$ is zero, the solution corresponds to the optimal perceptron solution for the classification task. If the optimal value of $τ$ is greater then zero, it indicates that the labeled patterns are not linearly separable and that there is no zero error solution to the classification task. Given that a solution with $τ = 0$ is found, the optimal weights are given by $𝒘 = V_{th} 𝒖 / b$ .

SI Capacity for Noneven Split of Plus and Minus Patterns

The capacity of a perceptron with no sign constraints on synaptic weights for classification of random patterns is a function of the fraction of plus patterns in the desired classification, $p_{out}$ (20–22), and is given by

α_{c}^{Unconst .} = {[p_{out} \int_{- \infty}^{Δ} D t {(t - Δ)}^{2} + (1 - p_{out}) \int_{Δ}^{\infty} D t {(t - Δ)}^{2}]}^{- 1},

where $D t$ is the Gaussian integration measure, $D t = \frac{e^{- \frac{t^{2}}{2}}}{\sqrt{2 π}} d t$ , and the order parameter $Δ$ is given by the solution to the equation

0 = p_{out} \int_{- \infty}^{Δ} D t (t - Δ) + (1 - p_{out}) \int_{Δ}^{\infty} D t (t - Δ) .

Fig. S1A depicts the theoretical and measured $α_{c}$ of our “constrained” perceptron as a fraction of the corresponding unconstrained capacity vs. $f_{exc}$ for two values of $p_{out}$ . Fig. S1B depicts theoretical and measured $α_{b}$ as a fraction of $α_{c}$ for two values of $p_{out}$ .

SI Effects of E and I Input Statistics

Our results depend, of course, on parameters, but in a fairly reduced way. In particular, the properties we discuss depend on the ratio of the inputs standard deviations, $λ = σ_{inh} / σ_{exc}$ , and the ratio of their coefficients of variation, $ϕ = {CV}_{exc} / {CV}_{inh}$ (Repica Theory for Sign- and Norm-Constrained Perceptron). As discussed in the main text $ϕ$ determines the value of the optimal fraction of excitatory synapses, $f_{exc}^{⋆}$ , which can be written as $f_{exc}^{⋆} = ϕ / (1 + ϕ)$ (Eq. 2 in the main text). Thus, the shape of the phase diagram changes with $ϕ$ (Fig. S2A). The parameter $λ$ has more subtle effects. We note here the main effect $λ$ has on the maximal $κ_{in}$ and maximal $κ_{out}$ solutions.

Balanced and Unbalanced Maximal $𝜿_{in}$ Solutions.

The maximal $κ_{in}$ solutions can be either balanced or unbalanced, depending on $f_{exc}, ϕ, λ$ and the value of $κ_{in}^{\max}$ (example in Fig. S2B). Importantly, for a wide range of reasonable parameters [for example, $ϕ \leq f_{exc} / (1 - f_{exc})$ and $λ \geq 1$ ] the $κ_{in}^{\max}$ solution is unbalanced for all values of $κ_{in}^{\max}$ .

Fraction of “Silent” Weights in Maximal $𝜿_{in}$ and Maximal $𝜿_{out}$ Solutions.

As noted in previous studies (25, 27), a prominent feature of “critical” solutions with sign-constraint weights, such as the maximal $κ_{in}$ and maximal $κ_{out}$ solutions, is that a finite fraction of the synapses are silent; i.e., $w_{i} = 0$ . Our theory allows us to derive the full distribution of synaptic efficacies (Distribution of synaptic weights) and calculate the fraction of silent weights for each solution. For the maximal $κ_{out}$ solutions in the unbalanced regime ( $α_{b} < α < α_{c}$ ), the fraction of E (I) silent weights is always larger (smaller) then 1/2 (Fig. S2C). However, in the balanced regime ( $α < α_{b}$ ) the qualitative behavior depends on $λ$ (Fig. S2C). Interestingly, for unbalanced maximal $κ_{in}$ solutions the fraction of silent weights is constant and equals 1/2 for both E and I inputs (Saddle-point equations for the maximal $κ$ in solution and Distribution of synaptic weights).

Tuning Properties of Cortical Neurons Suggest That in Cortex $f_{exc}^{⋆} > 0.5$ .

In cortical circuits, I neurons tend to fire with higher firing rates and are thought to be more broadly tuned than E neurons, implying, under reasonable assumptions, that both $λ$ and $ϕ$ are greater than 1, leading to $f_{exc}^{⋆} > 0.5$ .

To see this, we consider input neurons with Gaussian tuning curves to some external stimulus variable $φ \in [0,1]$ ; i.e., the mean response, $x_{i}$ , of neuron $i$ to stimulus $φ$ is given by

x_{i} = A_{i} \exp [- {(φ - φ_{i}^{pref})}^{2} / (2 δ_{i}^{2})],

[S10]

where $A_{i}$ , $φ_{i}^{pref}$ , and $δ_{i}$ characterize the response properties of the neuron. Assuming that $φ$ is distributed uniformly and, for simplicity, that $δ_{i} ≪ 1$ , the mean and variance of the neurons’ responses are given by

{\bar{x}}_{i} = \sqrt{2 π} A_{i} δ_{i},

[S11]

and

σ_{i}^{2} ≃ A_{i}^{2} δ_{i} \sqrt{π},

[S12]

where we neglect terms of order $δ_{i}^{2}$ . We now assume that $A_{i} = A_{exc}$ and $δ_{i} = δ_{exc}$ if neuron $i$ is E and that $A_{i} = A_{inh}$ and $δ_{i} = δ_{inh}$ if neuron $i$ is I. Further, we assume that I neurons respond with a higher firing rate ( $A_{inh} > A_{exc}$ ) and are more broadly tuned ( $δ_{inh} > δ_{exc})$ . In this case we have

λ = \frac{A_{inh} \sqrt{δ_{inh}}}{A_{exc} \sqrt{δ_{exc}}} > 1,

[S13]

and

ϕ = \sqrt{\frac{δ_{inh}}{δ_{exc}}} > 1.

[S14]

SI Recurrent Networks with Nonlearned Inhibition

In our basic model for an associative memory network we assume that the activity of both E and I neurons is specified in the desired memory states and that all network connections are learned. Both of these assumptions can be modified, creating new scenarios with different computational properties.

First, we assume that memory state is specified only by the activity of E neurons and that the memory is recalled when the activity of E neurons matches the memory state regardless of the activity of I neurons. The problem of learning in such a network is computationally hard since the learning needs to optimize the activity of the I neurons, using the full connectivity matrix. In our work we do not address this scenario. Instead we forgo the assumption that E and I connections onto I neurons are learned and replace them with randomly chosen connections, i.e., assume that E to I and I to I connections are not learned and random.

Choosing Random Synapses for I Neurons.

In this scenario the activity of I neurons is determined by the network dynamics. We consider random I to I and E to I weights with means $J_{II}$ and $J_{IE}$ and standard deviations $σ_{J_{II}}$ and $σ_{J_{IE}}$ . We examine the distribution of I neurons’ membrane potential, given that the activity of E neurons is held at a memory state in which $p_{out}^{exc} N$ neurons are active. When $N$ is large, this distribution is Gaussian and we assume correlations are weak. Thus, the mean activity in the network is the probability that the membrane potential is above threshold and is given by the equation

m_{I} = H (\frac{V_{th} - ⟨ V ⟩}{\sqrt{σ^{2} (V)}}),

[S15]

where $H (x) = \int_{x}^{\infty} \frac{e^{- \frac{y^{2}}{2}}}{\sqrt{2 π}} d y$ , and $⟨ V ⟩$ and $σ^{2} (V)$ are the mean and variance of the membrane potential of I neurons, respectively.

On the other hand, given the mean activity, $m_{I}$ , the mean and variance of the membrane potentials are given by

⟨ V ⟩ = N (p_{out}^{exc} g_{exc} J_{IE} - m_{I} (1 - g_{exc}) J_{II})

[S16]

σ^{2} (V) = N ([σ_{J_{IE}}^{2} + (1 - p_{out}^{exc}) J_{IE}^{2}] p_{out}^{exc} g_{exc} + [σ_{J_{II}}^{2} + (1 - m_{I}) J_{II}^{2}] m_{I} (1 - g_{exc})) .

[S17]

Together, Eqs. S15–S17 define the relations between $m_{I}$ , $J_{II}$ , $J_{IE}$ , $σ_{J_{IE}}$ , and $σ_{J_{II}}$ .

In our simulations we set $m_{I}$ and the mean and variance of the I to I connections and choose the mean and variance of the E to I connections according to the solution of Eq. S15 [when $N$ is large, $J_{IE}$ is given by $N J_{IE} ≃ (V_{th} + m_{I} N (1 - g_{exc}) J_{II}) / (g_{exc} p_{out}^{exc})$ ]. In particular, we choose an I network with binary weights in which each I neuron projects to another I neuron with probability $p_{II}$ with synaptic efficacy $j_{II} = 1 / (\sqrt{N} p_{II})$ . Each E neuron projects to an I neuron with probability $p_{IE}$ with synaptic efficacy $j_{IE}$ that ensures that the mean I activity level at the memory states is $m_{I}$ .

In this parameter regime, the I subnetwork exhibits asynchronous activity, with mean activity $m_{I}$ , at the E memory states. However, different memory states lead to different asynchronous states.

Training Set Definition.

E neurons need to learn to remain stationary at the desired memory states, given the network activity at this state. However, since the activity of the I subnetwork is not stationary at the desired memory states, the training set for learning is not well defined.

To properly define the training set, we sample $n_{sample}$ instances of the generated I activity for each memory state when the activity of the E neuron is clipped to this memory state. Sampling was performed by running the I network dynamics and recording the state of the I neurons after $T = 100 N$ time steps. We then use the sampled activity patterns together with the E memory states as an extended training set (with $P n_{sample}$ patterns) for the E neurons.

Learned Network Stability.

The nonfixed point dynamics of the I subnetwork imply that the convergence of the learning on the training set does not entail that the memory states themselves are dynamically stable, in contrast to our prior model in which I neurons learn their synaptic weights. Therefore, after training we measure the probabilities that patterns are stable. This is done by the following procedure: First, we run the network dynamics (with $σ_{out} = 0$ ) when the E neurons’ activity is clipped to the memory state, for $T_{init} = 50 N$ time steps. We then release the E neurons to evolve according to the natural network dynamics and observe whether their activity remains in the vicinity of the memory state for $T = 500 N$ time steps. In a similar way we test the basins of attraction, starting the E network from a distorted version of the memory state instead of the memory state itself.

Learning Only E to E Connections.

First, we consider the case in which I to E connections are random: Each I neuron projects to an E neuron with probability $p_{EI}$ with synaptic efficacy $j_{EI} = 1 / (\sqrt{N} p_{EI})$ . We then try to find an appropriate E to E connection, using the learning scheme described above. We find that the pattern to pattern fluctuations in the I feedback due to the variance of the I to E connections and the variance in the I network neurons’ activation are substantial and of the same order of the signal differentiating the memory states. In fact, in this scenario the parameters we consider ( $N = 2, 000$ , $P = 1, 000,$ $g_{exc} = 0.8$ , $p_{out}^{exc} = 0.15,$ $p_{II} = 1 / 2,$ $p_{I E} = 1 / 2,$ $m_{I} = 0.4$ , $p_{EI} = 1 / 2$ , $n_{sample} = 40$ ) are above the system’s memory capacity and we are unable to find appropriate E weights which implement the desired memory states for the training set. We conclude that this form of balancing I feedback is too restrictive due to the heterogeneity of I to E connections and variability of I neurons’ activity.

Learning Both E to E and I to E Connections.

In this scenario we find the maximal $κ_{out}$ solution for the extended training set described above. For the parameters used ( $N = 2, 000$ , $P = 1, 000,$ $g_{exc} = 0.8$ , $p_{out}^{exc} = 0.15,$ $p_{II} = 1 / 2,$ $p_{IE} = 1 / 2,$ $m_{I} = 0.4,$ $n_{sample} = 40$ ) we are able to find solutions that implement all of the desired memory states for the extended training set. In addition, we find that the E memory states are dynamically stable with very high probability (we did not observe any unstable pattern). For numerical results see Fig. 5 and Fig S5.

SI Replica Theory for Sign- and Norm-Constrained Perceptron

We use the replica method (75) to calculate the system’s typical properties. For the perceptron architecture the replica symmetric solution has been shown to be stable and exact (20–22).

Given a set of $P$ patterns, $𝒙^{μ},$ and desired labels $y^{μ} = \pm 1$ for $μ = 1, 2, \dots, P$ , the Gardner volume is given by

V_{G} = \int D (𝒘) \prod_{μ = 1}^{P} Θ [y^{μ} (w^{T} 𝒙^{μ} - V_{th}) - K],

[S18]

where $Θ [x]$ is the Heaviside step function and $D (𝒘)$ is an integration domain obeying the sign and norm constraint $| 𝒘 | \leq Γ$ .

We assume input pattern and labels are drawn independently from distributions with nonnegative means ${\bar{x}}_{exc (inh)}$ and SD $σ_{exc (inh)}$ . Labels are independently drawn from a binary distribution with $\Pr (y^{μ} = 1) = p_{out}$ and $\Pr (y^{μ} = - 1) = 1 - p_{out}$ .

We handle both input and output robustness criteria by using different $K$ for each case,

K_{in} = | 𝒘 | σ_{exc} κ_{in} K_{out} = V_{th} κ_{out},

[S19]

where here, $κ_{in}$ and $κ_{out}$ are dimensionless numbers representing the input robustness in units of $σ_{exc}$ and the output robustness in units of $V_{th}$ , respectively.

Further, we define the parameters

λ = \frac{σ_{inh}}{σ_{exc}}, η = \frac{{\bar{x}}_{inh}}{{\bar{x}}_{exc}},

[S20]

and

ϕ = \frac{{CV}_{exc}}{{CV}_{inh}} .

[S21]

The Order Parameters.

We calculate the mean logarithm of the Gardner volume ${⟨ {⟨ \ln V_{G} ⟩}_{x} ⟩}_{y}$ averaged over the E and I input distributions and the desired label distribution. The result of the calculation is expressing ${⟨ {⟨ \ln V_{G} ⟩}_{x} ⟩}_{y}$ as a stationary-phase integral over a free energy that is a function of several order parameters. The value of the order parameters is determined by the saddle-point equations of the free energy.

In our model the saddle-point equations are a system of six equations for the six order parameters: $q, Q, θ, Δ, B$ , and $C$ .

The order parameters $q, Q, θ, and Δ$ have a straightforward physical interpretation.

The parameter $q$ is the mean typical correlation coefficient between the $V_{PSP}$ s elicited by two different solutions to the same classification task: Given two typical solution weight vectors $𝒘^{α}$ and $𝒘^{β}$ , $q$ is given by

q = \frac{\sum_{i = 1}^{N} λ_{i}^{2} w_{i}^{α} w_{i}^{β}}{\sqrt{(\sum_{i = 1}^{N} λ_{i}^{2} w_{i}^{α 2}) (\sum_{i = 1}^{N} λ_{i}^{2} w_{i}^{β 2})}},

[S22]

where $λ_{i} = 1$ if $w_{i}$ is E and $λ_{i} = λ$ if $w_{i}$ is I.

Given a typical solution w, the physical interpretation of $Q$ and $θ$ is given by

Q = \frac{\sum_{i = 1}^{N} λ_{i}^{2} w_{i}^{2}}{\sum_{i = 1}^{N} w_{i}^{2}}

[S23]

θ = \frac{V_{th}}{σ_{exc}} {(\sum_{i = 1}^{N} λ_{i}^{2} w_{i}^{2})}^{- \frac{1}{2}} .

[S24]

The norm constraint on the weights is satisfied as long as

θ \geq \frac{V_{th}}{σ_{exc} Γ \sqrt{Q}} .

[S25]

Thus, there are two types of solutions: one in which the value of $θ$ is determined by the saddle-point equation (unbalanced solutions) and the other in which $θ$ is clipped to its lower-bound value (balanced solutions). Note that $q$ and $Q$ remain of order 1 for any scaling of $| 𝒘 |$ while $θ$ scales as $\sqrt{N}$ when $| 𝒘 |$ is of order $1 / \sqrt{N}$ and is of order $1$ when $| 𝒘 |$ is of order 1.

The physical interpretation of $Δ$ can be expressed through the relation

Δ = θ (1 - \frac{{\bar{x}}_{exc} \sum_{i = 1}^{N} η_{i} w_{i}}{V_{th}}),

[S26]

where $η_{i} = 1$ if $w_{i}$ is E and $η_{i} = η$ if $w_{i}$ is I.

Summary of Main Results.

Before describing the full saddle-point (SP) equations and their various solutions in detail, we provide a brief general summary of the results that will hopefully provide some flavor of the derivations for the interested reader.

Since $θ$ is bounded from below by $V_{th} / (σ_{exc} Γ \sqrt{Q})$ , we have two sets of SP equations which we term the balanced and the unbalanced sets. In both sets, given the free energy $F (Q, q, Δ, θ, B, C)$ , five of the SP equations are given by

\frac{\partial F}{\partial Q} = \frac{\partial F}{\partial q} = \frac{\partial F}{\partial Δ} = \frac{\partial F}{\partial B} = \frac{\partial F}{\partial C} = 0.

[S27]

The sixth equation is

\frac{\partial F}{\partial θ} = 0

[S28]

in the unbalanced set and is

θ = \frac{V_{th}}{σ_{exc} Γ \sqrt{Q}}

[S29]

in the balanced set. Importantly, we find that Eq. S28 has solutions only when $θ \sim \sqrt{N}$ , which implies $| 𝒘 | \sim 1 / \sqrt{N}$ , and Eq. S29 implies $| 𝒘 | = Γ \sim O (1)$ , justifying the naming of the two sets. The solutions to the two sets of SP equations define the range of possible values of $α$ , $κ_{in}$ , and $κ_{out}$ that permits the existence of solution weight vectors. There are a number of interesting cases that we analyze below.

We first consider the solutions of the SP equations for zero $κ_{out}$ and $κ_{in}$ . In this case the SP describes the typical solutions that dominate the Gardner volume. Since the $N$ -dim. volume of balanced solutions with $| 𝒘 | \sim 1$ is exponentially larger than the volume of unbalanced solutions with $| 𝒘 | \sim 1 / \sqrt{N}$ , we expect that balanced solutions will dominate the Gardner volume whenever they exist. Indeed, solving the two sets of SP equations we find that solutions to the balanced set exist only for $α < α_{b}$ while solutions for the unbalanced set exist only for $α_{b} < α < α_{c}$ .

Next, we examine the values of $κ_{out}$ that permit solutions to the balanced and unbalanced sets of SP equations. Importantly, we show that the unbalanced set can be solved only for $κ_{out} \propto 1 / \sqrt{N}$ . Thus, unbalanced solutions cannot have $κ_{out}$ of $O (1)$ or conversely all solutions with $κ_{out}$ of $O (1)$ are balanced.

Of particular interest are the so-called critical solutions for which $q \to 1$ . At this limit the typical correlation coefficient between the $V_{PSP}$ s elicited by two different solutions to the same classification task approaches unity, which implies that only one solution exists and the Gardner volume shrinks to zero. Thus, for a given $κ_{in}$ or $κ_{out}$ , the value of $α$ for which $q \to 1$ is the maximal load for which solutions exist. In this case, the SP describes the properties of the maximal $κ_{out}$ or $κ_{in}$ solutions.

The structure of the equations in this limit is relatively simple. First, the order parameter $Δ$ is given by the solutions to

0 = p_{out} \int_{- \infty}^{Δ + \tilde{K}} D t (t - Δ - \tilde{K}) + (1 - p_{out}) \int_{Δ - \tilde{K}}^{\infty} D t (t - Δ + \tilde{K})

[S30]

with the robustness parameter $\tilde{K}$ being ${\tilde{K}}_{in} = κ_{in} / \sqrt{Q}$ or ${\tilde{K}}_{out} = θ κ_{out}$ and the integration measure, $D t$ , is given by $D t = \frac{e^{- \frac{t^{2}}{2}}}{\sqrt{2 π}} d t$ . Second, we find a simple relation between critical loads of the constrained perceptron considered here and the critical loads of the classic unconstrained perceptron, $α^{Unconst .}$ :

α = 2 C α^{Unconst .} .

[S31]

$α^{Unconst .}$ is given by

α^{Unconst .} = {[p_{out} \int_{- \infty}^{Δ + \tilde{K}} D t {(t - Δ - \tilde{K})}^{2} + (1 - p_{out}) \int_{Δ - \tilde{K}}^{\infty} D t {(t - Δ + \tilde{K})}^{2}]}^{- 1},

[S32]

which is indeed the critical load of an unconstrained perceptron with a given margin $\tilde{K}$ (21). Finding the critical load is then reduced to solving for the order parameter $C$ . For each value of $κ_{in} > 0$ or $κ_{out} > 0$ only one set of SP equations can be solved, determining whether the maximal $κ_{in}$ or $κ_{out}$ solutions are balanced or unbalanced. By examining the range of solutions for each set we can find the value of $κ_{in}^{\max}$ and $κ_{out}^{\max}$ for any $α$ and determine that (i) the maximal $κ_{out}$ solution is balanced for $α \leq α_{b}$ and unbalanced for $α_{b} < α < α_{c}$ and (ii) for a wide range of parameters the maximal $κ_{in}$ is unbalanced for all $α < α_{c}$ . In addition, we find that for $α_{b} < α < α_{c}$ , $κ_{out}^{\max}$ is given by

κ_{out}^{\max} = \frac{σ_{exc}}{{\bar{x}}_{exc} \sqrt{N}} κ_{0},

[S33]

where $κ_{0}$ is finite and larger than zero when $α$ approaches $α_{b}$ from above and $κ_{0}$ approaches zero when $α$ approaches $α_{c}$ . The above result implies that output robustness can be increased when the tuning of the input is increased. As we discuss in the main text in the context of neuronal selectivity in purely E circuits, sparse input activity is one way to increase the input tuning. If we consider sparse binary inputs with mean activity level $s ≪ 1$ , the output robustness will be given by $κ_{0} \sqrt{(1 - s) / s N} ≃ κ_{0} / \sqrt{s N}$ .

Finally, we consider the solutions of the SP equations in the critical limit ( $q \to 1$ ) for $κ_{in} = κ_{out} = 0$ . In this limit the SP describes the capacity and balanced capacity. We note that for $κ_{out} = κ_{in} = 0$ , $Δ$ (and as a result $α_{c}^{Unconst .}$ ) is independent of all of the other order parameters, simplifying the equations. In this case, we have only two coupled SP equations (for the order parameters $B$ , $C$ , and $θ$ ), given by

{(1 - \frac{σ_{exc} B θ}{{\bar{x}}_{exc} \sqrt{2 C N}})}^{2} C = f_{exc} γ_{+} (B) + (1 - f_{exc}) γ_{-} (B ϕ)

[S34]

\frac{σ_{exc} θ}{{\bar{x}}_{exc} \sqrt{N}} = - \frac{f_{exc} γ_{+}^{'} (B) + (1 - f_{exc}) ϕ γ_{-}^{'} (B ϕ)}{\sqrt{2 C}},

[S35]

where we defined the functions

γ_{\pm} (x) = \frac{x^{2} + 1}{2} - \frac{1}{2} \int D u {(x + u)}^{2} Θ [\pm (x + u)]

[S36]

γ_{\pm}^{'} (x) = x - \int D u (x + u) Θ [\pm (x + u)] .

[S37]

For the balanced SP equations we have $θ = V_{th} / (σ_{exc} Γ \sqrt{Q})$ and for the unbalanced SP equations, Eq. S28 reduces to $B = 0$ . Finally, $α$ is given by Eq. S31.

For the unbalanced set ( $B = 0$ ) we have $γ_{\pm} (0) = \frac{1}{2} - \frac{1}{2} \int_{0}^{\infty} D u {(u)}^{2} = \frac{1}{4}$ and $γ_{\pm}^{'} (0) = \mp \int_{0}^{\infty} D u (u) = \mp \frac{1}{\sqrt{π}}$ . We immediately get $C = \frac{1}{4}$ and

θ = \sqrt{\frac{2 N}{π}} [f_{exc} / {CV}_{exc} - (1 - f_{exc}) / {CV}_{inh}] .

[S38]

This solution suggests that at capacity the solutions are unbalanced $(θ \sim \sqrt{N} \Rightarrow | 𝒘 | \sim 1 / \sqrt{N})$ and that capacity as a function of $f_{exc}$ is constant with

α_{c} = \frac{1}{2} α_{c}^{Unconst .} .

[S39]

However, this solution is valid only as long as $θ > V_{th} / (σ_{exc} Γ \sqrt{Q})$ , which is true only as long as

[f_{exc} / {CV}_{exc} - (1 - f_{exc}) / {CV}_{inh}] > 0,

[S40]

which implies

f_{exc} > f_{exc}^{⋆} = \frac{{CV}_{exc}}{{CV}_{exc} + {CV}_{inh}} .

[S41]

For the solution for the balanced set $(θ = V_{th} / (σ_{exc} Γ \sqrt{Q}))$ terms with $θ / \sqrt{N}$ can be neglected and we have the equation

0 = f_{exc} γ_{+}^{'} (B) + (1 - f_{exc}) ϕ γ_{-}^{'} (B ϕ)

[S42]

for the order parameter $B$ . $C$ and $Q$ are given by

C = f_{exc} γ_{+} (B) + (1 - f_{exc}) γ_{-} (B ϕ)

[S43]

Q = \frac{f_{exc} γ_{+} (B) + (1 - f_{exc}) γ_{-} (B ϕ)}{f_{exc} γ_{+} (B) + (\frac{1 - f_{exc}}{λ^{2}}) γ_{-} (B ϕ)} .

[S44]

This solution gives us the balanced capacity line

α_{b} (f_{exc}) = 2 [f_{exc} γ_{+} (B) + (1 - f_{exc}) γ_{-} (B ϕ)] {α^{Uncont .}}_{(p_{out})},

[S45]

where $B$ is given by the solution of Eq. S42 and ${α^{Uncont .}}_{(p_{out})}$ is given by Eqs. S32 and S30 with $\tilde{K} = 0$ .

Detailed Solutions of the SP Equations.

Below we provide the SP equations and their solutions under various conditions. We also provide the derived form of the distributions of synaptic weights for critical solutions.

The general SP equations.

We define the following:

F_{h} = p_{out} \int D t \ln H [- X_{+} (t)] + (1 - p_{out}) \int D t \ln H [X_{-} (t)]

[S46]

D t = \frac{e^{- \frac{t^{2}}{2}}}{\sqrt{2 π}} d t

[S47]

H (x) = \int_{x}^{\infty} D t

[S48]

X_{\pm} (t) = \frac{\sqrt{q} t - Δ \mp \tilde{K}}{\sqrt{1 - q}}

[S49]

{\tilde{K}}_{in} = κ_{in} / \sqrt{Q}

[S50]

{\tilde{K}}_{out} = θ κ_{out}

[S51]

ϕ_{+} = 1, ϕ_{-} = ϕ

[S52]

λ_{+} = 1, λ_{-} = λ

[S53]

f_{+} = f_{exc}, f_{-} = 1 - f_{exc}

[S54]

α = \frac{P}{N}

[S55]

\tilde{θ} = \frac{σ_{exc} (θ - Δ)}{{\bar{x}}_{exc} \sqrt{N}}

[S56]

Z_{\pm} = 2 C {[2 C - \sqrt{2 C} B \tilde{θ} + (1 - q) \times (1 - 2 α Q \frac{\partial F_{h}}{\partial Q} (1 - \frac{Q}{λ_{\pm}^{2}}))]}^{- 1}

[S57]

Φ_{\pm} (x, z, q) = \frac{z (x^{2} + 1)}{2} + \frac{1 - q}{2} \times [1 + \int D u J_{1} (\pm \frac{\sqrt{z} (x + u)}{\sqrt{(1 - q)}})]

[S58]

Φ_{\pm}^{'} (x, z, q) = z x \pm \sqrt{z (1 - q)} \int D u J_{2} (\pm \frac{\sqrt{z} (x + u)}{\sqrt{(1 - q)}})

[S59]

J_{1} (x) = \frac{H' (x)}{H (x)} x

[S60]

J_{2} (x) = \frac{H' (x)}{H (x)} .

[S61]

The SP equations are given by

C / Q = \sum_{\pm} \frac{f_{\pm}}{λ_{\pm}^{2}} Z_{\pm} Φ_{\pm} (B ϕ, Z_{\pm}, q)

[S62]

C = \sum_{\pm} f_{\pm} Z_{\pm} Φ_{\pm} (B ϕ_{\pm}, Z_{\pm}, q)

[S63]

\sqrt{2 C} \tilde{θ} = - \sum_{\pm} f_{\pm} ϕ_{\pm} Φ_{\pm}^{'} (B ϕ_{\pm}, Z_{\pm}, q)

[S64]

α = - \frac{C}{{(1 - q)}^{2}} {(\frac{\partial F_{h}}{\partial q})}^{- 1}

[S65]

\frac{σ_{exc} B}{{\bar{x}}_{exc} \sqrt{2 C N}} = - \frac{1}{2 (1 - q)} \frac{\partial F_{h}}{\partial Δ} {(\frac{\partial F_{h}}{\partial q})}^{- 1}

[S66]

\frac{σ_{exc} B}{{\bar{x}}_{exc} \sqrt{2 C N}} = \frac{1 - q}{2 θ C} + \frac{1}{2 (1 - q)} \frac{\partial F_{h}}{\partial θ} {(\frac{\partial F_{h}}{\partial q})}^{- 1} 𝑶𝑹 θ = \frac{V_{th}}{σ_{exc} Γ \sqrt{Q}} .

[S67]

It is important to note the relation between $θ$ and $\tilde{θ}$ . $\tilde{θ}$ is of the order of $θ / \sqrt{N}$ ( $Δ$ remains of order 1 under all conditions). Thus, for unbalanced solutions $θ \sim \sqrt{N}$ and $\tilde{θ}$ is of $O (1)$ while for balanced solutions $θ$ is of $O (1)$ and $\tilde{θ}$ is of $O (1 / \sqrt{N})$ and can be neglected.

SP equations for typical solutions.

For typical solutions we solve the SP equations for $κ_{in} = 0$ or $κ_{out} = 0$ , leading to $K = 0$ . In this case we have $\frac{\partial F_{h}}{\partial θ} = \frac{\partial F_{h}}{\partial Q} = 0$ and thus

Z_{\pm} = 2 C {[2 C - \sqrt{2 C} B \tilde{θ} + (1 - q)]}^{- 1}

[S68]

and the SP equation for $θ$ is

\frac{σ_{exc} \sqrt{2 C} B θ}{{\bar{x}}_{exc} \sqrt{N}} = 1 - q 𝑶𝑹 θ = \frac{V_{th}}{σ_{exc} Γ \sqrt{Q}} .

[S69]

We now can solve the SP equations for the unbalanced case with

Z_{\pm} = 1 and θ > \frac{V_{th}}{σ_{exc} Γ \sqrt{Q}}, \tilde{θ} > 0

[S70]

and for the balanced case with

Z_{\pm} = 2 C {[2 C + (1 - q)]}^{- 1} and θ = \frac{V_{th}}{σ_{exc} Γ \sqrt{Q}}, \tilde{θ} = 0.

[S71]

We find that for $α < α_{b}$ a solution exists only for equations of the balanced case while for $α > α_{b}$ a solution exists for the equations for the unbalanced case. Thus, typical solutions are balanced below $α_{b}$ and unbalanced above it. The norm of the weight and the IB depicted in Fig. 2 B and C are given by

| 𝒘 | = \frac{1}{\sqrt{Q} θ}

[S72]

IB = \frac{\sum_{\pm} f_{\pm} ϕ_{\pm} Φ_{\pm}^{'} (B ϕ_{\pm}, Z, q)}{\sum_{\pm} \pm f_{\pm} ϕ_{\pm} Φ_{\pm}^{'} (B ϕ_{\pm}, Z, q)} .

[S73]

Solutions with significant $𝜿_{𝐨𝐮𝐭}$ are balanced.

In this section we show that all unbalanced solutions have output robustness of order $1 / \sqrt{N}$ and, equivalently, solutions with $κ_{out}$ of order 1 are balanced.

Theorem.

All unbalanced solutions have output robustness of the order of $1 / \sqrt{N} .$

Proof.

In the case of output robustness we have $\tilde{K} = {\tilde{K}}_{out} = θ κ_{out}$ and thus $\frac{\partial F_{h}}{\partial Q} = 0$ . We are looking for unbalanced solutions $(\tilde{θ} > 0, θ \sim O (\sqrt{N}))$ so we have the equations

\frac{σ_{exc} B}{{\bar{x}}_{exc} \sqrt{2 C N}} = - \frac{1}{2 (1 - q)} \frac{\partial F_{h}}{\partial Δ} {(\frac{\partial F_{h}}{\partial q})}^{- 1}

[S74]

\frac{σ_{exc} B}{{\bar{x}}_{exc} \sqrt{2 C N}} = \frac{1 - q}{2 θ C} + \frac{1}{2 (1 - q)} \frac{\partial F_{h}}{\partial θ} {(\frac{\partial F_{h}}{\partial q})}^{- 1} .

[S75]

Both equations must be satisfied and therefore we have (using Eqs. S65, S74, and S75)

0 = \frac{\partial F_{h}}{\partial Δ} + \frac{\partial F_{h}}{\partial θ} - \frac{1}{α θ} .

[S76]

Performing the derivatives, we get

0 = (κ_{out} + 1) p_{out} \int D t J_{2} (- X_{+}) + (κ_{out} - 1) (1 - p_{out}) \int D t J_{2} (X_{-}) - \frac{\sqrt{1 - q}}{α θ} .

[S77]

Now, we use Eq. S74, leading to

p_{out} \int D t J_{2} (- X_{+}) = (1 - p_{out}) \int D t J_{2} (X_{-}) - M / \sqrt{N},

[S78]

where we defined $M$ as

M = \frac{\sqrt{2} B σ_{exc} \sqrt{1 - q}}{\sqrt{C} {\bar{x}}_{exc}} \frac{\partial F_{h}}{\partial q},

[S79]

which remains of $O (1)$ . Thus, we are left with

0 = 2 κ_{out} (1 - p_{out}) \int D t J_{2} (X_{-}) - (κ_{out} + 1) \frac{M}{\sqrt{N}} - \frac{\sqrt{1 - q}}{α θ} .

[S80]

Note that $J_{2} (x) < 0$ and the first term is negative (nonzero). The other two terms scale as $1 / \sqrt{N}$ and therefore the equation can be satisfied only if $κ = \frac{κ_{0}}{\sqrt{N}}$ .

SP equations for critical solutions.

To find the capacity, balanced capacity, and solutions with maximal output and input robustness we consider the limit $q \to 1$ .

We define

G_{Q} = lim_{q \to 1} \frac{Q}{(1 - q)} {(\frac{\partial F_{h}}{\partial q})}^{- 1} \frac{\partial F_{h}}{\partial Q},

[S81]

and thus in this limit $Z_{\pm}$ is given by

Z_{\pm} = {[1 - \frac{B \tilde{θ}}{\sqrt{2 C}} + (1 - \frac{Q}{λ_{\pm}^{2}}) G_{Q}]}^{- 1} .

[S82]

In addition,

lim_{q \to 1} Φ_{\pm} (x, z, q) = z γ_{\pm} (x)

[S83]

γ_{\pm} (x) = \frac{x^{2} + 1}{2} - \frac{1}{2} \int D u {(x + u)}^{2} Θ [\pm (x + u)]

[S84]

lim_{q \to 1} Φ_{\pm}^{'} (x, z, q) = z γ_{\pm}^{'} (x)

[S85]

γ_{\pm}^{'} (x) = x - \int D u (x + u) Θ [\pm (x + u)],

[S86]

and, in the $q \to 1$ limit we have

(1 - q) \frac{\partial F_{h}}{\partial Δ} = M (Δ, \tilde{K})

[S87]

M (Δ, \tilde{K}) = p_{out} \int_{- \infty}^{Δ + \tilde{K}} D t (t - Δ - \tilde{K}) + (1 - p_{out}) \int_{Δ - \tilde{K}}^{\infty} D t (t - Δ + \tilde{K})

[S88]

{(1 - q)}^{2} \frac{\partial F_{h}}{\partial q} = - \frac{1}{2} α^{Unconst .} (Δ, \tilde{K})

[S89]

α^{Unconst .} (Δ, \tilde{K}) = {[p_{out} \int_{- \infty}^{Δ + \tilde{K}} D t {(t - Δ - \tilde{K})}^{2} + (1 - p_{out}) \int_{Δ - \tilde{K}}^{\infty} D t {(t - Δ + \tilde{K})}^{2}]}^{- 1} .

[S90]

We now write the final form of the SP equations for critical solutions:

C = \sum_{\pm} f_{\pm} Z_{\pm}^{2} γ_{\pm} (B ϕ_{\pm})

[S91]

Q = \frac{\sum_{\pm} f_{\pm} Z_{\pm}^{2} γ_{\pm} (B ϕ_{\pm})}{\sum_{\pm} \frac{f_{\pm}}{λ_{\pm}^{2}} Z_{\pm}^{2} γ_{\pm} (B ϕ_{\pm})}

[S92]

\tilde{θ} = - \frac{\sum_{\pm} f_{\pm} ϕ_{\pm} Z_{\pm} γ_{\pm}^{'} (B ϕ_{\pm})}{\sqrt{2 \sum_{\pm} f_{\pm} Z_{\pm}^{2} γ_{\pm} (B ϕ_{\pm})}}

[S93]

0 = M (Δ, \tilde{K})

[S94]

\frac{B}{\sqrt{2 C N}} = - \frac{{\bar{x}}_{ex}}{σ_{ex}} 2 C α^{Unconst .} (Δ, \tilde{K}) lim_{q \to 1} (1 - q) \frac{\partial F_{h}}{\partial θ} 𝑶𝑹 θ = \frac{V_{th}}{σ_{exc} Γ \sqrt{Q}} .

[S95]

Finally $α$ is given by

α = 2 C α^{Unconst .} (Δ, \tilde{K}) .

[S96]

Capacity and balanced capacity.

The capacity is given for $δ = 0$ and $κ = 0$ . In this case both $\frac{\partial F_{h}}{\partial θ}$ and $\frac{\partial F_{h}}{\partial Q}$ are zero and Eq. S95 has two possible solutions:

B = 0, θ > \frac{V_{th}}{σ_{exc} Γ \sqrt{Q}},

[S97]

for unbalanced solutions or

θ = \frac{V_{th}}{σ_{exc} Γ \sqrt{Q}}, \tilde{θ} = 0,

[S98]

for balanced solutions. In both cases we have $Z_{\pm} = 1$ .

Unbalanced solution.

The SP equations become

Q = \frac{1}{\sum_{\pm} \frac{1}{λ_{\pm}^{2}} f_{\pm}}

[S99]

\tilde{θ} = \frac{1}{\sqrt{π}} \sum_{\pm} \pm f_{\pm} ϕ_{\pm}

[S100]

C = \frac{1}{4}

[S101]

and the capacity is given by

α_{c} = \frac{1}{2} α^{Unconst .} (Δ, 0),

[S102]

where $Δ$ is given by $M (Δ, 0) = 0$ .

This solution is valid only when $θ$ is larger than its $O (1)$ lower bound, which is guarantied in the large $N$ limit as long as $\tilde{θ} > 0$ . Using Eq. S100, this entails that

f_{exc} > f_{exc}^{⋆}

[S103]

with

f_{exc}^{⋆} = \frac{ϕ}{1 + ϕ}

[S104]

or conversely $ϕ < \frac{f_{exc}}{1 - f_{exc}} .$

Balanced solution.

In this solution we have $θ = \frac{V_{th}}{σ_{exc} Γ \sqrt{Q}}, \tilde{θ} = 0$ .

$B$ is given by the solution to

\sum_{\pm} f_{\pm} ϕ_{\pm} γ_{\pm}^{'} (B ϕ_{\pm}) = 0

[S105]

and we have

Q = \frac{\sum_{\pm} f_{\pm} γ_{\pm} (B ϕ_{\pm})}{\sum_{\pm} \frac{1}{λ_{\pm}^{2}} f_{\pm} γ_{\pm} (B ϕ_{\pm})}

[S106]

C = \sum_{\pm} f_{\pm} γ_{\pm} (B ϕ_{\pm})

[S107]

and

α_{b} = 2 C α^{Unconst .} (Δ, 0),

[S108]

where $Δ$ is given by $M (Δ, 0) = 0$ .

This gives the balanced capacity line. For $f_{exc} < f_{exc}^{⋆}$ this is the capacity line as well. Thus, for $f_{exc} < f_{exc}^{⋆}$ , at capacity the solution is balanced.

Coexistence of balanced and unbalanced solutions below the balanced capacity line.

To show that unbalanced solutions coexist with balanced solutions for any $α < α_{b}$ , we calculate the capacity of unbalanced solutions with a given norm. This can be done by solving Eqs. S91–S94 while imposing the condition $| 𝒘 | = \frac{V_{thr} W}{\sqrt{N} σ_{exc}}$ through the SP equation of $θ$ :

θ = \frac{\sqrt{N}}{W \sqrt{Q}} .

[S109]

We therefore have

\tilde{θ} = \frac{σ_{exc} (θ - Δ)}{{\bar{x}}_{exc} \sqrt{N}} ≃ \frac{σ_{exc}}{{\bar{x}}_{exc} W \sqrt{Q}} = \frac{1}{\tilde{W} \sqrt{Q}} .

[S110]

We are interested in the capacity and therefore we take $K = 0$ . As a result we have

Z_{\pm} = {[1 - \frac{B}{\sqrt{2 C Q {\tilde{W}}^{2}}}]}^{- 1}

[S111]

and the SP equations become

\frac{1}{\tilde{W}} = - \frac{\sum_{\pm} f_{\pm} ϕ_{\pm} γ_{\pm}^{'} (B ϕ_{\pm})}{\sqrt{2 \sum_{\pm} \frac{f_{\pm}}{λ_{\pm}^{2}} γ_{\pm} (B ϕ_{\pm})}}

[S112]

C = {[1 - \frac{B}{\tilde{W} \sqrt{2 C Q}}]}^{- 2} \sum_{\pm} f_{\pm} γ_{\pm} (B ϕ_{\pm})

[S113]

0 = M (Δ, 0),

[S114]

where $Q$ is given by

Q = \frac{\sum_{\pm} f_{\pm} γ_{\pm} (B ϕ_{\pm})}{\sum_{\pm} \frac{f_{\pm}}{λ_{\pm}^{2}} γ_{\pm} (B ϕ_{\pm})} .

[S115]

Given the value of $B$ and $Q$ the equation for $C$ can be solved for $\sqrt{C}$ and we get

\sqrt{C} = \frac{B}{\sqrt{2 Q} \tilde{W}} + \sqrt{\sum_{\pm} f_{\pm} γ_{\pm} (B ϕ_{\pm}) - {(\frac{B}{\sqrt{2 Q} \tilde{W}})}^{2}} .

[S116]

This solution is valid as long as $\sqrt{C} \geq 0$ . The conditional capacity $α_{c} (\tilde{W})$ is then given by

α_{c} (\tilde{W}) = 2 C α^{Unconst .} (Δ, 0) .

[S117]

It is easy to see that for $\tilde{W} \to \infty$ , the SP equations converge to the equations of the balanced capacity and thus $α_{c} (\tilde{W})$ approaches $α_{b}$ . In addition, we find that for $f_{exc} < f_{exc}^{⋆}$ , $α_{c} (\tilde{W})$ is a monotonically increasing function of $\tilde{W}$ . Another way to interpret this result is to “invert the function” and ask, What is the minimal value of $\tilde{W}$ that permits solutions given $α$ ? Our result implies that strictly below $α_{b}$ the minimal value of $\tilde{W}$ that permits solutions is of $O (1)$ [i.e., $| 𝒘 |$ of $O (1 / \sqrt{N})$ ] and unbalanced solutions exist. The minimal $\tilde{W}$ diverges as $α$ approaches $α_{b}$ and hence the solution at $α_{b}$ is balanced [ $| 𝒘 |$ of $O (1)$ ].

SP equations for the maximal $𝜿_{𝐢𝐧}$ solution.

In this case we have $\tilde{K} = {\tilde{K}}_{in} = \frac{δ}{\sqrt{Q}}$ and therefore $\frac{\partial F_{h}}{\partial θ} = 0$ . For unbalanced solutions we have

B = 0, θ > \frac{V_{th}}{σ_{exc} Γ \sqrt{Q}}, \tilde{θ} > 0

[S118]

and for balanced solutions we have

θ = \frac{V_{th}}{σ_{exc} Γ \sqrt{Q}}, \tilde{θ} = 0.

[S119]

In both cases $Z_{\pm}$ is given by

Z_{\pm} = {[1 + (1 - \frac{Q}{λ_{\pm}^{2}}) G_{Q}]}^{- 1}

[S120]

with

G_{Q} = κ_{in} / \sqrt{Q} α^{Unconst .} (Δ, κ_{in} / \sqrt{Q}) \times [p_{out} \int_{- \infty}^{Δ + \frac{κ_{in}}{\sqrt{Q}}} D t (t - Δ - \frac{κ_{in}}{\sqrt{Q}}) - (1 - p_{out}) \int_{Δ - \frac{κ_{in}}{\sqrt{Q}}}^{\infty} D t (t - Δ + \frac{κ_{in}}{\sqrt{Q}})] .

[S121]

Unbalanced solution.

In this case we have equations for $Δ$ and $Q$ :

M (Δ, κ_{in} / \sqrt{Q}) = 0

[S122]

Q = \frac{\sum_{\pm} f_{\pm} Z_{\pm}^{2}}{\sum_{\pm} \frac{1}{λ_{\pm}^{2}} f_{\pm} Z_{\pm}^{2}} .

[S123]

We then have

\tilde{θ} = \frac{\frac{1}{\sqrt{2 π}} \sum_{\pm} \pm f_{\pm} ϕ_{\pm} Z_{\pm}}{\sqrt{\frac{1}{2} \sum_{\pm} f_{\pm} Z_{\pm}^{2}}}

[S124]

C = \frac{1}{4} \sum_{\pm} f_{\pm} Z_{\pm}^{2}

[S125]

and

α = 2 C α^{Unconst .} (Δ, κ_{in} / \sqrt{Q}) .

[S126]

Balanced solution.

In this case we have equations for $Δ$ , $B$ , and $Q$ :

M (Δ, κ_{in} / \sqrt{Q}) = 0

[S127]

Q = \frac{\sum_{\pm} f_{\pm} γ_{\pm} (B ϕ_{\pm}) Z_{\pm}^{2}}{\sum_{\pm} \frac{1}{λ_{\pm}^{2}} f_{\pm} γ_{\pm} (B ϕ_{\pm}) Z_{\pm}^{2}}

[S128]

0 = \sum_{\pm} f_{\pm} ϕ_{\pm} γ_{\pm}^{'} (B ϕ_{\pm}) Z_{\pm} .

[S129]

The equations for $G_{Q}$ (Eq. S121) and $α$ (Eq. S126) remain the same; however, $C$ is given by

C = \sum_{\pm} f_{\pm} γ_{\pm} (B ϕ_{\pm}) Z_{\pm}^{2} .

[S130]

Transition between balanced and unbalanced solutions.

Transition points between balanced and unbalanced solutions depend on the value of $ϕ,$ $λ$ , and $f_{exc}$ . Transition points are points in which both $B = 0$ and $\tilde{θ} = 0$ . Thus, we have

ϕ^{⋆} = \frac{f_{exc} Z_{+}}{(1 - f_{exc}) Z_{-}},

[S131]

where $Q$ and $Δ$ are given by Eqs. S123 and S122. Thus, $ϕ^{⋆}$ is a function of $κ_{in}$ and $λ$ . Solutions are balanced for $ϕ > ϕ^{⋆}$ and unbalanced for $ϕ < ϕ^{⋆}$ (Fig. S2B).

SP equations for the maximal $𝜿_{𝐨𝐮𝐭}$ solution.

Unbalanced solution.

This solution is valid for $α > α_{b}$ , $f_{exc} > f_{exc}^{⋆}$ . We look for a solution with $\tilde{θ} > 0$ and thus $θ$ must scale as $\sqrt{N}$ .

In this case $\tilde{K} = θ κ$ and so $\frac{\partial F_{h}}{\partial Q} = 0$ and $Z_{\pm} = {[1 - \frac{B \tilde{θ}}{\sqrt{2 C}}]}^{- 1}$ .We then have

Q = \frac{\sum_{\pm} f_{\pm} γ_{\pm} (B ϕ_{\pm})}{\sum_{\pm} \frac{1}{λ_{\pm}^{2}} f_{\pm} γ_{\pm} (B ϕ_{\pm})}

[S132]

and we are left with equations to solve for $\tilde{θ},$ $B$ , and $Δ$ :

\tilde{θ} = - \frac{\sum_{\pm} f_{\pm} ϕ_{\pm} γ_{\pm}^{'} (B ϕ_{\pm})}{\sqrt{2 \sum_{\pm} f_{\pm} γ_{\pm} (B ϕ_{\pm})}}

[S133]

M (Δ, \tilde{K}) = 0

[S134]

and

\frac{B}{\sqrt{2 C N}} = - \frac{{\bar{x}}_{ex}}{σ_{ex}} κ α^{Unconst .} (Δ, \tilde{K}) [f \int_{- \infty}^{Δ + \tilde{K}} D t (t - Δ - \tilde{K}) - (1 - f) \int_{Δ - \tilde{K}}^{\infty} D t (t - Δ + \tilde{K})] .

[S135]

There is a solution only if

κ = \frac{σ_{exc}}{{\bar{x}}_{exc} \sqrt{N}} κ_{0}, θ ≃ \sqrt{N} \frac{{\bar{x}}_{ex}}{σ_{ex}} \tilde{θ}

[S136]

so we have $\tilde{K} = \tilde{θ} κ_{0}$ and

\frac{B}{\sqrt{2 C}} = - κ_{0} α^{Unconst .} (Δ, \tilde{θ} κ_{0}) \times [f \int_{- \infty}^{Δ + \tilde{θ} κ_{0}} D t (t - Δ - \tilde{θ} κ_{0}) - (1 - f) \int_{Δ - \tilde{θ} κ_{0}}^{\infty} D t (t - Δ + \tilde{θ} κ_{0})] .

[S137]

Finally, we have $C = Z^{2} \sum_{\pm} f_{\pm} γ_{\pm} (B ϕ_{\pm}),$ from which we can isolate $C$ to have

C = [\sum_{\pm} f_{\pm} γ_{\pm} (B ϕ_{\pm})] {[1 - \frac{B \sum_{\pm} f_{\pm} ϕ_{\pm} γ_{\pm}^{'} (B ϕ_{\pm})}{2 \sum_{\pm} f_{\pm} γ_{\pm} (B ϕ_{\pm})}]}^{2} .

[S138]

$α$ is given as before, $α = 2 C α^{Unconst .} (Δ, \tilde{θ} κ_{0})$ .

The equations given in this section are equivalent to the ones derived in Eq. S27.

Balanced solution.

We look for balanced solutions with $θ = \frac{V_{th}}{σ_{exc} Γ \sqrt{Q}}, \tilde{θ} = 0$ . The SP equations in this case are given by the same equations as the balanced solution described in SP equations for the maximal $κ$ in solution with $κ_{out} V_{th} / σ_{exc} Γ$ replacing $κ_{in}$ .

Distribution of Synaptic Weights.

We derive the mean distribution of synaptic weights for critical solutions ( $q \to 1$ )

P_{\pm} (w) = H (\mp B ϕ_{\pm}) δ (w) + \sqrt{\frac{θ^{2} N λ_{\pm}^{2}}{2 π σ_{w \pm}^{2}}} \exp \times [- \frac{{(θ \sqrt{N} λ_{\pm} w + B ϕ_{\pm} σ_{w \pm})}^{2}}{2 σ_{w \pm}^{2}}],

[S139]

with $σ_{w \pm} = \frac{Z_{\pm}}{\sqrt{2 C}}$ , where $P_{+}$ and $P_{-}$ denote the probability densities for E and I synaptic weights, respectively, $δ (x)$ is the Dirac delta function, and weights are given in units of $V_{th} / σ_{exc}$ . The fraction of silent synapses is given by $H (- B)$ for E synapses and by $H (B ϕ)$ for I synapses.

Discussion

The results we have presented come from imposing a set of fundamental biological constraints: fixed-sign synaptic weights, nonnegative afferent activities, a positive firing threshold (relative to the resting potential), and both input and output forms of noise. Amit et al. (23) studied the maximal margin solution for the sign-constrained perceptron and showed that it has half the capacity of the unconstrained perceptron. However, this previous work considered afferent activities that were centered around zero and a neuron with zero firing threshold, features that preclude the presence of the behavior exhibited by the more biologically constrained model studied here. Chapeton et al. (27) studied perceptron learning with sign-constrained weights and a preassigned level of robustness, but considered only solutions in the unbalanced regime which, as we have shown, are extremely sensitive to output noise.

Learning in neural circuits involves a trade-off between exhausting the system’s capacity for implementing complex input–output functions on the one hand and ensuring good generalization properties on the other. A well-known approach in machine learning has been to search for solutions that fit the training examples while maximizing the distance of samples from the decision surface, a strategy known as maximizing the margin (21, 23, 39). The margin being maximized in this case corresponds, in our framework, to $κ_{in}$ . Work in computational neuroscience has implicitly optimized a robustness parameter equivalent to our $κ_{out}$ (25, 27). To our knowledge, the two approaches have not been distinguished before or shown to result in solutions with dramatically different noise sensitivities. In particular, over a wide parameter range, we have shown that maximizing $κ_{out}$ leads to a balanced solution with minimal sensitivity to output noise and robustness to input noise that is almost as good as that of the maximal margin solution, with only a modest trade-off in capacity. On the other hand, maximizing the margin ( $κ_{in}$ ) often leads to unbalanced solutions with extreme sensitivity to output noise.

The perceptron has long been considered a model of cerebellar learning and computation (67, 68). More recently, Brunel et al. (25) investigated the capacity and robustness of a perceptron model of a cerebellar Purkinje cell, taking all weights to be E. In view of the analysis presented here, balanced solutions are not possible in this case ( $f_{exc} = 1$ ), and solutions that maximize either input-noise or output-noise robustness both have $κ_{out} \propto 1 / \sqrt{N}$ . These two types of solutions differ in their weight distributions, with experimentally testable consequences for the predicted circuit structure [SI $κ_{out}^{\max}$ $and$ $κ_{in}^{\max}$ Solutions in Purely E Networks and Fig. S2C; Brunel et al. (25) considered only solutions that maximize $κ_{out}$ ]. Output robustness of the unbalanced solutions can be increased by making the input activity patterns sparse. Denoting by $s$ the mean fraction of active neurons in the input, maximum output robustness scales as $κ_{out} \sim 1 / \sqrt{N s}$ (Fig. 3B and SI Replica Theory for Sign- and Norm-Constrained Perceptron). Thus, the high sparsity in input activation (granule cell activity) of the cerebellum relative to the modest sparsity in the neocortex is consistent with the former being dominated by E modifiable synapses.

Interestingly, our results suggest an optimal ratio of E to I synapses. Capacity in the balanced regime is optimal when $f_{exc} = f_{exc}^{⋆}$ , with $f_{exc}^{⋆}$ determined by the CVs (with respect to stimulus) of the E and I inputs (Eq. 2). Thus, optimality predicts a simple relation between the fraction of E and I inputs and their degree of tuning. Estimating the CVs from existing data is difficult, but it would be interesting to check whether input statistics and connectivity ratios in different brain areas are consistent with this prediction. The commonly observed value in cortex, $f_{exc} ≃ 0.8$ , would be optimal for input statistics with ${CV}_{exc} / {CV}_{inh} ≃ 4$ . In general, we expect that ${CV}_{exc} / {CV}_{inh} > 1$ , which implies that $f_{exc}^{⋆} > 1 / 2$ .

For most of our work, we assumed that I neurons learn to represent specific sensory and long-term memory information, the same as the E ones, and that all synaptic pathways are learned using similar learning rules. While plasticity in both E and I pathways has been observed (57–59, 61, 63, 64, 69), accumulating experimental evidence indicates a high degree of cell-type and synaptic-type specificity of the plasticity rules. In addition, synaptic plasticity is under tight control of neuromodulatory systems. At present, it is unclear how to interpret our learning rules in terms of concrete experimentally observed synaptic plasticity. Other functional models of neural learning assume learning only within the E population with inhibition acting as a global stabilizing force. In the case of sensory processing, our approach is consistent with the observation of a similar stimulus tuning of E and I postsynaptic currents in many cortical sensory areas. The role of I neurons in memory representations is less known (but see ref. 70). Importantly, we have shown that our main results are valid also in the case in which I neurons do not explicitly participate in the coding of the memories. Interestingly, our work suggests that even if I neurons are only passive observers during learning processes, learning of I synapses onto E cells can amplify the memory stability of the system against fluctuations in the I feedback. Given the diversity of I cell types it is likely that in the real circuits inhibition plays multiple roles, including both conveying information and providing stability.

Several previous models of associative memory have incorporated biological constraints on the sign of the synapses, Dale’s law, assuming variants of Hebbian plasticity in the E to E synapses (14, 48, 51–54). The capacity of the these Hebbian models is relatively poor, and their basins of attraction are small, except at extremely sparse activity levels. In contrast, our model applies a more powerful learning rule that, while keeping the sign constraints on the synapses, exhibits significantly superior performance: with high capacity even for moderate sparsity levels, large basins of attraction, and high robustness to output noise.

From a dynamical systems perspective, the associative memory networks we construct exhibit unusual properties. In most associative memory network models large basins of attractions endow the memory state with robustness against stochasticity in the dynamics (i.e., output noise). Here, we found that, for the same set of fixed-point memories, the synaptic weights with the largest possible basins (the unbalanced solutions with maximal $κ_{in}$ ) are very sensitive to even mild levels of stochasticity, whereas the balanced synaptic weights with somewhat reduced basins have substantially increased output noise robustness.

At the network level, as at the single-neuron level, imposing basic features of neural circuitry—positive inputs, bounded synapses of fixed sign, a positive firing threshold, and sources of noise—forces neural circuits into the balanced regime. A recent class of models showing computational benefits of balanced inputs uses extremely strong synapses, which are outside the range we have discussed (16). These models are stabilized by instantaneous transmission of signals between neurons which are not required in the range of synaptic strength we consider.

Previous models of balanced networks have highlighted the ability of networks with strong E and I recurrent synapses to settle into a state in which the total input is dynamically balanced without special tuning of the synaptic strengths. Such a state is characterized by a high degree of intrinsically generated spatiotemporal variability (7). Mean population activities respond fast and in a linear fashion to external inputs. Typically, these networks lack the population-level nonlinearity required to generate multiple attractors. In contrast, we have explored the capacity of the balanced network to support multiple stable fixed points by tuning the synaptic strengths through appropriate learning. We note that fully understanding and characterizing the dynamic properties of these networks and their relation to previously studied models remains an important challenge. Despite the dynamic and functional differences in the two classes of networks, the balancing of excitation and inhibition plays a similar role in both. In the first scenario, synaptic balance amplifies small changes in the spatial or temporal properties of the external drive. Similarly, in the present scenario, balanced synaptic architecture leads to enhanced robustness by amplifying the small variations in the synaptic inputs induced by changes in the stimulus or memory identity. It would be very interesting to combine fast dynamics with robust associative memory capabilities.

In conclusion, we have uncovered a fundamental principle of neuronal learning under basic biological constraints. Our work reveals that excitation–inhibition balance may have a critical computational role in producing robust neuronal functionality that is insensitive to output noise. We showed that this balance is important at the single-neuron level for both spiking and nonspiking neurons and at the level of recurrently connected neural networks. Further, the theory suggests that excitation–inhibition balance may be a collective, self-maintaining, emergent phenomenon of synaptic plasticity. Any successful neuronal learning process in the presence of substantial output noise will lead to strong balanced synaptic efficacies with noise robustness features. The fundamental nature of this result suggests that it should apply across a variety of neuronal circuits that learn in the presence of noise.

Materials and Methods

Detailed methods and simulation parameters are given in SI Materials and Methods.

Software.

To acknowledge their contribution to scientific work we cite the open source projects that directly and most crucially contributed to the current work: The Python stack of scientific computing [CPython, Numpy, Scipy, Matplotlib (71), Jupyter/Ipython (72), and others], CVXOPT (73) (convex conic optimization), and IPyparallel (parallelization).

Code Availability.

Python code for simulations and numerical solution of saddle-point equations is available upon request.

SI $𝜿_{out}^{max}$ and $𝜿_{in}^{max}$ Solutions in Purely E Networks

In purely E networks ( $f_{exc} = 1$ ) all solutions are unbalanced and output robustness can be achieved by sparse input (25) or tonic inhibition (28). However, the distinction between output robustness and input robustness still applies and, surprisingly, maximizing either $κ_{in}$ or $κ_{out}$ leads to two different solutions with qualitatively different properties.

In particular, as noted in refs. 25 and 27, the fraction of silent weights of the $κ_{out}^{\max}$ solutions increases as the load decreases. Thus, if the network implements the maximal $κ_{out}$ solution, network connectivity, as measured in a pairwise stimulation experiment, is expected to be sparse. However, for the maximal $κ_{in}$ solution the fraction of silent weights is constant and remains $1 / 2$ for all values of the load. Thus, measured network connectivity is expected to be higher.

Establishing correspondence between theory and experiment in this case is confounded by the difficulty to experimentally distinguish between silent synapses and completely absent synapses that were never available as inputs for the postsynaptic neuron during learning.

Acknowledgments

We thank Misha Tsodyks for helpful discussions. Research was supported by National Institutes of Health Grant MH093338 (to L.F.A. and R.R.), the Gatsby Charitable Foundation through the Gatsby Initiative in Brain Circuitry at Columbia University (L.F.A. and R.R.) and the Gatsby Program in Theoretical Neuroscience at the Hebrew University (H.S.), the Simons Foundation (L.F.A., R.R., and H.S.), the Swartz Foundation (L.F.A., R.R., and H.S.), and the Kavli Institute for Brain Science at Columbia University (L.F.A. and R.R.).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1705841114/-/DCSupplemental.

References

1.Anderson JS, Carandini M, Ferster D. Orientation tuning of input conductance, excitation, and inhibition in cat primary visual cortex. J Neurophysiol. 2000;84:909–926. doi: 10.1152/jn.2000.84.2.909. [DOI] [PubMed] [Google Scholar]
2.Wehr M, Zador AM. Balanced inhibition underlies tuning and sharpens spike timing in auditory cortex. Nature. 2003;426:442–446. doi: 10.1038/nature02116. [DOI] [PubMed] [Google Scholar]
3.Okun M, Lampl I. Instantaneous correlation of excitation and inhibition during ongoing and sensory-evoked activities. Nat Neurosci. 2008;11:535–537. doi: 10.1038/nn.2105. [DOI] [PubMed] [Google Scholar]
4.Poo C, Isaacson JS. Odor representations in olfactory cortex: “Sparse” coding, global inhibition, and oscillations. Neuron. 2009;62:850–861. doi: 10.1016/j.neuron.2009.05.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Atallah BV, Scanziani M. Instantaneous modulation of gamma oscillation frequency by balancing excitation with inhibition. Neuron. 2009;62:566–577. doi: 10.1016/j.neuron.2009.04.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Isaacson J, Scanziani M. How inhibition shapes cortical activity. Neuron. 2011;72:231–243. doi: 10.1016/j.neuron.2011.09.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.van Vreeswijk C, Sompolinsky H. Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science. 1996;274:1724–1726. doi: 10.1126/science.274.5293.1724. [DOI] [PubMed] [Google Scholar]
8.Vreeswijk Cv, Sompolinsky H. Chaotic balanced state in a model of cortical circuits. Neural Comput. 1998;10:1321–1371. doi: 10.1162/089976698300017214. [DOI] [PubMed] [Google Scholar]
9.Froemke RC, Merzenich MM, Schreiner CE. A synaptic memory trace for cortical receptive field plasticity. Nature. 2007;450:425–429. doi: 10.1038/nature06289. [DOI] [PubMed] [Google Scholar]
10.Dorrn AL, Yuan K, Barker AJ, Schreiner CE, Froemke RC. Developmental sensory experience balances cortical excitation and inhibition. Nature. 2010;465:932–936. doi: 10.1038/nature09119. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Sun YJ, et al. Fine-tuning of pre-balanced excitation and inhibition during auditory cortical development. Nature. 2010;465:927–931. doi: 10.1038/nature09079. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Li Yt, Ma Wp, Pan Cj, Zhang LI, Tao HW. Broadening of cortical inhibition mediates developmental sharpening of orientation selectivity. J Neurosci. 2012;32:3981–3991. doi: 10.1523/JNEUROSCI.5514-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Tsodyks MV, Sejnowski T. Rapid state switching in balanced cortical network models. Netw Comput Neural Syst. 1995;6:111–124. [Google Scholar]
14.van Vreeswijk C, Sompolinsky H. Course 9-irregular activity in large networks of neurons in Les Houches. In: Carson C, Boris G, David H, Claude M, Jean D, editors. Methods and Models in Neurophysics. Vol 80. Elsevier; Amsterdam: 2005. pp. 341–406. [Google Scholar]
15.Lim S, Goldman MS. Balanced cortical microcircuitry for maintaining information in working memory. Nat Neurosci. 2013;16:1306–1314. doi: 10.1038/nn.3492. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Boerlin M, Machens CK, Denève S. Predictive coding of dynamical variables in balanced spiking networks. PLoS Comput Biol. 2013;9:e1003258. doi: 10.1371/journal.pcbi.1003258. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Lajoie G, Lin KK, Thivierge JP, Shea-Brown E. Encoding in balanced networks: Revisiting spike patterns and chaos in stimulus-driven systems. PLoS Comput Biol. 2016;12:e1005258. doi: 10.1371/journal.pcbi.1005258. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Rosenblatt F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books; Washington, DC: 1962. [Google Scholar]
19.Minsky ML, Papert SA. Perceptrons: Expanded Edition. MIT Press Cambridge; MA: 1988. [Google Scholar]
20.Gardner E. Maximum storage capacity in neural networks. Europhys Lett. 1987;4:481–485. [Google Scholar]
21.Gardner E. The space of interactions in neural network models. J Phys A Math Gen. 1988;21:257–270. [Google Scholar]
22.Gardner E, Derrida B. Optimal storage properties of neural network models. J Phys A Math Gen. 1988;21:271–284. [Google Scholar]
23.Amit DJ, Campbell C, Wong KYM. The interaction space of neural networks with sign-constrained synapses. J Phys A Math Gen. 1989;22:4687–4693. [Google Scholar]
24.Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA. 1982;79:2554–2558. doi: 10.1073/pnas.79.8.2554. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Brunel N, Hakim V, Isope P, Nadal JP, Barbour B. Optimal information storage and the distribution of synaptic weights: Perceptron versus Purkinje cell. Neuron. 2004;43:745–757. doi: 10.1016/j.neuron.2004.08.023. [DOI] [PubMed] [Google Scholar]
26.Clopath C, Nadal JP, Brunel N. Storage of correlated patterns in standard and bistable Purkinje cell models. PLoS Comput Biol. 2012;8:e1002448. doi: 10.1371/journal.pcbi.1002448. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Chapeton J, Fares T, LaSota D, Stepanyants A. Efficient associative memory storage in cortical circuits of inhibitory and excitatory neurons. Proc Natl Acad Sci USA. 2012;109:E3614–E3622. doi: 10.1073/pnas.1211467109. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Brunel N. Is cortical connectivity optimized for storing information? Nat Neurosci. 2016;19:749–755. doi: 10.1038/nn.4286. [DOI] [PubMed] [Google Scholar]
29.Amit DJ, Wong KYM, Campbell C. Perceptron learning with sign-constrained weights. J Phys A Math Gen. 1989;22:2039–2045. [Google Scholar]
30.Denève S, Machens CK. Efficient codes and balanced networks. Nat Neurosci. 2016;19:375–382. doi: 10.1038/nn.4243. [DOI] [PubMed] [Google Scholar]
31.Brown DA, Adams PR. Muscarinic suppression of a novel voltage-sensitive K+ current in a vertebrate neurone. Nature. 1980;283:673–676. doi: 10.1038/283673a0. [DOI] [PubMed] [Google Scholar]
32.Madison DV, Nicoll RA. Control of the repetitive discharge of rat CA 1 pyramidal neurones in vitro. J Physiol. 1984;354:319–331. doi: 10.1113/jphysiol.1984.sp015378. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Fleidervish IA, Friedman A, Gutnick MJ. Slow inactivation of Na+ current and slow cumulative spike adaptation in mouse and guinea-pig neocortical neurones in slices. J Physiol. 1996;493:83–97. doi: 10.1113/jphysiol.1996.sp021366. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Benda J, Herz AVM. A universal model for spike-frequency adaptation. Neural Comput. 2003;15:2523–2564. doi: 10.1162/089976603322385063. [DOI] [PubMed] [Google Scholar]
35.Cover TM. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput. 1965;EC-14:326–334. [Google Scholar]
36.Venkatesh SS. Denker JS. AIP Conference Proceedings. Vol 151. AIP Publishing; Melville, NY: 1986. Epsilon capacity of neural networks; pp. 440–445. [Google Scholar]
37.Liu Bh, et al. Visual receptive field structure of cortical inhibitory neurons revealed by two-photon imaging guided recording. J Neurosci. 2009;29:10520–10532. doi: 10.1523/JNEUROSCI.1915-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Kerlin AM, Andermann ML, Berezovskii VK, Reid RC. Broadly tuned response properties of diverse inhibitory neuron subtypes in mouse visual cortex. Neuron. 2010;67:858–871. doi: 10.1016/j.neuron.2010.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Vapnik V. The Nature of Statistical Learning Theory. Springer; New York: 2000. [Google Scholar]
40.Bottou L, Lin CJ. Bottou L, Chapelle O, DeCoste D, Weston J. Large Scale Kernel Machines. MIT Press; Cambridge, MA: 2007. Support vector machine solvers; pp. 301–320. [Google Scholar]
41.Gütig R, Sompolinsky H. The tempotron: A neuron that learns spike timing-based decisions. Nat Neurosci. 2006;9:420–428. doi: 10.1038/nn1643. [DOI] [PubMed] [Google Scholar]
42.Memmesheimer RM, Rubin R, Ölveczky B, Sompolinsky H. Learning precisely timed spikes. Neuron. 2014;82:925–938. doi: 10.1016/j.neuron.2014.03.026. [DOI] [PubMed] [Google Scholar]
43.Gütig R. Spiking neurons can discover predictive features by aggregate-label learning. Science. 2016;351:aab4113. doi: 10.1126/science.aab4113. [DOI] [PubMed] [Google Scholar]
44.Rubin R, Gütig R, Sompolinsky H. DiLorenzo PM, Victor JD. Spike Timing: Mechanisms and Function. CRC Press; Boca Raton, FL: 2013. Neural coding and decoding with spike times; pp. 35–64. [Google Scholar]
45.Gütig R. To spike, or when to spike? Curr Opin Neurobiol. 2014;25:134–139. doi: 10.1016/j.conb.2014.01.004. [DOI] [PubMed] [Google Scholar]
46.Amit DJ, Gutfreund H, Sompolinsky H. Storing infinite numbers of patterns in a spin-glass model of neural networks. Phys Rev Lett. 1985;55:1530–1533. doi: 10.1103/PhysRevLett.55.1530. [DOI] [PubMed] [Google Scholar]
47.Tsodyks MV, Feigel’man MV. The enhanced storage capacity in neural networks with low activity level. Europhys Lett. 1988;6:101–105. [Google Scholar]
48.Roudi Y, Latham PE. A balanced memory network. PLOS Comput Biol. 2007;3:e141. doi: 10.1371/journal.pcbi.0030141. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Krauth W, Nadal JP, Mezard M. Basins of attraction in a perceptron-like neural network. Complex Syst. 1988;2:387–408. [Google Scholar]
50.Krauth W, Nadal JP, Mezard M. The roles of stability and symmetry in the dynamics of neural networks. J Phys A Math Gen. 1988;21:2995–3011. [Google Scholar]
51.Amit DJ, Treves A. Associative memory neural network with low temporal spiking rates. Proc Natl Acad Sci USA. 1989;86:7871–7875. doi: 10.1073/pnas.86.20.7871. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Golomb D, Rubin N, Sompolinsky H. Willshaw model: Associative memory with sparse coding and low firing rates. Phys Rev A. 1990;41:1843–1854. doi: 10.1103/physreva.41.1843. [DOI] [PubMed] [Google Scholar]
53.Hasselmo ME. Acetylcholine and learning in a cortical associative memory. Neural Comput. 1993;5:32–44. [Google Scholar]
54.Barkai E, Bergman RE, Horwitz G, Hasselmo ME. Modulation of associative memory function in a biophysical simulation of rat piriform cortex. J Neurophysiol. 1994;72:659–677. doi: 10.1152/jn.1994.72.2.659. [DOI] [PubMed] [Google Scholar]
55.Kadmon J, Sompolinsky H. Transition to chaos in random neuronal networks. Phys Rev X. 2015;5:041030. [Google Scholar]
56.Harish O, Hansel D. Asynchronous rate chaos in spiking neuronal circuits. PLoS Comput Biol. 2015;11:e1004266. doi: 10.1371/journal.pcbi.1004266. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Nugent FS, Kauer JA. LTP of GABAergic synapses in the ventral tegmental area and beyond. J Physiol. 2008;586:1487–1493. doi: 10.1113/jphysiol.2007.148098. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Chevaleyre V, Castillo PE. Heterosynaptic LTD of hippocampal GABAergic synapses: A novel role of endocannabinoids in regulating excitability. Neuron. 2003;38:461–472. doi: 10.1016/s0896-6273(03)00235-6. [DOI] [PubMed] [Google Scholar]
59.D’amour J, Froemke R. Inhibitory and excitatory spike-timing-dependent plasticity in the auditory cortex. Neuron. 2015;86:514–528. doi: 10.1016/j.neuron.2015.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.McBain CJ, Kauer JA. Presynaptic plasticity: Targeted control of inhibitory networks. Curr Opin Neurobiol. 2009;19:254–262. doi: 10.1016/j.conb.2009.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Lu Jt, Li Cy, Zhao JP, Poo Mm, Zhang Xh. Spike-timing-dependent plasticity of neocortical excitatory synapses on inhibitory interneurons depends on target cell type. J Neurosci. 2007;27:9711–9720. doi: 10.1523/JNEUROSCI.2513-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Kullmann DM, Lamsa KP. Long-term synaptic plasticity in hippocampal interneurons. Nat Rev Neurosci. 2007;8:687–699. doi: 10.1038/nrn2207. [DOI] [PubMed] [Google Scholar]
63.Lamsa KP, Kullmann DM, Woodin MA. Spike-timing dependent plasticity in inhibitory circuits. Front Synaptic Neurosci. 2010;2:8. doi: 10.3389/fnsyn.2010.00008. [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Larsen RS, Sjöström PJ. Synapse-type-specific plasticity in local circuits. Curr Opin Neurobiol. 2015;35:127–135. doi: 10.1016/j.conb.2015.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15:1929–1958. [Google Scholar]
66.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
67.Marr D. A theory of cerebellar cortex. J Physiol. 1969;202:437–470. doi: 10.1113/jphysiol.1969.sp008820. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Albus JS. A theory of cerebellar function. Math Biosci. 1971;10:25–61. [Google Scholar]
69.Hennequin G, Everton AJ, Vogels TP. Inhibitory plasticity: Balance, control and codependence. Annu Rev Neurosci. 2017;40:557, 579. doi: 10.1146/annurev-neuro-072116-031005. [DOI] [PubMed] [Google Scholar]
70.Wilent WB, Nitz DA. Discrete place fields of hippocampal formation interneurons. J Neurophysiol. 2007;97:4152–4161. doi: 10.1152/jn.01200.2006. [DOI] [PubMed] [Google Scholar]
71.Hunter JD. Matplotlib: A 2d graphics environment. Comput Sci Eng. 2007;9:90–95. [Google Scholar]
72.Pérez F, Granger BE. IPython: A system for interactive scientific computing. Comput Sci Eng. 2007;9:21–29. [Google Scholar]
73.Andersen M, Dahl J, Liu Z, Vandenberghe L. Sra S, Nowozin S, Wright SJ. Optimization for Machine Learning. MIT Press; Cambridge, MA: 2011. Interior-point methods for large-scale cone programming; pp. 55–83. [Google Scholar]
74.Litwin-Kumar A, Harris KD, Axel R, Sompolinsky H, Abbott LF. Optimal degrees of synaptic connectivity. Neuron. 2017;93:1153–1164.e7. doi: 10.1016/j.neuron.2017.01.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
75.Engel A, Broeck C. Statistical Mechanics of Learning. Cambridge Univ Press; Cambridge, UK: 2001. [Google Scholar]

[r1] 1.Anderson JS, Carandini M, Ferster D. Orientation tuning of input conductance, excitation, and inhibition in cat primary visual cortex. J Neurophysiol. 2000;84:909–926. doi: 10.1152/jn.2000.84.2.909. [DOI] [PubMed] [Google Scholar]

[r2] 2.Wehr M, Zador AM. Balanced inhibition underlies tuning and sharpens spike timing in auditory cortex. Nature. 2003;426:442–446. doi: 10.1038/nature02116. [DOI] [PubMed] [Google Scholar]

[r3] 3.Okun M, Lampl I. Instantaneous correlation of excitation and inhibition during ongoing and sensory-evoked activities. Nat Neurosci. 2008;11:535–537. doi: 10.1038/nn.2105. [DOI] [PubMed] [Google Scholar]

[r4] 4.Poo C, Isaacson JS. Odor representations in olfactory cortex: “Sparse” coding, global inhibition, and oscillations. Neuron. 2009;62:850–861. doi: 10.1016/j.neuron.2009.05.022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5] 5.Atallah BV, Scanziani M. Instantaneous modulation of gamma oscillation frequency by balancing excitation with inhibition. Neuron. 2009;62:566–577. doi: 10.1016/j.neuron.2009.04.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6] 6.Isaacson J, Scanziani M. How inhibition shapes cortical activity. Neuron. 2011;72:231–243. doi: 10.1016/j.neuron.2011.09.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7] 7.van Vreeswijk C, Sompolinsky H. Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science. 1996;274:1724–1726. doi: 10.1126/science.274.5293.1724. [DOI] [PubMed] [Google Scholar]

[r8] 8.Vreeswijk Cv, Sompolinsky H. Chaotic balanced state in a model of cortical circuits. Neural Comput. 1998;10:1321–1371. doi: 10.1162/089976698300017214. [DOI] [PubMed] [Google Scholar]

[r9] 9.Froemke RC, Merzenich MM, Schreiner CE. A synaptic memory trace for cortical receptive field plasticity. Nature. 2007;450:425–429. doi: 10.1038/nature06289. [DOI] [PubMed] [Google Scholar]

[r10] 10.Dorrn AL, Yuan K, Barker AJ, Schreiner CE, Froemke RC. Developmental sensory experience balances cortical excitation and inhibition. Nature. 2010;465:932–936. doi: 10.1038/nature09119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r11] 11.Sun YJ, et al. Fine-tuning of pre-balanced excitation and inhibition during auditory cortical development. Nature. 2010;465:927–931. doi: 10.1038/nature09079. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r12] 12.Li Yt, Ma Wp, Pan Cj, Zhang LI, Tao HW. Broadening of cortical inhibition mediates developmental sharpening of orientation selectivity. J Neurosci. 2012;32:3981–3991. doi: 10.1523/JNEUROSCI.5514-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13] 13.Tsodyks MV, Sejnowski T. Rapid state switching in balanced cortical network models. Netw Comput Neural Syst. 1995;6:111–124. [Google Scholar]

[r14] 14.van Vreeswijk C, Sompolinsky H. Course 9-irregular activity in large networks of neurons in Les Houches. In: Carson C, Boris G, David H, Claude M, Jean D, editors. Methods and Models in Neurophysics. Vol 80. Elsevier; Amsterdam: 2005. pp. 341–406. [Google Scholar]

[r15] 15.Lim S, Goldman MS. Balanced cortical microcircuitry for maintaining information in working memory. Nat Neurosci. 2013;16:1306–1314. doi: 10.1038/nn.3492. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16] 16.Boerlin M, Machens CK, Denève S. Predictive coding of dynamical variables in balanced spiking networks. PLoS Comput Biol. 2013;9:e1003258. doi: 10.1371/journal.pcbi.1003258. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r17] 17.Lajoie G, Lin KK, Thivierge JP, Shea-Brown E. Encoding in balanced networks: Revisiting spike patterns and chaos in stimulus-driven systems. PLoS Comput Biol. 2016;12:e1005258. doi: 10.1371/journal.pcbi.1005258. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r18] 18.Rosenblatt F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books; Washington, DC: 1962. [Google Scholar]

[r19] 19.Minsky ML, Papert SA. Perceptrons: Expanded Edition. MIT Press Cambridge; MA: 1988. [Google Scholar]

[r20] 20.Gardner E. Maximum storage capacity in neural networks. Europhys Lett. 1987;4:481–485. [Google Scholar]

[r21] 21.Gardner E. The space of interactions in neural network models. J Phys A Math Gen. 1988;21:257–270. [Google Scholar]

[r22] 22.Gardner E, Derrida B. Optimal storage properties of neural network models. J Phys A Math Gen. 1988;21:271–284. [Google Scholar]

[r23] 23.Amit DJ, Campbell C, Wong KYM. The interaction space of neural networks with sign-constrained synapses. J Phys A Math Gen. 1989;22:4687–4693. [Google Scholar]

[r24] 24.Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA. 1982;79:2554–2558. doi: 10.1073/pnas.79.8.2554. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r25] 25.Brunel N, Hakim V, Isope P, Nadal JP, Barbour B. Optimal information storage and the distribution of synaptic weights: Perceptron versus Purkinje cell. Neuron. 2004;43:745–757. doi: 10.1016/j.neuron.2004.08.023. [DOI] [PubMed] [Google Scholar]

[r26] 26.Clopath C, Nadal JP, Brunel N. Storage of correlated patterns in standard and bistable Purkinje cell models. PLoS Comput Biol. 2012;8:e1002448. doi: 10.1371/journal.pcbi.1002448. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r27] 27.Chapeton J, Fares T, LaSota D, Stepanyants A. Efficient associative memory storage in cortical circuits of inhibitory and excitatory neurons. Proc Natl Acad Sci USA. 2012;109:E3614–E3622. doi: 10.1073/pnas.1211467109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r28] 28.Brunel N. Is cortical connectivity optimized for storing information? Nat Neurosci. 2016;19:749–755. doi: 10.1038/nn.4286. [DOI] [PubMed] [Google Scholar]

[r29] 29.Amit DJ, Wong KYM, Campbell C. Perceptron learning with sign-constrained weights. J Phys A Math Gen. 1989;22:2039–2045. [Google Scholar]

[r30] 30.Denève S, Machens CK. Efficient codes and balanced networks. Nat Neurosci. 2016;19:375–382. doi: 10.1038/nn.4243. [DOI] [PubMed] [Google Scholar]

[r31] 31.Brown DA, Adams PR. Muscarinic suppression of a novel voltage-sensitive K+ current in a vertebrate neurone. Nature. 1980;283:673–676. doi: 10.1038/283673a0. [DOI] [PubMed] [Google Scholar]

[r32] 32.Madison DV, Nicoll RA. Control of the repetitive discharge of rat CA 1 pyramidal neurones in vitro. J Physiol. 1984;354:319–331. doi: 10.1113/jphysiol.1984.sp015378. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r33] 33.Fleidervish IA, Friedman A, Gutnick MJ. Slow inactivation of Na+ current and slow cumulative spike adaptation in mouse and guinea-pig neocortical neurones in slices. J Physiol. 1996;493:83–97. doi: 10.1113/jphysiol.1996.sp021366. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r34] 34.Benda J, Herz AVM. A universal model for spike-frequency adaptation. Neural Comput. 2003;15:2523–2564. doi: 10.1162/089976603322385063. [DOI] [PubMed] [Google Scholar]

[r35] 35.Cover TM. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput. 1965;EC-14:326–334. [Google Scholar]

[r36] 36.Venkatesh SS. Denker JS. AIP Conference Proceedings. Vol 151. AIP Publishing; Melville, NY: 1986. Epsilon capacity of neural networks; pp. 440–445. [Google Scholar]

[r37] 37.Liu Bh, et al. Visual receptive field structure of cortical inhibitory neurons revealed by two-photon imaging guided recording. J Neurosci. 2009;29:10520–10532. doi: 10.1523/JNEUROSCI.1915-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r38] 38.Kerlin AM, Andermann ML, Berezovskii VK, Reid RC. Broadly tuned response properties of diverse inhibitory neuron subtypes in mouse visual cortex. Neuron. 2010;67:858–871. doi: 10.1016/j.neuron.2010.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r39] 39.Vapnik V. The Nature of Statistical Learning Theory. Springer; New York: 2000. [Google Scholar]

[r40] 40.Bottou L, Lin CJ. Bottou L, Chapelle O, DeCoste D, Weston J. Large Scale Kernel Machines. MIT Press; Cambridge, MA: 2007. Support vector machine solvers; pp. 301–320. [Google Scholar]

[r41] 41.Gütig R, Sompolinsky H. The tempotron: A neuron that learns spike timing-based decisions. Nat Neurosci. 2006;9:420–428. doi: 10.1038/nn1643. [DOI] [PubMed] [Google Scholar]

[r42] 42.Memmesheimer RM, Rubin R, Ölveczky B, Sompolinsky H. Learning precisely timed spikes. Neuron. 2014;82:925–938. doi: 10.1016/j.neuron.2014.03.026. [DOI] [PubMed] [Google Scholar]

[r43] 43.Gütig R. Spiking neurons can discover predictive features by aggregate-label learning. Science. 2016;351:aab4113. doi: 10.1126/science.aab4113. [DOI] [PubMed] [Google Scholar]

[r44] 44.Rubin R, Gütig R, Sompolinsky H. DiLorenzo PM, Victor JD. Spike Timing: Mechanisms and Function. CRC Press; Boca Raton, FL: 2013. Neural coding and decoding with spike times; pp. 35–64. [Google Scholar]

[r45] 45.Gütig R. To spike, or when to spike? Curr Opin Neurobiol. 2014;25:134–139. doi: 10.1016/j.conb.2014.01.004. [DOI] [PubMed] [Google Scholar]

[r46] 46.Amit DJ, Gutfreund H, Sompolinsky H. Storing infinite numbers of patterns in a spin-glass model of neural networks. Phys Rev Lett. 1985;55:1530–1533. doi: 10.1103/PhysRevLett.55.1530. [DOI] [PubMed] [Google Scholar]

[r47] 47.Tsodyks MV, Feigel’man MV. The enhanced storage capacity in neural networks with low activity level. Europhys Lett. 1988;6:101–105. [Google Scholar]

[r48] 48.Roudi Y, Latham PE. A balanced memory network. PLOS Comput Biol. 2007;3:e141. doi: 10.1371/journal.pcbi.0030141. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r49] 49.Krauth W, Nadal JP, Mezard M. Basins of attraction in a perceptron-like neural network. Complex Syst. 1988;2:387–408. [Google Scholar]

[r50] 50.Krauth W, Nadal JP, Mezard M. The roles of stability and symmetry in the dynamics of neural networks. J Phys A Math Gen. 1988;21:2995–3011. [Google Scholar]

[r51] 51.Amit DJ, Treves A. Associative memory neural network with low temporal spiking rates. Proc Natl Acad Sci USA. 1989;86:7871–7875. doi: 10.1073/pnas.86.20.7871. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r52] 52.Golomb D, Rubin N, Sompolinsky H. Willshaw model: Associative memory with sparse coding and low firing rates. Phys Rev A. 1990;41:1843–1854. doi: 10.1103/physreva.41.1843. [DOI] [PubMed] [Google Scholar]

[r53] 53.Hasselmo ME. Acetylcholine and learning in a cortical associative memory. Neural Comput. 1993;5:32–44. [Google Scholar]

[r54] 54.Barkai E, Bergman RE, Horwitz G, Hasselmo ME. Modulation of associative memory function in a biophysical simulation of rat piriform cortex. J Neurophysiol. 1994;72:659–677. doi: 10.1152/jn.1994.72.2.659. [DOI] [PubMed] [Google Scholar]

[r55] 55.Kadmon J, Sompolinsky H. Transition to chaos in random neuronal networks. Phys Rev X. 2015;5:041030. [Google Scholar]

[r56] 56.Harish O, Hansel D. Asynchronous rate chaos in spiking neuronal circuits. PLoS Comput Biol. 2015;11:e1004266. doi: 10.1371/journal.pcbi.1004266. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r57] 57.Nugent FS, Kauer JA. LTP of GABAergic synapses in the ventral tegmental area and beyond. J Physiol. 2008;586:1487–1493. doi: 10.1113/jphysiol.2007.148098. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r58] 58.Chevaleyre V, Castillo PE. Heterosynaptic LTD of hippocampal GABAergic synapses: A novel role of endocannabinoids in regulating excitability. Neuron. 2003;38:461–472. doi: 10.1016/s0896-6273(03)00235-6. [DOI] [PubMed] [Google Scholar]

[r59] 59.D’amour J, Froemke R. Inhibitory and excitatory spike-timing-dependent plasticity in the auditory cortex. Neuron. 2015;86:514–528. doi: 10.1016/j.neuron.2015.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r60] 60.McBain CJ, Kauer JA. Presynaptic plasticity: Targeted control of inhibitory networks. Curr Opin Neurobiol. 2009;19:254–262. doi: 10.1016/j.conb.2009.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r61] 61.Lu Jt, Li Cy, Zhao JP, Poo Mm, Zhang Xh. Spike-timing-dependent plasticity of neocortical excitatory synapses on inhibitory interneurons depends on target cell type. J Neurosci. 2007;27:9711–9720. doi: 10.1523/JNEUROSCI.2513-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r62] 62.Kullmann DM, Lamsa KP. Long-term synaptic plasticity in hippocampal interneurons. Nat Rev Neurosci. 2007;8:687–699. doi: 10.1038/nrn2207. [DOI] [PubMed] [Google Scholar]

[r63] 63.Lamsa KP, Kullmann DM, Woodin MA. Spike-timing dependent plasticity in inhibitory circuits. Front Synaptic Neurosci. 2010;2:8. doi: 10.3389/fnsyn.2010.00008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r64] 64.Larsen RS, Sjöström PJ. Synapse-type-specific plasticity in local circuits. Curr Opin Neurobiol. 2015;35:127–135. doi: 10.1016/j.conb.2015.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r65] 65.Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15:1929–1958. [Google Scholar]

[r66] 66.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]

[r67] 67.Marr D. A theory of cerebellar cortex. J Physiol. 1969;202:437–470. doi: 10.1113/jphysiol.1969.sp008820. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r68] 68.Albus JS. A theory of cerebellar function. Math Biosci. 1971;10:25–61. [Google Scholar]

[r69] 69.Hennequin G, Everton AJ, Vogels TP. Inhibitory plasticity: Balance, control and codependence. Annu Rev Neurosci. 2017;40:557, 579. doi: 10.1146/annurev-neuro-072116-031005. [DOI] [PubMed] [Google Scholar]

[r70] 70.Wilent WB, Nitz DA. Discrete place fields of hippocampal formation interneurons. J Neurophysiol. 2007;97:4152–4161. doi: 10.1152/jn.01200.2006. [DOI] [PubMed] [Google Scholar]

[r71] 71.Hunter JD. Matplotlib: A 2d graphics environment. Comput Sci Eng. 2007;9:90–95. [Google Scholar]

[r72] 72.Pérez F, Granger BE. IPython: A system for interactive scientific computing. Comput Sci Eng. 2007;9:21–29. [Google Scholar]

[r73] 73.Andersen M, Dahl J, Liu Z, Vandenberghe L. Sra S, Nowozin S, Wright SJ. Optimization for Machine Learning. MIT Press; Cambridge, MA: 2011. Interior-point methods for large-scale cone programming; pp. 55–83. [Google Scholar]

[r74] 74.Litwin-Kumar A, Harris KD, Axel R, Sompolinsky H, Abbott LF. Optimal degrees of synaptic connectivity. Neuron. 2017;93:1153–1164.e7. doi: 10.1016/j.neuron.2017.01.030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r75] 75.Engel A, Broeck C. Statistical Mechanics of Learning. Cambridge Univ Press; Cambridge, UK: 2001. [Google Scholar]

PERMALINK

Balanced excitation and inhibition are required for high-capacity, noise-robust neuronal selectivity

Ran Rubin

L F Abbott

Haim Sompolinsky

Series information

Significance

Abstract

Results

Fig. 1.

Balanced and Unbalanced Solutions.

Fig. 2.

Fig. S2.

Noise Robustness of Balanced and Unbalanced Solutions.

Fig. 3.

Fig. S3.

Balanced and Unbalanced Solutions for Spiking Neurons.

Fig. S4.

Fig. 4.

Balanced and Unbalanced Synaptic Weights in Associative Memory Networks.

Fig. 5.

The Role of Inhibition in Associative Memory Networks.

Fig. S5.

Learning Robust Solutions.

Fig. 6.

Fig. S6.

SI Materials and Methods

Finding Perceptron Solutions.

Random Patterns in Numerical Estimation of 𝜿outmax and 𝜿inmax Solutions.

Fig. S1.

Dynamics of LIF Neuron.

Input spike trains.

Synaptic input.

Output noise.

Voltage reset.

Membrane potential.

Simulations of Recurrent Networks.

Memory states.

Initial pattern distortion.

Perceptron Learning Algorithm.

Figure Parameters.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Finding Maximal 𝜿in and Maximal 𝜿out Solutions.

SI Capacity for Noneven Split of Plus and Minus Patterns

SI Effects of E and I Input Statistics

Balanced and Unbalanced Maximal 𝜿in Solutions.

Fraction of “Silent” Weights in Maximal 𝜿in and Maximal 𝜿out Solutions.

Tuning Properties of Cortical Neurons Suggest That in Cortex fexc⋆>0.5.

SI Recurrent Networks with Nonlearned Inhibition

Choosing Random Synapses for I Neurons.

Training Set Definition.

Learned Network Stability.

Learning Only E to E Connections.

Learning Both E to E and I to E Connections.

SI Replica Theory for Sign- and Norm-Constrained Perceptron

The Order Parameters.

Summary of Main Results.

Detailed Solutions of the SP Equations.

The general SP equations.

SP equations for typical solutions.

Solutions with significant 𝜿𝐨𝐮𝐭 are balanced.

Theorem.

Proof.

SP equations for critical solutions.

Capacity and balanced capacity.

Unbalanced solution.

Balanced solution.

Coexistence of balanced and unbalanced solutions below the balanced capacity line.

SP equations for the maximal 𝜿𝐢𝐧 solution.

Unbalanced solution.

Balanced solution.

Transition between balanced and unbalanced solutions.

SP equations for the maximal 𝜿𝐨𝐮𝐭 solution.

Unbalanced solution.

Balanced solution.

Distribution of Synaptic Weights.

Random Patterns in Numerical Estimation of $𝜿_{out}^{max}$ and $𝜿_{in}^{max}$ Solutions.

Finding Maximal $𝜿_{in}$ and Maximal $𝜿_{out}$ Solutions.

Balanced and Unbalanced Maximal $𝜿_{in}$ Solutions.

Fraction of “Silent” Weights in Maximal $𝜿_{in}$ and Maximal $𝜿_{out}$ Solutions.

Tuning Properties of Cortical Neurons Suggest That in Cortex $f_{exc}^{⋆} > 0.5$ .

Solutions with significant $𝜿_{𝐨𝐮𝐭}$ are balanced.

SP equations for the maximal $𝜿_{𝐢𝐧}$ solution.

SP equations for the maximal $𝜿_{𝐨𝐮𝐭}$ solution.

SI $𝜿_{out}^{max}$ and $𝜿_{in}^{max}$ Solutions in Purely E Networks