Abstract
Neural computation in biological and artificial networks relies on the nonlinear summation of many inputs. The structural connectivity matrix of synaptic weights between neurons is a critical determinant of overall network function, but quantitative links between neural network structure and function are complex and subtle. For example, many networks can give rise to similar functional responses, and the same network can function differently depending on context. Whether certain patterns of synaptic connectivity are required to generate specific network-level computations is largely unknown. Here we introduce a geometric framework for identifying synaptic connections required by steady-state responses in recurrent networks of threshold-linear neurons. Assuming that the number of specified response patterns does not exceed the number of input synapses, we analytically calculate the solution space of all feedforward and recurrent connectivity matrices that can generate the specified responses from the network inputs. A generalization accounting for noise further reveals that the solution space geometry can undergo topological transitions as the allowed error increases, which could provide insight into both neuroscience and machine learning. We ultimately use this geometric characterization to derive certainty conditions guaranteeing a nonzero synapse between neurons. Our theoretical framework could thus be applied to neural activity data to make rigorous anatomical predictions that follow generally from the model architecture.
I. INTRODUCTION
Structure-function relationships are fundamental to biology [1–3]. In neural networks, the structure of synaptic connectivity critically shapes the functional responses of neurons [4,5], and large-scale techniques for measuring neural network structure and function provide exciting opportunities for examining this link quantitatively [6–15]. The ellipsoid body in the central complex of Drosophila is a beautiful example where modeling showed how the structural pattern of excitatory and inhibitory connections enables a persistent representation of heading direction [16–19]. Lucid structure-function links have also been found in several other neural networks [20–23]. However, it is generally hard to predict either neural network structure or function from the other [5,24]. For example, functionally inferred connectivity can capture neuronal response correlations without matching structural connectivity [25–28], and network simulations with structural constraints do not automatically reproduce function [29–31]. Two broad modeling difficulties hinder the establishment of robust structure-function links. First, models with too much detail are difficult to adequately constrain and analyze. Second, models with too little detail may poorly match biological mechanisms, the model mismatch problem. Here we propose a rigorous theoretical framework that attempts to balance these competing factors to predict components of network structure required for function.
Neural network function probably does not depend on the exact strength of every synapse. Indeed, multiple network connectivity structures can generate the same functional responses [32,33], as illustrated by structural variability across individual animals [24,34] and artificial neural networks [29,35–37]. Such redundancy may be a general feature of emergent phenomena in physics, biology, and neuroscience [38–40]. Nevertheless, some important details may be consistent despite this variability, and here we find well-constrained structure-function links by characterizing all connectivity structures that are consistent with the desired functional responses [24]. We also account for ambiguities caused by measurement noise. Our goal is not to find degenerate networks that perform equivalently in all possible scenarios. We instead seek a framework that finds connectivity required for specific functional responses, independently of whatever else the network might do.
The model mismatch problem has at least two facets. First, neurons and synapses are incredibly complex [41–44], but which complexities are needed to elucidate specific structure-function relationships is unclear [5,45,46]. This issue is very hard to address in full generality, and here we seek a theoretical framework that makes clear experimental predictions that can adjudicate candidate models empirically. In particular, we predict neural network structure only when it occurs in all networks generating the functional responses. This high bar precludes the analysis of biophysically-detailed network models, which require numerical exploration of the connectivity space that is typically incomplete [24,32,47–49]. We instead focus on recurrent firing rate networks of threshold-linear neurons, which are growing in popularity because they strike an appealing balance between biological realism, computational power, and mathematical tractability [12,16,18,20,22,23,29,30,37,50–55].
The second facet of the model mismatch problem is hidden variables, such as missing neurons, neuromodulator levels, and physiological states [5,56–58]. Here we take inspiration from whole-brain imaging in small organisms [15], such as Caenorhabditis elegans [9], larval zebrafish [8,12,57], and larval Drosophila [11], and assume access to all relevant neurons. Our model neglects neuromodulators and other state variables, which would be interesting to consider in the future. Furthermore, many experiments indirectly assess neuronal spiking activity, such as by calcium florescence [58–61] or hemodynamic responses [25,62–64]. We restrict our analysis to steady-state responses to mitigate mismatch between fast firing rate changes and these inherently slow measurement techniques.
Our analysis begins with an analytical characterization of synaptic weight matrices that realize specified steady-state responses as fixed points of neural network dynamics [Figs. 1(a) and 1(b)]. A key insight is that asymmetrically constrained dimensions appear as a consequence of the threshold nonlinearity. Synaptic weight components in these semiconstrained dimensions are completely uncertain in one half of the dimension but well-constrained in the other. We then compute error surfaces by finding weight matrices with fixed points near the desired ones. This error landscape has a continuum of local and global minima, and constant-error surfaces exhibit topological transitions that add semiconstrained dimensions as the error increases. This may help explain the importance of weight initialization in machine learning, as poorly initialized models can get stuck in semiconstrained dimensions that abruptly vanish at nonzero error. By studying the geometric structure of the neural network ensemble that can approximate the functional responses, we derive analytical formulas that pinpoint a subset of connections, which we term certain synapses, that must exist for the model to work [Fig. 1(c)]. These analytical results are especially useful for studying high-dimensional synaptic weight spaces that are otherwise intractable. Since the presence of a synapse is readily measurable, our theory generates accessible experimental predictions [Fig. 1(c)]. Tests of these predictions assess the utility of the modeling framework itself, as the predictions hold across model parameters. Their successes and failures can thus move us forward toward identifying the mechanistic principles governing how neural networks implement brain computations.
The rest of the paper begins in Sec. II with a toy problem that concretely demonstrates the approach illustrated in Fig. 1 and relates the geometry of the solution space (all synaptic weight matrices that realize a given set of response patterns) to the concept of a certain synapse. In Sec. III, we explain how the solution space for a limited number of response patterns can be calculated for an arbitrarily large threshold-linear recurrent neural network. Section IV is devoted to three simple toy problems that provide additional insights into how the geometry of the solution space can help us to identify certain synapses. This is followed by Sec. V, where we explain and numerically test the precise algebraic relation that must be satisfied for a synapse to be certain when the response patterns are orthonormal. Section VI generalizes our analyses to include noise, including numerical tests via simulation. Finally, Sec. VII concludes the paper by summarizing our main results and discussing important future directions.
II. AN ILLUSTRATIVE TOY PROBLEM
To gain intuition on how robust structure-function links can be established, including the effects of nonlinearity, we begin by analyzing the structural implications of functional responses in a very simple threshold-linear feedforward network [Fig. 2(a)]. We assume that two input neurons, x1 and x2, provide signals to a single driven neuron, y, via synaptic weights, w1 and w2. The weights are unknown, and we constrain their possible values using two neuronal response patterns, labeled μ = + and μ = −. We suppose that steady-state activities of the input neurons and driven neuron are nonlinearly related according to
(1) |
where x1, x2, and y denote firing rates of the corresponding neurons, and
(2) |
is the threshold-linear transfer function. The driven neuron responds (y = 1) when x1 = x2 = 1 in the μ = + pattern. In contrast, the driven neuron does not respond (y = 0) when x1 = −x2 = 1 in the μ = − pattern. If the transfer function were linear, then it is easy to see that there is a unique set of weights, , that produces these driven neuron responses, the brown dot in Fig. 2(b).
How does the nonlinearity change the solution space of weights that reproduce the driven neuron responses? To answer this question, we define two linear combinations of weights,
(3) |
which correspond to the driven neuron’s input drive in patterns μ = ±. Equation (1) now yields rather simple algebraic constraints for the two patterns:
(4) |
(5) |
Note that η− would have had to be zero if Φ were linear, but because the threshold-linear transfer function turns everything negative into a null response, η− can now also be any negative number. However, sufficiently negative values of η− correspond to implausibly large weight vectors, and hence we focus on solutions with norm bounded above by some value,W. The nonlinearity thus turns the unique linear solution [brown dot in Fig. 2(b)] into a continuum of solutions [yellow line segment in Fig. 2(b)]. This continuum lies along what we will refer to as a semiconstrained dimension. Indeed, this will turn out to be a generic feature of threshold-linear neural networks: every time there is a null response, a semiconstrained dimension emerges in the solution space.1
Although we found infinitely many weight vectors that solve the problem, all solutions to the problem have a synaptic connection x2 → y, and this connection is always excitatory [Fig. 2(b)]. Positive, negative, or zero connection weights are all possible for x1 → y. However, this reveals why the value of the synaptic weight bound, W, has important implications for the solution space. For example, all solutions in Fig. 2(b) with have w1 > 0, whereas larger magnitude weight vectors have w1 ≤ 0. Therefore, one would be certain that an excitatory x1 → y synapse exists if the weight bound were biologically known to be less than Wcr = 1. We refer to this weight bound as W-critical. Looser weight bounds raise the possibility that the synapse is absent or inhibitory. Note that too tight weight bounds, here less than , can exclude all solutions.
The example of Fig. 2 concretely illustrates the general procedure diagramed in Fig. 1. First, we specified a network architecture and steady-state response patterns [Figs. 1(a) and 2(a)]. Second, we found all synaptic weight vectors that can implement the nonlinear transformation [Figs. 1(b) and 2(b)]. Finally, we determined whether individual synaptic weights varied in sign across the solution space [Figs. 1(c) and 2(b)]. Section III will generalize the first two parts of this procedure to characterize the solution space of any threshold-linear recurrent neural network, assuming that the number of response patterns is at most the dimensionality of the weight vectors. Sections IV and V will then generalize the final part of this procedure to pinpoint synaptic connections that are critical for generating any specified set of orthonormal responses.
III. SOLUTION SPACE GEOMETRY
A. Neural network structure and dynamics
Consider a neural network of ℐ input neurons that send signals to a recurrently connected population of 𝒟 driven neurons [Fig. 3(a)]. We compactly represent the network connectivity with a matrix of synaptic weights, wim, where i = 1, …, 𝒟 indexes the driven neurons, and m = 1, …, 𝒟 + ℐ indexes presynaptic neurons from both the driven and input populations. We suppose that activity in the population of driven neurons dynamically evolves according to
(6) |
where yi is the firing rate of the ith driven neuron, xm is the firing rate of the mth input neuron, and τi is the time constant that determines how long the ith driven neuron integrates its presynaptic signals. It is possible that prior biological knowledge dictates that certain synapses appearing in Eq. (6) are absent. For notational convenience, in this paper we will assume that the number of synapses onto each driven neuron remains the same,2 and we will denote this number of the incoming synapses as 𝒩. Note that 𝒩 = ℐ + 𝒟 for a general recurrent network, 𝒩 = ℐ + 𝒟 − 1 for recurrent networks without self-synapses, and 𝒩 = ℐ for feedforward networks. We suppose that the network functionally maps input patterns, xμm, to steady-state driven signals, yμi ⩾ 0, where μ = 1, …, 𝒫 labels the patterns [Fig. 3(b)]. We assume throughout that 𝒫 ⩽ 𝒩, as the number of known response patterns is typically small, and the number of possible synaptic inputs is large. Experimentally, different response patterns often correspond to different stimulus conditions, so we will often refer to μ as a stimulus index and xμm → yμi as a stimulus transformation.
B. Decomposing a recurrent network into 𝒟 feedforward networks
Our goal is to find features of the synaptic weight matrix that are required for the stimulus transformation discussed above. For notational simplicity, let us consider the case where we potentially have all-to-all connectivity, so that 𝒩 = 𝒟 + ℐ, but we will later explain how our arguments generalize. Since all time-derivatives are zero at steady-state, the response properties provide 𝒟 × 𝒫 nonlinear equations for 𝒟 × 𝒩 unknown parameters3:
(7) |
Inspection of the above equation, however, reveals that each neuron’s steady-state activity depends only on a single row of the connectivity matrix [Fig. 3(c)]; the responses of the ith driven neuron, {yμi, μ = 1, …, P}, are only affected by its incoming synaptic weights, {wim, m = 1, …, 𝒩}. Thus, the above equations separate into 𝒟 independent sets of equations, one for each driven neuron. In other words, we now have to solve 𝒟 feedforward problems, each of which will characterize the incoming synaptic weights of a particular driven neuron, which we term the target neuron. Note that since a generic target neuron receives signals from both the input and the driven populations, the activities of both input and driven neurons serve to produce the presynaptic input patterns that drive the responses of the target neuron in the reduced feedforward problem.
C. Solution space for feedforward networks
We have just seen how we can solve the problem of finding synaptic weights consistent with steady-state responses of a recurrent population of neurons, provided we know how to solve the equivalent problem for feedforward networks. Accordingly, we will now focus on a feedforward network, where a single target neuron, y, receives inputs from 𝒩 neurons {xm; m = 1, …, 𝒩}, to find the ensemble of synaptic weights that reproduce this target neuron’s observed responses. The constraint equations are
(8) |
where yμ now stands for the activity of the target neuron driven by the μth input pattern, and is the 𝒩-vector of synaptic weights onto the target neuron. Assuming that the 𝒫 × 𝒩 matrix x is rank P, we let the 𝒩 × 𝒩 matrix X be rank 𝒩 with Xμm = xμm for μ = 1, …, 𝒫. This implies that the last 𝒩 − 𝒫 rows of X span the null space of x, and X defines a basis transformation on the weight space,
(9) |
The 𝒩 linearly independent columns of X−1 define the basis vectors corresponding to the η coordinates,
(10) |
In other words,
(11) |
where is the physical orthonormal basis whose coordinates, {wm}, correspond to the material substrates of network connectivity. These basis vectors can be obtained from by an inverse basis transformation:
(12) |
We can thus write any vector of incoming weights as
(13) |
In terms of η coordinates, the nonlinear constraint equations take a rather simple form:
(14) |
Accordingly, η coordinates succinctly parametrize the solution space of all weight matrices that support the specified fixed points [Fig. 3(d)]. Each η dimension can be neatly categorized into one of three types. First, for each stimulus condition μ where yμ > 0, we must have ημ > 0. This in turn implies that Φ(ημ) = ημ = yμ. Because the coordinate ημ must adopt a specific value to generate the transformation, we say that μ defines a constrained dimension. We denote the number of constrained dimensions as 𝒞 ⩽ 𝒫. Second, note that the threshold in the transfer function implies that Φ(a) = 0 for all a ⩽ 0. Therefore, for any stimulus condition such that yμ = 0, we have a solution whenever ημ 0. Because positive values of ημ are excluded but all negative values are equally consistent with the transformation, we say that μ defines a semiconstrained dimension. We denote the number of semiconstrained dimensions as 𝒮 = 𝒫 – 𝒞. Finally, we have no constraint equations for ημ if μ = 𝒫 + 1,···, 𝒩. Because all positive or negative values of ημ are equally consistent with the stimulus transformation, we say that μ defines an unconstrained dimension. We denote the number of unconstrained dimensions as 𝒰 = 𝒩 − 𝒫. Altogether, the stimulus transformation is consistent with every incoming weight vector that satisfies
(15) |
Note that one can enumerate the solutions in the physically meaningful w coordinates by simply applying the inverse basis transformation in Eq. (9) to any solution found in η coordinates.
Going forward, it will be convenient to extend the 𝒫-dimensional vector of target neuron activity to an 𝒩-dimensional vector whose components along the unconstrained dimensions are equal to zero, because this will allow us to compactly write equations in terms of dot products between the activity vector and vectors in the 𝒩-dimensional weight space. Rather than introducing a new notation for this extended 𝒩-dimensional vector, we simply write with yμ = 0 for μ = 𝒫 + 1, …, 𝒩. It is critical to remember that this is merely a notational convenience, and the solution space distinguishes between semiconstrained dimensions and unconstrained dimensions according to Eq. (15). In particular, yμ = 0 is a constraint equation for semiconstrained dimensions, but yμ = 0 is a notational convenience for unconstrained dimensions.
D. Back to the recurrent network
To understand how the solution space geometry of the feedforward network can be translated back to the recurrent network, it is useful to group together the steady-state activities of all input and driven neurons that are presynaptic to the ith driven neuron as a 𝒫 × 𝒩 input pattern matrix, z(i).4 The entries of the matrix, , correspond to the responses of the mth presynaptic neuron to the μth stimulus. At this point it is easy to see that when biological constraints dictate that some of the synapses are absent, then one should just exclude those presynaptic neurons when constructing z(i), such that the m index excludes those presynaptic neurons. Similarly, by a suitable reordering, which will depend on the driven neuron, we can always ensure that m = 1, …, 𝒩 runs only over the neurons that are presynaptic to the given driven neuron.
Once the input patterns feeding into the ith neuron are known, we can follow the steps outlined in the previous subsection to define the 𝒩 × 𝒩 full rank extension of z(i), Z(i), and the η(i) coordinates via
(16) |
The nature of the coordinates, that is whether they are constrained, semiconstrained, or unconstrained, is determined by how the ith neuron responded to the stimulus conditions, as in Eq. (15). Repeating this process for all driven neurons provides a geometric characterization of the entire recurrent network solution space, which involves all elements of the synaptic weight matrix, wim.
An important special case is all-to-all network connectivity. In this case, the Z(i) matrices are the same for all driven neurons, and therefore the directions corresponding to the η coordinates are also preserved.5 In particular, the orientation of the unconstrained subspace with respect to the physical basis does not change from one driven neuron to another. However, how a given driven neuron responds to a particular stimulus determines whether the corresponding η direction is going to be constrained or semiconstrained for the feedforward network associated with that driven neuron.
IV. CERTAIN SYNAPSES IN ILLUSTRATIVE 3D EXAMPLES
Although we have found infinitely many weight matrices that produce a given stimulus transformation, it is nevertheless possible that the solutions imply firm anatomical constraints (e.g., Sec. II). In this paper we focus on finding synapses that must be nonzero in order for the response patterns to be fixed points of the neural network dynamics. We refer to such synapses as certain, because the synapse must exist in the model, and its sign is identifiable from the response patterns. It is clear from the geometry of the solution space that the relative orientations between the η coordinates and the physical w coordinates are significant determinants of synapse certainty. To build quantitative intuition for how the solution space geometry precisely determines synapse certainty, we begin by first analyzing a few illustrative toy problems. In the next section we will describe the more general treatment of high-dimensional networks. Importantly, we select and parametrize each toy problem to introduce concepts and notations that will reappear in the general solution.
More specifically, we first consider three feedforward examples with 𝒩 = 3 [Fig. 4(a)]. The first two examples have 𝒫 = 3, and the third has 𝒫 = 2. In the first example, we will assume that the driven neuron does not respond to the first two stimulus patterns, but responds positively to the third pattern. So we have two semiconstrained and one constrained dimension,
(17) |
In contrast, in the second example we will have two constrained and one semiconstrained dimension,
(18) |
The final example will feature one unconstrained, one semiconstrained, and one constrained dimension,
(19) |
For technical simplicity we will consider orthonormal input patterns, X−1 = XT , which implies that
(20) |
where δμν is the Kronecker δ function, which equals 1 if μ = ν and 0 if μ ≠ ν, so . This trivially implies that the η coordinates are related to the synaptic coordinates via a rotation, so the spherical biological bound on the physical coordinates transforms to an identical spherical bound on the η coordinates:
(21) |
A. Problem 1
Let us first focus on the example with two semiconstrained and one constrained dimension, whose solution space is depicted in deep yellow in Fig. 4(b). Suppose we are interested in assessing whether the w1 synapse is certain. Since the w1 = 0 plane divides the weight space into the positive and the negative halves, the synapse will be certain if this plane does not intersect with the solution space, which clearly depends on the orientation of the plane relative to the various η directions [Fig. 4(b)]. It is thus useful to consider how the w1 = 0 plane’s unit normal vector pointing toward positive weights, , is oriented relative to the η directions. For ease of graphical illustration, here we assume the specific orientation diagramed in Figs. 4(b) and 4(c). Using Eq. (12) and the orthogonality of X, we can parametrize as
(22) |
where
(23) |
[Figs. 4(b) and 4(c)]. Geometrically, and are unit vectors along the projections of onto the constrained and semiconstrained subspaces [Fig. 4(b)]. Thus, cos θ ⩾ 0 and sin θ ⩾ 0, making θ an acute angle. In this example, γ is also an acute angle, as depicted in Fig. 4(c).
Note that all solutions lie within the two-dimensional semiconstrained subspace having η3 = y3. The w1 = 0 plane intersects this semiconstrained subspace as a line [Figs. 4(b) and 4(c)], and its equation in η coordinates is
(24) |
From the geometry of the problem [Fig. 4(c)], it is clear that if the perpendicular distance, ds, from the origin to this line is large enough, then it will not intersect the all-negative quadrant of the semiconstrained subspace within the weight bound. According to simple trigonometry, this occurs when
(25) |
where is the radius of the semiconstrained subspace containing the solutions. The perpendicular distance can be identified from Eq. (24) as
(26) |
Substituting this expression for ds into Eq. (25), one finds through simple algebra that the w1 = 0 hyperplane does not intersect the solution space, and hence the synapse is certain, if the response magnitude exceeds a critical value,
(27) |
which we generally refer to as y-critical.
Notice that if θ increases in Fig. 4(b), then the orange line in Fig. 4(c) comes closer to the origin, making it intersect with the solution space for more γ angles. Therefore, the synapse is more difficult to identify, and indeed Eq. (27) shows that ycr increases. However, if γ increases, then the orange line in Fig. 4(c) rotates away from the solution space, making the synapse easier to identify with small ds. Accordingly, ycr decreases.
It will turn out that the concept of y-critical is general, and ycr can always be expressed in terms of projections of ê along several specific directions. In this example, if we define es* and ey to be projections of along, and respectively, then it is easy to check that one can re-express ycr as
(28) |
We will later discover that these projections are closely related to correlations between pre-synaptic and postsynaptic neuronal activity patterns. Thus, the expressions in Eq. (28) will provide a deeper understanding of the determinants of synapse certainty.
B. Problem 2
Having identified two key angles, θ and γ, that play a role in synapse certainty, let us look at the example of two constrained and one semiconstrained dimensions to uncover other important geometric quantities. In this case, the solution space is a ray defined by η1 = y1, η2 = y2, and −∞ < η3 0, and the magnitude of η3 is at most
(29) |
for solutions within the weight bound [Fig. 4(d)]. Figure 4(d) shows a geometry where the w1 = 0 plane intersects the solution space at the point
(30) |
Now we must have
(31) |
as the intersection point lies on the w1 = 0 plane by definition, where we have defined as in the previous toy problem. The projection directions of onto the constrained and semiconstrained subspaces are given by
(32) |
[Fig. 4(d)]. Then combining Eqs. (22) and (32), we can find an equation to determine η3 at the intersection point
(33) |
We next introduce α to represent the angle between and [Fig. 4(e)], such that
(34) |
where . The first two terms in Eq. (33) can then be trigonometrically combined with a difference of angles identity to arrive at
(35) |
To be able to identify the sign of w1, this intersection point must lie beyond the weight bounds of the solution line segment, so . After some straightforward algebra we obtain the certainty condition as
(36) |
From the geometry of the problem in Figs. 4(d) and 4(e), one sees that as θ or α increases, the point where the orange hyperplane intersects the yellow line is closer to the origin. Indeed, ycr increases, making it more difficult to identify the synapse sign. Again, one can re-express ycr as Eq. (28) in terms of projections, with the role of being played by .
C. Problem 3
Through the two above examples we found three angles, θ,α, and γ, that determine how large the response of the driven neuron has to be in order for a given synapse to be certain. However in both examples the number of patterns were equal to the number of synapses, 𝒫 = 𝒩. When 𝒫 < 𝒩, we have unconstrained dimensions, and the projection of the vector into the unconstrained subspace will also matter, because it relates to how much we do not know about the response properties of the driven neuron.
Here we consider a 𝒩 = 3 example with one constrained, one semiconstrained, and one unconstrained dimension rection as a linear combination of its projections along the [Fig. 4(f)]. In this case, we can express the synaptic diconstrained, semiconstrained and unconstrained dimension as
(37) |
where we can always choose the directions of the unit vectors to make θ and φ acute angles. For the example shown in Fig. 4(f), this is achieved by choosing
(38) |
Obtaining the certainty condition again involves ascertaining whether the w1 = 0 hyperplane intersects the deep yellow solution space [Fig. 4(f)]. In the example of Fig. 4(f), one can see that increasing the driven neuron response moves the yellow plane up, and there will come a critical point when the orange w1 = 0 plane just touches the solution space at the corner (η1 = 0, η2 = ycr, η3). Thus,
(39) |
Since this corner point has a negative η3 component and lies on the bounding sphere, we must also have
(40) |
[Fig. 4(f)]. Substituting in the w1 = 0 plane equation,
(41) |
we can then determine ycr through simple algebra as
(42) |
The final result now depends on the two acute orientation angles, θ and φ. By inspection of Fig. 4(f) or Eq. (42), it is clear that ycr increases if either θ or φ increases toward π/2. One therefore needs a larger response (y3) to make the synapse certain. We can again express ycr in terms of projections
(43) |
where eu is the projection of along , and es* does not appear because the intersection occurred at the origin of the semiconstrained subspace.
V. CERTAIN SYNAPSES, THE GENERAL TREATMENT
A. High-dimensional feedforward networks
We have seen in the previous section how geometric considerations can identify synapses that must be present to generate observed response patterns in small networks. One can similarly ask when a synapse is required in high-dimensional networks [Fig. 5(a)].
Although the rigorous derivation is intricate, this certainty condition is remarkably simple for orthonormal X (Appendix A). Quantitatively, orthonormal X imply that only a few parameters matter for the certainty condition, each illustrated in the previous section and abstractly summarized in Fig. 5(b).
For any given synapse, its physical basis vector, , can always be written as a sum of components in the constrained, semiconstrained, and unconstrained subspaces,
(44) |
where , and denote the partial sums over μ in the constrained, semiconstrained, and unconstrained subspaces, respectively. Note that are orthogonal unit vectors if and only if X is an orthogonal matrix. In this case, the decomposition of is a sum of three orthogonal vectors that can be parameterized by two angles,
(45) |
where , , and are unit vectors in the constrained, semiconstrained, and unconstrained subspaces, and (θ, ϕ) are spherical coordinates6 specifying the orientation of with respect to these subspaces [e.g., Fig. 4(f)]. In particular,
(46) |
As we have seen in the toy examples, these two orientation angles heavily influence whether the synapse is certain.
Additionally, because the solution space’s height along [e.g., Fig. 4(b)] is controlled = by the angle between and , the equation for the wm = 0 hyperplane that divides the positive and negative synaptic regions in the solution space depends on
(47) |
where y is the length of and α is the angle between and [Fig. 4(e)]. Finally, there is another critical angle, which we call γ, that encodes how is oriented with respect to the solution space in the semiconstrained subspace. Using a more convenient direction, , which is either along or opposite to the direction, we define γ to be the minimal angle between and the solution space [e.g., Fig. 4(b)]. It is generally given by
(48) |
(Appendix A), where is the μth component of , and we have suppressed m to avoid cluttered notation. Although this definition and equation for γ may initially appear opaque, we soon clarify its meaning in terms of interpretable projections of the synapse vector.
Putting all the pieces together, we find that the mth synapse must be present, and its sign is unambiguous, if and only if y exceeds the critical value
(49) |
(Appendix A). Intuitively, W bounds the magnitude of weight vectors, and largeW increase ycr by admitting more solutions. Note that a synapse is certain, for a given y, when the weight bound is less than a critical value,
(50) |
Finally, we note that we must have W ⩾ y for any solutions to exist. One can straightforwardly obtain the special cases Eqs. (27), (36), and (42), by substituting α = φ = 0, γ = φ = 0, and α = cosγ = 0 in the general expression given by Eq. (49).
The geometric description of Eq. (49) can be written more intuitively as
(51) |
(Appendix A), where is the unit vector in the solution space that is most aligned with [e.g., Fig. 4(b)], and ey, es*, and eu are the projections of onto , , and [Fig. 5(b)]. Indeed, Eqs. (28) and (43) canbe readily recognized as special cases of the above general expression.
Each of these projections is interpretable in light of the fact that xμm represents the activity level of the mth presynaptic neuron in the μth response pattern. Most simply,
(52) |
is a normalized correlation of the pre- and postsynaptic activity (note that As expected, synapse certainty is aided by large magnitudes of ey. Moreover, the sign of a certain synapse is the sign of this correlation, or equivalently the sign of ey. Synapse sign identifiability is hindered by large values of
(53) |
which effectively measures the weakness of the presynaptic neuron’s activity, as it is the amount of presynaptic drive for which we do not have any information on the target neuron’s response. The more subtle quantity is
(54) |
The condition that selects for patterns where the sign of the presynaptic activity is Sgn(cosα) = Sgn(ey), but the postsynaptic neuron does not respond. In other words, presynaptic activity should have promoted a response in the target neuron according to the observed activity correlation. That it does not generates uncertainty in the sign of the synapse. See Appendix A for a heuristic derivation of ycr based on this argument.
We can gain more useful intuition by interpreting our result in relation to what we would obtain in a linear neural network. In the linear problem, there are only constrained and unconstrained dimensions; every dimension that was semiconstrained in the nonlinear problem becomes constrained, with all solutions having ημ = yμ for μ = 1, …, 𝒫. This implies that
(55) |
Returning to the nonlinear problem, recall that the certainty condition finds the largest y for which the solution space and wm = 0 hyperplane intersect within the weight bound, and this intersection is simply a point when y = ycr. Importantly, each semiconstrained dimension can either behave like a linear constrained dimension with ημ = yμ = 0 at this intersection point (toy problems 1 and 3), or like an unconstrained dimension with ημ < 0 at the intersection point (toy problems 1 and 2).7 The first case occurs when and have opposite signs and ; the second case occurs when they have the same sign and . This means that one could compute the nonlinear theory’s y-critical from ycr,lin by appending the second class of semiconstrained dimensions onto the unconstrained dimensions. Mathematically, this corresponds to the replacement
(56) |
which indeed transforms Eq. (55) to Eq. (51). The role of is to quantify the uncertainty introduced by the subset of semiconstrained dimensions that do not behave as constrained at the intersection point.
Since the parameters ey,es*, and eu cannot be set independently, it is convenient to reparameterize Eq. (51) as
(57) |
where , and all three composite parameters can be independently set between 0 and 1. Conceptually, ry and rs* merely normalize ey and es* by their maximal values, and ep is the projection of into the activity-constrained subspace spanned by both constrained and semiconstrained dimensions. One could also interpret rs* as quantifying the effect of threshold nonlinearity. For instance, rs* = 0 describes the case where all semiconstrained dimensions are effectively constrained, but rs* increases as some of the semiconstrained dimensions start to behave like unconstrained dimensions. As expected, ycr is a decreasing function of and and an increasing function of [Fig. 5(c)].
B. Regarding nonorthogonal input patterns
While a complete treatment of the certainty condition for generally correlated input patterns is beyond the scope of this paper, we could find a conservative bound for y-critical that may be useful when patterns are close to being orthogonal. The details of the derivation are discussed in the final subsection of Appendix A.
The major challenge caused by nonorthogonal patterns is that the spherical weight space becomes elliptical in terms of the η coordinates. Thus, the main idea behind the bound is that one can always find the sphere that just encompasses this ellipse. We can then use our formalism to obtain a conservative y-critical, such that if the norm of is larger than this value then all solutions within the encompassing sphere have a consistent sign for the synapse under consideration. An interesting insight that emerges from our analysis is that the relative orientations between
(58) |
and the various important η directions play the role of θ, φ, α and γ (Appendix A). Note that when X is an orthogonal matrix. We anticipate that will also be an important player in a more comprehensive treatment of nonorthogonal patterns.
C. Application to recurrent networks
As we explained in Sec. III, to find the ensemble of all incoming weight vectors onto the ith driven neuron, one can use the results obtained for the feedforward network and just substitute X with the Z(i) matrix. Consequently, identifying certain synapses onto the ith neuron can follow the route outlined for the feedforward scenario as long as Z(i) is orthogonal. So for example, if we want to ascertain whether any incoming synapse to the ith neuron is certain, we have to replace and yμ → yμi in Eqs. (51)–(54) to compute ycr.
D. Numerical illustration of the certainty condition
To illustrate and test the theory numerically, we first considered a small neural network of three input neurons and three driven neurons [Fig. 6(a)]. This small number of synapses meant that we could comprehensively scan the entire spherical weight space without relying on a numerical algorithm to find solutions.8 This is important because numerical techniques, such as gradient descent learning, potentially find a biased set of solutions that incompletely test the theory. We supposed that each driven neuron has three inputs, and we constrained weights with two orthonormal stimulus responses. We set W = 1 for all simulations and numerically screened weights randomly. See Appendix F for complete simulation details.
The first driven neuron in Fig. 6(a), y1, receives only feedforward drive, and we suppose that it responds to one stimulus condition with response y (μ = 2), but it does not respond to the other (μ = 1). Its synapses thus have one constrained, one semiconstrained, and one unconstrained dimension, and all of the terms in Eq. (49) contribute to y-critical. We could thus use y1 to verify Eq. (49). Moreover, this scenario includes the illustrative example of Fig. 4(f) as a special case, so we could also use y1 to verify Eq. (42).
To these ends, we decided to focus on a two-parameter family of input patterns,
(59) |
where rows correspond to different input patterns and columns correspond to different input neurons, as usual, and we extend x to the full-rank orthogonal matrix
(60) |
By Eq. (44), the physical basis vector corresponding to the synapse from the first input neuron is thus
(61) |
and it has the same general form as Eqs. (37) and (45), where plays the role of . If ψ and χ are both acute, then one can identify them with θ and φ in Fig. 4(f), and the roles of and are played by and , respectively. In this case α = 0, cosγ = 0, and the theoretical dependencies of ycr on θ and φ are given by Eq. (42). Figure 6(b) illustrates these dependencies as the purple and dark green curves. If ψ is acute, but χ is obtuse, then according to our conventions, θ = ψ, φ = π – χ, , and . Now α = 0 and cosγ = 1, and our general formula, Eq. (49), implies
(62) |
These dependencies are plotted as the pink and the light green curves in Fig. 6(b). We do not plot cases where ψ is obtuse, because obtuse and acute ψ result in equivalent ycr formulas. Whether ψ is acute or obtuse nevertheless matters because it determines the sign of the w1 synapse when it is certain.
The black dots in Fig. 6(b) show the largest response magnitude, y, for which we numerically found solutions with both positive and negative w1 (see Appendix F for numerical methods), thereby providing a numerical estimate of ycr. The theoretical curves and numerical points precisely aligned in all cases. The differences between the light and dark theoretical curves illustrates the effect of nonlinearity. When χ is obtuse, the semiconstrained dimension effectively behaves as unconstrained, and the mixing angle between the semiconstrained and unconstrained dimension is irrelevant to y-critical. When χ is acute, the semiconstrained dimension effectively behaves as constrained, as if its coordinate were set to zero. Moreover, these results confirmed that stronger responses were needed to make synapses fixed sign when the synaptic direction was less aligned with the constrained dimension [Fig. 6(b), purple and pink]. Furthermore, smaller y-critical values occurred when the synaptic direction anti-aligned with the semiconstrained dimension [Fig. 6(b), purple versus pink, dark green versus light green].
We next wanted to check the validity of our results for the recurrently connected neurons in Fig. 6(a). We therefore needed to tailor the steady-state activity levels of the recurrent network to result in orthogonal presynaptic input patterns for each driven neuron. In mathematical terms, Z(i) must be an orthogonal matrix for i = 1, 2, 3. We achieved this by considering a two-parameter family of driven neuronal responses in which the activity patterns of y2 and y3 were matched to those of x1 and x3, respectively. This construction means that all three driven neurons receive the same input patterns. To ensure positivity of driven neuronal responses, we set χ as an acute angle and ψ as the negative of an acute angle.
Although y2 has both feedforward and recurrent inputs, we can analyze its connectivity in exactly the same way as y1. Recurrence only complicates the analysis for neurons that synapse onto themselves, like y3, since changing the output activity also changes the input drive. So and ycr are not independent. Here we focused on the certainty condition for the self-synapse, wy3,y3, for which ycr = cos χ, and . Therefore, the synapse should be certain if 45° < χ ⩽ 90°. Since θ = π/2 − χ according to our conventions,9 this is equivalent to 0⩽ θ < 45° [Fig. 6(c), top]. Our numerical results precisely recapitulated these theoretical expectations [Fig. 6(c), bottom], as the self-connection was consistently positive across all simulations whenever this condition on θ was met. See Appendix E for certainty condition analyses for other synapses onto y3 and Appendix F for complete simulation details.
VI. ACCOUNTING FOR NOISE
A. Finding the solution space in the presence of noise
So far we have only considered exact solutions to the fixed point equations. However, it is also important to determine weights that lead to fixed points near the specified ones. For example, biological variability and measurement noise generally make it infeasible to specify exact biological responses. Furthermore, numerical optimization typically produces model networks that only approximate the specified computation. We therefore define the ℰ-error surface as those weights that generate fixed points a distance ℰ from the specified ones,
(63) |
where yμi is the specified activity of the ith driven neuron in in the μth fixed point, and is the corresponding activity level in the fixed point approached by the model network when it is initialized as yi(t = 0) = yμi. If the network dynamics do not approach a fixed point, perhaps oscillating or diverging instead [54], we say ℰ = ∞.
Each ℰ-error surface can be found exactly for feedforward networks. For illustrative purposes, let us first consider the 𝒟 = 1 feedforward scenario in which the driven neuron is active in every response pattern. This means that yμ > 0 for all μ = 1, …, 𝒫, and we can reorder the μ indices to sort the driven neuron responses in ascending order, 0 < y1 < y2 < ⋯ < y𝒫. Here we assumed that no two response levels are exactly equal, as is typical of noisy responses. Since all responses are positive, the zero-error solution space has no semiconstrained dimensions, and the only freedom for choosing w is in the 𝒰 = 𝒩 − 𝒫 unconstrained dimensions. Therefore, the zero-error surface of exact solutions, 𝒱0, is a 𝒰-dimensional linear subspace, and 𝒱0 is a point in the 𝒫-dimensional activity-constrained subspace.
How does this geometry change as we allow error? For 0 < ℰ < y1, we must have for all μ. Therefore, the nonlinearity is irrelevant, and ℰ-error surfaces are spherical in the activity-constrained η coordinates [Eq. (63), Fig. 7(a(i))]. However, once ℰ = y1 it becomes possible that , and suddenly a semi-infinite line of solutions appears with η1 ⩽ 0. As ℰ further increases, this line dilates to a high-dimensional cylinder [Fig. 7(a(ii))]. A similar transition happens at ℰ = y2, whereafter two cylinders cap the sphere [Fig. 7(a(iii))]. Things get more interesting as ℰ increases further because two transitions are possible. A third cylinder appears at ℰ′ = y3. However, at it is possible for both and to be zero, and the two cylindrical axes merge into a semi-infinite hyperplane defined by η1 ⩽ 0, η2 ⩽ 0. Thus, when ℰ″ < ℰ′ the error surface grows to attach a third cylinder [Fig. 7(a(iv))], and when ℰ″ < ℰ′ the two cylindrical surfaces merge to also include planar surfaces in between [Fig. 7(av)]. These topological transitions continue by adding new cylinders and merging existing ones, and the sequence is easily calculable from {yμ}. Note that we use the terminology “topological transition” to emphasize that the structure of the error surface changes discontinuously at these values of error. The geometric transitions we observe here also relate to topological changes in a formal mathematical sense. For instance, while there are no incontractible circles in Fig. 7(a(ii)), one develops as we transition to Fig. 7(a(iii)).
In general, yμ may also be zero or negative in the presence of noise. Whenever yμ = 0, the μth response pattern generates a semiconstrained dimension in 𝒱0. However, if some response levels are negative, then there are no exact solutions at all. However, it becomes possible to find solutions when , and each response pattern associated with a negative yμ acts as a semiconstrained dimension in 𝒱ℇ. As illustrated above, more semiconstrained dimensions open up as more error is allowed in each of these cases.
This geometry only approximates ℰ-error surfaces for recurrent networks (Appendix C). For instance, displacing yμi from its specified value changes the input pattern that define the directions for downstream driven neurons, but this effect is neglected here. We will nevertheless find that this feedforward approximation to ℰ-error surfaces is practically useful for predicting synaptic connectivity in recurrent networks as well.
B. Predicting connectivity in the presence of noise
The threshold nonlinearity and error-induced topological transitions can have a major impact on synapse certainty [Fig. 7(b)]. For example, one might model a neuronal dataset with a linear neural network and find that models with acceptably low error consistently have positive signs for some synapses. However, if measured neural activity was sometimes comparable to the noise level, then semiconstrained dimensions could open up that suddenly make some of these synapse signs ambiguous [Fig. 7(b), left]. Although semiconstrained dimensions can never make an ambiguous synapse fully unambiguous, semiconstrained dimensions can heavily affect the distribution of synapse signs across the model ensemble by providing a large number of solutions that have consistent anatomical features [Fig. 7(b), right].
We therefore generalized the certainty condition to include the effects of error, including topological transitions in the error surface (Appendix C). As before, finding the certainty condition amounts to determining when the wm = 0 hyperplane intersects the solution space within the weight bound, but to account for noise of magnitude ε, we must now check whether an intersection occurs with any ℰ-error surface with ℰ ⩽ ε. No intersections will occur if and only if every nonnegative within ε of the provided 𝒫-vector of noisy target neuron activity [Fig. 3(c)] satisfies its zero-error certainty condition, and each is a possible denoised version of it [Eq. (63)]. We thus define y-critical in the presence of noise as the maximal ycr [Eq. (51)] among this set of .
Although we lack an exact expression for y-critical in the presence of noise, we derived several useful bounds and approximations (Appendix C). We usually focus on a theoretical upper bound for y-critical, ycr,max. Note that this upper bound suffices for making rigorous predictions for certain synapses, because y > ycr,max ⇒ y > y-critical. In the absence of topological transitions, this formula is
(64) |
We also computed a lower bound, ycr,min, to assess the tightness of the upper bound. This bound is
(65) |
without topological transitions. Both bounds increase with error and should be considered to be bounded above byW. As expected, both expressions reduce to Eq. (51) as ε/W → 0. We also note that the two bounds coincide, to leading order in ε/W, if ey ≪ max(es*, eu) and ep/max(es*, eu) = 𝒪(1), and we argue in Appendix B that this is typical when the network size is large.
The effect of topological transitions is that ycr,max and ycr,min become the maximums of several terms, each corresponding to a way that constrained dimensions could behave as semiconstrained within the error bound (Appendix C). We compute each term from generalizations of Eqs. (64) and (65) that account for the amount of error needed to open up semiconstrained dimensions.
C. Testing the theory with simulations
To examine our theory’s validity, we assessed its predictions with numerical simulations of feedforward and recurrent networks [Fig. 8(a)]. Each assessment used gradient descent learning to find neural networks whose late time activity approximated some specified orthogonal configuration of input neuron activity and driven neuron activity (Appendix F). We then used our analytically derived certainty condition with noise to identify a subset of synapses that were predicted to not vary in sign across the model ensemble (W = 1), and we checked these predictions using the numerical ensemble. We similarly checked predictions from simpler certainty conditions that ignored the nonlinearity or neglected topological transitions in the error surface (Appendix C). Note that we expected gradient descent learning to often fail at finding good solutions in high dimensions, as our theory predicts that each semiconstrained dimension induces local minima in the error surface [Fig. 7(a)]. Since we did not want the theory to bias our numerical verification of it, we focused our simulations on small to moderately sized networks, where we could reasonably sample the initial weight distribution randomly. Future work will consider more realistic neural network applications.
We first considered feedforward network architectures, for which our analytical treatment of noise is exact. To illustrate how nonlinearity and noise affect synapse certainty, we calculated the magnitude of postsynaptic activity needed to make a particular synapse sign certain [Fig. 8(b)]. We specifically considered 102 random input-output configurations of a small feedforward network with 6 input neurons (𝒫 = 5, 𝒞 = 2), which were tailored to have orthonormal input patterns and generate one topological error surface transition at small errors. In particular, we generated random orthogonal matrices by exponentiating random antisymmetric matrices, we set one element of to a small random value to encourage the topological transition, and we ensured that the other nonzero random element of was large enough to preclude additional transitions (Appendices C and F). For each input-output configuration, we then systematically varied the magnitude of driven neuron activity, y, finding 105 synaptic weight matrices with moderate error, ℰ2 ≈ ε2, for each magnitude y. Since randomly screening a six-dimensional synaptic weight space is not numerically efficient, we applied gradient descent learning. Nevertheless, the small network size meant that we could comprehensively sample the solution space and numerically probe the distinct predictions made by each bound or approximation used to estimate y-critical.
As expected, the maximum value of y that produced numerical solutions with mixed synapse signs [Fig. 8(b), black dots] was always below the theoretical upper bound for y-critical [Fig. 8(b), black line]. In contrast, mixed-sign numerical ensembles were often found above theoretical y-critical values that neglected topological transitions in the error surface [Fig. 8(b), yellow line] or that neglected the nonlinearity entirely [Fig. 8(b), cyan line]. This means that these simplified calculations for estimating y-critical make erroneous predictions, because the synapse sign is supposed to be exclusively positive or negative whenever y exceeds y-critical, by definition. Therefore, we were able to accurately assess synapse certainty, and this generally required us to include both the nonlinearity and noise-induced topological transitions in the error surface.
We next asked how often we could identify certain synapses in larger networks. For this purpose, we generated 25 random input-output configurations in the feedforward setting (Appendix F), again with orthonormal input patterns, but this time we increased the number of input neurons from 4 to 100 across the configurations [Fig. 8(c)]. As we increased the size of the network, we kept 𝒞/𝒩 fixed at 0.25 and 𝒫/𝒩 fixed at 1 [Fig. 8(c), brown] or 0.5 [Fig. 8(c), purple]. These scaling relationships put our simulations in the setting of high-dimensional statistics [65], where both the number of parameters and the number of constraints increase with the size of the network. In this high-dimensional regime, a simple heuristic argument suggests that the number of zero-error certain synapses should scale linearly with the number of synapses (Appendix B), because ycr and the typical magnitude of y scale equivalently with 𝒩. Here we tested this prediction by setting randomly, setting y = 1 − ln 2/-𝒞 to approximate the median norm of vectors in the unit 𝒞-ball (Appendix B), and numerically finding a small error solutionC for each configuration (ℰ2/𝒫 ≈ 10−6).
As expected, we empirically found that the number of certain synapses predicted by the theory [Fig. 8(c), solid lines] scaled with the network size linearly [Fig. 8(c), dashed lines]. The jaggedness of the solid curves reflect the fact that each point is specific to the random input-output configuration constructed for that value of 𝒩. The purple curve corresponds to the case when 𝒩 = 2𝒫 = 4𝒞 and the brown curve when 𝒩 = 𝒫 = 4𝒞. Furthermore, for every certain synapse predicted, we verified that its predicted sign was realized in the numerical solution we found [Fig. 8(c), circles]. These results suggest that the theory will predict many synapses to be certain in realistically large neural systems.
Finally, we empirically tested our theory for a recurrent network [Figs. 8(d) and 8(e)], where our treatment of noise is only approximate. For this purpose, we considered networks without the self-coupling terms, 𝒩 = ℐ + 𝒟 − 1. We constructed a single random configuration with nonnegative driven neuron responses and orthogonal presynaptic patterns for one of the driven neurons10 (Appendix F). This driven neuron could thus serve as the target neuron for our analyses. Note that it is sometimes possible to orthogonalize the input patterns for more than one driven neuron, but this is irrelevant to our analysis and is not pursued here. We then used gradient descent learning to find around 4500 networks that approximated the desired fixed points with variable accuracy. For technical simplicity, we first found connectivity matrices using a proxy cost function that treated the network as if it were feedforward. We then simulated the neural network dynamics with these weights and correctly evaluated the model’s error as prescribed by Eq. (63).
This network ensemble revealed that constrained and semiconstrained dimensions accurately explained the structure of the solution space for recurrent networks with nonzero error. Figure 8(d) shows the projection of the corresponding solution space along two η directions, one predicted to be constrained by the feedforward theory and the other predicted to be semiconstrained. As predicted, the extension of the solution space along the negative semiconstrained direction was clearly discernible. However, recurrence implies that the exact solution space is not perfectly cylindrical around the semiconstrained axes (Appendix C), because the driven neuron inputs to the target neuron can themselves vary due to noise. Here this effect was empirically insignificant, and the geometric structure of the solution space conformed rather well to our feedforward prediction. One might have expected the error [color in Fig. 8(d)] to increase monotonically as one moves away from the center of semiconstrained cylinder, but this expectation is incorrect for two reasons. First, we are visualizing the error surface as a projection along two dimensions, yet variations in other η coordinates add variation to the error.11 Second, we are visualizing the solution space for one target neuron, but other driven neurons in the recurrent network contribute to the summed error represented by the color.
Moreover, the theory correctly predicted how the number of certain synapses would decrease as a function of ε [Fig. 8(e)], and we never found a numerical violation of the theoretical certainty condition that included nonlinearity and noise. In Fig. 8(e), the yellow circles represent the number of certain synapses that were predicted by the theory and verified to have synapse signs that agreed with the theoretical prediction. Here accurate predictions did not require us to account for topological error surface transitions. In contrast, although our simulations usually agreed with the predictions of the linear theory [Fig. 8(e), cyan circles], they could also disagree. In Fig. 8(e), the blue crosses indicate configurations where the linear theory incorrectly predicted some synapse signs. The absence of red crosses reiterates the consistency of predictions coming from the nonlinear treatment.
VII. DISCUSSION
In summary, we enumerated all threshold-linear recurrent neural networks that generate specified sets of fixed points, under the assumption that the number of candidate synapses onto a neuron is at least the specified number of fixed points. We found that the geometry of the solution space was elegantly simple, and we described a coordinate transformation that permits easy classification of weight-space dimensions into constrained, semiconstrained, and unconstrained varieties. This geometric approach also generalized to approximate error-surfaces of model parameters that imprecisely generate the fixed points. We used this geometric description of the error surface to analyze structure-function links in neural networks. In particular, we found that it is often possible to identify synapses that must be present for the network to perform its task, and we verified the theory with simulations of feedforward and recurrent neural networks.
Rectified-linear units are also popular in state of the art machine learning models [29,66–68], so the fundamental insights we provide into the effects of neuronal thresholds on neural network error landscapes may have practical significance. For example, machine learning often works by taking a model that initially has high error and gradually improving it by modifying its parameters in the gradient direction [69]. However, error surfaces with high error can have semiconstrained dimensions that abruptly vanish at lower errors (Fig. 7). Local parameter changes typically cannot move the model through these topological transitions, because models that wander deeply into semiconstrained dimensions are far from where they must be to move down the error surface. The model has continua of local and global minima, and the network needs to be initialized correctly to reach its lowest possible errors. This could provide insight into deep learning theories that view its success as a consequence of weight subspaces that happen to be initialized well [70,71].
The geometric simplicity of the zero-error solution space provides several insights into neural network computation. Every time a neuron has a vanishing response, half of a dimension remains part of the solution space, which the network could explore to perform other tasks. In other words, by replacing an equality constraint with an inequality constraint, simple thresholding nonlinearities effectively increase the computational capacity of the network [72,73]. The flexibility afforded by vanishing neuronal responses thereby provides an intuitive way to understand the impressive computational power of sparse neural representations [50,74–76]. Furthermore, the brain could potentially use this flexibility to set some synaptic strengths to zero, thereby improving wiring efficiency. This would link sparse connectivity to sparse response patterns, both of which are observed ubiquitously in neural systems.
Our theory could be extended in several important ways. First, we only derived the certainty condition to identify critical synapses from orthonormal sets of fixed points. Although our orthogonal analysis also provides a conservative bound for a general set of fixed points (Appendix A), a more precise analysis will be needed to pinpoint synapses in realistic biological settings where stimulus-induced activity patterns may be strongly correlated. Since our error surface description made no orthonormality assumptions, this analysis will only require more complicated geometrical calculations to discern whether the synapse sign is consistent across the space of low-error models. Furthermore, we could use the error surfaces to identify multisynapse anatomical motifs that are required for function, or to estimate the fraction of models in which an uncertain synapse is excitatory versus inhibitory. It would also be interesting to relax the assumption that the number of fixed points is small. This would allow us to consider scenarios where the fixed points can only be generated nonlinearly. We could also consider cases where no exact solution exists at all. Here we assumed that we knew the activity level of every neuron in the circuit. This is not always the case, and it will be important to determine how unobserved neurons alter the error landscape for synaptic weights connecting the observed neurons. The error landscape geometry will also be affected by recurrent network effects that we ignored here (Appendix C). It will be interesting to see whether the geometric toolbox of theoretical physics can provide insights into the nontrivial effects of unobserved neurons and recurrent network dynamics. Finally, we note that it will sometimes be important to analyze networks with alternate nonlinear transfer functions. Our analyses already apply exactly to recurrent networks with arbitrary threshold-monotonic nonlinear transfer functions (Appendix D). Moreover, our analyses can approximate any nonlinearity by treating its departures from threshold-linearity as noise (Appendix D). An extension to capped rectified linear units [67], which saturate above a second threshold, would also be straightforward. In particular, semiconstrained dimensions would emerge from any condition where the target neuron is inactive or saturated.
Our primary motivation for undertaking this study was to find rigorous theoretical methods for predicting neural circuit structure from its functional responses. This identification can be used to corroborate or broaden circuit models that posit specific connectivity patterns, such as center-surround excitation-inhibition in ring attractors [16–18] or contralateral relay neuron connectivity in zebrafish binocular vision [12,77]. More generally, if an experimental test violates the certainty conditions we derived using our ensemble modeling approach, it will suggest that some aspect of model mismatch is important. We could then move on to the development of qualitatively improved models that might modify neuronal nonlinearities, relax weight bounds, incorporate subcellular processes or neuromodulation, or hypothesize hidden cell populations. However, we hope that our focus on predictions that follow with certainty from simple network assumptions will enable predictions that are relatively insensitive to minor mismatches between our abstract model and the real biological brain. More nuanced predictions may require more nuanced models.
An important parameter of the theory is the weight bound. In particular, W bounds the magnitude of synaptic weight vectors in biological networks, and our certainty condition declares a synapse to be necessary when the ratio y/W exceeds a critical value. It is not a priori clear how to set this scale parameter without additional biological data. Nevertheless, one could use the neuronal activity data to compute each synapse’s W-critical value, below which the certainty condition is satisfied, and rank-order the synapses according to decreasing W-critical values. Until we know the value of W, we do not know where to draw the line between certain synapses and uncertain synapses. However, our theory predicts that all of the certain synapses will be at the top of the list, which specifies a sequence of experimentally testable predictions and may already provide biological insights into the important synaptic connections. Testing these predictions can help constrain the theory’s biological bound parameter.
Our theory describes function at the level of neural representations. This description is useful because many systems neuroscience experiments measure representations directly, and it is important to build mechanistic models that explain these data in terms of neural network interactions [12,15,18,77]. However, it would also be interesting to link structure to function at the higher levels of behavior and cognition. This is a significantly different problem because multiple representations can support the same high-level functions, and both neural network structure and representation can change over time [78–85]. Consequently, experimental tests of our current framework must measure network structure and representation on timescales shorter than the network’s representational dynamics, and certain synapses may be most biologically meaningful in innate circuits with limited plasticity. Extensions to our framework may also be useful for relating structural and representational dynamics in circuits for learning [86].
An exciting prospect is to explore how our ensemble modeling framework can be combined with other theoretical principles and biological constraints to obtain more refined structure-function links. For instance, we could refine our ensemble by restricting to stable fixed points. Alternatively, once the sign of a given synapse is identified, Dale’s principle might allow us to fix the signs of all other synapses from this neuron [87]. This would restrict the solution space and could make other synapses certain. Utilizing limited connectomic data to impose similar restrictions might also be a fruitful way to benefit from large-scale anatomical efforts [7,10,13,14]. Finally, rather than restricting the magnitude of the incoming synaptic weight vector, we could consider alternate biologically relevant constraints, such as limiting the number of synapses, minimizing the total wiring length, or positing that the network operates at capacity [88,89]. These changes would modify the certainty conditions in our framework, as well as our experimental predictions. We could therefore assess candidate optimization principles and biological priors experimentally. While the base framework developed here was designed to identify crucial network connections required for function, we hope that our approach will eventually allow us to assess theoretical principles that determine how neural network structure follows from function.
ACKNOWLEDGMENTS
The authors thank Tianzhi (Lambus) Li, Srini Turaga, Andrew Saxe, Ran Darshan, and Larry Abbott for helpful discussions and comments on the manuscript. This work was supported by the Howard Hughes Medical Institute and the Janelia Visiting Scientist Program.
APPENDIX A: A CERTAINTY CONDITION TO PINPOINT SYNAPSES REQUIRED FOR SPECIFIED RESPONSE PATTERNS
1. Preliminaries
For completeness, we begin by briefly reviewing a few central concepts from the main manuscript.
a. From recurrent to feedforward networks
Let us consider a neural network of ℐ input neurons that send signals to an interconnected population of 𝒟 driven neurons governed by dynamical equations (6), as described in the main manuscript. At steady-state, since all time-derivatives are zero, Eq. (6) yields
(A1) |
where, as prescribed in the main manuscript, yμi and xμm denote steady-state activity levels of the driven and input neurons to the μth stimulus, which we have combined into zμm, and 𝒩 is the number of incoming synapses onto each of the driven neurons. Equation (A1) provides 𝒟 × 𝒫 nonlinear equations for 𝒟 × 𝒩 unknown parameters. However, we immediately notice that the steady-state activity of neuron i depends only on the ith row of the connectivity matrix, so these equations separate into 𝒟 independent sets of 𝒫 equations with 𝒩 unknowns, the weights onto a given driven neuron. In other words, the recurrent network involving 𝒟 driven and ℐ input neurons decomposes into 𝒟 feedforward networks with 𝒩 = 𝒟 + ℐ feedforward inputs. The steady-state equations for these feedforward networks are given by
(A2) |
where we have now suppressed the i index in yμi and in wim. For this feedforward network we will refer the ith neuron as the target neuron, and it is as if that all the neurons (driven and input) are providing feedforward inputs to it. As long as we only consider exact solutions to the fixed point equations, the problem of identifying synaptic connectivity in a recurrent network reduces to solving the problem for feedforward networks. Thus, in the rest of this Appendix we will focus on identifying wm’s satisfying Eq. (A2).
Note that the main text used the notation to emphasize that the set of presynaptic neurons may depend on the target neuron, but we simply write zμm throughout the Appendices with the understanding that the formalism applies to a specified target neuron whose index is suppressed. Furthermore, for conceptual simplicity the main text first stated many results in a feedforward setting with a single driven neuron, but the Appendices immediately treat the general case where presynaptic partners may come from either the input or driven populations of neurons.
b. A convenient set of variables
In all our discussions in this section the input neuronal response matrix, zμm, will be assumed to be fixed. Note that zμm connects synaptic weight vectors to the target response vector and can be used to define 𝒫 weight combinations, the η coordinates. Each η coordinate controls the target response to a single stimulus condition:
(A3) |
It is rather convenient to extend this set of 𝒫 η coordinates to a basis set of 𝒩 η coordinates, such that all synaptic weights can be uniquely expressed as a linear combination of these η coordinates, and vice versa. To see how this can be done, we will henceforth make the simplifying assumption that the 𝒫 × 𝒩 matrix has the maximal rank, 𝒫, although we anticipate that much of our framework, results, and insights will apply more generally. If z has maximal rank, then its kernel will be an (𝒩 − 𝒫)-dimensional linear subspace spanned by (𝒩 − 𝒫) orthogonal basis vectors, denoted by for μ = 𝒫 + 1 …𝒩. We can now extend z to an 𝒩 × 𝒩 matrix, Z, as follows:
(A4) |
where εμm is the mth component of the null vector . With this construction, it is easy to see that the new η coordinates,
(A5) |
remain completely unconstrained by the specified response patterns, as these linear combinations do not contribute to any of the target responses. In contrast, the original η coordinates,
(A6) |
are all constrained by the data:
(A7) |
where for notational simplicity we have ordered the response patterns such that yμ = 0 only for μ = 1, …, 𝒞. Also, we extend the yμ’s to an 𝒩-dimensional vector, , by assigning yμ = 0 for μ = 𝒫 + 1 …𝒩.
The extended response matrix Z defines a basis transdirections formation connecting physical synaptic directions, , with directions
(A8) |
along which the η coordinates change. These vectors clearly differentiate directions in the weight space that are activity-constrained by neuronal responses (μ = 1, …, 𝒫) from those that are not (μ = 𝒫 + 1, …, 𝒩). We can express any weight vector in either the basis or the basis:
where
(A9) |
For later convenience we also define the number of semiconstrained and unconstrained dimensions as, 𝒮 = 𝒫 − 𝒞 and 𝒰 = 𝒩 − 𝒫, respectively.
2. Derivation of the certainty condition for orthogonal input patterns
Our goal here is to use the solution space (i.e., ensemble of weights that are precisely able to recover the specified target responses) to derive a condition for when we can be certain that a given synapse must be nonzero. For technical simplicity, we will specialize to the case when all the response patterns are orthonormal, i.e.,
(A10) |
where I is the identity matrix. Then we can always choose the extended Z matrix to be an 𝒩 × 𝒩 orthogonal matrix, such that Z−1 = ZT and the vectors now form an orthonormal basis. Motivated by biological constraints, we will impose a bound on the magnitude of the synaptic weight vector. For orthonormal response patterns, this translates into a spherical bound on η coordinates as well [see Fig. 5(b)]:
(A11) |
We refer to this 𝒩-dimensional ball, in which all admissable synaptic weights reside, as the weight space.
a. A heuristic argument for y-critical
Before diving into the rigorous and technical derivation, in this subsection we first try to intuitively understand how the certainty condition (51) can arise. For this purpose, let us start with a linear theory with no unconstrained dimension, so 𝒮 = 𝒰 = 0. In this case, there is a unique set of weights that can precisely reproduce the observed responses:
(A12) |
Since Zμm = zμm represents the responses of the mth presynaptic neuron, the solution for the mth synaptic weight (A12) is simply the correlation between the pre and post synaptic activity. In a linear theory, the sign of the synapse is thus dictated by the sign of the correlation between the pre and post synaptic neuron.
Let us now allow a single (𝒩th) unconstrained direction. One can think of this situation as if we do not have the information on how the target neuron would respond to the unconstrained stimulus pattern. If we knew that this response was say, yu, then we would have been able to determine the sign of wm:
(A13) |
However, since we do not know what the last term is, if it can cancel the first term for some allowed value of yu then the overall sign becomes ambiguous. Conversely, wm becomes certain if
(A14) |
Now, it is easy to recognize that the first term is just , where we have suppressed the m index on here to reduce notational clutter and will continue to doso while referring to the synapse direction whose we are considering.12 refers to the projection of along . Also, note that in this simple case with one unconstrained direction, the projection of along the unconstrained subspace is just given by . Further, since Z is orthogonal, to have any solution at all
(A15) |
Substituting the maximum |yu| from Eq. (A15) into Eq. (A14), after some algebra we get the condition for sign certainty as
(A16) |
The same argument applies if the 𝒩th direction is semiconstrained instead of unconstrained, with one notable difference. If the 𝒩th pattern was semiconstrained that means y 𝒩 = 0, and the nonlinear thresholding is masking how the target neuron would have responded in a linear model.13 However, the ambiguity in sign can only arise if the second term has a sign opposite to the first term, or a sign opposite to Sgn(ey). Moreover, for the thresholding to act, the target response for the semiconstrained pattern must be negative in the linear theory, so Z𝒩m has to have the same sign as ey to generate the ambiguity. And, if it is indeed so, then we obtain a certainty condition that is identical to Eq. (A16) except that eu → es, the projection of along the semiconstrained direction:
(A17) |
If Z𝒩m and ey have opposite signs, then the synapse always has the same sign throughout the solution space.
While this derivation of ycr is heuristic and only deals with a single semiconstrained or unconstrained dimension, it provides intuition for the general result (51). Essentially, whether the sign of a given synapse is constant across the solution space depends on two competing quantities: the correlation between the pre- and postsynaptic responses; and the strength of the postsynaptic drive for patterns where the target response is either unknown or masked by the thresholding nonlinearity.
b. Hyperplane dividing excitatory and inhibitory synaptic regions
Having gained some intuition about the certainty condition, let us now proceed to a rigorous derivation of the result. Since the constrained coordinates are fixed for the weight vectors that belong to the solution space [the deep yellow wedge in Fig. 5(b)], we must have
(A18) |
so that the solution space resides within an (𝒰 + 𝒮)-dimensional ball with radius
(A19) |
as depicted in Fig. 5(b), by the yellow region. We refer to this semiconstrained plus unconstrained subspace as the flexible subspace.
Now, the synaptic direction of interest, , can be decomposed into its projections along constrained, semiconstrained and unconstrained subspaces. For notational simplicity, let us denote as the component of along . Note that the second equality follows from the orthogonality assumption and Eq. (A9). In general, in this manuscript we will subscripts on e to denote projections of along different directions or subspaces. We can now write14
(A20) |
where
(A21) |
are unit vectors that lie within the constrained, semiconstrained and unconstrained subspaces, and
(A22) |
are the projections of along these directions. One could think of θ, φ as representing a spherical coordinate system where the role of x, y, and z axesare played by , , and respectively, and our definitions (A20)–(A22) imply the convention, 0 ⩽ {θ, φ} < π/2. For later convenience, let us also introduce the projection of onto the activity-constrained subspace
(A23) |
We would also like to emphasize that we can compute θ, φ just from the knowledge of the neuronal responses, zμm = eμ, which is particularly useful for numerical calculations:
(A24) |
Now, any weight vector in the solution space can be written as
(A25) |
where and are the projections of onto the semiconstrained and unconstrained subspaces, and the constrained part of is fixed at . Using Eqs. (A20) and (A25), one then finds that the wm = 0 hyperplane dividing the excitatory and inhibitory regions in the flexible subspace satisfies the equation
(A26) |
where we have now also suppressed the index m in wm. Also, we have defined α ∈ [0, π] to be the angle between and . We now notice that the origin of the flexible subspace, , is in the solution space and the sign of w for this solution point is given by
(A27) |
In other words, if the sign of the synapse is certain, this certain sign must be Sgn(cos α), which corresponds to the sign of the correlation between the target neuron and the presynaptic neuron. Intuitively, positive correlations point to an excitatory connection, and negative correlations point to an inhibitory connection.
c. Special case without unconstrained dimensions
To derive the certainty condition, let us start by looking at the case when 𝒫 = 𝒩, so that there are no unconstrained directions, or equivalently, ϕ = 0. In this case, the solution space is just the all-negative orthant in the 𝒮-dimensional semiconstrained hypersphere (Fig. 9), and the equation for the w = 0 hyperplane can be written as
(A28) |
where the right-hand side (RHS) is positive, and we have introduced
(A29) |
which flips the direction of if cos α > 0, or equivalently, if ey > 0. Now, if the w = 0 hyperplane (orange lines in Fig. 9) is far enough along from the origin that it does not intersect with the all-negative orthant within the weight bounds, then we can be certain that w is nonzero and always has a consistent sign. To check this, we need to compare the cone angle that the orange hyperplane subtends at the center, φ, with the minimum angle, γ, that the vector makes with the all-negative orthant.
First, φ can easily be inferred from trigonometry:
(A30) |
where ds = y cot θ|cos α| represents the distance from the center of the semiconstrained sphere to the hyperplane. The expression for ds follows from the general mathematical result that if
(A31) |
is an equation for a hyperplane, where denotes the coordinate vector and , β0 are constants, then the perpendicular distance, d⊥, to it from a point is given by
(A32) |
Note, we are interested in the distance from the origin, , to the hyperplane satisfying Eq. (A28), so and .
To provide a geometric intuition for γ, let us first assume that does not point into the all-negative orthant. If we can find the projection of on the correct boundary of the solution space, then γ will be given by the angle between and the appropriate semiconstrained boundary vector, (Fig. 9). Since all the components in the solution space (all-negative orthant) have to be negative or zero, to find the appropriate projection vector of onto the boundary of solution space, we essentially have toset all the positive components to zero:
(A33) |
depending upon whether Sgn(ey) = ±. Here , sμ are just the μth components of and vectors, and ϴ(x) is the Heaviside step function, whichis one if x is positive and zero otherwise. Then, γ is given by
(A34) |
where again the sign in Θ is determined by the sign of ey.
A formal way to see that γ is indeed given by Eq. (A34) is to start with any unit vector, , lying in the solution space. Then, the angle, γ, between and is given by
(A35) |
where we have defined A± as the set of all μ indices for which is positive/negative, respectively. Since is in the solution space, wsμ ⩽ 0, and therefore the second term sums positive quantities while the first term subtracts. Thus,
(A36) |
where both the equalities are achieved when is aligned with the boundary semiconstrained vector, , or , as argued previously. Note also, that this formal proof did not assume any restrictions on direction and thus Eq. (A34) turns out to be a general result that also holds if points into the all-negative orthant.
Combining Eqs. (A34) and (A30), the certainty condition now reads
(A37) |
d. General case with unconstrained dimensions
We can extend the above analysis to the case when we have unconstrained dimensions by noting that, for a given set of unconstrained coordinates, the solution space is again the all-negative orthant in a semiconstrained hypersphere. Isometry along unconstrained dimensions ensures that it is always possible to make one of the null directions, lets say , align with . Then, the w = 0 hyperplane Eq. (A26) reads
(A38) |
which can be rewritten as
(A39) |
where we have introduced . To have a certain synapse, the w = 0 hyperplane cannot intersect the solution space for any allowed value of .
The direction of is independent of the unconstrained coordinates and hence the value of γ remains unchanged. However, the cone-angle, φ, does depend on the unconstrained coordinates in two ways. First, the radius, , of the 𝒮-dimensional spherical subspace containing admissible solutions is now
(A40) |
where η⊥ is the magnitude of the weight-vector in the (𝒰 − 1)-dimensional subspace that is perpendicular to . We note in passing that Eq. (A40) implies . Second, the distance of the hyperplane from origin that follows from Eq. (A39) is now a function of :
(A41) |
Strictly speaking, this expression for the distance is only valid as long as the numerator in the ds expression stays positive. However, if there exists an allowed (let us call it ηu0) for which the numerator can vanish, then the synapse cannot have a certain sign, because at that point ds = 0, the hyperplane intersects the origin, and the weight can vanish even for a linear theory. In fact, the ds = 0 condition provides us with the y-critical value below which the synapse sign becomes uncertain in a linear theory:
(A42) |
So we will now look into cases when y ⩾ ycr,lin which also means that Eq. (A41) will remain valid.
Combining Eqs. (A40) and (A41) we get
(A43) |
For us to be certain that w is nonzero, we have to make sure that even the largest φ that one can obtain by varying η⊥ and is still smaller than γ. Clearly, to make φ large it is best to make η⊥ = 0. Also, it is clear from inspection that cos φ starts to initially decrease as increases from zero, being dominated by the linear term. However, as the quadratic term in in the denominator becomes more and more important, cos φ reaches a minimum and starts to increase. Imposing , we can find that this minimum is reached at
(A44) |
where we substituted ycr,lin from Eq. (A42) and used the fact that W ⩾ y ⩾ ycr,lin to obtain the inequality. This proves that the minimum cos φ indeed occurs at an allowed positive value of . Substituting the above in Eq. (A43), after some algebra we find that this minimum value of cos φ, or equivalently the maximum φ, is given by
(A45) |
The certainty condition then requires
(A46) |
which can be recast as
(A47) |
It is illuminating to express y-critical in terms of the projections, ey, eu, es∗, of the synaptic direction, , respectively along the data vector, , the unconstrained unit vector, , and the semiconstrained boundary vector, :
(A48) |
where
(A49) |
and eu is given by Eq. (A22). We note that setting ϕ = 0 precisely reproduces the correct limit with no unconstrained directions (A37).
3. Regarding orthogonal input patterns in recurrent networks
While our analysis of the solution space and the certainty condition (A48) translate directly to recurrent networks, the requirement of orthogonality for the derivation of our certainty condition imposes certain technical restrictions on its scope when it comes to recurrent neural networks.
The certainty condition we derived for feedforward networks can be applied to two different recurrent neural network set ups. First, let us consider networks where neurons have self-couplings. A consequence of having orthogonal response patterns in this case is that the certainty condition can only be satisfied for self-couplings wii, as long as W ⩾ 1. This is because the imposition of orthogonality in response patterns also restricts the correlation between the target neuron and the other neurons:
(A50) |
However, for the synapse-sign to be certain, the responses of pre and postsynaptic neurons need to be correlated. To see the problem more quantitatively, suppose we are interested in constraining the synapse from the mth neuron onto the ith neuron, as before. Now, the first P elements of the unit vectors, and contain the responses of the ith and the mth neuron in the 𝒫 patterns. We have already derived a decomposition of em in terms of its projections onto the constrained, semiconstrained and unconstrained subspaces (A20). Similarly, can be decomposed as
(A51) |
where lies entirely along the constrained directions, and is orthogonal to it and only has components along unconstrained directions. Then, orthogonality implies
(A52) |
Starting from the certainty condition (A48), we can now go through a sequence of (in)equalities:
(A53) |
where we substituted sin θ sin ϕ from Eq. (A52). Note that the RHS is minimized when and are either aligned or antialigned. Even in this case, RHS = W2y2, and thus the certainty condition cannot be satisfied if W ⩾ 1. One can check that when i = m, because the RHS in the first equation of (A52) is one and not zero, no similar constraints appear. Indeed, the certainty condition may be satisfied depending upon the specific response patterns.
As a second possibility, suppose that no self-couplings are present. Then to be able to apply our framework and determine the couplings wim for a given i, we only need the truncated row vectors of z whose ith column entry is absent, to be orthonormal. Therefore, the response of the ith driven neuron, which consists of the entries of the ith column, can now be chosen independently from the responses of its input neurons. In other words, and , m ≠ i, no longer need to satisfy orthogonality constraint of Eq. (A52). Consequently, the wim weights can indeed satisfy the certainty condition, just as in the feedforward case.
4. Implied conservative bound on the certainty condition for nonorthogonal input patterns
A complete treatment of the certainty conditions for nonorthogonal fixed point patterns is beyond the scope of this work. However, here we provide some preliminary results and insights by explaining how our formalism for analyzing orthogonal fixed-point patterns can be simply adapted to derive an exact, but conservative, upper bound for y-critical that applies to general sets of patterns.
Conceptually speaking, deriving the certainty condition amounts to determining when the wm = 0 hyperplane intersects the solution space within the sphere of weight vectors with norm at most W. Because the solution space is exceedingly simple in η coordinates, our orthogonal analysis used η coordinates to conveniently recast the equations for the bounding sphere and wm = 0 hyperplane. To adapt this analysis to the nonorthogonal case, it is important to account for three important changes to the geometry of the problem. Most fundamentally, the directions corresponding to η coordinates are no longer orthonormal. However, the mathematical notions of orthogonality and normality are implicitly defined with respect to the inner-product structure imposed on the vector space, and our geometrical calculations from the orthogonal case easily carry over to the nonorthogonal case if we redefine the inner product structure of the weight space to give an orthonormal coordinate system with respect to the η coordinates rather than the physical coordinates.15 In practice, all that this will entail is interpreting the η coordinates as if they define coordinates along orthogonal axes, and we will never need to explicitly write down the associated inner product. Second, the equation for the weight bound in the orthogonal system of η coordinates is elliptical around the origin, rather than spherical. However, for any ellipse one can find a sphere that just encompasses it. If we can find the radius of this bounding sphere, then one can look for an intersection anywhere within this sphere and our geometrical approach for deriving Eq. (A48) will carry over and provide a conservative bound for y-critical. This bound will poorly approximate the true y-critical when some axes of the ellipse are much longer than others. Third, the normal vector to the wm = 0 hyperplane is no longer in the orthogonal system of η coordinates. Therefore, the projections of in Eq. (A48) must be generalized to become projections of the hyperplane’s normal vector.
To obtain the radius of the bounding sphere, consider the SVD decomposition of the data-matrix:
(A54) |
where L and R are 𝒫 × 𝒫 and 𝒩 × 𝒩 orthogonal rotation matrices, and Λ is a 𝒫 × 𝒩 rectangular diagonal matrix whose only nonzero entries are given by
(A55) |
Note that λ1,·⋯ ,λ𝒫 are called the singular values of z. We can now define rotated coordinates:
(A56) |
so that
(A57) |
Note that η and η′ are 𝒫 vectors in the current notation. We also note that since w′ is just a rotation of the original synaptic coordinates, the biological bound does not change as we go from w to w′ coordinates:
(A58) |
This makes it possible to find an inequality in terms of the η′ coordinates:
(A59) |
where in the last step we have used the fact that the orthogonal matrix L does not change the L2-norm as one goes from η′ to η coordinates, and λmax is defined as the maximal singular value. To obtain spherical symmetry, we thus define unconstrained coordinates via
(A60) |
such that
(A61) |
We now realize that the problem of finding y-critical using this conservative bound can be recast into the problem of the orthogonal case: As we just described, the η coordinates satisfy the conservative spherical bound. Our goal can then be to find the minimum value of y for which the hyperplane satisfying, wm = 0, does not intersect the solution space. Now, if we define Z to be, as in the orthogonal case, a full rank extension16 of z:
(A63) |
so that for all value of μ,
(A64) |
then the hyperplane equation can be rewritten as
(A65) |
From Eq. (A65) it is clear that
(A66) |
is perpendicular to the wm = 0 hyperplane, where we remind the readers that ’s were defined by Eq. (A9). thus plays the role of whose orientation with respect to the constrained, semiconstrained and unconstrained dimensions determines the certainty condition. Specifically,
(A67) |
where the various projections of are given by
(A68) |
We remind the readers that in the orthogonal case the set A− contained each semiconstrained μ index along which the component of had the same sign as ey. Similarly, here A−, contains those semiconstrained μ indices along which the component of have the same sign as ny. To summarize, the above analysis suggests that both Z and Z−1 will play an important role in generalizing the certainty condition to nonorthogonal patterns, and especially the relative orientation of (defined by Z−1) with respect to the η directions.
Appendix B: ESTIMATING THE PROBABILITY THAT A SYNAPSE IS CERTAIN IN LARGE FEEDFORWARD NETWORKS
For given values of (= ℐ), 𝒞, 𝒫 in a feedforward setting we will here try to assess how likely it is that noiseless orthonormal neuronal responses require a given synapse to be nonzero. As we have seen in Eq. (49), whether a synapse is certain to exist depends on six parameters, θ, φ, γ, α, W, and y. The first four quantities depend on how is oriented with respect to various directions in the weight space. Since is a unit vector, typically we expect its component along any given direction to be . Thus, we typically expect
(B1) |
Hence, we approximate the typical ycr as
(B2) |
Let us now suppose that all the dimensions scale with the network size, such that
(B3) |
Then, we find that as the network size increases ycr behaves as
(B4) |
and ycr is essentially pushed up toward W.
However, the typical scale of y behaves similarly as the dimensions increase. To see this concretely, let us define as a 𝒞-dimensional vector, and assume that every possible is equally likely within a sphere of radius W (larger activity levels of the target neuron admit no solutions). Then the average and median values of are given by
(B5) |
respectively. Since y and ycr scale similarity as one increases the network size, the probability of a synapse being certain should not change as the network size increases. In Fig. 8(c), we show that if we choose, y = 1 − ln 2/𝒞, as the approximate median value in simulations with random input-output configurations (see Appendix F for details), then the number of certain synapses does indeed increase linearly with 𝒩.
To quantitatively estimate the probability of finding a certain synapse, we can compute the fraction of volume of ’s for which the synapse is certain for the typical projections (B2), as compared to the volume of ’s for which solutions to the steady-state equations exist. We know that has to lie within a 𝒞-dimensional sphere of radius W in order for there to be any solutions to the problem.17 However, for the synapse sign to be certain, we need , where ycr is given by Eq. (B2). So, we need to compare the spherical shell volume, , with the volume of the 𝒞-dimensional sphere, Vy⩽W. To find , we have to subtract the 𝒞-dimensional spherical volume with radius ycr from the spherical volume with radius W. Since n-dimensional spherical volumes scale as the nth power of the radius, the probability, P, of ascertaining the sign of the synapse is approximately given by
(B6) |
Now, when 𝒮, 𝒰 ≫ 1 we can approximately evaluate the RHS as follows:
(B7) |
Thus, we get
(B8) |
The most prominent feature of Eq. (B8) is that the probability only depends on the ratios of the various dimensions. Hence, it does not change as we increase the size of the network as long as the ratios are kept constant.
For the purpose of illustration and numerically testing this feature we assessed how certainty predictions changed when the network size is increased while holding the ratios between 𝒞, 𝒮, and 𝒰 fixed. In Fig. 8(c) we have plotted the number of certain synapses in simulations generated from random data as we scale up 𝒩 maintaining the ratios between 𝒞, 𝒮, and 𝒰 (see Appendix F for more details). We illustrate two cases. In the first example, no unconstrained directions were present, and 𝒮 = 3𝒞. Then P = 1 − e−1/3 ≈ 0.28, so one has a 28% chance of being able to determine the sign of the connections. This answer incidentally is the same as an example with 𝒞 = 𝒮 = 𝒰. As another example, Fig. 8(c) considered the case when 𝒰 = 2𝒞 = 2𝒮. According to Eq. (B8), then P = 1 − e−1/5 ≈ 0.18, so the chance of determining the sign drops to about 18%. We only expect these numbers to be approximate. For example, our arguments relied on the assumption that all target responses admitting solutions are equally likely, an assumption that definitely needs to be revisited for realistic networks. However, the scaling behavior should hold for other probabilistic distributions as long as the scale of behaves similar to Eq. (B5) with increasing 𝒩.
APPENDIX C: NONZERO-ERROR CERTAINTY CONDITIONS
There are various reasons why we may want to not only consider weights that exactly reproduce the specified neuronal responses, but also weights that do so approximately. For instance, we are always limited by the accuracy of the measurement apparatus. More importantly, there are various sources of biological noise that typically lead to uncertainties in observed values of neuronal responses. For the purpose of this paper we will consider any set of weights to be part of the ε-error solution space if it is able to reproduce the specified neuronal responses with an error ⩽ ε [see Eq. (63) for definition of E]. We will neglect uncertainties in the input responses to the target neuron, but we will comment on their possible effects toward the end of this Appendix.
1. Errors in feedforward networks
Let us first focus on feedforward networks. Allowing for error increases the value of ycr by expanding the solution space. One way to think about this is to realize that we have to now make sure that Eq. (51) is satisfied for any nonnegative , where and are vectors in the 𝒫-dimensional activity-constrained subspace, the former representing the observed responses, and the latter coming from noise. We will initially assume that all the observed responses are nonnegative, so a zero-error solution is possible and the noise is bounded by . Our strategy will be to first seek the minimum y needed to have a certain synapse for a given . We then find the maximum among these y-critical values as we let vary within the ε-ball. Since this procedure will guarantee that the w = 0 hyperplane does not intersect the entire solution space with ℇ ⩽ ε, this means that the synapse must exist for the network to generate the specified responses patterns. The synapse’s sign will match the zero-error analysis. We will first estimate y-critical when the error is small enough to not induce topological transitions in the error surface. In the subsequent sections, we will include the effects of topological transitions, as well as explain how to deal with situations where some of the observed responses are negative, which is possible due to noise.
a. When all observed responses are nonnegative and no topological transitions occur
To understand how errors affect the certainty conditions, let us consider the case where the observed responses are nonnegative and the allowed error satisfies 0 < ε < min{yμ}μ=1, …, 𝒞, so that no topological transitions can occur. If some responses that were zero in are now nonzero in , then both ey and es∗ can change due to the noise. Withoutloss of generality, let us assume that only has nonzero components along μ = 𝒞 + 1, …, 𝒬 semiconstrained dimensions,18 as well as along some (or all) of the constrained dimensions. Then ey changes to
(C1) |
Furthermore, if some previously semiconstrained components that contributed to have now become constrained,19 then no longer has thosecomponents. This means that we haveto subtract these components from es∗:
(C2) |
where Aμ is 1 if and eμ have the same sign and 0 otherwise. This follows from the definition of the boundary projection vector (A33) and es∗ (A49). Thus, for a given the certainty condition (A48) yields
(C3) |
As before, one can interpret the above inequality as equivalently specifying either y-critical or W-critical. For a fixed , one can obtain a minimum value of the left-hand side (LHS) of the latter inequality by varying within the ε ball. The square root of this is W-critical. Then as long as W is less than W-critical, we will have a certain synapse. Inverting the relation, one finds y-critical as the minimum y needed to make the synapse sign certain for all and given and W. More explicitly, equating the two sides of the inequality forany given , , and W, we get a minimal y that depends on . To find y-critical, we have to take the maximum of the minimal y as we vary over all possible in the ε-ball.
Let us first obtain a lower bound on y-critical. By inspection of the LHS of the above inequality, it is clear that the more the -dependent terms can cancel the -dependent terms, the harder it is to satisfy the certainty condition. We observe that in Eq. (C3), the second term is minimized when .20 Accordingly, one can obtain a lower bound on y-critical by substituting in Eq. (C3):
(C4) |
where we have used and , since has no components along the semiconstrained directions. We will see later that this simple lower bound can approximate the actual y-critical very well in many situations. Notice that we used a subscript “0” to denote this lower bound. This is because, as we will soon see, when noise allows for topological transitions, one may be able to obtain stricter lower bounds by allowing some constrained dimensions to behave as semiconstrained. This “0” emphasized that no constrained indices behave as semiconstrained. Next, we can find an upper bound for y-critical by noting
(C5) |
where the first inequality is true because , and the second inequality because we have dropped positive (δ2) terms. Then we can obtain an upper bound on y-critical by finding a y such that even the last expression on the RHS is greater than W2. Specifically,
(C6) |
So, let us try to find the that minimizes the LHS:
(C7) |
where , and we have noted that because must be in the activity-constrained subspace. It is now clear that LHS is minimized if anti-aligns with . Then Eq. (C6) yields
(C8) |
Equating the two sides of Eq. (C8) and solving for y,21 we now get an upper bound for y-critical:
(C9) |
where ξ is the norm of and can be simplified as
(C10) |
Thus, we have
(C11) |
As with lower bound, we will see that to obtain the correct upper bound in presence of topological transitions, one has to maximize over several upper bounds. Hence, we refer the above upper bound that does not include any effects from topological transitions with an index “0.”
Finally, we would like to point out that for small errors one can also obtain an approximate correction to y-critical that lies in between ycr,min,0 and ycr,max,0. To obtain this estimate, let us first write down the bound on y that one would obtain from Eq. (C3) as δ → 0:
(C12) |
If has components along any semiconstrained direction that contributes toward the original vector, then , and comparing Eqs. (A48) and (C12) we see that, as δ → 0, the bound on y will be less than the zero-error ycr. In other words, for sufficiently small errors, if explores directions that contribute to es∗, then the corresponding bound on y is going to be smaller than even the zero-error ycr. Thus, for these small errors the leading order corrections to Eq. (A48) is obtained only if δ do not have any components along these semiconstrained directions. This means , and we can reorder the indices such that the semiconstrained directions along which excursions of will be considered range from 𝒞 + 1, …, 𝒬; i.e., is negative for these, and only these, semiconstrained indices. To obtain the certainty condition, one can then follow steps (C5)22 through (C9) except that is restricted to only have nonzero components along constrained directions those semiconstrained directions that do not contribute to , i.e., for μ = 1 … 𝒬. In other words, it can at the most anti-align with a truncated ,
(C13) |
where ξtrunc is the norm of and can be simplified as
(C14) |
Substituting ξ = ξtrunc into the counterpart of Eq. (C9), and keeping only the linear terms in ε, we thus get the leading order correction to Eq. (A48):
(C15) |
We will see later how ycr,appr,0 can be generalized to provide an approximation, ycr,appr, to y-critical that accounts for topological transitions.
Reassuringly, we see that at ε = 0, ycr,appr,0, ycr,max,0, and ycr,min,0, all reduce to the zero-error ycr Eq. (A48). Also it is obvious that the coefficient of ε in ycr,appr,0 is greater than that of ycr,min,0 but less than that of ycr,max,0. Finally, note that ycr,appr,0 coincides with ycr,min in the maximally nonlinear case where . In Fig. 10(a), we have plotted how these different quantities depend on ep, ey and es∗. In particular we note that as the network size increases, these curves typically come closer together [Fig. 10(b)], so that they provide a good approximation for y-critical. Finally, for future reference we point out that for a given set of input patterns, zμm, the various y-criticals that we have computed above depend on the orientation of the target response vector, or , and the total noise budget, ε. In other words, , , and .
b. Comparing predictions from linear and nonlinear models
To assess the effects of nonlinearity it is useful to compare the predictions for certain-synapses between the linear and nonlinear theory. In a linear theory, there are no semiconstrained directions, and therefore, a lower bound, leading order and upper bound on y-critical can be obtained from Eqs. (C4), (C15), and (C11), respectively, by setting es∗ = 0:
(C16) |
(C17) |
(C18) |
We note that since all these quantities are increasing function of es∗, the linear values are always less than or equal to the nonlinear counterparts. Since no topological transitions are possible in a linear theory, these expressions do not need a qualifying “0” index. In Fig. 10, we show a comparison between the upper bound on y-critical obtained in the linear and the nonlinear theories.
c. ycr,max, when one and only one nonzero response is smaller than noise
In the previous section we have considered responses which are either zero, or positive and greater than the noise bound, ε. In this section, we consider a situation where one and only one of the observed responses is smaller than the noise, |y1| < ε.
Note that once we admit noise, it is possible for the small observed response to be negative. In this case, there is no zero-error solution as Φ(η1) cannot be negative, and therefore we need a minimum noise, and incur a minimum error:
(C19) |
In fact, since the noise for this observation has to be positive, we must have
(C20) |
Then using
(C21) |
we obtain a modified bound on the noise:
(C22) |
since . Or,
(C23) |
Let us now introduce a new reduced response vector whose response to the first pattern is set to zero:
(C24) |
We can then identify to be the noise associated with the μ = 1 response in this new feedforward problem, while the other δμ’s can continue to represent the noise associated with all the other responses. Thus, a sufficient condition for a given synapse to be certain is
(C25) |
A very similar condition arises if y1 is positive but small enough to admit a topological transition. To see how, let us first remember that in order for a synapse to be certain, the solution space should not intersect with the w = 0 hyperplane. Now, let us look at the solution space coming from denoised ’s that have . Since, the solution space corresponding to these ’s do not have any additional semiconstrained dimension as compared to the observed response, , the condition for no intersection with this part of the solution space is simply given by
(C26) |
a condition that guarantees a certain synapse when no topological transitions are considered. Next consider the solution space for denoised ’s with . The solution space for these ’s have an additional semiconstrained dimension corresponding to the first pattern. We can therefore use the reduced response vector, (C24), so that the solution space corresponding this new response vector with the error bound, ε′ (C23), along with the solution space with accounts for the full solution space of with error ε. Note, that the noise budget is again reduced according to Eq. (C23) since we are committed to making at least an error of y1 to convert the first response to a semiconstrained dimension. To ensure that there is no intersection of the w = 0 hyperplane with the solution space we must therefore also satisfy Eq. (C25). We note that to calculate the right-hand side using Eq. (C11), the various projections have to be recalculated according to
(C27) |
The condition (C25) on translates to a condition on :
(C28) |
where we have defined ’s to be the μth component of .
We note that this is a nonlinear inequality as the right-hand side depends on through its implicit dependence on ε′. When we have a negative y1, only Eq. (C28) needs to be satisfied to guarantee a certain synapse, but if y1 is positive, both Eqs. (C26) and (C28) have to be satisfied. It is not hard to see how this process should be continued if one has more than one topological transition within the allowed error. Since we know the precise sequence of topological transitions, all the sequential certainty-conditions can in principle be obtained. A synapse is certain if all of its certainty-conditions are satisfied.
So far, we have obtained a way to check whether a synapse is certain given the response data, . We also have an upper bound of y-critical, ycr,max,0, ignoring effects from topological transitions when all the observed responses are nonnegative. We will now investigate how topological transitions can change this upper bound. We will start by quantifying effects from a single topological transition by finding potentially a new upper bound for y-critical, ycr,max, such that we can say that if , then the synapse is certain. Suppose we start out with a data vector whose norm is so large that there are no topological transitions. Then as we decrease the strained norm, but keep its orientation, , fixed, eventually a semiconstrained dimension will open up in the solution space, in our example, the first direction. If we keep decreasing further, then at some point another response dimension will become semiconstrained due to the presence of noise. Let us however consider the situation where ycr,max (that is yet to be computed) is going to turn out to be larger than the norm when the second transition occurs. In this case, we do not have to consider this possibility (and any other transitions) because then if the second transition cannot occur. We will later find a condition that guarantees this. Since we are trying to find the smallest value of ycr,max that we can find, what all this means is that at , one of the two inequalities, (C26) or (C25), becomes an equality. While the first equality is trivial to solve as the right-hand side does not depend on ycr,max, the second equation is highly nonlinear23:
(C29) |
where
(C30) |
In particular, we notice that there are two competing effects that ultimately determine ycr,max,1. The numerator depends on ε′, which decreases as ycr,max,1 increases and therefore has an overall effect of decreasing ycr,max,1. However, the presence of in the denominator within the square root tends to increase ycr,max,1. To determine the correct upper bound for y-critical one has to compare the obtained from Eq. (C29) with , and then choose the maximum because then both the inequalities (C26) and (C25) will be satisfied. For the negative response case, we simply need to solve Eq. (C29) to obtain ycr,max,1.
Now, determining ycr,max,1 from Eq. (C28) involves solving a quartic equation leading to expressions that are not particularly insightful. However, we can obtain a relatively simple conservative estimate bypassing the nonlinearity if we have a lower bound on y-critical, ycr,min because we can use this bound to overestimate ε′:
(C31) |
Before we describe how we can obtain ycr,min, let us note that if
(C32) |
then the second transition occurs at a magnitude that is lower than ycr,max,1, and therefore does not need to be incorporated in the ycr,max,1 calculation. Indeed, Eq. (C32) is a sufficient condition but not a necessary one.
d. ycr,min, when one and only one nonzero response is smaller than noise
When we have a small response, |y1| < ε, we have seen that we have to consider solution space around a reduced response vector, (C24), with a smaller error budget, ε′ (C23). Accordingly, we can obtain an equation for a lower bound on y-critical using Eq. (C4),24
(C33) |
where μ = 1 along with μ = 𝒞 + 1 … 𝒫 are all treated as semiconstrained. As before, since the norms along and are related via
(C34) |
we get an equation for ycr,min,1 very similar to Eq. (C29) for ycr,max,1:
(C35) |
Although nonlinear, the above equation reduces to a quadratic equation for ycr,min,1,
(C36) |
solving which we get25
(C37) |
For positive y1 the above expression provides another lower bound along with the one obtained without the transition (C4). To ensure we have the tightest possible lower bound we thus maximize:
(C38) |
e. When more than one nonzero responses are smaller than allowed error
It is not difficult to see how the arguments above generalize if we have more than one small (< ε) observed response. We have to consider cases where all the negative observed responses, and different possible combinations of the positive responses, are set to zero. Let us denote T to be one such possible set of μ indices. As before, we define a reduced response vector, which is now indexed by T:
(C39) |
so yT,μ = 0 for all μ ∈ T. Then, essentially following the same algebraic manipulations as above we obtain a lower bound according to
(C40) |
To reiterate, whenever an observed response is negative, which is inconsistent with a threshold linear transfer function, this means that some of the noise budget has to be used up to bring this response up to zero, and the same noise reduction occurs if one wants to consider topological transitions. Each ycr,min,T evaluated this way provides us with a lower bound, and hence we have to take a maximum over all these to find the tightest lower bound, ycr,min. To make things explicit, let us also enumerate the new expressions for the various projections of that one needs to calculate :
(C41) |
Once we have a lower bound, we can obtain conservative upper bounds analogous to Eq. (C31) for each T:
(C42) |
As before, for a consistent upper bound for y-critical, we need to take the maximum over all ycr,max,T ’s.
Finally, we note that we do not need to consider all possible transitions. While going through the sequence of transitions, as soon as we find a T such that for all μ ∉ T we can stop as this means that by the time is small enough that any additional yμ’s can be set to zero, the synapse is already uncertain.
f. Numerical simulation
To illustrate the behavior of the various y-critical functions and check their utility, in Fig. 11 we have plotted ycr,min (light gray curve) and ycr,max (black curve) for the same 102 configurations as the ones depicted in Fig. 8(b) involving a feedforward simulation with 𝒩 = 6, 𝒫 = 5, 𝒞 = 2, and ℇ < ε = 0.1. We also defined, ycr,appr, as a maximum over different approximations, ycr,appr,T’s, that incorporate topological transitions and are defined as natural generalizations of Eq. (C15):
(C43) |
We have plotted ycr,appr, the approximation of y-critical, in green in Fig. 11. As in Fig. 8(b), the black dots here denote the maximum value of y in our simulations that still admitted mixed signs for the synapse under consideration, for details on the simulations, please see Appendix F. As one can see, most of the black dots seem to closely track the ycr,min curve, but some of the dots lie between the ycr,appr and ycr,min curves.
2. New sources of corrections in recurrent neural networks
It is clear that recurrent neural networks inherit error corrections to y-critical that were already present in the feedforward case. There are two additional sources of error that one could consider as one moves from feedforward to recurrent networks. However, our numerical simulations of recurrent networks suggest that these are sometimes small effects, and we leave their systematic study for the future.
First, we could account for the fact that the directions themselves can change. This is because the inputs driving any given driven neuron can no longer be assumed to be fixed at zμm if the other driven neurons suffer from noise. However, these activity patterns define the directions and ημ coordinates. Allowing noise in input neurons would lead to similar corrections.
Second, the total error in Eq. (63) may be unevenly distributed across the driven neurons. If the total squared error summed over all responses and neurons is , then on average, the root-mean-square error associated with each driven neuron is . We can thus hope that a substitution of in the various y-critical formulas will provide a good approximation. However, it is also possible that a few neurons will incur most of the error (up to εtot), potentially leading to violation of the certainty conditions computed from the root-mean-square error over neurons.
APPENDIX D: BEYOND THRESHOLD-LINEAR TRANSFER FUNCTIONS
So far, we have always modeled the firing rate as a threshold-linear function applied to the input drive. Here we will explain how our analyses of y-critical with noise also provide a formalism to analyze a much more general class of nonlinear transfer functions.
1. Bounded deviations from the threshold-linear function
Let us start by considering transfer functions with bounded differences from the threshold-linear function:
(D1) |
In this case, the fixed-point equations become
(D2) |
Since |Δ(x)| is bounded by Δ0, we have a bound on the squared norm of :
(D3) |
It is therefore clear that we can estimate y-critical for the Ψ nonlinearity with exactly the same formalism that we used to estimate y-critical for the threshold nonlinearity in the presence of noise. In particular, all the y-critical estimates [(64), (65), (C15)] are valid with the substitution, . Moreover, one can account for other sources of noise (bounded by ε0) by instead substituting
(D4) |
to obtain estimates and bounds on y-critical.
2. Bounded departures from any threshold-monotonic nonlinearity
Let us now consider transfer functions, Ψ(x), that are close to a function, Ξ(x), that monotonically increases above a threshold, xT :
(D5) |
Accordingly, we find
(D6) |
Since the monotonicity condition ensures that Ξ−1 is well defined above threshold, and, we then have the upper bound,
(D7) |
Additionally, if yμ > Δ0, then we also have a lower bound:
(D8) |
Thus, combining the upper and lower bounds, we find
(D9) |
However, if yμ ⩽ Δ0, then there is no lower bound, and any ημ satisfying the upper bound (D7) is allowed. Now, we can introduce effective responses, representing the midpoint of possible superthreshold input drives,
(D10) |
and effective noise limits,
(D11) |
which allow ημ to span the full allowed range. By inspection, we now see that the solution space is equivalent to the solution space of a threshold-linear problem:
Thus, again all the y-critical estimates [(64), (65), (C15)] will be valid with the substitution and a conservative error bound
(D13) |
APPENDIX E: CERTAIN SYNAPSES IN LOW-DIMENSIONAL RECURRENT NETWORKS WITH SELF-CONNECTIONS
As discussed in Appendix A, when one moves from feedforward to recurrent neural networks with self-synapses, the input patterns can no longer be considered independent from the target neuron responses. How then does one assess synapse certainty for driven neurons with self-synapses, such as y3 in Fig. 6(a), y in Fig. 12(a), y1 in Fig. 12(c), and y2 in Fig. 12(c)?
We begin by concretely analyzing neuron y in Fig. 12(a), because this is the conceptually simplest example, and fundamentally the mathematical analyses are the same for the other examples. In particular, to test our formalism and analytical results using this neuron, we performed low-dimensional simulations where the 𝒩 × 𝒩 extended input pattern matrix was
(E1) |
which is the same as X in Eq. (60), except that the role of x3 is now played by y itself.26 The third column of Eq. (E1) corresponds to the responses of the driven neuron, but it also provides a self-input. The two input neuron responses are given by the first two columns. Equation (E1) is meant to correspond to the case where 𝒫 = 2 and 𝒞 = 1, such that μ = 1, 2, 3 correspond to the constrained, semiconstrained, and unconstrained response patterns, respectively. For the purpose of numerical testing, we assumed that χ ∈ (0°, 90°) and ψ ∈ (−90°, 0°), as this range of angles ensures that driven neuron responses were nonnegative.27 We set W = 1. See Appendix F for numerical simulation details.
This example problem has one self-coupling, w, and two feedforward couplings, u1 and u2. For this response structure, one can use Eq. (51) to calculate ycr for each of these three couplings. We find
(E2) |
We assess synapse certainty by checking whether these formulas for ycr are smaller than the magnitude of ,
(E3) |
Since W = 1, we see that the self-coupling becomes certain only if
(E4) |
[Fig. 12(b), left], where θ has been defined to be the angle between and 28. consistent with the conventions of Eq. (37). Next, basic trigonometric manipulations tell us that the certainty condition can never be satisfied for u1 [Fig. 12(b), middle]. Finally, we see that the condition for certainty is always just not satisfied for u2. Here this implies that all solutions have u2 ⩾ 0 [Fig. 12(b), right]. This is because it is a very special case where cos γ = es∗ = 0 and in Eq. (A51) is aligned with , so that both the inequalities in Eq. (A53) turn into equalities. In each panel of Fig. 12(b), we tracked the fraction of positive and negative synapse signs across the simulations, as we varied θ. In particular, we see that w had a unique sign as long as θ < 45°, u1 always had mixed signs, and u2 was nonnegative.29
The same exact response matrix (E1) can also be used to consider neurons y3 in Fig. 6(a) and y2 in Fig. 12(c). The only difference is that the three columns respectively encode the responses of the second driven neuron, second input neuron, and third driven neuron in Fig. 6(a) and the responses of the single input neuron and two driven neurons, y1 and y2, in Fig. 12(c). This correspondence can be seen by comparing the numerical results in Figs. 12(b) and 12(d). We use this correspondence to avoid having to simulate the full network in Fig. 6(a), and the numerical results in Fig. 6(c) are the same as those in Fig. 12(b).
APPENDIX F: NUMERICAL METHODS
1. Low-dimensional numerical methods
Here we detail the numerical methods relevant for Figs. 6 and 12.
a. Feedforward analysis
To test the analytic dependence in Fig. 6(b), we wanted to simulate solutions without biasing ourselves by the particular search algorithm used to find solutions. Accordingly, to find solutions to the fixed point equations (A2) with very small error (ℇ < ε = 0.01) we performed a random screen where each weight was chosen randomly from a uniform distribution between −1 and +1. For feedforward circuits, given the synaptic weights, one can obtain the fixed point responses of the target neuron by direct substitution of the known input responses in Eq. (A2) and then comparing these simulated target responses with the known target responses. We varied ψ, χ in the response data (E1) systematically in steps of 6°.30 For the light and dark green curves, ψ was fixed at 45°, and χ was varied between (0°, 90°) and (90°, 180°), respectively, while for the pink and purple curves χ was fixed at 45°, 135°, respectively, and ψ was varied between (0°, 90°). Finally, for a given choice of ψ, χ, we systematically varied y between 0 and 1, in intervals of Δy = 0.01. For each value of ψ, χ, and y, we obtained ~(102–104) solutions31 satisfying the error and the biological bound (A11) from five to ten million different trial weight vectors. We then identified the maximal value of y for which the solutions had both positive and negative w1’s. This simulation point should lie beneath the theoretical ycr if no error is allowed. However, since the error is small but nonzero, occasionally the y-criticals determined from simulations did slightly exceed the theoretical value. Also, since we vary y by small amounts y = 0.01, we expected the simulated y-criticals to be discrete but close to the theoretical predictions, which is exactly what we found in Fig. 6(b).
b. Recurrent analysis
Because the recurrent network solution space separates into several feedforward solution spaces at zero error, we numerically treated the driven neurons one at a time. To find solutions for the recurrent neurons in Figs. 6(a), 12(a), and 12(c), we fixed χ, ψ and then performed screens with random weights, selected in the same manner as the feedforward simulations discussed earlier. For each set of weights, and for each μ = 1, 2, we obtained the late time values of y by solving the time evolution equation [Eq. (6) with τi = 20 ms] using Euler’s method starting with initial conditions yi(0) = yμi, for μ = 1, 2. We used a time step of Δt = 0.2 ms. The ’s obtained from the simulation at late times, t ~ 600 ms, were then compared with yμ to obtain ℇ. If the weights satisfied, and the biological bound (A11), then we considered the weights as solutions and checked the sign of the synaptic weights. For every value of ψ, χ, we found at least 50 solutions32 to test the certainty predictions.
2. High-dimensional numerical methods
Here we detail the numerical methods relevant for Figs. 8 and 11.
a. Generating random orthogonal matrices
In several simulations we had to generate orthogonal response matrices. This meant that we had to obtain 𝒫 orthonormal 𝒩-dimensional vectors. This was done by first generating an 𝒩 × 𝒩 matrix, G, where each of its entries was randomly selected from a uniform distribution between −1 and +1. We then antisymmetrized the matrix, G → (G − GT)/2, and a random orthogonal response matrix was then obtained via matrix exponentiation, Z = eG, where the matrix exponential is defined by substituting the matrix G into the power series expansion of the exponential function and is distinct from the simple exponentiation of individual matrix elements. The first 𝒫 rows of Z could then correspond to the 𝒫 orthogonal patterns, and Z can be interpreted as the orthogonal extension of z as discussed before.
b. Generating random orthogonal matrices with nonnegativity constraints
In recurrent networks, all driven neurons must have nonnegative responses for all patterns. Accordingly, when the input response pattern includes responses of driven neurons, we follow a different procedure for generating the response matrix, which works as long as ℐ ⩾ 𝒫 – 1. We started by choosing a (𝒫 × 𝒩)-dimensional matrix, z, containing the responses of 𝒟 driven neurons and ℐ ⩾ 𝒫 – 1 input neurons. The first columns of z corresponded to driven neuron responses, and the last columns to input neuron responses, such that z = (y x). We made sure that the responses of the driven neurons were all nonnegative, as the threshold nonlinearity dictates, by choosing them to lie randomly between 0 and 1. To mimic a sparse response pattern, we set driven responses to 0 with 50% probability. The feedforward inputs, however, were randomly selected between −1 and 1. We then orthogonalized the input responses to the target neuron as follows. We start by normalizing the ν = 1 pattern:
(F1) |
Then, for each row, ν = 2 … 𝒫, in a sequential order we performed the following operations:
- We started by defining a (ν − 1)-dimensional square matrix, x′:
(F2) - We next changed the first m = 1 … ν − 1 elements of the νth row of x
and thus z. The other elements of the νth row of z were left unchanged. In particular, none of the driven neuron responses changed during this step.(F3) - Finally, we rescaled all the elements of the νth row of z for normalization:
This algorithm essentially uses the responses of the input neurons to the νth stimulus to ensure that the full νth response pattern involving both the driven and input neurons is orthogonal to all μ ⩽ ν − 1 patterns.(F4)
c. Generating target responses and response directions
To generate 𝒫 target responses with 𝒮 null responses, we simply randomly selected numbers between 0 and 1 for the 𝒞 = 𝒫 − 𝒮 nonzero responses.
In some simulations, we wanted to consider situations where one has to account for a single topological transition to compute y-critical. Accordingly, we tailored the responses as follows. First, we set one nonzero response of the target neuron to a small value, 0.1ε. was then obtained by dividing the response vector by its norm. We then only considered those ’s whose other entries were large enough to prevent additional topological transitions from affecting y-critical. This was done by: evaluating ycr,min, the theoretical lower bound for y-critical that includes the first topological transition (Appendix C); constructing , which approximates the activity vector right below y-critical; and ensuring that all the other entries of were greater than the allowed error, which guarantees that no other constrained dimensions can become semiconstrained in between ycr,min and the true y-critical. This way, typically one and only one constrained direction became semiconstrained when we allowed solutions with errors ≲ε.
d. Finding solutions using gradient descent learning in feedforward networks
In all the high-dimensional simulations, we had to find solutions to the fixed point equations (A2). Since scanning a high-dimensional synaptic weight space randomly is not numerically efficient, we applied gradient descent learning33 to obtain solutions. For feedforward networks, this meant using the loss function
(F5) |
We performed gradient descent optimization until we reached the desired error bound, ℇ < ε. The initial weights were first chosen randomly from a uniform distribution between −1 and 1. The initial weight vector was then rescaled to have a norm between 0 and W = 1, chosen uniformly.
e. Finding solutions using gradient descent learning in recurrent networks
To find solutions for recurrent neural networks, we used the modified loss function,
(F6) |
to perform gradient descent, instead of Eq. (63). Since the responses of the driven neurons can vary for nonzero errors, the two loss functions, ℇ and , differ. However, it is numerically a lot quicker to obtain solutions via gradient descent with as compared to using back-propagation through time to consider the entire time evolution of the network. Thus, the strategy we adopted to find solutions with ℇ ≲ ε was to first find weights satisfying . Also, the gradient descent was done in two stages. In the first stage we minimized the error associated with each individual driven neuron, treating it as a feedforward problem. Once each of these errors were less than , we performed a second stage of gradient descent to minimize down to . Next, we obtained the late time values of yi’s by solving the time evolution equations (6) with τi = 20 ms using Euler’s method with step time Δt = 0.2 ms for the weights obtained via gradient descent and starting with initial conditions yi(0) = yμi, ∀μ,i. The ’s obtained at late times, t ~ 600 ms, this way were compared with yμi to obtain ℇ. Finally, we checked that the weights satisfied the biological bound34 (A11).
f. Other minor simulation details
In Fig. 8(b), we show results from a simulation with 𝒩 = 6, 𝒫 = 5, 𝒞 = 2, and ℇ < ε = 0.1. The solutions were obtained using a gradient descent learning rate of 0.02. We varied the norm of the target response vector, , systematically by Δy = 0.01 in a manner similar to the low-dimensional simulations.
In Fig. 8(c), we considered a single input-output configuration for a given value of 𝒩 and 𝒫, and we found a single solution with using a gradient descent learning rate of 0.005.
In Figs. 8(d)–8(e), we show results of a simulation for a 𝒩 = 10, 𝒟 = 4, ℐ = 7, 𝒫 = 8, and 𝒞 = 3 network where the norm of was fixed to 0.79, which approximates the median value of y for the given values of 𝒩, 𝒫, and 𝒞 [Eq. (B5)]. Our solutions were obtained using a gradient descent learning rate of 0.004 and the overall error satisfied .
Footnotes
Assuming that the number of patterns does not exceed the dimensionality of the synaptic weight vector.
It will become progressively evident that our construction of the solution space and certainty condition can be trivially adapted to the case where the number of presynaptic neurons changes from one driven neuron to another.
In fact, one can easily incorporate the case when the number of presynaptic partners differs from one driven neuron to another. This just means that the z(i) matrices will have dimensions 𝒫 × 𝒩i, where 𝒩i represents the number of presynaptic partners of the ith neuron.
Nevertheless, the vector spaces of synaptic weights are fundamentally distinct for different driven neurons, as these vector spaces pertain to the incoming synapses onto different driven neurons. The fact that the Z(i) matrices are the same for all i means that the relative orientation of the η directions, with respect to the physical w-coordinate axes (labeled by the presynaptic indices), remains the same for all the driven neurons.
The angles also depend on the synapse but we have dropped the m index for brevity.
Since this intersection point depends on m, the semiconstrained dimension indexed by μ can behave as constrained for some synapses and unconstrained for others.
Our results for the certainty condition hold for network ensembles that exactly generate the desired responses. For numerical tests, we had to allow for small deviations from the desired responses, but our predictions proved robust.
We performed this numerical experiment with several random configurations to confirm that the results did not qualitatively depend on the random sample.
For example, imagine projecting the 3D surfaces in Fig. 7(a) along two dimensions.
We do want to point out that in the main manuscript since we were introducing the various concepts and relevant quantities, for clarity we did explicitly keep track of the m index.
Note that our relation between the sign of the synapse and the sign of the correlation is based on a linear response.
Again, we remind the readers that in the main manuscript these projected vectors were denoted by , and .
In particular, for an orthogonal Z matrix, the two inner product structures defined via and are equivalent, but this is not the case when Z is nonorthogonal. Although one would conventionally adopt the first inner product structure, both the derivation and interpretation of the conservative y-critical formula is easier in terms of the latter inner product structure, which makes all of the response pattern directions orthonormal by definition.
(A62) |
The allowed must also lie in the all positive orthant, but as we will compute the ratio of two spherical volumes the reduction factor will cancel out.
The components of along these semiconstrained directions are all positive since must be nonnegative.
This will happen if a component of that was zero now has a nonzero component.
This assumes that y > ε. Smaller values of y permit and all weights can be set to zero.
This quadratic equation obviously has two solutions. The correct one can easily be identified, for instance, by taking the ε → 0 limit.
Since we are only interested in the leading order correction, we could also drop the (δ2) terms needed to arrive at an expression such as the RHS of Eq. (C5).
Here the “1” in the subscript indicates that this possibility for y-critical is computed by only considering the first topological transition.
To remind the readers, the expression for ycr,min,0 was obtained by computing ycr for , a denoised point that is allowed because of the noise. In this case, the corresponding point is .
The second root gives a negative result, and accordingly does not reduce to the correct ε → 0 limit.
In the context of Fig. 6(a), y, x1, and x2 can be identified with y3, y2, and x2, respectively.
This range also allowed us to ensure nonnegativity for the other example recurrent circuits in Figs. 6(a) and 12(c).
(E5) |
As we explained before, ycr,u2 = y for all values of θ. Hence, the certainty condition is not satisfied because u2 may be zero. The fact that u2 can vanish is not discernible from our simulations because the weight magnitudes were generated randomly, and weights where u2 = 0 comprise a zero measure set.
Since here we were primarily interested in the zero-error result, we restricted ourselves to a range of ψ, χ where no topological transitions can occur due to the small but finite error we had to allow for numerical simulations.
The number of solutions varied between 200 and 40 000 depending primarily on the value of y, the higher the value, typically the more difficult it was to find solutions.
The number of solutions varied between 104 and 105 for the 𝒟 = 1 simulation in Fig. 12(b) and between 56 and 110 for the 𝒟 = 2 simulation in Fig. 12(d). Note that for the 𝒟 = 2 simulation we have a six-dimensional weight space, which makes it a lot harder to find solutions through random scanning. Also, for this latter case we only checked that the biological constraint is satisfied by the incoming weights to y1.
Typically with learning rate ~0.01.
We only checked that the target weights satisfied the weight bound, as that is what matters for the certainty conditions. Since we initialized weights amongst the nontarget neurons to be between −7 and 7, it is likely that other components of the weight matrix were large.
References
- [1].Watson JD and Crick FHC, Molecular structure of nucleic acids, Nature (London) 171, 737 (1953). [DOI] [PubMed] [Google Scholar]
- [2].Milo R et al. , Network motifs: Simple building blocks of complex networks, Science 298, 824 (2002). [DOI] [PubMed] [Google Scholar]
- [3].Hunter P and Nielsen P, A strategy for integrative computational physiology, Physiology 20, 316 (2005). [DOI] [PubMed] [Google Scholar]
- [4].Seung HS, Reading the book of memory: Sparse sampling versus dense mapping of connectomes, Neuron 62, 17 (2009). [DOI] [PubMed] [Google Scholar]
- [5].Bargmann CI and Marder E, From the connectome to brain function, Nat. Methods 10, 483 (2013). [DOI] [PubMed] [Google Scholar]
- [6].Bock DD et al. , Network anatomy and in vivo physiology of visual cortical neurons, Nature (London) 471, 177 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Varshney LR, Chen BL, Paniagua E, Hall DH, and Chklovskii DB, Structural properties of the Caenhabditis elegans neuronal network, PLoS Comput. Biol 7, e1001066 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Ahrens MB et al. , Brain-wide neuronal dynamics during motor adaptation in zebrafish, Nature (London) 485, 471 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Schrödel T, Prevedel R, Aumayr K, Zimmer M, and Vaziri A, Brain-wide 3D imaging of neuronal activity in Caenorhabditis elegans with sculpted light, Nat. Methods 10, 1013 (2013). [DOI] [PubMed] [Google Scholar]
- [10].Ohyama T et al. , A multilevel multimodal circuit enhances action selection in Drosophila, Nature (London) 520, 633 (2015). [DOI] [PubMed] [Google Scholar]
- [11].Lemon WC et al. , Whole-central nervous system functional imaging in larval Drosophila, Nat. Commun 6, 7924 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Naumann EA et al. , From whole-brain data to functional circuit models: The zebrafish optomotor response, Cell 167, 947 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Hildebrand DGC et al. , Whole-brain serial-section electron microscopy in larval zebrafish, Nature (London) 545, 345 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Scheffer LK et al. , A connectome and analysis of the adult Drosophila central brain, eLife 9, e57443 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Biswas T, Bishop WE, and Fitzgerald JE, Theoretical principles for illuminating sensorimotor processing with brain-wide neuronal recordings, Curr. Opin. Neurobiol 65, 138 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Ben-Yishai R, Bar-Or RL, and Sompolinsky H, Theory of orientation tuning in visual cortex, Proc. Natl. Acad. Sci. USA 92, 3844 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Skaggs WE, Knierem JJ, Kudrimoti HS, and McNaughton BL, A model of the neural basis of the rat’s sense of direction, Adv. Neural Inf. Process. Syst 7, 173 (1995). [PubMed] [Google Scholar]
- [18].Kim SS, Rouault H, Druckmann S, and Jayaraman V, Ring attractor dynamics in the Drosophila central brain, Science 356, 849 (2017). [DOI] [PubMed] [Google Scholar]
- [19].Turner-Evans DB et al. , The neuroanatomical ultrastructure and function of a biological ring attractor, Neuron 108, 145 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Kim JS et al. , Space-time wiring specificity supports direction selectivity in the retina, Nature (London) 509, 331 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Kornfeld J et al. , EM connectomics reveals axonal driven variation in a sequence-generating network, eLife 6, e24364 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Wanner AA and Friedrich RW, Whitening of odor representations by the wiring diagram of the olfactory bulb, Nat. Neurosci 23, 433 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Vishwanathan A et al. , Predicting modular functions and neural coding of behavior from a synaptic wiring diagram, 10.1101/2020.10.28.359620 (2020). [DOI] [Google Scholar]
- [24].Marder E and Taylor AL, Multiple models to capture the variability in biological neurons and networks, Nat. Neurosci 14, 133 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Friston K, Harrison L, and Penny W, Dynamic causal modeling, NeuroImage 19, 1273 (2003). [DOI] [PubMed] [Google Scholar]
- [26].Schneidman E II, Berry MJ, Segev R, and Bialek W, Weak pairwise correlations imply strongly correlated network states in a neural population, Nature (London) 440, 1007 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Pillow JW et al. , Spatio-temporal correlations and visual signalling in a complete neuronal population, Nature (London) 454, 995 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Huang H and Ding M, Linking functional connectivity and structural connectivity quantitatively: A comparison of methods, Brain Connect 6, 99 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Tschopp FD, Reiser MB, and Turaga SC, A connectome based hexagonal lattice convolutional network model of the Drosophila visual system, arXiv:1806.04793 [Google Scholar]
- [30].Zarin AA, Mark B, Cardona A, Litwin-Kumar A, and Doe CQ, A multilayer circuit architecture for the generation of distinct locomotor behaviors in Drosophila, eLife 8, e51781 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Litwin-Kumar A and Turaga SC, Constraining computational models using electron microscopy wiring diagrams, Curr. Opin. Neurobiol 58, 94 (2019). [DOI] [PubMed] [Google Scholar]
- [32].Prinz AA, Bucher D, and Marder E, Similar network activity from disparate circuit parameters, Nat. Neurosci 7, 1345 (2004). [DOI] [PubMed] [Google Scholar]
- [33].Fisher D, Olasagasti I, Tank DW, Aksay ER, and Goldman MS, A modeling framework for deriving the structural and functional architecture of a short-term memory microcircuit, Neuron 79, 987 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Goaillard JM, Taylor AL, Schulz DJ, and Marder E, Functional consequences of animal-to-animal variation in circuit parameters, Nat. Neurosci 12, 1424 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Baldi P and Hornik K, Neural networks and principal component analysis: Learning from examples without local minima, Neural Networks 2, 53 (1989). [Google Scholar]
- [36].Dauphin YN et al. , Identifying and attacking the saddle point problem in high-dimensional optimization, in Advances in Neural Information Processing Systems (MIT Press, Cambridge, MA, 2014). [Google Scholar]
- [37].Kawaguchi K, Deep learning without poor local minima, in Advances in Neural Information Processing Systems (MIT Press, Cambridge, MA, 2016). [Google Scholar]
- [38].Machta BB, Chachra R, Transtrum MK, and Sethna JP, Parameter space compression underlies emergent theories and predictive models, Science 342, 604 (2013). [DOI] [PubMed] [Google Scholar]
- [39].Transtrum MK et al. , Perspective: Sloppiness and emergent theories in physics, biology, and beyond, J. Chem. Phys 143, 010901 (2015). [DOI] [PubMed] [Google Scholar]
- [40].O’Leary T, Sutton AC and Marder E, Computational models in the age of large datasets, Curr. Opin. Neurobiol 32, 87 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Abbott LF and Regehr WG, Synaptic computation, Nature (London) 431, 796 (2004). [DOI] [PubMed] [Google Scholar]
- [42].Spruston N, Pyramidal neurons: dendritic structure and synaptic integration, Nat. Rev. Neurosci 9, 206 (2008). [DOI] [PubMed] [Google Scholar]
- [43].Zeng H and Sanes JR, Neuronal cell-type classification: challenges, opportunities and the path forward, Nat. Rev. Neurosci 18, 530 (2017). [DOI] [PubMed] [Google Scholar]
- [44].S. G. N. Grant, Synapse molecular complexity and the plasticity behaviour problem, Brain Neurosci. Adv 2, 239821281881068 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Curto C and Morrison K, Relating network connectivity to dynamics: Opportunities and challenges for theoretical neuroscience, Curr. Opin. Neurobiol 58, 11 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Billeh YN et al. , Systematic integration of structural and functional data into multiscale models of mouse primary visual cortex, Neuron 106, 388 (2020). [DOI] [PubMed] [Google Scholar]
- [47].Almog M and Korngreen A, Is realistic neuronal modeling realistic? J. Neurophysiol 116, 2180 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Bittner SR et al. , Interrogating theoretical models of neural computation with emergent property inference, eLife 10, e56265 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Gonçalves PJ et al. , Training deep neural density estimators to identify mechanistic models of neural dynamics, eLife 9, e56261 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Treves A and Rolls ET, What determines the capacity of autoassociative memories in the brain? Netw. Comput. Neural Syst 2, 371 (1991). [Google Scholar]
- [51].Salinas E and Abbott LF, A model of multiplicative neural responses in parietal cortex, Proc. Natl. Acad. Sci. USA 93, 11956 (1996). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Hahnloser RLT, On the piecewise analysis of networks of linear threshold neurons, Neural Netw 11, 691 (1998). [DOI] [PubMed] [Google Scholar]
- [53].Hahnloser RH, Seung HS, and Slotine JJ, Permitted and forbidden sets in symmetric threshold-linear networks, Neural Comput 15, 621 (2003). [DOI] [PubMed] [Google Scholar]
- [54].Morrison K, Degeratu A, Itskov V, and Curto C, Diversity of emergent dynamics in competitive threshold-linear networks: a preliminary report, arXiv:1605.04463 [Google Scholar]
- [55].Curto C, Geneson J, and Morrison K, Fixed points of competitive threshold-linear networks, Neural Comput 31, 94 (2019). [DOI] [PubMed] [Google Scholar]
- [56].Marder E, Neuromodulation of neuronal circuits: back to the future, Neuron 76, 1 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Mu Y et al. , Glia accumulate evidence that actins are futile and suppress unseccessful behavior, Cell 178, 27 (2019). [DOI] [PubMed] [Google Scholar]
- [58].Aitchison et al. , Model-based Bayesian inference of neural activity and connectivity from all-optical interrogation of a neural circuit, in Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA (2017). [Google Scholar]
- [59].Grienberger C and Konnerth A, Imaging calcium in neurons, Neuron 73, 862 (2012). [DOI] [PubMed] [Google Scholar]
- [60].Wilt BA, Fitzgerald JE, and Schnitzer MJ, Photon shot noise limits on optical detection of neuronal spikes and estimation of spike timing, Biophys. J 104, 51 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Theis L et al. , Benchmarking spike rate inference in population calcium imaging, Neuron 90, 471 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [62].Logothetis NK and Pfeuffer J, On the nature of the BOLD fMRI contrast mechanism, Magn. Reson. Imaging 22, 1517 (2004). [DOI] [PubMed] [Google Scholar]
- [63].Bartolo MJ et al. , Stimulus-induced dissociation of neuronal firing rates and local field potential gamma power and its relationship to the resonance blood oxygen level-dependent signal in macaque primary visual cortex, Eur. J. Neurosci 34, 1857 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [64].Heinzle J, Koopmans PJ, den Ouden HEM, Raman S, and Stephan KE, A hemodynamic model for layer BOLD signals, NeuroImage 125, 556 (2016). [DOI] [PubMed] [Google Scholar]
- [65].Advani M, Lahiri S, and Ganguli S, Statistical mechanics of complex neural systems and high dimensional data, J. Stat. Mech (2013) P03014. [Google Scholar]
- [66].Nair V and Hinton GE, Rectified linear units improve restricted Boltzmann machines, in Proceedings of the International Conference on Machine Learning (ICML) (ACM Press, New York, NY, 2010). [Google Scholar]
- [67].Krizhevsky A, Convolutional deep belief networks on CIFAR-10 (2010), https://www.cs.toronto.edu/~kriz/conv-cifar10-aug2010.pdf.
- [68].Xu B, Wang N, Chen T, and Li M, Empirical evaluation of rectified activations in convolutional network, in Deep Learning Workshop (ICML, 2015). [Google Scholar]
- [69].Rumelhart DE, Hinton GE, and Williams RJ, Learning representations by back-propagating errors, Nature (London) 323, 533 (1986). [Google Scholar]
- [70].Frankle J and Carbin M, The lottery ticket hypothesis: Finding sparse, trainable neural networks, arXiv:1803.03635 [Google Scholar]
- [71].Zhou H, Lan J, Liu R, and Yosinski J, Deconstructing lottery tickets: Zeros, signs, and the supermask, in Advances in Neural Information Processing Systems, edited by Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, and Garnett R (Curran Associates, 2019), Vol. 32. [Google Scholar]
- [72].Cover TM, Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE Trans. Electr. Comput 14, 326 (1965). [Google Scholar]
- [73].Gardner E, The space of interactions in neural network models, J. Phys. A: Math. Gen 21, 257 (1988). [Google Scholar]
- [74].Marr D, A theory of cerebellar cortex, J. Physiol 202, 437 (1969). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [75].Olshausen BA and Field DJ, Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Res 37, 3311 (1997). [DOI] [PubMed] [Google Scholar]
- [76].Glorot X, Bordes A, and Bengio Y, Deep sparse rectifier neural networks, in Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS) (ACM Press, New York, NY, 2011). [Google Scholar]
- [77].Kubo F et al. , Functional architecture of an optic flow-responsive area that drives horizontal eye movements in zebrafish, Neuron 81, 1344 (2014). [DOI] [PubMed] [Google Scholar]
- [78].Trachtenberg JT et al. , Long-term in vivo imaging of experience-dependent synaptic plasticity in adult cortex, Nature (London) 420, 788 (2002). [DOI] [PubMed] [Google Scholar]
- [79].Ziv Y et al. , Long-term dynamics of CA hippocampal place codes, Nat. Neurosci 16, 264 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [80].Attardo A, Fitzgerald JE, and Schnitzer MJ, Impermanence of dendritic spines in live adult CA1 hippocampus, Nature (London) 523, 592 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [81].Driscoll LN, Pettit NL, Minderer M, Chettih SN, and Harvey CD, Dynamic reorganization of neuronal activity patterns in parietal cortex, Cell 170, 986 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [82].Rule ME, O’Leary T, and Harvey CD, Causes and consequences of representational drift, Curr. Opin. Neurobiol 58, 141 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [83].Schoonover CE, Ohashi SN, Axel R, and Fink AJP, Representational drift in primary olfactory cortex, Nature (London) 594, 541 (2021). [DOI] [PubMed] [Google Scholar]
- [84].Marks TD and Goard MJ, Stimulus-dependent representational drift in primary visual cortex, Nat. Commun 12, 5169 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [85].Deitch D, Rubin A, and Ziv Y, Representational drift in the mouse visual cortex, Curr. Biol 31, 4327 (2021). [DOI] [PubMed] [Google Scholar]
- [86].Kappel D, Legenstein R, Habenschuss S, Hsieh M, and Maass W, A dynamic connectome supports the emergence of stable computational function of neural circuits through rewardbased learning, eNeuro 5, ENEURO.0301–17.2018 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [87].Burnstock G, Cotransmission, Curr. Opin. Pharmacol 4, 47 (2004). [DOI] [PubMed] [Google Scholar]
- [88].Chen BL, Hall DH, and Chklovskii DB, Wiring optimization can relate neuronal structure and function, Proc. Natl. Acad. Sci. USA 103, 4723 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [89].Brunel N, Hakim V, Isope P, Nadal J, and Barbour B, Optimal information storage and the distribution of synaptic weights: Perceptron versus Purkinje cell, Neuron 43, 745 (2004). [DOI] [PubMed] [Google Scholar]