Summary
Synaptic connectivity varies widely across neuronal types. Cerebellar granule cells receive five orders of magnitude fewer inputs than the Purkinje cells they innervate, and cerebellum-like circuits including the insect mushroom body also exhibit large divergences in connectivity. In contrast, the number of inputs per neuron in cerebral cortex is more uniform and large. We investigate how the dimension of a representation formed by a population of neurons depends on how many inputs they each receive and what this implies for learning associations. Our theory predicts that the dimensions of the cerebellar granule-cell and Drosophila Kenyon-cell representations are maximized at degrees of synaptic connectivity that match those observed anatomically, showing that sparse connectivity is sometimes superior to dense connectivity. When input synapses are subject to supervised plasticity, however, dense wiring becomes advantageous, suggesting that the type of plasticity exhibited by a set of synapses is a major determinant of connection density.
Introduction
Extensive synaptic connectivity is often cited as a key element of neural computation, a prime example being the cerebral cortex, where neurons each receive thousands of inputs. However, the majority of neurons in the human brain, the 50 billion neurons that form the cerebellar granule-cell layer, each receive input from only about four of the mossy fibers innervating the cerebellum (Eccles et al., 1966; Llinás et al., 2003). Sparse input is a feature shared by neurons that play roles analogous to granule cells in other neural circuits with cerebellum-like structures, such as the dorsal cochlear nucleus, the electrosensory lobe of electric fish, and the insect mushroom body (Mugnaini et al., 1980; Bell et al., 2008; Keene and Waddell, 2007). What is the functional role of diversity in synaptic connectivity, and what determines the appropriate number of input connections to a given neuronal type? In this study, we address these questions by investigating the ability of populations of neurons with different degrees of connectivity to support associative learning.
Both cerebellar and cerebrocortical regions are involved in a variety of experience-dependent adaptive behaviors (Raymond et al., 1996; Buonomano and Merzenich, 1998). In cerebellar cortex and other cerebellum-like circuits, synaptic modifications associated with learning occur among the elaborate dendrites of densely connected output neurons, for example cerebellar Purkinje cells (Ito et al., 1982) and the output neurons of the Drosophila mushroom body (Hige et al., 2015). Classic Marr-Albus theories of associative learning propose that the abundance of granule cells supports a high-dimensional representation of the information conveyed to the cerebellum by mossy fibers, and that the large number of synapses received by Purkinje cells allows them access to this representation to form associations (Marr, 1969; Albus, 1971). These theories assume that the inputs to granule cells are random and not modified during learning. Anatomical and physiological studies suggest that the handful of inputs received by granule cells in the electrosensory lobe of electric fish (Kennedy et al., 2014) and by Kenyon cells, the granule-cell analogs of the mushroom body, are a random subset of the afferents to these structures (Murthy et al., 2008; Caron et al., 2013; Gruntman and Turner, 2013). In many regions of cerebellar cortex, granule cells receive diverse (Huang et al., 2013; Chabrol et al., 2015; Ishikawa et al., 2015, but see Jörntell and Ekerot, 2006; Bengtsson and Jörntell, 2009 and Discussion), though not completely random (Billings et al., 2014) mossy-fiber input. In Marr-Albus theories, learning in cerebellar cortex relies exclusively on climbing-fiber-dependent modifications of the connections between parallel fibers and Purkinje cells, but unsupervised forms of plasticity have been reported for synapses from mossy fibers onto granule cells (Hansel et al., 2001; Schweighofer et al., 2001; Gao et al., 2012; D’Angelo, 2014; Gao et al., 2016, but see Rylkova et al., 2015).
The logic of experience-dependent circuit modifications is less clear in cerebral cortex, where densely connected neurons exhibit diverse forms of synaptic plasticity (Abbott and Nelson, 2000). Recent theoretical studies have proposed that populations of randomly connected cerebrocortical neurons support high-dimensional representations that enhance the ability of readout neurons to learn associations (Hansel and van Vreeswijk, 2012; Rigotti et al., 2013; Barak et al., 2013; Babadi and Sompolinsky, 2014), much as in theories of cerebellar cortex. Although these studies support the idea of random, high-dimensional representations as substrates for associative learning, they do not explain why the degree of synaptic connectivity in granule-cell and cerebrocortical layers is so different.
To address this issue, we explore the effects of degree of connectivity, excitation/inhibition balance, and synaptic weight distribution on the ability of a large neural representation to support associative learning tasks. We also investigate whether synaptic plasticity of input connections (e.g. mossy fiber to granule cell), augmenting plasticity of output connections (e.g. granule cell to Purkinje cell, as in Marr-Albus theories), improves performance. In these analyses, we distinguish between unsupervised synaptic plasticity that normalizes or otherwise modulates the gain of synaptic input without the aid of a feedback (“error”) signal and supervised synaptic plasticity that exploits feedback signals to reshape the neural representation based on prior experience.
Using a combination of analytic calculation and computer simulation, we find that the number of connections per neuron required to produce a high-dimensional representation increases slowly with the number of neurons. For a wide range of conditions (but not all), dimension and learning performance are maximized if the number of inputs is small. These results apply when the input synapses onto the granule or granule-like cells are selected randomly or approximately randomly and are not modified by a supervised learning procedure. In contrast, if we permit supervised modification of these synapses during the learning of a task, dense connectivity becomes advantageous. Our theory predicts a degree of connectivity that maximizes dimension for cerebellum-like structures and quantitatively matches what is observed for cerebellar granule cells and Kenyon cells of the Drosophila mushroom body. It also predicts that supervised synaptic plasticity at multiple stages of processing is necessary to exploit the dense connectivity of cerebral cortex.
Results
The “granule-cell” layer we consider consists of M neurons that each receive K excitatory connections from a random subset of N input channels and feedforward inhibition via a globally connected inhibitory neuron (Figure 1A). We assume that, as in the cerebellum and its analogs, M >N. We refer to K as the synaptic degree. Because the neurons in the second layer are selective to combinations of their inputs, we refer to the two layers as the input (e.g. mossy-fiber) and mixed (e.g. granule-cell) layers, respectively.
Figure 1.
Schematic of random expansion. (A) N input channels project to a mixed layer of M neurons. Each mixed-layer neuron receives K connections from a random subset of input channels and inhibition via a global inhibitory neuron. (B) Schematic of Drosophila mushroom body. Kenyon cells (the mixed-layer neurons) receive inputs from antennal lobe projection neurons in the calyx (shaded blue region) and send bifurcated parallel-fiber axons that form the mushroom body peduncle and lobes (shaded green region). Mushroom body output neurons read out the activity of parallel fibers. Kenyon cells excite and are inhibited by the anterior paired lateral (APL) neuron. (C) Schematic of cerebellar cortex. Mossy fibers project to the granule-cell (mixed) layer. Granule cells send parallel-fiber axons that are read out by Purkinje cells. Granule cells are inhibited by Golgi cells, which are excited by mossy and parallel fibers.
The ratio of the number of mixed-layer neurons to inputs, M/N, which we call the expansion ratio, is a critical parameter for our study. Our analysis assumes that the input channels are independent, so multiple channels with redundant activity are classified as a single input. This means that, for the Drosophila mushroom body, the number of distinct inputs is N = 50, an estimate of the number of antennal lobe glomeruli, while the estimated number of Kenyon cells is M = 2000 (Keene and Waddell, 2007), yielding an expansion ratio of 40 (Figure 1B). For the cerebellum, the number of mossy fibers and granule cells presynaptic to a single Purkinje cell in the rat are estimated to be N = 7000 and M = 209,000 (Marr, 1969; Harvey and Napper, 1991; Tyrrell and Willshaw, 1992; see Methods), meaning the expansion ratio in this case is 30 if the mossy fibers are assumed to be independent (Figure 1C).
Heterogeneous responses via random connectivity
A necessary requirement to produce a high-dimensional representation in the mixed layer is that its neurons should respond to different ensembles of inputs. To understand what is needed to produce this response heterogeneity, we first ask the simple question: what values of K ensure that, with high probability, each mixed-layer neuron receives a distinct subset of inputs? This question is a variant of the “birthday problem,” which concerns the likelihood of M people being born on unique days of the year. More generally one can ask: what is the probability p of M draws with replacement from R equally likely possibilities all being different? In our case, the number of possibilities is R = N choose K, the number of ways a mixed-layer neuron can choose K presynaptic partners from N inputs. The maximum value of this probability is always at K = N/2, but this maximum typically lies in the middle of a large range of K for which p is extremely close to 1. We therefore denote by K* the smallest value of K for which p attains at least 95% of its maximum. The level of 95% is, of course, arbitrary, but the value of K* typically changes very little, if at all, if p is varied over a reasonable range near 100% (see below).
Using numbers appropriate for the mushroom body, N = 50 and M = 2000, leads to K* = 7, equal to the average observed number of projection-neuron inputs received by Kenyon cells (Caron et al., 2013; p takes the values 0.88, 0.98, and 0.996 for K = 6, 7, and 8). Using the same criterion for the cerebellum (N = 7000 and M = 209,000), leads to K* = 4, equal to the typical number observed anatomically (Eccles et al., 1966; p takes values of 0.69, 0.9998, and greater than 0.9999 for K = 3, 4, and 5). That a small number of connections is sufficient to ensure each mixed-layer neuron receives a unique set of inputs owes to the rapid growth of N choose K with K. In other words, the combinatorial explosion in the number of possible wirings with synaptic degree permits even very sparsely connected systems to eliminate duplication in the mixed layer with high probability.
Dimension of mixed-layer inputs and responses
The combinatorial calculation in the previous section treated two mixed-layer neurons as distinct even if only one of their inputs differed, but partially overlapping inputs may introduce correlations between neural responses that reduce the quality of the mixed-layer representation. To provide a more nuanced analysis, we define and calculate a quantitative measure that characterizes the mixed-layer representation: its dimension. We define the dimension of a system with M degrees of freedom x = (x1, x2, … xM,), as
| (1) |
where λi are the eigenvalues of the covariance matrix of x computed by averaging over the distribution of inputs to the system (Abbott et al., 2011). If the components of x are independent and have the same variance, all the eigenvalues are equal and dim(x) = M. Conversely, if the components are correlated so that the data points are distributed equally in each dimension of an m-dimensional subspace of the full M-dimensional space, only m eigenvalues will be nonzero and dim(x) = m (Figure 2A). Later, we will show that this measure is closely related to the classification performance of a readout of mixed-layer responses.
Figure 2.
Dimension of mixed-layer representation. (A) Illustration of dimension for a system with three degrees of freedom x1, x2, and x3. Each point represents the response of the system to an input. The dataset consisting of the magenta points has a higher dimension than that of the green points. (B) The input-current dimension dim(h) and mixed-layer dimension dim(m), normalized by N, are plotted as a function of the expansion ratio M/N (light curve and dark curve, respectively). For each M, dimension is evaluated at the value of K that maximizes it. Results are shown for N = 1000, f = 0.1, and homogeneous excitatory synaptic weights. (C) Dependence of dimension on K. (D) Dependence of dimension on K for networks with and without inhibition, in the limit of a large expansion ratio.
Increased dimension of the mixed-layer representation requires, in addition to mixing of the inputs received by each neuron, a nonlinearity provided by their input-output response function. We begin by analyzing the effect of input mixing alone and then address the effect of nonlinearity, for the case of purely excitatory input (later we will add inhibition). In studying the effect of mixing alone, the components of the vector we consider, which we call h, are the total synaptic currents received by each of the mixed-layer neurons. These currents are given by h = Js, where J is the M × N matrix of synaptic weights describing the strengths of the connections from the input layer to the mixed layer, and s is the vector of activities for the input layer. We consider the case of uncorrelated inputs with a uniform variance across them. If the excitatory connections onto the mixed-layer neurons have homogeneous weights,
| (2) |
when M and N are large. Two features of this result are noteworthy. First, dim(h) ≤ N, even when M ≫ N (Figure 2B, light curve). This is because h is a linear function of the activity of the input layer s, so it cannot have dimension higher than N. Second, increasing the synaptic degree K reduces the dimension (Figure 2C, light curve). This is due to the final term in the denominator, which arises from correlations between the entries of h. On average, two mixed-layer neurons share K2/N of their K inputs, leading to an average correlation of K/N. Indeed, as K approaches N, all the currents received by mixed-layer neurons become identical, and dim(h) approaches 1. Thus, unlike the combinatorial calculation of the previous section, this analysis indicates that increasing K beyond 1 is detrimental because it reduces the dimension due to increased correlations, even when the identities of the inputs to each neuron are different.
However, our real interest is the dimension of the nonlinear output of the mixed layer. We consider mixed-layer neurons with binary outputs given by m = Θ(h − θ), where Θ is a step function (applied element-wise) and θ is a vector of activity thresholds, one for each neuron. The thresholds are chosen so that each neuron is active with probability f, averaged across all the input patterns which, for now, we take to be standard Gaussians (we refer to the input-layer response to a particular stimulus as an input pattern). We call f the coding level of the mixed layer. Under these assumptions, the mixed-layer dimension is given by
| (3) |
where ρij is the correlation coefficient of the activities mi and mj of neurons i and j, averaged over the distribution of input patterns (a more general expression holds when the coding level of the neurons is not identical; see Methods). Thus, the maximum mixed-layer dimension is limited by correlations among its neurons and saturates to a limiting value as the expansion ratio grows (Equation 3; Figure 2B, dark curve; also see Babadi and Sompolinsky, 2014). The limiting value scales linearly with N (since Var (ρij) ~ 1/N; Figure S1). For a coding level of f = 0.1, the saturation suggests that expansion ratios beyond 10–50 do not increase the mixed-layer dimension. This maximum expansion ratio and the maximum dimension both increase as the coding level is reduced (Figure S1), but as we will see, representations with extremely low coding levels do not necessarily lead to improved discrimination (Barak et al., 2013; Babadi and Sompolinsky, 2014).
We next investigate the dependence of mixed-layer dimension on K. When M is small, the mixed-layer dimension is similar to the input-current dimension and nearly constant over a wide range of K values (Figure 2C, top). However, for larger M, dimension initially grows with K, as increased K results in each mixed-layer neuron being selective to different combinations of inputs (Figure 2C, middle and bottom). The dimension is maximized for an intermediate value of K that depends on N, f, and the distribution of synaptic weights. For the case of homogeneous excitatory synaptic weights, this value is K = 9 for N = 1000 and f = 0.1. Above this value, dimension decreases because of positive average correlations among the mixed-layer neurons (Figure 2C; Figure S1). Thus, the detrimental effect of even small average correlations (〈ρij〉 in the denominator of Equation 3) on dimension leads to dimension being maximized at small K.
Many studies have shown that inhibition can decorrelate neural activity (Ecker et al., 2010; Renart et al., 2010; Wiechert et al., 2010), so we next investigate if inhibition can increase dimension by reducing correlations among mixed-layer neurons. In the Drosophila mushroom body, a single GABAergic interneuron on each side of the brain inhibits the Kenyon-cell population globally (Liu and Davis, 2009; Figure 1B). In the cerebellum, Golgi cells are vastly outnumbered by the granule cells they inhibit (Eccles et al., 1966; Figure 1C). We therefore begin by introducing a single neuron that inhibits all mixed-layer neurons in proportion to its input, which is equal to the summed input-layer activity (Figure 1A). When inhibition is tuned to balance excitation on average (see Methods), the distribution of the correlation between inputs received by pairs of mixed-layer neurons has zero mean, although the variance remains finite (Figure S1). Consequently, dim(h) does not decrease with K, and dim(m) does not exhibit a peak at small K (Figure 2D). However, even in this case, dimension quickly approaches its maximum (which occurs at K = N/2), and when N = 1000 and the expansion ratio is large, it attains 95% of its maximum value at K = 29, or approximately 3% connectivity. Furthermore, the increase in dimension due to inhibition is only substantial for sufficiently large K.
Thus, nonlinear mixed-layer neurons with small synaptic degree are sufficient to produce high-dimensional representations. This observation is consistent with the combinatorial argument of the first section, which showed that the explosion in possible wirings with synaptic degree leads to few redundant mixed-layer neurons, even when the degree is small. The present analysis also shows that positive average correlations limit dimension when mixed-layer neurons receive purely excitatory input, and that when K is large global inhibition can increase dimension through decorrelation. Although our analytic calculations are most easily performed for systems with feed-forward inhibition, we verified with simulations that our qualitative results also hold for sufficiently strong recurrent inhibition (Figure S2).
Optimal connectivity for random representations under resource constraints
The observations of the previous section suggest that a representation formed by many neurons with small synaptic degree may be higher-dimensional than one formed by fewer neurons with large synaptic degree. Therefore, when constructing a randomly wired neural system with limited resources, the former strategy may be preferable. To formalize this intuition, we ask: What combination of mixed-layer neuron number M and synaptic degree K maximizes the dimension of the mixed-layer representation when the total number of connections S = MK is limited to some maximum value? This is equivalent to fixing the number of presynaptic sites to which mixed-layer neurons can connect.
For f = 0.1 and S = 14,000, consistent with the Drosophila mushroom body (Keene and Waddell, 2007; Caron et al., 2013), the optimum occurs at K = 4 when inhibition is absent or K = 8 when it is present (Figure 3A), close to the observed value of 7. For f = 0.01 and S = 8.4×105, parameters consistent with the cerebellar granule-cell representation (Eccles et al., 1966; Harvey and Napper, 1991), the optimum occurs at K = 4 (Figure 3B), equal to the observed value. In this case, the presence of inhibition has a vanishing effect when K is small. This is because for N = 7000 and small K, the average overlap of inputs received by granule-cell pairs is negligible, and decorrelation via inhibition is not necessary. Although we chose the mixed-layer coding level based on available experimental estimates (Chadderton et al., 2004; Honegger et al., 2011), our conclusions do not depend strongly on this parameter (Figure 3C and D). For both the Drosophila mushroom body and the cerebellum, therefore, allocating synaptic resources among randomly connected neurons to maximize dimension quantitatively predicts the observed synaptic degree.
Figure 3.
Optimal connectivity with constrained connection number. (A) For each K, dim(m)/N is plotted with the number of mixed-layer neurons set to M = S/K. S is equal to 14,000, an estimate of the total number of connections from projection neurons onto Kenyon cells in each hemisphere of the Drosophila mushroom body (Keene and Waddell, 2007; Caron et al., 2013). Other parameters are N = 50, an estimate of the number of antennal lobe glomeruli, and f = 0.1. Arrow indicates the average value of K observed anatomically. (B) Same as (A) but for parameters consistent with the cerebellar granule-cell representation (Eccles et al., 1966; Harvey and Napper, 1991), S = 8.4 × 105, N = 7000, and f = 0.01. (C and D) Optimal value of K, defined as the maximum of curves computed as in (A) and (B) for different mixed-layer coding levels. In (B) and (D), points for inhibition and no inhibition lie on top of each other when K ≪ N.
Heterogeneous synaptic weights
EPSP amplitudes recorded in vitro are not homogeneous but broadly distributed (Sargent et al., 2005; Song et al., 2005). To examine if our results are valid in the presence of this heterogeneity, we ask what changes when synaptic weights are drawn from a distribution with mean 〈w〉 and variance Var (w). We first examine the dimension of the input currents. When M ≫ N ≫ 1, and in the absence of inhibition,
| (4) |
First, we observe that the input-current dimension still decreases as K increases (compare with Equation 2). However, this decrease occurs more slowly when Var (w) is large. This makes sense because, with heterogeneous weights, even neurons with identical input ensembles can receive different input currents. Accordingly, when considering the nonlinear mixed-layer responses, the maximum dimension occurs for a larger value of K = 19 when synaptic weights are sampled from a log-normal distribution matching recorded EPSP amplitudes in neocortical slices (Song et al., 2005) and inhibition is absent, compared to K = 9 for the case of homogeneous weights (Figure 4A, compare to Figure 2D).
Figure 4.
The case of heterogeneous synaptic weights. (A) Dependence of dimension on K for log-normal synaptic weights chosen from a distribution consistent with recordings from neocortical neurons (Song et al., 2005), with N = 1000, f = 0.1, and in the limit of a large expansion ratio. (B) Similar to (A) for different numbers of global inhibitory neurons NI . (C) Dependence of mixed-layer coding level f on input-layer coding level for K = 10. Input-layer activities were sampled from a standard Gaussian distribution truncated at zero, with a coding level of finput. Mixed-layer activity thresholds were chosen to obtain an average coding level of f = 0.1 across input patterns when finput is drawn uniformly from the range (0,1) for each pattern. (D and E) Same as Figure 3A and B but for log-normal synaptic weights (NI = 1 for the case of inhibition). Parameters of the log-normal distribution were chosen to match data from neocortical neurons (Song et al., 2005; filled circles) or cerebellar granule cells (Sargent et al., 2005; open circles). The presence of inhibition or the number of inhibitory neurons NI does not affect the shape of the curves in (E).
Next, we add global inhibition. We assume that the synaptic weights of the connections from the input layer onto both mixed-layer neurons and the global inhibitory neuron are heterogeneous, while inhibitory weights are homogeneous. In this case, the addition of one inhibitory neuron only modestly increases the mixed-layer dimension (Figure 4A). To understand why this improvement is weaker than for homogeneous weights (Figure 2D), we again examine the correlation of the input currents received by pairs of mixed-layer neurons. When the weights of the connections received by the inhibitory neuron from the input layer are chosen from a distribution with mean 〈wI 〉 and variance Var (wI ), the average input-current correlation is proportional to
| (5) |
when inhibition is tuned to minimize correlation (see Methods). Only when all the connections received by the inhibitory neuron have homogeneous weights (so that Var (wI) = 0), can perfect decorrelation occur. This is because, when these weights are heterogeneous, inhibition serves as an extra source of correlated input, precluding full decorrelation.
However, this heterogeneity can be averaged out by the presence of multiple inhibitory neurons, leading to more effective decorrelation. When the number of inhibitory neurons is increased, the maximum dimension increases, and the maximum is located at a larger value of K (Figure 4B). Notably, the advantage of inhibition is restricted to large K, as in the case of homogeneous weights (Figure 2D).
There seems to be little benefit to the presence of inhibition when K is small and N is large, at least as far as dimensionality is concerned. However, inhibition may be useful for different purposes. For example, if inputs convey signals with different coding levels, it is difficult to set an activity threshold that assures that neurons respond to signals with a low coding level without saturating for signals with a high coding level (in other words, the dynamic range of the mixed layer is low). Global inhibition can mitigate this problem by increasing a neuron’s effective threshold for dense signals (Marr, 1969; Albus, 1971; Billings et al., 2014). In our model, one global inhibitory neuron is sufficient to increase the dynamic range of the mixed layer, even if it sums its inputs through connections with heterogeneous weights (Figure 4C).
Inhibition may therefore serve two purposes in neural circuits implementing random expansions: normalization and decorrelation. The former can be accomplished with a single inhibitory neuron that provides global graded inhibition (Figure 4C), while the latter requires either precisely tuned inhibitory synaptic weights (Figure 2D) or sufficiently many inhibitory neurons to average heterogeneities in synaptic weights (Figure 4B). The benefits of decorrelation are more pronounced as synaptic degree grows, but even in this case dimension increases at best logarithmically (Figure 4B). Thus, neural circuits containing many mixed-layer neurons with relatively small synaptic degrees and few global inhibitory neurons are capable of providing much of the benefits of a random expansion. Accordingly, even for broadly distributed synaptic weights, the optimal value of K for the mushroom body and cerebellum calculated in the manner of Figure 3 is no more than 7 (Figure 4D and E). Notably, when the distribution of weights is chosen to match recordings from cerebellar granule cells (Figure 4D and E, open circles; Sargent et al., 2005), rather than recordings from neocortical neurons that predict a broader distribution (Figure 4D and E, filled circles; Song et al., 2005), dimension is increased, and the optimal synaptic degrees more closely match the values observed anatomically.
Local connectivity in cerebellar cortex
The inputs to Drosophila mushroom body Kenyon cells appear to be random (Caron et al., 2013), but the extent to which this is true for cerebellar cortex is unclear. We extended a recent model (Billings et al., 2014) to determine the dimension of the representation formed by cerebellar granule cells when their inputs are chosen in accordance with the constraints imposed by the spatial arrangement of granule cells and mossy fibers. In the model, synaptic connections are formed depending on the distance between granule cells and mossy-fiber rosettes (Figure 5A; see Methods).
Figure 5.
Detailed model of cerebellar cortex. (A) Schematic of random model (left) and local model (right) of mossy-fiber to granule-cell connectivity based on Billings et al., 2014. (B) Distribution of the number of shared mossy-fiber inputs received by pairs of granule cells when connections are chosen randomly (random model, gray) or based on the distance between mossy-fiber rosettes and granule cells in the local model (black). (C) Dimension as a function of K as in Figure 3B for the two models. For the local model, the granule-cell density is scaled inversely with K to fix the total number of connections. Results are shown for excitatory synaptic weights with a distribution of strengths chosen to match recordings from cerebellar granule cells (Sargent et al., 2005), but the results do not change if global inhibition is added to the model (see Figure 4E).
This connectivity rule leads to an increased proportion of nearby granule-cell pairs that have multiple shared mossy-fiber inputs (Figure 5B). Consistent with this increased redundancy, the dimension of the resulting representation decreases by approximately 50% compared to the case of random connectivity (Figure 5C), suggesting that spatially structured wiring limits dimension. Nonetheless, the optimal synaptic degree still occurs at K = 4. Thus, sparse connectivity can be sufficient to create high-dimensional representations even when dimension is limited by nonrandom wiring.
Classification of mixed-layer responses
Dimensionality provides a convenient measure of the quality of a mixed-layer representation because it makes few assumptions about how the representation is used. A less abstract measure is the ability of a decoder to correctly classify input patterns on the basis of mixed-layer responses. Consider a set of P patterns of activity sμ for μ = 1…P in the input layer, each of which is associated with a positive or negative valence, denoted by vμ = ±1, with equal probability. Patterns associated with positive or negative valences might correspond to neural activity evoked by appetitive or aversive stimuli, for example. We examine the performance of a fully connected Hebbian classifier of mixed-layer responses trained to discriminate between input patterns of opposing valences (Figure 6A; see Methods). During the training phase, the classifier updates its synaptic weights according to the mixed-layer activity and the valence so that, after training, the weights are given by , where mμ is the mixed-layer activity pattern evoked by input pattern sμ (the subtraction is performed element-wise). To assess the performance of this classifier during the test phase, we compute its error rate when classifying the activity evoked by instances of previously learned input patterns that have been corrupted by noise. Input noise produces noise in the mixed layer, which we quantify by a measure , where d is the mean-squared distance between the mixed-layer activity during training and testing (see Methods). If noise is not present, Δ = 0, while if noise is so large that training and test patterns are uncorrelated, Δ = 1.
Figure 6.
Classification of mixed-layer responses. (A) Schematic of network with a classifier that computes a weighted sum of mixed-layer activity to determine the valence of an input pattern. (B) Δ for N = 1000 and an expansion ratio of M/N = 5, for log-normal synaptic weights (Song et al., 2005) without inhibition (black) and with NI = 100 global inhibitory neurons (red). Left: Gaussian input patterns (curves with and without inhibition lie on top of each other). Right: Binary input patterns with coding level of finput = 0.5. For Gaussian patterns, the standard deviation of the noise at the input layer is 0.3 times that of the signal, while for binary patterns, noise is introduced by randomly flipping the activity of 10% of the inputs. For binary patterns and no inhibition, the mixed-layer coding level cannot be adjusted to exactly match the desired level of f = 0.1, because the input to each mixed-layer neuron takes only discrete values. We therefore choose the threshold that leads to the highest f ≤ 0.1. For K = 1, 2, 3, it is not possible to find such a threshold with f > 0, so data is not shown. (C) Error rate of a Hebbian classifier when classifying P = 1000 random input patterns. Circles denote simulations and dashed lines denote prediction from Equation 6.
For the case of uncorrelated input patterns and valences, mixed-layer neurons with a fixed coding level, and input-layer to mixed-layer connections that are not modified by supervised learning, there is a particularly simple relationship between the performance of a Hebbian classifier and mixed-layer dimension (Equation 3). The error rate of the classifier is expressed in terms of a signal-to-noise ratio (SNR) for the mixed layer through the relation (Babadi and Sompolinsky, 2014). The SNR is proportional to the mixed-layer dimension (seeMethods) and given by
| (6) |
when P is large. Thus, SNR, and hence error rate which is a monotonically decreasing function of SNR, depend only on the mixed-layer dimension, the number of input patterns, and Δ. We have already explored the dependence of dimension on synaptic degree (Figures 2–5). All that remains is to understand the dependence of Δ, which is determined by the difference between the mixed-layer representation of an input and that same input corrupted by noise. For a fixed coding level and Gaussian noise in the input layer, the dependence is trivial; Δ is independent of synaptic degree (Figure 6B, left). Thus, the synaptic degree that maximizes dimension for a given coding level also minimizes error rate, regardless of the level of noise in the input layer (Figure 6C, left). We note that Δ is high when the coding level is low (Barak et al., 2013; Babadi and Sompolinsky, 2014; Figure S3), meaning that representations with a very low coding level have a reduced SNR even though they have a higher dimension (Figure S1).
Gaussian input patterns and noise may apply when inputs received by mixed-layer neurons are communicated via smoothly varying firing rates. However, for neurons with small synaptic degree, a few EPSCs are often sufficient to generate a spiking response (Chadderton et al., 2004). In this case, it is more appropriate to describe the input patterns as discrete spike counts. While the qualitative dependence of mixed-layer dimension on synaptic degree is similar for discrete and Gaussian inputs (Figure S4), the dependence of Δ is not (Figure 6B, right). Surprisingly, sparse connectivity leads to lower Δ than dense connectivity in this case (except for very small K; see Figure 6 caption). This further biases the optimal connectivity toward small synaptic degree (Figure 6C, right). Noise reduction at small K arises from the fact that, when coding level is low, many mixed-layer neurons are just slightly above or below threshold and thus sensitive to noise. When K is large, all neurons experience noise in a fraction of their inputs, while when K is small, there is a subpopulation of neurons whose input is not strongly affected by noise and that can respond reliably. The effect is maximal for low noise and high mixed-layer coding level (Figure S3). Thus, small synaptic degree can curtail the amplification of input noise, if this noise arises from low-rate spiking activity.
For the above analyses, we assumed that only fluctuations in the input layer contribute to noise in the mixed layer. However, if synaptic release is unreliable and uncorrelated across release sites (Markram et al., 1997; Sargent et al., 2005), Δ can be reduced by increasing synaptic degree to average over these fluctuations (Figure S5). This averaging biases the optimal synaptic degree toward larger values. Cerebellar granule cells appear to be capable of responding reliably to mossy fiber activation (Chadderton et al., 2004), but quantifying the relative contributions of input-layer and synaptic fluctuations to noise in the mixed layer requires further analysis.
Learned mixed-layer representations
We have shown that small synaptic degree is sufficient to obtain optimal or near-optimal classification performance when the connections and synaptic weights of the mixed-layer neurons are set randomly and the activities of input-layer and mixed-layer neurons are normalized. What happens if synaptic weights can be modified by plasticity directed by a learning process? We considered the performance of a classifier after both unsupervised and supervised modifications of the connections received by mixed-layer neurons.
Unsupervised learning has been proposed to improve classification by regulating the activity of the cerebellar granule-cell layer (Schweighofer et al., 2001; D’Angelo, 2014). We hypothesized that such learning could be particularly important when the activities of different inputs are not normalized. In this case, synaptic plasticity could increase the influence of presynaptic inputs whose activity only weakly activates mixed-layer neurons, ensuring each mixed-layer neuron conveys information relevant to a task (Schweighofer et al., 2001). We implemented a plasticity rule that bidirectionally adjusts synaptic weights based on recent activity to compensate for changes in activity level, as well as one that adjusts the excitability of mixed-layer neurons to maintain a desired coding level (see Methods). After learning, the error rate of the classifier is reduced (Figure 7A). Notably, small synaptic degree is still sufficient to attain high performance.
Figure 7.
Learning mixed-layer representations. (A) Error rate of a Hebbian classifier before (magenta) and after (black) unsupervised learning. The input-layer activity and mixed-layer coding level are initially heterogeneous (see Methods). Parameters are N = 1000, M/N = 5, P = 1000, and f = 0.1. Global inhibition is present, and excitatory synaptic weights are initialized to be homogeneous before learning. (B) Effect of supervised learning on error rate. Results are shown for learning that associates inputs with either random mixed-layer target patterns (blue) or mixed-layer target patterns in which neurons are sensitive to combinations of input and valence (cyan). Results are compared to the case of synaptic weights drawn randomly from a standard Gaussian distribution (black). Other parameters are as in (A). (C) Same as (B) but for mixed-layer dimension, averaged over the distribution of random input patterns.
We next asked if supervised learning could further reduce the error rate for a system in which inputs and mixed-layer responses have already been normalized. To implement learning, the strengths of connections received by mixed-layer neurons were modified with a Hebbian rule to associate each input pattern with a target pattern of activity in the mixed layer (see Methods). In this scenario, the problem of learning is to determine the appropriate target patterns. Unlike the unsupervised case, this form of learning requires a feedback signal to determine the target patterns.
A previous study proposed associating each input pattern with a random and uncorrelated target pattern (Babadi and Sompolinsky, 2014). This approach attempts to reduce the overlap of any two target patterns, which may otherwise be large if mixed-layer neurons are highly correlated. Such learning reduces the error rate, but only for large synaptic degree (Figure 7B, blue curve). The magnitude of the reduction is greater for very low coding levels (Figure S6), although it is still restricted to large synaptic degree.
Improved performance can be achieved if mixed-layer target patterns incorporate information about valence. When target patterns are selective for conjunctions of valence and input, classification performance improves dramatically for large synaptic degree (Figure 7B, cyan curve). This improvement demonstrates the utility of mixed representations that incorporate information about both input and desired outcome. Such representations can be generated in a biologically plausible manner by computing linear combinations of input and valence and applying a threshold (see Methods).
The mixed-layer representation is not uncorrelated with valence or the input patterns presented during training in the case of supervised learning, and therefore the error rate is no longer simply related to dimension (Equation 6). In fact, the dimensions of the representations described above, computed over the distribution of all random input patterns, are lower than the dimension of a random representation (Figure 7C). This reflects a tradeoff between minimizing classification error for known patterns and creating a high-dimensional representation of arbitrary novel patterns. Surprisingly, the benefits of these learned representations, when compared to random connectivity with Gaussian statistics, are only evident when K >10, suggesting that when inputs to mixed-layer neurons are appropriately normalized and synaptic degree is small, implementing a supervised learning procedure may not improve performance on associative learning tasks.
Discussion
Our analysis demonstrates that the sparse connectivity observed in the cerebellar granule-cell layer and analogous structures is well suited for producing high-dimensional representations that can be read out by densely connected output neurons. This conclusion is supported by simple combinatorial arguments, analysis of dimension (Figures 2–5), and the relationship between dimension and classification error (Figure 6C), all of which demonstrate that the performance of randomly wired neural systems quickly saturates as synaptic degree grows. While random wiring is unable to exploit synaptic degrees beyond this saturation point, supervised Hebbian modifications of input synapses can dramatically improve classification performance for large synaptic degrees (Figure 7B). Such modifications may therefore be present in cerebrocortical circuits, where input and output neurons are both densely connected.
Expansion via sparse connectivity in neural systems
Classic theories of cerebellar cortex were primarily interested in the mechanisms that maintain a low coding level (that is, sparse activity) in the granule-cell layer and did not consider its dimension (Marr, 1969; Albus, 1971). While a low coding level can increase dimension (Figure S1), it is not sufficient to do so, because correlations in activity can limit dimension regardless of the coding level (Figure 2A). The representations we considered exhibit low coding levels, but we note that a low coding level does not require or imply sparse connectivity. Marr (1969) implied that sparse connectivity can ensure that the granule-cell coding level does not saturate as the level of mossy fiber activity is varied, and a similar argument has been made for the mutual information between mossy-fiber input and granule-cell responses (Billings et al., 2014). However, the importance of sparse connectivity for this purpose is reduced if inhibition is included (see Figure 4C), or if preprocessing normalizes the activity of the inputs (see below). Our result that small synaptic degree maximizes dimension does not depend strongly on coding level (Figure 3C and D).
Our analysis shows that, when synaptic wiring is random, the dimension of a representation formed by a large number of sparsely connected neurons is typically higher than that of a smaller number of densely connected neurons (Figure 3). This intuition holds as long as the ratio of mixed-layer neurons to inputs is below 10–50. Above this expansion ratio, increases in dimension are modest and require both well-tuned inhibition and large increases in synaptic degree (Figure S1). The observed expansion ratio in the Drosophila mushroom body and an estimate of the expansion ratio of the granule cells presynaptic to a single Purkinje cell in cerebellar cortex both lie within this range of 10–50 (Keene and Waddell, 2007; Marr, 1969; Tyrrell and Willshaw, 1992). Thus, the extensive dendritic arbors of the output neurons in both circuits may provide near-optimal classification ability, allowing cerebellum-like circuits to operate close to the limit of performance for sparsely connected neural systems. While we established a formal link between dimension and one specific computation, binary classification (Figure 6), high-dimensional representations may be useful for a variety of tasks (Rigotti et al., 2013) and thus desirable even in circuits whose output neurons do not perform binary classification.
The cerebellar granule-cell representation
Classic theories of cerebellar cortex assumed that granule cells employ fixed, random connections from mossy fibers to produce a high-dimensional representation (Marr, 1969; Albus, 1971). The spatial organization of cerebellar cortex (Billings et al., 2014; Figure 5) rules out the extreme case in which every mossy-fiber input can be assigned randomly to any granule cell. However, within defined spatial domains, inputs to individual granule cells appear heterogeneous (Huang et al., 2013; Chabrol et al., 2015; Ishikawa et al., 2015), supporting a spatially restricted form of randomness. This is sufficient for our theory because granule cells represent heterogeneous combinations of mossy-fiber input relevant to the classification performed by their postsynaptic Purkinje cells, which are unlikely to require all possible input combinations. Some studies have suggested that granule-cell input is more homogeneous (Jörntell and Ekerot, 2006; Bengtsson and Jörntell, 2009) and that granule cells function as noise filters (Ekerot and Jörntell, 2008) or improve generalization (Spanne and Jörntell, 2015). However, even when we impose a corresponding level of similarity of tuning in our cerebellar cortex model, the synaptic degree that maximizes dimension is still small (K ≤ 4), although the dimension of the resulting representation is reduced (Figure S7). Another study related to ours, of which we recently became aware, shows that sparse connectivity also improves convergence speed in a model of associative learning in cerebellar cortex (Cayco-Gajic et al., 2016).
Classic models of cerebellar cortex assume that the inputs to granule cells are fixed (Marr, 1969; Albus, 1971), but plasticity has been reported at mossy-fiber to granule-cell synapses (Hansel et al., 2001; Schweighofer et al., 2001; Gao et al., 2012; D’Angelo, 2014; Gao et al., 2016), although these synapses appear morphologically stable (Rylkova et al., 2015). Prior modeling studies have suggested that these modifications reflect an unsupervised learning process that allows granule cells to respond to inputs relevant to a task (Schweighofer et al., 2001). Consistent with this, unsupervised learning improves classification performance in our model (Figure 7A). However, the performance of these models still saturates when synaptic degree is small, in contrast to models subject to supervised learning (Figure 7B). The fact that climbing-fiber error signals target Purkinje cells but not granule cells is consistent this observation. It is also possible that mossy-fiber plasticity is useful for functions other than improving associative learning performance, and further work is necessary to test this hypothesis (D’Angelo, 2014).
The distribution of the strengths of connections received by cerebellar granule cells from mossy fibers appears to be less variable than the very broad, log-normal distribution observed for neocortical neurons (Sargent et al., 2005; Song et al., 2005). In our model, this reduced heterogeneity increases the dimension of the granule-cell representation (Figure 4E). In contrast, the broader distribution for neocortical neurons may be the outcome of a supervised learning process rather than the basis of a high-dimensional representation, consistent with our predictions (Figure 7) and other theoretical analyses that show similar distributions can be generated through associative learning (Chapeton et al., 2012; Brunel, 2016).
Neural representations for associative learning
We considered the simple case of learning associations between outputs and random, uncorrelated input patterns with a fixed coding level, a standard benchmark for learning systems (Gardner, 1988; Barak et al., 2013; Babadi and Sompolinsky, 2014). Under these assumptions, we derived a direct relationship between the performance of a Hebbian classifier and dimension (Equation 6), thus making explicit the previously assumed relationship between these two quantities that characterize random mixed-layer representations (Rigotti et al., 2013). Our qualitative results also hold for more sophisticated pseudoinverse and maximum-margin classifiers, which can achieve low error rates when mixed-layer neurons have heterogeneous coding levels (Figure S8).
The activities of the inputs to biological systems will inevitably exhibit some level of correlation, violating the simplifying assumptions of our model. However, the magnitude of this correlation may be reduced by decorrelating mechanisms within the pre-processing circuitry of cerebellum-like systems such as the Drosophila antennal lobe (Bhandawat et al., 2007). Similarly, normalizing circuitry may ensure that input coding levels remain relatively constant (Carandini and Heeger, 2012). For more complicated distributions of input patterns and noise that strongly violate our assumptions, different analyses are required, and the results will depend on the exact forms of these distributions. Some simple forms of correlation among input channels do not substantially affect the location of the optimal synaptic degree (Figure S7). For a classification problem related to the challenging “parity” task, in which only knowledge of the pattern of activity in every one of a set of independent input streams provides any information about valence (Barak et al., 2013), larger synaptic degree is needed when there are many input streams (Figure S8). Dense connectivity or structural plasticity mechanisms capable of selecting relevant inputs may be necessary for such tasks.
Our analysis also did not consider the complex temporal dynamics that underlie many forms of associative learning. To extend our model to the temporal domain for systems that lack strong recurrence, each input pattern could be interpreted as the averaged activity during a period comparable to the integration time constant of a mixed-layer neuron. In this case, it is necessary to consider an ensemble of patterns with correlations determined by the temporal statistics of the input. Short-term synaptic plasticity (Xu-Friedman and Regehr, 2003; Chabrol et al., 2015) and unsupervised learning mechanisms (Schweighofer et al., 2001) are likely to control these statistics to ensure well-timed input to mixed-layer neurons. In general, we expect the qualitative result that small synaptic degree is sufficient for high performance to hold provided that the temporal overlap of inputs is sufficient to allow mixed-layer neurons to respond to heterogeneous input combinations.
In our model, supervised modifications of the synaptic weights received by mixed-layer neurons, in addition to those of the classifier, are required to exploit large synaptic degree but provide little improvement when synaptic degree is small (Figure 7B). We also found little improvement at small synaptic degree using a state-of-the-art implementation of a gradient descent algorithm for modifying synaptic weights (Figure S9; Kingma and Ba, 2014). We cannot rule out the existence of other learning rules that lead to improved performance, but, if they exist, these rules may not be easily implementable by biological systems.
Learning in cerebellum-like and cerebrocortical systems
These observations suggest that supervised learning of the inputs to cerebellar granule cells or analogous neurons in cerebellum-like regions may be unnecessary to achieve near-optimal performance for learning associations when input-layer activity is normalized (Figure 7). As described above, the modifications of mossy-fiber to cerebellar granule-cell synaptic transmission that have been identified in vitro and in vivo may underlie an unsupervised learning process (Schweighofer et al., 2001). Another study reported enhanced Kenyon-cell responses to conditioned odors in honeybees, but this change may reflect learning at other stages of the olfactory processing hierarchy (Szyszka et al., 2008). In cephalopods, plasticity in the vertical lobe, a cerebellum-like structure, appears to be confined to output neurons in cuttlefish, but present only in the dendrites of intermediate-layer, not output-layer, neurons in Octopus vulgaris (Shomrat et al., 2011). This does not contradict our analysis, which predicts that supervised learning of mixed-layer responses is dispensable only in the presence of output-layer plasticity. Experiments suggest that Kenyon cells in the locust mushroom body are densely connected (Jortner et al., 2007), unlike in Drosophila, implying that their input synapses may potentially exhibit associative modifications.
For densely connected neural systems, we investigated various forms of supervised modifications of input-layer to mixed-layer connections and found that learning based on mixed-layer conjunctions of valence and input provides the greatest improvement. This is consistent with the observation that both task-relevant variables and reward are represented in higher neocortical regions (Saez et al., 2015). Although our analysis neglected features of cerebral cortex including recurrence and complex temporal dynamics, it suggests that supervised learning of both inputs and outputs is critical to take advantage of its dense connectivity. Thus, cerebrocortical systems are likely to exploit supervised synaptic plasticity at multiple levels of processing, similar to artificial neural networks that attain high performance on complex categorization tasks (LeCun et al., 2015).
STAR Methods
KEY RESOURCES TABLE
KEY RESOURCES TABLE
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Antibodies | ||
| Bacterial and Virus Strains | ||
| Biological Samples | ||
| Chemicals, Peptides, and Recombinant Proteins | ||
| Critical Commercial Assays | ||
| Deposited Data | ||
| Experimental Models: Cell Lines | ||
| Experimental Models: Organisms/Strains | ||
| Oligonucleotides | ||
| Recombinant DNA | ||
| Software and Algorithms | ||
| TensorFlow machine learning library | Google Research | http://www.tensorflow.org |
| Adam gradient descent optimizer | Kingma & Ba, 2014 | http://arxiv.org/abs/1412.6980 |
| Algorithms for calculation of dimension and error rate | This paper | http://www.columbia.edu/~ak3625/code/litwinkumar_et_al_dimension_2017.zip |
| Other | ||
CONTACT FOR RESOURCE SHARING
Further requests for resources should be directed to and will be fulfilled by the Lead Contact, Ashok Litwin-Kumar (ak3625@columbia.edu).
METHOD DETAILS
Combinatorics
Suppose we draw M items with replacement from R possibilities. The probability p that all items are distinct is given by
| (7) |
For the case of a synaptic degree of K and N input channels, setting yields the probability of all mixed-layer neurons being distinct.
Neuron model
The mixed-layer responses are given by
| (8) |
where J are the synaptic weights, s is the input-layer activity, θ are the activity thresholds, and Θ is the Heaviside step function (applied element-wise). When inhibition is present, J = J+ + J−, where J+ represents excitatory synaptic weights and represents global inhibition. Jm is a vector of length M with each element equal to −α, and Js is a vector of length N with each element either equal to 1 (homogeneous inhibition) or equal to (heterogeneous inhibition), where wI,i is drawn from the same distribution as the excitatory synaptic weights and NI is the number of inhibitory neurons. When α = K/N, inhibition and excitation are balanced. For all analyses of dimension, we choose the value of α that minimizes input-current correlations; this value depends on the distribution of inhibitory synaptic weights and is less than K/N except for homogeneous weights (see Calculation of input-current dimension). We note that our model is insensitive to the mean of the distribution of input activities, since the activity thresholds can be adjusted to compensate for a change in mean (specifically, if sj ← sj + μ, the thresholds may be redefined as ).
For heterogeneous weights, synaptic weights were drawn from a log-normal distribution with μ = −.702, σ = .936 (consistent with neocortical data; Song et al., 2005) or μ = 0, σ = 0.438 (leading to a CV of 0.46 consistent with data from cerebellar granule cells; Sargent et al., 2005).
Dimension
To compute the dimension of a system whose state is defined by x, we construct the covariance matrix C = 〈(x − 〈x〉)(x − 〈x〉)T〉. Using Tr C = Σi λi, where {λi} are the eigenvalues of C, and noting that C is symmetric, we find
| (9) |
where Cii and Cij are random variables representing the diagonal and off-diagonal elements of C, respectively. For the case of the input currents, analytic expressions can be obtained in terms of the distribution of synaptic weights (see Calculation of input-current dimension), leading to Equations 2 and 4. For the case of binary mixed-layer responses with identical coding levels for each neuron, 〈Cii〉 = f(1 − f), Var(Cii) = 0, and the expression reduces to Equation 3 when M is large. The distribution of Cij can be obtained exactly for the case of homogeneous weights, while for heterogeneous weights we estimated the distribution by sampling (see Heterogeneous weights).
Classifier performance
The synaptic weights of the Hebbian classifier are given by , where mμ − f is short for element-wise subtraction of f from mμ. We assume that the training patterns and noisy test patterns have a mean-squared distance (equivalent to the Hamming distance divided by M) of d and equal coding levels. Denoting the classifier input by gμ = w· (m̂μ −f), where m̂μ represents a noisy test pattern,
| (10) |
The average is taken over noise realizations. The variance Var (vμgμ) is equal to
| (11) |
When P is large, this quantity can be approximated by the sum over ν ≠ μ. In the special case of random and uncorrelated patterns and noise, then if ν ≠ μ, the statistics of vμvν(mν −f) · (m̂μ−f) are identical to those of Oμν ≡ (mν − f) · (mμ − f), which are Gaussian distributed with mean zero. Therefore Var (vμgμ) ≈ PVar (Oμν). Oμν is a random variable representing the overlap of the mixed-layer responses to two random input patterns. We now relate these overlaps to the covariance matrix of mixed-layer responses, and thus to dimension.
Consider a set of Q mixed-layer responses to random input patterns {mμ}, μ = 1…Q, and construct the matrix M, whose μth column is equal to mμ − f. The covariance matrix of mixedlayer responses can be written as . Furthermore, Var (Oμν) is equal to the variance of the off-diagonal elements of O = MTM. Therefore,
| (12) |
Thus, Var (vμgμ) ≈ PVar (Oμν) = P(Tr C2).
We define the signal-to-noise ratio
| (13) |
For the case of a fixed coding level, Tr C = Mf(1 − f). We can therefore write
| (14) |
Unsupervised learning of mixed-layer inputs
To study unsupervised learning in the presence of heterogeneous input-layer activity, we calculated the error rate of the classifier before and after an unsupervised learning period consisting of 1000 presentations of random input patterns. During this learning period, but not during the calculation of error rate, synaptic weights and the thresholds of mixed-layer neurons are modified.
The activity of the jth input channel is drawn from a Gaussian distribution with mean 0 and variance , with σj uniformly distributed between 0.25 and 1.75. We define a variable uij(t) for each connection, with uij(0) = 1. When an input pattern s(t) is presented, uij is updated according to
| (15) |
with α = 0.01 and t = 1,2, … 1000. The variable uij(t) thus estimates the variance of input channel j. Synaptic weights are set equal to Jij(t) = (uij(t))−1/2.
To update neuronal thresholds, we initially set θi(0) to be identical for all i, leading to heterogeneous coding levels across neurons (the initial value is chosen so that the average coding level is 0.1). When an input pattern is presented, thresholds are updated according to
| (16) |
with η = 0.01. This learning rule ensures that each neuron approaches a target coding level of ft = 0.1.
Supervised learning of mixed-layer inputs
We compared classification performance when input-layer to mixed-layer connections were chosen from a standard Gaussian distribution to performance when they were modified by Hebbian learning. Since the learning rule does not constrain the sign of the synaptic weights, we did not require the elements of J to be positive. To implement Hebbian learning, each input pattern sμ is associated with a target mixed-layer pattern tμ. The synaptic weights are then set according to
| (17) |
Target patterns are either random binary patterns with coding level f or patterns that mix input and valence. For the latter case, target patterns are generated by setting
| (18) |
θt is chosen so that each mixed-layer neuron is active for a fraction f of the target patterns. J0 is a M × N matrix with entries drawn independently from a zero-mean Gaussian distribution with variance 1/N, while Jv is a M-dimensional vector with entries drawn independently from a zeromean Gaussian distribution with variance 100.
Calculation of input-current dimension
If the input patterns have zero mean, then hμ = Jsμ has zero mean. The (i, j)th element of the covariance matrix of the input currents is therefore
| (19) |
where Ji is the ith row of J and the average is taken over input patterns for a fixed weight matrix. If the input patterns are also uncorrelated, 〈sμ(sμ)T〉 = I, and the above expression leads to C = JJT. This is the case we analyze.
To compute the input-current dimension, we calculate
| (20) |
where {λi} are the eigenvalues of C, and therefore
| (21) |
If Cii is a random variable representing the diagonal elements of C and Cij the off-diagonal elements, then
| (22) |
The averages are taken over the population of mixed-layer neurons. Thus, the dimension is determined by the mean and variance of the distributions of Cii = Ji · Ji and Cij = Ji · Jj across the population.
Homogeneous weights
We first consider the case where the rows of J each have K nonzero elements, at random locations. At these locations, Jij = 1. Thus, Cii = K and Tr C = MK. For i ≠ j, Cij ~ Hypergeom(N,K,K), and Cij = Cji. This yields
| (23) |
Thus,
| (24) |
When M,N ≫ K this simplifies to
| (25) |
In the limit of a large expansion, when M ≫ N ≫ K,
| (26) |
Homogeneous weights with homogeneous inhibition
We now suppose that all neurons receive inhibition proportional to the summed activity of the input layer. This is equivalent to adding a negative offset to all weights. Let J = J+ + J−, where J+ is determined as above for homogeneous weights and . Using the results of the previous section leads to
| (27) |
When α = K/N, the row sums of J are equal to zero and excitation and inhibition are “balanced”. In this case, and 〈Cij〉 = 0. Hence
| (28) |
When M,N ≫ K this simplifies to
| (29) |
Heterogeneous weights
Now we suppose each nonzero element of J is drawn from a distribution so that Jij ~ w, where w is a random variable. In this case,
| (30) |
For a given i ≠ j, the number of nonzero elements in the sum is again distributed as Hypergeom(N,K,K). Also, the individual nonzero elements have mean 〈w〉2 and second moment 〈w2〉2. This leads to
| (31) |
Thus, the dimension is
| (32) |
When M,N ≫ 1, this simplifies to
| (33) |
In the limit of a large expansion, when M ≫ N ≫ K,
| (34) |
Heterogeneous weights with homogeneous inhibition
Again, we assume J = J+ + J−, where J+ is determined as above for heterogeneous weights and . Then,
| (35) |
To determine the variance of , observe that the terms that contribute to the variance are those for which either or are nonzero. We therefore define as the sum over only these entries. This sum can be decomposed into those indices for which both and are nonzero and those for which only one is nonzero. We denote the number of these as and , respectively. Given ,
| (36) |
Now, observing that is distributed as Hypergeom(N,K,K), we use the law of total variance to find
| (37) |
When , inhibition balances excitation and 〈Cij〉 = 0. In this case and in the limit of N ≫ K,
| (38) |
Heterogeneous weights with heterogeneous inhibition
Let us suppose that , where the entries of Jm are homogeneous and equal to −α, while the entries of Js are drawn from a distribution with mean 〈wI 〉 and variance Var (wI ). This represents the case of global inhibition, with heterogeneous weights of the connections from the input layer onto the inhibitory neuron but homogeneous weights of the connections from the inhibitory neuron onto the mixed-layer neurons.
The average correlation between the input currents is
| (39) |
The value of α that minimizes this quantity is
| (40) |
Inserting this expression into Equation 39 leads to
| (41) |
Also, in this case the average variance is
| (42) |
Calculation of mixed-layer dimension
For Gaussian patterns and a homogeneous coding level across the mixed-layer neurons, we can evaluate Cij = 〈(mi − f)(mj − f)〉 = 〈mimj〉 − f2 by determining the variance and covariance of the input currents. The input to neuron i is Gaussian distributed with 〈hi〉 = 0 and . This determines the threshold of neuron i through . The covariance of hi and hj is Cov (hi, hj) = Ji · Jj. These quantities are sufficient to determine Cij using the approach described below. We use this approach to calculate the dimension in the main text for the case of Gaussian patterns. For non-Gaussian patterns, we compute the dimension by direct simulation.
Suppose hi, hj are zero-mean Gaussian random variables with variances and covariance ρ. We can evaluate 〈mimj〉 = 〈Θ(hi − θi)Θ(hj − θj)〉 by defining three auxiliary unit Gaussian random variables xi, xj, and y and writing , where and . The xi and y variables represent the independent and common components of the input-current fluctuations, respectively. Then
| (43) |
where .
Homogeneous weights
When weights are homogeneous, the correlation between the responses of a pair of mixed-layer neurons depends only on how many shared inputs they receive. The number of inputs shared by neurons i and j is distributed as . If inhibition is absent, and , while if it is present, and . These quantities and Equation 43 are sufficient to evaluate the distribution of correlations and thus the dimension.
Heterogeneous weights
For the case of heterogeneous weights, we obtain random samples from the distribution of Cij by generating random rows of J, computing σi, σj, and ρ, and using Equation 43 to compute 〈mimj〉. These random samples are used to estimate the mean and variance of the distribution.
Spatial model of mossy fiber-granule cell connectivity
We adapted the model described in Billings et al., 2014 to develop an anatomically realistic connectivity matrix between mossy fibers and granule cells. In the model, connections are formed based on the distances between mossy-fiber rosettes and granule cells. We extended the model beyond the 80μm-diameter sphere in Billings et al., 2014 to cover a region containing all granule cells presynaptic to a single model Purkinje cell.
A single Purkinje cell forms a large dendritic tree approximately 250μm in diameter (Eccles et al., 1967). We therefore consider a cylinder with diameter 250μm, representing the unfolded granular-layer region containing granule cells that send parallel fibers to the Purkinje cell. The length of the cylinder is chosen so that the number of granule cells contained within it is equal to the value of 209,000 reported to synapse onto a single Purkinje cell in rats (Harvey and Napper, 1991). Using a granule-cell density of 1.9 · 106 per mm3 (Harvey and Napper, 1991; Billings et al., 2014), the resulting length is 2240μm, which agrees with the typically reported parallel fiber extent of 2000 − 3000μm (Eccles et al., 1967; Albus, 1971). Here we have assumed that all parallel fibers that pass through the Purkinje-cell dendritic tree form connections with it. Anatomical studies indicate that this is true for greater than half of parallel fibers passing through a typical Purkinje-cell dendritic tree (Harvey and Napper, 1991).
Glomeruli containing mossy-fiber rosettes are placed within the cylinder using the procedure in Billings et al., 2014, which we now describe. The density of mossy fibers is equal to the observed glomerular density of 6.6 · 105 per mm3 divided by the number of rosettes per mossy fiber. In the 80μm-diameter sphere of Billings et al., 2014, mossy fibers typically formed 2–3 rosettes, but anatomical studies indicate that mossy fibers form on average approximately 20 rosettes in total (Eccles et al., 1967; Fox et al., 1967). We assume that mossy fibers form 10 rosettes in the area we consider, but varying this number does not influence our qualitative results. The value of 10 yields 7257 unique mossy fibers in the region, consistent with previous estimates by Marr, 1969, who estimated 7000 (Tyrrell and Willshaw, 1992 estimated 13,000). For each mossy fiber, the first rosette is placed uniformly within the cylinder, while subsequent rosettes are displaced relative to the previous rosette by a distance that is exponentially distributed in the x, y, and z directions (the axis of the cylinder is oriented along the z direction). The average displacements are 2μm, 58μm, and 21μm for the x, y, and z directions (Sultan, 2001; Billings et al., 2014), reflecting the anisotropic orientations of the paths of the mossy fibers.
Granule cells and their connections are formed using a procedure identical to Billings et al., 2014. Granule cells are placed uniformly within the cylinder according to the granule-cell density. Each forms connections with K = 4mossy-fiber rosettes. The K connected rosettes are those whose distances to the granule cell are closest to 15μm, representing the typical granule-cell dendritic length. Granule cells are disallowed from forming multiple connections to a single mossy fiber via multiple rosettes.
QUANTIFICATION AND STATISTICAL ANALYSIS
Results for dimension, error, and other quantities are first averaged over random input patterns for a fixed network architecture, then over random network architectures. Unless otherwise noted, the standard error of the mean across network architectures for the plotted quantities is smaller than the width of the marks.
DATA AND SOFTWARE AVAILABILITY
Software was written in the Julia (http://julialang.org) and Python (http://python.org) programming languages. Implementations of algorithms used to compute quantities presented in this study are available at: http://www.columbia.edu/~ak3625/code/litwin-kumar_et_al_dimension_2017.zip
Supplementary Material
Acknowledgments
We are grateful to Natasha Cayco-Gajic for discussions and advice concerning modeling of cerebellar cortex. We thank Stefano Fusi and Nate Sawtell for comments on the manuscript, and Mark Goldman and Hannah Payne for discussions that helped shape an early version of this study. We also acknowledge the support of the Marine Biological Laboratories in Woods Hole where the work was partially completed. Research was supported by NIH grant F32DC014387 (A.L.-K.), the UW NIH Big Data for Neuroscience and Genetics training grant and a Boeing Fellowship (K.D.H.), the Howard Hughes Medical Institute (R.A.), the Mathers Foundation (R.A. and L.F.A.), NIH grant U01NS090449 (H.S.), the Gatsby Charitable Foundation and the Simons Foundation under the Simons Collaboration for the Global Brain (H.S. and L.F.A.), CRCNS grant NSF-1430065 (L.F.A.), and the Swartz and Kavli Foundations (L.F.A.).
Footnotes
Author contributions
A.L.-K., K.D.H., R.A., H.S., and L.F.A. conceived the study and wrote the paper. A.L.-K. and K.D.H. performed simulations and analyses.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Abadi M, et al. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. 2016 arXiv preprint arXiv:1603.04467. [Google Scholar]
- Abbott LF, Nelson SB. Synaptic plasticity: Taming the beast. Nature Neuroscience. 2000;3:1178–1183. doi: 10.1038/81453. [DOI] [PubMed] [Google Scholar]
- Abbott LF, Rajan K, Sompolinsky H. Interactions between intrinsic and stimulus-evoked activity in recurrent neural networks. In: Ding M, Glanzman DL, editors. The dynamic brain: An exploration of neuronal variability and its functional significance. Oxford University Press; New York: 2011. pp. 65–82. [Google Scholar]
- Albus JS. A theory of cerebellar function. Mathematical Biosciences. 1971;10:25–61. [Google Scholar]
- Babadi B, Sompolinsky H. Sparseness and expansion in sensory representations. Neuron. 2014;83:1213–1226. doi: 10.1016/j.neuron.2014.07.035. [DOI] [PubMed] [Google Scholar]
- Barak O, Rigotti M, Fusi S. The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. Journal of Neuroscience. 2013;33:3844–3856. doi: 10.1523/JNEUROSCI.2753-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bell CC, Han V, Sawtell NB. Cerebellum-like structures and their implications for cerebellar function. Annual Review of Neuroscience. 2008;31:1–24. doi: 10.1146/annurev.neuro.30.051606.094225. [DOI] [PubMed] [Google Scholar]
- Bengtsson F, Jörntell H. Sensory transmission in cerebellar granule cells relies on similarly coded mossy fiber inputs. Proceedings of the National Academy of Sciences. 2009;106:2389–2394. doi: 10.1073/pnas.0808428106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhandawat V, Olsen SR, Gouwens NW, Schlief ML, Wilson RI. Sensory processing in the Drosophila antennal lobe increases reliability and separability of ensemble odor representations. Nature Neuroscience. 2007;10:1474–1482. doi: 10.1038/nn1976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Billings G, Piasini E, Lőrincz A, Nusser Z, Silver RA. Network structure within the cerebellar input layer enables lossless sparse encoding. Neuron. 2014;83:960–974. doi: 10.1016/j.neuron.2014.07.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brunel N. Is cortical connectivity optimized for storing information? Nature Neuroscience. 2016;19:749–755. doi: 10.1038/nn.4286. [DOI] [PubMed] [Google Scholar]
- Buonomano DV, Merzenich MM. Cortical plasticity: From synapses to maps. Annual Review of Neuroscience. 1998;21:149–186. doi: 10.1146/annurev.neuro.21.1.149. [DOI] [PubMed] [Google Scholar]
- Carandini M, Heeger DJ. Normalization as a canonical neural computation. Nature Reviews Neuroscience. 2012;13:51–62. doi: 10.1038/nrn3136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caron SJC, Ruta V, Abbott LF, Axel R. Random convergence of olfactory inputs in the Drosophila mushroom body. Nature. 2013;497:113–117. doi: 10.1038/nature12063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cayco-Gajic A, Clopath C, Silver A. Expansion recoding through sparse sampling in the cerebellar input layer speeds learning. BMC Neuroscience. 2016;17:59. [Google Scholar]
- Chabrol FP, Arenz A, Wiechert MT, Margrie TW, DiGregorio DA. Synaptic diversity enables temporal coding of coincident multisensory inputs in single neurons. Nature Neuroscience. 2015;18:718–727. doi: 10.1038/nn.3974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chadderton P, Margrie TW, Häusser M. Integration of quanta in cerebellar granule cells during sensory processing. Nature. 2004;428:856–860. doi: 10.1038/nature02442. [DOI] [PubMed] [Google Scholar]
- Chapeton J, Fares T, LaSota D, Stepanyants A. Efficient associative memory storage in cortical circuits of inhibitory and excitatory neurons. Proceedings of the National Academy of Sciences. 2012;109:E3614–E3622. doi: 10.1073/pnas.1211467109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- D’Angelo E. The organization of plasticity in the cerebellar cortex: From synapses to control. Progress in Brain Research. 2014;210:31–58. doi: 10.1016/B978-0-444-63356-9.00002-9. [DOI] [PubMed] [Google Scholar]
- Eccles JC, Ito M, Szentagothai J. The cerebellum as a neuronal machine. Springer-Verlag; New York: 1967. [Google Scholar]
- Eccles JC, Llinás R, Sasaki K. The mossy fibre-granule cell relay of the cerebellum and its inhibitory control by Golgi cells. Experimental Brain Research. 1966;1:82–101. doi: 10.1007/BF00235211. [DOI] [PubMed] [Google Scholar]
- Ecker AS, et al. Decorrelated neuronal firing in cortical microcircuits. Science. 2010;327:584–587. doi: 10.1126/science.1179867. [DOI] [PubMed] [Google Scholar]
- Ekerot CF, Jörntell H. Synaptic integration in cerebellar granule cells. The Cerebellum. 2008;7:539–541. doi: 10.1007/s12311-008-0064-6. [DOI] [PubMed] [Google Scholar]
- Fox CA, Hillman DE, Siegesmund KA, Dutta CR. The primate cerebellar cortex: A Golgi and electron microscopic study. Progress in Brain Research. 1967;25:174–225. doi: 10.1016/S0079-6123(08)60965-6. [DOI] [PubMed] [Google Scholar]
- Gao Z, van Beugen BJ, De Zeeuw CI. Distributed synergistic plasticity and cerebellar learning. Nature Reviews Neuroscience. 2012;13:619–635. doi: 10.1038/nrn3312. [DOI] [PubMed] [Google Scholar]
- Gao Z, et al. Excitatory cerebellar nucleocortical circuit provides internal amplification during associative conditioning. Neuron. 2016;89:645–657. doi: 10.1016/j.neuron.2016.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gardner E. The space of interactions in neural network models. Journal of Physics A: Mathematical and General. 1988;21:257. [Google Scholar]
- Gruntman E, Turner GC. Integration of the olfactory code across dendritic claws of single mushroom body neurons. Nature Neuroscience. 2013;16:1821–1829. doi: 10.1038/nn.3547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hansel C, Linden DJ, D’Angelo E. Beyond parallel fiber LTD: The diversity of synaptic and non-synaptic plasticity in the cerebellum. Nature Neuroscience. 2001;4:467–475. doi: 10.1038/87419. [DOI] [PubMed] [Google Scholar]
- Hansel D, van Vreeswijk C. The mechanism of orientation selectivity in primary visual cortex without a functional map. Journal of Neuroscience. 2012;32:4049–4064. doi: 10.1523/JNEUROSCI.6284-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harvey R, Napper R. Quantitative studies on the mammalian cerebellum. Progress in Neurobiology. 1991;36:437–463. doi: 10.1016/0301-0082(91)90012-p. [DOI] [PubMed] [Google Scholar]
- Hige T, Aso Y, Modi MN, Rubin GM, Turner GC. Heterosynaptic plasticity underlies aversive olfactory learning in Drosophila. Neuron. 2015;88:985–998. doi: 10.1016/j.neuron.2015.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Honegger KS, Campbell RA, Turner GC. Cellular-resolution population imaging reveals robust sparse coding in the Drosophila mushroom body. Journal of Neuroscience. 2011;31:11772–11785. doi: 10.1523/JNEUROSCI.1099-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang CC, et al. Convergence of pontine and proprioceptive streams onto multimodal cerebellar granule cells. eLife. 2013;2:e00400. doi: 10.7554/eLife.00400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ishikawa T, Shimuta M, Häusser M. Multimodal sensory integration in single cerebellar granule cells in vivo. eLife. 2015;4:e12916. doi: 10.7554/eLife.12916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ito M, Sakurai M, Tongroach P. Climbing fibre induced depression of both mossy fibre responsiveness and glutamate sensitivity of cerebellar Purkinje cells. Journal of Physiology. 1982;324:113. doi: 10.1113/jphysiol.1982.sp014103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jörntell H, Ekerot CF. Properties of somatosensory synaptic integration in cerebellar granule cells in vivo. Journal of Neuroscience. 2006;26:11786–11797. doi: 10.1523/JNEUROSCI.2939-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jortner RA, Farivar SS, Laurent G. A simple connectivity scheme for sparse coding in an olfactory system. Journal of Neuroscience. 2007;27:1659–1669. doi: 10.1523/JNEUROSCI.4171-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keene AC, Waddell S. Drosophila olfactory memory: Single genes to complex neural circuits. Nature Reviews Neuroscience. 2007;8:341–354. doi: 10.1038/nrn2098. [DOI] [PubMed] [Google Scholar]
- Kennedy A, et al. A temporal basis for predicting the sensory consequences of motor commands in an electric fish. Nature Neuroscience. 2014;17:416–422. doi: 10.1038/nn.3650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kingma D, Ba J. Adam: A method for stochastic optimization. 2014 arXiv preprint arXiv:1412.6980. [Google Scholar]
- LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
- Liu X, Davis RL. The GABAergic anterior paired lateral neuron suppresses and is suppressed by olfactory learning. Nature Neuroscience. 2009;12:53–59. doi: 10.1038/nn.2235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Llinás RR, Walton KD, Lang EJ. Cerebellum. In: Shepherd GM, editor. The Synaptic Organization of the Brain. 5 Oxford University Press; New York: 2003. pp. 271–309. [Google Scholar]
- Markram H, Lübke J, Frotscher M, Roth A, Sakmann B. Physiology and anatomy of synaptic connections between thick tufted pyramidal neurones in the developing rat neocortex. Journal of Physiology. 1997;500:409. doi: 10.1113/jphysiol.1997.sp022031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marr D. A theory of cerebellar cortex. Journal of Physiology. 1969;202:437–470. doi: 10.1113/jphysiol.1969.sp008820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mugnaini E, Osen KK, Dahl AL, Jr, VLF, Korte G. Fine structure of granule cells and related interneurons (termed Golgi cells) in the cochlear nuclear complex of cat, rat and mouse. Journal of Neurocytology. 1980;9:537–570. doi: 10.1007/BF01204841. [DOI] [PubMed] [Google Scholar]
- Murthy M, Fiete I, Laurent G. Testing odor response stereotypy in the Drosophila mushroom body. Neuron. 2008;59:1009–1023. doi: 10.1016/j.neuron.2008.07.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raymond JL, Lisberger SG, Mauk MD. The cerebellum: A neuronal learning machine? Science. 1996;272:1126–1131. doi: 10.1126/science.272.5265.1126. [DOI] [PubMed] [Google Scholar]
- Renart A, et al. The asynchronous state in cortical circuits. Science. 2010;327:587–590. doi: 10.1126/science.1179850. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rigotti M, et al. The importance of mixed selectivity in complex cognitive tasks. Nature. 2013;497:585–590. doi: 10.1038/nature12160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rylkova D, Crank AR, Linden DJ. Chronic in vivo imaging of ponto-cerebellar mossy fibers reveals morphological stability during whisker sensory manipulation in the adult rat. eNeuro. 2015;2:ENEURO–0075. doi: 10.1523/ENEURO.0075-15.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saez A, Rigotti M, Ostojic S, Fusi S, Salzman CD. Abstract context representations in primate amygdala and prefrontal cortex. Neuron. 2015;87:869–881. doi: 10.1016/j.neuron.2015.07.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sargent PB, Saviane C, Nielsen TA, DiGregorio DA, Silver RA. Rapid vesicular release, quantal variability, and spillover contribute to the precision and reliability of transmission at a glomerular synapse. Journal of Neuroscience. 2005;25:8173–8187. doi: 10.1523/JNEUROSCI.2051-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schweighofer N, Doya K, Lay F. Unsupervised learning of granule cell sparse codes enhances cerebellar adaptive control. Neuroscience. 2001;103:35–50. doi: 10.1016/s0306-4522(00)00548-0. [DOI] [PubMed] [Google Scholar]
- Shomrat T, et al. Alternative sites of synaptic plasticity in two homologous “fan-out fan-in” learning and memory networks. Current Biology. 2011;21:1773–1782. doi: 10.1016/j.cub.2011.09.011. [DOI] [PubMed] [Google Scholar]
- Song S, Sjöström PJ, Reigl M, Nelson S, Chklovskii DB. Highly nonrandom features of synaptic connectivity in local cortical circuits. PLoS Biology. 2005;3:e68. doi: 10.1371/journal.pbio.0030068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spanne A, Jörntell H. Questioning the role of sparse coding in the brain. Trends in Neurosciences. 2015;38:417–427. doi: 10.1016/j.tins.2015.05.005. [DOI] [PubMed] [Google Scholar]
- Sultan F. Distribution of mossy fibre rosettes in the cerebellum of cat and mice: Evidence for a parasagittal organization at the single fibre level. European Journal of Neuroscience. 2001;13:2123–2130. doi: 10.1046/j.0953-816x.2001.01593.x. [DOI] [PubMed] [Google Scholar]
- Szyszka P, Galkin A, Menzel R. Associative and non-associative plasticity in Kenyon cells of the honeybee mushroom body. Frontiers in Systems Neuroscience. 2008;2:3. doi: 10.3389/neuro.06.003.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tyrrell T, Willshaw D. Cerebellar cortex: Its simulation and the relevance of Marr’s theory. Philosophical Transactions of the Royal Society B: Biological Sciences. 1992;336:239–257. doi: 10.1098/rstb.1992.0059. [DOI] [PubMed] [Google Scholar]
- Wiechert MT, Judkewitz B, Riecke H, Friedrich RW. Mechanisms of pattern decorrelation by recurrent neuronal circuits. Nature Neuroscience. 2010;13:1003–1010. doi: 10.1038/nn.2591. [DOI] [PubMed] [Google Scholar]
- Xu-Friedman MA, Regehr WG. Ultrastructural contributions to desensitization at cerebellar mossy fiber to granule cell synapses. Journal of Neuroscience. 2003;23:2182–2192. doi: 10.1523/JNEUROSCI.23-06-02182.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







