Abstract
The ability for the brain to discriminate among visual stimuli is constrained by their retinal representations. Previous studies of visual discriminability have been limited to either low-dimensional artificial stimuli or pure theoretical considerations without a realistic encoding model. Here we propose a novel framework for understanding stimulus discriminability achieved by retinal representations of naturalistic stimuli with the method of information geometry. To model the joint probability distribution of neural responses conditioned on the stimulus, we created a stochastic encoding model of a population of salamander retinal ganglion cells based on a three-layer convolutional neural network model. This model not only accurately captured the mean response to natural scenes but also a variety of second-order statistics. With the model and the proposed theory, we computed the Fisher information metric over stimuli to study the most discriminable stimulus directions. We found that the most discriminable stimulus varied substantially across stimuli, allowing an examination of the relationship between the most discriminable stimulus and the current stimulus. By examining responses generated by the most discriminable stimuli we further found that the most discriminative response mode is often aligned with the most stochastic mode. This finding carries the important implication that under natural scenes, retinal noise correlations are information-limiting rather than increasing information transmission as has been previously speculated. We additionally observed that sensitivity saturates less in the population than for single cells and that as a function of firing rate, Fisher information varies less than sensitivity. We conclude that under natural scenes, population coding benefits from complementary coding and helps to equalize the information carried by different firing rates, which may facilitate decoding of the stimulus under principles of information maximization.
1. Introduction
Neural populations represent information with their collective activity. In the field of sensory neuroscience, whereas there is much focus on characterizing the sensitivity of neurons to different stimuli, the true function of a sensory system is to discriminate between stimuli. A quantitative description of discriminability in a neural population requires not only a precise knowledge of sensitivity to stimuli, but also the structure of noise correlations in the population.
Neural representation manifolds constitute a modern approach to studying neural populations in a geometric framework, and recent machine learning literature has seen extensive research on the geometry of representation manifolds of neural networks trained for artificial tasks [1, 2, 3, 4, 5]. In separate work, Fisher information has been widely utilized in neuroscience for measuring decoding fidelity, particularly in theoretical studies exploring the effect of noise correlations [6, 7, 8]. In this study, we propose a framework for studying the Fisher information geometry of representation manifolds for natural visual scenes in a population of visual neurons, in particular the retina. In addition, for the first time, we apply such a manifold analysis to the neural representation of natural stimuli for real data.
The discriminability of biological visual systems has long been studied for low-dimensional and discrete artificial stimuli [9, 10, 11]. However, naturalistic visual stimuli in the real world are inherently high-dimensional and discriminability is generally heterogeneous across different types of changes in a stimulus. Recent advances in convolutional neural networks (CNNs) have enabled the accurate prediction of retinal responses to naturalistic stimuli, as well as ethological computations [12, 13, 14]. Nevertheless, previous work has predominantly focused on deterministic trial-averaged neural codes, which is not adequate for computing discriminability. Thus an encoding model of stochastic retinal representations for natural scenes is required, which constitutes one of the primary contributions of this work.
We present here a highly accurate model of both sensitivity and stochasticity for the retinal ganglion cell (RGC) population, thus creating the first accurate model of discriminability in a neural population for its natural inputs. From an analysis of the Fisher information geometry of the retinal representation of stimuli, we find that the most discriminable direction for the RGC population varies greatly across stimuli and is correlated with the mean neural response. We further address a long standing question as to the effect of noise correlations on information. We find that the most discriminative mode in the representation space often aligns with the most stochastic direction, implying that noise correlations in the retina are detrimental to information coding. Finally, we analyze how discriminability depends on firing rate of individual cells and in the population, and discuss the implications in relation to principles of information maximization.
2. Theory
In this section, we describe basic theory of information geometry [15] and its application to sensory neuroscience. Let denote the stimulus vector and denote the spike count vector whose dimension is equal to the number of neurons encoding the stimulus. Each stimulus induces a conditional probability distribution of sensory neural responses and mean firing rates . Along any decoding vector , the sensitivity is defined as , while the stochasticity is defined as the standard deviation of conditioned on (Figure 1). In our case in the retina, is modeled by a noiseless CNN and is modeled by the stochastic counterpart.
Figure 1:
Schematic of basic concepts in a sensory system. The sensory network maps from the stimulus space to the representation manifold. The differentiation operator maps the stimulus tangent space to the representation tangent space. With noise in the network, each stimulus corresponds to a cloud of points in the representation space. Generally, the sensitivity (the blue and the red arrows) and the stochasticity (the green oval) are both heterogeneous.
The representation manifold is defined as the manifold of mean neural responses induced by stimuli: . We also define a statistical manifold of conditional response distributions parameterized by stimuli , which yield a natural coordinate system. To equip the statistical manifold with a Riemannian metric, we consider the Kullback–Leibler divergence between two nearby distributions:
| (1) |
where is the Fisher information matrix with respect to the stimulus, and the KL divergence is linearized up to the second order. The Fisher information metric naturally characterizes the local discriminability of stimuli near .
When we assume that neural responses follow multivariate Gaussian distributions and the covariance matrix is locally independent of the stimulus, , the Fisher information metric can be expressed as
| (2) |
where is the Jacobian matrix. Eq. 2 is commonly termed linear Fisher information which bounds the performance of unbiased linear decoders, which may be more biologically plausible than nonlinear decoders [16, 17, 18]. The latter assumption is a good approximation in our case because a neural network consisting only of ReLU nonlinearities is locally equivalent to an effective linear network [19] and the covariance matrix of a linear network is a constant matrix. This metric expression is interpretable in that discriminability is proportional to sensitivity and inversely proportional to noise. The length of a geodesic connecting two stimuli under this information metric measures the discriminability between them, when represented by the neural population.
Equivalently, the mean response on the representation manifold can also serve as a coordinate system. Then the metric is given by . The metric matrix transforms between the stimulus space and the representation space as a tensor [1].
We define here several important quantities for analysis. The eigenvector of associated with the largest eigenvalue is the most discriminable direction in the stimulus tangent space, which we term the most discriminable input (MDI) at stimulus . The square root of the eigenvalue is the corresponding discriminability [20]. The MDI in the stimulus space can be pushed forward to the most discriminative response mode (MDR) in the representation tangent space via the Jacobian matrix: . Similarly, the most sensitive direction in the stimulus (input) tangent space (MSI) can be obtained as the principal eigenvector of and the associated most sensitive response mode (MSR) obeys . Additionally, the most stochastic (noisy) response mode (MNR) is defined as the principal eigenvector of the noise covariance matrix , which is no more than principal component analysis. For a ReLU neural network with small stochasticity, the stimulus dependence of the aforementioned tangent space features is solely derived from the stimulus dependence of the effective linear network [19].
One practical challenge of the above analysis with natural visual stimuli is the extremely high dimensionality of the stimulus space. Fortunately, we only need to compute the Fisher information matrix in a subspace defined by the recorded cell population. First we define the instantaneous receptive field for each cell as the direction of the greatest sensitivity to the stimulus at one stimulus point, which is the gradient of that cell’s response, [13]. Then we only need to compute Fisher information in the subspace spanned by instantaneous receptive fields, as justified by the following theorem.
Theorem: The top most discriminable directions are linear combinations of the instantaneous receptive fields, where is the number of output neurons.
Proof: Let represent the subspace spanned by instantaneous receptive fields. Suppose that the most discriminable direction is not in , then we can decompose , where is the orthogonal component in and is the parallel component in , , . Then following Eq. 2 we have
| (3) |
which contradicts with the fact that is the most discriminable direction. Therefore, is in . Replacing with , with the same analysis we can prove that the second most discriminable direction is also in . Subsequently, the whole theorem can be proven.
3. Methods
The spiking activity of a population of tiger salamander retinal ganglion cells was recorded in response to a sequence of jittering natural images. We created a stochastic encoding model capturing with the CNN architecture that included independent noise in the stimulus and in each unit of the CNN. The CNN parameters were trained using backpropagation to fit the mean firing rates to create a deterministic model, and then these parameters were frozen. Noise parameters included one parameter for each layer to set the standard deviation of the independent noise, and one parameter for binomial noise in each ganglion cell. These were then optimized by maximizing the log-likelihood of the empirical data and exploring the parameter space to match a few second-order statistics of the neural responses. Once the parameter optimization was completed, we computed the Fisher information metric with the model numerically for each stimulus and analyzed the most discriminable directions.
3.1. Stochastic model
The model architecture is shown in Figure 2. The model takes a spatiotemporal visual stimulus as an input and its output is a set of spike counts, one for each neuron in each time bin. The deterministic part of the model including its hyperparameters and training method is adopted from Ref. [12, 13, 14] which is a three-layer CNN chosen after an extensive architecture search. This CNN is the SOTA retinal encoding model for natural scenes in that (1) the prediction accuracy of RGC mean responses outperforms other models; (2) the activity of internal units is correlated with retinal interneuron responses recorded separately and not used in training; and (3) the model trained only on natural scenes can reproduce a wide range of phenomena induced by artificial stimuli. Thus, the deterministic part of our model has a clear correspondence with the real retina in that its circuitry is mechanistically interpretable and it captures many retinal computations. Every convolutional filter in our model was implemented by linearly stacking a sequence of 3 × 3 small filters, which outperforms the traditional method [21]. A parametric tanh nonlinearity was attached to the last convolutional layer for the purpose of enforcing the refractory period constraint:
| (4) |
where , are cell-dependent model parameters, and represents the cell-dependent maximal firing rate.
Figure 2:
Model architecture. The first layer consists of eight 15 × 15 spatiotemporal convolutional filters. The second layer consists of eight 11 × 11 filters. The final layer consists of C 9 × 9 filters, where C represents the number of cell types. Batch normalization is applied after each convolutional filter. The nonlinearities in the first two layers are ReLUs, while in the last layer, the nonlinearity consists of the SoftPlus function followed by a parametric tanh function. Gaussian white noise is added to the stimulus and the first two convolutional layers, and independent binomial noise is applied to the final nonlinearity. The one-hot layer selects the location of the unit in the final convolutional layer to match that of the recorded neuron. Further details are explained in the main text.
The essential module that enables us to study retinal population coding is the one-hot layer. Model parameters are fit with the one-hot layer, which bijectively maps a few output units of the final convolutional layer to recorded neurons by the end of the training. The one-hot layer is then removed after parameter optimization so that we can analyze the representation of the full population, typically consisting of thousands of units. The rationale behind this method is the mosaic organization of each retinal cell type, in which all ganglion cells of a given type tile the retina with their dendrites [11]. Formally, the one-hot layer is defined as
| (5) |
where the input is the output of the last convolutional layer, is the index of recorded neurons, , (and , below) are location indices, is the linear combination weight vector, and is the neuron-to-channel map. Each will converge to a one-hot array after model training with our one-hot loss function, which is adapted from the semantic loss in Ref. [22]
| (6) |
and which reflects the locations of recorded neurons. Each output channel represents a ganglion cell type and was determined using hierarchical clustering of output channels (see the Supplementary Material for details).
Gaussian white noise was added to the stimulus to simulate stochasticity in photoreceptors, and was also added to pre-threshold signals in the first two layers to simulate stochasticity in bipolar cells and amacrine cells. We found that Gaussian noise outperforms Poisson noise in these layers, which can presumably be attributed to the noise generation mechanism in the retina and the central limit theorem. The stochasticity of ganglion cells was modeled as a parametric binomial probability mass function:
| (7) |
where denotes the spike count in a time bin, is the rate parameter having a one-to-one correspondence with the mean firing rate , is the variability parameter to be optimized, and is the cell-dependent upper bound of spike counts in one time bin reflecting the refractory period of that cell, which one can directly determine from data. In practice, for given and we can interpolate the mapping and use it to determine the rate parameter producing the desired mean firing rate, i.e., . Although we have explored a variety of probability mass functions to fit the sub-Poisson statistics of retinal ganglion cells [23, 24], the binomial noise Eq. 7 performs the best among them (see the Supplementary Material) and also is the most natural choice considering the refractory period constraint. We stress that the binomial noise on ganglion cells is independent noise, and thus will not qualitatively affect the geometry and the functional implication of noise correlations.
3.2. Optimization
Since the second order statistics are functions of the entire dataset and cannot be decomposed into terms depending on the individual stimulus, we optimized deterministic parameters and stochastic parameters separately. CNN parameters and one-hot parameters were optimized together with the standard Poisson loss function and to fit mean firing rates smoothed using a 10 ms Gaussian filter. The optimization was performed using ADAM [25] via Pytorch [26] on NVIDIA TITAN Xp, GeForce GTX TITAN X, GeForce RTX 3090, TITAN RTX, TITAN V, and GeForce RTX 2080 Ti GPUs. The network was regularized with an L2 weight penalty at each layer and an L1 penalty on the model output. Then tanh parameters of the final nonlinearity were fitted with the least square regression. Then we optimized stochastic parameters while freezing all deterministic parameters. Variability parameters were optimized by maximizing the log-likelihood of recorded spike counts based on empirical firing rates. Standard deviations of Gaussian noise in each layer were optimized through grid searches in order to best fit empirical noise correlations and stimulus correlations by minimizing the smoothed and weighted average of mean squared errors.
3.3. Computation of Fisher information metric
For each stimulus, the noise covariance matrix was estimated by repeatedly simulating the stochastic model thousands of times. We then computed the eigendecomposition of obtaining local information of the representation manifold: . We applied the Gram–Schmidt orthogonalization to instantaneous receptive fields to obtain a basis of , which is a subspace of the stimulus tangent space. Then following Eq. 2, the Fisher information matrix with respect to the stimulus can be computed as
| (8) |
where is the stimulus coordinate under the basis. In practice, we summed only over hundreds of the most stochastic modes that can explain 85% to 90% of the variance, as the sensitivity and stochasticity of the other modes are both close to zero indicating that those modes are orthogonal to the tangent space of the representation manifold. The gradient with respect to the stimulus was computed using the noiseless model.
4. Results
4.1. Model evaluation
Early stochastic encoding models of the retina were primarily developed for artificial stimuli [27, 28, 29], and their results cannot be directly generalized to natural scenes due to the stimulus-dependent nature of noise in the retina. Here our fitted model was evaluated by comparing with neural data various commonly used second-order statistics including correlation measures and single-cell variability measures (Figure 3). Surprisingly, the model prediction matches the neural data remarkably well despite that the model only has a few noise parameters. This result suggests that because noise in the retina propagates through the same network pathways as the signal, they are both captured by the CNN structure. This indicates a strikingly simple origin for noise correlations between all ganglion cell pairs: independent noise that starts in individual upstream cells, and then propagates through the same network that creates mean stimulus sensitivity. Statistics for preparations with longer test sequences tend to be more reliable and consequently our model performs better on these. For the first order statistics (mean firing rates), the Pearson correlation coefficient between model predictions and data ranges from 70% to 80%, which is similar to Ref. [12, 13]. Such an accurate model of both sensitivity and stochasticity guarantees that the computation of Fisher information matrix is reliable (Eq. 2).
Figure 3:
The model captures a variety of second-order statistics of neural responses (see the Supplementary Material for definitions). (a) Comparison of total pairwise correlation, stimulus correlation, and noise correlation between neural data and model prediction. Black lines represent perfect agreement. Each circle corresponds to a pair of neurons in an experimental preparation coded by its color and the circle radius is proportional to the square root of the total length of the test set. (b) Comparison of single-cell variability measures including the Fano factor and trial-to-trial correlation between neural data and model prediction. Neuron indices share the same color code with (a).
4.2. The most discriminable directions
Some examples of the most discriminable stimulus direction (MDI) and the most discriminative response mode (MDR) are shown in Figure 4a (see the Supplementary Material for more examples). Often the top eigenvalues were similar, and so to avoid over-interpreting the importance of individual eigenmodes, we found the sparsest direction (with the least L1 norm) in the space spanned by the top 5 MDIs for the purpose of visualization. Only spatial components are presented here as we found that temporal components for different stimuli are both very similar to each other and to those of instantaneous receptive fields [13]. We found that the MDI is strongly stimulus dependent, and is typically localized in a region where the sensitivity is high, an effect that arises from the locality of convolutional connections. The MDI tends to appear around the central region (red box in Figure 4a) because pixels within this area affect more output units than pixels near the edges or corners.
Figure 4:
Most discriminable directions. (a) Examples of the most discriminable directions for different stimuli in the test set. The first column shows a representative frame of the spatiotemporal stimulus. The second column shows the spatial component of the spatiotemporal MDI obtained from the singular value decomposition. Red boxes correspond to the unit array in the final layer. The third column shows the mean response averaged over cell-type channels. The fourth column shows the absolute values of MDR averaged over channels. The last column shows the superposition of the MDI and stimulus. (b) Histogram of the cosine similarity between the mean response vector and its projection onto the space spanned by the top 5 MDRs. Similarities between shuffled pairs are also shown for baseline comparison. (c) One example of the stimulus (left panel, the central 18 × 18 array is shown) and the corresponding MDR for different cell type channels.
One general observation is that the MDI varies greatly across stimuli. From a mathematical point of view, this is to say that certain features of the tangent space depend on the base point. With respect to neural function, this variability potentially indicates the presence of adaptation to the stimulus. We found that the most discriminable direction is related to the base point through the representation space such that the MDR and the mean response are correlated (Figure 4b). This correlation can be attributed to the fact that neurons with intermediate firing rates exhibit the highest single-cell sensitivity and discriminability (Figure 6a). As a result,the MDI sometimes captures the most salient feature of the stimulus (the first row of Figure 4a), although this is not always the case (the second row of Figure 4a). Similar stimuli and mean responses likely produce distinct MDIs especially when their mean response pattern has multiple spatial components that can each yield a localized MDR (the third and fourth rows of Figure 4a). Moreover, the correlation between MDR and mean response provides a mechanism for higher brain regions to estimate the most informative future response changes of retinal ganglion cells based on the current response and thereby adjust the corresponding sensitivity, which could have benefits for the property of predictive coding [30]. Additionally, we found that different parts of the stimulus generates the MDR for different cell types (Figure 4c), indicating that different cell types signal different stimulus regions in the image as conveying the most information.
Figure 6:
Firing rate dependency. (a) Log-scale 2D histograms showing the firing rate dependencies of stochasticity, sensitivity, and discriminability for individual neurons whose definitions are given in the theory section. Mean values were plotted with thick black lines. (b) Scatter plots showing the population-averaged firing rate dependencies of stochasticity, sensitivity, and discriminability for the first MDR. The linear regression of the populational discriminability is shown.
4.3. Noise correlations in the retina limit information coding
The role of noise correlations in population coding has long been debated. Multiple scenarios have been advanced where noise correlations are detrimental to information coding [31, 10]. However, others suggest that noise correlations under certain circumstances may not limit information coding and can even be beneficial [32, 6, 9]. More specifically, it has been proposed that in response space, noise correlations decrease information when the differences in the signal are aligned with the noise direction, and increase information when they are orthogonal [33] (Figure 5a).
Figure 5:
Noise correlations in the retina are detrimental to information coding. (a) Compared to independent noise (left), noise correlations can either increase information (middle) or decrease information (right) [33]. (b) Cosine similarities between the top 5 MDRs and the top 30 MNRs averaged over stimuli. (c) Discriminability of MDI for different numbers of output units. The plot for trial-shuffled responses with independent noise is shown for comparison. Output units are randomly selected from the central 500µm × 500µm region [34]. (d) Sensitivity and stochasticity of the top 200 MDRs averaged over stimuli.
Here we tested these two hypotheses for the retina by computing cosine similarities between MDRs and MNRs. It turns out that the most discriminative mode is quite aligned with the most stochastic mode and orthogonal to modes with small variability (Figure 5b), implying that noise correlations in the retina are information-limiting correlations. We also performed a more direct test of this conclusion by analyzing trial-shuffled responses, which have independent noise. We found that removing noise correlations in this way created responses that were significantly more discriminative than the original model outputs with correlated noise for every population size (Figure 5c). To understand the mechanism of this reduction in information, we analyzed the eigenspectrum of the Fisher information matrix (Figure 5d). Both the sensitivity and the stochasticity increase with the discriminability of MDR, and their ratio reaches a maximum at the noisiest mode. Therefore, we propose that the fundamental reason for such information-limiting correlation is that noise propagates through the same network as the signal, causing the stochasticity ellipsoid to align closely with, and be rounder than, the sensitivity ellipsoid. One trivial and extreme example of this arrangement is when all noise originates at the level of the stimulus, e.g. from photoreceptors, then the resulting noise correlations will obviously be detrimental.
4.4. Firing rate dependency
The firing rate dependencies of stochasticity, sensitivity, and discriminability for individual neurons and the whole population are shown in Figure 6. For individual neurons, both the sensitivity and the stochasticity peak around intermediate firing rates and decay at high firing rates (a consequence of the refractory period). Thus, the single-cell discriminability also saturates around the intermediate firing rate. In contrast, population coding does not exhibit such a decay at high rates, implying that neurons can encode information complementarily, which is a benefit of population coding. In other words, even when a fraction of neurons get saturated by the stimulus and become uninformative at high firing rates, other neurons can still provide discriminative responses to stimulus changes. Although the population sensitivity and stochasticity significantly increase with the mean firing rate, the firing rate dependency of the population discriminability remains relatively flat, which is potentially beneficial for information transmission assuming that stimuli are uniformly distributed on the natural scene manifold [35, 36].
5. Discussion
In summary, we established a novel information-geometric framework to understand the high dimensional representation manifold of the retinal population for natural scenes, one that can also be generalized to other neural systems. However, there still exist remaining questions and further work that can be pursued within this framework.
First, our analysis primarily focused on examining the local metric tensor, which characterizes the effects of infinitesimal stimulus changes. It would be interesting to extend the analysis to finite stimulus changes to explore more global geometry of the representation manifold. For example, one exciting direction is to find geodesics between two stimuli under the information metric and observe how one stimulus transitions into another with the least amount of discriminable changes for the retinal population [37].
Second, we found that noise correlations in the retina are information-limiting correlations. This conclusion is surprisingly robust against many hyperparameter selections, even when second order statistics are not fitted well. Therefore, it remains unclear whether this is a general result for all feedforward networks with independent noise applied to their inputs and intermediate layers. If this is indeed the case, noise correlations in other brain areas that do not limit information coding must arise from different mechanisms.
Finally, the Fisher information metric in this work was computed in the full stimulus space. However, most points in the stimulus space represent meaningless noise images that no animal has ever encountered. Therefore, an important avenue for future research would involve incorporating the structure of natural stimuli and computing the metric with stimuli constrained to a natural-scene manifold.
Supplementary Material
Acknowledgements
This work was supported by grants from the NEI, R01EY022933, R01EY025087 and P30EY026877 (SAB).
Footnotes
37th Conference on Neural Information Processing Systems (NeurIPS 2023).
References
- [1].Hauser Michael and Ray Asok. Principles of riemannian geometry in neural networks. Advances in neural information processing systems, 30, 2017. [Google Scholar]
- [2].Cohen Uri, Chung SueYeon, Lee Daniel D, and Sompolinsky Haim. Separability and geometry of object manifolds in deep neural networks. Nature communications, 11(1):746, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Sorscher Ben, Ganguli Surya, and Sompolinsky Haim. Neural representational geometry underlies few-shot concept learning. Proceedings of the National Academy of Sciences, 119(43):e2200800119, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Shao Hang, Kumar Abhishek, and Thomas Fletcher P. The riemannian geometry of deep generative models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 315–323, 2018. [Google Scholar]
- [5].Wang Binxu and Ponce Carlos R. A geometric analysis of deep generative image models and its applications. In International Conference on Learning Representations, 2021. [Google Scholar]
- [6].Moreno-Bote Rubén, Beck Jeffrey, Kanitscheider Ingmar, Pitkow Xaq, Latham Peter, and Pouget Alexandre. Information-limiting correlations. Nature neuroscience, 17(10):1410–1417, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Abbott Larry F and Dayan Peter. The effect of correlated variability on the accuracy of a population code. Neural computation, 11(1):91–101, 1999. [DOI] [PubMed] [Google Scholar]
- [8].Averbeck Bruno B and Lee Daeyeol. Effects of noise correlations on information encoding and decoding. Journal of neurophysiology, 95(6):3633–3644, 2006. [DOI] [PubMed] [Google Scholar]
- [9].Rumyantsev Oleg I, Lecoq Jérôme A, Hernandez Oscar, Zhang Yanping, Savall Joan, Chrapkiewicz Radosław, Li Jane, Zeng Hongkui, Ganguli Surya, and Schnitzer Mark J. Fundamental bounds on the fidelity of sensory cortical coding. Nature, 580(7801):100–105, 2020. [DOI] [PubMed] [Google Scholar]
- [10].Silveira Rava Azeredo da and Rieke Fred. The geometry of information coding in correlated neural populations. Annual Review of Neuroscience, 44:403–424, 2021. [DOI] [PubMed] [Google Scholar]
- [11].Kastner David B and Baccus Stephen A. Coordinated dynamic encoding in the retina using opposing forms of plasticity. Nature neuroscience, 14(10):1317–1322, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].McIntosh Lane, Maheswaranathan Niru, Nayebi Aran, Ganguli Surya, and Baccus Stephen. Deep learning models of the retinal response to natural scenes. Advances in neural information processing systems, 29, 2016. [PMC free article] [PubMed] [Google Scholar]
- [13].Maheswaranathan Niru, McIntosh Lane T, Tanaka Hidenori, Grant Satchel, Kastner David B, Melander Joshua B, Nayebi Aran, Brezovec Luke E, Wang Julia H, Ganguli Surya, et al. Interpreting the retinal neural code for natural scenes: From computations to neurons. Neuron, 111(17):2742–2755, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Tanaka Hidenori, Nayebi Aran, Maheswaranathan Niru, McIntosh Lane, Baccus Stephen, and Ganguli Surya. From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction. Advances in neural information processing systems, 32, 2019. [PMC free article] [PubMed] [Google Scholar]
- [15].Amari Shun-ichi. Information geometry and its applications, volume 194. Springer, 2016. [Google Scholar]
- [16].Beck Jeffrey, Bejjanki Vikranth R, and Pouget Alexandre. Insights from a simple expression for linear fisher information in a recurrently connected population of spiking neurons. Neural computation, 23(6):1484–1502, 2011. [DOI] [PubMed] [Google Scholar]
- [17].Kanitscheider Ingmar, Coen-Cagli Ruben, Kohn Adam, and Pouget Alexandre. Measuring fisher information accurately in correlated neural populations. PLoS computational biology, 11(6):e1004218, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Kohn Adam, Coen-Cagli Ruben, Kanitscheider Ingmar, and Pouget Alexandre. Correlations and neuronal population information. Annual review of neuroscience, 39:237–256, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Shan Haozhe and Sompolinsky Haim. Minimum perturbation theory of deep perceptual learning. Physical Review E, 106(6):064406, 2022. [DOI] [PubMed] [Google Scholar]
- [20].Berardino Alexander, Laparra Valero, Ballé Johannes, and Simoncelli Eero. Eigen-distortions of hierarchical representations. Advances in neural information processing systems, 30, 2017. [Google Scholar]
- [21].Ding Xuehao, Lee Dongsoo, Grant Satchel, Stein Heike, McIntosh Lane, Maheswaranathan Niru, and Baccus Stephen. A mechanistically interpretable model of the retinal neural code for natural scenes with multiscale adaptive dynamics. In 2021 55th Asilomar Conference on Signals, Systems, and Computers, pages 287–291. IEEE, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Xu Jingyi, Zhang Zilu, Friedman Tal, Liang Yitao, and Broeck Guy. A semantic loss function for deep learning with symbolic knowledge. In International conference on machine learning, pages 5502–5511. PMLR, 2018. [Google Scholar]
- [23].Berry Michael J, Warland David K, and Meister Markus. The structure and precision of retinal spike trains. Proceedings of the National Academy of Sciences, 94(10):5411–5416, 1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Pitkow Xaq and Meister Markus. Decorrelation and efficient coding by retinal ganglion cells. Nature neuroscience, 15(4):628–635, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Kingma Diederik P and Ba Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [Google Scholar]
- [26].Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32:8026–8037, 2019. [Google Scholar]
- [27].Chichilnisky EJ. A simple white noise analysis of neuronal light responses. Network: computation in neural systems, 12(2):199, 2001. [PubMed] [Google Scholar]
- [28].Keat Justin, Reinagel Pamela, Clay Reid R, and Meister Markus. Predicting every spike: a model for the responses of visual neurons. Neuron, 30(3):803–817, 2001. [DOI] [PubMed] [Google Scholar]
- [29].Pillow Jonathan W, Paninski Liam, Uzzell Valerie J, Simoncelli Eero P, and Chichilnisky EJ. Prediction and decoding of retinal ganglion cell responses with a probabilistic spiking model. Journal of Neuroscience, 25(47):11003–11013, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Aitchison Laurence and Lengyel Máté. With or without you: predictive coding and bayesian inference in the brain. Current opinion in neurobiology, 46:219–227, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Kanitscheider Ingmar, Coen-Cagli Ruben, and Pouget Alexandre. Origin of information-limiting noise correlations. Proceedings of the National Academy of Sciences, 112(50):E6973–E6982, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Cafaro Jon and Rieke Fred. Noise correlations improve response fidelity and stimulus encoding. Nature, 468(7326):964–967, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Averbeck Bruno B, Latham Peter E, and Pouget Alexandre. Neural correlations, population coding and computation. Nature reviews neuroscience, 7(5):358–366, 2006. [DOI] [PubMed] [Google Scholar]
- [34].Schneidman Elad, Berry Michael J, Segev Ronen, and Bialek William. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature, 440(7087):1007–1012, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Laughlin Simon. A simple coding procedure enhances a neuron’s information capacity. Zeitschrift für Naturforschung c, 36(9–10):910–912, 1981. [PubMed] [Google Scholar]
- [36].Wei Xue-Xin and Stocker Alan A. Mutual information, fisher information, and efficient coding. Neural computation, 28(2):305–326, 2016. [DOI] [PubMed] [Google Scholar]
- [37].Hénaff Olivier J and Simoncelli Eero P. Geodesics of learned representations. In 4th International Conference on Learning Representations, ICLR 2016, 2016. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






