Abstract
The role of correlated firing in representing information has been a subject of much discussion. Several studies in retina, visual cortex, somatosensory cortex, and motor cortex, have suggested that it plays only a minor role, carrying < 10% of the total information carried by the neurons (Gawne and Richmond, 1993; Nirenberg et al., 2001; Oram et al., 2001; Petersen et al., 2001; Rolls et al., 2003). A limiting factor of these studies, however, is that they were carried out using pairs of neurons; how the results extend to large populations wasn’t clear. Recently, new methods for modeling network firing patterns have been developed (Pillow et al., 2008; Nirenberg and Pandarinath, 2010), opening the door to answering this question for more complete populations. One study, Pillow et al. (2008), showed that including correlations increased information by a modest amount, ~20%; however, this work used only a single retina (primate) and a white noise stimulus. Here we performed the analysis using several retinas (mouse) and both white noise and natural scene stimuli. The results showed that correlations added little information when white noise stimuli were used (~13%), similar to Pillow et al.’s findings, and essentially no information when natural scene stimuli were used. Further, the results showed that ignoring correlations did not change the quality of the information carried by the population (as measured by comparing the full pattern of decoding errors). These results suggest generalization: the pairwise analysis in several species show that correlations account for very little of the total information. Now, the analysis with large populations in two species show a similar result, that correlations still account for only a small fraction of the total information, and, most significantly, the amount is not statistically significant when natural stimuli are used, making rapid advances in the study of population coding possible.
Keywords: correlations, noise correlation, retinal ganglion cell, parallel processing, information, vision, population coding, decoding, spike trains
Introduction
In the last several years, there has been a great deal of interest in whether correlations in spiking patterns carry important information (e.g., Meister et al., 1995; Nirenberg et al., 2001; Wu et al., 2001; Schneidman et al., 2006; also see reviews by Latham and Nirenberg, 2005; Averbeck et al., 2006). The question arises frequently because the answer has critical bearing on the research approaches that can be used to understand population coding. If these correlations do carry information, then direct, i.e., brute force, approaches for characterizing population activity can’t be used: one simply can’t find the mapping from stimulus to response, as such a mapping would require estimating response distributions in high dimensions – at least N dimensions for N neurons. For populations of more than 3 or 4 cells, the amount of data needed would be impossibly large, and one would have to turn to indirect approaches, such as estimating the response distributions with the correlational structure modeled parametrically.
In contrast, if correlations do not carry unique information, direct approaches become viable even for large populations. Under these conditions, one can characterize the population response distributions from the single neuron distributions. This latter scenario would allow much more rapid advances in the field of population coding.
Much of the work addressing this issue has focused on pairwise analyses. These studies, which include a broad range of neural areas, showed that correlations carry little information – less than 10% of the total information carried by each pair (Gawne and Richmond, 1993; Nirenberg et al., 2001; Oram et al., 2001; Petersen et al., 2001; Rolls et al., 2003). How this result scales with population size, however, is still a subject of debate. One possibility is that as population size increases, each pair will continue to contribute about the same amount of information. Since the number of pairs is proportional to the square of the number of neurons, this scaling behavior predicts that for large populations, correlations could become a substantial or even dominant carrier of information. Another alternative is that as the size of the population grows, only some pairs contribute, or the contributions of the individual pairs become redundant. In these scenarios, the correlations would remain a small contributor to the total amount of information.
Recently, methods for modeling the firing patterns of retinal ganglion cells to white noise stimuli have been developed and used to address this (Pillow et al., 2008). Analysis of the importance of correlations using these models showed that correlations increased information by a relatively small amount, ~20%; however, the work used only a single retina and a white noise stimulus; no natural stimuli were used.
Here we performed the analysis using several retinas, large populations of cells, and both white noise and natural stimuli. The results showed that correlations added little information when white noise stimuli were used (~13%), similar to the results of Pillow et al. (2008), and essentially no information when natural stimuli were used. Thus, the second alternative is more likely the correct one: correlations are a relatively minor contributor to the information carried by populations of neurons - not only for neuronal pairs, but also for whole populations. This is also consistent with a smaller population study with natural scenes in salamander (Oizumi et al., 2010). Thus, correlations likely play a different role in network functioning (e.g., reinforcing network learning (reviewed in Feldman, 2009) or shaping network development (reviewed in Blankenship and Feller, 2010)).
Methods
Defining correlations
Two types of correlations are commonly referred to in the literature. One is called “noise correlation” (Gawne and Richmond, 1993) and is the focus of this paper. Neural responses r=(r1,…,rn) are noise-correlated if and only if
where (r1,…,rn) are the individual neural responses that constitute the population response r to the stimulus x.
The second type is called “signal correlation” (Gawne and Richmond, 1993) and differs from noise correlation in that it takes the average over all stimuli. Neural responses are signal-correlated if and only if
To provide intuition for what these two types of correlations are, we give an example, following from Nirenberg and Latham (2003). Suppose one presents a flash of light while recording from two ON-type ganglion cells that lie far apart on the retina (such that their receptive fields don’t overlap). Because the cells are both ON cells, they will both fire at the onset of the flash. The similarity in their response is an example of signal correlations, and its role in neural coding is clear and not disputed. If, though, the two cells are close enough to receive common input from presynaptic cells (e.g., common photoreceptors, amacrine cells, etc.), then they would show correlations above and beyond the signal correlations. These extra correlations are the noise correlations; their contribution to the information carried by the cells has become the subject of much debate and is the focus of this paper.
Stimuli
The retinas were stimulated with two photopic, grayscale stimuli of identical luminance and contrast: binary spatio-temporal white noise (WN) and a grayscale natural scene movie (NS). The natural scene movie was recorded in New York City’s Central Park, and had a temporal power spectrum of 1/f2.04, where f is temporal frequency, and a spatial power spectrum of 1/ω2.09, where ω is spatial frequency. Both were presented at 15 Hz, using an LCD projector driven by a computer running custom software on a real-time version of Red Hat Linux. Luminance was 0.24 μW/cm2 on the retina (in the photopic range); root-mean-squared contrast was 0.087 μW/cm2. The white noise stimulus covered 10 × 9 squares (with each square corresponding to 160 × 160 μm on the retina); the natural movie stimulus covered 20 × 18 squares (with each square covering 80 × 80 μm on the retina). For each stimulus, we had a training set, which was used to fit model parameters, and a testing set, which was used for evaluating the models and making calculations; the latter were referred to as the “out-of-sample stimuli”.
Measuring degree of correlation
The degree of correlation was measured using the excess correlated fraction (ECF) following Nirenberg et al. (2001). For each pair of cells, the ECF was calculated as follows: first, the “raw fraction” of correlated spikes was determined. This was the number of spikes that occurred within 1 ms of each other divided by the total number of spikes produced by the pair. A second quantity, the “shifted fraction”, was then determined. It was obtained by pairing responses from the two cells when they were presented with the stimulus at different times, i.e., when their responses were shifted by one repeat relative to each other (Perkel et al., 1967). The shifted fraction was then calculated by counting the number of spikes in the shifted pair that occurred within 1 ms of each other and dividing this by the total number of spikes for the pair. The ECF is then the difference between the raw and shifted fraction.
Shift-corrected cross-correlograms were generated in a manner similar to that described above for obtaining the ECF. Briefly, for each pair of neurons, the raw cross-correlogram was first determined from the two cells’ simultaneously-recorded responses. The “shift predictor” was calculated from their responses recorded on separate repeats, and was then subtracted from the raw cross-correlogram to yield the shift-corrected cross-correlogram (Perkel et al., 1967).
Independent and coupled models
Two models were constructed from the neural responses: one in which the neurons were treated as independent, and one in which coupling among the neurons was included. Each model consisted of a set of parameters that were fit by maximizing the log-likelihood of observed spiking data collected for 10 minutes using the training stimulus set (see Stimuli above).
In the independent model (Fig. 2A), the mth neuron’s firing rate was modeled by
(1) |
where X is the stimulus, * denotes spatiotemporal convolution, Lm is the spatiotemporal impulse response corresponding to the linear filter for the mth neuron, and Nm is a function that describes its nonlinearity. The nonlinearities were parameterized as cubic spline functions with 6 knots. Knots were spaced to cover the range of values given by the linear filter output of the models. Hm is the spike history filter for the mth neuron, and τm is the sequence of spike times for the mth neuron.
In the coupled model (Fig. 2B) (Pillow et al., 2008), the influence of neuron k on neuron m is modeled by convolution of a linear filter with the sequence of spikes on neuron k. The outputs of these convolutions for all neighbors k are summed together and added to the stimulus filter output for neuron m, and the sum is then fed into the mth neuron’s nonlinearity function, Nm. Formally, the model is
(2) |
where τk is the sequence of spikes times of neuron k.
Each neuron’s linear filter Lm was assumed to be a product of a spatial function (extending through 5 × 5 squares), and a temporal function (extending through 18 time bins, 67 ms each). Dimensionality was further reduced by constraining the temporal function to a linear combination of 10 basis functions (raised cosines), as in Nirenberg et al. (2010), following Pillow et al. (2008). Similarly, coupling filters were parameterized by four raised-cosine basis functions (using additional basis functions was not found to improve fitting), and had a temporal extent of 9 ms (slightly longer than the observed correlogram widths).
Parameters were fit using an expectation maximization procedure, as described by Paninski et al. (2007). The quantity maximized is the log likelihood of the observed spike trains under the model (for a given stimulus). Because the models we’re using produce an output governed by an inhomogeneous Poisson process, the log likelihood of the observed spike trains can be calculated according to the standard formula:
(3) |
In this equation, λm is the instantaneous firing rate of the mth neuron (predicted by either the independent model (eq. 1) or the coupled model (eq. 2)), and Z is the log-likelihood of the population activity in which the mth neuron spikes at times τm. Z is a sum of terms, one for each neuron m. The term for the mth neuron (the expression in the larger square brackets) is composed of a sum and an integral. The sum term boosts the log-likelihood if the neuron produces a spike when the model’s instantaneous firing rate (λm) is high. The integral is a standard penalty term that prevents models from achieving a high log-likelihood merely by predicting high firing rates overall (Paninski et al., 2007).
To find the parameters that maximize the log likelihood in Eq. 3, we used the following procedure. We began by assuming that the nonlinearity Nm was exponential, since in this case, a global maximum for the log likelihood Z (Eq. 3) is assured (Paninski et al., 2007). After optimizing the linear filters for an exponential nonlinearity (by gradient ascent), the exponential nonlinearity was replaced by a spline, as in Nirenberg et al. (2010). Final model parameters were then determined by alternating stages of maximizing the log likelihood with respect to (i) the spline coefficients and (ii) the filter parameters, until a maximum was reached.
Models were validated by observing their spike-train prediction performance for 10 minutes of novel spiking data, that is, data produced from the retina by out-of sample stimuli (stimuli not used to fit the model).
Recording
Electrophysiological recordings were obtained in vitro from the isolated mouse retina. Recordings of central retinal ganglion cells (RGCs) were performed on a 64 electrode multi-electrode array using methods described previously (Dedek et al., 2008). Spike waveforms were recorded using a Plexon Instruments Multichannel Neuronal Acquisition Processor (Dallas, TX), and a standard spike sorting method (Fee et al., 1996) was used to identify individual cells. Recordings contained 16 to 33 cells per retina; cells that had refractory period violations above 5% were excluded.
Calculating information
For both stimuli (white noise and natural movies) and both models (independent and coupled), we estimated information rates by decoding the population responses and comparing the accuracy of the decoded stimulus with the actual stimulus. Information calculations were performed from data produced by the test stimuli (i.e., data not used to fit the model parameters).
For white noise, we followed the procedure of Pillow et al. (2008): briefly, we used the Bayes’ least squares estimator to decode population responses for both the independent and coupled models. The stimulus variable, xi, was a 10-element binary vector representing the luminance of a single spatial checker (with spatial location chosen to maximize coverage by the population of recorded cells) over 10 frames. The response variable r was the set of spike trains recorded across all neurons (r1,…,rn) for 2 seconds beginning with the first frame of xi. For each response r, we calculated the likelihood that it would have been elicited by each of the 210 possible stimuli xi. This is the likelihood exp(Z) given by Eq. 3, where the rate functions on the right-hand side are taken from Eq. 1 for the independent model or from Eq. 2 for the coupled model. It is proportional to p(r|xi), the probability that the stimulus xi would elicit a population response r. To convert p(r|xi) to what we need for decoding, the posterior probability p(r|xi), we use Bayes’ theorem:
where in this case p(xi) is uniform. The Bayes' least squares estimate of the stimulus given the response r is a probability-weighted sum over all possible stimuli:
Finally, as in Pillow et al. (2008), decoding performance was converted into an estimate of information rate (in bits/sec) via
where <·> denotes averaging across all responses. Note that the above equation is an estimate of the log of the signal-to-noise ratio (SNR) between the stimulus x and the decoded estimate x̂. Implicit in this estimate is the assumption that the white noise stimulus can be considered to be approximately Gaussian (see Pillow et al., 2008). There are two reasons that we choose to decode a single spatial pixel: 1) this allows for a direct comparison with Pillow et al. (2008), and 2) it facilitates the information calculations under the assumptions of a Gaussian channel. We use short snippets for decoding as it allows us to repeat the decoding calculation for many samples of the response, which is a necessary step for computing the SNR.
For natural scene stimuli, we calculated information by an approach that did not require the Gaussian approximation. The stimulus variable x consisted of the starting point of the 150-frame natural movie sequence, that is, x0 was the movie that started at frame 0, x1 was the movie that started at frame 1, etc. Each response r was taken as the spike trains recorded over a single 67 ms segment of the movie. The decoded stimulus was computed as the element from the stimulus set that maximized the posterior probability:
Mutual information was then calculated between x and x̂ with the standard plug-in estimator of entropy (Antos and Kontoyiannis, 2001). In order to ensure that the stimulus entropy was not a critical factor in our calculations, we also performed this analysis using half of the stimuli (x0, x2, x4, etc.) and a third of the stimuli (x0, x3, x6, etc.). As a further check for robustness, we repeated the calculation with the response variable consisting of longer segments of the spike trains, at lengths of 133 ms and 200 ms (the same was done for the decoding used for confusion matrices described below).
For information estimates, 95% confidence intervals were determined by taking 1000 bootstrap resamples of 200 decoded segments for the white noise stimulus conditions, and by taking 200 bootstrap resamples of 60 decoded segments (for each of the stimuli) for natural scenes.
Finally, we used the natural scene stimuli and responses to construct confusion matrices, as in Pandarinath et al. (2010). Briefly, a confusion matrix gives the probability that a neural response to a presented stimulus will be decoded as that stimulus (Hand, 1981). To find these probabilities, we used the same Bayesian approach as for calculating information. Each recorded response r was decoded by choosing the stimulus x most likely to have produced it, that is, the stimulus for which p(xi|r) was maximized. Here again, p(xi|r) was calculated from p(xi|r) using Bayes theorem, where the prior p(xi) was uniform . For each presentation of stimulus xi that resulted in a response r that was decoded as stimulus xj, the entry at position (i,j) in the confusion matrix was incremented.
Results
Our goal is to determine the extent to which information transmission by a neuronal population – the output cells of the retina – depends on correlations, specifically noise correlations (see Methods). Our strategy is to decode the population responses with two decoders: one that ignores correlations and one that exploits them. The decoding is carried out via a Bayesian framework, and therefore requires a model of the relationship between stimulus and response. The model determines how we treat the correlations: when we decode using an “independent” model, we ignore correlations; when we decode using a “coupled” model, we take them into account (see Methods for descriptions of the models).
To approach this goal, we proceeded in four steps. First, we measured the degree of correlation in our system, the ganglion cells of the mouse retina. Second, we constructed the two input/output models and verified that they perform as expected – that is, that the coupled model predicts correlations while the independent model does not. Third, using each model, we decoded out-of-sample stimuli (i.e., the test stimuli), both white noise and natural scenes. Finally, using the decoding results, we compared the information rates for the two models and assessed the quality of the decoding using confusion matrices.
We begin by characterizing the correlations in our model system, the mouse retina (Fig. 1). Fig. 1A shows 6 representative cross-correlograms, covering the range of strengths and timescales. The lower panels summarize the results for the population: Fig. 1B shows the distribution of correlation strengths, quantified by the excess correlated fraction, ECF (Nirenberg et al., 2001, see Methods), and Fig. 1C shows the distribution of correlation timescales. The range of ECF’s encountered (up to 33%) agrees with that previously reported for mouse retina (up to 34% in Nirenberg et al., 2001, and 33% in Jacobs et al., 2009) and other mammalian species (up to 27% in cat (Mastronarde, 1983) and 28% in rabbit (DeVries, 1999)). Similarly, the range of correlation timescales (<1 ms to 5 ms, with a tail out to 11 ms) is also consistent with results reported for other mammalian species, including cat, rabbit, and mouse (Mastronarde, 1983; DeVries, 1999; Nirenberg et al., 2001). Note that the range of ECFs in the retina is broad, and their potential impact, as discussed in the literature, is proposed to occur in two ways: through the cumulative effect of the many small correlations that make up the majority of the correlations, or through groups of highly correlated cells, which, though more rare, could also carry significant information (see Wu et al., 2001; Schneidman et al., 2006; Pillow et al., 2008; Meister et al., 1995; Nirenberg et al., 2001; Chatterjee et al., 2007; also discussed in Latham and Nirenberg, 2005, and Averbeck et al., 2006).
Having established that our dataset is representative of the typical levels of correlated activity in the retina, we constructed the two retinal input/output models mentioned above – the one that ignores correlations (the independent model), and the one that takes them into account (the coupled model). The two models are shown schematically in Fig. 2. The independent model (Fig. 2A) consists of a linear filter, a static nonlinearity, and Poisson spike generation for each neuron (i.e., an LNP cascade) and a post-spike filter for modeling refractoriness. The coupled model (Fig. 2B) consists of the same with an added dependence on the firing of other neurons in the population.
In Figs. 5 and 6, we will show the results of using the models to decode stimuli, but first we show that they have the necessary properties to allow us to test the hypothesis that correlations matter. Specifically, we show that both models predict the average responses well, indicating that they are capturing signal correlations, but only the coupled model predicts the detailed firing patterns and their correlations, indicating that it is capturing both signal and noise correlations, the latter being the correlations that are the subject of so much debate.
Fig. 3 shows the evaluation of the two models using the white noise stimulus. Fig. 3A shows rasters from several representative cells produced by the out-of-sample white noise stimulus (the testing set), along with predictions from the independent and coupled models, and Fig. 3B shows the corresponding average peristimulus time histograms (PSTHs).
Fig. 3C and D summarize the results for the whole dataset. As can be seen in the figures, both models predict the average responses equally well (Fig. 3C). Fig. 3D then shows that, in addition, the coupled model predicts the detailed firing patterns better than the independent model. The increase was quantified by calculating the log likelihood of the observed spike trains, given each model (Methods, Eq. 3). As shown in Fig. 3D, the coupled model consistently achieves the better performance.
Finally, to confirm that the coupled model was achieving the better performance for the right reason, that is, because it was capturing the noise correlations in the spike trains, we calculated cross-correlograms. Each plot in Fig. 3E shows the shift-corrected cross-correlogram of the true responses of a pair of neurons, along with the predictions from the independent and coupled models (the shift-corrected correlograms show only the noise correlations, not the full correlograms which also contain signal correlations as in Pillow et al., 2008 and Oizumi et al., 2010). As shown in the figure, the coupled model provides a good fit to the cross-correlograms, whereas the independent model fails to predict any peaks at all.
Fig. 4 provides a parallel analysis of the two models for natural scene stimuli. As was the case for the responses to white noise, the independent and coupled models provide equally good predictions of the average responses. This is seen in representative rasters (Fig. 4A) and PSTHs (Fig. 4B), and in the summary across all cells in all retinas (Fig. 4C). Also in agreement with the findings for white noise, the coupled model is better able to predict individual spike trains (Fig. 4D) and cross-correlograms (Fig. 4E).
Having shown that the coupled model captures the noise correlations, we now use a Bayesian decoding framework (given in Methods) to determine how much information these correlations carry.
Fig. 5A shows the results of the decoding analysis for the white noise stimulus in the three retinas. As shown in the figure, capturing the correlations led to only a small increase in decoding performance, about 1 bit/s. On average this amounted to a 13% increase in information.
Fig. 5B extends this analysis to the responses to natural scenes and shows that for these stimuli, the contribution of noise correlations is even less, effectively undetectable. As was the case for white noise, taking noise correlations into account via the coupled model yields superior predictions of spike trains on a trial-by-trial basis (Fig. 4D). But, as shown in Fig. 5B, despite this, taking noise correlations into account yielded no detectable increase in the amount of information that can be decoded about the stimulus. Similar results were obtained when other response lengths were used for decoding (see Methods). Thus, the information in the correlations is essentially redundant to the information in the independent firing patterns, and including them in the decoding does not add information.
The Supplement shows two additional analyses that further support these conclusions: Fig. S1 shows that the results are robust to goodness of fit (as measured by R2), and Fig. S2 shows that the results hold when the analysis is restricted to subpopulations of cells with particularly high mutual correlations.
At this point, we have shown that taking noise correlations into account has minimal effect on the amount of information carried by the population. We now take this one step further, and ask whether they change the kind of information that is carried. To do this, we constructed confusion matrices corresponding to the information calculations for the natural scene stimuli shown in Fig. 5B. Two confusion matrices are shown: one is constructed with 50 stimuli drawn from the natural scene movies, and the other with 75 stimuli, also drawn from the natural scene movies. The stimuli were 67 ms movie segments. These matrices, shown in Fig. 6, indicate which stimuli are correctly decoded, and which ones are mistaken for another. Rows correspond to the presented stimulus; columns correspond to the decoded stimulus. The intensity of a pixel in row i and column j corresponds to the probability that the neural response to stimulus i is decoded as stimulus j. Thus, perfect decoding would result in a bright diagonal line, and off-diagonal elements represent the pattern of decoding errors.
As is clear from Fig. 6, the pattern of decoding for the independent model is the same as the pattern for the coupled model – that is, the pattern of correct classifications (the on-diagonal elements), and the pattern of errors (the off-diagonal elements) are the same for both models. This finding also held for decoding among 150 stimuli and for decoding with longer response durations (133 ms and 200 ms). We conclude that under natural scene stimulation, taking noise correlations into account has little impact not only on the amount of information that is present, but also on the kind of information – that is, it has almost no effect on which stimuli are properly decoded, and which are confused.
Discussion
The question of whether neural responses can be understood or decoded without taking noise correlations into account is crucial to the study of population coding. If cells can’t be treated as independent units, then a recording from one neuron can’t be understood without taking into account the neurons interacting with it. Determining the correlational structure for large populations of neurons is nontrivial: using brute force methods on populations with more than a few neurons is not possible, since the amount of data needed to characterize the population response scales exponentially with the number of cells in the population. Therefore, alternative approaches, such as characterizing the correlational structure parametrically, would be needed. For this reason, the answer to the question of whether noise correlations carry information has direct and significant impact on the direction of research efforts in the field (Nirenberg et al., 2001; Averbeck and Lee, 2004; Latham and Nirenberg, 2005; Averbeck et al., 2006; see Brown et al., 2004 in addition for further discussion of the problems of studying large neuronal populations).
Though this question has already been studied for pairs of neurons, finding that noise correlations add little or negligible amounts of information (Gawne and Richmond, 1993; Nirenberg et al., 2001; Oram et al., 2001; Petersen et al., 2001; Rolls et al., 2003; Levine, 2004), the possibility remained that they could contribute in larger amounts in more complete populations. The concern remained in part because of the matter of scaling: as the number N of neurons in a population grows, the number of pairs of neurons grows in proportion to N2. Thus, if the noise correlations from each pair were to contribute non-redundantly to the total information, their contribution would eventually become very large even though the contribution from individual pairs is small. Furthermore, higher-order (non-pairwise) interactions could also contribute in large populations. This study however confirms that neither of these was the case: even in the large populations we used (as high as 26 neurons), noise correlations continued to account for only a small fraction of the total information.
Correlations do not carry significant information in complete subpopulations
Unlike the two previous population studies (Pillow et al., 2008; Oizumi et al., 2010), our analysis included more than one cell class; this was done to allow correlations among different cell types to be included, such as those between ON cells and ON-OFF cells or OFF cells and ON-OFF cells. To relate our results to previous studies, though, we also performed the analysis on two cell type-specific patches (Fig. 7) – a patch of 10 ON cells and a patch of 7 ON-OFF cells. We found that our results still hold for both of these patches, and, in fact, within these single cell type patches, correlations account for even less of the information than in the full mixed populations, possibly because single cell type mosaics do not include the strong correlations that exist between different cell types with overlapping response properties (DeVries, 1999; Greschner et al., 2011) (e.g., between ON and ON-OFF cells).
Ignoring correlations does not affect the quality of the information
We emphasize that our analysis went a step beyond determining the contribution of noise correlations to the amount of information: we showed that taking noise correlations into account also does not change the kind of information carried. We showed this using confusion matrices, which delineate which stimuli were correctly decoded, and exactly which errors were made. As shown in Fig. 6, the confusion matrices that emerge when correlations are taken into account are very nearly identical to the confusion matrices that emerge when they are ignored. Thus, taking noise correlations into account does not change either the amount or the quality of the information carried by retinal spike trains.
Conclusions
We have shown that for multiple types of stimuli (white noise and natural scenes), cells in a retinal population can be treated as independent with little or no loss of information. This greatly simplifies, and, therefore, facilitates the study of retinal population coding. The fact that the result holds for natural stimuli is notable because natural stimuli have both local and long range correlations that are not found in white noise, and, as a result, the noise correlations they induce may differ. Hence, generalization of the result from white noise to natural scenes is nontrivial, and, since natural stimuli are the biologically relevant ones, this generalization is critical to know.
One question that naturally arises is whether the conclusions – that population coding can be studied by treating the neurons as independent – will generalize to other regions of the CNS. At the level of pairs of neurons, the contribution of noise correlations is similar between the retina and many cortical areas, including visual, motor, and somatosensory (Oram et al., 2001; Petersen et al., 2001; Rolls et al., 2003), and this contribution is small. While this suggests generalization, it doesn’t assure it. Good parametric models for capturing global input-output relationships will open the door to direct tests along the lines of the ones reported here.
Supplementary Material
Highlights.
We examine the role of correlations for carrying information in large populations
We find that correlations add little information under white noise stimulation
We find that correlations add no discernible information under natural scenes
This means neurons may be treated as independent for many applications
Acknowledgments
We thank I. Bomash for helpful discussion and J. Pillow for both helpful discussion and for generously contributing software, particularly at the initiation of the project. This work was supported by NIH EY012978 to S.N., NIH EY07977 and the Tri-Institutional Training Program in Vision Research for M.M. and Z.N., and the Tri-Institutional Training Program in Computational Biology for Z.N.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Antos A, Kontoyiannis I. Convergent properties of functional estimates for discrete distributions. Random Structures and Algorithms. 2001;19:163–193. [Google Scholar]
- Averbeck BB, Lee D. Coding and transmission of information by neural ensembles. Trends Neurosci. 2004;27(4):225–30. doi: 10.1016/j.tins.2004.02.006. [DOI] [PubMed] [Google Scholar]
- Averbeck BB, Latham Peter E, Pouget A. Neural correlations, population coding and computation. Nat Rev Neurosci. 2006;7(5):358–66. doi: 10.1038/nrn1888. [DOI] [PubMed] [Google Scholar]
- Blankenship AG, Feller MB. Mechanisms underlying spontaneous patterned activity in developing neural circuits. Nat Rev Neuro. 2010;11:18–29. doi: 10.1038/nrn2759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown EN, Kass RE, Mitra PP. Multiple neural spike train data analysis: state-of-the-art and future challenges. Nat Neurosci. 2004;7(5):456–461. doi: 10.1038/nn1228. [DOI] [PubMed] [Google Scholar]
- Chatterjee S, Merwine DK, Amthor FR, Grzywacz NM. Properties of stimulus-dependent synchrony in retinal ganglion cells. Vis Neurosci. 2007;24:827–843. doi: 10.1017/S0952523807070757. [DOI] [PubMed] [Google Scholar]
- Dedek K, Pandarinath C, Alam NM, Wellershaus K, Schubert T, Willecke K, et al. Ganglion cell adaptability: does the coupling of horizontal cells play a role? PLoS One. 2008;3(3):e1714. doi: 10.1371/journal.pone.0001714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeVries SH. Correlated firing in rabbit retinal ganglion cells. J Neurophys. 1999;81(2):908–920. doi: 10.1152/jn.1999.81.2.908. [DOI] [PubMed] [Google Scholar]
- Fee MS, Mitra Partha P, Kleinfeld D. Automatic sorting of multiple unit neuronal signals in the presence of anisotropic and non-gaussian variability. J Neurosci Methods. 1996;69(2):175–188. doi: 10.1016/S0165-0270(96)00050-7. [DOI] [PubMed] [Google Scholar]
- Feldman DE. Synaptic mechanisms for plasticity in neocortex. Ann Rev Neurosci. 2009;32:33–35. doi: 10.1146/annurev.neuro.051508.135516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gawne TJ, Richmond BJ. How independent are the messages carried by adjacent inferior temporal cortical neurons? J Neurosci. 1993;13(7):2758–2771. doi: 10.1523/JNEUROSCI.13-07-02758.1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Greschner M, Shlens J, Bakolitsa C, Field GD, Gauthier JL, Jepson LH, Sher A, Litke EM, Chichilnisky EJ. Correlated firing among major ganglion cell types in the primate retina. J Physiol. 2011;589(1):75–86. doi: 10.1113/jphysiol.2010.193888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hand DJ. Discrimination and classification. Wiley; Chichester, UK: 1981. [Google Scholar]
- Jacobs AL, Fridman G, Douglas RM, Alam NM, Latham PE, Prusky GT, Nirenberg S. Ruling out and ruling in neural codes. PNAS. 2009;106(14):5936–5941. doi: 10.1073/pnas.0900573106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levine MW. The potential coding utility of intercell cross-correlations in the retina. Biol Cybern. 2004;91(3):182–7. doi: 10.1007/s00422-004-0492-4. [DOI] [PubMed] [Google Scholar]
- Mastronarde DN. Correlated firing of cat retinal ganglion cells. I. Spontaneously active inputs to X- and Y-cells. J Neurophysiol. 1983;49(2):303–24. doi: 10.1152/jn.1983.49.2.303. [DOI] [PubMed] [Google Scholar]
- Meister M, Lagnado L, Baylor DA. Concerted Signaling by Retinal Ganglion Cells. Science. 1995;270(5239):1207–1210. doi: 10.1126/science.270.5239.1207. [DOI] [PubMed] [Google Scholar]
- Nirenberg S, Carcieri SM, Jacobs a L., Latham PE. Retinal ganglion cells act largely as independent encoders. Nature. 2001;411(6838):698–701. doi: 10.1038/35079612. [DOI] [PubMed] [Google Scholar]
- Nirenberg S, Bomash I, Pillow JW, Victor JD. Heterogeneous response dynamics in retinal ganglion cells: the interplay of predictive coding and adaptation. J Neurophysiol. 2010;103(6):3184–3194. doi: 10.1152/jn.00878.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nirenberg S, Latham PE. Decoding neuronal spike trains: how important are correlations? PNAS. 2003;100(12):7348–7353. doi: 10.1073/pnas.1131895100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nirenberg S, Pandarinath C. A retinal prosthetic strategy with the capacity to restore normal vision. Society for Neuroscience Abstracts. 20102010:20.1. [Google Scholar]
- Oizumi M, Ishii T, Ishibashi K, Hosoya T, Okada M. Mismatched decoding in the brain. J Neurosci. 2010;30(13):4815–4826. doi: 10.1523/JNEUROSCI.4360-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oram MW, Hatsopoulos NG, Richmond BJ, Donoghue JP. Excess synchrony in motor cortical neurons provides redundant direction information with that from coarse temporal measures. J Neurophysiol. 2001;86(4):1700–1716. doi: 10.1152/jn.2001.86.4.1700. [DOI] [PubMed] [Google Scholar]
- Pandarinath C, Victor JD, Nirenberg S. Symmetry Breakdown in the ON and OFF Pathways of the Retina at Night: Functional Implications. J Neurosci. 2010;30(30):10006–10014. doi: 10.1523/JNEUROSCI.5616-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paninski L, Pillow J, Lewi J. Statistical models for neural encoding, decoding, and optimal stimulus design. Prog Brain Res. 2007;165:493–507. doi: 10.1016/S0079-6123(06)65031-0. [DOI] [PubMed] [Google Scholar]
- Perkel DH, Gerstein GL, Moore GP. Neuronal spike trains and stochastic point processes. II. Simultaneous spike trains. Biophys J. 1967;7:419–440. doi: 10.1016/S0006-3495(67)86597-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Petersen RS, Panzeri S, Diamond ME. Population coding of stimulus location in rat somatosensory cortex. Neuron. 2001;32(3):503–514. doi: 10.1016/s0896-6273(01)00481-0. [DOI] [PubMed] [Google Scholar]
- Pillow JW, Shlens J, Paninski L, Sher A, Litke AM, Chichilnisky EJ, Simoncelli EP. Spatio-temporal correlations and visual signaling in a complete neuronal population. Nature. 2008;454:995–999. doi: 10.1038/nature07140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rolls ET, Franco L, Aggelopoulos NC, Reece S. An information theoretic approach to the contributions of the firing rates and the correlations between the firing of neurons. J Neurophysiol. 2003;89(5):2810–2822. doi: 10.1152/jn.01070.2002. [DOI] [PubMed] [Google Scholar]
- Schneidman E, Berry MJ, Segev R, Bialek W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature. 2006;440:1007–1012. doi: 10.1038/nature04701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu S, Nakahara H, Amari S. Population coding with correlation and an unfaithful model. Neural Comput. 2001;13:775–797. doi: 10.1162/089976601300014349. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.