Large-scale two-photon imaging revealed super-sparse population codes in the V1 superficial layer of awake monkeys

Shiming Tang; Yimeng Zhang; Zhihao Li; Ming Li; Fang Liu; Hongfei Jiang; Tai Sing Lee

doi:10.7554/eLife.33370

. 2018 Apr 26;7:e33370. doi: 10.7554/eLife.33370

Large-scale two-photon imaging revealed super-sparse population codes in the V1 superficial layer of awake monkeys

Shiming Tang ^1,^2,^3,^✉, Yimeng Zhang ^4,⁵, Zhihao Li ^4,⁵, Ming Li ^1,^2,³, Fang Liu ^1,^2,³, Hongfei Jiang ^1,^2,³, Tai Sing Lee ^4,^5,^✉

Editor: Emilio Salinas⁶

PMCID: PMC5953536 PMID: 29697371

Abstract

One general principle of sensory information processing is that the brain must optimize efficiency by reducing the number of neurons that process the same information. The sparseness of the sensory representations in a population of neurons reflects the efficiency of the neural code. Here, we employ large-scale two-photon calcium imaging to examine the responses of a large population of neurons within the superficial layers of area V1 with single-cell resolution, while simultaneously presenting a large set of natural visual stimuli, to provide the first direct measure of the population sparseness in awake primates. The results show that only 0.5% of neurons respond strongly to any given natural image — indicating a ten-fold increase in the inferred sparseness over previous measurements. These population activities are nevertheless necessary and sufficient to discriminate visual stimuli with high accuracy, suggesting that the neural code in the primary visual cortex is both super-sparse and highly efficient.

Research organism: Rhesus macaque

Introduction

The efficient-coding hypothesis is an important organizing principle of any sensory system (Barlow, 1981; Olshausen and Field, 1996). It predicts that neuronal population responses should be sparse, although the optimal level of sparseness depends on many factors. Most of the experimental evidence of sparse coding comes from the responses of individual neurons that were exposed to a large set of natural image stimuli, measured using single-unit recording techniques (Haider et al., 2010; Hromádka et al., 2008; Rust and DiCarlo, 2012; Vinje and Gallant, 2000). The sparseness of a neuron’s response to a large set of stimuli in these studies was used to infer the sparseness of the population responses. The first direct measurement of population response sparseness was performed with two-photon (2P) GCaMP6 signal imaging in rodents (Froudarakis et al., 2014). One potential confound in this work, however, is that GCaMP6 responses are subject to saturation at neuronal firing rates above 60–80 Hz (Chen et al., 2013; Froudarakis et al., 2014), leading to the potential to under-estimate the sparseness measures that can capture the peakedness or sharpness of the population response distributions. Thus, direct and accurate measurement of the population sparseness of neuronal response, particularly in non-human primates, is required.

In this study, we provided the first direct measurement of population sparseness from V1 of awake macaques. We performed 2P imaging on a large population of neurons using the genetically encoded calcium indicator GCaMP5 (Akerboom et al., 2012; Denk et al., 1990), delivered with adeno-associated viruses (AAVs). We showed previously that GCaMP5 exhibits linear non-saturating responses across a wide range of firing rates (10–150 Hz) (Li et al., 2017), allowing us to measure accurately the response sparseness of almost all of the neurons in layer two in V1 within a 850 µm x 850 µm field of view—the spatial scale of about one hypercolumn.

Results and discussion

Our 2P imaging of GCaMP5 recorded neuronal population calcium (Ca²⁺) responses to 2250 natural images in V1 layer two in two awake macaques (about 1000 neurons each). Each monkey performed a fixation task while stimuli were presented to the appropriate retinotopic position in the visual field. Each trial sequence consisted of: a blank screen presented for one second, followed by a visual stimulus for one second. Each activated cell’s region-of-interest (ROI) was defined as the compact region (>25 pixels) in which brightness exceeded three standard deviations (stds) above baseline, for each individual differential image. The standard ratio of fluorescence change (ΔF/F0) of each of these regions of interest (ROIs) during stimulus presentation was calculated as the neuron’s response (see 'Materials and methods').

The receptive fields (RFs) of the neurons were characterized using oriented gratings and bars presented in various positions. The RF centers of the imaged neurons were located between 3^o and 5^o in eccentricity. In each trial, a stimulus of 4^o x 4^o in size was presented, randomly drawn from a set of 2250 natural image stimuli (Figure 1, Figure 1—figure supplement 1c). The entire set of stimuli was repeated three times. These natural images evoked robust visual responses in the imaged neurons (Figure 1a,b, Figure 1—figure supplement 1c).

Figure 1. — (**a, b**) Calcium images of the neuronal population response to two different natural images (shown in the insets). Typically, only a few neurons, among the nearly 1000 neurons measured (1225 neurons for Monkey A or 982 neurons for Monkey B), responded strongly to a single patch of natural scenes. (c) The overall neuronal population responses to all 2250 natural images. Each cell was color-coded according to the response intensity to its optimal stimulus. (**d, e**) The distributions of neuronal population responses to the two natural images. Abscissa indicates the 1225 neurons that showed a significant response to natural images, in ranked order according to their responses to each image. Ordinate indicates Δ*F/F0*. (**f, g**) Frequency histograms showing the distributions of the number of stimuli (out of 2250) (y-axis) with different population sparseness, measured by the number of neurons activated strongly (x-axis). On average, fewer than 0.5% of the cells (6 cells out of 1225 for monkey A, and 4.1 cells out of 982 for monkey B) responded above half of their peak responses for any given image.

Figure 1—figure supplement 1. — (**a, b**) Calcium images of the neuronal population response to two different natural images (shown in the insets). Typically, only a few neurons, among the nearly 1000 neurons measured (1225 neurons for Monkey A or 982 neurons for Monkey B), responded strongly to a single patch of natural scenes. (c) The overall neuronal population responses to all 2250 natural images. Each cell was color-coded according to the response intensity to its optimal stimulus. (**d, e**) The distributions of neuronal population responses to the two natural images. Abscissa indicates the 1225 neurons that showed a significant response to natural images, in ranked order according to their responses to each image. Ordinate indicates Δ*F/F0*. (**f, g**) Frequency histograms showing the distributions of the number of stimuli (out of 2250) (y-axis) with different population sparseness, measured by the number of neurons activated strongly (x-axis). On average, fewer than 0.5% of the cells (6 cells out of 1225 for monkey A, and 4.1 cells out of 982 for monkey B) responded above half of their peak responses for any given image.

We recorded data from 1225 neurons in monkey A and 982 neurons in monkey B, and found that only a few neurons in each monkey responded strongly to any given image (Figure 1a,b), though most of the neurons responded strongly to at least one of the images in the set (Figure 1c). The rank-ordered distributions of the population responses were always sharply peaked (Figure 1d,e). On average, the percentage of neurons that responded above each’s half-maximum was 0.49% (6.0/1,225) for monkey A and and 0.42% (4.1/982) for monkey B (Figure 1f,g). This is a measure of population sparseness that can capture the peakedness of the population response distribution (see 'Materials and methods'). In other words, only about 0.5% of the cells responded strongly to each image, indicating a very high degree of sparseness in the strong population responses or a very sharp population response distribution.

We also examined each neuron’s stimulus specificity or life-time sparseness (Willmore et al., 2011). Interestingly, we found that most cells responded strongly to only a small number of images in the whole stimulus set (Figure 2a,b). The preferred images for individual neurons often shared common features. For example, neuron 653 of monkey A was most excited when its receptive field (0.8^o in diameter) covered the lower rim of the cat’s eye (indicated by the red dashed line in the inset in Figure 2b). The neuron’s preference for the specific curve feature was further confirmed by checking its selectivity to a variety of artificial patterns (Figure 2b). Similarly, neuron 949 of Monkey A was selective to a different specific curvature embedded within the natural stimuli (Figure 2d). A more systematic characterization of these cells using artificial stimulus patterns has been reported previously (Tang et al., 2018), and this work showed that many of these neurons are highly selective to specific complex patterns. To measure the sharpness of each neuron’s stimulus tuning curve — its life-time response sparseness — we computed the percentage of the stimuli that excited each neuron to >50% of the peak response found from the entire stimulus set. This population average was 0.49% (11/2,250) for Monkey A and 0.41% (9.3/2,250) for Monkey B (Figure 2e,f). This suggests that a high degree of stimulus specificity or life-time sparseness goes hand-in-hand with a heightened population sparseness.

Figure 2. — (**a, b**) The response of one example cell (cell 653) to the entire set of natural scene stimuli, exhibiting a high level of stimulus specificity. (**c, d**) Another example cell (cell 949) also shows high stimulus specificity. (**e, f**) The distributions of the stimulus specificity of neurons, in terms of the half-height width of the stimulus tuning curves. Each cell would typically respond strongly to fewer than 0.5% of the natural images in our test set.

Figure 2—figure supplement 1. — (**a, b**) The response of one example cell (cell 653) to the entire set of natural scene stimuli, exhibiting a high level of stimulus specificity. (**c, d**) Another example cell (cell 949) also shows high stimulus specificity. (**e, f**) The distributions of the stimulus specificity of neurons, in terms of the half-height width of the stimulus tuning curves. Each cell would typically respond strongly to fewer than 0.5% of the natural images in our test set.

To understand how much information was carried in the sparse ensemble of population activities, we evaluated how well the sparse neural responses allow a decoder to discriminate the 2250 stimuli (Quian Quiroga and Panzeri, 2009; Froudarakis et al., 2014). The entire population’s decoding accuracy was 54% for Monkey A and 38% for Monkey B, whereas the chance accuracy was 0.04% (1/2,250) (Figure 3a,b, horizontal dashed lines). These population decoding accuracies estimate the highest achievable decoding performance from the activities recorded for the full population, and serve as the upper limit of accuracy performance that can be achieved by decoders that are made from any subset of neurons in the population (Figure 3a,b, horizontal dashed lines).

Figure 3. — Y axes show the cross-validated decoding accuracy on the 2,250-way image classification task. Dash lines are the referential ‘achievable decoding performance’ in accuracy obtained using the original entire neural population responses; red lines (‘top only’ in key) show the decoding accuracies when different percentages of the top responses were kept and lower responses were removed (set to zero); blue lines (‘top excluded’ in legends) show the decoding accuracies when different percentages of top responses were removed (set to zero) and lower responses were kept. X axes show the percentage of top responses included (red curves) and excluded (blue curves). Check 'Materials and methods *Decoding Analysis'* for details. Gray vertical lines highlight the decoding accuracies including or excluding the top 0.5% responses. Since our classification task is a 2250-way one, the chance accuracy is 1/2,250, or about 0.4%.

Figure 3—figure supplement 1. — Y axes show the cross-validated decoding accuracy on the 2,250-way image classification task. Dash lines are the referential ‘achievable decoding performance’ in accuracy obtained using the original entire neural population responses; red lines (‘top only’ in key) show the decoding accuracies when different percentages of the top responses were kept and lower responses were removed (set to zero); blue lines (‘top excluded’ in legends) show the decoding accuracies when different percentages of top responses were removed (set to zero) and lower responses were kept. X axes show the percentage of top responses included (red curves) and excluded (blue curves). Check 'Materials and methods *Decoding Analysis'* for details. Gray vertical lines highlight the decoding accuracies including or excluding the top 0.5% responses. Since our classification task is a 2250-way one, the chance accuracy is 1/2,250, or about 0.4%.

Now that we know the highest achievable decoding performance of the entire population, we can build on that idea to assess the contribution of strong sparse responses to the overall decoding accuracy. We zeroed all responses below the top 0.5% response threshold and built a new decoding model to discriminate the 2250 stimuli. Here, we assume that only the strong signals above the threshold would be conveyed to downstream neurons successfully. Remarkably, a decoding accuracy of 28% for monkey A and 21% for monkey B could be achieved with only the top 0.5% of the strongest signals included (Figure 3a,b vertical gray lines). This means that transmitting the top 0.5% of the strongest responses for each image was sufficient to realize 50% of the maximum achievable decoding performance for either monkey.

Conversely, we can assess the necessity of the top 0.5% of responses by repeating the decoding experiment after removing the strongest signals by setting them to 0, and keeping the remaining 99.5% of the signals intact. The blue curves in Figure 3a and b show the decoding performance with signals above a range of percentage threshold ‘removed’. The intersections of the blue curves with the gray lines indicate that the decoding performance dropped by 50% when the strongest 0.5% of the signals are removed. Thus, we showed that these strong and sparse signals contain the information necessary to realize half of the decoding performance.

Figure 3a and b also provides a more complete picture of the decoding performance as a function of percentage threshold. Although the top 0.5% of the signals are both necessary and sufficient for realizing 50% of the performance, 99% of decoding performance is not reached until the top 40% (for monkey A) and top 30% (for monkey B) of the responses are included (saturation of the red curves). However, without the top 5% responses, the decoding performance drops to practically zero (blue curve). Thus, the strongest 5% of the responses are necessary for the achievement of full performance, but insufficient by themselves. In other words, the presence of other weaker signals in the population is required to achieve the full performance. The decoding results revealed that significant information contents are indeed carried by the superb sparse strong responses.

We note that, in the analyses described above, the selection of responses that were kept or removed was based on the absolute response magnitude of the cells, rather than on a normalized response magnitude for each cell as determined by its peak response. We made this decision because we assumed that downstream neurons may know where the signals come from, but it is not clear whether they can know (or remember from historical responses) the peak response magnitude of the neuron that is providing the signals. For better comparison with the population sparseness measure, we repeated the decoding experiment but this time chose the percentage threshold relative to the peak response of each individual neuron. The results were qualitatively similar, and showed that strong sparse responses still carry a disproportionate amount of information (Figure 3—figure supplement 1). Quantitatively, the top 1.5% of the responses are now required to achieve 50% of the decoder’s highest achievable performance. This suggests that absolute response strengths may potentially convey more discriminable information to downstream neurons. The decrease in performance based on the top 0.5% of responses selected on the basis of a relative threshold (Figure 3—figure supplement 1, red curve) is understandable because the relative threshold will include some useless contribution from neurons that have weak peak responses and will exclude the more useful contribution of some neurons that have high peak responses, amplifying the effect of noises particularly when a small number of neurons were selected.

In conclusion, this study provides the first simultaneous recording of a large dense population of neurons in V1 at single-cell resolution in response to a large set of natural stimuli, using 2P imaging in awake macaque. Earlier studies had provided life-time sparseness measurements in rodents (Hromádka et al., 2008; Haider et al., 2010), non-human primates (Rolls and Tovee, 1995; Vinje and Gallant, 2000; Rust and DiCarlo, 2012) and humans (Quiroga et al., 2005), as well as population sparseness measurement in rodents (Froudarakis et al., 2014), but our study provides the first direct measurement of sparseness of large-scale neuronal population responses, carried out in awake macaque monkeys and made possible by large-scale 2P imaging techniques. We found that a very small ensemble of neurons from V1’s superficial layer are activated at any one time in response to any given natural image. Using decoding analysis, we showed that the small ensembles of neural responses provide a surprisingly large amount of information to downstream neurons, providing for the discrimination of complex image patterns in natural scenes.

Earlier studies inferred population sparseness on the basis of measurements of life-time sparseness. For the first time, we have shown that direct measurements of population sparseness are indeed comparable to life-time sparseness measurements. However, the level of sparseness that we observed (0.5% at half maximal response) was considerably higher than that predicted by the earlier life-time estimates of sparseness, which were based on single unit recording in macaques (Rust and DiCarlo, 2012). Studies on rodents have yielded a considerable range of estimates of sparseness that vary across measurement techniques (Haider et al., 2010; Hromádka et al., 2008; Froudarakis et al., 2014). A single-unit study that used the cell-attached patch technique might have been the most accurate to date (Hromádka et al., 2008), and it showed that neurons were mostly silent in the awake auditory cortex, inferring that less than 2% of the neuronal population showed ‘well-driven’ responses (>20 Hz response frequency) to natural sounds. Our imaging shows that neurons in the superficial layers of V1 are densely packed and have small cell bodies. Thus, it may not be possible to obtain stable and well isolated single-unit signals over several hours using extracellular recording methods. Our study has reduced the biases inherent to previous extracellular recording studies — in neuronal sampling, in stimulus sampling, and in single-cell isolation — by invoking a response in virtually all of the neurons in a single field of view and in a particular layer by using a large set of natural stimuli.

The high degree of population sparseness that we observed is consistent with two recent conjectures from theoretical neuroscience. First, based on the metabolic costs of spiking, one group posits that fewer than 1% of the neurons should be substantially active concurrently in any brain area (Lennie, 2003). Second, and more importantly, theoretical sparse-coding studies have suggested that because V1 neurons are at least 200 times more abundant than their thalamic input, V1 neurons could be quite specialized in their feature selectivity and thus highly sparse in their population responses (Olshausen, 2013; Rehn and Sommer, 2007). We have indeed observed this very finding using 2P imaging techniques in the V1 superficial layers (Tang et al., 2018). These findings are reminiscent of the highly specific codes exhibited by neurons in the human medial temporal lobes (Quiroga et al., 2005), suggesting that many V1 neurons might be akin to highly specific ‘grandmother neurons’, although they may encode information in the form of an extremely sparse population code. The observed high degree of population sparseness and life-time sparseness are also consistent with our earlier observation that V1 neurons in this layer were tuned to complex patterns with a great degree of specificity (Tang et al., 2018, see also Hegdé and Van Essen, 2007). These findings reveal the complexity and specificity of feature selectivity and the super-sparse neural representation within a V1 hypercolumn, providing new understanding of the neural codes in the macaque primary visual cortex.

Materials and methods

Key resources table.

Reagent type (species) or resource	Designation	Source or reference	Identifiers	Additional information
Strain, strain background (Macaque)	Rhesus monkeys	Beijing Prima Biotech Inc	http://www.primasbio.com/cn/Default	http://www.primasbio.com/cn/Default
Recombinant DNA reagent	AAV1.hSyn.GCaMP5G	Penn Vector Core	V5072MI-R
Software, algorithm	Matab 7.12.0 (R2011a)	MathWorks	Matab 7.12.0 (R2011a)	https://www.mathworks.com
Software, algorithm	Codes for the decoding analysis and image movement correction	This paper	Codes for the decoding analysis and image movement correction	https://github.com/leelabcnbc/sparse-coding-elife2018 (copy archived athttps://github.com/elifesciences-publications/sparse-coding-elife2018)

Open in a new tab

All experimental protocols were approved by the Peking University Animal Care and Use Committee, and approved by the Peking University Animal Care and Use Committee (LSC-TangSM-5).

Subjects

The study used two adult rhesus monkeys (A and B), who were 4 and 5 years of age and weighed 5 and 7 kg, respectively (Li et al., 2017). Two sequential surgeries were performed on each animal under general anesthesia and strictly sterile conditions. In the first surgery, a 16 mm hole was drilled in the skull over V1. The dura was opened to expose the cortex, into which 50–100 nl AAV1.hSynap.GCaMP5G.WPRE.SV40 (AV-1-PV2478, titer 2.37e13 (GC/ml), Penn Vector Core) was pressure-injected at a depth of ~500 μm. After AAV injection, the dura was sutured, the skull cap was placed back, and the scalp was sutured. Then the animal was returned to its cage for recovery. Antibiotic (Ceftriaxone sodium, Youcare Pharmaceutical Group Co. Ltd., China) was administered for one week. After 45 days, a second surgery was performed, in which three head-posts were implanted on each animal’s skull, two on the forehead and one on the back of the head. A T-shaped steel frame was connected to these head-posts for head stabilization during imaging. The skull and dura were later opened again to explore the cortex. A glass cover-slip (diameter 8 mm and thickness 0.17 mm) was glued to a titanium ring and gently pressed onto the cortical surface. A ring-shape GORE membrane (20 mm in outer diameter) was inserted under the dura. The titanium ring was glued to the dura and skull with dental acrylic to form an imaging chamber. The whole chamber (formed by thick dental acrylic) was covered by a steel shell to prevent breakage of the cover-slip when the animal was returned to the home cage.

Behavioral task

During imaging, each monkey sat in a standard primate chair with head restraint and performed a fixation task, which involved fixating on a small white spot (0.1°) within a window of 1° for over 2 s to obtain a juice reward. Eye position was monitored with an infrared eye-tracking system (ISCAN, Inc.) at 120 Hz.

Visual stimuli

Visual stimuli were generated using the ViSaGe system (Cambridge Research Systems) and displayed on a 17’ LCD monitor (Acer V173, 80 Hz refresh rate) positioned 45 cm from the animal’s eyes. Each stimulus was presented for 1 s after a 1 s blank within a fixation period of 2 s. We estimated the RF sizes and positions of the imaged neurons with small drifting gratings and bars presented at different locations. The RFs were estimated to be 0.2° to 0.8° in size with RF locations between 3° and 5° in eccentricity for both monkeys.

Drifting and oriented gratings were tested to examine the visual responses of imaged neurons (Li et al., 2017). Small patches (0.8° in diameter) of gratings with 100% contrast square waves were presented to the center of RFs of imaged cells, with two spatial frequencies (4.0 and 8.0 cyc/deg) at two temporal frequencies (1 and 2 Hz), six orientations, and two directions (30° apart).

A natural stimulus set (NS) of 2,250 4° × 4° stimulus patches extracted from different natural scene photos was used to examine the neuronal responses to natural stimuli. The order of the stimuli was randomized in each session. These stimuli were tested on monkeys A and B, each with at least three repetitions.

Eye movement control

We analyzed the distribution of eye-positions during stimulus ON periods. The monkeys’ fixation during stimulus presentation (from 1 to 2 s in the graph) was stable and accurate. The distribution of eye positions during stimulus presentation, with standard deviations smaller than 0.05°, was significantly smaller than the typical receptive field sizes (ranging from 0.2° to 0.8°) of neurons at 3–5° eccentricities. To examine whether the eye movement made a significant contribution to the distribution of neuronal population responses, we compared the standard deviations (stds) of eye position in different population response classes of neurons: (1) weak responses (ΔF/F0 <0.5), (2) sparse strong responses (one or two cells responded), (3) dense responses (more than ten cells responded). We found no statistically significant differences in the distribution of eye position data in these three classes (Tang et al., 2018), indicating that the observed effects were not caused by movement differences. The ROC and decoding analysis (Figure 1—figure supplement 2), demonstrating the reliability of the neural responses across trials, confirm that the sparse population responses were repeatedly evoked by stimuli, and not by random eye-movement jitters.

Two-photon imaging

After a recovery period of 10 days after the second surgery, the animals were trained to maintain eye-fixation. Two-photon imaging was performed using a Prairie Ultima IV (In Vivo) 2P microscope (Bruker Nano, Inc., FMBU, formerly Prairie Technologies) powered by a Ti: Sapphire laser (Mai Tai eHP, Spectra Physics). The wavelength of the laser was set at 1000 nm. With a 16 × objective (0.8 N.A., Nikon), an area of 850 μm × 850 μm was imaged. A standard slow galvonometer scanner was used to obtain static images of cells with high resolution (1024 × 1024). The fast and resonant scan (up to 32 frames per second) was used to obtain images of neuron activity. The images were recorded at 8 frames per second by averaging each 4 frames. Infected cells of up to 700 μm in depth were imaged. We primarily focused on cells that were 160 μm to 180 μm deep, which included a high density of infected cells.

Imaging data analysis

All data analyses were performed using customized Matlab software (The MathWorks, Natick, MA). The images from each session were first realigned to a template image (the average image of 1000 frames) using a normalized cross-correlation-based translation algorithm, this corrected the X-Y offset of images caused by the relative movements between the objective and the cortex (Li et al., 2017).

The cell density was high in superficial V1, and many cell bodies were quite dim at rest. It was difficult to identify these cells directly by eye or by algorithm on the basis of their morphology as captured in static images. We therefore identified ROIs for cell bodies on the basis of their responses. The differential images (which were averaged frames of the ON stimulus period, from which we then subtracted the average of the stimulus OFF period, for each stimulus condition) were first filtered using low-pass and high-pass Gaussian filters (5 pixels and 50 pixels). Notably, these two filters were used solely for ROI identifications. In all further analyses, we used the raw data without any filtering. Connected subsets of pixels (>25 pixels) with average pixel value greater than 3 stds in these differential images were identified as active neuronal ROIs. Note that these 3 stds empirical value was used only for deciding the ROIs of the activated cells, and was not used as a cutoff threshold for measuring neuronal responses (Figure 1—figure supplement 3). The ratio of fluorescence change (ΔF/F0) of these ROIs was calculated for each activated cell. ΔF = F F0, where F0 is the baseline activity during the blank screen prior to stimulus onset in each trial and F is fluorescence activity in the ROI during stimulus presentation in the trial. A neuropil-correction was performed with an index of 0.7 (Chen et al., 2013).

Sparseness measure

The sparseness measure is used to quantify the peakedness of the response distribution. There are several different definitions of sparseness and corresponding sparseness measures (Willmore et al., 2011). One intuitive one for sparse codes, described by Willmore et al., 2011, is that “the population response distribution that is elicited by each stimulus is peaked (exhibits population sparseness). A peaked distribution is one that contains many small (approximately zero) magnitude values and only a small number of large values. Thus, a neural code will have high population sparseness if only a small proportion of the neurons are strongly active at any given time." A measure consistent with this intuition is the percentage of neurons that responded strongly, above a certain threshold relative to their peak response. This measure has been used in other studies (Rust and DiCarlo, 2012), typically with a half-peak response threshold.

The sparseness measures based on the calculation suggested by Rolls and Tovee (1995) and by Vinje and Gallant (2000) are popular for quantifying the sparseness of spiking data, but they are very sensitive to measurement noise and uncertain baselines because of nonlinearities and missing responses in the low spiking-rate range (<10 Hz) in calcium imaging (Li et al., 2017). The sparseness measure that we used in this study, which is based on the percentage of the cells or stimuli above the half-maximum of each neuron, is much less sensitive to low-level activities (iceberg effect) or baseline fluctuations in the calcium signal.

Stability and reliability of the neuronal measurements

For each single neuron, we examined whether the sparse strong responses (ΔF/F0 >50% max) observed across the 2250 stimuli were reliable across trials by performing the following ROC analysis (Quiroga et al., 2005). We set all the stimuli that produced mean responses greater than 50% of the observed maximum mean peak of the cell to be in the ON class, and all other stimuli to be in the OFF class. We computed the ROC for classifying the ON class against the OFF class based on the response of each single trial. If the responses above the half-maximum are stable across all trials, then the AUC (the area under the ROC curve) will be close to 1.0, because the ON and OFF classes are readily discriminable. The null hypothesis is that sparse strong responses will be spurious single-trial epileptic responses, and not repeatable across trials. To test this hypothesis, we shuffled all the responses against the stimulus labels, and recomputed the mean responses for all the stimuli across the trials. We performed 1000 shuffles. We found that most of the shuffled cases have much lower average peak responses because of the mismatch of the rigorous sparse responses across trials, suggesting that the sparse responses in the original data are reliable. To make an even stricter and more fair comparison with the original data on ROC terms, for each shuffle, we recomputed the maximum responses, and used the half of this mean maximum as the threshold to sort the stimuli into ON and OFF classes and repeated the ROC analysis to obtain the AUC for this shuffle. The probability of the null hypothesis is the percentage of the time that the AUCs of the 1000 shuffles reach the AUC of the original data. With this ROC analysis, we found >96% neurons that adhered to the null hypothesis (p<0.01) (Figure 1—figure supplement 2).

Decoding analysis

We used a nearest centroid classifier to discriminate the 2250 images on the basis of the population responses in each trial. As each image was tested three times, the nearest centroid classifier was trained on the basis of two trials for all images and tested on the hold-out trials. We repeated the procedure for each trial, performing 3-fold cross-validations.

For each monkey, we constructed neural response matrices (with dimension 2250 × 1225 for monkey A, and 2250 × 982 for monkey B) for three trials X⁽¹⁾, X⁽²⁾, and X⁽³⁾ that store the neural responses to all images in each trial as rows in its matrix. We trained and tested nearest-centroid classifiers via a three-fold cross-validation procedure across trials in a 2250-way image decoding task. Specifically, for trial t, during training, we computed the centroids of the other two trials C^(t) (if t = 1, C⁽¹⁾= (X⁽²⁾ + X⁽³⁾)/2; if t = 2, C⁽²⁾= (X⁽¹⁾ + X⁽³⁾)/2, etc.) and stored C^(t) in the classifier; during testing, given some row k of X^(t), which is the population neural response vector to image k in trial t, the (trained) classifier computed the Euclidean distances between row k of X^(t) and every row of C^(t). The model outputted the index (1,2,…,2249,2250) of the row in C^(t) that gives the smallest distance. The correct output is k and all other outputs are incorrect. The average decoding accuracy for this trial is defined as the percentage of correct outputs over all rows of X^(t). We repeated the above procedure for each trial and reported the average of three (average) decoding accuracies.

In our experiments, we first set the X^(t)s defined above to be the original recorded neural responses and computed the decoding accuracies for both monkeys. We refer to the accuracies obtained from the original neural data as ‘achievable decoding accuracies’. Later, to evaluate the amount of information in the strong sparse portions of the neural data, we set X^(t)s to be thresholded versions of the original data. We tried two classes of thresholding methods: ‘top only’ (red in Figure 3) and ‘top excluded’ (blue in Figure 3). In ‘top only’, we only kept the largest responses (p%) across images and trials in the thresholded version and set the smaller responses (100-p%) of the to be zero. In ‘top excluded’, which is complementary to ‘top only’, we set the largest responses (p%) to be zero and kept the smaller responses (100-p%). For both ‘top only’ and ‘top excluded’, we evaluated decoding accuracies at the following percentages (crosses in Figure 3): 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, and 99.

Given that the population sparseness was computed on the basis of the half-maximum of each individual neuron’s response, we also repeated the decoding experiment using a percentage threshold that is relative to each neuron’s peak response, rather than the absolute response threshold, to select the ‘top responding’ neurons to be included or excluded and reset the responses to be excluded in the data matrix to 0 accordingly for training and testing the decoder as before.

Software available

The code used for the decoding analysis and image movement correction can be found in https://github.com/leelabcnbc/sparse-coding-elife2018 (Zhang et al., 2018; copy archived at https://github.com/elifesciences-publications/sparse-coding-elife2018).

Acknowledgements

We are grateful to many colleagues for their insightful discussion and generous help on this paper, and in particular to Stephen L Macknik, Susana Martinez-Conde and Shefali Umrania for editing the manuscript. We thank Wenbiao Gan for the early provision of AAV-GCaMP5; and Peking University Laboratory Animal Center for excellent animal care. We acknowledge the Janelia Farm program for providing the GCaMP5-G construct, specifically Loren L Looger, Jasper Akerboom, Douglas S Kim, and the Genetically Encoded Calcium Indicator (GECI) project at Janelia Farm Research Campus Howard Hughes Medical Institute.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Shiming Tang, Email: tangshm@pku.edu.cn.

Tai Sing Lee, Email: tai@cnbc.cmu.edu.

Emilio Salinas, Wake Forest School of Medicine, United States.

Funding Information

This paper was supported by the following grants:

National Natural Science Foundation of China 31730109 to Shiming Tang.
National Natural Science Foundation of China China Outstanding Young Researcher Award 30525016 to Shiming Tang.
National Basic Research Program of China 2017YFA0105201 to Shiming Tang.
Peking University Project 985 grant to Shiming Tang.
Beijing Municipal Commission of Science and Technology Z151100000915070 to Shiming Tang.
NIH Office of the Director 1R01EY022247 to Tai Sing Lee.
National Science Foundation CISE 1320651 to Tai Sing Lee.
Intelligence Advanced Research Projects Activity D16PC00007 to Tai Sing Lee.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Resources, Data curation, Software, Funding acquisition, Investigation, Methodology, Writing—original draft, Project administration, Writing—review and editing.

Data curation, Writing—original draft, Writing—review and editing.

Data curation, Investigation.

Methodology.

Conceptualization, Data curation, Funding acquisition, Investigation, Writing—original draft, Writing—review and editing.

Ethics

Animal experimentation: Animal experimentation: All procedures involving animals were in accordance with the Guide of Institutional Animal Care and Use Committee (IACUC) of Peking University Animals, and approved by the Peking University Animal Care and Use Committee (LSC-TangSM-5). All surgrey was performed under general anesthesia and strictly sterile conditions, and every effort was made to minimize suffering.

Additional files

Transparent reporting form

elife-33370-transrepform.docx^{(241.2KB, docx)}

DOI: 10.7554/eLife.33370.010

Data availability

All data generated or analysed during this study are included in the manuscript and supporting files. Source data files have been provided for Figures 1, 2 and 3.

References

Akerboom J, Chen TW, Wardill TJ, Tian L, Marvin JS, Mutlu S, Calderón NC, Esposti F, Borghuis BG, Sun XR, Gordus A, Orger MB, Portugues R, Engert F, Macklin JJ, Filosa A, Aggarwal A, Kerr RA, Takagi R, Kracun S, Shigetomi E, Khakh BS, Baier H, Lagnado L, Wang SS, Bargmann CI, Kimmel BE, Jayaraman V, Svoboda K, Kim DS, Schreiter ER, Looger LL. Optimization of a GCaMP calcium indicator for neural activity imaging. Journal of Neuroscience. 2012;32:13819–13840. doi: 10.1523/JNEUROSCI.2601-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barlow HB. The Ferrier Lecture, 1980: critical limiting factors in the design of the eye and visual cortex. Proceedings of the Royal Society B: Biological Sciences. 1981;212:1–34. doi: 10.1098/rspb.1981.0022. [DOI] [PubMed] [Google Scholar]
Chen TW, Wardill TJ, Sun Y, Pulver SR, Renninger SL, Baohan A, Schreiter ER, Kerr RA, Orger MB, Jayaraman V, Looger LL, Svoboda K, Kim DS. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature. 2013;499:295–300. doi: 10.1038/nature12354. [DOI] [PMC free article] [PubMed] [Google Scholar]
Denk W, Strickler JH, Webb WW. Two-photon laser scanning fluorescence microscopy. Science. 1990;248:73–76. doi: 10.1126/science.2321027. [DOI] [PubMed] [Google Scholar]
Froudarakis E, Berens P, Ecker AS, Cotton RJ, Sinz FH, Yatsenko D, Saggau P, Bethge M, Tolias AS. Population code in mouse V1 facilitates readout of natural scenes through increased sparseness. Nature Neuroscience. 2014;17:851–857. doi: 10.1038/nn.3707. [DOI] [PMC free article] [PubMed] [Google Scholar]
Haider B, Krause MR, Duque A, Yu Y, Touryan J, Mazer JA, McCormick DA. Synaptic and network mechanisms of sparse and reliable visual cortical activity during nonclassical receptive field stimulation. Neuron. 2010;65:107–121. doi: 10.1016/j.neuron.2009.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hegdé J, Van Essen DC. A comparative study of shape representation in macaque visual areas v2 and v4. Cerebral Cortex. 2007;17:1100–1116. doi: 10.1093/cercor/bhl020. [DOI] [PubMed] [Google Scholar]
Hromádka T, Deweese MR, Zador AM. Sparse representation of sounds in the unanesthetized auditory cortex. PLoS Biology. 2008;6:e16. doi: 10.1371/journal.pbio.0060016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lennie P. The cost of cortical computation. Current Biology. 2003;13:493–497. doi: 10.1016/S0960-9822(03)00135-0. [DOI] [PubMed] [Google Scholar]
Li M, Liu F, Jiang H, Lee TS, Tang S. Long-Term Two-Photon Imaging in Awake Macaque Monkey. Neuron. 2017;93:1049–1057. doi: 10.1016/j.neuron.2017.01.027. [DOI] [PubMed] [Google Scholar]
Olshausen BA, Field DJ. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature. 1996;381:607–609. doi: 10.1038/381607a0. [DOI] [PubMed] [Google Scholar]
Olshausen BA. Highly overcomplete sparse coding. Proceedings of SPIE; 2013. 86510S. [DOI] [Google Scholar]
Quian Quiroga R, Panzeri S. Extracting information from neuronal populations: information theory and decoding approaches. Nature Reviews Neuroscience. 2009;10:173–185. doi: 10.1038/nrn2578. [DOI] [PubMed] [Google Scholar]
Quiroga RQ, Reddy L, Kreiman G, Koch C, Fried I. Invariant visual representation by single neurons in the human brain. Nature. 2005;435:1102. doi: 10.1038/nature03687. [DOI] [PubMed] [Google Scholar]
Rehn M, Sommer FT. A network that uses few active neurones to code visual input predicts the diverse shapes of cortical receptive fields. Journal of Computational Neuroscience. 2007;22:135–146. doi: 10.1007/s10827-006-0003-9. [DOI] [PubMed] [Google Scholar]
Rolls ET, Tovee MJ. Sparseness of the neuronal representation of stimuli in the primate temporal visual cortex. Journal of Neurophysiology. 1995;73:713–726. doi: 10.1152/jn.1995.73.2.713. [DOI] [PubMed] [Google Scholar]
Rust NC, DiCarlo JJ. Balanced increases in selectivity and tolerance produce constant sparseness along the ventral visual stream. Journal of Neuroscience. 2012;32:10170–10182. doi: 10.1523/JNEUROSCI.6125-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tang S, Lee TS, Li M, Zhang Y, Xu Y, Liu F, Teo B, Jiang H. Complex Pattern Selectivity in Macaque Primary Visual Cortex Revealed by Large-Scale Two-Photon Imaging. Current Biology. 2018;28:38–48. doi: 10.1016/j.cub.2017.11.039. [DOI] [PubMed] [Google Scholar]
Vinje WE, Gallant JL. Sparse coding and decorrelation in primary visual cortex during natural vision. Science. 2000;287:1273–1276. doi: 10.1126/science.287.5456.1273. [DOI] [PubMed] [Google Scholar]
Willmore BD, Mazer JA, Gallant JL. Sparse coding in striate and extrastriate visual cortex. Journal of Neurophysiology. 2011;105:2907–2919. doi: 10.1152/jn.00594.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Y, Tang S, Li M. sparse-coding-elife2018. 33b196c2018 https://github.com/leelabcnbc/sparse-coding-elife2018

eLife. doi: 10.7554/eLife.33370.014

Decision letter

Editor: Emilio Salinas¹

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Large-scale two-photon imaging revealed super-sparse population codes in V1 superficial layer of awake monkeys" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and David Van Essen as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Anna Wang (Reviewer #2); Stanley Klein (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

In this study, Tang and colleagues use two-photon calcium imaging in awake monkeys to investigate the sparseness of responses in V1. All reviewers were enthusiastic about the manuscript, noting that it was technically novel and conceptually important for delineating how V1 encodes the visual world. For instance, 5 important advances were pointed out: (1) No previous study has provided a direct data-based quantitative measure of sparseness in terms of number of cells in the population (<0.5%). This finding demonstrates that very few cells, for a large number of stimulus types (from simple to complex), are needed to encode each stimulus. This is a conclusion that, due to limitations of sampling, could not be arrived at by existing single unit neurophysiology approaches. (2) The study presents a large number of different stimuli, both simple and complex, to a large dense population of single cells, all observed simultaneously in monkey cortex. This is an innovative way to track how all cells within a single locus of cortex respond under a large number of different conditions. (3) This study invites reconsideration of V1 as an area that encodes simple features of visual stimuli. (4) This study opens up new ideas about what a hypercolumn is. (5) Technically, this is a high quality, tour-de-force study of monkey visual cortex that few in the world can achieve.

The reviewers also pointed out several areas where the manuscript requires improvement, most notably (1) discussing the possibility that GCaMP is missing lower firing rate neurons; (2) improving the clarity and justification of the analysis methods, as well as possible alternate interpretations of current findings; (3) improving the discussion of why these findings are novel, and what are the implications for the role of V1 in visual processing, as well as for the meaning of a hypercolumn; and (5) making the figures more clear. Specific comments and suggestions follow.

Essential revisions:

1) Concerns about GCaMP5.

1a) The authors point to the linearity of GCaMP5 as an advantage of the current study. While this is correct for large enough firing rates, it fails to mention the significant iceberg effect of GCaMP5, which is clearly demonstrated in the authors previous work (only spike rates larger than 10Hz evoke detectable fluorescence changes). In the current paper, the iceberg effect is only acknowledged in passing in the Materials and methods section. This iceberg effect is a serious issue for the current study. This paper reports average firing rates between 15 and 30 Hz, which would suggest that a detection threshold of 10Hz for the calcium imaging indeed will result in an overestimation of sparseness by failing to detect weaker responses. Most of the chronic calcium indicators currently available (including GCaMP6) have this problem, so there is currently no suitable alternative. However, the shortcomings of the chosen indicator, and the resulting potential for overestimating sparseness, should be addressed much more clearly than is currently the case. In its current form the paper is misleading, because it does not acknowledge the confounds of the GCaMP nonlinearities on the sparseness measurements.

1b) One fix for this problem could be to include actual measurements of spike rates for a population of V1 neurons. While those data would suffer from sampling biases not present in the two-photon data, they would still be the strongest possible complement to this data set. Without spike data, it is basically impossible to assess how much of a problem the iceberg effect could be.

1c) Perhaps a sensitivity analysis could be performed to test different assumptions about the response distributions below the Ca signal threshold. For instance, what happens if each "non-responsive" neuron is assigned a random response between 0 and 10 spikes/s? How would the sparseness results change in that case? That type of statistical analysis could be useful to estimate or provide bounds to the error in the sparseness measurement caused by the iceberg effect.

1d) The statement that the linearity of GCaMP5 makes the sensor more suitable than GCaMP6 (which potentially saturates at higher rates) is incorrect. Determining sparseness across a population requires a determination of how many neurons are 'on' during a particular stimulus presentation, not accurate measurements of tuning functions. The same holds for determining the life-time sparseness of a neuron, which also only requires a determination of how many stimuli drive a neuron over a baseline level, not the precise tuning function.

2) Concerns about statistical/analysis methods

2a) Sparseness measures. The authors use the half-height bandwidth as a measure for sparseness. This choice is rather arbitrary and should be further justified. It would seem more plausible to count all stimuli that evoke responses that are significantly larger than the baseline. At a minimum, the authors should explore how their assessment of sparseness changes if the criterion threshold is changed (how many stimuli evoke responses that are 10%, 20% etc. of the maximum response). In general, any such measure is problematic because of the iceberg effect of GCaMP mentioned above. This also needs to be discussed more explicitly.

2b) Comparison to traditional sparseness measures. The authors assert that sparseness measures used previously (as the one by Rolls and Tovee) are not applicable here because they are sensitive to changes in baseline level. However, these previous studies used baseline subtracted firing rates to calculate sparseness. The sensitivity of the traditional measures to changes in baseline levels therefore requires further explanation.

2c) Decoding. The 2 analysis parts of the paper are somewhat disconnected. One emphasizes single cell selectivity, whereas the other emphasizes population sparseness. It might be useful to set up the idea that single cell selectivity does or does not predict population sparseness. It seems two concepts are correlated, but I could imagine that this need not be so.

The first part assesses sparseness by thresholding responses on a per neuron basis. The second part assesses decoding based on groups of neurons by thresholding the population. What happens to the decoding if the response matrix is computed with the thresholding of part 1 applied (i.e., setting all responses below the half maximum for a neuron to 0)?

Furthermore, the discussion of the decoding results should be improved. Currently, it seems to imply a rather arbitrary threshold of around 20% that is considered 'good decoding' (e.g., in the comparison of the decoding results from the top 0.5% – which are around 20-30%, and the decoding results from the bottom 0.5% – which are around 15%). Both are far from the chance level, so these statements need to be further justified.

Finally, the authors conclude that the comparison of decoding performance for top and bottom responses demonstrates that strong responses are both necessary and sufficient to transfer relevant information. This is incorrect. The sufficiency is indeed demonstrated (accepting the assertion that decoding performance above some threshold constitutes successful decoding). However, to demonstrate necessity, they would have to demonstrate that successful decoding always implies the occurrence of strong responses. This is not the same as demonstrating that weak responses do not allow 'successful' decoding.

2d) It would be interesting to compare the sparseness of responses evoked by natural images to that evoked by gratings (which in other studies have been shown to drive a large percentage of superficial V1 neurons). This would also allow a better assessment of how many neurons could potentially respond, further alleviating concerns about cell health or other properties of the imaging region (although this concern is largely addressed by the fact that most neurons respond to at least some of the images in the natural image set).

2e) A famous paper by Quiroga, Reddy, Kreiman, Koch and Fried (2005) illustrated extremely high sparsity in cells that responded to images of Jennifer Aniston and Halle Berry. I'm actually surprised that there was no mention in the present paper of that finding by Quiroga. Although a direct comparison may not be appropriate given the differences in areas, it may still be informative to ask whether the V1 cells have greater sparseness than the Jennifer Aniston cell.

Another reason for connecting to the Quiroga paper is that they also do ROC analyses, but their ROC curves look very different than those of the present paper (see point 5e below). The comparison may provide further evidence that the sparseness calculations, including the ROC calculations, were done properly.

3) Clarification of experimental methods employed.

3a) Cell count. The overall number of cells per imaging region is crucial for estimating sparseness. The ROI-definition procedure adopted by the authors appears reasonable and well justified. However, a few additional details would be useful:

- Additional images of identified cells so that the accuracy of the chosen approach can be assessed.

- How does the imaging region from monkey 2 look like?

- Which manual steps are involved in the procedure (presumably, somebody checks the identified ROIs)?

- Were all data collected on the same day? If not, how are data and cell counts combined across days? How many cells were stable across those days?

- How many of these cells are filled in versus have a ring-like label? This will help to assess how many of the included neurons are presumably healthy and should exhibit normal responses.

3b) Visual stimuli. Only the size of the stimuli are given. What are the other characteristics of the natural image set? Are they in color? Are they isoluminant with the background? What spatial frequencies and colors do they span? How different is their content? Are they part of one of the standard sets of natural images used in other studies?

The claim that single neurons respond to similar features of stimuli is not well supported and premature [“neuron 653 of monkey A was most excited when its receptive field (0.8o in diameter) covered the lower rim of the cat's 1 eye'. 'neuron 949 of Monkey A was found to be selective to an opposite curvature embedded in its preferred natural stimulus set”].

4) Points for further discussion/elaboration.

4a) The Discussion is focused on issues of sparseness. However, there are two issues that are also worth mentioning. First, this study thus provides a fresh view of what V1 is doing, one that shifts the emphasis from simple orientation selectivity to complex natural stimuli, and which gives novel perspective on how the brain encodes natural stimuli. And second, the results show that, within the span of a single hypercolumn, all stimuli presented could be largely decoded. This supports Hubel and Wiesel's original concept that a hypercolumn contains all the machinery to encode everything at a single point in space, except that the manner of encoding may be distinct from (or more complex than) the original concept of selection from amongst an array of systematically organized ocular dominance, orientation, and color columns.

4b) A very important item that should be made very clear is the last sentence of the Abstract where the authors correctly claim that this is the first paper that shows sparseness of neuronal population responses in V1 in primates. They need to point out that papers like Frouderakis et al. (2014) were in mice and papers like Quiroga, et al. (2005) were in humans but not in V1. The statement at the end of the tenth paragraph of the Results and Discussion needs to make that clear. It is such an important point that it needs to be pointed out in the Introduction and the Conclusion. Other researchers would then become aware of the power of two-photon imaging.

5) Clarification of data/analyses in the figures.

5a) I find Figure 1F and G vs. L and M description very confusing. I think this is the interpretation: For population sparseness, one expects the distribution to show that for most images only few cells respond. For single cell selectivity, one expects each cell to respond to only a few images. Somehow the description of these graphs seems garbled. [E.g. then for each picture only 0.5% of cells responded means: in Monkey A, for 250 pictures 1 cell responded; for ~200 pictures 6 cells responded; for 5 pictures 20 cells responded. Alternative interpretation of graph: 1 cell responded to 250 pictures; 6 cells responded to ~200 pictures; 20 cells responded to 5 pictures? I would not use 'half-height bandwidths'. Use 'number of cells'. Why is stimulus specificity called 'life-time sparseness'? F-G, L-M should be described better to distinguish what each is saying (single cell selectivity vs. population sparseness/redundancy). Maybe partly it is the terminology that is used.]

5b) Figure 1: Why are there points below zero in D and E?

5c) Figure 1: 'Cells 653 and 949 are colored red respectively.' Don't see this.

5d) Figure 2: 0.5% contributes 50% of information and 5% contributes to 80% of information, what are remaining cells doing?

5e) Figure 1—figure supplement 2: Item C shows ROC curves for 99 shuffled trials. The part that wasn't at all clear to me was why did you compare the results to a shuffled version of the results. And why did the shuffled data have such a high hit rate at zero false alarms. I would have thought that the shuffling would greatly reduce the hit rate. That is quite different from Quiroga's (2005) paper, which shows more normal curves with the false positive rate close to zero. If the authors are unable to use the Quiroga method then they should explain why, and why they end up with the very unusual shape for the ROC curves.

5f) Perhaps it would be helpful to the reader to have Figure 1 broken up into multiple figures. By doing that the figures would be larger with details more visible. One improvement that would be helpful for Figures1D, E would be to have the x-axis be a log axis so that the cell rank of the first 20 or so neurons would be more visible. I believe this is the plot that best demonstrates sparseness so it needs to be very clear.

I would suggest also spending substantially more effort clarifying F and F0 on which panels D and E are based. Another question is what is the role of noise for the one second intervals where the stimulus is shown? There is expected to be more noise in that interval than in the interval without the stimulus. How is that noise estimated?

5g) In connection with Figure 1—figure supplement 2 it would be useful to show A and B as histograms in addition to the way they are presently shown.

5h) I suspect Figure 2 is an important figure. However it wasn't at all clear to me how it was calculated. I did figure out that the small inset was a continuation of the large plot that stopped at 10%. I wonder of the inset could be removed if log axes were used for the x-axis and the data would go from say 0.1% to 100%. But more important is to clarify how the red and blue data were calculated.

eLife. 2018 Apr 26;7:e33370. doi: 10.7554/eLife.33370.015

Author response

Essential revisions:

1) Concerns about GCaMP5.

1a) The authors point to the linearity of GCaMP5 as an advantage of the current study. While this is correct for large enough firing rates, it fails to mention the significant iceberg effect of GCaMP5, which is clearly demonstrated in the authors previous work (only spike rates larger than 10Hz evoke detectable fluorescence changes). In the current paper, the iceberg effect is only acknowledged in passing in the Materials and methods section. This iceberg effect is a serious issue for the current study. This paper reports average firing rates between 15 and 30 Hz, which would suggest that a detection threshold of 10Hz for the calcium imaging indeed will result in an overestimation of sparseness by failing to detect weaker responses. Most of the chronic calcium indicators currently available (including GCaMP6) have this problem, so there is currently no suitable alternative. However, the shortcomings of the chosen indicator, and the resulting potential for overestimating sparseness, should be addressed much more clearly than is currently the case. In its current form the paper is misleading, because it does not acknowledge the confounds of the GCaMP nonlinearities on the sparseness measurements.

The concerns about iceberg effect may have arisen due to the fact that we may not have sufficiently clarified the definition of sparseness we used. There are in fact several different definitions of sparseness rather than one. We used a definition of sparseness that is based on quantifying the peakedness of neuronal response distribution, which is robust and less sensitive to low level activities. The iceberg effect is not a serious problem in the sparseness measure (based on this definition we used) in this study.

1) Willmore, Mazer and Gallant, 2011 discussed four definitions of sparseness commonly used in the field. One we found intuitive is as follows: “a sparse code (is defined) as one in which the population response distribution elicited by each stimulus is peaked (we will refer to this as “population sparseness”). A peaked distribution is one that contains many small (or 0) values and a small number of large values. Thus a neural code will have high population sparseness if only a small proportion of neurons are strongly active at any given time”.

2) The half-height width of the ranked response distributions is a simple measure of the peakedness. A small half-height width of the ranked response distributions indicates “only a small proportion of neurons are strongly active at any given time”, Thus a half-height width of the ranked response distributions has provided a standard and comparable measure of sparseness (Rust and DiCarlo, 2012).

3) There are several different definitions of sparseness (and corresponding sparseness measures) rather than one. Actually, there is no simple perfect measure of sparseness. The popular sparseness measure suggested by Rolls and Tovee could not tell precisely the peakedness of a neuronal response distribution because multiple response distribution profiles can give the same sparseness index. For example, consider the following two neurons, one neuron has tens of strong peak responses and hundreds of very low responses (i.e. fat head but thin tail in its ranked response distribution), while the other neuron has only a few strong responses and hundreds of medium low responses (thin head but fat tail in the distribution). These two neurons have quite different response distributions, but could have same Rolls and Tovee’s sparseness measure. That means, when one study reports a low sparseness based on Rolls and Tovee’s measure, we will have no way of knowing whether this low sparseness comes from the dense peak responses (the fat heads) or medium weak responses (the fat tails). Similar issue is also true for the population sparseness measure.

4) The sparseness definition suggested by the reviewers – “a determination of how many neurons are 'on' (over baseline)” – has problems too. It would have treated the peaked strong responses and distributed weak responses equally, and thus could not characterize the peakedness of the response distribution. Probably because of these defects, a simply “on” definition of sparseness, as suggested, is not popularly used in this field.

5) Balancing all considerations, in this study, we used a half-height width of the ranked response distributions (which is the percentage of neurons exceed their half peak responses – for population sparseness, and the percentage of stimuli that makes a neuron fire more than half of its peak response – for life-time sparsity) to quantify the peakedness of response distribution. This measure is robust and less sensitive to variations in low-level activities (including the noises and nonlinearity in low firing range in calcium imaging). To make this issue more explicit, we add a statement in the text that this sparsity measure is not just indicating the percentage of neurons exhibit any statistical significant responses (probably a large fraction of neurons will show some weak responses, and given enough trials, the responses can be deemed significant), but the percentage of neurons exhibit strong response (above half their max firing rates). Half-height is used to decide whether a response is “strong” or not for computing sparseness. The half-height bandwidth of ranked response distribution has been used as a sparseness measure in other studies (Rust and DiCarlos), and is not an arbitrary measure we developed.

6) As explained above, the sparseness measure based on the half-height width of the ranked response distributions (the peakedness measure) concerns mostly the strong responses (mountains) but not on the distributed weak responses (valleys) or mean firing rates. Notably, a highly selective neuron with very strong responses to some optimal stimuli (which may exceed 150 Hz) will likely have very low average firing rate (even under 10 Hz). We have found, as reported in our Neuron paper, the firing rates below 10 Hz (~0.1 in dF/F0) could not linearly measured by GCaMP5. Missing these low level responses would surely under-estimate how many neurons would turn on, as the reviewer rightly pointed out. But, given the mountains of strong responses is far above that sea-level of 0.1, its half-height is also way above the sea-level, and the iceberg effect is quite minimal for the half-height bandwidths sparseness measure used in this study (see also discussion below 1c).

1b) One fix for this problem could be to include actual measurements of spike rates for a population of V1 neurons. While those data would suffer from sampling biases not present in the two-photon data, they would still be the strongest possible complement to this data set. Without spike data, it is basically impossible to assess how much of a problem the iceberg effect could be.

It is true that obtaining a distribution of spiking rates of a population of V1 neurons to the 2,250 stimuli would be a good comparison. However, this is not a simple task. First, as the reviewers pointed out, it is really hard to densely sample neurons with microelectrodes without bias. Second, the neurons in that layer are densely packed which would make isolation and cell-sorting difficult. Third, chronic multi-electrode recording of a particular superficial layer might not be stable enough to obtain the equivalent data. Given these factors, whatever comparative studies would likely not be conclusive. Furthermore, given the findings of our Neuron paper, and with a better clarification of our sparseness measure and its rationales, we feel such comparison is not completely necessary, particularly in view of the fact that our sparseness measure has been chosen precisely to mitigate the iceberg or sea-level effect that the reviewers were worried about.

1c) Perhaps a sensitivity analysis could be performed to test different assumptions about the response distributions below the Ca signal threshold. For instance, what happens if each "non-responsive" neuron is assigned a random response between 0 and 10 spikes/s? How would the sparseness results change in that case? That type of statistical analysis could be useful to estimate or provide bounds to the error in the sparseness measurement caused by the iceberg effect.

We have performed this test according to reviewers’ suggestion. We assigned each non-responsive neuron (with dF/F0 < 0, also tested with dF/F0 < 0.1) with random responses from 0 to 0.1 dF/F0 (roughly corresponding to 0 to 10 spikes/s). Exactly as discussed above, we found that the iceberg effect on sparseness measure is very small. In most cases, the half-height widths of ranked responses barely changed. In monkey A, the measured sparseness slightly increased from 0.49% to 0.50%. In monkey B, the measured sparseness did not change at all (0.41%).

1d) The statement that the linearity of GCaMP5 makes the sensor more suitable than GCaMP6 (which potentially saturates at higher rates) is incorrect. Determining sparseness across a population requires a determination of how many neurons are 'on' during a particular stimulus presentation, not accurate measurements of tuning functions. The same holds for determining the life-time sparseness of a neuron, which also only requires a determination of how many stimuli drive a neuron over a baseline level, not the precise tuning function.

We disagreed with the reviewers on this point. The reviewers might have missed a crucial aspect in the definition of sparsity, which does depend on the shape of the tuning curves. As mentioned above, we used an original definition of sparseness: “a neural code will have high population sparseness if only a small proportion of neurons are strongly active at any given time”. This definition is also applicable to life-time sparseness as well. We have stated that our sparseness measure is the percentage of strong response, not simply just any significant response. We will make sure this point clearer in the main text.

The advantage of GCaMP5 is that it catches all the peaks in responses even though it misses some low-level activities (below 10 Hz). GCaMP6s is more sensitive and tends to saturate above 60-80 Hz, and flattening many strong peaks into plateau, reducing sparseness measure, although it can capture more weak responses in the low-firing rate regime. Thus GCaMP5, not GCaMP6, is more appropriate for this study. However, if one’s definition of sparseness is simply the percentage of neurons would be turned on (regardless strength), then GCaMP6 would be more appropriate. But as discussed above, such a definition, while seems intuitive to general audience, does not characterize the peakedness of the response distribution, which is a critical aspect of the preferred definition of sparseness in the field (see Willmore et al., 2011).

2) Concerns about statistical/analysis methods

2a) Sparseness measures. The authors use the half-height bandwidth as a measure for sparseness. This choice is rather arbitrary and should be further justified. It would seem more plausible to count all stimuli that evoke responses that are significantly larger than the baseline. At a minimum, the authors should explore how their assessment of sparseness changes if the criterion threshold is changed (how many stimuli evoke responses that are 10%, 20% etc. of the maximum response). In general, any such measure is problematic because of the iceberg effect of GCaMP mentioned above. This also needs to be discussed more explicitly.

As discussed above (1a), we did not “count all stimuli that evoke responses that are significantly larger than the baseline” or “a determination of how many neurons are 'on' (over baseline)”, as a definition of sparseness because measures based on such definition treat the peaked strong responses and distributed weak responses equally and could not characterize the peakedness of the response distribution, which is a critical aspect of the commonly used sparseness definitions sparseness in the field (see Willmore et al., 2011).

Denser sampling of the bandwidths of the ranked response curve at various response height thresholds (such as 75%, 50%, 25% and so on) might provide a more comprehensive profile of the responses distributions, as shown in Author response image 1. However, a half-height width of the ranked response distributions is commonly used for counting strong responses and quantifying the peakedness. A very low height threshold will not work in quantifying the peakedness. The sparseness measure based on the peakedness analysis is not sensitive to low level iceberg effect (see also the response in (1c)).

2b) Comparison to traditional sparseness measures. The authors assert that sparseness measures used previously (as the one by Rolls and Tovee) are not applicable here because they are sensitive to changes in baseline level. However, these previous studies used baseline subtracted firing rates to calculate sparseness. The sensitivity of the traditional measures to changes in baseline levels therefore requires further explanation.

Precisely, Rolls and Tovee’s measure requires the subtraction of a baseline. However, determination of a baseline precisely is difficult for Ca imaging signals. As we tested thousands of stimuli, a slight variation in our baseline estimate will yield a large variation in the estimate sparseness value, because of most of the responses are very weak, and there is the danger of iceberg effect, hence, we chose a measure that does not require a precise estimation of baseline and is more robust against the known insensitivity against signals below 10 Hz, as discussed above. We will make further clarification in the Materials and methods section.

2c) Decoding. The 2 analysis parts of the paper are somewhat disconnected. One emphasizes single cell selectivity, whereas the other emphasizes population sparseness. It might be useful to set up the idea that single cell selectivity does or does not predict population sparseness. It seems two concepts are correlated, but I could imagine that this need not be so.

Actually, the first analysis part includes both population sparseness (Figure 1) and stimulus selectivity (life-time sparseness) (Figure 2). A paper from Willmore, Mazer and Gallant, 2011 discussed the correlation between lifetime sparseness and population sparseness carefully. Simply, when the responses are distributed (like those in this study), these two sparseness measures will be quite close. The second part is about decoding. The main motivation for the decoding analysis is to examine the amount of information that is contained in the strong sparse responses.

The first part assesses sparseness by thresholding responses on a per neuron basis. The second part assesses decoding based on groups of neurons by thresholding the population. What happens to the decoding if the response matrix is computed with the thresholding of part 1 applied (i.e., setting all responses below the half maximum for a neuron to 0)?

This is a correct observation. We agreed that there is certain dissonance in using relative half-height threshold in charactering population sparseness in Part 1, and then using absolute threshold for doing decoding in Part 2. We have actually done both, but we feel that using absolute threshold makes it more clear that the strong response in the absolute sense is important, as there is no guarantee that a cell can necessarily attain its absolute peak responses among these 2,250 natural image set. While decoding with absolute response threshold makes more sense biologically, no one would accept absolute threshold for population sparseness estimate. In any case, we have now done both and the results are actually comparable. We now include the new result in (Figure 3—figure supplement 1) for comparison. Showing both is important because they make clear that the sparse strong responses are meaningful for decoding both in relative and in absolute sense. Absolute threshold might be even more important the relative threshold for the biological reasons we mentioned earlier: downstream neurons might mostly care about the strength of the strong and robust input signals, and might not remember the “peak response” of each of its sources.

Furthermore, the discussion of the decoding results should be improved. Currently, it seems to imply a rather arbitrary threshold of around 20% that is considered 'good decoding' (e.g., in the comparison of the decoding results from the top 0.5% – which are around 20-30%, and the decoding results from the bottom 0.5% – which are around 15%). Both are far from the chance level, so these statements need to be further justified.

Our intent was to show that the top 0.5% of the responses alone can achieve 50% of the achievable decoding performance (28%/54% for monkey A, 21%/38% for monkey B), and that removing the top 0.5% of the response drop the accuracy to less than 50% of the achievable performance (19/54 for monkey A and 10/38 for monkey B). We believe the reviewer might have misinterpreted the blue curve, at 0.5% (where it intersects with the gray line), the blue curve means the accuracy when the top 0.5% response is removed, not the accuracy when the bottom 0.5% response is used. These two curves are used to demonstrate “sufficiency” (red curve), and “necessity” (blue curve) of the top 0.5%.

Finally, the authors conclude that the comparison of decoding performance for top and bottom responses demonstrates that strong responses are both necessary and sufficient to transfer relevant information. This is incorrect. The sufficiency is indeed demonstrated (accepting the assertion that decoding performance above some threshold constitutes successful decoding). However, to demonstrate necessity, they would have to demonstrate that successful decoding always implies the occurrence of strong responses. This is not the same as demonstrating that weak responses do not allow 'successful' decoding.

This issue might arise from the reviewer’s misinterpretation of the blue curve (see above). The blue curve indicates that without the top 0.5% of the strong response, decoding performance drops below 50% of the achievable decoding accuracy. At some level, this demonstrates the “necessity” of the top 0.5% for achieving good decoding, because without them the rest 99.5% of the population responses can realize only (or slightly less than) 50% of the achievable performance. We hope that with a better clarification of the blue curve, we can resolve the confusion, and that the reviewer will see the “necessity” of the strong response. In fact, the bottom 90% of the responses alone perform extremely badly (close to chance – <0.1%) (blue curve at 10% x-axis). On the other hand, in the presence of the strong response, the bottom 90% responses can contribute significantly, contributing over 20% of the achievable decoding performance (the red curve).

2d) It would be interesting to compare the sparseness of responses evoked by natural images to that evoked by gratings (which in other studies have been shown to drive a large percentage of superficial V1 neurons). This would also allow a better assessment of how many neurons could potentially respond, further alleviating concerns about cell health or other properties of the imaging region (although this concern is largely addressed by the fact that most neurons respond to at least some of the images in the natural image set).

Thanks for this suggestion. Actually, we have shown this result in our early paper (Neuron, 2017 and Current Biology, 2018). With a standard significance check (t-test or Anova), more than 90% cells did respond significantly to orientation gratings. But those responses were much weaker than the optimal strong responses elicited by natural images or complex shapes. That result could be logically and tightly connected with this study. But we are sorry that we could not add all those wonderful results into this paper anymore because of the limitation of publication policy.

2e) A famous paper by Quiroga, Reddy, Kreiman, Koch and Fried (2005) illustrated extremely high sparsity in cells that responded to images of Jennifer Aniston and Halle Berry. I'm actually surprised that there was no mention in the present paper of that finding by Quiroga. Although a direct comparison may not be appropriate given the differences in areas, it may still be informative to ask whether the V1 cells have greater sparseness than the Jennifer Aniston cell.

That is a very interesting point. In fact, we are aware of that paper, and cited the paper in the context of using similar ROC analysis to show that the sparse responses are not due to spurious chance occurrence. We also see the analogy as the reviewer, in the sense that both these MTC neurons and our V1 cells also exhibit similar high degree of stimulus specificity. But we feel it might be a bit too controversial to make a direct comparison. We now mention this potential connection in our Discussion/Conclusion paragraphs.

Another reason for connecting to the Quiroga paper is that they also do ROC analyses, but their ROC curves look very different than those of the present paper (see point 5e below). The comparison may provide further evidence that the sparseness calculations, including the ROC calculations, were done properly.

Please see the response in 5e below.

3) Clarification of experimental methods employed.

3a) Cell count. The overall number of cells per imaging region is crucial for estimating sparseness. The ROI-definition procedure adopted by the authors appears reasonable and well justified. However, a few additional details would be useful:

- Additional images of identified cells so that the accuracy of the chosen approach can be assessed.

We used the same algorithms to identify the neurons as reported in our previous work (Tang et al., 2017). All the cells with significant activities (over 3std) in differential images evoked by any stimuli could be identified automatically (ROIs). All the identified ROIs are shown in Figure 1C. We overlapped the ROIs to an image of the cells to show the ROIs extracted based on activities were well matched to cell bodies (Figure 1—figure supplement 3).

- How does the imaging region from monkey 2 look like?

We have added images from monkey 2 in new Figure 1—figure supplement 2.

- Which manual steps are involved in the procedure (presumably, somebody checks the identified ROIs)?

The ROI identification is automatic without manual operations.

- Were all data collected on the same day? If not, how are data and cell counts combined across days? How many cells were stable across those days?

Yes, all data were collected on the same day.

- How many of these cells are filled in versus have a ring-like label? This will help to assess how many of the included neurons are presumably healthy and should exhibit normal responses.

See Figure 1B, most of the cells (more the 95%) have ring-like label.

3b) Visual stimuli. Only the size of the stimuli are given. What are the other characteristics of the natural image set? Are they in color? Are they isoluminant with the background? What spatial frequencies and colors do they span? How different is their content? Are they part of one of the standard sets of natural images used in other studies?

Yes, they are in color (see the examples in Figure 1C, 1H and 1J). They are a large set of raw ‘natural images’. We did not modify them in color or contrast, as we tried to understand how the visual cortex works under natural conditions in this study. Those images are cropped from photos containing varying objects and scenes, with vary view distances, thus covered very large spatial frequencies and color ranges. Each stimulus is 4 x 4 degrees and hence 6-8 times larger than the receptive fields, which were placed at the center of the images.

The claim that single neurons respond to similar features of stimuli is not well supported and premature [“neuron 653 of monkey A was most excited when its receptive field (0.8o in diameter) covered the lower rim of the cat's 1 eye'. 'neuron 949 of Monkey A was found to be selective to an opposite curvature embedded in its preferred natural stimulus set”].

We have systematically investigated the pattern selectivity of V1 neurons in another study (monkey A of the two studies is actually the same, and many neurons overlap). We found most of the V1 superficial layer neurons exhibit high degree of selectivity to more complex and higher order patterns, such as corners, curvatures and radiating lines (Tang et al., 2018). Such selectivity could be the reason for the high degree of sparseness observed in this study. The two cells are just two typical examples for connecting the two studies. In the other study, each cell was tested with 9,500 patterns. The five patterns shown in Panel B and D of Figure 2 are just examples from the 9,500 tested patterns. More complete and detailed analysis of pattern selectivity of the neurons is presented in the other paper.

http://www.cell.com/current-biology/fulltext/S0960-9822(17)31521-X).

4) Points for further discussion/elaboration.

4a) The Discussion is focused on issues of sparseness. However, there are two issues that are also worth mentioning. First, this study thus provides a fresh view of what V1 is doing, one that shifts the emphasis from simple orientation selectivity to complex natural stimuli, and which gives novel perspective on how the brain encodes natural stimuli. And second, the results show that, within the span of a single hypercolumn, all stimuli presented could be largely decoded. This supports Hubel and Wiesel's original concept that a hypercolumn contains all the machinery to encode everything at a single point in space, except that the manner of encoding may be distinct from (or more complex than) the original concept of selection from amongst an array of systematically organized ocular dominance, orientation, and color columns.

Thanks for these suggestions! We have made the first point in our Current Biology paper (Tang, et al., 2018) with explicit evidence on the pattern selectivity of the neurons. Second point is an interesting suggestion. While we completely agree with the reviewers’ interpretation, we were afraid the evidence presented in this paper by itself is not sufficient to make this statement.

4b) A very important item that should be made very clear is the last sentence of the Abstract where the authors correctly claim that this is the first paper that shows sparseness of neuronal population responses in V1 in primates. They need to point out that papers like Frouderakis et al. (2014) were in mice and papers like Quiroga, et al. (2005) were in humans but not in V1. The statement at the end of the tenth paragraph of the Results and Discussion needs to make that clear. It is such an important point that it needs to be pointed out in the Introduction and the Conclusion. Other researchers would then become aware of the power of two-photon imaging.

Thanks for these suggestions! Actually, we already mentioned this in the Introduction. But now we added the following as the first sentence in the “Conclusion”:

“In conclusion, while earlier studies provided life-time sparseness measurements in rodents (Hromadka et al. 2008; Haider et al. 2010), non-human primates (Rolls and Tovee, 1995; Vinje and Gallant, 2000; rust and DiCarlo 2012) and human (Quiroga et al. 2005), and population sparseness measurement in rodents (Froudarakis et al. 2014), our study provided the first direct measurement of sparseness of large-scale neuronal population responses in awake macaque monkeys, made possible by large-scale two-photon imaging.”

5) Clarification of data/analyses in the figures.

5a) I find Figure 1F and G vs. L and M description very confusing. I think this is the interpretation: For population sparseness, one expects the distribution to show that for most images only few cells respond. For single cell selectivity, one expects each cell to respond to only a few images. Somehow the description of these graphs seems garbled. [E.g. then for each picture only 0.5% of cells responded means: in Monkey A, for 250 pictures 1 cell responded; for ~200 pictures 6 cells responded; for 5 pictures 20 cells responded. Alternative interpretation of graph: 1 cell responded to 250 pictures; 6 cells responded to ~200 pictures; 20 cells responded to 5 pictures?

The reviewer’s expectation is correct. “For population sparseness, one expects the distribution to show that for most images only few cells respond. For single cell selectivity, one expects each cell to respond to only a few images.” But the interpretation of the figure is wrong. We will try to make it clearer:

In such a frequency histogram graph, when there is a bar with height of 250 (Y-axis) at half-width of cells = 1 (x-axis), it means that there were 250 cases (pictures) with one picture activating only one cell strongly (among the 1225 cells in monkey A), i.e. 250 extremely sparse cases with only 1 cell responding to each picture.

We have also modified the captions of Figure 1G and G as follows: “(F and G) Frequency histograms showing the number of stimuli (out of 2250) (y-axis) that produce population responses with different population sparseness, measured in half-height bandwidth, i.e. the number of neurons activated strongly (x-axis). It shows that less than 0.5% of the cells (6 cells out of 1225 for monkey A, and 4.1 cells out of 982 for monkey B) responded above half of their peak responses for any given image on the average.”

I would not use 'half-height bandwidths'. Use 'number of cells'.

True. To minimize jargon, we changed half-height bandwidths to the percentage of the number of neurons or stimuli, which might be more explicit and easier to understand. We might still use half-height bandwidths at some places, because this is a standard expression in sparse code studies.

Why is stimulus specificity called 'life-time sparseness'? F-G, L-M should be described better to distinguish what each is saying (single cell selectivity vs. population sparseness/redundancy). Maybe partly it is the terminology that is used.]

Yes. We agree that life-time sparsity is a technical jargon in the field (Science 287: 1273- 1276, 2000; J Neurophysiol 105: 2907–2919, 2011). We equate it with stimulus specificity. However, the stimulus specificity might have the connotation that we know what stimulus feature the neuron is coding for. In sparse code study as those in Gallant’s or McCormick’s, the researchers often didn’t or couldn’t figure out the exact stimulus (which often include contextual surround) that give the neuron strong sparse response, hence they use the term life-time sparseness to indicate the neuron fire rarely to some specific stimulus, without claiming any knowledge of the neuron’s preferred stimulus.

5b) Figure 1: Why are there points below zero in D and E?

There are two possible reasons for the negative value in calcium imaging signal. First, some neurons may have spontaneous activities, and were depressed by some stimuli, which decreased the fluorescence signals (F < F0). Second, measurement noises could also make some signals negative.

5c) Figure 1: 'Cells 653 and 949 are colored red respectively.' Don't see this.

Sorry about the error. We have deleted this sentence.

5d) Figure 2: 0.5% contributes 50% of information and 5% contributes to 80% of information, what are remaining cells doing?

The responses of 0.5%-5% of the cells contribute to encoding of a given image. The responses of another 0.5%-5% of the cells contribute to encoding another image. Every cell is useful, just for different stimuli. In a way, this is very similar to the case of Quiroga’s paper where one cell was coding Jennifer Aniston, while another cell coding the Halle Berry, and yet another cell coding Sydney opera house, and these three cells did not care about other objects or concepts.

5e) Figure 1—figure supplement 2: Item C shows ROC curves for 99 shuffled trials. The part that wasn't at all clear to me was why did you compare the results to a shuffled version of the results.

The logic is, if the strong responses (say > half maximum for a neuron) in the raw data are random and spurious signals, then the curve of raw data will be comparable with those of shuffled version, and they will have similar ROC and AUC. On the other hand, if the strong responses for specific stimuli in raw data are reliable and repeatable across trials, the AUC will be significantly higher than those from shuffled data. The example shown in Figure 1—figure supplement 2, strongly against the hypothesis that the strong responses (three red data points in Figure 1—figure supplement 2A) came from random or spurious single trial epileptic responses.

And why did the shuffled data have such a high hit rate at zero false alarms. I would have thought that the shuffling would greatly reduce the hit rate. That is quite different from Quiroga's (2005) paper, which shows more normal curves with the false positive rate close to zero. If the authors are unable to use the Quiroga method then they should explain why, and why they end up with the very unusual shape for the ROC curves.

Yes, there are some key differences between Quiroga’s ROC analysis and ours. In Quiroga’s case, images from a certain individual or object formed the positive class, and images from others formed the negative; in our case, images with high enough (higher than half-maximum) mean responses form the positive class, and images with lower mean responses form the negative.

In Quiroga’s case, the set of positive images was fixed across all shuffles, and the authors tested whether the classification performance of a neuron for this fixed set of images against other images was by chance or not; in our case, the set of positive images varied across all shuffles, and we tested whether the classification performance of a neuron for high-mean-response images against others using per-trial responses was by chance or not. We cannot use Quiroga’s method for assigning positive and negative classes because our stimuli do not come from shared individuals or objects; because we changed the set of positive images for each shuffle, the ROC curves are biased to have high hit rates. Regardless of all differences, our ROC analysis is still valid, showing that the strong responses were not spurious or epileptic; otherwise, the shuffle operation would not change the AUC much.

To illustrate the point, let us imagine the following simplest case. Suppose a cell responding strongly to only one particular stimulus among 2,250, in every repeat and there are two repeats. There is then only one strong peak in the mean stimulus-tuning curve. The ROC curve will starts at (0, 1), with AUC (area under the curve) = 1. The shuffled version of the two trials will have two peaks (weaker but still significantly stronger than the rest) in the mean tuning curve. In this case, the ROC curve will start at (0, 0.5) with AUC near 0.75, as the hit rate is only 0.5 before any false positive appearing. When there are three repeats with one repeatable strong peak, the mean tuning curve of the shuffled case will have three peaks, as in our case, then the ROC will start at (0, 0.33) with AUCs around 0.667. This is because, as described above, the positive set of images is re-defined for the shuffled case (indicated by the blue spikes in Figure 2—figure supplement 1A), whereas the original positive set is indicated by the red spikes in Figure 2—figure supplement 1. In the real situation, as in the example shown in Figure 2—figure supplement 1, there are very strong peaks but also peaks close to the 0.5 max, the shuffled trials might contain some additional peaks due to noises or other low responses, making the ROC curve starting at (0, 0.33 – 0.4 range) and the AUC = 0.68 on average. When we increase the number of repeats, the starting point for the ROC will move toward (0, 0). In Quiroga’s case, the set of positive images was fixed, hence the ROC curves of shuffled responses start at (0, 0). That is the reason for the difference. In any case, the AUC of repeatable responses (across trials) will have significant higher AUC than that from shuffled trials, whereas spurious responses (unrepeatable across trials) will have similar AUC to that of the shuffled trials.

5f) Perhaps it would be helpful to the reader to have Figure 1 broken up into multiple figures. By doing that the figures would be larger with details more visible. One improvement that would be helpful for Figures1D, E would be to have the x-axis be a log axis so that the cell rank of the first 20 or so neurons would be more visible. I believe this is the plot that best demonstrates sparseness so it needs to be very clear.

Thanks for this constructive suggestion! We have separated Figure 1 into two figures. As for the suggestion of using log X for the neuronal response distribution plots (Figure 1D and E), as shown here, whilst using the log X allows easier visualization of the data, we feel it is not commonly used in the field for tuning curve, and might confuse some readers. We prefer to use the linear scale in the main text as it better convey the dramatic sharpness of the tuning curves and simplify the comparisons across studies.

I would suggest also spending substantially more effort clarifying F and F0 on which panels D and E are based. Another question is what is the role of noise for the one second intervals where the stimulus is shown? There is expected to be more noise in that interval than in the interval without the stimulus. How is that noise estimated?

We used a standard definition of F0, F and dF in this field. For each ROI, the F0 is the average fluorescence intensity during interval without the stimulus and F is average fluorescence intensity during interval with the stimulus. Yes, as shown in Figure 1I and K, the noises or variability of the signals in the interval with stimulus are larger than those in the interval without stimulus. That is consistent with situations in traditional electrophysiological studies.

5g) In connection with Figure 1—figure supplement 2 it would be useful to show A and B as histograms in addition to the way they are presently shown.

We have added histograms for top 3 responses for Figure 1—figure supplement 2A and B.

5h) I suspect Figure 2 is an important figure. However it wasn't at all clear to me how it was calculated. I did figure out that the small inset was a continuation of the large plot that stopped at 10%. I wonder of the inset could be removed if log axes were used for the x-axis and the data would go from say 0.1% to 100%. But more important is to clarify how the red and blue data were calculated.

We thank the reviewer for the advice. We have now improved the description in the main text. We thought about putting back the insets into the larger figures by using log axes; however, it was easier said than done as we found that to be messier overall.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Transparent reporting form

elife-33370-transrepform.docx^{(241.2KB, docx)}

DOI: 10.7554/eLife.33370.010

Data Availability Statement

All data generated or analysed during this study are included in the manuscript and supporting files. Source data files have been provided for Figures 1, 2 and 3.

[bib1] Akerboom J, Chen TW, Wardill TJ, Tian L, Marvin JS, Mutlu S, Calderón NC, Esposti F, Borghuis BG, Sun XR, Gordus A, Orger MB, Portugues R, Engert F, Macklin JJ, Filosa A, Aggarwal A, Kerr RA, Takagi R, Kracun S, Shigetomi E, Khakh BS, Baier H, Lagnado L, Wang SS, Bargmann CI, Kimmel BE, Jayaraman V, Svoboda K, Kim DS, Schreiter ER, Looger LL. Optimization of a GCaMP calcium indicator for neural activity imaging. Journal of Neuroscience. 2012;32:13819–13840. doi: 10.1523/JNEUROSCI.2601-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Barlow HB. The Ferrier Lecture, 1980: critical limiting factors in the design of the eye and visual cortex. Proceedings of the Royal Society B: Biological Sciences. 1981;212:1–34. doi: 10.1098/rspb.1981.0022. [DOI] [PubMed] [Google Scholar]

[bib3] Chen TW, Wardill TJ, Sun Y, Pulver SR, Renninger SL, Baohan A, Schreiter ER, Kerr RA, Orger MB, Jayaraman V, Looger LL, Svoboda K, Kim DS. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature. 2013;499:295–300. doi: 10.1038/nature12354. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Denk W, Strickler JH, Webb WW. Two-photon laser scanning fluorescence microscopy. Science. 1990;248:73–76. doi: 10.1126/science.2321027. [DOI] [PubMed] [Google Scholar]

[bib5] Froudarakis E, Berens P, Ecker AS, Cotton RJ, Sinz FH, Yatsenko D, Saggau P, Bethge M, Tolias AS. Population code in mouse V1 facilitates readout of natural scenes through increased sparseness. Nature Neuroscience. 2014;17:851–857. doi: 10.1038/nn.3707. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Haider B, Krause MR, Duque A, Yu Y, Touryan J, Mazer JA, McCormick DA. Synaptic and network mechanisms of sparse and reliable visual cortical activity during nonclassical receptive field stimulation. Neuron. 2010;65:107–121. doi: 10.1016/j.neuron.2009.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Hegdé J, Van Essen DC. A comparative study of shape representation in macaque visual areas v2 and v4. Cerebral Cortex. 2007;17:1100–1116. doi: 10.1093/cercor/bhl020. [DOI] [PubMed] [Google Scholar]

[bib8] Hromádka T, Deweese MR, Zador AM. Sparse representation of sounds in the unanesthetized auditory cortex. PLoS Biology. 2008;6:e16. doi: 10.1371/journal.pbio.0060016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Lennie P. The cost of cortical computation. Current Biology. 2003;13:493–497. doi: 10.1016/S0960-9822(03)00135-0. [DOI] [PubMed] [Google Scholar]

[bib10] Li M, Liu F, Jiang H, Lee TS, Tang S. Long-Term Two-Photon Imaging in Awake Macaque Monkey. Neuron. 2017;93:1049–1057. doi: 10.1016/j.neuron.2017.01.027. [DOI] [PubMed] [Google Scholar]

[bib11] Olshausen BA, Field DJ. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature. 1996;381:607–609. doi: 10.1038/381607a0. [DOI] [PubMed] [Google Scholar]

[bib12] Olshausen BA. Highly overcomplete sparse coding. Proceedings of SPIE; 2013. 86510S. [DOI] [Google Scholar]

[bib13] Quian Quiroga R, Panzeri S. Extracting information from neuronal populations: information theory and decoding approaches. Nature Reviews Neuroscience. 2009;10:173–185. doi: 10.1038/nrn2578. [DOI] [PubMed] [Google Scholar]

[bib14] Quiroga RQ, Reddy L, Kreiman G, Koch C, Fried I. Invariant visual representation by single neurons in the human brain. Nature. 2005;435:1102. doi: 10.1038/nature03687. [DOI] [PubMed] [Google Scholar]

[bib15] Rehn M, Sommer FT. A network that uses few active neurones to code visual input predicts the diverse shapes of cortical receptive fields. Journal of Computational Neuroscience. 2007;22:135–146. doi: 10.1007/s10827-006-0003-9. [DOI] [PubMed] [Google Scholar]

[bib16] Rolls ET, Tovee MJ. Sparseness of the neuronal representation of stimuli in the primate temporal visual cortex. Journal of Neurophysiology. 1995;73:713–726. doi: 10.1152/jn.1995.73.2.713. [DOI] [PubMed] [Google Scholar]

[bib17] Rust NC, DiCarlo JJ. Balanced increases in selectivity and tolerance produce constant sparseness along the ventral visual stream. Journal of Neuroscience. 2012;32:10170–10182. doi: 10.1523/JNEUROSCI.6125-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Tang S, Lee TS, Li M, Zhang Y, Xu Y, Liu F, Teo B, Jiang H. Complex Pattern Selectivity in Macaque Primary Visual Cortex Revealed by Large-Scale Two-Photon Imaging. Current Biology. 2018;28:38–48. doi: 10.1016/j.cub.2017.11.039. [DOI] [PubMed] [Google Scholar]

[bib19] Vinje WE, Gallant JL. Sparse coding and decorrelation in primary visual cortex during natural vision. Science. 2000;287:1273–1276. doi: 10.1126/science.287.5456.1273. [DOI] [PubMed] [Google Scholar]

[bib20] Willmore BD, Mazer JA, Gallant JL. Sparse coding in striate and extrastriate visual cortex. Journal of Neurophysiology. 2011;105:2907–2919. doi: 10.1152/jn.00594.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Zhang Y, Tang S, Li M. sparse-coding-elife2018. 33b196c2018 https://github.com/leelabcnbc/sparse-coding-elife2018

PERMALINK

Large-scale two-photon imaging revealed super-sparse population codes in the V1 superficial layer of awake monkeys

Shiming Tang

Yimeng Zhang

Zhihao Li

Ming Li

Fang Liu

Hongfei Jiang

Tai Sing Lee

Roles

Abstract

Introduction

Results and discussion

Figure 1. Population sparseness of neuronal responses of V1 layer two neurons to natural scenes.

Figure 1—figure supplement 1. Two-photon calcium imaging in awake macaque monitoring the neuronal activity in V1 layer 2 evoked by natural stimuli.

Figure 1—figure supplement 2. Two-photon images and neuronal responses in monkey B.

Figure 1—figure supplement 3. The ROIs overlaid over a two-photon image of a 850 × 850 µm region under a 16X objective, showing that the ROIs extracted on the basis of activities were well matched to the cell bodies.

Figure 2. Life-time sparseness in the neuronal responses of V1 layer two neurons to natural scenes.

Figure 2—figure supplement 1. Reliability analysis of neuronal responses.

Figure 3. Image decoding performance as a function of the percentage of only the strongest responses used for (a) Monkey A and (b) Monkey B.

Figure 3—figure supplement 1. Image decoding performance as a function of the percentage of top-responding neurons selected to be included or excluded, using a threshold relative to the peak response of each neuron.

Materials and methods

Key resources table.

Subjects

Behavioral task

Visual stimuli

Eye movement control

Two-photon imaging

Imaging data analysis

Sparseness measure

Stability and reliability of the neuronal measurements

Decoding analysis

Software available

Acknowledgements

Funding Statement

Contributor Information

Funding Information

Additional information

Competing interests

Author contributions

Ethics

Additional files

Data availability

References

Decision letter

Roles

Author response

Author response image 1. The proportions of stimuli those evoked strong responses above varying thresholds.

Author response image 2. The distribution of neuronal population responses with log X-axis.

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases