Abstract
The way that humans and animals perceive the lightness of an object depends on its physical luminance as well as its surrounding context. While neuronal responses throughout the visual pathway are modulated by context, the relationship between neuronal responses and lightness perception is poorly understood. We searched for a neuronal mechanism of lightness by recording responses of neuronal populations in monkey primary visual cortex (V1) and area V4 to stimuli that produce a lightness illusion in humans, in which the lightness of a disk depends on the context in which it is embedded. We found that the way individual units encode the luminance (or equivalently for our stimuli, contrast) of the disk and its context is extremely heterogeneous. This motivated us to ask whether the population representation in either V1 or V4 satisfies three criteria: 1) disk luminance is represented with high fidelity, 2) the context surrounding the disk is also represented, and 3) the representations of disk luminance and context interact to create a representation of lightness that depends on these factors in a manner consistent with human psychophysical judgments of disk lightness. We found that populations of units in both V1 and V4 fulfill the first two criteria but that we cannot conclude that the two types of information in either area interact in a manner that clearly predicts human psychophysical measurements: the interpretation of our population measurements depends on how subsequent areas read out lightness from the population responses.
NEW & NOTEWORTHY A core question in visual neuroscience is how the brain extracts stable representations of object properties from the retinal image. We searched for a neuronal mechanism of lightness perception by determining whether the responses of neuronal populations in primary visual cortex and area V4 could account for a lightness illusion measured using human psychophysics. Our results suggest that comparing psychophysics with population recordings will yield insight into neuronal mechanisms underlying a variety of perceptual phenomena.
Keywords: lightness, population coding, psychophysics, visual cortex
INTRODUCTION
A fundamental building block of our perception of object properties is lightness. For achromatic objects, lightness is the perceptual attribute of the object’s surface that varies from black, through gray, to white. The perceived lightness of an object’s surface is related to, but not completely determined by, its luminance, which characterizes the light reflected from the object to the eye. The luminance of the reflected light is the total spectral radiance after weighting by a measure of the spectral sensitivity of the visual system.
When only the diffuse surface reflectance of an object is varied, while its shape, position, the objects around it, and the incident illumination are held fixed, variation in luminance predicts variation in perceived lightness. When the context within which the object is varied, however, luminance is no longer a reliable predictor of lightness. Effects of context upon lightness support lightness constancy, wherein the perceived lightness of an object remains roughly constant across changes in illumination (Adelson 2000; Kingdom 2011).
The dissociation between physical luminance and perceived lightness is readily illustrated by Adelson’s checker shadow illusion, as elaborated by Gilchrist (http://persci.mit.edu/gallery/checkershadow; Gilchrist 2006). Figure 1 illustrates our variant of this illusion. Disks that have the same luminance and the same immediate surround (and thus the same local contrast) have different perceived lightnesses; disks that are perceived as lying in a shadow ( Fig. 1, right, which we refer to as “shadow” stimuli) are perceived as lighter than disks that do not lie in a shadow (Fig. 1, left, which we refer to as “paint” stimuli).
The goal of this study was to understand the relationship between neuronal population responses and lightness perception. Previous studies in animal models have found that a small proportion of neurons in the visual cortex respond in ways that are consistent with lightness perception in humans but the responses of single neurons are heterogeneous (Huang and Paradiso 2008; Hung et al. 2007; Kinoshita and Komatsu 2001; MacEvoy et al. 1998; MacEvoy and Paradiso 2001; Roe et al. 2005; Rossi et al. 1996; Rossi and Paradiso 1996, 1999; Vladusich et al. 2006). However, perception of complex stimuli is presumably driven by the joint activity of neuronal populations, and it is not currently clear how population neuronal activity is integrated to give rise to lightness perception. Functional imaging studies (e.g., using functional MRI) that measure the blood-level oxygen-dependent signal, which reflects the integrated activity of many neurons, have reported varied findings regarding neural correlates of lightness (Boyaci et al. 2007, 2010; Cornelissen et al. 2006; Corney et al. 2009; Haynes et al. 2004; Pereverzeva and Murray 2008; Perna et al. 2005).
Here we use a combination of human psychophysics, simultaneous recordings from dozens of neurons in areas V1 and V4 in monkeys, and neuronal population data analysis techniques to explore the way that luminance/contrast and context are encoded and might be read out. We found that neuronal populations in both V1 and V4 represent variation in both luminance and context. The relationship between the representations of luminance and context is complicated. We show that lightness information can be read out from the responses of neuronal populations in V1 and V4 in a manner that is consistent with the illusion illustrated in Fig. 1, perhaps by neurons in premotor areas in parietal or frontal cortex that are thought to be involved in the formation of perceptual decisions (Gold and Shadlen 2007; Heekeren et al. 2008). At the same time, we show that the neural representations in V1 and V4 do not obligatorily lead to lightness representations consistent with the illusion. Rather, the interpretation of the information in our recorded populations depends on how that information is read out. More generally, our work shows how analyzing the responses of neuronal populations as a whole can illuminate the neuronal mechanisms underlying perceptual and cognitive processes.
MATERIALS AND METHODS
Visual Stimuli
To better understand the neuronal and psychophysical underpinnings of lightness perception, we studied visual stimuli of the type illustrated in Fig. 1. A central disk was embedded in a checkerboard image, either within a shadowed region (“shadow” checkerboard) or within a luminance-matched region without a shadow (“paint” checkerboard). We studied how the perceived lightness of the center disk depends on context, which in our study refers to the difference between the paint and shadow surrounding checkerboards. Our stimuli differ from the original checker-shadow illusion in that the lightness effect occurs across disks viewed in the center of two separate images, rather than within a single image.
Importantly, in our stimuli, the disks always had the same immediate surround (the center check is the same luminance in the paint and shadow versions) and the same average global surround. Indeed, the average luminance of each of the 25 corresponding checks in the paint and shadow checkerboards was the same. The only difference between shadow and paint checkerboards was the spatial distribution of light in the first and second off diagonals. In the paint version, each check was spatially uniform. In the shadow version, the luminance was governed by a cumulative normal computed as a function of the distance from the off diagonals. This produced a penumbra-like gradient. For our stimuli, the paint-shadow illusion cannot be mediated by changes in local contrast nor by light adaptation to the overall luminance of the context images, because these two factors are matched. Indeed, disk luminance and disk contrast are perfectly correlated for our stimuli, so that our experiments do not distinguish between luminance and contrast representations. We reasoned that by silencing contrast and light adaptation, our stimuli would be more likely to reveal the action of cortical computations that support lightness perception (see Hillis and Brainard 2007). For simplicity in the following, we will describe the disks in terms of their luminance; the reader should bear in mind that given our stimuli, we could have equally well have used a contrast representation with all else remaining unchanged.
The disk luminance values were expressed in normalized units that vary between 0 and 1. A luminance of 1 corresponded to about 260 or 300 cd/m2 for the psychophysical experiments (varying across the 2 monitors used, see below) and ~105 cd/m2 for the physiological measurements, with the exact value in each type of experiment varying as the monitors aged. In our normalized units, the mean luminance of each image was 0.485, and the check that immediately surrounded the disks had a luminance of 0.170.
Note that the paint and shadow stimuli themselves differed sufficiently, based on the spatial distribution of light in the first and second off diagonals alone, so that both human and nonhuman primate subjects would almost certainly be able to reliably discriminate between the two.
Psychophysical Experiments
All human psychophysical procedures were approved by the Institutional Review Board of the University of Pennsylvania, and the experiments were conducted in accord with the tenets of the Declaration of Helsinki. All human subject provided written informed consent.
Stimuli were displayed on one of two NEC Spectra View LCD monitors (Model PA241W, 24-in. display, maximum luminance 300 cd/m2, stimulus chromaticity [0.31 0.33]); and PA271W, 27-in. display, maximum luminance 260 cd/m2, stimulus chromaticity [0.31 0.32]) from a distance of 57 cm using a pixel resolution of 1,920 × 1,200 pixels. The displays were controlled with eight-bit precision per channel using Matlab (The MathWorks), with a combination of routines from mgl (http://gru.stanford.edu/doku.php/mgl/overview) and the Psychophysics Toolbox (psychtoolbox.org; Brainard 1997; Pelli 1997). The monitors were calibrated using standard methods (Brainard et al. 2002), and the nonlinear input-output relation of each monitor channel was corrected using table lookup. Stimulus size in pixels was adjusted across the two monitors so that the visual angle of the stimuli was the same on each.
On each trial, human subjects viewed either two paint checkerboards side-by-side or one paint and one shadow checkerboard side-by-side. In the latter case, the left-right position of the paint and shadow checkerboards was randomized across trials. Each checkerboard contained a disk in its central square (as in Fig. 1). We refer to one disk as the reference disk and the other as the test disk. The left-right location of the reference disk was randomized across trials. There were three reference disk luminances, 0.25, 0.5, and 0.75 on the normalized [0–1] luminance scale, and across trials the reference disk appeared in each of the two types of checkerboards. The test disk’s luminance was adjusted over trials using staircase procedures. The subject’s task on each trial was to indicate which of the two disks appeared lighter. No complex elaboration about what was meant by the term “lighter” was provided to the subjects, and none reported to us that they found the term confusing. There is a large amount of literature on the effect of instructions in judgments of lightness and color; see Radonjić and Brainard (2016) for a recent treatment.
The size of the checkerboards was 3.5° of visual angle, and they were presented centered vertically and with their centers located at ±3.5° horizontally. Each check in the checkerboards was a 0.7° square. The disks had a diameter of 0.35° and were centered on the center square of the checkerboards.
Trials for each choice of checkerboard pairings (paint-shadow and paint-paint) were run in separate sessions. The paint-paint checkerboard pairing served primarily as a control. There were two separate staircases in each session for each choice of which checkerboard contained the reference disk and reference disk luminance, so 12 staircases per session in all (2 choices of which checkerboard contained the reference × 3 reference luminances × 2 staircases). For each combination of checkerboard containing the reference and reference disk luminance, one staircase was 2 up 1 down and the other was 1 up 2 down. At the start of each session there were five practice trials, chosen randomly from the set of possible trial types. The practice trials were followed by 20 blocks of 12 trials, where each block contained one trial from each of the staircases presented in random order. Each individual staircase was thus 20 trials long: that is there were a fixed number of trials per staircase. Thus there were 245 trials per session, including the 5 practice trials.
Four subjects (3 female, 1 male; ages 19–50) participated in the experiment. All were naïve as to the purpose of the study and had visual acuity of 20/40 or better as tested with a Snellen eye chart. The subjects ran two sessions of each condition (paint-shadow and paint-paint) to complete what we refer to as a single determination of the psychophysical paint-shadow effect. We made one such determination for subjects BAF and EJE and two each for subjects AQR and CNJ. In cases where we aggregate data across subjects, we treat each determination separately, so that data from subjects AQR and CNJ are weighted more heavily than those from subjects BAF and EJE.
For the paint-shadow data, for each combination of reference disk luminance and location (reference in paint or shadow checkerboard), we combined the data from the two within-session staircases and fit a cumulative normal to them, using a maximum likelihood fitting method. The fit was implemented using the Palamedes toolbox (Kingdom and Prins 2010; www.palamedestoolbox.org). The point of subjective equality (PSE; luminance corresponding to 50% lighter judgments) for each session was obtained from this fit. Thus there were six paint-shadow PSEs obtained per session. These were aggregated across the two sessions for each subject/determination to find a paint-shadow effect, as explained in results. The same data analysis procedure was applied for the paint-paint data, although in this case the reference was always in a paint checkerboard. Nonetheless, we gave each of the two paint checkerboards a nominal label to allow a parallel analysis and obtained six paint-paint PSEs for each session as well.
Electrophysiology Experiments
Stimulus presentation, subjects, and electrophysiological recordings.
Nonhuman primate subjects passively fixated while we presented single checkerboard stimuli and recorded neuronal responses. We presented visual stimuli on a calibrated CRT monitor (calibrated to linearize intensity, 1,024 × 768 pixels, 120-Hz refresh rate) placed 57 cm from the animal. We monitored eye position using an infrared eye tracker (Eyelink 1000; SR Research). We used custom software (written in Matlab using the Psychophysics Toolbox; Brainard 1997; Pelli 1997) to present stimuli and monitor behavior. We recorded eye position (1,000 samples per second), neuronal responses (30,000 samples per second), and the signal from a photodiode to align neuronal responses to stimulus presentation times (30,000 samples per second) using hardware from Ripple Microsystems.
The subjects in our physiological experiments were four adult male rhesus monkeys (BR, JD, ST, and SY, Macaca mulatta, 8.8, 10.0, 9.0, and 9.3 kg, respectively). All animal procedures were approved by the Institutional Animal Care and Use Committees of the University of Pittsburgh and Carnegie Mellon University. Before training, we implanted each animal with a titanium head holder. Then, the animal was trained to passively fixate while we presented peripheral visual stimuli. Monkeys BR, JD, and ST were also trained to perform other visually guided tasks that were not used in the current experiments. Once training was complete, we implanted a microelectrode array (Blackrock Microsystems). In monkeys BR and ST, we implanted a 10 × 10 microelectrode array in area V1. In monkeys SY and JD, we implanted a pair of 6 × 8 microelectrode arrays in V4. In monkey SY, both arrays were in V4 in the right hemisphere while monkey JD received bilateral V4 implants. We identified areas V1 and V4 using stereotactic coordinates and by visually inspecting the sulci. We placed the V1 arrays posterior to the border between V1 and V2 and placed the V4 arrays between the lunate and the superior temporal sulci. The two V4 arrays were connected to a single percutaneous connector. The distance between adjacent electrodes was 400 μm, and each electrode was 1-mm long. At this depth, the electrodes were likely to be in the middle cortical layers, although the curvature of the brain relative to the array and other experimental factors make it difficult to be certain.
We recorded neuronal activity during daily experiments for several weeks in each animal. During each daily experiment, the monkeys were rewarded for passively fixating while we presented single checkerboard stimuli for 1,000 ms. We varied the location, size, and orientation of the stimuli across all of our experiments. The stimuli were generally positioned such that both the disk and at least some of the shadowed part of the shadow checkerboard (and corresponding region of the paint checkerboard) fell within the classical spatial receptive fields of the population of neurons we recorded and typically spanned sizes between 4 and 15° of visual angle. The stimuli were repositioned within and across days with the goal of changing the configuration of different parts of the image on the receptive field of different neurons. We define an experimental session as a set of all disk luminances in both paint and shadow contexts that were presented at one location, size, and orientation. Multiple sessions were often collected during a single daily experiment with different stimulus configurations randomly interleaved. Because V1 receptive fields are substantially smaller than those of V4 neurons, the stimuli typically covered a greater proportion of V1 surrounds but the sizes of stimuli were varied across experiments performed in both areas. We collected data that spanned luminance values between 0.05 and 1, in steps of 0.05, and analyzed data for stimuli where the disk was an increment relative to its immediate surround (disk luminance: 0.20 or greater). Only trials where the monkey maintained good fixation were retained. Our data sets were limited by the availability of high-quality neuronal recordings and the monkeys’ willingness to complete a sufficient number of behavioral trials. We included data from sessions 1) where there were at least five trials for each intensity for paint and for shadow stimuli where the monkey maintained good fixation, and 2) where the best root-mean-squared decoding error of our linear population decoder variants (x-axis values on see Fig. 8, B and C below) was ≤0.20. We adopted this latter criterion because it seemed most conservative to make statements about the relationship between the encoding of luminance and context in situations where luminance was encoded with reasonable fidelity. The root mean squared error (RMSE) criterion led to us exclude 4 of 6 sessions from monkey BR, 1 of 17 sessions from monkey ST, 17 of 28 sessions from monkey JD, and 2 of 128 sessions from monkey SY. Note that simply decoding each disk luminance with a null model that estimates all disk luminances as the mean disk luminance leads to a mean (across sessions) RMSE of 0.24.1 We verified that relaxing the RMSE exclusion criterion to this null model value of 0.24, which excluded only 5 of 179 sessions, did not affect our conclusions. The average number of total trials per session in the data set was 648. Our analyzed data set includes 18 recording sessions in V1 (2 from monkey BR and 16 from monkey ST) and 137 recording sessions in V4 (11 from monkey JD and 126 from monkey SY). Trial-by-trial data for each session, in the form of disk luminance, stimulus type (paint or shadow), and resulting spikes per electrode, are available in a public data repository at URL https://figshare.com/articles/Individual_Session_Data/5948077/1. The data can also be provided in a rawer form upon request.
All spike sorting was done manually following the experiment using Plexon’s Offline Sorter. We sorted single units as well as multiunit clusters (multiunit clusters, which comprise the majority of our data set, were sorted to remove noise). During recordings from the chronically implanted microelectrode arrays we used in V1 and V4, it was nearly impossible to tell whether we recorded from the same single-unit or multiunit clusters on the array across subsequent days. Because of this, our primary analyses are based on neurons that were recorded simultaneously during a single recording session. For this study, we have combined data from single units and multiunits, and we use the term “unit” to refer to either. We included units for analysis if their response to the checkerboard stimuli was significantly different from the baseline response 100 ms before stimulus onset (t-test, P < 0.01, with Bonferroni correction for the number of units recorded during the session). We recorded from an average of 96 units per session from monkey BR, 97 from monkey ST, 55 from monkey JD, and 83 from monkey SY. To allow for the latency of V1 and V4 responses, our analyses are based on spike counts calculated from 30 to 1,030 ms after stimulus onset for V1 and 50–1,050 ms after stimulus onset for V4. Analysis of response dynamics and indication that our data are unlikely to be sensitive to the response interval chosen for analysis are shown in results (see Fig. 6).
Population decoding.
To study how neuronal populations encode luminance and context, we used a linear decoding approach.
We used standard linear regression to predict disk luminance from the spike count responses of populations of simultaneously recorded units. We fit the luminance of paint and shadow trials (all together) as a linear combination of the responses of all simultaneously recorded units to stimuli with disk luminances ≥0.2 (all stimuli where the disks were increments relative to their immediate surround). Specifically, we fit the model x = Yb + b0, where x is a #trials by 1 vector of disk luminances, Y is a #trials by #units matrix of spike count responses to the analyzed stimuli, b is a #units by 1 vector of weights, and b0 is a scalar affine term. We then assessed the decoded luminance of paint and shadow trials separately. To assess how well the luminance decoders performed, the RMSE of the luminance predictions was obtained by comparing the decoded luminances to the true disk luminances using 10-fold cross validation.
We chose to study linear decoders because 1) it is known that the computations required for linear decoding can be implemented by neurons, and 2) information that may be read out by a linear decoder is reasonably described as explicitly represented in the neuronal population (Majaj et al. 2015; Pagan et al. 2013). The latter point is important, as our primary interest is not in determining whether information about disk luminance and paint-shadow context is present in some form in V1 and V4, but rather in determining the degree to which this information explicitly supports a paint-shadow effect. In preliminary analyses, we also explored maximum likelihood decoders, which estimated disk luminance based on which luminance maximized the likelihood of the observed responses. These did not perform as well in a cross-validated RMSE sense as standard linear regressions, presumably because our data set does not contain enough trials to adequately estimate the unit response parameters (e.g., response mean, response variance) required to compute response likelihoods, and we do not report results based on maximum likelihood decoders. We also repeated our decoding analyses using trial-shuffled data to destroy noise correlations between simultaneously recorded units. Although correlated neurons contribute to the non-unique decoding weights we observed, destroying the correlation structure had a small effect on the decoding RMSE and did not change the key features of the paint-shadow effect shown in results (see Fig. 8, B and C).
Note that our decoding methods allow any combination of weights on the units we recorded. If luminance information was best decoded using the responses of a small number of well-tuned units (perhaps those whose receptive fields best overlapped the stimuli we used), the weights of those units could be large while the weights on the rest of the population could be 0. In practice, however, the distributions of weights were broad, and the decodings were nonunique (see results).
RESULTS
Human Subjects Perceive Higher Lightness in the Shadow Than in the Paint Context
The main goal of the human psychophysical experiments was to quantify the paint-shadow effect for our stimuli.
An example psychometric function is shown in Fig. 2A. This plots the fraction of trials on which test disks seen in the shadow checkerboard were judged to be lighter than the reference disk in the paint checkerboard, as a function of test disk luminance. The luminance of the reference disk was 0.5. Sensibly, as the test disk luminance increased it was judged lighter more of the time. The PSE obtained from the maximum-likelihood cumulative normal fit to the data is indicated by the dashed line. This value was inferred from the fit and corresponds to the test disk luminance at which subjects would report that the test was lighter than the reference on 50% of trials. The difference between the PSE and the reference disk luminance shows the perceptual effect of context on lightness. Here, the luminance of the PSE is less than that of the reference disk, indicating in turn that disks of equal luminance appear lighter in the shadow checkerboard than in the paint checkerboard.
Figure 2B summarizes the psychophysical results for all of the paint-shadow measurements for the same subject/determination whose example psychometric function is shown in Fig. 2A. Each point represents one PSE from a single session, with data from the two sessions of the single determination shown. For the cases where the reference disk was in the paint checkerboard, the reference disk luminance is on the x-axis and the PSE luminance is on the y-axis. For cases where the reference disk was in the shadow checkerboard, the PSE luminance is on the x-axis and the reference disk luminance is on the y-axis. In all cases, a lower luminance was required for a disk in shadow to appear the same as a disk in paint. We summarized this effect (which we refer to as the paint-shadow effect) as the negative log10 of the slope of the best-fit line through the PSE points (solid line in Fig. 2B). The fit line was constrained to pass through the origin, and in fitting we only considered points where the x-axis value was in the range 0.25–0.75.2 This allows us to make a matched choice that avoids neuronal saturation when we perform a parallel analysis on the neuronal data below. For the data shown in Fig. 2B, the slope was 0.84 and the paint-shadow effect was 0.08. The convention of choosing the negative, rather than positive, log10 slope makes a positive paint-shadow effect one that is consistent with the observation that disks of the same luminance appear lighter in shadow.
The results shown in Fig. 2B were characteristic of the data we obtained from other subjects/determinations. Figure 2C plots the paint-shadow effect obtained for each subject/determination (solid circles). The mean effect was 0.064 (±0.006 SE). We applied the same analysis procedures to control paint-paint checkerboard pairings (solid squares). Here, as expected, the mean paint-shadow effect was close to 0 (0.005 ± 0.001). The psychophysical data provide a quantitative measurement in humans of the magnitude of the paint-shadow illusion for our stimuli.
Criteria for a Neuronal Explanation for the Lightness Illusion
The existence of lightness illusions and constancy tell us that the neuronal representation of lightness combines information about the luminance of light reflected from objects with information about the context in which they are viewed. The goal of the physiological part of this study is to understand how information from these two separate sources is represented in populations of cortical neurons and in particular the degree and manner to which the neuronal representation of disk luminance is affected by variation between the paint and shadow contexts. Our guiding hypothesis is that luminance and context are represented jointly in neuronal populations in the visual cortex and that this ultimately leads to a context-dependent transformation of luminance to lightness as the information is read out by subsequent processing stages. More specifically, we test the idea that parsimonious accounts of decoding surface lightness from the measured population can provide a higher readout for disks in the shadow checkerboard than for disks in the paint checkerboard. If this is the case, the nature of the population code reveals a mechanism that can contribute to the visual system’s ultimate representation of surface lightness. Thus we suggest that the way that disk luminance and context are encoded in a candidate neuronal population should satisfy the following criteria.
Criterion 1.
Disk luminance should be encoded with good fidelity in ways that are broadly consistent with human discrimination psychophysics. For example, human subjects are more sensitive to subtle luminance changes of a disk when the luminance of the disk is low than when it is high.
Criterion 2.
Context should also affect the population responses when disk luminance is held fixed, so that the readout of disk luminance could be affected by context.
Criterion 3.
Plausible methods of reading out lightness from the population responses, regardless of whether the responses of individual neurons themselves vary monotonically with luminance, should accommodate a context effect in the same direction as the illusion, so that the readout lightness of a disk in shadow is higher than that of a corresponding-luminance disk in paint. In addition, the same readout should account for the fact that lightness increases with luminance when context is held fixed.
To provide intuition for how neuronal representations could fulfill the above criteria, Fig. 3, A and B, illustrate two scenarios. The schematics in Fig. 3, A and B, show a neuronal population space. Each dimension in this space could be taken to represent the firing rate of one of the simultaneously recorded neurons (so a population of 100 neurons would be represented in a 100-dimensional space). The response to a visual stimulus would then be represented by a point that indicated the number of spikes each unit fired during the stimulus presentation. More generally, a neuronal population space could be a lower dimensional projection of the individual neuron firing rate space. In both schematics shown, the variation in neuronal response to changes in the luminance of disks in the paint checkerboard is encoded along the direction represented by the red arrow, while the luminance of disks in shadow checkerboards is encoded along the direction represented by the blue arrow.
In the scenario represented by Fig. 3A, varying the luminance of disks in both paint and shadow causes the neuronal representation to vary along a single direction, with the effect of context being to shift the representation of disks in shadow along this direction relative to the representation of disks in paint. Here it would be natural to read out the lightness of the disk by projecting the responses onto a readout dimension that was aligned with the common direction of stimulus variation. This dimension is shown in Fig. 3A by the dashed black arrow, with the position of the arrow shifted laterally from the red and blue arrows to avoid excessive clutter in the depiction. Because the effect of paint vs. shadow context shown in Fig. 3A is to shift the representation of disk luminance along the single direction of variation, the lightness decoder illustrated will produce an obligate paint-shadow effect. This coding idea underlies, at least implicitly, a number of single-unit and functional MRI studies of the neuronal representation of lightness, in which the question posed is whether the response magnitude of individual units or voxels to luminance is shifted by context in a direction consistent with perceptual effects evoked by the stimuli under study (Boyaci et al. 2007, 2010; Cornelissen et al. 2006; Haynes et al. 2004; Kinoshita and Komatsu 2001; MacEvoy and Paradiso 2001; Pereverzeva and Murray 2008; Perna et al. 2005; Roe et al. 2005; Rossi and Paradiso 1996, 1999).
In the scenario represented by Fig. 3B, the direction of response variation corresponding to varying disk luminance is different for the paint and shadow contexts. As in Fig. 3A, the neuronal representation of luminance is affected by context but in a qualitatively different manner. Here multiple possible readout directions for decoding lightness are shown (black dashed arrows), and across these there are potential tradeoffs between the precision with which the readout lightness encodes within-context luminance variation and the degree to which the readout will reveal a paint-shadow effect. The goal of our study is to determine whether the representation of lightness in early visual cortex carries with it a requisite paint-shadow effect (as in Fig. 3A) or whether, as in Fig. 3B, there are many equivalent readout dimensions, which carry with them a range of paint-shadow effects.
Note that these schematics are simplified for illustrative purposes. They show the effect of luminance variation as lines in a one-dimensional subspace of a two-dimensional space, but the actual dimensionalities are higher. In addition, in higher dimensions, the variation may trace out nonlinear paths within the subspaces they occupy, and the subspaces occupied by the paint and shadow could share some dimensions but diverge in others. These considerations add richness beyond what is shown in the schematics. Below, we analyze our data to understand the relationship between lightness perception and the neuronal population representations of luminance and context.
Neuronal Populations in V1 and V4 Encode Luminance and Context
Based on previous studies (Leopold and Logothetis 1996; Rossi and Paradiso 1999; Sheinberg and Logothetis 1997), we chose primary visual cortex (V1) and V4 as areas in which to examine the neuronal representation of disk luminance and the effect of context. We recorded simultaneously from several dozen units in each area and positioned the stimuli so that they overlapped the receptive fields of the recorded units (Fig. 4A). The exact positioning, size, and orientation of the checkerboard was varied across sessions (see materials and methods).
In both areas V1 and V4, we found individual units that were selective for luminance and/or context. Figure 4B shows the mean firing rates of four example units as a function of luminance in the paint (red) or shadow conditions (blue). The responses of these four example units are quite heterogeneous. For example, the unit shown in the upper left panel is modulated by disk luminance but there is little if any effect of paint vs. shadow context. In contrast, the unit shown on the upper right is modulated by context but shows little modulation by disk luminance.
This heterogeneity was typical of our data set (Fig. 4C, example units are specified by the colored inset in each plot in Fig. 4B) and is expected given that we recorded from neurons with a range of receptive field locations and tuning properties. We characterized each unit by a luminance index and a paint-shadow index (see definitions in caption to Fig. 4). A positive luminance index indicates that a unit responds to paint stimuli with a higher firing rate as disk luminance increases. A positive paint-shadow index indicates that a unit responds more to shadow stimuli than paint stimuli when disk luminance is equated, irrespective of whether this is a response to the central test disk itself or to any other component of the display. Thus luminance and paint-shadow indexes of the same sign (Fig. 4C, 1st and 2rd quadrants) indicate neurons whose individual response properties are consistent with the psychophysics, while indexes of opposite sign (2nd and 4th quadrants) indicate neurons whose response properties go in the opposite direction from the psychophysics.
Consistent with reports in the literature of the existence of atypical luminance and contrast response tuning curves (Bushnell et al. 2011; Sani et al. 2013), our data set contained many units that had either positive or negative luminance or paint-shadow indexes. In both areas and for both indexes, the population means and medians were close to 0. Because it is difficult to determine the extent to which we recorded from the same units on subsequent days and thus the extent to which the data across days are independent, it is difficult to determine whether deviations from 0 mean are statistically significant. In V1, the average luminance index was 0.072 (SD = 0.091) and the average paint-shadow index was 0.0087 (SD = 0.026). In V4, the average luminance index was −0.0093 (SD = 0.082) and the average paint-shadow index was 0.0048 (SD = 0.042). However, a central tendency single-number summary of the population response (e.g., mean or median) is not the most useful measure for connecting the neural measurements to perception. Rather, understanding the neural basis for the checker-shadow illusion requires understanding how the responses of many neurons (or a subset of neurons) may be read out to guide lightness perception.
The luminance and paint-shadow indexes did not strongly depend on the relationship between their receptive field locations and the visual stimuli, and this relation did not explain a large proportion of the variance in the population data. The slopes of the best fit lines relating the absolute value of the luminance index and the distance between the center of the unit’s receptive field and the center of the disk were very close to 0 (–0.045 for V1 and 0.0015 for V4). Similarly, the slopes of the best fit lines relating the absolute value of the paint-shadow index and the distance between the receptive field center and the nearest diagonal line across the checkerboard that differentiates the paint from shadow stimuli were also very close to 0 (−0.0039 for V1 and 0.0017 for V4).
Despite these weak relationships, it remains likely that the way V1 and V4 neurons respond to visual stimuli depend on the way those stimuli fall on their receptive fields. Indeed, earlier single-unit studies, where stimuli were chosen with respect to the properties of individual neurons, nonetheless found modest contextual modulation of the activity of single cortical neurons in situations where context affects lightness, and observed considerable neuron-to-neuron heterogeneity (for review, see Paradiso et al. 2006). Our data set is more diverse, and we think more closely approximates the range of neuronal responses involved in lightness perception in natural vision, in which the heterogeneity of single neuron responses is presumably overcome by basing percepts and behaviors on the activity of large neuronal populations.
The heterogeneity of selectivity to luminance and context is reminiscent of the mixed and uncorrelated selectivity of neurons in many sensory areas to different stimulus features (for example, in primate area MT; DeAngelis and Uka 2003; Smolyanskaya et al. 2013). Such mixed selectivity may be a natural consequence of a common underlying neuronal architecture as well as the fact that the neurons we recorded are tuned for many other stimulus features (e.g., orientation, spatial frequency, temporal frequency, color, texture, etc.). When selectivity is mixed, using the responses of many neurons, regardless of their specific tuning properties, may be advantageous for coding (Fusi et al. 2016; Rigotti et al. 2013).
We hypothesize that despite the heterogeneity we observe across units, the population response structure might support a parsimonious readout consistent with lightness perception. We next evaluate this hypothesis using the three criteria described above.
Criterion 1: Sensitivity of the neuronal population to small changes in disk luminance.
A basic prerequisite for a neuronal population to be part of the neuronal substrate for lightness perception is that it encodes luminance with reasonable precision. We asked how the neuronal population responses could be used to discriminate between a disk of particular luminance and one of increased luminance. Figure 5A shows the average ability of a cross-validated linear classifier to detect luminance increments of 0.1 as a function of the number of units (color) and the base disk luminance (x-axis) to which the increment was added for both V1 (Fig. 5A, left) and V4 (Fig. 5A, right) data.
The classifier performs above chance for all choices of number of units and performance worsens as base disk luminance rises. This is also true for the data from each individual animal (data not shown). The exact quantitative performance of the decoder depends on experimental factors such as the number of units we recorded from, the amount of noise in the recordings, the location of and size of the stimulus on the retina, and the extent to which the central disk overlapped with the receptive fields of the units. It is notable that, in addition to being above chance, the performance of the decoder shares a key feature with the psychophysical data. This feature is shown in Fig. 5B, where we plot the average fraction of trials on which psychophysical subjects correctly judged that a disk with given base luminance plus an increment of 0.1 was lighter than a disk of the base luminance alone, when both disks were presented in the paint checkerboard. Here too we see a decrease in fraction correct with base luminance, a phenomenon that is generally referred to as Weber’s Law. The fact that neuronal population sensitivity matches, qualitatively, this feature of the psychophysics supports the idea that the neurons we recorded are involved in the processing that transforms luminance to lightness; these neuronal populations satisfy the first criterion we proposed for a candidate neuronal explanation for the lightness illusion.
Criterion 2: Context affects neuronal population responses.
The schematics in Fig. 3 depict two neuronal population representations of luminance that would lead to a paint-shadow effect: one in which the neuronal representations of paint and shadow overlap in population space and one in which the paint and shadow stimuli vary along different directions. To begin to differentiate between these possibilities, we assessed the similarity of the representations of paint and shadow stimuli by visualizing the mean responses in each context and luminance conditions as a function of time throughout the stimulus viewing period. To do so, we binned the responses in 20-ms bins, performed dimensionality reduction [using Gaussian Process Factor Analysis (Cowley et al. 2013; Yu et al. 2009); https://users.ece.cmu.edu/~byronyu/software/DataHigh/datahigh.html], and plotted the trajectories for each luminance and context (snapshots of projections from the Gaussian Process Factor Analysis representation onto the best two dimensions, as determined by linear discriminant analysis, for each cortical area and several time bins are shown in Fig. 6). This analysis was done using the entire data set, combining across sessions. It suggests that although there are interesting dynamics to the raw population responses, information about both luminance and context is represented in both V1 and V4 throughout the response period and that there is no large qualitative change in the representation of either luminance or context across the response interval. For this latter reason, all of the other analyses in this paper are performed using data aggregated across the entire response interval (see materials and methods).
Quantifying the neuronal paint-shadow effect.
To relate the neuronal representations of disk luminance to the perceptual paint-shadow effect, we need a method to quantify the neuronal paint-shadow effect in a manner that allows comparison to the psychophysically measured paint-shadow effect. Our approach was to use a linear regression technique to decode a neural correlate of disk lightness from the population responses and then determine whether the result, which we refer to as the decoded lightness, differs for paint and shadow disks in a way that is consistent with the psychophysics.
To illustrate the idea, we begin by considering decoders that recover disk luminance from neuronal population responses. We used linear regression to predict disk luminance. We fit the luminance of paint and shadow trials (all together) as a linear combination of the responses of all simultaneously recorded units to stimuli with disk luminances ≥0.2 (all stimuli where the disks are increments relative to their immediate surround). We then assessed the decoded luminance of paint and shadow trials separately. To assess how well the luminance decoders performed, the RMSE of the luminance predictions was obtained by comparing the decoded luminances to the true disk luminances using 10-fold cross validation. Figure 7A shows the mean decoded luminance obtained in this manner for an example V4 session for paint disks (red) and shadow disks (blue), as a function of stimulus luminance. The RMSE for this example decoding was 0.19, which may be compared with a null model value of 0.24 that would be obtained from simply assigning to every disk the mean luminance of all of the presented disks. The improvement in decoding RMSE relative to the null value indicates that the units recorded in this session carried information about disk luminance. In this session, disks in shadow were decoded to higher luminances than disks in paint. This suggests the possibility of using the luminance decoder as a way to generate decoded lightness. Indeed, for this example session, taking the decoded luminance as decoded lightness, the result is qualitatively consistent with the perceptual paint-shadow effect, where disks in shadow are perceived as lighter.
We examined whether the decoders tended to draw on the responses of just a few units or more broadly on the responses of many units. For each session, we ordered the absolute value of the decoding weights, from largest to smallest. We then computed the sum of these weights and asked how many units accounted for 25, 50, and 75% of that sum. We expressed these numbers as a percentage of the total number of units for that session. For a decoder that draws primarily on just a few units, a small number of units would account for most of the absolute weight. We found that, on average, it required 5% of units to account for 25% (±2% SD) of the total absolute weight, 16% (±5%) to account for 50% of the total, and 36% (±6%) to account for 75%. Thus, on average, ~30% of the units contributed to the central 50% (25–75%) of the total absolute weight. We interpret this as indicating that the optimal decoder draws broadly on the responses of the neural population. As a check on this conclusion, we repeated the analysis using cross-validated lasso regression. Lasso regression uses an L1-norm regularization term to minimize the number of nonzero weights obtained in the regression solution and thus provides a more conservative approach. We choose the regularization hyperparameter based on a fivefold cross-validation procedure, in which we evaluated the cross-validated RMSE error as a function of the regularization hyperparameter (25 values logarithmically spaced between 10−5 and 10) and chose the value that gave the lowest cross-validated RMSE. Using this value, we reanalyzed our data set with lasso regression. We found weight distribution values very similar to those obtained with standard linear regression. [With lasso regression, it required 5% of units to account for 25% (±2% SD) of the total absolute weight, 15% (±4%) to account for 50% of the total, and 34% (±5%) to account for 75%. Thus, even with the more conservative lasso regression method, ~29% of the units (on average) contributed to the central 50% (25–75%) of the total absolute weight.]
To quantify the neuronal paint-shadow effect for the illustrative decoding shown in Fig. 7A, we fit the relationship between stimulus luminance and decoded luminance/lightness with a smooth function (fits shown as solid lines in Fig. 7A), and then used the fits to identify luminances for disks in paint and disks in shadow that decoded to the same value. Figure 7B plots pairs of neuronal matches (solid blue circles) obtained in this manner. We quantified the neuronal paint-shadow effect as the negative log10 of the slope of the best fitting line through the plotted points, with the line constrained to pass through the origin.3 For the data shown in Fig. 7B, the neuronal paint-shadow effect was 0.05. Figure 7, C and D, shows the same analysis for an example V1 session, where the neuronal paint-shadow effect was found to be small (0.01).
A feature of the illustrative analysis shown in Fig. 7 is that it reflects the combined effect of the way that paint and shadow stimuli are represented in the neuronal population and the action of the particular way we chose to build the decoder. That is, for illustrative purposes we equated decoded lightness with the output of a decoder built to estimate stimulus luminance. As depicted in the schematic shown in Fig. 3B, when the population response to paint and shadow stimuli is multidimensional, there may be multiple decoders that can read out a neural correlate of lightness that preserves information about variation in disk luminance with high fidelity. Indeed, the decoding approach we illustrated in Fig. 7 was one that sought to find the same decoded luminance for both paint and shadow disks; that is, it was a decoder that sought to minimize the inferred neuronal paint-shadow effect. Because our interest is in whether the neuronal population codes can support the observed lightness effects, simply restricting the analysis to a decoder built to estimate luminance, as we did for illustrative purposes, is not appropriate. Rather, we want to characterize the range of paint-shadow effects that emerge when we explore a set of decoders, while at the same time requiring that the decoders preserve information about luminance variation.
Criterion 3: The neuronal representations in V1 and V4 may be read out with high precision in a manner that produces a paint-shadow effect.
To explore the range of neuronal paint-shadow effect obtainable with high-fidelity linear decoders, we determined the cost in decoding quality (quantified as RMSE) when we introduced a paint-shadow gain into the regression. That is, rather than constructing a single lightness decoder that attempts to estimate veridical disk luminance, we constructed a set of lightness decoders by introducing different estimation targets for paint disks and for shadow disks. We did this by defining a gain factor, g, and dividing the target decoded paint luminance by g while multiplying the target decoded shadow luminance by g. The gain factor g implicitly sets a target paint-shadow effect for the decoder, and in this sense the decoders illustrated in Fig. 7 had a target paint-shadow effect of 0. We varied g at 20 levels between and . These correspond to paint-shadow effects between −0.15 and 0.11, a range that encompasses the human paint-shadow effect as well as an equal-sized effect in the opposite direction. For each resulting decoder we obtained the paint-shadow effect using the procedure illustrated by Fig. 7. Figure 8A shows the obtained paint-shadow effects as a function of decoding RMSE for an example V4 session. Here the decoding RMSE was computed with respect to the target values, that is with respect to paint disk luminance divided by g and shadow disk luminance multiplied by g. We found that we obtain a wide range of paint-shadow effects (for this session between approximately −0.11 to 0.07) without a large effect on the decoding RMSE. If we restrict attention to RMSE values within 5% of the minimum value we found across all regressions, the range is still substantial (black filled circles in Fig. 8A), approximately −0.04 to 0.04.
Figure 8, B and C, shows for each experimental session the range of paint-shadow effects we obtained from lightness decoders whose RMSE was within 5% of the best RMSE obtained across all the examined decoders. The ranges were typically large (mean 0.077 in V1 and 0.054 in V4). The range straddles 0 for 81% of 155 recording sessions (89% of 18 V1 sessions, including both sessions from monkey BR and 14/16 sessions from monkey ST, and 80% of 137 V4 sessions, including 7/11 sessions from monkey JD and 103/126 sessions from monkey SY) and is strictly <0 for about the same number of sessions as it is strictly >0 (8% of 155 sessions strictly positive compared with 11% strictly negative).
Our results show that many (although not most) of the high-precision lightness decodings lead to a paint-shadow effect that is in the direction of the psychophysics. These data are therefore consistent with the idea that the populations of V1 and V4 units we recorded satisfy the final criterion for a candidate neuronal explanation for the lightness illusion. That said, the neuronal paint-shadow effect we observe is not obligate; one can construct high-precision decoders that do not show the paint-shadow effect as well as ones that do. In addition, the upper edge of the decoded paint-shadow effect range is frequently smaller in magnitude than the mean psychophysical paint-shadow effect (see horizontal solid black line in Fig. 8, B and C. These observations suggest a population representation more like that depicted in Fig. 3B than in Fig. 3A (see also Fig. 7). Thus the integration of context and luminance we observe in V1 and V4 neuronal populations provides a mechanism for computations that begin to extract perceptual lightness from stimulus luminance, but these populations do not reveal directly the full computations nor the specific readout mechanisms that would link the population code directly to the perceptual effects.
DISCUSSION
A Candidate Neuronal Explanation for Perceived Lightness
We combined human psychophysics, simultaneous recordings from dozens of neurons in V1 and V4, and neuronal population analyses to investigate the neuronal population mechanisms underlying lightness perception. With the psychophysics, we quantified a lightness illusion in the form of a measured paint-shadow effect. With neuronal recordings for the same stimuli, we found that the population representation of disk luminance is affected by the context in which the disk is presented. We found that the nature of the population representation allowed a range of high-precision decoders. Although it was generally the case that this range included ones that produced neuronal paint-shadow effects in the direction consistent with the psychophysics (for which the neuronal lightness of a disk of a given luminance was higher in the shadow than in the paint context), it was also the case that the range included ones that produced the opposite effect.
At first it might seem maladaptive to have context alter the lightness of disks that share the same luminance. Such contextual interactions, however, can be useful if we regard the function of lightness perception as providing a stable representation of object surface reflectance across changes in illumination, as well as across changes in other contextual variables (e.g., object shape, position and pose). Our study shows that population representations early in the visual cortex (V1 and V4) combine information about the disk luminance and context so that a subsequent high-precision linear readout could lead to a representation of lightness consistent with the paint-shadow effect. Our work leaves open the question of whether such a readout is in fact deployed by the visual system, as there are also high-precision readouts that are inconsistent with perceived lightness. In addition, it is an open question as to whether a single fixed readout can accommodate lightness constancy with respect to contextual changes beyond the paint-shadow manipulation we studied.
Our stimuli had the property that they equate the luminance of the local and global surround of the disks. They thus seem likely to silence a number of retinal mechanisms that contribute to lightness perception more generally: contrast coding and light adaptation. We designed the stimuli to emphasize the role of cortical processing in our measurements, and our work does not characterize the contributions of contrast coding and light adaptation nor does it distinguish luminance coding from contrast coding. It would be interesting in future work to study more general stimulus manipulations with our methods.
Is It Possible To Positively Identify the Neuronal Mechanism for Lightness Perception Given Experimentally Feasible Data Sets?
Our results speak to how the representations of luminance in neuronal populations in V1 and V4 meet the three criteria we described for a candidate neuronal mechanism underlying the checker-shadow illusion. First, we showed that both areas encode luminance (or equivalently for our stimuli, contrast) with relatively high fidelity and that the neural sensitivity of both V1 and V4 populations depends on disk luminance in a manner qualitatively similar to the dependence revealed by human psychophysics. Second, we showed that the representations of luminance and context interact in the neuronal population.
With respect to the third criterion, that plausible methods of decoding neuronal lightness from population responses should result in the luminance of shadow stimuli being read out as higher in lightness than that of corresponding paint stimuli, our results show that for many sessions in both V1 and V4, it is possible to construct high-precision luminance decoders that result in a paint-shadow effect that is similar to our psychophysical results (e.g., all decodings from V1 or V4 whose range bars cross the solid black line in Fig. 8, B and C). It would be tempting to declare “victory” at this juncture and conclude that the existence of such decoders implies that population responses in these areas form the neural basis of the paint-shadow effect. However, we also found a substantial number of high-precision luminance decoders that are accompanied by the opposite paint-shadow effect, and even more that are consistent with no paint-shadow effect at all. Thus conclusions about the relation between the neural population responses in V1 and V4 are contingent on assumptions about how the information carried by the population is read out. Our current data do not test these assumptions. We emphasize that this uncertainty was not a foregone conclusion: for some sessions, we do find responses where the range of decoder paint-shadow effects is sufficiently narrow to allow strong statements about the population recorded in that session. Had all sessions revealed decoders that were consistent in this manner, our data would have supported stronger conclusions.
Given the above, an important lesson from this study is that the neuronal weightings corresponding to high-precision lightness decoders (or likely decoders of any visual feature from experimentally feasible numbers of units) are far from unique. That is, the weighting can change substantially across decoders that have close to equal precision, resulting in a large range of, e.g., paint-shadow effects.
Why were we unable to conclusively identify a neuronal mechanism for the paint-shadow effect? There are at least four possibilities:
Monkeys do not experience the illusion. This seems unlikely given the similarity of monkey and human vision, but training monkeys to do a lightness task using the current stimuli would be required to positively rule out this possibility (see Huang et al. 2002).
We were looking in the wrong brain areas. Previous work suggests that early visual areas are involved with lightness perception (Boyaci et al. 2007, 2010; Cornelissen et al. 2006; Haynes et al. 2004; Kinoshita and Komatsu 2001; MacEvoy and Paradiso 2001; Pereverzeva and Murray 2008; Perna et al. 2005; Roe et al. 2005; Rossi et al. 1996; Rossi and Paradiso 1996, 1999), but the paint-shadow effect could in principle be revealed more clearly if we had applied our methods to characterize population activity in another, potentially downstream area (such as inferotemporal cortex; for example, see Leopold and Logothetis 1996; Sheinberg and Logothetis 1997). Indeed, our results suggest that the paint-shadow effect is set up by the representation of lightness in early visual cortex but finalized by the particular way that other parts of the brain process and read out this information.
We focused on the wrong subsets of neurons. Previous single unit studies of the neural basis of lightness optimized stimulus features (e.g., size and location) to match the tuning of the unit under study. Our multineuron recording approach made that impossible, and as a result, the majority of the units we recorded had tuning features that were suboptimal for the particular visual stimuli we presented. To explore this possibility, we made pseudopopulations consisting only of the units in which the absolute value of their luminance indexes (computed as in Fig. 4) were in the top decile. Unsurprisingly, these pseudopopulations encoded more luminance information than the actual recorded populations. As with the recorded populations, however, high quality lightness decodings were accompanied by a wide range of paint-shadow effects. This result is consistent with the observation that units with high-magnitude luminance indexes had paint-shadow indexes of both signs (Fig. 4C). That is, our recordings contained units that respond more to high luminance and either more to paint than shadow stimuli or vice versa. Therefore, even if we recorded only from subpopulations of units for which the stimuli were optimized, we likely would have reached the same conclusions. It is also possible that our electrode arrays missed critical subpopulations of neurons, possibly because the most relevant neurons are located in cortical layers not adequately sampled by our electrodes.
V1 and V4 contain neuronal mechanisms that are the direct correlate of the paint-shadow effect, but we recorded from too few neurons to reveal these mechanisms or applied the wrong analysis methods. Indeed, the lightness decoding weights we obtained are likely different than the ones the monkey uses. There are many possible reasons for this: we recorded from a very small subset of the neurons that respond to these stimuli; the luminance sensitivity, context dependence, and trial-to-trial variability of many neurons are correlated (meaning that even the truly optimal weights are nonunique); the monkey may use a nonlinear decoder whose behavior differs in fundamental ways from that of the linear decoder we considered; the neuronal responses might be modulated by attention or motivation if our monkeys had been performing a luminance discrimination task; etc. Recording from all of the relevant neurons is not feasible; there are likely many thousands of neurons in multiple areas that respond to these stimuli, and constructing decoders with that many neurons would require a number of trials that is several orders of magnitude larger than is experimentally feasible. Whether physiological data of the sort we recorded can ever positively identify the neuronal mechanisms underlying phenomena such as the paint-shadow effect remains an interesting question for future work. Consistent with previous work (Boyaci et al. 2007, 2010; Cornelissen et al. 2006; Haynes et al. 2004; Kinoshita and Komatsu 2001; 1999; MacEvoy and Paradiso 2001; Pereverzeva and Murray 2008; Perna et al. 2005; Roe et al. 2005; Rossi et al. 1996; Rossi and Paradiso 1996), we did observe a small number of units in both V1 and V4 that exhibited strong sensitivity to both luminance and context (see Fig. 4). It remains possible that this subset of neurons plays a special role in lightness perception, but our data are equally consistent with the possibility that they do not. One approach to address this possibility, as well as the issue raised in point 1 above, would be to train the monkeys to do a lightness discrimination task. We could then try to infer decoders that predict behavior and examine the units whose firing strongly influenced the decoders. Successful identification of this type of choice-based decoder might require a richer behavior than the typical two alternative forced choice discrimination task. In addition, the power of this method might be improved by increasing the number of stimulus dimensions varied in the experiments.
Neuronal Population Measures Can Reveal Representations of Sensory or Cognitive Factors for Which Individual Neuronal Responses Are Heterogeneous
Many studies have found that the responses of single neurons in early visual cortex are context dependent (Friedman et al. 2003; Kinoshita and Komatsu 2001; MacEvoy et al. 1998; MacEvoy and Paradiso 2001; Roe et al. 2005; Rossi and Paradiso 1996, 1999; Rossi et al. 1996), but linking neuronal activity to lightness perception has proven difficult. Our approach differs from previous investigations in that we recorded from populations of neurons using a stimulus that was not optimized for the specific neurons under study. This experimental situation is more analogous to natural vision, where different aspects of a complex stimulus fall on the receptive fields of different neurons. With the use of these stimuli, our results demonstrate that neuronal responses to luminance and context are extremely heterogeneous (Fig. 4; see also Bushnell et al. 2011; Sani et al. 2013), even as early as the primary visual cortex. Furthermore, in response to these stimuli, many individual units were not very sensitive to luminance (e.g., many luminance indexes were close to 0; Fig. 4). Nonetheless, the best luminance decoders drew nontrivially on the responses of many units. These results suggest that lightness, and likely many other perceptual phenomena, may arise from the readout of activity across large neuronal populations. Understanding the nature of these population representations in sensory cortices and how they are readout by downstream areas will require recording from, and analyzing the responses of, neuronal populations as a whole in response to multiple dimensions of stimulus variation, likely coupled with behavioral measures (see also Shapley and Hawken 2011).
Together, our results suggest that information about luminance and context is encoded in large neuronal populations in V1 and V4 in a manner that could, but does not necessarily, account for the paint-shadow effect.
GRANTS
This work was supported by National Eye Institute (NEI) Grants 4R00-EY-020844-03 and R01-EY-022930 (to M. R. Cohen) and R01-E10016 (to D. H. Brainard), a training grant slot under National Institute of Neurological Disorders and Stroke Grant 5T32NS7391-14 (to D. A. Ruff), Whitehall Foundation Grant (to M. R. Cohen), a Klingenstein-Simons Fellowship (to M. R. Cohen), Simons Foundation Collaboration on the Global Brain Grants 709862 (to M. R. Cohen) and 324759 (to D. H. Brainard), a Sloan Research Fellowship (to M. R. Cohen), and a McKnight Scholar Award (to M. R. Cohen). This work was also supported by core grants from the NEI Grants P30-EY-008098 (to M. R. Cohen) and P30-EY-001583 (to D. H. Brainard).
DISCLOSURES
No conflicts of interest, financial or otherwise, are declared by the authors.
AUTHOR CONTRIBUTIONS
D.A.R., D.H.B., and M.R.C. conceived and designed research; D.A.R. and D.H.B. performed experiments; D.A.R., D.H.B., and M.R.C. analyzed data; D.A.R., D.H.B., and M.R.C. interpreted results of experiments; D.A.R., D.H.B., and M.R.C. prepared figures; D.A.R., D.H.B., and M.R.C. drafted manuscript; D.A.R., D.H.B., and M.R.C. edited and revised manuscript; D.A.R., D.H.B., and M.R.C. approved final version of manuscript.
ACKNOWLEDGMENTS
We thank Joshua Alberts for assistance with recordings, Karen McCracken for technical assistance, and Amy Ni for helpful comments on an earlier version of the manuscript.
Footnotes
The null value varies slightly from session to session, depending on how many stimuli of each disk luminance were presented in that session. The reported value of 0.24 is the mean over sessions where the decoded root mean squared error was better than 0.20.
In preliminary analyses, we also explored fitting the data with an intercept parameter and the line constrained to have a slope of 1. These fits were of about the same quality as the slope only fits, and we chose the slope only fits for the theoretical reason that these describe the gain-change computation required to achieve lightness constancy across a change of illuminant intensity.
This fitting choice parallels the way we obtained the psychophysical paint-shadow effect. As with the psychophysical data, we have not pursued a detailed comparison of different functional forms for fitting the relationship illustrated by Fig. 7, B and D.
REFERENCES
- Adelson EH. Lightness perception and lightness illusions. In: The New Cognitive Neurosciences (2nd ed.), edited by Gazzaniga M. Cambridge, MA: MIT Press, 2000, p. 339–351 [Google Scholar]
- Boyaci H, Fang F, Murray SO, Kersten D. Responses to lightness variations in early human visual cortex. Curr Biol 17: 989–993, 2007. doi: 10.1016/j.cub.2007.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boyaci H, Fang F, Murray SO, Kersten D. Perceptual grouping-dependent lightness processing in human early visual cortex. J Vis 10: 4, 2010. doi: 10.1167/10.9.4. [DOI] [PubMed] [Google Scholar]
- Brainard DH. The Psychophysics Toolbox. Spat Vis 10: 433–436, 1997. doi: 10.1163/156856897X00357. [DOI] [PubMed] [Google Scholar]
- Brainard DH, Pelli D, Robson T. Display characterization. In Encylopedia of Imaging Science and Technology, edited by Hornak J. New York: Wiley, 2002, p. 172–188. doi: 10.1002/0471443395.img011 [DOI] [Google Scholar]
- Bushnell BN, Harding PJ, Kosai Y, Bair W, Pasupathy A. Equiluminance cells in visual cortical area v4. J Neurosci 31: 12398–12412, 2011. doi: 10.1523/JNEUROSCI.1890-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cornelissen FW, Wade AR, Vladusich T, Dougherty RF, Wandell BA. No functional magnetic resonance imaging evidence for brightness and color filling-in in early human visual cortex. J Neurosci 26: 3634–3641, 2006. doi: 10.1523/JNEUROSCI.4382-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corney D, Haynes JD, Rees G, Lotto RB. The brightness of colour. PLoS One 4: e5091, 2009. doi: 10.1371/journal.pone.0005091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cowley BR, Kaufman MT, Butler ZS, Churchland MM, Ryu SI, Shenoy KV, Yu BM. DataHigh: graphical user interface for visualizing and interacting with high-dimensional neural activity. J Neural Eng 10: 066012, 2013. doi: 10.1088/1741-2560/10/6/066012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeAngelis GC, Uka T. Coding of horizontal disparity and velocity by MT neurons in the alert macaque. J Neurophysiol 89: 1094–1111, 2003. doi: 10.1152/jn.00717.2002. [DOI] [PubMed] [Google Scholar]
- Friedman HS, Zhou H, von der Heydt R. The coding of uniform colour figures in monkey visual cortex. J Physiol 548: 593–613, 2003. doi: 10.1113/jphysiol.2002.033555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fusi S, Miller EK, Rigotti M. Why neurons mix: high dimensionality for higher cognition. Curr Opin Neurobiol 37: 66–74, 2016. doi: 10.1016/j.conb.2016.01.010. [DOI] [PubMed] [Google Scholar]
- Gilchrist A. Seeing Black and White. Oxford, UK: Oxford University Press, 2006. doi: 10.1093/acprof:oso/9780195187168.001.0001 [DOI] [Google Scholar]
- Gold JI, Shadlen MN. The neural basis of decision making. Annu Rev Neurosci 30: 535–574, 2007. doi: 10.1146/annurev.neuro.29.051605.113038. [DOI] [PubMed] [Google Scholar]
- Haynes JD, Lotto RB, Rees G. Responses of human visual cortex to uniform surfaces. Proc Natl Acad Sci USA 101: 4286–4291, 2004. doi: 10.1073/pnas.0307948101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heekeren HR, Marrett S, Ungerleider LG. The neural systems that mediate human perceptual decision making. Nat Rev Neurosci 9: 467–479, 2008. doi: 10.1038/nrn2374. [DOI] [PubMed] [Google Scholar]
- Hillis JM, Brainard DH. Distinct mechanisms mediate visual detection and identification. Curr Biol 17: 1714–1719, 2007. doi: 10.1016/j.cub.2007.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang X, MacEvoy SP, Paradiso MA. Perception of brightness and brightness illusions in the macaque monkey. J Neurosci 22: 9618–9625, 2002. doi: 10.1523/JNEUROSCI.22-21-09618.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang X, Paradiso MA. V1 response timing and surface filling-in. J Neurophysiol 100: 539–547, 2008. doi: 10.1152/jn.00997.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hung CP, Ramsden BM, Roe AW. A functional circuitry for edge-induced brightness perception. Nat Neurosci 10: 1185–1190, 2007. doi: 10.1038/nn1948. [DOI] [PubMed] [Google Scholar]
- Kingdom F, Prins N. Psychophysics: A Practical Introduction. New York: Academic, 2010. [Google Scholar]
- Kingdom FA. Lightness, brightness and transparency: a quarter century of new ideas, captivating demonstrations and unrelenting controversy. Vision Res 51: 652–673, 2011. doi: 10.1016/j.visres.2010.09.012. [DOI] [PubMed] [Google Scholar]
- Kinoshita M, Komatsu H. Neural representation of the luminance and brightness of a uniform surface in the macaque primary visual cortex. J Neurophysiol 86: 2559–2570, 2001. doi: 10.1152/jn.2001.86.5.2559. [DOI] [PubMed] [Google Scholar]
- Leopold DA, Logothetis NK. Activity changes in early visual cortex reflect monkeys’ percepts during binocular rivalry. Nature 379: 549–553, 1996. doi: 10.1038/379549a0. [DOI] [PubMed] [Google Scholar]
- MacEvoy SP, Kim W, Paradiso MA. Integration of surface information in primary visual cortex. Nat Neurosci 1: 616–620, 1998. doi: 10.1038/2849. [DOI] [PubMed] [Google Scholar]
- MacEvoy SP, Paradiso MA. Lightness constancy in primary visual cortex. Proc Natl Acad Sci USA 98: 8827–8831, 2001. doi: 10.1073/pnas.161280398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Majaj NJ, Hong H, Solomon EA, DiCarlo JJ. Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict human core object recognition performance. J Neurosci 35: 13402–13418, 2015. doi: 10.1523/JNEUROSCI.5181-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pagan M, Urban LS, Wohl MP, Rust NC. Signals in inferotemporal and perirhinal cortex suggest an untangling of visual target information. Nat Neurosci 16: 1132–1139, 2013. doi: 10.1038/nn.3433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paradiso MA, Blau S, Huang X, MacEvoy SP, Rossi AF, Shalev G. Lightness, filling-in, and the fundamental role of context in visual perception. Prog Brain Res 155: 109–123, 2006. doi: 10.1016/S0079-6123(06)55007-1. [DOI] [PubMed] [Google Scholar]
- Pelli DG. The VideoToolbox software for visual psychophysics: transforming numbers into movies. Spat Vis 10: 437–442, 1997. doi: 10.1163/156856897X00366. [DOI] [PubMed] [Google Scholar]
- Pereverzeva M, Murray SO. Neural activity in human V1 correlates with dynamic lightness induction. J Vis 8: 1–10, 2008. doi: 10.1167/8.15.8. [DOI] [PubMed] [Google Scholar]
- Perna A, Tosetti M, Montanaro D, Morrone MC. Neuronal mechanisms for illusory brightness perception in humans. Neuron 47: 645–651, 2005. doi: 10.1016/j.neuron.2005.07.012. [DOI] [PubMed] [Google Scholar]
- Radonjić A, Brainard DH. The nature of instructional effects in color constancy. J Exp Psychol Hum Percept Perform 42: 847–865, 2016. doi: 10.1037/xhp0000184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rigotti M, Barak O, Warden MR, Wang XJ, Daw ND, Miller EK, Fusi S. The importance of mixed selectivity in complex cognitive tasks. Nature 497: 585–590, 2013. doi: 10.1038/nature12160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roe AW, Lu HD, Hung CP. Cortical processing of a brightness illusion. Proc Natl Acad Sci USA 102: 3869–3874, 2005. doi: 10.1073/pnas.0500097102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rossi AF, Paradiso MA. Temporal limits of brightness induction and mechanisms of brightness perception. Vision Res 36: 1391–1398, 1996. doi: 10.1016/0042-6989(95)00206-5. [DOI] [PubMed] [Google Scholar]
- Rossi AF, Paradiso MA. Neural correlates of perceived brightness in the retina, lateral geniculate nucleus, and striate cortex. J Neurosci 19: 6145–6156, 1999. doi: 10.1523/JNEUROSCI.19-14-06145.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rossi AF, Rittenhouse CD, Paradiso MA. The representation of brightness in primary visual cortex. Science 273: 1104–1107, 1996. doi: 10.1126/science.273.5278.1104. [DOI] [PubMed] [Google Scholar]
- Sani I, Santandrea E, Golzar A, Morrone MC, Chelazzi L. Selective tuning for contrast in macaque area V4. J Neurosci 33: 18583–18596, 2013. doi: 10.1523/JNEUROSCI.3465-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shapley R, Hawken MJ. Color in the cortex: single- and double-opponent cells. Vision Res 51: 701–717, 2011. doi: 10.1016/j.visres.2011.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sheinberg DL, Logothetis NK. The role of temporal cortical areas in perceptual organization. Proc Natl Acad Sci USA 94: 3408–3413, 1997. doi: 10.1073/pnas.94.7.3408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smolyanskaya A, Ruff DA, Born RT. Joint tuning for direction of motion and binocular disparity in macaque MT is largely separable. J Neurophysiol 110: 2806–2816, 2013. doi: 10.1152/jn.00573.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vladusich T, Lucassen MP, Cornelissen FW. Do cortical neurons process luminance or contrast to encode surface properties? J Neurophysiol 95: 2638–2649, 2006. doi: 10.1152/jn.01016.2005. [DOI] [PubMed] [Google Scholar]
- Yu BM, Cunningham JP, Santhanam G, Ryu SI, Shenoy KV, Sahani M. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. J Neurophysiol 102: 614–635, 2009. doi: 10.1152/jn.90941.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]