Skip to main content
eLife logoLink to eLife
. 2020 Nov 11;9:e58360. doi: 10.7554/eLife.58360

A new no-report paradigm reveals that face cells encode both consciously perceived and suppressed stimuli

Janis Karan Hesse 1,2,, Doris Y Tsao 1,2,
Editors: Ming Meng3, Floris P de Lange4
PMCID: PMC7676863  PMID: 33174836

Abstract

A powerful paradigm to identify neural correlates of consciousness is binocular rivalry, wherein a constant visual stimulus evokes a varying conscious percept. It has recently been suggested that activity modulations observed during rivalry may represent the act of report rather than the conscious percept itself. Here, we performed single-unit recordings from face patches in macaque inferotemporal (IT) cortex using a no-report paradigm in which the animal’s conscious percept was inferred from eye movements. We found that large proportions of IT neurons represented the conscious percept even without active report. Furthermore, on single trials we could decode both the conscious percept and the suppressed stimulus. Together, these findings indicate that (1) IT cortex possesses a true neural correlate of consciousness and (2) this correlate consists of a population code wherein single cells multiplex representation of the conscious percept and veridical physical stimulus, rather than a subset of cells perfectly reflecting consciousness.

Research organism: Rhesus macaque

Introduction

Having conscious experience is arguably the most important reason why it matters to us whether we are alive or dead. The question which signals in the brain reflect this conscious experience and which reflect obligatory processing of input regardless of conscious experience is a central puzzle of neuroscience. For example, activations in the retina may correlate with the conscious percept of flashing light but are arguably entirely driven by physical input, much of which never evolves into a conscious percept. Another driver of neural activity that can be confounded with signals related to conscious perception is report. Recently, it has been suggested that brain regions may correlate with conscious perception simply because they are driven by the active report of it (Aru et al., 2012; Block, 2019; Block, 2020; Boly et al., 2017; Frässle et al., 2014; Koch et al., 2016; Overgaard and Fazekas, 2016; Panagiotaropoulos et al., 2020; Safavi et al., 2014; Tsuchiya et al., 2016; Tsuchiya et al., 2015).

A paradigm known as binocular rivalry is useful for distinguishing responses related to conscious perception from those driven by obligatory processing of physical input (Blake et al., 2014; Tong et al., 2006): when two incompatible stimuli such as a face and an object are shown to the left and right eyes, respectively, one does not perceive a constant superimposition of the two but rather an alternation between face and object, even though the physical input is fixed (Figure 1a). Since these alternations are internally generated, they cannot be attributed to pure feedforward processing of external input.

Figure 1. A novel no-report paradigm.

Figure 1.

(a) Illustration of binocular rivalry stimuli used in the paradigm. Four example trials are shown. Each trial was presented continuously for 800 ms without blank period between trials. The first and second rows show stimuli in the left and right eyes, respectively. If different stimuli are shown to the left and right eyes, as in this example, one’s percept will spontaneously alternate between the two, as shown in the example perceptual trajectory in the third row. Stimuli in each eye contained a fixation spot at one of four possible positions. (b) Example eye traces from a human subject. Red and blue traces show the distance of the eye position from the fixation spot in the right and left eyes, respectively. Thick lines show the average. Traces are aligned to the onset of a trial where the subject reported that the percept switched from face to object (left) or object to face (right). (c) The bar plot shows the average proportion of trials where the percept inferred matched the percept reported by button press. White circles show accuracies of individual subjects. We inferred that a subject was perceiving face or object if the subject fixated on the face fixation spot (i.e., fixation spot in the eye of the face stimulus) or object fixation spot (i.e., fixation spot in the eye of the object stimulus), respectively, for at least half of the trial.

In previous studies, researchers trained monkeys to report their percept during binocular rivalry by releasing a lever. They found that the proportion of cells modulated by the reported percept increases along the visual hierarchy, with 20% of cells showing modulations in V1 (Leopold and Logothetis, 1996) compared to 90% of cells showing modulations in inferotemporal (IT) cortex (Sheinberg and Logothetis, 1997). Using functional magnetic resonance imaging (fMRI), Tong et al., 1998 found that the human fusiform face area responds to reported perceptual switches. Using single-unit recording, Gelbard-Sagiv et al., 2018 found that the activity of neurons in the human medial temporal lobe and frontal cortex is also modulated by the reported percept.

Although binocular rivalry dissociates the conscious percept from physical input, an important confounding factor remains. In all studies cited above, the monkey or human subject always actively reported their percept by a motor response. Thus it is possible that the observed neuronal activations were due to the act of report itself, including introspection, decision making, and motor action accompanying report, rather than a switch in conscious percept. This concern was emphasized in an fMRI experiment by Frässle et al., 2014 who compared modulations in the brain with and without active report. Many of the modulations observed in higher-level brain regions such as the frontal lobe disappeared when subjects did not actively report perceptual switches.

To infer the subject’s percept in the absence of report, Frässle et al. used two no-report paradigms that depended on pupil size and optokinetic nystagmus, respectively. To exploit pupil size, they presented stimuli with different brightness in the two eyes, causing the subject’s pupil size to vary according to the dominant percept’s brightness. To exploit optokinetic nystagmus, they presented gratings moving in opposite directions in the two eyes, causing the subject’s eye position to reflexively follow the direction of the dominant grating. Therefore, the conscious percept could be inferred by reading out pupil size and drift of eye position, respectively.

These no-report paradigms allow accurate prediction of the subject’s percept but are not free of confounds themselves (Overgaard and Fazekas, 2016). First, pupil size is known to correlate with arousal, surprise, attention, and other confounding factors (Bradley et al., 2008; Hoeks and Levelt, 1993; Preuschoff et al., 2011). Second, when optokinetic nystagmus is applied to moving non-grating stimuli such as natural objects that drive IT cortex, there will be confounding physical stimulus differences. For example, the dominant stimulus that is smoothly pursued by the subject’s eyes will tend to be stationary on the subject’s fovea and optimally modulate IT areas with foveal biases, while the non-dominant stimulus will be more eccentric and have increased motion velocity. Moreover, optokinetic nystagmus is still present in monkeys in which the conscious percept is diminished due to anesthesia with low doses of ketamine (Leopold et al., 2002).

Here, we introduce a new no-report paradigm that relies on active tracking of a fixation spot, unlike the reflex-based paradigms mentioned above. In this fixation-based paradigm the subject is required to maintain fixation on a jumping spot, a task that many animals in vision research are already trained to perform. While following the fixation spot, subjects view either unambiguous, monocular stimuli physically switching between a face and an object, or a binocular rivalry stimulus that switches only perceptually. For the binocular rivalry stimulus, a fixation spot is shown to each eye at different positions on the screen. Thus, when perceiving a face in the left eye, the subject will generally perceive only the fixation spot in the left eye and saccade to it, ignoring the fixation spot in the right eye. In this way, the subject’s percept can be inferred from eye movement patterns without active report.

In a second innovation, we performed electrophysiological recordings using a novel 128-electrode site Neuropixels-like probe that allowed us to measure responses from large numbers of cells simultaneously. This allowed us to address for the first time the extent to which neural activity is modulated by conscious perception in single trials. Sheinberg and Logothetis, 1997 reported that 90% of IT cells are modulated by conscious perception. However, a fact that has been largely overlooked is that the response modulations found in that study during the rivalry condition were clearly smaller than those in the physical condition. It is possible that the decrease arose due to incorrect reporting of the percept by the monkey on some trials, and cells were modulated just as strongly by perceptual as by physical alternations. However, the decrease could also have been due to a more interesting possibility: mixed selectivity of cells for the conscious percept and the suppressed stimulus on single trials in the rivalry condition. In other words, it is possible that single cells encode both the conscious percept and the suppressed stimulus during rivalry. Inter-trial averaging confounds these two possibilities. To distinguish them, it is critical to compare perceptual vs. physical response modulations for single trials.

To explore correlates of conscious perception, we targeted recordings to macaque face patches ML and AM. The macaque face patch system constitutes an anatomically connected network of regions in IT cortex dedicated to face processing (Chang and Tsao, 2017; Grimaldi et al., 2016; Hesse and Tsao, 2020; Tsao et al., 2006) and has served as an archetypal system for understanding object recognition in IT in general (Bao et al., 2020). To date, most response properties of cells in the face patch network can be explained using a feedforward framework without invoking conscious perception. For example, the functional hierarchy of this network, with increasing view invariance as one moves anterior from ML to AM (Freiwald and Tsao, 2010), can be explained by simple feedforward pooling mechanisms (Leibo et al., 2017). The representation of facial identity by cells in face patches through projection onto specific preferred axes can also be explained by feedforward mechanisms (Chang and Tsao, 2017).

Here, we explore activity in the face patch network using a binocular rivalry paradigm in which neural activity modulation is difficult to explain by feedforward filtering processes, since the stimulus remains unchanged. The hierarchical and feedback-rich organization of the face patch network (Freiwald and Tsao, 2010; Grimaldi et al., 2016) makes it a ripe testbed to examine the neural circuits underlying construction of conscious visual experience beyond feedforward filtering of visual input. It has been postulated that the fundamental architecture of the cortex is a predictive loop in which inference guided by internal priors plays a key role in determining what we see (Rao and Ballard, 1999). One explanation for binocular rivalry is that it arises as a consequence of such predictive coding, reflecting a high-level prior that two objects cannot occupy the same space (Hohwy et al., 2008).

We recorded from fMRI-identified face patches ML and AM in two monkeys using high channel-count electrodes, while we inferred the animals’ conscious percept through the no-report paradigm described above. We found that large proportions of cells in both face patches (57% in ML and 73% in AM) encoded the conscious percept even without active report. Population activity of perceptually modulated cells was modulated more weakly during rivalry than during physical stimulus transitions in single trials. Nevertheless, we could reliably decode the dynamically changing conscious percept from activity in single trials. Surprisingly, we could also decode suppressed stimuli using activity from the same cells, indicating that single cells multiplex information about the conscious percept and the suppressed stimulus. These findings suggest that the neural correlate of consciousness within IT cortex resides in a population code rather than a subset of cells perfectly reflecting consciousness, and different linear readouts can decode either the consciously perceived or the suppressed stimulus from the same population.

Results

We first confirmed that it is possible to correctly infer a subject’s conscious percept using a fixation-based no-report paradigm through a behavioral experiment in humans. We presented binocular rivalry stimuli consisting of a face (e.g., Obama) in the right eye and a non-face object (e.g., a taco) in the left eye, causing the percept to stochastically alternate between the two (Figure 1a). Each of the stimuli contained a fixation spot that jumped to one of four possible locations every trial. Trials were 2000 ms long and contained no blank period, that is, stimuli were presented continuously. If subjects fixated at the fixation spot presented in the right eye on a given trial, we inferred that they perceived the face and vice versa for the object. To verify that the percept of face or object could be inferred from fixations, we instructed six naïve human subjects to perform the fixation task while simultaneously reporting their conscious percept with button presses. On trials where the percept switched, subjects also switched the fixation spot they were following (Figure 1b). We were able to infer which image the subjects were consciously perceiving with accuracies ranging from 86% to 98% across subjects (average: 93%, Figure 1c).

We next used the same method in monkeys to infer their conscious percept while recording from face patches ML and AM in IT. Importantly, the two monkeys in this study had never been trained to report their percept. They had previously been trained to maintain fixation on a spot (presented binocularly) and learned to perform the new task within 1 or 2 days, respectively (maintaining fixation on a spot for at least 80% of all trials). Since the monkeys were so adept at the task, we set the trial length to 800 ms (compared to 2000 ms in humans); this allowed higher temporal fidelity in determining the animal’s percept. We presented two types of stimuli: in the ‘physical’ condition, unambiguous monocular stimuli were physically switched between face and object. In the ‘perceptual’ (binocular rivalry) condition, the same face and object were continuously presented to the right and left eyes, respectively, so any changes in percept were internally generated. To account for individuals’ eye dominance, we balanced the contrasts of the stimuli in the two eyes so that the monkey followed both fixation spots equally often in the rivalry condition. After balancing, median dominance durations were 7.2 s for faces and 7.2 s for objects across the two monkeys. Similarly, in human subjects, median dominance durations were 8 s for faces and 10 s for objects as estimated from fixation patterns, and 8.1 s for faces and 8.3 s for objects as estimated from reports. We inferred switches during rivalry when monkeys behaviorally switched from following the fixation spot in one eye to following the fixation spot in the other eye (example eye traces, Figure 2a, top). Spike rasters from an example ML cell showed a stronger response after switches from face to object compared to switches from object to face (Figure 2a, bottom; rasters aligned to onset of trials in which a switch occurred). Figure 2b compares average response time courses to physical vs. perceptual switches in two example cells, one from ML and one from AM. Both cells responded more strongly to a physically presented face than object. Importantly, in the binocular rivalry condition the response of both cells was also higher when the monkey perceived a face (as inferred by its eye movement) than when the monkey perceived an object. Since the physical stimulus was constant in this condition, the response reflected the monkey’s conscious percept of a face and not just the physical input.

Figure 2. Example face cells modulated by both physical and perceptual switches.

Figure 2.

(a) Top: Example eye traces from a macaque performing the task aligned to a trial where the inferred percept switched from face to object (left) and from object to face (right). Red and blue curves indicate distances from the face and object fixation spots, respectively (as in Figure 1b). Bottom: Spike raster of an example ML cell recorded in the same session as for the top panel. Responses are aligned to all trials where the inferred percept switched from face to object (left) and from object to face (right). (b) Left: Coronal slices from magnetic resonance imaging scan showing recording locations for the two example cells in this figure (top: face patch ML, bottom: face patch AM). Color overlay shows functional magnetic resonance imaging activation to visually presented faces vs. non-face objects. Middle: Peristimulus histograms (PSTHs) show neuronal response time courses aligned to trial onsets where the visual stimulus was physically switched from face to object (blue) or from object to face (red). Right: PSTHs aligned to trial onsets where the inferred percept switched from face to object (blue) or object to face (red). ML cell is same cell as in (a). Shaded areas indicate standard error of the mean across trials.

We recorded a total of 348 cells in ML and 210 cells in AM that were selective, i.e., they showed a significant difference between face and object in the physical switch condition (p<0.05, two-sided two-sample t-test). Since we recorded from face patches, most cells showed stronger responses to the physically presented face stimulus. Importantly, most cells kept their preference in the binocular rivalry condition (Figure 3). In face patch ML, 57% (200/348) of cells were significantly modulated by the conscious percept in the binocular rivalry condition and showed preference consistent with the physical switch condition (p<0.05, two-sided t-test), while 10% (34/348) of cells were significantly but inconsistently modulated. In AM, a face patch that receives input from ML (Grimaldi et al., 2016) and is the highest patch in the face patch hierarchy within IT (Freiwald and Tsao, 2010), the percentage of significant consistent modulation increased to 73% (153/210), with only 2% (5/210) showing significant inconsistent modulation. For both patches there was a clear correlation between modulation by the physical stimulus and modulation by the percept in binocular rivalry (p=2×1083, Pearson’s r=0.70, N=558 cells). Thus, in a no-report paradigm, cells in IT exhibit modulations by the conscious percept that reflect their response selectivity to physically unambiguous inputs.

Figure 3. Large proportions of face cells show modulation by conscious percept.

The scatterplot shows modulation indices (RfaceRobject)/(Rface+Robject) measuring the difference in responses (i.e., average spike count R) on trials where the inferred percept was face vs. trials where the inferred percept was object for the physical monocular condition (x-axis) and perceptual binocular rivalry condition (y-axis). Yellow and orange triangles show cells from ML without and with significant difference between perceived face and perceived object response in the binocular rivalry condition, respectively. Blue and green squares show cells from AM without and with significant difference between perceived face and perceived object response in the binocular rivalry condition, respectively.

Figure 3.

Figure 3—figure supplement 1. Color and eye-of-origin confound control.

Figure 3—figure supplement 1.

Left: Scatterplot similar to Figure 3 but modulation indices (RpreferredRnonpreferred)/(Rpreferred+Rnonpreferred) now show the difference between preferred and non-preferred stimulus. The preferred stimulus is face if the response to face is higher and non-face object if the response to non-face object is higher in the physical condition. Thus, by definition the x-values of all cells are positive. Right: Scatterplot of modulation indices (RpreferredRnonpreferred)/(Rpreferred+Rnonpreferred) for the same preferred and non-preferred object identities of stimuli when the colors and eye-of-origin of the two stimuli were switched. Importantly, the preference of a given stimulus identity was assigned based on responses to stimuli of the original color and eye-of-origin. N = 193 for ML and N = 120 for AM for both plots. Symbols have same conventions as in Figure 3.

After eliminating the report confound, two important potential confounds remain. First, cells could be selective for the eye-of-origin of the fixation point that the animal is following (e.g., a cell could respond selectively to a fixation spot in the fovea of the left eye). Second, since we presented binocular stimuli using red/cyan anaglyph goggles, a confound could arise if cells were selective for the color of the fixation spot that is in the fovea. To control for these two potential confounds, we switched the colors and eye-of-origin of the face and object stimuli, that is, where the face and its corresponding fixation spot was previously presented in red in one eye, it was now presented in cyan in the other eye and vice versa for the object (Figure 3—figure supplement 1). If cells followed color or eye-of-origin, then all the dots in the upper right quadrant in Figure 3—figure supplement 1a should move to the lower left quadrant in Figure 3—figure supplement 1b. Instead, the majority of cells followed the object identity rather than color or eye-of-origin for both the physical and perceptual conditions (p=9×1042 for physical condition and p=1019 for perceptual condition, one-sided t-test, N=313 cells, alternative hypothesis: modulation indices for switched condition are greater than 0). This confirms that cells in IT cortex indeed represent the conscious percept rather than the color or eye-of-origin of the fixation spot.

The strong modulation by conscious percept in single cells suggests that we should be able to decode the percept on single trials from population activity. To test this, we performed recordings from multiple neurons simultaneously using S-probes with 32 electrode sites and passive Neuropixels-like probes with 128 electrode sites (see Materials and methods for details). Figure 4 shows the recordings from face patch ML in one session using the Neuropixels probe. In this session, we recorded 81 cells simultaneously, of which 63 were face-selective (Figure 4a). An example population time course snippet of cells recorded simultaneously in the perceptual switch condition showed clearly stronger activity across the recorded population during perception of face compared to object (Figure 4b). The average population response across cells to perceptual switches is shown in Figure 4c. We found above chance decoding of the perceptual condition in all 12 sessions (in all but one session, responses were recorded in both ML and AM, and cells were pooled across the two patches). Cross-validated accuracies of linear classifiers across different sessions are shown in Figure 4d (see Materials and methods). Decoding accuracy was 99% for the best session and 95% on average for the physical condition. For the perceptual condition, decoding accuracy was 88% on the best session and 78% on average.

Figure 4. Multi-channel recordings allow decoding of conscious percept on single trials.

Figure 4.

(a) Left: Average responses (baseline-subtracted and normalized) of cells (rows) to 96 stimuli (columns) from six categories, including faces and other objects. Right: Waveforms of cells corresponding to rows on the left. Gray vertical bar on left indicates cells that significantly preferred face over object in the physical condition (p<0.05). (b) Top: Example eye trace across 24 trials as in Figure 1b during binocular rivalry (i.e., only perceptual, no physical switches). The inferred percept across trials according to eye trace is indicated by shading (red = face, blue = non face object). Small black dots on top of eye traces indicate time points where our method detected saccades (see Materials and methods), which are used in Figure 5. Bottom: Response time course snippet of a population of 81 neurons recorded with a Neuropixels-like probe in ML simultaneously to the eye trace at top. Each row represents one cell; ordering same as in (a). Face-selective cells indicated by gray vertical bar on left. (c) Normalized average population response across all significantly face-selective ML cells recorded from one Neuropixels session (same session as in a and b) to perceptual switch from object to face (red) and face to object (blue). Shaded areas indicate standard error mean across cells. (d) Cross-validated decoding accuracy of a linear classifier trained to discriminate trials of inferred percept face vs. inferred percept object for the physical switch condition (x-axis) and perceptual switch condition (y-axis). Each plus symbol represents a session of neurons recorded simultaneously with multi-channel electrodes.

Looking at the population time course, we noticed bursts of activity that appeared to be triggered by saccades, which occurred even when an object was perceived (blue epochs in Figure 4b; small black dots on top indicate detected saccades). This suggested to us that cells modulated by perception might still carry information about the physical stimulus: the bursts may have been caused by responses to the suppressed face stimulus. To investigate this further, we selected cells that (i) showed both significant physical and perceptual modulation and (ii) consistently preferred the face over the object. We then averaged responses across these cells and computed response time courses triggered by individual saccades, grouped by whether a saccade occurred during a trial inferred to be face or object, respectively (Figure 5). We observed response modulations for both physical and perceptual conditions starting around 130 ms after saccade onset (Figure 5a). In the physical condition, a saccade during an object epoch led to response suppression, while a saccade during a face epoch led to response increase. In striking contrast, in the rivalry condition saccades led to response increase in both object and face epochs. As a consequence, during rivalry the response difference to a saccade between face and object, though significant (p=6×1023, two-sample t-test, N=701 saccades for object, N=703 saccades for face), was weaker than during the physical condition. Computing histograms of responses averaged across neurons for individual saccades shows that responses in the rivalry condition were less bimodal and spanned a smaller range compared to the physical condition (Figure 5b). Importantly, this difference in response profiles between physical and perceptual conditions was apparent even when pooling across both face and object trials (Figure 5b, middle), and hence cannot be explained by mistakes in inferring the percept from eye movements. We computed the absolute value of these responses and found the difference in response distributions to be significant (Figure 5b, right, p=6×1035, two-sample t-test on absolute value distributions, N=229 saccades for physical condition, N=1404 saccades for perceptual condition).

Figure 5. Saccade-triggered responses are less bimodal during rivalry.

(a) Single-trial responses during saccades averaged across simultaneously recorded ML neurons from the same session as in Figure 4b that were significantly face-selective for both physical and perceptual conditions. Individual neuron responses were normalized to make the mean object response −1 and the mean face response +1. Rows of each plot correspond to response time courses to individual saccades, aligned to saccade onset, and sorted by average response during 0–400 ms after saccade onset. Top: Physical condition. Bottom: Perceptual condition. Left, middle, and right columns correspond to saccades during object epochs, face epochs, and across both, respectively. The difference between perceptual and physical conditions in the third column shows that this difference cannot be simply attributed to mislabeling of perceptual state by the no-report paradigm. (b) Histograms of saccade-aligned responses averaged across a time window of 0–400 ms after saccade onset and across neurons (after normalizing as in (a)) that were significantly modulated for both physical and perceptual conditions. Top: Physical condition. Bottom: Perceptual condition. Left: Saccades for face and object plotted separately in red and blue, respectively. Middle: Saccades for either face or object epochs plotted in gray. Right: Absolute values of normalized responses plotted in light gray.

Figure 5.

Figure 5—figure supplement 1. Lack of bimodality is a general trademark of rivalry.

Figure 5—figure supplement 1.

(a) Trial responses in ML are less bimodal during rivalry. Histograms have same conventions as Figure 5b but instead of averaging neuron responses for individual saccades, responses are averaged across trial duration for individual trials. (b) Trial responses in AM are less bimodal during rivalry. Same conventions as in (a), but instead of the Neuropixels-like probe in ML, cells were recorded simultaneously from AM using a 32-channel S-probe.

The observation of different response profiles for physical and perceptual conditions was not specific for saccades: histograms were also less bimodal and spanned a smaller range for the rivalry condition when triggering responses on trial onsets rather than saccades in both ML (Figure 5—figure supplement 1a, p=9×1015, two-sample t-test on absolute value distributions, N=150 trials for physical condition, N=571 trials for perceptual condition) and AM (Figure 5—figure supplement 1b, p=0.0014, two-sample t-test on absolute value distributions, N=120 trials for physical condition, N=480 trials for perceptual condition). Therefore, it appears that throughout rivalry, for perceptually modulated cells, response differences to face and object are less pronounced than in the physical condition, and this is true in both ML and AM. One tantalizing explanation for this phenomenon is that perceptually modulated cells may be multiplexing information about both the physical stimulus and the perceptual state during single trials, allowing both to be simultaneously represented across the face patch hierarchy.

Is it possible that the apparent responses to the suppressed face were due to incomplete suppression, leading to piecemeal percepts on some trials? We performed simulations of the worst-case effect of mixture, in which the percept would be exactly half-face and half-object, by taking the responses of the physical condition and averaging responses to face and body on a specific proportion of trials. The simulated distributions only became statistically indistinguishable from the observed binocular rivalry condition if 50–70% of trials were mixed percepts of half-face and half-body. This is markedly inconsistent with reports from every human subject that on most trials they did not perceive any mixture. We of course cannot be absolutely sure that monkeys do not experience mixed percepts significantly more often than humans. Yet, under the reasonable assumption that percepts were similar in the two species, trials with mixed or piecemeal percepts cannot account for the difference in response distributions between physical and perceptual conditions.

To directly test the hypothesis that cells multiplex information about the perceptually dominant and suppressed stimulus, we performed a new experiment in which we varied the identity of the suppressed stimulus. In this experiment, instead of having only two rivalling stimuli, we used three images A, B, and C to create two different binocular rivalry stimuli, (A,B) and (A,C), presented in separate blocks (Figure 6a). This allowed us to keep the dominant percept fixed as image A, and compare responses to trial types (A,B) and (A,C) to test whether neural responses could discriminate the suppressed stimulus (bold font indicates the dominant image, as inferred by eye movements). We trained a linear decoder to distinguish between trial types (A,B) and (A,C). Remarkably, the decoding accuracy for distinguishing the two trial types was 74% (Figure 6b). For comparison, the decoding accuracy for distinguishing (A,B) vs. (A,C) from the same cell population was 88%. Thus, while the conscious percept can be decoded better than the suppressed stimulus, face cells do encode significant information about the latter. Potential mislabeling of trials by the no-report paradigm could not account for this decoding accuracy (see Supplementary text).

Figure 6. Face cells multiplex information about both the perceptually dominant and perceptually suppressed stimulus.

Figure 6.

(a) Schematic of experiment design. Two types of binocular rivalry stimuli consisting of image pairs (A,B) and (A,C), respectively, were presented. During one image pair block, 12 trials corresponding to the 12 positions of the two fixation spots were presented in randomized order before the next block corresponding to the other image pair was presented. This was repeated more than 60 times. (b) Decoding accuracy for distinguishing (A,B) from (A,C) was 74% (black vertical line), even though the conscious percept was A for both trial types. As a control, we shuffled labels 100 times and attempted to perform decoding. Gray bars show the distribution of decoding accuracies for these 100 shuffle iterations. (c) Scatterplot showing dominant stimulus modulation indices MIdominant=(RABRAC)/(RAB+RAC ) on the x-axis and suppressed stimulus modulation indices MIsuppressed=(RABRAC)/(RAB+RAC ) on the y-axis. Each triangle represents 1 of 66 physically selective cells recorded in one session from face patch ML with a 64-ch S-probe. (d) Schematic of three possible models for how perceptually modulated neurons may encode consciously perceived and suppressed stimuli during binocular rivalry. Left: (I) Neural responses encode the conscious percept in binocular rivalry identically to the corresponding unambiguous physical stimulus; x and y axes represent two dimensions of neural state space. Middle: (II) Responses during binocular rivalry lie in between the two stimuli but are biased toward the dominant stimulus. Right: (IIa) Spikes reflect a weighted sum of consciously perceived and suppressed stimuli and are generated through a Poisson process based on average firing rates. (IIb) Two different types of spikes, defined, for example through a temporal code, encode the consciously perceived and veridical physical stimulus, respectively. The time course in this schematic is from a single perceptual dominance period and divided into different epochs that represent either the conscious or physical stimulus.

Do the same cells multiplex information about both the conscious and suppressed stimuli, or are there two distinct subpopulations, one encoding the conscious stimulus and another encoding the suppressed stimulus? To address this question, we compared modulation indices for the dominant stimulus with modulation indices for the suppressed stimulus for each cell. For the former, we fixed the suppressed stimulus while varying the dominant stimulus, that is, we compared responses to trial type (A,B) with responses to trial type (A,C) to compute the modulation index MIdominant=(RABRAC)/(RAB+RAC ). For the latter, we fixed the dominant stimulus while varying the suppressed stimulus, that is, we compared responses to trial type (A,B) with responses to trial type (A,C) to compute the modulation index MIsuppressed=(RABRAC)/(RAB+RAC ) (Figure 6c). We found a positive correlation between dominant stimulus modulation indices and suppressed stimulus modulation indices (p=1.4×106, Pearson’s r=0.55, N=66 physically selective cells). This suggests that cells strongly modulated by the dominant stimulus tend to be similarly modulated by the suppressed stimulus. Thus there are not two separate populations of cells that encode conscious and unconscious stimuli.

In summary, our findings indicate that the neural correlate of consciousness in IT does not reside in a subset of cells perfectly reflecting consciousness but rather in a population code. This is supported by the findings that (i) modulation by the conscious percept is weaker than modulation by the physical stimulus (Figures 3 and 5), (ii) both consciously perceived and suppressed stimuli can be decoded from the same population (Figure 6b), and (iii) modulation indices for consciously perceived and suppressed stimuli are correlated in single cells (Figure 6c).

Discussion

In this study, we developed a new no-report paradigm for tracking conscious state and used it to investigate the neural correlate of consciousness in face patches within macaque IT cortex. We made two new findings. First, we found that face patches ML and AM are modulated by conscious perception and do not merely encode the physical input. Importantly, monkeys in this study had never been trained to actively report their percept. Instead, we were able to infer their percept from eye movements using a new no-report paradigm. Thus activity modulations attributed to switches in conscious perception in IT cannot be explained simply by active report. Second, we found that cells in face patches are modulated by the identity of both the consciously perceived stimulus and the suppressed stimulus, such that both stimuli can be read out from the same population using different linear decoders. This finding challenges the widely held notion that in IT cortex almost all neurons respond only to the consciously perceived stimulus.

Previous single-unit recordings in IT cortex using active report to infer the percept found 90% of cells represent the conscious percept (Sheinberg and Logothetis, 1997). Here, we found proportions of 57% in ML and 73% in the more anterior patch AM. The quantitative difference may be due to several factors including different recording sites (Sheinberg and Logothetis recorded from both upper and lower banks of the superior temporal sulcus in a less specifically targeted manner), imperfect accuracy of the no-report paradigm, and differences in stimuli and analysis methods. Importantly, our results confirm that the majority of cells in IT cortex do represent conscious perception. Furthermore, this new paradigm makes studies of consciousness in monkeys more accessible, by replacing the need to train the animal to signal its conscious percept (which can be a laborious process) with a simple task that only requires animals to follow a fixation spot.

Our results show that for cells that are modulated by conscious perception, the modulation is not ‘all-or-none’. Instead, the average response modulation during the perceptual condition was weaker than during the physical condition (Figure 3). This was also observed in a previous study of rivalry (Sheinberg and Logothetis, 1997), but somehow, this fact has been forgotten in popular lore surrounding the neural correlates of consciousness. For example, the Wikipedia entry for ‘neural correlates of consciousness’ states that “in [inferior temporal cortex] almost all neurons responded only to the perceptually dominant stimulus, so that a ‘face’ cell only fired when the animal indicated that it saw the face and not the pattern presented to the other eye”. We think the reason this fact - the decreased average modulation of IT cells by switches in conscious percept compared to switches in physical stimulus - has not garnered much attention up to now is that it could, at least up to now, be simply explained by imperfect labeling of the animal’s perceptual state.

The key question is: what happens during single trials? In the rivalry condition, do responses in single trials look like those to either physically presented faces or objects? By recording from a large number of face cells simultaneously using a novel 128-electrode site probe specifically designed for use in primates, we could address this question for the first time. Surprisingly, we found a dramatically different response profile on single trials between the perceptual and physical conditions (Figure 5). Although in the physical condition responses clustered into two groups, in the rivalry condition responses appeared unimodal, lying in between the two clusters for the physical condition. This suggests that single cells are multiplexing the conscious percept and the veridical physical stimulus during single trials. To directly test this hypothesis, we presented more than one binocular rivalry stimulus, created from pairs of three images, and found that the subconscious stimulus could indeed by decoded from face patch activity. Moreover, the same cells that were strongly modulated by the conscious percept also tended to be strongly modulated by the suppressed stimulus, ruling out the existence of a subpopulation of cells in IT purely reflecting consciousness. These findings strongly suggest that rivalry is not fully resolved before IT. It remains an open question where and how the conscious percept is ultimately isolated from the suppressed stimulus to produce conscious awareness of the former and not the latter.

In Figure 6d, we sketch three models for how perceptually modulated cells in IT cortex could encode stimuli during binocular rivalry. In Model I, cells exactly reflect the conscious percept, encoding it the same way they would encode an unambiguous stimulus. In Model II, the response during binocular rivalry is in between the responses to the two unambiguous stimuli, with the contributions of the two stimuli weighted differently depending on which stimulus is dominant and which stimulus is suppressed. Thus, both the consciously perceived stimulus and suppressed stimulus can be decoded using two different decoders. For Model II, one can further distinguish between two different sub-models depending on whether consciously perceived and suppressed stimuli are encoded by different subsets of spikes or not: in Model IIa, spikes are stochastically generated from the average firing rate on a trial, which is determined by a linearly weighted sum of consciously perceived and suppressed stimuli. Alternatively, in Model IIb, there are two different types of spikes that encode the conscious percept or physical stimulus, respectively. The type of a spike may depend on the phase of a high-frequency oscillation at which the spike occurs (the oscillation would need to be faster than alternations in perceptual dominance), or on whether the spike occurs synchronously with spikes from other neurons. Unlike Model IIa, Model IIb harbors an explicit neural correlate of the conscious percept within a subset of spikes. Importantly, our result that the suppressed stimulus can be decoded rules out the cartoon picture of Model I. Our findings are compatible with both Models IIa and IIb, and future experiments may be able to distinguish between the two.

Compared to previous approaches that attempted to isolate representations of the conscious percept, our new no-report binocular rivalry paradigm has several advantages. For flash suppression, where a stimulus flashed in one eye suppresses the stimulus in the other eye, report is also not required (Tsuchiya and Koch, 2005; Wilke et al., 2003; Wolfe, 1984). However, in that case, the physical input when the target is perceived is not identical to that when it is suppressed, and thus any modulation observed may be driven entirely externally. Indeed, it is known that if a distractor stimulus is presented simultaneously with a preferred stimulus, the response can be reduced compared to when the preferred stimulus is presented alone, due to simple normalization mechanisms (Bao and Tsao, 2018). Another paradigm that has been widely used to study the neural correlates of consciousness is backward masking. Here, the stimulus is presented for such a short time before being masked that sometimes it enters consciousness and sometimes not (Breitmeyer et al., 1984). So far, backward masking has always relied on report. Also, it is more susceptible to modulations arising from bottom-up withdrawal of attention or low-level (e.g., retinal) noise, whereas in binocular rivalry perceptual switches appear to be internally generated. One potential confound described by Block as the ‘bored monkey problem’ is that the monkey may still be thinking about whether it is perceiving object or face and internally report it even if it is not required to actively report it (Block, 2020). It is methodologically very difficult to entirely remove this confound, but the fact that monkeys had to simultaneously perform a very challenging unrelated task of saccading to jumping fixation points should at least alleviate this concern.

Alternative approaches to the no-report paradigms of Frässle et al., 2014 have been developed in which the monkey or human subject is unaware of when a perceptual switch is happening and hence cannot report it, either due to anesthesia or due to the difference in stimuli being too subtle to report. Brascamp et al., 2015 reported that fMRI responses to binocular rivalry switches in fronto-parietal regions disappear when the difference between the percepts is made so subtle that subjects cannot report it; however, it is possible that the difference between the percepts was just too small to be picked up by the fMRI signal. Zou et al., 2016 created rivalry stimuli from orthogonal gratings where the grating in one eye was flickered fast enough that it was perceived as uniform gray and only produced fMRI activations in early visual cortex. These stimuli evoked rivalry according to behavioral reports whereas physically uniform stimuli do not, indicating that competition occurred in early visual cortex. In another study consistent with competition in early visual cortex, Xu et al., 2016 performed optical imaging in V1 while monkeys were anesthetized. They found that during binocular rivalry activations clearly alternated in counter-phase between left eye and right eye dominance columns. We note that competition occurring in V1 is not incompatible with our findings, although our findings suggest that rivalry is unlikely to be fully resolved in early areas, given our ability to decode the suppressed stimulus from cells in IT. It should also be noted as a caveat that hemodynamic signals, as measured in the above studies by fMRI or optical imaging, only indirectly reflect neural activity and have previously shown discrepancies with single-unit responses (Leopold and Logothetis, 1996; Tong and Engel, 2001). Overall, to the best of our knowledge, the current study describes the representation of conscious and subconscious stimuli in IT cells in the most confound-free way to date. Our study complements a study conducted in parallel by Kapoor et al., 2020 that found modulations by conscious percept in prefrontal cortex using a different no-report paradigm based on optokinetic nystagmus.

The existence of two directly connected functional modules with a hierarchical relationship (ML and AM) that both encode the conscious percept of a particular type of object opens the possibility for future studies to investigate how changes in conscious percept are coordinated across the brain. Recordings and perturbations in multiple face patches simultaneously using high-channel count recordings may reveal whether switches occur in a feedforward or feedback wave, and thus yield insight into how our interpretation of the world can be rendered consistent across different levels of representation.

Materials and methods

All animal procedures in this study complied with local and National Institute of Health guidelines including the US National Institutes of Health Guide for Care and Use of Laboratory Animals. All experiments were performed with the approval of the Caltech Institutional Animal Care and Use Committee (IACUC). The behavioral experiment with human subjects for the human psychophysics experiment complied with a protocol approved by the Caltech Institutional Review Board (IRB).

Targeting

Two male rhesus macaques were implanted with head posts and trained to fixate on a dot for juice reward. We targeted face patches ML and AM in IT cortex for electrophysiological recordings. ML and AM were identified using fMRI. Monkeys were scanned in a 3T scanner (Siemens), as described previously (Tsao et al., 2006). MION contrast agent was injected to increase signal-to-noise ratio. During fMRI, monkeys passively viewed blocks of faces and blocks of other objects to identify face-selective patches in the brain. Recording chambers (Crist) were implanted over ML and AM. Guide tubes were inserted into the brain 4 mm past the dura through custom-printed grids placed inside the chamber, and electrodes were advanced to the target through the guide tube. Both chamber placement and grid design were planned with the software Planner (Ohayon and Tsao, 2012). After insertion of tungsten electrodes, correct targeting of the desired location was confirmed with anatomical MRI scans.

Electrophysiology

Recordings were performed using tungsten electrodes (FHC) with 1 MΩ impedance and, after correct targeting was confirmed, with 32-channel S-probes (Plexon) with 75 µm and 100 µm inter-electrode distance, and, in three sessions, with passive Neuropixels-like probe prototypes (IMEC) (Dutta et al., 2019; Jun et al., 2017; Trautmann et al., 2019). These prototypes were a limited stock of test devices that were developed and used for testing as part of the development of primate Neuropixels probes and are not available for other labs. Unlike the final product, the prototypes had 128 passive electrode sites across 2 mm (arranged in two parallel staggered bands), but used the same electrode materials and shank specifications (45 mm total shank length). In the additional experiment performed to decode the suppressed stimulus (Figure 6), we recorded with a novel 64-ch. S-probe in face patch ML. All electrodes were advanced to the target using an oil hydraulic Microdrive (Narishige). Neural signals were recorded using an Omniplex system (Plexon). Local field potentials were low-pass filtered at 200 Hz and recorded at 1000 Hz, and units were high-pass filtered at 300 Hz and recorded at 40 kHz. Only well-isolated units were considered for further analysis.

Task

Monkeys were head fixed and viewed an LCD screen (Acer) of 47° size in a dark room. Monkeys viewed stimuli of 5° size wearing red/cyan anaglyph goggles custom made with filters to match the red and green/blue emission spectrum of the screen, respectively, so that inputs to left and right eyes could be controlled independently. Emission spectra were measured using a PR-650 SpectraScan colorimeter (Photo Research). Eye position was monitored using an infrared eye tracking system (ISCAN). The camera recorded one eye through the red/cyan anaglyph filter. We measured the precision of ISCAN eye positions by computing the absolute value of distances between 1 ms adjacent eye data. The median and 99% confidence interval of this jitter was 0.038° and 0.34°, respectively. Note that these confidence intervals should not be contaminated by saccades which occur less frequently than 10 Hz and therefore make up less than 1% of the distribution. In the first phase of the experiment, monkeys passively viewed at least five repeats of 61 screening stimuli in pseudorandom order (250 ms ON time, 100 ms OFF time) with a fixation spot of 0.25° diameter in the center of the screen. Screening stimuli consisted of 20 images of faces and 41 images of non-face objects. During this phase, monkeys received a juice reward for maintaining fixation for at least 3 s. Subsequently, for the main experiment, stimuli contained one or two fixation spots at one of four possible locations (top, bottom, left, and right, 1° from the center) and were presented for 800 ms ON time and 0 ms OFF time. In the case of two fixation spots, stimuli contained one fixation spot per eye and the two spots never appeared at the same location. During this phase, the monkey received a juice reward if the monkey maintained fixation within 0.5° of one of the fixation spots for at least half of the trial duration (i.e., 400 ms, not required to be contiguous). Stimuli during the main experiment included (1) a monocular face/monocular object with one fixation spot and (2) a binocular stimulus composed of a face and a fixation spot in one eye, and an object and a second fixation spot in the other eye. During the binocular rivalry condition, even though the same stimulus was presented continuously, we refer to the 800 ms duration, after which the two fixation spots would change position, as one trial. To improve rivalry and minimize periods of mixture, face and object stimuli were presented at high contrast on backgrounds consisting of gratings that were orthogonal in the two eyes. Moreover, we applied orthogonal orientation filters (with concentration σangle=0.5) to the face and object stimuli, respectively, to increase local orientation contrast and further reduce periods of mixture. For human subjects, stimuli were identical except that the trial duration was 2000 ms, since they had not been extensively trained on the task unlike monkeys and hence needed more time to saccade to the jumping fixation spots. During the additional session performed for decoding the suppressed stimulus (Figure 6), we presented stimuli in a block design. Each block corresponded to an image pair, for example (A,B), where each fixation position was presented in randomized order, that is, eight trials for the physical condition (including four trials of unambiguous A and four trials of unambiguous B), and 12 trials for the perceptual condition, after which another block was presented (Figure 6a). We repeated this design so that each image-pair block was presented for at least 60 repetitions.

Online analysis

Spikes were isolated and sorted online using the PlexControl software (Plexon). During the screening phase, the average number of spikes during the time window from 100 ms to 300 ms was calculated for each unit and stimulus. For each stimulus, the average response across units was determined after normalizing the response of each unit by subtracting the mean and dividing by the standard deviation for the unit. Subsequently, the face stimulus with the highest average response and the object stimulus with the lowest average response were chosen to generate stimuli for the main experiment.

Offline analysis

For human subjects, the inferred percept based on button-presses on a given trial was determined according to the last report the subject made before the end of the trial. For humans and monkeys, we also determined their inferred percept based on eye movements depending on which fixation spot they fixated on if they fixated on one of the fixation spots for at least half of the trial duration (i.e., 400 ms for monkeys or 1000 ms for humans, not required to be contiguous). We computed L1 norms for the distance between eye position and a given fixation spot (Figures 1b, 2a, and 4b). We accounted for an average saccade delay of 350 ms, by analyzing the eye data from 350 ms after trial onset until 350 ms after trial end. For Figure 3Figure 3—figure supplement 1, Figure 4d, Figure 5—figure supplement 1, and Figure 6 in order to exclude trials during which the percept switched back to the opposite percept, we also required the following trial to have the same inferred percept as the current trial. Spikes were re-sorted using the software OfflineSorter (Plexon). For the Neuropixels prototypes, since the high density of electrodes allowed the same neuron to appear on multiple channels, we used Kilosort2 to re-sort spikes (Pachitariu et al., 2016). A total of 653 and 481 cells were recorded in monkey A and monkey O, respectively. To correct for delays in stimulus presentation, we used a photodiode that detected the onset and offset of the stimuli. The output of the photodiode was fed into the recording system and later used to synchronize the onset of the stimulus and the neurophysiological data during offline analysis. Peristimulus time histograms (PSTHs) were smoothed with a box kernel (100 ms width). For computing modulation indices we used the average spike count across trials as response. Decoding analysis was performed with a support vector machine with a linear kernel (Matlab fitcsvm) trained to discriminate trials where the inferred percept was face or object, respectively. As predictor variables we used the spike count during the 800 ms of each trial for all simultaneously recorded neurons. All decoding accuracies were cross-validated. In more detail, one trial was chosen for testing and the remaining trials for training; this was repeated for all trials to compute decoding accuracies. Criteria for detecting a saccade were as follows: A saccade was detected at time t if the distance between the mean eye position during t−100,…t−2 ms and the mean eye position during t+2,…t+100 ms was greater than 0.5°, and the eye position during t−100,…t−2 ms and t+2,…t+100 ms, respectively, stayed within 0.5° of the respective mean for at least 80% of the duration of each period. We also required consecutive saccades to be at least 100 ms apart from each other. All analyses were performed using Matlab (MathWorks).

Acknowledgements

This work was supported by HHMI and the Simons Foundation. We are grateful to members of the Tsao lab for feedback on the manuscript, Varun Wadia for helping us collect the human subject data, Audo Flores for animal support, Daniel Wagenaar and Eric Trautmann for technical assistance, and Barun Dutta, Tim Harris, Tirin Moore, Michael Shadlen, Krishna Shenoy, and HHMI for contributions to development of NHP Neuropixels probes.

Appendix 1

Supplementary text

In the Results section we found that the suppressed stimulus, that is, B or C in binocular rivalry trials (A,B) vs. (A,C), where A is the consciously perceived image, could be decoded from neural activity with 74% accuracy. A natural question arising from the decoding accuracy of 74% is whether this could have been due to mislabeling by the no-report paradigm. On some trials, the conscious percept may have been mislabeled as (A,B) or (A,C) and actually have been (A,B) or (A,C), respectively. In this case, even if cells only encode the conscious percept and not the suppressed stimulus, the decoding accuracy would be higher than chance, as on those mislabeled trials, the decoder could successfully discriminate based on a difference in conscious percept. To address this concern, below we estimate the worst-case decoding accuracy increase we could expect from mislabelings under the null hypothesis that neurons do not encode the suppressed stimulus. For image pair (A,B), we could decode (A,B) vs. (A,B), that is, whether A or B was consciously perceived as in Figure 4d, with 89% accuracy in this session. If we had recorded more neurons, or neurons that were more selective, we would expect a decoding accuracy at least as high. Since there is physically no difference between trial types (A,B) and (A,B), any information that the decoder was able to acquire must have come from the difference in conscious percept. Thus, we can use 89% as a lower bound for the estimated accuracy of the no-report paradigm in inferring the correct conscious percept in this session. Under the null hypothesis that neurons only encode the conscious percept, the decoding accuracy for distinguishing (A,B) from (A,C) for 89% of trials should be chance. For the remaining 11% of trials, the conscious percept may have been B or C, respectively. Even if the decoder can decode all of these mislabeled trials with 100% accuracy (which is an overestimate), the decoding accuracy across all trials would be at most 89%×50%+11%×100%=55.5%. So even in the worst-case, the mislabeled trials would not lead to the observed decoding accuracy of 74%. This suggests that face cells do indeed encode the suppressed image.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Janis Karan Hesse, Email: jhesse@caltech.edu.

Doris Y Tsao, Email: dortsao@caltech.edu.

Ming Meng, South China Normal University, China.

Floris P de Lange, Radboud University, Netherlands.

Funding Information

This paper was supported by the following grants:

  • Howard Hughes Medical Institute to Doris Y Tsao.

  • Simons Foundation to Doris Y Tsao.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Conceptualization, Resources, Supervision, Funding acquisition, Methodology, Writing - original draft, Project administration, Writing - review and editing.

Ethics

Human subjects: The behavioral experiment with human subjects for the human psychophysics experiment complied with a protocol approved by the Caltech Institutional Review Board (IRB 19-0903). Informed consent was obtained from all subjects.

Animal experimentation: All animal procedures in this study complied with local and National Institute of Health guidelines including the US National Institutes of Health Guide for Care and Use of Laboratory Animals. All experiments were performed with the approval of the Caltech Institutional Animal Care and Use Committee (IACUC), under protocol #1574.

Additional files

Transparent reporting form

Data availability

All data generated or analysed during this study are included in the manuscript and supporting files.

References

  1. Aru J, Bachmann T, Singer W, Melloni L. Distilling the neural correlates of consciousness. Neuroscience & Biobehavioral Reviews. 2012;36:737–746. doi: 10.1016/j.neubiorev.2011.12.003. [DOI] [PubMed] [Google Scholar]
  2. Bao P, She L, McGill M, Tsao DY. A map of object space in primate inferotemporal cortex. Nature. 2020;583:103–108. doi: 10.1038/s41586-020-2350-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bao P, Tsao DY. Representation of multiple objects in macaque category-selective Areas. Nature Communications. 2018;9:1774. doi: 10.1038/s41467-018-04126-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Blake R, Brascamp J, Heeger DJ. Can binocular rivalry reveal neural correlates of consciousness? Philosophical Transactions of the Royal Society B: Biological Sciences. 2014;369:20130211. doi: 10.1098/rstb.2013.0211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Block N. What is wrong with the No-Report paradigm and how to fix it. Trends in Cognitive Sciences. 2019;23:1003–1013. doi: 10.1016/j.tics.2019.10.001. [DOI] [PubMed] [Google Scholar]
  6. Block N. Finessing the bored monkey problem. Trends in Cognitive Sciences. 2020;24:167–168. doi: 10.1016/j.tics.2019.12.012. [DOI] [PubMed] [Google Scholar]
  7. Boly M, Massimini M, Tsuchiya N, Postle BR, Koch C, Tononi G. Are the neural correlates of consciousness in the front or in the back of the cerebral cortex? clinical and neuroimaging evidence. The Journal of Neuroscience. 2017;37:9603–9613. doi: 10.1523/JNEUROSCI.3218-16.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bradley MM, Miccoli L, Escrig MA, Lang PJ. The pupil as a measure of emotional arousal and autonomic activation. Psychophysiology. 2008;45:602–607. doi: 10.1111/j.1469-8986.2008.00654.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Brascamp J, Blake R, Knapen T. Negligible fronto-parietal BOLD activity accompanying unreportable switches in bistable perception. Nature Neuroscience. 2015;18:1672–1678. doi: 10.1038/nn.4130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Breitmeyer BG, Hoar WS, Randall D, Conte FP. Visual Masking: An Integrative Approach. Clarendon Press; 1984. [Google Scholar]
  11. Chang L, Tsao DY. The code for facial identity in the primate brain. Cell. 2017;169:1013–1028. doi: 10.1016/j.cell.2017.05.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dutta B, Andrei A, Harris T, Lopez C, O’Callahan J, Putzeys J, Raducanu B, Severi S, Stavisky S, Trautmann E. The neuropixels probe: a CMOS based integrated microsystems platform for neuroscience and brain-computer interfaces. 2019. IEEE International Electron Devices Meeting (IEDM); 2019. [DOI] [Google Scholar]
  13. Frässle S, Sommer J, Jansen A, Naber M, Einhauser W. Binocular rivalry: frontal activity relates to introspection and action but not to perception. Journal of Neuroscience. 2014;34:1738–1747. doi: 10.1523/JNEUROSCI.4403-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Freiwald WA, Tsao DY. Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science. 2010;330:845–851. doi: 10.1126/science.1194908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gelbard-Sagiv H, Mudrik L, Hill MR, Koch C, Fried I. Human single neuron activity precedes emergence of conscious perception. Nature Communications. 2018;9:2057. doi: 10.1038/s41467-018-03749-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Grimaldi P, Saleem KS, Tsao D. Anatomical connections of the functionally defined "Face Patches" in the Macaque Monkey. Neuron. 2016;90:1325–1342. doi: 10.1016/j.neuron.2016.05.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hesse JK, Tsao DY. The macaque face patch system: a turtle’s underbelly for the brain. Nature Reviews Neuroscience. 2020;147:695–716. doi: 10.1038/s41583-020-00393-w. [DOI] [PubMed] [Google Scholar]
  18. Hoeks B, Levelt WJM. Pupillary dilation as a measure of attention: a quantitative system analysis. Behavior Research Methods, Instruments, & Computers. 1993;25:16–26. doi: 10.3758/BF03204445. [DOI] [Google Scholar]
  19. Hohwy J, Roepstorff A, Friston K. Predictive coding explains binocular rivalry: an epistemological review. Cognition. 2008;108:687–701. doi: 10.1016/j.cognition.2008.05.010. [DOI] [PubMed] [Google Scholar]
  20. Jun JJ, Steinmetz NA, Siegle JH, Denman DJ, Bauza M, Barbarits B, Lee AK, Anastassiou CA, Andrei A, Aydın Ç, Barbic M, Blanche TJ, Bonin V, Couto J, Dutta B, Gratiy SL, Gutnisky DA, Häusser M, Karsh B, Ledochowitsch P, Lopez CM, Mitelut C, Musa S, Okun M, Pachitariu M, Putzeys J, Rich PD, Rossant C, Sun WL, Svoboda K, Carandini M, Harris KD, Koch C, O'Keefe J, Harris TD. Fully integrated silicon probes for high-density recording of neural activity. Nature. 2017;551:232–236. doi: 10.1038/nature24636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kapoor V, Dwarakanath A, Safavi S, Werner J, Besserve M, Panagiotaropoulos TI, Logothetis NK. Decoding the contents of consciousness from prefrontal ensembles. bioRxiv. 2020 doi: 10.1101/2020.01.28.921841. [DOI]
  22. Koch C, Massimini M, Boly M, Tononi G. Neural correlates of consciousness: progress and problems. Nature Reviews Neuroscience. 2016;17:307–321. doi: 10.1038/nrn.2016.22. [DOI] [PubMed] [Google Scholar]
  23. Leibo JZ, Liao Q, Anselmi F, Freiwald WA, Poggio T. View-Tolerant face recognition and hebbian learning imply Mirror-Symmetric neural tuning to head orientation. Current Biology. 2017;27:62–67. doi: 10.1016/j.cub.2016.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Leopold DA, Plettenberg HK, Logothetis NK. Visual processing in the ketamine-anesthetized monkey. Experimental Brain Research. 2002;143:359–372. doi: 10.1007/s00221-001-0998-0. [DOI] [PubMed] [Google Scholar]
  25. Leopold DA, Logothetis NK. Activity changes in early visual cortex reflect monkeys' percepts during binocular rivalry. Nature. 1996;379:549–553. doi: 10.1038/379549a0. [DOI] [PubMed] [Google Scholar]
  26. Ohayon S, Tsao DY. MR-guided stereotactic navigation. Journal of Neuroscience Methods. 2012;204:389–397. doi: 10.1016/j.jneumeth.2011.11.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Overgaard M, Fazekas P. Can No-Report paradigms extract true correlates of consciousness? Trends in Cognitive Sciences. 2016;20:241–242. doi: 10.1016/j.tics.2016.01.004. [DOI] [PubMed] [Google Scholar]
  28. Pachitariu M, Steinmetz NA, Kadir SN, Carandini M, Harris KD. Fast and accurate spike sorting of high-channel count probes with KiloSort. Advances in Neural Information Processing Systems.2016. [Google Scholar]
  29. Panagiotaropoulos TI, Dwarakanath A, Kapoor V. Prefrontal Cortex and Consciousness: Beware of the Signals. Trends in Cognitive Sciences. 2020;24:343–344. doi: 10.1016/j.tics.2020.02.005. [DOI] [PubMed] [Google Scholar]
  30. Preuschoff K, 't Hart BM, Einhäuser W. Pupil dilation signals surprise: evidence for noradrenaline's Role in Decision Making. Frontiers in Neuroscience. 2011;5:115. doi: 10.3389/fnins.2011.00115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Rao RP, Ballard DH. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience. 1999;2:79–87. doi: 10.1038/4580. [DOI] [PubMed] [Google Scholar]
  32. Safavi S, Kapoor V, Logothetis NK, Panagiotaropoulos TI. Is the frontal lobe involved in conscious perception? Frontiers in Psychology. 2014;5:1063. doi: 10.3389/fpsyg.2014.01063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Sheinberg DL, Logothetis NK. The role of temporal cortical Areas in perceptual organization. PNAS. 1997;94:3408–3413. doi: 10.1073/pnas.94.7.3408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Tong F, Nakayama K, Vaughan JT, Kanwisher N. Binocular rivalry and visual awareness in human extrastriate cortex. Neuron. 1998;21:753–759. doi: 10.1016/S0896-6273(00)80592-9. [DOI] [PubMed] [Google Scholar]
  35. Tong F, Meng M, Blake R. Neural bases of binocular rivalry. Trends in Cognitive Sciences. 2006;10:502–511. doi: 10.1016/j.tics.2006.09.003. [DOI] [PubMed] [Google Scholar]
  36. Tong F, Engel SA. Interocular rivalry revealed in the human cortical blind-spot representation. Nature. 2001;411:195–199. doi: 10.1038/35075583. [DOI] [PubMed] [Google Scholar]
  37. Trautmann EM, Stavisky SD, Lahiri S, Ames KC, Kaufman MT, O'Shea DJ, Vyas S, Sun X, Ryu SI, Ganguli S, Shenoy KV. Accurate estimation of neural population dynamics without spike sorting. Neuron. 2019;103:292–308. doi: 10.1016/j.neuron.2019.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Tsao DY, Freiwald WA, Tootell RB, Livingstone MS. A cortical region consisting entirely of face-selective cells. Science. 2006;311:670–674. doi: 10.1126/science.1119983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Tsuchiya N, Wilke M, Frässle S, Lamme VAF. No-Report paradigms: extracting the true neural correlates of consciousness. Trends in Cognitive Sciences. 2015;19:757–770. doi: 10.1016/j.tics.2015.10.002. [DOI] [PubMed] [Google Scholar]
  40. Tsuchiya N, Frässle S, Wilke M, Lamme V. No-report and report-based paradigms jointly unravel the NCC: response to Overgaard and fazekas. Trends in Cognitive Sciences. 2016;20:242–243. doi: 10.1016/j.tics.2016.01.006. [DOI] [PubMed] [Google Scholar]
  41. Tsuchiya N, Koch C. Continuous flash suppression reduces negative afterimages. Nature Neuroscience. 2005;8:1096–1101. doi: 10.1038/nn1500. [DOI] [PubMed] [Google Scholar]
  42. Wilke M, Logothetis NK, Leopold DA. Generalized flash suppression of salient visual targets. Neuron. 2003;39:1043–1052. doi: 10.1016/j.neuron.2003.08.003. [DOI] [PubMed] [Google Scholar]
  43. Wolfe JM. Reversing ocular dominance and suppression in a single flash. Vision Research. 1984;24:471–478. doi: 10.1016/0042-6989(84)90044-0. [DOI] [PubMed] [Google Scholar]
  44. Xu H, Han C, Chen M, Li P, Zhu S, Fang Y, Hu J, Ma H, Lu HD. Rivalry-Like neural activity in primary visual cortex in anesthetized monkeys. Journal of Neuroscience. 2016;36:3231–3242. doi: 10.1523/JNEUROSCI.3660-15.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Zou J, He S, Zhang P. Binocular rivalry from invisible patterns. PNAS. 2016;113:8408–8413. doi: 10.1073/pnas.1604816113. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Ming Meng1
Reviewed by: Brad Duchaine2

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Binocular rivalry is a prominent type of bistable perception (illusion), in which observer's conscious perception automatically switches while stimuli remain unchanged. The present study combines cutting-edge neurophysiological recordings and a novel no-report paradigm to revisit whether macaque inferotemporal (IT) cortex correlates with the animal's conscious percept. The results are provocative, suggesting that a) cells in the IT cortex are modulated by conscious percept; b) single cells may multiplex representation of illusory percept and physical stimulus.

Decision letter after peer review:

Thank you for submitting your article "Representation of conscious percept without report in the macaque face patch network" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Floris de Lange as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Brad Duchaine (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.

Summary:

The study by Hesse and Tsao on "Representation of conscious percept without report in the macaque face patch network" presents confirmatory results reported first in seminal studies by Sheinberg and Logothetis from the 90's showing that neurons in infero-temporal (IT) cortex represent the conscious percept in a binocular rivalry paradigm rather than the physical stimulus. This earlier work has inspired neuroscientists for generations, and the present study is a refreshing update in presenting a novel paradigm to study conscious perception without the necessity of active report.

The authors recorded a large set of neurons with a neuropixel prototype and 32 channel probes in parts of the macaque face patch system, specifically patches ML and AM. They devised a novel task during which perceptual switches between a face and an objects were denoted with specific fixation point locations, so that the percept could be tracked without the need to track manual responses. Thereby, any contributions of the motor system to rivalry could be ruled out. This 'no report' paradigm was first established in humans, and then applied to monkeys trained to fixate for periods of time. The rivalry conditions were compared to physical alternations. The authors show compellingly that a large proportion of ML neurons, and the majority of AM neurons follow the percept during rivalry. The perceptual state could also be decoded from population activity.

The main novelty of this study is the innovation of a new paradigm that can be used to study conscious perception in individuals who are simply trained on fixation. The results (both behavioral and recordings) are compelling. While the study confirms that IT neurons can reflect the percept of an individual, it does not show where that percept is generated, that is whether it reflects feedback signals from PFC, or other neural structures (see Kapoor study), or locally generated, stochastic alternations.

Revisions for this paper:

1) The population result is nice given the novel neuropixels recordings, but perhaps not surprising given that the perceptual state could be determined from the majority of single units? Were you able to decode the percept also from the 32 channel probes? Perhaps discuss in greater detail what the population analysis adds.

2) Rivalry switches often occur slowly, often on the order of seconds. It would be helpful to include more details regarding the behavior (e.g. length of fixation, duration of stable percepts, frequency of switches). Was the paradigm presented in a trial structure? How long were these trials? Was this different for humans and monkeys? Your time scale for the human studies denotes several seconds, the monkey timescales are typically a few hundred milliseconds, which is very short. As you know, it often takes time to even become aware of a switch. More detail on these issues will be helpful given that the no report task is the major advance of the study.

3) The authors mention that Frassle et al., 2014, used a no-report paradigm, but I'm curious why the authors didn't discuss other rivalry studies that have used no-report paradigm such as Brascamp et al., 2015, Zou et al., 2016, and Xu et al., 2016. These papers used quite different approaches than the current study, but they seem to establish that the neural modulations that accompany rivalry can occur in the absence of a report.

4) What does it really mean when the cells carry information about both perceived and unperceived stimulus (as shown in Figures 4B and 5B)? If face patch neurons behave in a way of multiplexing information as the authors suggested, then what role might they take in the neural circuits underlying the conscious percept? It has been reported that the responses of high-level visual neurons to suppressive stimuli were almost eliminated (Sheinberg and Logothetis, 1997). Could the authors explain their differences and elaborate why information multiplexing in IT neurons happen only when perceptual reports are not demanded?

5) While I think the novel no-report design very smart, I wonder how accurate it is. The inferred percepts might be distorted by many factors, such as piecemeal rivalry. Such paradigm imperfection might weaken the average response modulation in rivalry condition, as you might oversimplify the percept, which in fact is not complete face or object. If so, does the less pronounced modulation during rivalry than in physical condition truly reflect neural representation, or a side effect from mislabeling?

Revisions expected in follow-up work:

How was the ISCAN system integrated with the goggles? What was the quality of eye movement measurements? Did you examine microsaccades that also can contribute to switches?

eLife. 2020 Nov 11;9:e58360. doi: 10.7554/eLife.58360.sa2

Author response


[…] The main novelty of this study is the innovation of a new paradigm that can be used to study conscious perception in individuals who are simply trained on fixation. The results (both behavioral and recordings) are compelling. While the study confirms that IT neurons can reflect the percept of an individual, it does not show where that percept is generated, that is whether it reflects feedback signals from PFC, or other neural structures (see Kapoor study), or locally generated, stochastic alternations.

We would like to thank the reviewers for both the compliments and constructive criticism on the manuscript. We have revised the paper incorporating all the feedback and believe that the manuscript is significantly stronger due to this process.

In addition to the valuable revisions suggested by the reviewers, we are happy to announce that we have also performed a new experiment. Even though the reviewers did not require additional experiments, we believe that this addition adds scientific value to the article by directly addressing the outstanding question whether face patches indeed encode the unperceived stimulus, for which we until now only had suggestive evidence:

Decoding the suppressed stimulus

The finding that modulations in the perceptual condition are weaker and responses are distributed less bimodally on a single-trial basis suggested that cells may be multiplexing information about not only the consciously perceived stimulus but also the suppressed, subconscious stimulus. However, since in all previously recorded sessions binocular rivalry stimuli consisted of only two rivalling images, this could not be shown directly. In a new experiment, we therefore used three images, A, B, and C, and presented two different binocular rivalry stimuli made of image pairs (A,B) and (A,C), respectively (Figure 6A). This allowed us to compare trials where A was consciously perceived but the suppressed stimulus was either B or C, i.e., the animal’s conscious perception was the same in both types of trials, and only the suppressed stimulus varied. We asked whether we can decode the suppressed stimulus, i.e., distinguish between trial types (A,B) and (A,C) based on neural responses, where the image name in bold indicates the consciously perceived image, as inferred by eye movements. We performed this experiment while recording from face patch ML with a 64 channel S-probe. The decoding accuracy for distinguishing the two trial types with different suppressed images was 74% (Figure 6B). This indicates that face cells do encode the subconscious stimulus. Do the same cells multiplex information about both the conscious and subconscious stimulus or are there two distinct subpopulations, with one population encoding the conscious stimulus and another encoding the subconscious stimulus? To address this question, we compared modulation indices for the dominant stimulus with modulation indices for the suppressed stimulus for each cell. For the former, we fixed the suppressed stimulus while varying the dominant stimulus, i.e., 𝑀𝐼𝑑𝑜𝑚𝑖𝑛𝑎𝑛𝑡 = (𝑅AB− 𝑅A𝐂)/(𝑅A𝐁 + 𝑅A𝐂 ), and for the latter we fixed the dominant stimulus while varying the suppressed stimulus, i.e., 𝑀𝐼𝑠𝑢𝑝𝑝𝑟𝑒𝑠𝑠𝑒𝑑 = (𝑅𝐀B −𝑅𝐀C)/(𝑅𝐀B + 𝑅𝐀C ). We found a positive correlation between dominant stimulus modulation indices and suppressed stimulus modulation indices (𝑝 = 1.4 × 10−6, Pearson’s 𝑟 = 0.55, 𝑛 = 66 physically selective cells, Figure 6C). This suggests that cells that are strongly modulated by the dominant stimulus tend to be similarly modulated by the suppressed stimulus. Thus, we did not find evidence for separate populations of cells that encode conscious and unconscious stimulus, respectively.

A natural question arising from the decoding accuracy of 74% is whether this could be due to mislabeling by the no-report paradigm. On some trials, the conscious percept may have been mislabeled as (A,B) or (A,C) and actually have been (A,B) or (A,C), respectively. In this case, even if cells only encode the conscious percept and not the suppressed stimulus, the decoding accuracy may have been higher than chance because on those mislabeled trials, the decoder successfully discriminated based on a difference in conscious percept. The following calculation addresses this concern: We will estimate the worst-case decoding accuracy increase we could expect from these mislabelings under the null hypothesis that neurons do not encode the suppressed stimulus. Within image pair (A,B), we could decode (A,B) vs. (A,B), i.e., whether A or B was consciously perceived as in Figure 4D, with 89% accuracy in this session. If we had recorded more neurons, or neurons that were more selective, we would expect a decoding accuracy at least as high. Given the nature of the no-report binocular rivalry paradigm there is physically no difference between trial types (A,B) and (A,B), and hence any information that the decoder was able to acquire must have come from the difference in conscious percept. Thus, we can use 89% as a lower bound for the estimated accuracy of the no-report paradigm of inferring the correct conscious percept in this session. Under the null hypothesis that neurons only encode the conscious percept, the decoding accuracy for distinguishing (A,B) from (A,C) for 89% of trials should be chance (since for these trials, the conscious percept is correctly decoded as A). For the remaining 12% of trials, the conscious percept may have been B or C, respectively. Even if the decoder can decode all of these mislabeled trials with 100% accuracy (which is an overestimate), the decoding accuracy across all trials would be at most 89% × 50% + 11% × 100% = 55.5%. So even in the worst-case, the mislabeled trials would not lead to the observed decoding accuracy of 74%. This suggests that face cells do indeed encode the suppressed image.

We have incorporated the new experiment and associated analyses into the revised manuscript. We believe this additional evidence significantly strengthens the paper, and raise it from a mostly confirmatory study to one that challenges the currently dominant concept of how rivalrous stimuli are represented in IT cortex (Figure 6D, Model I versus Model II).

Revisions for this paper:

1) The population result is nice given the novel neuropixels recordings, but perhaps not surprising given that the perceptual state could be determined from the majority of single units? Were you able to decode the percept also from the 32 channel probes? Perhaps discuss in greater detail what the population analysis adds.

We agree that the perceptual modulation of single units predicts that perceptual state can be decoded. The decoding analysis is a proof of concept that perceptual content can be decoded on a single-trial basis with accuracies much higher than chance (95% for physical and 78% for perceptual on average across sessions). This is something we could not achieve with single electrodes: When performing the decoding with single neurons, decoding accuracies were merely 61% ± 12% for physical and 55% ± 6% for perceptual (mean accuracy ± standard deviation across neurons). While the use of Neuropixels prototypes represents an innovation in terms of number of simultaneously recorded channels, it was not necessary to use Neuropixels to decode the percept above chance. Indeed, of the 12 data points in Figure 4D, only three sessions included Neuropixels data. On the 9 other sessions, we recorded with two 32-ch. S-probes in ML and AM and still obtained decoding accuracies much higher than chance. We make this clearer in the text now:

“Recordings were performed using tungsten electrodes (FHC) with 1 MΩ impedance and, after correct targeting was confirmed, with 32-channel S-probes (Plexon) with 75 µm and 100 µm inter-electrode distance, and, in three sessions, with passive Neuropixels-like probe prototypes (IMEC) (Dutta et al., 2019; Jun et al., 2017; Trautmann et al., 2019).”

What the population analysis really adds in our opinion is that we can ask how conscious percepts are encoded during binocular rivalry on single trials. Previous

electrophysiological studies averaged across trials (e.g., Sheinberg and Logothetis, 1997) and found weaker modulation on average for binocular rivalry as compared to physical switches. However, it is unclear whether this weaker modulation strength was the case across all trials or whether it arose from mislabeling of percept on some trials. Therefore, a common perception is that in IT most cells reflect conscious perception exactly (see, e.g., reviewer comment #4 below). Our single-trial analysis of large numbers of simultaneously recorded cells shows that the distribution of single-trial responses during binocular rivalry is less bimodal and spans a smaller range, indicating that cells are truly more weakly modulated during binocular rivalry. This raised the interesting possibility that cells may multiplex information about the veridical physical stimulus and the conscious percept, which we were able to confirm by decoding the suppressed stimulus from a population of simultaneously recorded neurons. We emphasize this fact in the Introduction:

“In a second innovation, we performed electrophysiological recordings using a novel 128-electrode site Neuropixels-like probe that allowed us to measure responses from large numbers of cells simultaneously. […] Inter-trial averaging confounds these two possibilities; to distinguish them, it is critical to compare perceptual versus physical response modulations for single trials.”

2) Rivalry switches often occur slowly, often on the order of seconds. It would be helpful to include more details regarding the behavior (e.g. length of fixation, duration of stable percepts, frequency of switches). Was the paradigm presented in a trial structure? How long were these trials? Was this different for humans and monkeys? Your time scale for the human studies denotes several seconds, the monkey timescales are typically a few hundred milliseconds, which is very short. As you know, it often takes time to even become aware of a switch. More detail on these issues will be helpful given that the no report task is the major advance of the study.

Thank you for this helpful comment. We have now clarified and supplemented the pertaining information in the Materials and methods section. The binocular rivalry stimuli were presented continuously, but fixation spot positions changed at regular intervals and we defined a trial structure based on that. For monkey experiments, the duration of each trial duration was 800 ms (i.e., fixation spots jumped to a new position every 800 ms). For the human experiment, we set the trial duration to 2000 ms, since the study participants had not been extensively trained on the task unlike monkeys and hence needed more time to saccade to the jumping fixation spots. We have now clarified these details in the text:

“Subsequently, for the main experiment, stimuli contained one or two fixation spots at one of four possible locations (top, bottom, left, and right, 1 degree from the center) and were presented for 800 ms ON time and 0 ms OFF time. […] During the binocular rivalry condition, even though the same stimulus was presented continuously, we refer to the 800 ms duration, after which the two fixation spots would change position, as one trial.”

“For human subjects, stimuli were identical except that the trial duration was 2000 ms, since they had not been extensively trained on the task unlike monkeys and hence needed more time to saccade to the jumping fixation spots.”

The trial duration determines the temporal resolution with which we were able to infer switches in percept. However, the trial duration was significantly lower than the average switching time of the percept: In monkeys, median dominance duration was 7.2 seconds for faces and 7.2 seconds for objects. In humans, median dominance duration was 8 seconds for faces and 10 seconds for objects as estimated from fixation patterns, and 8.1 seconds for faces and 8.3 seconds for objects as estimated from reports. We now include the dominance durations in the Results section of the manuscript:

“To account for individuals’ eye dominance, we balanced the contrasts of the stimuli in the two eyes so that the monkey followed both fixation spots equally often in the rivalry condition. […] Similarly, in human subjects median dominance durations were 8 seconds for faces and 10 seconds for objects as estimated from fixation patterns, and 8.1 seconds for faces and 8.3 seconds for objects as estimated from reports.”

3) The authors mention that Frassle et al., 2014, used a no-report paradigm, but I'm curious why the authors didn't discuss other rivalry studies that have used no-report paradigm such as Brascamp et al., 2015, Zou et al., 2016, and Xu et al., 2016. These papers used quite different approaches than the current study, but they seem to establish that the neural modulations that accompany rivalry can occur in the absence of a report.

We thank the reviewers for directing our attention to these interesting alternative approaches to no-report paradigms and have added them to the Discussion:

“Alternative approaches to the no-report paradigms of Frässle et al., 2014, have been developed in which the monkey or human subject is unaware of when a perceptual switch is happening and hence cannot report it, either due to anesthesia or due to the difference in stimuli being too subtle to report. […] Thus, to the best of our knowledge, the current study reveals representation of the conscious percept in IT cells in the most confound-free way to date.”

4) What does it really mean when the cells carry information about both perceived and unperceived stimulus (as shown in Figures 4B and 5B)? If face patch neurons behave in a way of multiplexing information as the authors suggested, then what role might they take in the neural circuits underlying the conscious percept? It has been reported that the responses of high-level visual neurons to suppressive stimuli were almost eliminated (Sheinberg and Logothetis, 1997). Could the authors explain their differences and elaborate why information multiplexing in IT neurons happen only when perceptual reports are not demanded?

It would be hard to imagine how a circuit mechanism within IT could generate switches of conscious percept if IT cells did not encode any information about the suppressed stimulus, since the neural state would be indistinguishable from that to an unambiguous stimulus. The additional experiment described above confirms that cells do indeed multiplex information about the perceived and unperceived stimulus, as both the perceived stimulus and the unperceived stimulus can be decoded from the population, using different decoders. Figure 6C suggests that there are not two distinct populations for encoding perceived and unperceived stimulus, respectively, but the same neuron may have mixed selectivity for perceived and unperceived stimulus. This leaves open the possibility that IT or downstream areas are involved in switches of conscious percept. We mention this in the Discussion:

“To directly test this hypothesis, we presented more than one binocular rivalry stimulus, created from pairs of three images, and found that the subconscious stimulus could indeed by decoded from face patch activity. […] It remains an open question where and how the conscious percept is ultimately isolated from the suppressed stimulus to produce conscious awareness of the former and not the latter.”.

It is a common misconception, which we also had at the outset of this project, that high level visual neurons in IT reflect conscious percept exactly as physical stimuli. Figure 5 of the original paper by Sheinberg and Logothetis, 1997, on neural correlates of binocular rivalry, shows that cells were significantly more weakly modulated during rivalry. Notably, when the non-preferred stimulus was perceived in rivalry, responses were not eliminated. The original study used single electrodes and averaged across trials, and hence, it could not be determined whether the weaker modulation stemmed from mislabeling on a subset of trials. Importantly, we do not think that the multiplexing happens only if reports are demanded. Instead, the weaker modulation appears to be a hallmark of rivalry whether it is reported or not.

5) While I think the novel no-report design very smart, I wonder how accurate it is. The inferred percepts might be distorted by many factors, such as piecemeal rivalry. Such paradigm imperfection might weaken the average response modulation in rivalry condition, as you might oversimplify the percept, which in fact is not complete face or object. If so, does the less pronounced modulation during rivalry than in physical condition truly reflect neural representation, or a side effect from mislabeling?

The effect of mislabeled and mixture trials is a valid concern. We therefore optimized the stimulus to enhance competition between the stimuli and decrease periods of mixture using a variety of methods including: (1) having the stimulus as small as possible while allowing accurate tracking of fixation patterns (5 degree total), (2) increasing contrast of both eyes’ object images, (3) adding fixation marks to help with fusion, (4) adding orthogonal gratings in the background of the objects to increase local orientation contrast, and (5) applying orientation filters to the object images that were orthogonal in left and right eyes to further increase local orientation contrast. We asked human subjects to report whether and how frequently they perceived mixture during the experiment and all subjects reported that they could see only one of the objects most of the time. See the Materials and methods section:

“During the binocular rivalry condition, even though the same stimulus was presented continuously, we refer to the 800 ms duration, after which the two fixation spots would change position, as one trial. […] Moreover, we applied orthogonal orientation filters (with concentration 𝜎𝑎𝑛𝑔𝑙𝑒 = 0.5°) to the face and object stimuli, respectively, to increase local orientation contrast and further reduce periods of mixture.”

We think that trials with mixture or mislabeled percept did contribute to the weaker modulation averaged across trials assuming that proportions of mislabeling and mixture were similar. However, we think that these factors cannot explain the radically different response single-trial response profiles between rivalrous and unambiguous conditions; the former were much less bimodal than the latter and spanned a smaller range, despite the binocular rivalry condition having been presented in many more trials. We performed simulations of the effect of mixture on the data shown in Figure 5. We assumed different proportions of mixture from 0%-100% and simulated the worst-case effect of mixture (i.e. exactly half-face, half-object) on responses in the physical condition, by averaging the responses to pairs of face and object trials. We used the same statistical test as described in the paper and found that only if we added 50%-70% mixture to the physical trial responses, did they become statistically indistinguishable from binocular rivalry responses, whereas each human subject reported not seeing any mixture on most trials. Note that this analysis is independent of correct labeling of conscious percepts. We now describe this simulation in the Results section:

“Importantly, this difference in response profiles between physical and perceptual conditions was apparent even when pooling across both face and object trials (Figure 5B, middle), and hence cannot be explained by mistakes in inferring the percept from eye movements. […] Yet, under the reasonable assumption that they were similar, trials with mixed or piecemeal percepts cannot account for the difference in response distributions between physical and perceptual conditions.”

Revisions expected in follow-up work:

How was the ISCAN system integrated with the goggles? What was the quality of eye movement measurements? Did you examine microsaccades that also can contribute to switches?

The ISCAN camera recorded the position of one eye through the anaglyph filter. The presence of the filter only slightly impaired the quality of eye movement measurements. We have added this information to the Materials and methods section:

“Eye position was monitored using an eye tracking system (ISCAN). The camera recorded one eye through the red/cyan anaglyph filter.”

We measured the precision of ISCAN eye positions by computing the absolute value of distances between 1 ms adjacent eye data. The median and 99% confidence interval, respectively, were 0.038 degrees and 0.34 degrees. Note that saccades should not contaminate the confidence interval estimate of this jitter, since saccades happen less frequently than every 10 ms. For comparison, the distance between fixation spots was 1.4 or 2 degrees, much larger than the jitter magnitude.

Author response image 1. Quality of eye movement measurements.

Author response image 1.

Histogram shows counts of Euclidean distances between eye positions of adjacent milliseconds in the range from 0 to 1 degree visual angle across all recorded sessions. Median and 99% confidence interval (CI) is shown in orange and yellow, respectively.

We now mention this in the Materials and methods:

“We measured the precision of ISCAN eye positions by computing the absolute value of distances between 1 ms adjacent eye data. […] Note that these confidence intervals should not be contaminated by saccades which occur less frequently than 10 Hz and therefore make up less than 1% of the distribution.”

Given the measurement noise above, we were able to detect saccades over distances of 0.5 degrees or larger, which under some definitions can still be considered microsaccades. It has been reported that perceptual switches happen more frequently around microsaccades (Sabrin and Kertesz, 1980; van Dam and van Ee, 2006). However, in our no-report paradigm we infer the conscious percept based on which fixation spot a subject is saccading to at a given trial, and therefore we can infer percepts with at most the sample rate of saccades. Hence, we cannot determine whether the percept switched more during microsaccades than during static fixation. In terms of neural modulation, we did find that saccades evoked response increases during binocular rivalry, and the response increase was slightly higher when we inferred that the preferred stimulus was perceived compared to the non-preferred, see Figure 5 and Results section:

“We observed response modulations for both physical and perceptual conditions starting around 130 ms after saccade onset (Figure 5A). […] As a consequence, during rivalry the response difference to a saccade between face and object, though significant (𝑝 = 6 × 10−23, two-sample t-test, 𝑁 = 701 saccades for object, 𝑁 = 703 saccades for face), was weaker than during the physical condition.”

References:

Sabrin, H. W., and Kertesz, A. E. (1980). Microsaccadic eye movements and binocular rivalry. Perception and psychophysics, 28(2), 150-154.

van Dam, L. C., and van Ee, R. (2006). Retinal image shifts, but not eye movements per se, cause alternations in awareness during binocular rivalry. Journal of vision, 6(11), 3-3.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Transparent reporting form

    Data Availability Statement

    All data generated or analysed during this study are included in the manuscript and supporting files.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES