Skip to main content
Journal of Vision logoLink to Journal of Vision
. 2025 Sep 10;25(11):6. doi: 10.1167/jov.25.11.6

Binocular cues to 3D face structure increase activation in depth-selective visual cortex with negligible effects in face-selective areas

Eva Deligiannis 1,2,1, Marisa Donnelly 2,2, Carol Coricelli 2,3,3, Karsten Babin 2,4, Kevin M Stubbs 2,4,5, Chelsea Ekstrand 2,5,6, Laurie M Wilcox 6,7,7, Jody C Culham 1,2,8,8
PMCID: PMC12429739  PMID: 40928314

Abstract

Studies of visual face processing often use flat images as proxies for real faces due to their ease of manipulation and experimental control. Although flat images capture many features of a face, they lack the rich three-dimensional (3D) structural information available when binocularly viewing real faces (e.g., binocular cues to a long nose). We used functional magnetic resonance imaging to investigate the contribution of naturalistic binocular depth information to univariate activation levels and multivariate activation patterns in depth- and face-selective human brain regions. We used two cameras to capture images of real people from the viewpoints of the two eyes. These images were presented with natural viewing geometry (such that the size, distance, and binocular disparities were comparable to a real face at a typical viewing distance). Participants viewed stereopairs under four conditions: accurate binocular disparity (3D), zero binocular disparity (two-dimensional [2D]), reversed binocular disparity (pseudoscopic 3D), and no binocular disparity (monocular 2D). Although 3D faces (both 3D and pseudoscopic 3D) elicited higher activation levels than 2D faces, as well as distinct activation patterns, in depth-selective occipitoparietal regions (V3A, V3B, IPS0, IPS1, hMT+), face-selective occipitotemporal regions (OFA, FFA, pSTS) showed limited sensitivity to internal facial disparities. These results suggest that 2D images are a reasonable proxy for studying the neural basis of face recognition in face-selective regions, although contributions from 3D structural processing within the dorsal visual stream warrant further consideration.

Keywords: 3D vision, face processing, functional magnetic resonance imaging, binocular disparity, stereopsis

Introduction

Face perception is essential in real life for recognizing and communicating with other people. This importance is reflected by the extensive research that has been done on face processing and its neural substrates. Indeed, the fusiform face area (FFA) has been empirically declared one of the “most interesting parts of the brain” (Behrens, Fox, Laird, & Smith, 2013) based on the impact of neuroimaging studies of faces. Although the fusiform gyrus has received the most attention in face processing studies (Kanwisher, McDermott, & Chun, 1997), studies in humans and non-human primates consistently identify an interconnected network of regions that support different aspects of face processing (Bernstein, Erez, Blank, & Yovel, 2018; Duchaine & Yovel, 2015; Freiwald, Duchaine, & Yovel, 2016; Freiwald & Tsao, 2010; Grimaldi, Saleem, & Tsao, 2016; Haxby, Hoffman, & Gobbini, 2000).

Importantly, however, the study of how faces are processed in the brain can more accurately be described as the study of how images (typically photographs) of faces are processed. The main reason for the use of images is that, unlike real faces, images can be easily presented and carefully controlled while participants undergo functional magnetic resonance imaging (fMRI). Relatively few studies consider that real stimuli, including faces, contain a wealth of additional information that is missing from images (Snow & Culham, 2021). Yet, even a seminal model of face processing (Bruce & Young, 1986) emphasized that the information from a two-dimensional (2D) image of a face cannot explain how real-world recognition is tolerant to changes in orientation, expression, and lighting. Of course, faces in real life are fully three-dimensional (3D), including binocular disparity derived from the different perspectives of the two eyes. In contrast, images of faces only convey a sense of 3D structure from preserved monocular depth cues such as texture and shading, which conflict with binocular signals that the image is a flat surface. Binocular cues, in combination with oculomotor cues (particularly vergence) may be used to infer the physical size and distance of a face from the observer, which is commonly misrepresented in images. Other factors in face realism missing from images have been shown to be important. Specifically, compared with static face images, dynamic videos of faces, with changing expression and eye gaze, have been shown to activate a broader network of brain areas than 2D static images, including the posterior superior temporal sulcus (for a review, see Pitcher & Ungerleider, 2021). This raises the question of whether binocular cues, like dynamic cues, might enhance neural face processing.

The addition of binocular vision can provide important information about both 3D structure (disparity gradients) and 3D distance (specifically, the egocentric distance from the observer) that are absent in 2D images of faces. First, stereoscopic 3D information about the volumetric structure of a face could theoretically facilitate face detection and recognition (even across different viewpoints such as from frontal to intermediate views). If, for example, a person has a particularly long nose or a sharp chin, the addition of stereopsis might allow them to be recognized faster, even across different viewpoints. Second, richer information about the distance of a face from an observer affects the interactions possible with an individual. For example, you might shout to a person 10 meters away or reach out to shake the hand of someone within arm's reach. However, images often misrepresent the size and distance of faces (as in a billboard or a magazine photograph). Interestingly, neuroimaging evidence suggests that brain areas sensitive to 3D structure versus 3D distance are largely dissociable (Durand, Peeters, Norman, Todd, & Orban, 2009).

There is longstanding debate as to whether stimuli, including faces and objects, are represented based on their inferred 3D structures or specific viewpoints of the stimulus as it is rotated in 3D space (reviewed in Peissig & Tarr, 2007). Structural description models suggest that object features are integrated within an internal 3D representation of a stimulus (Biederman, 1985; Marr & Nishihara, 1978), whereas view-based recognition models suggest that invariance is accomplished by multiple representations from different views of a stimulus (Bulthoff, Edelman, & Tarr, 1995; Peissig & Tarr, 2007).

Although both structural and view-based models can explain object recognition, converging behavioral and neural evidence suggests that faces are processed using specialized mechanisms that differ from general object processing (for a review, see Duchaine & Yovel, 2008). Early psychophysical evidence suggested that upright faces are processed holistically (Tanaka & Farah, 1993; Yin, 1969; Young, Hellawell, & Hay, 1987). Moreover, neurophysiological correlates of these effects have been shown in humans (Schiltz & Rossion, 2006; Yovel & Kanwisher, 2005) and in non-human primates (Freiwald, Tsao, & Livingstone, 2009). Even Biederman, one of the strongest proponents of structural description models for object recognition, argued that faces could not be represented by structural descriptions alone because all faces share the same, highly predictable basic structure (Biederman & Kalocsais, 1997). By this view, unlike objects, faces are thought to be processed in a holistic and viewpoint-dependent manner to enable detection of the subtle variations in facial parts that confer identity. In addition to neural evidence for holistic processing, some face-selective areas or patches show at least partial sensitivity to specific views of a face (Andrews & Ewbank, 2004; Axelrod & Yovel, 2012; Freiwald & Tsao, 2010; Pitcher, Walsh, Yovel, & Duchaine, 2007).

Although there is evidence that face processing is viewpoint dependent and holistic, supporting view-based recognition models, this does not mean that 3D features are not relevant. In fact, under natural conditions, holistic representations may be based on different views of the face that incorporate 3D structural information. Certain 3D features could be particularly important for recognizing faces across viewpoints. For example, an observer who looks at a person head-on may extract richer volumetric information about the face than looking at a 2D image taken from the same perspective, allowing them to recognize the same individual more quickly in a three-quarter or profile view. Indeed, stereoscopic 3D faces have been found to evoke broader viewpoint tuning (Deng et al., 2024) and better viewpoint generalization than 2D images (Burke, Taubert, & Higman, 2007).

The addition of stereoscopic depth information has been shown to improve face recognition for both frontal and intermediate views of a face. Behavioral evidence largely shows that stereoscopically presented 3D faces are recognized more accurately than 2D faces (Chelnokova & Laeng, 2011; Eng, et al. 2017; Liu, Laeng, & Czajkowski, 2020; but see also opposing results from Liu, Ward, & Young, 2006). Face identity aftereffects have been observed for faces defined by a range of depth cues (stereopsis, shading, texture, and structure from motion), suggesting that face processing mechanisms have access to binocular disparity information (Dehmoobadsharifabadi & Farivar, 2016). Moreover, these aftereffects generalized across depth cues, suggesting that multiple cues are integrated for processing facial identity. Yet, despite evidence that 3D information impacts face recognition, to our knowledge no studies have explored the neural correlates of these effects.

Although there is a rich literature on binocular disparity processing in early and mid-level stages of the visual system (Cumming & DeAngelis, 2001; Parker, 2007; Welchman, 2016), much less is known about stereoscopic processing within high-level, category-selective regions. In large part, this is because experiments on depth processing have relied largely on simplified visual stimuli, typically random dot stereograms (RDSs) and line elements to probe cortical disparity sensitivity. These fMRI studies have consistently identified strong disparity selectivity in higher order dorsal visual areas—specifically, V3A, V3B, IPS0 (V7), and hMT+ (Backus, Fleet, Parker, & Heeger, 2001; Brouwer, van Ee, & Schwarzbach, 2005; Finlayson, Zhang, & Golomb, 2017; Georgieva, Peeters, Kolster, Todd, & Orban, 2009; Goncalves et al., 2015; Minini, Parker, & Bridge, 2010; Preston, Li, Kourtzi, & Welchman, 2008; Tsao et al., 2003). Both fMRI and single-cell studies in humans have shown that disparity sensitivity extends into the ventral visual stream, including object-selective visual areas in lateral occipital cortex (Decramer et al., 2019; Georgieva et al., 2009; Minini et al., 2010; Preston et al., 2008), and the fusiform gyrus (Gonzalez, Relova, Prieto, & Peleteiro, 2005). Although no studies have explicitly contrasted the neural processing of 3D versus 2D faces, activation in the occipital face area (OFA) was modulated by presenting planar images of faces at different stereoscopically simulated distances (Nag, Berman, & Golomb, 2019).

The goal of the present study was to understand how the addition of binocular disparity modulates face processing in the brain, using fMRI to examine both univariate activation levels and multivariate activation patterns. Face stimuli were full-color, head-and-shoulder photographs of real people (Figure 1). In the 3D condition, stimuli were presented stereoscopically such that the left and right eyes saw different views (Figure 1b). In the 2D condition, the same image was presented to both eyes such that stereopsis conveyed the impression of a flat image (Figure 1a). To optimize realism, unlike most studies of category-selectivity, our stimuli were captured and presented with natural viewing geometry. Specifically, the face subtended an appropriate real-world physical size (on average, 25 cm high) at a comfortable conversational distance (100 cm), and binocular disparities were within a range similar to what would be encountered for real faces at that distance.

Figure 1.

Figure 1.

Experimental conditions and stimulus presentation timing alongside representation of the manipulations in the images viewed by each eye to create the four conditions. (ad) The four static viewing conditions differed in terms of the presence and consistency of binocular disparity and 2D depth information. Binocular disparity was either present (ac) or absent (d). When present, disparities were zero (2D) (a), consitent with natural binocular viewing (3D) (b), or reversed from the normal face configuration (P-3D) (c). Binocular disparity was generated by capturing images of real faces with two cameras separated by the average interpupillary distance. In the 2D condition, the same image was presented to both eyes, and the result was a flat 2D view with zero binocular disparity. For 3D views, the left and right eye images were presented to the respective eye. In the P-3D condition, the left and right eye images were reversed to maintain the same local amplitudes of binocular disparity but with reversed polarity. Finally, in the Mono condition, the right eye was occluded, eliminating binocular depth cues that conflict with monocular cues. Note that, here, right and left eye views have been swapped to enable cross-fusion; see Supplementary Figure S1 for condition examples without the clutter of the accompanying text. (e) Participants wore passive polarizing glasses with a rod attached to a cover on the right eye. Prior to viewing monocular stimuli, participants were visually cued to rotate the rod, moving an occluding cover in front of the right eye. (f) Participants performed a one-back task as they fixated on the cross at the nasion. Trials lasted 1000 ms, with 800-ms stimulus presentation and 200-ms inter-stimulus interval. Photographs were taken by the authors, and written consent was provided by the individuals for their use in academic publications.

We were particularly interested in 3D depth cues to facial structure. Although there are interesting implications for 3D distance in the processing of real faces (Khandhadia, Murphy, Koyano, Esch, & Leopold, 2023), to avoid confounding the two we focused our manipulations on the role of 3D facial structure and did not vary 3D distance. Our stimuli conveyed some information about 3D distance based on the combination of vergence and binocular disparity information (Rogers & Bradshaw, 1995), although Linton (2020) asserted that vergence provides little or no distance information when isolated from all other depth information. However, our stereoscopic photographs of face models were taken in front of a flat, non-textured background, to avoid large disparities with the background that would provide additional 3D distance information.

Our crucial contrast was between cortical responses to 3D and 2D faces. Although higher activation for 3D than 2D stimuli would be expected for known human depth-selective areas in occipitoparietal cortex (V3A, V3B, IPS0) and hMT+, our key question was whether 3D faces would evoke different activation levels and patterns than 2D faces in face-selective areas, including FFA, OFA, and the posterior superior temporal sulcus (pSTS). If activation levels and response patterns differed for 3D versus 2D faces in face areas, this would suggest that these regions process binocular information about 3D structure. Alternatively, an absence of such differences would suggest that 2D images are a reasonable proxy for real faces in studies of visual processing.

Although the 3D greater than 2D contrast was the cornerstone of the experiment, we included two additional conditions to further understand which aspects of the stimuli could drive potential differences between our 3D and 2D stimuli. We expected the crucial difference between our 3D and 2D conditions to be the availability of binocular information about volumetric structure, but potential confounds could also affect the results. First, 3D faces contain binocular information that is consistent with our everyday experience of faces being convex, whereas our 2D faces would be consistent with looking at less natural faces, as in photographs. Second, for our 2D images, the preserved monocular depth cues (such as shading) conflicted with the binocular disparity information, which indicated zero disparity. To identify whether a preference for 3D faces was driven by disparity information congruent with natural faces or by a preference for disparity generally, a situation with incongruent disparity was induced by swapping the right and left eye views (i.e., pseudoscopic 3D, or P-3D) (Figure 1c). This manipulation, known as pseudoscopy (Wheatstone, 1852), preserves the magnitude of binocular disparity at each point in the display but inverts the direction. For example, in a P-3D face, the disparity between the nose tip and eyes would be the same magnitude as in a real face, but disparity of the nose tip would be consistent with it being farther from the viewer than the eyes. Interestingly, when presented with pseudoscopic natural 3D images, participants often fail to recognize that the depth is inverted (Palmisano, Hill, & Allison, 2016; van den Enden & Spekreijse, 1989). This effect can be powerful for faces, as evidenced by the strong effect of the hollow-face illusion (Gregory, 1970). Note, however, that regional brain activation differences for 3D and P-3D faces could reflect either the stimulus differences (the presence or absence of disparity, the congruency of disparity with monocular cues) or the perceptual differences. Finally, to determine if cue conflict contributes to the unnaturalness of 2D images, one eye was occluded in the monocular condition (Mono), removing both cue conflict and binocular disparity (Figure 1d).

Methods

Participants

The analysis included data from 23 participants (10 females, 13 males; mean age, 24; range, 17–34 years). Data from one additional participant was collected but excluded from the final analysis due to excessive head motion (as detailed below in the MRI data preprocessing section). Participants were recruited from the University of Western Ontario (London, ON, Canada). All participants had normal or corrected-to-normal vision, normal depth perception (stereoacuity thresholds of 40 seconds of arc or better on the Randot Preschool Stereotest), no neurological conditions, and no history of strabismus or amblyopia. Informed consent was obtained prior to the start of the study. Participants received financial compensation for their participation. Experimental procedures were approved by the Non-Medical Research Ethics Board of the University of Western Ontario (protocol 115109), in accordance with the tenets of the Declaration of Helsinki.

Individuals from the university community were recruited to serve as photographic subjects. They provided informed consent for their images to be used experimentally and published in conference presentations and publications.

Apparatus

Stimuli were projected from a light-emitting diode (LED) digital light processing PROPixx 3D MRI-Compatible Projector (VPixx Technologies, Saint-Bruno, QC, Canada), which used a high-speed circular polarizer to present full-color images (8-bit color depth) at a resolution of 1920 × 1080 and frequency of 120 Hz. Here, images were presented using an interleaved left/right eye image presentation method, in which the polarizer alternated polarization from frame to frame, such that each eye received images at 60 Hz. In the scanner, participants wore passive polarized glasses and viewed stimuli on a polarization-preserving back-projection screen. The optical path included two large first-surface mirrors outside the bore of the scanner, each angled at 45° (to enable placement of the projector at a position in the scanner room with the lowest magnetic field) and a first-surface mirror, angled at 45°, above the head coil (to enable the participant to view the screen). First-surface mirrors were used to preserve the polarization of the light and enable simultaneous viewing of stereopairs (i.e., separate images for the left and right eyes).

Images were projected on a screen that was shaped to fit the bore of the scanner (VPixx Technologies), with a horizontal diameter of 54.6 cm (30.5°) and height of 38.5 cm (21.8°). The viewing distance from the eye to the screen was 1 meter. This viewing distance is consistent with comfortable interpersonal distance for face viewing in Canada (Geers & Coello, 2023; Sorokowska et al., 2017). On average the faces, including hair, subtended 14° × 11° of retinal angle (24 cm × 19.6 cm), which is consistent with the natural size of a face.

Participants manually covered and uncovered their right eye before and after monocular condition blocks. To do so, they used their right index finger and thumb to turn a rod affixed to a hinged eye cover attached to the glasses; this action was designed to require only a small, distal body movement to minimize movement artifacts (Figure 1e).

Stimuli

Frontal photographs were taken of 23 young adults with diverse demographics (12 female, 11 male), including the face, head, and shoulders. Many studies of human face processing remove other features to isolate the faces (but see Sinha & Poggio, 1996), particularly hair and clothing; however, because the main goal of this study was to contrast 3D and 2D versions of natural images of people, we did not remove these features. Photographic subjects wore a solid, dark gray t-shirt, and head images were captured in front of an untextured background of the same color as the shirt. The absence of a textured background minimized relative binocular disparities between the face and the background, leaving only disparities due to features within the head.

In the 3D condition, the nasion was aligned to the screen plane (i.e., zero disparity at the nasion); therefore, most of the face appeared behind the screen plane (i.e., uncrossed disparities) except, for example, the nose and forehead, which appeared in front of the screen plane (i.e., crossed disparities). Crossed and uncrossed disparities were reversed in the P-3D condition. The resulting high-quality, naturalistic stereopairs produced compelling, realistic percepts of natural and P-3D faces (see Supplementary Figure S1 for enlarged and uncluttered left and right eye views of each condition for easier cross fusion).

On average, the nearest (nose) and farthest points (ears) on the face had a disparity difference of 0.35° (range, 0.26°–0.43°), corresponding to 10.74 cm of depth (range, 7.76–13.52 cm). In the P-3D condition, the sign of the disparity was reversed. To closely match natural viewing geometry, stereopairs of face models were captured using two mirrorless digital cameras (α7RIII; Sony, Tokyo, Japan) with full frame sensors (35.9 mm × 24 mm). The cameras were positioned facing first-surface mirrors angled toward the models, to achieve a 60-mm separation between the center of the lenses, which is near the average interpupillary distance (Dodgson, 2004). The dual-camera rig faced the models from a distance of 1 meter to match the 1-meter viewing distance used in the scanner. The lines of sight of the cameras were parallel to one another rather than “toed-in” (converged), as recommended to reduce keystone distortions (Allison, 2007). The camera resolution was 42.4 megapixels, which, combined with a 55-mm lens, produced a focal length closest to the widely stated 43-mm focal length of the human eye. The relative aperture (f/5.5) and set design ensured that the depth of field was consistent with the degree of focus of the human eye.

The low-level properties of the images (i.e., color, luminance) were carefully matched to standard, real-world viewing conditions to evoke a compelling percept of realism from the faces. Presenting stereopairs using a polarizing projector avoided the unnatural color distortions common to anaglyphs that use color filters to separate the images presented to each eye (e.g., red–cyan or red–green). A color card was included in one of the raw images per face model to enable color correction in Lightroom (Adobe, San Jose, CA). A neutral background with 50% gray was chosen to minimize ghosting (the presence of features from one eye's image in the other eye's image due to limitations of polarizing filters). To compensate for the differences in luminance display linearity during stimulus generation and projection, which alter the color profile of the images (with the PROPixx projector utilizing a linear gamma function), we applied a custom nonlinear gamma-correction color lookup table that was the inverse gamma of the stimulus generation monitor. The lighting level (5700 K) was chosen to mimic standard indoor lighting conditions, and light sources were positioned to the side of the face models, creating shadows that increased the salience of the pictorial depth cues. Luminance levels were kept constant and chosen to minimize ghosting. To avoid luminance confounds between conditions, both 2D and 3D stimuli were viewed through the polarizer and polarized lenses, with images to the two eyes segregated by interleaving presentation using the blue line technique in the Psychophysics Toolbox extension (Kleiner, Brainard, & Pelli, 2007) in MATLAB 2019b (MathWorks, Natick, MA). In the 2D and Mono conditions, the same image was presented to each channel.

Procedure

Face stimuli were presented in a block design, with one run consisting of two 16-second fixation baseline epochs (at the beginning and end of the run) and eight 24-second condition epochs, each followed by a fixation baseline epoch that lasted for 24 seconds to allow the blood-oxygen-level-dependent (BOLD) signal to return to baseline between visual stimulation periods. An additional 5-second period preceded and followed the Mono condition blocks, during which participants received a visual clue to occlude or disocclude the right eye. To ensure comfortable stereoscopic viewing and to minimize vergence accommodation conflicts (Hoffman, Girshick, Akeley, & Banks, 2008), the faces were positioned so that the overlaid fixation cross was at the plane of the display. One run was 412 seconds: (8 stimulus condition blocks × 24 seconds) + (4 occlusion/disocclusion periods × 5 seconds) + (7 baseline condition blocks × 24 seconds) + (2 beginning/end baseline blocks × 16 seconds). Participants performed eight experimental runs.

The condition epochs each contained 24 trials; in each trial within a block, a different face was presented for 800 ms followed by a 200-ms fixation period (Figure 1f). Baseline periods between epochs enabled estimation of the BOLD response without trial history confounds and provided the necessary time for manual occlusion or disocclusion of the right eye before and after Mono viewing conditions (Figure 1e). Participants performed a standard one-back task in which they were instructed to press a button when two identical trials were presented sequentially, to keep task and attentional demands consistent throughout the run. The four conditions (2D, 3D, P-3D, Mono) occurred twice per run, and condition order was counterbalanced across and within runs to control for order effects.

Functional localizers

Face-selective regions

To independently localize regions that preferentially respond to faces, two additional runs were performed in which participants passively viewed blocks of 2D images from five categories (Figure 2): (1) faces (https://pics.stir.ac.uk/2D_face_sets.htm); (2) objects; (3) places; (4) reachspaces as defined by Josephs and Konkle (2019); and (5) phase-scrambled stimuli. Objects and places were included because the contrast of Faces > (Objects and Places) has often be used to define face-selective regions (Kanwisher et al., 1997; Large, Cavina-Pratesi, Vilis, & Culham, 2008; Rosenke, Van Hoof, Van Den Hurk, Grill-Spector, & Goebel, 2021; Yovel & Kanwisher, 2005). Reachspace scenes were included for use in a related experiment on 3D versus 2D reachspaces but are not relevant here. Scrambled images were generated by phase scrambling the places stimuli. Faces and objects were superimposed on the phase-shifted scrambled images to ensure that all stimuli had the same visual extent. Participants were instructed to maintain fixation on a central point for the duration of the run. Each run contained five sets of five 16-second condition blocks. Each set of five blocks was interleaved with 16-second baseline fixation blocks, also included at the beginning and end of the run. Each stimulus block was comprised of 16 different stimuli, each presented for 800 ms with a 200-ms interstimulus interval. One run was 496 seconds: (25 stimulus conditions + 6 baseline conditions) × 16 seconds. Every participant completed one or two face localizer runs, depending on the amount of time available in the session after the main experimental runs were collected.

Figure 2.

Figure 2.

Example stimuli from the category localizer. Participants passively viewed five categories of stimuli presented in a block design. The category localizer was used to define face-selective regions independent from the experimental data for subsequent ROI analyses. Localizer face stimuli could not be used for publication, so representative face stimuli are shown here. Photographs of faces were taken by the authors, and written consent was provided by the individuals for their use in academic publications.

Depth-selective regions

To identify disparity-preferring brain regions, we performed an additional run to contrast activation for random dot patterns with and without binocular disparity. The stimuli were modified from past fMRI studies by Tsao et al. (2003) and Verhoef, Bohon, and Conway (2015). We employed a two-by-two design that varied the presence of disparity (3D vs. 2D) and the amount of separation between the squares (creating the appearance of a grid vs. no grid) (Figure 3). The addition of the grid condition enabled definition of areas with a preference for disparity and not a simple preference for boundaries or edges. Stimuli were presented in a block design. Each run contained four sets of the condition epochs, with each of the four condition blocks lasting 16 seconds. These sets were interleaved with 16-second baseline fixation blocks, also included at the beginning and end of the run. Participants were instructed to fixate on a small white cross placed at the center of the display and presented at the plane of screen. Each trial lasted 1000 ms, during which the stereograms were presented for 800 ms followed by 200-ms intertrial interval displaying a neutral full-field gray stimulus that was matched for constant luminance. One run was 336 seconds: (16 stimulus conditions + 5 baseline conditions) × 16 seconds. Every participant completed one or two depth localizer runs, depending on the amount of time available.

Figure 3.

Figure 3.

Overview of stimuli used to localize disparity-selective regions of cortex. Schematic of the random dot stereograms presented that create disparity-defined checkerboard stimuli (3D) and control conditions without disparity (2D). Here the black outlines are used to simulate squares with one of five possible disparity levels; however, these outlines were not present in the display. Because differences between 3D and 2D stimuli in the no-grid condition could be explained by the perception of edges, not disparity per se, we also included 3D and 2D conditions with gaps between the elements. Disparity-selective regions would be expected to respond more to 3D than 2D stimuli for both the no-grid and grid conditions (i.e., regardless of the absence/presence of other boundaries).

Stimuli were stationary random-dot stereograms with a 25% dot density and 0.09° × 0.09° dot size presented on a gray background (Figure 3). The dot distribution updated at a frequency of 25 Hz. The disparity checkerboard contained 8 × 6 squares (each square subtended 3.3° × 4° in the no-grid conditions or 3.2° × 3.9° in the grid conditions). The disparity of the squares was equally distributed across five levels (−0.22°, −0.11°, 0°, 0.11°, and 0.22°), with no adjacent squares having the same disparity. In the 2D conditions, all squares had 0° disparity, and the grid conditions were created by the addition of a 10-pixel-wide offset between adjacent squares. Stereograms were black dots on a gray background to minimize ghosting that can occur with polarized projection.

MRI data acquisition

Data were acquired on a 3T MAGNETOM Prisma-Fit MRI scanner (Siemens Healthineers, Erlangen, Germany) at the Centre for Functional and Metabolic Mapping (CFMM) at the University of Western Ontario. To provide full-field binocular viewing without occlusion from the elements surrounding the eyes in a standard Siemens head coil, a custom open-face, 28-channel head coil was used. This coil was modified from a Siemens 32-channel head coil by CFMM technicians. We acquired whole-brain fMRI data, including the eyes and cerebellum, from eight task runs and three localizer runs acquired using a T2*-weighted multi-echo multi-band echoplanar imaging sequence: repetition time (TR) = 1000 ms; slice thickness = 3 mm; in-plane resolution = 3 × 3 mm; number of slices = 60; time to echo (TE) = 9.82 ms, 24.11 ms, and 38.4 ms; flip angle (FA) = 40°; field of view (FOV) = 210 × 210 mm. We used a multi-echo sequence to increase the signal-to-noise ratio in the data and thus improve statistical power by optimally combining all three echo timeseries (Kundu et al., 2017). To optimize the magnetic field homogeneity over the functional planes, a constrained, 3D, phase-shimming procedure was performed (Klassen & Menon, 2004). A high-resolution, T1-weighted reference image was acquired for each subject using an MPRAGE acquisition sequence (TR = 2300 ms, in-plane resolution = 1 mm, TE = 2.98 ms, FA = 9°, FOV = 256 × 256 mm).

MRI data preprocessing

Minimal initial data preprocessing was performed using fMRIPrep 21.0.2 (Esteban et al., 2019; Esteban et al., 2022) (RRID:SCR 016216). For the functional data, a reference volume and its skull-stripped version were generated from the shortest echo of the BOLD run using a custom fMRIPrep step. Head-motion parameters with respect to the BOLD reference (transformation matrices and six corresponding rotational and translational parameters) were estimated using mcflirt (Jenkinson, Bannister, Brady, & Smith, 2002) (FSL 6.0.5.1:57b01774) prior to spatiotemporal filtering. BOLD runs were slice-time corrected to 0.456 second (0.5 of slice acquisition range 0–0.912 second) using 3dTshift from the Analysis of Functional NeuroImages software suite (AFNI; National Institutes of Health, Bethesda, MD) (Cox & Hyde, 1997) (RRID:SCR_005927). The BOLD timeseries (including slice-timing correction) were resampled onto their original, native space by applying the transforms to correct for head motion. The head motion transforms were used as a threshold for exclusion. A participant was excluded if their head motion exceeded a volume-to-volume displacement of 3 mm of translation or 3° of rotation in the majority (4+) of the runs. This threshold resulted in one participant being excluded from the final analysis.

Further TE-dependent denoising was performed using a multi-echo independent component analysis (ICA) denoising pipeline in Tedana 0.0.12 (Ahmed et al., 2022; DuPre et al., 2021; Kundu et al., 2013; Kundu, Inati, Evans, Luh, & Bandettini, 2012). In the Tedana workflow, the minimally processed echo timeseries outputs from fMRIPrep were optimally combined and denoised using a principal component analysis (PCA) followed by ICA. The Tedana PCA decomposed the data into components that were subsequently classified as TE-dependent or TE-independent using the default Akaike information criterion with a seed of 42, which is the least aggressive classification approach. Finally, ICA was implemented to classify BOLD and non-BOLD components, after which BOLD components were retained, and non-BOLD components were removed from the final denoised timeseries.

Final data processing and subsequent statistical analyses were performed in the BrainVoyager QX 22.2 software package (Brain Innovation, Maastricht, The Netherlands). Each functional run was aligned to the participant's intensity non-uniformity-corrected, T1-weighted native space image from fMRIPrep and then transformed into Montreal Neurological Institute (MNI) stereotaxic space (Fonov, Evans, McKinstry, Almli, & Collins, 2009) with resampling to a functional resolution of (3 mm)3. Finally, a temporal high-pass filter was applied to the data, removing frequencies below 3 cycles/run. For multivariate analyses, unsmoothed data were analyzed; data in the univariate analyses were spatially smoothed with a 5-mm Gaussian full-width-at-half-maximum kernel. The general linear model (GLM) was applied to data that had been transformed from raw signal units to percent signal change. Predictors were generated by convolving box-car predictors for each of the conditions (excluding the fixation baseline) with the default double-gamma hemodynamic response function (HRF) in BrainVoyager. We did not include motion parameters as predictors of no interest because TEDANA would have already accounted for such confounds.

Data analysis

Data from the main experiment were analyzed using a random-effects (RFX) GLM with predictors for each of the four experimental conditions. The 5-second periods preceding and following the Mono condition, during which participants were cued to manually occlude/disocclude the right eye, were modeled as a predictor of no interest (a box-car predictor convolved with the double-gamma HRF) to account for any brain activation related to the action. Data from the localizers was modeled with GLMs based on convolved box-car predictors for the main conditions.

We primarily used a region of interest (ROI) approach to optimize the statistical power of the contrasts in the experimental data by independently defining restricted voxels for analysis and reducing the need for conservative corrections that are necessary when performing whole-brain contrasts containing hundreds of thousands of voxels. An MNI brain mask template was applied to the statistical maps for both localizers, excluding voxels outside the template region. As a supplement, we also performed whole-brain voxelwise/searchlight analyses for univariate/multivariate analyses to ensure that the ROI analyses had not excluded other regions that showed robust differences in activation levels/patterns.

Definition of face-preferring ROIs

The face localizer was robust enough to identify functionally distinct regions of interest using a fixed-effects GLM to identify ROIs in a majority of individual participants based on a contrast of Faces > (Objects + Places). This contrast has been used to reliably localize face-selective areas including FFA and OFA (Kanwisher et al., 1997; Large et al., 2008; Rosenke et al., 2021; Yovel & Kanwisher, 2005). In this dataset, we defined individual ROIs including bilateral OFA, FFA, and pSTS using statistical maps of all positively activated voxels at a minimum statistical threshold of t ≥ 2.5. This relatively liberal threshold, uncorrected for multiple comparisons, is justified by the fact that it was used to independently define ROIs that were later evaluated with strict, corrected statistical tests. The use of an overly conservative threshold for individual data may have led to a failure to detect ROIs in more participants.

First, a plausible location of an ROI was identified based on MNI-based atlases (Rosenke et al., 2021 for OFA and FFA; Neurosynth map “face” for pSTS) overlaid on an individual participant's functional activation map. Within or near the expected region outlined by the atlas or map, the voxel with the highest t-value was defined as the peak voxel for the region. The resulting ROI was then defined as the 33 functional voxels of (3 mm)3 with the highest t-values that were contiguous with the peak voxel. Face-selective regions showed largely distinct activation foci for each ROI (OFA, FFA, and pSTS), but, in cases where adjacent regions showed overlapping activation clusters, the clusters were divided by following the spatial gradient of t-values and defining the boundary at the lowest t-value. We could not reliably identify all six ROIs for all subjects, nor could an ROI with 33 functional voxels be reliably identified in all subjects at t ≥ 2.5. Therefore, subsequent univariate and multivariate ROI-based analyses were only performed in regions that could be identified in at least two-thirds of participants (i.e., n ≥ 15), and for each participant an activation focus of at least 16 functional voxels had to be present for each region included (see Table 1 and Figure 4). These criteria resulted in the exclusion of left pSTS from ROI-based analyses. Additionally, for each participant, we used the face ROIs defined by Rosenke et al. (2021) to verify that none of the ROIs overlapped with the susceptibility artifact in the temporal lobe.

Table 1.

Localizer-defined face-selective regions based on individual ROIs, where faces > (objects + places).

Average peak MNI coordinates
Brain areas Number of subjects x ± SEM y ± SEM z ± SEM Average extent (mm3)
Right OFA 22 40 ± 1 −78 ± 1 −9 ± 1 879
Left OFA 17 −38 ± 1 −81 ± 1 −13 ± 1 864
Right FFA 23 40 ± 1 −53 ± 1 −19 ± 1 875
Left FFA 20 −41 ± 1 −53 ± 2 −20 ± 1 872
Right pSTS 18 52 ± 1 −46 ± 2 10 ± 1 891
Left pSTS 12 −41 ± 8 −54 ± 2 −12 ± 1 815
Figure 4.

Figure 4.

Percent overlap of individual subject ROIs in the five face-selective regions included in the final analysis. ROIs were restricted to the 33 functional voxels contiguous with the peak voxel in each region; therefore, the resulting overlap shown here does not reflect the extent of overlap present in the activation maps. Face photograph was taken by the authors with written consent from the individual for use in academic publications.

Definition of depth-preferring ROIs

We defined depth-selective activation by contrasting both 3D conditions against the 2D conditions using a conjunction of (3D > 2D) AND (3D Grid > 2D Grid). The statistical map of all positively active voxels at the group level (t > 3.5; p < 0.002; effective p for a conjunction of independent criteria, p < 0.0022) revealed a large cluster of activation in each hemisphere that overlapped with retinotopic visual areas V3A, V3B, IPS0, and IPS1 as defined by Wang, Mruczek, Arcaro, and Kastner (2015). In addition, activation was observed in the human middle temporal complex, hMT+ as defined by Rosenke et al. (2021). The statistical t map for the conjunction contrast is available at https://identifiers.org/neurovault.image:903805.

To test activation levels in specific dorsal-stream visual areas (V3A, V3B, IPS0, and IPS1) that could not be individually resolved in the localizer data, we performed a conjunction contrast between the group statistical map generated from the localizer data and each unilateral region of interest defined by Wang et al. (2015). For example, rV3A was defined by the overlap between the conjunction contrast: (3D > 2D) AND (3D Grid > 2D Grid) AND (rV3A in the Wang atlas). For the complete list of atlas ROIs tested in separate conjunction contrasts, see Table 2.

Table 2.

Localizer-defined depth-selective regions based on a conjunction contrast between group maps and atlas ROIs.

Peak MNI coordinates
Brain areas Atlas ROI x y z Extent (mm3)
[(3D > 2D) AND (3D Grid > 2D Grid)] AND Wang et al. (2015)
 Right V3A rh_v3a 21 −94 25 2288
 Left V3A lh_v3a −21 −85 19 1892
 Right V3B rh_v3b 30 −88 22 1548
 Left V3B lh_v3b −30 −88 13 1160
 Right IPS0 rh_IPS0 28 −80 20 2088
 Left IPS0 lh_IPS0 −27 −79 25 1998
 Right IPS1 rh_IPS1 24 −67 52 840
 Left IPS1 lh_IPS1 −24 −79 52 1108
[(3D > 2D) AND (3D Grid > 2D Grid)] AND Rosenke et al. (2021)
 Right hMT+ rh_hMT_motion 48 −67 1 844
 Left hMT+ lh_hMT_motion −47 −76 10 530

Univariate ROI and whole-brain analyses

Data from the main experiment were analyzed by extracting mean beta weights within each of the 15 ROIs (Figure 5) and performing paired-sample, two-tailed t-tests between conditions. Given that our fundamental hypothesis was that there would be differences between 3D and 2D conditions, we began by testing this contrast in all face and depth ROIs. A false discovery rate (FDR) correction was performed on the 15 paired-sample t-tests (7 regions of interest × 2 hemispheres + right pSTS) for 3D > 2D, with a minimum significance level of q < 0.05.

Figure 5.

Figure 5.

Face-selective and depth-selective regions of interest defined using independent functional localizers. Face-selective areas shown were based on a superimposition of all individual subject ROIs. Individual adjacent depth-selective areas were resolved using a conjunction contrast between group localizer t-stat maps and atlases for visual and functional regions (Rosenke et al., 2021; Wang et al., 2015). Note that ROI definitions were performed in volumetric space, and the presentation on cortical surfaces here is solely for illustration.

In ROIs for which the main contrast, 3D versus 2D, reached statistical significance at q < 0.05, we performed three subsequent tests to explore the pattern of responses further (Figure 6); if it did not reach significance, these additional contrasts were not performed. This strategy served to reduce the number of contrasts performed and thus the severity of the correction for multiple comparisons. Nevertheless, because the correction for multiple comparisons did not include the follow-up contrasts, they should be viewed as exploratory.

Figure 6.

Figure 6.

Hypothesized univariate results including a priori statistical tests. (a) The main contrast of interest, 3D > 2D (red), was tested in all regions of interest. If the 3D > 2D contrast was significant, four additional statistical tests were performed to examine alternative explanations for a 3D > 2D difference (black). (b) Hypothetical pattern of results if 3D drives a larger response than 2D faces that cannot be explained by the resolution of cue conflict or a preference for disparity generally, which would suggest sensitivity to naturalistic congruent disparity. (c) If 3D evokes greater activation simply due to the presence of non-zero binocular disparity cues, then there would be no statistically significant difference in activation between 3D and P-3D.

The first two follow-up contrasts, between 3D and P-3D conditions and P-3D and 2D conditions, indicated whether the difference between 3D versus 2D faces was driven only by faces with congruent, natural disparities (yielding 3D > incongruent 2D, Figure 6b) or by disparity regardless of direction (yielding 3D ≈ incongruent 2D and P-3D > 2D) (Figure 6c). We alternately hypothesized that the absence of congruent disparity in the 2D condition might elicit greater activation in the ventral visual stream due to increased processing demands required to extract pictorial depth cues from 2D images or that these areas may rely more heavily on monocular cues. Therefore, the third follow-up contrast between 2D and Mono conditions was performed to indicate whether differences were related to cue conflicts (yielding 2D > Mono) or not (yielding 2D ≈ Mono).

Although the main approach was an ROI analysis, to determine whether any additional brain regions other than the ROIs we selected showed differences between conditions, we also performed a whole-brain voxelwise analysis. Specifically, we performed an RFX analysis of the main experimental contrast, 3D – 2D, using a cluster correction based on a Monte Carlo simulation (p < 0.001, cluster size 9 voxels of [3 mm]3 in the BrainVoyager cluster correction plug-in ClusterThresh).

Multivariate ROI and whole-brain analyses

We performed representational similarity analysis (RSA), as described by Kriegeskorte, Mur, and Bandettini (2008), to examine the pattern of activation across voxels both within each ROI as well as across the whole brain. For each participant, a univariate GLM was performed to model activity of every voxel in the brain for each condition within each run. For each participant and each run, every voxel was normalized to its own mean (by subtracting the mean activation across the four conditions from the beta value of each condition). The methods described below were performed both within ROIs and in an exploratory whole-brain searchlight analysis, which was conducted using a sphere with a diameter of seven functional voxels, for a volume of 123 functional voxels of (3 mm)3 each per sphere, or 3321 mm3 in total.

Next, we computed representational similarity matrices (RSMs) for each ROI and each sphere. Pearson correlations were calculated within and between conditions. For each participant, this process was repeated for all possible equal divisions of runs. The final RSA data matrix for each participant was the average correlation matrix across all run splits.

To probe whether conditions were represented differently, we performed paired-samples t-tests on a subset of the correlation values in the RSMs. Following the same logic as the univariate analyses, we initially tested whether 2D and 3D are represented more like themselves than like one another (i.e., testing the correlations both within 2D and within 3D vs. the correlation between 2D and 3D) (Figure 7c). Like the univariate analysis, we performed FDR correction on the 15 initial t-tests. If this test was significant at an uncorrected p < 0.05, to explore what features may be driving the 3D versus 2D difference, we then tested within versus between 3D and P-3D (Figure 7d) and within versus between 2D and Mono (Figure 7e). In the searchlight analysis, statistical t maps were cluster-corrected based on a Monte Carlo simulation (p < 0.001, cluster size 3 functional voxels of [3 mm]3 in the BrainVoyager cluster correction plug-in ClusterThresh).

Figure 7.

Figure 7.

Univariate bar graphs and multivariate result matrices for all 15 ROIs tested. (a, b) Two models were tested in the multivariate analyses using both a ROI and a whole-brain searchlight approach. (c–e) Paired-samples t-tests were performed on a subset of the correlation values; For example, (c) represents the test of the average correlation for 2D – 2D and 3D – 3D versus the correlation between 2D and 3D. (f) Ten ROIs (of the 15 ROIs tested) showed a statistically significant difference for 3D > 2D at *p < 0.05 in the univariate results. Of these regions, seven maintained statistical significance following FDR correction at q < 0.05 (+). Representational similarity matrices for each region are shown below the univariate plots for that region. Three colors have been used to indicate statistically significant differences for three different comparisons: black for comparisons between 3D and 2D, orange for comparisons between 3D and P-3D, and pink for differences between 2D and Mono. The nature of the tests differed slightly between the univariate and multivariate analyses; black solid borders indicate an FDR-corrected statistically significant difference between 3D > 2D in the univariate but (3D:3D and 2D:2D) > (3D:2D) in the multivariate results (q < 0.05), and black dashed borders indicate uncorrected p < 0.05. Orange outlines indicate a statistically significant difference between 3D > P-3D in the univariate results and (3D:3D and P-3D:P-3D) > (3D:P-3D) in the multivariate results (p < 0.05). Pink borders indicate a statistically significant difference between 2D > Mono in the univariate and (2D:2D and Mono:Mono) > (2D:Mono) in the multivariate results (p < 0.05).

We generated two hypothetical models to qualitatively characterize whether facial depth was represented based on the presence or absence of disparity (Figure 7a) or conflict between binocular and monocular cues (Figure 7b). These models served as visual representations of hypothetical patterns but were not statistically tested against RSMs because the t-tests provided more focused hypothesis testing.

Results

Univariate results

Overall, 3D faces evoked stronger responses in depth-selective occipitoparietal cortex, but only evoked a marginal, and largely non-significant, increase across face areas. 3D faces elicited stronger responses than 2D faces in almost all depth-selective areas, with most regions surviving FDR correction for the number of ROIs (Figure 7f). The exceptions were right IPS1 and left hMT+ (p < 0.05 uncorrected) and left IPS1 (p < 0.05 uncorrected).

Of the face-selective regions tested, 3D faces only evoked stronger responses than 2D faces in right OFA, which did not survive FDR correction (passing only p < 0.05 uncorrected). The effect size in regions with 3D > 2D ranged from medium to large. See Table 3 for summary statistics.

Table 3.

Summary statistics for FDR-corrected paired-sample t-tests of 3D – 2D. Notes: Regions significant at an uncorrected p < 0.05 are shown in italics. Effect size is only reported for areas where 3D > 2D reached significance (uncorrected p < 0.05).

Brain area t 23 p FDR-corrected critical value Significant at q < 0.05? Cohen's d Effect size
Right V3A 3.44 0.002 0.01 Yes 0.7 Medium
Left V3A 3.00 0.007 0.01 Yes 0.6 Medium
Right V3B 2.41 0.025 0.027 Yes 0.5 Medium
Left V3B 4.40 0.0002 0.003 Yes 0.9 Large
Right IPS0 2.77 0.011 0.02 Yes 0.6 Medium
Left IPS0 2.91 0.008 0.02 Yes 0.6 Medium
Right IPS1 2.13 0.045 0.03 0.4 Medium
Left IPS1 0.22 0.83 0.05
Left hMT+ 2.41 0.025 0.023 0.5 Medium
Right hMT+ 3.75 0.001 0.007 Yes 0.8 Large
Right OFA 2.19 0.04 0.03
Left OFA 1.61 0.123 0.04 0.3 Small
Right FFA 1.30 0.21 0.04
Left FFA 0.5 0.61 0.05
Right pSTS 0.85 0.408 0.04

In areas with a significant difference between 3D and 2D (after FDR correction), further exploratory paired-samples t-tests revealed that most areas showed greater responses for stimuli containing non-zero disparities (3D and P-3D), regardless of the direction of the disparities and congruency with expectations for faces. Specifically, most depth-selective regions showed greater activation for P-3D faces compared with 2D faces at p < 0.05, including left V3B and V3A, IPS0, and hMT+ bilaterally. Furthermore, no areas showed a statistically significant difference between 3D and P-3D, suggesting that the greater response to 3D faces was driven by the presence of binocular disparities, regardless of whether the direction of the disparities matched the expected polarity of faces. Finally, 2D faces evoked a stronger response than monocular viewing unilaterally in right V3B and right IPS0 (p < 0.05).

Our alternate hypothesis was that 2D faces may elicit greater activation than 3D faces because of monocular and binocular cue conflict. The monocular viewing condition was included as a control condition because it lacked cue conflict. However, in the absence of any areas where 2D > 3D, the greater response for 2D versus Mono is likely reflective of the extent of visual input (i.e., binocular vs. monocular visual inputs) as opposed to the presence or absence of cue conflict. Taken together, these results suggest that bilateral occipitoparietal depth areas have the greatest response to stimuli containing non-zero binocular disparities regardless of their consistency with expected 3D face shape or other monocular depth cues. In contrast to the pattern in depth areas, the presence or absence of non-zero disparity cues did not significantly alter responses in face-selective areas.

At a cluster-corrected threshold of p < 0.001, whole-brain analysis of 3D > 2D only revealed activation around left V3B, which is consistent with the large effect size for 3D > 2D in left V3B in the ROI analysis. Although we performed our analysis on carefully thresholded maps to be consistent with the conventions of the field in correcting for multiple comparisons, others have recently advocated for the provision of maps that use transparent maps to convey the more continuous (rather than artificially binarized) nature of fMRI activation (Sundermann, Pfleiderer, McLeod, & Mathys, 2024; Taylor et al., 2023; Taylor et al., 2025). As such, we have provided the unthresholded statistical t map at https://identifiers.org/neurovault.image:903808. As suggested by the advocates of unthresholded maps, this map shows greater consistency with past fMRI studies (Backus et al., 2001; Brouwer et al., 2005; Finlayson et al., 2017; Georgieva et al., 2009; Goncalves et al., 2015; Minini et al., 2010; Preston et al., 2008; Tsao et al., 2003) that found extensive bilateral activation across a broader swath of regions around V3B, extending anteriorly along the intraparietal sulcus, and around MT+ and surrounding lateral occipitotemporal cortex.

One potential reason why we may have found limited responses to the addition of 3D (vs. 2D) in depth-selective regions and only marginal effects in one face-selective region (right OFA) could be individual differences in stereoscopic vision. For example, Georgieva and colleagues (2009) had participants perform a psychophysical 3D-shape adjustment task and used this to select only participants with the keenest stereoscopic vision (22 of 38). Even within this select group, they found correlations between psychophysical performance and fMRI activation in V3A. Although our screening was not so exclusive and we did not collect psychophysical performance data, we hypothesized that, if there were individual differences in the caliber of stereoscopic vision, this should be reflected in the size of the 3D – 2D effect across different brain regions. To explore the possible role of individual variability in stereo sensitivity, we computed each participant's average beta weight difference (3D – 2D faces) across the 10 depth-selective regions and the five face-selective regions. Many regions showed strong correlations of these differences between other regions. For example, participants who showed a large effect of 3D in left V3A also showed a large difference in left V3A (r = 0.76). As an exploratory analysis, we computed each individual's average 3D – 2D difference across the 10 depth-selective regions and their average across the five face-selective regions and performed a Pearson correlation. This enabled us to test whether stronger individual responses to disparity in the dorsal visual pathway (i.e., higher activation in depth-selective regions) is correlated with stronger responses to 3D faces in face-selective regions. This exploratory analysis of individual differences in response magnitude for 3D versus 2D face stimuli showed a significant correlation between dorsal stream disparity-selective regions and ventral stream face-selective regions (r = 0.52; p < 0.05) (Figure 8). Notably, most participants showed a positive effect (3D > 2D) across depth-selective regions, and the participants with the largest effects in depth-selective regions showed positive effects in face-selective regions. Though only exploratory, this result suggests that future studies should examine individual differences more strategically.

Figure 8.

Figure 8.

Difference in activation magnitude for 3D – 2D faces in disparity-selective areas versus face-selective areas. Increased activation for 3D faces relative to 2D across the 10 depth-selective regions tested (bilateral V3A, V3B, IPS0, IPS1, and hMT+) was significantly correlated with increased activation for 3D – 2D faces across the five face-selective regions (bilateral OFA and FFA and right pSTS; r = 0.52; p < 0.05). Each data point corresponds to an individual participant.

Multivariate results

Interestingly, multivariate analyses revealed differences in how depth-selective areas represent 3D versus P-3D faces (see RSA matrices in Figure 7f). Consistent with the univariate results in depth areas, paired-sample t-tests revealed that 3D faces were represented by different spatial patterns than 2D faces (Figure 7c) in bilateral V3A and left V3B (FDR-corrected q < 0.05; solid black boxes in RSA matrices). Right V3B and bilateral IPS0 showed similar patterns, but the differences did not maintain significance following correction for multiple comparisons (p < 0.05 uncorrected; dashed black boxes in RSA matrices). But, unlike the univariate results, the follow-up test described in Figure 7d showed that incongruent disparity was represented differently from congruent disparity in V3A bilaterally (p < 0.05; white box in RSA matrices for V3A in Figure 7f), despite there being no difference in the overall magnitude of activation evoked by these conditions. Neither V3B nor IPS0 showed significant differences between 3D and P-3D, suggesting that these areas are sensitive to the magnitude of disparity but not to the congruency of the disparity cues. Additionally, the test of 2D versus Mono (Figure 7e) revealed that binocular and monocular viewing elicited different patterns of activation in bilateral V3A, V3B, and IPS0 (p < 0.05; pink boxes in Figure 7f). Like the univariate results, this may be because these areas, which are sensitive to disparity and thus tuned to binocular inputs, encode monocular and binocular inputs differently.

Surprisingly, neither bilateral hMT+ nor IPS1 showed a significant difference in activation patterns between 3D and 2D, despite showing strong differences in activation levels for that same contrast. Furthermore, none of the five face ROIs showed a significant difference between 3D and 2D. Based on our a priori hypotheses, no further statistical tests were performed in any of these regions. Overall, the noise ceiling was higher in depth-selective areas and low in face areas. This suggests that there is greater within-subject variability in activation in face areas than depth areas. Whole-brain searchlight analysis with cluster correction did not reveal any significant clusters in which 3D and 2D were represented differently.

Discussion

Our study extends fMRI research on depth perception from simple stimuli (such as random dots and lines) to higher order stimuli (such as faces) under viewing conditions that closely match the natural environment. Although 3D faces elicited higher activation levels and different activation patterns than 2D faces in depth-selective occipitoparietal regions, we found limited sensitivity to internal facial disparities in face-selective occipitotemporal regions. Specifically, among face-selective regions, we found only a marginal preference for 3D > 2D faces in right OFA, which did not survive correction for multiple comparisons. These results suggest that 2D images are effective proxies for real faces in basic studies of face recognition, at least for frontal faces and tasks comparable to ours. In depth-selective occipitoparietal cortex, V3A was sensitive to the sign of the disparity, whereas other regions responded to non-zero disparities regardless of sign (V3B, IPS0, IPS1, hMT+). Our results with natural stimuli corroborate findings of depth processing from more reductionist stimuli. Although we did not find robust differences beyond known depth-selective regions, the approach we have developed opens new avenues to investigate the contributions of stereoscopic vision for other stimuli and test conditions.

Absence of 3D versus 2D differences in face-selective areas

Our results suggest that the additional information about the internal structure of faces conveyed by binocular vision has little or no reliable effect on fMRI activation in face-selective regions. Surprisingly, the addition of binocular disparity, which provides additional information about structure and form, did not reliably modulate responses in OFA or FFA, although these regions are implicated in the processing of facial form. These results do not provide robust evidence that the previously reported recognition benefit for 3D versus 2D faces is supported by face-selective regions; instead, they may rely on depth-selective occipitoparietal cortex. Furthermore, the lack of sensitivity to 3D faces in face-selective regions suggests that higher order recognition processes do not depend on internal 3D representations as predicted by the structural description model (Biederman, 1985; Marr & Nishihara, 1978). That said, Gauthier and Tarr (2016) recently suggested that the debate between structural description models and view-dependent models of recognition is no longer informative. They suggest that the features contained in image representations may vary depending on the task demands, the viewing context, and the brain region of interest because of evidence that simply changing these parameters can yield results that support either model. Perhaps, then, in our experiment, the additional facial structure information available from disparity was neither necessary nor helpful for performing the relatively easy one-back task. Future fMRI studies with more challenging tasks, such as face discrimination across viewpoints, are needed to evaluate this possibility.

An alternate interpretation of the lack of sensitivity for 3D versus 2D faces in face-selective regions is that these regions are insensitive to 3D cues to internal facial structure. We intentionally restricted our face stimuli and their context to test cortical sensitivity for the presence, direction, and absence of relative disparity. There was very little information in the stimuli regarding their distance from the observer. That is, because of the textureless background, the distance between the faces and the far surface was somewhat ambiguous. Similarly, their absolute distance from the observer was fixed, and binocular cues to distance such as vergence or vertical disparity would likely be imprecise (Howard & Rogers, 2012; Rogers & Bradshaw, 1995). This impoverished distance information, combined with the fact that the faces were presented at a fixed distance, may have weakened the depth effect in OFA, which only showed a trend for 3D > 2D that did not survive correction for multiple comparisons. Face-selective regions, particularly OFA, may be more sensitive to 3D distance information than to subtle 3D structure information to support lower-level aspects of face processing such as parsing and detection of a face relative to the rest of a scene (e.g., Pitcher, Walsh, & Duchaine, 2011; Tsantani et al., 2021).

The distinction between 3D structure and 3D distance can reconcile the absence of 3D selectivity in our study with other studies that did find 3D versus 2D differences in ventral stream regions. A previous study presented planar images of faces (as well as objects, scenes, and scrambled stimuli) at different positions in depth (i.e., virtual distances) relative to the fixation plane and found a preference for near disparities in OFA, regardless of stimulus category (Nag et al., 2019). However, because the stimuli were planar images, binocular vision conveyed only distance information but not 3D structure information. Similar studies using simple lines or RDS patterns have reported effects of 3D distance but not 3D structure in the mid-fusiform gyrus (Durand et al., 2009; Georgieva et al., 2009), although both studies did find an effect of 3D structure in the posterior inferior temporal gyrus (ITG). One important consideration for our results is that our disparity localizer only contained zero-order disparity and not second-order disparity (i.e., 3D curvature as described in Georgieva et al., 2009), so we did not explicitly localize regions sensitive to 3D structures such as the ITG. If we had used a localizer with second-order disparity cues, we may have localized additional higher-order areas sensitive to 3D structure that showed a preference for 3D faces over 2D faces. Furthermore, the macaque inferior temporal lobe has been shown to contain near disparity-selective patches that overlap with the face-selective patches along the posterior portion of the inferior temporal lobe (Verhoef et al., 2015). Additional evidence that regions tuned to faces are also encoding 3D distance information comes from a recent study in macaques that showed that neurons in a face-selective region of the inferotemporal cortex were modulated by 3D physical size, which requires sensitivity to metric spatial information (Khandhadia et al., 2023).

Another consideration is that we instructed participants to maintain fixation to minimize potential confounds from eye movements, but this may have disrupted the way disparity information would typically be processed during free viewing of a face. This, in turn, may have dampened the observed neural response to 3D faces in face-selective regions. Studies using 2D images have shown that viewing faces recruits specific gaze strategies that remain highly stable over time (Mehoudar, Arizpe, Baker, & Yovel, 2014) despite individual (Arizpe, Walsh, Yovel, & Baker, 2017) and cultural (Caldara, 2017) variability. Importantly, these individual gaze patterns are necessary for optimal facial recognition. Identification accuracy was significantly impaired when people were forced to fixate a different location on a face than their optimal area of fixation (Peterson & Eckstein, 2013), and free viewing of faces improved recognition accuracy by 28% compared with fixation (Henderson, Williams, & Falk, 2005). At a neural level, viewing preferred or optimal areas of a 2D face elicited stronger neural face discrimination responses (Stacchi, Ramon, Lao, & Caldara, 2019), and the number of fixations made during free viewing of a 2D face positively correlated with activation in both the FFA and hippocampus (Liu, Shen, Olsen, & Ryan, 2017). Gaze patterns for 3D faces have been shown to differ from 2D stimuli (Chelnokova & Laeng, 2011; Liu et al., 2020), and facial recognition accuracy improved when 3D cues to depth were available (Chelnokova & Laeng, 2011; Eng et al., 2017; Liu et al., 2020). Interestingly, improved recognition accuracy was restricted to upright and not inverted faces (Eng et al., 2017), which suggests that the additional depth information is only useful when it is consistent with expected facial configurations. Free viewing a 3D face would introduce more complex vergence eye movements required to fixate at different parts of the face. Although relative disparity remains unchanged with changing vergence, absolute disparity varies with fixation distance (Howard & Rogers, 2012). The changes in absolute disparity over time can provide additional information regarding 3D form, and, as shown by Quinlan and Culham (2007), these vergence changes are accompanied by neural correlates. Therefore, free-viewing 3D faces may provide stronger cues to 3D form and more strongly recruit areas involved in processing facial form.

One interesting alternative possibility is that simulated 3D faces (i.e., stereoscopic projection to simulate depth) may not elicit the same responses as real faces with 3D physical form. Recent work suggests that binocular cues to perceived depth may be down-weighted in 3D displays compared with physically real displays (Rzepka et al., 2023). Future studies are needed to understand the differences between simulated and physical reality, possibly even comparing simulated faces to real faces, although this would introduce additional potential confounds related to social presence and animacy.

Another possibility is that individual sensitivity to disparity-defined depth varied within our sample, decreasing the relevance of the additional disparity cues to facial structure for some participants. Although all participants met a minimum stereoacuity threshold of 40 arcseconds, a previous study employing a more fine-grained psychophysical depth judgment task found considerable variability in stereo sensitivity (Georgieva et al., 2009). In that study, over 40% of screened participants were deemed ineligible, and even among eligible participants stereo sensitivity varied considerably. This prevalence is consistent with other findings, such as those of Hess, To, Zhou, Wang, and Cooperstock (2015), who reported that more than 30% of a large sample exhibited poor stereo sensitivity. Therefore, in a typical population with variable stereo sensitivity, the additional binocular disparity information in faces may have limited impact. However, it is still possible that such disparity-defined facial structure may be relevant for individuals with high stereo sensitivity. Interestingly, Georgieva and colleagues (2009) found that fMRI activity in left V3A/VIPS was correlated with perceived depth, which suggests that individual variability in disparity sensitivity affects neural responses to depth information. In our data, participants with stronger effects of stereopsis for 3D versus 2D faces in depth-selective areas had positive effects of stereopsis in face-selective areas, consistent with the suggestion of considerable individual differences in the effect of stereopsis on brain responses.

Finally, although single-cell recordings in humans have identified both face-selective neurons (Decramer et al., 2021) and stereo-selective neurons in the fusiform gyrus (Gonzalez et al., 2005), no studies have explicitly tested face-selective neurons for stereo selectivity. It is possible that 3T fMRI lacks the sensitivity required to detect neural responses to 3D structure within face-selective regions. Alternatively, disparity-defined depth may only significantly drive neural activation in the absence of other visual cues to depth. All of our stimuli contained monocular depth information in addition to binocular depth information, so our 3D > 2D contrast was only sensitive to activation above and beyond that elicited by 2D faces with monocular depth information. In contrast, both Gonzalez et al. (2005) and Decramer et al. (2019) used stereograms, contrasting non-zero disparity with zero-disparity, which may bias depth-selective neurons to rely more heavily on stereoscopic cues. Therefore, a ceiling effect from monocular depth information in faces may limit our ability to detect additional activation from binocular depth cues. Future studies could address this by investigating stereo sensitivity in category-selective regions using high-field MRI and face stimuli defined purely by disparity, such as in Dehmoobadsharifabadi and Farivar (2016).

3D versus 2D differences in known depth-selective areas

As predicted, higher order visual areas in occipitoparietal cortex showed greater activation for faces containing non-zero binocular disparities (3D and P-3D compared with 2D). Specifically, activation was greater in bilateral V3A, IPS0, and hMT+, as well as in left V3B and right IPS1 for 3D > 2D (and P-3D > 2D in V3A, LV3B, IPS0, and MT+), but with no significant differences between the activation magnitude for 3D versus P-3D. These results add to a growing body of literature that consistently identifies V3A, as well as V3B, IPS0 (V7), and hMT+, as cortical areas involved in the processing of binocular disparity cues (Brouwer et al., 2005; Finlayson et al., 2017; Georgieva et al., 2009; Goncalves et al., 2015; Li et al., 2017; Minini et al., 2010; Preston et al., 2008; Tsao et al., 2003).

The only significant difference between natural and pseudoscopically presented faces (3D and P-3D) emerged in the multivariate response pattern in V3A. Such limited effects are rather surprising considering that multiple studies have shown sensitivity to near vs. far disparities not only in V3A (Nasr & Tootell, 2020), but across a much larger range of occipito-parietal brain regions, including V2 and V3 (Nasr & Tootell, 2018), V3B (Li et al., 2017), and parietal cortex (Li et al., 2017; Tootell et al., 2022). Interestingly, columns showing preferences for near versus far disparities in parietal cortex appear to be interdigitated with columns preferring face stimuli in near versus far space (Tootell et al., 2022), suggesting that low-level disparity information may be particularly important for face stimuli due to the ecological relevance of determining the distance of other people. Given that our stimuli differed in the predominance of near (crossed) disparities in 3D versus far (uncrossed) disparities in P-3D, we would have predicted more widespread differences between these two conditions. The discrepancies between our findings and earlier studies may be due to differences in the nature of the stimuli (e.g., the range of disparities) or the experimental approaches (e.g., MVPA in our case vs. high-resolution fMRI in Tootell et al., 2022). Alternatively, the discrepancies may result from the nature of the stimuli. Most notably, given observers’ predisposition to see faces as convex, as in the hollow-face illusion (Gregory, 1970; Hill & Bruce, 1993), our participants may have failed to perceive the differences between 3D and P-3D faces. Indeed, people vary in their perception of pseudoscopic stimuli, particularly those who have limited experience with pseudoscopy (Palmisano et al., 2016). If mid- and high-level occipitoparietal regions depend more on the perceived stimulus than the physical stimulus, this may have reduced or eliminated neural differences between 3D and P-3D.

Conclusions

Our results suggest that stereoscopic 3D information does not dramatically alter the neural processing of facial structure in face-selective areas, validating the continued use of 2D images as proxies for real, 3D faces, at least under some circumstances. In our study, face representations may have lacked 3D structural information because the information available in a 2D image was sufficient to perform the simple one-back recognition task. Alternately, 3D structural information may only be processed in the dorsal visual stream, regardless of stimulus category.

The approach we developed here makes it is possible to study high-level stereoscopic vision using naturalistic 3D stimuli with correct real-world geometry. Although we found little contribution of 3D stereoscopic information to the processing of face structure, we expect that binocular cues may nevertheless be informative about the 3D location of a face relative to an observer, both because position differences help segment faces from the background and the distance of faces is ecologically important for how we interact with conspecifics (Khandhadia et al., 2023). Furthermore, stereoscopic 3D information about structure may become more important for other tasks, such as more challenging face discrimination or viewpoint generalization tasks. Our new approach also opens the door to further studies of the contribution of real-world cues to the processing of other categories in the human brain, many of which contain much larger disparities than faces. One open question is whether both 3D structure and distance information are useful for object recognition, which may recruit different processing mechanisms than faces. Another question is whether binocular information about the spatial locations of single objects or the relative locations of multiple objects in complex scenes is important, not only in occipitoparietal depth-selective regions but also in other regions of the dorsal stream involved in guiding actions to targets, where egocentric distance information is important for conveying actability (e.g., Gallivan, Cavina-Pratesi, & Culham, 2009) and crucial to performance.

Supplementary Material

Supplement 1
jovi-25-11-6_s001.pdf (205.8KB, pdf)

Acknowledgments

The authors thank Kyle Gilbert, Joe Gati, and Ravi Menon for development of the 28-channel head coil and Kevin Stubbs for assistance with coding. We thank VPixx, particularly Sophie Kenny, for assistance with optimizing the projector.

Funded by the New Frontiers in Research Fund Exploration Grants program (to JCC) and the Natural Sciences and Engineering Research Council of Canada (Discovery grant to JCC; Research Tools and Instruments grant to JCC; two Undergraduate Student Research Awards to ED) and was supported by grants from the Canada Research Chairs program (to JCC), a Canada First Research Excellence Fund BrainsCAN grant (to Western University), and a Brain Canada Platform Support Grant (to Ravi Menon).

Commercial relationships: none.

Corresponding author: Jody Culham.

Email: jculham@uwo.ca.

Address: Centre for Brain and Mind, Western University, London, Ontario N6A 3K7, Canada.

References

  1. Ahmed, Z., Bandettini, P. A., Bottenhorn, K. L., Caballero-Gaudes, C., Dowdle, L. T., DuPre, E., … Whitaker, K. (2022). ME-ICA/tedana: 0.0.12. Retrieved from https://zenodo.org/records/6461353.
  2. Allison, R. S. (2007). Analysis of the influence of vertical disparities arising in toed-in stereoscopic cameras. Journal of Imaging Science and Technology, 51(4), 317–327. [Google Scholar]
  3. Andrews, T. J., & Ewbank, M. P. (2004). Distinct representations for facial identity and changeable aspects of faces in the human temporal lobe. NeuroImage, 23(3), 905–913, 10.1016/j.neuroimage.2004.07.060. [DOI] [PubMed] [Google Scholar]
  4. Arizpe, J., Walsh, V., Yovel, G., & Baker, C. I. (2017). The categories, frequencies, and stability of idiosyncratic eye-movement patterns to faces. Vision Research, 141, 191–203, 10.1016/j.visres.2016.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Axelrod, V., & Yovel, G. (2012). Hierarchical processing of face viewpoint in human visual cortex. The Journal of Neuroscience, 32(7), 2442–2452, 10.1523/JNEUROSCI.4770-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Backus, B. T., Fleet, D. J., Parker, A. J., & Heeger, D. J. (2001). Human cortical activity correlates with stereoscopic depth perception. Journal of Neurophysiology, 86(4), 2054–2068, 10.1152/jn.2001.86.4.2054. [DOI] [PubMed] [Google Scholar]
  7. Behrens, T. E. J., Fox, P., Laird, A., & Smith, S. M. (2013). What is the most interesting part of the brain? Trends in Cognitive Sciences, 17(1), 2–4, 10.1016/j.tics.2012.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bernstein, M., Erez, Y., Blank, I., & Yovel, G. (2018). An integrated neural framework for dynamic and static face processing. Scientific Reports, 8(1), 7036, 10.1038/s41598-018-25405-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Biederman, I. (1985). Human image understanding: Recent research and a theory. Computer Vision, Graphics, and Image Processing, 32(1), 29–73, 10.1016/0734-189X(85)90002-7. [DOI] [Google Scholar]
  10. Biederman, I., & Kalocsais, P. (1997). Neurocomputational bases of object and face recognition. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 352(1358), 1203–1219, 10.1098/rstb.1997.0103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Brouwer, G. J., van Ee, R., & Schwarzbach, J. (2005). Activation in visual cortex correlates with the awareness of stereoscopic depth. The Journal of Neuroscience, 25(45), 10403–10413, 10.1523/JNEUROSCI.2408-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77(3), 305–327, 10.1111/j.2044-8295.1986.tb02199.x. [DOI] [PubMed] [Google Scholar]
  13. Bulthoff, H. H., Edelman, S. Y., & Tarr, M. J. (1995). How are three-dimensional objects represented in the brain? Cerebral Cortex, 5(3), 247–260, 10.1093/cercor/5.3.247. [DOI] [PubMed] [Google Scholar]
  14. Burke, D., Taubert, J., & Higman, T. (2007). Are face representations viewpoint dependent? A stereo advantage for generalising across different views of faces. Vision Research, 47(16), 2164–2169, 10.1016/j.visres.2007.04.018. [DOI] [PubMed] [Google Scholar]
  15. Caldara, R. (2017). Culture reveals a flexible system for face processing. Current Directions in Psychological Science, 26(3), 249–255, 10.1177/0963721417710036. [DOI] [Google Scholar]
  16. Chelnokova, O., & Laeng, B. (2011). Three-dimensional information in face recognition: An eye-tracking study. Journal of Vision, 11(13):27, 1–15, 10.1167/11.13.27. [DOI] [PubMed] [Google Scholar]
  17. Cox, R. W., & Hyde, J. S. (1997). Software tools for analysis and visualization of fMRI data. NMR in biomedicine: An International Journal Devoted to the Development and Application of Magnetic Resonance In Vivo, 10(4–5), 171–178. [DOI] [PubMed] [Google Scholar]
  18. Cumming, B. G., & DeAngelis, G. C. (2001). The physiology of stereopsis. Annual Review of Neuroscience, 24(1), 203–238, 10.1146/annurev.neuro.24.1.203. [DOI] [PubMed] [Google Scholar]
  19. Decramer, T., Premereur, E., Uytterhoeven, M., Van Paesschen, W., Van Loon, J., Janssen, P., … Theys, T. (2019). Single-cell selectivity and functional architecture of human lateral occipital complex. PLoS Biology, 17(9), e3000280, 10.1371/journal.pbio.3000280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Decramer, T., Premereur, E., Zhu, Q., van Paesschen, W., van Loon, J., Vanduffel, W., … Theys, T. (2021). Single-unit recordings reveal the selectivity of a human face area. The Journal of Neuroscience, 41(45), 9340–9349, 10.1523/JNEUROSCI.0349-21.2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Dehmoobadsharifabadi, A., & Farivar, R. (2016). Are face representations depth cue invariant? Journal of Vision, 16(8):6, 1–15, 10.1167/16.8.6. [DOI] [PubMed] [Google Scholar]
  22. Deng, Z., Gao, J., Li, T., Chen, Y., Gao, B., Fang, F., … Chen, J. (2024). Viewpoint adaptation revealed potential representational differences between 2D images and 3D objects. Cognition, 251, 105903, 10.1016/j.cognition.2024.105903. [DOI] [PubMed] [Google Scholar]
  23. Dodgson, N. A. (2004). Variation and extrema of human interpupillary distance. Proceedings of SPIE, 5291, 36–46, 10.1117/12.529999. [DOI] [Google Scholar]
  24. Duchaine, B., & Yovel, G. (2008). Face recognition. In Fritzsch B. (Ed.), The senses: A comprehensive reference (pp. 329–357). Cambridge, MA: Academic Press, 10.1016/B978-012370880-9.00334-0. [DOI] [Google Scholar]
  25. Duchaine, B., & Yovel, G. (2015). A revised neural framework for face processing. Annual Review of Vision Science, 1(1), 393–416, 10.1146/annurev-vision-082114-035518. [DOI] [PubMed] [Google Scholar]
  26. DuPre, E., Salo, T., Ahmed, Z., Bandettini, P., Bottenhorn, K., Caballero-Gaudes, C., … Handwerker, D. (2021). TE-dependent analysis of multi-echo fMRI with tedana. Journal of Open Source Software, 6(66), 3669, 10.21105/joss.03669. [DOI] [Google Scholar]
  27. Durand, J. B., Peeters, R., Norman, J. F., Todd, J. T., & Orban, G. A. (2009). Parietal regions processing visual 3D shape extracted from disparity. NeuroImage, 46(4), 1114–1126, 10.1016/j.neuroimage.2009.03.023. [DOI] [PubMed] [Google Scholar]
  28. Eng, Z. H. D., Yick, Y. Y., Guo, Y., Xu, H., Reiner, M., Cham, T. J., … Chen, S. H. A. (2017). 3D faces are recognized more accurately and faster than 2D faces, but with similar inversion effects. Vision Research, 138, 78–85, 10.1016/j.visres.2017.06.004. [DOI] [PubMed] [Google Scholar]
  29. Esteban, O., Markiewicz, C. J., Blair, R. W., Moodie, C. A., Isik, A. I., Erramuzpe, A., … Gorgolewski, K. J. (2019). fMRIPrep: a robust preprocessing pipeline for functional MRI. Nature Methods, 16(1), 111–116, 10.1038/s41592-018-0235-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Esteban, O., Markiewicz, C. J., Goncalves, M., Kent, J. D., DuPre, E., Ciric, R., … Gorgolewski, K. J. (2022). fMRIPrep: A robust preprocessing pipeline for functional MRI. Retrieved from https://zenodo.org/records/6476576. [DOI] [PMC free article] [PubMed]
  31. Finlayson, N. J., Zhang, X., & Golomb, J. D. (2017). Differential patterns of 2D location versus depth decoding along the visual hierarchy. NeuroImage, 147, 507–516, 10.1016/j.neuroimage.2016.12.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Fonov, V., Evans, A., McKinstry, R., Almli, C., & Collins, D. (2009). Unbiased nonlinear average age-appropriate brain templates from birth to adulthood. NeuroImage, 47, S102, 10.1016/S1053-8119(09)70884-5. [DOI] [Google Scholar]
  33. Freiwald, W. A., Duchaine, B., & Yovel, G. (2016). Face processing systems: from neurons to real-world social perception. Annual Review of Neuroscience, 39(1), 325–346, 10.1146/annurev-neuro-070815-013934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Freiwald, W. A., & Tsao, D. Y. (2010). Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science, 330(6005), 845–851, 10.1126/science.1194908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Freiwald, W. A., Tsao, D. Y., & Livingstone, M. S. (2009). A face feature space in the macaque temporal lobe. Nature Neuroscience, 12(9), 1187–1196, 10.1038/nn.2363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Gallivan, J. P., Cavina-Pratesi, C., & Culham, J. C. (2009). Is that within reach? fMRI reveals that the human superior parieto-occipital cortex encodes objects reachable by the hand. The Journal of Neuroscience, 29(14), 4381–4391, 10.1523/JNEUROSCI.0377-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Gauthier, I., & Tarr, M. J. (2016). Visual object recognition: Do we (finally) know more now than we did? Annual Review of Vision Science, 2(1), 377–396, 10.1146/annurev-vision-111815-114621. [DOI] [PubMed] [Google Scholar]
  38. Geers, L., & Coello, Y. (2023). The relationship between action, social and multisensory spaces. Scientific Reports, 13(1), 202, 10.1038/s41598-023-27514-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Georgieva, S., Peeters, R., Kolster, H., Todd, J. T., & Orban, G. A. (2009). The processing of three-dimensional shape from disparity in the human brain. The Journal of Neuroscience, 29(3), 727–742, 10.1523/JNEUROSCI.4753-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Goncalves, N. R., Ban, H., Sánchez-Panchuelo, R. M., Francis, S. T., Schluppeck, D., & Welchman, A. E. (2015). 7 Tesla fMRI reveals systematic functional organization for binocular disparity in dorsal visual cortex. The Journal of Neuroscience, 35(7), 3056–3072, 10.1523/JNEUROSCI.3047-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Gonzalez, F., Relova, J. L., Prieto, A., & Peleteiro, M. (2005). Evidence of basal temporo-occipital cortex involvement in stereoscopic vision in humans: A study with subdural electrode recordings. Cerebral Cortex, 15(1), 117–122, 10.1093/cercor/bhh114. [DOI] [PubMed] [Google Scholar]
  42. Gregory, R. L. (1970). The intelligent eye. London, UK: Weidenfeld & Nicolson. [Google Scholar]
  43. Grimaldi, P., Saleem, K. S., & Tsao, D. (2016). Anatomical connections of the functionally defined “face patches” in the macaque monkey. Neuron, 90(6), 1325–1342, 10.1016/j.neuron.2016.05.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural system for face perception. Trends in Cognitive Sciences, 4(6), 223–233, 10.1016/S1364-6613(00)01482-0. [DOI] [PubMed] [Google Scholar]
  45. Henderson, J. M., Williams, C. C., & Falk, R. J. (2005). Eye movements are functional during face learning. Memory & Cognition, 33(1), 98–106, 10.3758/BF03195300. [DOI] [PubMed] [Google Scholar]
  46. Hess, R. F., To, L., Zhou, J., Wang, G., & Cooperstock, J. R. (2015). Stereo vision: The haves and have-nots. i-Perception, 6(3), 2041669515593028, 10.1177/2041669515593028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Hill, H., & Bruce, V. (1993). Independent Effects of lighting, orientation, and stereopsis on the hollow-face illusion. Perception, 22(8), 887–897, 10.1068/p220887. [DOI] [PubMed] [Google Scholar]
  48. Hoffman, D. M., Girshick, A. R., Akeley, K., & Banks, M. S. (2008). Vergence-accommodation conflicts hinder visual performance and cause visual fatigue. Journal of Vision, 8(3), 33, 1–30, 10.1167/8.3.33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Howard, I. P., & Rogers, B. J. (2012). Perceiving in depth: Volume 2. Stereoscopic vision. Oxford, UK: Oxford University Press. [Google Scholar]
  50. Jenkinson, M., Bannister, P., Brady, M., & Smith, S. (2002). Improved optimization for the robust and accurate linear registration and motion correction of brain images. NeuroImage, 17(2), 825–841, 10.1006/nimg.2002.1132. [DOI] [PubMed] [Google Scholar]
  51. Josephs, E. L., & Konkle, T. (2019). Perceptual dissociations among views of objects, scenes, and reachable spaces. Journal of Experimental Psychology: Human Perception and Performance, 45(6), 715–728, 10.1037/xhp0000626. [DOI] [PubMed] [Google Scholar]
  52. Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: A module in human extrastriate cortex specialized for face perception. The Journal of Neuroscience, 17(11), 4302–4311, 10.1523/JNEUROSCI.17-11-04302.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Khandhadia, A. P., Murphy, A. P., Koyano, K. W., Esch, E. M., & Leopold, D. A. (2023). Encoding of 3D physical dimensions by face-selective cortical neurons. Proceedings of the National Academy of Sciences, USA, 120(9), e2214996120, 10.1073/pnas.2214996120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Klassen, L. M., & Menon, R. S. (2004). Robust automated shimming technique using arbitrary mapping acquisition parameters (RASTAMAP). Magnetic Resonance in Medicine, 51(5), 881–887, 10.1002/mrm.20094. [DOI] [PubMed] [Google Scholar]
  55. Kleiner, M., Brainard, D., & Pelli, D. (2007). What's new in psychtoolbox-3? Perception, 36(ECVP Abstract Supplement), 14. [Google Scholar]
  56. Kriegeskorte, N., Mur, M., & Bandettini, P. (2008). Representational similarity analysis – connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2, 4, 10.3389/neuro.06.004.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Kundu, P., Brenowitz, N. D., Voon, V., Worbe, Y., Vértes, P. E., Inati, S. J., … Bullmore, E. T. (2013). Integrated strategy for improving functional connectivity mapping using multiecho fMRI. Proceedings of the National Academy of Sciences, USA, 110(40), 16187–16192, 10.1073/pnas.1301725110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Kundu, P., Inati, S. J., Evans, J. W., Luh, W.-M., & Bandettini, P. A. (2012). Differentiating BOLD and non-BOLD signals in fMRI time series using multi-echo EPI. NeuroImage, 60(3), 1759–1770, 10.1016/j.neuroimage.2011.12.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Kundu, P., Voon, V., Balchandani, P., Lombardo, M. V., Poser, B. A., & Bandettini, P. A. (2017). Multi-echo fMRI: A review of applications in fMRI denoising and analysis of BOLD signals. NeuroImage, 154, 59–80, 10.1016/j.neuroimage.2017.03.033. [DOI] [PubMed] [Google Scholar]
  60. Large, M. E., Cavina-Pratesi, C., Vilis, T., & Culham, J. C. (2008). The neural correlates of change detection in the face perception network. Neuropsychologia, 46(8), 2169–2176, 10.1016/j.neuropsychologia.2008.02.027. [DOI] [PubMed] [Google Scholar]
  61. Li, Y., Zhang, C., Hou, C., Yao, L., Zhang, J., & Long, Z. (2017). Stereoscopic processing of crossed and uncrossed disparities in the human visual cortex. BMC Neuroscience, 18(1), 80, 10.1186/s12868-017-0395-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Linton, P. (2020). Does vision extract absolute distance from vergence? Attention, Perception, & Psychophysics, 82(6), 3176–3195, 10.3758/s13414-020-02006-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Liu, C. H., Ward, J., & Young, A. W. (2006). Transfer between two- and three-dimensional representations of faces. Visual Cognition, 13(1), 51–64, 10.1080/13506280500143391. [DOI] [Google Scholar]
  64. Liu, H., Laeng, B., & Czajkowski, N. O. (2020). Does stereopsis improve face identification? A study using a virtual reality display with integrated eye-tracking and pupillometry. Acta Psychologica, 210, 103142, 10.1016/j.actpsy.2020.103142. [DOI] [PubMed] [Google Scholar]
  65. Liu, Z. X., Shen, K., Olsen, R. K., & Ryan, J. D. (2017). Visual sampling predicts hippocampal activity. The Journal of Neuroscience, 37(3), 599–609, 10.1523/JNEUROSCI.2610-16.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Marr, D., & Nishihara, H. K. (1978). Representation and recognition of the spatial organization of three-dimensional shapes. Proceedings of the Royal Society B: Biological Sciences, 200(1140), 269–294, 10.1098/rspb.1978.0020. [DOI] [PubMed] [Google Scholar]
  67. Mehoudar, E., Arizpe, J., Baker, C. I., & Yovel, G. (2014). Faces in the eye of the beholder: Unique and stable eye scanning patterns of individual observers. Journal of Vision, 14(7):6, 1–11, 10.1167/14.7.6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Minini, L., Parker, A. J., & Bridge, H. (2010). Neural modulation by binocular disparity greatest in human dorsal visual stream. Journal of Neurophysiology, 104(1), 169–178, 10.1152/jn.00790.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Nag, S., Berman, D., & Golomb, J. D. (2019). Category-selective areas in human visual cortex exhibit preferences for stimulus depth. NeuroImage, 196, 289–301, 10.1016/j.neuroimage.2019.04.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Nasr, S., & Tootell, R. B. H. (2018). Visual field biases for near and far stimuli in disparity selective columns in human visual cortex. NeuroImage, 168, 358–365, 10.1016/j.neuroimage.2016.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Nasr, S., & Tootell, R. B. H. (2020). Asymmetries in global perception are represented in near- versus far-preferring clusters in human visual cortex. The Journal of Neuroscience, 40(2), 355–368, 10.1523/JNEUROSCI.2124-19.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Palmisano, S., Hill, H., & Allison, R. S. (2016). The nature and timing of tele-pseudoscopic experiences. i-Perception, 7(1), 2041669515625793, 10.1177/2041669515625793. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Parker, A. J. (2007). Binocular depth perception and the cerebral cortex. Nature Reviews Neuroscience, 8(5), 379–391, 10.1038/nrn2131. [DOI] [PubMed] [Google Scholar]
  74. Peissig, J. J., & Tarr, M. J. (2007). Visual object recognition: Do we know more now than we did 20 years ago? Annual Review of Psychology, 58(1), 75–96, 10.1146/annurev.psych.58.102904.190114. [DOI] [PubMed] [Google Scholar]
  75. Peterson, M. F., & Eckstein, M. P. (2013). Individual differences in eye movements during face identification reflect observer-specific optimal points of fixation. Psychological Science, 24(7), 1216–1225, 10.1177/0956797612471684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Pitcher, D., & Ungerleider, L. G. (2021). Evidence for a third visual pathway specialized for social perception. Trends in Cognitive Sciences, 25(2), 100–110, 10.1016/j.tics.2020.11.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Pitcher, D., Walsh, V., & Duchaine, B. (2011). The role of the occipital face area in the cortical face perception network. Experimental Brain Research, 209(4), 481–493, 10.1007/s00221-011-2579-1. [DOI] [PubMed] [Google Scholar]
  78. Pitcher, D., Walsh, V., Yovel, G., & Duchaine, B. (2007). TMS evidence for the involvement of the right occipital face area in early face processing. Current Biology, 17(18), 1568–1573, 10.1016/j.cub.2007.07.063. [DOI] [PubMed] [Google Scholar]
  79. Preston, T. J., Li, S., Kourtzi, Z., & Welchman, A. E. (2008). Multivoxel pattern selectivity for perceptually relevant binocular disparities in the human brain. The Journal of Neuroscience, 28(44), 11315, 10.1523/JNEUROSCI.2728-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Quinlan, D. J., & Culham, J. C. (2007). fMRI reveals a preference for near viewing in the human parieto-occipital cortex. NeuroImage, 36(1), 167–187, 10.1016/j.neuroimage.2007.02.029. [DOI] [PubMed] [Google Scholar]
  81. Rogers, B. J., & Bradshaw, M. F. (1995). Disparity scaling and the perception of frontoparallel surfaces. Perception, 24(2), 155–179, 10.1068/p240155. [DOI] [PubMed] [Google Scholar]
  82. Rosenke, M., Van Hoof, R., Van Den Hurk, J., Grill-Spector, K., & Goebel, R. (2021). A probabilistic functional atlas of human occipito-temporal visual cortex. Cerebral Cortex, 31(1), 603–619, 10.1093/cercor/bhaa246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Rzepka, A. M., Hussey, K. J., Maltz, M. V., Babin, K., Wilcox, L. M., & Culham, J. C. (2023). Familiar size affects perception differently in virtual reality and the real world. Philosophical Transactions of the Royal Society B: Biological Sciences, 378(1869), 20210464, 10.1098/rstb.2021.0464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Schiltz, C., & Rossion, B. (2006). Faces are represented holistically in the human occipito-temporal cortex. NeuroImage, 32(3), 1385–1394, 10.1016/j.neuroimage.2006.05.037. [DOI] [PubMed] [Google Scholar]
  85. Sinha, P., & Poggio, T. (1996). I think I know that face. Nature, 384(6608), 404, 10.1038/384404A0. [DOI] [PubMed] [Google Scholar]
  86. Snow, J. C., & Culham, J. C. (2021). The treachery of images: How realism influences brain and behavior. Trends in Cognitive Sciences, 25(6), 506–519, 10.1016/j.tics.2021.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Sorokowska, A., Sorokowski, P., Hilpert, P., Cantarero, K., Frackowiak, T., Ahmadi, K., … Pierce, J. D. (2017). Preferred interpersonal distances: A global comparison. Journal of Cross-Cultural Psychology, 48(4), 577–592, 10.1177/0022022117698039. [DOI] [Google Scholar]
  88. Stacchi, L., Ramon, M., Lao, J., & Caldara, R. (2019). Neural representations of faces are tuned to eye movements. The Journal of Neuroscience, 39(21), 4113–4123, 10.1523/JNEUROSCI.2968-18.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Sundermann, B., Pfleiderer, B., McLeod, A., & Mathys, C. (2024). Seeing more than the tip of the Iceberg: Approaches to subthreshold effects in functional magnetic resonance imaging of the brain. Clinical Neuroradiology, 34(3), 531–539, 10.1007/s00062-024-01422-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Tanaka, J. W., & Farah, M. J. (1993). Parts and wholes in face recognition. The Quarterly Journal of Experimental Psychology Section A, 46(2), 225–245, 10.1080/14640749308401045. [DOI] [PubMed] [Google Scholar]
  91. Taylor, P. A., Aggarwal, H., Bandettini, P., Barilari, M., Bright, M., Caballero-Gaudes, C., … Chen, G. (2025). Go figure: Transparency in neuroscience images preserves context and clarifies interpretation. arXiv, 10.48550/arXiv.2504.07824. [DOI] [Google Scholar]
  92. Taylor, P. A., Reynolds, R. C., Calhoun, V., Gonzalez-Castillo, J., Handwerker, D. A., Bandettini, P. A., … Chen, G. (2023). Highlight results, don't hide them: Enhance interpretation, reduce biases and improve reproducibility. NeuroImage, 274, 120138, 10.1016/j.neuroimage.2023.120138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Tootell, R. B. H., Nasiriavanaki, Z., Babadi, B., Greve, D. N., Nasr, S., & Holt, D. J. (2022). Interdigitated columnar representation of personal space and visual space in human parietal cortex. The Journal of Neuroscience, 42(48), 9011–9029, 10.1523/JNEUROSCI.0516-22.2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Tsantani, M., Kriegeskorte, N., Storrs, K., Williams, A. L., McGettigan, C., & Garrido, L. (2021). FFA and OFA encode distinct types of face identity information. The Journal of Neuroscience, 41(9), 1952–1969, 10.1523/JNEUROSCI.1449-20.2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Tsao, D. Y., Vanduffel, W., Sasaki, Y., Fize, D., Knutsen, T. A., Mandeville, J. B., … Tootell, R. B. H. (2003). Stereopsis activates V3A and caudal intraparietal areas in macaques and humans. Neuron, 39(3), 555–568, 10.1016/S0896-6273(03)00459-8. [DOI] [PubMed] [Google Scholar]
  96. van den Enden, A., & Spekreijse, H. (1989). Binocular depth reversals despite familiarity cues. Science, 244(4907), 959–961, 10.1126/science.2727687. [DOI] [PubMed] [Google Scholar]
  97. Verhoef, B.-E., Bohon, K. S., & Conway, B. R. (2015). Functional architecture for disparity in macaque inferior temporal cortex and its relationship to the architecture for faces, color, scenes, and visual field. The Journal of Neuroscience, 35(17), 6952–6968, 10.1523/JNEUROSCI.5079-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Wang, L., Mruczek, R. E. B., Arcaro, M. J., & Kastner, S. (2015). Probabilistic maps of visual topography in human cortex. Cerebral Cortex, 25(10), 3911–3931, 10.1093/cercor/bhu277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Welchman, A. E. (2016). The human brain in depth: How we see in 3D. Annual Review of Vision Science, 2(1), 345–376, 10.1146/annurev-vision-111815-114605. [DOI] [PubMed] [Google Scholar]
  100. Wheatstone, C. (1852). Contributions to the physiology of vision.— Part the second. On some remarkable, and hitherto unobserved, phenomena of binocular vision. Philosophical Transactions of the Royal Society of London, 142, 1–17, 10.1098/rstl.1852.0001. [DOI] [Google Scholar]
  101. Yin, R. K. (1969). Looking at upside-down faces. Journal of Experimental Psychology, 81(1), 141–145, 10.1037/h0027474. [DOI] [Google Scholar]
  102. Young, A. W., Hellawell, D., & Hay, D. C. (1987). Configurational information in face perception. Perception, 16(6), 747–759, 10.1068/p160747. [DOI] [PubMed] [Google Scholar]
  103. Yovel, G., & Kanwisher, N. (2005). The neural basis of the behavioral face-inversion effect. Current Biology, 15(24), 2256–2262, 10.1016/j.cub.2005.10.072. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
jovi-25-11-6_s001.pdf (205.8KB, pdf)

Articles from Journal of Vision are provided here courtesy of Association for Research in Vision and Ophthalmology

RESOURCES