Environmental surfaces and the compression of perceived visual space

Zheng Bian; George J Andersen

doi:10.1167/11.7.4

. Author manuscript; available in PMC: 2011 Jul 14.

Published in final edited form as: J Vis. 2011 Jun 7;11(7):4. doi: 10.1167/11.7.4

Environmental surfaces and the compression of perceived visual space

Zheng Bian ¹, George J Andersen ²

PMCID: PMC3136083 NIHMSID: NIHMS306371 PMID: 21669858

Abstract

The present study examined whether the compression of perceived visual space varies according to the type of environmental surface being viewed. To examine this issue, observers made exocentric distance judgments when viewing simulated 3D scenes. In 4 experiments, observers viewed ground and ceiling surfaces and performed either an L-shaped matching task (Experiments 1, 3, and 4) or a bisection task (Experiment 2). Overall, we found considerable compression of perceived exocentric distance on both ground and ceiling surfaces. However, the perceived exocentric distance was less compressed on a ground surface than on a ceiling surface. In addition, this ground surface advantage did not vary systematically as a function of the distance in the scene. These results suggest that the perceived visual space when viewing a ground surface is less compressed than the perceived visual space when viewing a ceiling surface and that the perceived layout of a surface varies as a function of the type of the surface.

Keywords: depth, space and scene perception, 3D surface and shape perception

Introduction

An important ability for human observers is to perceive the layout of three-dimensional space and use this information to guide behavior. Previous studies have examined how information from the physical environment is used to perceive 3D space (Blank, 1961; Foley, Ribeiro-Filho, & Da Silva, 2004; Koenderink, van Doorn, & Lappin, 2000; Luneberg, 1947). Significant distortions in perceived size, shape, and distance suggest that visual space is not Euclidean (e.g., Baird & Biersdorf, 1967; Gilinsky, 1951; Levin & Haber, 1993; Norman, Todd, Perotti, & Tittle, 1996; Ooi, Wu, & He, 2006; Wagner, 1985). For instance, Gilinsky (1951) had observers successively set equivalently appearing depth intervals on a ground plane and found that the depth extent of the interval increased with increased distance from the observer. A similar result was obtained by Blank (1961) using a bisection task. These results, considered together, suggest that visual space is not uniform but is compressed and is best described mathematically by a hyperbolic function.

Recent findings suggest that the compression of visual space may be influenced by factors such as the depth cues available to the observers (full cue vs. reduced cue) and whether eye, head, and body movements were allowed (Wagner, 1985). This suggests that the visual system may take advantage of (or be limited to) the information available in the environment and may construct representation of visual space in accordance with available information (Indow, 1991). Previously, we have found that the perception of distance varies according to the type of environmental surfaces present in the scene. Specifically, we found that observers organize the ordinal depth within a scene in accordance with a ground surface as compared to other environmental surfaces (Bian, Braunstein, & Andersen, 2005, 2006). Given the importance of the ground surface in organizing the ordinal depth within a scene, we examined in the present study whether the compression of visual space is different for ground and ceiling surfaces.

The importance of the ground surface in perceiving the layout of 3D scenes was discussed approximately 1000 years ago in Alhazen’s (1989, translation) writings and more recently by Gibson (1950) in his “ground theory.” Recent studies have examined the unique role of the ground surface in the perceptual organization of the 3D space by comparing it directly with other environmental surfaces, especially the ceiling surface. For example, Epstein (1966) found that the “height in the picture” cue was less effective when a ceiling surface was presented as compared to when a ground surface was presented. McCarley and He (2000, 2001) found that visual search was faster on an implicit ground surface than on an implicit ceiling surface defined by binocular disparity. Their finding was extended by Morita and Kumada (2003) who showed superior visual search performance on a ground surface defined by pictorial cues. Champion and Warren (2010) also obtained an advantage of the ground surface as compared to the ceiling surface in 3D size estimation. Bian et al. (2005) found that when the ground surface and the ceiling surface provided conflicting information about the relative distance of objects in a scene, observers used the information on the ground surface to determine the layout of the scene. They referred to this result as the ground dominance effect. In a follow-up study, Bian et al. (2006) showed that the ground dominance effect was mainly due to the differences in the projections of ground and ceiling surfaces, with visual field location having a minor effect. Recent research has also found a ground dominance effect for older observers, although the magnitude of the effect was smaller than that found for younger observers (Bian & Andersen, 2008). Finally, using a change detection paradigm, Bian and Andersen (2010) found that changes to a ground surface or objects on a ground surface were detected faster than changes to a ceiling surface or objects attached to a ceiling surface and that this advantage was mainly due to superior encoding, rather than retrieval and comparison, of ground surface information.

The unique role of the ground surface in the perceptual organization of 3D scenes is generally attributed to the ground surface being universal whereas other environmental surfaces are present only in artificial environments, such as buildings (Gibson, 1950). The ground surface supports almost all objects and the locomotion of most land-dwelling animals either directly (Gibson, 1950) or indirectly through a series of “nested contact relations” (Meng & Sedgwick, 2001, 2002). Through evolution, our visual system may have been adapted to the perspective structure of the ground surface and, consequently, is able to process the information on the ground surface more efficiently as compared to other environmental surfaces (McCarley & He, 2000). One explanation for this advantage was proposed by He and Ooi (2000). According to their “quasi-2D” theory, the visual system may encode the location of objects on a common visual surface using a quasi-2D coordinate system instead of a 3D Cartesian coordinate system. The benefit of this approach is to reduce computational demand when encoding a representation of the scene. It is possible that the degree to which the visual system uses this quasi-2D coordinate system may be greater for a ground surface than for other environmental surfaces. As a result, the encoding of object locations and relative distance between objects on a ground surface is more veridical than on a ceiling surface.

These studies, considered together, suggest that the perceived visual space when viewing a ground surface may be different from the perceived visual space when viewing a ceiling surface. Specifically, the perceived distance on a ceiling surface may be more compressed than on a ground surface. One recent study by Thompson, Dilda, and Creem-Regehr (2007) showed that the perceived egocentric distance on a ceiling surface, measured using a blind-walking task, was as accurate as that on a ground surface. In the current study, we used a perceptual matching task to examine exocentric distance judgments on a ceiling as compared to a ground surface and to examine whether the compression of visual space when viewing a ceiling surface differed from the compression of visual space when viewing a ground surface.

The experiments were conducted using computer-generated 3D displays in which texture and motion parallax information was present. In Experiment 1, we used an L-shaped matching task similar to that used by Feria, Braunstein, and Andersen (2003) to examine whether exocentric distance judgments were more compressed on a ceiling surface than on a ground surface. In addition, we were interested in whether the difference between the two surfaces in the compression of perceived visual space varied as a function of distance in the scene. In Experiment 2, we used a bisection task and examined whether the results obtained in Experiment 1 were due to a difference in perceived length in the frontal-parallel plane between the ground surface and the ceiling surface. In Experiment 3, we examined whether linear perspective was necessary to produce the ground surface advantage in perceived exocentric distance. In Experiments 1–3, a single environmental surface was presented. In Experiment 4, we examined whether similar results would occur when both surfaces were presented in the scene.

Experiment 1

In Experiment 1, we used an L-shaped matching task similar to that used in Feria et al. (2003) to examine whether judged exocentric distance varied based on the presence of a ground or ceiling surface. Since previous studies using visual matching tasks have found a large compression of depth (e.g., Loomis, Da Silva, Fujita, & Fukusima, 1992), a more accurate response would mean less compression of perceived visual space. In addition, previous research has found that compression of a depth interval increased systematically as a function of distance in the scene (Levin & Haber, 1993; Loomis et al., 1992; Loomis & Philbeck, 1999; Norman et al., 1996; Ooi et al., 2006), suggesting a distortion of perceived visual space. In Experiment 1, we also manipulated the distance of the L-shape from the observers and examined whether the difference in judged depth extent on the two surfaces varied as a function of distance in the scene.

On each trial, three poles were attached to either a ground surface or a ceiling surface, forming an inverse “L” shape. The task of the observers was to match the horizontal arm of the “L” to the vertical arm of the “L” in depth. If the perceived depth when viewing a ground surface is different than a ceiling surface, then the adjusted ratio of the L-shape should be more accurate when a ground surface is present than when a ceiling surface is present. Previous studies have found that greater depth was perceived when stimuli with motion parallax information were presented as compared to when stationary stimuli were presented (Gibson, Gibson, Smith, & Flock, 1959; Rogers & Graham, 1979; Smith & Smith, 1963). In the present study, we manipulated motion parallax information and examined whether it has a differential effect on the perceived depth when viewing ground and ceiling surfaces.

Methods

Observers

The observers were 12 undergraduate students (6 males and 6 females) from the University of California, Riverside. All observers were paid for their participation, were naive regarding the purpose of the experiment, and had normal or corrected-to-normal visual acuity.

Stimuli

The stimuli were computer-generated 3D scenes composed of either a ground or ceiling surface with a 64 × 64 random black–white rectangle texture. The simulated dimension of the surface was 19.2 m × 34.3 m and each rectangle was measured as 30 cm × 53.6 cm. The average luminance of the white rectangles was 60.8 cd/m². The simulated distances from the observer to the near and far ends of the plane were 571 cm and 4000 cm, respectively (the calculation of the scene dimensions was based on an eye height of 120 cm). Three red vertical poles were attached to the surface and formed an inverse L-shape (see Figure 1). The first pole was positioned close to the observers (“the front pole”), the second pole was positioned directly behind the first pole (“the back pole”), and the third pole was positioned either to the left or to the right side of the second pole (“the side pole”). The simulated depth interval between the front pole and the back pole was 12 m, subtending a visual angle of 3.39° and 2.18° when the front pole was 12 m and 16 m away from the observer, respectively. The location of the front and back poles was fixed, whereas the side pole could be adjusted horizontally by the observer. On each trial, the height of each pole varied randomly between 24 cm and 44 cm, and the width of each pole varied randomly between 3.5 cm and 8 cm, respectively, in order to prevent the use of size information as a depth cue. The initial position of the side pole varied randomly between 50 cm and 100 cm from the back pole. When motion parallax information was available, the whole scene oscillated horizontally back and forth at an average speed of 90 cm/s. The actual speed in each frame was determined by a sine-wave function. The duration of each cycle was 8 s (480 frames).

Design

Four independent variables were manipulated: (1) surface type (ground or ceiling), (2) motion parallax information (present or absent), (3) simulated distance of the front pole to the observers (“front pole distance,” 8 m, 12 m, 16 m, or 20 m), and (4) simulated distance between the front and back poles (“depth interval,” 6 m or 8 m). The variable of motion parallax information was blocked and the order was counterbalanced across observers. On each block, 16 combinations of each level of surface type, front pole distance, and depth interval were presented for 6 replications. The side pole was to the left or right of the back pole with equal probability. Four practice trials (2 trials for each surface) were inserted at the beginning of each block. The order of the trials for each observer in each block was randomized.

Apparatus

The displays were presented on a 21-inch (53 cm) flat screen CRT monitor with a pixel resolution of 1024 by 768, controlled by a Windows XP Professional Operating System on a Dell Dimension XPS workstation. The dimensions of the display on the monitor were 40.0 cm (W) × 30.0 cm (H), subtending a visual angle of 31.3° × 23.7°. The center of the monitor was 120 cm above the floor. A black viewing hood was placed in front of the monitor to cover the edges of the screen. A 19-cm-diameter glass collimating lens, which magnified the images by approximately 19%, was located between the observer and the monitor. The purpose of the collimating lens was to remove accommodation as a flatness cue and, thus, increase the perceived depth of the 3D scenes. The distance between the eyes and the collimating lens was approximately 10 cm and the distance from the eyes to the monitor was 85 cm. A chin rest was mounted at a position appropriate to this viewing distance. A Logitech Attach 3 joystick was used to control the position of the side pole.

Procedure

The experiment was run in a dark room. The observers viewed the display monocularly through the collimating lens with their head position fixed by a chin rest and one of their eyes (the weaker eye) covered by an eye patch. An eye height of 120 cm was used. The observers were instructed to use a joystick to adjust the position of the side pole such that the horizontal separation between the side pole and the back pole matched the perceived distance between the front pole and the back pole. Once satisfied with their response, the observer pressed a button on the joystick to proceed to the next trial. The judged distance between the side pole and the back pole was recorded.

Results and discussion

The aspect ratio between the adjusted horizontal extent and the simulated depth (the depth interval between the front and back poles) was measured. An accurate judgment would result in an aspect ratio of 1. Formally, the judgment is defined as

r = D^{'} / D,

(1)

where r is the aspect ratio, D′ is the horizontal separation between the side and back poles, and D is the simulated depth interval between the front and back poles.

Overall, the aspect ratio for all observers varied from 0.20 to 0.33, suggesting a large compression of depth, a result consistent with previous research that used computer-generated scenes (Feria et al., 2003) and research conducted in real scenes (Beusmans, 1998, Figure 11). The aspect ratio for each observer in each condition was analyzed in a 2 (surface type) × 2 (motion parallax information) × 4 (front pole distance) × 2 (depth interval) analysis of variance (ANOVA). The main effect of surface type was significant (F(1, 11) = 18.83, p < 0.01). The average aspect ratio was 0.25 when the poles were located on a ground surface and 0.23 when the poles were located on a ceiling surface, suggesting that judged exocentric distance showed less distortion on a ground surface than on a ceiling surface. The main effect of motion parallax information was significant (F(1, 11) = 24.05, p < 0.01). According to this result, observers judged more depth with a moving scene (mean aspect ratio = 0.26) than with a stationary scene (mean aspect ratio = 0.23).

There was a significant main effect of front pole distance (F(3, 33) = 28.08, p < 0.01) and depth interval (F(1, 11) = 66.92, p < 0.01) that were mediated by a significant 3-way interaction of surface type, front pole distance, and depth interval (F(3, 33) = 3.73, p < 0.05, see Figure 2). According to this result, the interaction between surface type and front pole distance was significant (F(3, 33) = 2.91, p < 0.05) when the depth interval was 6 m but was not statistically significant when the depth interval was 8 m (F(3, 33) = 2.68, p = 0.06). No other interactions reached significance (p > 0.05).

Aspect ratio of judged depth and simulated depth as a function of surface type, motion parallax, front pole distance, and depth interval from Experiment 1. The top and bottom panels are the results for motion parallax and stationary scene conditions, respectively. Error bars represent ±1 standard error.

As the front pole distance increased from 8 m to 20 m, the judged aspect ratio decreased from 0.28 to 0.22. If the observers had responded to the projected size (retinal images) rather than the simulated size of the depth interval, then the aspect ratio would have been 0.15 for the 8-m condition and 0.06 for the 20-m condition, respectively. Our results suggest that observers were responding to the simulated distance between the front pole and the back pole.

Overall, the results are consistent with our prediction that judged depth on a ground surface was less compressed than on a ceiling surface. This ground surface advantage did not vary systematically as a function of distance in the scene. Our results are also consistent with previous studies showing a ground surface advantage in the perceptual organization of 3D scenes (e.g., McCarley & He, 2000).

Experiment 2

In Experiment 1, we found that the aspect ratio of an L-shape when observers adjusted the horizontal extent to match the perceived depth was smaller when viewing a ceiling surface than when viewing a ground surface, suggesting a difference in the compression of visual space for a ground as compared to a ceiling surface. These results, however, could be attributed to a difference in the perceived length in the frontal-parallel plane rather than a difference in the perceived depth when viewing the surfaces. In other words, it is possible that the perceived depth of ground and ceiling surfaces was the same with the horizontal extent (the horizontal separation between the back pole and the side pole) on the ground surface perceived to be smaller than the extent when viewing a ceiling surface.

In Experiment 2, we examined this possibility with a bisection task. On each trial, three horizontal poles were positioned on either the ground or ceiling surface. The three poles were parallel to each other and separated in depth (see Figure 3). The observers adjusted the position of the middle pole to match the depth interval between the front and middle poles to the interval between the middle and back poles. If the results obtained in Experiment 1 were due to a difference in the perceived horizontal extent in the frontal-parallel plane, then we would expect similar bisection judgments when viewing ground and ceiling surfaces. On the other hand, if the results of Experiment 1 were due to a difference in perceived depth between the two surfaces, then a similar ground surface advantage should occur when performing a bisection task.

An example of the stimuli with a ground surface used in Experiment 2.