Transsaccadic Representation of Layout: What is the time course of Boundary Extension?

Christopher A Dickinson; Helene Intraub

doi:10.1037/0096-1523.34.3.543

. Author manuscript; available in PMC: 2009 Sep 29.

Published in final edited form as: J Exp Psychol Hum Percept Perform. 2008 Jun;34(3):543–555. doi: 10.1037/0096-1523.34.3.543

Transsaccadic Representation of Layout: What is the time course of Boundary Extension?

Christopher A Dickinson ¹, Helene Intraub ¹

PMCID: PMC2754043 NIHMSID: NIHMS142583 PMID: 18505322

Abstract

How rapidly does boundary extension (BE) occur? Across experiments, trials included a 3-scene sequence (325 ms/picture), masked interval, and repetition of one scene. The repetition was the same view or differed (more close-up or more wide-angle). Observers rated the repetition as same, closer, or more wide-angle than the original view on a 5-point scale. Masked intervals were 100, 250, 625, or 1000 ms in Experiment 1, and 42, 100, or 250 ms in Experiments 2 and 3. BE occurred in all cases: identical views were rated as too “close-up”, and distractor views elicited the rating asymmetry typical of BE (wider-angle distractors were rated as being more similar to the original than were closer-up distractors). Most important, BE was evident when only a 42-ms mask separated the original and test views. Experiments 1 and 3 included conditions eliciting a gaze shift prior to the rating test; this did not eliminate BE. Results show that BE is available soon enough and is robust enough to play an on-line role in view integration, perhaps supporting incorporation of views within a larger spatial framework.

Keywords: Scene perception, Transsaccadic memory, Visuospatial memory, Boundary extension, Spatial cognition

Transsaccadic Representation of Layout: What is the time course of Boundary Extension?

We can never see the surrounding visual world all at once. Instead, we must sample it a part at a time through successive movements of the eyes and head. An interesting aspect of memory for a single view of a scene is that it will often be remembered as having shown more of the scene than was available in the sensory information—observers remember seeing beyond the edges of the view. This is referred to as boundary extension (BE; Intraub & Richardson, 1989). Although BE is an error with respect to the stimulus, it provides a good prediction of the world beyond the view. For this reason, it has been suggested that BE might serve an adaptive function in scene representation by placing each view within its larger spatial framework (Intraub, 2002, 2007; Intraub, Bender, & Mangels, 1992).

This hypothesis has received support from both behavioral and neuroimaging data. Behavioral research has shown that BE is not present in memory for all pictures of objects (e.g., an object on a blank background), but only for those in which the background depicts part of the visual world (i.e., scene layout; Gottesman & Intraub, 2002, 2003; Intraub, Gottesman & Bills, 1998). A similar distinction between pictures that include scene layout and those that do not is reflected in the heightened neural responses of the parahippocampal place area (PPA) to pictures of scene layout (Epstein & Kanwisher, 1998). More recently, an fMRI study of brain activation in the presence of BE revealed that indeed PPA was highly activated when BE occurred (Park, Intraub, Yi, Widders, & Chun, 2007). These experiments indicate that BE is part of the representation of scene layout. The purpose of the present research is to determine the early time course of BE. Specifically, at what stage of processing does BE occur?

In the typical experiment, short series of photographs were presented for multi-second durations (e.g., 15 s each), and memory was tested minutes to 48 hours later (e.g., Candel, Merckelbach, & Zandbergen 2003; Gottesman & Intraub, 2002; Intraub et al., 1992, 1998; Intraub & Richardson, 1989; Mathews & Mackintosh, 2004). These results suggest that BE may be a long-term phenomenon. In fact, Koriat, Goldsmith, and Pansky (2000), in their review of memory errors, grouped BE with memory errors for text in which “... schematic knowledge is used to make inferences and suppositions that go beyond the actual input event, ... (p. 494)”. They noted that schema-induced errors such as these tend to increase over time as memory becomes less detailed. If BE requires long retention times to occur, then it could not play a role on-line during visual scanning. It might be the case that at the earliest stages of processing (e.g., between fixations), visual memory for a just-fixated view might be strong enough to maintain fairly veridical boundaries.

However, an experiment described in Previc and Intraub (1997) demonstrated, somewhat surprisingly, that BE did occur rapidly enough to be observed across a series of perception/action cycles during drawing. Observers viewed four photographs for 15 s each and then drew them from memory. Another group drew them from a projected image on a screen at the front of the room. As expected, BE occurred in the memory group. What was striking was that BE also occurred in drawings made by observers who could see the photographs. Does this mean that BE occurs during perception?

To answer this question it is important to consider the task carefully. While an observer is literally looking at a picture, he or she can see where the picture ends, and can imagine what would likely exist in the world beyond the edges of the view. Both the sensory information and the expected layout beyond the edges are part of the observer's representation. Unlike well-known perceptual illusions (e.g., the Mueller-Lyer illusion), BE does not occur while the stimulus is in view. However, when drawing, observers did not maintain fixation on the projected image. They shifted their gaze from the projected image to the paper on their desks, thus relying on memory of the view while they drew. They occasionally would look up and sample the image then look down and draw again, alternating between sensory perception and memory. This suggests that BE occurred at least within seconds after the sensory image was gone, while the observer was drawing.

Consistent with this observation, Intraub, Gottesman, Willey, and Zuk (1996) demonstrated that BE occurs for photographs presented in brief rapid serial visual presentation (RSVP) sequences when memory was tested for the last item in the sequence only 1 second later (see also Bertamini, Jones, Spooner, & Hecht, 2005). Given these results, two possibilities are clear. First, BE might occur a second or more following offset of the stimulus, suggesting it is a very short-term memory error—not rapid enough to play a role in transsaccadic memory, but available soon after perception. Alternately, BE might occur as soon as the sensory input is gone. Rather than occurring “in memory,” instead it might be part of the unfolding process of scene perception, which involves a rapidly changing cycle of sensory perception and memory.

If BE is available during the time between saccades (i.e., transsaccadic memory; see Irwin, 1991, 1993), then we would expect to see it under the following three conditions. First, it should occur following a brief glimpse of a scene (analogous to the duration of a single fixation). BE is known to occur following presentations as brief as 250 ms and 333 ms (Bertamini et al., 2005; Intraub et al., 1996; 500 ms: Intraub, Hoffman, Wetherhold, & Stoehs, 2006).

Second, it should be evident following a gap in sensory input commensurate with a “typical” saccade (on the order of 30 to 50 ms; Rayner, 1998). However, retention times briefer than 1 s have not been tested. Third, BE would need to survive a gaze shift caused by a change in the position of the eyes or the head. Recent research shows that BE occurs following a single eye movement when memory is tested 2 s later (Intraub et al., 2006), but it is not known whether it occurs when tested immediately after a gaze shift. It is possible that the process of planning and executing a gaze shift might delay the onset of BE for a couple of seconds, thus preventing its inclusion in transsaccadic memory.

Might Early Memory Buffers Prevent BE?

It may be that early visual buffers maintain a fairly veridical representation of layout, essentially protecting the representation from distortions. Iconic memory has been categorized as a high-capacity, brief-duration, veridical buffer that is disrupted by masking by both luminance and pattern. So, for example, if a single fixation on a scene were followed by a fixation on an empty region of space, then a veridical representation of the spatial expanse of that view might be maintained in an iconic representation (i.e., as informational persistence) for 100 to 300 ms (Irwin & Brown, 1987; Irwin & Yeomans, 1986).

What if, however, a single fixation on a view of a scene is followed by another fixation whose contents would disrupt an iconic representation, or alternately, an iconic representation isn't maintained across a gaze shift? Visual short-term memory (VSTM) has been categorized as a longer lasting (i.e., multi-second) buffer that does not maintain a literal copy of the physical visual stimulus (i.e., it is not a point-by-point copy) and that is capacity-limited but not disrupted by visual masking (Irwin, 1991; Phillips, 1974). Numerous studies suggest that VSTM representations are not literal copies of the display (Gordon & Irwin, 1996; Henderson, 1994; Hollingworth, Hyun, & Zhang, 2005; Irwin, 1991, 1992; Irwin & Andrews, 1996; Olson & Jiang, 2004; Phillips, 1974).This does not imply, however, that VSTM representations contain no visual information. Here we ask whether VSTM might maintain a veridical representation for a brief interval following a picture's offset, thus preventing BE.

Might Gaze Shifts Delay BE?

If BE occurs rapidly enough to support view integration on-line, it would not be of use unless it could also survive a shift in gaze. In prior boundary extension research, stimulus and test views have always been presented in the same physical location. Might the demands of attention/motor systems engaged during gaze shifts early in processing either delay the onset of BE or cause memory for boundaries simply to be poor—causing errors but not the strong unidirectional error that characterizes BE? Numerous studies have demonstrated that gaze shifts suppress a variety of cognitive processes, whereas others appear to continue across them unimpeded (for a review, see Irwin & Brockmole, 2004). A brief examination of the known properties of transsaccadic memory might shed some light on whether BE might be included in transsaccadic memory.

Most experiments on transsaccadic memory have focused not on scenes as a whole, but on the properties of individual objects that are remembered across a saccade. In general, results suggest that representations of structural descriptions of objects are retained (Carlson-Radvansky & Irwin, 1995; Carlson-Radvansky, 1999; Verfaillie & De Graef, 2000; Verfaillie, De Troy, & Van Rensbergen, 1994), but there is also evidence for retention of specific visual information (object orientation: Henderson & Siefert, 1999, 2001; but not detailed object contours: Henderson, 1997).

Boundary extension, however, involves an extrapolation of layout. Although the literature on transsaccadic memory has not focused on scene layout, there have been several studies that have focused on retention of spatial relations. These provide evidence that spatial information is available in transsaccadic memory. It has been shown that information that specifies the structural relations of parts of a single object is included in a transsaccadic representation (Carlson-Radvansky & Irwin, 1995). In addition, information about the configuration, or the spatial relations among different objects, can also be represented (Carlson-Radvansky, 1999; Deubel, 2004; Germeys, de Graef, Panis, van Eccelpoel, & Verfaillie, 2004). The question we ask is whether this transsaccadic representation includes extrapolated layout (BE) or instead maintains a more veridical representation of the view.

The Current Experiments

To explore the early time course of BE and test its resiliency to shifts in gaze, in all three experiments we used Intraub et al.'s (1996) three-picture RSVP method. Three pictures were presented for 325 ms each in a continuous sequence followed by a masked retention interval and subsequent boundary memory test. There were three reasons for choosing this method. First, by embedding a target picture in a rapidly changing series, we could approximate the dynamic nature of visual scanning.¹ Second, the design allowed us to test boundary memory following presentations of 0, 1, or 2 intervening items, thus allowing us to determine if factors such as conceptual masking (Intraub 1984; Potter, 1976) during successive presentation might influence BE. Third, the rapidity of input coupled with observer uncertainly about which picture would be tested would minimize the observer's ability to develop verbal strategies (e.g., “the man's head is .5 cm from the top”) for remembering boundary placement over the course of the session.

In all three experiments, on each trial, the observer was required to rate the repeated scene as being the same view or a more close-up view or a more wide-angle view than before on a 5-point scale (Intraub & Richardson, 1989). Across experiments the interval between offset of the last picture and onset of the test picture was always masked and it ranged between 42 ms (comparable to the duration of a saccade) and 1 s (to replicate earlier research). To test the effect of a shift in gaze on boundary extension, in Experiments 1, 3a, and 3b, test pictures were presented either in the same location as the RSVP sequence, or to the left or right side of the screen.

Depending on the experiment, when a scene repeated, it could be the same, a more close-up view, or a more wide-angle view than one of the pictures in the presentation sequence. In this way, all the patterns of response that are diagnostic of boundary extension could be addressed following each masked interval. These patterns have been replicated in many studies (e.g., Bertamini et al., 2005; Gottesman & Intraub, 2002; Intraub et al., 1992, 1998; Intraub & Richardson, 1989). The three patterns of interest are as follows:

When the target picture and the test picture are identical close-ups, observers tend to reject the test picture as being the same, reporting instead that it is more close-up than the target picture. This was tested in Experiments 1–3.
Target pictures that are tight close-ups yield more boundary extension than wider angle views of the same scene; in fact, wider angle views tend to yield no directional distortion (see Gottesman & Intraub, 2002; Intraub et al., 1992; Intraub & Berkowits, 1996). This was tested in Experiment 2.
When a close-up is the target, and a wider view is presented at test, observers rate the pair as being more similar than when the reverse is the case. This is because BE for a closer target causes it to be remembered as looking more like the wider angle test picture. This asymmetry was also observed in neural responses to dissimilar pictures in scene selective brain regions (Park et al., 2007). This signature pattern was tested in Experiment 2.

Experiment 1

In Experiment 1 BE was tested for a single picture immediately following a three-picture RSVP sequence and a masked interval of 1 s, 625 ms, 250 ms, or 100 ms. We selected the latter three intervals as coarse divisions of the 1-s interval after which BE is known to occur (Bertamini et al., 2005; Intraub et al., 1996). To provide the most sensitive test of BE on half the trials the same close-up served as both target and test picture. Thus in the critical case for testing time course (when the target picture is in the final serial position of the RSVP sequence), the same close-up view of a complex scene was interrupted by a 100-ms masked interval. To ensure that observers were focused on the task and were using the scale appropriately, on the remaining trials, scenes were divided into two sets: in one a close-up target was tested with a more wide-angle view of the same scene, and in the other, a wider angle view was tested with a close-up version.

Finally, to determine if BE survives a gaze shift between the target picture and the test picture, the test picture appeared equally often in the same location as the target (center screen) or shifted to the right or left side of the screen. If BE does not occur at intervals briefer than 1 s, or if a gaze shift disrupts the extrapolation of expected surrounding space, then BE cannot play an on-line role in view integration during active scanning.

Method

Participants

A total of 144 University of Delaware undergraduates, fulfilling a requirement for an introductory psychology course, participated in the experiment. All reported having normal or corrected-to-normal vision and normal color vision.

Apparatus

All stimuli were presented on a 21” flat-screen CRT monitor in 32-bit color at a resolution of 1024 × 768 pixels and a refresh rate of 120 Hz that was driven by a video card with 128 mb of video memory. Stimulus presentation was controlled by a Pentium-based PC running Microsoft Windows XP. The software was based on a template program supplied by SR Research Inc. written in C that used Simple DirectMedia (SDL) v.1.2.9 to access the video hardware. The viewing distance was approximately 80 cm, and on average, pictures subtended 9.2° × 10.2° of visual angle (widths ranged from 5.9° to 13.7°; heights ranged from 8.9° to 10.4°).

Stimuli

Stimuli were 96 color photographs that depicted people engaging in various activities; for example, a football player kicking a football, a man tossing a pizza, and a couple dancing. Some of the images were copied, with permission, from The Big Box of Art database (Hemera Technologies, Inc.); others were downloaded from the Internet. All stimuli were presented on a gray background. Of the 96 pictures, 32 served as targets (i.e., to-be-tested pictures: 2 in the practice trials and 30 in the experimental trials). The other 64 pictures served as fillers in the presentation sequence (i.e., the two non-target pictures in the RSVP triad). A given scene (close or wide version) was always presented with the same filler items. Each target scene could be presented in its close-up version or its more wide-angle version, as is illustrated in Figure 1.² Close-ups were created by enlarging the wide-angle version by 8% to 21% in area and then cropping the enlarged version to be the same size as the original. Thus, both versions were the same size, but the wide-angle version showed a larger amount of background surrounding the main object or objects.

A close-up version of a target picture is shown on the left, and wider angle version of the same target picture is shown on the right.

Design and Procedure

A depiction of a trial sequence is shown in Figure 2. Each self-initiated trial began with a central fixation point that remained on screen for 500 ms. The RSVP sequence followed (325 ms/picture) in the center of the screen. The target picture appeared equally often in serial position 1, 2, or 3. Observers did not know which picture would ultimately be tested. The RSVP sequence was immediately followed by a masked interval of 100, 250, 625, or 1000 ms (between groups). For a target picture that appeared in serial position 3, the retention interval would be the same as the duration of the masked interval; for target pictures presented earlier in the sequence, the retention interval would be equal to the duration of the masked interval plus the duration of the picture or pictures that followed it.

A schematic illustration of a trial sequence with the test picture in the center of the screen (maintain-fixation condition). Note that the actual stimuli did not fill the screen, as is shown in the example.

The mask was a pattern mask that filled the screen and had a dynamically changing central portion. As is shown in Figure 2, the central component was a schematic face that subtended 5.5° × 5.5° of visual angle. A sequence of 4 different faces was shown, and each face was visible for either 150 ms or was terminated when the masked interval ended. The repeated onsets were intended to minimize implicit verbalization and keep the observers’ eyes on the center of the screen before the test picture appeared. At the end of the masked interval, the test picture was shown either in the display's center (referred to as maintain-fixation trials) or to the left or right of center (referred to as shift-gaze trials), shifted by an average of 5.8° (ranging from 4.2° to 7.4°). It appeared equally often at each location, without warning. On trials in which the test picture appeared in a different location than the RSVP sequence, the interval between the offset of the final picture in the RSVP sequence and the beginning of an observer's first fixation on the test picture would be defined by the time required for the observer to shift his or her gaze to the test picture following its onset. Otherwise, this interval would be equal to the masked interval.

Observers were asked to rate whether the test picture showed the same view, a more close-up view, or a more wide-angle view than before using a 5-point Likert scale. The alternatives (and their corresponding numerical values) were “much closer up (–2),” “a little closer up (–1),” “the same (0),” “a little farther away (1),” and “much farther away (2).” The test picture was visible until the observer clicked one of these choices with the mouse. Observers then indicated how confident they were about their response by clicking “sure (3),” “pretty sure (2),” “not sure (1),” or “don't remember that picture (0).” The fixation point for the next trial then appeared and the observer initiated the trial by clicking the mouse.

There were a total of 30 trials. On 15 trials the target and test picture were identical close-ups (CC trials). On 14 trials the target and test picture were different views of the same scene; half the time a close-up was the target and the test picture was a more wide-angle view (CW trials) and half the time the reverse was true (WC trials). Each target picture was tested in only one of these conditions across observers. The 30^th trial was always a “dummy” trial in which the target picture was tested with the same view. This trial had to be added to allow us to show each target at each serial position equally often across observers, while at the same time having an equal number of CW and WC trials. Responses made on this trial were not included in the analyses.

Results and Discussion

Observers were rather confident of their ratings; on average, 21%, 57%, and 20% of their responses were rated as “sure,” “pretty sure,” and “not sure”, respectively. They reporting not recognizing the test picture on only 2% of the trials, and these were excluded from analysis. A 4 × 2 (masked interval × side of display) mixed-design ANOVA comparing observers’ mean boundary ratings for test pictures presented on the left vs. right side of the screen revealed no main effect of location, F(1, 140) = 1.73, n.s., and no interaction with the masked interval, F < 1. Observers’ mean boundary ratings in these two conditions were subsequently collapsed across this factor.

Critical CC trials: Targets and test pictures are identical close-ups

Figure 3 (left) shows the mean boundary rating at each serial position (collapsed over the spatial position of the test picture). The 95% confidence intervals revealed that BE occurred at each serial position for each masked interval. Thus boundary extension occurred even at the briefest interval tested—when the final picture in a sequence was repeated only 100 ms later. To determine if the size of the BE effect was influenced by the duration of the masked interval, a 3 × 4 (serial position × retention interval) mixed-design ANOVA was conducted. It revealed no main effect of the masked interval's duration, F(3, 140) = 1.16, n.s., no effect of serial position, F < 1, and no interaction, F < 1. The lack of a serial position effect shows that the onset of new meaningful pictures during RSVP did not disrupt incorporation of the extrapolated region into the spatial representation of the scenes—that is, there was no effect of conceptual masking (Intraub, 1984; Potter, 1976).

Observers’ mean boundary ratings for each serial position of the target (collapsed across spatial position of the test picture) are shown on the left, and their mean boundary ratings for each spatial position of the test picture (center, side; collapsed across serial position of the target) are shown on the right, for each retention interval (Experiment 1). All error bars show the 95% confidence interval of the mean. Means that are significantly less than zero reflect boundary extension; means that are significantly greater than zero reflect boundary restriction.

Figure 3 (right panel) shows the mean boundary rating as a function of the spatial location of the test picture for each masked interval (collapsed over serial position). As shown by the 95% confidence intervals in the figure, boundary extension occurred whether or not a shift in gaze intervened between presentation and test. Observers were never forewarned about the location of the test picture, yet when it shifted away from center screen, the concomitant gaze shift had no inhibitory effect on BE. The expanded representation of layout clearly survived the shift in attention and subsequent gaze shift, suggesting that BE is available during the time course of transsaccadic memory.

CW and WC Trials

The mean boundary ratings for CW trials and for WC trials (collapsed over serial position and spatial position) for each masked interval are shown in Figure 4 (left and right panels, respectively). Observers were able to recognize the presence of distractors, and were clearly using the scale appropriately. Consistent with the occurrence of BE, the right panel shows that observers were quite good at recognizing when the test picture was more close-up than the target, whereas the left panel shows that more wide-angle test pictures were sometimes mistaken as being the same as the target.

Observers’ mean boundary ratings (collapsed across both serial position of the target and spatial position of the test picture) for trials on which a close-up target was tested by wide-angle view are shown on the left, and their mean boundary ratings for trials on which a wide-angle target was tested by close-up view are shown on the right, for each retention interval (Experiment 1).

Experiment 2

Experiment 1 showed that a 100 ms interruption was sufficient to elicit BE. In Experiment 2 we decreased the briefest interval further to 42 ms (commensurate with a saccade); intervals tested were 250 ms, 100 ms, and 42 ms. To enhance the observer's ability to retain a veridical representation, the RSVP sequence and test picture were always in the same location (center screen). In addition, we sought to obtain converging evidence for BE through implementation of a design used in many prior BE studies (e.g., Intraub et al., 1998; Intraub & Richardson, 1989), in which targets were either close-up or wide-angle views of a scene, and the test picture was either the same view as the target or its complement. This yielded four different test conditions: 1) Close-up view tested with the same close-up view (CC trials), 2) Wide-angle view tested with the same wide-angle view (WW trials), 3) Close-up view tested with the wide-angle version of the scene (CW), and 4) Wide-angle view tested with the close-up version of the scene (WC). Scenes were counterbalanced across these four conditions. In this way we could determine if all three patterns that are diagnostic of boundary extension (i.e., boundary extension for CC trials, little or none for WW trials, and a CW–WC asymmetry, as discussed in the introduction) would occur.