Changing viewer perspectives reveals constraints to implicit visual statistical learning

Yuhong V Jiang; Khena M Swallow

doi:10.1167/14.12.3

. 2014 Oct 7;14(12):3. doi: 10.1167/14.12.3

Changing viewer perspectives reveals constraints to implicit visual statistical learning

Yuhong V Jiang ¹, Khena M Swallow ²

PMCID: PMC4189525 PMID: 25294640

Abstract

Statistical learning—learning environmental regularities to guide behavior—likely plays an important role in natural human behavior. One potential use is in search for valuable items. Because visual statistical learning can be acquired quickly and without intention or awareness, it could optimize search and thereby conserve energy. For this to be true, however, visual statistical learning needs to be viewpoint invariant, facilitating search even when people walk around. To test whether implicit visual statistical learning of spatial information is viewpoint independent, we asked participants to perform a visual search task from variable locations around a monitor placed flat on a stand. Unbeknownst to participants, the target was more often in some locations than others. In contrast to previous research on stationary observers, visual statistical learning failed to produce a search advantage for targets in high-probable regions that were stable within the environment but variable relative to the viewer. This failure was observed even when conditions for spatial updating were optimized. However, learning was successful when the rich locations were referenced relative to the viewer. We conclude that changing viewer perspective disrupts implicit learning of the target's location probability. This form of learning shows limited integration with spatial updating or spatiotopic representations.

Keywords: statistical learning, visual attention, spatial updating, viewpoint specificity

Introduction

The human mind's ability to extract and use regularities in complex environments, often in the absence of awareness, is stunningly powerful. From language acquisition to visual perception (Fiser & Aslin, 2001; Reber, 1993; Saffran, Aslin, & Newport, 1996), statistical learning has been characterized as ubiquitous (Turk-Browne, 2012), powerful (Reber, 1993; Stadler & Frensch, 1998), and useful for perceptual and attentive processing (Brady & Chun, 2007; Chun & Jiang, 1998; Goujon, Brockmole, & Ehinger, 2012; Kunar, Flusberg, Horowitz, & Wolfe, 2007; Zhao, Al-Aidroos, & Turk-Browne, 2013). These findings suggest that statistical learning is a critical factor in perceiving and adapting to environmental regularities. Yet only a few studies have directly tested the idea that implicit visual statistical learning (VSL) allows mobile observers to extract visual statistics that are environmentally stable. This study examines the roles of explicit awareness, spatial updating, and reference frames in the ability of mobile observers to learn statistical regularities in the environment.

Perhaps because it can be acquired implicitly and is observed in a variety of domains, statistical learning is often considered an evolutionarily old capability (Reber, 1993). As such, the suggestion that it facilitates basic survival behavior (such as visual search) is appealing. The ability to rapidly learn and use knowledge of where valuable resources are located should be important for survival, as it could optimize effort allocation (Chukoskie, Snider, Mozer, Krauzlis, & Sejnowski, 2013; Smith, Hood, & Gilchrist, 2010). Consistent with this proposal, when a visual search target is more often found in some screen locations than others, people prioritize the “rich” locations even though they are unable to explicitly report where those locations are (Geng & Behrmann, 2005; Jiang, Swallow, Rosenbaum, & Herzig, 2013; Umemoto, Scolari, Vogel, & Awh, 2010).

However, many search tasks involve viewer movements, changing the locations of the rich regions relative to the observer. For VSL to facilitate search in moving observers, it must overcome a difficult computational challenge: representing important locations in a manner that allows them to consistently influence behavior from multiple perspectives. One solution to this problem could be the formation of a map that codes the rich locations relative to landmarks in the external environment (a “spatiotopic map” or other environment-centered coding; Burr & Morrone, 2012). Alternatively, locations may be coded relative to the viewer but updated or remapped following viewer locomotion or eye movements (Cavanagh, Hunt, Afraz, & Rolfs, 2010; Colby & Goldberg, 1999; Wang & Spelke, 2000; Wurtz, 2008). Both solutions are evidently used to code space in navigation and localization tasks (Burr & Morrone, 2012; Wang & Spelke, 2000; Wurtz, 2008). However, for VSL to facilitate search in moving observers, implicit learning must be integrated with spatiotopic representations or with spatial updating mechanisms. Several recent studies have examined how VSL is used when people move through space. These studies have produced inconsistent findings regarding whether implicit VSL can facilitate search when people search from variable perspectives.

Participants in one recent study searched for a target character among distractor characters on a monitor that was laid flat on a stand (Jiang, Swallow, & Capistrano, 2013). Unbeknownst to the participants, the target was more often in one visual quadrant (50%) than in any one of the other three quadrants (17%). Despite being unable to report which quadrant was more likely to contain the target, participants who always searched from the same position found the target faster when it was in the high-probability, rich quadrant rather than a low-probability, sparse quadrant (Figure 1). These data changed when participants searched from random locations around the monitor on each trial. Under these conditions, participants failed to prioritize the target-rich quadrant. They also lacked explicit awareness about where the target-rich quadrant was. Movement itself did not disrupt learning; when participants moved between trials but always returned to the same starting position: They had no difficulty prioritizing the target-rich quadrant. These findings showed that changing the viewer's perspective is detrimental to the acquisition of environmentally stable visual statistics.

(A) Experimental setup. Participants search for a target on a monitor that was laid flat on a stand. (B) A sample search display from Experiment 1. The green footprint indicates where participants should stand for that trial (in the actual experiment, the footprint preceded, rather than overlaid, the search display). (C) Results from a previous study on observers who always searched from the same position. Adapted from Jiang, Swallow, and Capistrano (2013), Copyright the Association for Research in Vision and Ophthalmology (ARVO©).

Two other studies provide suggestive evidence of environment-centered VSL, however. In one study, participants searched for a hidden target light embedded in the floor (Smith et al., 2010). When participants reached a light, they switched it on to discover if it was the target (defined as a particular color). The hidden target was more likely to be in one side of the room (80%) than the other (20%). When their starting position was fixed, participants were able to use this fact to more quickly find a target in the rich side of the room. They showed no learning, however, when their starting position was random and all lights had the same color, suggesting that changes in viewpoint interfered with learning. This was not always the case, however. If different colored lights marked the two sides of the room, participants were able to prioritize the rich side. In this situation, many participants spontaneously reported noticing that the target was unevenly distributed. Another study asked participants to find a coin in a large (64 m²) outdoor environment (Jiang, Won, Swallow, and Mussack, in press). The coin was more often placed in one region of the environment than in other regions. Participants were able to prioritize the target-rich region. Much like the study by Smith et al. (2010) study, participants in the outdoor task were highly accurate in identifying the rich region. These data showed that mobile participants were sometimes capable of acquiring environmental regularities. However, such learning was accompanied by explicit awareness of what was learned.

The contradiction between the three studies reviewed above raises questions about when and why implicit VSL is insensitive to environmentally stable visual statistics. To address this question, the current study systematically examines the roles of explicit awareness, spatial updating, and reference frames in one form of VSL: location probability learning. We selected location probability learning as our testing paradigm because it is considered an important mechanism for foraging and visual search (Chukoskie et al., 2013; Smith et al., 2010). In addition, spatial location is the key element in location probability learning, making this paradigm ideal for examining the spatial reference frame of implicitly learned visual statistics. Finally, the large effect size of location probability learning under fixed-viewing conditions facilitates the interpretation of potential null results. At the end of the article, we will discuss the generalizability of our findings to other forms of VSL.

First, we tested the hypothesis that the implicit VSL of the target's likely location in a visual environment is more successful if conditions for spatial updating are optimized. In the study by Jiang, Swallow, and Capistrano (2013), participants moved to a random side of the search space on each trial. These unpredictable changes in viewer perspective could have increased the difficulty of spatial updating over the course of the task. Moreover, the visual environment was relatively sparse, reducing the likelihood that landmarks could be used to form an environment-centered reference frame. One goal of the current study is to optimize conditions for environment-centered VSL. To this end, participants made small, predictable perspective changes from one trial to the next (Experiment 1). In addition, we compared performance in a visually rich map search task (Experiment 2) with that in a visually sparse letter search task.

Second, we tested the role of explicit awareness in learning environmental regularities. Although two studies have shown some evidence for environment-centered learning (Jiang et al., in press; Smith et al., 2010), neither examined the relationship between explicit awareness and environment-centered learning. In Experiment 3, we therefore used a task in which participants would be likely to acquire various levels of explicit awareness of the visual statistics. If explicit awareness is important for acquiring an environment-centered learning, those participants who showed greater awareness should also evidence more learning.

Finally, previous studies had little to say about why implicit location probability learning was insensitive to environment-centered visual statistics. One possibility is that this type of learning is intrinsically viewer centered. If this is the case, then visual statistics that are referenced relative to the viewer should be readily acquired. Alternatively, if other factors (such as disorientation) interfered with learning, then moving observers should be unable to acquire any type of visual regularities, including viewer-centered visual statistics. A third goal of this study is to test the source of failure for implicit learning (Experiment 4).

Experiment 1

To examine whether implicit VSL can result in environment-centered learning in mobile observers, participants in Experiment 1 changed their standing position on a trial-by-trial basis in a visual search task. They searched for a rotated letter target (T or L) among symbol distractors (distorted +) and reported whether it was a T or an L. There was one target on each trial. The items were presented on a monitor that was laid flat on a stand (Figure 1A). At the beginning of each trial, participants were cued to stand at one position around the stand (Figure 1B). Experiment 1A served as a replication of the study by Jiang, Swallow, and Capistrano (2013) and also enabled cross-experiment comparisons. In this experiment, the standing position was chosen randomly from the four sides of the monitor. Consequently, the participants' viewpoint could change 0°, 90° clockwise or counter-clockwise, or 180° from one trial to the next. The goal of Experiment 1B was to optimize conditions for spatial updating. In this experiment, the standing position changed in 30° increments along a single direction around the monitor (e.g., clockwise). Perspective change was therefore small and predictable. The two versions of the experiment were otherwise identical.

We divided the experiment into 20 blocks of trials. In the first 16 blocks (mobile phase), across multiple trials, the target was more often located in one target-rich quadrant than in any one of the target-sparse quadrants. The high-probable, target-rich locations were stable within the environment. However, because the standing position changed from one trial to the next, the location of the rich quadrant relative to the participant was variable. Note that changes in perspective occurred between trials rather than during a trial. In the last four blocks (stationary phase), the target was equally likely to appear in any quadrant. In addition, participants always searched from the same position to minimize interference from movements. If an attentional bias toward the target-rich quadrant had developed in the mobile phase, then it should manifest as a persisting preference for the (previously) target-rich quadrant in the stationary phase.

Participants had full access to stable environmental landmarks in the testing room. Room furniture, an experimenter, and a lamp at one corner of the room were constantly in view. One side of the computer monitor was colored red to provide an additional landmark that was readily visible during search. These cues could be used to code the target-rich locations in an environment-centered representation or to update viewer-centered representations following movement. If implicit VSL extracts regularities in an environment-centered fashion, then it should result in faster search response time (RT) when the target appears in the rich quadrant rather than the sparse quadrants. In contrast, if implicit VSL is viewpoint specific, then it may fail to produce a search advantage in the rich quadrant. Finally, it is possible that a search advantage in the rich quadrant would be found when the perspective change is small and predictable (Experiment 1B) but not when it is large and unpredictable (Experiment 1A). Such findings would suggest that implicit VSL can support environment-centered learning if conditions for spatial updating are optimized.

Method

Participants

Participants in all experiments reported in this study were college students between the age of 18 and 35 years. They were naïve to the purpose of the study, had normal vision, and received $10/hour or extra course credit for their time. No participants performed more than one experiment.

There were 16 participants in each of Experiments 1A and 1B. The estimated statistical power was greater than 0.99 in detecting an effect size as large as those found in previous studies on stationary observers (Cohen's d = 2.93; Jiang, Swallow, & Capistrano, 2013, experiment 3).

Equipment

A 17-in. touchscreen monitor (1200 × 860 pixels resolution) was placed flat on a 35-in.-tall stand (Figure 1A). Tape on the floor marked four sides of the stand. Participants responded with a wireless mouse. A 25-watt lamp illuminated the room. Viewing distance varied according to the participant's height and was approximately 55 to 90 cm.

Stimuli

Each display contained 12 items (one target and 11 distractors) placed in randomly selected locations in an invisible 10 × 10 matrix (19 × 19 cm). The items were white presented against a black background. Three items were placed in each quadrant. The target was either a T or an L, and the distractors were distorted plus symbols (+; all items 1.3 × 1.3 cm). All items had a random orientation of 0°, 90°, 180°, or 270°. A red bar (1.3 × 19 cm) on one side of the display provided a consistent landmark (Figure 1B).

Design and procedure

After 10 trials of practice involving random standing positions and random target locations, participants completed 20 blocks of experimental trials, with 24 trials per block. Before each trial, a cue (a green footprint icon in Experiment 1A, or an arrow in Experiment 1B) on the monitor indicated the position that participants should move to (Figure 1B). This position changed from trial to trial in the first 16 blocks but remained the same in the last four blocks. In Experiment 1A, the standing position cue changed randomly to one of four equidistant locations around the monitor. Therefore, from one trial to the next, the participants' viewpoint changed 0°, 90° clockwise or counter-clockwise, or 180°. In Experiment 1B, the standing position cue changed in 30° increments along a consistent direction, clockwise for half of the participants and counter-clockwise for the other half. An experimenter stayed in the room to monitor compliance.

Once in position, participants touched a square in the middle of the monitor to initiate a trial. The touch response required eye-hand coordination and ensured that the eye position returned to the center of the display. The search display appeared 200 ms later and remained in view until participants clicked the mouse to indicate which target (T or L) was there. Trials that lasted more than 10 s were considered outliers, which happened less than 1% of the time in all experiments but Experiment 3. The display was erased after the response. A tone provided feedback about response accuracy.

In the first 16 blocks, the target appeared in one target-rich quadrant on 50% of the trials and in any of the other three target-sparse quadrants on 16.7% of the trials. Which quadrant was rich was counterbalanced across participants but remained the same for a given participant. The target-rich quadrant was environmentally stable and did not change when participants moved to different standing positions. In the last four blocks, the target appeared in each quadrant on 25% of the trials. Participants were not informed about where the target was likely to be.

Recognition

After the search task, participants were asked to select the quadrant where the target was most often found.

Results and discussion

Accuracy

One participant in Experiment 1B had low accuracy (less than 80%); this person's data were excluded from the analysis. For the other participants, search performance was highly accurate (greater than 97%) and was unaffected by the target's quadrant condition (p > 0.10 in both Experiments 1A and 1B). This was also the case in all subsequent experiments. Therefore, the analysis in all experiments used the mean RT from correct trials as the dependent measure.

Experiment 1A

In Experiment 1A, participants moved randomly to any side of the monitor before each visual search trial. The target-rich quadrant (in the first 16 blocks) was environmentally stable. Despite the presence of multiple stable environmental landmarks, participants failed to prioritize the search in the rich quadrant (Figure 2). Analysis of variance (ANOVA) on the target's location (rich or sparse quadrant) and block (1–16) showed that RT improved as the experiment progressed, F(15, 225) = 12.61, p < 0.001, η_p² = 0.46, but it did not differ between target-rich and target-sparse quadrants, F(1, 15) = 1.25, p > 0.25, and neither did the target quadrant interact with block, F < 1. Furthermore, it is unlikely that participants learned where the target was likely to be but were unable to use that knowledge when moving. For the last four blocks, participants stood at a single position, but the target was equally likely to appear in any quadrant. Under these conditions, VSL should manifest as a persistent attentional bias toward the previously rich quadrant (Jiang, Swallow, Rosenbaum, et al., 2013; Umemoto et al., 2010). This, however, did not occur. The RT in the stationary phase was unaffected by whether the target was in the formerly rich or sparse quadrant, F < 1.

Experiment 1B

In Experiment 1A, participants moved a relatively large distance to an unpredictable location on each trial. Both the large change in the perspective and its unpredictability could disrupt spatial updating (Tsuchiai, Matsumiya, Kuriki, & Shioiri, 2012). In Experiment 1B, changes to the participant's search position were both predictable (e.g., they moved in a single direction around the display) and small (30° change from one trial to the next; Figure 3A). However, in the first 16 blocks, RT was similar when the target appeared in the rich quadrant and the sparse quadrants, F < 1, and this effect did not interact with block, F < 1. The main effect of block was significant, F(15, 210) = 7.01, p < 0.001, η_p² = 0.33. The stationary phase also did not reveal an attentional bias toward the target-rich quadrant, F(1, 14) = 1.59, p > 0.20 (Figure 3B).

(A) A bird's eye view of the search display and possible standing positions around the monitor (in the actual experiment, the standing position cue preceded, rather than overlaid, the search display). (B) Results from Experiment 1B. Error bars show ±1 SEM of the difference between the target-rich and target-sparse conditions.

A direct comparison between Experiments 1A and 1B revealed no main effect of target quadrant, F < 1, and no interaction between experiment and target quadrant, F(1, 29) = 1.16, p > 0.25. Thus, even though participants made small incremental changes in viewpoint, were exposed to many viewpoints around the display, and could predict where they were going next, they were unable to prioritize the target-rich locations of the environment.

Recognition

The percentage of participants who correctly identified the rich quadrant was 12.5% in Experiment 1A and 20% in Experiment 1B. These values did not differ from chance (chance: 25%), ps > 0.15 on a binomial test. Thus, participants did not acquire explicit awareness about where the target was most often found. Recognition performance also did not interact with probability learning (p > 0.10 for data combined between Experiments 1A and 1B).

Comparison with previous findings

Data from Experiment 1 should be contrasted with that of a previous study in which participants searched similar displays from the same standing position (Jiang, Swallow, & Capistrano, 2013, experiment 3). In that study, participants walked halfway toward another side of the monitor and back between trials. Despite this movement, participants found the target faster when it was in the rich quadrant rather than in the sparse quadrants (Figure 1C). The effect was large, Cohen's d = 2.93, and provides a baseline against which null results can be interpreted. We therefore compared data from Experiment 1A with those from our previous study (Jiang, Swallow, & Capistrano, 2013, experiment 3) in an ANOVA using experiment as the between-subject factor and target quadrant (rich or sparse) and block (1–16) as within-subject factors. This test revealed a significant interaction between experiment and target quadrant, F(1, 30) = 21.37, p < 0.001, η_p² = 0.42.

Discussion

Changing the viewer's perspective from trial to trial disrupts learning, both when those changes are random and large and when they are predictable and small. These findings significantly expanded conditions under which implicit VSL was disrupted. They show that even when conditions for spatial updating are optimal, participants are unable to acquire environmental regularities that cannot be consistently represented relative to the viewer.

Experiment 2

People typically search for useful items in visually rich environments. Although many potential landmarks were available in Experiment 1 (e.g., room furniture), the search display itself was impoverished. Other than a red bar on one side of the monitor, participants did not have other task-relevant cues to reference the target's location. Enriching the search display may be a critical factor for producing environment-centered VSL. In fact, making two sides of a room perceptually distinct produced environment-centered VSL in a previous study (Smith et al., 2010). Participants in Experiment 2 were therefore asked to search a visually rich display (a Google map) for a traffic icon. The target—an icon of a car or a gas station—was most often found in one quadrant of the map, allowing the participants to reliably reference the target's location to the map itself. Similar to Experiment 1A, participants stood at a random location before each trial, so the target-rich locations were variable relative to their perspective. If a visually rich environment is sufficient to produce environment-centered VSL, then participants in Experiment 2 should find the icon faster when it appears in the quadrant that is most likely to contain it.

Methods

Participants

Sixteen new participants (18–35 years old) completed Experiment 2.

Stimuli

Four satellite images (45° aerial view; 1 in. on the map = 20 m in real space) of an unfamiliar university campus (19 × 19 cm) were acquired through Google Maps. Street labels were removed. The maps were visually rich and not symmetrical along any axis. One map was randomly assigned to each participant. This map was displayed for the entire experiment in the same orientation relative to the monitor. The search target was an icon of a car or a gas station (0.6 × 0.6 cm; in one of four orientations, 0°, 90°, 180°, or 270°, randomly selected for each trial). The icon was placed in a randomly selected location of the map. There were 100 possible locations from a 10 × 10 invisible grid that subtended 19 × 19 cm.

Design and procedure

This experiment used the same design as Experiment 1A, except for the stimuli and task. Participants completed 20 blocks of testing (24 trials per block). Just like Experiment 1A, for the first 16 blocks, the target icon was more often placed in one quadrant of the display (50%) than in any one of the other quadrants (16.7%). The rich quadrant was stable relative to the larger environment (e.g., the room) as well as the map itself. Participants changed their standing position from trial to trial randomly around the four sides of the monitor. For the last four blocks, participants stood at the same position to perform the search. In addition, the target icon could appear in any quadrant with equal probability (25%).

Recognition

At the completion of the visual search task, participants were shown four maps simultaneously (one in each quadrant). They were asked to choose the map that they saw in the experiment. Following this response, the correct map was displayed in the same way as during the search task. Participants were asked to touch the quadrant of the map where the target icon was most often found.

Results

Search RT

The use of a rich search display failed to yield VSL in Experiment 2 (Figure 4). In the first 16 blocks, RT was comparable when the target appeared in a rich quadrant or sparse quadrant, F < 1. This similarity in search times did not interact with experimental block, F(15, 225) = 1.01, p > 0.40, and no effect emerged in the stationary phase either, F(1, 15) = 1.78, p > 0.20.

A sample Google Map display (A) and results (B) from Experiment 2. Error bars show ±1 SEM of the difference between the target-rich and target-sparse conditions.

Recognition

All participants were able to identify the map that they saw from a set of four maps. However, only 31.3% of the participants were able to identify the target-rich quadrant on the map, which did not differ significantly from chance (25%; p > 0.15 on a binomial test). Quadrant recognition accuracy did not interact with location probability learning, p > 0.10.

Discussion

The map search task of Experiment 2 provided rich environmental cues that could facilitate landmark-based (including spatiotopic) coding of the environment. Participants actively explored the Google map to spot a traffic icon—a car or a gas station. Over several hundred trials of visual search, participants had acquired high familiarity with the map. However, they failed to prioritize search in one quadrant of the map that frequently contained the search target. The failure of VSL was also accompanied by a lack of explicit awareness about where the search target was most often found. These data showed that the richness of visual cues, by itself, is unable to produce environment-centered VSL.

Experiment 3

The failure of implicit VSL in Experiments 1 and 2 suggests that this form of learning is not environment centered. The observation that changing viewer perspective disrupts implicit VSL poses significant constraints on how much it could influence search in everyday contexts (e.g., when searching through a produce section for a preferred kind of apple). As we will discuss later, these data have important implications for understanding the function of implicit VSL. However, they also run counter to the intuition that people should be able to prioritize important locations during search even when their standing positions change. In fact, two previous studies that tested participants in a large environment have evidenced some degree of environment-centered VSL (Jiang et al., in press; Smith et al., 2010). In both of those studies, however, participants were highly accurate in explicitly recognizing the target-rich region.

The goal of Experiment 3 is to examine whether participants can spontaneously acquire environment-centered VSL and, if so, whether explicit awareness correlates with the degree of learning. Experiment 3 used a statistical learning paradigm that was known to produce explicit knowledge (Brockmole, Castelhano, & Henderson, 2006; Ehinger & Brockmole, 2008). In this experiment, participants searched for a small green letter overlaid on a natural scene. Multiple scenes and multiple target locations were used, each appearing 20 times over the course of the experiment. For half of the scenes (the old condition), the target's location was consistent within the scene across all repetitions. For the other half of the scenes (the shuffled condition), the mapping between the scene and the target's location was variable over time. By associating a single target location with a unique scene in the old condition, this design increased the likelihood that participants would become aware of where the target was in the scene (Brockmole et al., 2006). Similar to Experiments 1 and 2, participants' standing position changed randomly from trial to trial. Thus, the scene and the target's location were variable relative to the participants. If explicit awareness facilitates environment-centered VSL, then as long as participants became aware of the scene-target association, they should acquire learning in Experiment 3. Furthermore, individuals who showed greater awareness of the scene-target association should demonstrate greater VSL. In contrast, if participants are completely unable to acquire environment-centered learning in our setup (perhaps as a result of disorientation), then VSL should fail even in people who became aware of the scene-target association.