Visual search for category sets: Tradeoffs between exploration and memory

Melissa M Kibbe; Eileen Kowler

doi:10.1167/11.3.14

. Author manuscript; available in PMC: 2012 Dec 11.

Published in final edited form as: J Vis. 2011 Mar 18;11(3):10.1167/11.3.14 14. doi: 10.1167/11.3.14

Visual search for category sets: Tradeoffs between exploration and memory

Melissa M Kibbe ¹, Eileen Kowler ²

PMCID: PMC3519289 NIHMSID: NIHMS425332 PMID: 21421747

Abstract

Limitations of working memory force a reliance on motor exploration to retrieve forgotten features of the visual array. A category search task was devised to study tradeoffs between exploration and memory in the face of significant cognitive and motor demands. The task required search through arrays of hidden, multi-featured objects to find three belonging to the same category. Location contents were revealed briefly by either a: (1) mouseclick, or (2) saccadic eye movement with or without delays between saccade offset and object appearance. As the complexity of the category rule increased, search favored exploration, with more visits and revisits needed to find the set. As motor costs increased (mouseclick search or oculomotor search with delays) search favored reliance on memory. Application of the model of J. Epelboim and P. Suppes (2001) to the revisits produced an estimate of immediate memory span (M) of about 4–6 objects. Variation in estimates of M across category rules suggested that search was also driven by strategies of transforming the category rule into concrete perceptual hypotheses. The results show that tradeoffs between memory and exploration in a cognitively demanding task are determined by continual and effective monitoring of perceptual load, cognitive demand, decision strategies and motor effort.

Keywords: visual search, active vision, memory, exploration, eye movements, saccades, arm movement, immediate memory, oculomotor, categorization, cognitive load, decision-making

Introduction

Active visual tasks, such as searching a room for misplaced keys, or driving a car along an unfamiliar route, can make extraordinary demands on visual, cognitive and motor resources. Information must be gathered from large regions of space and retained for extended periods of time. Visual details are continually being forgotten, and must be retrieved by means of motor actions, such as movements of the eye or head. Decisions about whether to rely on an accumulating (but fragile) memory for the contents of a scene, or to refresh memory by revisiting previously seen locations, may be made at intervals ranging from one to three times each second. These decisions must weigh the risks of relying on a potentially inaccurate memory against the costs in time or effort of generating the motor actions needed to explore the environment. The challenges faced during active tasks increase when the tasks impose significant cognitive requirements involving the generation and evaluation of hypotheses about the contents of the scene. This study investigates the tradeoffs between exploration and memory in a cognitively demanding task that involves both visual search and categorization.

Much recent effort has been devoted to understanding the trade-offs between memory and motor exploration during active tasks. Initial reports emphasized the limited capacity of memory in contrast to the seemingly unlimited ability to generate eye movements (Ballard, Hayhoe, & Pelz, 1995; O'Regan, 1992). This perspective was supported by novel studies of eye movements during “active” visual tasks, showing that people preferred to re-examine previously seen locations, rather than relying on memory, in order to accomplish tasks such as copying arrangements of colored blocks (Ballard et al., 1995) or solving problems in geometry (Epelboim & Suppes, 2001). Subsequent work, however, altered views about the balance between memory and exploration Studies showed that despite the limits in the capacity of immediate memory for scene details during active tasks, memory can be better than expected, depending on the importance or predictability of the details (Brady, Konkle, Alvarez, & Oliva, 2009; Droll & Hayhoe, 2007; Hollingworth & Henderson, 2002; Pertzov, Avidan, & Zohary, 2009), the location of the details relative to the planned pathway of the saccadic eye movements (Bays & Husain, 2008; Gersch, Kowler, Schnitzer, & Dosher, 2008), or the number of times details were previously viewed (Epelboim et al., 1995; Melcher, 2001; Melcher & Kowler, 2001). In addition, motor exploration proved not to be cost-free. Planning of saccadic eye movements requires time and attention, so that people often avoid making saccades, or decide to alter the saccadic path, if the time needed for planning saccades is too long (Araujo, Kowler, & Pavel, 2001; Coëffé & O'Regan, 1987; Hooge & Erkelens, 1998) or if the distances that must be traveled are large (Ballard et al., 1995; Hardiess, Gillner, & Mallot, 2008; Inamdar & Pomplun, 2003). Taken together, these prior findings show that management of resources during active visual tasks is not a matter of favoring either memory or motor planning exclusively, but requires decisions about how to strike the appropriate balance between the two.

The prior work cited above focused mainly on perceptual or perceptual-motor tasks that made significant demands on visual memory and motor planning. Natural tasks, however, often impose significant cognitive demands as well. We investigated the role of both cognitive and motor demands in controlling the tradeoff between memory and exploration by testing performance in a difficult visual search task. The task required searchers to explore arrays of hidden objects to find three multi-featured targets that belonged to the same category. Cognitive demands were controlled by varying the complexity of the rule that defined the category. Motor demands were varied by changing the effector mediating the search (arm or eye) and by imposing different time constraints. The goal was to find out how the cognitive and motor demands of the task affected strategies of relying on memory or exploration.

How might cognitive demands affect strategies for balancing memory and exploration? When thinking and decision making become demanding, the best strategy may be to decrease the reliance on immediate or working memory in favor of a greater reliance on exploration, thereby freeing the limited memory resources for use in thinking or planning, rather than in retaining the contents of the display. Cognitive demands could also have more subtle effects. For example, it is possible that only a subset of the features of fixated objects are encoded during each glance (Alvarez & Cavanagh, 2004; Bays, Catalao, & Husain, 2009; Droll, Hayhoe, Triesch, & Sullivan, 2005; Olson & Jiang, 2002), and the selection of which features to encode may depend on the task demands.

The effect of cognitive task demands on memory has been addressed previously using dual-task methods. In a classic study, Baddeley and Hitch (1974) concluded that cognitive demands of a task do not affect memory based on their finding that words or numbers could be retained in working memory during simultaneous performance of a separate, unrelated task. Other studies have shown that sets of visual objects can be retained in working memory while performing a concurrent visual search task (Woodman, Vogel, & Luck, 2001), although visual search is slowed when the spatial locations of objects have to be retained during search (Oh & Kim, 2004; Woodman & Luck, 2004). Similarly, learning a new, rule-based category was more difficult when performing a concurrent working memory task, suggesting that working memory resources were required for learning the category (Zeithamova & Maddox, 2006). However, in such dual-task studies, in contrast to most real-world tasks, the to-be-remembered array is unrelated to the primary task. This encourages a compartmentalization of resources in ways that might not be relevant when memory and cognitive demands are integrated into a single task (Sperling & Dosher, 1986), as they are in most natural situations.

Present study

The present study imposed concurrent cognitive, perceptual and motor demands during a visual search task and used the observed search pattern, mediated by either arm or eye movements, to infer how memory resources were managed. The task required subjects to search through an array of hidden, multi-featured objects to find three that belonged to the same category. The task was constructed so that objects could be viewed only one at a time, allowing the searcher to decide when to go back and re-explore a previously visited location. Since only one object was viewed at a time, the experimenters could keep track of these decisions by observing the searcher's motor behavior.

The approach was inspired in part by Epelboim and Suppes's (2001) study of eye movements while solving geometry problems. They used sequences of eye fixations to estimate the span of immediate memory by analyzing the pattern of revisits to previously viewed locations. They estimated the span of immediate memory to be about 4 or 5 regions of a diagram (similar to typical estimates; e.g., Luck & Vogel, 1997), with revisits serving to replenish this limited store as regions were forgotten. We will apply Epelboim and Suppes (2001) model to our search data. (See Zelinsky, Loschky, & Dickinson, 2010, for a similar model of revisits, applied to a memorization task.)

The cognitive demands of Epelboim and Suppes's (2001) geometry task were considerable, and were likely to have played a large role in determining which regions of the diagram were fixated. Subjects decided which regions of a diagram were most relevant as they worked through the problem, and thus were able to make strategic decisions about limited resource allocation “online”. This makes for a more natural task in which limited resources must be allocated dynamically. However, because of the complex nature of the geometry problems it is difficult to systematically quantify how task demands affected memory use.

In the present study, cognitive task demands were controlled by varying the complexity of the search rules, and motor demands were controlled by varying the time and effort needed to visit locations. Specifically:

(1)
Variation in the cognitive demands of the task. The cognitive demands of the search task were manipulated by varying the complexity of the rule that defined the set of target objects. Previous work has shown that the subjective difficulty of a category rule depends on both the number of features relevant to the rule, as well as on the nature of the decisions about candidate targets (e.g., searching for objects that share one or more features vs. objects that differ on one or more features). Feldman (2000), following on classical work by Shepard, Hovland, and Jenkins (1961), found that the difficulty of learning a new category from examples was proportional to the shortest propositional formula that is logically equivalent to that category; the more logically complex the formula, the more difficult the category was to learn (see also Aitkin & Feldman, 2006; Feldman, 2006; Pothos & Chater, 2002; Pothos & Close, 2008). More recently, Jacob and Hochstein (2008) found that, in a search task in which subjects had to find sets of objects with either the same features or different features, same-feature sets were detected more quickly than different-feature sets. They concluded that detecting similarities results from use of a basic, built-in perceptual process, and is thus less effortful than finding differences.

The present study used five different category rules which were based on either one, two, or three of the objects' four features. Each rule required evaluating either conjunctions of features (each feature value is the same), or exclusive disjunctions of features (each feature value is completely different), or both. Based on prior work (Aitkin & Feldman, 2006; Feldman, 2006; Jacob & Hochstein, 2008), the complexity of the five rules used in the current task was assumed to increase by either adding a feature to the rule or by incorporating a disjunction. If the cognitive demands of the task influence memory use, we expect that as category complexity increases, searchers should visit and revisit more object locations, rather than rely on memory, to make the decision. Two additional aspects of the task should be emphasized. First, the task involves category search, and not category learning, since it required finding a set of objects that satisfied a rule presented before each trial (several possible sets that could satisfy the rule were available in each display). Second, the contents of the displays were chosen so that an “ideal searcher” with perfect memory could find the targets after searching the same number of object locations (between 4 and 5) regardless of the category rule. Thus, an effect of the type of category rule on search would imply that the human searcher (unlike the ideal searcher) was encountering performance limits due to limitations imposed by cognitive processes and strategies and not due to statistical fluctuations within the display.
(2)
Variation in the motor costs of the task. Motor costs were varied by testing both manual search (Experiment 1), in which subjects searched through objects by clicking locations with a mouse, and the (presumably less demanding) oculomotor search (Experiment 2), in which fixation on a location revealed the object. Two kinds of oculomotor search were tested: search with delay, in which a brief pause was imposed between fixating a location and the appearance of the object at that location, and search with no delay, in which the object appeared as soon as the fixation was detected. In the delay condition, the duration of the delay was chosen such that the time to carry out the search approximated that of the manual search found in Experiment 1. When the motor demands of a task are high (as in manual search, or in delayed oculomotor search), the best strategy may be to reduce exploration, thus minimizing the amount of time or physical action required for the task. If this were the case, manual search or oculo-motor search with delay would be carried out with fewer visits and revisits to object locations than oculomotor search with no delay.

As a preview: Manipulations of the cognitive and motor demands of the task altered the search patterns, and, by implication, the use and reliance on memory. Increasing the cognitive demands of the task, and decreasing the motor demands, each resulted in more visits and more revisits to object locations, that is, a bias to favor exploration over memory. Further analyses done to estimate the span of immediate visual memory from the pattern of revisits using the model of Epelboim and Suppes (2001) yielded estimates similar to those found in prior work with very different task constraints (Epelboim & Suppes, 2001; Jacob & Hochstein, 2009), although the estimates of immediate memory span did vary as a function of both the motor demands and rule complexity. These results show that concurrent monitoring of perceptual states, cognitive load, and motor effort determine the strategies used to control the balance between exploration and memory during active visual tasks.

A portion of these results were presented at meetings of the Vision Sciences Society (Kibbe, 2008; Kibbe, Kowler, & Feldman, 2009).

Experiment 1

Methods

Subjects

Eight subjects participated. Subjects were either undergraduates recruited from the General Psychology subject pool who earned course credits for participation, paid subjects who earned $10 for participation, or graduate student volunteers. Subjects all had normal or corrected-to-normal vision. Four subjects completed two sessions of 25 trials, for a total of 50 trials each, and four subjects completed one session of 25 trials. An additional subject was tested but the data were not analyzed due to a visual impairment not disclosed prior to experiment.

Stimuli

Stimuli were displayed on a Dell 19” LCD monitor (refresh rate 75 Hz) viewed from a distance of 118 cm. Displays consisted of nine “hidden” objects (2.9° × 2.9°) arranged in a 3 × 3 array (7.8° horizontally by 7.0° vertically). The location of each object was indicated by a black outline (3.1° × 2.8°) and the distance between midpoints of objects was 4.6° horizontally and 4.4° vertically. Objects were revealed by clicking on their location with a mouse. Each object was defined by four trinary features: color (red, green, or blue), shape (oval, rectangle, or diamond), texture (solid, striped, or grid), and orientation (upright, downward, or sideways). Figure 1 shows a sample array.

A sample array of objects for a trial. In the actual experiment, these objects were hidden from view and could be revealed one at a time by either a mouse click (Experiment 1) or a an eye fixation (Experiment 2). The category rule is displayed at the bottom of the screen at all times. This sample shows Category S (Objects share one feature). There are 10 possible correct sets in the sample array.

On each trial subjects searched for a set of three objects belonging to the same category according to one of the five possible category rules. The five category rules were formed by combining conjunction and disjunction rules over the objects' features (see Figure 2 for examples):

Category S: Objects share one feature (the other 3 features are irrelevant);
Category SS: Objects share two features (the other 2 features are irrelevant);
Category SD: Objects share one feature, differ on one feature, with the remaining 2 features irrelevant;
Category SSD: Objects share two features, differ on one feature, with the remaining feature irrelevant;
Category SDD: Objects share one feature, differ on two features, with the remaining feature irrelevant.

Category rules and examples of each. During training, subjects were presented with three-object sets and were asked to judge whether they were an example of the category rule.

Here, “differ” means that each object must have a different value of the feature (e.g. one is red, one is blue, and one is green.) Since the searcher was never told which specific features to search for, the searcher had to decide which features and objects to explore and of those, which might satisfy the category.

An experimental session consisted of 25 trials organized into five blocks of five trials each. Each category rule (see above) was tested once per block. The order of testing rules within a block was pseudo-randomized with the constraint that the same category rule was never tested twice in succession and no two categories ever appeared in the same order in each block.

The nine objects on each trial were selected such that an ideal searcher (with no memory loss), limited only by statistical fluctuations in the content of the display, would have about the same probability of finding a correct set of three objects for each category rule, regardless of complexity. To create each trial, an algorithm tested every possible 3-object combination of a randomly drawn set of nine objects against a given category rule. The algorithm chose the nine objects such that on each trial there were nine to 12 possible correct sets of three objects for each rule. If the nine objects drawn did not fit the criterion, a new set of nine objects was drawn. To verify the success of this algorithm, an ideal searcher was programmed to perform the search task by choosing locations to visit at random, store the object at each visited location in memory, and then check its memory after every visit to see whether it had found a set of three objects that satisfy the category. The ideal searcher completed 500 trials of each category type (2500 total trials) and performed the about the same regardless of category rule, requiring an average of 4.7 visits to find a correct set.

Training

Before beginning the experiment, subjects received 26 training examples to familiarize them with the categories and objects. In each training example, subjects were given a category (e.g. “Objects share one feature”). They were then presented with three objects, and asked to decide whether the three objects belonged to the given category. Subjects were given feedback as to whether they were correct and an explanation as to why or why not. An experimenter observing the subject answered any questions that the subject had about the categories. Subjects were allowed to go through the training examples as many times as they liked until they were confident that they understood the categories. Most subjects felt confident that they had learned the categories after going through all the training examples only once or twice.

Procedure

A sample sequence of events in a trial are illustrated in Figure 3. Before each trial, an instruction screen appeared that defined the category rule for that trial. When subjects were ready to proceed, they clicked on the instruction screen and nine outline rectangles (one per object) appeared. These nine rectangles indicated the location of each object. The content of a given location was revealed by moving the mouse cursor to the location and clicking on the location. A revealed object remained visible for 1 second. Only one object could be viewed at a time. While one object was visible, clicking on another location had no effect. Inspection of the locations continued until the set of 3 had been found. A right-click of the mouse was used to select a location as belonging to the set. A selected object remained visible and was highlighted with a heavy black border. Once selected, an object could not be unselected. After three objects were selected, the trial ended. It was possible to select one or two objects, and then continue search, however, this strategy was followed only rarely (<1%). (Subjects were advised that selecting only a portion of the set before finding all three objects was likely to lead to error since choices could not be revised.) Subjects were allowed to search for up to two minutes, at which point the trial timed out. No subjects failed to complete the trial in the allotted time. Eye movements were not recorded during in Experiment 1.

A sample trial for Category S in Experiment 1. Each screen represents the sequence of actions over the course of the trial. Once three objects were selected, the trial ended.

Results

There were three main performance measures: 1) error (selecting a set of objects that did not satisfy the category rule); 2) mean total number of visits to objects per trial; and 3) mean number of revisits to previously viewed objects per trial.

Error

Error was defined as selecting a set of three objects that did not satisfy the category rule. Figure 4 shows that the mean number of errors within each category rule were low, an average of 1 error or fewer per subject for each category rule, with the highest number of errors for the most complex category. This works out to a total of only 17 trials with errors out of all 275 trials tested (across the 5 rules and 8 subjects). The remaining analyses of visits and revisits are based solely on trials in which an error was not made.

Visits

The mean number of objects viewed per trial was an indicator of the search strategy. The mean number of visits increased significantly as category complexity increased (Figure 5), from about 9 visits/trial for the easiest category rule, to 17 visits/trial for the most difficult rule (F(4) = 7.021, p < 0.001). Post-hoc analysis indicated significant increases in mean visits between adjacent categories, except between Category SD (same on one feature; differing on one feature) and category SSD (same on two features, differing on one feature) (LSD p < 0.03).

The large effect of the category rule on the number of visits (Figure 5) was not due to variations in the probability of encountering sets of objects that satisfied the category rule. Displays were constructed so that an ideal searcher, with perfect memory and limited only by the statistical fluctuations in the display, could find a correct set using about the same number of visits across the category rules (see Methods). Results of simulations using the ideal searcher tested in 500 simulations per rule are shown in Figure 5. The ideal searcher found correct sets in an average of 4.7 visits, with small (<.5 object) fluctuations across the rules. Further simulations, in which the ideal searcher's memory was limited to the contents of 4 locations, produced similar results, with 4.8 visits on average required to find the sets, and only small differences in the mean number of objects viewed across the categories. Thus, the effect of rule complexity on the number of visits, shown in Figure 5, represents limitations imposed by cognitive factors or by variation in search strategies across the different rules.

Adding disjunctions to the category rule played a larger role in increasing the number of visits than adding features (see increases between Category SS vs. SD and Category SSD vs. SDD in Figure 5). This suggests that the increases in the number of visits across category rules was not due exclusively to limitations on the number of features that can be held in memory.

Revisits

Performance was characterized by frequent revisits to previously seen object locations. The mean number of revisits increased as complexity increased (F(4) = 9.026, p < .001, Figure 5), with revisits constituting more than half of the total number of visits for the most complex category. Post-hoc analyses showed that adding a disjunction to the category rule resulted in a significant increase in revisits (Category SS vs. SD: LSD mean difference = 3.27 revisits, p < 0.05; Category SSD vs. SDD: LSD mean difference = 5.98 revisits, p < 0.001.)

Summary

Search became more difficult as the complexity of the categories increased, requiring more visits to object locations, more revisits, and resulting in more errors. The effect could not be driven by statistical fluctuations in the stimuli because stimuli were chosen such that an ideal searcher performed nearly identically on each category.

The increase in number of visits across the categories was also not due solely to the number of features defining the category. There were significant increases in the number of visits between categories defined over the same number of features, but differing in categorical structure, as when a category rule contained a disjunction rather than a conjunction. Thus, the effects of category rule on search involved issues of cognitive search strategies, and not exclusively feature memory load.

Experiment 1 required arm movements and mouse clicks to search the object array. Experiment 2 reduced the motor demands by testing search mediated by eye movements. In oculomotor search, the motor demands should be reduced because moving the arm is more effortful and takes longer than moving the eye. In the oculomotor search task, a saccade-contingent method was used so that the contents of each location were not visible until the location was fixated. Given the expected difference between the time needed for oculomotor and manual search, two different types of oculomotor search were tested: 1) a no-delay condition in which the contents of a location were revealed as soon as it was fixated, and 2) a delay condition in which a brief delay was imposed between the fixation of an object and that object becoming visible.