Searching near and far: the attentional template incorporates viewing distance

Surya Gayet; Elisa Battistoni; Sushrut Thorat; Marius V Peelen

doi:10.1037/xhp0001172

. Author manuscript; available in PMC: 2024 Sep 11.

Published in final edited form as: J Exp Psychol Hum Percept Perform. 2024 Feb 1;50(2):216–231. doi: 10.1037/xhp0001172

Searching near and far: the attentional template incorporates viewing distance

Surya Gayet ^1,^2,^C, Elisa Battistoni ³, Sushrut Thorat ^2,⁴, Marius V Peelen ²

PMCID: PMC7616437 EMSID: EMS198528 PMID: 38376937

Abstract

According to theories of visual search, observers generate a visual representation of the search target (the ‘attentional template’) that guides spatial attention towards target-like visual input. In real-world vision, however, objects produce vastly different visual input depending on their location: your car produces a retinal image that is ten times smaller when it’s parked fifty compared to five meters away. Across four experiments, we investigated whether the attentional template incorporates viewing distance when observers search for familiar object categories. On each trial, participants were precued to search for a car or person in the near or far plane of an outdoor scene. In ‘search trials’, the scene reappeared and participants had to indicate whether the search target was present or absent. In intermixed ‘catch-trials’, two silhouettes were briefly presented on either side of fixation (matching the shape and/or predicted size of the search target), one of which was followed by a probe-stimulus. We found that participants were more accurate at reporting the location (Exp. 1&2) and orientation (Exp. 3) of probe-stimuli when they were presented at the location of size-matching silhouettes. Thus, attentional templates incorporate the predicted size of an object based on the current viewing distance. This was only the case, however, when silhouettes also matched the shape of the search target (Exp 2). We conclude that attentional templates for finding objects in scenes are shaped by a combination of category-specific attributes (shape) and context-dependent expectations about the likely appearance (size) of these objects at the current viewing location.

Significance statement

When searching for an object in our surroundings, traditional theories of visual search posit that we generate a mental picture of the object we are looking for (the “attentional template”). Depending on where we look (e.g., further away), however, an object will produce a vastly different (i.e., smaller) image on the retina. Here we show that observers flexibly adjust their attentional template, based on their current search location, effectively accounting for viewing distance by searching for a smaller version of the object when searching further away. These findings reconcile traditional theories of visual search with the challenges imposed by naturalistic vision.

Materials and resources

All materials (stimuli, experiment scripts, raw data, data processing scripts, complete output of statistical analyses) are publicly accessible via the following online repository: https://osf.io/84tbv/.

Introduction

Every moment in time our retinae collect unfathomable amounts of information from the world around us. Because the vast majority of this visual input is irrelevant to our current behavioral goals, our visual system is equipped with means to favor behaviorally relevant visual input over irrelevant visual input. One such means lies at the heart of most leading theories of visual search: these theories posit that observers generate a visual representation of the object they are looking for (a so-called attentional template), thus optimally preparing the visual processing stream to favor visual input that resembles the template (such as the target object), at the expense of visual input that does not (Duncan, & Humphreys, 1989; Wolfe, 1994; Desimone, & Duncan, 1995; Kastner, & Ungerleider, 2001; Wolfe, & Horowitz, 2004; Eimer, 2014; for reviews, see Battistoni, et al., 2017; Beck, & Kastner, 2009). Evidence for template-based visual search mostly comes from lab-based studies using impoverished visual displays, which stand in stark contrast with the complexity of naturalistic visual environments. Therefore, it remains a matter of debate to what extent well-established mechanisms of visual search generalize to real-world vision (Wolfe & Horowitz, 2004; Wolfe, et al., 2011; Peelen, & Kastner, 2014; Wolfe, 2021).

Human observers are particularly proficient in detecting objects in naturalistic scenes (Potter, 1975; Thorpe, et al., 1996; Li, et al., 2002; Peelen, et al., 2009; Wolfe, et al., 2011), despite their inherent complexity and clutter, as compared to the typical impoverished displays that are used in most studies investigating visual search. This proficiency suggests that mechanisms of visual search are particularly well-adapted to complex naturalistic vision (Peelen, & Kastner, 2014). Natural scenes provide a rich source of information that observers can capitalize on during search, by constraining the likely locations and identity of objects in the scene (i.e., contextual guidance; Torralba et al., 2006; Neider, & Zelinsky, 2006; Droll & Eckstein, 2008; Malcolm, & Henderson, 2010; Spotorno, et al., 2014; Boettcher et al., 2018; for a review, see Castelhano, & Krzyś, 2020). Naturalistic environments, however, pose a fundamental challenge to the core principle of template-based visual search: the image that any given target object will produce on the retinae is unknown in advance, because it varies with the (unknown) location of the target object. Its color or brightness depends on the illumination (e.g., in the sun, in the shade or under artificial lighting), its shape depends on the viewpoint (e.g., viewed from the side, from above, or at an angle), and −most dramatically− its size can vary by orders of magnitudes depending on the distance between the target object and the observer. Consequently, it remains unknown to the observer what template needs to be generated to effectively search for a given target object, which calls into question the usefulness of template-based visual search during real-world vision.

In this study we test one key mechanism that could solve this problem, focusing on the predictable relationship between viewing distance and retinal object size. We test the hypothesis that human observers account for viewing distance when searching for a given object. This would entail that observers effectively search for a smaller projection of the object when searching far away (generating a smaller attentional template), and for a larger projection of the object when searching nearby (generating a larger attentional template). In favor of this hypothesis, it has been shown that attentional templates can be flexibly adjusted to match the current task demands during naturalistic search (Yu et al., 2023). For instance, observers can adjust the tuning (or: precision) of the attentional template, to account for the uncertainty of target object appearance (Lleras et al., 2022; Witkowski, & Geng, 2022; Hout, & Goldinger, 2015; Bravo, & Farid, 2012), or adjust the feature content of the attentional template to optimally distinguish the target object from anticipated distractor objects (Howard, et al., 2011; Boettcher et al., 2020; Lerebourg et al., 2023). Moreover, priming the upcoming target object with word-cues or semantically congruent scenes benefits subsequent search (Stein, & Peelen, 2017; Robbins, & Hout, 2020; Malcolm & Henderson, 2009), suggesting that observers adjust their attentional template to account for the provided context. Most specifically, we recently showed that when participants prepare to search for a target object nearby (compared to far away), patterns of neural activity emerge in visual cortex that are similar to activity patterns evoked by viewing large (compared to small) images of this target object (Gayet, & Peelen, 2022). This shows that the human visual system anticipates the size of an object depending on the viewing distance. But does this visual-like activity evoked during search preparation benefit search behavior in any way? In other words: do human observers generate distance-dependent (i.e., size-specific) attentional templates to aid visual search? One finding supporting this possibility is that observers sometimes fail to identify an object that is disproportionally large compared to its background (Eckstein, et al., 2017). Going against our hypothesis, however, are results from studies showing that attentional templates can be invariant to such visual attributes as orientation (Reeder, & Peelen, 2013) and size (Bravo, & Farid, 2009). This invariance may particularly apply to highly familiar real-world object categories (cars, people), for which detection is highly efficient (e.g., Li, et al., 2002; Thorpe, et al., 1996; see also Stein, & Peelen, 2017). According to this view, an object-specific attentional template (e.g., of a car) would benefit search irrespective of its orientation or size. Here, we ask whether the attentional template incorporates (retinal object) size during naturalistic visual search, when size can be directly inferred from the scene context (i.e., viewing distance).

To answer this question, we conducted a series of behavioral lab-based experiments, in which participants were searching for one of two possible object categories (a person or a car), at different viewing distances within outdoor scene photographs. The viewing distance informed participants of the (retinal image) size of the target object, allowing them to incorporate size information in their preparatory attentional template. To test whether the attentional template indeed contained size information we used a dual-task design. In “search trials” participants searched for a pre-cued object category (a car or person) and reported which of two briefly presented scenes contained the target object. Critically, the size of the target object was −in principle− predictable, based on the layout of the search scene (Experiment 1) or on a cue instructing where to search (in depth; Experiments 2-3). The goal of these trials was to motivate participants to instill a preparatory attentional template that could potentially incorporate size information. In intermixed “catch trials”, we used a dot-probe task that allows for probing attentional biases (MacLeod, et al., 1986), and has been used to reveal the contents of the attentional template (Reeder, & Peelen, 2013; Reeder, et al., 2015; Gayet, & Peelen, 2019). In this task, the search cue is unexpectedly followed by two task-irrelevant silhouettes (on both sides of fixation), of a car or person of differing sizes. Participants are tasked with responding to a simple target stimulus presented to the left or right of fixation, immediately after the presentation of the silhouettes. The idea is that, if one silhouette matches the attentional template to a better extent than the other silhouette (e.g., a car versus a person silhouette), attention will be directed to the location of the matching silhouette, thus improving target reports at that location. In the current study, this approach allowed us to measure a specific aspect of the search template that is key to naturalistic visual search (whether it incorporates the size of the target object, as predicted from viewing distance), while preserving the experimental control of reductionist experiments.

To preface the results, we demonstrate that attentional templates are retinal size-specific (Experiments 1-3). These size-specific attentional templates, however, only favor size-consistent visual objects that resembles the search target; they do not favor all objects of the predicted size (Experiments 1-2). The data further show that observers could infer the predicted retinal size of the search target from the viewing distance in the scene, following a location cue, even when the viewing distance changed trial-by-trial (Experiments 2-3). This showcases the ability of observers to flexibly change the size of their attentional template when searching at different locations of a visual scene. Importantly, visual discrimination performance (on an orthogonal task) was better at the location of size-consistent compared to size-inconsistent silhouettes (Experiment 3), which implies that size-consistent objects attracted spatial attention. Together, these findings show that observers infer the predicted retinal size of a search target from the viewing distance in a scene to favor target-like visual input during naturalistic visual search.

Experiment 1

Methods

Transparency and openness

The current study adheres to all Transparency and Openness Promotion (TOP) guidelines regarding research transparency; in the OSF project dedicated to this study (https://osf.io/84tbv/) we provide (1) the experiment scripts and stimuli that were used for data collection, (2) the raw data, (3) the data pre-processing and analysis scripts, and (4) the complete output of all statistical analyses. The experiments in this study were not pre-registered. Nonetheless, we believe that the risk of false positive inflation caused by the degrees of freedom in data analysis choices is minimized by (1) applying minimal data exclusion, by (2) presenting three internal (conceptual) replications of the main finding, by (3) using the exact same analysis pipeline in all studies, and by (4) showing consistent statistical outcomes across different types of statistical tests. The years of data collection were 2017 (Experiment 1), 2019 (Experiment 2), and 2021 (Experiment 3).

Participants

Thirty healthy students from the University of Trento participated in Experiment 1, which comprised two experimental sessions conducted on different days. All participants (25 women; mean age 23.3 years, SD = 3.8) had normal or corrected-to-normal vision and provided written informed consent to take part in the study. Most participants received monetary compensation (€8,-/session), but three participants took part for course credits. The experiment was approved by the Ethics Committee of the University of Trento. The sample size for Experiment 1 was based on resource availability; formal power analyses were conducted for all subsequent experiments (see Methods section of Experiment 2).

Setup

Stimuli were presented on a 19” Philips 109P monitor with a screen resolution of 1024 x 768 pixels and a refresh rate of 100Hz. Stimulus presentation and response registration were done with MatLab 8.0 using Psychtoolbox-3 (Brainard, 1997; Pelli, 1997). All stimuli were presented on a uniform gray background, with a black plus-sign (“+”) at the center serving as a fixation point. Viewing distance was fixed at 55cm from the monitor using a chin-rest.

Natural scene stimuli (search trials)

A total of 378 outdoor scene photographs were found via Google Image search or retrieved from previous studies. Of those, 162 had target objects (i.e., people or cars) in the foreground (near location), which were thus relatively large: 54 scenes with cars, 54 scenes with people, and 54 scenes with cars and people. Another 162 scenes had target objects in the background (far location), which were thus relatively small: again, this comprised 54 scenes with cars, 54 scenes with people, and 54 scenes with cars and people. The remaining 54 scenes contained no target objects. In order to increase the number of scene stimuli, each of these 378 scenes was horizontally mirrored, amounting to a total of 756 unique scene stimuli. The 324 scenes with near/large target objects were used in one experimental session (the Near Target session), the 324 scenes with far/small target objects were used in another experimental session (the Far Target session), and the remaining 108 scenes without target objects were used in both sessions (see Figure 1a).

(a) Scene stimuli used in the search task. During the Near Targets session, target objects (person or car) were located in the foreground, and their retinal image size was therefore relatively large. During the Far Targets session, target objects were located in the foreground, thus producing a relatively small retinal image. (b) Silhouette stimuli used in the catch trials. The sizes of the silhouettes were matched to the sizes of the target objects presented within the search trial scenes.

All scenes were converted to greyscale and rescaled to 427 (horizontal) by 320 (vertical) pixels, subtending 15.8 by 11.7 degrees of visual angle. The average height of the target objects was 52 pixels for “far” persons, 240 pixels for “near” persons, 56 pixels for “far” cars, and 287 pixels for “near” cars. Of note, the largest “far” object of the stimulus set was smaller than the smallest “near” object, thus ensuring the validity of the session-specific manipulation of expected object size.

Silhouette stimuli (catch trials)

The stimuli used in the catch trials were black silhouettes of cars and people, presented on the uniform gray background. A total of 576 silhouettes were selected from stimuli used in previous experiments, or created based off images of cars and people found via Google Image search, using GIMP (https://www.gimp.org). This resulted in 144 unique silhouette stimuli in each size (large, small) and category (person, car) condition (see Figure 1b). These silhouettes were scaled to match the sizes of the target objects presented within the natural scenes that are used in the search task.

Experimental procedure

The experiment consisted of two sessions of 45 minutes each; a “Near Targets” session in which all target objects in the scenes were relatively nearby (and thus subtended a large retinal image), and a “Far Targets” session in which all target objects in the scenes were relatively far away (and thus subtended a small retinal image). Each participant completed both sessions on separate days, and the second session was completed within a week of the first session. The order of sessions (“Near Targets” first or “Far Target” first) was counterbalanced across participants. Each session comprised nine blocks of 64 trials each, of which 48 search trials (75%) and 16 catch trials. The silhouettes were large in half of the catch trials, and small in the other half. Therefore, each block comprised catch trials with two size-consistent silhouettes (i.e., large silhouette in “Near Targets” session, small silhouette in “Far Targets” session) and trials with two size-inconsistent silhouettes (i.e., large silhouette in “Far Targets” session, small silhouette in “Near Targets” session). The order of trials within a block was pseudo-randomized, so that search trials, catch trials with large silhouettes, and catch trials with small silhouettes were intermixed. The only restriction was that the first three trials of each block were always search trials, to ensure that participants were engaged in the (size-specific) search task before the first catch trial appeared. At the start of each session participants performed one practice block to familiarize with the task.

Search trials

The order of events in search trials is depicted in the top row of Figure 2. Each search trial started with a central fixation cross (500 ms), followed by the letter “C” or “P” (500 ms), which instructed participants to search for a car or person in the upcoming scene images (for Italian speaking participants, this was replaced with a “M” or “P”, for “macchina” and “persona” respectively).

After another fixation cross (1000 ms), during which observers could prepare for the search task, two scenes were simultaneously presented for 67ms on either side of fixation, in one of four possible combinations: (1) car in the left scene, person in the right scene; (2) person in the left scene, car in the right scene; (3) both person and car in the left scene, no target objects in the right scene; and (4) no target objects in the left scene, both person and car in the right scene. These combinations ensured that viewing one object (e.g., a car) in a scene was not predictive of the location of the other object, hence inciting participants to search for the cued object (rather than inferring its location from the location of the other object).

The scenes were followed by a blank screen of variable duration (range [10ms, 300ms]), and two backward masks that covered the same presentation area as the scenes (350ms). The duration between scene offset and mask onset was titrated using an adaptive staircase procedure, aiming at a search task performance of 75% correct in both (“Near Targets” and “Far Targets”) experimental sessions. This was done by reducing the duration of the blank screen by 20ms when accuracy (from the 6^th trial onwards) rose above 75% and by increasing its duration by 20ms when accuracy dipped below 75% correct.

The masks were followed by a fixation cross (1660ms), during which observers reported which target scene (left or right of fixation) contained the target object, using the “z” and “n” arrow keys (for left or right scene, respectively). Finally, a feedback screen (500ms) indicated whether they were correct (“+1”) or incorrect (“+0”).

To test whether the staircase procedure was successful in equating search task difficulty between the Near Targets session and Far Target session, we conducted a 2x2 repeated-measures ANOVA with the factors Object (person versus car) and Distance (near versus far), on both accuracy and response times. A main effect of Distance on accuracy showed that participants were more accurate in localizing target objects in the Near Targets session (M = 87.9%, SD = 4.2) than the Far Target session (M = 74.5%, SD = 6.5), F(1,29) = 179.12, p <.001, ɳ² =.729. Similarly, a main effect of Distance on reaction times showed that participants were faster in localizing target objects in the Near Targets session (M = 549ms, SD = 93) than the Far Target session (M = 609ms, SD = 100), F(1,29) = 11.18, p =.002, ɳ² =.257. These results show that larger objects remained easier to find that smaller objects, despite the thresholding procedure that was aimed at equating performance between Distance conditions. This probably reflects that localization of relatively large objects was too easy with a presentation time of 67ms, even at the shortest scene-mask interval of 10ms (which motivated us to use a different staircase procedure in Experiments 2 and 3).

Catch trials

The order of events in catch trials (dot-probe task) is depicted in the second and third row of Figure 2. The start of a catch trial was indistinguishable to that of a search trial, thus inciting participants to generate an attentional template in anticipation of the search task. That is, the trial started with a fixation cross (500ms), a letter cue (500ms), and another fixation point (1000ms). Then, instead of two scenes, two silhouettes were presented on either side of fixation (for 67ms). The two silhouettes were either both small or both large (i.e., they were both either consistent or inconsistent with the size of search targets in the current session), and one silhouette was always of a car and the other of a person (i.e., one silhouette matched and the other silhouette mismatched the category of the search target).

After the silhouettes, a fixation point was briefly presented (50ms), and a small circular target dot appeared on one side of fixation (100ms); at the location of the silhouette that matched the category of the search target (valid trials) or at the location of the mismatching silhouette (invalid trials).

After the offset of the target dot, the fixation cross remained on screen for 1660ms, during which participants could report the location of the target dot (left or right of fixation), using the “z” and “n” arrow keys (for left or right scene, respectively). Participants were instructed to ignore the task-irrelevant silhouettes. Finally, a feedback screen (500ms) indicated whether they were correct (“+1”) or incorrect (“+0”).

Data analysis

We focus our analyses of catch trials on accuracy because pilot experiments revealed that our effects of interest were better captured by accuracy differences than reaction times differences between conditions. For transparency, and to verify that our reported effects are not the result of changes in speed-accuracy trade-offs, we report all reaction time analyses in Supplemental Materials S1. Before performing the analyses, we collapsed the catch-trial data across all conditions of non-interest (e.g., the specific category of the silhouette); additional analyses in Supplemental Materials S2 show that none of the outcomes reported in the main manuscript depend on these conditions of-non interest.

All tests reported in the Results section and Supplemental Materials are two-tailed within-subject tests with a significance threshold of 0.05. To compare between pairs of conditions, we use paired-samples t-tests when normality assumptions are met (according to a Shapiro-Wilk test, with a significance threshold of 0.05), and we use Wilcoxon signed-rank tests when they are violated. In case multiple factors are included in the analysis (e.g., Experiment 1), we always use Repeated-Measures ANOVAs, which are robust to violations of normality (Blanca, Arnau, García-Castro, & Bono, 2023) and offer more flexibility than the non-parametric alternatives. Whenever parametric tests are used, we report parametric measures of central tendency (mean), effect sizes (d_z, or ɳ²), and spread (standard deviation). Conversely, whenever non-parametric tests are used, we report non-parametric measures of central tendency (median), effect sizes (rank-biserial correlation), and spread (inter-quartile range). Finally, for all critical tests, we also conducted two-sided one-sample bootstrap tests (1*10⁶ permutations) comparing the difference between conditions-of-interest to zero.

To address the main question of whether observers incorporate the predicted retinal size of a target object in the attentional template, we analyzed participants’ average accuracy on catch trials. Catch trial data were analyzed as a function of two experimental factors: category-validity (of the target dot location relative to the silhouettes), and size-consistency (of the silhouettes with the search task session). In valid trials the target dot appeared at the location of a silhouette that matched the search cue (i.e., a car silhouette when participants were cued to search for a car, or a person silhouette when participants were cued to search for a person). In invalid trials the target dot appeared at the opposite location, where the silhouette mismatched the search cue (i.e., a car silhouette when participants were cued to search for a person, or a person silhouette when participants were cued to search for a car). In half of the trials, the silhouettes were size-consistent, which entails that the size of the silhouettes was consistent with the size of the search targets (i.e., large silhouettes in the “Near Targets” session, and small silhouettes in the “Far Targets” session). In the other half of the trials, the silhouettes were size-inconsistent, which entails that the size of the silhouettes was inconsistent with the size of the search targets (i.e., large silhouettes in the “Far Targets” session, and small silhouettes in the “Near Targets” session). Figure 3a illustrates the four conditions of the 2x2 factorial design. Mean accuracy scores were computed for each participant and for each of the four conditions of interest, only excluding trials in which no response was provided within the 1660ms time window.

(a) Visualization of the two-by-two factorial design (for simplicity, all four cells depict “Person” search, in a “Near Targets” session). The dot target appeared either at a valid location (i.e., at the location of a person silhouette following the “P” search cue, or at the location of a car silhouette following the “C” search cue) or an invalid location (vice versa). The size of the silhouettes was either consistent with the size of the search targets (i.e., large silhouettes in a “Near Targets” session, or small silhouettes in a “Far Targets” session) or inconsistent (vice versa). (b) Left: mean proportion correct in each of the 2x2 conditions depicted in panel a. Right: validity effect (performance on valid minus invalid trials) for the size-consistent and size inconsistent conditions. Transparent dots are individual participant means; error bars in the interaction plot represent the within-subject standard error of the mean (Cousineau, 2005); The whisker on the right-most bar of the difference plot shows the 95% confidence interval of the paired difference between size-consistency conditions.

Results

Catch trial analysis

If participants generate an attentional template that incorporates the predicted retinal size of a target object, the category-validity effect (higher accuracy for reporting target dots appearing at the category-valid location than the category-invalid location) should be more pronounced on trials with size-consistent silhouettes than with size-inconsistent silhouettes. This would imply that size-consistent silhouettes more closely resemble the attentional template than size-inconsistent silhouettes and, thus, that size information is incorporated in the attentional template.

Following size-consistent silhouettes, participants were more accurate on category-valid trials (M = 98.6%, IQR = 2.8) than on category-invalid trials (M = 89.6%, IQR = 10.8), W = 406, p <.001, rank-biserial correlation = 1.00 (p_bootstrap <.001, 95% CI [6.3%, 11.6%]). Following size-inconsistent silhouettes as well, participants were more accurate on category-valid trials (M = 99.3%, IQR = 1.4) than on category-invalid trials (M = 93.8%, IQR = 12.2), W = 300, p <.001, rank-biserial correlation = 1.00 (p_bootstrap <.001, 95% CI [4.5%, 9.3%]). The occurrence of this validity effect shows that the attentional template contained category-selective information (i.e., distinguishing between car and person targets). Most importantly −and confirming our main hypothesis− this category-validity effect was larger for size-consistent silhouettes (M = 7.6%, IQR = 9.0) than for size-inconsistent silhouettes (M = 4.2%, IQR = 11.5), as showcased by a significant interaction effect between category-validity and size-consistency on response accuracy, F(1,29) = 9.88, p =.004, ɳ² =.009 (p_bootstrap =.002, 95% CI [0.7%, 3.2%]). This pattern of results (visualized in Figure 3b) supports the hypothesis that participants incorporated the expected size of target objects in their attentional template.

Note that the main effect of size-consistency was also significant, F(1,29) = 9.50, p =.004, ɳ² =.009 (p_bootstrap =.001, 95% CI [0.5%, 2.1%]), which shows that −irrespective of the location of the target dot− presenting two size-consistent silhouettes interfered more with catch-trial localization performance than presenting two size-inconsistent silhouettes (i.e., the vertical offset between lines in Figure 3B).

Interim discussion

The goal of Experiment 1 was to test whether observers incorporate the expected size of a target object in their attentional template. This hypothesis was confirmed. Category-specific silhouette stimuli influenced localization reports of the target dot more when they matched the expected size of the cued target object (e.g., small silhouettes in a “Far Target session”) than when they mismatched the expected size (e.g., small silhouettes in a “Near Target session”). This implies that the expected size of the cued target object was used during search preparation, otherwise the dot-probe performance for size-consistent and size-inconsistent silhouette conditions would not differ.

In this experiment, however, observers might not have predicted the size of the target object based on the viewing distance, but could have based their expectations of object size on the prevalence of (larger or small) target objects within an experimental session. As such, it remains unclear whether observers could also incorporate object size in their attentional template during real-world search, where size needs to be inferred from the viewing distance in the scene, on a moment-to-moment basis.

The goal of Experiment 2A was to test whether observers also incorporate object size in their attentional template when they need to infer the size of the target object from the current search location in a scene, as would be done during real-world visual search. To this end, participants now previewed the search scene that contained a location cue, informing participants about the viewing distance to the object (and thus its retinal size). This approach also allows to test whether observers can incorporate a new predicted object size in their attentional template in a trial-by-trial manner, which would indicate that observers can flexibly alter their attentional template as a function of search location (e.g., from saccade to saccade during real-world visual search). Because event-based designs (such as Experiment 2) are typically less powerful than block-based design (such as Experiment 1), we decided to directly pit the two conditions-of-interest against each other within each trial, by contrasting a size-consistent silhouette with a size-inconsistent silhouette (both of the target object category).