The Change Detection Advantage for Animals: An Effect of Ancestral Priorities or Progeny of Experimental Design?

Thomas Hagen; Bruno Laeng

doi:10.1177/2041669516651366

. 2016 Jun 27;7(3):2041669516651366. doi: 10.1177/2041669516651366

The Change Detection Advantage for Animals: An Effect of Ancestral Priorities or Progeny of Experimental Design?

Thomas Hagen ^1,^✉, Bruno Laeng ¹

PMCID: PMC4934668 PMID: 27433331

Abstract

The “animate monitoring” hypothesis proposes that humans are evolutionarily predisposed to recruit attention toward animals. Support for this has repeatedly been obtained through the change detection paradigm where animals are detected faster than artifacts. The present study shows that the advantage for animals does not stand up to more rigorous experimental controls. Experiment 1 used artificially generated change detection scenes and counterbalanced identical target objects across two sets of scenes. Results showed that detection performance is determined more by the surrounding scene than semantic category. Experiment 2 used photographs from the original studies and replaced the target animals with artifacts in the exact same locations, such that the surrounding scene was kept constant while manipulating the target category. Results replicated the original studies when photos were not manipulated but agreed with the findings of our first experiment in that the advantage shifted to the artifacts when object categories replaced each other in the original scenes. A third experiment used inverted and blurred images so as to disrupt high-level perception but failed to erase the advantage for animals. Hence, the present set of results questions whether the supposed attentional advantage for animals can be supported by evidence from the change detection paradigm.

Keywords: change detection, change blindness, animals, animate, replication, evolution, attention, null result

Introduction

The “animate monitoring” hypothesis (New, Cosmides, & Tooby, 2007) states that modern humans have inherited a mechanism which biases attention toward animate objects. Such a mechanism seems highly plausible from an evolutionary perspective, as it should provide great opportunities for survival (fleeing) and nutrition (hunting). Specifically, New et al. (2007) argued for the existence of such a mechanism by testing human subjects in a “change detection” task (Simons & Levin, 1997) where photographs containing animals and artifacts (man-made objects) would rapidly change (e.g., by repeatedly removing and reinserting a target object). Such a task consists of displaying a sequence of images for short durations (e.g., 250 ms): first the original image, then a blank image before a modified image is displayed, the sequence is then completed by a second blank before it is repeated (see Figure 1 for an illustration). Participants are then instructed to locate where the modification takes place as quickly as possible. The resulting time taken to detect the target is typically interpreted as a measure of how quickly attention can be drawn to the target.

Figure 1. — Example of one sequence of images in the change detection task. The “original” image was first displayed for 250 ms, this was followed by a “blank” (white) image for 250 ms before the “modified” image was displayed for 250 ms. The series was completed by another “blank” image before the sequence was displayed again.

New et al. (2007) manipulated the presence of animals (including humans) and artifacts in photographs. As predicted, they found that changes to animals and humans were detected more readily than changes to artifacts, supporting the hypothesis that attentional mechanisms are preferentially tuned to detect animate beings. They concluded that the evidence was consistent with an evolutionary account, where humans, under evolutionary pressure, evolved the ability to preferentially and spontaneously direct stronger attentional resources toward humans and non-human animals than to artifacts, plants, or geological features. The proposal is thus that visual attention operates on the basis of selective mechanisms that prioritize attention towards animals (or animate objects; cf. Caramazza & Shelton, 1998) and that such a mechanism should have evolved as a sort of “interrupt circuit” (i.e., an alert mechanism) for focused attention so as to reorient or bias attention toward areas of the visual scene that were not previously attended on the basis of low-level visual characteristics.

Other studies (New, Schultz, Wolf, Niehaus, Klin, German, & Scholl, 2010; Wang, Tsuchiya, New, Hurlemann, & Adolphs, 2014) have replicated these findings with success. One study showed that attention toward animals is not impaired in participants with autism (New et al., 2010); and another showed that patients with amygdala lesions can still show a preference for animals in this task (Wang et al., 2014). However, these studies used the same sets of images as the original study (except for Wang et al., 2014, where a few extra images were added to the original set).Thus, in the current literature, the advantage for animals appears to be quite robust and obtainable with a relatively small collection of images. In fact, New et al. (2007) implicitly made the assumption that the time to detect a change in an image is largely dependent on the ability of the target object to capture attention and that any variations in detection time caused by the surrounding context of the scenes could be controlled for by taking an average detection time across 14 to 24 images.

However, the set of images used in all of the above studies had disparate collections of scenes for the different categories, which could have introduced other effects since they were not kept constant for both sets of objects. Given the theory, one should expect an even clearer advantage when removing variations introduced by differences in the photographs containing the targets. Thus, in the first experiment of this study, we kept constant the context and manipulated only the targets category.

Experiment 1

Photographs can vary along a large number of perceptual parameters. Moreover, photographs produced by humans are not random observations of reality. Humans often seek to clarify any objects of interest to effectively convey a particular perception (especially professionals like photographers, artists, and designers); consequently the objects of interest are often presented within little cluttered or crowded contexts (Wichmann, Drewes, Rosas, & Gegenfurtner, 2010).

Photographs produced by humans at ground level typically contain a surface on which objects are located. Objects can vary in their position and distance from the photographer (depth), their relative physical size and their proximity to surrounding objects. Compared with classical psychological experiments (e.g., arrays of objects), these images are challenging to control and scrutinize with objective measures. In an attempt to control for such differences between photographs, New et al. (2007) collected subjective ratings of how interesting and “busy” the background was. However, several studies have clearly shown change blindness for objects placed in visual arrays or artificially created scenes (Freeman & Pelli, 2007; Henderson & Hollingworth, 1999; Zelinsky, 2001), thus indicating that artificial displays can be sufficient to produce change blindness. An advantage of using artificially constructed scenes is the ability to better control the stimuli used and their context. Further, this approach makes it trivial to obtain a large set of images per condition. Thus, for the present experiment, we generated artificial scenes by placing objects on a plane tilted in depth. Further, each image contained an equal combination of objects from both categories, thus balancing the number of times objects from each category were presented. This approach allowed us to keep the background constant across scenes.

Consequently, we prepared two experiments where we made efforts to counterbalance the influence of the surrounding scene across the experiments by switching the target category (animal or artifact) between identical sets of images. While a strict within-subjects design should provide more statistical power, for the present goal, it is not viable to show the same change detection image to the same participant twice, as one tends to learn rather well the location of the change after its first encounter. We note that our original expectation was in line with New et al. accounts, and we expected to find an advantage for animals in either type of experiments. In fact, we expected to increase the effect size of the experiment, by effectively cancelling out most of the noise introduced by using dissimilar images per category in the individual experiments. Clearly, we were surprised by the results, which in turn led us to revise and put some doubt on the interpretation of the original findings as well.

Methods

Materials

A set of 138 color studio photographs of animals (52) and artifacts (86) were collected from the internet and the Bank of Standardized Stimuli (Brodeur, Guérard & Bouras, 2014). The background was chosen to be a photograph of a flat green grass plane with blue sky. The objects were separated from their backgrounds and resized so that each object could be displayed with a realistic size in relation to the other objects. Each object was adjusted to perceptually blend in with the color tone and luminance of the target background scene with Photoshop software. As most object photographs were produced in studios, they appeared to have even illumination and hence no directional illumination (as can be produced by natural sunlight).

An algorithm was used to select 30 image pairs of animals and artifacts that were maximally matched on a set of variables. More specifically, each of the 30 images of animals was matched with an image of an artifact that were minimally different in size, saturation, luminance, object-background contrast (Naber, Hilger, & Einhäuser, 2012) based on the distance between object and background in DKL color space (Derrington, Krauskopf, & Lennie, 1984), perimeter and JPEG compression ratios (as a proxy for complexity; Forsythe, Mulhern & Sawey, 2008), Hypercomplex Fourier Transform (HFT) visual salience (Li, Levine, An, Xu & He, 2013), as well as a measure of visual salience developed by Itti, Koch, and Niebur (1998).

The placing of objects was done by use of a custom-made software algorithm which ensured correct scaling in depth and realistic spacing and overlapping of objects. The algorithm further ensured that all object locations were randomized, except for the rule that object category should alternate for each progression in depth. Each display contained 18 to 20 objects, and it was ensured that no object was displayed twice in the same display. The heading or horizontal flip of each object was decided by a randomizer. Examples of these scenes can be seen in Figure 2.

Figure 2. — Example display used in Experiment 1. In this display, the sheep located in the lower center portion of the scene is removed and consequently reappears in the same location.

From the selection of image pairs an algorithm generated 1,000 scenes each containing approximately 10 animals and 10 artifacts. This arrangement made it possible to select 10,000 different change locations for each category. A selection of possible combinations of scenes was done with the “cube method of stratified balanced sampling” (Deville & Tillé, 2004). This algorithm ensured that target objects in both categories (animals and artifacts) would be approximately equalized in eccentricity, size, depth, crowding (Freeman & Pelli, 2007; Wallace & Tjan, 2011), and occlusion. This method further ensured that the final collection of images would maximally use the same target object three times. It is argued that this process helps to distance the experimenter from the selection and manipulation process, while also aiming to balance potential confounding variables.

The sampling resulted in a set of images containing 64 scenes of changing animals and 64 scenes of changing artifacts. Next, we created a new stimulus set by replacing the target object with its optimally matched pair from the opposite category. This preserved the overall structure of the scenes while minimizing the difference introduced by changing the target object. The first set of images will be referred to as Experiment 1A, and the second set as Experiment 1B. An illustration of this process can be seen in Figure 3. Note that Experiments 1A and 1B used the same set of target objects with the same frequency, as each object had a designated matching object in the opposite category. Thus, both experiments (1A and 1B) consisted of the same images except for the crucial manipulation of the target object (i.e., replacing animal targets with artifacts). Consequently, when comparing change detection performance to animals in Experiment 1A against artifacts in Experiment 1B and artifacts in Experiment 1A to animals in Experiment 1B, we would effectively be controlling for context (by keeping it constant) while being able to measure the sheer influence of target category.

Figure 3. — Examples of images used in Experiment 1A and 1B. Target objects are indicated with red circles (not used in the actual experiments). The surrounding scene was kept constant as we manipulated the category of the target object (animal or artifact).

Participants

We used Crowdflower® to recruit 277 participants, 150 (115 males) for Experiment 1A and 127 (87 males) for Experiment 1B. Participants were monetarily remunerated for their time. For Experiment 1A, the participants mean age was 31.2 years (range = 16–64, SD = 9.9). For Experiment 1B, the participants mean age was 29.5 years (range = 17–61, SD = 8.7). Each participant was allowed to complete only one version of the experiment.

Apparatus

The experiments were implemented in JavaScript and participants used their own computers at their ease, as it is the case with crowd sourcing experiments (Crump, McDonnell, & Gureckis, 2013).

Procedure

Participants were required to complete 20 practice trials (constructed in the same manner as the experimental stimuli) and to achieve an accuracy of at least 80% before continuing to the actual experiment which consisted of 128 trials. Each trial started with a fixation cross in the middle of the display for 1,000 ms followed by a 100 ms blank screen before the change detection loop started. One sequence of the loop consisted of (a) the original image for 250 ms followed by (b) a blank screen for 250 ms before the (c) changed image were presented for 250 ms followed by (d) another blank screen for 250 ms (see Figure 1 for an illustration) and so on. This sequence was maximally repeated 10 times. Every trial contained a change. Participants were instructed to press the space bar as fast as possible when they thought they had located the change. If participants failed to press the space bar within 10 s from the start of each trial a feedback screen was displayed; but if participant reported a change, then the original image was displayed with instructions to use the computer mouse to click on the position of the changing object. After recording the mouse click, a feedback screen was displayed informing subjects whether they had provided a correct response or not, along with their accumulated accuracy. The feedback screen was displayed until the subject pressed the space bar to continue the experiment. Trial order was randomized across subjects. The task normally lasted about 15 min. Subjects were randomly assigned to complete either experiment (1A or 1B) after agreeing to an informed consent in accordance with the Declaration of Helsinki. The experiment was also approved by the institutional review board.

Results

One subject was removed from the analysis of Experiment 1A (0.6% of all trials) and five subjects were removed from Experiment 1B (3.9% of all trials), all for having accuracy rates below 70%.

A mixed ANOVA with category (animals, artifacts) and experiment (1A, 1B) showed no significant main effect for Category, F(1, 269) = 0.004, p = .95, or Experiment, F(1, 269) = 1.7, p = .19, but a significant interaction between Category and Experiment, F(1, 269) = 152.8, p < .001.

Further t tests conducted to investigate this result showed that animals (M = 3,176 ms, SD = 559 ms) were detected significantly faster than artifacts (M = 3,403 ms, SD = 533 ms), t(148) = 8.3, p < .001, 95% CI [173, 281], d = 0.41, in Experiment 1A while artifacts (M = 3,071 ms, SD = 515 ms) were detected significantly faster than animals (M = 3,346 ms, SD = 541 ms), t(121) = 9.1, p < 0.001, 95% CI [215, 334], d = 0.52, in Experiment 1B. In fact, this demonstrates a complete reversal of the typical advantage for animals described in the literature and suggests that the advantage for either image set may not be specific to the category of the target object (see Figure 4).

Figure 4. — Mean response times (detection times) with error bars indicating 95% confidence intervals for within-subjects designs (Cousineau, 2005)..

Next, we examined whether the change in target objects between experiments resulted in significantly different response times. An independent samples t test between Experiment 1A artifacts and Experiment 1B animals failed to find the two conditions significantly different, t(257) = 0.87, p = .38, 95% CI [−186, 71]. An analogous result was also found between 1A animals and 1B artifacts, t(265) = 1.6, p = .10, 95% CI [−233, 23]. These results, together with the previous ones, suggest that the advantage for either image set is likely to be dependent on the surrounding scene and not the category of the target object.

Undetected changes in Experiment 1A were 5.6% for animals and 8.9% for artifacts; for Experiment 1B, the proportions were 7.8% for animals and 5.3% for artifacts.

Discussion

As expected from the findings of New et al. (2007), Experiment 1A did show faster responses for animals as compared with artifacts. However, surprisingly, when we replaced within the same locations the target objects, with members of the opposite category in Experiment 1B the advantage was reversed and artifacts showed faster responses than animals. Further, it was shown that the mean response times did not significantly different between similar scenes across the experiments.

These results suggest that using shapes of animals as targets does not necessarily lead to faster response times in a change detection experiment. In fact, category seemed irrelevant to the advantage from one experiment (1A) to the other (1B) since the advantage was simply reversed and to a same extent. Even assuming that the images or scenes were highly artificial or that the category sets were unbalanced in some unknown attribute would not predict a complete reversal of the advantages. Given that the shapes remained constant between 1A and 1B and only the targets locations were switched, it would seem that either location per se or in relation to surrounding objects or the whole scene plays a key role, irrespective of object category.

Given the present results, we were led to reconsider whether using realistic photographs of natural scenes (as in New et al., 2007) is a key ingredient for finding category effects. One could argue that the images of the previous experiments, though they carefully controlled the surround scene around the target objects, contained implausible arrangements of disparate objects, unlikely to be seen in the real world. Thus, we may have enhanced purely contextual effects and overshadowed possible category differences. In fact, New et al. (2007) used 14 images per category in their Experiments 1 to 4 and 24 images for Experiment 5. As remarked previously, in their experiments, every scene was unique to a specific target object; from it seems difficult to conclude, also in the light of the strong effects of context in the present experiment, whether the advantage for the animals would remain when the animal shapes were replaced within the same scene by an artifact. Hence, to further investigate how robust a change detection advantage for animals can be we used, the original images used by New et al. (2007) and generated additional conditions where we replaced the target animals with either artifacts or other animal shapes.

Experiment 2

The above findings suggest that the location of the target object within a particular scene might be a more important predictor of change detection performance than the semantic category of the target object. Hence, we asked whether replacing the animal targets with artifacts in the images used by New et al. (2007) could influence and possibly erase the detection advantage. If the targets category (animal vs. artifact) is the key factor leading to faster detection responses, then we should expect a considerable change in responses from such substitution; more precisely, we should observe a dramatic drop in change detection performance for artifacts located within the same scenes originally used for animal targets. However, based on the results of Experiment 1, such alternations may not lead to a significant change in detection performance.

Ideally, we should design the experiment orthogonally, by exchanging target objects in all images such that artifacts will take the place of the animals and animals of artifacts. However, this did not seem like a viable solution for at least this set of images used by New et al. (2007). In fact, a large proportion of artifact target objects had properties that would result in very awkward positions (e.g., rooftop, wall, etc.) or sizes (e.g., silo, house, vertical pole, etc.) for animal substitutions. As remarked, keeping the images plausible and naturalistic may be an important aspect influencing the appearance and disappearance of the sought-after effects and one should avoid testing the hypothesis with non-ecological assemblies of objects. However, in the original photos, most of the animal targets had appropriate positions and relative sizes for being replaced with artifact targets without the resulting scene looking unnatural or absurd. Indeed, to avoid incongruency effects between the content of a scene and the target, we selected artifacts which were highly plausible as alternative objects in the same context (e.g., a pram for a dog in the backstreet scene; an anchor for a fish in the underwater scene; see Figure 5).Thus, we were forced to only manipulate these images originally containing animal targets.

Figure 5. — Images in Experiment 2A are identical to those used by New et al. Images in Experiment 2B (“Artifacts for Animals”) are modified to contain an artifact as the target object, while in Experiment 2C (“New Animals”), they are modified to contain animals that were not original to the image. Original images © 2007 by The National Academy of Sciences of the USA.

To make a more balanced comparison between our manipulated images across categories, we constructed two versions of the animal target images: One where we replaced the animal target with an artifact and one where we replaced the animal target with an image of an animal not originally contained in the scene (as a control condition to the act of modifying the images). Both artifact and animal images used for replacement were gathered from the internet by searching for images visually similar to the original animals.

Finally, it would be important to rule out that any confounds related to object-scene consistency or perceived interest could have contributed to earlier detections for our artifact insertions, thus we also collected consistency and interest ratings where participants responded on a scale from 1 to 7 (not at all-very) on how well the targets fit the theme of the scene and how interesting they were.