The nature of instructional effects in color constancy

Ana Radonjić; David H Brainard

doi:10.1037/xhp0000184

. Author manuscript; available in PMC: 2017 Jun 1.

Published in final edited form as: J Exp Psychol Hum Percept Perform. 2016 Jan 4;42(6):847–865. doi: 10.1037/xhp0000184

The nature of instructional effects in color constancy

Ana Radonjić ¹, David H Brainard ¹

PMCID: PMC4873441 NIHMSID: NIHMS740538 PMID: 26727021

Abstract

The instructions subjects receive can have a large effect on experimentally measured color constancy, but the nature of these effects and how their existence should inform our understanding of color perception remains unclear.

We used a factorial design to measure how instructional effects on constancy vary with experimental task and stimulus set. In each of two experiments, we employed both a classic adjustment-based asymmetric matching task and a novel color selection task. Four groups of naive subjects were instructed to make adjustments/selections based on 1) color (neutral instructions), 2) the light reaching the eye (physical spectrum instructions), 3) the actual surface reflectance of an object (objective reflectance instructions) or 4) the apparent surface reflectance of an object (apparent reflectance instructions). Across the two experiments we varied the naturalness of the stimuli.

We find clear interactions between instructions, task and stimuli. With simplified stimuli (Experiment 1), instructional effects were large and the data revealed two instruction-dependent patterns. In one (neutral and physical spectrum instructions) constancy was low, inter-subject variability was also low, and adjustment-based and selection-based constancy were in agreement. In the other (reflectance instructions) constancy was high, inter-subject variability was large, adjustment-based constancy deviated from selection-based constancy and for some subjects selection-based constancy increased across sessions. Similar patterns held for naturalistic stimuli (Experiment 2), although instructional effects were smaller. We interpret these two patterns as signatures of distinct task strategies — one is perceptual, with judgments based primarily on the perceptual representation of color; the other involves explicit instruction-driven reasoning.

Keywords: color constancy, instructional effects, asymmetric matching, color selection, task strategy

Understanding how the visual system extracts information about the color of the objects in the environment is a fundamental open question in vision science. The problem arises because of the inherent ambiguity of the signal that reaches the photoreceptors: the spectrum of light reflected from objects to the eye depends not only on object surface reflectance, which is the physical correlate of object color, but also on incident illumination. Any reflected light spectrum can result from a myriad of different surface-and-illuminant combinations. By the same token, a fixed object will reflect different spectra when viewed under different illuminations. Disentangling the intrinsic surface reflectance component from the transient and variable illumination component is computationally challenging. Despite the challenge, the visual system provides a fairly constant perceptual representation of object color, and we rely on this representation to guide action (e.g., selecting fresh and avoiding spoiled food). The underlying mechanisms that support such color constancy, however, are not fully understood.

Endeavors to understand constancy are complicated by the finding that the instructions subjects receive can modulate the degree of experimentally measured constancy. Early influential studies on this topic, conducted by Arend and his collaborators (Arend & Reeves, 1986; Arend & Goldstein, 1987; Arend, Reeves, Schirillo, & Goldstein, 1991) demonstrated instructional effects in the context of an asymmetric matching task. In their experiments, subjects viewed a pair of stimulus configurations, each a simulation of illuminated papers presented on a computer screen. The simulated papers across the two configurations were identical, but their simulated illumination differed. Subjects were asked to adjust a test patch in one configuration to match a corresponding patch (the standard) in the other, and the data were analyzed in terms of how much constancy the matches revealed. Across conditions, the task instructions were varied and this resulted in different degrees of measured constancy. When instructed to adjust the test to “match the hue and saturation (and/or brightness) of the test patch to those of the standard patch […] while disregarding, as much as possible, other areas in the screen” subject matches indicated low constancy. Constancy increased considerably, however, when the same subjects were asked to “make the test patch look as if it were cut from the same piece of paper”. Similar instructional effects were measured for full color stimuli (Arend & Reeves, 1986; Arend et al., 1991) and for the achromatic stimuli (Arend & Goldstein, 1987).

Several characteristics of the experimental design of these early studies are worth noting. First, the studies employed relatively simple stimuli. These were simulation of flat matte surfaces rendered under spatially diffuse illumination; they consisted either of pairs of surfaces (disk-and-annulus configurations) or larger sets of overlapping rectangular surfaces (Mondrian configurations). Second, the effects of instructions were measured within-subjects. Third, the majority of subjects were experienced (the authors and the members of their labs) — aware of the computational problem of color constancy and familiar with how changes in illumination affect the light reflected from a fixed object.

A number of subsequent studies replicated the instructional effects reported by Arend and colleagues using asymmetric matching and similar stimulus configurations. This was done both for successive (Troost & de Weert, 1991) and simultaneous (Bauml, 1999; Cornelissen & Brenner, 1995; Troost & de Weert, 1991) matching and both with within-subjects (Cornelissen & Brenner, 1995; Troost & de Weert, 1991) and between-subjects designs (Bauml, 1999). One of those studies found instructional effects only for experienced subjects (Cornelissen & Brenner, 1995), but others reported instructional effects for naive subjects as well (Bauml, 1999; Troost & de Weert, 1991).

Large instructional effect on color constancy were also revealed in the a study that used a modified version of the asymmetric matching task in which, rather than adjusting the test surface, the subjects rated (on a scale of 0% to 100%) the extent to which the test appeared either (1) the same hue and saturation as the target surface or (2) as if it were made of the same piece of paper as the target (Reeves, Amano, & Foster, 2008). In this study, which used both within- and between-subjects designs, the subjects also completed a task in which they provided a yes/no judgment on whether the two Mondrian configurations were made out of the same material; this type of judgment was highly correlated with the surface-based (paper) matching ratings. In a related study, Van Es et al. (2007) showed that across a simulated change in illumination subjects are able to make fairly accurate judgments of both (1) the local properties of the test surface (did the test patch change in hue/saturation/brightness), which was interpreted to indicate low color constancy, as well as (2) global properties of the scene (did the test patch change in the manner consistent with the overall change in illumination), which was interpreted to indicate higher constancy.

Some studies which probed achromatic color perception introduced a third type of instructions: in addition to brightness and lightness (paper) matches, the subjects were asked to make brightness contrast matches (“make the brightness difference between the test and the surround the same as between the standard patch and the surround”) and, under certain conditions, these matches differed from both brightness and lightness matches (Arend & Spehar, 1993a, 1993b; Blakeslee, Reetz, & McCourt, 2008; for another version of instructional manipulation in the lightness domain see Rudd, 2010).

A number of color and lightness constancy studies, however, varied instructions along the same lines as the early studies of Arend and colleagues and failed to find substantial effects (Delahunt & Brainard, 2004; Logvinenko & Tokunaga, 2011; Madigan & Brainard, 2014; Ripamonti et al., 2004). These studies were all conducted using naive subjects and experimental methods other than adjustment-based asymmetric matching (such as achromatic adjustment or palette matching) and, predominantly, a between-subjects design (Logvinenko & Tokunaga 2011 study was within-subjects). In addition, these studies employed more naturalistic stimulus configurations than the studies reviewed above that reported large instructional effects. Here the stimuli were real illuminated objects or fairly realistic graphics simulations (e.g., three-dimensional scenes, presented stereoscopically). This difference suggests that both the task and the class of stimuli used in the experiment may modulate instructional effects in constancy studies.

That the choice of stimuli affects experimentally measured constancy is also suggested by a number of studies that used neutral (non-specific) instructions. In our recent work, for example, we showed that when stimuli were fairly realistic simulations of illuminated objects constancy was good, but that it dramatically decreased when stimuli were reduced to square patches presented against the textured background, even though the colorimetric characteristics of the stimuli were closely matched (Radonjić, Cottaris, & Brainard, 2015b). High degrees of constancy were also found in other studies that used naturalistic stimuli (Brainard, Brunt, & Speigle, 1997; Kraft & Brainard, 1999), suggesting that ‘surface-based’ instructions are not necessary for good constancy. It remains unclear however whether introducing such instructions would have led to even higher degrees of measured constancy in these experiments (see also Wright, 2013).

In summary, the extant literature makes clear that it is possible to find instructional effects in studies of color and lightness constancy. What is much less clear, however, is the nature of these effects and what they tell us about the human color and lightness constancy (Brainard & Radonjić, 2014; Kingdom, 2011).

Some authors have argued that different instructions prompt subjects to report about different aspects of a fixed perceptual representation, in the same way one can, for example, independently judge object’s size or its orientation (Arend & Spehar, 1993a). In a similarly dualistic vein, it has also been proposed that different instruction probe different types of processes that support color constancy or different “perceptual modes” (Arend & Reeves, 1986; Arend et al., 1991; see also Rock, 1983). Others posit that the perception of object color is based on a unitary perceptual representation and that instructional effects reflect the fact that certain types of instructions drive subjects to rely on explicit reasoning (from the unitary perceptual representation) to make a prompted-for match. In this regard, some posit that hue/saturation/brightness instructions prompt subjects to reason when making (unnatural) judgments about the characteristics of the proximal stimulus (Gibson, 1950; Gilchrist, 2012; see also Koffka, 1935; MacLeod, 2012). Others, however, argue that subjects tend to engage in explicit reasoning when they are given surface-based (paper) instructions and that under these instructions, their matches are best described as inferred color (or lightness) judgments (Blakeslee & McCourt, 2015; Blakeslee et al., 2008). Distinguishing between the various theoretical accounts is challenging because it is not clear what experimental data would clearly support one over the others.

To provide a better understanding of instructional effects on constancy and their nature, we designed a study to measure systematically whether and how such effects depend on stimulus set and experimental task. We asked four groups of subjects, each of which received a different type of instructions, to complete two different color constancy tasks: a classic asymmetric matching task, for which instructional effects have been frequently reported, and a color selection task, recently developed in our lab.

In the color selection task, subjects are asked to select objects based on color across a change in illumination. The task is intended to probe constancy in the manner that captures the real-world use of color, where we frequently rely on it to select objects to meet specific goals. For example, we use color information to select the ripest tomatoes, rather than adjusting the tomatoes until they look ripe enough to eat. Our previous study compared constancy across color selection and the asymmetric matching for neutral instructions, and found good agreement for those instructions. This finding held for both simplified and naturalistic stimuli (Radonjić et al., 2015b).

In this study we also used two different classes of stimuli: simplified and naturalistic. These were identical to stimuli we used in our previous study (Radonjić et al., 2015b). Our simplified stimuli (Experiment 1) consisted of simulations of diffusely illuminated two-dimensional patterns of rectangular matte paper patches; they resembled the flat-matte diffusely illuminated geometric patterns used in studies which report large instructional effects. Our naturalistic stimuli (Experiment 2) consisted of realistic simulations of three-dimensional scenes and were presented stereoscopically. They were modeled after the color cube illusion of Lotto and Purves (1999) and depicted a large multifaceted cube suspended in mid-air in the center of a room in which the illumination varied spatially.

We used four different types of instructions, labeled as neutral, physical spectrum, objective reflectance and apparent reflectance instructions. The neutral instructions simply ask subjects to judge color, without any further definition of the term color. The physical spectrum instructions are formulated to probe sensory experience and ask subjects to make judgments based on the qualities of the light reaching their eye while disregarding any signals in the image that might indicate changes in illumination. These are similar to the hue/saturation/brightness instructions used in the prior literature. The objective reflectance and the apparent reflectance instructions are both surface-based and the difference between them is subtle: the latter asks for judgment based on subjective appearance, while the former asks for judgment based on objective surface properties (e.g., “adjust the test so that it is made out of the same material as the target square” vs. “adjust the test so that it looks like it is made from the same material as the target” (see also Wagner, 2012)). Data for the neutral instructions were reported in our previous paper (Radonjić et al., 2015b), and are presented again here to provide a baseline for comparison.

Experiment 1

Methods

Apparatus

The stimuli were presented on a calibrated 21″ CRT color monitor (ViewSonic, Model Graphic Series G225fB) driven via a dual-port video card (NVIDIA GeForce GT120) at a pixel resolution of 1280 by 1024 and refresh rate of 75 Hz and with 8-bit resolution for each RGB channel. The host computer was an Apple Macintosh with an Intel Xeon quad-core processor. An eye tracker (EyeLink 1000, Desktop remote model, SR Research) was used to record the position of the eye, but eye-tracking data will not be reported here.

The subject’s head position was stabilized using a chin rest. Subjects viewed the stimuli monocularly using their right eye while their left eye was covered with an eye patch. The distance between the subject’s eye and the center of the screen was 76 cm. The experimental programs were written in Matlab and relied on Psychtoolbox (Brainard, 1997; Pelli, 1997, http://psychtoolbox.org) and mgl (http://justingardner.net/doku.php/mgl/overview) routines.

Stimulus

The stimulus configuration consisted of five squares (each 3.5 cm a side, 2.6°) presented against a textured color background (Figure 1). The square in the center of the screen served as the target and was surrounded by the four squares (each at 8° eccentricity, measured from the center of the target to the center of the surrounding square).

Example stimuli for the color selection task. The square in the center of the configuration is the target and it is surrounded by four squares. In the illuminant-constant example (left), the top square is the tristimulus and reflectance match for the target while the bottom square is a competitor (C₋₁). In the yellowish illuminant-change example (center) the square on the right is the tristimulus match, while the bottom square is the reflectance match. In the bluish illuminant-change example (right), the top square is the tristimulus match while the square on the right is the reflectance match. All three examples show the gray target. (Color version of all figures is available in the online edition.)

Across trials, we used four different colored targets (Figure 2A). They all had the same luminance (23.6 cd/m²). One target (“gray”) was achromatic (CIELAB chroma: 0.4, hue angle: 314.01°); the remaining three (“rose”, “teal” and “green”) were equal in saturation (chroma: 25.6), but were sampled from different regions of the hue circle (hue angle: 355.31°, 227.14°, 135.46°).

**Panel A.** The four targets (shown under the standard illuminant). **Panel B.** Competitor set in the color selection task for one target (gray) in the illuminant-constant (top row) and the illuminant-changed conditions (middle row: yellowish; bottom row: bluish). See text for details on how the competitor sets were constructed.

The textured background behind the squares was constructed as an array of simulated small rectangular Munsell papers (0.26 × 0.23 cm each, 0.17° × 0.19°). On illuminant-constant trials, the background behind the target and the surrounding squares was uniformly illuminated by the standard illuminant (6500 K CIE daylight; Figure 2, left). On illuminant-changed trials, the simulated illuminant of the background behind the surrounding squares changed to the yellowish test illuminant (4500 K CIE daylight; Figure 2, center) on one half of the trials and to the bluish test illuminant (12000 K CIE daylight; Figure 2, right) on the other half. On these trials, a small circular area of the background around the target (13.3 cm in diameter, 10°) remained under the standard illuminant.

The stimulus background was created by randomly sampling (with replacement) from a subset of ~220 Munsell paper samples (out of 462) whose surface reflectance is known (Nickerson, 1957). The subset only included the samples that we could render within the gamut of our display in each illuminant condition and whose luminance (under the standard illuminant) was at least 20 cd/m². We created 10 different background patterns (160 × 128 patches each). On each trial, one of these backgrounds was randomly chosen and rendered under the simulated illuminants appropriate for a given trial.

The mean xy chromaticity for backgrounds in the illuminant-constant and yellowish and bluish illuminant-changed condition was [0.33, 0.34], [0.38, 0.38] and [0.29, 0.30] while the luminance was 45.17, 29.41 and 30.32 cd/m², respectively. Across trials, the targets (as well as all the competitors from the color selection task; see below) were luminance decrements relative to the average of the textured backgrounds.

Color selection task

At the beginning of each trial, a black fixation cross was displayed against the textured background at the center of the screen. To initialize a trial, subjects used the computer mouse to move a cursor (a small black dot) to the center of the cross and clicked the mouse. The stimulus (the target and the surrounding squares) was then displayed.

Two of the surrounding squares were distractors, whose color was highly dissimilar from the target. The remaining two squares were competitors and their degree of color similarity to the target varied across trials. The subject’s task was to use the mouse to move the cursor onto the surrounding square that was closest to the target in color and click the mouse. The meaning of the word “color” was defined by experimental instructions (see below).

On each trial, the two competitors were drawn from a set, which was predefined for each target and illumination condition (Figure 2B). In the illuminant-changed condition, the competitor set included:

(1)
The tristimulus match for the target (denoted as T) which had a different simulated surface reflectance, but the same tristimulus coordinates as the target under the standard illuminant.
(2)
The reflectance match for the target (R), which had the same simulated surface reflectance as the target. The tristimulus coordinates of the reflectance match were different from those of the target because of the change in simulated illuminant.
(3–5)
Three color samples (C₁, C₂, C₃), which were equally spaced along the line in CIELAB color space that connected the tristimulus and the reflectance match. We used the XYZ coordinates of the standard illuminant ([90.38, 95.22, 103.39]) as the white point for conversion of XYZ to CIELAB values.

In illuminant-constant condition the competitor set included five color samples: the tristimulus match for the target (which, in this condition, was also the reflectance match) and the two closest competitors from the yellowish (C₋₁ and C₋₂) and from the bluish (C₁ and C₂) competitor set.

On each trial, the squares that served as distractors were randomly drawn (without replacement) from a predefined set of distractors for each target. This set consisted of simulations of the Munsell papers used for the background checks (under the standard illuminant) that differed from the target and any of its competitors by at least 20 CIELAB ΔE units.

Illuminant-constant and illumination-changed trials were blocked. Within a block of trials, each target was presented with all pairwise combinations of its competitors. Thus, each illuminant-constant block consisted of 40 trials (1 standard illuminant x 4 targets x 10 possible competitor pairs) while each illuminant-changed block consisted of 80 trials (2 test illuminants x 4 targets x 10 possible competitor pairs; bluish and yellowish trials intermixed) presented in random order.

At the beginning of the first session all subjects completed a brief training which consisted of four illuminant-constant trials (each with a different target).

Subjects completed 20–28 illuminant-constant blocks and 30–32 illuminant-changed blocks across 7–9 one-hour sessions. Typically, the first and the fourth session consisted of illumination-constant blocks, while the remaining sessions consisted of illuminant-changed blocks. In the last session, the subjects completed both types of trials (all remaining trials needed to finish the experiment), with all illuminant-constant blocks completed first.

Asymmetric matching task

The stimuli in the asymmetric matching task closely matched those used in the color selection task: after the subject initiated a trial (in the same manner), the stimulus configuration, consisting of five squares presented against the textured background, was displayed. As with the color selection task, the square in the center served as the target and was surrounded by four squares. One of the surrounding squares was the test square and its color was set to either white or black at the beginning of the trial. The remaining three squares were randomly chosen from the predefined set of distractors.

The subjects’ task was to adjust the test square to match the target in color (as defined by the instructions). They completed the task using a controller which allowed adjustment of test’s CIELAB L*, chroma and hue. The subjects could take as much time as they needed to set the desired match. They were also allowed to select a “match impossible” option if they felt they were not able to achieve the desired match; there were only a few such trials and they were excluded from the analysis (1/72 for subjects tuj, hfe and goh; 2/80 for mik and 2/72 for nke).

Each illuminant-constant block consisted of 4 trials (1 standard illuminant x 4 targets), while each illuminant-changed block consisted of 8 trials (2 test illuminants x 4 targets). The subjects completed 6 blocks of trials in each illuminant condition in 3–4 one-hour sessions (except subjects mik who completed 7 illuminant-changed blocks and mil who completed 7 illuminant-changed and 4 illuminant-constant blocks of trials). The sessions were blocked by illumination condition: the subjects completed all illuminant-constant blocks of trials (in 1 or 2 sessions) before moving to illuminant-changed blocks (completed in 2–3 sessions).

Asymmetric matching training

Prior to the first asymmetric matching session all subjects completed a training to familiarize themselves with the matching task and learn how to use the controller. We used the same apparatus as for the experiment, but the subjects viewed the display binocularly without an eye patch. In the training, the subjects completed 3–7 blocks of trials (up to 10 trials per block) across 2–3 sessions. On each training trial, two squares — the target and the test — were presented adjacent to one another against the illuminant-constant background and the subject made a symmetric color match. Target colors were set randomly (by drawing a random triplet of RGB values). The first training trial was completed by the experimenter, who demonstrated how to use the controller and provided step-by-step explanations while making the match. The subject made the second match with experimenter’s help and then continued to make matches unassisted and without immediate feedback. At the beginning of the each following training session the experimenter reviewed the worst matches from the previous session (assessed using the CIELAB ΔE metric) and encouraged the subjects to make matches that agreed more with the target.

Subjects

16 subjects (3 male and 13 female, all age 19 – 22) participated in the experiment. They all had normal color vision, as assessed by the Ishihara plates (Ishihara, 1977, up to one plate incorrect). All except one had normal or corrected to normal visual acuity of 20/40 or better (as assessed by a Snellen chart). Measured visual acuity for the remaining subject (fai) was 20/50. All experimental procedures were approved by University of Pennsylvania Institutional Review Board and were in accordance with the APA Ethical Principles and World Medical Association Helsinki Declaration.

Instructions

We used four different sets of experimental instructions (neutral, physical spectrum, objective reflectance and apparent reflectance). A different group of four subjects was assigned to each instructional condition; each subject received only one type of instructions in all phases of the experiment. Within each instructional group, two of the four subjects completed the color selection task first, while the other two completed asymmetric matching task first.

At the beginning of the experiment all subjects except those in the neutral instructions group went through the induction procedure in which they were familiarized with the type of color judgment they were asked to make in the experiment. A full description of the induction procedure is provided in the supporting material available online (http://color.psych.upenn.edu/supplements/instructionaleffects/). Briefly, the subjects were taught the difference between surface reflectance and reflected light through a series of demonstrations in which they observed how the changes in illumination affected the light reflected from different colored papers. The induction procedure was repeated twice for each observer: (1) at their very first session (the illuminant-constant color selection session for the subjects who did the selection task first or the asymmetric matching training session for those who did asymmetric matching first) and (2) before the first illuminant-change session of the task they were to complete first.

The subjects also received task-specific instructions, which were read to them and repeated before each experimental session for the duration of the experiment (and after the induction procedure when it was performed). For each condition and task we provide instructions verbatim in the supporting material. The procedural aspects of the instructions were the same across groups, but the way the term color was defined differed. For each instructional group the instructions for the color selection task were as follows (ellipses are inserted in places that described procedural aspects of the experiment; see the supporting material for instructions verbatim).

Neutral: “Your task is to click on the test square that is closest to the target square in color.”

Physical spectrum: “You should think about these squares as simulations of illuminated paper surfaces. In this context, your task is to click on the test square from which the light reaching your eye is most similar to the light from the target square. […] In the experiment, you may notice that on some trials there will be a change in background behind the test squares. Try to ignore that as much as possible and focus on choosing the test square that delivers that most similar light to your eye as the target square – as if you were looking at the test squares through the tube that we used when we explained to you the difference between surface reflectance and reflected light [in the induction procedure].

Objective reflectance: “You should think about these squares as simulations of illuminated paper surfaces. In this context, your task is to click on the test square that is cut from the piece of paper most similar to the target square, that is the test that has the same reflectance properties as the target. […] In the experiment you may notice that on some trials there will be a change in background behind the test squares. Think of this as a change of illumination and focus on choosing the test square that is closest in surface reflectance to the target. That is chose the test that would be most similar to the target, if the target were under the changed illumination as well.

Apparent reflectance: “You should think about these squares as simulations of illuminated paper surfaces. In this context, your task is to click on the test square that looks like it is cut from the piece of paper most similar to the target square, that is the test that looks like it has the same reflectance properties as the target” […] In the experiment, you may notice that on some trials there will be a change in background behind the test squares. Think of this as a change of illumination and focus on choosing the test square that looks closest in surface reflectance to the target. That is chose the test that would look most similar to the target, if the target were under the changed illumination as well.

The instructions for the asymmetric matching task closely matched those used in the color selection task:

Neutral.Your task is to adjust the test square so that it matches the target square in color.

Physical spectrum.Your task is to adjust the test square so that the light reaching your eye from it is the same as the light reaching your eye from the target square. In the experiment, you may notice that on some trials there will be a change in background behind the four squares. Try to ignore that as much as possible and focus on adjusting the test square so that the light it delivers to your eye is the same as that from the target square – as if you were looking at the test square and the target through the tube that we used when we explained to you the difference between surface reflectance and reflected light.

Objective reflectance.Your task is to adjust the test square so that it has the same reflectance properties as the target square. […] In the experiment you may notice that on some trials there will be a change in background behind the four squares. Think of this as a change of illumination and focus on adjusting the test square so that it matches the target in surface reflectance. That is adjust the test so that it matches the target, if the target were under the changed illumination as well.

Apparent reflectance.Your task is to adjust the test square so that it looks like it is cut from the same piece of paper as the target square. That is, adjust the test square so that it looks like it has the same reflectance properties as the target square. […] In the experiment, you may notice that on some trials there will be a change in background behind the four squares. Think of this as a change of illumination and focus on adjusting the test square so that it looks like it has the same surface reflectance as the target. That is adjust the test so it looks like the target, if the target were under the changed illumination as well.

In the asymmetric matching task (and training) we also used a different term to refer to CIELAB L* dimension across instructional groups: term intensity was used in the neutral, term brightness in the physical spectrum and term lightness in the objective and apparent reflectance instructions. The label for the button controlling the intensity on the controller schema used in the asymmetric matching training also differed across conditions to reflect this change (see supporting material).

Post-experiment questionnaire

In the end of the study the subjects completed a short questionnaire in which they were asked to describe any strategy they might have used to complete the color selection and the asymmetric matching tasks. They were also invited to provide any additional comments they might have had about the experiment.

Supporting material

For both experiments supporting material available online (http://color.psych.upenn.edu/supplements/instructionaleffects/) provides detailed colorimetric specification of the stimuli (including CIELAB, xyY and LMS values for each target and competitor), target reflectances and spectra of each illuminant, instructions verbatim, the post-experiment questionnaires with responses, tabulated results of statistical analyses and individual data for each subject.

Data analysis: Color selection

We developed a method that allows us to quantify the degree of color constancy that mediates subject’s performance in the color selection task. Our method relies on the observer model implemented in the maximum likelihood difference scaling (Maloney & Yang, 2003) and we have described it in detail in earlier papers (Radonjić, Cottaris, & Brainard, 2015a; Radonjić et al., 2015b). Briefly, we assume that, each stimulus (the target and the competitors) occupies a certain position in an underlying one-dimensional perceptual representation. This position is subject to perceptual noise, and on each trial it is described as a draw from a normal distribution, centered around its mean position. The subject’s choice is modeled as a comparison between the current target and the competitor representations, with the subject choosing the competitor whose representation is closest to that of the target. Our analysis method takes as an input the subject’s choices across a series of trials and, via a numerical search procedure, recovers the mean position of the target and each of the competitors in the underlying perceptual representation that best accounts for the subject’s choices measured in the experiment.

In the recovered representation, the position of the target is the selection-based match for a given illumination condition. Conceptually, the selection-based match is equivalent to the selection-based point of subjective equality; it is the color sample that the subject would select on the majority of trials as “the closest one to the target” over any other competitor (see Radonjić et al., 2015a).

When the position of the selection-based match falls within the range of the competitors, we assume that the relative distances between the match and the two adjacent competitors are preserved in the recovered representation and we use linear extrapolation to infer the CIELAB coordinates of the selection-based match. In the illuminant-constant condition, in which both the target and the competitors are presented under the same illumination, the selection-based match is expected to fall at the tristimulus match. In the illuminant-change condition, the distance between the selection-based match from the reflectance match indicates the degree of color constancy in the color selection task: the closer the selection-based match is to the reflectance match, the higher the constancy.

For each target and illuminant-change condition we quantify constancy by computing a color constancy index (CCI) following the formula:

CCI = 1 - (b / a),

where b denotes the Euclidian distance (in three-dimensional CIELAB space) between the selection-based match and the reflectance match and a denotes the distance between the tristimulus match and the reflectance match (Arend et al., 1991).

In some cases the recovered position of the selection-based match falls out of the range of competitors. In the illuminant-changed condition, this would occur, for example, if across all pairwise combinations of the competitors a subject always chose the sample that is closer to the reflectance match. In this case, the selection-based match is not well constrained by the data and its recovered position will be beyond the reflectance match, in the direction of overconstancy. Similarly, if a subject always chose the competitor in the pair that is closer to the tristimulus match, the selection-based match would fall out-of-range on the tristimulus match end. Rather than excluding these out-of-range matches from the analysis, we assigned them a position in color space that qualitatively captured the underlying pattern of choices. That is, we assigned them the coordinates that were outside of the range of competitors, but along the same line in CIELAB space, at 1/10 of the inter-competitor distance from R (out-of-range on the reflectance end) or T (out-of-range on the tristimulus end). These positions that corresponded to the selection-based color constancy indices of 0.975 and −0.025, respectively. Out-of-range matches occurred with some frequency in Experiment 1 (as we discuss below), but in only one instance in Experiment 2 (subject kkd; rose target in the yellowish illuminant-change condition).

Data analysis: Asymmetric matching

For the asymmetric matching task, for each subject we computed the mean match (across all repetitions) for a given target and illuminant condition. We then used this match to compute color constancy indices using the same formula we used to quantify constancy in the color selection task.

Results

For each of our 16 subjects, Figure 3A shows the mean recovered position of the selection-based match in each illuminant condition, averaged across targets. Figure 3B shows the selection-based color constancy indices for the yellowish and bluish illumination change.

Panel A shows the mean position of the subject’s selection-based match (averaged across targets) relative to the space of competitors. Panel B shows subject’s mean selection-based color constancy indices. Panel C shows mean adjustment-based color constancy indices measured with asymmetric matching. Subjects that belong to different instructional groups are separated by vertical lines and letters on top of the plot indicate instructional condition (N: neutral, PS: physical spectrum, OR: objective reflectance, AR: apparent reflectance). The illuminant-constant condition is shown in gray, yellowish illuminant-changed condition in orange and bluish illuminant-changed condition in blue. Error bars represent +/− 1 SEM. Within each group different symbols indicate different subjects.

The data reveal clear instructional effects. When subjects are asked to select objects based on their surface reflectance properties, constancy was higher than when they were instructed to make selections based on reflected light or when they were given neutral instructions. A three-way repeated-measure analysis of variance (ANOVA), with instructional group as between-subject factor (4 groups) and test illuminant (yellowish vs. bluish) and target reflectance (4 different reflectances) as within-subject factors, revealed a significant main effect of instructions, F(3, 12) = 11.26; p = 0.001. We did not find a significant main effect of target or test illuminant, or any significant interaction between the factors (although there were trends for main effect of illuminant, F(1, 12) = 4.68, p = 0.051, and Illuminant x Instruction interaction, F(3, 12) = 3.00, p = 0.07). Supporting material available online provides the complete results for all statistical analyses we conducted.

The instructional effects we find are large: the mean color constancy index was 0.10 for both the neutral and the physical spectrum group, but 0.60 for the objective reflectance group and 0.77 for the apparent reflectance group. Figure 4A (filled bars) shows mean constancy indices for each group, averaged over subjects and test illuminants. We used a bootstrapping procedure to further explore differences in constancy between the groups. On each iteration of the bootstrapping, we sampled (randomly, with replacement) a new set of 4 subjects from the 4 actual subjects in a given instructional group. We then computed the mean constancy indices for each group over 2000 iterations of the resampling along with the corresponding 90% confidence intervals (Efron & Tibshirani, 1993, p. 178–201). The bootstrapped means were essentially identical to the group means (+/− 0.005); the bootstrapped 90% confidence intervals are plotted as error bars in Figure 4A. Comparing the confidence intervals across groups suggests that: (1) constancy was lower for the neutral group than for either the objective or apparent reflectance groups, (2) constancy was also lower for the physical spectrum group than either the objective or apparent reflectance groups and (3) the neutral and physical spectrum groups did not differ from each other in constancy, nor did the objective and apparent reflectance groups.

Panel A shows mean constancy indices for each group for color selection (filled bars) and asymmetric matching (open bars). Panel B shows the standard deviations of the constancy indices for each group. In both panels, error bars are bootstrapped 90% confidence intervals. Panel C plots mean (over targets) selection-based indices against corresponding mean adjustment-based indices for subjects in the neutral (red) and physical spectrum (green) instructions groups. Panel D shows the corresponding plot for objective reflectance (black) and apparent reflectance (gray) groups. Error bars represent +/− 1 SEM. Symbols for individual subjects are the same as in Figure 3. In panels C and D there are two points for each subject as the means for each illuminant-changed condition are plotted separately.

The same general trends are revealed by the constancy indices obtained from the asymmetric matches (Figure 3C), but the instructional effects we measured were smaller. We found significant differences in constancy across groups (main effect of instructions F(3,12) = 3.74, p < 0.05): matches of subjects who received neutral or physical spectrum instructions indicated considerably lower degree of constancy (mean CCI of 0.06 and 0.07) than matches of subjects who received objective or apparent reflectance instructions (mean CCI of 0.24 and 0.37). Although the overall degree of constancy varied across targets (main effect of target, F(3,36) = 3.10, p < 0.05; Illuminant x Target interaction, F(3, 36) = 4.31; p < 0.05), these variations were independent of the instructional manipulation (we did not find a significant Instructions x Target, Instructions x Illuminant or Instructions x Target x Illuminant interaction; values for all statistical tests are provided in the supporting material available online). Figure 4A shows mean constancy indices across groups for the asymmetric matching task (open bars) with error-bars representing bootstrapped 90% confidence intervals. Comparing the confidence intervals across groups shows the same pattern of differences as we found for the color selection task: constancy did not differ for the neutral and the physical spectrum groups or for the two reflectance groups, but was higher for the two reflectance groups than for the neutral/physical spectrum groups.

We analyzed the variability in constancy indices across instructional groups, using the same bootstrapping method we used to analyze the means. Figure 4B shows the standard deviations of constancy indices for the color selection task (filled bars) and the asymmetric matching task (open bars), with error bars representing 90% bootstrapped confidence intervals. For both tasks, the overall variability was lower for the neutral and the physical spectrum groups than for the objective and the apparent reflectance groups. For the color selection task, this difference exceeded the measurement variability as indicated by the confidence intervals. For the asymmetric matching task the measurement variability was larger and confidence intervals across instructional groups overlapped.

To understand better the nature of instructional effects, we compared subjects’ performance across the two tasks. Figure 4 (panels C and D) plots mean adjustment-based constancy indices measured in the asymmetric matching task against the selection-based constancy indices measured in the color selection task for each subject and test illuminant. For clarity, neutral and physical spectrum instructions groups are plotted in Figure 4C while objective reflectance and apparent reflectance groups are plotted in Figure 4D. The diagonal in the panels indicates the identity line: the closer the matches are to the diagonal, the better the agreement between two constancy measures.

The figure illustrates three main differences between the neutral and physical spectrum instructions groups on the one hand and the objective and apparent reflectance instructions groups on the other. For the neutral and physical spectrum instructions groups (1) overall constancy is low in both tasks, (2) variability between subjects is also low and (3) the selection-based and adjustment-based asymmetric matches are in good agreement: Subjects’ matches group close to the diagonal. In contrast, in the objective and apparent reflectance groups (1) overall constancy is higher in both tasks, (2) variability between subjects is fairly high, and (3) the selection-based matches systematically deviate from the adjustment-based asymmetric matches: all matches in the figure 4D lie above the identity line, indicating that selection-based constancy is higher than constancy measured with the asymmetric matching task.

To contrast the patterns of matches across instructional groups we also compared the degree of constancy across tasks (averaged across targets and illuminants) via a repeated measure ANOVA (with instructional group as a between-subject factor and task as a within-subject factor). For all instructional groups, overall constancy was higher in the color selection than for the asymmetric matching task (main effect of task F(1,12) = 56.61, p < 0.001). This difference was small for the neutral and physical spectrum groups (0.05 and 0.04, respectively, relative to typical CCI range of variation of 0 to 1) but quite pronounced for the objective and apparent reflectance groups (0.24 and 0.53) leading to a significant Task x Instructions interaction, F(3,12) = 16.94; p < 0.001. Consistent with our main findings, there was also a main effect of instructions, F(3, 12) = 7.66, p < 0.01.

We also examined the difference in performance across instructional groups by examining the pattern of subject matches across sessions. We did this for the color selection data, where subjects completed between 4 and 5 sessions; exploring such effects for our asymmetric matching data set was not practical because subjects completed only two illuminant-changed sessions for this task.

Figure 5A plots constancy indices as a function of session for two subjects. For subject iul (neutral instructions) constancy does not vary systematically across sessions (left panel). This is typical of all of the subjects in the neutral and physical spectrum groups. In contrast, for subject zdc (apparent reflectance instructions) constancy increases systematically with session (right panel); two other subjects (also from the apparent reflectance group) showed a similar pattern.

Panel A shows examples of two distinct patterns of responses in the color-selection task. Mean selection-based indices (averaged over targets) as a function of session are shown for subjects iul (neutral instructions; left) and zdc (apparent reflectance; right). Error bars represent +/−1 SEM. Panel B shows slope for linear fits to each subject’s data across sessions. The yellowish illuminant-changed condition is shown in orange; the bluish illuminant-changed condition in blue.

To quantify the stability of constancy for each subject, we fitted a line to the subject’s selection-based constancy indices as a function of session. The slope of this line then characterizes any systematic linear change. For each subject and illuminant condition these slopes are shown in Figure 5B. The slopes are close to zero for most subjects, indicating stable constancy across sessions. The slopes are high for three of the subjects in the apparent reflectance group. Interestingly, for two of these subjects (zdc and mik) the variation in constancy across sessions spanned practically the entire available range, with low constancy in the first session, similar to the levels measured in the neutral and physical spectrum instructions groups, and excellent constancy in the last session.

Plots showing selection-based constancy indices across sessions for each subject are available in the supporting material and include the lines fitted to the data. There we also provide the plots of selection-based matches across sessions for all illuminant conditions; there are only 2–3 sessions to compare in the illuminant-constant condition, but these matches appear fairly stable for all subjects.

Although we find differences in constancy with instructions, it seems unlikely that these are due to individual or group differences in how precisely the subjects performed our experimental tasks. In the illuminant-constant condition, in which the selections or adjustments are made under uniform illumination, subject matches approximate the target well and their precision, measured as the match-to-target distance in CIELAB ΔE, does not significantly differ across instructional groups. For both the color-selection task and the asymmetric matching task a two-way repeated-measures ANOVA (with instructional group as a between-subject factor and target reflectance as within-subject factor) failed to reveal a significant main effect of instructions on the match-to-target precision or Instruction x Target interaction (see supporting material), although in the asymmetric matching task the overall precision of matches differed across targets (main effect of target: F = 6.45, p < 0.01). Moreover, in both tasks the absolute precision of the illuminant-constant matches was good in an absolute sense: mean ΔE across subjects was 1.1, (ranging from 0.6 to 2.1 ΔE) for the color selection and 1.6 (varying from 0.9 to 2.3 ΔE) for the asymmetric matching task.

The instructional effects we find (0.59 in the color selection task; 0.24 in the asymmetric matching task; computed as the difference between neutral/physical spectrum groups and the objective/apparent reflectance groups) are comparable to those previously reported for studies that used similarly simple chromatic stimuli and simultaneous asymmetric matching (mean 0.34, range: 0.19 – 0.57, computed across 9 different experiments from 5 different studies, see also Foster, 2011). Interestingly, the overall constancy in our asymmetric matching task was somewhat lower for both instructional categories (0.06 vs. 0.30) than in the previous studies (0.22 vs. 0.57), possibly due to differences in stimuli we used (which featured a textured background and where the illumination change was presented within a continuous image, rather than across two separated images).

Experiment 2

In Experiment 2, we measured the effect of instructions for the color selection and the asymmetric matching task using more naturalistic stimuli. The basic logic of Experiment 2 was the same as for Experiment 1.