Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Apr 22.
Published in final edited form as: J Vis. 2011 Oct 6;11(12):10.1167/11.12.2 2. doi: 10.1167/11.12.2

Color names, color categories, and color-cued visual search: Sometimes, color perception is not categorical

Angela M Brown 1,*, Delwin T Lindsey 2, Kevin M Guckes 1
PMCID: PMC3632453  NIHMSID: NIHMS318725  PMID: 21980188

Abstract

The relation between colors and their names is a classic case-study for investigating the Sapir-Whorf hypothesis that categorical perception is imposed on perception by language. Here, we investigate the Sapir-Whorf prediction that visual search for a green target presented among blue distractors (or vice versa) should be faster than search for a green target presented among distractors of a different color of green (or for a blue target among different blue distractors). Gilbert, Regier, Kay & Ivry (2006) reported that this Sapir-Whorf effect is restricted to the right visual field (RVF), because the major brain language centers are in the left cerebral hemisphere. We found no categorical effect at the Green|Blue color boundary, and no categorical effect restricted to the RVF. Scaling of perceived color differences by Maximum Likelihood Difference Scaling (MLDS) also showed no categorical effect, including no effect specific to the RVF. Two models fit the data: a color difference model based on MLDS and a standard opponent-colors model of color discrimination based on the spectral sensitivities of the cones. Neither of these models, nor any of our data, suggested categorical perception of colors at the Green|Blue boundary, in either visual field.

Introduction

The Sapir-Whorf hypothesis has been influential in the fields of psychology, philosophy, anthropology, and linguistics. According to the Sapir-Whorf hypothesis, our perception of stimuli depends on the names we give to them. Following the classic definition of categorical perception (reviewed in Harnad, 1987), stimuli within categories are perceptually similar to one another, and are given the same name, whereas stimuli from different categories look different and are given different names. According to the Sapir-Whorf hypothesis, it is the distinctive names that are responsible for the categorical perception of stimuli.

Color names are a classic topic of research within the tradition of the Sapir-Whorf hypothesis. Almost all world languages have at least some color names in their lexicons, yet there is important variation around the world in how many color terms are used (Kay, Berlin, Maffi, Merrifield, & Cook, 2009). Furthermore, almost all people have color vision, which can be easily measured in the laboratory and in the field. And finally, the spectral composition of light is continuously variable, so any categorical perception of colors that might exist must be due to the perception of the observer rather than to the stimuli themselves. Colors are easy to specify, measure, and produce, which allows the relations among the physical properties of colors, the perception of colors, and the naming of colors, to rest on a quantitative basis.

Many investigators have attempted to link the color categories defined by color names to other aspects of color-based visual performance such as judged similarity (Kay & Kempton, 1984) and visual search (Daoutis, Franklin, Riddett, Clifford, & Davies, 2006; Lindsey, et al., 2010), and to visual physiology more generally (Ratliff, 1976; Lindsey & Brown, 2002). Particularly, a recent paper by Gilbert, et al. (2006) (henceforth, “Gilbert et al.”) claims that visual search reaction time for colors is about 0.024 sec. faster across the Green|Blue color category boundary than in the case of stimuli that are within the Green category or within the Blue category, but only for stimuli presented in the right visual field (RVF). Gilbert et al. found no category effect in the left visual field (LVF). This result suggested in a general way that the language centers in the left cerebral hemisphere are important. Here, we attempt to replicate and extend that result.

Overview of the experiments

Experiments I – IV were visual search reaction time (RT) studies in which the observer viewed a circle of 12 colored stimuli (Fig. 1a,b) and identified the odd-one-out. Classically, if stimuli are perceived categorically, targets are found faster when they are in a different category from the distractors than when they are in the same category as the distractors. We did not find evidence of categorical perception that is restricted to the right visual field. In Experiment I, the target and distractor stimuli were similar to those used in Gilbert et al. Our stimuli replicated those used in Kay and Kempton (Kay & Kempton, 1984), being at constant value and chroma, but varying in hue, within the Munsell color ordering system. As in Gilbert et al., the visual search data were button-press reaction-times. In Experiment II, the stimuli were at constant distance from white in CIELAB 1976, a color space based on the discriminability of colors (Schanda, 2007), and the search data were saccadic RTs as the observer looked at the target. In Experiment III, the stimuli were a subset of those used in Experiment II, but the method was button-press RT. In Experiment IV, the stimuli were also at constant eccentricity in CIELAB, while two technical aspects of the experiment were manipulated: (1) either the surrounding field was dark, as in our experiments I – III, or light, as it was in Gilbert et al., and (2) saccadic RTs were collected as the observer either looked at the target, or looked at a “button” on the right or left hand side of the display, according to whether the target was on the right or the left side of the fixation point. We compared the color at the RT minimum from Experiments I – IV to each observer's Green|Blue boundary, and found that the fastest color was not related to the Green|Blue boundary.

Fig. 1.

Fig. 1

Stimulus configurations: a, Experiments I and III; b, Experiments II and IV. For the RT experiments, the positions of the “odd” target stimulus varied randomly from trial to trial; for the MOA measurement of the Green|Blue boundary, the “odd” target stimulus was in position 2 (panel a) or 1 (panel b) on all trials. c, top-bottom arrangement in Experiment V, used with the observers from Experiment II; d, right-visual-field stimulus in Experiment V, used with the observers from Experiment IV. The numerals by the disks are for exposition, and the black lines around the targets in a and b are for clarity and none of these were present in the actual stimuli. The colors are not colorimetrically correct because they have been adjusted to show up well on the reader's media (computer screen or printout).

In Experiment V, we addressed the question of whether these experiments provide evidence of categorical perception. We collected scaling data using the Maximum Likelihood Difference Scaling (MLDS) method of Maloney & Yang (2003), and used the scaled color differences to predict the shape of the RT data sets. Those data agreed well, suggesting that the RT data were related to the perceived differences between the colors: RT was shorter when the colors looked more different, and RT was longer when the colors looked more similar. Finally, we modeled the RT data using a standard color-opponent model (Lindsey, et al., 2010). We found no particularly large discrepancy between this standard model and the data in the neighborhood of the Green|Blue boundary, such as that predicted by Gilbert et al. and others. Taken as a whole, the results and analyses suggested that the overall shape of the RT data sets was controlled entirely by visual signals that arise in the cones and are combined in a color-opponent fashion in the earliest stages of visual processing.

General Methods for Experiments I—IV

These studies were carried out under the tenets of the Declaration of Helsinki, and were approved by the Biomedical Institutional Review Board of the Ohio State University. All subjects participated after informed written consent was obtained.

The basic stimulus configuration was a ring of 12 color samples, eleven of which were distractors of one color and one of which was a target of an “odd” color, which was either greener or bluer than the eleven distractors (Fig. 1a, b and Appendix I; calibrations performed with a Pritchard PR-670 SpectraScan spectrophotometer).

The ring of colored samples (disks in Experiments I, III, squares in Experiments II, IV), appeared around the fixation point and remained present until the observer responded. The observer's task was to determine which of the 12 stimuli was the “odd one”, i.e., the target. In Experiments I and III, the observer responded promptly by pressing a button to indicate whether the “odd one” was to the right or left of the fixation point. Color names were never used in the instructions for the RT experiment, and observers generally did not know that color terms or color categories were of interest to the experimenters, except for co-authors AMB and KMG, whose data are clearly marked in every figure where they are reported. The visual-search but-ton-press reaction-time (RT) was the time elapsed between the appearance of the ring of stimuli and the button press, on correct trials only. In Experiments II and IV, the observer looked directly at the target, and the saccadic RT was the time between the onset of the stimulus and the arrival of the line of sight at the correct target stimulus after a single saccadic eye movement. The other details of the stimuli are in Appendix I. The RT results are expressed in units of seconds, as a function of the color azimuth in CIELAB.

The experimental hypothesis, based on Gilbert et al. and Kay and Kempton (1984), was that reaction times for the between-color-category stimulus combinations should be shorter than for the within-color-category combinations, but that this Whorfian effect should apply only to the right visual field. Inasmuch as this hypothesis concerned whether colors were categorically green or blue, each of our experiments included a measurement of each observer's Green|Blue boundary, which only occurred after the RT phase of the experiment was completed. The stimulus for this phase of each experiment was the same ring of disks or squares used for the RT phase of the experiment. Eleven of the stimuli were one color and one was a different color, either greener or bluer than the other eleven. The separation in color space between the colors was close to the same as that used in the corresponding RT experiment. On a given method-of-adjustment (MOA) trial, the subject used the computer mouse to adjust the colors so that the the Green|Blue boundary was halfway between the two colors. This was true when the colors straddled the Green|Blue boundary and the greener stimulus looked as green as the bluer stimuli looked blue (or vice-versa, in the case of a bluer target and greener distractors). While the method of constant stimuli has a better reputation among psychophysicists as a method of estimating threshold, we found a pronounced effect of the stimulus range on the measured Green|Blue boundary in pilot experiments (Appendix II), which would likely have biased our results. This relation between the stimulus range and measured boundary was not evident in our MOA data, so that is the method we chose.

Throughout this paper, colors are specified by their azimuth angles within a constant-luminance plane in CIELAB 1976 Uniform Color Space (Schanda, 2007), relative to the point (1, 0). RT and MOA data are plotted at the average of the target and distractor azimuth values. A preliminary Mixed-Procedure Analysis of Variance (SPSS) revealed that there were no statistically reliable differences between the target-greener and target-bluer stimulus combinations for a given average azimuth value, so the data for those two stimulus combinations were pooled throughout. Each RT data point is plotted at the average of the azimuth values of the targets and distractors.

Experiment I

Experiment I was designed to replicate the basic result of Gilbert et al. by relating each observer's Green|Blue boundary to his/her search behavior in a button-press latency experiment. The target and distractor colors were at value 5 and chroma 6 of the Munsell color order system and were separated by 5 Munsell hue steps (chromaticities shown in Appendix I). The first four colors (chroma = 7.5G, 2.5BG, 7.5BG, 2.5B) were from Kay & Kempton (1984), as were the intended colors in Gilbert et al. The fifth color was bluer (chroma = 7.5B). To express the results of Experiment I relative to an interval-scaled numerical axis, we calculated the positions of our colors in the CIELAB 1976 Uniform Color Space, and we express our results in units of color azimuth within an isoluminant plane in that space. Because of the way the Munsell color ordering system was constructed, the separation in units of color azimuth in CIELAB space, and separation in ΔE units in CIELAB space, varied from stimulus pair to stimulus pair (minimum=10.4, maximum=14.8). The colors were presented on a CRT video monitor at 41.23 cd/m^2 within a dimmer gray surrounding field (35 cd/m^2) with a fixation stimulus in the center (Appendix I).

Fifteen monolingual native English-speaking adults (ages 20 – 62, median = 41), served as observers. All were right-handed (7 female) and all had normal color vision (D-15 panel). Two were aware of the purposes of the experiment (co-authors AMB and KMG), and the others were experimentally naïve. The observer's task was to find the target, and press the “S” key on the computer keyboard with the left hand if the target was on the left half of the display, or the “L” key with the right hand if the stimulus was on the right half of the display. The computer recorded the subject's RT. Each observer contributed eight iterations through the stimulus set (a total of 960 trials).

Results

On average, observers chose the correct stimulus on 98.6% of trials. Each observer's Green| Blue boundary value (Fig. 2) is indicated by a black triangle just below his/her RT data set. The observers’ average Green|Blue boundary was at 183.46° of azimuth in CIELAB color space (standard deviation = 9.13°). This is within the range reported by Gilbert et al., and did not differ significantly for the LVF compared to the RVF (t14 = 0.736, N.S.; average data in Fig. 3a).

Fig. 2.

Fig. 2

RT results from Experiment I, on 15 observers. Black disks, LVF; white disks, RVF. Each pair of LVF, RVF curves is for a different observer. The displacement constant is 0.5 sec. That is, the lowermost observer's data in the left-hand panel are plotted at the correct RT value, and each of the other observers’ data are displaced upward for clarity, by an integral multiple of 0.75 sec. Black triangles: each observer's Green|Blue boundary, plotted at an arbitrary y-axis value to point at the position where the RT minimum was predicted to be * = co-author KMG; §=co-author AMB.

Fig. 3.

Fig. 3

Analyses of the results of Experiments I, II, and III. a, average data from Fig. 2, ± 1 s.e.m. Black triangle: the average Green|Blue boundary. Dashed line: the colors that were used in the statistical analysis of the local minima. b, The color at which the minimum of the best- fitting parabola occurred, as a function of the Green|Blue boundary. If the RT minimum occurred reliably at the Green|Blue boundary, the two measures would be equal and highly correlated (dashed line). Instead, they were uncorrelated (solid line) and the RT minimum was at a bluer color azimuth than the Green|Blue boundary. c. RT DIFFERENCE as a function of COLOR DIFFERENCE for Experiment I; see text for description of the axes units. The prediction from Gilbert et al. is that the minimum value should be RT DIFFERENCE = -0.024 sec. at COLOR DIFFERENCE = 0 (white disk), and that function should rise to RT DIFFERENCE = 0 for the conditions where both target and distractor are on the same side of zero (black curve). The average value of RT DIFFERENCE at COLOR DIFFERENCE = 0 is statistically significantly different from the prediction. d, e, f, Analysis of the results of Experiment II; g, h, I, analysis of the results of Experiment III. d, g, conventions as in a; e, h, conventions as in b; f, I, conventions as in c.

According to the classic definition of categorical color perception, if color perception had been categorical in this experiment, the fastest RT should have occurred at the Green|Blue boundary. To determine whether this was the case, we estimated the color azimuth of the fastest color combination in each observer's data set, pooled across LVF and RVF, by fitting a quadratic equation to the three data points indicated by the dashed line in Fig. 3a, taking the fastest color to be the minimum of that function. Fig. 3b shows the fastest color for each observer as a function of his/her Green|Blue boundary. Contrary to the prediction, the minimum of the RT data was generally bluer than the Green|Blue boundary (average minimum: 192.07° in CIELAB, SD = 5.08; paired t14 = 3.181, p=0.0067), and the correlation between the minima and the boundaries was not statistically significant (r = 0.256, p=0.1779, one-tailed) (Fig.3b). Thus, the minimum RT values of the data set were not strongly related to the observers’ measured Green|Blue boundaries.

Statistical analyses

The first analysis was designed to determine whether the results agreed with those of Gilbert et al. This was a GLM (SPSS) analysis of the data for the three conditions that were the same for this experiment and the Gilbert et al. experiment. The results from the green target, green distractor (average azimuth = 173.76°) and the results from the blue target, blue distractor (average azimuth = 207.97°), were averaged into a single “within-category” group. The within-category results were compared to the between-category group, consisting of the trials with the green target and blue distractors and the blue target and green distractors (average azimuth = 189.16°). Thus, the ANOVA had factors for the visual field of presentation (L-R), the categories of target and distractors (Within-vs-Between (W-B) category groups) and Subjects (SUBJ). The analysis revealed a highly statistically significant effect of W-B (F(1,14) = 30.101, p<.0005, Fig. 4). This result indicates that the data showed a statistically significant minimum near the color azimuth of 189.16° in CIELAB. There was also a significant overall effect of L-R (stimuli in the LVF were slightly faster: F(1,14) = 6.054, p=.027), but there was no statistically significant interaction between L-R and W-B (F(1,14) = .008, p=.932), and no interaction between those factors and SUBJ. We repeated the analysis based on different assumptions, by considering the green target/green distractor separately from the blue target/blue distractor, and by using each subject's own Green|Blue boundary instead of the group average. Neither of these variations yielded results materially different from the original. Thus, when we repeated the statistical analysis of Gilbert et al., on data that were collected using their reported stimuli and method, we saw no evidence that between-category target stimuli are found faster in the RVF than in the LVF.

Fig. 4.

Fig. 4

Bargraph of RTs from Experiment I, combined for analysis as in Gilbert et al. The RVF was slightly but significantly slower than the LVF, but, unlike in Gilbet et al., the RT difference between the “within color categories” and the “beteen color cateogies” conditions is not statistically significantly greater in the RVF than in the LVF.

To further examine the statistical power of our negative result, we performed an SPSS Mixed Procedure analysis of variance specifically aimed at determining whether our data excluded the 0.024 sec. advantage that Gilbert et al. reported for between-category stimulus color combinations, within the RVF only. This experimental hypothesis depended on whether the target was in the same color category as the distractors, but subjects differed in their Green|Blue color boundary locations. Therefore, we subtracted each subject's color boundary azimuth value from the color azimuths of the stimuli. The experimental hypothesis concerned whether the RT was faster in the RVF than in the LVF, so we subtracted each subject's LVF RT from his/her RVF RT. Thus, we created a data set that had bluer stimuli with positive “COLOR DIFFERENCE” (x-axis) numbers, greener stimuli with negative COLOR DIFFERENCE numbers, bluer stimuli with positive COLOR DIFFERENCE numbers, and color combinations that straddled the Green|Blue boundary at or near COLOR DIFFERENCE = zero (Fig. 3c). If the RVF were selectively faster at the Green|Blue boundary, the RVF – LVF “RT DIFFERENCE” would be at its minimum value at the Green-Blue boundary. When the data were presented in this way, the experimental question can be expressed as follows: Was the RT DIFFERENCE function U-shaped, with its minimum located at COLOR DIFFERENCE x= 0, and the RT DIFFERENCE y = -0.24, the difference reported by Gilbert et al.

The confidence interval containing the RVF – LVF RT difference at the Green|Blue boundary was 0.026142 – 0.003446, a marginally statistically significant effect in the opposite direction from our experimental hypothesis. Thus, our result has the statistical power to specifically reject the -0.024 sec reported by Gilbert et al.

Experiment II

Many investigators propose the Sapir-Whorf hypothesis as a general theory of visual perception under real-life conditions. Therefore, it is important to ask: are there experimental conditions under which the result of Gilbert et al. holds? The Munsell color order system used by Gilbert et al. and in Experiment I was based on the appearance of colors (see Nickerson, 1940 for a historical review). Perhaps RT depends on the discriminability of colors instead of on their appearance. Experiment II was designed to examine the possible generality of the results of Experiment I by using colors that were approximately equally spaced within CIELAB color space. We chose CIELAB because it was based on the color discrimination data of MacAdam and his colleagues (Wyszecki & Stiles, 1982).

Button-press RT has limited ecological validity: our forebears did not press buttons. On the other hand, human beings and many other animals do look at stimuli that they can see, and in the case of humans, this looking behavior is often in the form of saccadic eye movements. Therefore, in Experiment II, we presented the stimuli in a saccadic eye-movement paradigm, and measured the amount of time between the appearance of the stimulus display and the moment the subject's gaze reached the target stimulus.

The Green|Blue boundary

As in Experiment I, the Green|Blue boundary was measured on each observer using the MOA described above. The stimuli were presented using a CRT video monitor apparatus. The colors were at 35.4 cd/m^2, the target and distractors were separated by approximately 15° in CIELAB color space, and the surrounding field was at 25.2 cd/m^2. Eight monolingual, native English-speaking adults, aged 20 – 60, served as observers in the main experiment. All were right-handed (5 female) and all had normal color vision (D-15 panel). Except for co-author AMB, all observers were experimentally naïve.

Visual Search

The stimuli for the visual search experiment were presented on a rear projection screen using a video projector (JVC DLA-M2000L). After the observer was fixating a tiny dot in the center of the screen, a 750 msec black annulus (diameter = 2.6° v.a., thickness= 0.1° v.a.) appeared, then was replaced by the 20° diameter ring of 12 colored disks. The ring of disks remained in place until the observer responded. The observer's task was to determine which of the colored disks (the target) had a different color from the others, and to fixate it as quickly as possible, while maintaining good accuracy in the choice of disk. The chromaticites of the stimuli are in Appendix I. The colors were spaced approximately every 4.5° of azimuth in CIELAB color space, but the greenest color (as target or distractor) was paired with fourth-greenest color (as distractor or target), and so forth, so the target and distractor colors were always separated by about 14.1° of azimuth in CIELAB color space (average ΔE = 10.7). This strategy ensured that at least one stimulus-distractor color combination would straddle the boundary for each observer, no matter where over the tested range the Green|Blue boundary fell. An eyetracker (Tobii X120) measured the saccadic RT, that is, the elapsed time between the onset of the stimulus and the time when the first saccadic eye movement arrived at the correct target.

Results of Experiment II

Observers fixated the correct stimulus on 94.5% of trials, and each observer contributed data from between 5 and 11 iterations (median = 9.5) through the stimulus set, depending on observer availability and the proper function of the equipment. A ninth observer's data were discarded because of technical difficulties with the eyetracker.

Each observer's RT data set (Fig. 4) is presented along with the corresponding Green|Blue boundary value, which is indicated by a black triangle just below his/her RT data set. According to Kay and Kempton (1984) and Gilbert et al., the Sapir-Whorf hypothesis predicts that this estimate of the fastest RT will occur at the Green|Blue boundary. Contrary to that prediction, visual inspection reveals that the numerical minimum of these RT data sets was not consistently close to the boundary. As in Experiment I, we estimated the color azimuth of the fastest color in each observer's data as the minimum of a quadratic function fitted to the range of the five RTs indicated by the dashed line on the graph of the average data (Fig. 3d). The fastest color (184.97° in CIELAB, SD = 1.67) was reliably greener than the Green|Blue boundary (193.63° in CIELAB, SD = 1.67; paired t7 = 3.40, p=0.012), and the two values were not reliably positively correlated with one another (r=-0.310, p=0.228) (Fig. 3e). The coefficients of those quadratic functions were statistically significantly greater than zero (t7=5.96, p=0.0006), indicating that the data show a local minimum over the range of colors indicated by the dashed line.

Data Analysis

As in Experiment I, we examined the linear RVF – LVF RT DIFFERENCE data, normalized along the x-axis so that each subject's Green|Bue boundary was at COLOR DIFFERENCE=0 (Fig. 3f). The resulting analysis of the 95% confidence intervals around the value at COLOR DIFFERENCE=0 was y=-0.020058 to y= +0.0214, which excludes the value y = -0.024. Thus, this data set excluded Gilbert's result.

Experiment III

Perhaps the problem with Experiment II was that the dependent measure was saccadic latency rather than manual button-presses, as in Gilbert et al. For example, others have pointed out that an extrageniculostriate pathway, and perhaps a geniculostriate pathway via the dorsal stream, are probably involved in targeting saccadic eye movements to a certain location in oculocentric visual space. In contrast, the geniculostriate ventral stream probably mediates cognitive decisions such as whether to press the right or the left button, according to where the target is located in allocentric visual space. Any Whorfian language effect might be more important for a ventral-stream task than for a dorsal-stream task, so maybe our failure to replicate Gilbert et al. in Expriment II was due to the task rather than to any fundamental failure of the Whorf hypothesis. Therefore, in Experiment III, we repeated Experiment II using a manual button-press task rather than a saccadic fixation task.

The RT and MOA experiments were performed on the same CRT apparatus as Experiment I (Appendix I). The chromaticites were paired (target and distractor colors) into 5 color pairs, each target and distractor being separated by approximately 15.9° of azimuth in CIELAB color space (average ΔE = 11.6) (see Appendix 1). The average stimulus chromaticities were spaced every 10° around a circle in CIELAB color space. Here, the stimuli were 12 square patches of color, situated on a ring of radius 20 degrees of visual angle. As in Experiment I, the observer pressed the “S” key with the left hand if the target stimulus appeared to the left of the fixation point, and the “L” key with the right hand if the target appeared on the right. We collected 8 iterations through the RT experiment, on 13 color-normal adult observers (D-15 panel; ages 24 – 62, median =37; 9 female). The observers also provided MOA measurements of their Green|Blue boundaries.

Results

Observers pressed the correct key on 97.6% of trials. Inspection of the individual RT data (Fig. 6) shows no reliable minimum at the Green|Blue boundary. As before, we estimated the color azimuth of the fastest color by fitting a quadratic equation to each observer's data set over the range of three stimulus combinations indicated by the dashed line near the average data in Fig. 3g. The average minimum was at 181.26° in CIELAB, SD = 4.42, and was at a reliably greener color than the MOA boundaries (201.58 ° in CIELAB, SD = 5.59; paired t13 = 8.701, p<0.001). Furthermore, there was no relation between the azimuths of the RT minima and the MOA boundary data (r=-0.405, p=0.067) (Fig 3H). As in Experiment II, we examined the coefficients of the quadratic functions that went into that analysis, and found that they were reliably positive (t12=15.40, p<.0001), indicating that the RT functions were reliably U-shaped. We normalized the data as in Experiments I and II, by subtracting the LVF data from the RVF data, and subtracting each observer's Green|Blue boundary from the color azimuth value, for each color azimuth (Fig. 3i). The SPSS Mixed Procedure analysis revealed that the 95% confidence interval for the difference between the LVF and RVF data at the boundary was y= +0.015 – y= +0.042. Thus, this data set was also inconsistent with the idea that the Whorf hypothesis applies only to the RVF, and the -0.024 second difference between the LVF and the RVF RTs at the Green|Blue data was specifically excluded by this data set. Thus, the failure of Experiment II to reveal a consistent effect like what Gilbert reported was not due to our use of saccadic RT as the dependent measure.

Fig. 6.

Fig. 6

Individual RT button-press reaction-time data from Exeriment III. Displacement parameter: 0.75 sec. Symbols: the same subjects as in Fig. 4. Other conventions as in Fig. 2.

This experiment was designed using button-press RT to determine whether the results of Experiment II were due to the use of saccadic RT as a dependent measure. The results of Experiments II and III resembled each other closely. Three observers served in both experiments (coauthor AMB, KTN and BUI, indicated by the symbols in Figs. 5, 6), so it is of interest to compare the results of those observers across the two methodologies (Fig 7). Saccadic RT was generally faster than button-press RT. This suggested that either the time required to execute the motor response was longer for the button-press task than the eye-movement task, or else the choice between the 12 colors in Experiment II took longer than the choice between the two response keys in Experiment III. In any case, both of these tasks are choice RTs, and the RTs are always longer than the classic saccadic or button-press latency values for simple RTs. Furthermore, each observer's RT function had a consistent shape across the two experiments, which suggested that the shape was governed by sensory and perceptual factors that were common to the two experiments, rather than random variation or constant factors related to the decision or to the generation of the response itself.

Fig. 5.

Fig. 5

RT results from Experiment II. Displacement parameter = 0.75 sec. §, co-author AMB; #, BUI. ⦰, KTN. Other conventions as in Fig. 2.

Fig. 7.

Fig. 7

a–d RT data from four individual observers who served in both button-press RT and saccadic RT experiments. Subjects AMB, KTN, and BUI served in Experiments II and III; subject KMG served in Experiments III and IV. RT for the saccadic eye movements to one of 12 color samples (black disks) was reliably faster than for the button press of one of two response keys (white disks), and the shape of the RT function for each subject was similar across the two tasks. e, f: average saccadic RT data from Experiment IV. RT for the look-at-the-button task (black symbols) was reliably faster than for the look-at-the-button task (white symbols).

Experiment IV

We also ran an intensive series of saccadic RT measurements on three observers to explore the impact of two decisions we made in Experiment II: the decision to use a dark rather than a light background, and the decision to have our observers look at the target, which did not require an explicit left-right decision before responding. These three observers were tested in a 2 × 2 experimental design. The first factor was a light vs. a dark surrounding field. The second factor was task: the observers either looked at the target, as in Experiment II, or they looked at a “button” dot affixed to the right or left side of the screen (Fig. 1b) according to whether the target appeared on the right or the left half of the display. The apparatus for the RT experiment was the same rear projection video screen as we used for the RT data in Experiment II. We used the MOA to measure the Green|Blue boundary of each observer, as in Experiment II, only using the rear projection screen apparatus. The pairs of target and distractor stimuli were separated by about 15.9° in CIELAB color space (average ΔE = 17.2).

Results

A trial was accepted as correct if the first saccadic eye movement arrived within an area of interest around the correct target in the look-at-the-target condition, and around the correct “button” on the correct side in the look-at-the-button condition. By these criteria, performance was 88% correct in the look-at-the target condition and 90.7% correct in the look-at-the-button condition. The color at which the fastest RT occurred was not reliably related to the Green|Blue boundary, as can be seen in the individual data sets (Fig. 8). As in Experiments I – III, we fitted descriptive quadratic equations to the RT data sets over the range of five data points indicated by the dashed lines at the top of Fig. 8, and extracted an estimate of the local RT minimum for each subject under each condition within that range. The average of all the RT minima in the experiment was at 178.38° in CIELAB, SD = 5.33, which was greener than the MOA settings (199.97° in CIELAB, SD = 5.07, paired t11 = 2.91, p=0.014). As before, there was no clear relation between the RT minima extracted in this way and the Green|Blue boundaries measured using MOA (Fig. 9a; overall r= 0.015, p= 0.4813). The average value of the coefficients of the quadratic term was reliably greater than zero (t11=9.494, p<.0001), indicating that the data, when pooled across all the conditions, were U-shaped.

Fig. 8.

Fig. 8

Three observers’ data from Experiment IV. White symbols: RVF; black symbols: LVF; circles: look-at-target, dark surround; diamonds: look-at-target, light surround; triangles: left-right, dark surround; squares: left-right, light surround. Black triangles: Green|Blue boundaries. Displacement parameters for the three subjects were 0.0 (*: co-author KMG), 0.25 sec (†) and 0.35 sec (‡).

Fig. 9.

Fig. 9

Analyses of the data from Experiment IV. a, the fastest color as a function of the Green|Blue boundary; line conventions as in Fig 3b. These two quantities are unrelated to one another. b–e, RT DIFFERENCE as a function of COLOR DIFFERENCE. There is no clear tendency for the data to follow the black curve, so there is no obvious tendency for there to be a local minimum near -0.024 sec in the RT DIFFERENCE data.

Because of the small number of observers in Experiment IV, further statistical analysis is unwarranted. By inspection, we note that one of our three observers (co-author KMG, indicated by the *) showed extra-fast RT at the Green|Blue boundary in one of the four conditions. However, in that case, the RT in the RVF was not faster than in the LVF. We processed the results of each observer using the methods described for Experiments I – III. The RT DIFFERENCE values are shown as a function of the COLOR DIFFERENCE values in Fig. 9b–e. Several of the data sets fell below the Y=0 line, indicating that RT was faster in the RVF than in the LVF. However, there was no clear trend for the data in any of the conditions to be a U-shaped function with its minimum at -0.024 sec., as would be predicted by the results of Gilbert et al. Thus, none of the conditions produced an effect similar to that of Gilbert et al. The lack of such an effect suggests that our choices of task and surrounding field lightness were not crucial to our failure to find a Whorfian color boundary effect in the RVF only. Subject and co-author KMG served in both Experiment IV (saccadic RT) and Experiment III (button press RT) (Fig. 7d). As for the other subjects in Fig. 7, his saccadic RT was slightly faster than the button press RT, but the overall shapes of the two functions were similar.

These data allowed us to examine, qualitatively, the effects of the look-at-the-target vs the look-at-the button instructions. The data (Fig. 7e, f), averaged across three subjects and both sides of the visual field, revealed that the look-at-the target data were consistently faster (average difference = 0.038 seconds). The results are similar to those obtained when look-at-the-target saccadic RT data are compared to the right-left buttonpress RT data. Apparently the “choice” in these choice RT tasks had an important effect on RT.

Experiment V

While Experiments I – IV revealed no reliably extra-fast RT at the Green|Blue boundary specific to the RVF, as Gilbert et al. reported, the data sets did show a large range of RT values. What was this variability due to? One possibility is that the color combinations might not all look equally different to the observer, even though they were approximately equally separated in the Munsell color order system or the CIELAB color space. To investigate this possibility, we measured the perceived differences between the stimuli from Experiments II and IV using Maximum Likelihood Difference Scaling (MLDS), a modern psychophysical scaling technique (Maloney & Yang, 2003;Knoblauch & Maloney, 2008).

Methods for Experiment V

The observers were the same people who served in Experiments II and IV, and the apparatus was the same as was used for the MOA data (the CRT for the observers from Experiment II and the rear projection apparatus for the observers from Experiment IV). For the observers from Experiment II, the stimulus array was two pairs disks (Fig. 1c). Each top or bottom pair of colors was separated by one, two, or three chromaticity steps, from a set of 11 equally-spaced colors (stimulus chromaticities are in the Appendix). The steps were chosen so that two steps in the MLDS experiment corresponded approximately to the difference between the target and distractors in the RT experiment. The surrounding field was dark gray. For the observers from Experiment IV, the surrounding field was either dark or light, and the targets were grouped on the right or the left half of the visual field (Fig 1d). There were 10 equally-spaced colors, and two steps in the MLDS experiment corresponded approximately to the difference between the target and distractors in the RT experiment.

On each MLDS trial, the observer judged which of the two pairs of colors was the more dissimilar: the top pair or the bottom pair. This was a nonverbal judgment in the sense that no color terms were needed. In the case of the observers from Experiment II, the data were collected before the MOA and RT data, so the experimentally naïve observers did not know that they would be asked to name or judge any colors. Only co-author AMB knew what hypotheses were being tested. Each observer from Experiment II contributed one run through the MLDS stimulus set. In the case of the observers from Experiment IV, the MLDS measurements were made after the RT data and MOA data were collected; only co-author KMG knew that this was a study of categorical perception at the Green|Blue boundary1. Each of the three observers from Experiment IV contributed 10 runs through the stimulus set (five runs in the LFV, five in the RVF).

Results

The MLDS algorithms of Knoblauch & Maloney (2008) generated a curve of “Psy” (ψ), the scaled magnitude of the stimuli, normalized to the range (0, ... , 1), as a function of the stimulus values, which were the color angles of the stimuli in CIELAB color space. Fig. 10a shows the average ψ curves for the observers from Experiment II. The difference between the ψ values of two color angles is the scaled perceptual difference between them, delta-Psy (Δψ) (e.g., the red brace in Fig. 10a). In our case, the two color angles were separated by two steps along the ψ function, because those were separated by approximately the same amount (in azimuth of CIELAB) as the targets and distractors in Experiment II. This analysis generated a graph (Fig. 10b) that related stimulus chromaticity to stimulus appearance under our particular conditions: the x-axis was the azimuth in CIELAB color space (averaged across the two color angles), and the y-axis was the scaled perceived differences between target and distractor stimuli (Δψ values). For example, the two colors indicated by the red brace in Fig. 10a were subtracted to obtain the value of Δψ at the tip of the arrow in Fig. 10b. If categorical perception had occurred, we would expect a locally high value of Δψ at the category boundary, where a green stimulus was compared to a blue stimulus, and lower Δψ values above and below the boundary (Fig. 11a; see Harnad (1987), for a review). In contrast, if the colors are not perceived categorically, there will be no locally high value at the putative category boundary (Fig. 11b–d).

Fig. 10.

Fig. 10

MLDS data and their fits to the RT data. a—c, data from the observers from Experiment II, using stimulus configuration from Fig. 1b. Black triangles, MOA Green|Blue boundaries a, squares, MLDS ψ data; b, diamonds, Δψ data derived from a; c, the reciprocal of the Δψ data (line) was fitted using Eq. 2 to the RT data of Experiment II (circles). . d-i, data from the observers from Experiment IV; red and black solid lines, RVF and LVF respectively; d-f, dark surrounding field; g-I, light surrounding field. d, g, squares, MLDS ψ data; solid lines, point-to-point data; e, h, diamonds, Δψ data; solid lines, point-to-point data; f, i, the reciprocals of the Δψ data (solid lines) were fitted to the RT data of Experiment IV using Eq. 2 (white circles, RVF; black circles, LVF). White triangles and dashed lines throughout, the predicted curves for the RVF, taken from the LVF data, but assuming a 0.024 sec. category effect at the Green|Blue boundary (panels f,i).

Fig. 11.

Fig. 11

Examples of possible results of an MLDS experiment. Only the curve in (a) shows categorical perception.

Discussion

On the hypothesis that target stimuli are easier and faster to detect when they are more different from their distractors, and slower and harder to detect when they are more similar, one might suppose that the MLDS and the RT data might be related to one another. To explore this possibility, we predicted the RT data from the reciprocal of the MLDS data:

RT(x)=RTmin+k(1ΔΨ(x)(1ΔΨ(x))min), Eq. 2

where Δψ(x) is the MLDS-scaled difference between the two colors of the color combination (e.g., Fig. 10a), x is the average of those two color angles, RT(x) is the RT at x, and RTmin is the minimum RT for the data set. The average predictions of Eq. 2 for the RT results of Experiment II appear as the black line in Fig. 10c. For example, the two values of ψ indicated in red in Fig. 10a produced the value of Δψ indicated by the arrow in Fig. 10b., and predicted the value of RT at the tip of the arrow in Fig. 10c.

The predicted RTs from the much more extensive individual MLDS ψ data on the three observers from Experiment IV are in Fig. 12. The fits are good, suggesting that RT can be understood from the perceived differences between the targets and the distractors, regardless of which color categories they come from.

Figure 12.

Figure 12

RT data from Fig. 8, pooled across LVF and RVF, compared to the predictions from the delta-Psy results of Experiment V, fitted from Eq. 2 using a least squares criterion. *: co-author KMG.

The average ψ data for the RFV and LVF on the observers from Experiment IV were similar to each other, and so were the average Δψ functions for the RFV and LVF (compare the black and white squares in Figs 10d and g, and the black and white diamonds in Figs. 10e and h). Thus the MLDS data did not suggest any categorical effect restricted to the RVF. But, how big would the effect be, if the RVF RT data showed a dip of 0.024 sec. near the Green|Blue boundary? To answer that question, we adjusted the MLDS function from the LVF (white triangles and doted lines in Figs. 10d, g) to predict a dip of 0.024 sec. in the predicted RT function (white triangles in Figs. 10f, i) near the average Green|Blue boundary. The intermediate step in that prediction was the predicted RVF Δψ function (white triangles in Figs. 10e, h). That function shows a prominent maximum of the kind illustrated in Fig. 11a. Whereas the predicted changes to the RT data (Figs. 10f, i) and the MLDS ψ data (Figs. 10d, g) are subtle, the predicted effects on the Δψ functions are large, and are clearly ruled out by the RVF data (white diamonds, with error bars smaller than the data points, are well clear of the large deviation shown by the white triangles). Thus, the results of Experiment V provide evidence against a categorical effect in the perceived differences between colors that straddle the Green|Blue boundary.

The lack of an RVF-LVF difference in Figs. 10d--I is perhaps not surprising, because the left and right cerebral hemispheres are extensively connected. A Whorfian effect might cause categorically different stimuli presented to the RVF to be identified sooner because of the primary visual projection from the RVF to the left cerebral hemisphere, the same hemisphere where the language centers are located. However, it seems likely that this RVF advantage would be lost if the observer were allowed to respond at will, after the visual signals from both visual fields were allowed to reach whatever language and decision centers are needed, regardless of how the hemispheres are connected (see Roberson & Hanley, 2007, for a similar argument). Such an analysis does not, however, explain why there was no reliable Whorfian effect in either visual field in these data sets.

Color-theoretical Discussion

The need for a null hypothesis

What is lacking in previous studies of Whorfian effects in color vision is an explicit null hypothesis. In the case of the experiment of Gilbert et al., what should the RT be, in the absence of a categorical color effect? The implicit model that underlies much of the published research in this field, beginning with Kay and Kempton (1984), is that RT should be proportional to the separation of the stimuli in some uniform chromaticity space, unless categorical perception modifies that general relation. This implicit model depends heavily on the assumption that the color space within which the stimuli are chosen is uniform for all sensory aspects of visual perception. If the between-category stimuli were in some simple way more different from one another than the within-category stimuli are, that difference alone might explain the greater ease that subjects might have in detecting stimuli defined by between-category differences.

To see how a null hypothesis is necessary, consider the situation where RT is actually a curvilinear function of color azimuth (Fig. 13). In that case, a modest categorical effect might produce RT that is faster than the null hypothesis prediction, but not necessarily faster than all the colors in the data set (Fig. 13a). The Green|Blue boundary would not be the fastest RT (the fastest color might be elsewhere, white triangle in Fig. 13a), but it would be the color where the RT data fall below the null hypothesis prediction. The categorical boundary effect would be evident in the error of prediction, which would show a localized minimum (Fig. 13b). The minimum RT values for the observers in Experiments I – IV were consistent with this possibility, because those values were all close to 185° in CIELAB, but were not closely related to the Green|Blue boundaries (Figs. 2, 5, 6, 8 and Figs. 3b, e and h). Therefore, it was especially important to establish a null hypothesis to evaluate our data, to determine whether the scenario illustrated in Fig. 13 applies here.

Fig. 13.

Fig. 13

a. Diagram of a situation where perception is categorical in that RT is extra-fast at the color boundary (the white disk falls below the red prediction curve at the color boundary value indicated by the black triangle), but the extra-fast RT is not the fastest RT in the experiment (the fastest is the color indicated by the white triangle). This situation is especially revealed by the errors of prediction (b) which show a prominent dip at the Green|Blue boundary (black triangle), but none at the minimum of the data set (white triangle).

The choice of color space is a null hypothesis

Investigators in the tradition of Kay & Kempton (1984) generally attempt to assure that their results are due to a perceptual rather than a sensory difference between their stimuli by choosing stimuli that are separated by a constant distance in some presumably uniform metric color space. However, the rigorous assumption of uniformity of any color space is unwarranted. Uniform chromaticity spaces differ from one another in several ways, including in their design criteria. For example, the color samples that define the Munsell color order system were chosen to be perceptually uniformly spaced, but not necessarily to be uniform with respect to sensory color discrimination. Indeed, stimuli that are separated by constant distance in Munsell space are not generally equally discriminable from one another, as specified by CIELAB (e.g., Kuehni, 1999). More generally, the separation of two colors in JND units (by Fechnerian scaling) is known to be an unreliable guide to their perceived difference (Wyszecki, 1972), and the perceptual uniformity of the Munsell color system, even with respect to perceived hue, saturation, and value, is at best an approximation (Indow, 1988). The spacing of the colors in CIELAB and CIELUV uniform chromaticity spaces was intended to represent approximately constant-sized JNDs in all locations and in all directions of color space, based on MacAdam's ellipses (Wyszecki & Stiles, 1982), but even these spaces are both non-uniform and non-isotropic (e.g., Fig. 5.6 in Shevell (2003), and Fig 5.4.1 in Wyszecki & Stiles (1982)). The designs of CIELUV and CIELAB were heavily influenced by the desire for the uniform chromaticity spaces to be computationally simple transformations of CIE color matching functions. This means that the discriminability of equally spaced colors can vary substantially, even when they are close together. The problem is even more severe when the “within-category” stimuli are located in a different region of color space than the “between-category” stimuli (e.g., Daoutis, et al. (2006), data re-analyzed and discussed in depth in Drivonikou, Davies, Franklin, & Taylor (2007)). Furthermore, all color spaces, including the Munsell color system, CIELAB, and CIELUV, were designed based on average data, with considerable smoothing, so they will not apply to any single observer. This makes them inadequate benchmarks against which to assess categorical perception of individuals. Finally, the general model that RT should be monotonic with separation in color space is incorrect in the limit, because there is a maximum separation in color space beyond which RT is fast and constant (Nagy & Sanchez, 1990). In view of the modest sizes of some of the effects reported in this literature, it is striking that so little attention has been paid to the metric that quantifies the differences between the stimuli.

It is now clear that simply equating the separations of the stimuli along some psychological continuum is sometimes not sufficient to produce constant RT in a visual search experiment. In previous work (Lindsey, et al., 2010), we and our collaborators studied visual search using colors that were carefully adjusted for their psychological properties: their colorimetric purity was adjusted for constant saturation, and the subjective separations in color between targets and distractors were carefully controlled. In spite of these controls, we found that RT varied drastically with hue. In contrast, RT was predicted very well by a standard “low-level” color-opponent model, which was based on the responses of the well-understood LM and S channels. Thus, a failure of stimuli that have certain carefully-established subjective qualities to behave as expected does not necessarily mean that some higher-level process must be at work: on the contrary, a low-level sensory model might very well account for the results.

The standard color model as the null hypothesis

To get around these difficulties, and and to test directly the hypothesis that the perception of colors is categorical, we need a model of RT for colors. Ideally, this model will be a purely sensory model and it will make specific predictions that can be falsified by the results of perceptual studies such as the study of Gilbert et al. What should the RT be, in the conditions of a particular experiment, under Whorfian and non-Whorfian assumptions?

Our null hypothesis was a standard color-opponent model of discrimination between colors. That is, we assumed that when the target and distractor colors are hard-to-discriminate, RT should be slow, and if the colors are easy-to-discriminate, RT should be fast. Therefore, we predicted the results of Experiments I – IV, using a standard color-opponent model (see Lindsey et al., (Lindsey, et al., 2010), Supplement, and Appendix III below, for further details). Briefly, we calculated the coordinates of our stimuli in MacLeod-Boynton color space, which has axes directly related to L–M excitation and S-cone excitation, at constant luminance (MacLeod & Boynton, 1979). Then, we calculated the difference in excitation presented to the L–M and S-cone channels by the colorimetric difference between target and distractors. Our predicted RT was a linear function of these differences:

RT(x)=a+((bΔLM(x)1)4+(cΔS(x)1g(x))4)14, Eq. 3
g(x)=50+0.14S(x)50+0.14S(x)min. Eq. 4

In Equation 3, x is the color azimuth in CIELAB, a is the minimum possible RT, b and c are scalars on the overall contribution of the L–M and S components (values of a, b, and c are in Appendix III), and |ΔLM(x)| and |ΔS(x)| are the absolute values of the differences in L–M and S excitation of the target and distractors. g(x) is a factor from Boynton & Kambe (1980) that takes the increment threshold function for the S cones into account, and is based on an independent experiment (Lindsey et al. (2010), Supplement). The exponent of 4 is the Quick pooling formula, which we use instead of the ArgMin function of Lindsey et al.

The fits of Eqs. 3 and 4 appear in the left-hand panels of Figs. 14 and 15. The green lines in Figs. 14a–c, and 15 a–d are the contribution of the L–M channel, the blue lines are the contribution of the S channel, and the red lines are the fits of Eq. 3, with the parameters b and c fitted by a least-squares criterion. The minimum RT (parameter a) was held constant at 0.4 seconds for the button-press RT experiments (Experiments I and III), and 0.26 for the saccadic RT experiments (Experiments II and IV), consistent with the observation that the saccadic RT functions are consistently faster overall but not very different in shape across the two methodologies (Fig. 7a–d). The L-M channel increases without bound in Figs. 14a and 15 a–d in the neighborhood of the tritan point where L = M and their difference approaches zero. The fits to the RT data from Experiments I – III (Fig. 14e—g) and IV (Fig. 15d—f) are reasonably good. The errors of prediction are the red curves in the right-hand panels of Figs. 14 and 15. If there were categorical perception at the Green|Blue boundary (black triangles), there would be a local minimum in the error function at that color. There is little evidence of such a local minimum.

Fig. 14.

Fig. 14

The RT data from Experiments I, II, and III (a–c) were fitted using Equations 3, 4. Green lines, L-M contribution; blue lines, S contribution; red lines, the full model. d–f, errors of prediction from Eq. 3 (red lines) and from the MLDS fits in the case of Experiment II (black line in e). Black triangles: the average Green|Blue boundaries; green arrow: the L-M contribution goes up to infinity at the tritan colors where L = M. The errors of prediction do not show a pronounced local minimum at the Green|Blue boundary.

Fig. 15.

Fig. 15

The average RT data from Experiment IV (left panels), fitted using Equations 3, 4. Right panels, errors of prediction from Eq. 3 (red lines) and the MLDS fits. Conventions as in Fig. 14.

This simple, low-level model works reasonably well, which suggests that the speed with which subjects can do the visual search task is largely controlled by the strength of the sensory color signals available to mediate the response. Particularly, there is no reliable localized negative peak in the error graphs corresponding to the Green|Blue boundary. The success of these fits indicates that the standard color-opponent model of color discrimination is sufficient to account for these RT results, including the minimum values in the neighborhood of 185° of azimuth within CIELAB, without invoking the complication of categorical perception. We suspect that the minima reported by other authors may also be the result of low-level factors that control the sensitivity of the eye to changes of stimulus chromaticity.

General Discussion

In their 2006 paper, Gilbert et al. proposed that the Sapir-Whorf hypothesis applies to the right visual field (RVF) but not the left (LVF). This is an attractive idea, because it seems likely that a Whorfian influence of language on perception would be stronger when the visual stimulus is presented to the cerebral hemisphere where the language centers reside. Gilbert et al. reported a lateralized Whorfian effect in visual search: right-handed observers were about 0.024 sec. faster to find a target when it was from a different color category from its distractors (e.g., a blue target among green distractors) than when it was from the same category (e.g., a blue target among blue distractors), but only when the target was presented in the RVF. When the target fell within the LVF, the within-category and across-category stimulus combinations produced similar response times (RTs). The RT minimum was held to be evidence of categorical perception because it coincided with the Green|Blue boundary measured by the Method of Constant Stimuli. However, that experiment provided no quantitative analysis substantiating the null hypothesis that no RT minimum should occur in the absence of categorical perception, and no alternative explanation of the location of the RT minimum in color space was discussed or ruled out.

This basic result has been the subject of 17 empirical studies that appeared since Gilbert et al. was published; of the seven articles that reported psychophysical data (Drivonikou, Kay, et al., 2007; Franklin, Drivonikou, Bevis, et al., 2008; Franklin, Drivonikou, Clifford, et al., 2008; Liu, et al., 2009; Roberson, Pak, & Hanley, 2008; Siok, et al., 2009; Zhou, et al., 2010). All of these papers reported a minimum RT value in the neighbornood of 185° of color azimuth, which was near the Green|Blue boundary measured using MCS (see Appendix II, below). Four of the articles reported a statistically significant RVF—LVF RT difference in the size of that effect (Drivonikou, Kay, et al., 2007; Franklin, Drivonikou, Bevis, et al., 2008; Franklin, Drivonikou, Clifford, et al., 2008; Roberson, et al., 2008), and three reported no statistically significant difference ((Liu, et al., 2009; Siok, et al., 2009; Zhou, et al., 2010). The RT experiments we report here were designed to replicate or refute that important result, and to find the experimental conditions under which it holds.

We were not able to replicate the RT result of Gilbert et al., using either button-press RT or saccadic RT as a dependent measure, using either their own stimuli or stimuli chosen within the CIELAB color space, using stimuli with light or dark surrounding fields. Like many other workers in this field, we did find reliable RT minima in the neighborhood of 185° of color azimuth within CIELAB in all our data sets. However, these minima were not generally obtained with the between-category stimuli, in either visual field, and a -0.024 sec RVF-LVF difference in cross-category RT, the magnitude reported by Gilbert et al., was not consistent with our data.

In contrast to the lack of evidence for a reliable Whorfian effect in any of our data sets, our two modeling efforts worked well. Most directly, the scaled reciprocal of the Δψ data from the MLDS experiments fit the RT data well. This fit is intuitively appropriate, but it has the difficulty that if there were a perceptual discontinuity at the Green|Blue boundary, that discontinuity could affect both the RT and the MLDS data similarly. Therefore, both data sets might have a local minimum at the Green|Blue boundary, but no discontinuity in the fit between the two data sets is to be expected. In a similar vein, the use of the Munsell space for an experiment like Gilbert's is problematical. If there were indeed a Whofian effect of language on perception, that perceptual effect should have already adjusted the Munsell color space to be uniform with respect to perceived color differences, so no category effect should be found.

It is more convincing that the standard model of “early” color vision fit the data reasonably well (red lines in Figs. 14 and 15). The important feature of these models is that they provide reasonable fits to the full range of RT data, from the greenest to the bluest stimuli, with no assumptions about the uniformity of the color space within which they were chosen, and without recourse to a Whorfian element or any other high-level cognitive explanation. Indeed, the three estimates of the differences between the colors: the RT data, the perceived differences from the MLDS experiments, and the predicted sensory differences from color theory, are all very similar. The errors of prediction obtained when the RT data are predicted from the perceived differences and the sensory differences are very similar (compare the black and red lines in the right-hand panels of Figs. 14 and 15). This leads us to conclude that the Whorfian hypothesis is not a necessary component of any complete theory of visual perception of the colors in these experiments.

Conclusion

The results of the experiments reported here call into question the traditional use of color perception at the Green|Blue boundary as a paradigm for studying the possible correspondence between perceptual categories and color names. Although colors are necessarily categorized when they are named (otherwise one would need a distinct name for each discriminable color), they are apparently not categorized when they are perceived, at least not under the particular experimental conditions we examined. Of course, it is not appropriate to generalize beyond these data and analyses to speculate whether visual perception is ever categorical. However, these results do challenge the status of the Sapir-Whorf hypothesis as a general theory of visual perception. Furthermore, the use of perceived hue and color terms as a way of studying the possible relation between perceptual and linguistic categories should be re-examined critically, at least for reaction time experiments involving stimuli in the neighborhood of the Green|Blue boundary.

Acknowledgements

This work was supported by the National Institutes of Health–National Eye Institute R21-EY018321, R21-EY018321-0251 and the Ohio Lions Eye Research Institute. We are grateful to Ms Heather Shamp, Ms Renee Rambeau, and all our observers for their assistance in collecting the data, and to Dr. Loraine Sinnott for statistical consultation.

Appendix I

Stimulus parameters for Experiment I

Apparatus

Stimuli were presented using a Mitsubishi Diamond-Pro 9TTXM cathode-ray tube (CRT) video computer monitor. RT responses were entered by means of the computer keyboard. The observer moved the computer mouse to manipulate the colors in the MOA phase of the experiment.

Spatial parameters

For the RT experiment, the target and distractor stimuli were twelve 1.5° × 1.5° v.a. squares, presented with their centers equally spaced around a 5° v.a. diameter circle. For the Method of Adjustment (MOA), the stimulus in position #1 (Fig. 1a) was the “target” color and the other 11 stimuli were the “distractors”.

Chromaticities

The chromaticites of the stimuli appear in the figure below, in CIELAB units, calculated from the calibrated xyY coordinates (Pritchard PR-670 SpectraScan spectrophotometer) with a white point of (0.310, 0.316, 80). The luminance of the colors was 39.5 cd/m2, and the luminance of the surrounding gray field was 30.3 cd/m2. For the MOA, the chomaticities of the stimuli moved along the curve defined by the white disks, with the target and distractor separated by 15° of azimuth in CIELAB.

Stimulus parameters for Experiment II

Apparatus

For the RT experiment, stimuli were presented on a high-diffusion rear-projection screen by a DILA video projector (JVC DLA-M2000L). RT responses were in the form of eye movements, which were recorded by a Tobii X120 Eyetracker (Stockholm, Sweden). In the MOA phase of the experiment, a ViewSonic P815 CRT was used to present the stimuli, and the observer moved the computer mouse to manipulate the colors.

Spatial parameters

For the RT experiment, the target and distractor stimuli were 3.5° v.a. diameter disks, presented with their centers equally spaced around a 12.5° v.a. diameter circle. For the MOA measurements, the target was in Position #2 (Fig. 1b), and the other 11 stimuli were distractors.

Chromaticities

The chromaticites of the visual search stimuli appear as the white disks in Fig. A1b, obtained from the calibrated (Pritchard PR-670 SpectraScan) xyY values using a white point at (0.310, 0.316, 135). The colors were 65.23 cd/m2, and the surrounding field was 49.1 cd/m2. The colors were separated by an approximately constant 14° of azimuth in CIELAB color space and an average ΔE of 10.7 (range 9.77 – 11.84). For the MOA, the stimuli were adjusted along the contour described by the white disks, with a constant separation of 15° of azimuth in CIELAB.

Fig. A1.

Fig. A1

The chromaticities of the stimuli in these experiments. Throughout: gray diamonds, the stimuli used by Gilbert et al., taken from their specified RGB values using the software they specified (www.easyrgb.com); “G” and dotted line, the Green|Blue boundary of Gilbert et al.; dashed line, Green|Blue boundary from the method of constant stimuli in our experiments; white disks, calibrated chromaticities of the green and blue stimuli used in our RT experiments. black triangles: chromaticities of our gray surrounding fields. A, Experiment I, black dots, the intended chromaticities taken from the Munsell samples used by (Kay & Kempton, 1984); B, Experiment II. Black disks, calibrated chromaticities of the stimuli used in the MLDS experiment. C, Experiment III; D, Experiment IV. Black disks, calibrate chromaticities of the MLDS stimuli, D, L, the green|blue boundary for the dark and light surrounding fields, respectively.

Stimulus parameters for Experiment III

Apparatus

The RT and MOA apparatus was the same as for Experiment I.

Spatial parameters

Target and distractor stimuli: 3.2° × 3.2° v.a. squares, presented with their centers equally spaced around a 10° v.a. diameter circle. For the Method of Adjustment (MOA), the stimulus in position #1 was the “target” color and the other 11 stimuli were the “distractors”.

Chromaticities

The calibrated chromaticites of the stimuli appear in Fig. A1c, which were calculated from the calibrated xyY coordinates using a white point of (0.310, 0.316, 80). The colors were 39.55 cd/m2, and the surrounding field was 30.25 cd/m2. The colors were separated by about 15.7° of aximuth in CIELAB, or 11.5 ΔE units (range: 11.1 – 11.97). For the MOA, the stimuli were adjusted along the contour described by the white disks, with a constant separation of 15° of azimuth in CIELAB.

Stimulus parameters for Experiment IV

Apparatus

same as for Experiment II (for both RT and MOA).

Spatial parameters

The search stimuli and the stimuli for the MOA measurements were the same as for Experiment II. Only in the “look at the button” conditions, small paper “buttons” were affixed to the rear-projection screen midway between the fixation target and the left-hand and right-hand color disks (Fig. 1b); the “buttons” were absent during the “look-at-the-stimulus” experiments. The observer looked at the appropriate “button” on each trial to indicate his/her choice of the right-side vs. left-side location of the target.

Chromaticities

The chromaticities for the RT experiments appear in Fig. A1d, obtained from the calibrated xyY values using white point at (0.310, 0.316, 135). The colors were 55 cd/m2, and were separated by about 25.3° of azimuth in CIELAB, average ΔE was 19.1 (range: 17.8 – 20.8). The surrounding field was 41.3 cd/m2 in the “dark surround” condition, and 96.3 cd/m2 in the “light surround” condition.

Experiment V

Apparatus

The same as for the corresponding MOA expriment (a CRT for the observers from Experiment II and a rear-projection apparatus for the observers from Experiment IV). The observers responded by pressing keys on the computer mouse.

Spatial parameters

As in Experiments II and IV, the disk diameters were 3.5° v.a. For the observers from Experiment II, the two pairs of colored disks that were judged were in positions 12 and 1, and 6 and 7 respectively (Fig. 1c). For the observers from Experiment IV, the “near” disks on the right-hand side of the display were centered 3.5° v.a. to the right of the midline, and the “far” disks were centered 7° v.a. to the right of the midline (Fig. 1d). The center-to-center vertical separation was 7° v.a. The stimuli on the left-hand side of the display were mirror images of those on the right.

Chromaticities

The chromaticities of the stimuli for the MLDS experiments are presented as black disks in Fig. A1b (for the observers from Experiment II) and A1d (for the observers from Experiment IV).

Appendix II

Measuring the Green|Blue boundary

In Experiments I – IV, we measured each observer's color category boundaries directly using the Method of Adjustment (MOA). While the method of constant stimuli (MCS) enjoys a better reputation among psychophysicists, it is unsuitable for this purpose because, as we verified in a pilot study, the results of an MCS experiment are greatly influenced by the domain of stimuli being judged.

The Method of Constant Stimuli (MCS)

In the MCS pilot study, observers named single colors, using stimuli with chromaticities and luminances similar to those used in Experiment II. The colors were presented in a randomized blocks design. Interpolation yielded the Green|Blue boundary, i.e., the color angle that was named “green” and “blue” equally often. We varied the midpoint of the domain of stimuli being tested from block to block. The function relating the midpoint of the test domain to the Green|Blue boundary had a slope of 0.318 to 0.332, and the MCS estimates of the Green|Blue boundary covered a range of 20° to 24° of color azimuth (Fig. A2, black triangles). In contrast, the Green|Blue boundary was constant when the task was the MOA, performed using the same method as the main experiment (Fig. A2, white triangles).

Fig. A2.

Fig. A2

The effects of testing range on the estimated Green|Blue boundary. Black triangles, Method of Constant Stimuli. White triangles: the Method of Adjustment, using the two-stimulus method described in the text. The Method of Constant Stimuli produced clear variation in the measured boundary (**p<0.0005, in each case), whereas the MOA was more reliable (p>0.25 in each case). However, the tendency to follow the range was not perfect, as the slope = 1 hypothesis (dashed lines) is also rejected in each case.

The Method of Adjustment (MOA)

We also collected MOA data for comparison to the MLDS data. In the MOA experiment, the two colors (square #1 vs. squares #2 – #11 in Fig. 1b) maintained a hue angle separation of 15° of azimuth in CIELAB color space, as the observer adjusted their average chromaticity continuously along a constant-eccentricity contour in CIELAB. Each MOA trial began with a new, randomly chosen start point within the range of possible stimulus hues. While fixating a dot in the center of the screen, the observer adjusted the chromaticity of the stimuli until the “target” disk (disk #1 in Fig. 1b) was just green and the other eleven disks were just blue, or vice versa (see Webster & Mollon (1993) for a similar approach to flicker). If this held for a range of possible settings, the observer adjusted the setting until the bluer stimulus looked as blue as the greener stimulus looked green, and the Green|Blue boundary bisected the interval. The observer made 10 settings with square #1 greener than the others and 10 settings with square #1 bluer than the others. The observer's Green|Blue boundary was taken as the average of 8 settings from each set, triming the bluest and the greenest settings in each case.

Fig. A2 shows the results of this experiment. The measured Green|Blue boundary was highly correlated with the midpoint of the testing range in the case of the MCS measurements, and covered a range of as much as 24° of azimuth in CIELAB. In contrast, there was no effect of the range of colors available on the Green|Blue boundary when the MOA was used. Therefore, all the measurements of the Green|Blue boundary in the main experiments were collected using the MOA.

Appendix III

The parameters of the model fits in the section “Color-theoretical discussion”.

Table A1.

MLDS parameters (k in Eq 2.)

Expt II data set group data k 0.10
Expt IV look at target dark bs
ke
kmg
0.023
0.032
0.034
light bs
ke
kmg
0.0168
0.018
0.0775
look at button dark bs
ke
kmg
0.01865
0.0235
0.00435
light bs
ke
kmg
0.031
0.01395
0.02145

Table A2.

MacLeod-Boynton-Kambe model params

minRT L-M S
Expt I 0.4 0.020 0.330
Expt II 0.26 0.064 0.850
Expt III 0.4 0.030 1.30
Expt IV
look at target light 0.26 0.130 0.0285
dark 0.26 0.126 0.033
look at button light 0.26 0.117 0.028
dark 0.26 0.120 0.028

Footnotes

1

Co-author KMG was aware that this was a study of color categorical perception, but he was not aware of the hypotheses being tested at the time the RT measurements were being made. His many intellectual contributions to this project occurred after he had served as an observer in Experiments II and IV.

References

  1. Boynton RM, Kambe N. Chromatic difference steps of moderate sizemeasured along rheoretically critical axes. Color Research and Application. 1980;5(1):13–23. [Google Scholar]
  2. Daoutis CA, Franklin A, Riddett A, Clifford A, Davies IRL. Categorical effects in children's colour search: A cross-linguistic comparison. British Journal of Developmental Psychology. 2006;24:373–400. [Google Scholar]
  3. Drivonikou GV, Davies I, Franklin A, Taylor C. Lateralisation of colour categorical perception: A cross-cultural study. Perception. 2007;36:173–174. [Google Scholar]
  4. Drivonikou GV, Kay P, Regier T, Ivry RB, Gilbert AL, Franklin A, et al. Further evidence that Whorfian effects are stronger in the right visual field than the left. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(3):1097–1102. doi: 10.1073/pnas.0610132104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Franklin A, Drivonikou GV, Bevis L, Davies IRL, Kay P, Regier T. Categorical perception of color is lateralized to the right hemisphere in infants, but to the left hemisphere in adults. Proceedings of the National Academy of Sciences of the United States of America. 2008;105(9):3221–3225. doi: 10.1073/pnas.0712286105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Franklin A, Drivonikou GV, Clifford A, Kay P, Regier T, Davies IRL. Lateralization of categorical perception of color changes with color term acquisition. Proceedings of the National Academy of Sciences of the United States of America. 2008;105(47):18221–18225. doi: 10.1073/pnas.0809952105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Gilbert AL, Regier T, Kay P, Ivry RB. Whorf hypothesis is supported in the right visual field but not the left. Proceedings of the National Academy of Sciences of the United States of America. 2006;103(2):489–494. doi: 10.1073/pnas.0509868103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Harnad S. Introduction: Psychophysical and cognitive aspects of categorical perception: a critical overview. In: Harnad S, editor. Categorical perception: the Groundwork of Cognition. Cambridge University Press; Cambridge, New York, Melbourne: 1987. [Google Scholar]
  9. Indow T. Multidimensional Studies of Munsell-Color Solid. Psychological Review. 1988;95(4):456–470. doi: 10.1037/0033-295x.95.4.456. [DOI] [PubMed] [Google Scholar]
  10. Kay P, Berlin B, Maffi L, Merrifield WR, Cook R. The World Color Survey. Center for the Study of Language and Information; Stanford, CA: 2009. [Google Scholar]
  11. Kay P, Kempton W. What Is the Sapir-Whorf Hypothesis. American Anthropologist. 1984;86(1):65–79. [Google Scholar]
  12. Knoblauch K, Maloney LT. MLDS: Maximum likelihood difference scaling in R. Journal of Statistical Software. 2008;25(2):1–26. [Google Scholar]
  13. Kuehni RG. Hue scale adjustment derived from the Munsell system. Color Research and Application. 1999;24(1):33–37. [Google Scholar]
  14. Lindsey DT, Brown AM. Color naming and the phototoxic effects of sunlight on the eye. Psychological Science. 2002;13:506–512. doi: 10.1111/1467-9280.00489. [DOI] [PubMed] [Google Scholar]
  15. Lindsey DT, Brown AM, Reijnen E, Rich AN, Kuzmova YI, Wolfe JM. Color channels, not color appearance or color categories, guide visual search for desaturated color targets. Psychological Science. 2010;21(9):1208–1214. doi: 10.1177/0956797610379861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Liu Q, Li H, Campos JL, Wang Q, Zhang Y, Qiu J, et al. The N2pc component in ERP and the lateralization effect of language on color perception. Neuroscience Letters. 2009;454(1):58–61. doi: 10.1016/j.neulet.2009.02.045. [DOI] [PubMed] [Google Scholar]
  17. MacLeod DIA, Boynton RM. Chromaticity diagram showing cone excitation by stimuli of equal luminance. J. Opt. Soc. Am. 1979;69(8):1183–1186. doi: 10.1364/josa.69.001183. [DOI] [PubMed] [Google Scholar]
  18. Maloney LT, Yang JN. Maximum likelihood difference scaling. Journal of Vision. 2003;3(8):573–585. doi: 10.1167/3.8.5. [DOI] [PubMed] [Google Scholar]
  19. Nagy A, Sanchez RR. Critical color differences determined with a visual search task. J. Opt. Soc. Am. A. 1990;7(7):1209–1217. doi: 10.1364/josaa.7.001209. [DOI] [PubMed] [Google Scholar]
  20. Nickerson D. History of the Munsell color system and its scientific application. J Opt Soc Am. 1940;30(12):575–586. [Google Scholar]
  21. Ratliff F. On the psychophysiological bases of universal color terms. Proceedings of the American Philosophical Society. 1976;120(5):311–330. [Google Scholar]
  22. Roberson D, Hanley JR. Color vision: Color categories vary with language after all. Current Biology. 2007;17(15):R605–R607. doi: 10.1016/j.cub.2007.05.057. [DOI] [PubMed] [Google Scholar]
  23. Roberson D, Pak H, Hanley JR. Categorical perception of colour in the left and right visual field is verbally mediated: Evidence from Korean. Cognition. 2008;107(2):752–762. doi: 10.1016/j.cognition.2007.09.001. [DOI] [PubMed] [Google Scholar]
  24. Schanda J. Colorimetry: Understanding the CIE System. Wiley; Hoboken, New Jersey, USA: 2007. [Google Scholar]
  25. Shevell SK. The science of color. 2nd ed. Optical Society of America; Amsterdam: Elsevier: 2003. [Google Scholar]
  26. Siok WT, Kay P, Wang WSY, Chan AHD, Chen L, Luke KK, et al. Language regions of brain are operative in color perception. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(20):8140–8145. doi: 10.1073/pnas.0903627106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Webster MA, Mollon JD. Contrast Adaptation Dissociates Different Measures of Luminous Efficiency. Journal of the Optical Society of America a-Optics Image Science and Vision. 1993;10(6):1332–&. doi: 10.1364/josaa.10.001332. [DOI] [PubMed] [Google Scholar]
  28. Wyszecki G. Color matching and color-difference matching. J Opt Soc Am. 1972;62(1):117–128. doi: 10.1364/josa.62.000117. [DOI] [PubMed] [Google Scholar]
  29. Wyszecki G, Stiles WS. Color science : concepts and methods, quantitative data and formulae. 2nd ed. Wiley; New York: 1982. [Google Scholar]
  30. Zhou K, Mo L, Kay P, Kwok VPY, Ip TNM, Tan LH. Newly trained lexical categories produce lateralized categorical perception of color. Proceedings of the National Academy of Sciences of the United States of America. 2010;107(22):9974–9978. doi: 10.1073/pnas.1005669107. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES