Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2016 Nov 14;113(48):13666–13671. doi: 10.1073/pnas.1613666113

Phylogenetic approach to the evolution of color term systems

Hannah J Haynie a, Claire Bowern b,1
PMCID: PMC5137717  PMID: 27849594

Significance

A major question in the study of both anthropology and cognitive science is why the world’s languages show recurrent similarities in color naming. Here we examine this inherently evolutionary question–the evolution of color systems in language–using phylogenetic methods. We track the evolution of color terms across a large language tree in order to trace the history of the systems. We provide further validation of phylogenetic approaches to culture, and provide an explicit history of color terms across a large language sample, the Pama-Nyungan languages of Australia. Our work is of relevance to anthropologists, psychologists, and linguists.

Keywords: linguistics, color, cognitive science, evolution, Australian languages

Abstract

The naming of colors has long been a topic of interest in the study of human culture and cognition. Color term research has asked diverse questions about thought and communication, but no previous research has used an evolutionary framework. We show that there is broad support for the most influential theory of color term development (that most strongly represented by Berlin and Kay [Berlin B, Kay P (1969) (Univ of California Press, Berkeley, CA)]); however, we find extensive evidence for the loss (as well as gain) of color terms. We find alternative trajectories of color term evolution beyond those considered in the standard theories. These results not only refine our knowledge of how humans lexicalize the color space and how the systems change over time; they illustrate the promise of phylogenetic methods within the domain of cognitive science, and they show how language change interacts with human perception.


The naming of colors has long been a topic of interest in the study of human culture and cognition. It is a key case study for the link between perception, language, and the categorization of the natural world (14). The assumptions central to these lines of research on color naming are often linked, whether implicitly or explicitly, with the ways in which color term systems are believed to evolve. One of the most noteworthy scholarly works on color terms, both in terms of its impact on subsequent research and its clear and explicit evolutionary hypotheses, is the classification system proposed by Berlin and Kay (5) and refined in subsequent works (68). However, despite the very clear hypothesis in this literature that the attested range of color-naming systems in language results from evolution along highly constrained pathways, very little has been done to test these claims. Here, we directly examine the evolutionary hypotheses associated with this research tradition: principally, that as color term systems evolve languages gain but never lose basic color terms; and that the order in which color terms are added to a language’s lexicon is fixed. This approach capitalizes on the different patterns we should find in the presence of strong, universal cognitive constraints on color evolution, compared with those that might result from a more relativistic view, in which every language’s color term system development follows a unique path. We use Bayesian phylogenetic methods, which allow us to probabilistically reconstruct ancestral inventories and evaluate claims regarding the order in which color terms enter (and leave) the lexicon. We apply these techniques to Australia’s Pama-Nyungan language family.

The Color Research Landscape

Universal Patterns in Color Naming.

Berlin and Kay’s 1969 influential study (5) first established the notion of a universal, cross-linguistic typology of color term systems and ascribed the limited range of systems attested in their surveys to a strict developmental pathway. The model outlined in Berlin and Kay (5) and subsequent work (610) makes two evolutionary claims. First, the progression through the stages of color system development is hypothesized to be unidirectional. That is, languages gain basic color terms, but they do not lose them. Second, the order in which colors are added to a system is largely fixed.

The Berlin and Kay survey of color terms explicitly tested cross-linguistic variation in color naming, focusing scrutiny on earlier scholars’ treatment of color as a canonical example of linguistic relativity (3, 9). In direct contrast to relativistic views, Berlin and Kay found that languages have no more than 11 basic color terms, and that the systems used to organize these colors occupy only a small portion of the potential design space. They found such cross-linguistic agreement in the focal points of these color categories as further evidence for universals in color semantics. Furthermore, the seven color systems they identified were hypothesized to represent natural evolutionary stages.

Berlin and Kay’s basic findings have been largely affirmed by the much larger cross-linguistic sample in the World Color Survey (WCS) (8). Importantly, the 11 basic color foci of Berlin and Kay were revised to a set of six basic color foci [the lightness categories black and white, plus the Hering primary colors red, blue, yellow, and green (11)]. These foci are consistent with highly clustered “best examples” of basic colors from the WCS (12).

How Color Systems Evolve.

Theories of color system evolution have themselves changed over the last several decades, as the empirical data and diversity of perspectives involved in this area of research have grown. The evolutionary process outlined in Berlin and Kay (5) comprises seven distinct stages. The most basic system involves a two-category system, with terms centered on the black and white foci. The second stage adds a color associated with the focal category red, followed by either yellow or green in stage III. In stage IV, both yellow and green are present, as well as black, white, and red. Stage V adds blue, followed by brown in stage VI. The final stage involves the addition of pink, purple, orange, and/or gray.

This core evolutionary model—unidirectional progression through color system development in a fixed order—is maintained in the revised evolutionary model presented in subsequent work (7, 8), which additionally proposes an alternative pathway through stages III and IV. Fig. 1 gives streamlined version of this model, which, like our data, is not explicit about which foci may be combined in early-stage composite categories. The row labeled A represents Kay and Maffi’s “main line” of color term evolution, which accounts for 83% of the languages in the WCS.

Fig. 1.

Fig. 1.

Evolutionary pathways of color term systems, after WCS.

Once a language has terms for colors, we would expect them to change over time. Major types of change in vocabulary include semantic shift, where a word extends or contracts its meaning, or is used metaphorically (13). There is no a priori expectation from language change that colors would change as a system; although we do sometimes find words changing in parallel (14), words usually change independently. We assume that cognitive constraints play a role in language change in this domain, while still allowing for normal processes of sound change, semantic shift, and lexical replacement to occur in individual color terms.

Pama-Nyungan Color Systems.

The study of color system evolution in Australian languages represents a unique opportunity to evaluate claims central to the debate regarding color systems. Pama-Nyungan is a large language family that extends across approximately 90% of the Australian mainland. The internal composition of this family has been studied using both traditional (15) and phylogenetic comparative methods (16). The diversity of color-naming systems used by speakers of Pama-Nyungan languages make it an ideal case for examining the evolution of color terms. The languages range through all five basic evolutionary stages of the WCS model. This is in contrast to other large families such as Indo-European and Austronesian where languages tend to cluster in WCS stage VI, making them unsuitable for recovering evolutionary trajectories using phylogenetics.

Analysis

Color Term Data.

The data for this study consist of basic color terms from 189 Pama-Nyungan languages in the Chirila lexical database (17). The basic color terms were identified based on the association of a form with at least one English translation included in the set of six basic WCS color terms, plus brown, the most frequent secondary color term in our sample.

Data from this sample were coded as a set of seven binary characters, each representing a color category. For each language, the character state (0 or 1) represents the presence or absence of a term representing a particular color category in that language’s lexicon. (See especially Figs. S1 and S2 and Table S1).

Fig. S1.

Fig. S1.

Full phylogeny of Pama–Nyungan with internal nodes labeled with their posterior probability in a sample of 700 trees.

Fig. S2.

Fig. S2.

DensiTree (Bouckaert and Heled, 2014) plot showing a visualization of the sample of Pama–Nyungan trees, sampled from the Markov chain, illustrating the variation in tree topologies in the sample.

Pama-Nyungan Tree Sample.

We represent the Pama-Nyungan language phylogeny using a sample of 700 trees (see Figs. S1 and S2). The trees were subsampled from a Markov chain used to derive a consensus tree summarizing relationships among Pama-Nyungan languages. The tree was compiled using basic vocabulary data (16, 18). The main clades identified in historical work on Pama-Nyungan are all recovered with high posterior probability, as are many other clades; however, some of the primary branches high in the tree receive equivocal support. We track reconstructions on nodes that have a high posterior probability (that is, that appear frequently in the tree sample), thus avoiding the problem that node reconstruction probabilities can only ever be as high as the probability of the node itself (19).

Bayesian Phylogenetic Methods.

To evaluate the basic evolutionary claims of the WCS theory, we use a Bayesian phylogenetic method for the study of trait evolution, implemented with the BayesTraits software package (19). These methods have previously been used in numerous studies of linguistic and cultural evolution (2023). Modeling the evolution of cultural traits using phylogenetic comparative methods developed for biological processes is not entirely uncontroversial (24); however, arguments that these approaches are invalidated by differences in the transmission of cultural and biological material have been discussed thoroughly and largely refuted (25, 26).

The analyses presented here focus on two basic hypotheses, made explicit by Kay and Maffi (3, 7) and described above:

  • i)

    That “languages are frequently observed to gain basic color terms …” but “languages are infrequently or never observed to lose basic color terms.”

  • ii)

    That languages “gain basic color terms in a partially fixed order,” which proceeds one term at a time.

To test the first of these hypotheses, we use Markov chain Monte Carlo (MCMC) comparative methods to estimate the likelihood of alternative models, given our trees and data. By computing the Bayes factor (BF) support for models that disprefer or disallow the loss of color terms compared with models that allow both gains or losses of colors (by means of their marginal likelihoods), we can evaluate whether Pama-Nyungan color system evolution is consistent with the principle that languages gain color terms but do not lose them (Figs. S3 and S4).

Fig. S3.

Fig. S3.

Density plots of inferred transition rates for q01 (black) and q10 (blue), “gain predominant” model.

Fig. S4.

Fig. S4.

Trace plot of Lh for gain/loss models, showing the gain- and loss-predominant models (blue, red, yellow) compared with the single rate model (black).

We examine hypothesized orderings of color term gain by applying reversible jump MCMC (RJMCMC) (22, 27) analyses to data for pairs of colors, comparing dependent and independent models of trait evolution. RJMCMC moves between models with different numbers of parameters as it searches the space of trees and transition rates, sampling models in proportion to their posterior probabilities. By representing pairs of binary traits as a single character with four possible states (00, 01, 10, 11), these analyses characterize dependencies in trait evolution in terms of eight parameters, which represent transitions between these states. Dependent RJMCMC analyses sample across models that allow separate rates for the gain or loss of each trait in the presence or absence of the other; independent analyses have separate gain and loss rates for each trait that do not depend on the other trait’s state. The posterior sample of models generated by these analyses can be used to examine the support for individual parameters that represent ordered gains or losses of colors. We are thus able to assess the evidence for dependent evolution between individual color terms and to test hypotheses about the relative order in which colors are added.

Finally, we use MCMC analysis to examine the posterior probabilities for reconstructing each color to selected ancestral nodes. For ancestral state reconstructions, we implement a MCMC analysis that infers a single rate for gain transitions and a single rate for loss transitions across all seven color system characters. This approach treats the color lexicon as a unified system as it estimates the likely ancestral color terms.

Results and Discussion

Gain and Loss of Colors.

The marginal likelihoods of nested models can be used to evaluate the support for a dependent hypothesis (in this case, the hypotheses that color terms are gained but never lost), compared with a null hypothesis (here, that no such constraints act on color term systems). This is done by BF evaluation, which compares the probability of the observed data under two hypotheses represented by these nested models. Here, we use the log BFs guidelines (29), where 2logBF12=2(logL(H1)logL(H2)), with H1 and H2 representing the alternate and null hypotheses, respectively.

We test an unrestricted model that allows both the rates associated with color term gain and loss to vary freely. The hypothesis that color terms can be gained, but never (or almost never) lost, is represented by two models. The first sets the rate parameter for color term loss to zero. The second sets different prior distributions for each of the two rate parameters. This initializes the analysis with a bias toward gains of color terms, compared with losses, but allows for color term loss. The opposite patterns are also tested for models that are biased toward color term loss. A final alternative model restricts the rate parameters for color term gain and loss to be equal, creating a single rate model under which neither gain nor loss is prohibited and both of these processes are assumed to be equally likely.

Models for which the rate parameter for color term gain or the rate parameter for color term loss is set to zero fail to converge. The incompatibility with our data of models that implement exceptionless trends of gain or loss of color categories provides evidence against a strong interpretation of the Kay and Maffi (7) model. An evolutionary explanation for color term systems must allow for at least some color term loss.

Table 1 reports the results for analyses that do converge. Two-parameter analyses all result in similar likelihoods because the gain and loss rates converge to near-identical values across all models, regardless of biases toward gain or loss introduced by priors. The unrestricted model estimates the transition rate for color term gain to be markedly higher than that for color term loss (0.95 versus 0.36, respectively). It is thus unsurprising that an analysis that forces these rates to be equal has a far lower likelihood than the unrestricted model, with BFs showing extremely strong support for the two-parameter model over this single-parameter model. In sum, the results suggest that, although a strict prohibition on the loss of color terms is not compatible with Pama-Nyungan color term system evolution, the processes by which these systems have developed have involved substantially more color term additions than losses.

Table 1.

BF support for color term system models

Harmonic mean
Model logL 2logBF BF support
Gain/loss unrestricted -656.73
Gain predominant -657.21 -0.97 Not significant
Loss predominant -657.17 -0.89 Not significant
Gain/loss equal -673.89 -34.32 Very strong, negative

Ordering of Color Term Addition: Dependent Evolution Analyses.

Comparisons of dependent and independent models of evolution for each pair of colors are used to identify correlations in the evolution of color terms. Because the dependent model receives substantial BF support (BF > 2) for the majority of color pairs, we further investigate dependencies between color pairs by examining the frequency with which individual transition rates are deleted in RJMCMC (see Table 2).

Table 2.

BF support for correlated evolution between color pairs

Colors Independent Dependent 2logBF BF support
Red–green −197.35 -195.70 3.29 Moderate
Red–yellow −199.90 -200.15 -0.49 Not significant
Red–blue −185.10 -184.12 1.95 Not significant
Red–brown −172.20 -167.88 8.63 Strong
Green–yellow −252.59 -233.69 37.81 Very strong
Green–blue −232.77 -220.58 24.38 Very strong
Green–brown −226.65 -216.018 21.26 Very strong
Yellow–blue −236.32 -220.37 31.90 Very strong
Yellow–brown −229.55 -220.46 18.19 Very strong
Blue–brown −209.04 -187.35 43.38 Very strong

We find support for evolutionary dependencies between all pairs of colors, as would be predicted by WCS, with the exception of red/yellow and red/blue. For most remaining color pairs, the BF support for the dependent model was strong. However, a correlated model of evolution between red and green receives only moderate BF support. For most pairs of colors that are added in adjacent stages along the “mainline” WCS trajectory, we find strong evidence of dependent evolution, consistent with a theory in which color terms are added in a fixed order. The exception to this is red/green, which receives only moderate support.

Although all pathways in the WCS model involve the addition of red before green, we find no term for red in 11% of the Pama-Nyungan languages that have a term for green. These languages can be explained either by a gain of green before red or, more likely according to our ancestral state reconstructions, a loss of red. Neither of these explanations is consistent with WCS theory. The lack of support for dependencies between red and yellow or blue is likely the result of the fact that red is reconstructed to the root of the tree, and lost independently across several branches of Pama-Nyungan, which vary in the likelihood of a yellow or blue category. Thus, the evolution of red is captured as well by one gain parameter and one loss parameter as it is by separate rates for gain and loss in the presence and absence of yellow/blue. Stronger support for a dependency between red and brown likely reflects the fact that red is found in all sampled languages that have brown.

The RJMCMC procedure allows the number of model parameters to vary across iterations, it provides information about the posterior probability that any parameter should be deleted, which is useful for investigating the ordering of gains and losses of colors for which dependent models are supported. To do this, we examine the percentage of iterations in which particular parameters were set to zero in the RJMCMC analysis.

We expect two categories of parameters to be frequently set to zero: parameters associated with the gain of a “later” color term in the absence of an “earlier” color term (e.g., the rate for gaining yellow in the absence of red), and those associated with the loss of an “early” color term in the presence of a “late” color term (e.g., the loss of red where a term for yellow is present, arrow (h) in Fig. 2). These two types of parameters are associated with changes that contradict the WCS theory, namely out-of-order additions of terms and losses of terms in later stages of the evolutionary trajectory. Parameters associated with color gains in the order prescribed by the WCS model (arrows a and f in Fig. 2) are expected to be deleted seldom, if ever.

Fig. 2.

Fig. 2.

Parameters in dependent models.

Indeed, we find that parameters associated with gaining color terms in the order prescribed by the WCS “main line” (edges a and f in Fig. 2 and Table 3) are almost never deleted. The parameter for a gain of a brown term when a blue term is present is the most often deleted set to zero in 22% of models.

Table 3.

Deletion frequencies for parameters in RJMCMC model strings, expressed as percentage

Color 1 Color 2 a b c d e f g h
Red Green 9 81 2 21 76 0 0 0
Red Yellow 0 0 18 39 0 0 6 68
Red Blue 0 0 1 51 2 0 29 18
Red Brown 0 0 11 95 0 0 11 99
Green Yellow 0 0 22 29 0 0 12 36
Green Blue 0 0 1 98 0 0 1 94
Green Brown 0 0 7 31 0 2 21 63
Yellow Blue 0 0 1 100 3 0 0 0
Yellow Brown 0 0 0 100 0 0 0 0
Blue Brown 22 4 10 26 11 4 0 18

Parameters describing the gain of “late” color terms in the absence of “earlier” color terms (column d in Fig. 2 and Table 3) are expected to be universally deleted under the WCS theory. However, the deletion rates for this parameter are less consistently supportive of WCS hypotheses than the parameters that are associated with “main line” color gains. Percent deletion of this parameter ranges from 21% (for gain of green in the absence of red) to 100% (for gains of blue or brown in the absence of yellow). That is, we never find blue or brown gained without yellow. The deletion rate is also extremely high for the green/blue and red/brown color pairs. For other color pairs, relatively low deletion rates suggest that the ordering of color term gain may not be as secure as suggested by the WCS. Color pairs green/yellow, green/brown, and blue/brown retain the parameter associated with out-of-order color term gain in 69–74% of sampled iterations. For green/yellow, this may suggest that some branches of Pama-Nyungan may evolve along an alternative WCS pathway (i.e., pathway B in Fig. 1).

The parameter associated with losses of “early” color terms in systems where “later” color terms are present (parameter h) is even more variable. Not only do the changes described by this parameter involve losses of color terms, they also result in systems that generally do not fit into WCS classification stages. Despite the strong expectation that this parameter should be deleted, it is retained 100% of the time for color pairs red/green, yellow/blue, and yellow/brown (see column h in Table 3). The yellow category would thus appear to be less resistant to loss than the WCS theory would suggest. Color pairs red/brown and green/blue are more consistent with WCS predictions, with the h parameter deleted 99% and 94% of the time, respectively. As a whole, the patterns of deletion for this parameter across all color pairs show clear evidence for color loss and variable resistance to loss across colors.

The posterior distribution of models produced by RJMCMC is also useful for examining alternative pathways for color term addition. Although the WCS “main line” involves the gain of green before yellow, with the addition of blue only after these two colors have been added, a minority of attested systems surveyed by the WCS show evidence for the addition of yellow before green or the emergence of blue before splitting yellow and green (8). Although the parameter for gaining yellow without green is set to zero in 29% of iterations, the parameter for gaining these colors in the reverse order is never set to zero. These results support the dominance of the green-first pathway in Pama-Nyungan and provides further evidence that both universal and language- or family-specific factors are involved in the evolution of color systems.

We find further support for the WCS “main line” in the dependent model for yellow and blue. Although the parameter associated with gains of blue in the absence of yellow is always deleted, the parameter for gain of yellow in the absence of blue is never deleted. Parameters associated with losing either of these colors in the presence of the other are also almost never deleted. Thus, we find very strong support for the addition of yellow before blue in Pama-Nyungan, but poor support for the notion that these particular terms are resistant to loss.

Ancestral Node Reconstruction.

Ancestral node reconstruction estimates produced by the unconstrained two-parameter analysis across all seven color categories provide further evidence regarding the evolutionary trajectories of color term systems. Fig. 3 displays histograms showing the likelihood of each color’s presence at the root, ancestral nodes corresponding to well-established subgroups, and a sample of other internal nodes.

Fig. 3.

Fig. 3.

Ancestral state reconstructions on consensus tree.

For the majority of Pama-Nyungan subgroups, the reconstructed color term categories co-occur in patterns that are consistent with the WCS typology. The Paman, Yuin-Kuri, and Durubulic subgroups, for example, both have high probabilities for black, white, and red reconstructing to state 1 (present) at their ancestral nodes, consistent with WCS stage II. Several other subgroups, including Karnic, Thura-Yura, and Ngayarta, have ancestral state probabilities consistent with WCS stage III, with the color categories black, white, red, and green. The alternative WCS stage III configuration, with black, white, red, and yellow, is not as well supported among ancestral state reconstructions. Only Bandjalangic shows this pattern.

Only one Pama-Nyungan subgroup, the Central New South Wales languages, could be plausibly reconstructed with the six-color system of the WCS stage V (black, white, red, yellow, green, blue). However, although the probability of reconstructing blue for this subgroup is fairly secure (0.89), the reconstruction of yellow is less certain (0.45). Regardless of whether this subgroup is reconstructed as a canonical stage V system, it represents a challenge to one of the Kay and Maffi (7) hypotheses. It also represents a rapid elaboration of the color term system, given that its parent node shows strong support for only black, white, and red. The pattern of blue and brown occurring without yellow is even more robust in the Kulin subgroup. Three of its seven languages have blue and brown terms but lack yellow, with probabilities of 82% and 42%, respectively, for reconstructing blue and brown but only 4% for yellow.

Deeper in the tree, we find evidence that basic color term systems involved small numbers of color categories for the majority of the history of Pama-Nyungan. The root shows a high probability of having black, white, and red color categories, with a very small probability of green, presumably due to the prevalence of that color category outside of the Pama–Maric languages. The ancestral node reconstructions between this root and the primary subgroups generally show a progression from three-color systems to four-color systems including green (WCS stage III). A transition from this four-color system to a five-color system (black, white, red, green, yellow/WCS IV) is also apparent within the western branch of the family, although elsewhere in the tree the likelihood of five-color WCS stage IV system is lower due to low reconstruction probabilities for yellow.

Although the general trend suggested by ancestral node reconstruction probabilities is consistent with WCS evolutionary pathways, a more detailed examination of the results reveals evidence for patterns that contradict Kay and Maffi (7). Reasonably strong evidence for color term loss can be found in languages like Wayilwan (with only black, white, green) within the Central New South Wales subgroup (with a probable reconstruction of black, white, red, green, blue). We also find a reasonably high probability for a green category in Western Pama-Nyungan nodes ancestral to the Kanyara–Mantharta subgroup, although the probability of green in Kanyara–Mantharta itself is very low (0.01). This decrease in the probability of a green category along the branches leading to the Kanyara–Mantharta subgroup can reasonably be interpreted as a likely loss of that color.

Comparative Reconstruction of Terms.

Linguistic reconstruction using traditional comparative methods (30) reveals only a small number of color terms in that may be inheritances from Proto-Pama-Nyungan or high nodes within the Pama-Nyungan phylogeny. The forms *kara and *maru, both meaning black, are found in a number of distant subgroups across the tree, suggesting that these items may reconstruct to Proto-Pama-Nyungan. However, if these forms do reconstruct to the root, they must have been independently lost in most branches. Other explanations, such as a parallel semantic shift in several branches, cannot be excluded but are not evident from the available data.

Several obvious sources for color terms are evident. Northern Karnic *tyimpa black, for example, occurs in Thura–Yura with the meaning “ashes.” Terms for “ashes” are also recruited to express the color category white, as in Yolŋu *gaywaraŋu. In other instances, we see polysemy (that is, where a color term has multiple meanings, including both color and noncolor terms). Polysemy between white terms and the concepts “shining” and “clean” are evident. Red terms are frequently polysemous with or derived from items meaning “blood” (e.g., Biri kuma) or “red ochre” (e.g., Bandjalangic *kutyin). Proto-Pama-Nyungan *kurnka (“raw” or “unripe”) is used to refer to the green color category in a number of languages. Green terms also come from “leafy” meanings (e.g., Ngayarta *palharra) or “tree” meanings (e.g., *yukiri in many Western languages).

The instability of color terms in Pama-Nyungan is comparable to that in the Indo-European family, where the basic color terms show numerous innovations. In Indo-European, we likewise find color terms that relate to salient objects bearing that color. For example, compare the English loanword orange, Latvian melns “dirty” (cognate with Greek mélas “black”), English black (cognate with Greek phlégo “burn, blaze”; Latin flagrāre “flame, burn”) (ref. 31, p. 1055). However, Indo-European shows an additional pattern, whereby cognate color terms appear in different branches of the family in different meanings. They remain their status as color terms but refer to different portions of the color spectrum. English “yellow” is cognate with Greek khlo:rós “green,” and Old Irish gel “white.” In addition, we also see semantic narrowing and broadening, where terms for specific colors (such as in Sanskrit) derive from a word meaning colored more generally. For example, Sanskrit rakta “red” is from the past participle of the root raj “be colored.” In this last shift, we see a parallel in the Australian data, where Yolŋu miku “red” also means “colored.”

Conclusions

Our work shows an application of Bayesian phylogenetic methods to data that bridges linguistics, cultural anthropology, and cognitive science. Color term systems show themselves to be appropriate to Bayesian reconstruction techniques. We find general support for the WCS model of color term development, but with more nuance, including variability across color categories in the level of support for individual components of the WCS evolutionary theory. We also find exceptions to their predicted patterns, such as the loss of color terms in multiple subgroups. These exceptions are not easily explained by looking at the individual histories of color terms, nor do the linguistic mechanisms associated with color term change in Pama-Nyungan differ from those found in well-studied families like Indo-European. The principles outlined in the WCS evolutionary theory do play a substantial role in the development of color term systems, but that further study from an evolutionary perspective can refine our understanding of the interaction of cognitive constraints and language change in shaping lexical systems.

Supporting Information contains the files needed to replicate our analyses, as well as information on the Pama–Nyungan sample phylogenies used in the analysis. and the color terms present in the individual languages. Command files are available from pamanyungan.net/color.

Defining Color Terms

Basic color terms were identified based on the association of a form with at least one English translation included in the set of six basic WCS color terms, plus brown, the most frequent secondary color term in our sample. We include items whose basic meanings describe color but involve polysemy with physical objects; excluded are items whose only reference is an object, even if when these objects are strongly associated with color (e.g., “red ochre”). These criteria for the inclusion and exclusion of items from the dataset are meant to automatically select terms that are likely to fit Berlin and Kay’s basic color term definition, which focuses on monolexemic terms that express categories in the perceptual space of hue, lightness, and saturation and do not denote a subset of the perceptual space covered by any other color terms. We take a term’s inclusion in a dictionary or vocabulary as evidence that it meets the additional Berlin and Kay criteria of psychological salience and stability across speakers. By designing our lexical sample around basic color categories that have been established by more careful cross-linguistic sampling, we are thus able to indirectly implement Berlin and Kay’s criteria for color term basic-ness.

We acknowledge that this dataset falls short of the desiderata for meticulous language-internal analysis of color terms, because the referential, semantic, and grammatical detail available to us for these words is limited to what has been provided by lexicographers. However, the trade-off is a sample that provides unprecedented coverage within a large language family (i.e., 189 of approximately 290 attested Pama–Nyungan languages). This dense sample provides sufficient coverage of the major Pama–Nyungan subgroups to evaluate models of color system evolution and to reconstruct ancestral states for the color systems of these subgroups.

Many of the languages included in the sample are severely endangered or no longer spoken, making further study of color term denotational ranges impossible. In some cases, a single form may be glossed (that is, translated) with multiple English color terms; in others, a form may be glossed with a single English color term that may not reflect the entire extent of a category. For example, the sources for Gupapuyŋu and Yan-nhaŋu give “blue” and “green” as meanings for the terms milkuminy and walwalyana, respectively. However, the closely related language Djambarrpuyŋu has dhulmu “green.” Now, given that an explanatory gloss for dhulmu gives “as the colo[u]r of the deep sea,” we can be fairly sure that the term covers at least some of the reference of English blue. In other cases, however, the reference of the term is less clear.

Languages for which we have descriptive glosses largely adhere to the principles of color space partitioning described in the literature, in that languages in early stages of color system evolution tend to contain composite terms covering adjacent foci in the color space. However, it is impossible to determine in all cases whether the absence of a term for a particular English gloss represents a gap in the named color space, or simply a decision by a researcher to gloss a form only with a single English color term or with a subset of the basic color foci it can describe. Because we cannot distinguish the exact spectral ranges covered by individual terms, we treat all languages as though their color spaces are exhaustively partitioned and assume that the available English glosses are reasonably close to the focal ranges of their basic color categories. This default assumption of exhaustive partitioning is supported by Kay and Maffi’s (ref. 7, p. 758) finding that 92% of WCS languages demonstrate exceptionless adherence to this ideal.

Pama–Nyungan Phylogenies

We represent the Pama–Nyungan language phylogeny using a sample of 700 trees. The trees were subsampled from the MCMC used to derive a consensus tree summarizing relationships among Pama–Nyungan languages. The tree was compiled using basic vocabulary data, including that described in Bowern and Atkinson (16) [after methods in Gray et al. (18)]. The same sample was used in Zhou and Bowern (20). Here the trees, originally containing 262 taxa, were pruned to the 189 taxa included in our color term sample. Pruning was done using the ape package in R [R Core Development Team (32)]. Bowern and Atkinson’s sample of basic vocabulary included terms for ‘‘black,’’ ‘‘white,’’ and ‘‘red,’’ but no other color terms. The phylogeny estimation uses form–meaning correspondences (cognates), not presence/absence of lexical categories; of the data used to infer the tree sample, only cognates for ‘‘red’’ might provide information that overlaps with the current study. Because this is a very small part of the overall 189-meaning Bowern and Atkinson dataset, we consider the tree sample suitable for our analysis and independent of the color lexicon.

Fig. 3 in the main paper collapses nodes in the tree to aid in readability and to show the trends in the main subgroups that were tracked. Here, we provide a full phylogeny with nodes labelled for their posterior probability (Fig. S1). As can be seen, posterior probabilities vary substantially across the tree, with some subgroups closer to the root often showing lower probabilities than those closer to the tips. However, the nodes we track (including Paman, Karnic, and Western Pama–Nyungan) have support above 0.8. The probability of a node’s occurrence provides an upper bound on the probability of the reconstruction. The DensiTree (https://github.com/rbouckaert/DensiTree) plot in Fig. S2 provides a convenient visualization of the variation in tree topologies. Note that some of the ambiguity in tree topology does not affect groupings, but rather the height of the nodes (and therefore the timing of the branching). In other cases, the Densitree representation conveniently summarizes two main hypotheses. For example, the Yolngu subgroup is a sister to the subgroup that unites all of the languages of Western Australia (including the Kanyara–Mantharta, Wati, and Ngumpin–Yapa subgroups, among others) in 55% of our sample (giving a posterior probability of the node of 0.55). In almost all of the remaining trees in the sample, it forms a subgroup with Warluwaric. Likewise, the Central languages (Karnic, Yardli, Paakintyi, Mayi, and Kalkatungic subgroups) group together with posterior probability 1, but the next split in the tree is unclear.

The tree sample is included in the file color_trees_189.nex, in Nexus format.

Data and Command Files

Two sets of data files are provided. The first, color_binary_189.txt, contains alignment data for presence/absence of each of the color terms tracked in this analysis (black, white, red, green, yellow, blue, brown). The second set of files is used for running the dependency of traits analysis and contains the same information as the full alignment data files but is broken into pairs of color terms. Files are named by the pairs of languages they contain; for example, blbr.txt is the dataset for blue and brown. The file archive is color_datafiles.zip and is available from pamanyungan.net/color.

Several sets of command files are also provided, in the color_commands.zip archive (also available on pamanyungan.net/color). These files run BayesTraits from the command line (rather than interactively) and give the options needed to replicate our analyses. Of course, given that these are Bayesian analyses, a small amount of variation in the results is expected from run to run. Our analyses were run five times each to ensure that analyses were consistently convergent and not stuck in local maxima. Screenshots from Tracer (28) below provide illustrationof convergence and adequate chain length, along with evidence that repeat runs produced results with only minor variation (of the type that we would expect from a stochastic process such as this). We used the gelman.diag and autocorr functions in the coda package in R to assess autocorrelation and convergence in the chains.

Words for Colors in Pama–Nyungan Languages

The remainder of Supporting Information (Table S1) provides the languages, the words for the color terms in the language, and notes on their etymology. Data come from the Chirila etymological database of Australian languages [Bowern (17)] and full bibliographic details are provided there. Words are quoted in the forms in which they appear in the original data.

Supplementary Material

Supplementary File
pnas.1613666113.st01.docx (79.8KB, docx)

Acknowledgments

This work was supported by National Science Foundation Grants BCS-0844550 and BCS-1423711.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1613666113/-/DCSupplemental.

References

  • 1.Woodworth RS. The puzzle of color vocabularies. Psychol Bull. 1910;7:325–334. [Google Scholar]
  • 2.Lucy JA, Shweder RA. Whorf and his critics: Linguistic and nonlinguistic influences on color memory. Am Anthropol. 1979;81:581–615. [Google Scholar]
  • 3.Brown RW, Lenneberg EH. A study in language and cognition. J Abnorm Psychol. 1954;49(3):454–462. doi: 10.1037/h0057814. [DOI] [PubMed] [Google Scholar]
  • 4.Regier T, Kay P. Color naming and sunlight. Psychol Sci. 2004;15:289–290. doi: 10.1111/j.0956-7976.2004.00670.x. [DOI] [PubMed] [Google Scholar]
  • 5.Berlin B, Kay P. 1969. Basic Color Terms: Their Universality and Evolution (Univ of California Press, Berkeley, CA)
  • 6.Kay P, McDaniel CK. The linguistic significance of the meanings of basic color terms. Language. 1978;54:610–646. [Google Scholar]
  • 7.Kay P, Maffi L. Color appearance and the emergence and evolution of basic color lexicons. Am Anthropol. 1999;101:743–760. [Google Scholar]
  • 8.Kay P, Berlin B, Maffi L, Merrifield WR, Cook R. 2009. The World Color Survey (CSLI Publications, Stanford, CA)
  • 9.Lenneberg EH, Roberts JM. The language of experience: A study in methodology. Int J Am Ling. 1956;V22(2):Memoir 13. [Google Scholar]
  • 10.Kay P. Synchronic variability and diachronic change in basic color terms. J Lang Soc. 1975;4:257–270. [Google Scholar]
  • 11.Hering E. 1964. Outlines of a Theory of the Light Sense (Harvard Univ Press, Cambridge, MA)
  • 12.Regier T, Kay P, Cook RS. Focal colors are universal after all. Proc Natl Acad Sci USA. 2005;102(23):8386–8391. doi: 10.1073/pnas.0503281102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lakoff G, Johnson M. 1980. Metaphors We Live By (Univ of Chicago Press, Chicago)
  • 14.Traugott E, Dasher R. 2004. Regularity in Semantic Change (Cambridge Univ Press, Cambridge, UK)
  • 15.Bowern C, Koch H, editors. 2004. Australian Languages: Classification and the Comparative Method (John Benjamins, Amsterdam)
  • 16.Bowern C, Atkinson Q. Computational phylogenetics and the internal structure of Pama–Nyungan. Language. 2012;88(4):817–845. [Google Scholar]
  • 17.Bowern C. The Chirila database of Australian languages. Lang Doc Conservat. 2016;10:1–44. [Google Scholar]
  • 18.Gray RD, Drummond A, Greenhill S. Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science. 2009;323(5913):479–483. doi: 10.1126/science.1166858. [DOI] [PubMed] [Google Scholar]
  • 19.Pagel M, Meade A. Bayesian analysis of correlated evolution of discrete characters. Am Nat. 2006;167:808–825. doi: 10.1086/503444. [DOI] [PubMed] [Google Scholar]
  • 20.Zhou K, Bowern C. Quantifying uncertainty in the phylogenetics of Australian numeral systems. Proc R Soc B. 2015;282(1815):20151278. doi: 10.1098/rspb.2015.1278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Holden CJ, Mace R. Spread of cattle led to the loss of matrilineal descent in Africa: A coevolutionary analysis. Proc R Soc B. 2003;270:2425–2433. doi: 10.1098/rspb.2003.2535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Dunn M, Greenhill SJ, Levinson SC, Gray RD. Evolved structure of language shows lineage-specific trends in word-order universals. Nature. 2011;473:79–82. doi: 10.1038/nature09923. [DOI] [PubMed] [Google Scholar]
  • 23.Jordan FM. A phylogenetic analysis of the evolution of Austronesian sibling terminologies. Hum Biol. 2011;83:297–321. doi: 10.3378/027.083.0209. [DOI] [PubMed] [Google Scholar]
  • 24.Mesoudi A. 2011. Cultural Evolution: How Darwinian Theory Can Explain Human Culture and Synthesize the Social Sciences (Univ of Chicago Press, Chicago)
  • 25.Greenhill SJ, Currie TE, Gray RD. Does horizontal transmission invalidate cultural phylogenies? Proc R Soc B. 2009;276:2299–2306. doi: 10.1098/rspb.2008.1944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Mace R, Jordan FM. Macro-evolutionary studies of cultural diversity: A review of empirical studies of cultural transmission and cultural adaptation. Philos Trans R Soc B. 2011;366:402–411. doi: 10.1098/rstb.2010.0238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995;82:711–732. [Google Scholar]
  • 28.Rambaut A, Suchard MA, Xie D, Drummond AJ. Tracer v1.6. 2014 Available at beast.bio.ed.ac.uk/Tracer. Accessed August 1, 2016.
  • 29.Kass RE, Raftery AE. Bayes factors. J Am Stat Assoc. 1995;90:773–795. [Google Scholar]
  • 30.Hock HH, Joseph BD. 1996. Language History, Language Change, and Language Relationship: An Introduction to Historical and Comparative Linguistics (Mouton de Gruyter, Berlin)
  • 31.Buck CD. 1949. A Dictionary of Selected Synonyms in the Principal Indo-European Languages: A Contribution to the History of Ideas (Univ of Chicago Press, Chicago)
  • 32.R Core Development Team ape: Analyses of Phylogenetics and Evolution. 2015 Available at https://cran.r-project.org/web/packages/ape/index.html. Accessed August 1, 2016.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1613666113.st01.docx (79.8KB, docx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES