Skip to main content
Philosophical Transactions of the Royal Society B: Biological Sciences logoLink to Philosophical Transactions of the Royal Society B: Biological Sciences
. 2019 Nov 18;375(1789):20190062. doi: 10.1098/rstb.2019.0062

Syntax and compositionality in animal communication

Klaus Zuberbühler 1,2,3,
PMCID: PMC6895557  PMID: 31735152

Abstract

Syntax has been found in animal communication but only humans appear to have generative, hierarchically structured syntax. How did syntax evolve? I discuss three theories of evolutionary transition from animal to human syntax: computational capacity, structural flexibility and event perception. The computation hypothesis is supported by artificial grammar experiments consistently showing that only humans can learn linear stimulus sequences with an underlying hierarchical structure, a possible by-product of computationally powerful large brains. The structural flexibility hypothesis is supported by evidence of meaning-bearing combinatorial and permutational signal sequences in animals, with sometimes compositional features, but no evidence for generativity or hierarchical structure. Again, animals may be constrained by computational limits in short-term memory but possibly also by limits in articulatory control and social cognition. The event categorization hypothesis, finally, posits that humans are cognitively predisposed to analyse natural events by assigning agency and assessing how agents impact on patients, a propensity that is reflected by the basic syntactic units in all languages. Whether animals perceive natural events in the same way is largely unknown, although event perception may provide the cognitive grounding for syntax evolution.

This article is part of the theme issue ‘What can animal communication teach us about human language?’

Keywords: primate communication, language evolution, semantics, meaning, permutation, grammar

1. Introduction

The lineages leading to Homo and Pan began to split some 6–8 Ma [1], which led to differences in morphology, behaviour and cognition between the three surviving species. A striking example is human language, with nothing comparable in the communication systems of chimpanzees, bonobos and other animals. While all primates communicate with species-specific vocal repertoires, only humans have evolved a secondary communication system that is based on sophisticated, socially learned motor control of the vocal apparatus, unlike any other primate [2,3]. This difference is already visible in human infants before language: from about four months of age, humans begin to control their vocal apparatus, first by engaging in playful babbling, then followed by the production of simple word-like structures. At the same time, human infants produce non-linguistic vocalizations for social communication [4], very similar to what has been reported in non-human primates [5]. Human babbling has a social bonding function, possibly evolved in response to cooperative breeding [6], but it also provides the scaffolding for increasing vocal control and competence for subsequent speech production. At around 2 years of age, infants go beyond labelling simple referents (‘dada’, ‘duck’) and start producing word combinations [7], a developmental process that ends with the capacity to produce complex utterances with complex syntactic structures, such as ‘Neither did the linguist we consulted object nor was she interested’ (figure 1). Such communication behaviour requires considerable computational power, for both signallers and receivers. One particular challenge is the conversion of hierarchically structured social events into linearly structured speech streams. How and why did humans evolve this capacity?

Figure 1.

Figure 1.

Tree structure of a complex English expression conveying a sequence of three social events involving people interacting with a linguist, first as a patient (of a consultation), then as an agent (responding to consultation, showing lack of interest). A, adjective; AP, adjective phrase; C, conjunction; CP, conjunction phrase; D, determiner; I, inflection-bearing element; IP, inflectional phrase; N, (pro)noun; NP, noun phrase; S, sentence; V, verb; VP, verb phrase. Reproduced with permission from Townsend et al. [8] under the Creative Commons Attribution license. (Online version in colour.)

2. Syntax and compositionality

A basic mental operation that underlies this communication system is compositionality, or Frege's principle, which states that the meaning of a complex expression is determined by the meanings of its constituents and the rules used to combine them. In human language, compositionality is found both at the morphological and syntactic level, jointly responsible for the vast expressive power of human language. But syntax and morphology are not always compositional. Idioms (‘to kick the bucket’ = to die; ‘to spill the beans’ = to reveal a secret), for example, have figurative meanings different from their literal meanings [9], which cannot be derived from the meaning of the constituents. How did humans evolve the capacity for compositionality? Can the evolutionary history of compositionality be traced back to our common ancestor with non-human primates? Animals often combine signals into sequences, so an essential question is whether resulting combinations qualify as compositional or simple additions of meanings. The null hypothesis in this line of investigation is that callers produce sequences of (meaningful) constituents that merely reflect ongoing changes in the environment, a sort of running commentary of how events unfold.

The purpose of this paper is to review some of the animal evidence for syntax and examine, on a case-by-case basis, whether the systems show indications of compositionality at the morphological or syntactic level. The key terminology is summarized in table 1.

Table 1.

Key terminology and definitions.

term definition
syntax a set of principles by which meaning-bearing units can be combined into well-formed complexes
grammar a synonym for ‘syntax’ or an umbrella term encompassing the phonology, morphology and syntax of a language
duality of patterning a characteristic of language by which structural organization is found at two levels: meaningless phonemes can be combined into morphemes and words whereas (meaningful) morphemes and words can be combined into phrases and sentences
compositionality a process by which meaning is determined by the meanings of the constituent parts and the rule that combines them
merge a mental operation that takes two syntactic elements and assembles them to form a set
recursion (in linguistics) application of a linguistic rule to the result of the same rule (=nesting)

3. Evolutionary theories

(a). Syntax as a recent evolutionary event

How did humans evolve their capacity to produce and decode syntactically organized structures? A current debate concerns the onset of the transition towards human syntax, i.e. when brains became language-ready. One group of theories posits that this process began only very recently, after the split between humans and their primate relatives a few million years ago. Here, the argument is that human syntax is the output of a powerful computation device that evolved very recently and without relevant precursors. This device enables speakers to compute complex phrase structures and allows infants to acquire the syntax of their native language without specific instructions and at high speed [10, p. 25]. Once present, it exerts its power onto the vocal output by rendering thoughts into a syntactically organized, linear speech output. One particular prominent proposal within this group is Chomsky's saltation theory, which explains syntax as the output of a brain that has undergone a qualitative change after the split from non-human primates due to a macromutation [11, pp. 176–184]. Although saltation exists as an evolutionary mechanism, examples are mostly from plants, where gene duplications and gene transfers can lead to rapid spread and speciation [12,13]. However, there is no evidence that brain evolution has occurred in this way. A possibly more plausible scenario for why humans have syntax as a mental operation is that it was enabled, as a by-product, by recent brain enlargements. Humans have unusually large brains, especially considering their body size and compared to ancestral and current species (H. sapiens 1408 g; P. troglodytes 406 g, G. gorilla 486 g, P. pygmaeus 512 g; H. habilis 599 g, H. erectus 963 g; [14]) and it is possible that the extra amount of brain mass enables humans to perform more complex computations than individuals with smaller brains, including short-term memory operations. What the exact amount of brain volume is to experience a transition to syntax, however, is underspecified but could potentially be tested by examining syntactic processing in pathologically small brains, such as in microcephaly. Whatever the underlying evolutionary history, the claim is that human syntax emerged only recently and is fundamentally different from all known animal systems, due to its hierarchical, compositional, recursive mental operations that produce multi-segment structures as the main output [15].

(b). Syntax as an ancient evolutionary event

A second theoretical position is that syntax evolution started much earlier, deep within the primate lineage (or before), and then changed gradually from simple to more complex operations. This hypothesis can be tested with research on non-human primates and other animals to retrace the evolutionary history of syntax in deep time, with distantly related species providing important data to understand the role of ecological pressures [8]. A particularly influential idea is the lexical constraint hypothesis (e.g. [16]), the idea that the capacity for syntax has emerged as a response to increasing lexicon sizes [17]. If a species continuously increases the number of signals, it will reach a point where further additions become uneconomical compared to combining already existing ones, either due to production or memory limits.

A second, less explored idea is the event perception hypothesis [18]. The hypothesis states that the origins of syntax lie in the way events are perceived and mentally represented by the members of a species. For humans, natural events consist of agents, actions and patients; they possess a temporal structure and they contain dependencies across event sequences. Assignments of intentionality appear to play a key role during this process [19]. An important observation is that these basic components (agents, patient, action) correspond to the major grammatical functions of language, i.e. subjects, predicates, direct or indirect objects, tense markers or conjunctions. In this view, the syntactic structure of human language is little more than a coding system that reflects how (humans) perceive events. Syntax, in other words, is ‘out in the world’, constantly present in natural events, so an important research question is how non-human primates, particularly great apes, perceive natural events. Results will be important to decide whether syntax-ready brains have evolved before modern humans and whether the main transition towards language merely consisted of a communicative externalization of this cognitive propensity.

4. Comparative research

(a). Artificial grammar studies

One way to compare the syntactic capacities across species is to carry out artificial grammar studies. This is based on the notion that artificial languages can be ranked in terms of structural complexity, such as whether they contain linear or hierarchical rules. The experimental manipulation is to train subjects on sequences of stimuli that comply with a given rule in order to test whether they subsequently perceive violations [20]. So far, the conclusion has been that the processing abilities of animals are limited to linear grammars, while only humans are able to process hierarchically organized grammars [21]. There are exceptions, notably a study on blackbirds that managed to master phrase structure ruled grammar [22], but results have been questioned due to issues with overtraining and other artefacts [23]. Artificial grammar research has also been criticized for low ecological validity [24]. In most studies, experimental stimuli are sequences without relevant content, requiring advanced auditory pattern recognition but no semantic processing. Although a deliberate design feature (due to the goal of studying ‘pure’ syntax uncontaminated by semantics), syntax and semantics are not divorced in natural systems, casting some doubt on whether artificial grammar studies are suitable to model syntax evolution.

(b). Natural communication studies

(i). Animal song

A second approach to the evolution of syntax has been to study natural communication systems across species. In several species, song has been analysed in terms of potential syntactic structures, a particularly striking example being the communication behaviour of humpback whales, with males combining song units into phrases and phrases into songs, all driven by social learning, occasionally even across populations [25]. However, neither humpback whale nor bird songs qualify as syntax in a strict sense (table 1), as the basic units do not seem to bear any meaning. Instead, animal song appears to function typically to repel sexual rivals and attract mating partners, a sexually selected behaviour. In primates, singing has been reported in some socially monogamous species, such as gibbons or indris, but here also it appears to function in reproduction and competition between groups [26]. White-handed gibbons, however, also sing to predators, notably tigers, clouded leopards and pythons, with predator songs and duet songs assembled in different ways [27]. Naturalistic observations [27] and playback experiments [28] suggest that recipients discriminate predator and non-predator songs by showing appropriate behavioural responses. But as individual song units are meaningless, gibbon song does not technically qualify as syntax either. Nevertheless, the white-handed gibbon song represents a potential example of a compositional system, albeit with no evidence that songs are socially learned.

(ii). Permutation syntax

Types of syntax that prevail in the animal communication literature are permutation and combination. In permutations, signals are arranged in ordered ways, whereas in combinations, signals are selected from a repertoire but assembled without order. Permutations have been observed both within call variants and across signal sequences. An example for call variants is Diana monkey contact calls, which consist of a tonal, arched unit that conveys the identity of the caller. Arches differ individually in shape and contextually in tonality, depending on visibility and background noise, suggesting callers have some control over articulation. Importantly, the arched unit can take a prefix, either a trill or a chatter, depending on social circumstances. Trill–arch combinations are given during socio-positive interactions, whereas chatter–arch combinations are part of socio-negative interactions [29]. Whether such combinations are compositional remains a possibility, pending evidence that the combined meaning goes beyond the meaning of the constituent parts.

A second example is male Campbell's monkeys’ predator alarm calls [30]. Here, males produce two basic alarm call types (‘krak’ and ‘hok’), both of which can take an acoustically invariable suffix (‘-oo’) (figure 2). Suffixed alarm calls have been associated with low urgency situations (e.g. predator far away) whereas unsuffixed ones are given when danger is imminent (e.g. predator visible). In playback experiments, unsuffixed krak alarms (given to leopards) caused leopard alarm responses in Diana monkeys (a species that often forms mixed groups with Campbell's monkeys) whereas suffixed krak alarms (given to general disturbances) did not [31]. Importantly, artificially edited krak and krak-oo alarm calls caused the same response patterns as unedited, natural krak and krak-oo alarms, suggesting that the -oo unit modified the meaning of the krak stem. The Campbell's monkey suffixation system has been cited as an example of compositionality, at the morphological level (-oo does not have an independent semantic function) [18,32].

Figure 2.

Figure 2.

Suffixation in Campbell's monkey alarm calls: (a) krak alarm calls (given to leopards) have a descending frequency transition (large dashed red arrow) and can take on an optional ‘oo’ suffix (dashed red oval line) to form krak-oo (b); (c) hok alarm calls (given to crowned eagles) have no frequency transition (small dashed red arrow) and can take on an optional ‘oo’ suffix (dashed red oval line) to form hok-oo (d). Adapted with permission from Ouattara et al. [30] under the Creative Commons Attribution license. (Online version in colour.)

Permutations are also common at the sequence level. For example, Campbell's males often produce long sequences of calls to different events [33] and one permutation rule concerns the use of boom calls to introduce non-predator related events, which is meaningful to other primates [34]. Another example is chimpanzee long-distance pant-hoot vocalizations, a charismatic vocal utterance that consists of four distinct units: introduction, build-up, climax and let-down (figure 3). Using machine-learning, it was possible to establish the semantics of the four units, with caller identity mainly encoded by the introduction and climax units, age mainly encoded by the build-up, social status mainly encoded by the climax and behavioural context (travel versus food) mainly encoded by the let-down unit [31]. Although it appears that all four units can also be given individually, there is no systematic work on the meaning of these individual calls, so it is not possible to decide whether pant-hoots qualify as compositional signals.

Figure 3.

Figure 3.

Spectrogram and information content of chimpanzee pant-hoot calls. The top panel refers to the location within the expression where different types of information were most strongly encoded. The bottom panel depicts a spectrogram of a typical chimpanzee pant-hoot, featuring all four call units. Reproduced with permission from Fedurek et al. [35] under the Creative Commons Attribution license. (Online version in colour.)

Another well-studied permutational system is putty-nosed monkey alarm calls. Here, males produce two types of alarms, pyows and hacks, typically assembled into longer sequences. Roughly speaking, pure pyow series are given to terrestrial disturbances, whereas pure hack series are given to crowned eagles [36]. However, males sometimes produce mixed sequences, i.e. some pyows followed by some hacks, which reliably predicts group movement [16]. In playback experiments, pyow-hack sequences triggered movements by subjects towards the speaker, in contrast to pyow or hack series [37], suggesting that listeners attend to the permutation rule. The variable number of pyows and hacks in natural pyow-hack combinations had no measurable impact on recipient responses [38]. Due to the fact that the apparent meaning of the pyow-hack sequence cannot be inferred from the individual meanings, the system does not qualify as compositional but has been interpreted as idiomatic [39].

(iii). Combination syntax

As opposed to permutation, combinatoriality is defined as ‘the selection of a given number of elements from a larger number without regard to their arrangement’. A first relevant example is titi monkey alarm calling where adults produce A and B calls to aerial and terrestrial predators, respectively [40,41]. This basic pattern is disrupted if predators are detected in non-standard locations, i.e. a raptor on the ground or a terrestrial predator in the canopy [42]. Then callers produced mixed sequences containing both A and B calls in combinatorial but non-permutational ways [43]. In subsequent playback experiments, it could be shown that sequences recorded under the four experimental conditions (aerial predator ground or canopy, terrestrial predator ground or canopy) elicited adequate behavioural responses (figure 4, [44]). One specific combinatorial feature, the proportion of B-call bigrams in the sequence, had a particularly strong impact on recipient responses, suggesting it served as a main carrier of meaning.

Figure 4.

Figure 4.

Proportion of time the listener spent looking upwards depending on the experimental condition. The figure shows raw data (one line per individual), model estimates (black circles) and bootstrapped model estimates (coloured circles, 1000 bootstraps). Subjects looked more upwards when they were tested with sequences elicited by an aerial predator or by a predator in the canopy. Reproduced with permission from Berthet et al. [44]. (Online version in colour.)

For great apes, the combinatorial syntax is reported in bonobo food calling. Here, individuals were shown to select from five different call types (bark, B; peep, P; peep-yelp, PY; yelp, Y; grunt, G) to produce seemingly unordered combinations. Individual calls appeared to carry some meaning, insofar as different calls were preferably given to differently perceived food qualities (figure 5) [45].

Figure 5.

Figure 5.

Assessments of meaning of calls given by bonobos when encountering foods of differently perceived qualities. Reproduced with permission from Clay and Zuberbühler [45] under the Creative Commons Attribution license.

In subsequent playbacks, we trained subjects to anticipate highly preferred food (kiwi) in one part of their enclosure, whereas less preferred food (apple) could be found in another part [46]. Following this initial training, we broadcast segments of natural call sequences originally recorded from individuals discovering either kiwi or apple. Although sequences consisted of only four calls (table 2), subjects responded in highly predictable ways, such that sequences originally given to kiwi triggered mainly searching in the kiwi field whereas the opposite was observed with sequences originally given to apples.

Table 2.

Playback stimuli and receiver foraging efforts in bonobos at Twycross Zoo, England.

signallers
receivers
food sequence kiwi field apple field kiwi bias apple bias
kiwi B B P B 21.0 2.5 9.4 1.1
kiwi B B P B 6.5 2.5 3.6 1.4
kiwi B B P PY 28.3 5.0 6.7 1.2
apple PY B B PY 20.0 12.0 2.7 1.6
kiwi P P PY P 79.0 18.8 5.2 1.2
kiwi PY P P P 20.8 1.8 12.6 1.1
kiwi P PY PY P 1.3 2.5 1.5 2.9
kiwi P P PY Y 16.5 6.5 3.5 1.4
apple PY P PY PY 14.8 3.3 5.5 1.2
apple Y PY PY P 0.0 15.8 1.0 >12
apple Y PY P Y 9.3 2.0 5.7 1.2
apple PY PY Y PY 5.8 20.8 1.3 4.6
apple Y P Y Y 6.5 14.3 1.5 3.2
apple Y Y Y P 3.8 40.8 1.1 11.7
apple PY PY Y Y 9.5 9.0 2.1 1.9
apple PY Y PY Y 6.5 17.3 1.4 3.7
apple PY Y Y Y 2.5 10.3 1.2 5.1

Are bonobo food calls compositional? As mentioned, definitions of language are based on the notion that the meaning of a complex expression is fully determined by its structure and the meanings of its constituents [47]. If permutation (order) is an essential requirement for ‘structure’, then neither titi monkey alarm calling nor bonobo food calling qualify as compositional. However, in both systems, there is a curious presence of bigrams, which appears to play an organizing role. In titi monkeys, the proportion of B-call bigrams explained much of the monkeys' production and response patterns. In bonobos, the meaning of the sequence appears to be determined by the same principle: 15 of 17 playback sequences contained bigrams of B, P, PY and Y calls, and the presence of such structures explained to a large degree the subsequent search behaviour. In particular, B and P bigrams elicited a strong kiwi bias (in search efforts), whereas PY and Y bigrams elicit an apple bias. No bigrams in the sequence (n.a.) elicited an ambiguous response pattern, either a kiwi or an apple bias (table 3).

Table 3.

The role of bigrams in bonobo food calling.

bigram kiwi search (s)
(median)
apple search (s)
(median)
kiwi bias
(median)
apple bias
(median)
N
B 20.5 3.8 5.1 1.3 4
P 20.8 6.5 5.2 1.2 3
PY 5.8 9.0 1.5 2.9 5a
Y 5.2 12.3 1.3 4.2 4a
n.a. 9.3 2.0 5.7 1.2 1
n.a. 6.5 17.3 1.4 3.7 1

a1 of 15 sequence contained both Y and PY bigrams; n.a., no bigrams within the first four calls.

But what meaning has been computed by these expressions? As B and P calls are already associated with high-quality foods, why do callers not just use pure series of these calls when detecting kiwi? One possibility is that the meaning of some of the component calls is broader. This is certainly the case for P (peep) calls, which are given to a wide range of events, including non-food ones, that require others’ attention [48]. Producing calls in sequences, and particularly producing bigrams, thus allows individuals to generate acoustically conspicuous structures in response to some external events, but not others. Furthermore, great apes in the wild feed on hundreds of items, which differ in a wide range of variables (e.g. greens versus fruits, large versus small amounts, high versus low quality, novel versus familiar), potentially enabling callers to compose a range of complex expressions from a small number of calls to convey such differences. Also, callers may have reasons to encode individual identity into their utterances when finding food, and call sequences may facilitate this demand. It has been argued that food calling is some sort of other-serving behaviour, part of social reciprocity [49], suggesting that marking individual identity could be important when providing information about food encounters. Further work is needed to explore these questions, but for now, it would be difficult to defend a compositionality interpretation in the bonobo call system.

5. Conclusion

(a). Is animal communication compositional?

Are humans unique in having the capacity to compose complex expressions from simpler ones? Schlenker et al. [32] have proposed to distinguish between limited (trivial) and genuine (non-trivial) compositionality. Examples in English are ‘It's humid’ (trivial) and ‘It's very humid’ (non-trivial). The difference lies in the fact that for trivial compositions, there is no need for semantic operations; separate utterances are merely added into meaningful expressions. In genuine compositions, a semantic operation is active, such that the composite expression cannot be analysed from its components (‘it's humid’, ‘very’) in a direct way. In several theory papers, primate call systems have been analysed in such ways, which has led to the conclusion that some systems, particularly Campbell's monkey call suffixation and putty-nosed monkey call permutations, have weak compositional properties, a claim with implications for evolutionary theories of language [32,39,5053]. Arguably, however, the evidence for compositionality is better for passerine birds, with empirical evidence for compositional structure in both Japanese tits [54] and southern pied babblers [55,56]. For example, Japanese tits produce ‘alert’ calls to predators, ‘recruitment’ calls to non-dangerous situations and ‘alert-recruitment’ sequences to incite mobbing behaviour to a predator. Here, permutation matters since (non-natural) ‘recruitment-alert’ concatenations do not trigger any responses. Similar findings have been reported in southern pied babblers, suggesting that this type of compositional syntax may be widespread in passerine birds.

(b). Is merge uniquely human?

Related to this is the distinction between expressions with linear and hierarchical structures. Townsend et al. [8] discuss the sentence ‘Duck and cover!’ as an example of simple, non-hierarchical compositionality in contrast to ‘Neither did the linguist we consulted object nor was she interested’ as an example of a complex expression that consists of three hierarchically organized sentences. Not all human syntax is hierarchical but an evolutionary theory of syntax must encompass the entire range of phenomena. Nevertheless, it is interesting that all currently known examples of permutation syntax in animals consist of systems containing only two calls. Future studies may provide different results, but for now, animal and human syntax may simply differ in the complexity of ‘merge’ operations, but not in kind [57], with 0-merge systems containing only single lexical items with no combinations and 1-merge systems consisting of simple combinations, with no recursion. As said, all animal syntax appears to fall into this category. 2-merge systems, then, allow for recursion with previously merged complexes whereas 3-merge systems allow for merges of two already merged complexes [57]. This requires an individual to keep a merged complex in short-term memory while processing further mergers. The different degrees of merging, in this view, may simply be due to short-term memory limits, with only larger brained species being capable of carrying out merges of already merged units. Whether or not animals are limited by a capacity for short-term retention of merged complexes (and whether such a capacity is sufficient for recursion) needs to be tested with specific research.

(c). How does syntax evolve?

The saltation hypothesis is unlikely to carry much weight in evolutionary discussions of language origins. Although saltation can happen, it is not known to play a role in brain evolution. Syntax as a by-product of enhanced computational capacity is more plausible, especially in light of the proposal that complex phrase structure merges require short-term memory capacities, perhaps only available to humans. Regarding the hypothesis that syntax is an evolutionary response to increasing lexicon demands, no such pattern is visible in the published record of animal syntax. Some of the most interesting examples are from systems that consist only of a small number of calls (Campbell's monkey alarm calls, putty-nosed monkey alarm calls), suggesting that the evolution of further structure would have been possible.

How then did human syntax evolve? One main unresolved problem concerns the transition from non-hierarchical, 2-merge units in animal communication to generative, hierarchically structured, higher-order merges only seen in human language (e.g. [58]). It is possible that this transition actually involved two independent evolutionary events, a capacity for advanced articulatory control and a capacity to perceive others as governed by mental states. Syntax, in this scenario, can only emerge in species that have an urge to communicate perceptions of natural events to others (‘Mitteilungsbedürfnis’), which requires some understanding of other minds. Natural events are structured in hierarchical ways, i.e. in terms of actors and patients interacting in intentional ways, but this needs to be loaded upon a linear signal output, which requires a much richer signal repertoire than normally available to non-human primates. The evolution of syntax and compositionality, in this view, can only emerge in species that have evolved the computational resources to perceive, represent and recall natural events in such hierarchical ways and that have a signal system capable of conveying such information.

(d). Future directions

The event perception hypothesis has not been subject to much empirical research, at least not in animals. One possibility is that syntax-ready brains have evolved before modern humans, whereas the main transition towards language merely consisted of a communicative externalization of the cognitive propensity to perceive events in intentional ways (agents, patients). A first question therefore is whether non-human primates perceive and represent events in human-like ways or whether their perceptions are syntactically unstructured and holistic. If primates perceive events like humans, we expect them to visually explore social scenes in rule-governed ways along the main grammatical functions, for example by looking first or longer at the likely actor (‘subject’: NP) compared to the likely patient (‘predicate’: VP), regardless of the actor's and patient's actual appearance, emotional valence or social status (‘complement’: Adjective). Related questions are whether primates perceive inter-dependent event sequences as chronologically structured, suggesting that they may be more surprised by unexpected changes in the actor (NP) than unexpected changes in the action (VP), and whether primates are able to communicate such categories to others. In sum, research on the cognition of event perception is likely to contribute in new ways to the emerging science of animal syntax, currently dominated by studies on natural communication systems, towards a more comprehensive theory of language evolution.

Acknowledgements

The author is very grateful for discussions with Balthasar Bickel, Philippe Schlenker, Emmanuel Chemla, Simon Townsend and Jonathan Fritz.

Data accessibility

This article does not contain any additional data.

Competing interests

I declare I have no competing interests.

Funding

Much of the research reviewed in this article has benefitted from funding by the Leverhulme Trust, the European Research Council, the Royal Zoological Society of Scotland and the Swiss National Science Foundation.

References

  • 1.Langergraber KE, et al. 2012. Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution. Proc. Natl Acad Sci USA 109, 15 716–15 721. ( 10.1073/Pnas.1211740109) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Lameira AR, Maddieson I, Zuberbuehler K. 2014. Primate feedstock for the evolution of consonants. Trends Cogn. Sci. 18, 60–62. ( 10.1016/j.tics.2013.10.013) [DOI] [PubMed] [Google Scholar]
  • 3.Fitch WT, de Boer B, Mathur N, Ghazanfar AA. 2016. Monkey vocal tracts are speech-ready. Sci. Adv. 2, e1600723 ( 10.1126/sciadv.1600723) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kersken V, Zuberbühler K, Gomez J-C. 2017. Listeners can extract meaning from non-linguistic infant vocalisations cross-culturally. Sci. Rep. 7, 41016 ( 10.1038/srep41016) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Laporte MNC, Zuberbühler K. 2011. The development of a greeting signal in wild chimpanzees. Dev. Sci. 14, 1220–1234. ( 10.1111/j.1467-7687.2011.01069.x) [DOI] [PubMed] [Google Scholar]
  • 6.Zuberbühler K. 2011. Cooperative breeding and the evolution of vocal flexibility. In Oxford handbook of language evolution (eds Tallerman M, Gibson K). Oxford, UK: Oxford University Press. [Google Scholar]
  • 7.Tomasello M. 2009. The usage-based theory of language acquisition. In Cambridge handbook of child language (ed. EL Bavin), pp. 69–87. Cambridge, UK: Cambridge University Press. [Google Scholar]
  • 8.Townsend SW, Engesser S, Stoll S, Zuberbühler K, Bickel B. 2018. Compositionality in animals and humans. PLoS Biol. 16, e2006425 ( 10.1371/journal.pbio.2006425) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Jackendoff R. 1997. The architecture of the language faculty. Cambridge, MA: MIT Press. [Google Scholar]
  • 10.Chomsky N. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press. [Google Scholar]
  • 11.Chomsky N. 2006. Language and mind. Cambridge, UK: Cambridge University Press. [Google Scholar]
  • 12.Serres MH, Kerr AR, McCormack TJ, Riley M. 2009. Evolution by leaps: gene duplication in bacteria. Biol. Dir. 4, 46 ( 10.1186/1745-6150-4-46) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Dufresne F, Herbert PDN. 1994. Hybridization and origins of polyploidy. Proc. Biol. Sci. 258, 141–146. ( 10.1098/rspb.1994.0154) [DOI] [Google Scholar]
  • 14.Herculano-Houzela S, Kaasb JH. 2011. Gorilla and orangutan brains conform to the primate cellular scaling rules: implications for human evolution. Brain Behav. Evol. 77, 33–44. ( 10.1159/000322729) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Bolhuis JJ, Tattersall I, Chomsky N, Berwick RC. 2014. How could language have evolved? PLoS Biol. 12, e1001934 ( 10.1371/journal.pbio.1001934) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Arnold K, Zuberbühler K. 2006. Semantic combinations in primate calls. Nature 441, 303 ( 10.1038/441303a) [DOI] [PubMed] [Google Scholar]
  • 17.Nowak MA, Plotkin OB, Jansen VAA. 2000. The evolution of syntactic communication. Nature 404, 495–498. ( 10.1038/35006635) [DOI] [PubMed] [Google Scholar]
  • 18.Zuberbühler K. 2019. Evolutionary roads to syntax. Anim. Behav. 151, 259–265. ( 10.1016/j.anbehav.2019.03.006) [DOI] [Google Scholar]
  • 19.Dasser V, Ulbaek I, Premack D. 1989. The perception of intention. Science 243, 365–367. ( 10.1126/science.2911746) [DOI] [PubMed] [Google Scholar]
  • 20.Friederici AD. 2004. Processing local transitions versus long-distance syntactic hierarchies. Trends Cogn. Sci. 8, 245–247. ( 10.1016/j.tics.2004.04.013) [DOI] [PubMed] [Google Scholar]
  • 21.Fitch WT, Hauser MD. 2004. Computational constraints on syntactic processing in a nonhuman primate. Science 303, 377–380. ( 10.1126/science.1089401) [DOI] [PubMed] [Google Scholar]
  • 22.Abe K, Watanabe D. 2011. Songbirds possess the spontaneous ability to discriminate syntactic rules. Nat. Neurosci. 14, 1067–U1173. ( 10.1038/nn.2869) [DOI] [PubMed] [Google Scholar]
  • 23.Beckers GJL, Berwick RC, Okanoya K, Bolhuis JJ. 2017. What do animals learn in artificial grammar studies? Neurosci. Biobehav. Rev. 81, 238–246. ( 10.1016/j.neubiorev.2016.12.021) [DOI] [PubMed] [Google Scholar]
  • 24.Zuberbühler K. 2018. Combinatorial capacities in primates. Curr. Opin. Behav. Sci. 21, 161–169. ( 10.1016/j.cobeha.2018.03.015) [DOI] [Google Scholar]
  • 25.Garland EC, Rendell L, Lamoni L, Poole MM, Noad MJ. 2017. Song hybridization events during revolutionary song change provide insights into cultural transmission in humpback whales. Proc. Natl Acad. Sci. USA 114, 7822–7829. ( 10.1073/pnas.1621072114) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Marshall JT, Marshall ER. 1976. Gibbons and their territorial songs. Science 193, 235–237. ( 10.1126/science.193.4249.235) [DOI] [PubMed] [Google Scholar]
  • 27.Clarke E, Reichard UH, Zuberbühler K. 2006. The syntax and meaning of wild gibbon songs. PLoS ONE 1, e73 ( 10.1371/journal.pone.0000073) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Andrieu J, Malaivijitnond S, Reichard UH, Zuberbühler K. In preparation. White-handed gibbons discriminate context-specific song compositions. [DOI] [PMC free article] [PubMed]
  • 29.Candiotti A, Zuberbühler K, Lemasson A. 2012. Context-related call combinations in female Diana monkeys. Anim. Cogn. 15, 327–339. ( 10.1007/s10071-011-0456-8) [DOI] [PubMed] [Google Scholar]
  • 30.Ouattara K, Lemasson A, Zuberbühler K. 2009. Campbell's monkeys use affixation to alter call meaning. PLoS ONE 4, e7808 ( 10.1371/journal.pone.0007808) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Coye C, Ouattara K, Zuberbühler K, Lemasson A. 2015. Suffixation influences receivers' behaviour in non-human primates. Proc. R. Soc. B 282, 20150265 ( 10.1098/rspb.2015.0265) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Schlenker P, et al. 2016. Formal monkey linguistics. Theor. Linguist. 42, 1–90. ( 10.1515/tl-2016-0001) [DOI] [Google Scholar]
  • 33.Ouattara K, Lemasson A, Zuberbühler K. 2009. Campbell's monkeys concatenate vocalizations into context-specific call sequences. Proc. Natl Acad. Sci. USA 106, 22 026–22 031. ( 10.1073/pnas.0908118106) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Zuberbühler K. 2002. A syntactic rule in forest monkey communication. Anim. Behav. 63, 293–299. ( 10.1006/Anbe.2001.1914) [DOI] [Google Scholar]
  • 35.Fedurek P, Zuberbühler K, Dahl CD. 2016. Sequential information in a great ape utterance. Sci. Rep. 6, 38226 ( 10.1038/srep38226) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Arnold K, Zuberbühler K. 2006. The alarm-calling system of adult male putty-nosed monkeys, Cercopithecus nictitans martini. Anim. Behav. 72, 643–653. ( 10.1016/j.anbehav.2005.11.017) [DOI] [Google Scholar]
  • 37.Arnold K, Zuberbühler K. 2008. Meaningful call combinations in a non-human primate. Curr. Biol. 18, R202–R203. ( 10.1016/j.cub.2008.01.040) [DOI] [PubMed] [Google Scholar]
  • 38.Arnold K, Zuberbühler K. 2012. Call combinations in monkeys: compositional or idiomatic expressions? Brain Lang. 120, 303–309. ( 10.1016/J.Bandl.2011.10.001). ). [DOI] [PubMed] [Google Scholar]
  • 39.Schlenker P, Chemla E, Zuberbühler K. 2016. What do monkey calls mean? Trends Cogn. Sci. 20, 894–904. ( 10.1016/j.tics.2016.10.004) [DOI] [PubMed] [Google Scholar]
  • 40.Caesar C, Byrne R, Young RJ, Zuberbühler K. 2012. The alarm call system of wild black-fronted titi monkeys, Callicebus nigrifrons. Behav. Ecol. Sociobiol. 66, 653–667. ( 10.1007/s00265-011-1313-0) [DOI] [Google Scholar]
  • 41.Caesar C, Byrne RW, Hoppitt W, Young RJ, Zuberbühler K. 2012. Evidence for semantic communication in titi monkey alarm calls. Anim. Behav. 84, 405–411. ( 10.1016/j.anbehav.2012.05.010) [DOI] [Google Scholar]
  • 42.Cäsar C, Zuberbühler K, Young RJ, Byrne RW. 2013. Titi monkey call sequences vary with predator location and type. Biol. Lett. 9, 20130535 ( 10.1098/rsbl.2013.0535) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Berthet M, Neumann C, Mesbahi G, Casar C, Zuberbühler K. 2018. Contextual encoding in titi monkey alarm call sequences. Behav. Ecol. Sociobiol. 72, 8 ( 10.1007/s00265-017-2424-z) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Berthet M, Mesbahi G, Pajot A, Cäsar C, Neumann C, Zuberbühler K. 2019. Titi monkeys combine alarm calls to create probabilistic meaning. Sci. Adv. 5, eaav3991 ( 10.1126/sciadv.aav3991) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Clay Z, Zuberbühler K. 2009. Food-associated calling sequences in bonobos. Anim. Behav. 77, 1387–1396. ( 10.1016/J.Anbehav.2009.02.016) [DOI] [Google Scholar]
  • 46.Clay Z, Zuberbühler K. 2011. Bonobos extract meaning from call sequences. PLoS ONE 6, e18786 ( 10.1371/journal.pone.0018786) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Szabó ZG. 2017. Compositionality. In The Stanford encyclopedia of philosophy (ed. Zalta EN.), [online]. Stanford, CA: Metaphysics Research Lab. [Google Scholar]
  • 48.Clay Z, Archbold J, Zuberbühler K. 2015. Functional flexibility in wild bonobo vocal behaviour. PeerJ. 3, e1124 ( 10.7717/peerj.1124) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Needle, et al. In preparation.
  • 50.Schlenker P, Chemla E, Arnold K, Lemasson A, Ouattara K, Keenan S, Stephan C, Ryder R, Zuberbühler K. 2014. Monkey semantics: two 'dialects’ of Campbell's monkey alarm calls. Linguist. Philos. 37, 439–501. ( 10.1007/s10988-014-9155-7) [DOI] [Google Scholar]
  • 51.Schlenker P, Chemla E, Arnold K, Zuberbühler K. 2016. Pyow-hack revisited: two analyses of putty-nosed monkey alarm calls. Lingua 171, 1–23. ( 10.1016/j.lingua.2015.10.002) [DOI] [Google Scholar]
  • 52.Schlenker P, Chemla E, Casar C, Ryder R, Zuberbühler K. 2017. Titi semantics: context and meaning in Titi monkey call sequences. Nat. Lang. Linguist. Theory 35, 271–298. ( 10.1007/s11049-016-9337-9) [DOI] [Google Scholar]
  • 53.Schlenker P, et al. 2016. Formal monkey linguistics: the debate. Theor. Linguist. 42, 173–201. ( 10.1515/tl-2016-0010) [DOI] [Google Scholar]
  • 54.Suzuki TN, Wheatcroft D, Griesser M. 2016. Experimental evidence for compositional syntax in bird calls. Nat. Commun. 7, 10986 ( 10.1038/ncomms10986) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Engesser S, Crane JMS, Savage JL, Russell AF, Townsend SW. 2015. Experimental evidence for phonemic contrasts in a nonhuman vocal system. PLoS Biol. 13, e1002171 ( 10.1371/journal.pbio.1002171) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Engesser S, Ridley A, Townsend S. 2016. Meaningful call combinations and compositional processing in the southern pied babbler. Proc. Natl Acad. Sci. USA 113, 5976–5981. ( 10.1073/pnas.1600970113) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Rizzi L. 2016. Monkey morpho-syntax and merge-based systems. Theor. Linguist. 42, 139–145. ( 10.1515/tl-2016-0006) [DOI] [Google Scholar]
  • 58.Bolhuis JJ, Beckers GJL, Huybregts MAC, Berwick RC, Everaert MBH. 2018. Meaningful syntactic structure in songbird vocalizations? PLoS Biol. 16, e2005157 ( 10.1371/journal.pbio.2005157) [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

This article does not contain any additional data.


Articles from Philosophical Transactions of the Royal Society B: Biological Sciences are provided here courtesy of The Royal Society

RESOURCES