Abstract
The extent to which vocal learning can be found in nonhuman primates is key to reconstructing the evolution of speech. Regarding the adjustment of vocal output in relation to auditory experience (vocal production learning in the narrow sense), effects on the ontogenetic trajectory of vocal development as well as adjustment to group-specific call features have been found. Yet, a comparison of the vocalizations of different primate genera revealed striking similarities in the structure of calls and repertoires in different species of the same genus, indicating that the structure of nonhuman primate vocalizations is highly conserved. Thus, modifications in relation to experience only appear to be possible within relatively tight species-specific constraints. By contrast, comprehension learning may be extremely rapid and open-ended. In conjunction, these findings corroborate the idea of an ancestral independence of vocal production and auditory comprehension learning. To overcome the futile debate about whether or not vocal production learning can be found in nonhuman primates, we suggest putting the focus on the different mechanisms that may mediate the adjustment of vocal output in response to experience; these mechanisms may include auditory facilitation and learning from success.
This article is part of the theme issue ‘What can animal communication teach us about human language?’
Keywords: alarm calls, Chlorocebus, speech evolution, Papio, vocal production, learning
1. Background
Conventionalized communication in the auditory–vocal domain, such as in human speech, crucially requires learning in both the production and comprehension of sounds. To shed light on the evolutionary origins of vocal learning, numerous studies have investigated vocal learning in nonhuman primates (hereafter ‘primates’). In the context of vocal learning, it is important to distinguish between vocal learning by the caller and vocal learning by the listener. Vocal learning by callers encompasses adjustment of the structure of the vocalizations (vocal production learning in the narrow sense) and adjustment of usage in relation to experience (vocal usage learning). Vocal learning by listeners comprises auditory comprehension learning, which refers to the ability to associate a sound with its source and/or what the sound ‘stands for’, that is, what it predicts [1–4].
Classic studies provided clear evidence that primates do not require auditory input to develop normal species-specific vocalizations: monkey infants raised under social and acoustic isolation [5,6] or cross-fostered between species [7], developed species-typical vocalizations. Yet, a number of recent studies reported evidence for vocal learning in chimpanzees and marmosets. In the following, we will review these studies, taking also some of the earlier studies into consideration, before we turn to our own work on baboons and savannah monkeys. We will conclude with a discussion of the potential mechanisms that underlie the observed effects. We argue that some of the divergent views on the issue of vocal learning may be reconciled by distinguishing more clearly between different ways in which way social and auditory experience may contribute to variation in acoustic structure.
2. Evidence from chimpanzees
A highly influential study on vocal learning in the food grunts of chimpanzees explored changes in acoustic structure over 4 years [8], after integration of nine adult subjects from a group of chimpanzees previously housed in the Beekse Bergen Safari Park (BB group) in The Netherlands, with nine adult subjects housed at the Edinburgh zoo (ED group). Observations were made during 3 years over a 4-year period between 2010 and 2013. The datasets comprised recordings of food grunts during the feeding of apples, as well as proximity data from instantaneous scans, to track changes in the social network during the study period [8]. Over time, the acoustic features of BB subjects became more similar to the ED subjects, as evidenced by a significant interaction between the factors ‘group’ and the ‘year of study’ [8], as well as significant differences in non-parametric follow-up tests in the beginning of the study, and lack of such differences in the last year of the study [9]. A detailed inspection of the individual acoustic features suggested that there was substantial overlap in acoustic features to begin with, however, as six out of seven of the BB subjects also produced calls that fell within the range of the ED subjects from the beginning of the study [10]. This overlap is relevant for gauging the potential mechanisms underpinning the observed change, as detailed below (§5) in our framework for vocal production learning.
To assess the degree of plasticity in chimpanzee vocalizations, different studies explored the variation between wild social groups or populations. In their analysis of community-specific differences, Crockford & Boesch [11] investigated the acoustic structure of pant–hoots given by males from three neighbouring communities, for instance. A discriminant function analysis revealed that the calls of the three communities could be well distinguished from each other, but when calls from males residing in a distant community were added, the classification results dropped considerably. These results were taken as indirect evidence that male chimpanzees converge in their pant–hoot characteristics, so as to be distinct from males in neighbouring communities [11]. Since the authors did not report the variation in specific parameters, it is difficult to judge the extent of the variation between communities in absolute terms.
Similarly, Mitani et al. [12] observed significant differences in temporal characteristics and in the frequency characteristics of the climax element of chimpanzee pant–hoots uttered by members of two different groups. Marshall et al. [13] studied the acoustic structure of pant–hoots of two groups of captive chimpanzees and compared the recordings with calls from three male chimpanzees belonging to the Kanyawara community in Uganda. Irrespective of whether the pant–hoots were recorded in captivity or the wild, the spectral structure of the elements of this multisyllable call was largely similar, although the authors observed significant variation in temporal aspects of the two captive groups comprising 11 and 3 males.
If vocal learning and conventionalization would play a major role in shaping vocal output, one would expect marked differences between communities that never interact. Mitani et al. [14] examined variation in the long-distance calls (pant–hoots) between two large populations of eastern chimpanzees, with 10 males from Mahale and 12 males from Kibale. These populations live about 700 km apart. While the qualitative comparison calls revealed considerable heterogeneity within populations in terms of the structure of pant–hoots, the quantitative comparison of specific acoustic features identified several significant acoustic differences. Males from Kibale produced longer introductory elements than Mahale males, and Kibale males also produced longer build-up elements at a slower rate than Mahale males. Phase duration also varied between individuals of the two populations [14]. The authors discussed a number of factors that may explain the observed variation, including differences in transmission characteristics of the different habitats, differences in the sound environment, and variation in body size. Although the drivers of the observed variation could not be clearly identified, the study further corroborated the assumption of a strong hard-wired component of vocal output, with little evidence for a role of learning or conventionalization.
In summary, despite some variation between groups and populations, it appears that the call structure of chimpanzees is largely innate. Even the distantly related eastern and western chimpanzees produce the same general call types [15,16]. Within the constraints imposed by the neural mechanisms and genetic architecture underpinning vocal patterns, there appears to be some limited potential for plasticity at the individual and group level. One caveat is that many of the studies on chimpanzees are based on small sample sizes, and thus it is difficult to judge the extent of this plasticity with greater confidence.
3. Evidence from marmosets and macaques
In recent years, another group of primate species has attracted increasing attention with regard to their potential vocal learning abilities, namely marmosets and tamarins. These small callitrichid monkeys live in small family groups with extended family care for young. Females typically give birth to twins, and both the father and older offspring carry the young [17]. When separated, both young and adult subjects emit vocalizations that function to re-establish contact [18,19]. Moreover, calling by one party often prompts counter-calling from the other party [20]. This ‘antiphonal’ calling provides researchers with an excellent opportunity to study the effects of vocal feedback on the development of offspring vocalizations.
During the first two months of life, marmoset vocalizations undergo distinct changes. Specifically, young first emit twitter calls and trills, as well as cries, phee-cries and subharmonic phees. The latter three call types are proposed to transition into adult phee calls [21]. Subjects almost exclusively produce phee calls from an age of about two months. Some of the changes seen in the first two months can be attributed to physiological growth and an increase in strength. At the same time, marmosets exhibit a developmental pattern of FoxP2 expression in their thalamocortical–basal ganglia circuit [22] that has been likened to that of songbirds and humans [23]. Therefore, Takahashi et al. [21] hypothesized that marmoset vocal development may also be prone to the influence of vocal feedback. Indeed, infants that received more contingent parental responses in the form of antiphonal calling began to use the adult form earlier than those that received fewer contingent responses [21].
These observations formed the basis for a subsequent experimental study in which marmoset infants received either high or low rates of contingent parental calls in response to their own spontaneous vocalizations [24]. Three pairs of marmoset twins (six infants) were briefly separated from their family group. The first 5–10 min of a 40-min experimental session were used to estimate spontaneous calling rates of the infants (baseline), while the remaining time comprised closed-loop feedback with high (100%) or low (10%) rates of contingent vocal feedback. Because one infant of each twin pair was assigned to the high- and the other to the low-feedback condition, genetic and perinatal experiences were controlled. The study provided compelling evidence that infants in the high-feedback condition developed the adult phee call variant earlier than those in the low-feedback condition [24]. The authors also checked whether infants in the high-feedback condition produced more calls on average, by examining the call rate in the baseline period, i.e. prior to acoustic feedback. Yet, what was not reported is whether calling rates in the experimental period were affected. If parental calls are not only given in response to infant calls, but also elicit antiphonal calls by the infants themselves, this may have resulted in substantially higher call rates and thus more practice for infants in the high-feedback condition. Unfortunately, information on infant call rates in the experimental period was not provided.
Another study on marmosets investigated whether parental auditory feedback is an obligate requirement for proper vocal development or whether it simply accelerates vocal development, by tracing the vocal behaviour of two sets of offspring. Two infants were normally raised, while the other three were separated from the parents after the third postnatal month [25]. All five monkeys eventually produced mature vocalizations. In contrast to normally raised monkeys, however, marmosets with limited parental feedback also produced infant-specific vocal behaviour up to an age of 13 months. The social interactions between infants and parents affected the maturation of the vocal behaviour, including changes in acoustic call structures during development. Specifically, subjects that experienced only limited parental input produced calls with a higher entropy than normally reared monkeys [26].
In summary, there is converging evidence that contingent auditory feedback plays a key role in shaping the developmental trajectory of marmosets. At the same time, irrespective of rearing history and amount and temporal contingency of parental feedback, subjects are ultimately able to produce the regular adult call type.
In order to explore which factors may influence modifications in call structure during early development, Hammerschmidt et al. [27] studied the development of ‘coo’ calls in young rhesus macaques, Macaca mulatta. This call type is useful to study ontogenetic changes, because it can be elicited reliably from birth on. Calls were recorded during brief periods of separation from more than 20 rhesus macaque infants from the first week of life until the age of five months. Infants were either raised with their mothers in normal breeding colonies or separated from their mothers at birth and housed in a nursery with other age-matched peers. With increasing age, the ‘coo’ calls underwent several changes: calls dropped in pitch and showed reduced variability in call amplitude and fundamental frequency. In addition, call duration increased slightly. Aside from high residual intra-individual variability throughout the recording period, no significant influence of sex or rearing conditions could be found. Controlling for weight as a reliable proxy measure for body growth and changes in vocal tract characteristics [28,29], all except one significant correlation with age could be excluded. The only acoustic parameter that could not be explained by weight gain was a parameter describing the portion of amplitude gaps (figure 1). To produce constant amplitude throughout a call, it is necessary to produce the correct lung pressure in relation to vocal fold tension. Without a correct combination no audible sound can be produced [30]. Obviously, young rhesus macaques need some practice to find the correct combination of lung pressure and vocal fold tension to produce the coo modulation (figure 1). All other changes could be explained by growth. The fact that there were no differences between nursery and mother-reared animals confirmed the view that young rhesus do not require an adult model to produce species-specific ‘coo’ calls.
Figure 1.
Spectrograms of rhesus macaque coo calls. Recordings were obtained from two subjects at the age of one to two weeks, three to four weeks, and four to five months, respectively.
4. Comparison of baboon repertoires
Baboons constitute an interesting case to assess the influence of the degree of despotism and other social system characteristics on the vocal repertoire structure, as members of the genus vary greatly in the degree of male tolerance as well as their social organization [31]. The different species are distributed in a wide range of habitats across sub-Saharan Africa and the Arabian Peninsula (figure 2). In our analysis, we focused on three species: the highly despotic female philopatric chacma baboon, the less despotic female philopatric olive baboon and the tolerant male philopatric Guinea baboon [32].
Figure 2.
Spectrograms of baboon loud calls from Guinea baboons (Papio papio), olive baboons (Papio anubis) and chacma baboons (Papio ursinus). (a) Females and (b) males. (c) Distribution of different baboon species on the African continent. Baboon drawings by Steven Nash. (Online version in colour.)
We conducted a detailed acoustic analysis of two types of calls (grunts and loud calls) from the three species. Grunts are produced by males and females of all age classes immediately before or during affiliative interactions, but also before or during group movement [33,34]. Loud calls (figure 2) are also given by male and female animals of all age classes when they lose contact with the group or specific individuals [35]. In addition, loud calls are used as alarm calls upon sighting predators such as lions or leopards. Furthermore, chacma baboons use loud calls as part of their ‘wahoo displays’ in male–male competition [36].
The overall structure of these two call types was comparable in all three species. Yet, both grunts and loud calls varied between species in terms of several acoustic features, including call duration, mean fundamental frequency and mean peak frequency [32]. Based on a discriminant function analysis, it was possible to assign the calls to the respective species, but there was considerable misclassification. For female grunts, 65.5% of calls could be correctly classified; for males, the correct classification was 70.4% [32]. Loud calls showed greater variation between species, as predicted by the hypothesis that loud calls serve to distinguish one group from another, and correspondingly yielded better classification results. For females, 86.3% of calls were correctly assigned to the respective species; for males, the correct classification was 88.9%. Thus, although the general structure of the calls is comparable, detectable differences between the species exist. These may be partly due to size differences between the species. Interestingly, we found no correlation between the degree of acoustic differentiation and phylogenetic distance, however. By contrast, the acoustic variation between different species of leaf monkeys and gibbons revealed a high concordance with their phylogenetic distance [37,38]. The finding that the acoustic differentiation in baboons is less pronounced than in gibbons and leaf monkeys supports the idea that there were no strong selection pressures favouring species recognition and territorial defence in this genus [39].
In addition, we compared the overall structure of the vocal repertoires of Guinea and chacma baboons, as we had the most comprehensive database for these two species. We found an overall similarity between the most common calls of the two species. We therefore assume that the neural pattern generators giving rise to the different call types are highly conserved between the different baboon species. In summary, selective pressures such as sexual selection or inter-group competition may add to variation of calls, but only within certain constraints existing within the genus [32]. A further striking finding was that species that differ so prominently in their social system characteristics as olive and chacma baboons on the one hand, and Guinea baboons on the other, do not differ in terms of their vocal diversity [32].
A proper comparison of entire repertoires is difficult to achieve, as it takes an extreme sampling effort to collect a sufficient number of calls across all call types and from a sufficient number of individuals [32]. As we pointed out before [32], it would be necessary to collect data from several populations per species for a solid assessment. Yet, from the presently available data, it seems unlikely that a very different picture would emerge. All qualitative and quantitative comparisons available to date clearly suggest that the general call types do not vary fundamentally between different baboon species. Given that the six species differ considerably in terms of their social organization and the degree of sexual selection, this finding is quite remarkable. By contrast, behavioural dispositions such as aggressiveness vary strongly, most probably as a result of variation in inter- and intra-group competition. As a side note, the frequently held assumption that vocal complexity varies with social complexity [40] does not seem to hold in the genus Papio.
(a). Alarm calls in the genus Chlorocebus
Alarm calls are the focus of many of the most influential studies of primate vocal communication, specifically with regard to the semantic content [41–44] and syntactic properties [45–47] of primate vocalizations. The single most influential example in this context is the alarm call system of vervet monkeys, Chlorocebus pygerythrus. In brief, vervet monkeys have evolved different adaptive escape strategies in response to their main predator categories. They climb into trees upon the appearance of a leopard, scan the sky or run into cover when spotting an eagle, and stand bipedally after detecting a snake. They also produce different types of alarm calls in response to each of these main predator categories [48]. Playback experiments revealed that the calls alone are sufficient to elicit adaptive escape responses [49]. Yet, the classic study was conducted on a single population and until recently, remarkably little was known about the variation between populations or species, although such knowledge provides important insights into the potential flexibility of the alarm call system. More specifically, such comparative research aids to distinguish between local and potentially conventionalized learnt communication systems, and rather hard-wired, evolved solutions in response to predation pressure. We therefore initiated a study of the alarm call system of a congener of vervet monkeys, namely West African green monkey, Chlorocebus sabaeus, in the Niokolo Koba National Park in Senegal, with complementary research on a subspecies of C. pygerythrus in South Africa [50].
To elicit alarm calls, we presented snake and leopard models, as well as a model of an eagle perched on a tree to green monkeys [51]. The monkeys only responded with alarm calls, vigilance and escape responses to the snake and leopard models, while they largely ignored the eagle model. Putty-nosed monkeys, Cercopithecus nictitans martini, by contrast, produced strong alarm responses, including alarm calls, in response to a similar looking eagle model [52]. Because we had never observed the monkeys to give alarm calls in response to any of the birds of prey in the area since the beginning of our studies in 2009, we considered the possibility that the monkeys in this area are not preyed upon by raptors. This, in turn, provided us with the opportunity to present to them a novel aerial threat, a flying drone, to assess their vocal responses [53].
When we flew the drone over the monkeys, the animals produced distinct calls, and a number of subjects ran into cover [53,54]. We conducted an acoustic analysis of these calls and compared them with the calls given by members of the same study population in response to leopard and snake models recorded in a previous study [51]. Using the classification procedure of a discriminant function analysis (DFA), we found that for female subjects, 80.0% of calls could be correctly assigned to the context in which they were given. This was significantly better than chance, as assessed by a DFA based on permuted data [55]. Drone alarms were clearly distinct from the other two categories, with 95.2% correct classification. For male subjects, the overall correct classification was slightly lower, with 71.2% of calls correctly assigned to the context in which they were given, but yet again this was significantly better than chance [53]. Similar to the findings for females, male drone alarms could be most readily distinguished from the other two alarm call categories: 92.6% of drone calls by male green monkeys were correctly classified. In conclusion, the calls given in response to a novel flying object by green monkeys differed from those given in response to snake and leopard models.
In a second step, we investigated to what degree the green monkey alarm calls compared with those of their East African congeners. We were particularly interested how green monkey calls given in response to the drone compared with vervet monkey eagle alarms. For this comparison, we used the original recordings of alarm calls collected by Tom Struhsaker, and Dorothy Cheney and Robert Seyfarth that were part of a previous analysis of the vervet monkey alarm call repertoire [56]. Visual inspection of the spectrograms (figure 3) as well as a statistical analysis of key acoustic parameters for females and males (table 1) revealed significant variation in relation to context and only marginal differences in relation to species. For females, both species revealed a relatively similar pattern: alarm calls given to leopards had the longest element duration, calls given in response to snakes had the highest frequency range and calls given in response to aerial threats had the lowest mean frequency. For males, the picture was more differentiated: leopard alarms were much longer than snake alarms in green monkeys, but only slightly longer in vervet monkeys. In green monkeys, snake alarms had the highest frequency range, while in vervets, the frequency range did not differ between leopard and snake alarms. Aerial alarms had the lowest frequency characteristics in both species (figure 4). These differences between species for male calls point to a differential role of sexual selection; this conjecture requires further investigation.
Figure 3.
Spectrograms of West African green monkey and East African vervet monkey alarm calls, in response to different predator/threat types. (a) Female green monkey calls; (b) male green monkey calls; (c) female vervet monkey calls; and (d) male vervet monkey calls.
Table 1.
Variation in savannah monkey alarm calls. Results of multivariate analysis of variance (only p-values indicated, corrected for multiple testing) of four of the most decisive acoustic variables assessing differences in relation to species and context in Chlorocebus sabaeus and Chlorocebus pygerythrus alarm calls, separately for females and males.
females |
males |
|||
---|---|---|---|---|
acoustic parameters | species | alarm context | species | alarm context |
call duration | 0.212 | 0.000 | 0.788 | 0.000 |
DFA2 mean | 0.212 | 0.000 | 0.788 | 0.000 |
DF1 mean | 0.168 | 0.000 | 0.788 | 0.000 |
frequency range | 0.016 | 0.000 | 0.744 | 0.000 |
Figure 4.
Acoustic differences of Chlorocebus alarm calls in relation to context and species. Boxplots and individual values for female and male vervet monkey (C.p.) and green monkey (C.s.) alarm calls (blue: aer, aerial; orange: leo, leopard; green: snk, snake). (a) Element duration, (b) mean of the central frequency (DFA2), (c) mean dominant frequency (DF1) and (d) mean frequency range. Boxplots indicate median and interquartile range. Whiskers show values within 1.5 times of the interquartile range. Dots represent individual values. Reprinted with permission from [53].
With 80% correct classification of calls to the three contexts in the discriminant function analysis, female green monkey alarm calls were less distinct from each other than the alarm calls of female vervet monkeys, however. In vervet, the correct classification of calls given in response to eagles, leopards and snakes was 93.3%. In male green monkeys, the alarm calls were also less distinct, with 71.2% correct classification, than those of male vervet monkeys, with 81.3% correct classification. The aerial alarms had the highest similarity values between the two species (figure 5).
Figure 5.
Heat maps reflecting the acoustic similarity of West African green monkey (GM) and East African vervet monkey alarm calls (V). (a) Males and (b) females. aer, aerial alarms; snk, snake alarms; leo, leopard alarms.
These findings extend and confirm an earlier study of the acoustic variation of male Chlorocebus barks [50]. Male barks of two subspecies of C. pygerythrus, whose last common ancestor lived about 1.5 Ma ([57], but see [58]), revealed only marginal acoustic differences, and male barks of C. sabaeus, whose last common ancestor with C. pygerythrus lived around 2.1 Ma, also produce barks with a highly similar call structure [50]. In summary, despite considerable geographical and phylogenetic distance, the overall structure of these calls and the structure of the alarm call repertoire of these two species appear highly conserved. Studies from further members of the genus would be needed to assess whether this assessment holds for the entire genus.
(b). Rapid comprehension learning in green monkeys
While our survey of the variation in vocal production revealed only moderate flexibility, there is ample evidence that recipients are able to learn to attribute meaning to a variety of sounds (see [59] for a review). Our study on the green monkeys' responses to the drone provided us with the opportunity to test just how rapidly the animals attached meaning to the sound of the drone. When we prepared to fly the drone over the monkeys for the first time, we noted that the animals appeared to respond to the sound of the drone even before it became visible. This suggested that they are highly sensitive to novel sounds in the environment; further experimental studies are needed to test this notion.
More important in this context, after we had presented the drone, we conducted a playback experiment in which we played back the sound of the drone to the animals [53]. Following the presentation of the drone sound, the monkeys looked significantly longer in the direction of the loudspeaker and were more vigilant than in the control condition in which we played different familiar broad-band noises, such as the sound of the nearby generator. This was true even after the drone had only been presented a single time. Strikingly, the animals were also more likely to look up and scan the sky after the presentation of drone compared with control sounds. In three cases, the subject immediately ran into cover after the presentation of the drone sound. This never happened in response to control experiments [53]. These findings are relevant for two reasons. First, they demonstrate that comprehension learning may be extremely rapid. Second, they reinforce the question why operant conditioning in the auditory domain is so (excruciatingly) difficult to achieve in the laboratory. Numerous studies have attempted to train monkeys in auditory discrimination tasks and typically needed hundreds of trials [60]. Why nonhuman primates struggle in such laboratory settings, while they may immediately comprehend what a sound predicts in more naturalistic settings remains a question for further investigation.
5. Overcoming the dichotomous view of primate vocal production learning
The available evidence for plasticity in nonhuman primate vocal production learning suggests two fundamental principles. First, the overall patterns seem to be relatively strongly genetically fixed—in some cases, not only at the level of the species but also at the level of the genus. Second, within the species-specific reaction norms, a certain degree of plasticity appears to be possible, resulting in minor modifications of vocal output in relation to experience. The idea of minor modification as a result of social experience within species-specific typical patterns is not new [61–63], yet we now know much more about the extent of this plasticity during development. At the same time, we also know more about the tight link between phylogenetic relationships and acoustic variation between species. In the light of the available evidence, we feel that it is time to overcome the debate of whether or not nonhuman primates have vocal production learning. To fundamentally advance our understanding of primate vocal production learning, we must turn to the mechanisms that support vocal adjustment in relation to (auditory) experience.
Along similar lines, Arriaga & Jarvis [64] proposed the ‘continuum hypothesis’ for vocal learning in a broader range of taxa (including mice and birds). Arriaga & Jarvis argued that progress in the field of vocal production learning may be hampered by the classification of species into ‘haves’ and ‘have-nots’. They distinguished between vocalizations based on a template and vocalizations generated de novo. Vocalizations based on a template could range from calls that are strictly determined by an innate central pattern generator, to those where the spectral–temporal structure is ‘guided by an externally acquired target (imitation-based modification of a template)’ [64, p. 112]. Vocalizations generated de novo encompass improvisation, i.e. the versatile use of elements that need not have been acquired by experience, as well as full mimicry, i.e. the ‘modification of vocal output guided by an externally acquired target’ [64, p. 113]. Song learning in song birds typically falls under this latter category, where young birds acquire the template of the adult song by exposure to their father's or other adults’ song during a sensitive phase, and then proceed through a phase of ‘babbling’ or practice, until the song crystallizes [65].
While we generally agree with Arriaga & Jarvis [64] that it is time to overcome the dichotomous view of vocal production learning, we caution that the assumption of a continuum may be premature, or even slightly misleading, as the adjustment of vocal output in relation to (auditory) experience may be mediated by very different mechanisms. A ‘many roads’ metaphor might therefore be more appropriate, at least for the time being. We have identified the following (non-exhaustive) list of mechanisms that underpin vocal production in primates (see [4] for the neurobiology underpinning vocal production in primates).
-
(i)
Null model: innate call types with potential for within- and between-call type variability in relation to arousal and valence (and perhaps potency) of the caller. No auditory input necessary. This model is sufficient to account for context- and urgency-related variation in nonhuman vocal production.
-
(ii)
General auditory facilitation: auditory input increases the likelihood of vocal production (irrespective of structure); if increased auditory input consistently leads to increased vocal output, then this may affect maturation processes and developmental trajectories. This may be a simple explanation for some of the variation observed in marmosets deprived of parental input, if, but only if, these deprived subjects indeed call less than their normally reared counterparts.
-
(iii)
Specific auditory facilitation: auditory input of specific call types facilitates the production of the corresponding call type by the listener. This mechanism presupposes that several variants of the same call type may be represented in the caller's brain, of which certain variants are preferentially activated in response to specific auditory input. This mechanism has been suggested to underlie ‘action-based learning’ [63] and may account for the emergence of group-specific calls observed in different primate species.
-
(iv)
Learning from success: subjects may learn that the production of specific call variants is more likely to elicit the desired response than other call types. Given that nonhuman primates appear to have a certain degree of control over call usage, this model may provide an alternative explanation for the occurrence of group-specific calls.
-
(v)
Vocal copying: the vocal production is shaped according to an auditory template stored as a short-term sensory representation.
-
(vi)
Template learning: an auditory template is acquired during a sensitive phase and vocal output modified to match that template.
-
(vii)
Innovation/improvisation: volitional generation of (literally) unheard voiced sounds.
We suggest that (i)–(iv) may be found in nonhuman primates, with (i) explaining the greatest amount of variation in nonhuman primate vocalizations. The evidence for (v) in nonhuman primates is weak (or difficult to judge), and for (vi) and (vii) largely absent, at least for voiced sounds. Note that after intensive training, some individuals such as Viki were able to produce an extremely small amount of ‘words’, including ‘cup’, ‘up’ and ‘mama’ (reviewed in [66]), but there seems to be no inclination on the animals' side to acquire speech. Instead, there is wide-ranging consensus that nonhuman primates are not obligate learners [4]. A crucial target for further research is to identify the mechanisms that give rise to acoustic changes in relation to social experience, as described for chimpanzees by Watson et al. [8]. In principle, the observed changes could be due to specific auditory facilitation or learning from success and need not constitute instances of vocal learning.
Although the proposed framework may be far from perfect, it stresses the importance of sensory–motor integration at a very basal level. Furthermore, the framework lends itself for empirical tests. Regarding the developmental trajectory of vocalizations, it is critical to control for practice and to assess the extent to which auditory input mostly functions as a trigger of infant vocal output (ii). To test (iii), one would need to expose subjects with specific call variants that fall within the species-typical frame, to test whether this alters the call characteristics of vocal output, compared with a baseline condition. Ethical considerations discourage the ideal experiment in which infants would be temporarily muted. Likewise, very detailed studies of the use of specific variants and the effects they may have on conspecifics may provide a test of (iv).
We assume that (i) to (iv) are ancestral and also present in humans. Humans also reveal minor adjustments of their speech in the form of ‘vocal accommodation’ [67]. Yet, the development of speech critically relies on a combination of (v) to (vii), with speech acquisition constituting a prime example of obligate auditory learning [68]. The list of mechanisms compiled above is compatible with the dual network (or dual pathway [69]) model of primate vocal production, which suggests an increasing integration of the more ancestral vocal motor network that produces species-specific calls with a largely fixed structure, with more derived cortical networks [70,71]. To summarize, we strongly suggest that future research should aim to clarify the mechanisms supporting variation in vocal output in relation to experience, while the question ‘whether or not’ nonhuman primates have vocal learning should be set aside. The questions are to what extent do they reveal vocal production learning, how is this mediated and how much is vocal production integrated with the processing of information from social and non-social sources, other than auditory input.
Acknowledgements
We thank Ludwig Ehrenreich for help with the figures.
Data accessibility
This article has no additional data.
Authors' contributions
J.F. and K.H. conceived the manuscript, wrote the paper and gave final approval of the version to be published.
Competing interests
We declare we have no competing interests.
Funding
This research is supported by the Leibniz ScienceCampus Primate Cognition funded by the Leibniz Association.
References
- 1.Janik VM, Slater PJB. 2000. The different roles of social learning in vocal communication. Anim. Behav. 60, 1–11. ( 10.1006/anbe.2000.1410) [DOI] [PubMed] [Google Scholar]
- 2.Seyfarth RM, Cheney DL. 1997. Some general features of vocal development in nonhuman primates. In Social influences on vocal development (eds Snowdon CT, Hausberger M), pp. 249–273. Cambridge, UK: Cambridge University Press. [Google Scholar]
- 3.Petkov CI, Jarvis ED. 2012. Birds, primates, and spoken language origins: behavioral phenotypes and neurobiological substrates. Front. Evol. Neurosci. 4, 12 ( 10.3389/fnevo.2012.00012) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Fischer J, Hage SR. 2019. Primate vocalization as a model for human speech: scopes and limits. In Human language: from genes and brains to behavior (ed. Hagoort P.), pp. 639–656. Cambridge, MA: MIT Press. [Google Scholar]
- 5.Hammerschmidt K, Freudenstein T, Jürgens U. 2001. Vocal development in squirrel monkeys. Behaviour 138, 1179–1204. ( 10.1163/156853901753287190) [DOI] [Google Scholar]
- 6.Winter PP, Handley D, Schott D. 1973. Ontogeny of squirrel monkey calls under normal conditions and under acoustic isolation. Behaviour 47, 230–239. ( 10.1163/156853973X00085) [DOI] [PubMed] [Google Scholar]
- 7.Owren MJ, Dieter JA, Seyfarth RM, Cheney DL. 1993. Vocalizations of rhesus (Macaca mulatta) and Japanese (M. fuscata) macaques cross-fostered between species show evidence of only limited modification. Dev. Psychobiol. 26, 389–406. ( 10.1002/dev.420260703) [DOI] [PubMed] [Google Scholar]
- 8.Watson SK, Townsend SW, Schel AM, Wilke C, Wallace EK, Cheng L, West V, Slocombe KE. 2015. Vocal learning in the functionally referential food grunts of chimpanzees. Curr. Biol. 25, 495–499. ( 10.1016/j.cub.2014.12.032) [DOI] [PubMed] [Google Scholar]
- 9.Watson SK, Townsend SW, Schel AM, Wilke C, Wallace EK, Cheng L, West V, Slocombe KE. 2015. Reply to Fischer et al. Curr. Biol. 25, R1030–R1031. ( 10.1016/j.cub.2015.09.024) [DOI] [PubMed] [Google Scholar]
- 10.Fischer J, Wheeler BC, Higham JP. 2015. Is there any evidence for vocal learning in chimpanzee food calls? Curr. Biol. 25, R1028– R1029. ( 10.1016/j.cub.2015.09.010) [DOI] [PubMed] [Google Scholar]
- 11.Crockford C, Boesch C. 2003. Context-specific calls in wild chimpanzees, Pan troglodytes verus: analysis of barks. Anim. Behav. 66, 115–125. ( 10.1006/anbe.2003.2166) [DOI] [Google Scholar]
- 12.Mitani JC, Hasegawa T, Gros-Louis J, Marler P, Byrne RW. 1992. Dialects in wild chimpanzees? Am. J. Primatol. 27, 233–243. ( 10.1002/ajp.1350270402) [DOI] [PubMed] [Google Scholar]
- 13.Marshall AJ, Wrangham RW, Arcadi AC. 1999. Does learning affect the structure of vocalizations in chimpanzees? Anim. Behav. 58, 825–830. ( 10.1006/anbe.1999.1219) [DOI] [PubMed] [Google Scholar]
- 14.Mitani JC, Hunley KL, Murdoch ME. 1999. Geographic variation in the calls of wild chimpanzees: a reassessment. Am. J. Primatol. 47, 133–151. () [DOI] [PubMed] [Google Scholar]
- 15.Marler P. 1969. Vocalizations of wild chimpanzees—an introduction. In Proc. 2nd Int. Cong. Primatology, Atlanta, 1968 (eds Carpenter CR, Hofer H), pp. 94–100. Basel, Switzerland: Karger. [Google Scholar]
- 16.Mitani JC, Macedonia JM. 1996. Selection for acoustic individuality within the vocal repertoire of wild chimpanzees. Int. J. Primatol. 17, 569–583. ( 10.1007/BF02735192) [DOI] [Google Scholar]
- 17.Zahed SR, Prudom SL, Snowdon CT, Ziegler TE. 2007. Male parenting and response to infant stimuli in the common marmoset (Callithrix jacchus). Am. J. Primatol. 70, 84–92. ( 10.1002/ajp.20460) [DOI] [PubMed] [Google Scholar]
- 18.Bezerra BM, Souto A. 2008. Structure and usage of the vocal repertoire of Callithrix jacchus. Int. J. Primatol. 29, 671–701. ( 10.1007/s10764-008-9250-0) [DOI] [Google Scholar]
- 19.Schrader L, Todt D. 1993. Contact call parameters covary with social context in common marmoset, Callithrix j. jacchus. Anim. Behav. 46, 1026–1028. ( 10.1006/anbe.1993.1288) [DOI] [Google Scholar]
- 20.Miller CT, Beck K, Meade B, Wang X. 2009. Antiphonal call timing in marmosets is behaviorally significant: interactive playback experiments. J. Comp. Physiol. A 195, 783–789. ( 10.1007/s00359-009-0456-1.Antiphonal) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Takahashi DY, Fenley AR, Teramoto Y, Narayanan DZ, Borjon JI, Holmes P, Ghazanfar AA. 2015. The developmental dynamics of marmoset monkey vocal production. Science 349, 734–738. ( 10.1126/science.aab1058) [DOI] [PubMed] [Google Scholar]
- 22.Kato M, Okanoya K, Koike T, Sasaki E, Okano H, Watanabe S, Iriki A. 2014. Human speech- and reading-related genes display partially overlapping expression patterns in the marmoset brain. Brain Lang. 133, 26–38. ( 10.1016/j.bandl.2014.03.007) [DOI] [PubMed] [Google Scholar]
- 23.Teramitsu I, Kudo LC, London SE, Geschwind DH, White SA. 2004. Parallel FoxP1 and FoxP2 expression in songbird and human brain predicts functional interaction. J. Neurosci. 24, 3152–3163. ( 10.1523/jneurosci.5589-03.2004) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Takahashi DY, Liao DA, Ghazanfar AA. 2017. Vocal learning via social reinforcement by infant marmoset monkeys. Curr. Biol. 27, 1844–1852.e6. ( 10.1016/j.cub.2017.05.004) [DOI] [PubMed] [Google Scholar]
- 25.Gultekin YB, Hage SR. 2017. Limiting parental feedback disrupts vocal development in marmoset monkeys. Nat. Commun. 8, 14046 ( 10.1038/ncomms14046) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Gultekin YB, Hage SR. 2018. Limiting parental interaction during vocal development affects acoustic call structure in marmoset monkeys. Sci. Adv. 4, 11 ( 10.1126/sciadv.aar4012) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Hammerschmidt K, Newman JD, Champoux M, Suomi SJ. 2000. Changes in rhesus macaque ‘coo’ vocalizations during early development. Ethology 106, 873–886. ( 10.1046/j.1439-0310.2000.00611.x) [DOI] [Google Scholar]
- 28.Fitch WT. 1997. Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. J. Acoust. Soc. Am. 102, 1213–1222. ( 10.1121/1.421048) [DOI] [PubMed] [Google Scholar]
- 29.Pfefferle D, Fischer J. 2006. Sounds and size: identification of acoustic variables that reflect body size in hamadryas baboons, Papio hamadryas. Anim. Behav. 72, 43–51. ( 10.1016/j.anbehav.2005.08.021) [DOI] [Google Scholar]
- 30.Häusler U. 2000. Vocalization-correlated respiratory movements in the squirrel monkey. J. Acoust. Soc. Am. 108, 1443–1450. [DOI] [PubMed] [Google Scholar]
- 31.Fischer J, et al. 2017. Charting the neglected west: the social system of Guinea baboons. Am. J. Phys. Anthropol. 162, 15–31. ( 10.1002/ajpa.23144) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hammerschmidt K, Fischer J. 2019. Baboon vocal repertoires and the evolution of primate vocal diversity. J. Hum. Evol. 126, 1–13. ( 10.1016/j.jhevol.2018.10.010) [DOI] [PubMed] [Google Scholar]
- 33.Rendall D, Seyfarth RM, Cheney DL, Owren MJ. 1999. The meaning and function of grunt variants in baboons. Anim. Behav. 57, 583–592. ( 10.1006/anbe.1998.1031) [DOI] [PubMed] [Google Scholar]
- 34.Meise K, Keller C, Cowlishaw G, Fischer J. 2011. Sources of acoustic variation: implications for production specificity and call categorization in chacma baboon (Papio ursinus) grunts. J. Acoust. Soc. Am. 129, 1631–1641. ( 10.1121/1.3531944) [DOI] [PubMed] [Google Scholar]
- 35.Ey E, Hammerschmidt K, Seyfarth RM, Fischer J. 2007. Age- and sex-related variations in clear calls of Papio ursinus. Int. J. Primatol. 28, 947–960. ( 10.1007/s10764-007-9139-3) [DOI] [Google Scholar]
- 36.Fischer J, Kitchen DM, Seyfarth RM, Cheney DL. 2004. Baboon loud calls advertise male quality: acoustic features and their relation to rank, age, and exhaustion. Behav. Ecol. Sociobiol. 56, 140–148. ( 10.1007/s00265-003-0739-4) [DOI] [Google Scholar]
- 37.Meyer D, Hodges JK, Rinaldi D, Wijaya A, Roos C, Hammerschmidt K. 2012. Acoustic structure of male loud-calls support molecular phylogeny of Sumatran and Javanese leaf monkeys (genus Presbytis). BMC Evol. Biol. 12, 16 ( 10.1186/1471-2148-12-16) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Van Ngoc T, Hallam C, Roos C, Hammerschmidt K. 2011. Concordance between vocal and genetic diversity in crested gibbons. BMC Evol. Biol. 11, 36 ( 10.1186/1471-2148-11-36) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wilkins MR, Seddon N, Safran RJ. 2013. Evolutionary divergence in acoustic signals: causes and consequences. Trends Ecol. Evol. 28, 156–166. ( 10.1016/j.tree.2012.10.002) [DOI] [PubMed] [Google Scholar]
- 40.Freeberg TM. 2006. Social complexity can drive vocal complexity: group size influences vocal information in Carolina chickadees. Psychol. Sci. 17, 557–561. ( 10.1111/j.1467-9280.2006.01743.x) [DOI] [PubMed] [Google Scholar]
- 41.Wheeler BC, Fischer J. 2012. Functionally referential signals: a promising paradigm whose time has passed. Evol. Anthropol. 21, 195–205. ( 10.1002/evan.21319) [DOI] [PubMed] [Google Scholar]
- 42.Zuberbühler K. 2003. Referential signaling in non-human primates—cognitive precursors and limitations for the evolution of language. Adv. Study Behav. 33, 265–307. ( 10.1016/S0065-3454(03)33006-2) [DOI] [Google Scholar]
- 43.Townsend SW, Manser MB. 2013. Functionally referential communication in mammals: the past, present and the future. Ethology 119, 1–11. ( 10.1111/eth.12015) [DOI] [Google Scholar]
- 44.Marler P, Evans CS, Hauser MD. 1992. Animal signals: motivational, referential, or both? In Nonverbal vocal communication (eds Papoušek H, Jürgens U, Papoušek M), pp. 66–86. Cambridge, UK: Cambridge University Press. [Google Scholar]
- 45.Arnold K, Zuberbühler K. 2008. Meaningful call combinations in a non-human primate. Curr. Biol. 18, 202–203. ( 10.1016/j.cub.2008.01.040) [DOI] [PubMed] [Google Scholar]
- 46.Schamberg I, Cheney DL, Clay Z, Hohmann G, Seyfarth RM. 2016. Call combinations, vocal exchanges and interparty movement in wild bonobos. Anim. Behav. 122, 109–116. ( 10.1016/j.anbehav.2016.10.003) [DOI] [Google Scholar]
- 47.Coye C, Ouattara K, Arlet ME, Lemasson A, Zuberbühler K. 2018. Flexible use of simple and combined calls in female Campbell's monkeys. Anim. Behav. 141, 171–181. ( 10.1016/j.anbehav.2018.05.014) [DOI] [Google Scholar]
- 48.Struhsaker TT. 1967. Auditory communication among vervet monkeys (Cercopitheceus aethiops). In Social communication among primates (ed. Altmann S.), pp. 281–324. Chicago, IL: University of Chicago Press. [Google Scholar]
- 49.Seyfarth RM, Cheney DL, Marler P. 1980. Vervet monkey alarm calls: semantic communication in a free-ranging primate. Anim. Behav. 28, 1070–1094. ( 10.1016/S0003-3472(80)80097-2) [DOI] [Google Scholar]
- 50.Price T, Ndiaye O, Hammerschmidt K, Fischer J. 2014. Limited geographic variation in the acoustic structure of and responses to adult male alarm barks of African green monkeys. Behav. Ecol. Sociobiol. 68, 815–825. ( 10.1007/s00265-014-1694-y) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Price T, Fischer J. 2014. Meaning attribution in the West African green monkey: influence of call type and context. Anim. Cogn. 17, 277–286. ( 10.1007/s10071-013-0660-9) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Arnold K, Pohlner Y, Zuberbühler K. 2008. A forest monkey's alarm call series to predator models. Behav. Ecol. Sociobiol. 62, 549–559. ( 10.1007/s00265-007-0479-y) [DOI] [Google Scholar]
- 53.Wegdell FL, Hammerschmidt K, Fischer J. 2019. Conserved alarm calls but rapid auditory learning in monkey responses to novel flying objects. Nat. Ecol. Evol. 3, 1039–1042. ( 10.1038/s41559-019-0903-5) [DOI] [PubMed] [Google Scholar]
- 54.Fischer J, Wegdell FL, Hammerschmidt K. 2019. Data from: Conserved alarm calls but rapid auditory learning in monkey responses to novel flying objects. OSF ( 10.17605/OSF.IO/F4UTP) [DOI]
- 55.Mundry R, Sommer C. 2007. Discriminant function analysis with nonindependent data: consequences and an alternative. Anim. Behav. 74, 965–976. ( 10.1016/j.anbehav.2006.12.028) [DOI] [Google Scholar]
- 56.Price T, Wadewitz P, Cheney DL, Seyfarth RM, Hammerschmidt K, Fischer J. 2015. Vervets revisited: a quantitative analysis of alarm call structure and context specificity. Scient. Rep. 5, 13220 ( 10.1038/srep13220) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Perelman P, et al. 2011. A molecular phylogeny of living primates. PLoS Genet. 7, e1001342 ( 10.1371/journal.pgen.1001342) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Warren WC, et al. 2015. The genome of the vervet (Chlorocebus aethiops sabaeus). Genome Res. 4, 1921–1933. ( 10.1101/gr.192922.115.25) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Fischer J, Price T. 2017. Meaning, intention, and inference in primate vocal communication. Neurosci. Biobehav. Rev. 82, 22–31. ( 10.1016/j.neubiorev.2016.10.014) [DOI] [PubMed] [Google Scholar]
- 60.Scott BH, Mishkin M. 2016. Auditory short-term memory in the primate auditory cortex. Brain Res. 1640, 264–277. ( 10.1016/j.brainres.2015.10.048) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Fischer J. 2002. Developmental modifications in the vocal behaviour of nonhuman primates. In Primate audition (ed. Ghazanfar AA.), pp. 109–125. Boca Raton, FL: CRC Press. [Google Scholar]
- 62.Hammerschmidt K, Fischer J. 2008. Constraints in primate vocal production. In Evolution of communicative flexibility: complexity, creativity, and adaptability in human and animal communication (eds Oller DK, Griebel U), pp. 93–119. Cambridge, MA: MIT Press. [Google Scholar]
- 63.Fischer J. 2008. Transmission of acquired information in nonhuman primates. In Learning and memory: a comprehensive reference (eds Menzel R, Byrne J), pp. 299–313. Oxford, UK: Elsevier. [Google Scholar]
- 64.Arriaga G, Jarvis ED. 2013. Mouse vocal communication system: are ultrasounds learned or innate? Brain Lang. 124, 96–116. ( 10.1016/j.bandl.2012.10.002) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Catchpole CK, Slater PJB. 2008. Bird song: biological themes and variations, 2nd edn Cambridge, UK: Cambridge University Press. [Google Scholar]
- 66.Wallman J. 1992. Aping language. Cambridge, UK: Cambridge University Press. [Google Scholar]
- 67.Snowdon CT. 1997. Affiliative processes and vocal development. Ann. N. Y. Acad. Sci. 807, 340–351. ( 10.1111/j.1749-6632.1997.tb51931.x) [DOI] [PubMed] [Google Scholar]
- 68.Westermann G, Mani N (eds). 2017. Early word learning (Current issues in developmental psychology series). Oxford, UK: Taylor & Francis. [Google Scholar]
- 69.Jürgens U. 2009. The neural control of vocalization in mammals: a review. J. Voice 23, 1–10. ( 10.1016/j.jvoice.2007.07.005) [DOI] [PubMed] [Google Scholar]
- 70.Hage SR, Nieder A. 2016. Dual neural network model for the evolution of speech and language. Trends Neurosci. 39, 813–829. ( 10.1016/j.tins.2016.10.006) [DOI] [PubMed] [Google Scholar]
- 71.Owren MJ, Amoss RT, Rendall D. 2011. Two organizing principles of vocal production: implications for nonhuman and human primates. Am. J. Primatol. 73, 530–544. ( 10.1002/ajp.20913) [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
This article has no additional data.