Skip to main content
Springer logoLink to Springer
. 2011 Dec 10;39(3):415–418. doi: 10.1007/s11692-011-9151-6

Orangutan Instrumental Gesture-Calls: Reconciling Acoustic and Gestural Speech Evolution Models

Adriano R Lameira 1,, Madeleine E Hardus 2, Serge A Wich 3,4
PMCID: PMC3423562  PMID: 22923853

Call control allows an organism to produce an acoustic signal irrespective of its own underlying emotional state. It is thus a prerequisite to “higher” abilities, such as call imitation, innovation and the use of arbitrary or deceptive calls, and therefore to speech. However, among primates, call control is presumed to be greatly confined to humans (Seyfarth and Cheney 2008). Consequently, there is little agreement about its evolutionary precursors (Christiansen and Kirby 2003). Essentially two major models and lines of evidence have been proposed; speech evolved (1) as an extension of acoustic communication in non-human primates (e.g. Seyfarth et al. 1980; Slocombe and Zuberbühler 2005; Arnold and Zuberbühler 2006; Wich et al. 2009) or (2) from non-human primate gestural communication (e.g. Rizzolatti and Arbib 1998; Corballis 2003; Arbib Michael et al. 2008). These models have been seen as mutually exclusive or as sequential accounts in which calls replace gestures (Brown et al. 1999), however, both face limitations concerning the emergence of call control in our evolutionary lineage. Did call control derive from an essentially emotional call use, or from an essentially voluntary gesture use, as that of non-human primates? The acoustic model needs to explain how a fundamentally close-ended acoustic system became open-ended (i.e. with limitless number of elements; alike speech). The gestural model needs to clarify the behaviors and respective functional advantages that allowed a shift (or “translation”) from an open-ended gestural system to an open-ended acoustic system.

Other important evolutionary models, such as, on syntax (e.g. Scott-Phillips and Kirby 2010), protolanguage (e.g. Mithen 2005), musilanguage (e.g. Brown et al. 1999), linguistic categories (e.g. Puglisi et al. 2008), increased breathing control (e.g. Maclarnon and Hewitt 2004) and iterated learning (e.g. Smith et al. 2003), some of which merge acoustic and gestural models, such as, on Motherese (e.g. Falk 2004) and frame/content (e.g. MacNeilage 1998), commonly begin with a hypothetical organism that is equipped a priori with call control, or overlook the behaviors that may have provided the functional advantages towards call control. We propose that recent orangutan (Pongo pygmaeus wurmbii) findings answer and reconcile the limitations of these models. Arguments supporting the above mentioned models are compatible with the view presented.

Recently we have described (Hardus et al. 2009a) how and why wild orangutans use gestures to functionally alter the acoustic characteristics of a particular sound (sensu Lameira et al. 2010) emitted under disturbing contexts, the kiss squeak (Hardus et al. 2009b). By positioning a hand or holding leaves in front of their lips, wild orangutans lower the maximum frequency (i.e. that of highest dB) but maintain other parameters of the call similar. Evidence suggests that kiss squeaks are under voluntary motor control in orangutans, and when individuals produce these modified variants of the call, they sound as if their body size is bigger than it actually is, reinforcing this impression on a potential predator and potentially deterring it through functional deception.

Kiss squeaks with a hand and on leaves represent, to our best knowledge, the only example of instrumental gesture-calls (IGC) in non-human primates. They can be defined as gestures that modify oro-laryngeal acoustic production, with or without tools, such as finger-assisted whistling or brass-/woodwind-instrument playing. In order to achieve this acoustic modification, some sort of physical contact between hands/tools and lips, and possibly tongue, is critically required. Mere physical proximity is unlikely to modify a call considerably, as for instance, when “loud speaking” through funneled hands. These gestures are importantly distinct from gestures that produce an acoustic signal themselves, with or without tools, and that can be made during call production. Such acoustic gesture-calls have been reported in other ape species (Arcadi et al. 1998) and are possibly present in most non-human primate species, such as when making noisy displays during loud calls and/or alarm calling, by slapping the ground or strongly striking branches. Heuristically, gestures may be considered additive in acoustic gesture-calls, whereas gestures in IGC may be considered multiplicative.

IGC in hominids multiply the number of call-types comprising the acoustic repertoire in an extremely simple way: one call-type used in combination with different gestures produces new call-types. That is, the potential to augment its innate acoustic repertoire can be achieved solely by means of an ability already present—gesture control. It is very likely that our ape/hominid ancestors would have exploited such “new” repertoire when available, as means to transmit more (graded) information, since cognitive abilities in non-human primates have been demonstrated to be richer and more advanced than their acoustic counterparts (Seyfarth and Cheney 2010).

We hypothesise that IGC, dating back to the hominid-pongid split (9–13 m.y.a.; Hobolth et al. 2011) may have provided the direct functional and neural sensory-motor basis towards call control in an early human ancestor essentially lacking this ability, that is, they served as an exaptation for this ability. IGC are remarkable in that they bring into close temporal, motivational, contextual, anatomical and functional association both the gestural and oro-laryngeal systems of motor control in the communication domain. Hand-assisted feeding, for instance, raises the same associations between gestural and oro-laryngeal systems of motor control but in the foraging domain. IGC comprise therefore, obligatorily, the expression of synchronous activations of multiple neural sensory-motor systems in the ape brain. In the ape cerebral cortex, such activations will mainly occur within regions homologous to the cortical homunculus (that comprises the primary motor cortex, which plays a crucial role in general voluntary motor control) and between the cortical homunculus and other cortical systems involved in the domain of communication, such as those homologous to Broca’s and Wernicke’s areas (Taglialatela et al. 2011). Such synchronous activations may have provided a neural interface between the brain areas activated, through functional integration and clustering (Tononi et al. 1998a, b) enabling the sharing of abilities which were previously fundamentally restricted or segregated to particular areas. By means of cortical and neural plasticity (Lieberman 2002a), alike for example, use-dependent functional reorganization of sensory cortices (Pantev et al. 1998), this interface would have set the basis for the establishment of enhanced and more resilient short and long distance circuits. Indeed, cortical and neural plasticity is at the basis of hemispheric asymmetries in key areas of the ape and human brain for communicative signaling (Hopkins and Nir 2010; Perani et al. 2011).

As the focus of voluntary control, the cortical homunculus would represent the main stage for these circuit modifications. The number of areas activated in this area and their mutual proximity would add up to form a momentary local hotspot of activations sufficient to ignite neighbouring areas over which there was previously little voluntary control. Namely, circuitry between the respiration, hand, face, lips, and tongue (somatotopic) locations would expand to include that of larynx areas. These circuits would not necessarily be required to be established de novo, but instead, would only be required to modestly build and expand on previously existing ones. For instance, a rudimentary but functionally relevant interface between hand, respiration and laryngeal locations (and possibly lips and tongue) is already present in the ape brain, in that use of the right hand for gestures is significantly enhanced when the gestures are accompanied by a call (Hopkins and Cantero 2003). At the same time, pathways between the primary motor cortex and nucleus ambiguous (site of the laryngeal motor-neurons in medulla oblongata), which are specifically interpreted as representing a crucial neural step in gaining call control (Fitch 2005; Brown et al. 2008), are found in apes but not in monkeys (Kuypers 1958), substantiating the view that an rudimentary interface is already present between systems.

In humans, neuroimaging studies support this evolutionary scenario. For instance, the (somatotopic) location of larynx/phonation area (that with control over intrinsic musculature of the larynx, underlying adduction/abduction and tensing/relaxing of the vocal folds) in the cortical homunculus is adjacent to the lips area and the expiratory area (Brown et al. 2008). This means that in humans, phonation, articulation and respiration are neurologically conjunct. Considering that orangutans have been experimentally demonstrated to exert apt voluntary motor control over lips and respiration (Wich et al. 2009; Lameira et al. in review), it is reasonable to view this conjunction as evolutionarily relevant in humans. While laryngeal musculature may operate in complex ways during (online) speech and other functions (Jürgens 2002; Ludlow 2005), the evolutionary genesis of call control theoretically commenced when the first rudimentary neural signal initiating in the primary motor cortex would be transmitted successfully simply to set the larynx into position during air-flow. The view that neural circuitry flexibility could have successfully achieved this in our ancestors is supported by a phenomenon known in human as motor equivalence, where speakers develop different motor strategies, i.e., use different musculatures, of the larynx to achieve the same voice outcome (Ludlow 2005). Accordingly, IGC could potentially explain why the area of representation of the intrinsic laryngeal muscles has seemingly migrated toward the labial area in humans (Brown et al. 2008). In addition, IGC are in concordance with the increasing literature corroborating that gestures and calls/speech are neurally co-processed (e.g. Rizzolatti and Arbib 1998; Bernardis and Gentilucci 2006; Xu et al. 2009).

At the same time, these bimodal behaviors represent cultural variants of orangutan behavior (e.g. van Schaik et al. 2003). Accordingly, enhanced neural connectivity would have also developed across brain systems in areas involved in processing social information, emotional valence and learning, such as the amygdala and the auditory cortex (Remedios et al. 2009). Thus, brain-language (Deacon 1998), biology-culture (Richerson and Boyd 2005) and music-language premises (Brown 1999) are concordant with the IGC hypothesis.

IGC present a parsimonious route to human-like neurophysiology, increased call control and repertoire size in the earliest stages of speech evolution, but one may question its relevance based on the phylogenetic distance between orangutans and humans. Three clarifications are required. Firstly, comparison between human, chimpanzee and orangutan genomes shows that some regions of the human genome more closely resemble orangutan’s (Hobolth et al. 2011). Although this percentage is approximately 1%, a necessarily bigger percentage is equally similar between humans, chimpanzees and orangutans. While broad genetic underpinnings of speech are not well understood beyond FoxP2 gene (e.g. Enard et al. 2002), the relevance of genetic proximity within hominoids remains equivocal. Secondly, speech is a bio-cultural evolutionary phenomenon (Richerson and Boyd 2005), and therefore, theories must encompass some degree of interaction between social and genetic mechanisms in the acquisition and transmission of communication signals. Orangutans and chimpanzees are the only apes to show extensive cultures in the wild (e.g. Whiten et al. 1999; van Schaik et al. 2003), thus, both species represent promising models. Thirdly, the description of IGC in orangutans but (so far) not in chimpanzees may constitute a methodological artifact. While cultural variants between populations have been investigated in wild chimpanzees, this record tends to focus on feeding behavior (Watson and Caldwell 2009). Oppositely, researchers have investigated geographical variation in orangutans’ complete call repertoire (Hardus et al. 2009b). These conditions may have benefited the description of IGC more readily than in chimpanzees. There are nonetheless anecdotes suggesting that IGC may be part of their repertoire, such as the use of a hand in front of the mouth to muffle a call, as described by Jane Goodall (Deacon 1998).

This essay presents a new view on the earliest stages of speech evolution, based on orangutan IGC. It builds on the concept that enhanced linguistic ability cannot be totally differentiated from enhanced motor activity (Lieberman 2002b), and argues that IGC may have constituted speech exaptations, providing functional advantages in a human ancestor essentially lacking call control but allowing the emergence of the neural and communicative basis for subsequent selection favouring basic abilities for speech. This view provides a new concrete model organism, similar in its abilities of (1) call control, (2) call repertoire size and (3) reliance on social learning as those observed in orangutans for future speech evolution models.

Acknowledgments

ARL was financially supported by Fundação para a Ciência e Tecnologia (SFRH/BD/44437/2008). We thank Carel van Schaik, Asif Ghazanfar and two anonymous reviewers for comments on previous versions of the manuscript.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

References

  1. Arbib Michael A, Liebal K, Pika S. Primate vocalization, gesture, and the evolution of human language. Current Anthropology. 2008;49(6):1053–1076. doi: 10.1086/593015. [DOI] [PubMed] [Google Scholar]
  2. Arcadi A, Robert D, Boesch C. Buttress drumming by wild chimpanzees: Temporal patterning, phrase integration into loud calls, and preliminary evidence for individual distinctiveness. Primates. 1998;39(4):505–518. doi: 10.1007/BF02557572. [DOI] [Google Scholar]
  3. Arnold K, Zuberbühler K. Semantic combinations in primate calls. Nature. 2006;441(7091):303. doi: 10.1038/441303a. [DOI] [PubMed] [Google Scholar]
  4. Bernardis P, Gentilucci M. Speech and gesture share the same communication system. Neuropsychologia. 2006;44(2):178–190. doi: 10.1016/j.neuropsychologia.2005.05.007. [DOI] [PubMed] [Google Scholar]
  5. Brown S. The “musilanguage” model of music evolution. In: Wallin NL, Merker B, Brown S, editors. The origins of music. Cambridge, MA: MIT Press; 1999. pp. 271–300. [Google Scholar]
  6. Brown S, Merker B, Wallin NL. An introduction to evolutionary musicology. In: Wallin NL, Merker B, Brown S, editors. The origins of music. Cambridge, MA: MIT Press; 1999. pp. 3–24. [Google Scholar]
  7. Brown S, Ngan E, Liotti M. A larynx area in the human motor cortex. Cerebral Cortex. 2008;18(4):837–845. doi: 10.1093/cercor/bhm131. [DOI] [PubMed] [Google Scholar]
  8. Christiansen MH, Kirby S. Language evolution: Consensus and controversies. Trends in Cognitive Sciences. 2003;7(7):300–307. doi: 10.1016/S1364-6613(03)00136-0. [DOI] [PubMed] [Google Scholar]
  9. Corballis MC. From mouth to hand: Gesture, speech, and the evolution of right-handedness. Behavioral and Brain Sciences. 2003;26(02):199–208. doi: 10.1017/s0140525x03000062. [DOI] [PubMed] [Google Scholar]
  10. Deacon T. The symbolic species: The co-evolution of language and the brain. NY: Norton; 1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Enard W, Przeworski M, Fisher SE, Lai CSL, Wiebe V, Kitano T, Monaco AP, Paabo S. Molecular evolution of FOXP2, a gene involved in speech and language. Nature. 2002;418(6900):869–872. doi: 10.1038/nature01025. [DOI] [PubMed] [Google Scholar]
  12. Falk D. Prelinguistic evolution in early hominins: Whence motherese? Behavioral and Brain Sciences. 2004;27(04):491–503. doi: 10.1017/s0140525x04000111. [DOI] [PubMed] [Google Scholar]
  13. Fitch WT. Protomusic and protolanguage as alternatives to protosign. Behavioral and Brain Sciences. 2005;28(02):132–133. doi: 10.1017/S0140525X05290039. [DOI] [Google Scholar]
  14. Hardus ME, Lameira AR, van Schaik CP, Wich SA. Tool use in wild orang-utans modifies sound production: A functionally deceptive innovation? Proceedings of the Royal Society B: Biological Sciences. 2009;276(1673):3689–3694. doi: 10.1098/rspb.2009.1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hardus ME, Lameira AR, Singleton I, Morrough-Bernard HC, Knott CD, Ancrenaz M, Utami SS, Wich S. A description of the orangutan’s vocal and sound repertoire, with a focus on geographic variation. In: Wich S, Mitra Setia T, Utami SS, Schaik CP, editors. Orangutans. New York: Oxford University Press; 2009. pp. 49–60. [Google Scholar]
  16. Hobolth A, Dutheil JY, Hawks J, Schierup MH, Mailund T. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Research. 2011;21:349–356. doi: 10.1101/gr.114751.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hopkins WD, Cantero M. From hand to mouth in the evolution of language: The influence of vocal behavior on lateralized hand use in manual gestures by chimpanzees (Pan troglodytes) Developmental Science. 2003;6(1):55–61. doi: 10.1111/1467-7687.00254. [DOI] [Google Scholar]
  18. Hopkins WD, Nir TM. Planum temporale surface area and grey matter asymmetries in chimpanzees (Pan troglodytes): The effect of handedness and comparison with findings in humans. Behavioural Brain Research. 2010;208(2):436–443. doi: 10.1016/j.bbr.2009.12.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Jürgens U. Neural pathways underlying vocal control. Neuroscience and Biobehavioral Reviews. 2002;26(2):235–258. doi: 10.1016/S0149-7634(01)00068-9. [DOI] [PubMed] [Google Scholar]
  20. Kuypers MGJM. Some projections from the peri-central cortex to the pons and lower brain stem in monkeys and chimpanzee. Journal of Comparative Neurology. 1958;110:211–255. doi: 10.1002/cne.901100205. [DOI] [PubMed] [Google Scholar]
  21. Lameira A, Delgado R, Wich S. Review of geographic variation in terrestrial mammalian acoustic signals: Human speech variation in a comparative perspective. Journal of Evolutionary Psychology. 2010;8(4):309–332. doi: 10.1556/JEP.8.2010.4.2. [DOI] [Google Scholar]
  22. Lameira, A. R., Hardus, M. E., Kowalsky, B., de Vries, H., Spruijt, B. M., Sterck, E. H. M., et al. in review. Orangutan whistling and implications for the emergence of an open-ended call repertoire: A replication and extension. Journal of the Acoustical Society of America. [DOI] [PubMed]
  23. Lieberman P. On the nature and evolution of the neural bases of human language. American Journal of Physical Anthropology. 2002;119(S35):36–62. doi: 10.1002/ajpa.10171. [DOI] [PubMed] [Google Scholar]
  24. Lieberman P. Human language and our reptilian brain. Cambridge, Massachusetts and London, England: Harvard University Press; 2002. [Google Scholar]
  25. Ludlow CL. Central nervous system control of the laryngeal muscles in humans. Respiratory Physiology & Neurobiology. 2005;147(2–3):205–222. doi: 10.1016/j.resp.2005.04.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Maclarnon A, Hewitt G. Increased breathing control: Another factor in the evolution of human language. Evolutionary Anthropology: Issues, News and Reviews. 2004;13(5):181–197. doi: 10.1002/evan.20032. [DOI] [Google Scholar]
  27. MacNeilage PF. The frame/content theory of evolution of speech production. Behavioral and Brain Sciences. 1998;21(04):499–511. doi: 10.1017/s0140525x98001265. [DOI] [PubMed] [Google Scholar]
  28. Mithen SJ. The singing Neanderthals: The origins of music, language, mind and body. London: Cambridge Journals Online; 2005. pp. 97–112. [Google Scholar]
  29. Pantev C, Oostenveld R, Engelien A, Ross B, Roberts LE, Hoke M. Increased auditory cortical representation in musicians. Nature. 1998;392(6678):811–814. doi: 10.1038/33918. [DOI] [PubMed] [Google Scholar]
  30. Perani D, Saccuman MC, Scifo P, Awander A, Spada D, Baldoli C, Poloniato A, Lohmann G, Friederici AD. Neural language networks at birth. Proceedings of the National Academy of Sciences. 2011;108(38):16056–16061. doi: 10.1073/pnas.1102991108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Puglisi A, Baronchelli A, Loreto V. Cultural route to the emergence of linguistic categories. Proceedings of the National Academy of Sciences. 2008;105(23):7936–7940. doi: 10.1073/pnas.0802485105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Remedios R, Logothetis NK, Kayser C. Monkey drumming reveals common networks for perceiving vocal and nonvocal communication sounds. Proceedings of the National Academy of Sciences. 2009;106(42):18010–18015. doi: 10.1073/pnas.0909756106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Richerson P, Boyd R. Not by genes alone: How culture transformed human evolution. Chicago: University of Chicago Press; 2005. [Google Scholar]
  34. Rizzolatti G, Arbib MA. Language within our grasp. Trends in Neurosciences. 1998;21(5):188–194. doi: 10.1016/S0166-2236(98)01260-0. [DOI] [PubMed] [Google Scholar]
  35. Scott-Phillips TC, Kirby S. Language evolution in the laboratory. Trends in Cognitive Sciences. 2010;14(9):411–417. doi: 10.1016/j.tics.2010.06.006. [DOI] [PubMed] [Google Scholar]
  36. Seyfarth R, Cheney D. Primate social knowledge and the origins of language. Mind & Society. 2008;7(1):129–142. doi: 10.1007/s11299-007-0038-2. [DOI] [Google Scholar]
  37. Seyfarth RM, Cheney DL. Production, usage, and comprehension in animal vocalizations. Brain and Language. 2010;115(1):92–100. doi: 10.1016/j.bandl.2009.10.003. [DOI] [PubMed] [Google Scholar]
  38. Seyfarth RM, Cheney DL, Marler P. Monkey responses to three different alarm calls—evidence of predator classification and semantic communication. Science. 1980;210(4471):801–803. doi: 10.1126/science.7433999. [DOI] [PubMed] [Google Scholar]
  39. Slocombe KE, Zuberbühler K. Functionally referential communication in a chimpanzee. Current Biology. 2005;15(19):1779–1784. doi: 10.1016/j.cub.2005.08.068. [DOI] [PubMed] [Google Scholar]
  40. Smith K, Kirby S, Brighton H. Iterated learning: A framework for the emergence of language. Artificial Life. 2003;9(4):371–386. doi: 10.1162/106454603322694825. [DOI] [PubMed] [Google Scholar]
  41. Taglialatela JP, Russell JL, Schaeffer JA, Hopkins WD. Chimpanzee vocal signaling points to a multimodal origin of human language. PLoS ONE. 2011;6(4):e18852. doi: 10.1371/journal.pone.0018852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Tononi G, McIntosh AR, Russell DP, Edelman GM. Functional clustering: Identifying strongly interactive brain regions in neuroimaging data. NeuroImage. 1998;7(2):133–149. doi: 10.1006/nimg.1997.0313. [DOI] [PubMed] [Google Scholar]
  43. Tononi G, Edelman GM, Sporns O. Complexity and coherency: Integrating information in the brain. Trends in Cognitive Sciences. 1998;2(12):474–484. doi: 10.1016/S1364-6613(98)01259-5. [DOI] [PubMed] [Google Scholar]
  44. van Schaik CP, Ancrenaz M, Borgen G, Galdikas B, Knott CD, Singleton I, Suzuki A, Utami SS, Merrill M. Orangutan cultures and the evolution of material culture. Science. 2003;299(5603):102–105. doi: 10.1126/science.1078004. [DOI] [PubMed] [Google Scholar]
  45. Watson C, Caldwell C. Understanding behavioral traditions in primates: Are current experimental approaches too focused on food? International Journal of Primatology. 2009;30(1):143–167. doi: 10.1007/s10764-009-9334-5. [DOI] [Google Scholar]
  46. Whiten A, Goodall J, McGrew WC, Nishida T, Reynolds V, Sugiyama Y, Tutin CEG, Wrangham RW, Boesch C. Cultures in chimpanzees. Nature. 1999;399(6737):682–685. doi: 10.1038/21415. [DOI] [PubMed] [Google Scholar]
  47. Wich S, Swartz K, Hardus M, Lameira A, Stromberg E, Shumaker R. A case of spontaneous acquisition of a human sound by an orangutan. Primates. 2009;50(1):56–64. doi: 10.1007/s10329-008-0117-y. [DOI] [PubMed] [Google Scholar]
  48. Xu J, Gannon PJ, Emmorey K, Smith JF, Braun AR. Symbolic gestures and spoken language are processed by a common neural system. Proceedings of the National Academy of Sciences. 2009;106(49):20664–20669. doi: 10.1073/pnas.0909197106. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Evolutionary Biology are provided here courtesy of Springer

RESOURCES