Marmoset vocal behavior shows high plasticity throughout development, a crucial precursor for human vocal development.
Abstract
The vocal behavior of human infants undergoes marked changes across their first year while becoming increasingly speech-like. Conversely, vocal development in nonhuman primates has been assumed to be largely predetermined and completed within the first postnatal months. Contradicting this assumption, we found a dichotomy between the development of call features and vocal sequences in marmoset monkeys, suggestive of a role for experience. While changes in call features were related to physical maturation, sequences of and transitions between calls remained flexible until adulthood. As in humans, marmoset vocal behavior developed in stages correlated with motor and social development stages. These findings are evidence for a prolonged phase of plasticity during marmoset vocal development, a crucial primate evolutionary preadaptation for the emergence of vocal learning and speech.
INTRODUCTION
During the early development of speech in humans, the properties of infant vocalizations markedly change through several developmental stages (1, 2). Directly after birth, infants start vocalizing with genetically predetermined, arousal-based vocalizations through endogenous vocal exploration. This initial vocal repertoire consists of non-speech-like preverbal calls (vegetative sounds, cries, or moans) and prelinguistic, speech-like vocalizations (protophones) (3, 4). This stage is followed by one during which infants produce vocalizations consisting of continuous or interrupted phonations (babbling). These utterances become increasingly speech-like during the first year of life (5, 6). For example, human infant cry rates decrease as their vocal behavior gradually becomes more complex within the first postnatal years (1, 2, 5, 7, 8). Several factors are considered particularly critical for vocal ontogeny, including maturation, learning, and early contingent social interaction (9).
In contrast, all features of vocal development in nonhuman primates were thought for decades to be predominantly a result of physical growth and maturation and, therefore, largely independent of learning processes or external factors (10–12). Previous studies showed that deafness, social isolation, and parental absence have little or no effect on vocal development in squirrel and macaque monkeys (11, 13), leading to the conclusion that monkey vocalizations are largely fixed and lack plasticity. Moreover, previous work revealed that vocal development is accomplished within a few weeks after birth in most monkey species and, therefore, that these animals seem to be equipped with the adult vocal repertoire as early as the juvenile or adolescent stage (12, 14, 15).
Recent experiments in marmoset monkeys, a highly social and loquacious New World monkey species, revealed a degree of vocal flexibility during infancy: providing contingent vocal feedback developed their vocalizations faster, exhibiting a transition from immature to mature call types earlier than in animals with less feedback (16). But also here, vocal development appeared to be accomplished within the first 2 postnatal months. Subsequent experiments, however, revealed that the vocal behavior of marmosets can be affected at later time points by changes in social reinforcement of juvenile animals (17, 18), indicating that vocal development might last longer in marmosets than previously thought.
So far, work on vocal development in nonhuman primates has predominantly focused on the first postnatal weeks and disregarded potential changes accompanying later growth. We sought to bridge the gap between infant and adult vocal behavior by longitudinally tracking the trajectory of vocal ontogeny in marmoset monkeys from early infancy until sexual maturity. Using quantitative measures to compare distinct call features, call type usage, and vocal sequence composition, we show that marmoset vocal behavior is constantly developing and highly dynamic throughout adolescence. We demonstrate that both acoustic properties of different call types and vocal behavior changed substantially over the first 15 months after birth. While most changes in acoustic structure of discrete call types exhibited a species-specific development correlated with physical growth as measured by body weight, changes in call usage resulted in distinct developmental patterns for individual monkeys. Our data provide evidence for a prolonged phase of high plasticity during marmoset vocal development, which is among the crucial primate evolutionary preadaptations for the emergence of speech in humans.
RESULTS
We tracked the vocal ontogeny of marmoset monkeys from early infancy until sexual maturity in a longitudinal study. Throughout this period, at least 2 and typically 3 days of recordings were taken each month, starting from the second postnatal week (PW2) to PW64, and conducted in a directed context with both visual and auditory contact with their family (Fig. 1, A and B, and table S1; for details see Materials and Methods). From these recordings, we annotated 143,734 calls from six marmosets. Already at PW2, we observed a broad variety of different call types (Fig. 1C).
Fig. 1. Acoustic development of marmoset calls with age.
(A) Experimental setup. Infant vocal utterances were recorded during brief separation from their family while in visual and auditory contact. (B) Vocal recording schedule. Sixteen vocal recording periods, from PW2 to PW64. Blue shading indicates the recording periods; each consisting of at least two recording days. Developmental stages from (19). (C) Spectrograms of representative vocalizations uttered by the experimental animals. (D to F) Development of the mean (± SEM) call duration (D), peak frequency (E), and mean Wiener entropy (F) distributions from PW2 to adulthood across all monkeys. (G) Mean body weights (± SEM) of all monkeys throughout development. Gray lines indicate the weights of each monkey.
Development of acoustic call structures
We measured 10 acoustic call features (table S2), similar to those used to track marmoset vocal development in past studies (14, 18), to evaluate developmental changes in marmoset call types. By analyzing this high number of acoustic call features, we aimed to avoid potential bias caused by selecting a small number of parameters. To identify variations in acoustic features of discrete call types, we analyzed the distributions of these call features individually for each call type rather than pooling all uttered vocalizations before analysis. The call types produced the most were trill (24.9%, n = 35,388 calls), tsik (13.8%, n = 19,564), segmented trill (13.2%, n = 18,708), trill-phee (11.4%, n = 16,207), twitter (11.3%, n = 16,088), compound cry (6.1%, n = 8,638), cry (5.5%, n = 7,855), and phee (4.5%, n = 6,400) vocalizations, along with other less common vocalization types (each ≤2.9%, n = 13,394) (Fig. 1C and 2B). To compare acoustic call features across all recording periods, we used a repeated measurement permutation test (rmp-test; see Materials and Methods).
Fig. 2. Development of call type distributions during development.
(A) Mean distribution (+ SEM) of the call types across all recorded monkeys throughout development. (B) Spectrograms of exemplar segmented trill, phee, and trill-phee vocalizations uttered by the experimental animals. (C) Mean distribution (+ SEM) of segmented calls across all recorded monkeys throughout development.
Call durations significantly decreased throughout development in three call types (Fig. 1D and fig. S1): trills (rmp-test: P = 0.002), cries (rmp-test: P = 0.005), and compound cries (rmp-test: P = 0.002); and increased in one call type: tsiks (rmp-test: P = 0.02; table S3). While trill call duration decreased and tsik call duration continuously increased from PW2 throughout development, cries and compound cries exhibited a distinct peak at PW10 to PW12 before decreasing afterward until adulthood. This peak might be related to weaning, which occurs in marmosets around this time (19). During weaning, the parents stop carrying offspring and force them to roam independently. The observed increase in cry and compound cry duration might therefore result from the offspring attempting to get the parents’ attention (20). Phee, trill-phee, and twitter calls did not exhibit any clear developmental changes in call duration.
Call frequencies significantly decreased throughout development in three call types (Fig. 1E, fig. S1, table S3): phee (rmp-test: POnset = 0.017, POffset = 0.008, PPeak < 0.001, and PMean < 0.001), trill-phee (rmp-test: POnset = 0.002, POffset = 0.032, PPeak < 0.001, and PMean < 0.001), and trill (rmp-test: POnsetFreq = 0.008, POffsetFreq = 0.001, PPeakFreq = 0.011, and PMeanFreq = 0.008). Cry, compound cry, twitter, and tsik calls did not show significant changes in any of the measured frequency features.
Last, a measure for the tonal purity of a vocal signal (14, 21) known as Wiener entropy significantly increased for three call types (Fig. 1F, fig. S1, and table S3): compound cry (rmp-test: PMean = 0.04 and POnset = 0.044), phee (rmp-test: POnset = 0.008 and PPeak = 0.017), and trill-phee (rmp-test: POffset = 0.007, PPeak < 0.001, and PMean < 0.001). The other call types did not show significant Wiener entropy differences throughout development.
Overall, duration, call frequency, and entropy were statistically different for at least three call types each, showing that vocalizations underwent a transformation during development, whereby durations and call frequencies decreased, while Wiener entropy increased. As shown in earlier studies (14, 18), these changes can be largely explained by physical growth or maturation (Fig. 1G and table S4). It can therefore be assumed that these changes directly correlate with physical maturation. Body size, and therefore the relative size of the vocal tract, and call frequency show a strong negative correlation in mammals, including marmoset monkeys (22, 23).
Development of call type occurrence
Next, we investigated the developmental trajectory of the occurrence of different call types. For this, we focused on the six most commonly uttered complete (unsegmented) call types: cry, compound cry, trill, phee, trill-phee, and twitter. We observed that all call types were uttered in the directed context (Fig. 2A and fig. S1), including cry and compound cry, call types previously assumed to be specific to infants and to disappear after the first few weeks after birth (12, 14). This suggests that cry and compound cry call types are produced in specific situations throughout development rather than only in the first few postnatal weeks. However, we observed no significant changes in the mean number of all but one call type; only the occurrence of twitter calls significantly changed during development [linear mixed model (LMM): Pcry = 0.925, Pcompound cry = 0.944, Ptrill = 0.442, Pphee = 0.12, Ptrill-phee = 0.118, Ptwitter < 0.001, and Ptsik = 0.982]. Cries and compound cries were steadily uttered throughout development. Twitter calls were produced most frequently in PW2 with a rapid decrease in PW6. Following a brief increase between PW10 and PW14, twitter calls occurred rarely thereafter until adulthood (Fig. 2A and fig. S1).
In addition to regular vocalizations, marmosets are capable of producing segmented versions of some call types (24–26). Here, we observed that segmented trills, trill-phees, and phees are produced by marmoset infants (Fig. 2B and fig. S1). The developmental trajectory of the mean number of segmented calls across animals was distinct from that of complete calls (Fig. 2C). Segmented calls were nearly absent right after birth (PW2). However, the occurrence of segmented trills and trill-phees increased afterward, peaking in PW18 and reaching significance for the segmented trills before significantly decreasing until PW38 and then becoming approximately constant until adulthood (LMM: Psegmented trill < 0.001, Psegmented phee = 0.442, and Psegmented trill-phee = 0.12). The mean segmented phee occurrence was low throughout development. These results indicate a distinct developmental stage from PW6 to PW38, in which marmosets are capable of producing more variable calls with higher flexibility than in the stages before or after.
Development of call type usage
To determine whether the usage of specific call types is subject to developmental changes, we investigated call transitions within periods of vocal behavior. Specifically, marmoset infants produce babbling-like sequences that are characterized by the utterance of both immature and mature call types with short intercall intervals (ICIs) (17, 18, 27). The development of these call sequences remains unclear. Qualitatively, we observed that call sequences markedly changed from infancy to adulthood (Fig. 3A). Therefore, we quantified vocal behavior throughout development by comparing ICIs, call distributions, and transitions between uttered calls.
Fig. 3. Development of call type transitions.
(A) Spectrograms of exemplar vocal sequences produced at different stages of vocal development. (B) Mean distribution (± SEM) of different ratios of ICIs across all recorded monkeys throughout development. (C) Transition diagrams for vocal sequences throughout development. Each node in the diagram corresponds to a call type. Arrows show the transitions between call types. Node size represents the call type’s proportion of all calls. Arrow line thickness represents the probability of that transition. To have a similar number of vocal recordings per age range, we built five time categories. (D) Mean distribution (± SEM) of call type repetition rates across all recorded monkeys. (E) Mean distribution (± SEM) of transition-type usage stereotypy was expressed as the u-index across all recorded monkeys. Higher u values indicate higher evenness (range: 0 to 1). (F) Mean distribution (± SEM) of the Zipf value across all recorded monkeys. Statistical analyses were performed using LMM.
Previous studies showed that infant-specific, babbling-like behavior is defined by vocal sequences of mature and immature call types that are consecutively produced with ICIs between 100 and 500 ms (2, 27–29). ICIs that are longer than 500 ms separate individually produced vocalizations, and ICIs shorter than 100 ms are observed in call combinations such as twitter sequences (18, 27). Sequences with ICIs between 100 and 500 ms have been thought to disappear after about 2 months of vocal development (12, 14). As expected, the distribution of ICIs, a proxy for vocal maturity (18, 27), increased significantly during development (Fig. 3B). From PW2 to PW4, the mean ratio of ICIs above 500 ms across monkeys was low (17.7%), while that for ICIs between 100 and 500 ms was high (49.6%); this ratio then switched in PW6, becoming low for ICIs between 100 and 500 ms (15.9 to 27.5%) and high for ICIs above 500 ms (45.4 to 66%) for the rest of development (LMM: P<100ms < 0.001, P100–500ms = 0.198, and P≥500ms < 0.001). A considerable amount of ICIs remained between 100 and 500 ms throughout development rather than disappearing after 2 months.
In addition to the observed shift in ICIs, call transitions also significantly changed during development (rmp-test: P = 0.015; see Materials and Methods; Fig. 3C). Call transitions in PW2 exhibited a pattern distinct from all later periods, characterized predominantly by twitter calls with high repetition rates. Alternations with other call types such as cries, tsiks, and trills were rare in PW2. However, call transitions from PW6 until PW30 were highly variable, occurring between most call types. Here, the marmosets predominantly produced trill calls, alternating with tsik, cry, compound cry, trill-phee, and phee calls. This resulted in a significant increase in call transitions between trills and trill-phees, trills, and cries, as well as repetitions of trills and trill-phees (see table S5 for details). At the end of adolescence (PW34), the distribution of call transitions stabilized (Fig. 3C and fig. S2). We further quantified developmental changes in vocal complexity and variation using sequence analyses. Mean call type repetitions significantly decreased during development with a rapid and sustained drop after PW18 (LMM: P < 0.001; Fig. 3D), indicating an increase in vocal complexity during marmoset vocal development. This finding was also supported by a measure for stereotypy, the u-index, which compares the predictability of the occurrence of transition types (30). The calculated u values significantly increased with vocal development, indicating a decrease in stereotypy (LMM: P < 0.001; Fig. 3E). However, calculating a measure for sequence complexity known as the Zipf value for each developmental period did not reveal any significant changes (LMM: P = 0.157; Fig. 3F). In addition, the Zipf values remained low (around −2) throughout development, indicating no balance between unification and diversification. This suggests that the observed change in vocal behavior was not optimized to transfer complex information, despite the observed increase in vocal complexity defined by a decrease in call type repetition and in contrast to complex vocal behavior such as human speech (31, 32).
Overall, our findings indicate that some call types underwent distinct developmental changes from birth to adulthood in three different stages: first, an early stage (PW2), consisting of a stereotyped call transition behavior largely composed of only a few uttered call types and followed by a rapid decrease in twitter calls; second, an intermediate stage (PW6 to PW38) characterized by a highly diverse call transition behavior and the occurrence of segmented versions of some call types; and last, a late stage (PW42 to PW64) exhibiting more stable occurrence of call combinations and transitions.
Developmental changes in interindividual similarity of acoustic call structures and call type transitions
We next investigated whether or not the highly variable and flexible developmental trajectories observed in the intermediate stage could result from maturational processes. If shared species-specific maturational processes predominantly account for the observed changes, then developmental trajectories should be similar across individuals. In contrast, changes influenced by experience should result in trajectories that become more and more dissimilar between individuals over time. Therefore, we analyzed the interindividual similarity of acoustic call features and call transitions.
Because some call types were rarely uttered in the late recording periods, we pooled the data into four bins (PW2 to PW16, PW18 to PW32, PW34 to PW48, and PW50 to PW64). We then performed discriminant function analysis (DFA) and computed F values that indicate relative similarity between individuals, with lower values corresponding to higher similarity (Fig. 4A; see Materials and Methods). Interindividual similarity did not significantly change during development for most call types: cries, compound cries, phees, trills, phees, and trill-phees (LMM: cry = 0.338, compound cry = 0.414, trill = 0.229, phee = 0.229, and trill-phee = 0.878). Significant differences in interindividual similarity were only observed for the short-duration tsik calls (LMM: P = 0.012) and twitter calls (LMM: P < 0.001). However, developmental changes for tsik calls did not follow a linear trajectory and twitter call similarity across individuals increased during development. The latter finding reveals that, concurrently following PW4, the occurrence of twitters rapidly decreased, while interindividual differences in this call type declined. Therefore, while acoustic structures of several call types underwent distinct developmental changes (Fig. 1D and fig. S1), they occurred similarly across individuals with maturation, pointing toward shared species-specific processes. In contrast, analysis of call transition pattern similarity (Fig. 4B) revealed interindividual differences across development. In the first weeks after birth (PW2 to PW14), infants exhibited call transition patterns that were highly similar (with low F values). Thereafter, the corresponding F values significantly increased until adulthood (LMM: P < 0.001). These results indicate that sequences of vocalizations became less similar and more unique with maturation, pointing to experience-based causes.
Fig. 4. Development of interindividual vocalization similarity.
(A) Mean interindividual acoustic structure similarity (± SEM) across all monkeys for complete call types. (B) Mean interindividual call type transition similarity (± SEM) across all monkeys. (C) Timeline of the observed vocal development stages and their relationship to general motor and social development in marmoset monkeys (19, 42, 64).
In summary, our analysis of call transition pattern similarity further emphasizes the presence of three vocal developmental stages. The early stage is characterized by a stereotyped call transition behavior with high interindividual similarity. The intermediate stage is defined by a highly diverse call transition behavior with decreased interindividual similarity. Last, the late stage is characterized by a stabilized occurrence of call combinations and transitions with further decreased interindividual similarity.
DISCUSSION
We show that the vocal behavior of marmoset monkeys can be characterized by distinct stages during development from the first postnatal weeks until adulthood. All of the main call types are already present within the first postnatal month, and developmental changes in acoustic call features can largely be explained by physical maturation. These findings are compatible with previous evidence, suggesting that basic acoustic call structure is innate and does not require learning through auditory or social feedback (33–35).
We found that marmoset cries and compound cries are produced until adulthood, whereas previous studies assumed that these calls are immature and infant specific (14, 16, 36). Infant cries have been shaped by evolutionary processes to trigger parental behavior that is crucial for the survival of infants in all mammals (37, 38). Similar to humans, marmoset caregivers respond to infant cries mostly by initiating search (20) and providing immediate approach and interaction (29), which calms the infants (39). Our finding indicates that marmoset cries may be a crucial call type used in specific behavioral contexts throughout development and beyond rather than serving as an immature version of other call types. Apparent discrepancies with previous studies, in which infant cries and infant-like call sequences transitioned to phee calls after PW2 to PW3 (14, 16), can be explained by contextual differences. These previous studies, focusing on the first three postnatal months, recorded infant marmosets in the absence of visual contact with their caregivers, which we consider an undirected context. When isolated, adult marmosets produce phee calls, to which conspecifics respond with their own phee calls to establish contact (40, 41). This turn-taking behavior develops rapidly (14), indicating that the shift to phees in the undirected context observed in these studies may reflect the proper expression of vocal behavior in isolated contexts.
While changes in acoustic structure could be predominantly explained by physical growth or maturation, we found that marmosets also exhibit call usage differences throughout development. These changes are unlikely to be explained by physical maturation alone and suggest distinct learning mechanisms as a key feature of marmoset vocal development. In an early stage (PW2), marmosets exhibit a stereotyped and highly repetitive vocal behavior with high interindividual similarity. This stage is followed by an intermediate stage (PW6 to PW38) characterized by a highly diverse and increasingly individualized call transition behavior and the occurrence of segmented versions of some call types. Last, there is a late stage (PW42 to PW64) marked by stable occurrence of call types accompanied by a stable and increasingly individualized call transition behavior. In addition, we show that marmosets produced increasingly dissimilar and unique vocalization sequences during maturation, suggesting that distinct learning mechanisms underlie the development of call transitions. This notion is further supported by a previous study showing that marmosets lacking parental social feedback starting from PW13 develop a different vocal behavior than one of their normally raised siblings (17). In contrast, changes in acoustic structure were similar across individuals, suggesting that species-specific maturation underlies these developmental differences rather than vocal learning.
Although marmosets are born with a largely fixed vocal repertoire that is likely genetically predetermined (12), our findings reveal their vocal behavior to be highly flexible and segregated into three developmental stages. These stages align with previously defined motor and social developmental periods (Fig. 4C). Initially, at PW2, infant vocal behavior is highly distinct, largely fixed, and repetitive, suggestive of a species-specific vocal repertoire shared by the marmosets at birth. Marmoset infants are carried around by caregivers for 96% of PW4 to PW6 (42). They enter a critical period of motor development during PW6 to PW8 in which most adult motor abilities are acquired and then gain physical independence with weaning starting around PW8 to PW10 (19, 43). Around this time, we show that the vocal behavior undergoes a shift from a stable, shared vocal repertoire to a highly flexible, more idiosyncratic stage.
After weaning, marmosets increasingly interact with other group members, engage in social grooming, and become independent of their caregivers. The vocal flexibility we observed during this time period may benefit social development. By the beginning of the subadulthood (PW40), most adult behaviors have been acquired. Coinciding with this transition, vocal behavioral flexibility begins to give way to increased stability. Social maturity is acquired during the last half of the subadult phase, when marmosets undergo distinct hormonal changes in puberty before reaching adulthood at PW60 (19). The last stage of vocal development is characterized by an increase in interindividual differences in call usage. Because we carefully controlled the environmental state to be as similar as possible for all experimental animals, we suggest that the observed changes are likely due to experience. This experience could, for example, take the form of interactions between an individual and its parents. Future work will determine whether the differences in call transition signatures that we observed are linked to the experience of hearing a family-specific call sequence or other external factors.
Together, the results of our longitudinal analysis show that marmoset vocal behavior develops in phases similar to motor behavior (42, 43) and social behavior (19). As in human infants (44), our results further suggest that marmoset vocal development is unlikely to be entirely genetically predetermined. The changes we observed also indicate that marmoset vocal development could be associated with considerable brain reorganization during adolescence, similar to that found during the early vocal development in human infants (2) and in other vocal learners such as songbirds (21, 45–47). We speculate that such prolonged brain development and elevated brain plasticity during the ontogeny of marmosets could be a source of the vocal flexibility observed during adolescence, which then stabilizes when approaching adulthood. From a neurophysiological perspective, these findings indicate that vocal pattern generating networks might be directly modulated during vocal development and beyond by social experience. Recent studies have shown that juvenile rhesus monkeys are able to volitionally initiate their vocal output (48, 49) and rapidly modulate their vocalizations in response to external acoustic stimuli (25, 50, 51). Here, frontal lobe structures might serve as hubs for controlling such complex audio-vocal communicative behaviors (52–55) and could enable the vocal plasticity we observed. This recently found frontotemporal network for audio-vocal integration processes may constitute an evolutionary preadaptation in the primate lineage for networks capable of enabling vocal plasticity during development and in adulthood. It will be interesting to see whether and how such frontal cortex circuits contribute to vocal learning in response to social experience. From a behavioral perspective, our results indicate that vocal flexibility leads to an increase in individuality exhibited by distinct call sequence structures for each marmoset. These individualized call sequences might allow marmosets to facilitate efficient, dynamic social communication in a visually obstructed and complex environment such as their natural habitat, the canopies of the South American rainforests (56).
Our characterization of marmoset vocal development sets the stage for future studies to decipher how social interactions and reinforcement shape both vocal behavior and the brain, particularly during the flexible intermediate stage of development, which shares features of human development crucial for the emergence of vocal learning and eventually speech.
MATERIALS AND METHODS
Experimental animals
We recorded the vocal utterances of six marmoset monkeys (Callithrix jacchus) born to four pairs of parents: two sets of twins and two single-born animals. The marmoset monkeys used in this study were born in captivity and maintained in their family groups for the duration of the recording timeline. Home cages were situated in the colony room such that each family group could see at least one other family group and hear at least three other family groups. The colony room operated on a 12-hour light/12-hour dark cycle with approximately 27°C temperature and 25 to 50% relative humidity. The animals had ad libitum access to water and were fed twice daily with commercial pellets (New World Primate Diet 5040, LabDiet, St. Louis, MO, USA), fresh fruits or vegetables, mealworms (Product 9264, Bio-Serv, Flemington, NJ, USA), and supplemental feed (Marmoset Diet, ZuPreem, Shawnee, KS, USA). Additional treats, including marshmallows, granola, and dried fruits, were used as positive reinforcements to transfer the animals from their home cage to the recording enclosures. All animal procedures complied with the U.S. National Institutes of Health’s Guide for the Care and Use of Laboratory Animals and were approved by the Institutional Animal Care and Use Committees of the Rockefeller University.
Vocal recordings
We recorded the vocal behavior of marmosets during a brief (20 to 40 min) physical separation from their home cage and family group while they remained in visual contact. These recordings were performed from PW2 until adulthood (PW64) but for one monkey from a set of twins, which was euthanized after three recording periods because of unrelated health complications (reducing the recording time for this animal accordingly). All but two recordings were made within the 2-week recording periods indicated in Fig. 1B, with the exception of one recording day each for monkeys M1 and M2 conducted 4 days before the PW14 to PW16 recording period. Recordings were conducted in custom-made 24 cm–by–20 cm–by–28 cm cast acrylic (Product 8560 K355, McMaster-Carr, Douglasville, GA, USA) boxes lined with wedge foam (Pro Studio Acoustics) or flat foam (AlphaFlat FOMAF241GRY, Acoustical Solutions, Richmond, VA, USA), attenuating sound to improve recording quality. This design significantly decreased acoustic interference within the recording box and dampened environmental sounds. Nearby vocal signals emanating from the home cage were often recognizable in recordings but were substantially dampened, enabling the easy discrimination of the subject’s vocalizations from the other vocalizations by amplitude alone. During vocal recordings, infants were individually placed into a recording box with a transparent side facing toward their home cage. Vocalizations were recorded with a condenser microphone (MKH 8020 with preamplifier MZX 8000, Sennheiser electronic, Wedemark, Germany), which was placed inside the recording box through a side port, and digitized at a sampling frequency of 96 kHz using an audio interface (UR44 or UR22mkII, Steinberg, Hamburg, Germany) connected to a computer running recording software (Audition, Adobe, San Jose, CA, USA). The recording timeline consisted of 16 periods. Each period included the last 2 weeks of a 4-week time window, i.e., PW2 to PW4 (postnatal day 14 to 28), PW6 to PW8, and continuing until PW64. Within each period, recordings were typically conducted on 3 days, with a minority having only 2 days (9%; table S1). Most but not all recording days were sequential within these periods (table S1). The recording periods were staggered because the monkeys were born on different dates, with twins M1/M2 that were born 79 days before M3/M6, which were born 32 days before M4, which was born 28 days before M5. This resulted in, for example, PW2 recordings for M5 occurring on the same days as PW22 recordings for M1/M2. We took care to avoid changes in the animals’ environment such as introduction of previously unencountered sounds, smells, or objects to avoid artificially affecting their vocal behavior. The first 15 min of each recording session was analyzed. This approach allowed us to accurately capture developmental changes in marmoset vocalizations throughout the entire ontogeny from infancy until adulthood. Animals were weighed on every recording day except for the first three recording periods for M1 and M2 (table S3).
Acoustic analysis
We analyzed 143,734 vocalizations recorded in 230 sessions from six marmoset monkeys (table S1). Acoustic call structure analysis was performed on 142,242 calls, with 1497 calls (1.04%) excluded due to manual marking irregularities. We classified marmoset vocalizations into call types as previously defined (12, 29). In addition, we separated segmented calls (24–26) and complete, unsegmented utterances for trills, phees, and trill-phees. Segmented calls were defined by a segmentation of the call syllable into multiple brief call segments, separated by silent intersegment intervals. For occurrence and vocalization counts, each segment was considered an individual vocalization. Calls were manually annotated and classified on the basis of their spectrotemporal profile and auditory playback. Trill calls are defined by sinusoidal-like frequency modulation (FM) throughout the call. Phees are tone-like long calls with a fundamental frequency (F0) around 7 to 10 kHz. Trill-phees are the combination of trill and phee call types. Cries are broadband calls with F0 around 600 Hz. Compound cries are combinations of cries and any other call type. Tsiks are broadband short calls consisting of a linearly ascending FM sweep that merges directly into a sharply descending linear FM sweep.
Call onsets and offsets were manually annotated using Avisoft-SASLab Pro 5.2 software (Avisoft Bioacoustics). Duration, call frequency (peak, onset, offset, and mean), and entropy values (peak, onset, offset, and mean) were extracted with the same software. Duration was calculated as the difference between the end and the beginning of the vocalization. ICI was considered as the time between the end of a vocalization and the beginning of the consecutive utterance. Wiener entropy was used as a measure for how broadband the power spectrum of a specific sound signal is and was calculated as the logarithm of the ratio between the geometric and arithmetic means of the values of the power spectrum (14, 46). The signal was band-pass–filtered between 5 to 15 kHz, as the vast majority of marmoset calls fall into this range (29, 36). For analyses, spectrograms were calculated using a fast Fourier transform window of 1024 points using a Hanning window and 50% overlap.
Calculation of Zipf value and u-index
We calculated the “Zipf” values on the basis of Zipf’s principle of least effort (31). This principle can be statistically expressed by regressing the log of the rank of signals (in our case, single call types) within a repertoire on the log of their frequency of occurrence. A balance between unification and diversification is indicated by a regression coefficient of −1.00. Zipf could show that diverse human languages follow this rule. Several previous studies also used this value to test whether vocal sequences of animals follow Zipf’s principle (57, 58)
where R is the rank of frequency occurrence, Freq is the frequency of call types, and mean is the log10 mean.
In a recent study, Kershenbaum and colleagues (59) compared Zipf estimates of different synthetic and empirical datasets, showing that these estimates could be a helpful way to describe the communicative complexity of vocal sequences in animals.
In addition to the Zipf value, we calculated the u-index. The u-index is based on the Shannon-Wiener index, which is a common way to describe variability in the occurrence of given categories (e.g., to create a biodiversity index). The u-index represents a slight variation of the Shannon-Wiener index and ranges between 0 and 1. High u values indicate more similar frequencies in the usage of categories (30). In this study, we used the u-index to compare the frequency distribution of the transition types with high u values, indicating an equal distribution of transition types and a lower predictability
where ti is the relative frequency of occurrence of transition type “i” and N is the number of transition types.
Statistical analyses
To control for developmental changes in call structure, we tested each call type separately across all 16 recording periods. We used the mean values across subjects and recording periods. Because not all animals uttered all call types in every recording period, we used an rmp-test, permuting values across ages (60). For a given subject, this test could deal with missing values in a repeated measurement design without losing available values and without requiring any replacement. We had six subjects. For two subjects, we had no recordings in the first and third month, and for one subject, we had only recordings for the three first months (table S1). In a few cases, some of the subjects did not produce a certain call type during a certain month, which could lead to a total of missing data points between 18 and 27%. We tested all 10 acoustic features (see table S2) and used a Simes correction to correct for multiple testing (61). An LMM with the recording period as the fixed factor and subject as random factor was used to identify developmental changes in call usage, call type repetitions, and Zipf value. In some cases, we log-transformed the distribution to meet LMM assumptions.
To quantify the acoustic similarity across subjects over time, we used a DFA. We first grouped three consecutive recording periods into one triad. Then, we obtained pairwise F values for the acoustic distance between each triad. This approach has been applied in previous studies examining relationships between acoustic structure and genetic or geographic distance (62, 63). To estimate the F values, we entered duration and all frequency features into a stepwise DFA with “subject” as the grouping variable. We did this four times for each 3-month triad and separately for every call type. The selection criterion for acoustic features to enter the discriminant function was Pin = 0.05 and to be removed was Pout = 0.1. To compare changes over time, we used a LMM with time period (3-month triad) as the fixed factor. When necessary for LMM testing, we log-transformed the distribution.
We used the same approach to quantify the similarity of call type transitions across subjects over time. Here, instead of acoustic features, we used the transition rates between call types per triad to calculate the similarity values (F values). To control for multiple testing (different call types), we used a Simes correction. To control for developmental changes in transition rates, we tested the mean transition rates of all subjects over all five time periods (see Fig. 3C) with rmp-test (see above). In case of the overall comparison of changes in transition rates, we used transition type as a between factor. F values and LMM were calculated using IBM SPSS version 26.0 (IBM, 122 Armonk, NY). In all performed tests, uncorrected significance was tested at an α = 0.05 level.
Acknowledgments
We are grateful to W. Freiwald for giving us the opportunity to investigate the animal subjects in his laboratory. Without his hospitality, this study would not have been possible. We thank J. Fischer, P. Viswanathan, B. Deen, and P. Schade for comments and fruitful discussion on this manuscript. We thank J. Hartling and M.-L. Herzog for help in data analysis. Funding: This work was supported by the Werner Reichardt Centre for Integrative Neuroscience (CIN) at the Eberhard Karls University of Tübingen (CIN is an Excellence Cluster funded by the Deutsche Forschungsgemeinschaft within the framework of the Excellence Initiative EXC 307). D.G.C.H. received funding support from the Leon Levy Foundation. Author contributions: Y.B.G. conceived the study. S.R.H., Y.B.G., and D.G.C.H. designed the experiments. D.G.C.H. and Y.B.G. conducted the experiments. Y.B.G. and K.H. analyzed the data. S.R.H., K.H., and Y.B.G. created the visualizations. All authors interpreted the data and wrote the manuscript. S.R.H. acquired funding and supervised the project. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper or the Supplementary Materials. The original dataset on which our analyses are based on can be found at https://doi.org/10.5061/dryad.5tb2rbp3r.
SUPPLEMENTARY MATERIALS
Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/7/27/eabf2938/DC1
REFERENCES AND NOTES
- 1.Oller D. K., Eilers R. E., Neal A. R., Schwartz H. K., Precursors to speech in infancy. J. Commun. Disord. 32, 223–245 (1999). [DOI] [PubMed] [Google Scholar]
- 2.D. K. Oller, The Emergence of the Speech Capacity (Lawrence Erlbaum Associates, 2000). [Google Scholar]
- 3.Scheiner E., Hammerschmidt K., Jürgens U., Zwirner P., The influence of hearing impairment on preverbal emotional vocalizations of infants. Folia Phoniatr. Logop. 56, 27–40 (2004). [DOI] [PubMed] [Google Scholar]
- 4.Jhang Y., Oller D. K., Emergence of functional flexibility in infant vocalizations of the first 3 months. Front. Psychol. 8, 300 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Oller D. K., Buder E. H., Ramsdell H. L., Warlaumont A. S., Chorna L., Bakeman R., Functional flexibility of infant vocalization and the emergence of language. Proc. Natl. Acad. Sci. 110, 6318–6323 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Koopmans-van Beinum F. J., Clement C. J., Van Den Dikkenberg-Pot I., Babbling and the lack of auditory speech perception: A matter of coordination? Dev. Sci. 4, 61–70 (2001). [Google Scholar]
- 7.Nathani S., Ertmer D. J., Stark R. E., Assessing vocal development in infants and toddlers. Clin. Linguist. Phon. 20, 351–369 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ertmer D. J., Young N. M., Nathani S., Profiles of vocal development in young cochlear implant recipients. J. Speech Lang. Hear. Res. 50, 393–407 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Goldstein M. H., King A. P., West M. J., Social interaction shapes babbling: Testing parallels between birdsong and speech. Proc. Natl. Acad. Sci. 100, 8030–8035 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Elowson A. M., Snowdon C. T., Lazaro-Perea C., ‘Babbling’ and social context in infant monkeys: Parallels to human infants. Trends Cogn. Sci. 2, 31–37 (1998). [DOI] [PubMed] [Google Scholar]
- 11.Hammerschmidt K., Freudenstein T., Jürgens U., Vocal development in squirrel monkeys. Behaviour 138, 1179–1204 (2001). [Google Scholar]
- 12.Pistorio A. L., Vintch B., Wang X., Acoustic analysis of vocal development in a New World primate, the common marmoset (Callithrix jacchus). J. Acoust. Soc. Am. 120, 1655–1670 (2006). [DOI] [PubMed] [Google Scholar]
- 13.Hammerschmidt K., Newman J. D., Champoux M., Suomi S. J., Changes in rhesus macaque “Coo” vocalizations during early development. Ethology 106, 873–886 (2000). [Google Scholar]
- 14.Takahashi D. Y., Fenley A. R., Teramoto Y., Narayanan D. Z., Borjon J. I., Holmes P., Ghazanfar A. A., The developmental dynamics of marmoset monkey vocal production. Science 349, 734–738 (2015). [DOI] [PubMed] [Google Scholar]
- 15.Takahashi D. Y., Fenley A. R., Ghazanfar A. A., Early development of turn-taking with parents shapes vocal acoustics in infant marmoset monkeys. Philos. Trans. R. Soc. B Biol. Sci. 371, 20150370 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Takahashi D. Y., Liao D. A., Ghazanfar A. A., Vocal learning via social reinforcement by infant marmoset monkeys. Curr. Biol. 27, 1844–1852.e6 (2017). [DOI] [PubMed] [Google Scholar]
- 17.Gultekin Y. B., Hage S. R., Limiting parental feedback disrupts vocal development in marmoset monkeys. Nat. Commun. 8, 14046 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gultekin Y. B., Hage S. R., Limiting parental interaction during vocal development affects acoustic call structure in marmoset monkeys. Sci. Adv. 4, eaar4012 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Schultz-Darken N., Braun K. M., Emborg M. E., Neurobehavioral development of common marmoset monkeys. Dev. Psychobiol. 58, 141–158 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ziegler T. E., Sosa M. E., Colman R. J., Fathering style influences health outcome in common marmoset (Callithrix jacchus) offspring. PLOS ONE 12, e0185695 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Nottebohm F., Nottebohm M. E., Crane L., Developmental and seasonal changes in canary song and their relation to changes in the anatomy of song-control nuclei. Behav. Neural Biol. 46, 445–471 (1986). [DOI] [PubMed] [Google Scholar]
- 22.Fitch W. T. S., Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. J. Acoust. Soc. Am. 102, 1213–1222 (1997). [DOI] [PubMed] [Google Scholar]
- 23.Pfefferle D., Fischer J., Sounds and size: Identification of acoustic variables that reflect body size in hamadryas baboons, Papio hamadryas. Anim. Behav. 72, 43–51 (2006). [Google Scholar]
- 24.Zürcher Y., Burkart J. M., Evidence for dialects in three captive populations of common marmosets (Callithrix jacchus). Int. J. Primatol. 38, 780–793 (2017). [Google Scholar]
- 25.Pomberger T., Risueno-Segovia C., Löschner J., Hage S. R., Precise motor control enables rapid flexibility in vocal behavior of marmoset monkeys. Curr. Biol. 28, 788–794.e3 (2018). [DOI] [PubMed] [Google Scholar]
- 26.Risueno-Segovia C., Hage S. R., Theta synchronization of phonatory and articulatory systems in marmoset monkey vocal production. Curr. Biol. 30, 4276–4283.e3 (2020). [DOI] [PubMed] [Google Scholar]
- 27.Zhang Y. S., Ghazanfar A. A., Perinatally influenced autonomic system fluctuations drive infant vocal sequences. Curr. Biol. 26, 1249–1260 (2016). [DOI] [PubMed] [Google Scholar]
- 28.Snowdon C. T., Elowson A. M., ‘Babbling’in pygmy marmosets: Development after infancy. Behaviour 138, 1235–1248 (2001). [Google Scholar]
- 29.Bezerra B. M., Souto A., Structure and usage of the vocal repertoire of Callithrix jacchus. Int. J. Primatol. 29, 671–701 (2008). [Google Scholar]
- 30.Neuringer A., Deiss C., Olson G., Reinforced variability and operant learning. J. Exp. Psychol. Anim. Behav. Process. 26, 98–111 (2000). [DOI] [PubMed] [Google Scholar]
- 31.G. K. Zipf, Human Behavior and the Principle of Least Effort (Addison-Wesley Press, 1949). [Google Scholar]
- 32.Ferrer i Cancho R., Solé R. V., Least effort and the origins of scaling in human language. Proc. Natl. Acad. Sci. U.S.A. 100, 788–791 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.K. Hammerschmidt, J. Fischer, Evolution of Communicative Flexibility (MIT Press, 2008), pp. 92–119. [Google Scholar]
- 34.Jürgens U., Neural pathways underlying vocal control. Neurosci. Biobehav. Rev. 26, 235–258 (2002). [DOI] [PubMed] [Google Scholar]
- 35.Ackermann H., Hage S. R., Ziegler W., Brain mechanisms of acoustic communication in humans and nonhuman primates: An evolutionary perspective. Behav. Brain Sci. 37, 529–546 (2014). [DOI] [PubMed] [Google Scholar]
- 36.Agamaite J. A., Chang C.-J., Osmanski M. S., Wang X., A quantitative acoustic analysis of the vocal repertoire of the common marmoset (Callithrix jacchus). J. Acoust. Soc. Am. 138, 2906–2928 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Lingle S., Riede T., Deer mothers are sensitive to infant distress vocalizations of diverse mammalian species. Am. Nat. 184, 510–522 (2014). [DOI] [PubMed] [Google Scholar]
- 38.Hiraoka D., Ooishi Y., Mugitani R., Nomura M., Differential effects of infant vocalizations on approach-avoidance postural movements in mothers. Front. Psychol. 10, 1378 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Bowlby J., Attachment and loss: Retrospect and prospect. Am. J. Orthopsychiatry 52, 664–678 (1982). [DOI] [PubMed] [Google Scholar]
- 40.Miller C. T., Wang X., Sensory-motor interactions modulate a primate vocal behavior: Antiphonal calling in common marmosets. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 192, 27–38 (2006). [DOI] [PubMed] [Google Scholar]
- 41.Takahashi D. Y., Narayanan D. Z., Ghazanfar A. A., Coupled oscillator dynamics of vocal turn-taking in monkeys. Curr. Biol. 23, 2162–2168 (2013). [DOI] [PubMed] [Google Scholar]
- 42.M. Yamamoto, Marmosets and Tamarins: Systematics Behaviour and Ecology, A. B. Rylands, Ed. (Oxford Univ. Press, 1993), pp. 235–254. [Google Scholar]
- 43.Wang Y., Fang Q., Gong N., Motor assessment of developing common marmosets. Neurosci. Bull. 30, 387–393 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.R. E. Stark, Child Phonology, G. H. Yeni-Komshian, J. F. Kavanagh, C. A. Ferguson, Eds. (Academic Press, 1980), pp. 73–92. [Google Scholar]
- 45.Mori C., Liu W. C., Wada K., Recurrent development of song idiosyncrasy without auditory inputs in the canary, an open-ended vocal learner. Sci. Rep. 8, 8732 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Tchernichovski O., Mitra P. P., Lints T., Nottebohm F., Dynamics of the vocal imitation process: How a zebra finch learns its song. Science 291, 2564–2569 (2001). [DOI] [PubMed] [Google Scholar]
- 47.Knudsen E. I., Sensitive periods in the development of the brain and behavior. J. Cogn. Neurosci. 16, 1412–1425 (2004). [DOI] [PubMed] [Google Scholar]
- 48.Hage S. R., Gavrilov N., Nieder A., Cognitive control of distinct vocalizations in rhesus monkeys. J. Cogn. Neurosci. 25, 1692–1701 (2013). [DOI] [PubMed] [Google Scholar]
- 49.Pomberger T., Risueno-Segovia C., Gultekin Y. B., Dohmen D., Hage S. R., Cognitive control of complex motor behavior in marmoset monkeys. Nat. Commun. 10, 3796 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Eliades S. J., Tsunada J., Auditory cortical activity drives feedback-dependent vocal control in marmosets. Nat. Commun. 9, 2540 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Pomberger T., Löschner J., Hage S. R., Compensatory mechanisms affect sensorimotor integration during ongoing vocal motor acts in marmoset monkeys. Eur. J. Neurosci. 52, 3531–3544 (2020). [DOI] [PubMed] [Google Scholar]
- 52.Romanski L. M., Goldman-Rakic P., An auditory domain in primate prefrontal cortex. Nat. Neurosci. 5, 15–16 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Gavrilov N., Hage S., Nieder A., Functional specialization of the primate frontal lobe during cognitive control of vocalizations. Cell Rep., 2393–2406 (2017). [DOI] [PubMed] [Google Scholar]
- 54.Roy S., Zhao L., Wang X., Distinct neural activities in premotor cortex during natural vocal behaviors in a new world primate, the common marmoset (Callithrix jacchus). J. Neurosci. 36, 12168–12179 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Miller C. T., Thomas A. W., Nummela S. U., de la Mothe L. A., Responses of primate frontal cortex neurons during natural vocal communication. J. Neurophysiol. 114, 1158–1171 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ghazanfar A. A., Liao D. A., Takahashi D. Y., Volition and learning in primate vocal behaviour. Anim. Behav. 151, 239–247 (2019). [Google Scholar]
- 57.McCowan B., Hanser S. F., Doyle L. R., Quantitative tools for comparing animal communication systems: Information theory applied to bottlenose dolphin whistle repertoires. Anim. Behav. 57, 409–419 (1999). [DOI] [PubMed] [Google Scholar]
- 58.Gustison M. L., Semple S., Ferrer-I-Cancho R., Bergman T. J., Gelada vocal sequences follow Menzerath’s linguistic law. Proc. Natl. Acad. Sci. U.S.A. 113, E2750–E2758 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Kershenbaum A., Demartsev V., Gammon D. E., Geffen E., Gustison M. L., Ilany A., Lameira A. R., Shannon entropy as a robust estimator of Zipf’s Law in animal vocal communication repertoires. Methods Ecol. Evol. 12, 553–564 (2020). [Google Scholar]
- 60.Mundry R., Testing related samples with missing values: A permutation approach. Anim. Behav. 58, 1143–1153 (1999). [DOI] [PubMed] [Google Scholar]
- 61.Hochberg Y., Rom D. M., Extensions of multiple testing procedures based on Simes’ test. J. Stat. Plan. Inference 48, 141–152 (1995). [Google Scholar]
- 62.Thinh V. N., Hallam C., Roos C., Hammerschmidt K., Concordance between vocal and genetic diversity in crested gibbons. BMC Evol. Biol. 11, 36 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Meyer D., Hodges J. K., Rinaldi D., Wijaya A., Roos C., Hammerschmidt K., Acoustic structure of male loud-calls support molecular phylogeny of Sumatran and Javanese leaf monkeys (genus Presbytis). BMC Evol. Biol. 12, 16 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.de Castro Leão A., Duarte Dória Neto A., Bernardete Cordeiro de Sousa M. B. C., New developmental stages for common marmosets (Callithrix jacchus) using mass and age variables obtained by K-means algorithm and self-organizing maps (SOM). Comput. Biol. Med. 39, 853–859 (2009). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/7/27/eabf2938/DC1




