Abstract
Formants are important phonetic elements of human speech that are also used by humans and non-human mammals to assess the body size of potential mates and rivals. As a consequence, it has been suggested that formant perception, which is crucial for speech perception, may have evolved through sexual selection. Somewhat surprisingly, though, no previous studies have examined whether sexes differ in their ability to use formants for size evaluation. Here, we investigated whether men and women differ in their ability to use the formant frequency spacing of synthetic vocal stimuli to make auditory size judgements over a wide range of fundamental frequencies (the main determinant of vocal pitch). Our results reveal that men are significantly better than women at comparing the apparent size of stimuli, and that lower pitch improves the ability of both men and women to perform these acoustic size judgements. These findings constitute the first demonstration of a sex difference in formant perception, and lend support to the idea that acoustic size normalization, a crucial prerequisite for speech perception, may have been sexually selected through male competition. We also provide the first evidence that vocalizations with relatively low pitch improve the perception of size-related formant information.
Keywords: vocal communication, formants, acoustic cues to size
1. Introduction
Vocal tract resonances or ‘formants’ are the key acoustic parameters underlying the phonetic diversity of human speech (for an overview, see [1]). However, they can also provide non-linguistic information about potentially important biosocial dimensions of speakers. In particular, lower and more closely spaced formant frequencies are indicative of larger speakers [2,3] because vocal tract length and body size are correlated in humans [4] and longer vocal tracts produce lower formants [5]. Hence, the entire formant pattern is scaled down or up in larger and smaller speakers, respectively. In addition, because physical size often determines the outcome of competitive interactions, the use of formants for assessing body size from vocal signals may have been important in our ancestors for reliably assessing the quality or competitiveness of potential mates and/or rivals [1]. Indeed, recent studies have shown that formant spacing is a reliable cue to body size in several non-human mammal species that can play a functional role in female mate choice and male–male competition (reviewed in [6]). Other recent work [7–10] has shown that human listeners rate speakers with lower formants as sounding larger, more dominant, masculine and attractive.
This body of work not only suggests that formants are used by humans and non-human mammals to assess potential mates and rivals but also indicates that formant perception, which is crucial for speech perception [1,5,11,12], may have evolved through sexual selection. Furthermore, whereas men appear to use formants to judge the physical dominance of potential rivals [8], formants are not consistently found to predict women's attractiveness ratings of men's voices [2,9]. As a result, we may expect men to have more acute perception of size-related formant information in vocal signals. Surprisingly though, no previous studies have investigated whether men and women actually differ in their ability to use formants to make auditory size judgements.
The primary aim of the current study was to investigate whether men and women differ in their ability to make relative size judgements using small differences in the formant spacing of synthetic stimuli representing different size animals. We predict that listeners will rate stimuli with lower formants as coming from larger animals (humans or otherwise), and that male listeners will be better than female listeners at this task. In addition, we also examined comparison performance over a wide range of fundamental frequencies (the main determinant of vocal pitch, hereafter F0). Formant perception in human speech is compromised at higher F0s (e.g. [11]) presumably because formant peaks become poorly resolved as the density of harmonics sampling the formant envelope decreases below a certain threshold [13]. Thus, based on the assumption that lower F0 improves the perceptual salience of formants, we predict that the ability to categorize the apparent size of the vocal stimuli will improve as F0 decreases.
2. Material and methods
(a). Subjects
The study was conducted at the Bader International Study Centre, East Sussex, UK. A total of 55 college undergraduates (18 males and 37 females) completed the experiment. Participants were aged between 17 and 20 years. All participants gave informed consent.
(b). Stimuli
We synthesized a set of vocal stimuli representing different sized animals using Praat 5.1.32 DSP package (www.praat.org), and following the principles of the source-filter theory of voice production [5]. The stimuli consisted of a 1 s long harmonic complex tone (the ‘source’) combined with a formantGrid pattern (the ‘filter’) with equally spaced formants so that it approximates an idealized uniform straight tube (or an unperturbed vocal tract). The formant pattern consisted of 10 formants with an overall formant spacing of 1100 Hz (corresponding to a vocal tract length of 15.9 cm), which falls within the typical human range ([5]; for more details see the electronic supplementary material). The stimuli were arranged in matched pairs so that stimuli with the original formant pattern (baseline condition) were followed 0.5 s later by stimuli that had been rescaled by shifting all of the formants up or down by 1–5% (figure 1). Stimulus pairs were created with F0s of 10, 20, 40, 80, 160 and 320 Hz. These F0 values encompass the F0 range of the human speaking voice [5] and allowed us to test the ability of listeners to detect small differences in apparent size across a wide range of F0s (examples of the stimuli are provided as electronic supplementary material).
Figure 1.
(a–f) Spectrograms to illustrate the experimental stimuli at the six different pitch classes (spectrogram settings: window length = 0.05 s, frequency step = 20 Hz, dynamic range = 40 dB, Gaussian window shape). In each example, the formants are shifted up by 5% in the second presentation. The formants are labelled F1–F4.
(c). Experimental procedure
The stimuli were presented through JVC HA-S360 professional headphones (London, UK) at a comfortable pre-set volume. Participants were informed that they would hear pairs of audio stimuli representing two different animals, and that their task was to decide which one sounded ‘larger’ by clicking on the appropriate button on the computer screen. Each participant received 60 unique stimulus pairs representing the six pitch classes (10–320 Hz) with the formants shifted up or down 1–5%. Custom-written software in Python v. 2.6 was used to randomize stimulus presentation and collect responses, and a generalized linear model fitted with maximum likelihood estimation was used to examine variation in listeners’ size categorization performance (see the electronic supplementary material for further details).
3. Results
Male participants were significantly better at classifying the apparent size of stimuli than female participants (Wald
, p = 0.034) (figure 2a). In addition, significant main effects of formant rescaling (Wald
, p < 0.001) and pitch (Wald
, p = 0.017) on the proportion of correct size judgements made by listeners were revealed: in particular, listeners were better at categorizing low-pitched stimuli according to their apparent size than they were at categorizing high-pitched stimuli (figure 2b), and size categorization performance increased steadily as the difference in formant rescaling between the baseline and test stimulus increased from 1–5% (figure 2c). No statistically significant interaction effects were observed (gender × pitch: Wald
, p = 0.945; gender × formant condition: Wald
, p = 0.700; pitch × formant condition: Wald
, p = 0.385; gender × pitch × formant condition: Wald
, p = 0.934).
Figure 2.

(a) Proportion of correct classifications ± s.e.m. made by male and female participants, (b) relationship between the proportion of correct classifications and stimulus pitch, and (c) proportion of correct classifications ± s.e.m. for the different formant rescaling conditions.
4. Discussion
We found that men were significantly better than women at using small differences in the formant spacing of synthetic vocal stimuli to make relative size judgements. This sex difference was consistent for shifts in apparent size of 1–5% and across a wide range of F0s (from 10 to 320 Hz), as indicated by the absence of significant interaction effects. The fact that untrained men are better than women at spontaneously using the formant structure of vocal stimuli to correctly compare their apparent size is consistent with studies showing that women are more reliant on voice pitch than formants when they rate the attractiveness of male voices [9], whereas men tend to use formant spacing for dominance attributions [8]. While men also appear to be better at perceiving temporal and tonal contrasts in speech and non-speech sounds [14,15], to our knowledge, our results represent the first demonstration of a sex difference involving human formant perception.
Furthermore, the ability to perceive formant frequency spacing is crucial for the perception of speech sounds because the human auditory system needs to normalize the size-related formant variation in speech sounds produced by differently sized speakers with different vocal tract lengths, in order to retrieve the phonetic information encoded in the relative, rather than absolute position of formant frequencies [12]. This ‘size normalization’ appears to be applied to all sounds at a relatively early stage in auditory processing [16], suggesting that humans have dedicated perceptual mechanisms for automatically processing size-related formant information. Our results show that this ability is more developed in men, and support the idea that sexual selection might have played a role in the evolution of this key prerequisite of speech perception [1]. Future studies could aim to reveal whether sex differences in the auditory processing of size-related information in vocal signals also exist at a neurological level.
In addition, we have shown that the ability of human listeners to classify the apparent size of synthetic non-speech sounds varying only in their formant spacing is greater in stimuli with low F0. Low F0 vocalizations are predicted to be particularly well suited for highlighting formants because the dense harmonic spacing should allow the formant peaks to be more clearly resolved [13]. Furthermore, ‘pulsatile’ vocalizations, where there is no pitch percept and the individual glottal pulses are heard as separate events, should be ideal for the auditory discrimination of formant frequencies because they have no perceivable pitch and each of the discrete pulses contains energy across a broad frequency range, making it likely that formant-related information is emphasized. Interestingly, the vocal repertoires of several animal species include vocalizations characterized by very low F0 that may function to increase the salience of formant-related information [17,18]. Our results provide the first empirical support that lowering F0 does indeed improve the perception of size-related formant information.
We suggest that future studies investigate whether sex differences in the processing of size-related formant information exist in non-human mammals, and examine whether the sex difference we have reported in human listeners is specific to human voice-like sounds or generalizes to other resonant sources. Finally, it is also important to note that the sex difference in size discrimination we report in the current study could be innate or acquired or both. Hence, while it is compatible with the hypothesis that men rely on size assessment more than women, it does not conclusively demonstrate that these abilities arose through sexual selection. For example, it is possible that males learn to cue on size-related information in vocal signals more than females because this information is more important to them during their everyday social interactions. There may also be key differences across cultures, particularly in societies where gender roles differ markedly. Thus, future studies that examine the effects of training and personality, as well as social and cultural factors on the development of human auditory size discrimination, are also warranted.
Acknowledgements
A Leverhulme Trust Early Career Fellowship awarded to Benjamin D. Charlton financially supported this work. The University of Sussex Research Ethics Committee approved the study (BC0312).
References
- 1.Fitch WT. 2010. The evolution of language. Cambridge, UK: Cambridge University Press [Google Scholar]
- 2.Bruckert L, Lienard JS, Lacroix A, Kreutzer M, Leboucher G. 2006. Women use voice parameters to assess men's characteristics. Proc. R. Soc. B 273, 83–89 10.1098/rspb.2005.3265 (doi:10.1098/rspb.2005.3265) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Evans S, Neave N, Wakelin D. 2006. Relationships between vocal characteristics and body size and shape in human males: an evolutionary explanation for a deep male voice. Biol. Psychol. 72, 160–163 10.1016/j.biopsycho.2005.09.003 (doi:10.1016/j.biopsycho.2005.09.003) [DOI] [PubMed] [Google Scholar]
- 4.Fitch WT, Giedd J. 1999. Morphology and development of the human vocal tract: a study using magnetic resonance imaging. J. Acoust. Soc. Am. 106, 1511–1522 10.1121/1.427148 (doi:10.1121/1.427148) [DOI] [PubMed] [Google Scholar]
- 5.Titze IR. 1994. Principles of voice production. Englewood Cliffs, NJ: Prentice Hall [Google Scholar]
- 6.Taylor A, Reby D. 2010. The contribution of source-filter theory to mammal vocal communication research. J. Zool. 280, 221–236 10.1111/j.1469-7998.2009.00661.x (doi:10.1111/j.1469-7998.2009.00661.x) [DOI] [Google Scholar]
- 7.Pisanski K, Rendall D. 2011. The prioritization of voice fundamental frequency or formants in listener's assessments of speaker size, masculinity, and attractiveness. J. Acoust. Soc. Am. 129, 2201–2212 10.1121/1.3552866 (doi:10.1121/1.3552866) [DOI] [PubMed] [Google Scholar]
- 8.Puts DA, Hodges CR, Cardenas RA, Gaulin SJC. 2007. Men's voices as dominance signals: vocal fundamental and formant frequencies influence dominance attributions among men. Evol. Hum. Behav. 28, 340–344 10.1016/j.evolhumbehav.2007.05.002 (doi:10.1016/j.evolhumbehav.2007.05.002) [DOI] [Google Scholar]
- 9.Feinberg DR, Jones BC, Little AC, Burt DM, Perrett DI. 2005. Manipulations of fundamental and formant frequencies influence the attractiveness of human male voices. Anim. Behav. 69, 561–568 10.1016/j.anbehav.2004.06.012 (doi:10.1016/j.anbehav.2004.06.012) [DOI] [Google Scholar]
- 10.Puts DA, Apicella CL, Cárdenas RA. 2012. Masculine voices signal men's threat potential in forager and industrial societies. Proc. R. Soc. B 279, 601–609 10.1098/rspb.2011.0829 (doi:10.1098/rspb.2011.0829) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kewley-Port D, Li X, Zheng Y, Neel A. 1996. Fundamental frequency effects on thresholds for vowel formant discrimination. J. Acoust. Soc. Am. 100, 2462–2470 10.1121/1.417954 (doi:10.1121/1.417954) [DOI] [PubMed] [Google Scholar]
- 12.Ladefoged P, Broadbent D. 1957. Information conveyed by vowels. J. Acoust. Soc. Am. 29, 98–104 10.1121/1.1908694 (doi:10.1121/1.1908694) [DOI] [PubMed] [Google Scholar]
- 13.Ryalls JH, Lieberman P. 1982. Fundamental-frequency and vowel perception. J. Acoust. Soc. Am. 72, 1631–1634 10.1121/1.388499 (doi:10.1121/1.388499) [DOI] [PubMed] [Google Scholar]
- 14.Kempe V, Thoresen JC, Kirk NW, Schaeffler F, Brooks PJ. 2012. Individual differences in the discrimination of novel speech sounds: effects of sex, temporal processing, musical and cognitive abilities. PLoS ONE 7, e48623. 10.1371/journal.pone.0048623 (doi:10.1371/journal.pone.0048623) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.McRoberts GW, Sanders B. 1992. Sex differences in performance and hemispheric organization for a nonverbal auditory task. Percept. Psychophys. 51, 118–122 10.3758/BF03212236 (doi:10.3758/BF03212236) [DOI] [PubMed] [Google Scholar]
- 16.Smith DRR, Patterson RD, Turner R. 2005. The processing and perception of size information in speech sounds. J. Acoust. Soc. Am. 117, 305–318 10.1121/1.1828637 (doi:10.1121/1.1828637) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Charlton BD, Ellis WAH, McKinnon AJ, Cowin GJ, Brumm J, Nilsson K, Fitch WT. 2011. Cues to body size in the formant spacing of male koala (Phascolarctos cinereus) bellows: honesty in an exaggerated trait. J. Exp. Biol. 214, 3414–3422 10.1242/jeb.061358 (doi:10.1242/jeb.061358) [DOI] [PubMed] [Google Scholar]
- 18.Vannoni E, McElligott AG. 2007. Individual acoustic variation in fallow deer (Dama dama) common and harsh groans: a source-filter theory perspective. Ethology 113, 223–234 10.1111/j.1439-0310.2006.01323.x (doi:10.1111/j.1439-0310.2006.01323.x) [DOI] [Google Scholar]

