Summary:
Objectives/Hypotheses.
Charismatic leaders use vocal behavior to persuade their audience, achieve goals, arouse emotional states, and convey personality traits and leadership status. This study investigates voice fundamental frequency (f0) and sound pressure level (SPL) in female and male French, Italian, Brazilian, and American politicians to determine which acoustic parameters are related to cross-gender and cross-cultural common vocal abilities, and which derive from culture-, gender-, and language-specific vocal strategies used to adapt vocal behavior to listeners’ culture-related expectations.
Study Design.
Speech corpora were collected for two formal communicative contexts (leaders address followers or other leaders) and one informal communicative context (dyadic interaction), based on the persuasive goals inherent in each context and on the relative status of the listeners and speakers. Leaders’ acoustic voice profiles were created to show differences in f0 and SPL manipulation with respect to speakers’ gender and language in each communicative context.
Results.
Cross-gender and cross-language similarities in manipulation of average f0 and in f0 and SPL ranges occurred in all communicative contexts. Patterns of f0 manipulation were shared across genders and cultures, suggesting this dimension might be biologically based and is exploited by leaders to convey dominance. Ranges for f0 and SPL seemed to be affected by the communicative context, being wider or narrower depending on the persuasive goal. Results also showed language- and speaker-specific differences in the acoustic manipulation of f0 and SPL over time.
Conclusions.
These findings are consistent with the idea that specific charismatic leaders’ vocal behaviors depend on a fine combination of vocal abilities that are shared across cultures and genders, combined with culturally- and linguistically-filtered vocal strategies.
Keywords: Charisma, Voice quality, f0, SPL, Cross-cultural
1. INTRODUCTION
Charisma is the set of characteristics, including political vision, emotions, and dominance, that leaders use to share beliefs and achieve goals. Charismatic characteristics are displayed through “charisma of the mind”--verbal behaviors that convey the strength of the leaders’ ideas and visions, expressed through spoken words and written texts--and/or through “charisma of the body,” the non-verbal behaviors (voice, facial expression, gesture, posture, etc.) that leaders use to shape ideas and visions and to express personality and emotions. In particular, voice characteristics are fundamental in conveying speakers’ personality traits and internal affective states [1–3], and in identifying speakers and distinguishing them from one another [4–10]. Speakers directly manipulate the acoustical patterns of their speech to convey different traits across communicative contexts (environmental acoustics, audience’s social status, gender, and age [11]). The link between these extrinsic and intrinsic speaker characteristics and the specific acoustic characteristics of speech also depends in part on social context [12,13], and several studies show how speech acoustics affects speakers’ credibility and social attractiveness differently in different languages and cultures. For example, a regional or foreign accent negatively affects speakers’ credibility among American English listeners [14,15], but does not affect speakers’ social attractiveness among Italian listeners [16].
This study investigates voice acoustics in political speech from a cross-gender, cross-language sample of speakers. Our goal is to distinguish vocal manipulations related to gender, which could reflect political leaders’ inherent strengths [17,18] and dominance [19], from those resulting from strategies depending on the language spoken, presumably reflecting learned strategies for conveying strength and dominance. To this end, we measured female and male political leaders’ vocal fundamental frequency (f0) and sound pressure level (SPL) across a variety of communicative contexts. Across species, genders, and cultures, f0 depends in part on learned factors, including the phonetic and phonologic structure of the language being spoken [20] and on the extra-linguistic uses of voice quality shared by a given group of speakers [21]. F0 also depends on the speaker’s anatomy [22–26] and physiology [27,28], and thus can reliably signal physical size [18,29] along with a speaker’s emotional state [30,31], personality [7], sex [32], age [21], attractiveness [33,34], and threat potential [19], in addition to leadership status [17,35,36]. Pitch, the perceptual correlate of f0, has been shown to influence listeners’ choice of a leader [37–40], and is exploited by listeners according to a “frequency code” [19]. This code associates (1) high frequencies with a primary meaning of small and harmless vocalizers, and a secondary meaning of a subordinate attitude and submissive behavior, and (2) low frequencies with a primary meaning of a big and potentially dangerous vocalizer and a secondary meaning of a superior attitude and dominant behavior. Note that most studies particularly focusing on f0 range are based on read speech [41], singing voice [42], voice disorders [27], or acted speech [41], and not on naturally-occurring utterances, which are difficult to gather under controlled circumstances.
SPL is the primary acoustic correlate of perceived loudness, and has been associated with listeners’ perceptions of pragmatic [21] and idiomatic meaning [43], as well as emotional state [1,44]. SPL physically depends on the interaction between subglottal pressure, resistance at the vocal folds, and the status of the upstream vocal tract (see [21] for review]. As a consequence, it is phonetically related to prosodic features, such as pauses in utterances, articulatory changes, and word stress [28]. SPL measurement also depends on the distance between the speaker and the listener or recording device, thus complicating comparison of measurements across occasions or contexts. Finally, environmental factors (e.g., background noise [45,46] and communicative contexts [47]) influence SPL variation, so that experimentally controlled recordings are nearly impossible to gather. For these reasons, absolute measures of SPL from non-controlled settings are poor independent indices of speakers’ identity, sex, or age [21]. However, it is possible to examine normalized ranges of SPL variation (relative SPL, or SPLrel), which can be compared across utterances, speakers, and contexts. This is the approach taken in the present study.
To investigate how biological and social factors affect the speech of political leaders, we studied recorded orations by female and male charismatic speakers and compared the manner in which they varied f0 and SPL across contexts. Three communicative contexts were examined: a monologue addressed to followers in a formal campaign context (the monologue context), a monologue addressed to other politicians at a formal conference in an institutional conference room (the conference context), and an informal face-to-face interview, during which no political topics were addressed (a control condition). Biologically-based uses of voice predict that leadership is conveyed innately, which would lead to similarities between languages and genders across contexts. Use of learned vocal strategies to enhance persuasion suggests that we should expect differences across communicative contexts as speakers tune orations to specific audiences. Specifically, because of the frequency code described above, we hypothesized that all speakers, regardless of gender and language spoken, would use lower mean f0 (compared to the monologue addressed to followers in the campaign context) to convey dominant charisma when addressing an audience of their peers. Speeches addressed to peers should also be characterized by narrower ranges of f0 and SPL, again reflecting efforts to convey dominance. In contrast, we further hypothesized that speakers would use higher f0, with wider f0 and SPL ranges, when addressing listeners of lower status and differing backgrounds and expectations (the monologue context), in order to enhance persuasion by conveying such non-dominant “charisma types” as competence and activeness, combined with activated emotional states (fear, happiness, etc.). These manipulations contrast with the dyadic interview, in which speakers should use the narrowest f0 and SPL ranges because the absence of specific persuasive goals would imply that no special prosodic adjustments to speech are required.
Patterns of f0 and SPL manipulation over time and context reflect not only the speaker’s charismatic voice, but also a vocal strategy similar to the climax figure of speech, in which words, sentences, and arguments are delivered in order of increasing duration or importance, with the peak of importance at the end of a discourse [48]. Reflecting the differences in persuasive goals inherent in different communicative contexts, we hypothesized that f0 and SPL will correlate in formal discourse in which speakers address political topics with the goal of persuading the audience. In informal discourse, we do not expect consistent time-related adjustment of vocal f0 and/or SPL, because speakers do not deal with political topics or pursue a specific persuasive goal. Instead, we expect the leaders studied here to differ from one another in how they organize their vocal behavior over time, with these differences reflecting gender and spoken language.
2. MATERIALS AND METHODS
2.1. Corpus
A multi-gender, multi-lingual corpus of political speech was collected from recordings of politicians in four countries with distinct cultures: the United States of America, Italy, France, and Brazil (Supplementaiy Table S1). Speakers were selected through surveys described in several previous studies [11,49,50]. Briefly, 170 participants (American-English native speakers; 120 females, 50 males; average age 21.96 y.o.) generated lists of adjectives they felt described a charismatic leader, from which the 68 most-frequently occurring responses were selected. Adjectives corresponded to 5 dimensions of charisma: empathy, competence, benevolence, dominance, and ability to induce emotions. The scales were validated by asking 96 additional listeners (French native speakers: 51 females, 13 males; average age 24.5 y.o.; Italian native speakers: 25 females, 7 males; average age 31 y.o.) to rate a set of speakers and then performing factor analysis on the results (see [11] page 15). This resulted in 3 factors: proactive-seductive (e.g., vigorous, active, dynamic, charming, sexy), benevolent-competent (e.g., wise, prudent, fair, sincere, intelligent), and authoritarian-threatening (e.g., self-confident, resolute, threatening, egocentric). Speakers were selected based on their scores on these factors. The final set included two female American English speakers (Hillary Clinton, aged between 62-67 years old at the time of the recording; and Carly Fiorina, 60 y.o.), three male American English speakers (Barack Obama, 51-53 y.o.; Bernie Sanders, 74 y.o.; and Donald Trump, 68-69 y.o.), two male Italian speakers (Luigi de Magistris, 44-45 y.o.; and Walter Veltroni, 57-57 y.o.), two male French speakers (François Hollande, 57-60 y.o.; and Nicolas Sarkozy, 56-57 y.o.), and two male Brazilian Portuguese speakers (Luiz Inácio Lula da Silva, 63-65 y.o.; and José Serra, 67-70 y.o.).
Speech data produced in three different communicative contexts were collected for each speaker. The first was a monologue addressed to followers in a formal campaign context (an arena or other large venue) during which political topics were addressed. In this context, speakers were higher in leadership than their listeners, and attempted to persuade followers to adopt the speakers’ goals (providing resources to help the politician win the next election). The second context was a monologue addressed to other politicians at a formal conference in an institutional conference room, during which political topics were addressed. Speakers and listeners were equal in leadership and social status in this context. During these interactions, the politicians also attempted to persuade colleagues to provide resources to help them maintain leadership status. The final context was an informal face-to-face interview, during which no political topics were addressed. In this type of informal interaction, the politician does not forward a precise persuasive goal related to politics. This third context served as a control condition to verify the validity of the hypotheses above, and also to determine if dominance was displayed in vocal behavior in informal dyadic interaction (see also [51]).
2.2. f0 and SPL measurements
f0 and SPL values were measured from [a] vowels extracted from each speech. [a] was chosen for analysis because its high first formant reduced the likelihood that the frequency tracker would confuse f0 with first formant F1 [52]. Mean f0 (f0m) was measured in Hertz (Hz) using Praat software [53]. f0 range (f0rng) values expressed in semitones were obtained through the equation: ), where the maximum (f0max) and minimum (f0min) frequencies were measured in Hz with Praat.
SPL measurements were made from audio recordings created without control of microphone-to-mouth distance or recording environment. To allow comparisons across contexts, speakers, genders, and languages, relative SPL (SPLrel) was measured as the difference in dB between minimum and maximum SPL. This subtraction amounts to generating the ratio of minimum to maximum values, thereby normalizing the measure so that values can be compared within and across recordings.
To compare vocal behavior across communicative contexts and speakers [41,52], we plotted f0 against SPL to create normalized Voice Range Profiles (VRPs; Figures 1–11] representing the entire vocal output of the charismatic leaders. Correlations among measures within each context were calculated using mean f0m and mean absolute SPL (SPLabs]; VRPs across contexts were plotted using f0rng and SPLrel values. In the VRPs the f0 range scale (X axis) ranged from 0 to 30 semitones (ST) and the scale for relative SPL (Y axis) ranged from 1 to 30 dB. Finally, the Kruskal-Wallis test by ranks [54] with post-hoc focused comparisons ([55], p. 213-214) was used to compare speakers and/or contexts (see also [56]). The calculations performed for the Kruskal-Wallis tests were conducted using ranked lists of measures including the following number of [a] vowels collected for each speaker and each communication context (see Table S1 for the sources of the audio data): Clinton (MON: 445, CON: 269, INT: 225); Fiorina (MON: 150, CON: 306, INT: 392); Obama (MON: 151, CON: 123, INT: 163); Sanders (MON: 116, CON: 88, INT: 32); Trump (MON: 955, CON: 160, INT: 175); de Magistris (MON: 868, CON: 373, INT: 207); Veltroni (MON: 776, CON: 612, INT: 463); Hollande (MON: 125, CON: 293, INT: 364); Sarkozy (MON: 831, CON: 401, INT: 141); Silva (MON: 796, CON: 1117, INT: 214); Serra (MON: 2133, CON: 1025, INT: 109).
Figure 1.

Voice Range Profile for American English speaker Hillary Clinton. X axis: fundamental frequency range (f0rng) in semitones. Y axis: sound pressure level relative range (SPLrel) in decibels. Each point in the scatterplot represents parameters measured in one /a/ vowel. Contexts of communication: a) monologue addressed to the followers (MON); b) monologue addressed to other politicians (CON); c) informal interview addressed to one listener (INT). Mean f0 and SPLabs were positively correlated in all contexts (monologue: r=.36, Fig. 1a; conference: r=.17, Fig. 1b; interview: r=.24, Fig. 1c).
Figure 11.

Voice Range Profile for Brazilian speaker José Serra. X axis: fundamental frequency range (f0rng) in semitones. Y axis: sound pressure level relative range (SPLrel) in decibels. Each point in the scatterplot represents parameters measured in one /a/ vowel. Contexts of communication as in Fig. 1. Mean f0 and SPLabs were positively correlated in all contexts (monologue: r=.33, Fig. 11a; conference: r=.22, Fig. 11b; interview: r=.27, Fig. 11c).
A simple regression analysis was used to study dynamic manipulations of fundamental frequency and sound pressure level over time in political speech. We compared changes in f0 and SPL for different stages of each speech. The predictor variable was time (i.e., the sequence of [a] vowels from the beginning to the end of speech utterances); the dependent variables were f0m and SPLabs.
3. RESULTS
P values were adjusted for multiple comparisons as appropriate for all analyses described below.
3.1. Average fundamental frequency
Table 1 shows speakers’ average voice fundamental frequencies in the three communicative contexts (monologue, conference, and interview). As hypothesized, speakers used the highest mean f0s in the monologues, during which they addressed an audience with lower leadership and social status (mean f0 across speakers = 193 Hz; SD = 18 Hz), Mean f0 was intermediate (161 Hz; SD = 26 Hz) in the conference context, during which speakers addressed an audience with similar leadership and social status, and lowest (126 Hz; SD = 23) in the context of a dyadic interview (the control condition). Results of Kruskal-Wallis tests (Table 1) confirmed that the three contexts differed significantly (p < .05). Post hoc comparisons of mean f0 ranks within communicative contexts showed that this pattern was significant for all speakers except American English speaker Sanders, whose mean f0 did not vary significantly with context (p > .05) and Brazilian Portuguese speaker Serra, whose mean f0 in the conference and interview contexts did not differ significantly (p > .05) (Supplementary Table S2).
Table 1.
Mean fundamental frequency values for individual female and male charismatic voices for the three communicative contexts. MON: monologue addressed to the followers; CON: monologue addressed to other politicians; INT: interview addressed to an interviewer. The Kruskal-Wallis test were performed by ranks of absolute mean of f0.
| Mean f0 (Hertz) | ||||||
|---|---|---|---|---|---|---|
| Speaker | Gender | Language | MON | CON | INT | Kruskal-Wallis |
| Clinton | F | American English | 218 | 188 | 175 | H(2)=196.69, p<.001 |
| Fiorina | F | 206 | 186 | 148 | H(2)=169.23, p<.001 | |
| Obama | M | 217 | 182 | 112 | H(2)=317.88, p<.001 | |
| Sanders | M | 201 | 181 | 138 | ns | |
| Trump | M | 195 | 183 | 136 | H(2)=288.18, p<.001 | |
| de Magistris | M | Italian | 182 | 147 | 130 | H(2)=531.81, p<.001 |
| Veltroni | M | 199 | 166 | 110 | H(2)=855.29, p<.001 | |
| Hollande | M | French | 183 | 142 | 111 | H(2)=373.54, p<.001 |
| Sarkozy | M | 190 | 184 | 125 | H(2)=229.47, p<.001 | |
| da Silva | M | Brazilian Portuguese | 176 | 141 | 100 | H(2)=700.26, p<.001 |
| Serra | M | 165 | 114 | 122 | H(2)=1053, p<.001 | |
Not surprisingly, female speakers used higher mean f0s overall than did male speakers, particularly in the interview context (Table 1). Cross-language comparisons showed that Brazilian Portuguese speakers had the lowest mean f0 during the monologue and conference contexts. American English speakers’ voices were characterized overall by the highest mean f0 in all three communicative contexts. Only speaker Trump presents a lower mean f0 in monologue, compared to the other American English speakers (Table 1).
3.2. Fundamental frequency range
Both female and male speakers varied their frequency ranges (f0rng) across communicative contexts (Table 2). Monologues were characterized by the widest frequency ranges, conference presentations by intermediate ranges, and interviews by the narrowest f0 range. Kruskal-Wallis tests (Table 2) statistically confirmed these findings for five of the eleven individual leaders (Obama, Trump, Veltroni, Sarkozy, and da Silva). Post hoc comparisons of f0rng mean ranks within communicative contexts (Supplementary Table S3) also confirmed these results, with the exception of American English speaker Obama whose f0rng did not differ significantly (p>.05) in monologue vs. conference or in monologue vs. interview, American English speaker Trump whose f0rng in the monologue and conference communicative contexts did not differ significantly (p>.05), French speaker Sarkozy whose f0rng did not differ significantly (p>.05) for conference vs. interview, and Brazilian Portuguese speaker da Silva whose f0rng in monologue did not differ significantly from interview (p>.05).
Table 2.
Fundamental frequency ranges for the charismatic voices in the different communicative contexts. MON: monologue addressed to the followers; CON: monologue addressed to other politicians; INT: interview addressed to an interviewer. In all cases the critical difference (α=.05) was corrected for the number of tests and for each focused comparison. The Kruskal-Wallis test were performed by ranks of f0 range calculated in semitones.
| f0 range (semitones) | ||||||
|---|---|---|---|---|---|---|
| Speaker | Gender | Language | MON | CON | INT | Kruskal-Wallis |
| Clinton | F | American English | 18.93 | 14.57 | 11.21 | ns |
| Fiorina | F | 20.6 | 18.59 | 16.01 | ns | |
| Obama | M | 14.62 | 7.02 | 4.35 | H(2)=11.22, p=.003 | |
| Sanders | M | 13.48 | 3.12 | 2.59 | ns | |
| Trump | M | 20.44 | 13.92 | 7.85 | H(2)=16.45, p<.001 | |
| de Magistris | M | Italian | 16.76 | 15.81 | 14.32 | ns |
| Veltroni | M | 19.92 | 14.92 | 12.4 | H(2)=107.8, p<.0001 | |
| Hollande | M | French | 15.11 | 10.57 | 8.19 | ns |
| Sarkozy | M | 18.97 | 17.58 | 15.51 | H(2)=31.4, p<.001 | |
| da Silva | M | Brazilian Portuguese | 16.99 | 16.82 | 12.48 | H(2)=85.82, p<.0001 |
| Serra | M | 17.13 | 15.99 | 14.66 | ns | |
Finally, native language had a significant effect on f0rng, with American English speakers characterized by wider f0rng in the monologue communicative context (Table 2). Cross-gender differences were highlighted by significantly higher f0rng in American English female speakers in comparison to male American English speakers, as expected. This difference was the greatest in conference and interview contexts (Table 2).
3.3. Sound pressure level range
Relative sound pressure level (SPLrel) also varied significantly across the three communicative contexts (Table 3). Monologues were characterized by the widest SPL range; conference orations were characterized by an intermediate range, and interviews by the narrowest SPLrel range. Post-hoc Kruskal-Wallis tests (Supplementary Table S4) indicated that this pattern was significant for seven of the eleven individual speakers: Clinton, Obama, de Magistris, Veltroni, Hollande, da Silva, and Serra. Additional post hoc focused comparisons (Supplementaiy Table S3) showed that SPL was not significantly different in conference vs. interview contexts for American English speaker Clinton and Italian speaker de Magistris, and that only the monologue and conference contexts differed significantly for Brazilian Portuguese speaker Serra.
Table 3.
Sound pressure level ranges for charismatic voices in different communicative contexts. MON: monologue addressed to the followers; CON: monologue addressed to other politicians; INT: interview addressed to an interviewer. In all cases the critical difference (α=.05) was corrected for the number of tests and for each focused comparison.
| SPLrel (dB) | ||||||
|---|---|---|---|---|---|---|
| Speaker | Gender | Language | MON | CON | INT | Kruskal-Wallis |
| Clinton | F | American English | 13 | 11 | 9 | H(2)=30.27, p<.0001 |
| Fiorina | F | 11 | 9 | 8 | ns | |
| Obama | M | 9 | 8 | 7 | H(2)=20.15, p<.0001 | |
| Sanders | M | 9 | 7 | 3 | ns | |
| Trump | M | 13 | 11 | 10 | ns | |
| de Magistris | M | Italian | 38 | 24 | 23 | H(2)=15.35, p=0004 |
| Veltroni | M | 38 | 21 | 12 | H(2)=274.59, p<.0001 | |
| Hollande | M | French | 33 | 28 | 27 | H(2)=15.47, p=0004 |
| Sarkozy | M | 31 | 28 | 25 | ns | |
| da Silva | M | Brazilian Portuguese | 25 | 22 | 16 | H(2)=16.48, p<.0002 |
| Serra | M | 33 | 30 | 24 | H(2)=8.04, p=.017 | |
Cross-gender comparisons, in this study limited to speakers of American English, showed that female and male speakers did not differ very much in SPLrel, (Table 3). Cross-language comparisons showed that male American English speakers’ voices were characterized by the narrowest overall SPLrel range (Table 3). Within the American English speakers, only Clinton and Obama showed significant differences between communicative contexts. Italian, French, and Brazilian Portuguese speakers showed wide SPLrel ranges, which differed significantly for all three communicative contexts: a wider range in monologues, a slightly narrower range in the conference context, and the narrowest range in interviews (see Table 3).
3.4. Interactions between fundamental frequency and sound pressure level
Figures 1 to 11 show voice range profiles (VRPs) for each speaker, demonstrating how f0 and SPLrel covary across the three communicative contexts. Across speakers, genders, and languages, these two parameters were positively correlated in all three communicative contexts (Supplementary Table S5). This is consistent with the physiologicaly-based relationship between f0 and SPL: an increase in SPL often results in an increase in f0. However, with few exceptions, correlations were small to moderate in size. This fact, along with examination of the figures, suggests that speakers used rather different approaches to manipulating f0 and SPL, presumably related to prosodic control. For speakers Obama, Sanders, Hollande, and Serra, patterns of covariation between f0 and SPL did not differ substantially across communicative contexts, indicating a consistent manner of self-presentation regardless of the audience or persuasive goal. The pattern of f0 variation was bimodal for a number of speakers, primarily for the monologue context (speakers Clinton, Trump, de Magistris, and Sarkozy), but also in conference presentations (da Silva), and in one case in all contexts (Fiorina). This pattern suggests an oratory style in which pitch, rather than rate or loudness, is used emphatically to engage and arouse the audience. Finally, speakers differed in their patterns of SPL variation, with some (Trump, de Magistris, and Veltroni) using greatest loudness variation in the monologue context, others (Sarkozy, da Silva) using more loudness variation in the conference context, and the rest keeping patterns of loudness relatively constant across contexts.
3.5. Manipulation of fundamental frequency and sound pressure level over time
Across speakers, similar strategies emerged for adjusting mean f0 over time in the monologue context, but not in other contexts (Supplementary Table S6). Overall patterns of temporal variability in SPL were consistent across all contexts. These patterns held for all individual speakers except Obama, Sanders, and Serra, for whom f0 did not vary over time. In the monologue context speakers both decreased f0 over time (Clinton, Fiorina, Hollande, Sarkozy) and increased it (Trump, de Magistris, Veltroni, da Silva), while most speakers increased f0 over time in conference utterances.
Language-based strategies mostly affected formal speech contexts (monologue and conference). The French, Italian, and Brazilian speakers increased mean f0 over time in eveiy communicative context; the Italians and Brazilians significantly decreased SPL during monologues, and the French significantly decreased SPL in the conference context.
4. DISCUSSION
In this study we investigated the acoustics of charismatic political leaders’ speech by examining within- and cross-language similarities and differences in politicians’ vocal behavior in three different communicative contexts and over time. Analyses focused on speakers’ manipulations of mean fundamental frequency (f0m), fundamental frequency range (f0rng), relative sound pressure level (SPLrel), absolute sound pressure level (SPLabs), and the interaction between f0m and SPLabs. Results showed both shared and idiosyncratic patterns of voice manipulation whose ultimate purpose is to persuade listeners [11,57–59]. All leaders studied here addressed followers of mixed social status (the monologue context, see Figures 1–11, panel a) using high mean f0 (female speakers = 212 Hz; male average f0 = 189 Hz), wide f0mg (female speakers = 17.5 semitones; male speakers = 17 semitones), and wide SPLrel ranges (female speakers = SPLrel 12 dB; male speakers = SPLrel 25 dB). Mean f0 was lower in the conference or interview contexts (see Table 1). SPLrel ranges were also narrower overall in conferences (see Figures 1–11, panel b) and interviews (see Figures 1–11, panel c). Previous experiments [40,49,60] demonstrated that increasing f0 and f0 variability arouses listeners’ emotions while conveying charisma types in a way that matches the diverse expectations of a large group of listeners regarding what a charismatic leader should sound like, what emotional states a charismatic leader should arouse, and what personality traits a charismatic leader should display. Higher f0 and larger f0 variations appear to emphasize the speakers’ social status as represented by at least three charisma types (proactive-seductive, benevolent-competent, and authoritarian-threatening) that make leaders socially attractive to a group with larger diversity in terms of in gender, age, social status, ethnicity, and educational background.
In the conference communicative context in which the speakers’ goal was to persuade a medium-sized audience of their peers, the leaders studied here generally used a less-varying vocal pattern, with mean f0, f0rng, and SPLrel ranges that were significantly lower and narrower than in the monologue context. With the exception of SPLrel ranges for the American English speakers, this vocal profile was shared across cultures, suggesting that it could be (at least partly) biologically based (see Figures 1–11, panel b). This result is consistent with [29], who found that male speakers adjust f0 according to the listener’s perceived social status: male speakers who consider themselves more physically and socially dominant than their listeners tend to use lower f0. The present study shows that female charismatic speakers also lower mean f0 when addressing their peers, further consistent with views that this strategy has an underlying biological basis.
In the control interview context, in which speakers did not address political topics or pursue specific persuasive goals and addressed a single interlocutor, they all used the least varying voice profiles (narrow mean f0; see Figures 1–11, panel c), along with very low mean f0. However, SPLrel ranges were significantly narrower than those for monologue and conference for only three individual speakers (the French speaker Sarkozy and both Brazilian Portuguese speakers da Silva and Serra). Values were significantly higher than monologue for the American English speakers; and they were higher than conference for the French speaker Hollande and for both Italian speakers. This pattern reflects two possible vocal strategies: (i) a shared pattern in which speakers lower f0 and SPL average frequencies and narrow ranges because they consider the listener to be physically and socially submissive, consistent with [29]; or (ii) less variable vocalization because the goal of the speech is not persuasion, so it is not necessary to generate emotional arousal in the listener.
Analyses of acoustic variability over time in charismatic speech showed few commonalities across speakers, but instead a set of individual-specific manipulations of mean f0 and SPLabs. All speakers significantly increased SPLabs over time in the interview context, and most adjusted mean f0 over time in the monologue context, but in different ways (Supplementary Table S6). Female American English speakers and male French speakers decreased mean f0 over time, while male Italian speakers and one Brazilian Portuguese speaker increased it in all three contexts. The Italian speakers also significantly increased SPLabs over time in all three contexts. These varying strategies for voice adjustment over time, termed Vocis Climax (see [11]), are related to the way in which leaders culturally learn how to lead their audiences. Charismatic speakers use average acoustic values and ranges of mean f0 and SPLrel that differ significantly from the beginning to the end of the speech to amplify the emotional connection with the audience, with the aim of arousing emotional states to enhance persuasion. This acoustic strategy appears most strongly in the monologue communicative context, where leadership must be clearly demonstrated to a large and varied crowd. Finally, variation over time in fundamental frequency and sound pressure level suggests that the changes in overall f0 and SPL described above are in fact due to speakers’ adaptation to the particular audience and are not solely a result of bias related to room acoustics or audience size. Speakers’ individual variations of voice parameters (f0, SPL) over time demonstrate the speakers’ adaptation to the particular audience in a voluntary manner (Supplementary Table S6).
5. CONCLUSIONS
In conclusion, the present results show the subtle integration between cross-language abilities and culture-/language-specific strategies that charismatic leaders use to persuade listeners. The study addresses the voice production domain using acoustic analyses and statistical modeling to determine the overall physiological vocal range of charismatic leaders-speakers and its adaptation to contexts of communication and time. We found evidence of a corresponding exploitation of voice in terms of vocal fundamental frequency and loudness range by leaders from different languages and cultures involved in the same context of communication, consistent with a biological commonality in the use of these vocal characteristics. Yet we also found evidence of a cultural distinction in the use of these vocal characteristics in terms of modulation over time. Although speakers in high leadership positions share some common vocal behaviors, the acoustics of charismatic speech depend more on the communicative context than on the language spoken. Leaders speaking in formal political contexts, requiring high psychological involvement to arouse a large range of emotions and convey specific personality traits, use higher overall voice fundamental frequency and wider fundamental frequency and intensity ranges. Conversely, leaders speaking in informal political communicative contexts, requiring lower levels of psychological involvement with less need to arouse emotions or display a specific personality, display voice acoustics with lower overall fundamental frequency and narrow fundamental frequency and intensity ranges, and display more idiosyncrasies that mark their individual verbal style.
Supplementary Material
Figure 2.

Voice Range Profile for American English speaker Carly Fiorina. X axis: fundamental frequency range (f0rng) in semitones. Y axis: sound pressure level relative range (SPLrel) in decibels. Each point in the scatterplot represents parameters measured in one /a/ vowel. Contexts of communication as in Fig. 1. Mean f0 and SPLabs were positively correlated in all contexts (monologue: r=.39, Fig. 2a; conference: r=.41, Fig. 2b; interview: r=.34, Fig. 2c).
Figure 3.

Voice Range Profile for American English speaker Barack Obama. X axis: fundamental frequency range (f0rng) in semitones. Y axis: sound pressure level relative range (SPLrel) in decibels. Each point in the scatterplot corresponds to acoustic parameters measured in one /a/ vowel. Contexts of communication as in Fig. 1. Mean f0 and SPLabs were positively correlated in all contexts (monologue: r=.45, Fig. 3a; conference: r=.44, Fig. 3b; interview: r=.36, Fig. 3c).
Figure 4.

Voice Range Profile for American English speaker Bernie Sanders. X axis: fundamental Frequency range (f0rng) in semitones. Y axis: sound pressure level relative range (SPLrel) in decibels. Each point in the scatterplot corresponds to acoustic parameters measured in one /a/ vowel. Contexts of communication as in Fig. 1. Mean f0 and SPLabs were positively correlated for monologue (r=.26; Fig. 4a) and interview (r=.82; Fig. 4c), but not for conference (p>.05, Fig. 4b).
Figure 5.

Voice Range Profile for American English speaker Donald Trump. X axis: fundamental frequency range (f0rng) in semitones. Y axis: sound pressure level relative range (SPLrel) in decibels. Each point in the scatterplot corresponds to acoustic parameters measured in one /a/ vowel. Contexts of communication as in Fig. 1. Mean f0 and SPLabs were positively correlated in all contexts (monologue: r=.47, Fig. 5a; conference: r=.37, Fig. 5b; interview: r=.53, Fig. 5c).
Figure 6.

Voice Range Profile for Italian speaker Luigi de Magistris. X axis: fundamental frequency range (f0rng) in semitones. Y axis: sound pressure level relative range (SPLrel) in decibels. Each point in the scatterplot represents parameters measured in one /a/ vowel. Contexts of communication as in Fig. 1. Mean f0 and SPLabs were positively correlated in all contexts (monologue: r=.57, Fig. 6a; conference: r=.68, Fig. 6b; interview: r=.45, Fig. 6c).
Figure 7.

Voice Range Profile for Italian speaker Walter Veltroni. X axis: fundamental frequency range (f0rng) in semitones. Y axis: sound pressure level relative range (SPLrel) in decibels. Each point in the scatterplot represents parameters measured in one /a/ vowel. Contexts of communication as in Fig. 1. Mean f0 and SPLabs were positively correlated in monologue (r=.51; Fig. 7a) and conference (r=.60; Fig. 7c), but not interview (p>.05; Fig. 7c).
Figure 8.

Voice Range Profile for French speaker François Hollande. X axis: fundamental frequency range (f0rng) in semitones. Y axis: sound pressure level relative range (SPLrel) in decibels. Each point in the scatterplot represents parameters measured in one /a/ vowel. Contexts of communication as in Fig. 1. Mean f0 and SPLabs were positively correlated in all contexts (monologue: r=.44, Fig. 8a; conference: r=.57, Fig. 8b; interview: r=.51, Fig. 8c).
Figure 9.

Voice Range Profile (VRP) for French speaker Nicolas Sarkozy. X axis: fundamental frequency range (f0rng) in semitones. Y axis: sound pressure level relative range (SPLrel) in decibels. Each point in the scatterplot represents parameters measured in one /a/ vowel. Contexts of communication as in Fig. 1. Mean f0 and SPLabs were positively correlated in all contexts (monologue: r=.20, FIG 9a; conference: r=.51, Fig. 9b; interview: r=.55, Fig. 9c).
Figure 10.

Voice Range Profile for Brazilian speaker Luiz Inácio Lula da Silva. X axis: fundamental frequency range (f0rng) in semitones. Y axis: sound pressure level relative range (SPLrel) in decibels. Each point in the scatterplot represents parameters measured in one /a/ vowel. Contexts of communication as in Fig. 1. Mean f0 and SPLabs were positively correlated in all contexts (monologue: r=.63, Fig. 10a; conference: r=.40, Fig. 10b; interview: r=.46, Fig. 10c).
6. Acknowledgements
The authors are deeply thankful to Leonardo Lancia (Laboratoire de Phonétique et Phonologie (LPP), CNRS UMR 7018, Paris, France), Nari Rhee (University of California, Los Angeles, USA), and Neda Vesselinova (University of California, Los Angeles, USA). Parts of this research were previously presented at the 2013 Humaine Association on Affective Computing and Intelligent Interaction and at the 171st Meeting of the Acoustical Society of America. This research was supported in part by NIH grant DC01797.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declarations of interest: none
7. REFERENCES
- [1].Banse R, Scherer KR, Acoustic profiles in vocal emotion expression., J. Pers. Soc. Psychol 70 (1996) 614–36. http://www.ncbi.nlm.nih.gov/pubmed/8851745. [DOI] [PubMed] [Google Scholar]
- [2].Bänziger T, Scherer KR, The role of intonation in emotional expressions, Speech Commun. 46 (2005) 252–267. doi:doi: 10.1016/j.specom.2005.02.016. [DOI] [Google Scholar]
- [3].Grandjean D, Bänziger T, Scherer KR, Intonation as an interface between language and affect, Prog. Brain Res 156 (2006) 235–268. [DOI] [PubMed] [Google Scholar]
- [4].Scherer KR, Judging personality from voice: A cross-cultural approach to an old issue in interpersonal perception, J. Pers 40 (1972) 191–210. [DOI] [PubMed] [Google Scholar]
- [5].Brown BL, Strong WJ, Rencher AE, Perceptions of personality from speech: Effects of manipulations of acoustical parameters., J. Acoust. Soc. Am 54 (1973) 29–35. [DOI] [PubMed] [Google Scholar]
- [6].Brown BL, Strong WJ, Rencher AE, 54 voices from 2: The effects of simultaneous manipulation of rate, mean fundamental frequency, and variance of fundamental frequency on ratings of personality from speech, J. Acoust. Soc. Am 55 (1974) 313–318. [DOI] [PubMed] [Google Scholar]
- [7].Aronovitch CD, The voice of personality: Stereotyped judgments and their relation to voice quality and sex of speaker, J. Soc. Psychol 99 (1976) 207 http://search.ebscohost.com/login.aspx?direct=true&db=pbh&AN=5391314&lang=fr&site=ehost-live. [DOI] [PubMed] [Google Scholar]
- [8].Scherer KR, Non-linguistic vocal indicators of emotion and psychopathology, in: Izard CE (Ed.), Emot. Personal. Psychopathol, Plenum Press, New York, USA, 1979: pp. 495–529. [Google Scholar]
- [9].Scherer KR, Scherer U, Speech Behavior and Personality, in: Darby J (Ed.), Speech Eval. Psychiatry, Grune & Stratton, New York, USA, 1981: pp. 115–135. [Google Scholar]
- [10].Mohammadi G, Mortillaro M, Vinciarelli A, The Voice of Personality: Mapping Nonverbal Vocal Behavior into Trait Attributions, in: Int. Work. Soc. Signal Process, 2010: pp. 17–20. [Google Scholar]
- [11].Signorello R, La Voix Charismatique : Aspects Psychologiques et Caractéristiques Acoustiques, Université Grenoble Alpes, France and Università degli Studi Roma Tre, Italy, 2014. [Google Scholar]
- [12].Cornut G, La voix, Presses Universitaire de France, 2005. [Google Scholar]
- [13].Barkat-Defradas M, Dufour F, La mimesis vocale : un phénomène dialogique ?, Cah. Praxématique 49 (2007) 57–77. [Google Scholar]
- [14].Tsalikis J, DeShields OW, LaTour MS, The Role of Accent on the Credibility and Effectiveness of the Salesperson, J. Pers. Sell. Sales Manag 11 (1991) 31–41. [Google Scholar]
- [15].Lev-Ari S, Keysar B, Why don’t we believe non-native speakers? The influence of accent on credibility, J. Exp. Soc. Psychol 46 (2010) 1093–1096. doi: 10.1016/j.jesp.2010.05.025. [DOI] [Google Scholar]
- [16].De Meo A, Vitale M, Pettorino M, Martin P, Acoustic-Perceptual Credibility Correlates of News Reading by Native and Chinese Speakers Of Italian, in: Lee W-S, Zee E (Eds.), Proceeding XVI ICPhS, City University of Hong Kong, Hong Kong, China, 2011: pp. 1366–1369. [Google Scholar]
- [17].Klofstad CA, Anderson RC, Peters S, Sounds like a winner: voice pitch influences perception of leadership capacity in both men and women, Proc. R. Soc. B Biol. Sci (2012) 2698–2704. doi: 10.1098/rspb.2012.0311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Pisanski K, Fraccaro PJ, Tigue CC, O’Connor JJM, Röder S, Andrews PW, Fink B, DeBruine LM, Jones BC, Feinberg DR, Vocal indicators of body size in men and women: a meta-analysis, Anim. Behav 95 (2014) 89–99. doi: 10.1016/j.anbehav.2014.06.011. [DOI] [Google Scholar]
- [19].Ohala JJ, The frequency codes underlies the sound symbolic use of voice pitch, Sound Symb. (1994) 325–347.http://linguistics.berkeley.edu/PhonLab/users/ohala/papers/freq_code.pdf. [Google Scholar]
- [20].Keating P, Kuo G, Comparison of speaking fundamental frequency in English and Mandarin, J. Acoust. Soc. Am 132 (2012) 1050–1060. doi: 10.1121/1.4730893. [DOI] [PubMed] [Google Scholar]
- [21].Kreiman J, Sidtis D, Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception, Wiley-Blackwell, Oxford, UK, 2011. [Google Scholar]
- [22].Ladefoged P, Out of chaos comes order’: Physical, biological, and structural patterns in phonetics, in: Proc. Tenth Int. Congr. Phonetic Sci, 1984: pp. 83–95. [Google Scholar]
- [23].Disner SF, Vowel quality: The contribution of language particular and language universal factors, UCLA Work. Pap. Phonetics (1983) 1–158. [Google Scholar]
- [24].Ladd DR, Dediu D, Kinsella AR, Languages and Genes: Reflections on Biolinguistics and the Nature-Nurture Question, Biolinguistics. 2 (2008). [Google Scholar]
- [25].Demolin D, Trouville R, Wang R, Signorello R, Oral and nasal vowels effects on subglottal pressure, J. Acoust. Soc. Am 142 (2017) 2582. doi: 10.1121/l.5014450. [DOI] [Google Scholar]
- [26].Demolin D, Variation of anatomical features and the shape of phonological systems, in: XIII Convegno Naz AISV, Pisa, Italy, 2016. [Google Scholar]
- [27].Baken RJ, Orlikoff RF, Clinical Measurement of Speech and Voice, 2nd Revise, Singular Publishing Group Inc, San Diego, USA, 2000. [Google Scholar]
- [28].Baken RJ, Clinical measurement of speech and voice, College-Hill Press, Boston, USA, 1987. [Google Scholar]
- [29].Puts DA, Hodges CR, Cárdenas R. a., Gaulin SJC, Men’s voices as dominance signals: vocal fundamental and formant frequencies influence dominance attributions among men, Evol. Hum. Behav 28 (2007) 340–344. doi: 10.1016/j.evolhumbehav.2007.05.002. [DOI] [Google Scholar]
- [30].Scherer KR, Vocal Affect Expression. A Review and a Model for Future Research, Psychol. Bull (1986). doi: 10.1037/0033-2909.99.2.143. [DOI] [PubMed] [Google Scholar]
- [31].Scherer KR, Vocal communication of emotion: A review of research paradigms, Speech Commun. (2003). doi: 10.1016/S0167-6393(02)00084-5. [DOI] [Google Scholar]
- [32].Whiteside SP, Temporal-based acoustic-phonetic patterns in read speech: Some evidence for speaker sex differences, J. Int. Phon. Assoc (1996). doi: 10.1017/S0025100300005302. [DOI] [Google Scholar]
- [33].Zuckerman M, Driver RE, What sounds beautiful is good: The vocal attractiveness stereotype, J. Nonverbal Behav 13 (1989) 67–82. doi: 10.1007/BF00990791. [DOI] [Google Scholar]
- [34].Feinberg DR, Jones BC, Law Smith MJ, Moore FR, DeBruine LM, Cornwell RE, Hillier SG, Perrett DI, Menstrual cycle, trait estrogen level, and masculinity preferences in the human voice., Horm. Behav 49 (2006) 215–22. doi: 10.1016/j.yhbeh.2005.07.004. [DOI] [PubMed] [Google Scholar]
- [35].Anderson RC, Klofstad CA, Preference for Leaders with Masculine Voices Holds in the Case of Feminine Leadership Roles, PLoS One. 7 (2012) e51216. doi: 10.1371/journal.pone.0051216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Klofstad CA, Anderson RC, Nowicki S, Perceptions of Competence, Strength, and Age Influence Voters to Select Leaders with Lower-Pitched Voices, PLoS One. 10 (2015) e0133779 http://dx.doi.org/10.1371%2Fjournal.pone.0133779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Darwin C, The descent of man, and selection in relation to sex, Murray J, London, UK, 1871. [Google Scholar]
- [38].Trivers R, The Evolution of Reciprocal Altruism, Q. Rev. Biol 46 (1971) 35–57. http://onlinelibraiy.wiley.com/doi/10.1002/cbdv.200490137/abstract (accessed June 20, 2013). [Google Scholar]
- [39].Tigue CC, Borak DJ, O’Connor JJM, Schandl C, Feinberg DR, Voice pitch influences voting behavior, Evol. Hum. Behav 33 (2012) 210–216. http://linkinghub.elsevier.com/retrieve/pii/S1090513811001024?showall=true. [Google Scholar]
- [40].Signorello R, D’Errico F, Poggi I, Demolin D, Derrico F, Poggi I, Demolin D, How charisma is perceived from speech: A multidimensional approach, in: Proc. 2012 ASE/IEEE Int. Conf. Soc. Comput Soc. 2012, IEEE Computer Society, Amsterdam, The Netherlands, 2012: pp. 435–440. doi:10.1109/SocialCom-PASSAT.2012.68. [Google Scholar]
- [41].Emerich KA, Titze IR, Švec JG, Popolo PS, Logan G, Vocal Range and Intensity in Actors: A Studio Versus Stage Comparison, J. Voice 19 (2005) 78–83. doi: 10.1016/j.jvoice.2004.08.006. [DOI] [PubMed] [Google Scholar]
- [42].Lamarche A, Ternström S, Hertegård S, Not just sound: supplementing the voice range profile with the singer’s own perceptions of vocal challenges., Logoped. Phoniatr. Vocol 34 (2009) 3–10. doi: 10.1080/14015430802239759. [DOI] [PubMed] [Google Scholar]
- [43].Ahn JS, Yang SY, Sidtis D, The perception and acoustic features of Korean ditropic sentences, in: Meet. Acoust. Soc. Am, 2010: p. 1955. [Google Scholar]
- [44].Laukka P, Juslin PN, Bresin R, A dimensional approach to vocal expression of emotion., Cogn. Emot 19 (2005) 633–653. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=17835457&site=ehost-live. [Google Scholar]
- [45].Bond ZS, Moore TJ, Gable B, Acoustic-phonetic characteristics of speech produced in noise and while wearing an oxygen mask., J Acoust Soc Am. 85 (1989) 907–912. [DOI] [PubMed] [Google Scholar]
- [46].Garnier M, Henrich N, Speaking in noise: How does the Lombard effect improve acoustic contrasts between speech and ambient noise?, Comput. Speech Lang 28 (2014) 580–597. https://hal.archives-ouvertes.fr/hal-00874988. [Google Scholar]
- [47].Zollinger SA, Brumm H, The Lombard effect, Curr. Biol 21 (2011) R614–R615. doi: 10.1016/j.cub.2011.06.003. [DOI] [PubMed] [Google Scholar]
- [48].Reboul O, Introduction à la rhétorique, 3ème, Presses Universitaires de France, Paris, France, 1998. [Google Scholar]
- [49].D’Errico F, Signorello R, Demolin D, Poggi I, The perception of charisma from voice: A cross-cultural study, in: Proc. - 2013 Hum. Assoc. Conf. Affect. Comput. Intell. Interact. ACII 2013, 2013. doi:10.1109/ACII.2013.97. [Google Scholar]
- [50].Signorello R, Rhee N, The voice acoustics of the 2016 United States presidential election candidates: A cross-gender study, J. Acoust. Soc. Am 139 (2016) 2123. [Google Scholar]
- [51].Griffin D, Gonzalez R, Models of dyadic social interaction., Philos. Trans. R. Soc. B Biol. Sci 358 (2003) 573–581. doi: 10.1098/rstb.2002.1263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Lamarche A, Putting the Singing Voice on the Map, KTH School of Computer Science and Communication, 2009. [Google Scholar]
- [53].Boersma P, Weenink D, Praat: doing phonetics by computer (Version 5.2.26), Computer program, Retrieved on June 14 2011 from http://www.praat.org/, 2011. [Google Scholar]
- [54].Kruskal WH, Wallis WA, Use of Ranks in One-Criterion Variance Analysis, J. Am. Stat Assoc 47 (1952) 583–621. doi: 10.1080/01621459.1952.10483441. [DOI] [Google Scholar]
- [55].Siegel S, Jr. NJC, Non parametric statistics for the behavioural sciences, MacGraw Hill Int, New York, NY, USA, 1988. [Google Scholar]
- [56].Field A, Miles J, Field Z, Discovering Statistics Using R, Sage Publications Inc., London, UK, 2012. [Google Scholar]
- [57].Aristotle, Rhetoric, Kindle Ver, Acheron Press, Retrieved from Amazon.fr, 1991. [Google Scholar]
- [58].Cicero De Oratore, Harvard University Press, Cambridge, MA, USA, 1967. [Google Scholar]
- [59].Scherer KR, Voice Appeal and Its Role in Political Persuasion, in: Int. Work. Polit. Speech, 2010. http://sspnet.eu/. [Google Scholar]
- [60].Signorello R, The biological function of fundamental frequency in leaders’ charismatic voices, J. Acoust. Soc. Am 136 (2014) 2295. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
