Skip to main content
The Journal of the Acoustical Society of America logoLink to The Journal of the Acoustical Society of America
. 2015 Dec 1;138(6):EL498–EL503. doi: 10.1121/1.4936643

Effects of voice style, noise level, and acoustic feedback on objective and subjective voice evaluations

Pasquale Bottalico 1,a), Simone Graetzer 1, Eric J Hunter 1
PMCID: PMC4670443  PMID: 26723357

Abstract

Speakers adjust their vocal effort when communicating in different room acoustic and noise conditions and when instructed to speak at different volumes. The present paper reports on the effects of voice style, noise level, and acoustic feedback on vocal effort, evaluated as sound pressure level, and self-reported vocal fatigue, comfort, and control. Speakers increased their level in the presence of babble and when instructed to talk in a loud style, and lowered it when acoustic feedback was increased and when talking in a soft style. Self-reported responses indicated a preference for the normal style without babble noise.

1. Introduction

The interaction between the person, the room, and the activity leads to different sensations of vocal comfort, control, fatigue, and effort. The maximization of vocal comfort and control, and the minimization of vocal fatigue and effort, is particularly important when (1) the person is at high risk of vocal injury, such as in teaching environments1 when the classroom acoustics are poor,2 and (2) the person is speaking with an overused or under-recovered voice.3

Vocal comfort can be defined as a psychological entity of which the magnitude is determined by those aspects that reduce the vocal effort.4 It appears to decrease with the speaker's perceived fatigue and the sensation of needing to increase the voice level.5 Vocal comfort can be defined as the capacity to self-regulate vocal behaviour, e.g., sound pressure level (SPL) or intensity. The sensation of control relates to the capacity to adjust the voice to maintain a level that is suitable for communication given the environmental conditions. Vocal fatigue is a progressive increase in phonatory effort, from which one can recover with rest.3

Vocal effort can be defined as the exertion of the speaker as quantified by the A-weighted SPL (dB) at a distance of 1 m from the mouth.6 In addition, vocal effort can be defined as a physiological entity that accounts for changes in voice production when loading increases.7 It is affected by speaker-listener distance, background noise level, and other acoustic characteristics of the room. Moreover, the characteristics of the communication environment are known to affect vocal effort in both adults and children.8

In this study, the effects of voice style (corresponding to soft, normal, and loud levels), background noise level, and external auditory feedback on (1) vocal effort (SPL) and (2) self-reported vocal comfort, control, and fatigue were evaluated.

2. Experimental method

The speech of 20 talkers in a semi-reverberant room was recorded in three different styles corresponding to soft, normal, and loud levels, both with and without artificial multi-talker child babble, and with and without polycarbonate panels at 1 m from the subject. These panels increased external auditory feedback, providing an early reflection of a talker's speech. With protocol approval of the Michigan State University's Human Research Protection Programs Human Subject's Review Board, ten male and ten females were recruited to participate. These subjects, of between 18 and 29 years, with a mean (x¯) of 21 years, were self-reported nonsmoking and without a self-reported speech or hearing impairment. The instructions given for the styles were as follows. Soft: “Imagine you are saying something to a friend who is next to you. You want her to hear you but no one else. Do not whisper”; Normal: “Speak in your normal voice”; Loud: “Imagine you are in a classroom and you want to be heard by all of the children.”

2.1. Room acoustic conditions and measurement procedures

The experiment took place in a classroom of dimensions 5.8 m × 6 m × 2.7 m, in which the floor and ceiling were covered by absorbent material (carpet and absorbent tiles). Speech was acquired by an omnidirectional head-mounted microphone (HMM Glottal Enterprises M-80) and recorded by a Roland R-05 digital recorder with a sampling rate of 44.1 kHz.

Speech was recorded in two noise conditions: background and babble noise. The average background noise level, mainly generated by the HVAC system, was 40.5 dBA. Children's babble noise at an averaged A-weighted level of 61 dB (as measured at the talker position) was emitted by a directional speaker (Yamaha studio monitor model HS5). This level represents a common noise level generated by children in a classroom engaged in quiet group work or individual work with some movement.9

Room acoustic parameters were measured in an unoccupied state without furniture from the impulse responses (IRs) generated by a balloon pop.10 The 12 IRs were recorded in four source positions and three microphone positions. Room acoustic parameters in the octave band ranging from 125 to 8000 Hz were calculated. The mid-frequency reverberation time (T20 500–1000 Hz) was 0.53 s [standard deviation (s.d.) 0.04], while the mid-frequency clarity (C50 500–1000 Hz) was 5.7 dB (s.d. 1.2). Regarding T20, the standard deviation of the mean spatial values (s.d. 0.01) was lower than the JND (0.03 s) and therefore T20 demonstrated rather uniform spatial behavior. C50 values ranged between 3.52 and 7.47 dB; higher values were found in the positions closer to the window.

The dimensions of the transparent shield of the polycarbonate panels were 56 cm by 66 cm (22″ × 26″). The increase in the external auditory feedback introduced by the panels was quantified by means of the C50 calculated from oral-binaural room IRs. These IRs were measured using a Head and Torso Simulator (HATS) placed in the talker position in unoccupied conditions. Sine sweeps were used as excitation signals. Figure 1 shows the trend in C50 in the octave bands ranging from 125 to 8000 Hz with and without panels. The increase due to the panels is evident in the higher frequencies, which are the most important for speech.

Fig. 1.

Fig. 1.

C50 (dB) by panel condition per octave band measured using an oral-binaural impulse response.

2.2. Instructions, stimuli, and questionnaires

The subjects were instructed to read a text comprising three standard passages (“Marvin Williams,” 1st paragraph of the “Rainbow” passage, and “Stella”). The text, which was 1 to 2 min in length, was attached to a small stand placed at a distance of 1 m from the speakers. Subjects were asked to answer three questions after each reading of the text. These concerned the experience of talking in the various acoustic conditions. Subjects responded to the questions by making a vertical tick on a continuous horizontal line of 100 mm length (a visual analogue scale or VAS). The score was measured as the distance of the tick from the left end of the line and converted to a percentage. The questions were as follows. (1) Fatigue: How fatigued would your voice be if you were to speak continuously in this condition for 20 min? (2) Comfort: How comfortable was it to speak in this condition? (3) Control: How well were you able to control your voice in this condition? The extremes of the lines were “not at all” (left) and “extremely” (right).

2.3. Analysis

matlab version 2014b was used for speech signal analysis. For each condition, a time history with SPL evaluated at 0.125 s intervals was obtained for the entirety of each reading of the texts, for a total of 12 time histories per subject. The average among all the SPL values was computed per subject and this mean was subtracted from each time history value for that subject (termed ΔSPL). This within-subject centering was performed in order to evaluate the variation in the subject's vocal behavior in the different conditions from their typical vocal behavior.

Statistical analysis was conducted using r version 3.1.2. Information-theoretic metrics (including the Akaike information criterion) and the likelihood ratio test were used to compare nested models. Models were built and post hoc comparisons were run using lme4, lmerTest, and multcomp packages. In particular, linear mixed effect (LME) models were fit by restricted maximum likelihood (i.e., REML estimates of the covariance parameters were calculated). The Satterthwaite method was used to approximate degrees of freedom.

3. Results

3.1. Vocal effort

For the objective analyses of vocal effort, the effects of the variables voice style, noise level and panel on ΔSPL were considered. The effects are shown in Fig. 2. Summary statistics are reported in Table 1. A LME model was run with the response variable ΔSPL (dB) and the terms style, noise, panel, and time (in 0.125 s intervals) with interactions of style and noise and noise and panel and a correlated random intercept and slope for time and subject. The estimates of the standard deviations of the random effects for the intercept and the slope were 0.71 and 0.019 ΔSPL. The residual standard deviation was 9.9 ΔSPL. The fixed effects β coefficients were −11.4 for the intercept, and −0.008 for time (p = 0.086). The estimate for the normal style was 9.2 ΔSPL higher than that of the soft style [standard error (SE) = 0.11, p < 0.0001], while the estimate for the loud style was 16.8 ΔSPL higher (SE = 0.11, p < 0.0001). Tukey contrasts indicated a significant difference between all styles at p < 0.0001. The difference between soft and normal and between normal and loud was 7.7 and 6.9 ΔSPL, respectively. The estimate for the babble noise was 9 ΔSPL higher than that of the background noise (p < 0.0001). The estimate for panels was 0.23 ΔSPL lower than that for the room without panels (p < 0.01).

Fig. 2.

Fig. 2.

Variation in ΔSPL (dB) with style (a), noise (b), and panel (c) conditions. Error bars represent 95% confidence intervals.

Table 1.

ΔSPL and self-reported vocal fatigue, comfort, and control by panel, style, and noise conditions (across subjects). When panel = 1, panels are present; panel = 0, absent. Style = S refers to soft style, style = N, normal style, and style = L, loud or raised style. When noise = 1, babble noise is present; noise = 0, background noise.

Condition ΔSPL (dB) Fatigue (%) Comfort (%) Control (%)
Panel Style Noise x¯ s.d. x¯ s.d. x¯ s.d. x¯ s.d.
0 S 0 −11.9 8.7 25.6 26.2 57.8 27.3 48.0 26.0
0 N 0 −2.2 10.8 34.0 25.6 69.2 19.9 72.4 17.8
0 L 0 5.1 12.5 55.2 22.5 46.6 21.9 53.6 19.8
0 S 1 −2.8 6.6 30.3 24.2 44.2 21.7 45.7 21.3
0 N 1 3.3 8.5 30.0 21.5 58.1 24.0 63.0 18.8
0 L 1 9.5 10.4 59.2 24.7 34.5 18.3 40.0 20.6
1 S 0 −11.7 9.0 26.6 23.0 48.4 25.2 47.1 26.3
1 N 0 −2.8 11.0 24.9 23.7 71.3 14.8 68.3 19.8
1 L 0 4.8 13.0 55.2 18.7 52.9 17.4 59.0 18.1
1 S 1 −3.6 6.5 30.1 25.4 40.6 20.8 44.5 20.3
1 N 1 2.6 8.4 30.0 26.8 56.6 22.6 65.3 17.3
1 L 1 9.0 10.8 63.4 18.4 35.7 15.8 44.0 18.2

The interaction between style and noise was significant [χ2(2) = 863, p < 0.0001] as was the interaction between noise and panel [χ2(1) = 15.7, p < 0.0001]; there was a larger difference in ΔSPL between babble and background noise conditions in the soft style than in other styles, and there was a larger difference between the panel and no panel conditions when babble noise was present than when it was absent. The interaction of noise and panel is shown in Fig. 3.

Fig. 3.

Fig. 3.

Variation in ΔSPL (dB) with noise (x axis), style (facets), and panel (line type) conditions. Error bars represent confidence intervals.

3.2. Self-reported vocal fatigue, comfort, and control

For self-reported fatigue, comfort, and control, the effects of the variables voice style, noise level, and panel were considered. Summary statistics are reported in Table 1. Three LME models were fit by REML in which the subjective responses on the VASs were the response variables fatigue, comfort, and control (F, CM, and CN). The predictors were style, noise, and panel, and there was a random intercept for subject.

In the case of self-reported fatigue, there was an effect of style. Tukey's multiple comparisons indicated a difference between soft and loud and between normal and loud styles at p < 0.0001. As could be expected, the loud style was associated with higher self-reported fatigue than soft and normal styles. The estimate of the standard deviation of the random effect (subject) was 10.41%. The residual standard deviation was 21%. No effect of panel or noise was found; however, there was a tendency for self-reported fatigue to increase in the absence of panels and also in the presence of babble noise. The absence of a significant effect of noise confirms the unconscious nature of the Lombard effect.

Regarding self-reported comfort, there was an effect of noise; in the background noise condition, the estimate was 12.8% higher than that for the babble condition (SE = 2.5, p < 0.0001). That is, comfort decreased significantly in the presence of babble noise. There was also an effect of style [χ2(2) = 129.1, p < 0.0001]. Tukey contrasts indicated a difference between normal and soft styles and normal and loud styles at p < 0.0001. As might be expected, the normal style was associated with greater self-reported comfort than soft and loud styles.

The results for self-reported control were similar to those for self-reported comfort. This finding is predictable given the insight into the relationship between control and comfort that “the human being is a comfort-seeking animal who will, given the opportunity, interact with the environment in ways that secure comfort.”11 The presence of babble noise was associated with a decrease of 7.6% in the estimate (p < 0.005). There was also an effect of style [χ2(2) = 55.6, p < 0.0001]. Tukey contrasts for the style variable were very similar to those associated with self-reported comfort (normal-soft and normal-loud contrasts were significant at p < 0.0001).

4. Conclusions

This study describes the effect of voice style, background noise level, and external auditory feedback on vocal effort (SPL) and self-reported vocal comfort, control, and fatigue. The results indicate a reliable effect of style on SPL. The difference in vocal effort (SPL) between soft and normal styles was 9.33 dB while the difference between soft and loud was 16.78 dB. Perceived vocal fatigue did not increase from soft to normal styles but there was an increase of 30% in perceived vocal fatigue from normal/soft to loud styles. Self-reported voice comfort and control were higher (by ≈20%) in normal style than in soft and loud styles, while soft and loud styles did not differ.

Regarding the effect of the artificial multi-talker babble noise, there was an increase in SPL of 8.96 dB when babble noise was present relative to the background noise condition. Given the variation in noise level of approximately 20 dB, the slope of the increase in voice level with noise (Lombard effect) was 0.24 dB/dB. This result is similar to the slope of 0.33 dB/dB found by Kryter12 in a laboratory setting. Self-reported voice comfort and control were lower (12.8% and 7.7%, respectively) when babble noise was present. No effect of noise on self-reported fatigue was found, confirming that the Lombard effect is unconscious in nature.

The increase in the external auditory feedback due to the presence of reflective panels significantly affected vocal effort. SPL decreased by a statistically significant 0.23 dB when panels were present. Importantly, in babble noise, SPL decreased by 0.5 dB. That is to say, the subjects benefited in an objectively measurable way from the panels, but this benefit was not perceived by the subjects.

In conclusion, the present paper reports the effects of voice style, background noise level, and external auditory feedback on subjective and objective voice measurements. These effects were measured under laboratory conditions. Conversations in real world environments with communication partners typically involve communicative (e.g., information-sharing and social) goals, which can be difficult to replicate within a laboratory environment. Nevertheless, such factors as communication goals will be considered in future laboratory work. Previous studies showed a higher slope of the Lombard effect in real settings. For example, a slope of 0.78 dB/dB was reported for teachers in real classrooms.2 Hence, future research will consider early reflection effects in real classroom settings, in which the effects are predicted to increase in strength.

Acknowledgments

The authors would like to thank L. Hunter, L. Glowski, and A. Lee and the subjects for their involvement. Research was supported by the NIDCD of the NIH under Award No. R01DC012315. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

References and links

  • 1. Hunter E. J. and Titze I. R., “ Variations in intensity, fundamental frequency, and voicing for teachers in occupational versus non-occupational settings,” J. Speech Lang. Hear. Res. 53(4), 862–875 (2010). 10.1044/1092-4388(2009/09-0040) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Bottalico P. and Astolfi A., “ Investigations into vocal doses and parameters pertaining to primary school teachers in classrooms,” J. Acoust. Soc. Am. 131, 2817–2827 (2012). 10.1121/1.3689549 [DOI] [PubMed] [Google Scholar]
  • 3. Hunter E. J. and Titze I. R., “ Quantifying vocal fatigue recovery: Dynamic vocal recovery trajectories after a vocal loading exercise,” Ann. Otol. Rhinol. Laryngol. 118(6), 449–460 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Titze I. R., Principles of Voice Production ( National Center for Voice and Speech, Salt Lake City, 2000), pp. 1–409. [Google Scholar]
  • 5. Pelegrín-García P. and Brunskog J., “ Speakers' comfort and voice level variation in classrooms: Laboratory research,” J. Acoust. Soc. Am. 132, 249–260 (2012). 10.1121/1.4728212 [DOI] [PubMed] [Google Scholar]
  • 6.ISO 9921:2002(E), Ergonomics—Assessment of Speech Communication ( International Organization for Standardization, Geneva, 2002). [Google Scholar]
  • 7. Traunmüller H. and Eriksson A., “ Acoustic effects of variation in vocal effort by men, women and children,” J. Acoust. Soc. Am. 107, 3438–3451 (2000). 10.1121/1.429414 [DOI] [PubMed] [Google Scholar]
  • 8. Hunter E. J., Halpern A. E., and Spielman J. L., “ Impact of four nonclinical speaking environments on the child's fundamental frequency and voice level: A preliminary case study,” Lang. Speech Hear. Serv. Schools 43, 252–263 (2012). 10.1044/0161-1461(2011/11-0002) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Shield B. and Dockrell J. E., “ External and internal noise surveys of London primary schools,” J. Acoust. Soc. Am. 115(2), 730–738 (2004). 10.1121/1.1635837 [DOI] [PubMed] [Google Scholar]
  • 10.ISO 3382-2:2008(E), Acoustics—Measurement of Room Acoustic Parameters, Part 2: Reverberation Time in Ordinary Rooms ( International Organization for Standardization, Geneva, 2008). [Google Scholar]
  • 11. Humphreys M. A. and Nicol J. F., “ Understanding the adaptive approach to thermal comfort,” ASHRAE Tech. Data Bull. 14(1), 1–14 (1998). [Google Scholar]
  • 12. Kryter K. D., “ Effects of ear protective devices on the intelligibility of speech in noise,” J. Acoust. Soc. Am. 18, 413–417 (1946). 10.1121/1.1916380 [DOI] [Google Scholar]

Articles from The Journal of the Acoustical Society of America are provided here courtesy of Acoustical Society of America

RESOURCES