Abstract
In speech production research, talkers often perform a speech task several times per recording session with different speaking styles or in different environments. For example, Lombard speech studies typically have talkers speak in several different noise conditions. However, it is unknown to what degree simple repetition of a speech task affects speech acoustic characteristics or whether repetition effects might offset or exaggerate effects of speaking style or environment. The present study assessed speech acoustic changes over four within-session repetitions of a speech production taskset performed with two speaking styles recorded in separate sessions: conversational and clear speech. In each style, ten talkers performed a set of three speech tasks four times. Speaking rate, median fundamental frequency, fundamental frequency range, and mid-frequency spectral energy for read sentences were measured and compared across test blocks both within-session and between the two styles. Results indicate that statistically significant changes can occur from one repetition of a speech task to the next, even with a brief practice set and especially in the conversational style. While these changes were smaller than speaking style differences, these findings support using a complete speech set for training while talkers acclimate to the task and to the laboratory environment.
I. INTRODUCTION
Both child and adult talkers adjust their speech production in response to different communication goals,1–4 environments,5,6 or partners.7,8 Further, how talkers make these speech adjustments is impacted by individual differences, such as biological sex (e.g., Refs. 9 and 10), age (e.g., Ref. 11), or disorders.12,13 Speech production can also be temporarily affected by state vocal fatigue, in which a short-term change in voice fundamental frequency (fo) over the duration of a study has been seen regardless of the task type or task order.14 This state fatigue adjustment has also been observed outside the laboratory, for example, in schoolteachers whose voices change over the course of a school day.15,16
In studying speech adjustments like these, quantifying speech acoustic changes using a range of acoustic parameters has been an integral part of both speech production research and clinical assessments. Depending on the research or clinical question, a talker is often required to perform a set of identical speech tasks multiple times. For example, in a recent study seeking to identify the sound level at which the Lombard effect starts, talkers repeated a short set of speech materials ten times in ten different noise environments.14 An underlying assumption of studies like this is that while variation from trial to trial will exist, the effect of the independent variable (in this case, adding background noise) will be larger than the natural test-retest variation. Therefore, to understand how speech production changes due to a test variable in both clinical and research settings, it is vital to identify the stability and repeatability of speech acoustic characteristics between test and retest.17–19
The degree to which speech acoustics are affected by simple repetition of a speech task in a single test session is not currently clear. One recent study examined speech level (dB) variability both within and between talkers by having participants read two standard passages (∼300 words) twice in a row.17 The results showed a minimum of 0.5 dB within-talker variability between the two productions and >1 dB of variability between talkers. Variability in speech metrics can also be affected by such things as talkers' knowledge of and familiarity with a reading passage20 or the speech topic.21 For example, Herman22 described increases in speaking rate and fluency as intermediate-grade students repeatedly read stories aloud. However, most of the rate comparisons were made across several sessions, not within a single session. The same is true for studies examining the effect of task repetition on voice fo. For example, Atkinson23 measured the fo of five talkers who produced a three-word test sentence six different ways (i.e., as a statement or a question and with emphasis placed on either the first, second, or third word), repeating the task at various intervals over the course of a year. Other studies have made comparisons at different times during a single day (e.g., Refs. 24 and 25). A similar approach was recently taken to assess vowel production variability among separate recording sessions held on the same day and on different days.26 Thus, while data exist that could serve as a frame of reference when comparing the acoustic characteristics of speech produced in different recording sessions, data on within-session variability are very sparse.
Talkers may not consciously make the speech adjustments mentioned above or even perceive they have made them, such as in the well-known, nearly reflexive “Lombard” response.27 Indeed, Lindblom's28 hyper- and hypo-articulation theory states that talkers automatically adjust their speaking style in response to the needs of their communication partner and will adopt a more hyperarticulated form of speech when they learn that their communication partner has difficulty understanding them, whether this difficulty arises from the environment (e.g., the presence of background noise) or from the communication partner (e.g., due to their hearing status or language abilities). It is possible that some factors that cause speech production alterations may override the impact of other factors, whether there is an intentional speech goal or not. For example, in a study of short-term speech changes due to randomly presented acoustic environment changes, an overriding speech production change that was dependent on the duration of participation was observed.14 While significant environmental speech production changes as well as general test-retest variability were found, the participation duration effect was still detectable. Likewise, it is possible that intentional, listener-oriented speech production (e.g., child-directed speech, clear speech) would outweigh the impact of general conversational speech variability and/or short-term fatigue. Thus, further study is needed to clarify some of these effects and examine these methodological considerations.
The present study focused on the stability of acoustic characteristics of two speaking styles, conversational and clear speech, as elicited by instructions to talkers who read the task materials. Acoustic differences between these speaking styles are well-attested and numerous29 and are typically classified into global and segmental changes.30 Globally, talkers instructed to speak clearly show decreases in speaking rate30–41 and increases in voice intensity,30,33,37,42 especially in the range between 1000 and 3000 Hz.34,36,40,43 Other global clear speech changes include an increased voice fo32,34–36,42,43 and fo range.30,32,34–36,43,44 At the segmental level, vowels in clear speech are longer in duration than those in conversational speech30,39,45–50 and show vowel-specific formant frequency shifts that combine to expand the vowel space.30,32–35,38,41,43,45–48,51 Vowels also show greater dynamic formant movement in clear speech45,46,51 as well as less formant target undershoot.49 Like vowels, consonants are longer in duration in clear speech.30,52,53 Finally, fricatives have higher spectral peak frequencies,52 and stop consonants are released more often30,32,33,43 in clear speech than in conversational speech.
In most of the clear speech literature, talkers have performed the target speaking tasks in a quiet setting, and if a task was performed more than once within a style, tokens for analysis were chosen to avoid either warm-up or fatigue effects (e.g., Refs. 7 and 46). In preparation for a study where conversational and clear speech would be produced in four different speaking environments, the present study assessed speech acoustic changes over four repetitions of a set of speech production tasks performed in those two speaking styles. For the four global speech acoustic metrics used here, we expected to observe the speaking style differences described above. We also expected to observe certain differences between men and women, particularly for measurements involving fo. However, the focus of this study was the variability that would occur within each recording session. Our hypotheses were that over the course of four repetitions, (1) speaking rate would increase with task familiarity; (2) voice fo would decrease as talkers fatigued; and (3) variability would reduce in the intentional speaking style (clear speech).
II. METHODS
All subject recruitment and other procedures were carried out in accordance with a protocol approved by the University of Utah Institutional Review Board.
A. Talkers
Talkers were recruited from the University of Utah Department of Psychology participant pool, where students receive course credit for participating in various research studies. A total of 19 talkers were enrolled in the study, but materials were analyzed from just ten talkers (five women; average age 21 years, range of 18–24 years). These ten talkers completed both recording sessions and met the following criteria by self-report: (1) they had normal hearing, (2) they had no history of speech or language disorders, (3) they had grown up in Utah, and (4) they affirmed that “I talk like I'm from around here.” Of the nine talkers whose recordings were not analyzed, one did not return for the second recording session, and one disclosed (after the second session) having received speech therapy. The remaining seven of these talkers were excluded based on dialect history: Five reported growing up outside Utah, while two reported growing up in Utah but did not confirm speaking the local dialect.
B. Materials and procedures
Speech materials were recorded in a quiet, sound-treated room using a headset microphone [Shure (Niles, IL) SM-10] and a Marantz (Carlsbad, CA) PMD 670 digital recorder. Additionally, while a VocaLog (Griffin Labs, Temecula, CA) neck collar with accelerometer and microphone routed to a Roland (Hamamatsu, Japan) Edirol R-09 HR digital recorder was used to record voicing-related vibration of the thyroid cartilage, these signals were not used in the present analyses. Talkers participated in two recording sessions with only one speech style performed in each session. In each recording session, the talkers were given three speaking tasks: (1) the first paragraph of the Rainbow Passage,54 (2) a list of 110 sentences (described below), and (3) a picture description task. The three combined speaking tasks will be labeled a taskset, and each taskset was repeated four times in each speaking style.
In the current investigation, only the sentence materials were analyzed. The 110 sentences consisted of 50 “vowel sentences” and six lists of 10 sentences selected from the Hearing in Noise Test (HINT55). The vowel sentences were created by placing each of ten vowels (/i/, /ɪ/, /e/, /ɛ/, /ae/, /a/, /ʌ/, /o/, /ʊ/, and /u/) in /bVd/ context in five different neutral sentence frames (e.g., “He said bead into the microphone.”). Although the task order was fixed for the four repetitions of the taskset, the order of the 110 sentences was randomized for each taskset.
Talkers were recorded in two test sessions on two different days following procedures used in Ferguson.7 The conversational speaking style was recorded in the first session to reduce potential contamination of clear speech instructions on the conversational speech. Talkers were given a list of 15 practice sentences to read and were instructed, “Speak as you would in everyday, normal conversation.” During this practice period, talkers were given feedback regarding how conversational their speech sounded and were instructed to repeat the practice list until the experimenter felt the speech was sufficiently conversational. In the second session, talkers were instructed, “Say the sentences as you would if you were talking to a person who has a hearing loss.” They were given the list of practice sentences, but no feedback about their speaking style. This meant each talker decided when they had achieved what they considered “clear speech.” After this practice, talkers performed the taskset (the Rainbow Passage, the 110 sentences, and the picture description) a total of four times (four taskset repetitions) in each test session. They were offered a short break and water to drink after each taskset.
In both recording sessions, talkers were instructed to immediately repeat any sentences that they misread or that contained any disfluencies or extraneous noise (e.g., from handling the printed lists of sentences) by saying “repeating” and then reading the sentence again. If a sentence contained an error, disfluency, or noise that the talker did not notice, the experimenter monitoring the recording prepared a list of sentences that the talker needed to repeat. This list was given to the talker to read after they completed the list of 110 sentences, before they moved on to the picture description task. The list contained each error sentence plus the sentences that preceded and followed it in the original list. For example, if they made an error on sentence 12, the list of sentences to re-read included sentences 11, 12, and 13 (in that order). For any sentence containing an error, only the correctly read version was used in the present analyses.
C. Speech production analyses
For each talker, 880 individual sentence files (110 sentences × 4 tasksets × 2 speaking styles) were extracted from the original recording files using Praat56 textgrids generated using ELAN annotation.57 The acoustic speech measures described below were then performed on the 8800 individual sentence files (880 sentences × 10 talkers). While there are a wide range of possible acoustic speech measures, the statistical analysis was focused on four speech acoustic characteristics that have commonly been reported as changing in clear vs conversational speech (as detailed in the introduction) and/or could reasonably capture production changes related to fatigue and familiarity:16 (1) speaking rate, (2) median fo, (3) fo range, and (4) mid-frequency spectral energy.
1. Speaking rate
The duration of each sentence was measured with the aid of semi-automatic custom matlab scripts in which a research assistant was presented with the recording wave forms to mark the onset and offset of each sentence, and these times and the computed duration between onset and offset were saved in a data file. Speaking rate was then computed for each sentence by dividing the number of syllables in the written sentence by its spoken duration in seconds.
2. Voice fo measures
Each sentence was analyzed using another set of custom matlab scripts, which preprocessed and managed the recorded samples. The matlab scripts used several different published extraction scripts; of those, the Praat (5.4.17) autocorrelation method58 most closely matched the average of the values yielded by the various techniques. For each recorded sentence, a contour of several hundred fo values taken at 10-ms increments was obtained using the Praat command line,59 and the median fo value (fo-median) and the fo inter-quartile range (fo-IQR) were computed to capture general voice fo characteristics and the time-varying properties of fo, respectively. After the first round of fo extraction, it appeared that some of the participants used vocal fry phonation in a sizeable number of productions, especially in the conversational speech session. Since this fry phonation seemed to be impacting the speech fo extraction, we employed an automatic technique for detection of vocal fry60,61 and removed those fry segments from the extracted fo values before computing fo-median and fo-IQR. On average, vocal fry detection accounted for less than 12% of all voiced production segments.
3. Mid-frequency spectral energy
The same custom matlab scripts used for the voice fo measures also performed spectral analysis on each recorded sentence. First, the long-term average spectrum was calculated between 50 and 8000 Hz using default settings in published spectral algorithms.62 Since previous studies (e.g., Ref. 40) have found a clear speech effect in the mid-frequency range, energy in the spectral band 1–3.15 kHz was also extracted. This was done by calculating first the overall energy (sum of all spectral components between 50 and 8000 Hz) and then the energy between 1 and 3.15 kHz (sum of all spectral components between 1 and 3.15 kHz). The energy difference in dB between these two was then calculated. As the recordings were all completed at the same gain level and same mic to mouth difference, the mid-frequency energy could also be directly compared. This analysis step was tested on a variety of noise samples (e.g., white, brown, pink) and on spectrally manipulated samples to verify expected differences.
D. Statistical analyses
For each acoustic measure, mixed-effects linear regression models were carried out on the individual sentence data using Stata 17 to determine the significance of the overall fixed effect of speaking style, the two-way interaction between speaking style and taskset, and the fixed effect of taskset within each speaking style. Our chief interest in this study was the stability of speech production across the four tasksets within each recording session/speaking style, and, thus, values for any specific taskset averaged across the two speaking styles are uninformative, as are differences between identical tasksets in the two speaking styles. We therefore modeled taskset first as potentially interacting with speaking style and then as a fixed effect within each speaking style. Although we tested all six possible contrasts63,64 (i.e., 1 vs 2, 1 vs 3, 1 vs 4, 2 vs 3, 2 vs 4, and 3 vs 4), we report mainly on differences between adjacent tasksets (i.e., 1 vs 2, 2 vs 3, and 3 vs 4), thus, focusing on changes in speech metrics from one taskset to the next.
Since voice fo differs so greatly between adult men and women, this was accounted for in all statistical models by including the fixed effect of talker gender, the two-way interaction between speaking style and talker gender, and the three-way interaction between taskset, speaking style, and talker gender. When the two-way interaction was significant, stratified analyses (i.e., models carried out on subsets of the data) were carried out to identify the source of the interaction, testing the fixed effect of style for each talker gender and the fixed effect of talker gender for each speaking style. Any potential relationship between talker gender and taskset was explored only if the three-way interaction between all three fixed effects was significant. Talker was included as a random intercept in all models.
III. RESULTS
Average speaking rate, fo-median, fo-IQR, and mid-frequency energy for the four tasksets in each speaking style for men and women are displayed in Table I. The results of the overall mixed-effects linear regression analyses for the four measures are summarized in Tables II–V; results of stratified analyses exploring the nature of significant two- and three-way interactions are reported in the text, as are comparisons between tasksets within each speaking style. Given that the acoustic measures were chosen specifically because they previously had been reported to differ between conversational and clear speech, we first report the effect of style for each measure. We next report talker gender effects as well as any interaction between speaking style and talker gender, since several of the measures are known to differ between men and women. Finally, we report contrasts between adjacent tasksets within each speaking style and, thus, address our chief research question.
TABLE I.
Speaking rate, fo-median, fo-IQR, and mid-frequency energy for four tasksets (1–4) (read lists of 110 sentences) in the two speaking styles for the five men and five women.
| Conversational | Clear | |||||||
|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | |
| Speaking rate (syllables/s) | ||||||||
| Men | 5.41 | 5.64 | 5.68 | 5.79 | 4.47 | 4.46 | 4.22 | 4.30 |
| Women | 5.10 | 5.24 | 5.28 | 5.35 | 3.66 | 3.52 | 3.34 | 3.43 |
| fo-median (Hz) | ||||||||
| Men | 127.68 | 130.24 | 133.41 | 128.50 | 141.70 | 111.38 | 144.11 | 144.17 |
| Women | 227.70 | 230.37 | 232.39 | 231.10 | 231.97 | 233.37 | 231.75 | 233.71 |
| fo-IQR (Hz) | ||||||||
| Men | 23.86 | 28.23 | 32.05 | 29.62 | 29.24 | 29.65 | 30.06 | 30.01 |
| Women | 42.55 | 48.11 | 53.80 | 55.63 | 44.36 | 44.23 | 44.47 | 46.20 |
| Mid-frequency energy (dB) | ||||||||
| Men | 34.42 | 34.05 | 34.00 | 33.42 | 35.16 | 34.67 | 34.54 | 34.34 |
| Women | 36.49 | 36.75 | 36.78 | 36.89 | 37.49 | 37.63 | 38.01 | 37.85 |
TABLE II.
Results of linear mixed-effects models for speaking rate. Tests of taskset across speaking styles (i.e., the fixed effect of taskset and the gender × taskset interaction) are excluded. Stratified analyses exploring significant interactions and assessing differences between tasksets within each speaking style are reported in the text.
| Estimate | Standard error | z-value | p-value | |
|---|---|---|---|---|
| Style | −1.51 | 0.02 | −80.46 | <0.001 |
| Gender | −0.63 | 0.26 | −2.4 | 0.02 |
| Style × gender | −13.29 | <0.001 | ||
| Style × taskset | −9.96 | <0.001 | ||
| Style × gender × taskset | 0.13 | 0.49 |
TABLE III.
Results of linear mixed-effects models for fo-median. Tests of taskset across speaking styles (i.e., the fixed effect of taskset and the gender × taskset interaction) are excluded. Stratified analyses exploring significant interactions and assessing differences between tasksets within each speaking style are reported in the text.
| Estimate | Standard error | z-value | p-value | |
|---|---|---|---|---|
| Style | 7.74 | 0.31 | 25.08 | <0.001 |
| Gender | 94.54 | 7.49 | 12.63 | <0.001 |
| Style × gender | −19.70 | <0.001 | ||
| Style × taskset | −3.55 | <0.001 | ||
| Style × gender × taskset | 0.26 | 0.80 |
TABLE IV.
Results of linear mixed-effects models for fo-IQR. Tests of taskset across speaking styles (i.e., the fixed effect of taskset and the gender × taskset interaction) are excluded. Stratified analyses exploring significant interactions and assessing differences between tasksets within each speaking style are reported in the text.
| Estimate | Standard error | z-value | p-value | |
|---|---|---|---|---|
| Style | −2.10 | 0.31 | −6.56 | <0.001 |
| Gender | 18.22 | 5.15 | 3.54 | <0.001 |
| Style × gender | −11.05 | <0.001 | ||
| Style × taskset | −10.08 | <0.001 | ||
| Style × gender × taskset | −3.00 | 0.003 |
TABLE V.
Results of linear mixed-effects models for mid-frequency energy. Tests of taskset across speaking styles (i.e., the fixed effect of taskset and the gender × taskset interaction) are excluded. Stratified analyses exploring significant interactions and assessing differences between tasksets within each speaking style are reported in the text.
| Estimate | Standard error | z-value | p-value | |
|---|---|---|---|---|
| Style (clear vs conversational) | 0.87 | 0.06 | 14.20 | <0.001 |
| Gender (men vs women) | 2.90 | 1.19 | 2.44 | <0.02 |
| Style × gender | 2.79 | <0.01 | ||
| Style × taskset | −0.70 | 0.483 | ||
| Style × gender × taskset | 1.17 | 0.242 |
A. Speaking rate
Speaking rate was significantly slower in clear speech [mean (M) = 3.92 syllables (syll)/s] than in conversational speech (M = 5.44 syll/s), and women overall spoke significantly more slowly than men (M = 4.36 vs 4.99 syll/s). The two-way interaction between speaking style and talker gender was significant, as shown in the top left panel of Fig. 1. A stratified analysis showed that the talker gender effect was significant only in clear speech (β = 0.88, z = 2.31, p < 0.03); in conversational speech, speaking rate did not differ significantly between men and women (β = 0.39, z = 1.12, p = 0.26). This analysis also showed that the women reduced their speaking rate when they adopted a clear speech style vs their conversational speech style to a greater extent (β = 1.76, z = 71.48, p < 0.001) than the men (β = 1.27, z = 45.33, p < 0.001).
FIG. 1.
Average speaking rate (top left), median voice fundamental frequency (fo; top right), voice fo-IQR (bottom left), and energy in the 1000–3150 Hz frequency range (mid-frequency energy; bottom right) in conversational (CO) and clear (CL) speech for five women and five men.
The two-way interaction between taskset and speaking style was significant. Stratified analyses showed that during the conversational speaking style session, rate increased significantly by 0.2 syllables per second from taskset 1 to taskset 2 (z = 4.99, p < 0.001), was stable between tasksets 2 and 3 (β = 0.04, z = 0.95, p = 0.34), and increased significantly between tasksets 3 and 4 (β = 0.09, z = 2.49, p < 0.02). As seen in the top left panel of Fig. 2, the biggest increase in speaking rate during the conversational speech recording session occurred between tasksets 1 and 2. During the clear speech recording session, a different pattern was observed. In this second session, talkers significantly reduced their speaking rate between taskset 1 and taskset 2 (β = −0.08, z = 2.86, p < 0.01) and between tasksets 2 and 3 (β = −0.21, z = 7.56, p < 0.001) but spoke more quickly during taskset 3 than during taskset 4 (β = 0.08, z = 2.97, p < 0.01). The three-way interaction between taskset, speaking style, and talker gender was not significant, suggesting that men and women showed similar patterns across the four tasksets in each speaking style.
FIG. 2.
Average speaking rate (top left), median voice fundamental frequency (fo; top right), voice fo-IQR (bottom left), and energy in the 1000–3150 Hz frequency range mid-frequency energy (bottom right) in conversational (CO) and clear (CL) speech for ten talkers carrying out a speech taskset four times.
B. fo-median
Median voice fo was significantly higher in clear speech (M = 187.62 Hz) than in conversational speech (M = 180.22 Hz) and for women than for men (M = 231.54 vs 136.78 Hz, respectively). The two-way interaction between these two fixed effects was significant, as shown in the top right panel of Fig. 1. The stratified analyses showed that the voice fo difference between women and men was significantly larger in the conversational speaking style (β = 100.41, z = 12.86, p < 0.001) than in clear speech (β = 88.58, z = 10.32, p < 0.001) and that the speaking style effect, while significant for both genders, was much greater for the men (β = 13.60, z = 36.17, p < 0.001) than for the women (β = 1.81, z = 3.83, p < 0.01).
The two-way interaction between taskset and speaking style was significant and can be seen in the top right panel of Fig. 2. In conversational speech, fo-median significantly increased by about 3 Hz from taskset 1 to taskset 2 (β = 2.58, z = 4.09, p < 0.001) and from taskset 2 to taskset 3 (β = 2.59, z = 4.10, p < 0.001) but then significantly decreased by the same amount from taskset 3 to taskset 4 (β = −3.08, z = −4.87, p < 0.001). In clear speech, a slightly different pattern was observed. Median fo increased by about 3 Hz from taskset 1 to taskset 2 (β = 3.16, z = 6.66, p < 0.001), decreased by about 1 Hz between tasksets 2 and 3 (β = −1.16, z = −2.45, p < 0.02), and then increased by about 2 Hz between tasksets 3 and 4 (β = 1.99, z = 2.53, p < 0.02). As was the case for speaking rate, the three-way interaction between taskset, speaking style, and talker gender was not significant for fo-median. Women and men, thus, showed similar median voice fo changes across the four tasksets in each speaking style.
C. fo-IQR
Voice fo variability was significantly lower in clear speech than in conversational speech (M = 37.19 vs 39.24 Hz) and significantly higher for women than for men (M = 47.46 vs 29.09 Hz, respectively). Like the other metrics, fo-IQR showed a significant two-way interaction between speaking style and talker gender, as illustrated in the bottom left panel of Fig. 1. Voice fo variability was higher for women than men in both speaking styles, with a larger talker gender effect in conversational speech (β = 21.57, z = 4.17, p < 0.001) than in clear speech (β = 14.87, z = 2.71, p < 0.01). The speaking style effect was quite different for men and women, although significant for both genders. While the women showed significantly lower fo-IQR in clear vs conversational speech (β = –5.44, z = –10.73, p < 0.001), men increased their fo variability slightly when speaking clearly (β = 1.28, z = 3.52, p < 0.001).
Unlike speaking rate and median voice fo, fo-IQR showed a significant three-way interaction between style, taskset, and talker gender, as shown in Fig. 3. The two-way interaction between style and taskset was significant for both women and men (all |z| > 3.25 and all p < 0.01), but stratified analyses revealed slight differences between these two genders in how fo-IQR changed between tasksets in each speaking style. In conversational speech, men and women showed a very similar pattern of change in fo-IQR between tasksets 1 and 2 (men: β = 4.34; women: β = 5.56; both z > 5 and p < 0.001) and between tasksets 2 and 3 (men: β = 3.81; women: β = 5.68; both z > 4.5 and p < 0.001), where the variability generally increased with taskset. Where women and men differed was between tasksets 3 and 4. While the men showed a significant decrease in fo-IQR (β = –2.4, z = –3.15, p < 0.01), women showed a nonsignificant change (β = 1.83, z = 1.69, p = 0.09). In clear speech, patterns were quite similar for the men and women: the men showed no significant changes in fo-IQR between tasksets (|β |< 0.45, |z| < 0.7, p > 0.5), while women showed a small but significant change only between tasksets 3 and 4 (β = 1.75, z = 2.01, p < 0.05). The differences between tasksets 1 and 2 and between tasksets 2 and 3 were not significant for the women (β < 0.8, z < 0.9, p > 0.35).
FIG. 3.
Average voice fo-IQR in conversational (CO) and clear (CL) speech for five women (top) and five men (bottom) carrying out a speech taskset four times.
D. Mid-frequency energy
Spectral energy between 1000 and 3150 Hz was significantly higher in clear speech than in conversational speech (M = 36.21 vs 35.35 dB) and significantly higher for women than for men (M = 37.24 vs 34.33 dB, respectively). As with the other three metrics, the interaction between style and talker gender was significant for mid-frequency energy, as illustrated in the bottom right panel of Fig. 1. Both genders showed significant increases in mid-frequency energy when speaking clearly vs conversationally, but the effect was larger for the women (β = 1.03, z = 12.11, p < 0.001) than for the men (β = 0.70, z = 8.02, p < 0.001). The talker gender effect was significant and virtually identical for both conversational and clear speech (β = 2.73, z = 1.14 and β = 3.07, z = 1.29, respectively, both p < 0.02). The three-way interaction between taskset, speaking style, and talker gender was not significant.
Although mid-frequency energy was unique among our acoustic metrics in not showing a significant two-way interaction between speaking style and taskset, the taskset effect in each style remains of interest and is shown in the bottom right panel of Fig. 2. Stratified analyses showed that for both styles, there was an overall decline in mid-frequency energy from the beginning to the end of the recording session. In conversational speech, the only significant contrasts among tasksets occurred when taskset 4 was compared to tasksets 1 and 2 (β = –0.31, z = –2.41, p < 0.02 and β = –0.25, z = –1.99, p < 0.05). Contrasts between adjacent tasksets in the conversational speech recording session were not significant (1 vs 2: β = –0.05, z = –0.43; 2 vs 3: β = –0.01, z = –0.07; 3 vs 4: β = –0.25, z = –1.92; all p > 0.05). Similarly, in clear speech, the only significant contrast was between tasksets 1 and 4 (β = –0.23, z = –2.02, p < 0.05); contrasts between adjacent tasksets were all nonsignificant (1 vs 2: β = –0.18, z = –1.56; 2 vs 3: β = 0.12, z = 1.10; 3 vs 4: β = –0.17, z = –1.55; all p > 0.10).
IV. DISCUSSION
A. Speaking style and talker gender effects
The current study evaluated the within-session variability of speech rate, voice fo (median and range), and speech spectral characteristics in read conversational and clear speech styles. As discussed at length in the Introduction, numerous studies have compared the acoustic characteristics of clear vs conversational speech produced by talkers reading aloud; the present results for speaking rate (slower), overall voice fo (higher), and mid-frequency energy (higher) are consistent with this literature. Although absolute speaking rates vary slightly among studies depending on the materials and on how speaking rate is operationalized, the clear/conversational ratio is strikingly consistent. The clear/conversational speaking rate ratio for the present group of talkers was 0.72. When we computed clear/conversational ratios using data reported for speech produced in quiet by younger and older adult talkers in Table I of Smiljanic and Gilbert,40 the results were nearly identical: 0.72 and 0.70, respectively.
The significant increases in overall voice fo and in mid-frequency energy that we observed in clear speech have also been observed in several other studies, although the size of the clear speech effects observed varies somewhat. As an example, we compare the results for our sample to those reported for young adult talkers in quiet in Smiljanic and Gilbert.40 Overall voice fo was 7.4 Hz higher in clear vs conversational speech in our sample, while their clear speech effect was just 2.4 Hz. Conversely, the clear speech effect we found for mid-frequency energy was smaller than theirs (0.86 vs 1.86 dB). Such variability among studies is not surprising, however. The magnitude of clear speech effects varies greatly among talkers, with the two talkers in Bradlow et al.32 providing an excellent example: the male and female talkers (one each) showed overall voice fo differences of 8 and 59 Hz, respectively. Our sample of ten talkers was not large enough to wash out such individual variability, although it was appropriate given the volume of material that was recorded. Methodological details, such as speech materials and elicitation technique, including the instructions used to elicit clear speech,65 also impact the size of the clear speech effect.
Unlike the other three metrics, our results for fo variability stand in marked contrast with the previous literature. Most studies report an expanded fo range in clear vs conversational speech, while we observed a significant decrease in fo range in clear speech. An examination of Fig. 3 suggests that our overall style effect was driven by increased fo range in the later task sets of the conversational speech recording session, especially for women. These within-session trends are discussed at length below.
Both men and women were included in the present study to have a more representative sample of talkers, rather than specifically to test talker gender differences. Nonetheless, because talker gender differences are commonly seen in speech acoustic characteristics (especially voice fo), talker gender was included in the statistical models to account for any talker gender effects when we tested the effects of speaking style and taskset. Therefore, while there were not enough men and women included in the present study to make gender-based conclusions, certain trends were seen that appear to be consistent with the literature. The effect of talker gender was significant for all four metrics, and the magnitude and direction of the talker gender differences were consistent with the previous literature. For example, the slower speaking rate observed here for women than for men was also seen in Jacewicz et al.,66 while the higher median voice fo for women was also seen in Leung et al.67 and any number of previous studies. While interactions between speaking style and talker gender have been reported for perceptual measures of clear vs conversational speech (e.g., vowel intelligibility7), they have not been consistently assessed for acoustic measures. For example, Smiljanic and Gilbert,40 who had five talkers per gender in each of three age groups, collapsed across talker gender in their statistical analyses.
B. Taskset effects (within-session variability)
The goal of the present paper was to determine whether the acoustic characteristics of conversational and clear read speech were stable when speaking tasks were repeated several times in a single recording session. Since the interest was in what happened within each specific speaking style, taskset effects were assessed within each speaking style only, not across styles.
1. Speaking rate
Speaking rate increased over the course of the conversational speech recording session, with the biggest change (0.2 syllables/s) occurring between tasksets 1 and 2, a nonsignificant increase between tasksets 2 and 3, and another significant increase between tasksets 3 and 4. These results support our hypothesis that speaking rate would increase with task familiarity and are consistent with literature in the field of reading education, where fluency of oral reading has long been used as an indicator of reading ability.22 The results also indicate that the practice list of 15 sentences the talkers read failed to eliminate practice effects from the conversational speech recordings, despite being designed to serve as a “warm-up” for each speaking style and to familiarize the talkers with the recording procedures, the sentence frames, and the spellings of the various /bVd/ words. The results from our study suggest that any time a speech production task involves reading aloud, practice should consist of reading the complete set of materials aloud, not a subset of them (even a representative one).
Practice effects were also found for speaking rate in the clear speech recording session, but the changes for the first three tasksets were in the opposite direction from the conversational speech session. Instead of increasing, speaking rate decreased between tasksets 1 and 2 and between tasksets 2 and 3. In addition, while the largest taskset effect in conversational speech was between tasksets 1 and 2, for clear speech, the largest change (0.2 syllables/s) was between tasksets 2 and 3. Speaking rate then increased very slightly between tasksets 3 and 4. This pattern of results suggests that it took the talkers two passes through the taskset to achieve a speech register that they thought matched how they would talk to someone with hearing loss. As in the conversational speech session where the practice list of 15 sentences was insufficient to overcome reading practice effects, this practice list was insufficient to overcome speaking style practice effects in the clear speech session. Despite their statistical significance, however, we note that the taskset effects for speaking rate were very small compared to the speaking style effect: The overall clear speech effect for speaking rate was 1.5 syllables/s, while the largest taskset effect in each style was 0.2 syllables/s. Therefore, while there were statistically significant taskset effects, the impact on speech communication would likely be minimal compared to intentional speech style effects.
2. Voice fo measures
In both recording sessions, fo-median rose significantly (3 Hz in conversational speech and 4 Hz in clear speech) between tasksets 1 and 2. This is consistent with a warm-up effect (e.g., Ref. 68) and, like the speaking rate results, could suggest that our practice list of sentences was not long enough to provide sufficient warm-up for either style. After the second taskset, fo-median behaved differently in the two speaking styles. In conversational speech, fo-median rose another 2 Hz between tasksets 2 and 3 before falling 3 Hz between tasksets 3 and 4. This suggests that the warm-up effect continued into taskset 3. The decrease between tasksets 3 and 4 is consistent with our hypothesis that voice fo would decrease as talkers fatigued, although this decrease might also have been attributable to boredom or to impatience with the task. No fatigue effects were observed in the clear speech session, however: fo-median fell by 1 Hz between tasksets 2 and 3 and then rose by 1 Hz between tasksets 3 and 4. This result is counter-intuitive, given that clear speech requires more vocal effort, and presumably is more fatiguing, than conversational speech. Indeed, several clear speech acoustic characteristics, including increases in fo and mid-frequency energy, are considered to be indicators of vocal effort (e.g., Ref. 69). The fo-median data for clear speech also contrast with the speaking rate data, where decreases continued into taskset 3, while the style-related fo increase seems to have stabilized by taskset 2. This suggests that any fatigue effects associated with the more effortful speaking style were offset by the intentional style goal of clear speech. One may also consider whether the magnitude of this fo change is perceptually meaningful or important to a given research question. Researchers should evaluate their production task materials and goals of their study to determine whether a small, but statistically significant, change in fo is meaningful.
In contrast with the other three acoustic metrics, our measure of fo variability (fo-IQR) showed a significant three-way interaction between talker gender, speaking style, and taskset. However, both men and women showed similar patterns of change across the first three tasksets, patterns that differed sharply between the two styles. In conversational speech, fo-IQR increased dramatically between tasksets 1 and 2 and between tasksets 2 and 3, while in clear speech, no significant changes were observed among these tasksets. While we attributed the increases seen for fo-median between tasksets 1 and 2 in both speaking styles to vocal warm-up effects, warm-up does not explain the fo-IQR results, because warm-up should have occurred in both recording sessions. There are several possible explanations for effects occurring only during the first recording session. One is familiarity with the reading task. Another possibility, however, is that talkers spoke with increasingly greater fo variability as they strove to achieve an everyday, conversational speaking style, variability that disappeared when they spoke in an intentionally clear manner in the clear speech session.
For the women, fo-IQR continued to increase between tasksets 3 and 4 in conversational speech, while fo-IQR decreased between these tasksets for the men. The fo-IQR values for the women rose so high that their average fo-IQR was significantly higher in conversational speech than in clear speech. This negative clear speech effect is in marked contrast with the previous clear speech literature (e.g., Refs. 30, 34, 36, and 44) and difficult to explain. One possibility we explored was that the women engaged in vocal fry in the conversational speech session, which increased as they became familiar with the reading task and/or speaking style. This possibility was suggested by subjective impressions of the recorded speech as well as by an earlier study,70 which reported 80% less fry phonation in women reading in a clear speaking style than a conversational style. While an analysis of fry phonation in our recorded sentences also showed decreased fry in clear speech relative to conversational speech (12% of segments vs 6%), the proportion of fry segments was stable across the tasksets in the conversational session. The relative stability of fo-IQR in the clear speech sessions suggests that whatever modifications talkers made to their fo range when speaking clearly, they were able to accomplish this change during the practice list.
3. Mid-frequency energy
While the other three acoustic measures showed significant changes between adjacent tasksets, significant changes in the amount of energy between 1000 and 3150 Hz were only found when taskset 4 was compared to taskset 1 (in both speaking styles) or taskset 2 (in conversational speech only). Further, unlike the other metrics, which showed different taskset effects for the two speaking styles, the pattern of change in mid-frequency energy across tasksets was similar in conversational and clear speech, decreasing gradually from the beginning to the end of each recording session. This pattern of results suggests that mid-frequency energy is a stable characteristic, one that talkers “find” easily in a given recording session without need for task familiarity or warm-up. That is, it appears that talkers adopt a level of vocal effort consistent with task instructions and stay fairly close to their starting level. The slight decreases across the four tasksets might be due to fatigue, although if this were true, we would expect larger effects in clear speech since the effort is greater than conversational speech.69
4. General discussion of taskset effects
We had three hypotheses regarding the acoustic changes that we would see across four repetitions of a read speech task using conversational and clear speaking styles. The first was that speaking rate would increase with task familiarity. The data were consistent with this hypothesis for the conversational style: Speaking rate increased between tasksets 1 and 2 and again between tasksets 3 and 4. In the clear speech style, talkers were presumably as familiar with the materials as they could be, so the speaking rate decreases observed from taskset 1 to taskset 2 to taskset 3 can be attributed to talkers gaining confidence with how to produce clear speech.
Our second hypothesis was that voice fo would decrease over the course of each session due to fatigue. While fo-median fell significantly between tasksets 3 and 4 in conversational speech, this did not occur in clear speech, which we assume would be the more fatiguing style to produce. Instead, the results for fo-median were consistent with warm-up effects, at least in the conversational speaking style session. These warm-up effects were also seen for our measure of fo variability, fo-IQR, but again only for the conversational speech session. Taken together, the results support the common procedure of having talkers produce several tokens of the speech stimuli for a given experiment and then discarding the earliest productions when choosing tokens for acoustic or perceptual analysis. Given the significant increases observed in speaking rate and in both fo measures between tasksets 1 and 2, we advise having talkers read the entire set of materials aloud prior to beginning recording, as opposed to a short list of practice sentences as was used here.
Our third hypothesis was that any variability observed across tasksets in the conversational speech session would decrease in the clear speech session. The assumption behind this hypothesis was that when talkers have a specific speech target, such as a level or style, they would be more consistent in their production. However, this pattern only occurred for one of the metrics, fo-IQR. Given the recording procedures, the presence of taskset effects for speaking rate and fo-median in both the conversational and clear speech styles is perhaps not particularly surprising. Recall that while talkers received feedback about their speaking style during the practice sentences when they produced conversational speaking style, they received no feedback in the clear speech session. Identical procedures were used by Ferguson,7 whose intent was “to help the talkers produce speech approximating their everyday conversational speech despite the formal laboratory conditions” in the conversational speech session (p. 2367) and to mimic the non-specific request to speak more clearly that a talker might receive from a communication partner who is having trouble understanding them in the clear speech session. Our hypothesis assumed reduced variability when talkers had a specific speech target, but our procedures arguably gave talkers a more well-defined target in conversational speech than in clear speech. The effects of specific vs non-specific speaking targets on within-session variability, thus, remain an open question.
Finally, we note that much of the variability observed from one taskset to the next was quite small. Table VI shows the difference between adjacent tasksets in each speaking style as well as the difference between clear and conversational speech for each acoustic metric. The most dramatic contrast between speaking style effects and taskset effects is for speaking rate, where the speaking style effect was nearly 8 times the size of the largest taskset effect. For fo-median, the clear speech effect was 1.7 times the size of the largest taskset effect. There are also taskset effects that, while statistically significant (for example, the 1-Hz decrease between tasksets 2 and 3 in clear speech), are likely not noticeably different in a conversation. The statistical significance of this contrast may be due in part to the sheer number of samples analyzed here (110 sentences × 4 tasksets × 2 speaking styles). Nonetheless, the largest taskset changes usually occurred between tasksets 1 and 2, suggesting that the talkers in this study required more extensive practice than they were given. Investigators aiming to characterize the acoustic effects of a specific speech manipulation using read speech should keep in mind the practice effects observed here when designing their recording procedures.
TABLE VI.
Difference between adjacent tasksets in conversational (CO) and clear (CL) speech and difference between CL and CO speech for four acoustic metrics. *, p < 0.05; †, p < 0.01; ‡, p < 0.001.
| Conversational | Clear | CL − CO | |||||
|---|---|---|---|---|---|---|---|
| 2 − 1 | 3 − 2 | 4 − 3 | 2 − 1 | 3 − 2 | 4 − 3 | ||
| Speaking rate (syllables/s) | 0.19‡ | 0.04 | 0.09* | −0.08† | −0.21‡ | 0.08† | −1.51‡ |
| fo-median (Hz) | 2.48‡ | 2.59‡ | −3.05‡ | 4.28‡ | −0.95* | 1.01* | 7.40‡ |
| fo-IQR* (Hz) | 4.94‡ | 4.75‡ | −0.28 | 0.52 | 0.32 | 0.84 | −2.05‡ |
| Mid-frequency energy (dB) | −0.05 | −0.02 | −0.23 | −0.18 | 0.13 | −0.18 | 0.86‡ |
V. CONCLUSIONS
The present study investigated the acoustic variability that occurs within a single recording session when a speech task, specifically read speech, is repeated several times, as often occurs in studies examining the effects of speaking style, speaking condition, or other manipulations. Over the course of two recording sessions, scheduled on different days, during which talkers read a list of 110 sentences four times in either a conversational or clear speaking style, talkers' speaking rate, median fo, and fo range showed significant changes among the four readings, with the largest changes occurring between the first and second iterations. Voice fo range was more stable in the clear speaking style than in the conversational style, suggesting reduced variability when talkers are given a speech production target. However, speaking rate and median fo showed similar variability in the two speaking styles. In contrast with the other measures, mid-frequency energy changed very little within each speaking style. While within-session changes were generally small compared to the between-session effect of speaking style (i.e., the clear speech effect), the acoustic effects of other manipulations may be subtler and could have the potential of being wiped out by practice effects. We recommend that for any study where talkers read aloud in various conditions, practice should consist of reading the entire set of materials aloud rather than relying on a subset of the materials.
ACKNOWLEDGMENTS
We would like to acknowledge Jaime Booz and Mark Berardi, who assisted with acoustic analysis, and Jessica Staples, who helped prepare this manuscript. The authors gratefully acknowledge Pascal Deboeck and Greg Stoddard for statistical guidance. This research was supported by the NIDCD of the National Institutes of Health under Grant No. R01DC012315. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Portions of this work were presented at the 167th meeting of the Acoustical Society of America, Providence, RI, USA, April 2014.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Ethics Approval
The research described here was carried out according to procedures approved by the University of Utah Institutional Review Board, Protocol No. IRB_00056031. All participants provided informed consent.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
- 1. Hunter E. J., “ A comparison of a child's fundamental frequencies in structured elicited vocalizations versus unstructured natural vocalizations: A case study,” Int. J. Pediatr. Otorhinolaryngol. 73, 561–571 (2009). 10.1016/j.ijporl.2008.12.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. McAllister A. and Brandt S. K., “ A comparison of recordings of sentences and spontaneous speech: Perceptual and acoustic measures in preschool children's voices,” J. Voice 26, 673.e1–673.e5 (2012). 10.1016/j.jvoice.2011.12.013 [DOI] [PubMed] [Google Scholar]
- 3. Wagner P., Trouvain J., and Zimmerer F., “ In defense of stylistic diversity in speech research,” J. Phon. 48, 1–12 (2015). 10.1016/j.wocn.2014.11.001 [DOI] [Google Scholar]
- 4. Xu Y., “ In defense of lab speech,” J. Phon. 38, 329–336 (2010). 10.1016/j.wocn.2010.04.003 [DOI] [Google Scholar]
- 5. Hunter E. J., Halpern A. E., and Spielman J. L., “ Impact of four nonclinical speaking environments on a child's fundamental frequency and voice level: A preliminary case study,” Lang. Speech Hear. Serv. Sch. 43, 253–263 (2012). 10.1044/0161-1461(2011/11-0002) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Zollinger S. A. and Brumm H., “ The Lombard effect,” Curr. Biol. 21, R614–R615 (2011). 10.1016/j.cub.2011.06.003 [DOI] [PubMed] [Google Scholar]
- 7. Ferguson S. H., “ Talker differences in clear and conversational speech: Vowel intelligibility for normal-hearing listeners,” J. Acoust. Soc. Am. 116, 2365–2373 (2004). 10.1121/1.1788730 [DOI] [PubMed] [Google Scholar]
- 8. Snow C. E. and Ferguson C. A., Talking to Children: Language Input and Acquisition ( Cambridge University, Cambridge, UK, 1977). [Google Scholar]
- 9. Hunter E. J., Berardi M. L., and van Mersbergen M., “ Relationship between tasked vocal effort levels and measures of vocal intensity,” J. Speech Lang. Hear. Res. 64, 1829–1840 (2021). 10.1044/2021_JSLHR-20-00465 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Letowski T., Frank T., and Caravella J., “ Acoustical properties of speech produced in noise presented through supra-aural earphones,” Ear Hear. 14, 332–338 (1993). 10.1097/00003446-199310000-00004 [DOI] [PubMed] [Google Scholar]
- 11. Amazi D. K. and Garber S. R., “ The Lombard sign as a function of age and task,” J. Speech Lang. Hear. Res. 25, 581–585 (1982). 10.1044/jshr.2504.581 [DOI] [PubMed] [Google Scholar]
- 12. Doyle P. J., Goda A. J., and Spencer K. A., “ The communicative informativeness and efficiency of connected discourse by adults with aphasia under structured and conversational sampling conditions,” Am. J. Speech. Lang. Pathol. 4, 130–134 (1995). 10.1044/1058-0360.0404.130 [DOI] [Google Scholar]
- 13. Easterbrook A., Brown B. B., and Perera K., “ A comparison of the speech of adult aphasic subjects in spontaneous and structured interactions,” Int. J. Lang. Commun. Disord. 17, 93–107 (1982). 10.3109/13682828209012223 [DOI] [PubMed] [Google Scholar]
- 14. Bottalico P., Passione I. I., Graetzer S., and Hunter E. J., “ Evaluation of the starting point of the Lombard Effect,” Acta Acust. united Acust. 103, 169–172 (2017). 10.3813/AAA.919043 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Astolfi A., Carullo A., Pavese L., and Puglisi G. E., “ Duration of voicing and silence periods of continuous speech in different acoustic environments,” J. Acoust. Soc. Am. 137, 565–579 (2015). 10.1121/1.4906259 [DOI] [PubMed] [Google Scholar]
- 16. Titze I. R. and Hunter E. J., “ Comparison of vocal vibration dose measures for potential damage risk criteria,” J. Speech Lang. Hear. Res. 58, 1425–1439 (2015). 10.1044/2015_JSLHR-S-13-0128 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Castellana A., Carullo A., Astolfi A., Puglisi G. E., and Fugiglando U., “ Intra-speaker and inter-speaker variability in speech sound pressure level across repeated readings,” J. Acoust. Soc. Am. 141, 2353–2363 (2017). 10.1121/1.4979115 [DOI] [PubMed] [Google Scholar]
- 18. Vogel A. P., Fletcher J., Snyder P. J., Fredrickson A., and Maruff P., “ Reliability, stability, and sensitivity to change and impairment in acoustic measures of timing and frequency,” J. Voice 25, 137–149 (2011). 10.1016/j.jvoice.2009.09.003 [DOI] [PubMed] [Google Scholar]
- 19. Vogel A. P. and Maruff P., “ Monitoring change requires a rethink of assessment practices in voice and speech,” Logop. Phoniatr. Vocol. 39, 56–61 (2014). 10.3109/14015439.2013.775332 [DOI] [PubMed] [Google Scholar]
- 20. Hubbard C. P. and Prins D., “ Word familiarity, syllabic stress pattern, and stuttering,” J. Speech Lang. Hear. Res. 37, 564–571 (1994). 10.1044/jshr.3703.564 [DOI] [PubMed] [Google Scholar]
- 21. Merlo S. and Mansur L. L., “ Descriptive discourse: Topic familiarity and disfluencies,” J. Commun. Disord. 37, 489–503 (2004). 10.1016/j.jcomdis.2004.03.002 [DOI] [PubMed] [Google Scholar]
- 22. Herman P. A., “ The effect of repeated readings on reading rate, speech pauses, and word recognition accuracy,” Read. Res. Q. 20, 553–565 (1985). 10.2307/747942 [DOI] [Google Scholar]
- 23. Atkinson J. E., “ Inter‐ and intraspeaker variability in fundamental voice frequency,” J. Acoust. Soc. Am. 60, 440–445 (1976). 10.1121/1.381101 [DOI] [PubMed] [Google Scholar]
- 24. W. S. Brown, Jr. , Morris R. J., and Murry T., “ Comfortable effort level revisited,” J. Voice 10, 299–305 (1996). 10.1016/S0892-1997(96)80011-7 [DOI] [PubMed] [Google Scholar]
- 25. Garrett K. L. and Healey E. C., “ An acoustic analysis of fluctuations in the voices of normal adult speakers across three times of day,” J. Acoust. Soc. Am. 82, 58–62 (1987). 10.1121/1.395437 [DOI] [PubMed] [Google Scholar]
- 26. Heald S. L. M. and Nusbaum H. C., “ Variability in vowel production within and between days,” PLoS One 10, e0136791 (2015). 10.1371/journal.pone.0136791 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Egan J. J., “ The Lombard reflex: Historical perspective,” Arch. Otolaryngol. 94, 310–312 (1971). 10.1001/archotol.1971.00770070502004 [DOI] [PubMed] [Google Scholar]
- 28. Lindblom B., “ Explaining phonetic variation: A sketch of the H&H theory,” in Speech Production and Speech Modelling, edited by Hardcastle W. J. and Marchal A. ( Kluwer Academic, Amsterdam, Netherlands, 1990), pp. 403–439. [Google Scholar]
- 29. Smiljanić R., “ Clear speech perception: Linguistic and cognitive benefits,” in The Handbook of Speech Perception, edited by Pardo J. S., Nygaard L. C., Remez R. E., and Pisoni D. B. ( Wiley, New York, 2021), pp. 177–205. [Google Scholar]
- 30. Picheny M. A., Durlach N. I., and Braida L. D., “ Speaking clearly for the hard of hearing II: Acoustic characteristics of clear and conversational speech,” J. Speech Lang. Hear. Res. 29, 434–446 (1986). 10.1044/jshr.2904.434 [DOI] [PubMed] [Google Scholar]
- 31. Behrman A., Ferguson S. H., Akhund A., and Moeyaert M., “ The effect of clear speech on temporal metrics of rhythm in Spanish-accented speakers of English,” Lang. Speech 62, 5–29 (2019). 10.1177/0023830917737109 [DOI] [PubMed] [Google Scholar]
- 32. Bradlow A. R., Kraus N., and Hayes E., “ Speaking clearly for children with learning disabilities: Sentence perception in noise,” J. Speech Lang. Hear. Res. 46, 80–97 (2003). 10.1044/1092-4388(2003/007) [DOI] [PubMed] [Google Scholar]
- 33. Ferguson S. H., Poore M. A., and Shrivastav R., “ Acoustic correlates of reported clear speech strategies,” J. Acad. Rehabil. Audiol. XLIII, 45–64 (2010), available at https://www.audrehab.org/_files/ugd/caf823_fccc0bb4b596427cb742c7c180c96144.pdf. [Google Scholar]
- 34. Hazan V. and Baker R., “ Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions,” J. Acoust. Soc. Am. 130, 2139–2152 (2011). 10.1121/1.3623753 [DOI] [PubMed] [Google Scholar]
- 35. Kato M. and Baese-Berk M. M., “ Perceptual consequences of native and non-native clear speech,” J. Acoust. Soc. Am. 151, 1246–1258 (2022). 10.1121/10.0009403 [DOI] [PubMed] [Google Scholar]
- 36. Keerstock S. and Smiljanic R., “ Clear speech improves listeners' recall,” J. Acoust. Soc. Am. 146, 4604–4610 (2019). 10.1121/1.5141372 [DOI] [PubMed] [Google Scholar]
- 37. Lam J. and Tjaden K., “ Clear speech variants: An acoustic study in Parkinson's disease,” J. Speech. Lang. Hear. Res. 59, 631–646 (2016). 10.1044/2015_JSLHR-S-15-0216 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Smiljanić R. and Bradlow A. R., “ Production and perception of clear speech in Croatian and English,” J. Acoust. Soc. Am. 118, 1677–1688 (2005). 10.1121/1.2000788 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Smiljanić R. and Bradlow A. R., “ Temporal organization of English clear and conversational speech,” J. Acoust. Soc. Am. 124, 3171–3182 (2008). 10.1121/1.2990712 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Smiljanic R. and Gilbert R. C., “ Acoustics of clear and noise-adapted speech in children, young, and older adults,” J. Speech. Lang. Hear. Res. 60, 3081–3096 (2017). 10.1044/2017_JSLHR-S-16-0130 [DOI] [PubMed] [Google Scholar]
- 41. Whitfield J. A. and Goberman A. M., “ Articulatory-acoustic vowel space: Associations between acoustic and perceptual measures of clear speech,” Int. J. Speech Lang. Pathol. 19, 184–194 (2017). 10.1080/17549507.2016.1193897 [DOI] [PubMed] [Google Scholar]
- 42. Durisala N., Prakash S. G., Nambi A., and Batra R., “ Intelligibility and acoustic characteristics of clear and conversational speech in Telugu (a South Indian Dravidian language),” Indian J. Otolaryngol. Head Neck Surg. 63, 165–171 (2011). 10.1007/s12070-011-0241-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Krause J. C. and Braida L. D., “ Acoustic properties of naturally produced clear speech at normal speaking rates,” J. Acoust. Soc. Am. 115, 362–378 (2004). 10.1121/1.1635842 [DOI] [PubMed] [Google Scholar]
- 44. Yi H., Pingsterhaus A., and Song W., “ Effects of wearing face masks while using different speaking styles in noise on speech intelligibility during the COVID-19 pandemic,” Front. Psychol. 12, 682677 (2021). 10.3389/fpsyg.2021.682677 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Ferguson S. H. and Kewley-Port D., “ Vowel intelligibility in clear and conversational speech for normal-hearing and hearing-impaired listeners,” J. Acoust. Soc. Am. 112, 259–271 (2002). 10.1121/1.1482078 [DOI] [PubMed] [Google Scholar]
- 46. Ferguson S. H. and Quené H., “ Acoustic correlates of vowel intelligibility in clear and conversational speech for young normal-hearing and elderly hearing-impaired listeners,” J. Acoust. Soc. Am. 135, 3570–3584 (2014). 10.1121/1.4874596 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Leung K. K., Jongman A., Wang Y., and Sereno J. A., “ Acoustic characteristics of clearly spoken English tense and lax vowels,” J. Acoust. Soc. Am. 140, 45–58 (2016). 10.1121/1.4954737 [DOI] [PubMed] [Google Scholar]
- 48. Ménard L., Polak M., Denny M., Burton E., Lane H., Matthies M. L., Marrone N., Perkell J. S., Tiede M., and Vick J., “ Interactions of speaking condition and auditory feedback on vowel production in postlingually deaf adults with cochlear implants,” J. Acoust. Soc. Am. 121, 3790–3801 (2007). 10.1121/1.2710963 [DOI] [PubMed] [Google Scholar]
- 49. Moon S. J. and Lindblom B., “ Interaction between duration, context, and speaking style in English stressed vowels,” J. Acoust. Soc. Am. 96, 40–55 (1994). 10.1121/1.410492 [DOI] [Google Scholar]
- 50. Tasko S. M. and McClean M. D., “ Variations in articulatory movement with changes in speech task,” J. Speech. Lang. Hear. Res. 47, 85–100 (2004). 10.1044/1092-4388(2004/008) [DOI] [PubMed] [Google Scholar]
- 51. Ferguson S. H. and Kewley-Port D., “ Talker differences in clear and conversational speech: Acoustic characteristics of vowels,” J. Speech Lang. Hear. Res. 50, 1241–1255 (2007). 10.1044/1092-4388(2007/087) [DOI] [PubMed] [Google Scholar]
- 52. Maniwa K., Jongman A., and Wade T., “ Acoustic characteristics of clearly spoken English fricatives,” J. Acoust. Soc. Am. 125, 3962–3973 (2009). 10.1121/1.2990715 [DOI] [PubMed] [Google Scholar]
- 53. Searl J. and Evitts P. M., “ Tongue-palate contact pressure, oral air pressure, and acoustics of clear speech,” J. Speech Lang. Hear. Res. 56, 826–839 (2013). 10.1044/1092-4388(2012/11-0337) [DOI] [PubMed] [Google Scholar]
- 54. Fairbanks G., Voice and Articulation Drillbook, 2nd ed. ( Harper and Row, New York, 1960). [Google Scholar]
- 55. Nilsson M., Soli S. D., and Sullivan J. A., “ Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise,” J. Acoust. Soc. Am. 95, 1085–1099 (1994). 10.1121/1.408469 [DOI] [PubMed] [Google Scholar]
- 56. Boersma P., “ Praat, a system for doing phonetics by computer,” Glot Int. 5, 341–345 (2001). [Google Scholar]
- 57. Brugman H. and Russel A., “ Annotating multimedia/multi-modal resources with ELAN,” in Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC'04), Lisbon, Portugal (May 26–28, 2004) ( European Language Resources Association, Paris: ), pp. 2065–2068. [Google Scholar]
- 58. Boersma P., “ Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound,” Proc. Inst. Phon. Sci. 17, 97–110 (1993), available at http://xeds.eu/other/P_Boersma_Accurate_short-term_analysis_of_the_fundametnal_freq.pdf. [Google Scholar]
- 59.There are several modifiable parameters when using this method; these can be summarized in terms of the following Praat command line: “To Pitch (ac)… 0.01 fo_min 15 yes 0.03 0.45 0.0025 0.35 0.20 fo_max”. The fo extraction search range (fo_min and fo_max) was set individually for each talker based on their voice use profile (5) from their entire set of recordings (which with four tasksets and two speaking styles comprised about 20 minutes of recorded speech).
- 60. Ishi C. T., Sakakibara K.-I., Ishiguro H., and Hagita N., “ A method for automatic detection of vocal fry,” IEEE Trans. Audio Speech Lang. Process. 16, 47–56 (2008). 10.1109/TASL.2007.910791 [DOI] [Google Scholar]
- 61. Cantor-Cutiva L. C., Bottalico P., and Hunter E., “ Factors associated with vocal fry among college students,” Logop. Phoniatr. Vocol. 43, 73–79 (2018). 10.1080/14015439.2017.1362468 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Monson B. B., Hunter E. J., and Story B. H., “ Horizontal directivity of low- and high-frequency energy in speech and singing,” J. Acoust. Soc. Am. 132, 433–441 (2012). 10.1121/1.4725963 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Corrections for multiple comparisons were not needed; the problems inherent to multiple comparisons in other analysis techniques are not a problem in mixed-effects linear regression analysis (64).
- 64. Gelman A., Hill J., and Yajima M., “ Why we (usually) don't have to worry about multiple comparisons,” J. Res. Educ. Eff. 5, 189–211 (2012). 10.1080/19345747.2011.618213 [DOI] [Google Scholar]
- 65. Lam J., Tjaden K., and Wilding G., “ Acoustics of clear speech: Effect of instruction,” J. Speech Lang. Hear. Res. 55, 1807–1821 (2012). 10.1044/1092-4388(2012/11-0154) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Jacewicz E., Fox R. A., O'Neill C., and Salmons J., “ Articulation rate across dialect, age, and gender,” Lang. Var. Change 21, 233–256 (2009). 10.1017/S0954394509990093 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Leung Y., Oates J., Papp V., and Chan S.-P., “ Speaking fundamental frequencies of adult speakers of Australian English and effects of sex, age, and geographical location,” J. Voice 36, 434.e1–434.e15 (2020). 10.1016/j.jvoice.2020.06.014 [DOI] [PubMed] [Google Scholar]
- 68. Manjunatha U., Nayak P. S., and Bhat J. S., “ Can straw phonation be considered as vocal warm up among speech language pathologists?,” J. Voice 36, 735.e1–735.e6 (2022). 10.1016/j.jvoice.2020.08.016 [DOI] [PubMed] [Google Scholar]
- 69. Beechey T., Buchholz J. M., and Keidser G., “ Measuring communication difficulty through effortful speech production during conversation,” Speech Commun. 100, 18–29 (2018). 10.1016/j.specom.2018.04.007 [DOI] [Google Scholar]
- 70. Behrman A. and Akhund A., “ The effect of loud voice and clear speech on the use of vocal fry in women,” Folia Phoniatr. Logop. 68, 159–166 (2016). 10.1159/000452948 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.



