Abstract
This study examined acoustic variation of vowels within speakers across speech tasks. The overarching goal of the study was to understand within-speaker variation as one index of the range of normal speech motor behavior for American English vowels. Ten male speakers of American English performed four speech tasks including citation form sentence reading with a clear-speech style (clear-speech), citation form sentence reading (citation), passage reading (reading), and conversational speech (conversation). Eight monophthong vowels in a variety of consonant contexts were studied. Clear-speech was operationally defined as the reference point for describing variation. Acoustic measures associated with the conventions of vowel targets were obtained and examined. These included temporal midpoint formant frequencies for the first three formants (F1, F2, and F3) and the derived Euclidean distances in the F1–F2 and F2–F3 planes. Results indicated that reduction toward the center of the F1–F2 and F2–F3 planes increased in magnitude across the tasks in the order of clear-speech, citation, reading, and conversation. The cross-task variation was comparable for all speakers despite fine-grained individual differences. The characteristics of systematic within-speaker acoustic variation across tasks have potential implications for the understanding of the mechanisms of speech motor control and motor speech disorders.
I. INTRODUCTION
Measures of the acoustic vowel space, often defined by the formant frequencies of corner vowels, have played a dominant role in a variety of applications in speech production studies. For example, the vowel space has served as a reference for articulatory variability and an estimate of speech motor control integrity (e.g., Karlsson and van Doorn, 2012; Sapir et al., 2010; Skodda et al., 2011). Also, vowel space measures have been used as a metric for evaluating the effects of speech rate or style (e.g., clear speech) on speech production (e.g., Ferguson and Kewley-Port, 2007; Fourakis, 1991; Krause and Braida, 2004; Moon and Lindblom, 1994; Picheny et al., 1986). Moreover, vowel space measures have been used for the examination of speaker differences, such as those associated with linguistic and dialectal differences (e.g., Chung et al., 2012; Clopper et al., 2005; Jacewicz et al., 2011) and indications of speech motor control integrity in persons with neurological diseases (e.g., Kent and Kim, 2003; Lansford and Liss, 2014; Skodda et al., 2011; Weismer et al., 2001). As such, vowel space measures are widely used to characterize speech behaviors (also see Flipsen and Lee, 2012).
Many factors can affect vowel space measures, one being vowel reduction as influenced by speech production variations induced by context, rate, stress, and sex of given speaker. Conventionally, acoustic changes under the effects of reduction are described as more centralized formant frequencies relative to a given vowel's target values (e.g., Lindblom, 1963; Stevens and House, 1963; Stevens et al., 1966), though a broader application of the concept has been proposed to include any frequency deviation from the target as reduction (Lindgren and Lindblom, 1996). Target values for vowels imply the notion that there are ideal, or most representative, forms of vowels. The conceptualization of idealized vowel targets largely follows the requirements of maximum phonetic distinctiveness among a set of vowels (e.g., Lindblom, 1963; Johnson et al., 1993). However, speakers usually do not achieve these ideal forms. It has been suggested that in functional communicative contexts, sufficient, instead of maximal, distinctiveness is often what is needed (Lindblom, 1990).
Within the framework of the acoustic theory of speech production, a vowel target is associated with a vocal tract configuration most distinct from a tube with uniform cross-sectional area (Stevens and House, 1961; also Johnson et al., 1993). Changes in vocal tract configurations are achieved through complex speech motor events that shape the vocal tract (e.g., Story, 2009). Therefore, vowel targets and reduction (i.e., variations from targets) afford information about the ranges of speech motor behaviors for a particular phonetic goal. It should be acknowledged that similar variations in speech have also been a subject of interest in sociolinguistic research (Blake and Josey, 2003; Labov, 1972). The present study adopts and focuses on the perspective that speech is the culmination of motor events, and sociolinguistic factors are treated as an integral part of the motor behaviors for speech (e.g., Lindblom, 1990).
The potential for reduction in vocal tract shape from these ideal configurations, and therefore reduction in vowel formant frequencies has been well established for effects of phonetic contexts and rate since the seminal work of Peterson and Barney (1952), Lindblom (1963), Stevens and House (1963), Stevens et al. (1966), and Hillenbrand et al. (2001). Complex consonant context effects on vowel reduction have been reported for place of articulation, voicing, and style of speaking (e.g., Bradlow et al., 1996; Hillenbrand et al., 2001; Hillenbrand et al., 1995; Moon and Lindblom, 1994; Peterson and Barney, 1952; Stevens et al., 1966). In terms of rate, some studies have reported vowel reduction effects with increased rate (e.g., Lindblom, 1963; Moon and Lindblom, 1994; Perkell et al., 2004), while others did not find such effects (e.g., Fourakis, 1991; Gay, 1978; Von Son and Pols, 1990). The effects of stress have also been of interest in understanding reduction. A relative decrease in stress tends to be associated with vowel reduction effects (e.g., Fourakis, 1991; Koopmans-van Beinum, 1980).
Thus, vowel reduction characterizes one form of variability in speech production. As demonstrated by the studies reviewed above, vowels produced by healthy speakers are marked by reduction in varying degrees and patterns, depending on task and perhaps speaker. In the face of reduction, however, healthy speakers presumably do not present with communicative challenges (e.g., Lindblom, 1990; Bradlow et al., 1996). As such, can a “typical” range be established for vowel reduction within or among speakers and across the various factors that may induce reduction? That is, what is the range of variability that a healthy system produces in speech? Moreover, what is the nature and magnitude of differences in within-speaker variability across individuals? This inquiry has important implications for further understanding the mechanisms of speech motor control and for better quantification of disordered speech characteristics. It may also shed light on how useful the commonly-used vowel space metric may be in evaluating speech motor control deficits. For example, the ranges of vowel reduction and associated effect of reducing the size of the vowel space produced by healthy systems would provide significant reference data for studies on disordered populations. Accounts of population differences in vowel space measures have focused primarily on averaged vowel space values obtained in limited experimental tasks (e.g., Bunton and Leddy, 2011; Skodda et al., 2011; Turner et al., 1995). Establishing the range of healthy speech production as motor behaviors would reveal additional insights into population differences.
The overarching goal of this study was to examine acoustic variation as the outcome of systematic changes in typical speech motor behaviors for well-defined phonetic goals. In particular, the extents of vowel reduction across different speaking tasks were studied as a window into within-speaker variability. Speaking tasks of interest included citation form sentence reading in clear-speech style (henceforth “clear-speech”), citation form sentence reading in habitual speech style (henceforth “citation”), passage reading (henceforth “reading”), and conversational speech (henceforth “conversation”). The tasks were chosen to represent a range of communicative demands on the speech mechanism, driven by the hypothesized continuum of Hyper- and Hypo-speech (Lindblom, 1990). Vowel productions, and specifically in classically defined vowel-target measures, were examined. The directions and magnitudes of vowel formant frequency changes were examined for the first three formant (F1, F2, and F3) frequencies at the temporal midpoint of vowel nuclei (e.g., Lindblom, 1963, Stevens et al., 1966; Hillenbrand et al., 1995). Motivated by current knowledge on vowel reduction, vowels produced in clear-speech were operationally defined as the vowel targets, or reference points, for examining reduction effects (e.g., Stevens, 2000).
Two hypotheses were addressed. First, it was hypothesized that the general patterns (i.e., directions of formant changes) of variability across speaking tasks for a group of speakers of the same sex and same dialect would be similar, although fine-grained speaker differences (e.g., extents of changes, relative vowel-to-vowel changes) were expected. Second, vowel reduction for a given speaker (i.e., within-speaker) was hypothesized to be manifested as systematic changes across clear-speech, citation, reading, and conversation tasks. Vowel formant frequencies were expected to become increasingly distant from the targets across these four tasks.
II. METHODS
A. Participants
Ten healthy male native speakers of American English born and raised in Wisconsin participated in the study. The participants were within the age range of 18 to 31 years [M = 25.1 yrs, standard deviation (SD) = 3.75]. All speakers had no known history of speech, language, and/or hearing disorders (by self-report) and were naive to the purposes of the study. The current study focused on male speakers to eliminate sex as an additional factor of variability and to maximize potential cross-task effects given that male speakers have been shown to produce speech characterized by a greater degree of reduction when compared to females (e.g., Byrd, 1994).
B. Procedures
1. Vowels
This study included all four corner vowels (/i/, /æ/, /ɑ/, and /u/) and four lax vowels (/ɪ/, /ɛ/, /ʊ/, and /ʌ/) of American English. Vowels were elicited in a total of eight monosyllabic /h/-vowel-/d/ (hVd) contexts, and 48 monosyllabic consonant-vowel-consonant (CVC) contexts (Table I). Note that the primary interest of the study was in the effects of task on within-speaker reduction, and the variety of phonetic contexts added ecological value as a range of vowel productions was sampled in each task of interest. All contexts comprised real words with four exceptions described below.
TABLE I.
Target words by vowel and initial consonant context. Note: Asterisks mark the three target words with word-final velar stops. The superscripts indicate the four exceptions to the monosyllabic word rule.
| Initial consonants | ||||||||
|---|---|---|---|---|---|---|---|---|
| /h/ | /b/ | /p/ | /d/ | /t/ | /g/ | /k/ | ||
| Vowels | /i/ | heat | bead | Pete | deed | teed | geek* | keyed |
| /ɪ/ | hid | bit | pit | did | tit-for-tat | gig* | kid | |
| /ɛ/ | head | bed | pet | dead | Ted | get | ked | |
| /æ/ | had | bad | pad | dad | tad | gad | cad | |
| /ɑ/ | hot | cro-bot | pot | dot | Todd | got | cod | |
| /ʊ/ | hood | book* | put | Dould | took* | good | could | |
| /u/ | hoot | boot | pood | dude | toot | Goo together | cooed | |
| /ʌ/ | hut | but | putt-putt | dud | Tut | gut | cut | |
| hVd | CVC | |||||||
The hVd context was included with minor modification. Specifically, to ensure that nearly all words used across speaking tasks in the experiment were real words, both /d/ and /t/ were included in the word-final position given the shared place and manner of articulation and knowledge that post-vocalic voicing effects of consonants on vowel formants are small, at least when the formant frequencies are measured at or around the temporal midpoint of the vowel (Hillenbrand et al., 2001).
For CVC contexts, word-initial consonants included six stop consonants /b/, /p/, /d/, /t/, /g/, and /k/. Following the same rule as for hVd words, word-final consonants were confined to the alveolar environment (i.e., /d/ and /t/) to the extent possible. In rare instances (3 of 56 cases) velar stops (i.e., book, took, geek) were used when no real words were available for a given CVC item with final /d/ or /t/. Although the aim was to include only monosyllabic words, there were four exceptions in which the CVC of interest was part of a multisyllabic word/phrase. Table I shows four exceptions (marked with superscripts) to the monosyllabic word rule (i.e., “cro-bot,” “putt-putt,” “tit-for-tat,” and “Goo together”). These segments were constructed to elicit close approximation to CVC contexts of interest while being semantically meaningful in the reading materials described below. In particular, for /gut/, an approximation of the desired CVC context was created with an open-syllable word followed by a word with an initial alveolar consonant (i.e., Goo together).
2. Reading materials
For the reading task, six novel passages (See Appendix A) were constructed. The reading passages included the vocalic segments listed in Table I, in all hVd and CVC words. The words of interest were strategically placed in the reading materials in an effort to eliminate potential declination effects that may introduce confounding factors to vowel reduction effects (e.g., Ladd, 1988; Vayra and Fowler, 1992). At the sentence level, hVd and CVC words were placed prior to the middle of a sentence to the extent possible. For example, “A hut around the entrance marked the beginning of an adventure.” Or, words were placed prior to the middle of the sentence based on the number of words in a sentence. For example, “She glued a ceramic bead of a bright yellow color onto the ribbon bow.” No word of interest was placed in the last word position of a sentence. Within each reading passage, no segment of interest was placed in the last sentence of a passage. The mean length of the reading passages was 333 words (SD = 36), ranging across passages from 286 to 393 words, comparable to other materials that have been used in the speech production literature (e.g., 329 words in the Rainbow passage and 280 words in the Hunter script) (Crystal and House, 1982; Duffy, 2013).
3. Tasks
Speakers performed 4 speaking tasks including clear-speech, citation, reading, and conversation across two 45-min experiment sessions. During the first session, speakers performed the first part of the reading task and engaged in natural conversations. During the second session, speakers completed the second part of the reading task and then performed the citation and clear-speech tasks, in this exact order. These tasks were deliberately organized this way instead of randomized so that speakers were not exposed to the words of interest prior to the reading task. Conversation was again sampled in the second sessions across several natural conversation opportunities. Further details of tasks are described below.
Prior to the initiation of recordings, each participant completed one practice reading of all passages to become familiar with them. After the practice reading, each speaker read each of the reading passages three times, with the order of the passages randomized across repetitions and participants. Participants were instructed to read the passages in their habitual speech style at a comfortable vocal intensity and speaking rate. In this task, 3 repetitions for all words were obtained, yielding approximately 21 tokens for each vowel.
For conversation, participants were engaged in open-ended dialogues with the experimenter. Participants were encouraged to elaborate on topics of interest. Approximately five words were selected (See Appendix B for a list of criteria) from the conversational speech samples obtained during the first session and included in the citation and clear-speech tasks for the second session. On average, each participant supplied a total of approximately 10 min of conversational speech across two experiment sessions. For each participant, an average of nine words (range = 7 to 11) that contained target vowels were included for analysis.
For citation, four repetitions of each target word and selected words from conversation were obtained in a carrier phrase “It's a (target word) again,” with the sentences randomized across repetitions. Again, speakers were asked to produce these at a comfortable vocal intensity and speaking rate. In this task, 4 repetitions for all words were obtained, yielding approximately 28 tokens for each vowel.
For clear-speech, target words of hVd form and selected words from conversation samples were elicited in the carrier phrase following procedures used in the clear-speech literature (Ferguson and Kewley-Port, 2007; Picheny et al., 1986; see also Ferguson et al., 2010). The exact instructions used to elicit a clear-speech style were: “Say these sentences as clearly as possible as if you are talking to someone who is hearing impaired.” A single production was obtained in clear-speech style with the order of sentences randomized.
C. Data analysis
1. Measurements
Acoustic measures were made with a computer-based multiple-purpose speech analysis program, TF32 (Milenkovic, 2000). Following procedures described in Turner et al. (1995), both wide-band spectrograms and linear predictive coding analysis were used for acoustic measurements. TF32 provides a default bandwidth of 300 Hz. While this usually offered good resolution for male speakers, analysis bandwidth was adjusted as needed for individual speakers to optimize formant resolution for measurement. Words that contained the vowels of interest were first parsed from the speech samples, and the vowel nucleus, defined as the period of time from the first discernible glottal pulse to the last discernible glottal pulse (Klatt, 1975), was identified.
The measurements included raw frequency data and derived data. For the raw frequency data, the first three formant (F1, F2, F3) frequency values in Hertz (Hz) were manually corrected and obtained at the temporal midpoint of the vowel nucleus (Lindblom, 1963; Peterson and Barney, 1952; Stevens et al., 1966; Hillenbrand et al., 1995). The temporal midpoint measures in clear speech comprised the target values and served as the reference points for reduction. The derived data were the Euclidean distances of vowels in the F1–F2 and F2–F3 planes between clear-speech (reference point) and each of the other conditions.
2. Reliability
Measurement reliability was evaluated by comparing inter-experimenter replicate measurements of a randomly selected 250 words from a total of 4319 words, or approximately 6%, of the total number of obtained productions. Two out of a total of three experimenters involved in this study made the reliability measures. Inter-rater reliability is, on logical grounds, a more conservative estimate of measurement reliability (i.e., should produce larger measurement errors) when compared to intra-experimenter reliability. Table II summarizes the mean absolute frequency differences and Pearson correlation coefficients (r) for F1, F2, and F3 for vowel midpoint measurements. These results compared well to other reliability reports in the literature, and the frequency differences were less than typically reported reduction effects (e.g., Hillenbrand et al., 1995; Monsen and Engebretson, 1983).
TABLE II.
Summary of inter-experimenter measurement reliability.
| Vowel target | Vowel onset | Slice-in-time point | ||||
|---|---|---|---|---|---|---|
| Mean absolute difference | Pearson r | Mean absolute difference | Pearson r | Mean absolute difference | ||
| F1 | 15 Hz | 0.99 | 40 Hz | 0.91 | Vowel temporal midpoint (target) | 6 ms |
| F2 | 12 Hz | 0.99 | 42 Hz | 0.94 | Onset | 16 ms |
| F3 | 34 Hz | 0.94 | 42 Hz | 0.90 | ||
Additional measures of vowel onset frequencies and time point comparisons were included for the reliability test because the vowel midpoint measure of primary interest required the establishment of vowel nucleus boundaries and time point identification. These results are also reported in Table II. All measures used in the statistical analysis were the original measures.
3. Statistical analysis
The task-related vowel reduction effects were tested with generalized linear mixed model analysis using IBM® SPSS® Statistics (Version 22) (Cnaan et al., 1997). Whenever available, averages across repetitions were used for analysis. The generalized linear mixed model analyses were performed on the raw data (F1, F2, F3) and derived data (Euclidean distances in the F1–F2 plane and in the F2–F3 plane). With the study-wide alpha level held at 0.05, an alpha level of 0.01 was assigned to each F test by measure. The fixed factors were Vowel and Task. The random factor was Speaker. Vowel and Task were also defined as repeated measures. All main effects and two-way interactions were tested. Post hoc pairwise comparisons were completed using the sequential Bonferroni adjustment method and evaluated at p = 0.05. Note that the Vowel main effect was included in the analysis model primarily for the evaluation of vowel-related interactions. Although a significant Vowel main effect was expected, the individual vowel comparisons did not address the hypotheses of the present study and will not be considered in detail. Consonant context effects were not of interest in this study, and the variety of consonant contexts were included and pooled for analysis.
III. RESULTS
Below, results are organized following the two hypotheses of this study. First, group (across-speaker) and individual patterns of the acoustic “vowel perimeters” for eight American English vowels across tasks are evaluated. Vowel perimeters here refer to the outline connecting the F1–F2 or F2–F3 coordinates for all vowels studied. The concept of vowel perimeters is the same as the conventional vowel space, but this alternative term is used to distinguish the eight-point vowel acoustic area of interest in this study from any other vowel space computation. Second, statistical outcomes of reduction effects across clear-speech, citation, reading, and conversation tasks are presented.
A. Vowel perimeters across tasks
1. F1–F2 plane
Vowel perimeters across tasks in the F1–F2 plane are shown for all speakers in Fig. 1. The plotted data in each frame of the figure is for an individual speaker, and each data point generally represents an average across repetitions and consonant contexts as described above. For simplicity, only the phonetic symbols for the four corner vowels are given in each frame. Typically the right side of each quadrilateral shows symbols from top to bottom for /i/, /ɪ/, /ɛ/, and /æ/, respectively. On the left side of the quadrilateral, the symbols typically indicate vowels /u/, /ʊ/, /ʌ/, and /ɑ/, respectively. Plotted numbers represent the F1–F2 coordinate values for vowels extracted from conversation (see Fig. 1).
FIG. 1.
Vowel perimeters based on averaged raw frequency values of F1 and F2 are shown for each speaker across four speaking tasks. Any missing data point in the clear-speech condition is indicated by an asterisk, which marks the formant frequency values for the given vowel as an average across tasks.
Figure 1 shows a wide range of sizes and shapes of vowel perimeters across tasks within and across speakers. The clear-speech productions formed the largest vowel perimeter for all except two speakers, tgt16 and tgt19. The vowel perimeter decreased from clear-speech to citation and to reading for 6 out of 10 speakers (tgt01, tgt13, tgt14, tgt15, tgt17, tgt21). However, the decrease in the vowel perimeter was variable across speakers. Productions in conversation (plotted numbers in Fig. 1) were highly variable within and across speakers. For example, vowels produced by speakers tgt13 and tgt20 in conversation appeared to continue the apparent reduction effect relative to reading. To a lesser extent, speakers tgt16 and tgt19 also demonstrated a reduction effect in conversation. Speaker tgt01's conversation productions seem relatively resistant toward further reduction from the reading task. A similar observation can be made with conversation productions of speakers tgt14 and tgt21, although these speakers presented with somewhat greater changes in formant frequencies when compared to tgt01.
2. F2–F3 plane
Vowel perimeters across tasks in the F2–F3 plane are shown for all speakers in Fig. 2. Overall, the pattern of data in Fig. 2 follows that in Fig. 1, with each frame of the figure showing an individual speaker. Consistent with Fig. 1, only the phonetic symbols for the four corner vowels are given in each frame, as close as possible to the clear-speech hVd points (i.e., filled triangles). However, the data points were not connected by lines given the nature of the F2–F3 plane. Typically the right side of each frame shows symbols for /i/, /ɪ/, /ɛ/, and /æ/, and the left side vowels /u/, /ʊ/, /ʌ/, and /ɑ/. Plotted numbers represent the F2–F3 coordinate values for vowels extracted from conversation (see Fig. 2).
FIG. 2.
(Color online) Vowel perimeters based on averaged raw frequency values of F2 and F3 are shown for each speaker across four speaking tasks. The dotted line was manually plotted on each graph to indicate the separation between the phonetic distinction of front and back vowels.
The F2–F3 vowel perimeter is important for the phonetic categorization of front and back vowels, as illustrated in the current data by the separation of front and back vowels primarily along the F2 axis, with a boundary value of about 1500 Hz as indicated by the dotted line in Fig. 2. The dotted line has been placed by visual inspection of the data points to illustrate the front–back separation. However, conversation productions for speakers tgt13, tgt15, and tgt21 did not yield a clear front–back separation. Also, speaker tgt19 produced vowel /ɑ/ with notable variation in both F2 and F3 frequencies across speaking tasks. Systematic changes across tasks presented as formant frequency shifts toward the front–back distinction roughly around F2 at 1500 Hz, but these task-related frequency changes were most notable for vowels /i/ and /u/ and manifested differently in magnitude across speakers.
B. Vowel reduction effects
Summary statistics for all measures are shown in Tables III and IV. Table III shows the group of speakers' means and standard deviations (in parentheses) for raw F1, F2, and F3 measures. In addition, Table III shows the magnitude and direction of formant frequency change from clear-speech to each of the other speaking tasks (Δ in Table III, also see note). Table IV outlines the group of speakers' means and standard deviations (in parentheses) for the derived Euclidean distances between clear-speech and each of the other tasks in the F1–F2 space as well as the F2–F3 space.
TABLE III.
Group means and standard deviations of formant frequencies by task and signed change from clear-speech (Δ). Note: Unit = Hertz. Values presented are group means across speaker means. One standard deviation is shown in parentheses. Δ = non-clear-speech value minus clear-speech value (e.g., For /i/, Citation F1—Clear-speech F1 = 282 − 267 = +15).
| Tasks | Clear-speech | Citation | Reading | Conversation | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Vowels | F1 | F2 | F3 | F1 | F2 | F3 | F1 | F2 | F3 | F1 | F2 | F3 |
| /i/ | 267 | 2407 | 3184 | 282 | 2304 | 2980 | 301 | 2242 | 2848 | 341 | 2160 | 2692 |
| (28) | (235) | (278) | (20) | (190) | (173) | (19) | (142) | (117) | (21) | (161) | (69) | |
| Δ | +15 | −103 | −204 | +34 | −165 | −335 | +74 | −247 | −491 | |||
| /ɪ/ | 449 | 1912 | 2642 | 459 | 1853 | 2594 | 451 | 1811 | 2552 | 471 | 1676 | 2505 |
| (47) | (96) | (119) | (36) | (88) | (87) | (29) | (93) | (85) | (43) | (88) | (65) | |
| Δ | +10 | −59 | −47 | +2 | −101 | −90 | +22 | −236 | −137 | |||
| /ɛ/ | 633 | 1737 | 2627 | 613 | 1661 | 2555 | 606 | 1641 | 2532 | 520 | 1681 | 2481 |
| (63) | (99) | (112) | (62) | (63) | (90) | (54) | (80) | (84) | (85) | (96) | (111) | |
| Δ | −20 | −77 | −71 | −28 | −96 | −94 | −113 | −56 | −145 | |||
| /æ/ | 721 | 1711 | 2537 | 674 | 1725 | 2502 | 620 | 1712 | 2517 | 628 | 1629 | 2501 |
| (63) | (89) | (124) | (65) | (77) | (98) | (61) | (84) | (98) | (73) | (90) | (84) | |
| Δ | −48 | +13 | −35 | −101 | +1 | −19 | −93 | −83 | −36 | |||
| /ɑ/ | 807 | 1343 | 2538 | 767 | 1363 | 2551 | 726 | 1365 | 2503 | 663 | 1435 | 2465 |
| (81) | (76) | (149) | (76) | (114) | (148) | (58) | (41) | (101) | (96) | (86) | (163) | |
| Δ | −40 | +20 | +14 | −81 | +22 | −34 | −144 | +92 | −73 | |||
| /ʊ/ | 487 | 1322 | 2394 | 477 | 1341 | 2380 | 472 | 1363 | 2366 | 481 | 1563 | 2304 |
| (58) | (115) | (142) | (39) | (108) | (120) | (36) | (93) | (114) | (50) | (57) | (152) | |
| Δ | −9 | +20 | −13 | −14 | +42 | −28 | −6 | +242 | −89 | |||
| /u/ | 293 | 969 | 2331 | 314 | 1144 | 2301 | 339 | 1201 | 2276 | 358 | 1345 | 2344 |
| (33) | (152) | (176) | (22) | (176) | (179) | (21) | (132) | (166) | (n/a) | (n/a) | (n/a) | |
| Δ | +21 | +175 | −29 | +46 | +232 | −55 | +65 | +377 | +13 | |||
| /ʌ/ | 665 | 1255 | 2601 | 638 | 1345 | 2512 | 638 | 1349 | 2493 | 551 | 1334 | 2447 |
| (51) | (98) | (146) | (71) | (68) | (99) | (54) | (80) | (91) | (39.13) | (148) | (80) | |
| Δ | −27 | +90 | −89 | −27 | +94 | −108 | −114 | +79 | −154 | |||
TABLE IV.
Group summary of between-task Euclidean distances. Note: Unit = Hertz. Values presented are group means across speaker means. One standard deviation is shown in parentheses.
| F1–F2 plane | F2–F3 plane | ||||||
|---|---|---|---|---|---|---|---|
| Clear-speech vs. | Citation | Reading | Conversation | Citation | Reading | Conversation | |
| /i/ | 126.73 (76.13) | 155.60 (80.74) | 441.33 (101.76) | 338.84 (328.34) | 421.09 (80.74) | 644.14 (127.62) | |
| /ɪ/ | 100.25 (43.23) | 150.20 (82.65) | 246.34 (129.90) | 139.95 (52.06) | 189.42 (73.22) | 307.59 (117.75) | |
| /ɛ/ | 108.28 (48.43) | 144.54 (72.06) | 189.56 (71.63) | 152.04 (56.77) | 191.10 (52.82) | 243.22 (77.63) | |
| /æ/ | 87.19 (34.20) | 154.57 (48.90) | 187.84 (84.56) | 95.80 (32.90) | 143.08 (67.88) | 200.26 (67.39) | |
| /ɑ/ | 123.27 (136.16) | 122.82 (48.04) | 219.54 (58.90) | 156.13 (156.97) | 137.63 (66.49) | 211.56 (85.34) | |
| /ʊ/ | 138.92 (63.63) | 183.00 (100.44) | 288.23 (109.37) | 169.78 (66.65) | 194.51 (94.88) | 335.45 (116.27) | |
| /u/ | 190.73 (112.48) | 247.67 (113.24) | 612.44 (n/a) | 228.84 (117.14) | 276.75 (125.98) | 604.56 (n/a) | |
| /ʌ/ | 106.59 (38.51) | 134.52 (98.04) | 188.86 (101.84) | 161.24 (72.72) | 178.26 (90.55) | 252.98 (113.34) | |
F test results are reported in Table V. All effects were significant except the Task main effect for F2 and the interaction effect for the F2–F3 Euclidean distance. As discussed earlier, the Vowel main effect was expected for the productions by healthy speakers in this study and was an important inclusion in the analysis model for the potential Task by Vowel interaction effects.
TABLE V.
Summary of generalized linear mixed model analyses. Note: F tests were evaluated at p = 0.01. Significant tests are marked by asterisks and boldface.
| Measure | Vowel | Task | Vowel × Task |
|---|---|---|---|
| F1 | F (7, 266) = 443.35 | F (3, 266) = 12.69 | F (21, 266) = 7.48 |
| p < 0.001* | p < 0.001* | p < 0.001* | |
| F2 | F (7, 266) = 343.74 | F (3, 266) = 0.20 | F (21, 266) = 8.50 |
| p < 0.001* | p = 0.895 | p < 0.001* | |
| F3 | F (7, 266) = 49.69 | F (3, 266) = 20.71 | F (21, 266) = 2.58 |
| p < 0.001* | p < 0.001* | p < 0.001* | |
| Euclidean distance F1–F2 plane | F (7, 191) = 16.09 | F (2, 191) = 72.21 | F (14, 191) = 3.85 |
| p < 0.001* | p < 0.001* | p < 0.001* | |
| Euclidean distance F2–F3 plane | F (7, 191) = 13.26 | F (2, 191) = 28.97 | F (14, 191) = 1.30 |
| p < 0.001* | p < 0.001* | p = 0.209 |
The formant frequency changes associated with the Task effect were overall in the hypothesized direction from clear-speech to citation, to reading, and to conversation. Figure 3 summarizes the mean raw formant frequency values by task, across all speakers, and vowel contexts. The three panels in the figure correspond to the group data for F1, F2, and F3 from top to bottom. Significant pairwise comparisons are also reported in Fig. 3, with the pairs of comparisons indicated by brackets. F1 changes were in the hypothesized direction, with the most extreme F1 values occurring in clear-speech and conversation, depending on the vowel (Fig. 3, top panel). F1 values were not significantly different for two comparison pairs: clear-speech versus citation and citation versus reading. In contrast, F2 remained relatively consistent across tasks (Fig. 3, middle panel). As for F3, all pairwise comparisons were statistically significant (Fig. 3, bottom panel). In clear-speech the F3 value was the greatest, followed by citation, and then conversation. F3 was most reduced in the reading task.
FIG. 3.
Pairwise comparisons for raw formant frequency measures across tasks are illustrated. Each panel corresponds to a formant measure. Each data point (filled circle) represents the estimated mean from the statistical analysis, with the vertical bars indicating one standard error in each direction. The brackets point to the significant pairs of comparisons at p = 0.05.
The Euclidean distances characterized the magnitude of formant frequency deviation from clear-speech to each of the other tasks, illustrated in Fig. 4 in the form of bar graphs. The top panel shows the Euclidean distances in the F1–F2 plane, and the bottom the F2–F3 plane. It is clear from Fig. 4 that the distance from clear-speech became greater across tasks (citation, reading, and conversation). This pattern is supported by the significant comparisons except for the F2–F3 Euclidean distances from clear-speech for citation and reading tasks (Fig. 4, bottom panel).
FIG. 4.
Pairwise comparisons for Euclidean distances across tasks are illustrated. The top panel shows the F1–F2 plane data points, and the bottom panel shows the F2–F3 plane. The bars illustrate the Euclidean distances between clear-speech and each of the other defined tasks. One standard error is marked at the end of each bar as line with cap. The brackets point to the significant pairs of comparisons at p = 0.05.
Finally, the statistical analysis yielded significant Task by Vowel interactions for all formant frequency measures and Euclidean distance in the F1–F2 plane (Fig. 5 and Table IV). As illustrated in Fig. 5, the interactions were associated with the varying degrees of cross-task changes by vowel. It is notable that for F1 and F2, the Task effects were, at least to some extent, related to the conventional concepts of phonetic features. For F1 (Fig. 5, top left), the patterns of change across tasks were similar for vowel pairs that share the phonetic feature of tongue height. This can be seen for vowel pairs /i/ and /u/, /ɪ/ and /ʊ/, and /ɛ/ and /ʌ/. The interaction effect of F2 (Fig. 5, top right) appeared to be associated with tongue advancement. Front vowels /i/, /ɪ/, and /æ/ presented with similar cross-task changes while back vowels /ɑ/, /ʊ/, and /u/ shared a different direction of cross-task changes. Importantly, the effects of task were greater for some vowels than others.
FIG. 5.
Task-by-Vowel interactions are illustrated for measures that yielded significant interaction effects. Each panel corresponds to a measure.
IV. DISCUSSION
This study sought to characterize within-speaker vowel reduction across tasks as a means to evaluate the ranges of variation in speech production among healthy individuals. Two hypotheses were addressed. The first hypothesis stated that overall patterns of cross-task reduction for a relatively homogeneous group of speakers would be comparable, with fine-grained speaker-specific effects. The second hypothesis stated that vowel reduction across tasks should be systematic in the sense that production changes should reflect task characteristics associated with different demands on the speech mechanism. Current findings are consistent with these two hypotheses. The descriptive evaluation of vowel perimeters and statistical analysis of reduction effects both demonstrated that within-speaker changes in vowel production followed the theoretical concept that speech tasks could be organized by task demands (Lindblom, 1990). Notable individual differences were seen in the vowel spaces across tasks though the directions of reduction were largely similar among speakers. These results have important theoretical implications for our understanding of speech production in healthy individuals. Moreover, data presented here have significant indications for speaker group differences, particularly in studies that involve disordered populations.
Substantial within-speaker variations across tasks were present, manifested as unique changes in vowel perimeter areas and shapes (Figs. 1 and 2). For example, for the F1–F2 perimeters (Fig. 1), the majority of speakers produced vowel perimeters in clear-speech, citation, and reading tasks that generally conform to a quadrilateral shape. However, it is clear that the vowel perimeters are more complicated than a quadrilateral description. Speaker tgt13, for example, produced a more triangular shape in the clear-speech task when compared to other tasks, and speaker tgt19's productions of /ɑ/ contributed to a quadrilateral that was skewed toward a triangle especially in the citation task. Furthermore, it is noteworthy that these changes can occur along any dimension. Again, using the F1–F2 perimeter as an example (Fig. 1), some speakers presented with overall changes along both F1 and F2 (e.g., tgt13, tgt20). Some speakers presented with greater changes along F1 than F2 (e.g., tgt01, tgt14, tgt17). In several cases the changes were more confined to specific vowels, or categories of vowels (e.g., /æ/ for tgt16, low vowels for tgt18). In addition, it is evident that the vowel perimeter for the conversation task was highly variable. In many cases, it would not be possible to calculate the conventional vowel space area measures due to the irregular shape and the reduced, or compressed, space produced in conversation.
Data here showcase at least a portion of the range of possible normal, healthy, behaviors of speech production as indexed by acoustics. To further illustrate this, vowel perimeters across tasks in the F1–F2 plane constructed from the mean values (Table III) are shown in Fig. 6(A), labeled “Present Study” (top left). The plots follow the figure formats used in Fig. 1 (see captions and legends, Figs. 1 and 6). The vowel perimeter range from clear-speech to conversation is marked by the shaded area. For comparison, vowel data from male speakers in Peterson and Barney (1952) were plotted over the present study's cross-task vowel perimeter range in Fig. 6(B) (top right). Note that the shaded area is exactly the same as Fig. 6(A), but the shading was muted in Fig. 6(A) to facilitate visual contrast. The remainder of Fig. 6 will be discussed in the following paragraph.
FIG. 6.
Vowel perimeters in the F1–F2 plane constructed from the mean data (Table III) are shown with vowel spaces constructed from values reported in three other studies on healthy speakers and speakers with dysarthria for comparison. The shaded area in each panel presents the region of vowel space change from clear-speech to conversation in the present study. The shaded area is muted in the top left panel (Present Study) only to facilitate visual contrast. Mean vowel perimeters across tasks for Present Study are plotted following the format in Fig. 1: clear-speech = triangles, citation = squares, reading = circles, and conversation = numbers. Only habitual data are plotted for Turner et al., 1995, and data plotted from Skodda et al., 2012 are values reported on the first visit of the study.
A comparison of Figs. 6(A) and 6(B) suggests that most experimental tasks used for speech sampling only capture a part of the range of behaviors. In this study, natural changes in production were elicited with varying speaking tasks, and the design of the tasks closely followed the hypothesized Hypo- to Hyper-speech continuum associated with varying communicative demands on the speech mechanism (Lindblom, 1990). The question then arises, “What are the limits of the range of healthy productions?” The clear-speech productions were adopted as the reference points, or targets, in this study. Nevertheless, how close are these productions to a given speaker's most canonical and distinct vowel forms? On the other hand, how representative are the conversational speech productions of a given speaker's most casual vowel forms that still sustain the function of communication? Importantly, how do variations in healthy productions inform us about productions associated with disorders affecting speech motor control (e.g., dysarthria)?
To explore the question that pertains to the role of a healthy range of variation and impairment of speech motor control, select data from two studies on dysarthria were added to Figs. 6(C) and 6(D). Figure 6(C) (bottom left) shows quadrilateral vowel space data from Turner et al. (1995) that examined the effects of speaking rate on vowel production and perceptual outcomes for healthy speakers and speakers with amyotrophic lateral sclerosis (ALS). Only male speakers' data were plotted. Again, the shaded area indicates the mean range of productions from the present study. Figure 6(D) (bottom right) shows data for the triangular vowel space from Skodda et al. (2012), which evaluated the use of vowel space metrics to index disease progression in idiopathic Parkinson's disease (PD). For the exploratory purpose here, only data from the first visit reported by Skodda et al. (2012) were plotted. With the additional panels [Figs. 6(C) and 6(D)], Fig. 6 offers three important insights. First, as discussed above, Fig. 6 suggests variation in behavior is typically not accounted for and should be considered. In Weismer et al. (2001), variation in vowel space for healthy speakers, speakers with ALS, and speakers with PD were examined and discussed, but such consideration for variation, especially within-speaker variation, in the literature is scarce.
Second, it is notable that productions from the PD speakers in Skodda et al. (2012) fell within the healthy range from this study. Healthy controls' productions were well within the present study's range of variation and were close to the PD productions. In comparison, productions from the ALS speakers in Turner et al. (1995) were just outside or on the edge of range, falling close to the conversational speech vowel perimeter in this study. The healthy controls in Turner et al. (1995) produced corner vowels that were in the present study's range of variation and were apart from the ALS speakers' productions except for /u/. It must be acknowledged that the young healthy speakers in this study are not the appropriate control group for either of the dysarthria groups in Skodda et al. (2012) or Turner et al. (1995). Nonetheless, young and healthy systems may arguably be considered to be the closest to optimal systems. According to this hypothesis, one would expect the productions in the present study to be quite different from those associated with dysarthria. Figure 6 illustrates that this may not be the case. Even though very little data have been included in this exploratory examination, the presenting patterns suggest that caution should be exercised in the interpretation of population differences based primarily on group average data. Results here also indicate the need for the development of reference data for the range of healthy productions. Most importantly, the use and implications of conventional vowel space metrics as indices for speech motor deficits should be further examined.
Finally, Fig. 6 calls for increased attention to conversational speech, or tasks that are more ecologically valid than traditional laboratory speech tasks. With the assumption that conversational speech is the type of speech task, or speaking mode, or style, that day-to-day communication is accomplished, current results suggest that much more emphasis should be put on studying conversational speech because it may yield very different characteristics compared to other tasks. Related to this, as mentioned above in the discussion of population differences, vowels produced by speakers with dysarthria in the two studies referenced here were not drastically different from the conversational productions of healthy speakers. This poses important theoretical and practical questions about the mechanism of speech motor control and the identification abnormal productions.
This work points to a number of directions for future investigations. As discussed in earlier paragraphs, further emphasis on variation as a desirable characteristic in the speech motor control mechanism is important. Including more conversational speech (or speech produced in more natural tasks) would strengthen our current knowledge about production. Additionally, continuing work in characterizing variation, of the flexibility, of behaviors for speech would benefit from the inclusion of a perceptual component. Contemporary models of speech motor control have proposed that the goals of speech motor control are flexible and dynamic (e.g., Guenther et al., 1998). A combination of production data and perceptual experiments would provide the platform for testing these theoretical concepts. The present study has several limitations. The study included a limited number of speakers given the task and measurement demands. A larger sample size could potentially provide the opportunity for establishing cross-task variation profiles among healthy speakers. In addition, a relatively homogeneous group of speakers in terms of demographics and dialect was studied. It would be of interest to study speakers across the life span and who speak different dialects. Related to this, sociolinguistic factors should be considered further. As discussed earlier, the task-variation approach in this study was associated with the assumption that sociolinguistic effects were an integral part of speech production. Nevertheless, future work may benefit from a more detailed account of sociolinguistic variables. While a variety of consonant contexts were used to elicit vowel productions, the specific consonant contexts were not examined in detail as it was not part of the purposes of the study. Despite the existing literature on consonant context effects, a more detailed analysis of the potential interaction between task effects and consonant contexts is warranted. As for the task comparisons, there were much fewer vowel productions from conversation, as compared to all other tasks, that were included in the analysis due to experimental control. For the purposes of the present study, this was due to the need for experimental control. Future work that includes more conversation productions would be important.
ACKNOWLEDGMENTS
This work was in part supported by an NIDCD Award No. R01 DC003723 and Friends of the Waisman Center Grant. The authors thank Lesley Alexander and Phoebe Allen for their assistance with data collection and Reid Durtschi for assistance with manuscript preparation.
APPENDIX A: READING PASSAGES (NUMBERING OF PASSAGES IS ARBITRARY. TARGET WORDS THAT CONTAIN VOCALIC SEGMENTS OF INTEREST ARE SHOWN IN BOLD)
-
(1)
The Hoyt Aquarium and Zoo Park had a special exhibition featuring tropical lives. The hoot of the great horned owl could be heard meters away. Flowers of different colors surrounded the information booth where maps and guides could be picked up. In the garden with these flowers, a gardener hoed the soil to make it loose and good for new growth. A hut around the entrance marked the beginning of an adventure. Many children jumped up and down in excitement. There was a head of a fake King Kong on one of the man-made hills where monkeys rested and watched people. Next to the monkeys was the famous red panda. The panda had an itch on its leg and was rubbing against a small bush. It would hide around the bush, however, when too many people stood around. There were also a wide variety of sea creatures at the exhibition. The aquarium was home to thirteen sharks along with other smaller fish. The aquarium keeper explained the habitat of sharks to everyone. One shark hid behind some seaweed and devoured the food. A child asked, “How did the shark eat so fast?” Following the heat to the north side of the exhibition, one could find the “Paradise of Birds.” The hot air was appealing to the tropical birds, said the self-guided tour. The tour notes said that it could be as hot as being under the hood of a running car and the birds would still like it. A couple of stunning toucan birds flew across the palm trees several times. “They hate to be watched closely,” said the bird specialist. The wide variety of items at the exhibition, not just the featured tropical animals, but also the information sites and games for the youngsters attracted visitors of all ages. It was definitely a fun and educational day at the park for all. (314 words)
-
(2)
Meg just had a baby boy a month ago. She and her husband, Joe, held a small gathering for family and friends to celebrate the newborn. They had to change the date a couple of times because Joe had an unexpected business trip. He was in England near the River Goyt for a week and then traveled to Germany and France for another five days. Joe tied some baby blue ribbons around the window panel for decoration. Meg made a kite of rainbow colors to put on the ceiling in the baby room. Their two-and-half-year-old daughter, Kiara, was more excited about the chocolate cake than anything. She had trouble saying the sound “k,” so she said, “I want ‘tate,’” as she walked and skipped around the kitchen counter. “We will have the cake when our guests are here,” said Meg. Kiara was not happy. “Don't pout like that, sweetie; see, Simba is ready to play,” Meg picked up her stuffed animal lion and tickled her. Kiara got this unique pied lion as a Christmas gift from her grandparents. The guests arrived soon. They brought goat cheese, crackers, fruit salad, and many more to share. Everyone cooed over the baby pictures when Ryan was still napping. The kid from next door came over to play with Kiara and her cousins. Joe and a couple of friends put the movie “Tit for Tat” on for some laughter. It featured the tit-for-tat fighting routines of Stan Laurel and Oliver Hardy. The cake was a big hit. Kiara's first bite of the cake was so big that she barely managed to keep her mouth closed. She looked as if she had a coat of frosting on her cheeks because it got all over her face. Kiara laughed, and so did the adults. Her cousin could not finish his piece and asked his mom to help. It was a fun time, and Ryan seemed to have enjoyed the celebration for him too. (333 words)
-
(3)
Kate went out for some leisure shopping early in the morning. The Doyd Craft and Fabric store had a special weekend sale. The parking lot was already crowded when she arrived. A cut-by-the-yard cotton and patterned fabric was 60% off. Kate got some dress fabric with the bargain deal. One piece had Polka dot pattern in bright red and white and looked very festive and fun. She paid for some sewing and jewelry-making supplies also. Kate went home and toyed with some small pieces of fabric and buttons. She had a bit of time before a tea party with friends. She worked on a tote that was falling apart at the bottom and on the edges. She sewed a cotton pad on the inside of the tote to reinforce it. She then dyed the bag into indigo with a natural fabric dye mix. The dye mix was boiled in a pot the night before and allowed to cool for use on the next day. While waiting for the tote to dry, Kate worked on restyling a dollar-store metal hair accessory. She glued a ceramic bead of a bright yellow color onto the ribbon bow. It looked like a rose bud that was ready to bloom into a gorgeous flower. The addition did the trick and accentuated the piece. It looked good with the light yellow summer dress Kate was wearing. Kate also started a crochet project. It was a crocheted robot that crafters called a “cro-bot” for the new crochet class she just began going to. It was quite a fun and productive Sunday morning. For a crafter like Kate, nothing is better than exercising creativity in various functional ways. (286 words)
-
(4)
It started when Rob noticed swelling and pain in his right big toe one night when he was on vacation in Hawaii. His right foot would not fit in a boot that had a wide front and relatively soft insole made of memory foam material. Rob rested in bed because the great pain and swelling prevented him from standing on two feet. He had a big poi dinner several hours ago and wondered if it had to do with the amount of poi and fish he had. The feast was put together by a Russian chef. A one-pood salmon was first dipped in Russian vodka and then grilled with banana leave wraps. It was served with a special Russian-Hawaiian white sauce. Rob had more than three pieces of salmon. He took some pain reliever earlier, but it was not very helpful. He was worried that his toe got infected when he went to the farm the previous day. He saw a sheep with what is called a ked itch caused by a type of sheep fly. Rob got his note book to jot some notes down about the pain and swelling and some other things too. The old pen by the bed was a dud with a broken cap and little ink left, but Rob managed to use it. Fred was Rob's geek friend who came to see how he was doing and brought him some water. The other guys were playing the Wii game, “The World of Goo,” together on a big screen high definition TV in the game and lounge room. Fred did not care for the game. He made a gesture as if he was going to poke on Rob's toe and jokingly said, “Maybe you had too much to drink tonight, pal.” “I'll just rest for a while. Have fun at the gig on the east end of the beach tonight and don't drink too much yourself,” Rob responded. They were going to a performance by the Dould Family Folk and Rock Band that only played every other Thursday night. Their tour guide came to check in with Rob and told Fred the group was heading toward the beach. Later, Rob learned that he had Gout, a disease known as “the Disease of Kings” or the “rich man's disease” that was dated as far back as 2600 B.C. (393 words)
-
(5)
Pete started playing golf during his sophomore year in college. At the time his friends cowed him into playing golf instead of rugby because Pete was never the aggressive type. Pete did very well when he first gave golf a try. His gut feeling was that he could become very good at it. “The dude was right,” said his long-time coach who became his best friend. Mr. Coyd said of the many amateur golf players he had worked with, Pete was one of the best. The entrance to Pete's golf club had a solar-powered gate that was decorated with seasonal flowers. The course had a tradition to toot a brass horn during a tournament, big or small, for the player who won each hole. Pete teed off in front of a crowd mostly of family and friends when he played the finishing hole during a community-based competition. He was leading. The player behind him put in some time trying to get a “fried egg” out of the sand bunker. The crowd would dote on Pete's techniques and his ability to make birdies throughout the competition. Pete won the competition to get a special prize from the golf club. Pete did a good deed and gave half the money he won to a local charity center. His dad was very proud of him and gave him a big hug. “I never had doubt in Pete about anything because he is a hard worker who does not quit,” said Pete's father to those around. Pete bowed with a big smile in response to his family and friends' applause. A local news reporter asked Pete to name the strategy that keyed his victory during a small press conference arranged by the golf club. Pete said it was a genuine interest in the sport. As enthusiastic and hard-working as Pete was, he probably would have been very good at whatever he chose to participate in. (325 words)
-
(6)
The Boyd brothers go on a summer fishing trip every year. It is a family tradition to them. Ted is the oldest and tallest of the three who always does the driving. Tad is the middle child and the quiet one who gets the big fish most of the time. Todd is the little brother who got all the attention growing up because of his red hair and his loud laughs. This year they were bringing Ted's older son, five-year-old Kevin, for his first fishing trip. Cod is Kevin's favorite fish to eat, and he already went fishing a couple times with his dad. The weather was bad when they left for the cottage that they have rented for years. It cleared up nicely once they got there. The Putt-putt miniature golf course about five minutes down the road from their cottage had been around since the brothers' first trip. They played a card game called “pit” at the dining table that night. They took the boat out on the lake early the next morning and had some success. A fish took the bait and tried to swim off by going underneath the motorboat. Ted quickly pulled it up. A dead bug fell out of the fish's mouth when they put it into the water bucket. Ted encouraged Kevin to tout his fishing ability to anyone who would listen. Kevin's dad and uncles were happy to see that he was enthusiastic about it. Buck the pet dog was sitting by the edge of the boat, enjoying both the sunshine and the light breeze. Tad asked Todd, “Have you seen the King Tut exhibition?” being reminded of a trip he had taken some years ago down the Nile River in Egypt. Todd responded, “Gad no, but I'd really like to see it. I understand the boy king was somewhat of a cad but otherwise was a good leader.” The brothers exchanged some more traveling stories. Ted looked around and checked his compass to make sure the wind hadn't blown them too far from the fishing water. (348 words)
APPENDIX B: WORD SELECTION CRITERIA FOR CONVERSATIONAL SPEECH SAMPLES
-
(1)
Words containing vowels of interest had priority for selection.
-
(2)
Monosyllabic words were preferred over multisyllabic words.
-
(3)
Consonant contexts resembling those described in Table I had priority for selection.
-
(4)
Words containing liquids, glides, and nasals were avoided to the extent possible.
Pilot findings of this work were presented at the 158th meeting of the Acoustical Society of America, San Antonio, TX, U.S.A., November, 2009.
References
- 1. Blake, R. , and Josey, M. (2003). “ The/ay/diphthong in a Martha's Vineyard community: What can we say 40 years after Labov?,” Lang. Soc. 32(4), 451–485. 10.1017/S0047404503324017 [DOI] [Google Scholar]
- 2. Bradlow, A. R. , Torretta, G. M. , and Pisoni, D. B. (1996). “ Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics,” Speech Commun. 20(3), 255–272. 10.1016/S0167-6393(96)00063-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Bunton, K. , and Leddy, M. (2011). “ An evaluation of articulatory working space area in vowel production of adults with Down syndrome,” Clin. Linguist. Phon. 25(4), 321–334. 10.3109/02699206.2010.535647 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Byrd, D. (1994). “ Relations of sex and dialect to reduction,” Speech Commun. 15, 39–54. 10.1016/0167-6393(94)90039-6 [DOI] [Google Scholar]
- 5. Chung, H. , Kong, E. J. , Edwards, J. , Weismer, G. , and Fourakis, M. (2012). “ Cross-linguistic studies of children's and adults' vowel spaces,” J. Acoust. Soc. Am. 131(1), 442–454. 10.1121/1.3651823 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Clopper, C. G. , Pisoni, D. B. , and de Jong, K. (2005). “ Acoustic characteristics of the vowel systems of six regional varieties of American English,” J. Acoust. Soc. Am. 118(3), 1661–1676. 10.1121/1.2000774 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Cnaan, A. , Laird, N. M. , and Slasor, P. (1997). “ Tutorial in biostatistics: Using the general linear mixed model to analyse unbalanced repeated measures and longitudinal data,” Stat. Med. 16, 2349–2380. [DOI] [PubMed] [Google Scholar]
- 8. Crystal, T. H. , and House, A. S. (1982). “ Segmental durations in connected speech signals: Preliminary results,” J. Acoust. Soc. Am. 72(3), 705–716. 10.1121/1.388251 [DOI] [PubMed] [Google Scholar]
- 9. Duffy, J. R. (2013). Motor Speech Disorders: Substrates, Differential Diagnosis, and Management, 3rd ed. ( Mosby, St. Louis, Missouri: ), pp. 1–512. [Google Scholar]
- 10. Ferguson, S. H. , and Kewley-Port, D. (2007). “ Talker differences in clear and conversational speech: Acoustic characteristics of vowels,” J. Speech. Lang. Hear. Res. 50, 1241–1255. 10.1044/1092-4388(2007/087) [DOI] [PubMed] [Google Scholar]
- 11. Ferguson, S. H. , Poore, M. A. , and Shrivastav, R. (2010). “ Acoustic correlates of reported clear speech strategies,” J. Acad. Rehab. Audiol. 43, 45–64. [Google Scholar]
- 12. Flipsen, P. , and Lee, S. (2012). “ Reference data for the American English acoustic vowel space,” Clin. Linguist. Phon. 26(11–12), 926–933. 10.3109/02699206.2012.720634 [DOI] [PubMed] [Google Scholar]
- 13. Fourakis, M. (1991). “ Tempo, stress, and vowel reduction in American English,” J. Acoust. Soc. Am. 90(4), 1816–1827. 10.1121/1.401662 [DOI] [PubMed] [Google Scholar]
- 14. Gay, T. (1978). “ Effect of speaking rate on vowel formant movements,” J. Acoust. Soc. Am. 63(6), 223–230. 10.1121/1.381717 [DOI] [PubMed] [Google Scholar]
- 15. Guenther, F. H. , Hampson, M. , and Johnson, D. (1998). “ A theoretical investigation of reference frames for the planning of speech movements,” Psychol. Rev. 105(4), 611–633. 10.1037/0033-295X.105.4.611-633 [DOI] [PubMed] [Google Scholar]
- 16. Hillenbrand, J. , Getty, L. A. , Clark, M. J. , and Wheeler, K. (1995). “ Acoustic characteristics of American English Vowels,” J. Acoust. Soc. Am. 97(5), 3099–3111. 10.1121/1.411872 [DOI] [PubMed] [Google Scholar]
- 17. Hillenbrand, J. M. , Clark, M. J. , and Nearey, T. M. (2001). “ Effects of consonant environment on vowel formant patterns,” J. Acoust. Soc. Am. 109(2), 748–763. 10.1121/1.1337959 [DOI] [PubMed] [Google Scholar]
- 18. Jacewicz, E. , Fox, R. A. , and Salmons, J. (2011). “ Vowel change across three age groups of speakers in three regional varieties of American English,” J. Phonetics 39(4), 683–693. 10.1016/j.wocn.2011.07.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Johnson, K. , Flemming, E. , and Wright, R. (1993). “ The Hyperspace Effect: Phonetic targets are hyperarticulated,” Language 69(3), 505–528. 10.2307/416697 [DOI] [Google Scholar]
- 20. Karlsson, F. , and van Doorn, J. (2012). “ J. Vowel formant dispersion as a measure of articulation proficiency,” J. Acoust. Soc. Am. 132(4), 2633–2641. 10.1121/1.4746025 [DOI] [PubMed] [Google Scholar]
- 21. Kent, R. D. , and Kim Y. J. (2003). “ Toward an acoustic typology of motor speech disorders,” Clin. Linguist. Phon. 17(6), 427–445. 10.1080/0269920031000086248 [DOI] [PubMed] [Google Scholar]
- 22. Klatt, D. (1975). “ Voice onset time, frication, and aspiration in word-initial consonant clusters,” J. Speech Hear. Res. 18, 686–706. 10.1044/jshr.1804.686 [DOI] [PubMed] [Google Scholar]
- 23. Koopmans-van Beinum, F. J. (1980). “ Vowel Contrast Reduction, an acoustic and perceptual study of Dutch vowels in various speech conditions,” Ph.D. thesis, University of Amsterdam, The Netherlands, Academische pers B. V., Amsterdam, pp. 1–154. [Google Scholar]
- 24. Krause, J. C. , and Braida, L. D. (2004). “ Acoustic properties of naturally produced clear speech at normal speaking rates,” J. Acoust. Soc. Am. 115(1), 362–378. 10.1121/1.1635842 [DOI] [PubMed] [Google Scholar]
- 25. Labov, W. (1972). “ The study of language in its social context,” in Sociolinguistic Patterns, edited by Labov W. ( University of Pennsylvania Press, Philadelphia, PA: ), pp. 183–259. [Google Scholar]
- 26. Ladd, R. D. (1988). “ Declination‘reset’ and the hierarchical organization of utterances,” J. Acoust. Soc. Am. 84(2), 530–544. 10.1121/1.396830 [DOI] [Google Scholar]
- 27. Lansford, K. L. , and Liss, J. M. (2014). “ Vowel acoustics in dysarthria: Mapping to perception,” J. Speech Lang. Hear. Res. 57(1), 68–80. 10.1044/1092-4388(2013/12-0263) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Lindblom, B. (1963). “ Spectrographic study of vowel reduction,” J. Acoust. Soc. Am. 35(11), 1773–1781. 10.1121/1.1918816 [DOI] [Google Scholar]
- 29. Lindblom, B. (1990). “ Explaining phonetic variation: A sketch of the H&H theory,” in Speech Production and Speech Modeling, edited by Hardcastle W. J. and Marchal A. ( Kluwer Academic Press, The Netherlands: ), pp. 403–439. [Google Scholar]
- 30. Lindgren, R. , and Lindblom, B. (1996). “ Reduction of vowel chaos,” TMH-Quart. Prog. Status Rpt. 37(2), 1–4. [Google Scholar]
- 31. Milenkovic, P. (2000). Time-frequency analysis for 32-bit Windows (Version revised July 26, 2001) [computer software].
- 32. Monsen, R. B. , and Engebretson, A. M. (1983). “ The accuracy of formant frequency measurements: A comparison of spectrographic analysis and linear prediction,” J. Speech Lang. Hear. Res. 26, 89–97. 10.1044/jshr.2601.89 [DOI] [PubMed] [Google Scholar]
- 33. Moon, S.-J. , and Lindblom, B. (1994). “ Interaction between duration, context, and speaking style in English stressed vowels,” J. Acoust. Soc. Am. 96(1), 40–55. 10.1121/1.410492 [DOI] [Google Scholar]
- 34. Perkell, J. S. , Guenther, F. H. , Lane, H. , Matthies, M. L. , Stockmann, E. , Tiede, M. , and Zandipour, M. (2004). “ The distinctness of speakers' productions of vowel contrasts is related to their discrimination of the contrasts,” J. Acoust. Soc. Am. 116(4 Pt 1), 2338–2344. 10.1121/1.1787524 [DOI] [PubMed] [Google Scholar]
- 35. Peterson, G. E. , and Barney, H. L. (1952). “ Control methods used in a study of the vowels,” J. Acoust. Soc. Am. 24(2), 175–184. 10.1121/1.1906875 [DOI] [Google Scholar]
- 36. Picheny, M. , Durlach, N. , and Braida, L. (1986). “ Speaking clearly for the hard of hearing II: Acoustic characteristics of clear and conversational speech,” J. Speech Hear. Res. 29, 434–446. 10.1044/jshr.2904.434 [DOI] [PubMed] [Google Scholar]
- 37. Sapir, S. , Ramig, L. O. , Spielman, J. L. , and Fox, C. (2010). “ Formant centralization ratio: A proposal for a new acoustic measure of dysarthric speech,” J. Speech Lang. Hear. Res. 53(1), 114–125. 10.1044/1092-4388(2009/08-0184) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Skodda, S. , Grönheit, W. , and Schlegel, U. (2012). “ Impairment of vowel articulation as a possible marker of disease progression in Parkinson's disease,” PLoS One 7(2), e32132. 10.1371/journal.pone.0032132 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Skodda, S. , Visser, W. , and Schlegel, U. (2011). “ Vowel articulation in Parkinson's disease,” J. Voice 25(4), 467–472. 10.1016/j.jvoice.2010.01.009 [DOI] [PubMed] [Google Scholar]
- 40. Stevens, K. N. (2000). Acoustic Phonetics ( MIT Press, Cambridge, MA: ), pp. 1–624. [Google Scholar]
- 41. Stevens, K. N. , and House, A. S. (1961). “ An acoustical theory of vowel production and some of its implications,” J. Speech Hear. Res. 4, 303–320. 10.1044/jshr.0404.303 [DOI] [PubMed] [Google Scholar]
- 42. Stevens, K. N. , and House, A. S. (1963). “ Perturbation of vowel articulations by consonantal context: An acoustical study,” J. Speech Hear. Res. 6(2), 111–128. 10.1044/jshr.0602.111 [DOI] [PubMed] [Google Scholar]
- 43. Stevens, K. N. , House, A. S. , and Paul, A. P. (1966). “ Acoustical description of syllabic nuclei: An interpretation in terms of a dynamic model of articulation,” J. Acoust. Soc. Am. 40(1), 123–132. 10.1121/1.1910027 [DOI] [PubMed] [Google Scholar]
- 44. Story, B. H. (2009). “ Vowel and consonant contributions to vocal tract shape,” J. Acoust. Soc. Am. 126(2), 825–836. 10.1121/1.3158816 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Turner, G. , Tjaden, K. , and Weismer, G. (1995). “ The influence of speaking rate on vowel space and speech intelligibility for individuals with Amyotrophic Lateral Sclerosis,” J. Speech Hear. Res. 38(5), 1001–1013. 10.1044/jshr.3805.1001 [DOI] [PubMed] [Google Scholar]
- 46. Van Son, R. J. J. H. , and Pols, L. C. W. (1990). “ Formant frequencies of Dutch vowels in a text, read at normal and fast rate,” J. Acoust. Soc. Am. 88(4), 1683–1693. 10.1121/1.400243 [DOI] [PubMed] [Google Scholar]
- 47. Vayra, M. , and Fowler, C. A. (1992). “ Declination of supralaryngeal gestures in spoken Italian,” Phonetica 49, 48–60. 10.1159/000261902 [DOI] [PubMed] [Google Scholar]
- 48. Weismer, G. , Jeng, J. Y. , Laures, J. S. , Kent, R. D. , and Kent, J. F. (2001). “ Acoustic and intelligibility characteristics of sentence production in neurogenic speech disorders,” Folia Phoniatr. Logop. 53(1), 1–18. 10.1159/000052649 [DOI] [PubMed] [Google Scholar]






