Abstract
Speech nasalization is achieved primarily through the opening and closing of the velopharyngeal port. However, the resultant acoustic features can also be influenced by tongue configuration. Although vowel nasalization is not contrastive in English, two previous studies have found possible differences in the oral articulation of nasal and oral vowel productions, albeit with inconsistent results. In an attempt to further understand the conflicting findings, we evaluated the oral kinematics of nasalized and non-nasalized vowels in a cohort of both male and female American English speakers via electromagnetic articulography. Tongue body and lip positions were captured during vowels produced in nasal and oral contexts (e.g., /mɑm/, /bɑb/). Large contrasts were seen in all participants between tongue position of /æ/ in oral and nasal contexts, in which tongue positions were higher and more forward during /mæm/ than /bæb/. Lip aperture was smaller in a nasal context for /æ/. Lip protrusion was not different between vowels in oral and nasal contexts. Smaller contrasts in tongue and lip position were seen for vowels /ɑ, i, u/; this is consistent with biomechanical accounts of vowel production that suggest that /i, u/ are particularly constrained, whereas /æ/ has fewer biomechanical constraints, allowing for more flexibility for articulatory differences in different contexts. Thus we conclude that speakers of American English do indeed use different oral configurations for vowels that are in nasal and oral contexts, despite vowel nasalization being non-contrastive. This effect was consistent across speakers for only one vowel, perhaps accounting for previously-conflicting results.
Keywords: vowel nasalization, oral configuration, labial, lingual, American English, electromagnetic articulography
Introduction
Appropriate nasalization is important for intelligible speech production and can be impacted by a variety of disorders including hearing impairment, neurological diseases, or structural issues such as cleft palate. Many languages use nasalization as a contrastive marker between phonemes. English uses nasal and non-nasal consonants (e.g., /m/ vs. /b/), but vowels may or may not be nasalized based on coarticulatory mechanisms: those in nasal contexts (i.e., near nasal consonants) will be nasalized; those in non-nasal contexts, non-nasalized. Some languages, including French, Hindi, and Portuguese, use vowel nasalization contrastively.
Production of Nasalized Phonemes
Nasalization is achieved primarily through coupling the nasal and oral cavities by opening the velopharyngeal (VP) port, which introduces a variety of acoustic features (Chen, 1997; Fant, 1971; House and Stevens, 1956). In the production of nasal consonants, this action results in a characteristic “nasal murmur,” manifested as a low first formant (F1) and the introduction of anti-resonances that result in regions of reduced power at varied frequencies based on the place of tongue-body articulation. During nasalized vowels, the modifications due to nasal coupling can include the introduction of an additional low-frequency spectral peak and a reduction in amplitude and/or change in location of the F1 spectral peak. This acoustic result can also be influenced by the configurations of oral articulators such as the tongue and lips; this articulatory-to-acoustic “trading relationship” is an example of the many-to-one mapping of motor equivalence, in which many different articulator configurations can produce very similar acoustic results (Maeda, 1990; Perkell et al., 1993; Savariaux et al., 1999). Although we generally think of velar control as being binary (e.g., open or closed), we know that in fact there are degrees of oral-nasal coupling, and that these correlate non-linearly with listener perceptions of nasalization (Kummer et al., 2003, 1992).
Oral Configuration in Languages with Nasal Vowel Contrasts
Oral articulations (i.e., tongue and lip positions) have been shown to change systematically for vowel nasalization in French (Engwall et al., 2006), Hindi (Shosted et al., 2012), Brazilian and European Portuguese (Barlaz et al., 2018; Cunha et al., 2019; Oliveira et al., 2012), and other languages with contrastive vowel nasalization (Comivi Alowonou et al., 2019). By analyzing magnetic resonance images of French speakers, Engwall and colleagues (2006) found that two speakers each showed large differences in oral articulation during nasal versus non-nasal vowels and two other speakers did not. Oral articulation changes involved retracting and sometimes raising the tongue for nasal vowels. Shosted and colleagues (2012) measured tongue position using electromagnetic articulography (EMA), in which sensors are glued to lingual landmarks and tracked in three dimensions in real-time as participants produce natural speech. They found significant differences in sensor positions for nasal/oral vowel contrasts across four Hindi speakers, with the parameters of tongue position (height and forwardness), tongue sensor location (tip, midpoint, back), and vowel (ten oral vowels with ten matched nasal vowels), all affecting the results. In general, back vowels had lowered tongue positions during nasalized vowels, low vowels had more forward positions, and front vowels had higher tongue positions. Studies in Portuguese (Brazilian and European) suggest that oral configurations similarly change between nasal and oral vowels. Two studies using MRI both showed that lingual changes were most evident in the vowels /a/ and /o/, in both Brazilian and European Portuguese (Barlaz et al., 2018; Oliveira et al., 2012). They found that the tongue blade was higher during the nasal version of the vowel /a/, whereas changes for /i/ and /u/ were much more subtle. Authors across these languages have suggested that oral configuration differences may be used to enhance or attenuate nasal contrasts, particularly to maintain perceptual distinctiveness (i.e., Dispersion Theory; Barlaz et al., 2018; Engwall et al., 2006; Liljencrants and Lindblom, 1972; Shosted et al., 2012).
Oral Configuration Changes in American English
Vowel nasalization is not contrastive in English, and as such English vowels are typically nasalized based on coarticulation (that is, they are nasalized when near nasal consonants and non-nasalized when near non-nasal consonants). Conventional accounts of nasalization in English would deem it unlikely for English speakers to systematically change their oral articulation to modulate the contrast of nasality in vowels in nasal versus non-nasal contexts. It has been theorized, however, that the degree of anticipatory nasalization in American English suggests that it is “not an unintended coarticulatory effect but an intrinsic property of the vowel” (Solé, 1992). Similarly, evidence of fluctuations in vowel nasalization over time may suggest that this subphonemic variation is not determined by physical properties of the vocal tract alone, and instead is learned (Zellou and Tamminga, 2014).
Two previous studies have indeed found differences in the oral articulation of nasal and non-nasal vowel productions in English (Arai, 2005; Carignan et al., 2011). Arai (2005) evaluated tongue position using EMA in one male speaker of American English. The speaker produced /i, I, ɛ, ʌ, æ, ɑ/ in /bVb/ and /bVm/ contexts. Arai found no difference in tongue position during /i, æ, ʌ/. During /I, ɛ/, the tongue was more forward in nasal contexts. Finally, the speaker’s tongue was lower during /ɑ/ in a nasal context. When the tongue position was advanced, it was advanced on the order of 2–3 mm. Carignan and colleagues (2011) measured tongue height with EMA in four speakers of American English, all male. The speakers produced two vowels (/i/ and /ɑ/) in CVCs in which the first C was always oral and the final C was either oral or nasal. They found that tongue height was unchanged in nasalized /ɑ/ and higher in nasalized /i/. Tongue height was raised by 0.01–0.59 mm. The authors noted that the median error of the equipment utilized in that study, the Carstens AG500 electromagnetic articulograph, is less than 0.5 mm. The results only reached statistical significance for /i/, which was likely due to the large number of repetitions (380 per speaker). Results from these studies are mixed (summarized in Table 1), perhaps due to small sample sizes in American English (1–4 male participants) and differing analysis methods.
Table 1.
Results of selected previous studies of oral configuration of nasalization
|
Languages with contrastive vowel nasalization
| ||||
| Study | Participants | Language | Measured | Results |
|
| ||||
| (Engwall et al., 2006) | 2M, 2F | Belgian French | Tongue dynamics with MRI | 2 participants show oral configuration contrasts, 2 do not |
| (Shosted et al., 2012) | 1M, 3F | Hindi | Tongue position with EMA | Back vowels lower in nasals; low vowels more forward in nasals; front vowels higher in nasals |
| (Oliveira et al., 2012) | 2M, 4F | European Portuguese | Static MRI; real-time MRI | Lingual changes most evident in /a/ and /o/; less so for /i/ and /u/ |
| (Barlaz et al., 2018) | 7M, 5F | Brazilian Portuguese | Real-time MRI | Higher tongue in nasal /a/; smaller changes in /u/ and /i/ (lower tongue) |
|
| ||||
|
American English (language without contrastive vowel nasalization)
| ||||
| Study | Participants | Language | Measured | Results |
|
| ||||
| (Arai, 2005) | 1M | American English | Tongue height/ forwardness with EMA | /i, æ, ʌ/ no difference; /I, ɛ/ tongue more forward; /ɑ/ tongue lower (no statistical tests) |
| (Carignan et al., 2011) | 4M | American English | Tongue height with EMA | /ɑ/ no difference, /i/ higher in nasals |
| (Rong and Kuehn, 2010) | Computational model of 1M vocal tract | American English | Vocal tract shape | Tongue and lip position could be used to alter perception of nasalization in /i/ |
Acoustic and Perceptual Consequences of Oral Articulation Changes
Opening the velopharyngeal port introduces spectral poles and zeros into the acoustic signal, including a peak of energy around 1000 Hz (Chen, 1997; House and Stevens, 1956). One resulting change is a broadening of the bandwidth of the formants (Styler, 2017), which can make precise extraction of formants impossible, and is thus one argument for including kinematic or aerodynamic measurements when studying nasalization. Articulatory changes may exaggerate or attenuate these acoustic changes. Articulatory changes to enhance the perception of nasality may include raising the tongue for low vowels and lowering of the tongue for other vowels (Shosted et al., 2012). Lip rounding could also affect the perception of nasalization, particularly in low vowels (Stevens, 1998, p. 290).
To assess whether oral articulation changes could effectively nullify an open velum in the case of hypernasality, Rong and Kuehn (2012, 2010) simulated a vocal tract producing an /i/ with three variations: (1) a closed velopharyngeal port, (2) an open velopharyngeal port with a standard oral configuration, and (3) an open velopharyngeal port with oral configurations tuned to attenuate the acoustic effects of the open velopharyngeal port. They indeed found that specific oral configurations could produce acoustic samples that were perceptually non-nasal, despite the velopharyngeal port being open. Specifically, they found that a downward and forward movement of the tongue dorsum attenuated nasal acoustic features of /i/ (Rong and Kuehn, 2012). They also found that lip shape changes could affect the perception of nasality: lip constriction and rounding could both make the /i/ sound more nasal (see also Perkell et al., 2000). The authors thus suggest that specific oral articulations could be used to compensate for velopharyngeal dysfunction (Rong and Kuehn, 2012).
It has been shown that in typical speakers of American English, coarticulatory vowel nasalization (measured acoustically) can vary systematically based on lexical or phonological contrast, or on word position in the phrase (Cho et al., 2017). It is assumed that these differences a) may be associated with systematic articulatory differences, though not necessarily, and b) may serve a functional purpose to enhance or attenuate nasal contrasts. However, this latter assumption is not necessarily true: systematic differences could be due merely to structural or vowel-space constraints (see Discussion > Possible Mechanisms of Changes for a further treatment on this subject). In order to resolve this question, we must first determine whether speakers do indeed make systematic changes in their oral configurations for vowels in oral and nasal contexts.
Study Aims
Previous evidence in languages with and without contrastive vowel nasalization suggests that there are systematic changes in oral configurations in oral and nasal vowels. However, there is yet only scant evidence to indicate whether typical speakers of American English change their oral articulation in nasal and non-nasal contexts. The current evidence evaluates primarily tongue position in a small number of speakers, and the conclusions thus far are contradictory. In the current study, we measured tongue and lip positions during vowels produced in nasalized and non-nasalized (oral) contexts in a larger cohort of both male and female American English speakers. We present descriptive analyses of these oral configuration differences, augmented by statistical analyses to determine whether changes are consistent across all vowels or whether they are vowel-specific. Consistent oral configuration changes across participants with typical speech would add complexity to our understanding of articulations of contrastive and non-contrastive vowel nasalization; this would further provide support for the development of behavioral therapies targeted toward articulatory compensation for hypernasal speech.
Methods
Participants
Ten healthy individuals participated (age range: 20–33 years; mean: 23.2 years; four women, five men, one non-binary person; five female, five male). All participants were native speakers of American English and reported no history of speech, language, or hearing impairments. Participants provided written consent in compliance with the Boston University Institutional Review Board.
Data Collection
An NDI Wave Speech Research System (Northern Digital Inc., Waterloo, Ontario, Canada) was used to record articulator position data and speech acoustics simultaneously. Kinematic data were sampled at 100 Hz and acoustic data were sampled at 22,050 Hz. Three sensors were used for head correction (gingiva of the upper incisors and the left and right mastoid processes). Each participant’s maxillary occlusal plane was measured relative to the reference sensors using plastic mouth guards with three sensors attached to fit under the back molars and beneath the diastema of the front teeth. After measures for the maxillary occlusal plane were completed, the tongue sensor was attached. The tongue body sensor (TB) was placed as far back as was feasible with each individual, at least 2.5 cm from the anterior tip of the tongue. Lip sensors were attached at the center of each lip. The reference sensor on the upper gingiva and the articulatory sensors were attached with dental adhesive (high viscosity PeriAcryl™), whereas the two sensors on the mastoid processes were attached with double sided tape and medical tape.
Participants completed data collections for two experimental protocols sequentially (other experiment reported in Cler et al., 2017). Thus five total sensors were mounted on the tongue and lips. Tongue tip results were similar to those of the tongue body and thus for simplicity we discuss only the tongue body and lips here.
Participants were given a list of nasal and oral utterances to produce. Participants produced vowels /ɑ, i, æ, u/ in nasal and oral consonant-vowel-consonant (CVC) contexts with labial place of consonant articulation. Each CVC was repeated three times (e.g., “/mɑm mɑm mɑm; bɑb bɑb bɑb/”). Participants were instructed to pause briefly between words (that is, /bɑb bɑb bɑb/ rather than / bɑbɑbɑb /) but no explicit rate or stress instructions were given. Participants also produced words with alveolar placements. However, several participants mispronounced one of the transliterations; “Nan” (as in a name for a grandmother) was chosen as a transliteration of /næn/, but some pronounced it /nɑn/ (as in the bread, naan), identically to that of the transliteration of “non”. Thus we did not analyze the n/d tokens, and the total number of utterances analyzed per speaker was three repetitions each of eight words/non-words.
Data Preprocessing
Re-referencing and Filtering
EMA data were exported from the NDI WaveFront software and imported into MATLAB. Custom MATLAB (Mathworks, Natick, MA) software was used to low-pass filter the data with a third order Butterworth filter with a 5 Hz cutoff for the reference sensors and 20 Hz cutoff for the articulatory sensors (Tiede et al., 2010). To correct for head motion, the kinematic data were re-referenced to each individual’s articulatory space; the origin was redefined as the midpoint of line connecting the left and right rear molars, directly behind the diastema of the upper central incisors. The horizontal axis was defined as the line connecting the origin and the diastema. We reduced the dimensionality of the data by limiting our analyses to these two axes, excluding the (less relevant) left-right dimension.
Segmentation
The speech analysis software Praat (Boersma and Weenink, 2015) was used to segment the CVC syllables from the acoustic signal recorded by the NDI System. Onsets and offsets of each consonant and vowel were marked and timecodes were extracted; markings were all completed by the first author and reviewed by the senior author. In order to capture the steady-state portion of the vowel, analyses were completed on the average position of the middle 50% of the marked (steady-state) vowel portion. The steady-state portion of the vowel was chosen to ensure that potential differences in articulation of the consonants or transitions in and out of the consonants are not considered. The consonants are by definition contrastive, but our interest here is specifically across the vowels. For each participant and vowel, position data were extracted and averaged across the duration of the vowel of each utterance (e.g., average of 12 data points if vowel center portion lasted 120ms).
Data Analysis
The main analysis of differences between the conditions (nasal and oral contexts) was a vector-based representation of 2D position change in tongue configuration. Lip aperture was also evaluated descriptively. To complement these descriptive results, we used ANOVAs to test the hypothesis that any tongue or lip position changes were vowel-specific.
Tongue Position Change Vectors
For tongue position analyses, position was averaged across the three produced repetitions in the nasal and oral contexts respectively. Two dimensional vectors in the midsagittal plane were calculated and plotted (with respect to the superior/inferior, y, and posterior/anterior, x, axes) to indicate the difference in mean tongue position between vowels in nasal and oral contexts. Our main analysis approach is schematized in Figure 1: the left panel shows tongue sensors for one production each of /æ/ in an oral context (/bæb/, in blue) and in a nasal context (/mæm/, in red). The center panel shows tongue sensor positions for three repetitions of /bæb/ and /mæm/ and the corresponding centroid of tongue sensor position. Vectors are shown from the mean position during the vowel in an oral context to the mean position of the vowel in a nasal context. The right panel shows the same vectors with a normalized origin in order to compare across participants.
Figure 1.
Schematized analysis. The left panel shows tongue and tongue sensor positions during one production by one participant of /bæb/ (blue) and /mæm/ (red). Orientations of x and y dimensions are shown, but note that the origin lies in the midsagittal plane, midway between the back two molars. The center panel shows overlay of tongue positions for three productions each of /bæb/ and /mæm/ by one participant. Average positions are calculated and a vector is plotted from the mean center of oral productions to the mean center of nasal productions. The right panel schematizes the average of vectors across participants shifted to a common origin, with vector lengths still in mm. The dashed circles in the right panel are of radius .88 mm, which is the error inherent in the NDI Wave system (Berry, 2011).
Lip Aperture and Protrusion
Lip aperture was calculated as the Euclidean distance between the upper lip sensor and the lower lip sensor. Aperture was then averaged across the three repetitions in the oral context and nasal context respectively for each vowel. Lip protrusion was quantified as changes in the lower lip position in the anterior/posterior plane. That is, a rounded lip would manifest as protrusion of the sensor. Lip aperature and protrusion measures were each averaged across the three repetitions in oral and nasal contexts for each vowel.
Acoustic Analysis of /æ/
A post hoc acoustic analysis of the formants of /æ/ was also performed in Praat. The first two formants (F1 and F2) were extracted from each /æ/ over the same portion of the vowel as used for the kinematic analyses, and averaged over the three oral context and three nasal context productions, respectively. They were plotted as change vectors in formant space to facilitate comparison to the change vectors of the tongue position analyses.
Statistical Analysis
We assessed the vowel-specificity of these tongue position changes and lip aperture changes with analyses of variance (ANOVAs). For all ANOVAs, the predictor was vowel (/ɑ, i, æ, u/) and the outcome variable was the difference in position in nasal contexts and oral contexts. The ANOVAs assessed differences in tongue forwardness (posterior/anterior plane), tongue height (superior/inferior plane), lip aperture, and lip protrusion respectively.
Results
All participants accurately produced three repetitions of each of the labial tokens. Analysis was completed on 238 tokens, with 2 tongue position tokens excluded due to kinematic tracking errors. One participant had no usable lip data due to tracking errors from one of the lip sensors, and as such, analysis of lip aperture and protrusion measures were completed over 9 participants and 214 tokens.
Tongue Position Change Vectors
Change vectors of tongue position for each participant are shown in Figure 2, in which each plot shows the difference between tongue positions in nasal and oral contexts for a different vowel (/ɑ,æ,i,u /). Each vector represents the change in tongue position for one participant (as schematized in Figure 1). The origin of the vector represents the position of the tongue during vowels in oral contexts, which has been translated to a common origin across participants. The length and direction of the vector show the difference between vowels in oral contexts to vowels in nasal context. A vector pointing to the left thus suggests that a given participant had a more forward tongue position during nasal contexts than oral contexts. Similarly, a vector pointing down suggests that the participant had a tongue position that was lower during nasal contexts. The dotted circles have a radius of 0.88 mm, which represent an average maximum error across participants using an NDI Wave system (Berry, 2011). When averaging across all participants, the resultant vectors for /ɑ, i, u/ are small, with magnitudes of 1.4 mm, 1.1 mm, and 1.4 mm respectively. The resultant vector for /æ/ is larger, with magnitude 8.2 mm. Although shown together for comparison across speakers in Figure 2, these vectors are also presented as embedded in each speaker’s articulatory vowel space in Supplementary Figure S1. Group differences for tongue height and tongue forwardness separately are also shown in Figure 3A and B in order to compare to other studies that only address one of these dimensions.
Figure 2.

Tongue position change vectors for vowels / ɑ,æ,i,u/. Each arrow represents data from one participant (P1, P2, P3, etc.). The origin of each arrow represents the participant’s tongue position during vowels in oral contexts; these origins have been aligned to compare across participants. The magnitude (in mm; length of arrow) and direction of the arrow represent the change in tongue position between vowels in oral contexts (origin) and vowels in nasal contexts (end of each arrow). Arrow directions are interpretable as in the schematic in Figure 1; that is, an arrow pointing to the left suggests that tongue positions are more forward during nasal contexts, and an arrow pointing down suggests that tongue positions are lower during nasal contexts. Colors are assigned per participant and are consistent across plots as shown on bottom right.
Figure 3.

Differences in tongue position and lip aperture between vowels in nasal and non-nasal contexts. Each dot represents one participant. The gray box shows the standard deviation and the black line shows the group mean. (A) shows the differences in tongue forwardness, in which negative numbers mean the tongue is more forward in nasals. (B) shows the differences in tongue height, in which positive numbers means the tongue is higher during nasals. (C) shows differences in lip aperture, in which negative numbers mean the lips are closed more during nasals. (D) shows differences in lip protrusion, in which negative numbers mean the lips are more protruded during nasals. Dimensions are in the original mm along the respective axes (e.g., tongue height is a difference in the superior/inferior axis).
Lip Aperture Change
Lip aperture changes also varied by vowel (see Figure 3C). A positive aperture change suggests that the lips are more open during vowels in a nasal context than vowels in an oral context. For the vowel /ɑ/, the mean change in aperture was −0.5 mm (range: −1.9–2.4 mm) and in /u/ it was 0.5 mm (range: −2.2–2.0 mm). The mean difference in aperture was 1.1 for /i/ (range:−0.1–2.4 mm). As with tongue position changes, the largest contrast in aperture was also for /æ/, in which the mean difference was −1.8 mm (range: −5.1–0.2 mm). This negative aperture change suggests that the lips were more open during oral productions of /æ/ than during nasal productions of /æ/.
Lip Protrusion Change
Lip protrusion change was more consistent across vowels (see Figure 3D). A negative lip protrusion change indicates that the lips are more protruded in nasal contexts than oral contexts. For the vowel /ɑ/, the mean change in protrusion was −0.5 mm (range: −1.3–0.4 mm) and in /u/ it was −0.6 mm (range: −2.3–0.4 mm). The mean difference in protrusion for /i/ was −0.1 (range: −2.9–1.1 mm). As with tongue position changes, the largest contrast in lip protrusion was also for /æ/, in which the mean difference was −1.0 mm (range: −3.7–0.6 mm). Lip protrusion changes across vowels all had standard deviations that overlap with 0, suggesting that at the group level, changes were not consistent.
Statistical Analysis of Vowel Effect on Tongue and Lip Position
The difference in tongue forwardness was statistically significantly different between vowels as determined by one-way ANOVA (F(3,36) = 30.61, p < .0001). Similarly, the difference in tongue height also showed a main effect of vowel (F(3,36) = 18.53, p < .0001). For both tongue measures, post-hoc analyses indicated that /æ/ was significantly different from the other vowels (Tukey test). For lip aperture, a one-way ANOVA showed a main effect of vowel (F(3,32) = 8.53, p = .0003). Post-hoc testing revealed that lip aperture changes for /æ/ were significantly different to /i,u/ but were not significantly different than /ɑ/. For lip protrusion, a one-way ANOVA showed no effect of vowel (F(3,32) = 1.23, p = .31).
Examining the data for tongue position (Figure 3A–B) also show that only /æ/ has error bars that do not also overlap with 0 ± .88 mm, the error inherent in the tracking system (Berry, 2011). For lip aperture and lip protrusion, the error bars for all of the vowels are within this range (Figure 3C–D). This suggests that across participants, only tongue position is changing systematically and only during /æ/.
Finally, our post hoc acoustic analysis of formant changes of /æ/ is shown in Figure 4. As in the tongue position analysis in Figure 2, the beginning of each arrow represents the average formant values of the three productions of /æ/ in an oral context and the head of the arrow points to the average formant values in the nasal context.
Figure 4.

Formant changes per participant. F1 and F2 shown as in a typical acoustic vowel space, with decreasing F2 on the x axis and increasing F1 on the y axis. This mirrors the tongue position changes to modify F1 and F2: raising the tongue lowers F1, and moving the tongue forward raises F2. The common nasal resonance around 1000 Hz is marked in grey.
Discussion
We measured tongue and lip position changes between vowels produced in nasal and non-nasal contexts. We found that there were vowel-specific changes in tongue height, tongue forwardness, and lip aperture during the vowel /æ/. Lip protrusion did not appear to change consistently across participants.
In /æ/, speakers moved the tongue up and forward during nasal contexts in which the velum was presumably lowered. This change in tongue position narrows the oral passage, thus increasing the oral impedance and shunting more sound through the nasal cavities (Hajek, 1997). A few speakers shifted /ɑ/ in the same direction; nine participants (of ten) moved their tongue forward, but some moved their tongue forward and down, and one moved it back. During the high vowels /i/ and /u/, speakers moved their tongue less and perhaps less consistently. In both cases, only one participant moved the tongue body up. Lip aperture decreased during /æ/ in a nasal context, which would similarly increase oral impedance.
Comparisons to Previous Literature
These results may help explain the conflicting data published previously (Arai, 2005; Carignan et al., 2011), which came from only a small number of speakers and assessed different vowels (summarized in Table 1). Our results indicate primarily that different speakers may or may not alter their oral configurations (consistent with Engwall et al., 2006 in French speakers), and to different degrees. For example, the speaker in Arai had a tongue position that was lower and back by around 2–3 mm during nasal /ɑ/; this is consistent with the speaker labeled P5 in Figure 2. However, other speakers in our study produced different contrasts.
The magnitude of changes here (Figure 3.) are similar to those in Arai (2005), which were on the order of 2–3 mm, but much larger than those in Carignan (2011), which ranged from 0.01–0.59 mm. However, magnitude in Figure 3. incorporates both height and forwardness. When considering only the vertical component of the vectors in Figure 2 (e.g., tongue height contrasts), magnitude of height changes on /i/ in our study (−2.1–0.73 mm; SD: 0.96) are still larger than Carignan (2011). This could be due to different samples (4 men ages 22–65 versus our 10 mixed-gender sample of young adults 20–33 years), or due to different stimuli. The stimuli used in this study incorporate both anticipatory and carryover nasalization, to maximize possible changes. The literature suggests that carryover nasalization is stronger than anticipatory nasalization in French (Delvaux et al., 2008), but the opposite may be true in American English (Moll, 1962; Solé, 1992; Zellou and Tamminga, 2014). Previous studies in English used only anticipatory nasalization (e.g., bVm in both Arai (2005) and Carignan (2011)), which could explain our discrepant results.
Possible Mechanisms of Changes
These differences in tongue position for vowels being produced in oral versus nasal contexts could be due to two basic sources: functional and structural. Thus far we have considered only functional causes; that is, if there are consistent changes, they must have acoustic consequences that are desirable to the speaker, perhaps to emphasize a contrast (Solé, 1992). Indeed, in American English, acoustic correlates of coarticulatory vowel nasalization vary systematically based on lexical or phonological contrast, or on word position in the phrase (Cho et al., 2017). We have assumed that these differences are associated with systematic articulatory differences, although this has not been shown directly. We have also assumed that because these differences are systematic, they may serve a functional purpose to enhance or attenuate nasal contrasts. However, this assumption does not necessarily true: systematic differences could be due merely to structural or vowel-space constraints. For example, the palatoglossal muscle has attachments to both the posterior tongue and the velum (Hixon et al., 2018), which suggests a complex biomechanical coupling of these structures. The velum is also controlled by other muscles (levator veli palatini, tensor veli palatine, palatopharyngus, muscularis uvula), as is the tongue (genioglossus, hyoglossus, styloglossus), the combination of which provides the ability to move these structures quasi-independently. The exact biomechanical consequences of the anatomical coupling of these structures by the palatoglossus are unknown, and the entire system is known to be complex (Hixon et al., 2018).
It is likely that the underlying cause of these tongue positions is a combination of structural and functional (acoustic/perceptual) factors that vary across vowels and across speakers, dependent partly on speaker differences in anatomy and speaking habits. For example, the vertical position for /i/ is naturally constrained by virtue of contact of the sides of the tongue with the hard palate; thus in this dimension, speakers have only limited freedom to change their oral configurations in different contexts. Broadly, the vowel targets for /i/, /u/ and /ɑ/ are each defined by a combination of quantal acoustic effects, biomechanical saturation effects and (at least for /i/ and /u/), patterns of contact between the lateral edges of the tongue and the sides of the hard palate and teeth (Gick et al., 2017; Perkell, 1996, 1979; Stone, 1990). These factors and the fact that /i/, /ɑ/ and /u/ define corners of the acoustic and articulatory vowel spaces contribute to more stable productions than for the vowel /æ/, which is near the center of the vowel space, is more often produced intra-syllabically during tongue movement between two consonants, and has a less well-defined articulatory target (Perkell, 1996, 1979). The fewer putative constraints on /æ/ gives speakers more freedom to change their oral configuration in different contexts, either as a consequence of other biomechanical (structural) coupling with the velum, or possibly to enhance nasal contrasts.
Our acoustic analysis of the formants in /æ/ in Figure 4 complements our oral configuration findings. When moving from /æ/ in an oral context to a nasal context, participants shifted their tongues up and forward and decreased their lip aperture. Theoretically, these combine to a decrease in F1 and increase in F2, as is, in fact, demonstrated in Figure 4. One explanation of these changes may be that the speakers are attempting to shift formant values away from a spectral peak (nasal resonance) at 1000 Hz (House and Stevens, 1956) to lessen its effect and preserve vowel identity. However, note that, in this set of speakers, the formants in the oral position of /æ/ are not near 1000 Hz. This indicates that the purpose of the shift may not be to attenuate the effects of the nasal resonance, but to create a different perceptual contrast, or that it is due to biomechanical constraints. A foundational study on vowel nasalization, House and Stevens (1956), found that listeners thought a standard (non-nasalized) synthesized /æ/ sounded nasal. This may help explain why /æ/ differs from the other vowels studied here: if it is intrinsically perceived as nasalized, a larger contrast could be needed to differentiate it when actually in a nasal context (if speakers in American English indeed make such contrasts).
Another way to conceptualize these vowel differences may be to consider the effects of coarticulation resistance (Bladon and Al-Bamerni, 1976). Speakers generally coarticulate different phonemes, but they modulate the degree of coarticulation such that their acoustic goals are still met. In a language like English, in which vowel nasalization is not contrastive, we could make some predictions: (1) we should see no articulation resistance (that is, all speakers should have changes around 0 in Figure 3A–D, indicating identical tongue and lip positions for vowels independent of context), because the perception of vowels is not affected by nasalization, or (2) we may see some positional changes. However, the exact source of those positional changes is unclear. They could be changes that counteract the effect of nasalization; that is, the movements are meant to resist the effect of coarticulation, and could be vowel-specific based either on that vowel’s constraints (as above) or based on that vowel’s likelihood of being affected by the nasalization. While our experiment did not explicitly test the question of functional versus structural causes for these articulatory differences, these possible mechanisms should inform future work into why speakers might make these changes for some vowels.
These results are consistent with previous research of vowel-specific effects of nasalization. Previous work in nasal languages French, Hindi, and Portuguese showed different effects for different vowels (Barlaz et al., 2018; Engwall et al., 2006; Oliveira et al., 2012; Shosted et al., 2012). In addition, oral-nasal coupling has very different impacts on formants depending on vowel type (Stevens et al., 1987). The frequency of the antiresonances introduced by coupling to the nasal cavities can overlap with what would otherwise be identified as F1 or F2. Those vowels with larger oral-to-nasal F1 or F2 shifts might be likelier to show compensatory tongue movements so that the vowels stay within category even when nasalized. Finally, while the large positional changes during /æ/ observed here are inconsistent with the one speaker previously examined in Arai (2005), there are features of /æ/ that explain the present results. This vowel is typically considered lax, while the other corner vowels are all tense. It is known to be variable across different dialects of American English (perhaps suggesting a larger category size), and in fact has been noted elsewhere to be higher when preceding a nasal (Dinkin, 2011; Duncan, 2016).
Limitations and Future Directions
Methodological Considerations
One difference among studies examining nasalization is related to the trade-off between needing consistent productions and the occurrence of English words containing the CVCs used here. For example, in this study, although /mɑm/ and /bɑb/ are both utterances that speakers have likely pronounced before, /mæm/ (“ma’am”) is a word speakers are familiar with, while /bæb/ is not. In other studies, speakers may produce a carrier phrase or produce many different combinations of consonants and vowels. These and other methodological choices may affect results, as prosody can produce hyperarticulation that may enhance or attenuate nasalization markers (Cho et al., 2017).
Using EMA to capture oral positions provides both advantages and disadvantages compared to other methods (e.g., ultrasound, MRI). Its sampling rate, 100 Hz here, is higher than some real-time MRI paradigms (e.g., 14 Hz in Oliveira et al., 2012), although others can reach 100 Hz (Barlaz et al., 2018). However, instead of capturing the entire tongue or vocal tract boundaries, EMA can only capture a limited number of fleshpoints. EMA and MRI both affect the speaker such that their productions are not completely naturalistic. EMA enables speakers to sit upright and speak normally, but the glue and sensors affect somatosensation and speakers are unable to completely adapt their speech to the sensors (Hunter, 2016). MRI typically requires speakers to lie on their backs rather than sit upright, and the noise of the machine cannot be completely attenuated and thus will affect the speaker. Ideally, future research with both modalities will converge.
Sources of Variability between Participants
This study adds to the literature additional evidence for oral configuration changes during nasal productions in American English. Although we did include more participants than some previous research – and included both men and women – there are a variety of factors that we did not control for that might account for some variability. For example, we required only that participants spoke American English as their first language. We did not gather information about their particular dialects, which would affect vowel categories and articulation. We similarly asked participants about their history of speech, language, hearing, or voice disorder, and only enrolled participants with no history of any concerns, but we did not specifically ask whether they had a history of tonsillectomy or oral surgeries, nor did we directly test their hearing or speech articulation. Finally, although our participants are typically college students, this is not a requirement and we did not collect their level of education. Future research could investigate these factors directly to attempt to assess how specific and consistent oral configuration changes may be.
Previous studies indicate that, while gender does not appear to affect acoustic measures of nasalization (Leeper et al., 1992; Mayo et al., 1996), there could be gender-related differences in strategies for nasalization, as speculated in Engwall (2006). Specifically, findings could be affected by differences in body size (and thus presumably vocal-tract, pharyngeal, and velar size) between individuals. There may also be socio-cultural factors related to the degree of nasal coarticulation, particularly age and gender in different dialects of American English (Tamminga and Zellou, 2015). Although gender apparently did not have an effect in our study (see supplemental Figure S2 for /æ/ changes, colored to indicate speaker gender), this could be explored further as gender is relevant for oral configurations in languages with a nasal vowel contrast (Delvaux et al., 2002; Engwall et al., 2006) and other studies in American English did not include women as participants (Arai, 2005; Carignan et al., 2011).
Further, we did not require participants to produce the CVCs at a particular rate or for a particular duration. One possible account for differences in tongue or lip movement between participants could be vowel duration: very short could lead to more unstable productions, or very long vowels could indicate more emphasis, possibly leading to more contrast between conditions via hyperarticulation (Cho et al., 2017). However, our current data do not support these hypotheses. Although vowel durations did vary somewhat across participants, vowel length did not relate to the magnitude of change for any of the vowels (see supplementary Figure S3).
Finally, we did not directly measure nasalance, which correlates to listener perception of nasality, or each speaker’s acuity for nasalization. Future studies should measure these factors in order to determine the effect of perceptual differences on contrasts and the effect of individual anatomy on the necessity and possibility of using oral configurations to enhance contrasts (as suggested in Engwall et al., 2006). These measures would be necessary in order to translate these findings into a therapeutic context. Specifically, we found that typical speakers do consistently change their oral configurations for at least one vowel. In order to target therapies for an individual speaker with hypernasality, specific articulation targets per vowel may need to be determined, as suggested in Rong and Keuhn (2012).
Conclusions
Overall, we can conclude that typical speakers may use oral configurations that are habitually different in nasal and non-nasal contexts, particularly for the vowel /æ/. However, this effect is otherwise not consistent across speakers or vowels, perhaps accounting for previously conflicting results. Although velar control is often conceptualized as binary (e.g., open or closed), in fact there are degrees of nasal-oral coupling, and the degree of coupling correlates non-linearly to listener perceptions of nasalization (Kummer et al., 2003, 1992). Some previous acoustic and modelling work has suggested that speakers may make consistent changes in their oral articulation, but previous kinematic studies had inconsistent results and used just a few speakers.
This result is consistent with biomechanical accounts of vowel production that suggest that /æ/ has a larger range of acceptable tongue positions than more peripheral vowels. The fact that all participants moved their tongues in the same direction and largely to a similar extent suggests that this perceptual/articulatory allowance for variability does enable participants to consistently modulate their oral configurations. It is not yet clear what mechanisms might underlie these modulations, but future work can disentangle these effects, particularly in more central vowels, and further elucidate the complexity of oral-velar motor control.
Supplementary Material
Highlights.
English is not contrastive for vowel nasalization
English speakers do not make consistent changes in nasal and oral corner vowels /i,a,u/
However, they do change tongue position in nasal and oral /æ/ (i.e., man vs. pat)
Acknowledgments
The authors wish to thank Talia Mittelman, Jackson Lee, and Jay Bohland for their assistance setting up and running this experiment. We further wish to thank Mark Tiede for sharing his scripts for filtering and re-referencing electromagnetic articulography data. This research was supported in part by National Institutes of Health – National Institute on Deafness and Other Communication Disorders via grants: F31 DC014872 and F32 DC017637 (G. Cler), R01 DC016270 (C. Stepp), P50 DC015446 (R. Hillman), and R01 DC016270 (C. Stepp and F. Guenther).
Footnotes
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Arai T, 2005. Comparing tongue positions of vowels in oral and nasal contexts, in: Interspeech; 2005. Lisbon, Portugal, pp. 1033–1036. [Google Scholar]
- Barlaz M, Shosted R, Fu M, Sutton B, 2018. Oropharygneal articulation of phonemic and phonetic nasalization in Brazilian Portuguese. J. Phon. 71, 81–97. 10.1016/j.wocn.2018.07.009 [DOI] [Google Scholar]
- Berry JJ, 2011. Accuracy of the NDI Wave Speech Research System. J. Speech Lang. Hear. Res. 54, 1295. 10.1044/1092-4388(2011/10-0226) [DOI] [PubMed] [Google Scholar]
- Bladon RAW, Al-Bamerni A, 1976. Coarticulation resistance in English /l/. J. Phon. 4, 137–150. 10.1016/s0095-4470(19)31234-3 [DOI] [Google Scholar]
- Boersma P, Weenink D, 2015. Praat: doing phonetics by computer.
- Carignan C, Shosted R, Shih C, Rong P, 2011. Compensatory articulation in American English nasalized vowels. J. Phon. 39, 668–682. 10.1016/J.WOCN.2011.07.005 [DOI] [Google Scholar]
- Chen MY, 1997. Acoustic correlates of English and French nasalized vowels. J. Acoust. Soc. Am. 102, 2360–2370. 10.1121/1.419620 [DOI] [PubMed] [Google Scholar]
- Cho T, Kim D, Kim S, 2017. Prosodically-conditioned fine-tuning of coarticulatory vowel nasalization in English. J. Phon. 64, 71–89. 10.1016/j.wocn.2016.12.003 [DOI] [Google Scholar]
- Cler GJ, Lee JC, Mittelman T, Stepp CE, Bohland JW, 2017. Kinematic analysis of speech sound sequencing errors induced by delayed auditory feedback. J. Speech, Lang. Hear. Res. 60, 1695–1711. 10.1044/2017_JSLHR-S-16-0234 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Comivi Alowonou K, Liu Z, Wei J, Honda K, Lu W, Dang J, 2019. Lingual and Acoustic Differences in EWE Oral and Nasal Vowels. 10.1145/3316551.3316569 [DOI]
- Cunha Conceicao, Silva S, Oliveira C, Conceiç C, Cunha Conceição, Teixeira A, Martins P, Joseph A, Frahm J, 2019. On the Role of Oral Configurations in European Portuguese Nasal Vowels. 10.21437/Interspeech.2019-2232 [DOI] [Google Scholar]
- Delvaux V, Demolin D, Harmegnies B, Soquet A, 2008. The aerodynamics of nasalization in French. J. Phon. 36, 578–606. 10.1016/j.wocn.2008.02.002 [DOI] [Google Scholar]
- Delvaux V, Metens T, Soquet A, 2002. French nasal vowels: articulatory and acoustic properties, in: 7th International Conference on Spoken Language Processing. pp. 53–56. [Google Scholar]
- Dinkin A, 2011. Nasal short-a systems vs. the northern cities shift. Univ. Pennsylvania Work. Pap. Linguist. 17. [Google Scholar]
- Duncan D, 2016. “Tense” /æ/ is still lax: A phonotactics study. Proc. Annu. Meet. Phonol. 3. 10.3765/amp.v3i0.3653 [DOI] [Google Scholar]
- Engwall O, Delvaux V, Metens T, 2006. Interspeaker variation in the articulation of nasal vowels. Proc. 7th ISSP 3–10. [Google Scholar]
- Fant G, 1971. Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations. Walter de Gruyter. [Google Scholar]
- Gick B, Allen B, Roewer-Després F, Stavness I, 2017. Speaking tongues are actively braced. J. Speech, Lang. Hear. Res. 60, 494–506. 10.1044/2016_JSLHR-S-15-0141 [DOI] [PubMed] [Google Scholar]
- Hajek J, 1997. Universals of Sound Change in Nasalization. Blackwell Publishers, Oxford. [Google Scholar]
- Hixon TJ, Weismer G, Hoit JD, 2018. Preclinical speech science: Anatomy, physiology, acoustics, perception, 3rd ed. ed. Plural Publishing. [Google Scholar]
- House AS, Stevens KN, 1956. Analog studies of the nasalization of vowels. J. Speech Hear. Disord. 21, 218–232. [DOI] [PubMed] [Google Scholar]
- Hunter E, 2016. Speech Adaptation to Kinematic Recording Sensors. Theses Diss. Brigham Young University - Provo. [Google Scholar]
- Kummer AW, Briggs M, Lee L, 2003. The relationship between the characteristics of speech and velopharyngeal gap size. Cleft Palate-Craniofacial J. 40, 590–596. [DOI] [PubMed] [Google Scholar]
- Kummer AW, Curtis C, Wiggs M, Lee L, Strife JL, 1992. Comparison of velopharyngeal gap size in patients with hypernasality, hypernasality and nasal emission, or nasal turbulence (rustle) as the primary speech characteristic. Cleft Palate-Craniofacial J. 29, 152–156. [DOI] [PubMed] [Google Scholar]
- Leeper HA, Rochet AP, MacKay IRA, 1992. Characteristics of nasalance in Canadian speakers of English and French, in: Second International Conference on Spoken Language Processing (ICSLP’92). Banff, Alberta, Canada, pp. 49–52. [Google Scholar]
- Liljencrants J, Lindblom B, 1972. Numerical Simulation of Vowel Quality Systems: The Role of Perceptual Contrast. Language (Baltim). 48, 839–862. 10.2307/411991 [DOI] [Google Scholar]
- Maeda S, 1990. Compensatory Articulation During Speech: Evidence from the Analysis and Synthesis of Vocal-Tract Shapes Using an Articulatory Model, in: Speech Production and Speech Modelling. Springer Netherlands, pp. 131–149. 10.1007/978-94-009-2037-8_6 [DOI] [Google Scholar]
- Mayo R, Floyd LA, Warren DW, Dalston RM, Mayo CM, 1996. Nasalance and nasal area values: Cross-racial study. Cleft Palate-Craniofacial J. 33, 143–149. 10.1597/1545-1569_1996_033_0143_nanavc_2.3.co_2 [DOI] [PubMed] [Google Scholar]
- Moll KL, 1962. Velopharyngeal Closure on Vowels. J. Speech Hear. Res. 5, 30–37. 10.1044/jshr.0501.30 [DOI] [PubMed] [Google Scholar]
- Oliveira C, Martins P, Silva S, Teixeira A, 2012. An MRI study of the oral articulation of European Portuguese nasal vowels, in: INTERSPEECH 2012. Portland, OR, pp. 2690–2693. [Google Scholar]
- Perkell JS, 1996. Properties of the tongue help to define vowel categories: Hypotheses based on physiologically-oriented modeling. J. Phon. 24, 3–22. 10.1006/jpho.1996.0002 [DOI] [Google Scholar]
- Perkell JS, 1979. On the nature of distinctive features: Implications of a preliminary vowel production study. Front. speech Commun. 365–380. [Google Scholar]
- Perkell JS, Guenther FH, Lane H, Matthies ML, Perrier P, Vick J, Wilhelms-Tricarico R, Zandipour M, 2000. A theory of speech motor control and supporting data from speakers with normal hearing and with profound hearing loss. J. Phon. 28, 233–272. 10.1006/jpho.2000.0116 [DOI] [Google Scholar]
- Perkell JS, Matthies ML, Svirsky MA, Jordan MI, 1993. Trading relations between tongue-body raising and lip rounding in production of the vowel /u/: A pilot ‘“motor equivalence”‘ study. J. Acoust. Soc. Am. 93, 2948–2961. 10.1121/1.405814 [DOI] [PubMed] [Google Scholar]
- Rong P, Kuehn D, 2012. The effect of articulatory adjustment on reducing hypernasality. J. Speech Lang. Hear. Res. 55, 1438. 10.1044/1092-4388(2012/11-0142) [DOI] [PubMed] [Google Scholar]
- Rong P, Kuehn DP, 2010. The effect of oral articulation on the acoustic characteristics of nasalized vowels. J. Acoust. Soc. Am. 127, 2543–2553. 10.1121/1.3294486 [DOI] [PubMed] [Google Scholar]
- Savariaux C, Perrier P, Orliaguet J-P, Schwartz J-L, 1999. Compensation strategies for the perturbation of French [u] using a lip tube. II. Perceptual analysis. J. Acoust. Soc. Am. 106, 381–393. 10.1121/1.427063 [DOI] [PubMed] [Google Scholar]
- Shosted R, Carignan C, Rong P, 2012. Managing the distinctiveness of phonemic nasal vowels: Articulatory evidence from Hindi. J. Acoust. Soc. Am. 131, 455–465. 10.1121/1.3665998 [DOI] [PubMed] [Google Scholar]
- Solé M-J, 1992. Phonetic and Phonological Processes: The Case of Nasalization. Lang. Speech 35, 29–43. 10.1177/002383099203500204 [DOI] [PubMed] [Google Scholar]
- Stevens KN, 1998. Acoustic Phonetics, Acoustic Phonetics. MIT Press, Cambridge, MA. 10.7551/mitpress/1072.001.0001 [DOI] [Google Scholar]
- Stevens KN, Fant G, Hawkins S, 1987. Some acoustic and perceptual correlates of nasal vowels, in: Channon R, Shockey L (Eds.), In Honor of Ilse Lehiste : Ilse Lehiste Pühendusteos. Foris Publications Holland, Dordrecht, Holland, pp. 241–254. [Google Scholar]
- Stone M, 1990. A three-dimensional model of tongue movement based on ultrasound and x-ray microbeam data. J. Acoust. Soc. Am. 87, 2207–2217. 10.1121/1.399188 [DOI] [PubMed] [Google Scholar]
- Styler W, 2017. On the acoustical features of vowel nasality in English and French. J. Acoust. Soc. Am. 142, 2469–2482. 10.1121/1.5008854 [DOI] [PubMed] [Google Scholar]
- Tamminga M, Zellou G, 2015. Cross-dialectal differences in nasal coarticulation in American English, in: 18th International Congress of Phonetic Sciences. Glasgow, Scotland. [Google Scholar]
- Tiede M, Bundgaard-Nielsen R, Kroos C, Gibert G, Attina V, Kasisopa B, Vatikiotis-Bateson E, Best C, 2010. Speech articulator movements recorded from facing talkers using two electromagnetic articulometer systems simultaneously. J. Acoust. Soc. Am. 128, 2459. 10.1121/1.3508805 [DOI] [Google Scholar]
- Zellou G, Tamminga M, 2014. Nasal coarticulation changes over time in Philadelphia English. J. Phon. 47, 18–35. 10.1016/j.wocn.2014.09.002 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

