Nonlinear source–filter coupling in phonation: Vocal exercises

Ingo Titze; Tobias Riede; Peter Popolo

doi:10.1121/1.2832339

. 2008 Apr;123(4):1902–1915. doi: 10.1121/1.2832339

Nonlinear source–filter coupling in phonation: Vocal exercises

Ingo Titze ^1,^a), Tobias Riede ², Peter Popolo ³

PMCID: PMC2677316 PMID: 18396999

Abstract

Nonlinear source–filter coupling has been demonstrated in computer simulations, in excised larynx experiments, and in physical models, but not in a consistent and unequivocal way in natural human phonations. Eighteen subjects (nine adult males and nine adult females) performed three vocal exercises that represented a combination of various fundamental frequency and formant glides. The goal of this study was to pinpoint the proportion of source instabilities that are due to nonlinear source–tract coupling. It was hypothesized that vocal fold vibration is maximally destabilized when F₀ crosses F₁, where the acoustic load changes dramatically. A companion paper provides the theoretical underpinnings. Expected manifestations of a source–filter interaction were sudden frequency jumps, subharmonic generation, or chaotic vocal fold vibrations that coincide with F₀–F₁ crossovers. Results indicated that the bifurcations occur more often in phonations with F₀–F₁ crossovers, suggesting that nonlinear source–filter coupling is partly responsible for source instabilities. Furthermore it was observed that male subjects show more bifurcations in phonations with F₀–F₁ crossovers, presumably because in normal speech they are less likely to encounter these crossovers as much as females and hence have less practice in suppressing unwanted instabilities.

INTRODUCTION

A hypothesis is being pursued that humans can engage their sound source in the larynx and their vocal tract airways (the filter) in two fundamentally different ways. The first is linear source–filter coupling, where the source frequencies are produced independently of the acoustic pressures in the airways. The glottal airflow in the larynx is produced aerodynamically, with a quasisteady transglottal pressure and a flow pulse that mirrors the time-varying glottal area. The second is nonlinear coupling, where the acoustic airway pressures contribute to the production of frequencies at the source. In the nonlinear case, the transglottal pressure includes a strong acoustic component, much like in woodwind instruments where the airflow through the reed is driven by acoustic pressures of the instrument bore, or in brass instrument playing, where the lip flow is driven by the acoustic pressures in the brass tube (Fletcher, 1979). The major parameter in nonlinear coupling for voiced speech appears to be related to the diameter of the epilaryngeal tube (also known as laryngeal vestibule), which serves to either match or mismatch the output impedance of the glottis to the input impedance of the vocal tract. Weak coupling is obtained when the glottal impedance is high and the epilarynx tube input impedance is low, whereas strong coupling (nonlinear interaction) is obtained when the impedances are comparable.

Some evidence of nonlinear source–filter coupling comes from earlier voice source analysis (Rothenberg, 1981; Fant, 1986), excised larynx experiments (Alipour et al., 2001), and physical model experiments (Chan and Titze, 2006; Zhang et al., 2006). A more extensive discussion and bibliography is given in the companion paper (Titze, 2008). The investigations demonstrated that the addition of a vocal tract filter to the isolated larynx or a vocal fold model lowers phonation threshold pressure and thereby eases the onset of phonation. Analytical calculations and computational simulations are a second source of evidence (Ishizaka and Flanagan, 1972; Titze, 1988; Titze and Story, 1995; Titze, 2004; Chan and Titze, 2006; Zañartu et al., 2007). Those simulations showed that an acoustically inertive supraglottal tract facilitates vocal fold vibration and lowers F₀. By contrast, an acoustically compliant supraglottal tract hinders vocal fold oscillation (sometimes squelching it entirely) and raises F₀ (Titze, 2006a, Chap. 7). A third source of evidence is experiments in which human subjects phonate into tubes, artificially elongating the vocal tract (e.g., Story et al., 2000; Hatzikirou et al., 2006). In those experiments it was shown that instabilities are more likely to occur when F₀ and F₁ cross. What is currently missing is an investigation with a sufficient sample of real human voice production on a variety of vowels. In this current investigation we demonstrate that F₀–F₁ crossovers can occur naturally in the human voice and that instabilities are more likely to occur near such crossovers.

But why this duality of source–filter coupling? The advantage of linear coupling appears to be greater source stability when vowel and F₀ need perceptual clarity. Modes of vibration of the vocal fold tissues are not disturbed by articulatory adjustments, an obvious advantage for speech. Self-sustained oscillation is then based on a mucosal wave that propagates on the vocal fold surface and aerodynamic pressures in the glottis that are in synchrony with the tissue velocity of the vocal folds (Titze, 1988). The vocalist needs only to control the laryngeal configuration and lung pressure to produce the sound (Sundberg et al., 1993; Sundberg and Hogset, 2001; Henrich et al., 2005). Articulation is then merely a modulation of the source harmonic amplitudes. This has been the fundamental assumption in the linear source–filter theory of vowel and voiced consonant production (e.g., Fant, 1960; Stevens, 1998; Schutte and Miller, 1993).

The advantage of nonlinear coupling may be that more output power can be produced because stored energy in the vocal tract is fed back to the source to increase the glottal flow energy. But this may be at the expense of less stability at the source. In some forms of vocal communication, this may not matter. Lower stability leads to a greater variety of source qualities, including cultivated frequency jumps as in a yodel, subharmonics, low frequency modulations at the source, and chaotic vibration. Some of these instabilities may be advantageous in an artistic context (Neubauer et al., 2004), or for survival as in an infant cry (Mende et al., 1990), but they may be considered pathological in a speech context (Hirano, 1981). Source instability due to nonlinear source–filter coupling may be greatly exaggerated when there is a vocal pathology. Asymmetry in the larynx, nodules and polyps, paralysis, and other voice disorders affect the normal modes of vibration of the tissue, which can easily be desynchronized by additional nonlinear coupling to the vocal tract.

Historically, clinicians have used a battery of test utterances for assessment of voice disorders that progress from vowels to isolated syllables or words and then to complete sentences or paragraphs. Test utterances are also useful for monitoring the effectiveness of vocal training. Almost everyone agrees that the tasks must reveal control of fundamental frequency, loudness, and some aspect of vocal quality. But, the interactions among respiratory, phonatory, and articulatory components of speech have not been specifically targeted as important components of assessment. Although a collection of vowels and voiced consonants may be part of the test material, there is generally no hypothesis about whether the voice disorder is more affected by one vowel shape versus another.

It is generally thought that steady vowels alone are insufficient to provide a diagnostic “treadmill” for vocalization. They test the stability (or steadiness) of a vocal and articulatory posture, but allow little to be said about interactivity. Such interactivity becomes evident when either source or filter is dynamically changing. Dynamic testing has been proposed by Kent et al. (1987) for speech articulation and by Freund and Büdingen (1978) and Schmidt and Lee (1989) for limb movement, but little has been implemented for voice diagnostics.

To maximize the diagnostic value of test utterances for vocal control, it is suggested here that source–filter interaction exercises may become part of a diagnostic battery. A variety of voice disorders may manifest themselves in the lack of voice control when source harmonics and formant frequencies are forced to interfere with each other. In particular, sudden frequency jumps occur when specific formants and harmonics cross (see the companion paper for theoretical explanation). Often, bifurcations in the vibratory patterns of the vocal folds occur involuntarily at these locations.

The purpose of this study was to test three F₀–F₁ crossover exercises, (1) a fundamental frequency glide at a constant vowel, (2) a vowel glide at a constant fundamental frequency, and (3) a combination fundamental frequency–vowel glide. Fundamental frequency and vowels were chosen such that maximum interaction would likely take place.

METHODS

Subjects

Eighteen volunteers participated in the study, nine females (ranging in age from 25 to 50 with an average of 31) and nine males (ranging in age from 25 to 44 with an average of 31.6). All subjects reported no vocal pathologies. Several claimed that they sing as amateurs, but none had extensive vocal training. Two certified speech-language pathologists assessed their voices as normal, not containing any dysphonia. Experiments were in compliance with guidelines of the NIH and were reviewed and approved by the institutional review boards.

Three vocal exercises

As a first exercise, subjects were asked to produce fundamental frequency (F₀) glides. The pattern was high to low, then low to high, with an intermediate vocal fry. This exercise was produced on four vowels (∕α∕, ∕æ∕, ∕i∕, ∕u∕), with two different starting fundamental frequencies per vowel and two different vocal efforts (soft and loud). Table 1 lists all three exercises and Fig. 1 shows the F₀ glides in musical notation. The vocal fry utterance was elicited between the fundamental frequency glides to estimate the formant frequencies and bandwidths of the vowels, since both measures are most reliably extracted from low F₀ phonations.

Table 1.

Three vocal exercises.


Exercise 1. Pitch glides and reversals (at least two octaves withvocal fry included)
1. C5 to F3, vocal fry, F3 to C5 with steady vowels ∕ii∕, ∕u∕, ∕α∕, and ∕æ∕, soft and loud, males and females
2. Repeat with C6 to F4 for females, C4 to F2 for males, all else the same
Exercise 2. Vowel glides andreversals
1. C5, ∕i∕-∕æ∕-∕i∕ and ∕u∕-∕α∕-∕u∕, soft and loud, males and females
2. Repeat with C6 for females, C4 for males, all else the same
Exercise 3. Simultaneous vowel and pitch glides
1. C5 to F3, vocal fry, F3 to C5 while vowels change in the sequence ∕i∕-∕æ∕-∕i∕ and ∕u∕-∕α∕-∕u∕, soft and loud, males and females
2. Repeat with C6 to F4 for females, C4 to F2 for males, all else the same

Open in a new tab

Musical notation of F₀ glides used in Exercises 1 and 3, and vowel changes (far right) drawn at an approximate height so that F₁ corresponds to fundamental frequency on the left.

Females phonated the two higher fundamental frequency glides and males the two lower fundamental frequency glides, such that the middle glide was common to both genders. Subjects were prompted with computer simulated signals that had no source–filter coupling (see the companion paper, Titze, 2008, for the computer model). A spectrogram of the prompts is shown in Fig. 2, with Fig. 2A representing the prompt for the first exercise. The first formant frequency location is represented by the gray dots in Fig. 2 and the sloping lines are the harmonics.

Spectrograms of computer generated stimuli that were used to prompt the subjects. (A) Exercise 1, (B) Exercise 2, and (C) Exercise 3. First formant (F₁) is indicated by gray dots.

The second exercise consisted of two vowel glides and their returns (from ∕i∕ to ∕æ∕ and back to ∕i∕; and from ∕u∕ to ∕α∕ and back to ∕u∕). These vowel glides were phonated in succession on two constant fundamental frequencies (C5 and C6 for females and C5 and C4 for males). Returning to the musical notation of Fig. 1, this would be one sustained note (e.g., C5, second note from the top) while vowel formant frequencies are changing upward as shown on the right side of the graph. Two vocal efforts were used (soft and loud) for all exercises. Figure 2B illustrates a spectrographic version of a computer simulation that served to prompt a subject. Note that the harmonics remain constant while F₁ follows a low-high-low trajectory.

The third exercise consisted of simultaneous vowel and fundamental frequency changes. In Fig. 1, the fundamental frequency glides (glissandi) were again used, but this time with the simultaneous vowel changes as shown to the right. The spectrographic version of the prompt is shown in Fig. 2C. F₀ and F₁ were moved in opposite directions and were forced to cross. Subjects were instructed to start with an ∕i∕ vowel (F₁≈300 Hz) and change to an ∕æ∕ vowel (F₁≈800 Hz) while gliding fundamental frequency downward, as in Exercise 1, then change back to an ∕i∕ vowel while gliding fundamental frequency upward. Intermediate vocal fry was also elicited. Starting fundamental frequencies were C5 (523 Hz) and C6 (1047 Hz) for females and C5 (523 Hz) and C4 (262 Hz) for males. This exercise was repeated for the ∕u∕-∕α∕-∕u∕ vowel transition. Each phonation was produced at two different vocal efforts (soft and loud).

For each of the exercises described, subjects were asked to produce three tokens for statistical power; however, some subjects were only able to produce one or two tokens. Actual sample size is given in Table 2.

Table 2.

Information and sample sizes for each subject for Exercises E1, E2, E3.

Subject	Sex	E1	E2	E3
1	M	48	24	24
2	M	47	24	24
3	M	32	15	17
4	F	36	14	16
5	F	27	13	17
6	F	32	16	18
7	M	48	24	24
8	F	31	16	16
9	F	32	24	24
10	F	48	24	24
11	M	48	24	24
12	F	48	24	24
13	F	48	24	24
14	M	48	24	24
15	M	47	24	24
16	F	48	24	24
17	M	48	24	24
18	M	34	16	18

Open in a new tab

Recordings

Recordings were conducted in a single-wall IAC sound isolation booth. Subjects wore a head-mounted microphone (Countryman Associates omnidirectional B3 Lavalier; CSL Model 4400 pre-amp) mounted on a wire boom attached to a plastic frame, worn like a pair of eyeglasses. The microphone element was about 5 cm from the mouth and slightly to the side, out of the airstream.

The microphone signal was recorded with CUBASE SE software (version 3.0.3) on a PC. The recording level was adjusted to achieve the maximum signal strength and to avoid clipping. All phonations were digitized at a 44.1 kHz sampling rate and 16 bit quantization.

A Brüel & Kjaer 2238 sound level meter, set to linear frequency weighting, was positioned at the distance of 30 cm from the mouth. The sound level meter was used to visually obtain a sound pressure level reading at the outset of the recording session, while the subject phonated on ∕α∕ at a high and low fundamental frequency and loud and soft intensity, for the purpose of calibrating the microphone signal to SPL at 30 cm. (SPL levels are not discussed in this paper, however.)

The modeled vocalizations (Fig. 2) were generated with the SPEAK program (Titze, 2006a, Chap. 5) and were played back over a loudspeaker in the booth prior to the subjects performing each task, as a first auditory cue for the desired smoothness of fundamental and formant frequency change. In addition, the investigator was present in the sound booth during the vocal tasks to help the subject find the proper vowels and starting and ending fundamental frequency, if necessary. The vowels ∕i∕, ∕æ∕, ∕α∕, and ∕u∕ were announced (speech-like) by the investigator prior to each task. An electronic keyboard (Casio® Casiotone MT-35) was used to give the starting fundamental frequency as often as necessary for repeat tokens. The actual starting fundamental frequency did vary within and between subjects for particular exercises. No subject was specifically forced to phonate at the instructed starting fundamental frequency. The instructions were given only before the start of each token of the exercises. No corrections were attempted during the exercise.

Data analysis

Three bifurcations of vocal fold vibration were considered in this work, namely frequency jumps, subharmonics, and deterministic chaos (Fig. 3 shows stylistic sketches for two harmonics in a spectrogram). Biphonation, a fourth nonlinear phenomenon, was not found in any phonations. Each phonation was examined for the occurrence of those phenomena through visual inspection of narrow-band spectrograms (512-point Hanning window) and associated Fourier frequency spectra. Frequency jumps are sudden F₀ changes in which vibration rate moves up or down abruptly and discontinuously, and is qualitatively different from continuous, smooth F₀ change (Fig. 3, example I). Subharmonics are additional spectral components that can suddenly appear at integer fractional values of an identifiable F₀ (e.g., F₀∕2, F₀∕3, and so on) and as harmonics of these values. The result is that energy can appear at evenly spaced intervals below F₀ and between adjacent harmonics throughout the frequency spectrum (Fig. 3, example II). While the vibration pattern of the vocal folds is still regular in these cases, it is characterized by periods that are multiples of the F₀ period.

Sketches of narrow-band spectrogram of the three bifurcations considered in this study. In each example two harmonics (F₀ and 2F₀) are indicated. I: Two subsequent frequency jumps. II: Subharmonics. III: Deterministic chaos.

Deterministic chaos refers to episodes of nonrandom noise. This chaotic noise is technically distinguishable from random noise by the number of dimensions needed to describe it (Tokuda et al., 2002). An alternative to decide whether a noisy segment can be considered deterministic chaos is to evaluate characteristics visible in narrow-band spectrograms (Herzel, 1998), including sudden on- and offset, preceding or following subharmonics and harmonic “windows” occurring in otherwise noisy segments (Fig. 3, example III).

The following parameters were measured in each phonation: maximum and minimum fundamental frequency at the beginning, the middle, and end of the phonation; first and second formant frequency. The measurement of those parameters allowed the decision whether or not there was a F₀–F₁ crossover present (Fig. 4). Formant bandwidth was measured in the middle of the phonations of Exercise 1 (the vocal fry portion). Because energy loss to the subglottal system is minimum for vocal fry (a long glottal closure), we expected the measured formant bandwidths to be underestimated for the glides.

Schematics of fundamental frequency contours around a steady formant. Three relationships between F₀ and F₁ were found in phonations of Exercise 1. (A) F₀ and F₁ crossed, (B) F₀ and F₁ came within 100 Hz of each other at some point during the phonation, (C) F₀ and F₁ were never less than 101 Hz apart at any point during the phonation. Only (A) and (B) counted as “crossover present.”

Additionally, we measured the higher and lower fundamental frequencies of a frequency jump; the fundamental frequency before the onset of a subharmonic; the onset of a chaotic segment; and we noted the type of subharmonic event (F₀∕2, F₀∕3, and so on).

All measurements were performed using sound analysis software PRAAT (Boersma & Weenick, 2007). Linear predictive coding (autocorrelation procedure) was used to track formants. Formant bandwidth is the difference in frequency between the points on either side of the peak (frequency with peak amplitude) which have amplitude $A ∕ (\sqrt{2})$ (corresponding to 3 dB down from the peak).

Statistics

Source instabilities during a fundamental frequency or vowel glide can occur either as a result of mode of vibration changes in the sound source (e.g., a register change induced by changes in muscle activation), or as a result of nonlinear source–tract coupling. In a nonlinear source–filter system, source instabilities are expected when F₀ and F₁ cross, and therefore source instabilities can serve as indicators of nonlinear source–tract coupling. To sort out the instabilities that result from nonlinear source–tract coupling, we statistically compared the occurrences of instabilities in phonations without F₀–F₁ crossovers to those with F₀–F₁ crossovers. If instabilities were to result only from a source-specific mode change, we would expect no differences between the two samples. However if instabilities do result from a nonlinear source–filter interaction, we would expect more source instabilities in phonations with F₀–F₁ crossovers. Nonparametric tests were used for comparison of averages of matched (Wilcoxon test) or unmatched (Mann–Whitney test) samples.

RESULTS

Exercise 1: Fundamental frequency glide on steady vowels

Frequency jumps were the most commonly observed bifurcation type. They were found in 21% of all phonations across all individuals (s.d.=14%; range: 0–42%; N=18 subjects). The majority of frequency jumps were downward on the descending F₀ glide and upward on the ascending F₀ glide (161 cases out of 167). Examples are seen in Figs. 5A, 5C at the first and third arrows. Frequency jumps from all 18 subjects showed a mean frequency change of 31 Hz (s.d.=20 Hz; range: 0–79 Hz), or about 2 semitones, for the descending fundamental frequency. Subharmonics were found in 14% of all phonations across individuals (s.d.=9%; range: 2%–35%; N=18 subjects). Examples are seen at the second arrow in Fig. 5A and at the second arrow in Fig. 5B. Chaotic segments were found in 3% of all phonations across individuals (s.d.=5%; range: 0–15%; N=18 subjects).

Examples of bifurcations in Exercise 1. Time axes are slightly variable. Location of the first formant (F₁) is indicated by a horizontal line overlaid on the spectrograms. (A) Phonation of a male subject. Two frequency jumps are noted (arrows 1 and 3). A short subharmonic regime starts at arrow 2. (B) Phonation of an additional male subject. Source instabilities are not frequency jumps but only a slight perturbation of the descending F₀ trajectory (arrow). (C), (D) Phonations of two female subjects. Instabilities are indicated by arrows.

Crossovers occurred predominantly with ∕i∕ and ∕u∕ vowels because they had lower F₁ and were more likely to be in the path of the gliding F₀. The proportion of crossovers for all vowels in N=9 women were: ∕α∕:11.0±2.5%; ∕æ∕:9.6±5.6%; ∕i∕:24.7±1%; ∕u∕:24.9±1%. For N=9 men they were: ∕α∕:1.7±2.9%, ∕æ∕:2.0±4.2%; ∕i∕:20.0±3%; ∕u∕:18.3±4%; mean ±s.d.). The smaller percentage of crossovers in men comes from the fact that men started half of the F₀ glides an octave lower (C4; 262 Hz) while females started half of the glides an octave higher (C5; 523 Hz). All glides ended in vocal fry, which is below all formants. Hence, there was a greater likelihood that females always crossed F₁, while many males did not have fundamental frequencies above or near F₁ for the high-F₁ vowels ∕α∕ and ∕æ∕. In women, F₀–F₁ crossovers occurred in 70% of all phonations (s.d.=7%; range: 58%–78%; N=9). In men, F₀–F₁ crossovers occurred in only 42% of all phonations (s.d.=10%; range: 31%–66%; N=9).

Independent of loudness, the overall mean proportion of instabilities in phonations with F₀–F₁ crossover was 54% across male subjects (s.d.=17; range: 23%–78%; N=9), whereas without F₀–F₁ crossover it was 35% (s.d.=15; range: 7%–59%; N=9). This difference was statistically significant (Wilcoxon; Z=2.19; P<0.05; N=9). Across females, on the other hand, the overall mean proportion of source instabilities in phonations with F₀–F₁ crossover was 27% (s.d.=14; range: 11%–52%; N=9), whereas without F₀–F₁ crossover it was 24% (s.d.=30; range: 0%–90%; N=9). This difference is not significant (Wilcoxon; Z=0.59; P=0.55; N=9). The male–female difference in instabilities with F₀–F₁ crossover (54% vs 27%) was significant (Mann–Whitney; U=9; P<0.01; N_F, N_M=9).

Effect of loudness

A sufficient sample (N=75) of crossover and noncrossover phonations within the two categories “loud” and “soft” was available in male high glide and female low glide phonations. The overall mean proportion of source instabilities in loud phonations with F₀–F₁ crossover was 77% in males (s.d.=28; range: 25%–100%; N=9), whereas without F₀–F₁ crossover it was 65% (s.d.=41; range: 0%–100%; N=9). This difference was not statistically significant (Wilcoxon; Z=1.12; P=0.13; N=9). The overall mean proportion of source instabilities in soft phonations with F₀–F₁ crossover was 64% (s.d.=23; range: 37%–100%; N=9), whereas without F₀–F₁ crossover it was 38% (s.d.=36; range: 0%–100%; N=9). This difference in males for soft phonation was significant (Wilcoxon; Z=1.84; P<0.05; N=9).

For females, the overall mean proportion of source instabilities in loud phonations with F₀–F₁ crossover was 35% (s.d.=29; range: 0%–100%; N=9), whereas without F₀–F₁ crossover it was 23% (s.d.=25; range: 0%–75%; N=9). This difference was not significant (Wilcoxon; Z=0.59 P=0.23; N=9). The overall mean proportion of instabilities in soft phonations with F₀–F₁ crossover was 29% (s.d.=28; range: 0%–80%; N=9), whereas without F₀–F₁ crossover it was 20% (s.d.=33; range: 0%–100%; N=9), again not significantly different (Wilcoxon; Z=1.19 P=0.11; N=9). Results did not change when considering frequency jumps only, instead of summarizing all three observed instabilities (frequency jumps, subharmonics, and deterministic chaos).

F₀–F₁ vicinity

In 31 of 167 cases of frequency jumps (18%), the first formant frequency was in a 50 Hz vicinity of the fundamental frequency. In 21 additional cases, F₁ was in a 100 Hz vicinity of F₀. In 23 additional cases, F₁ was in a 200 Hz vicinity of F₀. In the remaining 92 cases, F₁ was more than 200 Hz away from F₀. For an average formant bandwidth of about 100 Hz in vocal fry (which is likely to be an underestimate for the glide phonations) it appears that at least 30% of instabilities occurred inside a formant bandwidth. But even if they occurred outside the bandwidth, the inertive reactance of the vocal tract may still have been large enough to trigger an F₀ change (see the companion paper, Titze, 2008).

Discussion of Exercise 1

Exercise 1 delivered at least three new findings. First, source instabilities occur more often in phonations in which F₀–F₁ crossovers are present. This is significant for male phonations. Second, instabilities occur more often in soft voice than loud voice, again primarily among males. Third, when F₀ jumps occur, they are mostly downward on a downgliding F₀ and upward on an upgliding F₀.

Consider the following explanations. When an instability in F₀ occurs near F₁, we expect the proximity of F₀ and F₁ to be on the order of the formant bandwidth, because most of the vocal tract acoustic reactance change occurs in this frequency interval. Figure 6 shows an impedance calculation for a vocal tract in the shape of the vowel ∕u∕. The top panel shows an outline of the vocal tract radius across length, and the bottom panel shows the supraglottal impedance curves in the vicinity of formants F₁ and F₂. (For a detailed discussion of the impedance curves, see the companion paper.) The thick solid curve is the supraglottal reactance, the thin solid line is the resistance, and the dashed curve is the magnitude of the impedance. The formant frequency is where the resistance has its maximum. This is where the reactance is midway between its positive and its negative peak, which is above the zero line because the laryngeal vestibule (epilarynx tube) adds a linear component with a positive slope to the reactance. Reactance above zero is inertive and reactance below zero is compliant. Only the 400–500 and 800–1000 Hz regions have compliant reactance.

(Color online) Calculation of reactance, resistance, and impedance magnitude (bottom) for a vocal tract shape resembling a ∕u∕ vowel (top).

The bandwidth of the formant is roughly the frequency distance between the peak and the trough in the reactance curve. But note that reactance can still be high (both positive and negative) a considerable distance outside the bandwidth. In our first data set, 31% of the frequency jumps we found occurred when F₀ was in the 100 Hz vicinity of F₁. Estimates of bandwidth values for vowels from this study, and two other studies, are given in Table 3. The wide range in bandwidths across these studies stems from the differences in the methods by which they were obtained. The Fujimura and Lindqvist-Gauffin (1971) values were obtained from a vocal tract transfer function measured with a sweep tone from a transducer applied to the surface of the neck, with the glottis tightly closed, which leads to less energy loss to the subglottic system and thus would account for the lower bandwidth values. The Childers and Wu (1991) values were obtained from a weighted recursive least-squares computation of the vocal tract filter function from the acoustic speech signal, similar to the method of linear prediction coefficients, which could include glottal leakage. Our measurements lie between the values from these two other studies because they were obtained during the vocal fry portion of the phonation. Vocal fry has a relative long closed phase, but the glottis is not completely closed. On the F₀ glides, bandwidths are expected to be higher because less glottal closure occurs at high F₀, where the phonation register is often falsetto-like. Thus, the assumption of an average 100–200 Hz bandwidth for both males and females at a wide range of F₀ is reasonable. This means that most of the frequency jumps were likely to be triggered by reactance changes.

Table 3.

First formant frequencies (F1) and bandwidths (B1) from phonations of Exercise 1 of our study and from three other studies (PB: Peterson and Barney, 1952; CW: Childers and Wu, 1991; FL: Fujimura and Lindqvist-Gauffin, 1971).

Vowel	Average F1 (Hz)			Average B1 (Hz)
Vowel	Measured	PB	CW	Measured	FL	CW
α—female	839	850	838	138	50	272
α—male	657	730	673	96	41	154
æ—female	840	860	842	128	50	221
æ—male	688	660	645	81	40	145
i—female	407	310	379	79	76	144
i—male	343	270	303	66	59	134
u—female	428	370	410	77	64	132
u—male	374	300	342	91	54	134

Open in a new tab

Further evidence for this assertion comes from the directions of the frequency change. Inertive reactance lowers F₀ because it effectively adds mass to the oscillating system (vocal folds plus a moving air column). The companion paper, Titze, 2008, gives calculations of this F₀ drop of about 50 Hz. Compliant reactance raises F₀ because it effectively adds stiffness to the oscillating system. As F₀ moves through the formant in a downward glide, one might first see a small increase in F₀ (due to a small amount of compliant reactance) followed by a sudden much larger decrease in F₀ because there is a dominance of inertive reactance.

As a second point of discussion, males experience more source instabilities than females in Exercise 1. Anatomically, the most important difference in the vocal system of males is a 60% greater vibrating vocal fold length, but only a 10%–20% greater vocal tract length. This leads to an overall greater difference between fundamental frequencies and lower formant frequencies in male phonations. Hence F₀–F₁ crossovers are generally less likely to occur in male normal speech. We hypothesize that this lower probability in male phonation may have led to fewer adaptive mechanisms to the destabilizing effects of F₀–F₁ crossovers.

A second factor is registration. Males phonate predominantly in modal register, whereas females have cultivated a more mixed register phonation. The second harmonic is of primary importance in modal register (male phonation) but less so in mixed register. It characterizes the closed portion of the glottal flow waveform (Titze, 2000, Chap. 5, Fig. 5.4), which is more important in male phonation than female phonation. Disturbance of the second harmonic by an additional loading effect (for instance during F₁–2F₀ crossovers) is more likely in males than in females because of the lower F₀. This can lead to larger source instabilities in males.

Exercise 2: Vowel transition on steady fundamental frequency

In this exercise, frequency jumps were found in 15% of all phonations across all subjects (s.d.=13%; range: 0%–41%; N=18). Six individuals showed no frequency jumps. Frequency jumps showed a mean frequency change of 20 Hz (s.d.=9.9 Hz; range: 10–42 Hz, N=12), which amounts to 1–2 semitones. Subharmonics were found in 21% of all phonations across 18 subjects (s.d.=15%; range: 0%–53%; N=18). Chaotic segments were found in 5% of all phonations across all subjects (s.d.=7%; range: 0%–25%; N=18). F₀–F₁ crossovers occurred in 89% of all phonations across all subjects (s.d.=16%; range 50%–100%; N=18).

Figures 7A, 7B, 7C are examples of F₀–F₁ crossovers. In Fig. 7A there is an aphonic segment, in Fig. 7B a lowering of F₀ in the middle vowel portion combined with a chaotic segment, and in Fig. 7C an F₀ lowering combined with a period 2 and a period 3 subharmonic segment. In Fig. 7D, F₁ crossed the second harmonic (2F₀), which revealed a small period 2 subharmonic segment near the end of the return crossover. 2F₀–F₁ crossovers represented 11% of the cases.

Waveform and spectrogram examples of source instabilities in Exercise 2. Time and frequency axes are slightly variable across the four spectrograms. Trajectory of the first formant (F₁) is indicated by a thin solid line overlaid on the spectrograms. (A) Phonation of a female subject. Near the upgliding formant transition there is a short break (arrow 1) and a frequency perturbation (arrow 2). (B) Phonations of a female subject. There is a break near the upgliding formant transition and a frequency jump near the downgliding formant transition. Note that the harmonics become more faint, in the background of some chaotic noise in the high formant vowel. (C) Phonation of a male subject. F₀∕2 subharmonics start at arrow 1 and F₀∕3 subharmonics start at arrow 2. (D) Phonation of a male subject. A short subharmonic regime starts at arrow 1. This location is the same as the downgliding formant transition. Arrows 2 and 3 in the waveform envelope above point to sudden amplitude increase near the formant transition, a pattern common in many subjects.

Independent of loudness, the overall mean proportion of source instabilities in phonations with F₀–F₁ crossover was 54% across male subjects (s.d.=28; range: 3%–89%; N=9). Five male subjects had five or more phonations without crossover that served for comparison. Without F₀–F₁ crossovers, the overall mean proportion of source instabilities in phonations was 23% (s.d.=23; range: 0%–43%; N=5). The difference was statistically significant (Wilcoxon; Z=2.02; P<0.05; N=5). For females, the overall mean proportion of source instabilities in phonations with F₀–F₁ crossover was 34% (s.d.=18; range: 8%–61%; N=9). Unfortunately, the female subject did not produce enough phonations without crossover for statistical comparison. Nevertheless, the result is that the proportion of source instabilities in phonations with F₀–F₁ crossover is significantly less in women (34%) than in men (54%) (Mann–Whitney; U=21.5; P<0.05; N_F, N_M=9).

Effect of loudness

A sufficient sample of noncrossover phonations within the two categories “loud” and “soft” was not available. When F₀–F₁ crossover occurred, however, the overall mean proportion of source instabilities in loud phonations was 59% in all males (s.d.=25; range: 9%–100%; N=9), whereas in soft phonations with F₀–F₁ crossover it was 65% (s.d.=22; range: 33%–100%; N=9). This difference was statistically not significant (Wilcoxon; Z=0.77; P=0.77; N=9). In females, the overall mean proportion of source instabilities in loud phonations with F₀–F₁ crossover was 23% across all subjects (s.d.=15; range: 11%–73%; N=9), whereas in soft phonations with F₀–F₁ crossover it was 43% (s.d.=25; range: 0%–100%; N=9). This difference was statistically significant (Wilcoxon; Z=−2.01; P<0.05; N=9), indicating that soft phonations are more prone to instability than loud phonations.

The location of the instability relative to the position of the crossover was investigated. Exercise 2 was designed to provoke two points of crossovers at the transitions from the first vowel to the second vowel, and back to the first vowels. Figure 8 indicates that instabilities were much more common near the two F₀–F₁ crossovers (“near” means within a 500 ms vicinity from the midpoint of the F₀–F₁ crossover) than in the steady portions of the exercise.

Relative occurrences of frequency jumps at five locations in Exercise 2 across 18 individuals. We tested if the instability occurred either in the steady segments or in a 500 ms vicinity of the gliding formant transitions.

F₀ Symmetry at the F₀–F₁ crossover

Near the F₀–F₁ crossovers we often observed F₀ perturbations that were not sudden fundamental frequency jumps or voice breaks, but rather a dip with a recovery (Fig. 9). A similar phenomenon was part of an earlier figure for Exercise 1, Figs. 5B, 5D at arrow 1. These dips and recoveries often showed either symmetric or antisymmetric patterns at the two formant transition of Exercise 2. For example, note the zoomed-in contours in the lowest panels of Fig. 9A. The two arrows indicate symmetric or antisymmetric fundamental frequency perturbations. Figure 9B shows upward (symmetric) perturbations at the transitions, opposite to what was seen in Fig. 9A. In addition, F₀ lowered in the middle portion as reactance changed [Figs. 9A, 9B, 9D]. Figure 9C shows a reduction in vibrato at the vowel transitions. Figure 9D shows a general fundamental frequency lowering in the middle vowel, with a period 2 subharmonic.

Waveforms, spectrograms, and F₀ contours for Exercise 2. Trajectory of the first formant (F₁) is indicated by a thin solid line overlaid on the spectrograms. (A) Phonation of a male. Note that F₀ contours show a symmetry and an antisymmetry pattern near the formant transitions (arrows). (B) Female phonation. Note the increase of F₀ near both vowel transitions, with a phonation break near the second vowel transition. The F₀ increase near both vowel transitions is associated with sudden and short amplitude increases (see wave envelope above spectrum). However, both sustained vowels are similar in amplitude. (C) Male phonation. Note the strong vibrato during sustained vowel phonation, its offset during the vowel transition, as well as the onset of subharmonics at the second vowel transition (arrow). (D) Male phonation. Note the subharmonic onset and offset near the formant transitions (arrows) and the overall F₀ drop throughout the high F₁ (∕æ∕) vowel (between the two arrows).

Discussion of Exercise 2

Exercise 2 delivered at least three findings, which confirmed findings from Exercise 1. First, source instabilities occur more often in phonations in which F₀–F₁ crossovers are present. Second, source instabilities occur more often in male phonations than in female phonations when there are F₀–F₁ crossovers. Third, there are more source instabilities in soft phonations than in loud phonations when there are F₀–F₁ crossovers. The effect of loudness is not clear. In Exercise 1 males phonating softly were troubled profoundly by F₀–F₁ crossovers, but in Exercise 2 females produced more irregularities in soft utterances with F₀–F₁ crossovers.

Although the exercise was designed to keep fundamental frequency constant, many subjects failed to do so. In F₀–F₁ crossover utterances, F₀ often decreased by up to 50 Hz, when F₀ was on the reactive side of F₁, suggesting a direct effect on F₀ during strong nonlinear coupling.

We have little explanation to offer for the sometimes opposite behavior of F₀ perturbation in the middle part (during onset and offset of the second vowel) between subjects [Figs. 9A, 9B], except that possibly there exists individual-specific patterns in the correction pattern in reaction to the disturbance when F₀ and F₁ cross. Whether such individual specificity relates to vocal fold morphology or motor pattern of intrinsic laryngeal muscles remains speculation at this stage.

Exercise 3: Simultaneous vowel and fundamental frequency transitions

Frequency jumps were found in 15% of all phonations of Exercise 3 across 18 subjects (s.d.=15%; range: 0%–46%; N=18). Examples are shown in Figs. 10A, 10D. By contrast, Fig. 10B shows an example of no frequency jumps when F₀ crosses F₁. Three individuals showed no frequency jumps, or only a F₀ perturbation without bifurcation, as in Fig. 10C. Frequency jumps from 15 subjects show a mean frequency change of 61 Hz (s.d.=38 Hz; range: 11–127 Hz, N=15), which amounts to about 2–3 semitones. Subharmonics were found in 20% of all phonations across 18 subjects (s.d.=15%; range: 0%–53%; N=18). Chaotic segments were found in 5% of all phonations across 18 subjects (s.d.=7%; range: 0%–25%; N=18). F₀–F₁ crossovers occurred in 89% of all phonations (s.d.=16%; range: 50%–100%, N=18 subjects), in women more often (99%) than in men (78%) (Mann–Whitney; U=5.5; P<0.01; N_F, N_M=9), for reasons given earlier.

Waveforms and spectrogram examples of source instabilities in Exercise 3. Trajectory of the first formant (F₁) is indicated by a horizontal line overlaid on the spectrograms. Time and frequency axes are scaled variably. (A) Phonation of a male subject. A frequency jump is indicated at arrow 1. (B) Phonation of a female subject. In the upgliding F₀ and downgliding formant transition, a subharmonic segment starts when F₁ and F₀ cross. (C) Phonations of a male subject. There is fundamental frequency perturbation without a jump or a break (arrow 1). (D) Phonation of a female subject. Frequency jumps are present at arrow 1 and arrow 2.

Independent of loudness, the overall mean proportion of source instabilities in phonations with F₀–F₁ crossover was 62% across the male subjects (s.d.=20; range: 26%–89%;N=9). Five male subjects had five or more phonations without crossover that served for comparison. Without F₀–F₁ crossover, the overall mean proportion of source instabilities was 21% across individuals (s.d.=24; range: 0%–50%; N=5). The difference in the proportion of source instabilities in phonations with and without F₀–F₁ crossovers was statistically significant (Wilcoxon; Z=2.02; P<0.05; N=5). For females, the overall mean proportion of source instabilities in phonations with F₀–F₁ crossover was 30% (s.d.=16; range: 8%–59%; N=9). Female subjects did not produce enough phonations without crossover to make a statistical comparison. The proportion of source instabilities in phonations with F₀–F₁ crossover was significantly less in women (30%) than men (62%) (Mann–Whitney; U=8.5; P<0.01; N_F, N_M=9).

Effect of loudness

A sufficient sample of noncrossover phonations within the two categories “loud” and “soft” was not available. For crossovers, males showed an overall mean proportion of source instabilities in loud phonations of 59% (s.d.=25; range: 9%–100%; N=9), whereas in soft phonations it was 63% (s.d.=21; range: 33%–100%; N=9). The difference was statistically not significant (Wilcoxon; Z=−0.47; P=0.63; N=9). For females the overall mean proportion of source instabilities in loud phonations with F₀–F₁ crossover was 23% (s.d.=16; range: 0%–50%; N=9), whereas in soft phonations it was 45% (s.d.=27; range: 0%–77%; N=9). This difference was statistically significant (Wilcoxon; Z=2.25; P<0.05; N=9).

Discussion of Exercise 3, and comparative data

The design of the exercise did not allow the comparison between crossover and noncrossover phonations. However, results of Exercise 3 suggest that males seem more susceptible to produce source instabilities in crossover phonation, confirming findings from Exercises 1 and 2. Exercise 3 also confirmed that soft phonations are more susceptible to instabilities than loud phonations when F₀ and F₁ cross.

Frequency jumps, which were the most frequent instability, were larger in Exercise 3 than in either Exercise 1 or Exercise 2. Specifically, the frequency jumps of 2–3 semitones were more than twice as large as those of Exercise 2, where vowel changes alone were targeted. Exercise 2 was presumably produced with constant laryngeal muscle activations to keep F₀ constant, thereby resisting F₀ changes. The larger frequency jumps in Exercise 3 may be attributable to two motor patterns (intrinsic laryngeal muscles and vocal tract configuration) changing simultaneously. At the F₀–F₁ crossing, the intrinsic laryngeal muscles are programmed to continuously change F₀, but the vocal tract impedance is disturbing the normal vocal fold vibrations. Somato-sensory feedback in the vocal fold muscles has not been consistently found (Loucks et al., 2005). Whether or not a feedback mechanism is responsible for differences in the size of frequency jumps (via mucosal mechanoreceptors), as opposed to a passive biomechanical mechanism, remains to be investigated.

General discussion

The exercises were designed to control for either vocal tract changes (Exercise 1) or for source changes (Exercise 2), or both (Exercise 3). With human subjects, however, the source and vocal tract changes never occur completely in isolation because supraglottal tissues and laryngeal tissues are connected and often influence one another, even if the attempt is to keep one or the other unchanged. For example, fundamental frequency changes can be associated with tongue-hyoid movement or with larynx height (Shipp et al., 1984; Maurer et al., 1991). Articulations (vowel transitions) are associated with F₀ changes (Whalen and Levitt, 1995; Whalen et al., 1998). These interdependencies must be taken into account because they may contribute to a higher incidence of F₀ instabilities. Biomechanical changes associated with articulation may cause less control over vocal fold adduction and thereby predispose the vocal folds vibration patterns to bifurcate. Nevertheless, the higher incidence of source instabilities with F₀–F₁ crossover supports the hypothesis that nonlinear source filter coupling is at work, independent of whether or not the vibrating source is predisposed to instabilities for additional reasons. In the companion paper (Titze, 2008), where a purely theoretical analysis was performed with single parameter variations, similar bifurcations were observed by contrasting nonlinear versus linear coupling.

By way of an unexpected and untargeted result, we observed sudden dramatic amplitude increases near the F₀–F₁ crossovers (Fig. 11). Across individuals, the amplitude surges could be up to 15 dB [for example, Fig. 11D]. This phenomenon was mostly observed in Exercise 2 but did also occur in Exercises 1 and 3, although to a much lesser amount. These sudden and very short-term amplitude increases were synchronized with F₀–F₁ crossovers and never occurred in phonations without crossovers.

Waveforms and spectrograms of phonations showing sudden amplitude bursts near the F₀–F₁ crossover. Trajectory of the first formant (F₁) is indicated by a thin solid line overlaid on the spectrograms. (A) Male phonation. Amplitude increase of 4.5 dB. (B) Female phonation. Amplitude burst of 10 dB. (C) Female phonation. Amplitude burst of 12 dB. (D) Same female as in (C) but on Exercise 3. Amplitude burst of 15 dB. In none of the phonations there is a strong bifurcation.

The sudden amplitude surges could be explained by linear source-filter theory in terms of rapid vocal tract, pressure changes in a dynamically changing vocal tract especially when vocal tract constrictions are suddenly made or released. Alternatively, nonlinear source–tract coupling could cause a sudden change in the vocal fold vibration amplitude that results in an increase of the power output of the source signal. Glottal source power output varies with open quotient and maximum flow declination rate (Titze, 2006b; companion paper, Titze 2008). There is the distinct possibility that when F₀ first traverses the compliant reactance range of F₁ and then suddenly enters the inertive reactance range (or vice versa), the maximum flow declination rate can fluctuate greatly.

In future studies, Exercises 2 and 3 might be individual specific in design so that more noncrossover phonations are produced, for comparative purposes. A subject’s first formant range for ∕α∕ and ∕æ∕ vowels might determine the starting F₀ and the range of ∕i∕ and ∕u∕ might decide the ending F₀ for the respective exercise. Although Exercise 3 shows the most dramatic effects, it may not be ideal for diagnostic purposes because of the difficulty of pinpointing the crossover point. Measurement of F₀ and F₁ is more difficult and contains a number of possible errors (more than for Exercises 1 and 2). Keeping either source frequency (F₀) or vocal tract frequency (F₁) constant allows a relatively reliable measurement, even in high F₀ phonations (if they are combined with vocal fry phonation).

There may be an exercise-specific bias for certain nonlinear phenomena. For instance, the greatest number of frequency jumps occurred in Exercise 1. One might test the generality of this in future studies with computational models.

There is also an individual-specific pattern of nonlinear phenomena occurring in crossover phonations. In our data set, two males and one female subject showed dramatic differences in the ratio of source instabilities between phonations with and without F₀–F₁ crossover (100% in phonations with and 0% in phonations without crossovers). Some subjects seem to show a bias in their productions toward one or another nonlinear phenomenon. An account for an individual-specific patterning of nonlinear phenomena has been given in several nonhumans (Riede et al., 1997, 2000, 2007). This brings us back to the original hypothesis that humans (and perhaps other species) have some flexibility in operating their source–filter combination with either linear or nonlinear coupling. With human subject experiments, the nonlinear coupling parameter (the diameter of the epilarynx tube) was not controlled. Greater detail, with specific parameters identified and controlled for this nonlinear coupling, is given in the companion paper.

It is perhaps a little premature to make specific recommendations for clinical or pedagogical use of the exercises investigated here. Voice disorders resulting from lesions (nodules or polyps) create mode-of-vibration instabilities. Bilateral asymmetries cause difficulties with synchronization between left and right vocal fold movement. A rapidly changing acoustic load, as proposed in these exercises, may exacerbate these instabilities, thereby lowering the threshold for detection of a disorder. It is our belief that in the near future the traditional reliance on comfortable fundamental frequency and loudness vowel utterances will be replaced with exercises that are a bit more out of the comfort zone. These exercises designed here were not easy for some subjects. Much like running and jumping may be more telling about problems with locomotion than easy walking, vocal fold disorders may be more detectable when the vibrations are destabilized with more challenging acoustic loads. Singers who want to avoid these instabilities could possibly benefit from structured practice in the instability region, with the intent of developing muscle patterns that counteract the instabilities.

ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health Grant No. 5R01 DC004224-08 from the National Institute on Deafness and Other Communication Disorders. T.R. was supported by a fellowship of the “Deutsche Akademie der Naturforscher, Leopoldina” (BMBF-LPD 9901∕8-127).

References

Alipour, F., Montequin, D., and Tayama, N. (2001). “Aerodynamic profiles of a hemilarynx with a vocal tract,” Ann. Otol. Rhinol. Laryngol. 110, 550–555. [DOI] [PubMed] [Google Scholar]
Boersma & Weenick (2007). “Praat: Doing phonetics by computer,” retrieved from www.praat.org 29 October and 4 December.
Chan, R. W., and Titze, I. R. (2006). “Dependence of phonation threshold pressure on vocal tract acoustics and vocal fold tissue mechanics,” J. Acoust. Soc. Am. 10.1121/1.2173516 119, 2351–2362. [DOI] [PubMed] [Google Scholar]
Childers, D., and Wu, K. (1991). “Gender recognition from speech. II. Fine analysis,” J. Acoust. Soc. Am. 10.1121/1.401664 4, 1841–1856. [DOI] [PubMed] [Google Scholar]
Fant, G. (1960). Acoustic Theory of Speech Production, 2nd ed. (Mouton, The Hague, The Netherlands). [Google Scholar]
Fant, G. (1986). “Glottal flow: Models and interaction,” J. Phonetics 14, 393–399. [Google Scholar]
Fletcher, N. H. (1979). “Excitation mechanisms in woodwind and brass instruments,” Acustica 43, 63–72. [Google Scholar]
Freund, H. J., and Büdingen, H. J. (1978). “The relationship between speed and amplitude of the fastest voluntary contractions of human arm muscles,” Exp. Brain Res. 10.1007/BF00235800 31, 1–12. [DOI] [PubMed] [Google Scholar]
Fujimura, O., and Lindqvist-Gauffin, J. (1971). “Sweep-tone measurements of vocal tract characteristics,” J. Acoust. Soc. Am. 10.1121/1.1912385 49, 541–558. [DOI] [PubMed] [Google Scholar]
Hatzikirou, H., Fitch, W. T., and Herzel, H. (2006). “Voice instabilities due to source-tract interactions,” Acta. Acust. Acust. 92, 468–475. [Google Scholar]
Henrich, N., d’Alessandro, C., Castellengo, M., and Doval, B. (2005). “Glottal open quotient in singing: Measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency,” J. Acoust. Soc. Am. 10.1121/1.1850031 117, 1417–1430. [DOI] [PubMed] [Google Scholar]
Herzel, H. P. (1998). “Nonlinear dynamics of the voice: Time series analysis, modeling and experiments,” Curr. Top. Acoust. Res. 2, 17–30. [Google Scholar]
Hirano, M. (1981). Clinical Examination of Voice (Springer, Vienna). [Google Scholar]
Ishizaka, K., and Flanagan, J. L. (1972). “Synthesis of voiced source sounds from a two-mass model of the vocal cords,” Bell Syst. Tech. J. 51, 1233–1268. [Google Scholar]
Kent, R., Kent, J., and Rosenbek, J. (1987). “Maximum performance tests of speech production,” J. Speech Hear Disord. 52, 367–387. [DOI] [PubMed] [Google Scholar]
Loucks, T. M. J., Poletto, C. J., Saxon, K. G., and Ludlow, C. L. (2005). “Laryngeal muscle responses to mechanical displacement of the thyroid cartilage in humans,” J. Appl. Physiol. 10.1152/japplphysiol.00402.2004 99, 922–930. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maurer, D., Landis, T., and d’Heureuse, C. (1991). “Formant movement and formant number alteration with rising F0 in real vocalizations of the German vowels [u], [o] and [a],” Int. J. Neurosci. 57, 25–38. [DOI] [PubMed] [Google Scholar]
Mende, W., Herzel, H., and Wermeke, K. (1990). “Bifurcations and chaos in newborn infant cries,” Phys. Lett. A 10.1016/0375-9601(90)90305-8 145, 418–424. [DOI] [Google Scholar]
Neubauer, J., Edgerton, M., and Herzel, H. (2004). “Nonlinear phenomena in contemporary vocal music,” J. Voice 10.1016/S0892-1997(03)00073-0 18, 1–12. [DOI] [PubMed] [Google Scholar]
Peterson, G. E., and Barney, H. L. (1952). “Control methods used in a study of the vowels,” J. Acoust. Soc. Am. 10.1121/1.1906875 24, 175–184. [DOI] [Google Scholar]
Riede, T., Arcadi, A. C., and Owren, M. J. (2007). “Nonlinear acoustics in pant hoots of common chimpanzees (Pan troglodytes): Vocalizing at the edge.” J. Acoust. Soc. Am. 10.1121/1.2427115 121, 1758–1767. [DOI] [PubMed] [Google Scholar]
Riede, T., Herzel, H., Mehwald, D., Seidner, W., Trumler, E., Böhme, G., and Tembrock, G. (2000). “Nonlinear phenomena and their anatomical basis in the natural howling of a female dog-wolf breed,” J. Acoust. Soc. Am. 10.1121/1.1289208 108, 1435–1442. [DOI] [PubMed] [Google Scholar]
Riede, T., Wilden, I., and Tembrock, G. (1997). “Subharmonics, biphonations, and frequency jumps–Common components of mammalian vocalization or indicators for disorders?,” Z. Säugetierkunde 62, 198–203. [Google Scholar]
Rothenberg, M. (1981). “Acoustic interaction between the glottal source and the vocal tract,” in Vocal Fold Physiology, edited by Stevens K. N. and Hirano M. (University of Tokyo Press, Tokyo), pp. 305–328. [Google Scholar]
Schmidt, R. A., and Lee, T. D. (1989). Motor Control and Learning: A Behavioral Emphasis, 3rd ed. (Human Kinetics, Champaign, IL). [Google Scholar]
Schutte, H. K., and Miller, D. G. (1993). “Belting and pop, nonclassical approaches to the female middle voice: Some preliminary considerations,” J. Voice 10.1016/S0892-1997(05)80344-3 7, 142–150. [DOI] [PubMed] [Google Scholar]
Shipp, T. (1984). “Effects of vocal frequency and effort on vertical laryngeal position,” Journal of Research in Singing 7, 1–5. [Google Scholar]
Stevens, K. (1998). Acoustic Phonetics (Current Studies in Linguistics) (MIT, Cambridge, MA). [Google Scholar]
Story, B. H., Laukkanen, A. M., and Titze, I. R. (2000). “Acoustic impedance of an artificially lengthened and constricted vocal tract,” J. Voice 10.1016/S0892-1997(00)80003-X 14, 455–469. [DOI] [PubMed] [Google Scholar]
Sundberg, J., Gramming, P., and Lovetri, J. (1993). “Comparisons of pharynx, source, formant, and pressure characteristics in operatic and musical theatre singing,” J. Voice 10.1016/S0892-1997(05)80118-3 7, 301–310. [DOI] [PubMed] [Google Scholar]
Sundberg, J., and Hogset, C. (2001). “Voice source differences between falsetto and modal registers in counter tenors, tenors and baritones,” Logoped. Phoniatr. Vocol. 26, 26–36. [PubMed] [Google Scholar]
Titze, I. R. (1988). “The physics of small-amplitude oscillation of the vocal folds,” J. Acoust. Soc. Am. 10.1121/1.395910 83, 1536–1552. [DOI] [PubMed] [Google Scholar]
Titze, I. R. (2000). Principles of Voice Production, 2nd ed. (National Center for Voice and Speech, Denver, CO). [Google Scholar]
Titze, I. R. (2004). “Theory of glottal airflow and source-filter interaction in speaking and singing,” Acta. Acust. Acust. 90, 641–648. [Google Scholar]
Titze, I. R. (2006a). The Myoelastic-Aerodynamic Theory of Phonation (National Center for Voice and Speech, Denver, CO). [Google Scholar]
Titze, I. R. (2006b). “Theory of maximum flow declination rate vs. maximum area declination rate in phonation.” J. Speech Lang. Hear. Res. 49, 439–447. [DOI] [PubMed] [Google Scholar]
Titze, I. R. (2008). “Nonlinear source-filter coupling in phonation: Theory,” J. Acoust. Soc. Am. 123(3), XXX-XXX [DOI] [PMC free article] [PubMed] [Google Scholar]
Titze, I. R., and Story, B. H. (1995). “Acoustic interaction of the voice source with the lower vocal tract,” J. Acoust. Soc. Am. 10.1121/1.418246 101, 2234–2243. [DOI] [PubMed] [Google Scholar]
Tokuda, I., Riede, T., Neubauer, J., Owren, M. J., and Herzel, H. (2002). “Nonlinear analysis of irregular animal vocalizations,” J. Acoust. Soc. Am. 10.1121/1.1474440 111, 2908–2919. [DOI] [PubMed] [Google Scholar]
Whalen, D. H., Gick, B., Kumada, M., and Honda, K. (1998). “Cricothyroid activity in high and low vowels: Exploring the automaticity of intrinsic F0,” J. Phonetics 10.1006/jpho.1999.0091 27, 125–142. [DOI] [Google Scholar]
Whalen, D. H., and Levitt, A. G. (1995). “The universality of intrinsic F0 of vowels,” J. Phonetics 10.1016/S0095-4470(95)80165-0 23, 249–366. [DOI] [Google Scholar]
Zañartu, M., Mongeau, L., and Wodlicka, G. R. (2007). “Influence of acoustic loading on an effective single mass model of the vocal folds,” J. Acoust. Soc. Am. 10.1121/1.2409491 121, 1119–1129. [DOI] [PubMed] [Google Scholar]
Zhang, Z., Neubauer, J., and Berry, D. A. (2006). “The influence of subglottal acoustics on laboratory models of phonation,” J. Acoust. Soc. Am. 10.1121/1.2225682 120, 1558–1569. [DOI] [PubMed] [Google Scholar]

[c1] Alipour, F., Montequin, D., and Tayama, N. (2001). “Aerodynamic profiles of a hemilarynx with a vocal tract,” Ann. Otol. Rhinol. Laryngol. 110, 550–555. [DOI] [PubMed] [Google Scholar]

[c2] Boersma & Weenick (2007). “Praat: Doing phonetics by computer,” retrieved from www.praat.org 29 October and 4 December.

[c4] Chan, R. W., and Titze, I. R. (2006). “Dependence of phonation threshold pressure on vocal tract acoustics and vocal fold tissue mechanics,” J. Acoust. Soc. Am. 10.1121/1.2173516 119, 2351–2362. [DOI] [PubMed] [Google Scholar]

[c5] Childers, D., and Wu, K. (1991). “Gender recognition from speech. II. Fine analysis,” J. Acoust. Soc. Am. 10.1121/1.401664 4, 1841–1856. [DOI] [PubMed] [Google Scholar]

[c6] Fant, G. (1960). Acoustic Theory of Speech Production, 2nd ed. (Mouton, The Hague, The Netherlands). [Google Scholar]

[c7] Fant, G. (1986). “Glottal flow: Models and interaction,” J. Phonetics 14, 393–399. [Google Scholar]

[c8] Fletcher, N. H. (1979). “Excitation mechanisms in woodwind and brass instruments,” Acustica 43, 63–72. [Google Scholar]

[c9] Freund, H. J., and Büdingen, H. J. (1978). “The relationship between speed and amplitude of the fastest voluntary contractions of human arm muscles,” Exp. Brain Res. 10.1007/BF00235800 31, 1–12. [DOI] [PubMed] [Google Scholar]

[c10] Fujimura, O., and Lindqvist-Gauffin, J. (1971). “Sweep-tone measurements of vocal tract characteristics,” J. Acoust. Soc. Am. 10.1121/1.1912385 49, 541–558. [DOI] [PubMed] [Google Scholar]

[c11] Hatzikirou, H., Fitch, W. T., and Herzel, H. (2006). “Voice instabilities due to source-tract interactions,” Acta. Acust. Acust. 92, 468–475. [Google Scholar]

[c12] Henrich, N., d’Alessandro, C., Castellengo, M., and Doval, B. (2005). “Glottal open quotient in singing: Measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency,” J. Acoust. Soc. Am. 10.1121/1.1850031 117, 1417–1430. [DOI] [PubMed] [Google Scholar]

[c13] Herzel, H. P. (1998). “Nonlinear dynamics of the voice: Time series analysis, modeling and experiments,” Curr. Top. Acoust. Res. 2, 17–30. [Google Scholar]

[c14] Hirano, M. (1981). Clinical Examination of Voice (Springer, Vienna). [Google Scholar]

[c15] Ishizaka, K., and Flanagan, J. L. (1972). “Synthesis of voiced source sounds from a two-mass model of the vocal cords,” Bell Syst. Tech. J. 51, 1233–1268. [Google Scholar]

[c16] Kent, R., Kent, J., and Rosenbek, J. (1987). “Maximum performance tests of speech production,” J. Speech Hear Disord. 52, 367–387. [DOI] [PubMed] [Google Scholar]

[c17] Loucks, T. M. J., Poletto, C. J., Saxon, K. G., and Ludlow, C. L. (2005). “Laryngeal muscle responses to mechanical displacement of the thyroid cartilage in humans,” J. Appl. Physiol. 10.1152/japplphysiol.00402.2004 99, 922–930. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c18] Maurer, D., Landis, T., and d’Heureuse, C. (1991). “Formant movement and formant number alteration with rising F0 in real vocalizations of the German vowels [u], [o] and [a],” Int. J. Neurosci. 57, 25–38. [DOI] [PubMed] [Google Scholar]

[c19] Mende, W., Herzel, H., and Wermeke, K. (1990). “Bifurcations and chaos in newborn infant cries,” Phys. Lett. A 10.1016/0375-9601(90)90305-8 145, 418–424. [DOI] [Google Scholar]

[c20] Neubauer, J., Edgerton, M., and Herzel, H. (2004). “Nonlinear phenomena in contemporary vocal music,” J. Voice 10.1016/S0892-1997(03)00073-0 18, 1–12. [DOI] [PubMed] [Google Scholar]

[c21] Peterson, G. E., and Barney, H. L. (1952). “Control methods used in a study of the vowels,” J. Acoust. Soc. Am. 10.1121/1.1906875 24, 175–184. [DOI] [Google Scholar]

[c22] Riede, T., Arcadi, A. C., and Owren, M. J. (2007). “Nonlinear acoustics in pant hoots of common chimpanzees (Pan troglodytes): Vocalizing at the edge.” J. Acoust. Soc. Am. 10.1121/1.2427115 121, 1758–1767. [DOI] [PubMed] [Google Scholar]

[c23] Riede, T., Herzel, H., Mehwald, D., Seidner, W., Trumler, E., Böhme, G., and Tembrock, G. (2000). “Nonlinear phenomena and their anatomical basis in the natural howling of a female dog-wolf breed,” J. Acoust. Soc. Am. 10.1121/1.1289208 108, 1435–1442. [DOI] [PubMed] [Google Scholar]

[c24] Riede, T., Wilden, I., and Tembrock, G. (1997). “Subharmonics, biphonations, and frequency jumps–Common components of mammalian vocalization or indicators for disorders?,” Z. Säugetierkunde 62, 198–203. [Google Scholar]

[c25] Rothenberg, M. (1981). “Acoustic interaction between the glottal source and the vocal tract,” in Vocal Fold Physiology, edited by Stevens K. N. and Hirano M. (University of Tokyo Press, Tokyo), pp. 305–328. [Google Scholar]

[c26] Schmidt, R. A., and Lee, T. D. (1989). Motor Control and Learning: A Behavioral Emphasis, 3rd ed. (Human Kinetics, Champaign, IL). [Google Scholar]

[c27] Schutte, H. K., and Miller, D. G. (1993). “Belting and pop, nonclassical approaches to the female middle voice: Some preliminary considerations,” J. Voice 10.1016/S0892-1997(05)80344-3 7, 142–150. [DOI] [PubMed] [Google Scholar]

[c28] Shipp, T. (1984). “Effects of vocal frequency and effort on vertical laryngeal position,” Journal of Research in Singing 7, 1–5. [Google Scholar]

[c29] Stevens, K. (1998). Acoustic Phonetics (Current Studies in Linguistics) (MIT, Cambridge, MA). [Google Scholar]

[c30] Story, B. H., Laukkanen, A. M., and Titze, I. R. (2000). “Acoustic impedance of an artificially lengthened and constricted vocal tract,” J. Voice 10.1016/S0892-1997(00)80003-X 14, 455–469. [DOI] [PubMed] [Google Scholar]

[c31] Sundberg, J., Gramming, P., and Lovetri, J. (1993). “Comparisons of pharynx, source, formant, and pressure characteristics in operatic and musical theatre singing,” J. Voice 10.1016/S0892-1997(05)80118-3 7, 301–310. [DOI] [PubMed] [Google Scholar]

[c32] Sundberg, J., and Hogset, C. (2001). “Voice source differences between falsetto and modal registers in counter tenors, tenors and baritones,” Logoped. Phoniatr. Vocol. 26, 26–36. [PubMed] [Google Scholar]

[c33] Titze, I. R. (1988). “The physics of small-amplitude oscillation of the vocal folds,” J. Acoust. Soc. Am. 10.1121/1.395910 83, 1536–1552. [DOI] [PubMed] [Google Scholar]

[c34] Titze, I. R. (2000). Principles of Voice Production, 2nd ed. (National Center for Voice and Speech, Denver, CO). [Google Scholar]

[c35] Titze, I. R. (2004). “Theory of glottal airflow and source-filter interaction in speaking and singing,” Acta. Acust. Acust. 90, 641–648. [Google Scholar]

[c36] Titze, I. R. (2006a). The Myoelastic-Aerodynamic Theory of Phonation (National Center for Voice and Speech, Denver, CO). [Google Scholar]

[c37] Titze, I. R. (2006b). “Theory of maximum flow declination rate vs. maximum area declination rate in phonation.” J. Speech Lang. Hear. Res. 49, 439–447. [DOI] [PubMed] [Google Scholar]

[c37a] Titze, I. R. (2008). “Nonlinear source-filter coupling in phonation: Theory,” J. Acoust. Soc. Am. 123(3), XXX-XXX [DOI] [PMC free article] [PubMed] [Google Scholar]

[c38] Titze, I. R., and Story, B. H. (1995). “Acoustic interaction of the voice source with the lower vocal tract,” J. Acoust. Soc. Am. 10.1121/1.418246 101, 2234–2243. [DOI] [PubMed] [Google Scholar]

[c39] Tokuda, I., Riede, T., Neubauer, J., Owren, M. J., and Herzel, H. (2002). “Nonlinear analysis of irregular animal vocalizations,” J. Acoust. Soc. Am. 10.1121/1.1474440 111, 2908–2919. [DOI] [PubMed] [Google Scholar]

[c40] Whalen, D. H., Gick, B., Kumada, M., and Honda, K. (1998). “Cricothyroid activity in high and low vowels: Exploring the automaticity of intrinsic F0,” J. Phonetics 10.1006/jpho.1999.0091 27, 125–142. [DOI] [Google Scholar]

[c41] Whalen, D. H., and Levitt, A. G. (1995). “The universality of intrinsic F0 of vowels,” J. Phonetics 10.1016/S0095-4470(95)80165-0 23, 249–366. [DOI] [Google Scholar]

[c42] Zañartu, M., Mongeau, L., and Wodlicka, G. R. (2007). “Influence of acoustic loading on an effective single mass model of the vocal folds,” J. Acoust. Soc. Am. 10.1121/1.2409491 121, 1119–1129. [DOI] [PubMed] [Google Scholar]

[c43] Zhang, Z., Neubauer, J., and Berry, D. A. (2006). “The influence of subglottal acoustics on laboratory models of phonation,” J. Acoust. Soc. Am. 10.1121/1.2225682 120, 1558–1569. [DOI] [PubMed] [Google Scholar]

Subject	Sex	E1	E2	E3
1	M	48	24	24
2	M	47	24	24
3	M	32	15	17
4	F	36	14	16
5	F	27	13	17
6	F	32	16	18
7	M	48	24	24
8	F	31	16	16
9	F	32	24	24
10	F	48	24	24
11	M	48	24	24
12	F	48	24	24
13	F	48	24	24
14	M	48	24	24
15	M	47	24	24
16	F	48	24	24
17	M	48	24	24
18	M	34	16	18

Subject	Sex	E1	E2	E3
1	M	48	24	24
2	M	47	24	24
3	M	32	15	17
4	F	36	14	16
5	F	27	13	17
6	F	32	16	18
7	M	48	24	24
8	F	31	16	16
9	F	32	24	24
10	F	48	24	24
11	M	48	24	24
12	F	48	24	24
13	F	48	24	24
14	M	48	24	24
15	M	47	24	24
16	F	48	24	24
17	M	48	24	24
18	M	34	16	18

PERMALINK

Nonlinear source–filter coupling in phonation: Vocal exercises

Ingo Titze

Tobias Riede

Peter Popolo

Abstract

INTRODUCTION

METHODS

Subjects

Three vocal exercises

Table 1.

Figure 1.

Figure 2.

Table 2.

Recordings

Data analysis

Figure 3.

Figure 4.

Statistics

RESULTS

Exercise 1: Fundamental frequency glide on steady vowels

Figure 5.

Effect of loudness

F0–F1 vicinity

Discussion of Exercise 1

Figure 6.

Table 3.

Exercise 2: Vowel transition on steady fundamental frequency

Figure 7.

Effect of loudness

Figure 8.

F0 Symmetry at the F0–F1 crossover

Figure 9.

Discussion of Exercise 2

Exercise 3: Simultaneous vowel and fundamental frequency transitions

Figure 10.

Effect of loudness

Discussion of Exercise 3, and comparative data

General discussion

Figure 11.

ACKNOWLEDGMENTS

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

F₀–F₁ vicinity

F₀ Symmetry at the F₀–F₁ crossover

Subject	Sex	E1	E2	E3
1	M	48	24	24
2	M	47	24	24
3	M	32	15	17
4	F	36	14	16
5	F	27	13	17
6	F	32	16	18
7	M	48	24	24
8	F	31	16	16
9	F	32	24	24
10	F	48	24	24
11	M	48	24	24
12	F	48	24	24
13	F	48	24	24
14	M	48	24	24
15	M	47	24	24
16	F	48	24	24
17	M	48	24	24
18	M	34	16	18