Skip to main content
The Journal of the Acoustical Society of America logoLink to The Journal of the Acoustical Society of America
. 2012 Apr;131(4):2999–3016. doi: 10.1121/1.3685824

Source-tract interaction with prescribed vocal fold motion

Richard S McGowan 1,a), Michael S Howe 2
PMCID: PMC3341965  PMID: 22501076

Abstract

An equation describing the time-evolution of glottal volume velocity with specified vocal fold motion is derived when the sub- and supra-glottal vocal tracts are present. The derivation of this Fant equation employs a property explicated in Howe and McGowan [(2011) J. Fluid Mech. 672, 428–450] that the Fant equation is the adjoint to the equation characterizing the matching conditions of sub- and supra-glottal Green’s functions segments with the glottal segment. The present aeroacoustic development shows that measurable quantities such as input impedances at the glottis, provide the coefficients for the Fant equation when source-tract interaction is included in the development. Explicit expressions for the Green’s function are not required. With the poles and zeros of the input impedance functions specified, the Fant equation can be solved. After the general derivation of the Fant equation, the specific cases where plane wave acoustic propagation is described either by a Sturm-Liouville problem or concatenated cylindrical tubes is considered. Simulations show the expected skewing of the glottal volume velocity pulses depending on whether the fundamental frequency is below or above a sub- or supra-glottal formant. More complex glottal wave forms result when both the first supra-glottal fundamental frequencies are high and close to the first sub-glottal formant.

INTRODUCTION

Source-filter theory was formulated as a first approximation describing acoustic air motion in the vocal tract, and it was a particularly compelling model for voiced vowel production. At the time of Fant’s 1960 monograph it was known that the sub-glottal and supra-glottal vocal tracts’ fluid mechanics would have an effect on the volume velocity pulse emanating from the glottis (Fant, 1960). For instance, it was known that the effect of the supra-glottal vocal tract is particularly strong during transition to or from obstruent consonants due to strong flow resistance. Glottal flow is affected by the fluid mechanics of the surrounding vocal tract because glottal flow depends on the pressures upstream and downstream of the glottis. The equation for the volume flow through the glottis, Q = Q(t), derived by Fant did not account for the surrounding vocal tract. This equation is

LdQdt+Kρ0Q22Ag2+R'Q=psub, (1)

where L,AgAg(t) are respectively the “inductance” and cross-sectional area of the glottis at its narrowest section, ρ0 is the mean air density, K is an order 1 constant, the coefficient R’ represents the effects of viscous losses, and psub is the pressure just upstream of the glottis (Fant, 1960). This equation assumes that the supra-glottal pressure is zero, and in the case that it is not zero, the sub-glottal pressure, psub, can be replaced by the difference between sub-and supra-glottal pressures, or trans-glottal pressure.

Equation 1 actually is an approximation for the time evolution of a jet through a time-varying orifice, Ag(t), into free space. But, for a jet exiting from a semi-infinite tube of cross-sectional area Asub into a semi-infinite tube of area Asup, there are pressures upstream and downstream of the glottis related to the glottal volume flow itself that need to be taken into account using impedances ρ0c0/Asub and ρ0c0/Asup, where c0 is the speed of sound, assuming that the tubes support only plane wave propagation. Titze showed that, under these conditions, a non-negligible term, Qρ0c0(1/Asub+1/Asup), should be added to the left-hand side of Eq. 1 (Titze, 1984).

Realistic results for finite tubes should involve the inverse Fourier transforms of input impedance functions of the sub- and supra-glottal vocal tract resonators. Such effects have been studied numerically. Flanagan and Landgraf (1968) and Flanagan and Cherry (1968) added an electrical line analog for the supra-glottal vocal tract to a one mass-model of the vocal folds to account for feedback at the glottis provided by a finite length supra-glottal tract. Thus, the effects of the supra-glottal impedances on the glottal volume velocity, which included changes in the vocal folds’ oscillation, were simulated by these authors. Ishizaka and Flanagan also modeled the effects of supra-glottal impedances in their time domain numerical simulation of the two-mass model of vocal fold oscillation coupled with a vocal tract (Ishizaka and Flanagan, 1972).

The effect of the vocal tract on the voice source with prescribed vocal fold motion or glottal area time variation has been considered in order to simplify the problem for greater physical understanding. [Models of source-tract interaction with specified glottal area functions are termed level I models by Titze (2008).] According to Rothenberg (1981), when the fundamental frequency is well below the first formant, the supra-glottal vocal tract input impedance can be approximated by an inductive element. He showed that this has the effect of skewing the glottal pulse so that the glottal flow decreases faster than it increases for a glottal area pulse that is symmetric in time. This phenomenon was confirmed by Ananthapadmanabha and Fant (1982), while they also studied the ripples on the volume velocity source caused by source-tract interaction (see also Fant, 1986). From their numerical work they concluded that it is necessary to include only the effects of the lowest formant in each the sub-glottal and supra-glottal tracts to adequately account for source-tract interaction during speech. Titze has gone on to show how realistic sub- and supra-glottal resonant tube shapes affect the amplitudes of the harmonics of the glottal flow in level I interactions using time-domain simulations of acoustic propagation in the sub- and supra-glottal vocal tract, particularly when the fundamental frequency is close to a formant frequency (Titze, 2008).

The approach that we have adopted is to derive the equation for the volume flow through the glottis, Q(t), in the time domain (Howe and McGowan, 2007, 2011a), using integral solutions to the equations of air motion with Green’s function kernels. This has two advantages. The first advantage is that a closed form ordinary differential equation for Q is obtained with known degrees of approximation. The resulting differential equation for Q is termed a Fant equation, and is a generalization of Eq. 1. The relative sizes of terms in the Fant equation can be estimated, and the equation approximated after the full equation for Q is derived. In particular, the importance of source-tract interaction can be judged. The second advantage of our approach is that a time-domain numerical solution of the differential equation for Q is possible without simulating wave propagation, while still including the effects the sub- and supra-glottal vocal tracts on Q.

In the previous work of Howe and McGowan (2007, 2011a) it was possible to obtain explicit expressions for the Green’s functions in infinitely long sub- and supra-glottal tracts, in the limit that these tracts allow only plane wave propagation. A Green’s function could also be derived for simplified sub-glottal systems (Howe and McGowan, 2009) and special supra-glottal vocal tracts, such as straight tubes (Howe and McGowan, 2011b). The Green’s functions from these previous works were derived using asymptotic matching procedures between neighboring regions: between sub-glottal and glottal regions and between supra-glottal and glottal regions. Flow in the sub- and/or supra-glottal tracts was assumed to be acoustic, and the flow was assumed incompressible in the glottal region under the acoustic compactness approximation. However, it is not possible to write an explicit Green’s function for human vocal tracts in the generality necessary for the great variety of speech sounds.

Mathematical progress has recently been made in understanding how to derive Fant equations that account for the effect of sub- and supra-glottal wave propagation on Q (Howe and McGowan, 2011b). As prelude to the discussion of this progress, an important fact is introduced: while the wave-operator is a self-adjoint operator, the problem is not self-adjoint because of the moving boundaries, i.e., the moving vocal folds (Morse and Feshbach, 1953). Thus, it is appropriate to use the Green’s function for the adjoint problem, denoted G here, as the kernel in the integral solution (Morse and Feshback, 1953; Howe and McGowan, 2009, 2011a). The mathematical progress is in showing that the integro-differential operator for the Fant equation is the adjoint to one derived from matching procedures for the Green’s function between neighboring regions for a Helmholtz resonator or a straight-tube resonator (Howe and McGowan, 2011b). This result is easily generalized here. Thus, an explicit form of the Green’s function need not be available, only the asymptotic matching conditions for the Green’s function G are needed to derive the operator that appears in the Fant equation for Q.

The first-order integro-differential equation that the Green’s function satisfies comes from considering its compact approximation, and matching segments of the Green’s function between incompressible and acoustic regions. In other words, the Green’s function is considered to be composed of segments valid in three different regions of the vocal tract: sub-glottal, glottal, and supra-glottal. These segments are related by asymptotic matching conditions, which provide for continuity of pressure and volume velocity. Matching the Green’s function segments between the sub-glottal region and the glottal region and between the supra-glottal region and the glottal region results in the desired first-order integro-differential equation.

In the following section a theory of source-tract interaction with prescribed vocal fold motion is presented. Initially, minimal assumptions are made about the nature of the Green’s function. However, it is necessary to make the nature of the problem more specific only to finalize the form of the Fant equation. Further, the Fant equation derived here will be such that only measured quantities appear as coefficients or integration kernels. At the end of this section, functional forms for the input impedances are derived assuming a Sturm-Liouville problem or a series of cylindrical concatenated tubes for the sub- and supra-glottal vocal tracts so that the Fant equation can be solved for a great variety of vocal tracts. Simulations based on the Fant equation will be presented in Sec. 3. Finally, a conclusion is provided in Sec. 4.

The Fant equation derived in this paper will be applied to the level I problem of source-tract interaction, where the glottal areas are specified. The application to the level I problems will show some of the expected trends of volume velocity pulse shape. However, these will not be realistic simulations of source-tract interaction, because fluid-structure interaction in the glottal region needs to be considered for realistic simulations. Those considerations are beyond the scope of this paper. Section 4 indicates how the same Fant equation can be applied in solving full source-tract interaction problems.

THEORY

Preliminaries

The supra-glottal vocal tract, the glottal air channel, and the sub-glottal tract are partially bounded by solid surfaces about a common Cartesian axis in the x1 direction (Fig. 1). The cross-sectional areas near the glottis are such that acoustic propagation occurs only along the x1 direction for the frequencies considered. The area of the sub-glottal tube adjoining the glottal region is Asub, the area of the supra-glottal vocal tract adjoining the glottal region is Asup, and the area of the narrowest part of the glottis is Ag = Ag(t). The coordinate system, (x1, x2, x3) has its origin along the vocal tract axis at the anterior end of the glottis.

Figure 1.

Figure 1

Schematic of the vocal tract. The control volume is bounded by the solid walls of the vocal tract and the surfaces between the lips and through the bronchial tubes, denoted SL. The surfaces Ssub and Ssup are used to segment the control volume into three portions: the sub-glottal tract, glottal region, and supra-glottal tract.

In this work we are concerned with disturbances to the air in the vocal tract caused by vocal fold vibration: voiced speech. While the Mach number of fluid flow can change from zero to the order of 0.1 in the glottal region, the resulting fluctuating pressures are small compared to the ambient atmospheric pressure. For the low Mach number, homentropic flow considered here, pressure is a function of density alone, and the fluctuating quantity of interest is the total enthalpy, B=dpρ+12ν2, where ν=|v| with v the air particle velocity. Further, dissipation is confined to the lateral boundaries with strong shear layers and wall vibration. For such flows the vortex sound equation can be written (Howe, 1998, 2002; Howe and McGowan, 2007).

(1co22t2-2xj2)B=div(ωv), (2)

where ω=v is vorticity, t is time, and the repeated subscript j implies summation.

A control volume V(τ), where τ is time, encompassing the the supra-laryngeal vocal tract, glottal region, and a large portion of the sub-glottal vocal tract is now specified (Fig. 1). The control volume is bounded by the surface S(τ), which covers the opening at the lips and lines the vocal tract into the bronchial tree until a bronchial generation is encountered where acoustic disturbances created at the glottis become negligible due to viscous loss. The surface is completed with cross-sectional planes through each of the bronchial tubes in that generation. The part of S that span across the bronchial tubes is a disconnected surface denoted SL.

The boundary condition for B on the surface S(τ) in the glottal region is Byn=-νnτ with yn the normal direction to the solid surfaces (directed into the control volume), and υn fluid particle velocity normal to the surface. It is assumed that a real-valued operator, Zw+l(y,τ), exists such that

υn(y,τ)τ+-Zω+l(y,τ')B(y,τ-τ')dτ'=0, (3)

on the solid surfaces of the sub- and supra glottal vocal tract walls and on the surface between the lips. y = (y1, y2, y3) are Cartesian coordinates. The boundary condition on the surface SL that cuts through the bronchi will be determined by the rate of contraction of the lungs and the resulting flow of air through that surface. This boundary condition will account for the source of B from the lungs. Here, the conception of the bronchial tree with an upper conductive region and lower transitional and respiratory region is adopted, as in Ishizaka et al. (1976). The surface SL is supposed to be somewhere in the conductive region.

It is assumed that the Green’s function, G(x, t;y, τ), for the adjoint problem exists for x and y in the control volume and for the boundary conditions specified below. The Green’s function for the adjoint problem must be employed because the glottal configuration changes rapidly and nonlinearly with time (Howe and McGowan, 2011a,b). When the governing equations for B are in terms of the field variables x and t, the Green’s function, G, is a solution to the adjoint problem in variables y and τ. Thus, the Green’s function for the adjoint problem is the advanced potential Green’s function G(x, t; y, τ), that satisfies

(1co22τ2-2yj2)G=δ(x-y)δ(t-τ),G=0for   τ>t, (4)

where δ(·) is the Dirac delta function. G is chosen to satisfy Gyn=0 on two surfaces: the solid surface in the glottal region and the surface SL that cuts through the bronchi. Further, with consideration of the boundary conditions for B expressed by Eq. 3, it is required that

G(x,t;y,τ)yn--Zω+l(y,τ')G(x,t;y,τ+τ')dτ'=0, (5)

on the solid surfaces of the sub- and supra-glottal vocal tract, and on the surface between the lips. G represents “incoming” waves as a function of y and τ that vanish after convergence onto the source δ(x − y)δ(t − τ) at τ=t. The Green’s function for the adjoint problem, G, is referred to here as the Green’s function.

Equations 2, 4 are combined (Morse and Feshbach, 1953; Howe and McGowan, 2007, 2011a; Howe, 2008) using Green’s theorem, the radiation condition and the momentum equation, to supply the causal solution of Eq. 2 at low Mach numbers in the form

B(x,t)=S(τ)[Gvτ+BGyvGyω]dS(y)dτV(τ)Gyωvd3ydτ, (6)

where x is in the control volume V(τ),v is the kinematic viscosity of air, and dS(y) are vector surface elements of S(τ) directed into the control volume. The boundary conditions specified for B and G can be employed to simplify this formal representation of the B field. With boundary conditions in Eqs. 3, 5 applied to the solid surfaces, in the sub- and supra-glottal regions, denoted Ssubwall(τ) and Ssupwall(τ), respectively, the first two terms of the surface integral in Eq. 6 are zero. Because the surface segment between the lips is not solid and Eqs 3, 5 apply to this section of surface, all three terms in the surface integral vanishes at that surface segment. Applying the remaining boundary conditions on the solid surface in the glottal region and on SL, Eq. 6 reduces to,

B(x,t)=Sg(τ)+SLGvτdS(y)dτSg(τ)+Ssubwall+SsupwallνGyωdS(y)dτV(τ)Gyωvd3ydτ, (7)

where Sg(τ) is the solid surface in the glottal region.

Subsequent discussion of the Green’s function is with the control volume divided into three contiguous regions: the sub-glottal region, the supra-glottal region, and the glottal region. This entails the addition of two additional surfaces bounding these regions. These are portions of planes orthogonal to y1, one portion just below the glottis, Ssub, and another just above the glottis Ssup (Fig. 1). Surface Ssub separates the sub-glottal and glottal regions, has directed normal into the sub-glottal tract, and area Asub, while surface Ssup separates the supra-glottal and glottal regions, has directed normal into the supra-glottal tract, and area Asup. Boundary conditions at these planes will result from matching conditions between adjoining regions. When the frequency is sufficiently low, the flow in the glottal region is approximately incompressible and the time derivatives in Eqs. 2, 4 can be neglected. This is the acoustic compactness approximation.

In Sec. 2B an equation analogous to Eq. 1, a Fant equation, but including the back-reaction of the sub- and supra-glottal vocal tracts, is derived. The subsequent development of the Fant equation does not require that an explicit Green’s function be derived. This is important for two reasons. First, there is an infinite number of supra-glottal vocal tract shapes possible in speech and singing, and only a few are amenable to analytic solution for the supra-glottal Green’s function segment. Also, the form of the sub-glottal segment of the Green’s function is unknown because of the complicated nature of the sub-glottal system (e.g., Habib et al., 1994). Instead of the explicit expressions for the Green’s functions, input impedances at the glottis are used to account for acoustic back-reaction in the glottal region.

The derivation of the Fant equation in Sec. 2B generalizes the development for the straight tube in the Appendix of Howe and McGowan (2011b). In Sec. 2C low-frequency approximations are applied in the sub- and supra-glottal vocal tracts in order to derive expressions for the input impedances in terms of their poles and zeros in the complex frequency plane. These expressions are used to compute the effect of the sub- and supra-glottal tracts on the voice source when glottal motion is specified in Sec. 3.

The Fant equation

Green’s function matching conditions

Under the acoustic compactness approximation with y in the glottal region,

G(x,t;y,τ)=α(τ)+β(τ)Y(y,τ), (8)

where α(τ),β(τ) are functions to be determined, 2Y(y,τ)=0, and Y(y,τ) denotes the velocity potential of flow at unit speed within the duct just upstream of the glottis such that the normal derivative Y/yn=0 on the instantaneous surface S(τ) of the glottis and the neighboring duct, so that

Y(y,τ){y1-¯(τ),|y1|Asubfor   y1<0(AsubAsup)y1,|y1|Asupfor   y1>0. (9)

The dependence of α and β on the field point x and time t is suppressed. The length ¯(τ) is a time-dependent “end correction” (Howe, 1998; Howe and McGowan, 2007),

¯(τ)=-(Y(y,τ)y1-AsubAsupH(y1)-H(-y1))dy1, (10)

where H(·) is the Heaviside step function and the integration is along any path through the glottis. The asymptotic form of Y(y,τ) implies that the tubes neighboring the sub- and supra-glottal tracts are approximately straight along the y1 axis and do not support acoustic cross-modes.

Initially, we work in the frequency domain by taking Fourier transforms with respect to τ. For instance,

G^(x,t;y,ω)=12π-G(x,t;y,τ)e-iωτdτ, (11)

Matching the Green’s function segments between the sub-glottal and glottal regions, and between the supra-glottal and glottal regions, using Eqs. 8, 9, provides

limy1-0G^(x,t;y,ω)limy0y1<0G^(x,t;y,ω)G^(x,t;-0,ω)=α^(ω)-(¯^*β^)(ω), (12)
limy1+0G^(x,t;y,ω)limy0y1>0G^(x,t;y,ω)G^(x,t;+0,ω)=α^(ω), (13)

where (^¯*β^)(ω)=^¯(ω)β(ωω)dω. It follows from Eq. 9 that the variation in G^ with y2 and y3 as y1±0 is small, and thus, the first approximations made in Eqs. 12, 13 are valid. Matching the spatial derivatives gives the relations

limy1-0G^(x,t;y,ω)y1G^(x,t;-0,ω)y1=β^(ω), (14)
limy1+0G^(x,t;y,ω)y1G^(x,t;+0,ω)y1=(AsubAsup)β^(ω). (15)

Green’s function decomposition

The Green’s functions segments in the sub- and supra glottal vocal tracts can be written as a sum of two functions, G0 and GM. G0 denotes the Green’s functions for a point source in either the sub- or supra-glottal regions [i.e., each G0 satisfies Eq. 4] with boundary conditions specified below. The G0 functions satisfy the same homogeneous boundary conditions that G satisfies on the control surface S. The G0 functions also satisfy homogeneous matching conditions where their regions join with the glottal region. The particular matching condition is the “hard-wall” condition that is familiar in the source-filter theory of speech production,

G0(x,t;y=±0,τ)y1=0. (16)

The other functions, GM, satisfy the wave equation [i.e., each GM satisfies Eq. 4, but with the delta function replaced by 0], but account for boundary inhomogeneities on Ssub and Ssup that result from matching Green’s function segments across neighboring regions [see Eqs. 21, 22 below]. Otherwise, GM satisfies the same homogeneous boundary conditions as G and G0 on surface S. Thus,

G(x,t;y,τ)=G0(x,t;y,τ)+GM(yM|x,t;y,τ), (17)

where yM denotes the region of matching: yM = −0 denotes the matching region between the sub-glottal tract and glottal region, and yM = +0 denotes the matching region between the supra-glottal tract and glottal region. GM(yM|x,t;y,τ) is used only with yM, x, and y in the same region. Each G0(x, t;y,τ) segment is defined in both the sub-glottal and supra-glottal regions as long as it is understood that these functions are zero for x and y in different regions. It should be clear in subsequent discussion which segment is being referred to.

The functions G0 in both the sub- and supra-glottal vocal tracts are assumed to be homogeneous in time. In particular, the sub- and supra-glottal tracts are quasi-steady when compared to the time scale of vocal fold vibration. This would permit G0 to be written as a function of t-τ instead of two separate arguments t and τ. However, to explicitly note that the variable t remains in the Fourier transform of G0(x, t;y, τ) with respect to τ, albeit as part of a simple phase factor, there will be no change in the argument list to denote homogeneity in time. A closely related function, g^0=g^0(x,y,ω), can be defined without the phase factor involving the t variable,

g^0(x,y,ω)=eiωtG^0(x,t;y,ω). (18)

It follows from Eq. 4 that

((ωco)2+2yj'2)g^0(y,y',ω)=-δ(y-y'). (19)

The boundary conditions for the g^0 segments are the Fourier transforms of the homogeneous boundary conditions for G0 expressed in Eq. 5. Thus g^0 is a Green’s function for the Helmholtz equation.

Returning to the matching conditions for the G^M functions on Ssub and Ssup, it follows from Eqs. 1417, that

β^(ω)=G^(x,t;-0,ω)y1=G^M(yM=-0|x,t;-0,ω)y1, (20)
(AsubAsup)β^(ω)=G^M(yM=+0|x,t;+0,ω)y1. (21)

From Eqs. 20, 21 it is seen that β^(ω) and (AsubAsup)β^(ω) are “velocity sources” for G^M(yM=-0|x,t;y,ω) and G^M(yM=+0|x,t;y,ω) at the glottal boundaries of their respective regions. Because the functions GM satisfy the homogeneous wave equation, the functions G^M satisfy the Helmholtz equation

((ωco)2+2yj'2)G^M(yM'=±0|x,t;y',ω)=0, (22)

where the G^M satisfy the same homogeneous boundary conditions that the g^0 do, except on Ssub or Ssup, where they are given by Eqs. 20, 21. Multiplying Eq. 22 by g^0 and Eq. 19 by G^M, subtracting, integrating over either the sub- or supra-glottal region, and applying Green’s theorem, provides

G^M(yM=-0|x,t;y,ω)=-g^0(y,-0,ω)β^(ω)Asub=-g^0(-0,y,ω)β^(ω)Asub (23)
G^M(yM=+0|x,t;y,ω)=-g^0(y,+0,ω)(AsubAsup)β^(ω)Asup=-g^0(y,+0,ω)β^(ω)Asub=-g^0(+0,y,ω)β^(ω)Asub (24)

where the final equalities in Eqs. 23, 24 follow because the g^0 are symmetric in the spatial variables (Morse and Feshbach, 1953).

Input impedances and relations to Green’s function segments

Input impedance at the supra-glottal surface Ssup can be defined as the Fourier transform of the pressure signal at Ssup resulting from a delta function volume velocity pulse input to the supra-glottal tract at Ssup. Equivalently, let φ^(y,ω) be the velocity potential in a neighborhood of Ssup. Then the Fourier transform of a brief volume velocity input at Ssup is Asupφ^(+0,ω)y1 and the resulting pressure at Ssup is -iρ0ωφ^(+0,ω). The supra-glottal input impedance, Z^sup(ω) can be defined as the ratio of the pressure to the volume velocity at Ssup

Z^sup(ω)-iωρ0φ^(+0,ω)Asup(φ^(+0,ω)y1). (25)

Here it is assumed that the flow in a neighborhood of Ssup is dominated by potential flow. Input impedance is not usually measured in this direct manner with brief volume velocity pulses, but indirectly with known impedances in series with the system (e.g., Ishizaka et al., 1976). A similar expression holds for sub-glottal input impedance, Z^sub,

Z^sub(ω)--iωρ0φ^(-0,ω)Asub(φ^(-0,ω)y1). (26)

The minus sign in Eq. 26 is because the sub-glottal tract in the -y1 direction with respect to its surface Ssub at y1=-0.

Let functions h^0 be the adjoints for g^0, so that the functions g^0 are the Fourier transforms of the Green’s function for the direct problems in the sub- and supra-glottal vocal tracts. By the properties of adjoint pairs (Morse and Feshbach, 1953), h^0(y,y',ω)=g^0(y',y,-ω) and h0(y,y',τ)=g0(y',y,τ). It follows by taking the inverse Fourier transform of Eq. 19 that

limy1-0h0(y,-0,τ)y1=limy1-0g0(-0,y,-τ)y1=-1Asubδ(-τ), (27)
limy1+0h0(y,+0,τ)y1=limy1+0g0(+0,y,-τ)y1=-1Asupδ(-τ), (28)

so that

limy1+0(-Asuph0(y,+0,τ)y1)

and

limy1-0(-Asubh0(y,-0,τ)y1)

are delta function “volume velocity” inputs to the supra- and sub-glottal systems that have

limy1+0(ρ0h0(y,+0,τ)τ)

and

limy1-0(ρ0h0(y,-0,τ)τ)

“pressure” outputs. It follows from these considerations and Eqs. 2528 that

Z^sup(ω)=iωρ0limy1+0h^0(y,+0,ω)=iωρ0limy1+0g^0(+0,y,-ω), (29)
Z^sub(ω)=-iωρ0limy1-0h^0(y,-0,ω)=-iωρ0limy1-0g^0(-0,y,-ω). (30)

Note that both Z^sup(ω)=Z^sup*(-ω) and Z^sub(ω)=Z^sub*(-ω). These identities follow from the reality of the inverse Fourier transform of g^0(x,y,ω). Combining Eqs. 23, 24, 29, 30 results in

G^M(ym=+0|x,t;+0,ω)=-Z^sub*(ω)iωρ0β^(ω)Asub, (31)
G^M(ym=-0|x,t;-0,ω)=Z^sub*(ω)iωρ0β^(ω)Asub. (32)

Combining results for the Fant equation

Subtracting Eq. 12 from Eq. 13, and using Eqs. 17, 31, 32 gives

iω(¯^*β^(ω))+Asubρ0β^(ω)(Z^sub*(ω)+Z^sub*(ω))   =iω(G^0(x,t;+0,ω)-G^0(x,t;-0,ω)). (33)

Taking the inverse Fourier transform,

¯(τ)β(τ)τ+Asub2πρ0β(τ)(Z^sub*(ω)+Z^sup*(ω))eiω(ττ)dωdτ=G0(x,t;+0,τ)τG0(x,t;0,τ)τ. (34)

The Fant equation for the time evolution of glottal volume velocity Q is now derived from Eq. 34. First, it is assumed that the sum of all aeroacoustic source terms, F(τ), can be related to B(x,t) for x in the supra-glottal tract by an equation of the form,

B(x,t)=-β(τ)F(τ)dτ. (35)

Equation 35 is shown true below in a discussion of the aeroacoustic sources in Sec. 2B5. We assert, along with the auxiliary condition limτ-Q¯(τ)=0, that the equation

¯(τ)β(τ)τ+Asub2πρ0β(τ')(Z^*sub(ω)+Z*sup(ω))eiω(ττ')dωdτ'=G0(x,t;+0,τ)τG0(x,t;0,τ)τ (36)

is satisfied by the glottal volume velocity Q(τ), i.e., that Q¯(τ)=Q(τ), provided that Eq. 35 holds. Note that the operator on the left-hand side of Eq. 36 is the adjoint to the operator on the left-hand side of Eq. 34.

To prove that Q¯=Q, multiply Eq. 34 by Q¯(τ), Eq. 36 by β(τ), add the results and integrate with respect to τ from - to . With Q¯(τ)0 as τ- and β(τ)0 as τ, the symmetry properties of Z^sup(ω) and Z^sub(ω), and noting that G0(x,t; -0,τ) = 0 for x in the supra-glottal vocal tract results in

-G0(x,t;+0,τ)τQ¯(τ)dτ=--β(τ)F(τ)dτ, (37)

where the symmetry properties of Z^sub(ω) and Z^sup(ω) noted after Eq. 32 have been used. Multiplying both sides of Eq. 37 by Asup, taking the derivative with respect to x1, and the limit as x1 → +0 gives

limx1+0AsupG0(x,t;+0,τ)x1Q¯(τ)τdr=limx1+0AsupB(x,t)x1 (38)

where integration by parts has been performed and Eq. 35 has been invoked. As before, let φ be the velocity potential in a neighborhood of Ssup and use the definition of stagnation enthalpy, B (namely, B=-φ/t in a region without vorticity). The right-hand side of Eq. 38 is

limx1+0AsupB(x,t)x1=limx1+0Asup2φx1t=-Q(t)t, (39)

because G0 satisfies Eq. 4 and is symmetric in the spatial variables (Morse and Feshbach, 1953)

limx1+0AsupG0(x,t;+0,τ)x1=-δ(t-τ). (40)

It follows that the left-hand side of Eq. 38 is -Q¯(t)t. Thus, with the condition limτ-Q(τ)=0,Q¯=Q, if Eq. 35 is true. Equation 36 is the desired Fant equation under the same condition.

Aeroacoustic sources

In order to show that Eq. 35 is true, it is necessary to examine Eq. 7. Initially the glottis is closed and the lungs are presumed to contract slowly, which means that lungs are a source of total enthalpy Bsub(y, τ) in the sub-glottal tract. As discussed previously in Sec. 2A, the active lung contraction occurs outside of the control volume V(τ), so that the effect of this source is taken into account by the flow of air passing through the surface SL inside the bronchial tubes, and the first term in Eq. 7 provides the mathematical expression for this source.

From Eq. 7 the enthalpy just below the glottis due to the lung source is

Bsub(0,t)=[SLG(0,t;y,τ)v(y,τ)τS(y)vSsubwallG(0,t;y,τ)yωdS(y)]dτVsubG(0,t;y,τ)yωvd3ydτ=[SLG(-0,t;y,τ)τv(y,τ)S(y)+vSsubwallG(0,t;y,τ)yωS(y)]dτVsubG(0,t;y,τ)yωvd3ydτ, (41)

where Vsub is the portion of the control volume in the sub-glottal tract. Thus the total enthalpy just below the glottis increases due to the first term in the integral, but it is reduced from what it otherwise would be at the glottis by viscous loss in the bronchial tubes and by vortex shedding at any sharp edges in the bronchial tree. The latter effect should be small compared to the viscous losses when the flow is slow enough so that the Reynolds numbers are small. Define psub(τ)=ρ0Bsub(-0,τ), which is approximately the slowly increasing pressure just below the glottis before it opens. When psub(τ = 0) is attained the vocal folds begin to vibrate, and it is assumed that the lungs continue to supply the same level of total enthalpy below the glottis.

When the glottis opens, the B at a field point x in the supra-glottal tract due to the sub-glottal enthalpy is

Bsub(x,t)=1ρ0-Ssuppsub(τ=0)G(x,t;0,τ)y·S(y)dτ=Asubρ0-psub(τ=0)β(τ)dτ, (42)

where continuity of Bsub(−0, τ) from just below the glottis to just above the acoustically compact glottis is assumed and Eq. 15 has been used for the second equality. Define Fsub(τ)=Asubρ0psub(τ)H(τ), and then Eq. 42 can be written

Bsub(x,t)=-Fsub(τ)β(τ)dτ, (43)

where x is in the supra-glottal tract and t>τ.

In this development, the volume changes in the vocal folds, as represented by the first term on the right-hand side in Eq. 7, are unknown and not considered here. They can be added in a later development when they become known. The viscous term, the third term on the right-hand side of Eq. 7, is small compared to the volume integral of the vorticity interaction term in the glottis, which is considered next, except when the glottis is nearly closed.

The volume integral on the right-hand side of Eq. 7 is a dipole source, and its main contribution is from the jet within distance Ag(τ) downstream of the glottis. Its strength is modulated by the variations in the glottis area Ag(τ) produced by fold vibration (Howe and McGowan, 2007, 2011a). The field, Bω, resulting from the vortex source is written,

Bω(x,t)=v(τ)(G(x,y,tτ)y)ωv(y,τ)d3ydτv(τ)β(τ)Y(y,τ)yωv(y,τ)d3ydτ, (44)

where Eq. 8 has been used. Define

F(τ)=Fsub(τ)-v(τ)Y(y,τ)y·ωv(y,τ)d3y. (45)

With Eqs. 4245, Eq. 7 reduces to the simple form of Eq. 35 for x in the supra-glottal vocal tract and t > τ > 0. Thus the Fant equation, Eq. 36, is true.

Final form of the Fant equation

The dependence on Q in Eq. 36 is made more explicit when the vortex sound source term, the second term on the right-hand side of Eq. 45 is evaluated in terms of Q(τ). This has been done previously in Howe and McGowan (2007) under the assumptions that glottal jet velocity, Uσ(τ), does not vary substantially over a distance of glottal length scale Ag(τ), the glottal area is small compared to the supra-glottal tract area, Ag(τ)Asup, and that the Strouhal number f0Ag(τ)/Uσ(τ) is small, where f0 is the fundamental frequency of voicing.

V(τ)Y(y,τ)y·ωv(y,τ)d3yAsupUσ2(τ)2=Asup2Ag2(τ)Q2(τ)σ2(τ), (46)

where σ(t) is the jet contraction ratio. With Eqs. 45, 46, Eq. 36, can be written

¯(τ)Q(τ)τ+Asup2Ag2(τ)Q2(τ)σ2(τ)Asub2πρ0Q(τ')(Z^sub(ω)+Zsup(ω))eiω(ττ')dωdτ'=Asubpsubρ0 (47)

for τ > 0. Equation 47 is the desired Fant equation.

Low frequency impedance models

In order to examine the effects of the sub-glottal and supra-glottal tracts, the input impedances need to be specified. The functional forms of the input impedances Z^sub(ω) and Z^sup(ω) are now examined for a range of frequencies low enough that only plane wave propagation can occur in the sub- and supra-glottal vocal tracts. The case where loss is neglected is examined first, and then the effects of losses at solid surfaces and due to radiation into the atmosphere are accounted for as a perturbation to these results. Note that while loss caused by changes in entropy have not been included as part of the derivation thus far, their effect on propagation are small so that the Fant equation is a valid approximation, even when heat conduction losses at vocal tract walls are included as an empirical adjustment to the bandwidths of the input impedances, as occurs below.

The input impedances for the sub- and supra-glottal vocal tracts are examined in the case that the equation of motion in the frequency domain is described by the Fourier transform of Eq. 2, but with B^(x,ω)xn=0 on solid surfaces. Because only plane wave propagation is assumed to occur, we can write B^=B^(x1,ω). The boundary conditions at the ends of the sub- and supra-glottal tracts are now considered. For the supra-glottal vocal tract of length L¯sup=L¯sup(ω) including a frequency-dependent end correction (Howe and McGowan, 2011b), these conditions are

dB^(+0,ω)dx1=0   and   B^(L¯sup,ω)=0. (48)

For the sub-glottal tract the simplified model of Ishizaka et al. (1976) is used for low frequency sub-glottal propagation (see Fig. 6 of Ishizaka et al., 1976). The distance from the glottis to a large expansion in bronchial area is about 20 cm for male adults and is denoted L¯sub, which also includes a frequency dependent end correction. The boundary conditions in the sub-glottal tract are similar to those of the supra-glottal tract,

dB^(-0,ω)dx1=0   and   B(-L¯sub,ω)=0. (49)

In the Appendix two models are considered for low-frequency propagation: (1) Sturm-Liouville problems, and (2) concatenated cylindrical tubes, which is commonly used to compute speech acoustics when cross-sectional vocal tract areas are specified. In both cases it is shown that the infinity of poles and zeros of the input impedance are simple and real, and they interleave one another. From these facts, it is possible to write them explicitly as products of factors (Whittaker and Watson, 1927). For the supra-glottal input impedance

Z^sup(ω)=-iωρ0Λ(ω)2πAsupΠn=1((1-ω/ψn)(1+ω/ψn)(1-ω/ωn)(1+ω/ωn)), (50)

where ωn, ψn > 0, and ωn<ψn<ωn+1.Λ(ω)=L¯sup is the length scale that is necessary in order that this input impedance agrees with the case of a tube of constant cross-sectional area of length L¯sup in the Appendix of Howe and McGowan (2011b). The symmetry in the products about ω = 0 is because the inverse Fourier transforms of the purely imaginary Z^sup(ω) is real for real ω. A similar expression defines the input impedance for the sub-glottal tract. In this form, the input impedances are defined on the complex ω plane with poles and zeros on the real axis.

Now that the behavior of the input impedances is understood for the lossless case, loss is included as a perturbation. This has the effect of moving the zeros and poles into the lower half of the complex ω plane in such a way that the ordering of the real parts of these points is preserved from the lossless case.1 Generalizing to the case where losses are present provides the following functional forms for the input impedances,

Z^sup(ω)=-iωρ0L¯sup2πAsupΠn=1((1-ω/rn)(1+ω/rn*)(1-ω/sn)(1+ω/sn*)), (51)
Z^sub(ω)=-iωρ0L¯sub2πAsubΠn=1((1-ω/Rn)(1+ω/Rn*)(1-ω/Sn)(1+ω/Sn*)), (52)

rn=ψn-iξn,sn=ωn-iγn,Rn=Ψn-iΞn,Sn=Ωn-iΓn. Here Ψn,Ωn,γn,ξn,Γn,Ξn>0, and Ωn<Ψn<Ωn+1.

Because this development is for frequencies for which there is little variation of B^(x1,ω) across the tracts’ cross-sections, this representation is useful only for a finite number of zeros and poles. Further, because we are interested in the interaction with the volume velocity at the glottis, which generally has most of its energy in a frequency range well below the third resonant frequencies of either the sub- or supra-glottal vocal tracts, the above representations will be restricted to the first two pole pairs and the first zero pair. (A pair of points are where the real parts are non-zero, have opposite sign, and equal magnitude.)

The Fant equation, Eq. 47, can now be rewritten in the low frequency approximation. First, the integral with respect to ω in the third term on the left-hand side of Eq. 47 is evaluated. With all the poles in the lower half plane, this integral is zero for τ-τ'<0 because the contour can be closed in the upper-half of the complex frequency plane. For τ-τ'>0 the contour can be closed in the lower half-plane and evaluated by the method of residues. Let

J1=s1(s1-r1)(s1+r1*)C(s1)(s1+s1*)(s1-s2)(s1+s2*),J2=s2(s2-r1)(s2+r1*)C(s2)(s2+s2*)(s2-s1)(s2+s1*),n=Re(Jn),In=Im(Jn), (53)

where C(ω) is correction factor for higher order zeros and poles. To a first approximation,

C(ω)=1+i2ω(γ3|r2|2-ξ2|s3|2). (54)

With these definitions and fixing the frequency dependent length, L¯sup,

Z^sup(ω)eiω(ττ')dω=2ρ0L¯supAsupH(ττ')n=12Aneγn(ττ')cos[ωn(ττ')ϕn] (55)

where

An=(|s1||s2||r1|)2n2+In2   and   φn=arctan(In/n). (56)

In the lossless case this reduces to,

Z^sup(ω)eiω(ττ)dω=2ρ0L¯supAsup(ω1ω2ψ1)2H(ττ)×1(ω22ω12){(ψ12ω12)cosω1(ττ)+(ω22ψ12)cosω2(ττ)}. (57)

Amplitudes Bn and phases Φn can be defined for the sub-glottal tract in an analogous way to those defined in the supra-glottal tract by Eq. 56. In this low frequency approximation the Fant equation [Eq. 47] is written,

¯(τ)Q(τ)τ+Asup2Ag2(τ)Q2(τ)σ2(τ)AsubH(ττ)Q(τ)×{2L¯supAsupn=12Aneγn(ττ)cos[ωn(ττ)ϕn]+2L¯subAsubn=12BneΓn(ττ)cos[Ωn(ττ)Φn]}dτ=Asubpsubρ0 (58)

for τ>0.

NUMERICAL SIMULATIONS

Definitions

The Fant equation in the low frequency approximation of Eq. 58 is solved numerically, following the procedure introduced in Howe and McGowan (2011b). Define,

K2n-1(τ)=-τQ(τ')e-γn(τ-τ')cos[ωn(τ-τ')-φn]dτ',K2n(τ)=-τQ(τ')e-γn(τ-τ')sin[ωn(τ-τ')-φn]dτ' (59)

and

J2n-1(τ)=-tQ(τ')e-Γn(τ-τ')cos[Ωn(τ-τ')-Φn]dτ',J2n(τ)=-tQ(τ')e-Γn(τ-τ')sin[Ωn(τ-τ')-Φn]dτ' (60)

for n = 1, 2. With these definitions, the integro-differential equation is replaced by a system of first-order differential equations,

dQdτ=[Asubpsubρ0-Asup2Ag2(τ)Q2(τ)σ2(t)      -Asubn=12(2L¯supAsupAnK2n-1(τ)      +2L¯subAsubBnJ2n-1(τ))]/¯(τ),dK2n-1dτ=Q-ωnK2n-γnK2n-1,dK2ndτ=ωnK2n-1-γnK2n,dJ2n-1dτ=Q-ΩnJ2n-ΓnJ2n-1,dJ2ndτ=ΩnJ2n-1-ΓnJ2n, (61)

with initial conditions Q = K2n−1 = K2n = J2n−1 = J2n = 0 for τ = 0 and n = 1, 2.

The glottis is assumed to be of length L in the x1 direction, to have a rectangular cross-sections orthogonal to the x1 direction of constant spanwise dimension d (into the paper in Fig. 1). The width, w=w(x1,τ), (the vertical direction in Fig. 1) is time dependent so that the minimum glottal area Ag(τ) varies in time,

Ag(τ)Asub=a0+a1[1-cos(ω0τ)], (62)

f0=ω0/2π is the voice fundamental frequency. The potential Y(y, τ) within the glottis and Rayleigh length ¯(τ) introduced in Eq. 10 can be approximated as,

Y(y,τ)Asuby1Ag(τ)-¯(τ)2,   -L/2<y1<L/2,            where¯(τ)AsubLAg(τ). (63)

Numerical results

The numerical simulations were performed with the following fixed parameter values,

Asup=Asub=2×10-4m2,   L¯sup=0.17m,L¯sub=0.2m,   L=0.005m,
c0=345m/s,   ρ0=1.2kg/m3,   psub=1600Pa,a0=0.001,   a1=0.05.

The sub-glottal properties are set according to Ishizaka et al. (1976). The first three sub-glottal resonances, or formants, are at 640 Hz, 1400 Hz, and 2100 Hz, with approximate bandwidths of 256 Hz, 156 Hz, and 175 Hz, respectively.2 The zeros were assumed to be midway between the surrounding poles, with bandwidths derived from the same plot of Q-factors as used to derive the bandwidths of the resonances (Fig. 4 in Ishizaka et al., 1976). (Bandwidth is the width of the resonance peak 3 dB down from its power maximum.) This gives the first two zeros as 1020 Hz and 1750 Hz with bandwidths of 206 Hz and 165 Hz, respectively. The first simulation is for the case where there is no supra-glottal resonance, so that Z^sup=ρ0c0/Asup for a semi-infinite tube. The results of these simulations are shown in Fig. 2 for f0 = 100, 280, 400, and 800 Hz in terms of normalized glottal volume velocity, Qnorm(t)=(Q(t)ρ0c0)/(Asubpsub). The horizontal axis is time normalized by f0, which gives the number of glottal cycles. The dashed lines are the volume velocity that results when the sub-glottal tract is replaced by a semi-infinite tube, with Z^sub=ρ0c0/Asub.

Figure 2.

Figure 2

Normalized glottal volume velocity, Qnorm=(Q(t)ρ0c0)/(psubAsub), versus normalized time, f0t, at four different fundamental frequencies. The supra-glottal impedance is that for a semi-infinite tube, (ρ0c0)/Asup. The dark curves have sub-glottal input impedance specified in the text according to Ishizaka et al. (1976). Sub-glottal formant frequencies, with bandwidths in brackets are 640 Hz (256 Hz), 1400 Hz (156 Hz), and 2100 Hz (175 Hz). The dashed curve for comparison is when the sub-glottal tract is a semi-infinite tube with input impedance (ρ0c0)/Asub. (a) f0 = 100 Hz, (b) f0 =280 Hz, (c) f0 = 400 Hz, and (d) f0 = 800 Hz.

The amplitude of the glottal volume velocity is slightly enhanced by the presence of the sub-glottal resonances for all the fundamental frequencies f0. The enhancement is largest for the lowest fundamental frequency, f0 = 100 Hz. There is evidence of rightward skewing of the glottal pulse for f0 = 280 Hz, which increases at f0 = 400 Hz. As predicted by Rothenberg (1981), and confirmed by Ananthapadmanabha and Fant (1982) and Titze (2008), as the sub-glottal input impedance becomes more “inductive” (i.e., with a larger positive imaginary part) for the higher fundamental frequency rightward, skewing of the glottal pulse becomes greater. Note that the relatively large bandwidth of the first sub-glottal resonance means that this effect is seen well-below its first resonance frequency. At f0 = 800 Hz the effect of being above the first resonance, yet still well below the first zero and higher poles of the input impedance, is seen to skew the volume velocity pulse to the left. This is what occurs with a “capacitive” load on the glottis (Titze, 2008).

The effects of supra-glottal resonances are now considered. Possible resonance frequencies, formants, appropriate for an adult male American English production of /i/ are taken to be 280 Hz, 2250 Hz, and 2750 Hz (Olive et al., 1993). Bandwidths are estimated by interpolation of the data presented in Stevens (1998) from Fant (1962) and Fujimura and Lindqvist (1970) to be 65 Hz, 98 Hz, and 113 Hz, respectively. The first two zeros of the supra-glottal input impedance are set to be located at the mean frequency of the surrounding resonance at 1265 Hz and 2500 Hz with bandwidths 69 Hz and 105 Hz, respectively. The resulting Qnorm(t) is shown in Fig. 3 for f0 = 100, 260, 275, and 290 Hz. The dashed curve for comparison is the condition (base condition) discussed above with the sub-glottal resonances alone. The amplitude of the volume velocity pulse is enhanced substantially above the base condition for f0 = 100 Hz, but there is no change in the skewing of the pulse. The enhancement of the glottal pulsing is diminished when f0 changes from 100 Hz to frequencies in the neighborhood of the first supra-glottal formant. Note also that steady-state takes two or three pulses to become established with the fundamental frequency close to the first formant frequency. There is a slight increase in rightward skewing of the glottal volume velocity pulse at f0 = 260 Hz and 275 Hz compared to 100 Hz. Also there is leftward skewing at f0 = 290 Hz. These show that the supra-glottal input impedances supply “inductive” and “capacitive” perturbations to the sub-glottal input impedances for this low first supra-glottal first formant.

Figure 3.

Figure 3

Normalized glottal volume velocity, Qnorm = (Q(t)ρ0c0)/(psubAsub), versus normalized time, f0t at four different fundamental frequencies. Sub-glottal resonances are as specified for the dark curve in Fig. 2. The dark curves are for supra-glottal formants appropriate for /i/, which are, with bandwidths in brackets, 280 Hz (65 Hz), 2250 Hz (98 Hz), and 2750 Hz (113 Hz). The dashed curves are when supra-glottal impedance is (ρ0c0)/Asup for a semi-infinite tube. (a) f0 = 100 Hz, (b) f0 =260 Hz, (c) f0 = 275 Hz, and (d) f0 = 290 Hz.

Howe and McGowan (2011b) find a strong effect of source-tract interaction when the fundamental frequency is very close to a Helmholtz resonance frequency (see Fig. 5 in Howe and McGowan, 2011b). This interaction has the effect of reducing the peak flow to the degree that there are two local maxima and a local minimum in a single area function pulse. Such behavior could not be induced here for /i/, but there is evidence of a change in curvature near the peak of the glottal pulse for fundamental frequencies close to the formant frequency. We suggest that the propagation losses in the present system are larger than in Howe and McGowan (2011b), so that this type of interaction does not appear for low formant frequencies.

While the first supra-glottal formant frequency of /i/ is low, and, thus, strong source-tract interaction might be expected with this vowel, its second supra-glottal formant frequency is high. What happens when both the first and second formant frequencies are relatively low, as for American English /r/? The first three formants for this phoneme in word initial position can be as low as 250 Hz, 700 Hz, and 1400 Hz for an adult male American English speaker according to Olive et al. (1993). The bandwidths are 69 Hz, 54 Hz, and 73 Hz, respectively, as computed from interpolated data. The interspersed zeros of the supra-glottal input impedance are 475 Hz and 1050 Hz with bandwidths of 51 Hz and 62 Hz, respectively. Figure 4 shows the results for f0 = 100, 240, 245, and 260 Hz, The results are not qualitatively different from the results for the vowel /i/ shown in Figs. 3, except that the pulses may have higher amplitude for /r/. The reason for the similarity between the results appears to be the existence of the zero of the input impedance between the formants, which tends to cancel the effect of the second formant moving closer to the first formant. This cancellation can be easily be seen in the lossless case with reference to Eq. 57. The denominator of the interaction term increases when the first and second formants move closer together, but the magnitudes of the coefficients of the two terms in brackets decrease, because the zero frequency is between the two formant frequencies.

Figure 4.

Figure 4

Normalized glottal volume velocity, Qnorm = (Q(t)ρ0c0)/(psubAsub), versus normalized time, f0t at four different fundamental frequencies. Sub-glottal resonances are as specified for the dark curve in Fig. 2. The dark curves are for supra-glottal formants appropriate for /r/, which are, with bandwidths in brackets, 250 Hz (69 Hz), 700 Hz (54 Hz), and 1400 Hz (73 Hz). The dashed curves are when supra-glottal impedance is (ρ0c0)/Asup for a semi-infinite tube. (a) f0 = 100 Hz, (b) f0  =240 Hz, (c) f0 = 245 Hz, and (d) f0 = 260 Hz.

The cases of /i/ and /r/ would apparently confirm the observation of Ananthapadmanabha and Fant (1982) that only one formant from each the sub-glottal tract and the supra-glottal tract is necessary to capture the main effects of source-tract interaction, at least for conversational fundamental frequencies. However, we have assumed that the zeros of the input impedances are placed equidistant between neighboring poles. This is not necessarily the case, and in fact important phenomena can be missed if this assumption is taken to always be the case. Titze and Story (1997) have shown that a narrow epilarynx tube has the effect of attracting the second, third, and fourth formants toward one another. Also, an examination of their Fig. 4 indicates that the zeros of the input impedance are adjusted in such away that the magnitude of the input impedance is greatly enhanced in the frequency region of these formants. This would have the effect of making the glottal volume flow’s interaction with the higher supra-laryngeal formants much more important than shown here. In particular one could expect ripples to appear in the glottal volume velocity waveform when appropriate modifications are made to the input impedance (Titze, 2008).

The far field acoustic pressure can be considered within the present context. The transfer function between the volume velocity at the glottis, Q^(ω), and the volume velocity at the lips, Q^(ω), is (Fant, 1960)

Q^(ω)Q^(ω)Πn=13(1(1-ω/sn)(1+ω/sn*)), (64)

where higher-order poles and radiation through the vocal tract walls are neglected. In the far field at a distance r from the lips, the acoustic pressure fluctuations, pf, is given by

pf(r,t)=ρ04πrQ(t-r/c0)t, (65)

where scattering by the body has been neglected and the area surrounded by the lips is considered a point source for the frequencies under consideration ( < 1 kHz). Far field pressure can be computed as a function of time from the solution of the Fant equation, Q(t), by employing Eqs. 64, 65.

Figures 56 show normalized normalized peak far field pressure, pfmax/psub at a distance r = 4 m from the lips and peak glottal volume velocity (Qmaxρ0c0)/(psubAsub) as functions of fundamental frequency, f0, for the vowel /i/ discussed in Fig. 3 and the /r/ discussed in Fig. 4. Both normalized peak quantities are plotted on decibel scales. In both cases there are relatively small decreases in peak glottal volume velocity at either sub-glottal or supra-glottal formants. In particular, two local minima in the peak glottal volume velocity can be noted in the frequency neighborhood of the first sub-glottal formant at 640 Hz and the second supra-glottal formant nearby at 700 Hz for /r/. On the other hand the filtering properties of the vocal tract makes peak far field pressures increase at the supra-glottal formants, as seen by Titze (2008) and in Howe and McGowan (2011b). The normalized peak far field pressure can be considered a measure of voice efficiency (Howe and McGowan, 2011b).

Figure 5.

Figure 5

Peak amplitudes for /i/ versus fundamental frequency, f0 at a distance r = 400 cm from the lips. (a) Normalized far field pressure, 10log10(pfmax/psub) versus f0, and (b) normalized volume velocity 10log10(Qmaxρ0c0/psubAsub) versus f0.

Figure 6.

Figure 6

Peak amplitudes for /r/ versus fundamental frequency, f0 at a distance r = 400 cm from the lips. (a) Normalized far field pressure, 10log10(pfmax/psub) versus f0, and (b) normalized volume velocity 10log10(Qmaxρ0c0/psubAsub) versus f0.

A final example is provided by the low back vowel /a/ with its relatively high frequency first formant and low second formant frequency. The formant frequencies with their bandwidths are, 750 Hz (bandwidth = 54 Hz), 1100 Hz (bandwidth = 64 Hz), and 2600 Hz (bandwidth = 108 Hz) (Olive et al., 1993). The first two zeros are 925 Hz (bandwidth = 59 Hz) and 1850 (bandwidth = 86 Hz). The glottal volume velocity waveforms are shown in Fig. 7 for f0 = 100, 650, 745, and 800 Hz. While the final three fundamental frequencies are very high, they show a great amount of source tract interaction in the form of two local maxima per glottal pulse for at least a 150 Hz range in fundamental frequency. Part of the reason for this could be the proximity of the first supra-glottal formant (750 Hz) to the first sub-glottal formant (640 Hz). Despite the complexity of the waveforms there is evidence of extra rightward pulse skewing for f0 = 650 Hz and some leftward skewing from the base condition for f0 = 745 and 800 Hz, after a couple of glottal pulses. f0 = 745 Hz is apparently above the actual first supra-glottal formant frequency when losses are taken into account. That is damping decreases the actual formant frequency from the nominal specification of 750 Hz to one less than 745 Hz (see footnote 2). The complexity of the waveform seems to counteract any maxima in the far field peak pressure as a function of f0 that could appear around the first supra-glottal formant frequency (Fig. 8).

Figure 7.

Figure 7

Normalized glottal volume velocity, Qnorm = (Q(t)ρ0c0)/(psubAsub), versus normalized time, f0t at four different fundamental frequencies. Sub-glottal resonances are as specified for the dark curve in Fig. 1. The dark curves are for supra-glottal formants appropriate for /a/, which are, with bandwidths in brackets, 750 Hz (54 Hz), 1100 Hz (64 Hz), and 2600 Hz (108 Hz). The dashed curves are when supra-glottal impedance is (ρ0c0)/Asup for a semi-infinite tube. (a) f0 = 100 Hz, (b) f0  =650 Hz, (c) f0 = 745 Hz, and (d) f0 = 800 Hz.

Figure 8.

Figure 8

Peak amplitudes for /a/ versus fundamental frequency, f0 at a distance r = 400 cm from the lips. (a) Normalized far field pressure, 10log10(pfmax/psub) versus f0, and (b) normalized volume velocity 10log10(Qmaxρ0c0/psubAsub) versus f0.

DISCUSSION AND CONCLUSION

A Fant equation has been derived for the vocal tract when the specific form of the Green’s function remains unknown. This is made possible by the mathematical observation in Howe and McGowan (2011b) that the Fant equation is the adjoint of an equation that results from satisfying matching conditions between segments of Green’s functions in the glottal region on one hand and Green’s functions segments in the sub-glottal and supra-glottal regions on the other hand. The derivation requires that the glottal region be acoustically compact and that the governing equations of motion for the air be ones appropriate for low Mach number, homentropic flow, which is the case for speech production. The only other assumption is that the fluid motion be essentially potential and without cross-modes in the vocal tract regions immediately outside the glottal region. The resulting Fant equation, Eq. 47, is written in terms of measurable quantities, namely the sub-glottal and supra-glottal input impedances.

One of the assumptions in the derivation of the Fant equation is that the fluid flow just upstream and downstream of the glottal region is governed by the one-dimensional wave equation, and there are no acoustic cross modes present there. This is in order that the input impedances be well-defined, but it does not preclude the existence of cross-modal propagation beyond the immediate neighborhood of the glottis. It is when specific propagation models are used to find the functional form of the input impedances is it convenient to impose one-dimensional propagation throughout the vocal tract. This is a common approximation in speech science, where there has been no, or very little, attention to the possibility of cross-mode propagation in the vocal tract.

Assuming that low frequency propagation is governed by a Sturm-Liouville problem or by a concatenated cylindrical tube model with one-dimensional propagation, the functional forms of the sub- and supra-glottal input impedances can be derived. These functions of complex frequency are completely specified by their simple poles and zeros. Simulations based on the first two poles, or formants, and estimated zeros show that the lossy sub-glottal system affects the skewing of the glottal pulse over a wide range of frequencies. Adding the formants of the supra-glottal system provide perturbations to the glottal pulse’s shape caused by the sub-glottal system. If zeros of the input-impedance are presumed to fall equidistant in frequency between neighboring poles it appears that the first sub-glottal and supra-glottal formants dominate the behavior of source-tract interaction at conversational fundamental frequencies. However, this assumption does not appear to be realistic when a relatively narrow epilarynx tube is taken into account (Titze and Story, 1997). This tube causes the higher formants to interact with the glottal volume velocity much more strongly and to cause ripples in the pulse. It would be best to use a concatenated tube model, as described in the second part of the Appendix, to derive the input impedance in most cases. These observations are made for source-tract interactions for which the glottal area is specified (level I), and, thus do not represent the full source-tract problem in speech.

The more practical and realistic source-tract interaction results will obtain when the Fant equation, Eq. 47, derived in this work is applied to the full problem that includes the fluid-structure interaction between the air in the glottis and the vocal folds. This is known as a level II model (Titze, 2008). The past few years has seen theoretical, numerical simulation, and experimental work in the area of level II source-tract interaction (e.g., Švec et al., 2008; Titze, 2008; Titze et al., 2008; Titze and Morely Worely, 2009; Tokuda et al., 2007; Zañartu et al., 2011). These studies show that source-tract interaction is an important factor when considering voice register changes in response to changes in laryngeal settings. These changes are not just gradual changes in volume velocity pulse skewing with the level I interactions examined here, but bifurcations with discontinuous jumps in fundamental frequency and vocal fold vibration mode.

To proceed to the full level II situation, a model of fluid-structure interaction in the glottal region would be included, such as the body-cover model of Story and Titze (1995). The Fant equation, Eq. 47, would be solved simultaneously for the volume flow Q(t) with the fluid-structure interaction model, because the Fant equation requires knowledge of the glottal area Ag(t) and the fluid-structure interaction model requires knowledge of Q(t). (Note that the time domain pressures upstream and downstream of the glottis can be obtained by convolving the volume velocity with the inverse Fourier transforms of the sub-glottal and supra-glottal impedances, respectively.)

The present work has concentrated on obtaining a Fant equation that can be used for both level I and level II source-tract interaction problems. Before applying this equation to level II problems, more research directed towards understanding the details of glottal fluid flow and the resulting fluid-structure interaction needs to be done. So far, recent research has indicated a role for suction, or Coanda, forces created with flow that changes direction (Howe and McGowan, 2009; McGowan and Howe, 2010). The behavior of the separation point during the glottal cycle is also an important consideration (e.g., Pelorson et al., 1994; Howe and McGowan, 2009). An effect that brings asymmetry to the vocal fold oscillation problem involves both Coanda forces and flow separation, the Coanda effect, also needs to be included eventually (Erath et al., 2011). Progress in that research will permit a consideration of full source-tract interaction in the future.

ACKNOWLEDGMENT

This work was supported by a subaward of Grant No. 1R01 DC009229 from the National Institute on Deafness and other Communication Disorders to the University of California, Los Angeles. We thank two anonymous reviewers for helping to improve this paper.

NOMENCLATURE

a0, a1

glottal area coefficients, Eq. 62

An

cross-sectional areas

Ag

glottis cross-sectional area

Asub

tract area upstream of glottis

Asup

tract area downstream of glottis

An,Bn

amplitudes, Eq. 56

B,B^

fluctuating total enthalpy

B,B^

fluctuating total enthalpy

Bsub

total enthalpy due to lung

Bω

total entropy due to vorticity

co

speed of sound

C

correction factor, Eq. 54

f0

vocal fold frequency

Fn

recursive function, Eq. A9

F

aeroacoustic source function

Fsub

sub-glottal source function

g^0

G^0 phase factor removed

G

Green’s function, Eq. 17

G0,G^0

G for homogeneous boundary conditions

GM,G^M

G for inhomogeneous boundary conditions

G

Sturm-Liouville Green’s function

h^0

adjoint of g^0

H,K

coefficients, Eq. A1

In

constants, Eq. 53

Jn,Kn

integration functions, Eqs. 59, 60

Jn

constants, Eq. 53

K

order 1 constant, Eq. 1

l

cylindrical tube section length

¯(τ)

glottal end correction, Eq. 10

L¯sup

supra-glottal length

L¯sub

sub-glottal length

L

glottal inductance, Eq. 1

p

pressure

pf

far-field fluctuating pressure

pfmax

maximum pf

psub

pressure upstream of the glottis

Q,Q^

glottal volume velocity

Q¯

dummy variable, Eq. 36

Q,Q^

volume velocity at the lips

Qnorm

normalized Q

Qmax

maximum Qnorm

r

distance from lips to far field point

rn

ψn-iξn zeros of Z^sup

Rn

Ψn-iΞn zeros of Z^sub

n

constants, Eq. 53

R

coefficient for viscous loss, Eq. 1

S

control surface

SL

S through air of bronchial tubes

sn

ωn-iγn poles of Z^sup

Sn

Ωn-iΓn poles of Z^sup

Sg

glottal region surface

Ssub

control surface orthogonal to the tract axis below the glottis

Ssubwall

S along solid sub-glottal walls

Ssup

control surface orthogonal to the tract axis above the glottis

Ssupwall

S along solid supra-glottal walls

t

time

u^;υ^

Sturm-Liouville solutions

Uσ

glottal jet velocity

v

magnitude of v

v

velocity

n

normal velocity to surfaces

V

control volume

W

Wronskian

x

(x1,x2,x3) Cartesian coordinates

y

(y1,y2,y.3) Cartesian coordinates

yM

matching region

yn

normal to surfaces

Y

Kirchhoff vector, Eqs. 8, 9

Znin

input impedance

Zn

tube impedance

Z^sup

supra-glottal input impedance

Z^sub

sub-glottal input impedance

Zw+l

boundary condition operator, Eq. 3

α,α^

Green’s function coefficients, Eq. 8

β,β^

Green’s function coefficients, Eq. 8

κ

Sturm-Liouville parameter

λ

eigenvalues for Sturm-Liouville

λn

parameter, Eq. A1

Λ

length scale, Eq. 50

ρ

air density

ρ0

mean ρ

σ

jet contraction ratio

τ

time

θ

independent variable

φ,φ^

velocity potential

φn

phase, Eq. 56

ω

radian frequency

ω

vorticity, curl v

APPENDIX

The functional forms of Z^sup(ω) and Z^sub(ω) are examined in models of lossless plane wave propagation. One model is provided by Sturm-Liouville problems, and the other is provided by concatenated cylindrical tubes. The characterizations of the functional form of the input impedance will be derived for the supra-glottal vocal tract, as the derivations for the sub-glottal tract are completely analogous.

Sturm-Liouville problems

First the case when the plane wave propagation in the sub- or supra-glottal vocal tract can be described by a lossless Sturm-Liouville equation with homogeneous boundary conditions (a Sturm-Liouville problem) is considered. An example of such a lossless equation is the Webster horn equation, which is valid when cross-sectional areas do not change with axial position y1 too rapidly in relation to the ratio of the speed of sound, c0, to characteristic frequency (Pierce, 1989). However, this equation does not exhaust the possible ways that the low-frequency propagation problem could be described by a Sturm-Liouville equation. It is assumed that, in the absence of losses and sources in the sub- and supra-glottal tracts, that the Fourier transform of Eq. 2 can be written in the form

ddx1(K(x1)dB^(x1,ω)dx1)-(H(x1)-λ)B^(x1,ω)=0, (A1)

where λ=ω2/c02,K(x1)>0 and H(x1) are real-valued functions (Ince, 1956). Coupling Eq. A1 with Eq. 48 constitutes a Sturm-Liouville problem. The minimal conditions that permit a second-order differential equation to be written in this self-adjoint operator form in the sub- and supra-glottal regions, such as the continuity of K(x1)dx1 and H(x1) have been presumed (Ince, 1956). Because of the strictly monotonic relation between λ and ω for ω > 0 a function of one of these variables can be considered a function of the other.

We draw on the classical theory of Sturm-Liouville problems and their Green’s functions to characterize the functional form of Z^sup(ω) (Courant and Hilbert, 1953; Ince, 1956; Morse and Feshbach, 1953). The Green’s function, which is self-adjoint, G(x1,y1,λ) satisfies

[ddx1(K(x1)ddx1)-(H(x1)-λ)]G(x1,y1,λ)   =-δ(x1-y1),dG(+0,y1,λ)dx1=dG(x1,+0,λ)dy1=0,   andG(L¯sup,y1,λ)=G(x1,L¯sup,λ)=0. (A2)

It can be shown that G(x1,y1,λ) with y1>x1>=+0 can be written (Courant and Hilbert, 1953),

G(x,y,λ)=-H(y1-x1)u^(x1,λ)ν^(y1,λ)W(λ),   withW(λ)=K(x1)[u^(x1,λ)ν^(x1,λ)x1-u^(x1,λ)x1ν^(x1,λ)]. (A3)

Both u^ and ν^ are solutions to Eq. A1, and each satisfies one of the boundary conditions in Eq. 48: u^(+0,λ)x1=ν^(L¯sup,λ)=0. The Wronskian, W(λ), is not a function of the spatial variable x1, and it will be zero at certain values of λ, which is when functions u^ and ν^ are linearly dependent and satisfy both boundary conditions. In this case, the particular value of λ is an eigenvalue and these functions are the same eigenfunction (to within a constant multiple).

If x1 is set to +0 in Eq. A3, then in the limit as y1 → +0,

limy1+0G(+0,y1,λ)=(-1K(+0))ν^(+0,λ)ν^(+0,λ)x1. (A4)

With Eqs. A2, A4, the supra-glottal input impedance defined completely analogous with Eq. 29 becomes

Z^sup(ω)iωρ0limy1+0G(+0,y1,λ)=-iωρ0K(+0)ν^(+0,λ)ν^(+0,λ)x1. (A5)

The zeros of Z^sup(ω) correspond to the zeros of ν^(+0,λ) and the poles of Z^sup(ω) to the zeros of ν^(+0,λ)x1.

The facts known about the ν^(x1,λ) are usually stated in terms of ν^(x1,λ) as a function of x1 at λ eigenvalues. First, for Sturm-Liouville problems there is an infinite sequence of positive eigenvalues λ0, λ1,… (without a finite limit point) i.e. those values λn for which ν^(+0,λn)x1=0, so that the ν^(x1,λn) are eigenfunctions. For each eigenvalue, λn all the zeros of ν^(x1,λn) and ν^(+0,λn)x1 are simple and real. Further the zeros of ν^(x1,λn) interleave those of ν^(+0,λn)x1, and there are exactly n zeros of ν^(x1,λn) for 0<x1<L¯sup.

These properties are also true of ν^(+0,λ) and ν^(+0,λn)x1, considered as functions of λ. The fact that this is true follows from the following relationship between x1 and λ for Sturm-Liouville problems. Let κ(λ')=λ'-H(+0)K(+0), with λ'>0 and λ'-H(+0)>0. Let θ=κ(x1-L¯sup), so that θ=θ(x1,κ). It can be shown that solutions to d2V(θ)dθ2+V(θ)=0 and its first and second order derivatives can be made arbitrarily close to ν^(x1,λ) and its corresponding derivatives for x1 sufficiently close to +0 and λ sufficiently close to λ'. dθ=θ(+0,λ)x1x1+θ(+0,λ)λλ,dθ=0 requires that

λ(θ,x1)x1|x1=+0=-θ(+0,λ)x1θ(+0,λ)λ>0, (A6)

with the inequality following from the definition of θ. Therefore, λ is a strictly monotonic function of x1 for constant phase θ. So the pattern of maxima, minima, and zeros of ν^(x1,λ) as a function of x1 for 0<x1<L¯sup discovered in the classical theory of Sturm-Liouville problems is the result of the same pattern found passing from negative x1 to positive x1 through x1 = +0 as λ increases: Namely, interleaving simple, real zeros of ν^(+0,λ) and ν^(+0,λ)x1, as λ increases from 0 to . Therefore, Z^sup(ω) possesses an infinity of interleaving zeros and poles as ω increase from 0 to .

Concatenated cylindrical tubes

The low frequency approximation to the supra-glottal input impedance in the case that the supra-glottal vocal tract is approximated by N sections of cylindrical tubes of equal length, l, so that L¯sup=Nl is now considered. The tubes are numbered with the tube that ends at the lips the first tube. The nth tube possesses cross-sectional area An, and let Zn=ρ0c0An. Continuity of pressure and volume velocity at the junctions between the tubes is assumed. With these continuity assumptions, the input impedance looking toward the lips from the end of the nth tube away from the lips is (Lighthill, 1978)

Znin=ZnZn-1in+iZntan(θ)Zn+iZn-1intan(θ), (A7)

where θ=ωl/c0 and the impedance at the lips can be denoted Z0in, with Z0in=0. Thus,

Z1in=iZ1tanθ. (A8)

Here it is claimed that for all n, N ≥ n ≥ 1,

Znin=iZnFn(θ), (A9)

where the poles and zeros of real valued Fn(θ) are real and simple. Also, dFn(θ)dθ>0, except at poles of Fn(θ), which is equivalent to requiring that neighboring poles be separated by a zero, and that neighboring zeros be separated by poles, when the poles and zeros are real and simple. Further, Fn() = 0 for all integers m.

That such Fn(θ) exist for all n, N ≥ n ≥ 1 can be shown by induction. F1(θ) = tan(θ) satisfies Eq. A9 with all the required properties following that equation. Suppose that Fn−1(θ) exists that satisfies Eq. A9 and all the properties following that equation. From the induction hypothesis and Eq. A7

Znin=iZnZn-1Fn-1(θ)+Zntan(θ)Zn-Zn-1Fn-1(θ)tan(θ). (A10)

If Fn(θ) is identified with the real-valued fraction in Eq. A10, then Eq. A9 is satisfied and Fn() = 0 for all integer m. The poles for Fn(θ) are the θ for which,

cot(θ)=Zn-1ZnFn-1(θ), (A11)

and the zeros of Fn(θ) are the θ for which,

-tan(θ)=Zn-1ZnFn-1(θ). (A12)

Both cot(θ) and −tan(θ) are strictly monotone decreasing functions of θ that do not intersect for any θ. Under the induction hypothesis, Fn−1(θ) is a strictly monotone increasing function. Thus the poles and zeros of Fn(θ) are real and simple. [Note that there is the possibility that either Eq. A11, A12 could be satisfied for values of θ for which the functions attain either ±.] Calculus shows that dFn(θ)dθ>0 for values of θ that are not poles of Fn(θ). Thus the countable infinity of poles and zeros of Znin are real and simple, and they separate one another for all n, N ≥ n ≥ 1. In particular this is true for Z^sup(ω)=ZNin.

Footnotes

1

When wall distensibility is accounted for in realistic boundary conditions, as they would be in Eqs. 3, 5, the speed of sound becomes a function of frequency c(ω). It can be assumed that the real part of c(ω), is a monotonically increasing function of ω for the frequency range of interest, and that it that approaches c0 as frequency increases (Lighthill, 1978; Davies et al., 1993). Further, the increase of the real part of c(ω) is presumed to be such that the ordering of the real parts of the poles and zeros of the input impedance functions remains unchanged from the hard-wall case.

2

The formant frequencies specified in this section will refer to the real part of the impedance poles. Measured formant frequencies will be perturbed to a value less than these nominal specifications because of losses.

References

  1. Ananthapadmanabha, T. V., and Fant, G. (1982). “Calculation of true glottal flow and its components,” Speech Commun. 1, 167–184. 10.1016/0167-6393(82)90015-2 [DOI] [Google Scholar]
  2. Courant, R., and Hilbert, D. (1953). Methods of Mathematical Physics (Interscience Publishers, Inc., New York: ), Vol. 1, pp. 291–295, 351–358. [Google Scholar]
  3. Davies, P. O. A. L., McGowan, R. S., and Shadle, C. H. (1993). “Practical flow-duct acoustics applied to the vocal tract,” in Vocal Fold Physiology: Frontiers in Basic Science, edited by Titze I. R. (Singular Publishing Group, San Diego, CA), pp. 93–142. [Google Scholar]
  4. Erath, B. D., Peterson, S. D., Zañartu, M., Wodicka, G. R., and Plesniak, M. W. (2011). “A theoretical model of the pressure field arising from asymmetric intraglottal flows applied to a two-mass model of the vocal folds,” J. Acoust. Soc. Am. 130, 389–403. 10.1121/1.3586785 [DOI] [PubMed] [Google Scholar]
  5. Fant, G. (1960). Acoustic Theory of Speech Production (Mouton, Hague, Netherlands: ), pp. 42, 265–272. [Google Scholar]
  6. Fant, G. (1962). “Formant bandwidth data,” Speech Transmission Laboratory Quarterly Progress and Status Report 1 (Royal Institute of Technology, Stockholm: ), pp. 1–3. [Google Scholar]
  7. Fant, G. (1986). “Glottal flow: models and interaction,” J. Phonetics 14, 393–399. [Google Scholar]
  8. Flanagan, J. L., and Cherry, L. (1968). “Excitation of vocal-tract synthesizers,” J. Acoust. Soc. Am. 45, 764–769. 10.1121/1.1911461 [DOI] [PubMed] [Google Scholar]
  9. Flanagan, J. L., and Landgraf, L. L. (1968). “Self-oscillating sources for vocal-tract synthesizers,” IEEE Trans. Audio Electroacoust. AU-16, 57–64. 10.1109/TAU.1968.1161949 [DOI] [Google Scholar]
  10. Fujimura, O., and Lindqvist, J. (1970). “Sweep-tone measurements of vocal-tract characteristics,” J. Acoust. Soc. Am. 49, 541–558. 10.1121/1.1912385 [DOI] [PubMed] [Google Scholar]
  11. Habib, R. H., Chalker, R. B., Suki, B., and Jackson, A. C. (1994). “Airway geometry and wall mechanical properties estimated from subglottal input impedances in humans,” J. Appl. Physiol. 77, 441–451. [DOI] [PubMed] [Google Scholar]
  12. Howe, M. S. (1998). Acoustics of Fluid-Structure Interactions (Cambridge University Press, Cambridge, U.K.), pp. 1–560. [Google Scholar]
  13. Howe, M. S. (2002). Theory of Vortex Sound (Cambridge University Press, Cambridge, U.K.), pp. 1–216. [Google Scholar]
  14. Howe, M. S. (2008). “Rayleigh Lecture 2007: Flow-surface interaction noise,” J. Sound Vib. 314, 113–146. 10.1016/j.jsv.2007.12.035 [DOI] [Google Scholar]
  15. Howe, M. S., and McGowan, R. S. (2007). “Sound generated by aerodynamic sources near a deformable body, with application to voiced speech,” J. Fluid Mech. 592, 367–392. 10.1017/S0022112007008488 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Howe, M. S., and McGowan, R. S. (2009). “Nonlinear flow-structure coupling in a mechanical model of the vocal folds and subglottal system,” J. Fluids Struct. 25, 1299–1317. 10.1016/j.jfluidstructs.2009.08.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Howe, M. S., and McGowan, R. S. (2011a). “On the generalized Fant equation,” J. Sound Vib. 330, 3123–3140. 10.1016/j.jsv.2011.01.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Howe, M. S., and McGowan, R. S. (2011b). “Production of sound by unsteady throttling of flow into a resonant cavity, with application to voiced speech,” J. Fluid Mech. 672, 428–450. 10.1017/S0022112010006117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Ince, E. L. (1956). Ordinary Differential Equations (Dover Publications, New York: ), pp. 210, 215, 223–235. [Google Scholar]
  20. Ishizaka, K., and Flanagan, J. L. (1972). “Synthesis of voiced sounds from a two-mass model of the vocal cords,” Bell Syst. Tech. J. 51, 1233–1267. [Google Scholar]
  21. Ishizaka, K., Matsudaira, M., and Kaneko, T. (1976). “Input acoustic-impedance measurement of the subglottal system,” J. Acoust. Soc. Am. 60, 190–197. 10.1121/1.381064 [DOI] [PubMed] [Google Scholar]
  22. Lighthill, J. (1978). Waves in Fluids (Cambridge University Press, Cambridge, U. K.), pp. 89–120. [Google Scholar]
  23. McGowan, R. S., and Howe, M. S. (2010). “Comments on single-mass models of vocal fold vibration,” J. Acoust. Soc. Am. 127, EL215–EL222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Morse, P. M., and Feshbach, H. (1953). Methods of Theoretical Physics (McGraw-Hill, New York: ), pp. 719–729, 804–807, 869–874. [Google Scholar]
  25. Olive, J. P., Greenwood, A., and Coleman, J. (1993). Acoustics of American English Speech: A Dynamic Approach (Springer-Verlag, New York: ), pp. 104–208.104, 208. [Google Scholar]
  26. Pelorson, X., Hirschberg, A., van Hassel, R. R., and Wijnands, A. P. J. (1994). “Theoretical and experimental study of quasisteady-flow separation within the glottis during phonation. Application to a modified two-mass model,” J. Acoust. Soc. Am. 96, 3416–3431. 10.1121/1.411449 [DOI] [Google Scholar]
  27. Pierce, A. D. (1989). Acoustics: An Introduction to its Physical Principles and Applications (Acoustical Society of America, Woodbury, NY: ), pp. 360–361. [Google Scholar]
  28. Rothenberg, M. (1981). “Acoustic interaction between the glottal source and the vocal tract,” in Vocal Fold Physiology, edited by Stevens K. N. and Hirano M. (University of Tokyo Press, Tokyo: ), pp. 305–328. [Google Scholar]
  29. Stevens, K. N. (1998). Acoustic Phonetics (MIT Press, Cambridge, MA: ), pp. 258–260. [Google Scholar]
  30. Story, B. H., and Titze, I. R. (1995). “Voice simulation with a body-cover model of the vocal folds,” J. Acoust. Soc. Am. 97, 1249–1260. 10.1121/1.412234 [DOI] [PubMed] [Google Scholar]
  31. Svec, J. G., Sundberg, J., and Hertegard, S. (2008). “Three registers in an untrained female singer analyzed by videokymography, strobolaryngoscopy, and sound spectrography,” J. Acoust. Soc. Am. 123, 347–353. 10.1121/1.2804939 [DOI] [PubMed] [Google Scholar]
  32. Titze, I. R. (1984). “Parameterization of the glottal area, glottal flow, and vocal contact area,” J. Acoust. Soc. Am. 75, 570–580. 10.1121/1.390530 [DOI] [PubMed] [Google Scholar]
  33. Titze, I. R. (2008). “Nonlinear source-filter coupling in phonation: Theory,” J. Acoust. Soc. Am. 123, 2733–2749. 10.1121/1.2832337 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Titze, I. R., Riede, T., and Popolo, P. (2008). “Nonlinear source-filter coupling in phonation: Vocal exercises,” J. Acoust. Soc. Am. 123, 1902–1915. 10.1121/1.2832339 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Titze, I. R., and Story, B. H. (1997). “Acoustic interactions of the voice source with the lower vocal tract,” J. Acoust. Soc. Am. 101, 2234–2243. 10.1121/1.418246 [DOI] [PubMed] [Google Scholar]
  36. Titze, I. R., and Worley, A. S. (2009). “Modeling source-filter interaction in belting and high-pitched operatic singing,” J. Acoust. Soc. Am. 126, 1530–1540. 10.1121/1.3160296 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Tokuda, I. T., Zemke, M., Kob, M., and Herzel, H. (2007). “Biomechnaical modeling of register transitions and the role of vocal tract resonators,” J. Acoust. Soc. Am. 122, 519–531. 10.1121/1.2741210 [DOI] [PubMed] [Google Scholar]
  38. Whittaker, E. T., and Watson G.N. (1927). A Course in Modern Analysis (Cambridge University Press, Cambridge, U. K.), pp. 136–137. [Google Scholar]
  39. Zañartu, M., Mehta, D. D., Ho, J. C., Wodicka, G. R., and Hillman, R. E. (2011). “Observation and analysis of in vivo vocal fold tissue instabilities produced by nonlinear source-filter coupling: A case study,” J. Acoust. Soc. Am. 129, 326–339 10.1121/1.3514536 [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from The Journal of the Acoustical Society of America are provided here courtesy of Acoustical Society of America

RESOURCES