Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Sep 1.
Published in final edited form as: Biomed Signal Process Control. 2024 May 3;95(Pt A):106394. doi: 10.1016/j.bspc.2024.106394

Continuous-Time Model Identification of the Subglottal System

Javier G Fontanet a, Juan I Yuz a, Hugues Garnier b, Arturo Morales a, Juan Pablo Cortés a, Matías Zañartu a
PMCID: PMC11113079  NIHMSID: NIHMS1991643  PMID: 38799405

Abstract

Mathematical models that accurately simulate the physiological systems of the human body serve as cornerstone instruments for advancing medical science and facilitating innovative clinical interventions. One application is the modeling of the subglottal tract and neck skin properties for its use in the ambulatory assessment of vocal function, by enabling non-invasive monitoring of glottal airflow via a neck surface accelerometer. For the technique to be effective, the development of an accurate building block model for the subglottal tract is required. Such a model is expected to utilize glottal volume velocity as the input parameter and yield neck skin acceleration as the corresponding output. In contrast to preceding efforts that employed frequency-domain methods, the present paper leverages system identification techniques to derive a parsimonious continuous-time model of the subglottal tract using time-domain data samples. Additionally, an examination of the model order is conducted through the application of various information criteria. Once a low-order model is successfully fitted, an inverse filter based on a Kalman smoother is utilized for the estimation of glottal volume velocity and related aerodynamic metrics, thereby constituting the most efficient execution of these estimates thus far. Anticipated reductions in computational time and complexity due to the lower order of the subglottal model hold particular relevance for real-time monitoring. Simultaneously, the methodology proves efficient in generating a spectrum of aerodynamic features essential for ambulatory vocal function assessment.

Keywords: Subglottal System, System Identification, Instrumental Variables, Output Error, Prediction Error Method, Likelihood, Kalman Smoother

1. Introduction

Voice production is a product of the intricate interplay between airflow and the various structures of the phonatory system. For voiced sounds, the pressure from the lungs provides sufficient energy to induce self-sustained oscillations of the vocal folds (VFs) that result in the main aeroacoustic sound source at the glottis. These sound waves are propagated above and below the glottis through the supraglottal and subglottal systems, respectively. The supraglottal system, also commonly referred to as the vocal tract, provides the main filtering effects that are associated with speech articulation (Fant (1971); Stevens (2000)), while the role of the subglottal system is less well understood. Previous efforts have found that the subglottal system plays a key role in the complex interactions between the resulting sound waves and the airflow at the glottis that alter voice quality (Lulich, Bachrach and Malyska (2007); Titze (2008); Titze, Riede and Popolo (2008); Ho, Zañartu and Wodicka (2011)), trigger bifurcations and voice breaks (Zañartu, Mongeau and Wodicka (2007); Zhang, Neubauer and Berry (2006)), and yield natural quantal differences for the vowel space (Stevens (2000); Chi and Sonderegger (2007); Lulich (2010)).

The study of voice aerodynamics, particularly its interaction with the supraglottal and subglottal systems, has been crucial in advancing our understanding of clinical issues, such as assessing vocal hyperfunction. Vocal hyperfunction (VH) is a type of voice disorder that is associated with abuse and misuse of voicing (Verdolini, Rosen, Branski and others (2006)), and affects approximately 6.6% of the adult population with a lifetime prevalence of 30% (Roy, Merrill, Gray and Smith (2005); Bhattacharyya (2014)). Prior studies have shown significant differences between normal vocal function and VH patients by measuring aerodynamic measures extracted from the glottal airflow signal, also referred to as glottal volume velocity (GVV) (Hillman, Holmberg, Perkell, Walsh and Vaughan (1989); Espinoza, Zañartu, Van Stan, Mehta and Hillman (2017)) from recordings of oral airflow (or oral volume velocity, OVV) with a pneumotachograph mask (Rothenberg (1970)). Notably, the GVV signal can also be obtained using subglottal inverse filtering methods using models of the subglottal system (Cheyne (2006); Zañartu, Ho, Mehta, Hillman and Wodicka (2013)) for a neck-skin accelerometer (ACC) signal. The non-invasive, noise-robust, and portable nature of the ACC signal has resulted in significant interest in subglottal inverse filtering for studying VH (Espinoza et al. (2017); Mehta, Van Stan, Zañartu, Ghassemi, Guttag, Espinoza, Cortés, Cheyne and Hillman (2015); Cortés, Espinoza, Ghassemi, Mehta, Van Stan, Hillman, Guttag and Zañartu (2018)) through ambulatory voice monitoring devices (e.g Mehta, Zañartu, Feng, Cheyne and Hillman (2012)).

Mathematical representations of the subglottal system (Wodicka, Stevens, Golub, Cravalho and Shannon (1989); Harper, Kraman, Pasterkamp and Wodicka (2001); Harper, Pasterkamp, Kiyokawa and Wodicka (2003)) have played an important role in the development of the signal processing components needed for subglottal inverse filtering. A transmission line model that represents the subglottal system was proposed by Zañartu et al. (2013) to physiologically relate the ACC and GVV signals, in an approach referred to as impedance-based inverse filtering (IBIF).

As a result, aerodynamic features can be obtained from GVV such as peak-to-peak AC flow, open quotient, and maximum flow declination rate (Perkell, Hillman and Holmberg (1994); Alku (2011); Drugman, Alku, Alwan and Yegnanarayana (2014)). These features can then be used to identify phonatory mechanisms associated with VH, which are supported by glottal aerodynamic measures of subglottal air pressure and glottal airflow (Espinoza et al. (2017)). To achieve this, researchers have utilized a transfer function impedance-based model to obtain aerodynamic parameters using a neck surface accelerometer for evaluating, monitoring, and comparing differences between VH and healthy controls (Zañartu et al. (2013); Mehta et al. (2015); Cortés et al. (2018)). However, a significant challenge in these studies is how to accurately estimate subject-specific parameters related to the subglottal system and neck-surface mechanical properties.

The study of voice pathologies has been grounded in mathematically modeled first-principles based on physical principles (Harper et al. (2001); Henry and Royston (2018); Hanna, Smith and Wolfe (2018)). Alternatively, a data-driven approach may be followed when only partial or no knowledge of the physical principles acting upon the system is used. In this case, a “grey box” approach can be applied when only certain parameters are unknown or physical knowledge is used to define the model structure (where then the parameters are estimated). Also, a “black box” modeling can be applied where both the structure and the parameters are chosen based solely on the available data (Söderström and Stoica (1989); Ljung (1998); Garnier and Wang (2008)). In this paper, GVV and ACC are respectively defined as the input and output signals of the system to be identified, and then a “black box’ system identification approach is applied.

System identification aims to derive a mathematical model, whether linear or nonlinear, that effectively captures the dynamic behavior of a system and exhibits a high level of accuracy. This identification process is conducted using input-output data from the system. In previous studies, the subglottal system has been represented using a linear model, and the impulse response has been obtained by inverse Fourier transformation of the frequency domain response (Cortés et al. (2018); Zañartu et al. (2013)). The linearity assumption in the subglottal system is justified since the nonlinearity associated with frequencies dependent on resistances has a minor overall impact on the system behavior (Zañartu et al. (2013)). Furthermore in Fant (1971), the cavities responsible for transforming the glottal source were presented as linear filters.

In this paper we propose the use of linear time-invariant (LTI) models, to describe the subglottal system in continuous-time and to use the simplified refined instrumental variable method for continuous-time (SRIVC) model estimation method to fit these models using real input-output data.

SRIVC is widely recognized as one of the most successful direct methods for continuous-time model identification from sampled data (Garnier, Gilson, Young and Huselstein (2007)). This method has been extensively applied in different fields, including chemical processes, electronic circuits, and biological systems modeling (Garnier and Young (2014); Garnier (2015)).

Additionally, in this paper we study the model order selection problem. Once the candidate continuous-time models for the subglottal system have been estimated, the Akaike, Bayes, and Young information criteria methods can be employed to determine the optimal order and structure among them. These criteria methods play a crucial role in the trade-off between accuracy of model fitting and complexity. The information criteria enable a systematic and data-driven approach in selecting the most appropriate model order (Burnham and Anderson (2004)).

The remainder of the paper is structured as follows: Section 2 presents the subglottal system description. Then, Section 3 presents SRIVC as the system identification method and the information criteria to be used to select the best model order and structure. In Section 4, the data processing required before performing the estimation for the set of candidate models is explained, and a linearity test is also conducted. Later in Section 5, the results of the model identification are presented together with the discussion of model order selection. Section 6 presents the application of the obtained model for inverse filtering using a Kalman Smoother to obtain estimates of the glottal volume velocity airflow and the associated aerodynamic features. Finally, in Section 7 conclusions are presented.

2. System Description

Several works have proposed to measure the acceleration on the neck skin surface generated by the airflow in the glottis to study VH. The acceleration data has been used to estimate certain parameters of an impedance-base (IB) model, which is a mechano-acoustic representation of a physiologically-based transmission line (Zañartu et al. (2013)). An inverse filtering has been then applied to the IB model to obtain an accurate estimation of the aerodynamic source of voice sounds at the glottis (Zañartu et al. (2013); Mehta et al. (2015); Cortés et al. (2018)).

The skin acceleration is measured by attaching an accelerometer to the neck surface between the thyroid prominence and the suprasternal notch (Cheyne (2006)). The main advantage of measuring acceleration is that it is a non-invasive method of studying the health of the speech system. Additionally, it is also immune to noise, making it particularly suitable for ambulatory studies (Popolo, Švec and Titze (2005); Mehta et al. (2012)).

We consider the subglottal system as represented in the block diagram shown in Figure 1, where u(t) is the input signal GVV, y(t) is the measured signal ACC, and w(t) is assumed to be white measurement noise.

Figure 1:

Figure 1:

Block diagram representation of the phonatory system

As mentioned in the introduction, the input GVV is not directly measured, however, it is obtained from OVV using a linear prediction (LP) filter (Alku, Magi, Yrttiaho, Bäckström and Story (2009); Kafentzis, Stylianou and Alku (2011)).

The subglottal system in Figure 1 can be expressed as the following continuous-time model

y(t)=G(p,θ)u(t)+H(p)w(t) (1)

where p represents the differential operator p=ddt,H(p)=1 is assumed as the noise filter, and the subglottal system is then given by

G(p,θ)=B(p,θ)A(p,θ)=bmpm+bm-1pm-1++b0pn+an-1pn-1++a0 (2)

where G(p,θ) is assumed to be proper (nm) and the parameter vector is θ=an-1a0bmb0TRn+m+1.

Our interest in this paper is to estimate the parameter vector θ in (2), for a given model structure (i.e., n and m), based on input and output sampled data: utk corresponding to GVV and ytk corresponding to ACC, for a regular sampling interval Ts. The sampling instants are tk=kTs, for k=1,,N, where N is the number of available synchronized data points. Thus, for identification purposes we consider the following hybrid model

yut=Gp,θutytk=yutk+wtk (3)

where yu(t) is the noise free output of the subglottal system, and wtk is assumed to be a discrete-time Gaussian distributed white measurement noise wtk𝒩0,σw2.

3. Parameter estimation and information criteria

This section presents the parameter estimation method used to perform identification of the subglottal system from synchronized input-output data. The input signal is the glottal flow GVV and the output is the accelerometer signal ACC.

Furthermore, this section also presents the information criteria employed for model order selection from a set of candidate models. The criteria used are the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC) and the Young Information Criteria (YIC), which are commonly used to select the best model order from a set of candidate models.

3.1. Continuous-Time Transfer Function Estimation Using Time-Domain Data

When the theory of system identification first appeared, it was mainly aimed at the estimation of continuous-time LTI models. However, with the advance of digital computers and data acquisition systems, the estimation of discrete-time models gained prominence. In the last decades, the increase both in computation power and sampling frequency has led to a resurgence of the estimation of continuous-time models from sampled data (Garnier, Mensler and Richard (2003); Garnier and Wang (2008)).

Some of the difficulties that appear in discrete-time identification are related to the sampling time, which may lead to loss of information when the sampling period Ts is small. Numerical issues may appear in discrete-time since all the poles of the transfer function approach the point z=1 (in the z-plane) for fast sampling rates (Garnier et al. (2003); Pascu, Garnier, Ljung and Janot (2019)).

The estimation of continuous-time models from a sampled dataset has been shown to have advantages over discrete-time methods when using real data (Garnier and Young (2014)). Moreover, the obtained continuous-time models can be discretized for different sampling rates if needed (Garnier and Wang (2008)).

Simplified refined instrumental variable method for continuous-time (SRIVC) modeling is one of the most applied continuous-time identification methods in the literature. It is an iterative instrumental variable method that generates filters (that depend on the system parameters) that are applied to the available input-output data at each iteration. It has been shown that SRIVC provides a consistent and optimal estimator for additive white noise (Young (2012); Pan, González, Welsh and Rojas (2020)).

One might consider applying the Refined Instrumental Variables for Continuous-Time Model (RIVC) method for identification, given the likelihood of colored noise in practical scenarios. However, the results that we obtained with such approach closely resemble those obtained with SRIVC. This is common in applications where the noise level is relatively small. (Garnier and Wang (2008)).

For SRIVC the error function is given by the output error

εtk,θ=ytk-B(p,θ)A(p,θ)utk (4)

The input-output signals in (4) are filtered by the following parameter-dependent filter

L(p,θ)=1A(p,θ) (5)

Then, from (4), we obtain

εtk,θ=A(p,θ)y˜tk-B(p,θ)u˜tk (6)

where y˜tk and u˜tk are the variables pre-filtered by L(p,θ)

A drawback of the previous formulation is that the polynomial in the denominator of the transfer function Gp,θ and its parameters are unknown, and, as a consequence, an initial parameter estimate is required (Garnier et al. (2007)). Then, the error in (6) can be iteratively optimized by solving in every iteration the following recursion

y˜ntk,θ^j=ϕ˜Ttk,θ^θj+1+εtk,θ^j (7)

where the regressor vector is given by

ϕ˜Ttk,θ^=u˜mtk,θ^j,,u˜tk,θ^j,y˜n1tk,θ^j,,y˜tk,θ^j (8)

where θˆj and θˆj+1 are the estimated parameters at the jth and (j+1)th iterations, respectively, and the superscripts (n),(m), and (n-1) correspond to (approximate) time derivatives. The SRIVC estimator is then given by

θ^SRIVCj+1=1Nk=1NZ˜tk,θ^jϕ˜Ttk,θ^j11Nk=1NZ˜Ttk,θ^jy˜ntk,θ^j (9)

where Z˜tk denotes the filtered instrument and is given by

Z˜itk,θ^j=piAp,θ^jϕ˜tk,θ^j (10)

More details about the SRIVC algorithm and its applications can be found, for example, in Young and Jakeman (1980); Young (2012); Garnier (2015); González, Rojas, Pan and Welsh (2023b).

3.2. Information criteria

Model order selection is an important task in the analysis of time series, signal processing, and system identification (Liavas and Regalia (2001)). To determine the best order and structure of the subglottal LTI model, information criteria such as Akaike, Bayes and Young’s Information Criteria (AIC, BIC and YIC respectively) can be used.

Akaike Information Criteria

AIC is a Kullback-Leibler (KL) cross validation approach, in which the model order is chosen such that it minimizes the KL discrepancy between the true probability distribution function and the likelihood of the model (Stoica and Selén (2004)). The AIC is given by

AIC=2lnpnθy,θ^η+2nθ (11)

where y is the vector of available output data of size N,nθ is the number of parameters, θˆη is the estimated parameter vector, and pny,θˆη is the probability density function of the data vector. The value of the above expression is obtained for a set of candidate models, among which the one that yields the minimum value is chosen. In this way, the goodness of fit is rewarded and over-fitting is penalized (Akaike (1974)).

Bayes Information Criteria

BIC is a second possible information criteria used to select model order that is given by

BIC=2lnpnθy,θ^η+nθlnN (12)

Equation (12) has a structure similar to the AIC. However, it differs in last penalty component that is associated with the assumption that p(θ) is independent of N, resulting in an increasing ratio of estimation to validation samples as the number of data points N grows (Stoica and Selén (2004)).

Similar to the case of AIC, BIC is determined for a set of candidates model, taking into account model fit and penalizing over-fitting, and the model order that minimizes the BIC in (12) is then selected.

Young Information Criteria

YIC is a third model order selection criteria that, as a difference compared to AIC and BIC, uses the covariance matrix estimate associated with the estimated parameters (Young (1989); Garnier and Wang (2008); Laurain, Gilson, Payraudeau, Grégoire and Garnier (2010)).

In fact, YIC involves the coefficient of determination RT2 (13) and the variance-error norm to define the model that best fits the data and best estimates the parameters (Young (1989)).

The coefficient of determination RT2 is a metric that assesses the accuracy of the identified model to represent the actual model (Young and Garnier (2006); Garnier et al. (2007)), by considering the simulation error. The RT2 values range between 0 and 1, indicating the relation between the identified model and the real-world observations. The RT2 is defined by (Garnier (2015))

RT2=1-σεˆ2σy2 (13)

where σεˆ2 is the variance of the estimated error, and σy2 is the variance of the measured output which are respectively given by

σε^2=1Nk=1Nε^tkε¯tk2 (14)
σy2=1Nk=1Nytky¯tk2 (15)
ε^tk=ytky^tk,θ^η (16)

where εtk and ytk are the mean values of the estimated noise and the measured output signal of the system, yˆtk,θˆη is the estimated output signal, and θˆη is the estimated parameter vector given by the SRIVC algorithm.

The YIC is then given by

YIC=lnσε^2σy2+ln1nθi=1nθπ^iiθ^i2 (17)

where nθ is the number of estimated parameters, πˆii is the ith diagonal element of the covariance matrix of the estimated parameter vector θˆη (Garnier and Wang (2008)).

4. Data pre-processing

To estimate the parameters of different possible models, we have an input-output dataset comprising 262165 samples, sampled at 20k Hz, corresponding to the pronunciation of the vowel a five times with temporal separation. It is worth mentioning that high-frequency signal processing can be computationally expensive and, in some cases, unnecessary for accurate system identification. Therefore, down-sampling the high-frequency signals to a lower sampling rate can be beneficial (Rout, Das and Panda (2015)).

Considering the impact of high-frequency data on parametric estimation, we resample the ACC and GVV signals to a frequency of Fsub=8192Hz. A similar down-sampling strategy has been applied, for example, in Espinoza et al. (2017). The power spectrum of the resampled data is shown in Figure 2. The analysis of the power spectrum is crucial for understanding signal characteristics such as dominant frequencies, energy distribution, and gain. Moreover, it may enhance experimental identification results.

Figure 2:

Figure 2:

Input-output resampled dataset power spectrum

Furthermore, we opted for the second vocalization from the resampled dataset, comprising 2000 time data points, with the assumption that the first vocalization may exhibit a greater transient effect. This choice was made to enhance the accuracy and fit of our experimental identification results.

The resampled GVV and ACC signals are shown in Figures 3 and 4, respectively. In these figures, the data points in red belong to down-sampled signal, and the blue lines represent the original signal sampled at high-frequency (20kHz).

Figure 3:

Figure 3:

Glottal volume velocity (GVV) original and resampled signal.

Figure 4:

Figure 4:

Acceleration (ACC) signal original and resampled signal.

Previous studies in the literature assume a linear model for the subglottal system, neglecting the impact of frequency-dependent resistances in the physically based model since they do not significantly impact the overall behavior of the system (Zañartu et al. (2013)). In this section, we test this assumption using the coherence method. The analysis shows that the assumption of linearity between the measurements of ACC and GVV could be, in fact valid in the frequency bandwidth of interest.

Spectral coherence is a powerful signal processing tool, that has been applied to analyze the relationship between two random signals or processes (Gómez González, Rodríguez, Sagartzazu, Schuhmacher and Isasa (2010); Klein, Sauer, Jedynak and Skrandies (2006)), and has extensively used in diverse fields, including communication systems, acoustics, and biomedical engineering (Stoica and Moses (2005)). Spectral coherence measures the degree of linear correlation between two given signals in the frequency domain, providing a coherence value spanning from 0 (indicating no correlation) to 1 (indicating perfect correlation).

The coherence between two signals x and y is defined by the following quotient

C(x,y)(f)=P(x,y)(f)2P(x,x)(f)P(y,y)(f)[0,1] (18)

where P(x,x)(f) and P(y,y)(f) are the power spectral densities of each signal, and P(x,y)(f) is the cross power spectral density between two signals.

The coherence function provides a direct and independent measure of system excitation, data quality, and system response linearity (Johansson (1993)). Poor coherence value can be attributed to either a poor signal to noise ratio (which is not the case in our application) or to nonlinear effects in the dynamics. Therefore, good coherence data is important for determination the linearity assumption of the system. The input-output data can then be used to estimate the parameters of the selected linear model.

In our case, the signals utilized for coherence analysis are the accelerometer measurements ACC and the glottal airflow GVV, which are the same signals used to perform the model parameter estimation. The results are presented in Figure 5, which shows the coherence between the two signals as a function of frequency. It can be noticed that, in the frequency range from approximately 220Hz to 1kHz, the average coherence value is higher than 0.7. This suggests that the signals are highly correlated in that frequency range, and a linear model may be appropriate. This result is closely related to the graph of the power spectrum of the signals shown in Figure 2, where the peaks appear within the same frequency bandwidth.

Figure 5:

Figure 5:

Coherence between ACC and GVV

5. Continuous time model identification

In this section we present the identification results obtained for the subglottal system using the SRIVC estimator.

For the system identification procedure, the input-output data are of the glottal airflow signal GVV and the accelerometer measurement signal ACC, which have already been resampled, as explained in Section 4.

In this study, system identification is conducted for a range of candidate models, for different model orders and relative degrees. Then we have to consider the trade-off between the flexibility of the estimated model that may overfit the training data and having a parsimonious model that may offer a more concise representation (González, Rojas, Pan and Welsh (2023a)). Parameter estimation is performed for all possible combinations of candidate models, using coefficients ranging from 1 to 10 in both the denominator and numerator. Then the SRIVC algorithm is applied for each of them to obtain parameter estimates.

The AIC, BIC, and YIC information criteria are utilized to compare the performance of each candidate model. These criteria provide a systematic approach to assess the trade-off between model complexity and goodness of fit, allowing for the selection of the best model among the candidates. Figures 6, 7, 8 show the AIC, BIC, and YIC matrices, respectively, obtained when continuous-time system identification is performed using SRIVC.

Figure 6:

Figure 6:

Heatmap of AIC coefficients

Figure 7:

Figure 7:

Heatmap of BIC coefficients

Figure 8:

Figure 8:

Heatmap of YIC coefficients

From the information criteria shown in Figures 6, 7 and 8 it can be noticed that the three information criteria suggest that the best model structure is the one having 5 poles and 4 zeros.

In order to complement the analysis provided by the information criteria, we use the root mean square error (RMSE)

RMSE=1Nk=1Nytky^tk,θ^η2 (19)

where N is the number of samples.

The RMSE between the measured ACC signal and the simulated ACC signal is computed for the group of candidate models (see Figure 9). In this figure it can be seen that the structure with 5 poles and 4 zeros has the lowest RMSE value, which corresponds to the same structure selected by the three information criteria.

Figure 9:

Figure 9:

Root mean squared error of the candidates models

Moreover, Figure 10 shows the coefficient of determination obtained for the estimated models. It can be noticed that the highest value of the coefficient RT2=1 is obtained for the same model structure selected by the information criteria, i.e., the model having 5 poles and 4 zeros.

Figure 10:

Figure 10:

Heatmap of RT2 coefficients

From the analysis above, the choice of the best model structure is clear since information criteria, RMSE, and coefficient of determination provide the same result, namely, an LTI model for the subglottal system having 5 poles and 4 zeros. Moreover, the analysis above also shows that the model structure with 8 poles and 7 zeros, could also be considered, however, at the expense of more complexity and possible overfitting.

Additionally, Figures 11 and 12 show the pole/zero map and Bode magnitude plot of the selected model (with 5 poles and 4 zeros), and Figure 13 shows the comparison between the simulated response corresponding to the selected model and the measured output (i.e. the accelerometer signal ACC). In that figure, the comparison between the measured acceleration validation data and the simulated acceleration data results in a model fit of 93.9% computed by:

FIT=(1y(tk)y^(tk,θ^η)y(tk)y¯(tk))100% (20)

Figure 11:

Figure 11:

Poles and zeros map of the selected model estimated using SRIVC.

Figure 12:

Figure 12:

Bode diagram of the selected model estimated using SRIVC.

Figure 13:

Figure 13:

Comparison of the estimated model with SRIVC and resampled data simulation against validation data

6. Inverse filtering

In the previous section, system identification was performed applying SRIVC and information criteria were used to select a model order and structure that provides a good fit between simulated and observed acceleration. However, a key objective of obtaining a model for the subglottal system is to apply it for inverse filtering, i.e., to estimate the glottal airflow (GVV, the input signal) from accelerometer measurements (ACC, the output signal). In fact, in ambulatory studies such as Zañartu et al. (2013), the estimated glottal airflow is used to obtain indicators of the speech health of the patients.

In this work, inverse filtering is performed applying a Kalman smoother strategy as presented in Morales, Yuz, Cortés, Fontanet and Zañartu (2023). A key difference with that work is that the linear models obtained in that paper for the subglottal system were based in a frequency response of an impedance-based model which parameters were previously fitted by particle swarm optimization.

The results of the Kalman smoother estimation of the glottal airflow (using the optimal model obtained by SRIVC) are shown in Figure 14. In this figure, the GVV signal derived from OVV and Kalman filtered GVV signal from the SRIVC estimated model can be compared. The associated RMSE obtained is 10.62×10-3mL/s

Figure 14:

Figure 14:

GVV estimation with Kalman Smoother and SRIVC estimated model

To be able to assess the validity of the estimated glottal airflow, acoustic and aerodynamic measures obtained from GVV are used. These measures consider the first two harmonics (H1-H2), the harmonic richness factor (HRF), the maximum flow declination rate (MFDR), the AC flow (ACFL), and the normalized amplitude quotient (NAQ) (Holmberg, Doyle, Perkell, Hammarberg and Hillman (2003)).

Table 1 shows the aerodynamic feaures computed using the measured GVV (that is actually derived from OVV) and the aerodynamic features computed using the estimated GVV obtained from ACC measurements by Kalman filtering. In the table, these two sets of features are labeled GVV_OVV and GVV_ACC, respectively.

Table 1.

Aerodynamic metrics computed from GVV

Measures Model GVV_ACC GVV_OVV
H1-H2
(dB)
8.43 ± 2.5 9.82 ± 3.7
HRF
(dB)
7.19 ± 3.3 7.98 ± 4.4
MFDR
(L/s2)
142.1 ± 83.8 178.37 ± 107.53
ACFL
(mL/s)
97.99 ± 56.2 112.07 ± 65.7
NAQ 0.15 ± 0.03 0.15 ± 0.04

Moreover. aerodynamic features shown in Table 1 obtained from ACC are similar to previous results in the literature when applying Kalman filtering, such as, for example Table 2 in Cortés, Alzamendi, Weinstein, Yuz, Espinoza, Mehta, Hillman and Zañartu (2022) and Table 4 in Morales et al. (2023). However, compared to those previous results, in this study we have used a lower-order model (5 poles and 4 zeros) estimated directly from time-domain data. This implies a reduction in model order of 77% (5/22) against recent efforts Morales et al. (2023) and 98.6% (5/350) against previous ones Cortés et al. (2022). The clear reduction in the model order for the subglottal system certainly leads to an important reduction in the computational time and complexity, which are key issues for real-time monitoring applications.

7. Conclusions

In this paper a continuous-time, linear time-invariant models were obtained for the subglottal system. The identification process employed sampled data, encompassing both neck skin acceleration and glottal airflow. Utilizing the SRIVC estimator, system identification was performed for a range of candidate models. Selection of the optimal parsimonious model was based on information criteria, and further verified by RMSE and coefficient of determination analysis. Our results indicate that the SRIVC algorithm effectively estimates a low-order model, achieving a high level of fit. Moreover, this model was applied in a Kalman smoother for inverse filtering, enabling the extraction of glottal airflow estimates based on neck skin acceleration measurements. The approach allows us to estimate glottal aerodynamic features using a lower order model compared to previous studies in the literature, offering a novel alternative for real-time ambulatory assessment of vocal function.

Acknowledgments

This work has been supported by ANID (through grants Advanced Center for Electrical and Electronic Engineering FB0008, ECOS 210008, FONDECYT 1230623 and Doctorado Nacional 21202402 scholarship), Universidad Técnica Federico Santa María (through grant PIIC 015/2021), and the National Institute of Health and National Institute on Deafness and Other Communication Disorders (through grant NIH P50DC015446). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Declaration of interests
  • Javier G. Fontanet reports a relationship with National Agency for Research and Development that includes: funding grants. Javier G. Fontanet reports a relationship with Federico Santa Maria Technical University that includes: funding grants. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  • Juan I. Yuz reports a relationship with ANID that includes: funding grants. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  • Juan Pablo Cortes reports financial support was provided by Lanek SPA. Matias Zanartu reports financial support was provided by Lanek SPA. Juan Pablo Cortes reports a relationship with Lanek SPA that includes: employment. Matias Zanartu reports a relationship with Lanek SPA that includes: equity or stocks. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Declaration of interests
  • Hugues Garnier.
  • Arturo Morales.

References

  1. Akaike H, 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 716–723. doi: 10.1109/TAC.1974.1100705. [DOI] [Google Scholar]
  2. Alku P, 2011. Glottal inverse filtering analysis of human voice production: A review of estimation and parameterization methods of the glottal excitation and their applications. Sadhana 36, 623–650. [Google Scholar]
  3. Alku P, Magi C, Yrttiaho S, Bäckström T, Story B, 2009. Closed phase covariance analysis based on constrained linear prediction for glottal inverse filtering. The Journal of the Acoustical Society of America 125. doi: 10.1121/1.3095801. [DOI] [PubMed] [Google Scholar]
  4. Bhattacharyya N, 2014. The prevalence of voice problems among adults in the United States. Laryngoscope 124, 2359–2362. doi: 10.1002/lary.24740. [DOI] [PubMed] [Google Scholar]
  5. Burnham KP, Anderson DR, 2004. Multimodel inference: A Practical Information-Theoretic Approach.
  6. Cheyne HA, 2006. Estimating glottal voicing source characteristics by measuring and modeling the acceleration of the skin on the neck. Proceedings of the 3rd IEEE-EMBS International Summer School and Symposium on Medical Devices and Biosensors, ISSS-MDBS 2006, 118–121doi: 10.1109/ISSMDBS.2006.360113. [DOI] [Google Scholar]
  7. Chi X, Sonderegger M, 2007. Subglottal coupling and its influence on vowel formants. The Journal of the Acoustical Society of America 122. doi: 10.1121/1.2756793. [DOI] [PubMed] [Google Scholar]
  8. Cortés JP, Alzamendi GA, Weinstein AJ, Yuz JI, Espinoza VM, Mehta DD, Hillman RE, Zañartu M, 2022. Kalman Filter Implementation of Subglottal Impedance-Based Inverse Filtering to Estimate Glottal Airflow during Phonation. Applied Sciences (Switzerland) 12, 1–20. doi: 10.3390/app12010401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cortés JP, Espinoza VM, Ghassemi M, Mehta DD, Van Stan JH, Hillman RE, Guttag JV, Zañartu M, 2018. Ambulatory assessment of phonotraumatic vocal hyperfunction using glottal airflow measures estimated from neck-surface acceleration. PLoS ONE 13, 1–22. doi: 10.1371/journal.pone.0209017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Drugman T, Alku P, Alwan A, Yegnanarayana B, 2014. Glottal source processing: From analysis to applications. Computer Speech & Language 28, 1117–1138. [Google Scholar]
  11. Espinoza VM, Zañartu M, Van Stan JH, Mehta DD, Hillman RE, 2017. Glottal aerodynamic measures in women with phono-traumatic and nonphonotraumatic vocal hyperfunction. Journal of Speech, Language, and Hearing Research 60, 2159–2169. doi: 10.1044/2017-JSLHR-S-16-0337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fant G, 1971. Acoustic Theory of Speech Production. doi: 10.1515/9783110873429. [DOI] [Google Scholar]
  13. Garnier H, 2015. Direct continuous-time approaches to system identification. Overview and benefits for practical applications, in: European Journal of Control, pp. 50–62. doi: 10.1016/j.ejcon.2015.04.003. [DOI] [Google Scholar]
  14. Garnier H, Gilson M, Young PC, Huselstein E, 2007. An optimal IV technique for identifying continuous-time transfer function model of multiple input systems. Control Engineering Practice 15, 471–486. doi: 10.1016/j.conengprac.2006.09.004. [DOI] [Google Scholar]
  15. Garnier H, Mensler M, Richard A, 2003. Continuous-time model identification from sampled data: Implementation issues and performance evaluation. International Journal of Control 76, 1337–1357. doi: 10.1080/0020717031000149636. [DOI] [Google Scholar]
  16. Garnier H, Wang L, 2008. Identification of Continuous-time Models from Sampled Data. Springer. [Google Scholar]
  17. Garnier H, Young PC, 2014. The advantages of directly identifying continuous-time transfer function models in practical applications. International Journal of Control 87, 1319–1338. doi: 10.1080/00207179.2013.840053. [DOI] [Google Scholar]
  18. Gómez González A, Rodríguez J, Sagartzazu X, Schuhmacher A, Isasa I, 2010. Multiple coherence method in time domain for the analysis of the transmission paths of noise and vibrations with non stationary signals, in: Proceedings of ISMA 2010 - International Conference on Noise and Vibration Engineering, including USD 2010, pp. 3927–3942. [Google Scholar]
  19. González RA, Rojas C, Pan S, Welsh JS, 2023a. Parsimonious Identification of Continuous-Time Systems: A Block-Coordinate Descent Approach. arXiv preprint arXiv:2304.03259 [Google Scholar]
  20. González RA, Rojas CR, Pan S, Welsh JS, 2023b. On the Relation Between Discrete and Continuous-Time Refined Instrumental Variable Methods. IEEE Control Systems Letters 7, 2233–2238. URL: https://ieeexplore.ieee.org/document/10143357/, doi: 10.1109/LCSYS.2023.3282445. [DOI] [Google Scholar]
  21. Hanna N, Smith J, Wolfe J, 2018. How the acoustic resonances of the subglottal tract affect the impedance spectrum measured through the lips. The Journal of the Acoustical Society of America 143. doi: 10.1121/1.5033330. [DOI] [PubMed] [Google Scholar]
  22. Harper P, Kraman SS, Pasterkamp H, Wodicka GR, 2001. An acoustic model of the respiratory tract. IEEE Transactions on Biomedical Engineering 48. doi: 10.1109/10.918593. [DOI] [PubMed] [Google Scholar]
  23. Harper VP, Pasterkamp H, Kiyokawa H, Wodicka GR, 2003. Modeling and measurement of flow effects on tracheal sounds. IEEE Transactions on Biomedical Engineering 50. doi: 10.1109/TBME.2002.807327. [DOI] [PubMed] [Google Scholar]
  24. Henry B, Royston TJ, 2018. A multiscale analytical model of bronchial airway acoustics. The Journal of the Acoustical Society of America 143. doi: 10.1121/1.5027239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Hillman RE, Holmberg EB, Perkell JS, Walsh M, Vaughan C, 1989. Objective assessment of vocal hyperfunction: An experimental framework and initial results. Journal of Speech, Language, and Hearing Research 32, 373–392. [DOI] [PubMed] [Google Scholar]
  26. Ho JC, Zañartu M, Wodicka GR, 2011. An anatomically based, time-domain acoustic model of the subglottal system for speech production. The Journal of the Acoustical Society of America 129, 1531–1547. doi: 10.1121/1.3543971. [DOI] [PubMed] [Google Scholar]
  27. Holmberg EB, Doyle P, Perkell JS, Hammarberg B, Hillman RE, 2003. Aerodynamic and acoustic voice measurements of patients with vocal nodules: Variation in baseline and changes across voice therapy. Journal of Voice 17. doi: 10.1067/50892-1997(03)00076-6. [DOI] [PubMed] [Google Scholar]
  28. Johansson R, 1993. System modeling and identification. Prentice-Hall. [Google Scholar]
  29. Kafentzis GP, Stylianou Y, Alku P, 2011. Glottal inverse filtering using stabilised weighted linear prediction, in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 5408–5411. doi: 10.1109/ICASSP.2011.5947581. [DOI] [Google Scholar]
  30. Klein A, Sauer T, Jedynak A, Skrandies W, 2006. Conventional and wavelet coherence applied to sensory-evoked electrical brain activity. IEEE Transactions on Biomedical Engineering 53. doi: 10.1109/TBME.2005.862535. [DOI] [PubMed] [Google Scholar]
  31. Laurain V, Gilson M, Payraudeau S, Grégoire C, Garnier H, 2010. A new data-based modelling method for identifying parsimonious nonlinear rainfall/flow models. Modelling for Environment’s Sake: Proceedings of the 5th Biennial Conference of the International Environmental Modelling and Software Society, iEMSs 2010 3, 2044–2052. [Google Scholar]
  32. Liavas AP, Regalia PA, 2001. On the behavior of information theoretic criteria for model order selection. IEEE Transactions on Signal Processing 49, 1689–1695. [Google Scholar]
  33. Ljung L, 1998. System identification: Theory for the user. 2nd ed., Pearson. [Google Scholar]
  34. Lulich SM, 2010. Subglottal resonances and distinctive features. Journal of Phonetics 38, 20–32. doi: 10.1016/j.wocn.2008.10.006. [DOI] [Google Scholar]
  35. Lulich SM, Bachrach A, Malyska N, 2007. A role for the second subglottal resonance in lexical access. The Journal of the Acoustical Society of America 122. doi: 10.1121/1.2772227. [DOI] [PubMed] [Google Scholar]
  36. Mehta DD, Van Stan JH, Zañartu M, Ghassemi M, Guttag JV, Espinoza VM, Cortés JP, Cheyne HA, Hillman RE, 2015. Using ambulatory voice monitoring to investigate common voice disorders: Research update. Frontiers in Bioengineering and Biotechnology 3, 1–14. doi: 10.3389/fbioe.2015.00155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Mehta DD, Zañartu M, Feng SW, Cheyne HA, Hillman RE, 2012. Mobile voice health monitoring using a wearable accelerometer sensor and a smartphone platform. IEEE Transactions on Biomedical Engineering 59, 3090–3096. doi: 10.1109/TBME.2012.2207896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Morales A, Yuz JI, Cortés JP, Fontanet JG, Zañartu M, 2023. Glottal Airflow Estimation using Neck Surface Acceleration and Low-Order Kalman Smoothing. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31, 2055–2066. doi: 10.1109/TASLP.2023.3277269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Pan S, González RA, Welsh JS, Rojas CR, 2020. Consistency analysis of the Simplified Refined Instrumental Variable method for Continuous-time systems. doi: 10.1016/j.automatica.2019.108767. [DOI] [Google Scholar]
  40. Pascu V, Garnier H, Ljung L, Janot A, 2019. Benchmark problems for continuous-time model identification: Design aspects, results and perspectives. Automatica 107. doi: 10.1016/j.automatica.2019.06.011. [DOI] [Google Scholar]
  41. Perkell JS, Hillman RE, Holmberg EB, 1994. Group differences in measures of voice production and revised values of maximum airflow declination rate. The Journal of the Acoustical Society of America 96, 695–698. [DOI] [PubMed] [Google Scholar]
  42. Popolo PS, Švec JG, Titze IR, 2005. Adaptation of a Pocket PC for use as a wearable voice dosimeter. Journal of Speech, Language, and Hearing Research 48, 780–791. doi: 10.1044/1092-4388(2005/054). [DOI] [PubMed] [Google Scholar]
  43. Rothenberg M, 1970. New Inverse-Filtering Technique for Deriving the Glottal Air Flow Waveform during Voicing. The Journal of the Acoustical Society of America 48. doi: 10.1121/1.1975066. [DOI] [PubMed] [Google Scholar]
  44. Rout NK, Das DP, Panda G, 2015. Computationally efficient algorithm for high sampling-frequency operation of active noise control. Mechanical Systems and Signal Processing 56. doi: 10.1016/j.ymssp.2014.10.009. [DOI] [Google Scholar]
  45. Roy N, Merrill RM, Gray SD, Smith EM, 2005. Voice disorders in the general population: prevalence, risk factors, and occupational impact. The Laryngoscope 115, 1988–1995. [DOI] [PubMed] [Google Scholar]
  46. Söderström T, Stoica P, 1989. System Identification. Prentice Hall International. [Google Scholar]
  47. Stevens KN, 2000. Acoustic Phonetics. doi: 10.7551/mitpress/1072.001.0001. [DOI] [Google Scholar]
  48. Stoica P, Moses RL, 2005. Spectral analysis of signals. Pearson/Prentice Hall. [Google Scholar]
  49. Stoica P, Selén Y, 2004. Model-order selection: a review of information criterion rules. IEEE Signal Processing Magazine 21, 36–47. [Google Scholar]
  50. Titze I, Riede T, Popolo P, 2008. Nonlinear source-filter coupling in phonation: Vocal exercises. The Journal of the Acoustical Society of America 123. doi: 10.1121/1.2832339 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Titze IR, 2008. Nonlinear source-filter coupling in phonation: Theory. The Journal of the Acoustical Society of America 123. doi: 10.1121/1.2832337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Verdolini K, Rosen C, Branski RC, others, 2006. Classification manual for voice disorders-I. Psychology Press. [Google Scholar]
  53. Wodicka GR, Stevens KN, Golub HL, Cravalho EG, Shannon DC, 1989. A Model of Acoustic Trnsmission in the Respirtor System. IEEE Transactions on Biomedical Engineering 36. doi: 10.1109/10.35301. [DOI] [PubMed] [Google Scholar]
  54. Young P, 1989. Recursive Estimation, Forecasting, and Adaptive Control. Control and Dynamic Systems 30, 119–165. doi: 10.1016/B978-0-12-012730-6.50011-0 [DOI] [Google Scholar]
  55. Young P, Garnier H, 2006. Identification and estimation of continuous-time rainfall-flow models. IFAC Proceedings Volumes 39. doi: 10.3182/20060329-3-au-2901.00206. [DOI] [Google Scholar]
  56. Young P, Jakeman A, 1980. Refined instrumental variable methods of recursive time-series analysis: Multivariable systems. International Journal of Control 31, 741–764. doi: 10.1080/00207177908922724. [DOI] [Google Scholar]
  57. Young PC, 2012. Recursive Estimation and Time-Series Analysis. 2. second ed., Springer Science & Business Media, London. doi: 10.1007/978-3-642-21981-8. [DOI] [Google Scholar]
  58. Zañartu M, Ho JC, Mehta DD, Hillman RE, Wodicka GR, 2013, Subglottal impedance-based inverse filtering of voiced sounds using neck surface acceleration. IEEE Transactions on Audio, Speech, and Language Processing 21, 1929–1939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Zañartu M, Mongeau L, Wodicka GR, 2007. Influence of acoustic loading on an effective single mass model of the vocal folds. The Journal of the Acoustical Society of America 121. doi: 10.1121/1.2409491. [DOI] [PubMed] [Google Scholar]
  60. Zhang Z, Neubauer J, Berry DA, 2006. The influence of subglottal acoustics on laboratory models of phonation. The Journal of the Acoustical Society of America 120. doi: 10.1121/1.2225682. [DOI] [PubMed] [Google Scholar]

RESOURCES