Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Aug 5.
Published in final edited form as: Perspect ASHA Spec Interest Groups. 2020 Dec 14;5(6):1647–1656. doi: 10.1044/2020_persp-20-00073

Automatic Estimation of Laryngeal Vestibule Closure Duration Using High- Resolution Cervical Auscultation Signals

Aliaa Sabry a,b, Amanda S Mahoney a, Shitong Mao c, Yassin Khalifa c, Ervin Sejdić c,d,e, James L Coyle a
PMCID: PMC9355454  NIHMSID: NIHMS1772334  PMID: 35937555

Abstract

Purpose:

Safe swallowing requires adequate protection of the airway to prevent swallowed materials from entering the trachea or lungs (i.e., aspiration). Laryngeal vestibule closure (LVC) is the first line of defense against swallowed materials entering the airway. Absent LVC or mistimed/ shortened closure duration can lead to aspiration, adverse medical consequences, and even death. LVC mechanisms can be judged commonly through the videofluoroscopic swallowing study; however, this type of instrumentation exposes patients to radiation and is not available or acceptable to all patients. There is growing interest in noninvasive methods to assess/monitor swallow physiology. In this study, we hypothesized that our noninvasive sensor- based system, which has been shown to accurately track hyoid displacement and upper esophageal sphincter opening duration during swallowing, could predict laryngeal vestibule status, including the onset of LVC and the onset of laryngeal vestibule reopening, in real time and estimate the closure duration with a comparable degree of accuracy as trained human raters.

Method:

The sensor-based system used in this study is high-resolution cervical auscultation (HRCA). Advanced machine learning techniques enable HRCA signal analysis through feature extraction and complex algorithms. A deep learning model was developed with a data set of 588 swallows from 120 patients with suspected dysphagia and further tested on 45 swallows from 16 healthy participants.

Results:

The new technique achieved an overall mean accuracy of 74.90% and 75.48% for the two data sets, respectively, in distinguishing LVC status. Closure duration ratios between automated and gold-standard human judgment of LVC duration were 1.13 for the patient data set and 0.93 for the healthy participant data set.

Conclusions:

This study found that HRCA signal analysis using advanced machine learning techniques can effectively predict laryngeal vestibule status (closure or opening) and further estimate LVC duration. HRCA is potentially a noninvasive tool to estimate LVC duration for diagnostic and biofeedback purposes without X-ray imaging.


Swallowing is a complex neuromuscular process in- volving the integration of two distinct but related functions: airway protection and bolus transport. This complex process involves volitional and reflexive neu- ral activities paired with coordinated contraction of many paired muscle groups. The result of this process is specific biomechanical events, which are executed in a sequential temporal order to ensure safe and efficient swallowing. Al- though there is variability within and among humans, any disturbance of these biomechanical events caused by disease can lead to swallowing disorders, known as dysphagia.

Entrance of food or liquid into the airway during the pharyngeal stage of swallowing is known as aspiration. As- piration is generally considered the most concerning compo- nent of swallowing dysfunction and may lead to possibly fatal pulmonary consequences, especially for individuals with neurologic and neurodegenerative diseases (Cabib et al., 2016) or already compromised respiratory systems. Laryngeal vestibule closure (LVC) is usually considered the primary and most critical aspect of laryngeal function during swallowing, providing protection for the airway against the entrance of swallowed materials. LVC is defined as the collapse of the laryngeal inlet via arytenoid adduction and arytenoid approximation to the epiglottis during epi- glottic inversion (Logemann et al., 1992). The closure of the laryngeal airway occurs in a peristaltic-like motion by a caudal-to-rostral compression while the larynx shortens, facil- itating approximation of the epiglottis to the laryngeal inlet. This pattern of closure, which is observable through video- fluoroscopic studies (VFSs) of swallowing function, prevents airway invasion by closing off the airway while squeezing aberrant swallowed material out of the laryngeal vestibule (LV; Ekberg, 1982; Ekberg & Nylander, 1982).

Timely and complete LVC is vital to safe and success- ful swallowing. Incomplete closure or shortened LVC dura- tion may cause laryngeal penetration, in which swallowed material that enters the LV remains above the level of the vocal folds, and/or tracheal aspiration of swallowed mate- rials (Mann et al., 1999; Robbins et al., 1993). Shortened LVC duration is significantly associated with an increased incidence of aspiration (Cabib et al., 2016). In fact, short- ened LVC duration is the primary impairment for predicting aspiration in patients following stroke (Power et al., 2007).

The published literature reports a wide range of LVC durations, with mean values from 0.31 to 1.07, depending on the presence or absence of certain factors (Humbert et al., 2018; Logemann et al., 1992, 2000; Logemann et al., 2002; Molfenter & Steele, 2012; Ohmae et al., 1996; Ohmae et al., 1995; Park et al., 2010). Prolonged LVC duration has been observed with increasing bolus volumes, with longer pharyngeal transit durations (Kang et al., 2010; Kendall et al., 2003; S. J. Kim et al., 2010; Y. Kim et al., 2005; Martin-Harris et al., 2003; Rofes et al., 2010; Rosenbek et al., 1996), and during the performance of swallow maneu- vers such as the effortful swallow and the chin-down posture (Hind et al., 2001; Macrae et al., 2014; Young et al., 2015). Intentionally increasing LVC duration during swallowing in patients with shortened LVC duration has been investi- gated as a method of improving airway protection for de- cades. The supraglottic swallow maneuver, described in 1993, was designed to volitionally close the upper airway before swallowing in patients with a supraglottic laryngec- tomy whose epiglottis had been resected (Mendelsohn & Martin, 1993). This maneuver, as well as its sibling, the super-supraglottic swallow, which exaggerates contact be- tween the arytenoids and the epiglottic base in nonresected patients, has been adapted for use in patients with dyspha- gia whose laryngeal anatomy remains intact and is a main- stay of dysphagia compensatory management for many patients (Lazarus et al., 1993). Much of the literature has demonstrated that healthy individuals and individuals with dysphagia due to stroke could volitionally prolong LVC after training (Azola et al., 2015; Lazarus et al., 1993; Macrae et al., 2014; Mendelsohn & Martin, 1993; Young et al., 2015). Direct volitional control of the timing and duration of LVC has enormous rehabilitation potential for individ- uals with dysphagia.

VFS, a real-time, dynamic X-ray technique, is the only standard instrumental assessment to visualize LVC and to determine LVC duration during swallowing (Martin-Harris & Jones, 2008). The duration of LVC is the measure of how long the LV remains completely closed. In VFS images, complete LVC is defined as no visible airspace or barium contrast in the LV given complete contact of the arytenoids to the base of the epiglottis and full epiglottic inversion over the base of the arytenoids (Logemann et al., 1992). VFS can be used to train volitional prolongation of LVC by pro- viding patients with kinematic visual biofeedback. However, VFS has inherent challenges such as patients’ exposure to radiation. Radiation safety standards limit exposure time during VFS; thus, data collection opportunities are time sensitive, and despite its superior visualization of the entire aerodigestive mechanism during swallowing, the use of VFS for visual biofeedback during treatment to acquire compen- satory volitional augmentation of LVC is impossible. VFS may not be feasible in facilities without X-ray departments, and facilities may not have qualified clinicians to perform and interpret the VFS images. Additionally, some patients may refuse X-ray testing or have other conditions limiting its accessibility or feasibility (Bonilha et al., 2013; Nierengarten, 2009; Steele et al., 2007; Zammit-Maempel et al., 2007).

Although acquiring temporal measurements of LVC duration would be invaluable when managing many patients with dysphagia, it is rarely quantified during imaging studies of swallowing function. During VFSs, LVC is typically judged as present, absent, or incomplete, but temporal mea- surements are not assessed.

There are limitations in a typical clinical setting that prevent the frequent temporal measurement of LVC, which result in these broad categorical judgments. Swallow kine- matic analysis using frame-by-frame review of VFS images is not typically performed by clinicians because very few have the required training or confirmation of their judgment reliability. Some clinicians may not have the ability to re- cord VFS images for secondary review due to lack of equip- ment or limited access to archived materials. Additionally, a minimum temporal resolution of 30 frames per second (fps) is required to properly assess LVC duration. Recording at reduced frame rates (i.e., 7.5 or 15 fps), a common practice, is inadequate for accurately capturing LVC timing due to its short duration (Bonilha et al., 2013).

Adding temporal measures to the evaluation of LVC could provide clinicians with objective swallowing kinematic data, which could be compared to published, normative data, and provide clinical evidence of increased risk of air- way compromise (Humbert et al., 2018; Molfenter & Steele, 2012). Successfully achieving this goal would help initiate appropriate compensatory interventions to reduce dysphagia complications through timely diagnosis. The benefits of having objective LVC data and the limitations of using VFS indicate that clinicians would benefit from a noninvasive, alternative method to estimate LVC duration. Naturally, the ability to obtain LVC information noninvasively would revolutionize efforts to stabilize or improve LVC timing and duration in people with dysphagia.

One potential noninvasive alternative for quantifying LV temporal measures is high-resolution cervical ausculta- tion (HRCA). Traditional cervical auscultation (CA) is a method by which a clinician uses a stethoscope on a patient’s throat to assess swallowing and airway sounds. The cardiac analogy hypothesis suggests that CA acoustic signals are generated via vibrations caused by valve and pump systems within the upper aerodigestive tract. As with heart valves that open and close during the cardiac cycle, valves in the upper aerodigestive tract produce characteristic acoustic signals during different stages of swallowing (Cichero & Murdoch, 1998). However, the transmission of swallow in- formation may be incomplete due to the limited receiving bandwidth of a stethoscope, and the interpretation of these sounds by judges listening through a stethoscope can be bounded by the limits of the hearing frequency range of humans. Likewise, numerous well-designed studies have con- firmed the very low interjudge agreement for CA sounds, rendering it a relatively weak diagnostic method (Leslie et al., 2004). Therefore, CA cannot be considered a valid and reliable screening or assessment tool for swallowing function due to imprecise and incomplete interpretation of these signals (Sejdić et al., 2018).

HRCA exhibits unbiased and reliable interpretations as compared to conventional CA assessment. HRCA uses high-resolution accelerometers and microphones, attached to patients’ necks, to record vibratory and acoustic signals during swallowing (Dudik et al., 2015; Movahedi et al., 2016). In line with the cardiac analogy hypothesis, the strik- ing of the epiglottis and arytenoids may be the valve activity that generates swallowing sounds and vibrations during LVC, which can be recorded with HRCA.

HRCA is an easily mobile, noninvasive tool, which is suitable for daily monitoring of swallow function. Ad- vanced technology using artificial intelligence through ma- chine learning techniques enables HRCA signal analysis by and the diameter of upper esophageal sphincter maximal opening (Shu, 2019). Given recent advances in signal pro- cessing algorithms, HRCA could provide a fundamental contribution to dysphagia management.

In this study, we investigated the ability of advanced machine learning techniques to predict LVC and laryngeal vestibule reopening (LVO) through HRCA signal analysis, thus allowing a predicted estimation of LVC duration. We hypothesized that, by analyzing HRCA signals using ma- chine learning techniques, we could predict LVC and LVO status in real time and estimate the duration of LVC with a comparable degree of accuracy as trained human raters. Successfully achieving this aim would significantly improve LVC duration estimation by making it more automatic and objective.

Method

Data Collection and Equipment

Two sets of data were collected; the first data set was composed of 588 swallows from 120 enrolled patients with various diagnoses and etiologies of dysphagia, and the sec- ond data set was composed of 45 swallows from 16 healthy community dwellers. Patient and healthy participant characteristics can be found in Table 1.

Table 1.

Demographic data of the participants

Variable First data set (patients with dysphagia) Second data set (healthy participants)

Participants 68 men 9 men
52 women 7 women
Age M = 64 years M = 64 years
Range: 19–94 Range: 55–75
Swallows (n) 588 45

All patients and healthy participants underwent VFS at the University of Pittsburgh Medical Center Presbyte- rian Hospital. Since the aim of this study was to investigate the feasibility of our system’s ability to predict LVC regard- less of other variables, we intentionally did not control for patient variables, including the patient’s diagnosis or charac- teristics of swallowed materials. Data for patients were collected during routine clinical VFSs, which resulted in various volumes and consistencies of swallowed material. Healthy participants swallowed only thin liquids of various volumes. All patients and healthy participants in this study signed informed consent forms, and the data collection pro- tocol was approved by the institutional review board of the University of Pittsburgh.

VFSs for patients were conducted in the lateral plane using an X-ray machine (Ultimax system, Toshiba) with a pulse rate of 30 fps. Healthy participant data were collected in the lateral plane with a Precision 500D X-ray system (GE Healthcare) with a pulse rate of 30 fps. To ensure that different resolutions did not affect judgment of kinematic events, we resampled a subset of the original VFS data to match the sample rate of the new machine. Five judges using feature extraction and complex algorithms. HRCA has recently shown promise in the autonomous detection of Variable First data set (patients with dysphagia) Second data set (healthy participants) many swallow kinematic events. HRCA signals have been found to be associated with hyoid bone displacement (He et al., 2019), LVC, and the contact of the base of the tongue with the posterior pharyngeal wall (Kurosu et al., 2019). Furthermore, HRCA successfully detected vertical and hori- zontal displacements of the hyoid bone (Rebrion et al., 2018) labeled nine swallowing kinematic events, including LVC and LVO, using native and resampled resolutions. The level of agreement between human labels at the different resolu- tions was excellent for all measures, with interjudge intra- class correlation coefficients (ICCs) at or above .99. VFS videos were captured on an AccuStream Express HD video card (Foresight Imaging), digitized with a sampling rate of 60 fps, and then saved to a hard disk using LabView’s SignalExpress (National Instruments).

The sensor signals were collected concurrent to VFS examinations using a triaxial accelerometer neck sensor and contact microphone. The accelerometer (ADXL327, Analog Devices) was attached to the midline of the partici- pant’s anterior neck at the level of the cricoid cartilage with surgical tape to obtain the best contact (Takahashi et al., 1994). The sensors’ axes were aligned to the anatomical di- rections of anterior–posterior, superior–inferior, and medial– lateral, respectively. The sensor was powered by a power supply (Model 1504, B&K Precision) with a 3V output, and the resulting signals were bandpass filtered from 0.1 to 3000 Hz and amplified tenfold (Model P55, Grass Technologies). The microphone (Model C411L, AKG), which was powered by a power supply (Model B29l, AKG), was placed below the accelerometer and slightly toward the right lateral side of the trachea. This location has previously been described to be appropriate for collecting swallowing sound signals without interfering with visualization of the proximal tra- chea or larynx (Cichero & Murdoch, 2002; Takahashi et al., 1994). All signals acquired by the accelerometer and micro- phone were fed into a National Instruments 6210 data ac- quisition device and recorded at 20 kHz by the LabView program (SignalExpress, National Instruments). This setup has been shown to be effective at detecting swallowing activ- ity in previous studies (Dudik et al., 2016; Lee et al., 2010).

Data Labeling

All videos were segmented into individual swallows. Swallow durations were defined as the frame in which the head of the bolus reached the ramus of the mandible (onset) to the frame in which the hyoid returned to its lowest posi- tion following clearance of the bolus from the pharynx (off- set). The corresponding HRCA signals were also segmented according to the frames of onset and offset. Reliability of segmentation was established on 10% of the videos with ICCs of over .99, and intrarater reliability was maintained throughout testing to avoid judgment drift.

Two trained raters labeled the first closure and the first reopening of the LV from VFS X-ray videos for each swallow sample (see Figure 3). Reliability was established on 10% of the videos with ICCs of over .99, and intrarater reliability was maintained throughout testing to avoid judg- ment drift. The criteria in judging the LV status are listed in Table 2.

Figure 3.

Figure 3.

The accuracy levels for the laryngeal vestibule status prediction across the 10 validation groups.

Table 2.

Definitions of swallow kinematic measures.

Measure Definition

Onset of LVC The first frame in which no air or barium contrast is seen in the collapsed LV (between the arytenoids and the base of the epiglottis).
Onset of LVO The first frame in which the LV reopens. It is the frame of the first obvious airspace reappearance within the LV.

Note. LVC = laryngeal vestibule closure; LV = laryngeal vestibule; LVO = laryngeal vestibule reopening.

Once the onset values for LVC and LVO were recorded by judges, the data were entered into machine learning routines to enable training and testing of the accuracy of the algorithms.

Deep Neural Network Architecture, Training, and Testing

An advanced hybrid deep neural network combining a convolutional neural network and a recurrent neural network, called a convolutional recurrent neural network (CRNN), was used to build the relationship between the HRCA signals and the LVC duration by predicting the LVC and LVO statuses. Artificial neural networks are loosely based on the neuronal networks in humans. They are typi- cally organized in “layers” and contain “learning rules,” which allow the network to recognize underlying patterns between input and output. The network is repeatedly trained based on observed data sets until it recognizes the patterns, and then, the model is tested on a novel or “unseen” data set to evaluate the model fit or how well the network has “learned.”

In this study, the two LV statuses (opened and closed) were coded as “0” and “1,” respectively. The human-labeled LV statuses were translated to the computer program through this binary sequence (see Figure 1). The CRNN F1 model was given the binary sequence for each swallow frame series (i.e., the first frame through the last frame of the swallow), with the corresponding HRCA signal seg- ments. The CRNN was trained to mathematically model the relationship between the HRCA signals and the LV statuses.

Figure 1.

Figure 1.

An illustration of the use of the temporal binary classification method to train the convolutional recurrent neural network architecture. The events of laryngeal vestibule (LV) closure and LV reopening were labeled by an experienced rater in kinematic analysis of videofluoroscopic swallowing videos. The numbers “0” and “1” represent the opening and closure of the LV, respectively. VFSS = videofluoroscopic swallowing study.

A tenfold cross-validation technique was used to de- velop the CRNN model. In tenfold cross validation, all samples are divided into 10 nonoverlapping training groups. During training, nine of the 10 groups are used to “train” the model by providing feedback to help the model predict the human labels using signals only. The remaining sample is used as a validation set to evaluate or essentially help the model find parameters (i.e., other factors), which may not have been identified during training with the initial nine groups. This process is repeated a total of 10 times, with each sample used as a validation set once.

For this study, the 588 patient swallowing samples were randomly divided into 10 patient-specific training groups. In other words, an individual patient’s swallows were contained within one group and not spread across any of the remaining nine groups. The groups were used for training and validating the CRNN to predict LVC and LVO based on HRCA signals alone. Once the tenfold vali- dation was completed, the “unseen” data set of 45 healthy participant swallows was used as a testing set to evaluate the final model fit (i.e., to determine how well the model could predict LVC and LVO using HRCA signals without having ever “seen” the data) in order to evaluate how well the model generalized to new information.

Results

The following results reveal the accuracy of the CRNN model. We use the term accuracy to characterize the percentage of the frames that were correctly predicted, as compared to the human labels. First, the accuracy of the model to predict the frame number of the onset of LVC (within ± 3 frames of the human label; Lof & Robbins, 1990) for the patient data set was 62.07% (mean error value = 0.19 ± 4.5 frames), and that to predict the frame number of the onset of LVO was 60.03% (mean error value = 0.08 ± 4.9 frames). For the healthy participant data set, whose data were not included in the training process, the accuracy of model prediction for the frame number of the onset of LVC (within ± 3 frames of the human label) was 66.22% (mean error value = 0.73 ± 5.2 frames), and that for the frame number of the onset of LVO was 64.44% (mean error F2 value = 0.73 ± 5.2 frames). Figure 2 illustrates the frame error distribution for the validation sets and the testing set.

Figure 2.

Figure 2.

The frame error distribution for the validation results. The red bars represent an error no larger than 3 frames. Panels (a) and (b) show the distribution of the onset of laryngeal vestibule closure and the onset of laryngeal vestibule reopening, respectively, for the tenfold validation data set, which contained 588 swallowing samples. Panels (c) and (d) show the distribution of the onset of laryngeal vestibule closure and the onset of laryngeal vestibule reopening, respectively, for the testing data set, which contained 45 unseen swallowing samples.

Mean overall accuracy is the ratio of the number of frames that were correctly predicted by the algorithm (whether the LV was opened or closed) over the total number of frames for all swallows. The model’s mean overall accuracy for predicting the LV status (whether the LV was opened or closed) across the 10 groups from the training set of patient swallows was 74.90%. The accuracy levels of the 10 validation groups for LV status prediction are shown in Figure 3. The mean overall accuracy for distinguishing the LV status (opening and closure) from the testing data set of 45 healthy participant swallows was 75.48%.

Finally, to evaluate the model’s predictive ability for LVC duration, we used a duration ratio. The duration ra- tio was calculated as the predicted number of frames for which the LV is closed over the human-labeled LVC frames for which the LV is closed. The closer the ratio is to 1, the closer the model’s prediction was to the human-calculated duration. The duration ratio for the 10 patient validation groups is listed in Table 3. The overall mean value for the duration ratio from the patient data set was 1.13, indicating that the model slightly overestimated the number of frames in which the LV was closed. The overall mean value for the duration ratio from the healthy participant data set was 0.93, indicating that the model slightly underestimated the number of frames in which the LV was closed.

Table 3.

The ratio of laryngeal vestibule closure across the 10 validation groups.

Variable Group 1 Group 2 Group 3 Group 4 Group 5
Duration ratio 1.15 0.86 0.94 1.05 1.17
Variable Group 6 Group 7 Group 8 Group 9 Group 10
Duration ratio 1.06 1.062 1.25 1.64 1.11

Discussion

The primary aim of this study was to determine the feasibility of HRCA signals to predict the LV status (open, closed) during swallowing with an advanced computer-aided approach and, thus, noninvasively estimate the duration of LVC. We demonstrated that a highly complex and nonlin- ear relationship between the LV status and HRCA signals can be established via advanced deep learning algorithms, such as the proposed hybrid neural network in this study.

The CRNN model autonomously predicted the LV status based on HRCA signal input alone, independent from the manual analysis of the VFS videos by human judges, which were used to assess the model’s performance. Our experimental results revealed that the overall accuracy of the model to distinguish the LV status (open, closed) was around 75% for both validation and testing data sets, sug- gesting that the CRNN algorithm is capable of distinguish- ing the LV status (open, closed) based only on HRCA signals and, therefore, LVC duration.

The mean accuracies for machine-predicted LVC and LVO frames for the testing group of healthy participants’ “unseen data” were higher than the accuracies for the train- ing and validation sets of patients’ “seen data,” which un- derscores the robustness of the CRNN model. It is unclear why the participant testing data had larger mean error values than the patient data, but a possible explanation could be differences between patient and healthy swallow kinematics. The algorithm was trained and validated only on disordered swallows but was tested on healthy swallows. Regardless, the higher accuracies seen in the tested set support the utility of the algorithm; however, the system is not yet ready for clinical implementation. This study estab- lished feasibility and illustrated the model’s relatively im- pressive performance in accurately identifying events of very short duration. These events were detected from among all events occurring during a swallow sequence. We intend to hone the system’s precision in future investigations.

HRCA also has the potential to be used as a nonin- vasive biofeedback tool during swallowing rehabilitation. Dysphagia management is designed to target the underlin- ing biomechanical impairment during swallowing, which can be achieved through behavioral modifications such as swallowing maneuvers. However, when training swallowing maneuvers, patients are expected to exert volitional control over laryngeal structures. This presents treatment challenges when imaging-based visual biofeedback is unavailable be- cause individuals with dysphagia may not be familiar with laryngeal function. Providing patients with extrinsic feed- back could improve patient compliance, performance accu- racy, and overall outcomes, as has been demonstrated with other signal-based biofeedback methods (Martin-Harris et al., 2017; Steele et al., 2012).

In clinical settings, the combination of a clinician’s verbal feedback with visual biofeedback (i.e., kinematic feedback such as videofluoroscopy or fiberoptic endoscopic evaluation of swallowing or nonkinematic feedback such as signal waveforms, numerical data, or graphs) corresponding to the patient’s target movement can intensify the impact of extrinsic feedback (Crary & Groher, 2000; Humbert & Joel, 2012). Unlike limbs, the volitional control of the lar- ynx is a relatively obscure act without externally observable activity upon which to base motor learning. The amplified effect of combined extrinsic feedback may augment the patient’s intrinsic feedback system, which monitors the move- ment of muscles and joints and general body position, thus allowing the patient to make more accurate approximations of targeted gross and fine movements (Abbruzzese et al., 2014; Gandevia et al., 2002) and, ultimately, support learn- ing the target task (Dayan & Cohen, 2011; Taubert et al., 2011).

HRCA can provide biofeedback by estimating LVC and LVO, thereby providing LVC duration to patients. Using HRCA in this way would limit radiation exposure and could improve patient accuracy for targets related to LVC and LVO onset and volitional LC prolongation, thus promoting better airway protection.

Methods of improving skill acquisition, along with schedules for dosage and intensity as well as reinforcement and feedback, are important components of rehabilitation treatment taxonomies (Hart et al., 2019). Imagine, for ex- ample, there is an HRCA visual biofeedback device, which provides the patient with a simple visual representation of laryngeal closure and opening (e.g., red [open] or green [closed] lights) as biofeedback. This type of system could provide the clinician and the patient with LVC duration information as well as provide the patient with visual feed- back during skill acquisition to help support them achieve their therapy goal.

HRCA provides an objective tool to noninvasively analyze laryngeal behavior during swallowing, which can provide trackable outcome measures and help demonstrate and document the efficacy of interventions to reduce aspi- ration risk. The newly proposed machine learning tech- nique using a CRNN model enabled us to analyze HRCA signals associated with specific swallowing kinematic events (LVC, LVO) and aligns with other research in our lab dem- onstrating the association between HRCA signals and hyoid bone displacement (He et al., 2019), LVC, the contact of the base of the tongue with the posterior pharyngeal wall (Kurosu et al., 2019), and the diameter of upper esophageal sphincter maximal opening (Shu, 2019). This new technique has potential for further noninvasive swallowing function examination for other kinematic events such as tongue base retraction or epiglottic inversion, which could not be completely perceived or precisely analyzed previously.

The aim of this study was to determine the ability of the sensors and the CRNN to independently predict the LV status regardless of age, gender, or diagnosis; however, these considerations provide interesting directions for future research. Researchers could investigate systematic changes in model predictions of LVC and LVO. Considerations for changes include varying bolus volumes and consistencies, various patient characteristics (e.g., age, gender, diagnosis), and disease characteristics (e.g., disease/dysphagia sever- ity, infarct location from stroke, and degenerative disease progression).

Further considerations for future research include ex- ploring factors for machine learning, such as model struc- ture, learning algorithms, and hyperparameter tuning. These factors may improve the accuracy of the CRNN model, thus ensuring the identification of “safe” swallows and avoiding the over- or underestimation of LV closure. Ideally, clinical trials should investigate the efficacy of HRCA as a noninvasive biofeedback tool to augment training in voli- tional laryngeal closure and to establish its use as a swallow- ing intervention to reduce aspiration.

Limitations

One limitation of the current study is that the model was trained on patient swallows and did not incorporate healthy swallows, which may have improved its performance. These machine learning algorithms perform more robustly when they are trained on heterogeneous exemplars (i.e., swal- lows) from the population under investigation. We also con- ducted training and testing of the model with relatively small sample sizes. Generally, larger training sample sizes are pre- ferred in the machine learning process. A larger sample of swallows would have increased the opportunity for the model to characterize less common perturbations in swallow physi- ology; the accuracy in modeling the novel test data subset would most likely be improved. Our results are considered preliminary and will likely improve as we increase the sample size and train the model with healthy swallows; however, this study demonstrates the feasibility of using HRCA to predict LV status and LVC duration.

Conclusions

This study found that HRCA signal analysis using an advanced machine learning technique can effectively pre- dict LV status (opening or closure) and accurately estimate LVC duration. This provides a potential noninvasive tool to estimate LVC duration for diagnostic and biofeedback pur- poses in managing patients with dysphagia as an adjunct to X-ray imaging.

Acknowledgments

Disclosures

Financial: Aliaa Sabry has no financial interests to disclose. Amanda S. Mahoney has no financial interests to disclose. Shitong Mao has no financial interests to disclose. Yassin Khalifa has no financial interests to disclose. Research reported in this publication was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health under Grant R01HD092239, awarded to Ervin Sejdić (Principal Investigator), whereas the data were collected under Grant R01HD074819, awarded to Ervin Sejdić and James L. Coyle (Co-Principal Investigators). This work was also supported by National Science Foundation CAREER Award 1652203, awarded to Ervin Sejdić (Principal Investigator).

Nonfinancial: Aliaa Sabry has no nonfinancial interests to disclose. Amanda S. Mahoney has no nonfinancial interests to disclose. Shitong Mao has no nonfinancial interests to disclose. Yassin Khalifa has no nonfinancial interests to disclose. Ervin Sejdić has no nonfinancial interests to disclose. James L. Coyle has no nonfinancial interests to disclose.

Footnotes

Disclaimer

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the National Science Foundation.

References

  1. Abbruzzese G, Trompetto C, Mori L, & Pelosin E (2014). Pro- prioceptive rehabilitation of upper limb dysfunction in move- ment disorders: A clinical perspective. Frontiers in Human Neuroscience, 8, 1–8. 10.3389/fnhum.2014.00961 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Azola AM, Greene LR, Taylor-Kamara I, Macrae P, Anderson C, & Humbert IA (2015). The relationship be- tween submental surface electromyography and hyo-laryngeal kinematic measures of Mendelsohn Maneuver duration. Journal of Speech, Language, and Hearing Research, 58(6), 1627–1636. 10.1044/2015_JSLHR-S-14-0203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bonilha HS, Blair J, Carnes B, Huda W, Humphries K, McGrattan K, Michel Y, & Martin-Harris B (2013). Pre- liminary investigation of the effect of pulse rate on judgments of swallowing impairment and treatment recommendations. Dysphagia, 28(4), 528–538. 10.1007/s00455-013-9463-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bonilha HS, Humphries K, Blair J, Hill EG, McGrattan K, Carnes B, Huda W, & Martin-Harris B (2013). Radiation exposure time during MBSS: Influence of swallowing impairment severity, medical diagnosis, clinician experience, and standard- ized protocol use. Dysphagia, 28(1), 77–85. 10.1007/s00455-012-9415-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cabib C, Ortega O, Kumru H, Palomeras E, Vilardell N, Alvarez-Berdugo D, Muriana D, Rofes L, Terré R, Mearin F, & Clavé P (2016). Neurorehabilitation strategies for post- stroke oropharyngeal dysphagia: From compensation to the recovery of swallowing function. Annals of the New York Academy of Sciences, 1380(1), 121–138. 10.1111/nyas.13135 [DOI] [PubMed] [Google Scholar]
  6. Cichero JA, & Murdoch BE (1998). The physiologic cause of swallowing sounds: Answers from heart sounds and vocal tract acoustics. Dysphagia, 13(1), 39–52. 10.1007/PL00009548 [DOI] [PubMed] [Google Scholar]
  7. Cichero JA, & Murdoch BE (2002). Detection of swallowing sounds: Methodology revisited. Dysphagia, 17(1), 40–49. 10.1007/s00455-001-0100-x [DOI] [PubMed] [Google Scholar]
  8. Crary MA, & Groher ME (2000). Basic concepts of surface electromyographic biofeedback in the treatment of dysphagia: A tutorial. American Journal of Speech-Language Pathology, 9(2), 116–125. 10.1044/1058-0360.0902.116 [DOI] [Google Scholar]
  9. Dayan E, & Cohen LG (2011). Neuroplasticity subserving motor skill learning. Neuron, 72(3), 443–454. 10.1016/j.neuron.2011.10.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Dudik JM, Jestrović I, Luan B, Coyle JL, & Sejdić E (2015). Characteristics of dry chin-tuck swallowing vibrations and sounds. IEEE Transactions on Biomedical Engineering, 62(10), 2456–2464. 10.1109/TBME.2015.2431999 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dudik JM, Kurosu A, Coyle JL, & Sejdić E (2016). A sta- tistical analysis of cervical auscultation signals from adults with unsafe airway protection. Journal of NeuroEngineering and Re- habilitation, 13(1), Article 7. 10.1186/s12984-015-0110-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ekberg O (1982). Closure of the laryngeal vestibule during de- glutition. Acta Oto-Laryngologica, 93(1–6), 123–129. 10.3109/00016488209130862 [DOI] [PubMed] [Google Scholar]
  13. Ekberg O, & Nylander G (1982). Cineradiography of the pharyn- geal stage of deglutition in 150 individuals without dysphagia. The British Journal of Radiology, 55(652), 253–257. 10.1259/0007-1285-55-652-253 [DOI] [PubMed] [Google Scholar]
  14. Gandevia SC, Refshauge KM, & Collins DF (2002). Pro- prioception: Peripheral inputs and perceptual interactions. In Gandevia SC, Proske U, & Stuart DG(Eds.), Sensorimotor control of movement and posture (pp. 61–58). Springer. 10.1007/978-1-4615-0713-0_8 [DOI] [PubMed] [Google Scholar]
  15. Hart T, Dijkers MP, Whyte J, Turkstra LS, Zanca JM, Packel A, Van Stan JH, Ferraro M, & Chen C (2019). A theory-driven system for the specification of rehabilitation treatments. Archives of Physical Medicine and Rehabilitation, 100(1), 172–180. 10.1016/j.apmr.2018.09.109 [DOI] [PubMed] [Google Scholar]
  16. He Q, Perera S, Khalifa Y, Zhang Z, Mahoney AS, Sabry A, Donohue C, Coyle J, & Sejdić E (2019). The association of high resolution cervical auscultation signal features with hyoid bone displacement during swallowing. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(9), 1810–1816. 10.1109/TNSRE.2019.2935302 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hind JA, Nicosia MA, Roecker EB, Carnes ML, & Robbins J (2001). Comparison of effortful and noneffortful swallows in healthy middle-aged and older adults. Archives of Physical Medicine and Rehabilitation, 82(12), 1661–1665. 10.1053/apmr.2001.28006 [DOI] [PubMed] [Google Scholar]
  18. Humbert IA, & Joel S (2012). Tactile, gustatory, and visual bio- feedback stimuli modulate neural substrates of deglutition. Neuro- Image, 59(2), 1485–1490. 10.1016/j.neuroimage.2011.08.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Humbert IA, Sunday KL, Karagiorgos E, Vose AK, Gould F, Greene L, Azola A, Tolar A, & Rivet A (2018). Swallow- ing kinematic differences across frozen, mixed, and ultrathin liquid boluses in healthy adults: Age, sex, and normal variabil- ity. Journal of Speech, Language, and Hearing Research, 61(7), 1544–1559. 10.1044/2018_JSLHR-S-17-0417 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kang BS, Oh B-M, Kim IS, Chung SG, Kim SJ, & Han TR (2010). Influence of aging on movement of the hy- oid bone and epiglottis during normal swallowing: A motion analysis. Gerontology, 56(5), 474–482. 10.1159/000274517 [DOI] [PubMed] [Google Scholar]
  21. Kendall KA, Leonard RJ, & McKenzie SW (2003). Sequence variability during hypopharyngeal bolus transit. Dysphagia, 18(2), 85–91. 10.1007/s00455-002-0086-z [DOI] [PubMed] [Google Scholar]
  22. Kim SJ, Han TR, & Kwon TK (2010). Kinematic analysis of hyolaryngeal complex movement in patients with dysphagia development after pneumonectomy. The Thoracic and Cardio- vascular Surgeon, 58(02), 108–112. 10.1055/s-0029-1186278 [DOI] [PubMed] [Google Scholar]
  23. Kim Y, McCullough GH, & Asp CW (2005). Temporal mea- surements of pharyngeal swallowing in normal populations. Dysphagia, 20(4), 290–296. 10.1007/s00455-005-0029-6 [DOI] [PubMed] [Google Scholar]
  24. Kurosu A, Coyle JL, Dudik JM, & Sejdić E (2019). Detec- tion of swallow kinematic events from acoustic high-resolution cervical auscultation signals in patients with stroke. Archives of Physical Medicine and Rehabilitation, 100(3), 501–508. 10.1016/j.apmr.2018.05.038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Lazarus C, Logemann JA, & Gibbons P (1993). Effects of maneuvers on swallowing function in a dysphagic oral cancer patient. Head & Neck, 15(5), 419–424. 10.1002/hed.2880150509 [DOI] [PubMed] [Google Scholar]
  26. Lee J, Sejdić E, Steele CM, & Chau T (2010). Effects of liq- uid stimuli on dual-axis swallowing accelerometry signals in a healthy population. BioMedical Engineering OnLine, 9(1), Article 7. 10.1186/1475-925X-9-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Leslie P, Drinnan MJ, Finn P, Ford GA, & Wilson JA (2004). Reliability and validity of cervical auscultation: A con- trolled comparison using videofluoroscopy. Dysphagia, 19(4), 231–240. 10.1007/s00455-004-0007-4 [DOI] [PubMed] [Google Scholar]
  28. Lof GL, & Robbins J (1990). Test–retest variability in normal swallowing. Dysphagia, 4(4), 236–242. 10.1007/BF02407271 [DOI] [PubMed] [Google Scholar]
  29. Logemann JA, Kahrilas PJ, Cheng J, Pauloski BR, Gibbons PJ, Rademaker AW, & Lin S (1992). Closure mechanisms of laryngeal vestibule during swallow. American Journal of Physiology-Gastrointestinal and Liver Physiology, 262(2), G338–G344. 10.1152/ajpgi.1992.262.2.G338 [DOI] [PubMed] [Google Scholar]
  30. Logemann JA, Pauloski BR, Rademaker AW, Colangelo LA, Kahrilas PJ, & Smith CH (2000). Temporal and biomechanical characteristics of oropharyngeal swallow in younger and older men. Journal of Speech, Language, and Hearing Research, 43(5), 1264–1274. 10.1044/jslhr.4305.1264 [DOI] [PubMed] [Google Scholar]
  31. Logemann JA, Pauloski BR, Rademaker AW, & Kahrilas PJ (2002). Oropharyngeal swallow in younger and older women. Journal of Speech, Language, and Hearing Research, 45(3), 434–445. 10.1044/1092-4388(2002/034) [DOI] [PubMed] [Google Scholar]
  32. Macrae P, Anderson C, Taylor-Kamara I, & Humbert I (2014). The effects of feedback on volitional manipulation of airway protection during swallowing. Journal of Motor Behavior, 46(2), 133–139. 10.1080/00222895.2013.878303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Mann G, Hankey GJ, & Cameron D (1999). Swallowing function after stroke: Prognosis and prognostic factors at 6 months. Stroke, 30(4), 744–748. 10.1161/01.STR.30.4.744 [DOI] [PubMed] [Google Scholar]
  34. Martin-Harris B, Brodsky MB, Price CC, Michel Y, & Walters B (2003). Temporal coordination of pharyngeal and laryngeal dynamics with breathing during swallowing: Single liquid swallows. Journal of Applied Physiology, 94(5), 1735–1743. 10.1152/japplphysiol.00806.2002 [DOI] [PubMed] [Google Scholar]
  35. Martin-Harris B, Garand KL, & McFarland D (2017). Opti- mizing respiratory-swallowing coordination in patients with oropharyngeal head and neck cancer. Perspectives of the ASHA Special Interest Groups, 2(13), 103–110. 10.1044/persp2.SIG13.103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Martin-Harris B, & Jones B (2008). The videofluorographic swallowing study. Physical Medicine and Rehabilitation Clinics of North America, 19(4), 769–785. 10.1016/j.pmr.2008.06.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Mendelsohn MS, & Martin RE (1993). Airway protection during breath-holding. Annals of Otology, Rhinology & Laryngology, 102(12), 941–944. 10.1177/000348949310201206 [DOI] [PubMed] [Google Scholar]
  38. Molfenter SM, & Steele CM (2012). Temporal variability in the deglutition literature. Dysphagia, 27(2), 162–177. 10.1007/s00455-012-9397-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Movahedi F, Kurosu A, Coyle JL, Perera S, & Sejdić E (2016). Anatomical directional dissimilarities in tri-axial swal- lowing accelerometry signals. IEEE Transactions on Neural Sys- tems and Rehabilitation Engineering, 25(5), 447–458. 10.1109/TNSRE.2016.2577882 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Nierengarten MB (2009). Evaluating dysphagia: Current ap- proaches. Oncology Times, 31(14), 29–30. 10.1097/01.COT.0000358150.50765.f2 [DOI] [Google Scholar]
  41. Ohmae Y, Logemann JA, Hanson DG, Kaiser P, & Kahrilas PJ (1996). Effects of two breath-holding maneuvers on oro- pharyngeal swallow. Annals of Otology, Rhinology & Laryngology, 105(2), 123–131. 10.1177/000348949610500207 [DOI] [PubMed] [Google Scholar]
  42. Ohmae Y, Logemann JA, Kaiser P, Hanson DG, & Kahrilas PJ (1995). Timing of glottic closure during normal swallow. Head & Neck, 17(5), 394–402. 10.1002/hed.2880170506 [DOI] [PubMed] [Google Scholar]
  43. Park T, Kim Y, Ko D-H, & McCullough G (2010). Initiation and duration of laryngeal closure during the pharyngeal swallow in post-stroke patients. Dysphagia, 25(3), 177–182. 10.1007/s00455-009-9237-9 [DOI] [PubMed] [Google Scholar]
  44. Power ML, Hamdy S, Singh S, Tyrrell PJ, Turnbull I, & Thompson DG (2007). Deglutitive laryngeal closure in stroke patients. Journal of Neurology, Neurosurgery & Psychiatry, 78(2), 141–146. 10.1136/jnnp.2006.101857 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Rebrion C, Zhang Z, Khalifa Y, Ramadan M, Kurosu A, Coyle JL, Perera S, & Sejdić E (2018). High-resolution cervical auscultation signal features reflect vertical and horizon- tal displacements of the hyoid bone during swallowing. IEEE Journal of Translational Engineering in Health and Medicine, 7, 1–9. 10.1109/JTEHM.2018.2881468 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Robbins J, Levine RL, Maser A, Rosenbek JC, & Kempster GB (1993). Swallowing after unilateral stroke of the cerebral cortex. Archives of Physical Medicine and Rehabilitation, 74(12), 1295–1300. 10.1016/0003-9993(93)90082-L [DOI] [PubMed] [Google Scholar]
  47. Rofes L, Arreola V, Romea M, Palomera E, Almirall J, Cabré M, Serra-Prat M, & Clavé P (2010). Pathophysiology of oropharyngeal dysphagia in the frail elderly. Neurogastroenterology & Motility, 22(8), 851–e230. 10.1111/j.1365-2982.2010.01521.x [DOI] [PubMed] [Google Scholar]
  48. Rosenbek JC, Roecker EB, Wood JL, & Robbins J (1996). Thermal application reduces the duration of stage transition in dysphagia after stroke. Dysphagia, 11(4), 225–233. 10.1007/BF00265206 [DOI] [PubMed] [Google Scholar]
  49. Sejdić E, Malandraki GA, & Coyle JL (2018). Computational deglutition: Using signal- and image-processing methods to understand swallowing and associated disorders [life sciences]. IEEE Signal Processing Magazine, 36(1), 138–146. 10.1109/MSP.2018.2875863 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Shu K (2019). Association between diameter of upper esophageal sphincter maximal opening and high-resolution cervical ausculta- tion signal features [Unpublished doctoral dissertation]. Univer- sity of Pittsburgh. [Google Scholar]
  51. Steele C, Allen C, Barker J, Buen P, French R, Fedorak A, Irvine Day S, Lapointe J, Lewis L, MacKnight C, McNeil S, Valentine J, & Walsh L (2007). Dysphagia service deliv- ery by speech-language pathologists in Canada: Results of a national survey. Canadian Journal of Speech-Language Pathol- ogy and Audiology, 31(4), 166–177. [Google Scholar]
  52. Steele C, Bennett JW, Chapman-Jay S, Cliffe Polacco R, Molfenter SM, & Oshalla M (2012). Electromyography as a biofeedback tool for rehabilitating swallowing muscle func- tion. In Steele C (Ed.), Applications of EMG in clinical and sports medicine (pp. 311–328). InTech. [Google Scholar]
  53. Takahashi K, Groher ME, & Michi KI (1994). Methodology for detecting swallowing sounds. Dysphagia, 9(1), 54–62. 10.1007/BF00262760 [DOI] [PubMed] [Google Scholar]
  54. Taubert M, Lohmann G, Margulies DS, Villringer A, & Ragert P (2011). Long-term effects of motor training on resting- state networks and underlying brain structure. NeuroImage, 57(4), 1492–1498. 10.1016/j.neuroimage.2011.05.078 [DOI] [PubMed] [Google Scholar]
  55. Young JL, Macrae P, Anderson C, Taylor-Kamara I, & Humbert IA (2015). The sequence of swallowing events during the chin-down posture. American Journal of Speech-Language Pathology, 24(4), 659–670. 10.1044/2015_AJSLP-15-0004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Zammit-Maempel I, Chapple C-L, & Leslie P (2007). Radia- tion dose in videofluoroscopic swallow studies. Dysphagia, 22(1), 13–15. 10.1007/s00455-006-9031-x [DOI] [PubMed] [Google Scholar]

RESOURCES