Abstract
Objective:
The vibratory source for voicing in children with dysphonia is classified into three categories including a glottal vibratory source (GVS) observed in those with vocal lesions or hyperfunction; supraglottal vibratory sources (SGVS) observed secondary to laryngeal airway injuries, malformations, or reconstruction surgeries; and a combination of both glottal and supraglottal vibratory sources called mixed vibratory source (MVS). This study evaluated the effects of vibratory source on three primary dimensions of voice quality (breathiness, roughness, and strain) in children with GVS, SGVS, and MVS using single-variable matching tasks and computational measures obtained from bio-inspired auditory models.
Methods:
A total of 44 dysphonic voice samples from children aged 4 to 11 years were selected. Seven listeners rated breathiness, roughness, and strain of 1000-ms /ɑ/ samples using single-variable matching tasks. Computational estimates of pitch strength, amplitude modulation filterbank output, and sharpness were obtained through custom-designed MATLAB algorithms.
Results:
Perceived roughness and strain were significantly higher in children with SGVS and MVS compared to children with GVS. Among the computational measures, only the modulation filterbank output resulted in significant differences among vibratory sources; a post-hoc test revealed that children with SGVS had greater amplitude modulation than children with GVS, as expected from their rougher voice quality.
Conclusions:
The results indicate that the output of an auditory amplitude modulation filter bank model may capture characteristics of SGVS that are strongly related to the rough voice quality.
Keywords: Pediatric Voice Disorders, Auditory-Perceptual Evaluation of Voice, Acoustic Assessment, Voice Quality Post Airway Reconstruction, Supraglottic Vibratory Sources
Introduction
Pediatric voice disorders are relatively common with a reported incidence ranging from 6 to 19%.1,2 The vocal folds serve as the primary source of vibration for voice production and pediatric voice disorders often manifest as structural lesions (e.g., nodules, polyps, cysts) in the vocal folds. Such structural lesions are typically benign, and often cause incomplete closure of the vocal folds resulting in irregular vibration of the glottal vibratory source (GVS), which can be accompanied by excessive vocal effort.
Some children with dysphonia have been identified as using supraglottic vibratory sources (SGVS) or a combination of SGVS and GVS (mixed vibratory sources [MVS]).3 Children with SGVS or MVS are often born critically ill or prematurely, requiring frequent or prolonged tracheal intubation, ultimately requiring laryngeal-tracheal reconstruction to treat subglottic stenosis and reestablish a patent’s continuous upper/lower airway. Similarly, these children may have congenital laryngeal disorders such as glottal web and unilateral or bilateral vocal fold paralysis necessitating surgical intervention.4 These conditions and the requisite airway surgery can result in damage or dysfunction of the larynx, hampering proper vocal fold closure and limiting the function of the true vocal folds as a vibratory source for voice production. Instead, these children often use ventricular folds, or combinations of supraglottic structures (e.g., epiglottic petiole, arytenoids) as their primary source of voice production.5–8 Notably, some children with mixed vibratory sources have been observed to be able to switch between a predominantly SGVS to a GVS.9
Auditory-perceptual and acoustic assessments of voice quality from different vibratory sources
Voice quality has been more extensively studied in children with voice disorders who utilize GVS compared to those with SGVS, and it has been observed that children with GVS exhibit excessive breathiness, roughness, and strain,10–12 similar to adults. Acoustic changes associated with voice samples from children with vocal nodules, including increased jitter, shimmer, and harmonics-to-noise ratio, indicate increased aperiodicity and degradation in overall voice quality.12,13 Decreased cepstral peak prominence (CPP), a measure associated with overall severity of dysphonia and breathiness,14–16 also has been reported in children with vocal nodules.17 Szklanny et al.,18 however, reported no differences in CPP between children with and without vocal nodules.
The understanding of voice quality in children with vibratory sources other than the true vocal folds is limited due to the scarcity of robust measures of voice quality, despite its crucial role in the management of pediatric population. Children with SGVS frequently have severe dysphonia characterized by aperiodicity, though most of the investigations with children have been descriptive and observational in nature.7,8 When comparing voice quality in children with both GVS and SGVS using Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V),19 Krival, et al.3 found that children with SGVS had a higher degree of overall severity, strain, and roughness compared to children with GVS. While the CAPE-V is commonly used in clinical and research settings and has good reliability in terms of overall dysphonia severity ratings, it has poor to moderate reliability for ratings of specific voice quality dimensions.4,20 Additionally, the use of visual analog scales as in CAPE-V, which rely on arbitrary assignment of numbers to perceptions, fails to provide precise measurement of the magnitude of voice quality for comparison between groups and over time.21,22 An additional complicating factor is that the severe aperiodicity in SGVS and MVS compared to GVS makes it difficult to use most conventional acoustic measures of voice which require accurate fundamental frequency estimation.6,9 Although one study compared fundamental frequencies of SGVS and GVS groups,3 no studies have yet evaluated acoustic measures related to voice quality in the SGVS group. Thus, the exact impact of vibratory sources on pediatric voice quality remains unknown.
Matching tasks
Traditional auditory-perceptual rating tasks such as CAPE-V are subjective in their nature, have limited inter-rater reliability, and lack the ability to accurately capture small changes in voice quality over time.21–23 To overcome these limitations, matching tasks with reference sound stimuli have been proposed as a more effective alternative in laboratory use.24–26 With these tasks, a synthetic comparison sound with modifiable variables is presented, and listeners adjust the variables until the comparison sound matches the target sound quality. This type of comparison reduces the influence of internal biases and contextual biases.23,26,27 In single-variable matching tasks (SVMTs) a single independent variable is increased or decreased in magnitude until the perceived quality of the comparison sound matches the perceived quality of the voice. These variable values establish a direct relationship between the physical value of the independent variable (and their associated units of measurement) and the perceived voice quality. By expressing perceived voice quality in terms of physical units, mathematical operations such as difference in magnitude of severity or change due to some modification can be computed, while also providing an objective value to compare against acoustic analyses or computational modeling. SVMTs have been developed to evaluate breathiness,28 roughness,29 and strain30 in adult voices, but have yet to be evaluated in pediatric voices.
Bio-inspired computational measures using auditory models
In addition to acoustic analyses of the vocal signal, bio-inspired auditory models have been used to assess dysphonic voices and yield outputs that provide objective indices of voice quality in adult voices. This approach may also benefit the evaluation of pediatric dysphonia, particularly for pediatric voices with SGVS, which currently lack objective measures for assessment. Conventional acoustic measurements, such as jitter and shimmer, are usually computed directly from raw acoustic signals but typically fail to robustly index specific voice quality dimensions.31,32 To address this issue, previous studies have used bio-inspired auditory models to transform raw signals into internal auditory representations, capturing various stages of nonlinear auditory processing.33,34 Pitch strength, estimated from outputs of basilar membrane filtering, can represent the perceived degree of tonality and has been strongly correlated with perceived breathiness.35 Amplitude modulation filterbank output, which estimates the representation of temporal envelope modulation at the midbrain,36 has been strongly correlated with perceived roughness.37 Sharpness, estimated from the spectral envelope shape from an auditory loudness model (Moore, 1997) has been strongly correlated with perceived strain. These measures do not require accurate estimation of fundamental frequency, unlike many conventional acoustic measures, making them useful for evaluating severely aperiodic voices resulting from SGVS. All three measures have been used for adult voices, but none have yet been used with pediatric voices.
Purposes and hypotheses
The aim of this study was to advance the understanding and evaluation of voice quality in children with SGVS and MVS through the use of recent advancements in measurement tools for voice quality; specifically, SVMTs and bio-inspired computational measures of specific voice quality dimensions. Children with SGVS and MVS use supraglottal structures (e.g., ventricular folds, arytenoid- epiglottic petiole contact) for voicing, which may be more massive and less symmetrical than the true vocal folds. Based on these characteristics, it was hypothesized that children with SGVS and MVS would exhibit more irregular phonation, resulting in higher levels of perceived roughness and correspondingly higher values of amplitude modulation filterbank output compared to children using GVS. Some children with SGVS and MVS may experience more difficulty closing the airway completely due to a possible increased gap from damage, allowing more air escape during their voice production. Thus, it was hypothesized that children with SGVS and MVS would have higher levels of perceived breathiness, and correspondingly lower pitch strength estimates, compared to children with GVS. We also hypothesized that the use of supraglottal structures for voice production in children with SGVS and MVS may result in increased subglottal pressure and vocal effort to initiate phonation, leading to higher levels of perceived strain and correspondingly greater spectral sharpness compared to children with GVS.
This study also aimed to evaluate the applicability of recent voice quality measurement tools, SVMTs and bio-inspired computational measures, for pediatric voices. It was hypothesized that the SVMTs used to evaluate voice quality in adult voices would produce reliable perceptual judgments for pediatric voices, and that the perceptual outcomes would have similar correlations with associated computational measures as observed in adult voices.
Methods
Voice stimuli
Sustained /ɑ/ recordings of 44 children (16 GVS, 16 SGVS, 14 MVS) were selected from the Cincinnati Children’s Hospital Medical Center (CCHMC) disordered voice database. Prior to the selection of the 44 recordings, recordings available in the database were grouped into each vibratory source group through careful examinations of the acoustic signals associated with each voice sample and corresponding videoendoscopic images evaluated by three SLPs (SB, LK, and BW) and one otolaryngologist (AA). All four voice experts reached a consensus on the primary vibratory source used by the children in the database, and although the voice recordings we used for the experiment were not simultaneously recorded during videoendoscopy, the clinicians checked the recordings by listening to ensure that the recordings were grouped into the correct vibratory source group. Children with GVS (mean age = 7.7 years; range = 4 to 11 years) were diagnosed with nodules, muscle tension dysphonia, paralysis, hemorrhage, or vocal fold dysfunction. Children with SGVS (mean age = 7.6 years; range 4 to 11 years) and MVS (mean age = 8.0 years; range = 5 to 11 years) were not diagnosed with any vocal fold lesions but some were diagnosed with papilloma, web, or paralysis (see Supplemental Materials S1–S3 for the videoendoscopic clips corresponding to each vibratory source). Most of the children were reported to be prematurely born, had received laryngeal-tracheal reconstruction surgery, presented with restricted arytenoid, or combinations of these characteristics.
All recordings were cropped to the middle 1 second having relatively stable fo and intensity contours. To control loudness of the chosen samples, the amplitude of each stimulus was adjusted to produce a single loudness level (phon) calculated using a loudness model.38
Listeners
Seven listeners completed the study. Listeners were females between 20 and 24 years of age (mean age = 21.8 years), with American English as their native language and normal hearing in both ears (pure tone threshold < 25 dB HL at octave frequencies between 250 and 8000 Hz; ANSI, 2010). Listeners had no expertise in assessing voice quality. The number of listeners was determined by evaluation of the average absolute deviations of roughness matching values from the mean roughness matching values of 15 listeners in a previous study.37 Mean roughness matching values averaged by seven listeners differed only slightly (~ 5%) from mean roughness matching values rated by 15 participants, and thus a total of seven listeners were determined to be suitable for the study. Each listener provided informed consent following procedures approved by the University of South Florida Institutional Review Board (IRB Pro0012381).
Comparison stimuli
Seven listeners evaluated perceived breathiness, roughness, and strain of the voice stimuli via separate one-dimensional SVMT with quality-specific independent variables of comparison stimuli. The comparison stimulus consisted of a sawtooth waveform mixed with noise as in the previous studies.28,29 The sawtooth waveform and noise were low-pass filtered to have a spectral slope of −12 dB/octave for the comparison stimuli for breathiness and roughness, and a spectral slope of −18 dB/octave for the comparison stimuli for strain. The steeper spectral slope for the strain comparison stimuli was to include a wider range of spectral slopes of the comparison stimuli as the independent variable for strain voice quality is related to the spectral slope. The sawtooth-plus-noise had a fundamental frequency of 151 Hz and was set to a sawtooth-to-noise ratio of 20 dB to achieve a quality similar to natural voices.
The independent variable of the comparison stimulus for each voice quality is listed in Table 1. The independent variable for breathiness matching was the signal-to-noise ratio (SNR) adjusted by changing the amount of noise in the signal.28 The independent variable for roughness matching was amplitude modulation depth (dAM).29 Amplitude modulation was applied to the comparison stimulus using a sinusoidal function of the fourth power with a modulation frequency (fAM) of 25 Hz: [1 + m × sin(2πfAM × t)]4. The modulation depth is expressed in dB as 20 × log10(m). The independent variable for strain matching was the bandpass filter gain varied to result in equal steps of spectral sharpness (for details on the applied bandpass filter, see Park, et al.30). Spectral sharpness was calculated using a sharpness model of Fastl and Zwicker.40 Changes in these independent variables have been shown to produce a range of perceived quality wide enough to exceed the range needed to match adult dysphonic voices.28–30 Comparison stimuli for breathiness and roughness were RMS-normalized to yield an equal sound level across the SNR and dAM ranges. RMS normalization did not result in equal loudness across the filter gain range of the comparison stimuli for strain due to the increase in high-frequency spectral energy with bandpass filtering. Instead, the comparison stimuli for strain were normalized to a single loudness level (phon) using a loudness model.38
Table 1.
The independent variables of the comparison stimuli for each voice quality
| Voice Quality | Base Stimulus | Independent Variables | Steps | Initial Values |
|---|---|---|---|---|
| Breathiness | Sawtooth +
noise: fo = 151 Hz Slope = −12 dB/octave SNR = 20 dB |
SNR | 2 dB | Low = 0 dB High = 30 dB |
| Roughness | Amplitude modulation depth (dAM) | 2 dB | Low = −30 dB High = −5 dB |
|
| Strain | Sawtooth +
noise: fo = 151 Hz Slope = −18 dB/octave SNR = 20 dB |
Bandpass filter gain | 0.09 acum (sharpness) | Low = 0.55 acum High = 1.81 acum |
Single-variable matching tasks
In each trial of the matching task, a voice stimulus was presented, followed by a 500-ms silent interval, and then the comparison stimulus (synthetic waveform described above). Stimulus presentation, the subject interface, and response collection were controlled by the SykofizX software application (Tucker-Davis Technologies, Inc., Alachua, FL). The desired digital acoustic stimuli were converted to analog signals via TDT RZ6 multi-processor, routed to a TDT HB7 headphone buffer, and delivered to the listeners through a pair of Sennheiser HD-280 headphones at a level of 75dB SPL (an average level of 75 dB SPL for the phon-scaled strain comparison stimuli).
The subject interface included a graphical user interface (GUI) that was displayed on a computer monitor in front of the listeners concurrent with the sound presentation. The GUI presented three buttons labeled “Increase Quality”, “Decrease Quality” and “Equal Quality,” where the text “Quality” was replaced with “breathiness” for the breathiness task, “fluctuation” for the roughness task, and “effort” for the strain task. If listeners perceived that the specific voice quality of the comparison stimulus being measured was lower than the perceived quality of the voice stimulus, then they were instructed to select the “Increase Quality” button with a mouse click. This resulted in manipulating the independent variable in the next trial to increase the quality of the comparison stimulus. When they perceived that the quality of the comparison stimulus was greater than the perceived quality of the voice stimulus, then they were instructed to select the “Decrease Quality” button. This decreased the quality of the comparison stimulus in subsequent trials. When they reached the point of a subjective perceptual match, they selected the “Equal Quality” button.
Prior to the matching task of each session, the listeners were given definitions of the breathiness, roughness, and strain percepts based on the definitions associated with the CAPE-V assessment tool.19 They also were presented with example sounds and were instructed to focus on the specific voice quality measured in each session and ignore other voice qualities as well as pitch, loudness, and vowel identities. Listeners were trained with the SVMT procedure described above using two natural voice samples that were not included in the selected stimuli. For each voice quality dimension, the samples were chosen to be near the two ends of the dysphonia severity continuum. The experimenters provided feedback when the participant did not appropriately match the comparison stimulus to the expected matching values of the practice voice samples. The feedback included guidance on the direction in which participant could adjust the comparison sound, and the amount of feedback depended on each participant’s performance of the matching practice samples.
To account for hysteresis in judgments during the adaptive matching task, for each voice stimulus, final matches were based on the average of three matches for which the initial independent variable was near maximum and three matches for which the initial value was near minimum, as described by Patel, et al.29 Table 1 lists the initial independent variable values used for each condition and the step size of the independent variable for each voice quality matching task. Within an experimental block, the listeners completed matching four different voice stimuli in two high initial conditions and two low initial conditions. Each block took approximately 10 to 15 minutes to complete. At the end of each block, participants were given a short break before starting the next block. The listeners completed an average of seven blocks in one session and completed matching for all voice stimuli over three to five sessions for each voice quality. After completing one voice quality matching task, the listeners proceeded to another voice quality matching task in the next session. The order of voice stimuli and the order of voice quality were pseudorandomized across participants. No session was longer than two hours of listening time.
The matching value obtained for each voice quality corresponds to the value of the independent variable of the comparison stimulus matched by listeners as being equal to the perceived voice quality of the voice stimulus. Matching values of six repetitions of each stimulus from low and high initial conditions were averaged within a listener. The matching values among the seven listeners were averaged to represent the average matching values for each natural voice stimulus by each voice quality.
Computational measures from auditory models
Computational measures of the voice stimuli were obtained using MATLAB (The Mathworks, Inc; Natick, MA). Pitch strength, associated with perceived breathiness,35 was estimated from a sawtooth waveform-inspired pitch estimator with an auditory front end.41 Envelope fluctuation measure from the amplitude modulation filterbank output, associated with perceived roughness,37 was calculated with an auditory processing model of temporal envelope modulation36 as detailed in Park, et al.37 Among modulation filters, modulation filter 7 (center frequency = 64 Hz) was selected because the standard deviation of its output (EnvSD7) showed a large and significant correlation with the perceived roughness of adult voice samples. Additionally, it has been previously observed that as the fundamental frequency of a tone increases, the modulation frequency that causes the perceived roughness of that tone also increases.40 Thus, we speculated that a modulation filter with a higher center frequency may be needed to predict the roughness of children’s voices, which typically have higher fundamental frequencies than adult voices. As a result, modulation filter 8 (center frequency = 107 Hz) was additionally selected and the standard deviation of its output (EnvSD8) was calculated. Spectral sharpness, associated with perceived strain,42 was calculated using the sharpness model of Fastl and Zwicker.40
Statistical Analysis
Statistical analysis was performed in SPSS (Version. 27.0, IBM Corp.). Intra-listener reliability of the SVMTs was calculated via two-way mixed effects intraclass coefficients (ICC [2, k]) for absolute agreement and average measures. Inter-listener reliability of the SVMTs was calculated via two-way mixed effects ICC (2, k) for consistency. ICC (2,k) was used to evaluate the reliability because the mean values of seven listeners were used as the matching result, representing the stimuli rather than individual listeners’ matching values.43 One-way ANOVAs were performed to compare the effect of vibratory source on auditory-perceptual measures and computational measures. Tukey’s post-hoc tests were performed on the measures that vibratory sources had a significant effect on to evaluate which group of vibratory sources resulted in the higher values. Effect sizes were calculated using a partial eta squared (ƞp2). Pearson correlation coefficients were computed among the obtained matching values to assess the relationships and interactions between auditory-perceptual measures. Pearson correlation coefficients were also computed to assess the linear relationships between auditory-perceptual and computational measures. Specifically, based on previous studies, correlations between breathiness and pitch strength,35 roughness and log-transformed envelope fluctuation measures (log10(EnvSD7) and log10(EnvSD8)),37 and strain and sharpness30,42 were investigated. A significance level of p < 0.05 was set a priori.
Results
Reliability of SVMT
Reliability measures are presented in Table 2. Intra-listener reliability of the SVMTs for all voice qualities was good, with mean intra-listener reliability (ICC [2, k], absolute agreement) of 0.85 or higher. Interrater reliability of breathiness (ICC [2, k], consistency = 0.88) and roughness (ICC [2, k], consistency = 0.93) matching was good, but strain matching was moderate (ICC [2, k], consistency = 0.69). The reliability of SVMT was similar to that observed in previous studies of SVMT (breathiness intra-listener ICC = 0.98, inter-listener ICC = 0.9544; roughness intra-listener ICC = 0.87–0.99, inter-listener ICC = 0.9237; strain intra-listener ICC = 0.98–0.99, inter-listener ICC = 0.9930) with the exception of the interrater reliability of strain matching.
Table 2.
Intra- and inter-listener reliability for each voice quality
| Voice Quality | Intra-listener ICCs (2, k), absolute agreement (Mean) | Inter-listener ICC (2, k), consistency (95% CI) |
|---|---|---|
| Breathiness | 0.80–0.92 (0.87) |
0.88 (0.82–0.93) |
| Roughness | 0.88–0.96 (0.92) |
0.93 (0.89–0.95) |
| Strain | 0.74–0.92 (0.85) |
0.69 (0.53–0.81) |
Auditory-perceptual measures
To demonstrate the effect of the vibratory source on auditory-perceptual measurements, the mean breathiness, roughness, and strain matching values are plotted as a function of vibratory source in Figure 1. The results showed a general trend that the SGVS group had the most severe voice quality, followed by the MVS group, as measured from the SVMTs. The one-way ANOVA of the mean roughness matching values showed a statistically significant effect of vibratory source with a large effect size (F2,41 = 8.543, p < 0.001, ƞp2 = 0.29). The post-hoc Tukey test indicated that the mean roughness matching values of SVGS and MVS groups were significantly higher than that of the GVS group, as hypothesized (see the center panel of Figure 1). The one-way ANOVA of the mean strain matching values also showed a statistically significant effect of vibratory source with a large effect size (F2,41 = 4.77, g = 0.01, ƞp2 = 0.19). The post-hoc Tukey test revealed that the mean strain matching value of SVGS groups was significantly higher than that of the GVS group (see the right panel of Figure 1). However, the one-way ANOVA for mean breathiness matching values indicated no statistically significant differences between the vibratory source groups.
Figure 1.

Mean matching values of breathiness (left), roughness (center), and strain (right) as a function of vibratory source. Error bars represent the standard error of the mean.
Correlations between auditory-perceptual measures
When considered as a function of stimulus token, auditory-perceptual judgments of the three different voice qualities were correlated among each other. The strongest correlation was between mean roughness and strain matching values (r = 0.64, p < 0.001). Mean breathiness and strain matching values were significant but weakly correlated (r = 0.37, p < 0.01). There was no significant correlation between mean breathiness and roughness matching values (r = 0.16, p = 0.31). The correlational analysis of auditory-perceptual measures may be indicative of a stronger interaction between perceived strain and other co-varying voice qualities, than between perceived breathiness and roughness in the samples chosen for this study.
Computational measures
To investigate the effect of the vibratory source on computational measures, the mean pitch strength, EnvSD8, and sharpness values are plotted as a function of vibratory source in Figure 2. The results of one-way ANOVAs revealed significant effects of vibratory source on both mean EnvSD7 and EnvSD8, with large effect sizes (EnvSD7: F2,41 = 5.00, p = 0.01, ƞp2 = 0.22; EnvSD8: F2,41 = 8.52, p < 0.001, ƞp2 = 0.29). Between these measures, mean EnvSD8 showed a larger effect size than EnvSD7 (see Methods for a discussion of the two metrics), so it is shown in Figure 2. A post-hoc Tukey test revealed that the mean EnvSD8 of the SGVS group was significantly higher than that of the GVS group, as hypothesized (see the center panel of Figure 2). The results of the one-way ANOVA for the remaining measures showed no statistically significant differences between the vibratory source groups.
Figure 2.

Mean values of pitch strength (left), EnvSD8 (center), and computed sharpness (right) as a function of vibratory source. Error bars represent the standard error of the mean.
Correlations between auditory-perceptual and computational measures
Pearson’s correlation analysis of the relationships between auditory-perceptual and computational measures revealed statistically significant correlations. Figure 3 presents plots of breathiness, roughness, and strain matching values of all voice stimuli in all three vibratory source groups as functions of their pitch strength, log10(EnvSD8), and sharpness, respectively. The shaded areas in the plots represent the range of each voice quality observed in previous studies on adult voices for reference.30,35,37 There was a strong and highly significant correlation between pitch strength and breathiness matching values (r = 0.90, p < 0.001). Similarly, there was a strong and significant correlation between log10(EnvSD8) and roughness matching values (r = 0.68, p < 0.001; EnvSD7: r = 0.54, p < 0.001). A moderate and significant correlation between computed sharpness and strain matching values was obtained (r = 0.45, p = 0.002). To achieve our goal of evaluating the applicability of bio-inspired computational measures for pediatric voices, we compared the correlation values observed in this study with those from previous studies on adult voices. Table 3 provides a summary of the correlation values observed in our study compared to those in previous studies.
Figure 3.

Top: Breathiness matching values as a function of pitch strength. Mid: Roughness matching values as a function of log10(EnvSD8). Bottom: Strain matching values as a function of computed sharpness. Shaded areas indicate the ranges observed in previous studies.30,35,37
Table 3.
Correlations between auditory-perceptual and computational measures observed in this study on pediatric voices and previous studies on adult voices
Discussion
The purpose of this study was to evaluate the impact of different vibratory sources on pediatric dysphonia. In order to address the limited understanding and measurement of voice quality in children with supraglottic vibratory sources (SGVS), we used recently developed novel measures, including the single-variable matching tasks (SVMTs) and bio-inspired computational measures, to assess specific dimensions of traditionally described voice quality. Our hypothesis was that children using SGVS would have higher perceived roughness resulting from their irregular phonation physiology and altered anatomy, and correspondingly higher values of amplitude modulation filterbank output compared to children using glottal vibratory source (GVS). Additionally, it was hypothesized that children with SGVS would have higher perceived breathiness, pitch strength, perceived strain, and sharpness than children with GVS due to the use of supraglottal structures for voice production. The study also aimed to evaluate the applicability of the SVMT and computational measures to the voice quality of pediatric voices.
Effects of vibratory source on auditory-perceptual measures with SVMTs
The results of this study indicated that children with SGVS indeed had significantly higher perceived roughness and strain compared to children with GVS, as hypothesized. These results are in line with previous research that observed increased roughness and strain in children using CAPE-V,3 and in adults with ventricular dysphonia, whose vocalizations primarily involve the use of ventricular folds.45 Our results support the idea that the use of supraglottal structures like the ventricular folds, arytenoids, and epiglottic petiole used by children with SGVS and mixed vibratory sources (MVS) leads to irregular and asymmetrical physiology of vibrating laryngeal structures, which is predominantly perceived as increased vocal roughness. Use of the supraglottal structures may also require more vocal effort to produce voice than use of the true vocal folds, resulting in increased perceived strain.
Similarly, children with MVS also exhibited significantly higher perceived roughness compared to children with GVS. However, their perceived strain was not significantly different from either GVS or SGVS group. The general trend of the SVMT results indicated that children with MVS, who use both true vocal folds and supraglottal structures, tend to have voice quality that is less severely impacted than those who use SGVS alone. This suggests that a child’s ability to engage the true vocal folds post airway reconstruction surgery improves their perceived voice quality and vocal function. Moreover, techniques to ease strain during mixed voice production can be used during post-surgical voice therapy. Such techniques can include using a type of semi-occluded voice therapy (SOVT) to reduce laryngeal area tension while encouraging engagement of the true vocal folds for sustained phonation if possible. If not SOVT specifically, similar techniques can be useful to reduce excess effort and strain during continuous voicing.46
Contrary to roughness and strain matching values, breathiness matching values did not differ significantly among the vibratory source groups, which was contrary to our hypothesis as well. Breathiness is caused by the turbulence of air passing through the glottis that is not fully closed during phonation.15,19 We hypothesized that children with SGVS would have higher perceived breathiness, as some children with SGVS may have damage to their airway that leads to an increased gap, making it difficult for the supraglottic structures to fully adduct during vocalization. However, our study showed that children with SGVS had similar ranges of breathiness matching values as those with GVS. The higher strain observed in children with SGVS as compared to GVS indicates the effort involved in adduction and engagement of ventricular folds and associated supraglottal structures for sustained voicing, which would limit air escape and result in similar perceived breathiness as those with GVS. Children with GVS who have voice disorders can also have increased breathiness due to glottal insufficiency caused by benign superficial vocal fold lesions. The size and severity of the lesion and resulting voice disorder can determine if they exhibit breathiness similar to what is found in children with SGVS. In fact, the child with the highest breathiness score was in the GVS group, as shown in the top panel of Figure 3. Some children with SGVS and MVS showed lower perceived breathiness compared to those with GVS, indicating the possible use of these secondary vibratory sources to achieve an adequate level of airway closure for voice production. Thus, perceived breathiness alone may not accurately indicate SGVS voices. The manner and degree to which a child who is post-airway reconstruction (or similar anatomic difference) can uniquely approximate, close, or even squeeze their supraglottic structures for sustained voicing make it difficult to draw definitive conclusions about how the broad percept of breathiness is associated with the voicing source.
The results of this study also support the potential use of SVMTs in pediatric voice assessment. The same comparison sounds used in previous studies evaluating adult voice quality28–30 were used in this study. High intra- and inter-rater reliability similar to previous studies on adult voices30,37,44 suggest that these tasks using newly developed comparison sounds can be used reliably for pediatric voice evaluation. Additionally, the results showed that breathiness and roughness matching values were not correlated, indicating that these qualities were captured independently of each other, which also was observed in a previous investigation using synthetic samples with covarying breathiness and roughness.47 However, strain matching values are correlated with breathiness and roughness matching values, and further investigation is needed to understand potential interactions between strain matching results and other covarying voice qualities is needed. It is also worth noting that the SVMT method used in this study was primarily designed for laboratory measurements of voice quality, which can take several minutes to achieve a perceptual match for a single voice sample. As real-time or rapid evaluation of voice quality is typically required in clinical settings to facilitate the diagnosis and treatment of voice disorders, modifications to this current method will be necessary to enable reliable and efficient voice quality assessment within a considerably shorter time frame. Additionally, integrating dimension-specific magnitude estimates with SVMT data can lead to the development of formal scales of voice quality with ratio-level data and corresponding physical units.48 In this method, a dysphonic voice will be evaluated against 1 voice quality unit while allowing the clinician to listen to multiple synthetic anchors and make a judgment.
Effects of vibratory source on computational measures
Among the bio-inspired computational measures used in this study, only the outputs of the amplitude modulation filterbank, EnvSD7 and EnvSD8, were significantly higher in children with SGVS compared to those with GVS. EnvSD7 and EnvSD8 estimate the representation in the auditory midbrain of amplitude modulation of signals at modulation frequencies around 64 and 107 Hz, respectively. Acoustic stimuli with amplitude modulation at relatively low modulation frequencies have been associated with perceived roughness of sound,40,49 and EnvSD7 was strongly correlated with perceived roughness of adult voices in a previous study.37 The higher EnvSD7 and EnvSD8 values in children with SGVS may reflect the increased vocal roughness observed in this group in the SVMTs. Acoustic assessment of voice quality from different vibratory sources has been challenging due to possible extreme aperiodicity, limiting the use of many existing acoustic measures.6,50 Measures from the amplitude modulation filterbank, however, can be obtained regardless of the periodicity of the signals, supporting their potential for the assessment and management of children’s voices, especially those with SGVS and MVS.
Relationships between auditory-perceptual and computational measures
The EnvSD7 and EnvSD8 metrics obtained from the amplitude modulation filterbank were also strongly and significantly correlated with perceived roughness assessed by SVMT unlike the other measures used in this study. This supports the idea that amplitude modulation filterbank outputs may be effective in predicting vocal roughness in pediatric voices. One difference from the previous study on adult vocal roughness and the amplitude modulation filterbank37 compared to this study is the modulation filter that resulted in the highest correlation with perceived roughness. In this study on pediatric voices, EnvSD8 obtained from modulation filter 8 with a center frequency of 107 Hz had greater statistical differences between vibratory source groups and a had greater correlation with perceived roughness compared to EnvSD7 from modulation filter 7 (center frequency = 64 Hz), which strongly predicted roughness in adult voices.37 These results indicate that the overall modulation frequency associated with perceived roughness in pediatric voices may be higher than in adult voices. Pediatric voices generally have a higher fundamental frequency than adult voices, and previous studies of general sound roughness have observed that high-frequency tones require a higher modulation frequency to cause maximum perceived roughness compared to low-frequency tones.40,49 This is consistent with the greater correlation observed in this study between perceived roughness of pediatric voices and the output from an amplitude modulation filter with higher modulation frequencies. Therefore, for pediatric voice assessment, amplitude modulation filterbank output from higher modulation filters may be more suitable than that used for adult voice assessment.
Computed pitch strength, which represents the pitch salience of the signal, was highly correlated with perceived breathiness, as hypothesized and previously observed in studies of adult voices.35 Among the computational measures used in this study, pitch strength showed the strongest correlation (r = 0.91) with the perceived voice quality associated with each measure. While the pitch strength did not differ significantly across vibratory sources as shown in Figure 2, it was strongly correlated with perceived breathiness across all groups and showed a wide range (Figure 3). This relationship between pitch strength and breathiness has been attributed to the increased aperiodicity in breathy voices caused by turbulent noises.15 Pitch strength in this study did not correlate with the perceived roughness in pediatric voices (r = 0.11), similar to previous observations in studies of adult rough voices.37 Although aperiodicity has been shown to have some relationship with roughness,19,51–53 the aperiodicity estimated by pitch strength may not be sufficient to predict perceived roughness.
Sharpness was significantly but moderately correlated with perceived strain, as evaluated using the SVMT. Previous studies have investigated the relationship between sharpness and perceived strain in voice signals,30,42,54 as increased vocal effort has been observed to be linked with higher mid-to-high frequency energy.55,56 A strong correlation between computed sharpness and perceived strain has been reported for adult voices.30,42 The moderate correlation observed in this study, unlike previous studies on adult voices, may be attributed to the limited range of perceived strain in our samples. Although our voice samples were chosen to have a wide range of dysphonia severities, they may not have had a sufficient range of perceived strain. The range of strain matching values observed in this study (0.84 to 1.54 acum) was slightly reduced from those observed for adult voices selected to have a wide range of perceived strain (0.72 to 1.72 acum)30 as shown in Figure 3. An insufficient range of samples can result in lower correlation results. Future studies can evaluate the relationship between computed sharpness and perceived strain in pediatric voices using samples with a wide range of strained quality.
Another possible reason for the lower correlation between perceived strain and sharpness observed in our voice samples compared to previous studies could be the influence of other covarying voice qualities. A strain-matching study has previously used primarily strained voice samples with minimal other voice qualities, but this study included a variety of voice qualities. The correlations between strain matching values and roughness and breathiness matching values were significant. In particular, perceived roughness, which was higher in SGVS and MVS samples than in GVS samples, may have increased the perception of strain. In Figure 3, most samples with higher strain matching values than their computational estimates of strain (the calculated sharpness) were SGVS samples (yellow symbols). The influence of other covarying voice qualities on strain matching may have resulted in the lowest inter-rater reliability of strain among the three voice qualities in a one-dimensional, single-variable matching task, in which listeners match each quality individually. This effect may be reduced if multiple qualities are matched together simultaneously in a two- or three-variable matching task.
Limitations
This study was limited to the use of sustained vowel recordings as the voice samples. While these vowel samples reflect the voice characteristics of children, they may lack the ecological validity. Voice evaluation in clinical settings includes both sustained vowels and continuous sentences.57 However, the measurement techniques used in this study, SVMT and bio-inspired computational measures, have not yet been validated for use with sentence recordings. Future research can explore the application of these techniques to sentence recordings, both in adult and pediatric speech samples. Despite this limitation, sustained vowel recordings still offer valuable information regarding the voice characteristics of children and were used effectively in this study to reveal the effects of different vibratory sources.
Conclusion
The impact of different vibratory sources was greatest on perceived roughness in SVMTs and amplitude modulation output measures among bio-inspired computation measures. Our results indicated that roughness is the most prominent quality of different vibratory sources in children. The use of amplitude modulation output measures related to perceived roughness can improve the objective assessment of pediatric voice quality in children with various vibratory sources. This study also supported the reliability and potential of SVMT and bio-inspired computational measures that were developed for adult voice quality evaluation, for pediatric voice quality evaluation. The computational measures exhibited significant correlations with their associated perceived quality in pediatric voices. Future research can explore the development of a matching task that simultaneously measures all three voice qualities to overcome the limitations of single-variable matching tasks and examine the relationship between strain and sharpness, which was observed to be moderate in this study, in a broader range of strained pediatric voice samples.
Supplementary Material
Acknowledgment
This work was supported by NIH R01 DC018008 (DAE, ADA, & RS). The authors would like to thank Anthony Consuegra and Sophia M. Gifford for help with recruitment and data collection.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declarations of interest:
none
Amplitude fluctuation at low modulation frequencies has been associated with the perception of roughness.40 Previous studies on roughness matching tasks29,37,49 commonly employed the term “fluctuation” to describe this perceptual aspect of roughness for listeners. Consistent with the exiting literature, the term “fluctuation” was displayed in the GUI used for the roughness matching task. However, it is important to note that participants were also provided with an explanation of “roughness” in terms of “fluctuation” during the task to ensure their understanding of the concept.
References
- 1.Carding PN, Roulstone S, Northstone K, Team AS. The prevalence of childhood dysphonia: a cross-sectional study. Journal of Voice. Dec 2006;20(4):623–30. doi: 10.1016/j.jvoice.2005.07.004 [DOI] [PubMed] [Google Scholar]
- 2.Johnson CM, Anderson DC, Brigger MT. Pediatric dysphonia: A cross-sectional survey of subspecialty and primary care clinics. Journal of Voice. Mar 2020;34(2):301 e1–301 e5. doi: 10.1016/j.jvoice.2018.08.017 [DOI] [PubMed] [Google Scholar]
- 3.Krival K, Kelchner LN, Weinrich B, et al. Vibratory source, vocal quality and fundamental frequency following pediatric laryngotracheal reconstruction. International Journal of Pediatric Otorhinolaryngology. Aug 2007;71(8):1261–9. doi: 10.1016/j.ijporl.2007.04.018 [DOI] [PubMed] [Google Scholar]
- 4.Kelchner LN, Baker Brehm S, Weinrich B, et al. Perceptual evaluation of severe pediatric voice disorders: rater reliability using the consensus auditory perceptual evaluation of voice. Journal of Voice. Jul 2010;24(4):441–9. doi: 10.1016/j.jvoice.2008.09.004 [DOI] [PubMed] [Google Scholar]
- 5.Weinrich B, Baker S, Kelchner L, et al. Examination of aerodynamic measures and strain by vibratory source. Otolaryngology-Head and Neck Surgery. Mar 2007;136(3):455–8. doi: 10.1016/j.otohns.2006.11.052 [DOI] [PubMed] [Google Scholar]
- 6.Baker S, Kelchner L, Weinrich B, et al. Pediatric laryngotracheal stenosis and airway reconstruction: a review of voice outcomes, assessment, and treatment issues. Journal of Voice. Dec 2006;20(4):631–41. doi: 10.1016/j.jvoice.2005.08.012 [DOI] [PubMed] [Google Scholar]
- 7.Clary RA, Pengilly A, Bailey M, et al. Analysis of voice outcomes in pediatric patients following surgical procedures for laryngotracheal stenosis. Archives of Otorhinolaryngology-Head & Neck Surgery. Nov 1996;122(11):1189–94. doi: 10.1001/archotol.1996.01890230035008 [DOI] [PubMed] [Google Scholar]
- 8.Smith ME, Marsh JH, Cotton RT, Myer CM 3rd. Voice problems after pediatric laryngotracheal reconstruction: videolaryngostroboscopic, acoustic, and perceptual assessment. International Journal of Pediatric Otorhinolaryngology. Jan 1993;25(1–3):173–81. doi: 10.1016/0165-5876(93)90051-4 [DOI] [PubMed] [Google Scholar]
- 9.Kelchner LN, Weinrich B, Baker Brehm S, Tabangin ME, de Alarcon A. Characterization of supraglottic phonation in children after airway reconstruction. Annals of Otology, Rhinology & Laryngology. Jun 2010;119(6):383–90. doi: 10.1177/000348941011900604 [DOI] [PubMed] [Google Scholar]
- 10.Nuss RC, Ward J, Huang L, Volk M, Woodnorth GH. Correlation of vocal fold nodule size in children and perceptual assessment of voice quality. Annals of Otology, Rhinology & Laryngology. Oct 2010;119(10):651–5. doi: 10.1177/000348941011901001 [DOI] [PubMed] [Google Scholar]
- 11.Shah RK, Woodnorth GH, Glynn A, Nuss RC. Pediatric vocal nodules: correlation with perceptual voice analysis. International Journal of Pediatric Otorhinolaryngology. Jul 2005;69(7):903–9. doi: 10.1016/j.ijporl.2005.01.029 [DOI] [PubMed] [Google Scholar]
- 12.Gramuglia AC, Tavares EL, Rodrigues SA, Martins RH. Perceptual and acoustic parameters of vocal nodules in children. International Journal of Pediatric Otorhinolaryngology. Feb 2014;78(2):312–6. doi: 10.1016/j.ijporl.2013.11.032 [DOI] [PubMed] [Google Scholar]
- 13.Campisi P, Tewfik TL, Pelland-Blais E, Husein M, Sadeghi N. MultiDimensional Voice Program analysis in children with vocal cord nodules. The Journal of Otolaryngology. Oct 2000;29(5):302–8. [PubMed] [Google Scholar]
- 14.Maryn Y, Roy N, De Bodt M, Van Cauwenberge P, Corthals P. Acoustic measurement of overall voice quality: a meta-analysis. The Journal of the Acoustical Society of America. Nov 2009;126(5):2619–34. doi: 10.1121/1.3224706 [DOI] [PubMed] [Google Scholar]
- 15.Hillenbrand J, Cleveland RA, Erickson RL. Acoustic correlates of breathy vocal quality. Journal of Speech and Hearing Research. Aug 1994;37(4):769–78. doi: 10.1044/jshr.3704.769 [DOI] [PubMed] [Google Scholar]
- 16.Heman-Ackah YD, Heuer RJ, Michael DD, et al. Cepstral peak prominence: a more reliable measure of dysphonia. Annals of Otology, Rhinology & Laryngology. Apr 2003;112(4):324–33. doi: 10.1177/000348940311200406 [DOI] [PubMed] [Google Scholar]
- 17.Esen Aydinli F, Ozcebe E, Incebay O. Use of cepstral analysis for differentiating dysphonic from normal voices in children. International Journal of Pediatric Otorhinolaryngology. Jan 2019;116:107–113. doi: 10.1016/j.ijporl.2018.10.029 [DOI] [PubMed] [Google Scholar]
- 18.Szklanny K, Gubrynowicz R, Ratynska J, Chojnacka-Wadolowska D. Electroglottographic and acoustic analysis of voice in children with vocal nodules. International Journal of Pediatric Otorhinolaryngology. Jul 2019;122:82–88. doi: 10.1016/j.ijporl.2019.03.030 [DOI] [PubMed] [Google Scholar]
- 19.Kempster GB, Gerratt BR, Verdolini Abbott K, Barkmeier-Kraemer J, Hillman RE. Consensus auditory-perceptual evaluation of voice: development of a standardized clinical protocol. American Journal of Speech-Language Pathology. May 2009;18(2):124–32. doi: 10.1044/1058-0360(2008/08-0017) [DOI] [PubMed] [Google Scholar]
- 20.Johnson K, Brehm SB, Weinrich B, Meinzen-Derr J, de Alarcon A. Comparison of the Pediatric Voice Handicap Index with perceptual voice analysis in pediatric patients with vocal fold lesions. Archives of Otorhinolaryngology-Head & Neck Surgery. Dec 2011;137(12):1258–62. doi: 10.1001/archoto.2011.193 [DOI] [PubMed] [Google Scholar]
- 21.Shrivastav R, Sapienza CM, Nandur V. Application of psychometric theory to the measurement of voice quality using rating scales. Journal of Speech, Language, and Hearing Research. Apr 2005;48(2):323–35. doi: 10.1044/1092-4388(2005/022) [DOI] [PubMed] [Google Scholar]
- 22.Nagle KF. Clinical use of the CAPE-V scales: agreement, reliability and notes on voice quality. Journal of Voice. Dec 19 in press;doi: 10.1016/j.jvoice.2022.11.014 [DOI] [PubMed] [Google Scholar]
- 23.Kreiman J, Gerratt BR. Validity of rating scale measures of voice quality. The Journal of the Acoustical Society of America. Sep 1998;104(3 Pt 1):1598–608. doi: 10.1121/1.424372 [DOI] [PubMed] [Google Scholar]
- 24.Gerratt BR, Kreiman J. Measuring vocal quality with speech synthesis. The Journal of the Acoustical Society of America. Nov 2001;110(5 Pt 1):2560–6. doi: 10.1121/1.1409969 [DOI] [PubMed] [Google Scholar]
- 25.Kreiman J, Gerratt BR. The perceptual structure of pathologic voice quality. The Journal of the Acoustical Society of America. Sep 1996;100(3):1787–95. doi: 10.1121/1.416074 [DOI] [PubMed] [Google Scholar]
- 26.Patel S, Shrivastav R, Eddins DA. Perceptual distances of breathy voice quality: a comparison of psychophysical methods. Journal of Voice. Mar 2010;24(2):168–77. doi: 10.1016/j.jvoice.2008.08.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kreiman J, Gerratt BR. Perception of aperiodicity in pathological voice. The Journal of the Acoustical Society of America. Apr 2005;117(4 Pt 1):2201–11. doi: 10.1121/1.1858351 [DOI] [PubMed] [Google Scholar]
- 28.Patel S, Shrivastav R, Eddins DA. Developing a single comparison stimulus for matching breathy voice quality. Journal of Speech, Language, and Hearing Research. Apr 2012;55(2):639–47. doi: 10.1044/1092-4388(2011/10-0337) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Patel S, Shrivastav R, Eddins DA. Identifying a comparison for matching rough voice quality. Journal of Speech, Language, and Hearing Research. Oct 2012;55(5):1407–22. doi: 10.1044/1092-4388(2012/11-0160) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Park Y, Anand S, Gifford SM, Shrivastav R, Eddins DA. Development and Validation of a Single-Variable Comparison Stimulus for Matching Strained Voice Quality Using a Psychoacoustic Framework. Journal of Speech, Language, and Hearing Research. Jan 12 2023;66(1):16–29. doi: 10.1044/2022_JSLHR-22-00280 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Bhuta T, Patrick L, Garnett JD. Perceptual evaluation of voice quality and its correlation with acoustic measurements. Journal of Voice. Sep 2004;18(3):299–304. doi: 10.1016/j.jvoice.2003.12.004 [DOI] [PubMed] [Google Scholar]
- 32.Fujiki RB, Thibeault SL. The relationship between auditory-perceptual rating scales and objective voice measures in children with voice disorders. American Journal of Speech-Language Pathology. Jan 27 2021;30(1):228–238. doi: 10.1044/2020_AJSLP-20-00188 [DOI] [PubMed] [Google Scholar]
- 33.Shrivastav R, Camacho A. A computational model to predict changes in breathiness resulting from variations in aspiration noise level. Journal of Voice. Jul 2010;24(4):395–405. doi: 10.1016/j.jvoice.2008.12.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Shrivastav R The use of an auditory model in predicting perceptual ratings of breathy voice quality. Journal of Voice. Dec 2003;17(4):502–12. doi: 10.1067/s0892-1997(03)00077-8 [DOI] [PubMed] [Google Scholar]
- 35.Eddins DA, Anand S, Camacho A, Shrivastav R. Modeling of breathy voice quality using pitch-strength estimates. Journal of Voice. Nov 2016;30(6):774 e1–774 e7. doi: 10.1016/j.jvoice.2015.11.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Dau T, Kollmeier B, Kohlrausch A. Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. The Journal of the Acoustical Society of America. Nov 1997;102(5 Pt 1):2892–905. doi: 10.1121/1.420344 [DOI] [PubMed] [Google Scholar]
- 37.Park Y, Anand S, Ozmeral EJ, Shrivastav R, Eddins DA. Predicting perceived vocal roughness using a bio-inspired computational model of auditory temporal envelope processing. Journal of Speech, Language, and Hearing Research. Aug 17 2022;65(8):2748–2758. doi: 10.1044/2022_JSLHR-22-00101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Moore BCJ, Glasberg BR. A revision of Zwicker’s loudness model. Acustica. Mar-Apr 1996;82(2):335–345. [Google Scholar]
- 39.ANSI S3.21–2010 Methods for manual pure-tone threshold audiometry (American National Standards Institute) (2010).
- 40.Fastl H, Zwicker E. Psychoacoustics: Facts and Models. 3rd. ed. Springer series in information sciences,. Springer; 2007:xii, 462 p. [Google Scholar]
- 41.Camacho A On the use of auditory models’ elements to enhance a sawtooth waveform inspired pitch estimator on telephone-quality signals. 2012:1107–1112. [Google Scholar]
- 42.Anand S, Kopf LM, Shrivastav R, Eddins DA. Objective indices of perceived vocal strain. Journal of Voice. Nov 2019;33(6):838–845. doi: 10.1016/j.jvoice.2018.06.005 [DOI] [PubMed] [Google Scholar]
- 43.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin. Mar 1979;86(2):420–8. doi: 10.1037//0033-2909.86.2.420 [DOI] [PubMed] [Google Scholar]
- 44.Anand S, Skowronski MD, Shrivastav R, Eddins DA. Perceptual and quantitative assessment of dysphonia across vowel categories. Journal of Voice. Jul 2019;33(4):473–481. doi: 10.1016/j.jvoice.2017.12.018 [DOI] [PubMed] [Google Scholar]
- 45.Maryn Y, De Bodt MS, Van Cauwenberge P. Ventricular dysphonia: clinical aspects and therapeutic options. Laryngoscope. May 2003;113(5):859–66. doi: 10.1097/00005537-200305000-00016 [DOI] [PubMed] [Google Scholar]
- 46.Kelchner L, Baker Brehm S, Weinrich B. Pediatric Voice: A Modern, Collaborative Approach to Care. Plural Publishing; 2014. [Google Scholar]
- 47.Park Y, Anand S, Kopf LM, Shrivastav R, Eddins DA. Interactions between breathy and rough voice qualities and their contributions to overall dysphonia severity. Journal of Speech, Language, and Hearing Research. 2022;65(11):4071–4084. doi:doi.org/ 10.1044/2022_JSLHR-22-00012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Eddins DA, Anand S, Lang A, Shrivastav R. Developing clinically relevant scales of breathy and rough voice quality. Journal of Voice. Jan 10 2020;doi: 10.1016/j.jvoice.2019.12.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Eddins DA, Kopf LM, Shrivastav R. The psychophysics of roughness applied to dysphonic voice. The Journal of the Acoustical Society of America. Dec 2015;138(6):3820–5. doi: 10.1121/1.4937753 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Mehta DD, Hillman RE. Voice assessment: updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods. Current Opinion in Otolaryngology & Head and Neck Surgery. Jun 2008;16(3):211–5. doi: 10.1097/MOO.0b013e3282fe96ce [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Barsties v. Latoszek B, Maryn Y, Gerrits E, De Bodt M. A meta-analysis: acoustic measurement of roughness and breathiness. Journal of Speech, Language, and Hearing Research. Feb 15 2018;61(2):298–323. doi: 10.1044/2017_JSLHR-S-16-0188 [DOI] [PubMed] [Google Scholar]
- 52.de Krom G Some spectral correlates of pathological breathy and rough voice quality for different types of vowel fragments. Journal of Speech and Hearing Research. Aug 1995;38(4):794–811. doi: 10.1044/jshr.3804.794 [DOI] [PubMed] [Google Scholar]
- 53.Heman-Ackah YD, Michael DD, Goding GS Jr. The relationship between cepstral peak prominence and selected parameters of dysphonia. Journal of Voice. Mar 2002;16(1):20–7. doi: 10.1016/s0892-1997(02)00067-x [DOI] [PubMed] [Google Scholar]
- 54.Kopf LM, Shrivastav R, Eddins DA. Isolating the effects of strain on voice quality. poster presentation presented at: Pan-European Voice Conference; 2013; Prague, Czech Republic. [Google Scholar]
- 55.Lowell SY, Kelley RT, Awan SN, Colton RH, Chan NH. Spectral- and cepstral-based acoustic features of dysphonic, strained voice quality. Annals of Otology, Rhinology & Laryngology. Aug 2012;121(8):539–48. doi: 10.1177/000348941212100808 [DOI] [PubMed] [Google Scholar]
- 56.McKenna VS, Stepp CE. The relationship between acoustical and perceptual measures of vocal effort. The Journal of the Acoustical Society of America. Sep 2018;144(3):1643. doi: 10.1121/1.5055234 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Patel RR, Awan SN, Barkmeier-Kraemer J, et al. Recommended Protocols for Instrumental Assessment of Voice: American Speech-Language-Hearing Association Expert Panel to Develop a Protocol for Instrumental Assessment of Vocal Function. American Journal of Speech-Language Pathology. Aug 6 2018;27(3):887–905. doi: 10.1044/2018_AJSLP-17-0009 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
