Abstract
Purpose:
Vocal vibrato is a singing technique that involves periodic modulation of fundamental frequency (fo) and intensity. The physiological sources of modulation within the speech mechanism and the interactions between the laryngeal source and vocal tract filter in vibrato are not fully understood. Therefore, the purpose of this study was to determine if differences in the rate and extent of fo and intensity modulation could be captured using simultaneously recorded signals from a neck-surface vibration sensor and a microphone, which represent features of the source before and after supraglottal vocal tract filtering.
Method:
Nine classically-trained singers produced sustained vowels with vibrato while simultaneous signals were recorded using a vibration sensor and a microphone. Acoustical analyses were performed to measure the rate and extent of fo and intensity modulation for each trial. Paired-samples sign tests were used to analyze differences between the rate and extent of fo and intensity modulation in the vibration sensor and microphone signals.
Results:
The rate and extent of fo modulation and the extent of intensity modulation were equivalent in the vibration sensor and microphone signals, but the rate of intensity modulation was significantly higher in the microphone signal than in the vibration sensor signal. Larger differences in the rate of intensity modulation were seen with vowels that typically have smaller differences between the first and second formant frequencies.
Conclusions:
This study demonstrated that the rate of intensity modulation at the source prior to supraglottal vocal tract filtering, as measured in neck-surface vibration sensor signals, was lower than the rate of intensity modulation after supraglottal vocal tract filtering, as measured in microphone signals. The difference in rate varied based on the vowel. These findings provide further support of the resonance-harmonics interaction in vocal vibrato. Further investigation is warranted to determine if differences in the physiological source(s) of vibrato account for inconsistent relationships between the extent of intensity modulation in neck-surface vibration sensor and microphone signals.
1. Introduction
Vocal vibrato is a singing technique that involves periodic modulation of fundamental frequency (fo) and intensity. These acoustical modulations can be described by the rate of modulation, or the number of cycles of modulation occurring per second, and the extent of modulation, or the magnitude of modulation. In classically-trained singers producing vibrato, the rate of fo modulation is 5-7 Hz, and the extent of fo modulation is 6-8%; while the rate of intensity modulation is about 5-10 Hz, and the extent of intensity modulation is 23-38% (Prame, 1994; Ramig & Shipp, 1987; Seidner, 1995; Shipp, Leanderson, & Sundberg, 1980; Sundberg, 1995). The higher rates and extents of intensity modulation compared to fo modulation may be related to the physiological sources of modulation within the speech mechanism or the interactions between the laryngeal source and vocal tract filter in vibrato.
The primary physiological source of fo modulation in vibrato is oscillation of vocal fold length (Hirano, Hibi, & Hagino, 1995; Niimi, Horiguchi, Kobayashi, & Yamada, 1988). However, there may be multiple laryngeal and vocal tract sources of intensity modulation that could include oscillation of vocal fold length, degree of vocal fold adduction, laryngeal height, or position of the mandible, tongue, velum, and pharyngeal walls (Hirano et al., 1995; Horii, 1989; Niimi et al., 1988; Rothenberg, Miller, & Molitor, 1988; Sundberg, 1995). In addition, although large subglottal pressure variations are not considered typical for Western classical singing, there may be small subglottal pressure modulations that could contribute to modulation of intensity (Nandamudi & Scherer, 2019; Rothenberg et al., 1988). A combination of these physiologic sources of modulation might alter the rate or extent of intensity modulation relative to fo modulation.
In addition, interactions between the laryngeal source and vocal tract filter may affect the rate and extent of intensity modulation. Based on the source-filter theory of voice production (Chiba & Kajiyama, 1958; Fant, 1960), the harmonics (i.e., the fo and integer multiples of the fo) are produced by vibration of the vocal folds and are a feature of the sound source. The sound propagates through the supraglottal vocal tract (i.e., the pharynx, oral cavity, and nasal cavity), which acts as a filter and determines the amplitude of the harmonics. When harmonics fall close to resonance peaks of the filter, their amplitudes are enhanced and they form peaks in the output spectrum, referred to as formant frequencies. In vibrato, modulation of the fo may cause harmonics to shift in and out of resonance peaks, resulting in modulation of intensity (Horii, 1989; Sundberg, 1995). This effect is referred to as the resonance-harmonics interaction (Horii, 1989). If high amplitude harmonics modulate in and out of resonance peaks during one cycle of vibrato, the rate of intensity modulation may double and the extent of intensity modulation may be altered (Horii, 1989; Sundberg, 1995). In addition to these linear source-filter interactions described by the source-filter theory, there are also nonlinear interactions of the source and filter that affect source features like fo stability and harmonic intensity, particularly in male speakers when the fo crosses over the first formant frequency (F1) (Maxfield, Palaparthi, & Titze, 2017; Titze, Riede, & Popolo, 2008; Titze, 2008). These complex physiologic and acoustic interactions are difficult to characterize using audio recordings collected with a microphone because they capture a combination of source and filter features.
Microphone recordings have been combined with electroglottography (EGG; Herbst, 2020) to investigate the physiologic sources of intensity modulation in previous studies of vibrato. Dromey, Reese, and Hopkin (2009) found that amplitude modulation was represented by the EGG signal, suggesting oscillation of the degree of vocal fold adduction during production of vibrato in classically-trained singers. When they compared the patterns in the EGG signal to acoustic patterns in the microphone signal, they found that, even when the amplitude modulation in the EGG measurements was low, intensity modulation was present in the microphone signal, providing evidence of the resonance-harmonics interaction with fo modulation. Nandamudi and Scherer (2019) also used EGG combined with acoustic and aerodynamic measures to investigate the interaction of intensity modulation, fo modulation, and airflow modulation in vibrato. This study did not demonstrate a relationship between modulation represented by EGG signal measurements and airflow modulation. However, the study revealed that the intensity modulation rate was up to two times higher than the fo and airflow modulation rates, which may provide further support for the resonance-harmonics interaction. While microphone recordings combined with EGG and aerodynamic analyses may provide information about the possible physiological patterns and acoustical interactions in vibrato, the equipment needed to perform EGG and aerodynamic assessments and the training needed to analyze the signals is not readily accessible across a range of teaching, research, and clinical settings.
Neck-surface accelerometry is another voice recording technique that is not yet readily accessible across a range of settings but can capture features of the source before they are filtered through the supraglottal vocal tract (Askenfelt, Gauffin, Sundberg, & Kitzing, 1980; Coleman, 1988; Hillman et al., 2006; Švec, Titze, & Popolo, 2005). Acoustical analyses comparing accelerometer signals and microphone signals have demonstrated consistency in estimating average fo (Coleman, 1988; Hillman et al., 2006; Mehta et al., 2016) and average sound pressure level (Švec et al., 2005), as well as spectral and cepstral features (Mehta et al., 2019; Mehta et al., 2016). Comparisons have not yet been made with vibrato, which may involve source modulations, filter modulations, or a combination of source and filter modulations similar to vocal tremor, a neurogenic voice disorder (Brown & Simonson, 1963; Hachinski, Thomsen, & Buch, 1975; Koda & Ludlow, 1992; Sulica & Louis, 2010). Although the accelerometers used in previous studies are not readily accessible for application to vibrato and vocal tremor due to the technical demands in configuring the instrumentation, a commercially-available vibration sensor was used as a neck-surface accelerometer to collect voice recordings in a recent study (Cler, McKenna, Dahl, & Stepp, 2020). Comparisons of neck-surface vibration sensor and microphone signals have the potential to clarify the physiological sources of modulation and characterize source-filter interactions in vibrato and vocal tremor, which could advance the understanding of vibrato and inform clinical assessment of vocal tremor.
Therefore, the purpose of this study was to determine if differences in the rate and extent of fo and intensity modulation could be captured using simultaneously recorded neck-surface vibration sensor and microphone signals. We hypothesized that the rate and extent of fo modulation would be equivalent in the vibration sensor and microphone signals because fo is a feature of the source that would not be largely affected by the supraglottal filter based on the source-filter theory. Although nonlinear source-filter interactions can affect fo stability, we did not anticipate that these interactions would have a significant effect on the rate or extent of fo modulation across male and female speakers producing a range of vowels at a comfortable pitch. Because the subglottal filter (i.e., the trachea) impacts glottal source features like fo and vibratory amplitude that would be captured in both the vibration sensor and microphone signals (Austin & Titze, 1997; Lehoux, Hampala, & Švec, 2021; Zhang, Neubauer, & Berry, 2006), we did not expect differential effects of subglottal resonances on the two signals. In contrast, we hypothesized that the rate and extent of intensity modulation would be higher in the microphone signal compared to the vibration sensor signal because there would be combined effects of the source and supraglottal filter represented in the microphone signal that would not be represented in the vibration sensor signal, including the resonance-harmonics interaction and oscillation of the supraglottal vocal tract. The findings of this study could inform future research on the physiology of vibrato and the development of clinically-feasible and accessible approaches for comprehensive assessment of vocal tremor.
2. Method
2.1. Participants
Nine classically-trained singers (five female, four male) between the ages of 22-55 years of age (mean = 31 years) were included in this study. Eight of these singers also participated in a study on auditory-motor control of fo in vocal vibrato (Lester-Smith et al., 2021). Only singers who reported current or past classical singing training and denied current neurological, speech, language, cognitive, and voice disorders were eligible for these studies. Participants represented a range of voice types (i.e., soprano, mezzo-soprano, alto, counter-tenor, baritone, bass-baritone). Their range of singing experience was 4-30 years (mean = 14 years), and their range of singing training was 4-15 years (mean = 10 years). All participants passed a hearing screening as described in Lester-Smith et al. (2021).
2.2. Procedure
All study procedures were approved by the Northwestern University Institutional Review Board (NU IRB). Each participant completed the NU IRB informed consent process prior to initiating the study procedures. Participants were seated in a quiet clinical room for data collection. They wore an AKG C520 head-mounted condenser cardioid microphone positioned 4 cm from the corner of the mouth at an angle of approximately 45°. This microphone was selected to achieve adequate amplification of signals for the auditory perturbation experiment described by Lester-Smith et al. (2021). The microphone signal was routed to a MOTU Ultra-Lite-mk3 audio digital interface for digitization. The participants also wore a piezoelectric vibration sensor (Big Shot, K&K Sound Systems Inc.) taped to the anterior neck, superior to the sternal notch. The sensor has a diameter of 3/4” and thickness of 1/32”. This sensor was selected over the smaller vibration sensor (Hot Shot, K&K Sound Systems Inc.) used for ambulatory monitoring of voice in a previous study (Cler et al., 2020) based on preliminary testing with a classically-trained female singer producing sustained vowels with an average fo of 206 Hz. Visual inspection showed clear harmonic representation up to about 8200 Hz before the noise floor was reached for the Big Shot sensor and up to about 2800 Hz before the noise floor was reached for the Hot Spot sensor. Preliminary manual analyses revealed an 11.9 dB drop in the first octave, and an average 4.2 dB drop in the second octave, 7.7 dB drop in the third octave, 1.8 dB drop in the fourth octave, and 0.6 dB drop in the fifth octave for the Big Shot sensor. According to the manufacturer, the primary sensitivity of the sensors is in the anterior-posterior direction (i.e., perpendicular to the sensor), although some superior-inferior motion may be detected. The digitized microphone signal and the vibration sensor signal were routed to an ADInstruments PowerLab 8SP ML 785 multi-channel data acquisition device. The signals were then routed to an Apple MacBook Pro A1278 laptop computer with LabChart software (ADInstruments, 2009, version 7.0.3) for simultaneous recordings with a 40 kHz sampling rate. Participants were asked to produce three repetitions of the vowels /i, æ, Λ ɑ u/ with vibrato for 5 seconds per repetition. They were instructed to produce the vowels with a comfortable pitch and loudness. Example recordings are presented in Fig. 1.
Figure 1:

Waveform (upper panel) and narrowband spectrogram (lower panel) for simultaneously recorded normalized microphone (left panel) and vibration sensor signals (right panel) for /ɑ/ with vibrato.
2.3. Data Analysis
Based on inspection of the vibration sensor and microphone signals in Praat (Boersma & Weeninck, 2019, version 6.1.08), 12 trials were excluded out of 135 total trials (9%) due to clipping of the microphone signal with loud productions and four trials were excluded out of 135 total trials (3%) due to extraneous speaking or throat clearing in between productions. The exclusion of clipped signals eliminated one production of /i/ for one participant, one production of /ɑ/ for two participants, and three productions of /æ, Λ, ɑ/ for one participant. Custom-written Praat scripts were used to identify the voice onset for each trial using the annotate to text grid (silences) function and to estimate the rate and extent of fo and intensity modulation for the first 4 s of each production. These automated analyses of the rate and extent of fo and intensity modulation were based on manual analyses described by Lester, Barkmeier-Kraemer, and Story (2013). A 10 Hz smoothing factor was applied to the “pitch object” prior to identifying the minimum and maximum fo peak times and values. Intensity peak values in dB were converted to Pascals using the following equation: px = (pr) x 10(dB/20), where px is the absolute pressure in Pascals, pr is the standard reference pressure of 2 x 10−5 Pa, and dB is the intensity in dB. The fo and intensity modulation cycle period was determined by calculating the time difference between the peak of one cycle and the peak of the preceding cycle. The average modulation rate was then calculated using the formula 1/T, where T is the average cycle period for each trial. The extent of fo and intensity modulation was determined for each modulation cycle using the formula (max – min) / (max + min) x 100. The average modulation extent was then calculated for each trial. Atypically high extents of fo modulation were measured for the first cycle of modulation in the vibration sensor signal for one trial and in the microphone signal for one trial out of the 119 included trials, and atypically high extents of intensity modulation were measured for the first modulation cycle in the vibration sensor signal for 98 trials and in the microphone signal for 83 trials out of the 119 included trials when the minimum value was obtained in the silent period prior to voice onset. Therefore, the first modulation cycle was excluded from the average fo modulation extent for the vibration sensor and microphone signals for two trials, and the first modulation cycle was excluded from the average intensity modulation extent for the vibration sensor and microphone signals for all trials. In addition, atypically high extents of fo modulation were measured for two modulation cycles in one microphone signal during a period of roughness that caused tracking of ½fo. Therefore, these two cycles were excluded from the average fo modulation extent for the vibration sensor and microphone signals. Because the intensity measurements estimated relative changes rather than absolute changes, the intensity measurements were not calibrated for these analyses. The relative intensity levels for the background noise and the vowel productions were estimated by manually measuring the intensity of the earliest 1 s period of silence and the middle 1 s period of voicing for the trial with the lowest average intensity in Praat for each microphone recording. The intensity of the silent period was subtracted from the intensity of the voicing period to determine the relative intensity difference.
2.4. Statistical Analysis
Paired-samples sign tests were used to analyze differences between the rate and extent of fo and intensity modulation in the vibration sensor and microphone signals. These tests were selected because: 1) the samples were related, and 2) the distribution of the differences between the samples was not normally distributed or symmetrical for all comparisons. Paired-samples sign tests were performed in SPSS (IMB, 2019, version 26).
3. Results
The average rate of fo modulation was 4.8 Hz (SD = 0.3) in the vibration sensor signal and 4.8 Hz (SD = 0.3) in the microphone signal (Fig. 2). The average extent of fo modulation was 2.8% (SD = 0.8) in the vibration sensor signal and 2.8% (SD = 0.8) in the microphone signal (Fig. 3). The average rate of intensity modulation was 6.4 Hz (SD = 1.1) in the vibration sensor signal and 8.4 Hz (SD = 1.3) in the microphone signal (Fig. 4). The average extent of intensity modulation was 5.2% (SD = 3.2) in the vibration sensor signal and 4.3% (SD = 2.0) in the microphone signal (Fig. 5). Each participant’s average rate and extent of fo and intensity modulation are presented in Table 1, along with their average fo. The average signal and noise intensity difference was 38.5 dB (range 30.8 – 46.3 dB), which exceed the recommended 10 dB difference (Patel et al., 2018; Šrámková, Granqvist, Herbst, & Švec, 2015).
Figure 2:
Results of the rate of fundamental frequency (fo) modulation analysis, with the microphone signal on the x-axis and the vibration sensor signal on the y-axis. The orange 1:1 line represents matched rates of fo modulation in the microphone and vibrato sensor signals.
Figure 3:
Results of the extent of fundamental frequency (fo) modulation analysis, with the microphone signal on the x-axis and the vibration sensor signal on the y-axis. The orange 1:1 line represents matched extents of fo modulation in the microphone and vibration sensor signals.
Figure 4:
Results of the rate of intensity modulation analysis, with the microphone signal on the x-axis and the vibration sensor signal on the y-axis. The orange 1:1 line represents matched rates of intensity modulation in the microphone and vibration sensor signals.
Figure 5:
Results of the extent of intensity modulation analysis, with the microphone signal on the x-axis and the vibration sensor signal on the y-axis. The orange 1:1 line represents matched extents of intensity modulation in the microphone and vibration sensor signals.
Table 1:
Results of acoustical analyses for the vibration sensor (sens) and microphone (mic) signals from each participant.
| Participant Number |
Mean fo (Hz) |
fo modulation rate (Hz) |
fo modulation extent (%) |
fo modulation extent (cents) |
Intensity modulation rate (Hz) |
Intensity modulation extent (%) |
||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sens | Mic | Sens | Mic | Sens | Mic | Sens | Mic | Sens | Mic | Sens | Mic | |
| P1 | 199.9 | 199.9 | 5.4 | 5.4 | 3.1 | 3.1 | 53.4 | 53.8 | 7.8 | 9.8 | 5.6 | 1.9 |
| P2 | 334.2 | 334.1 | 4.3 | 4.3 | 1.6 | 1.6 | 28.1 | 28.6 | 5.6 | 8.2 | 2.5 | 1.7 |
| P3 | 121.7 | 121.8 | 4.6 | 4.6 | 3.8 | 3.8 | 66.0 | 66.7 | 6.4 | 7.7 | 5.5 | 5.8 |
| P4 | 328.0 | 327.7 | 5.2 | 5.2 | 1.3 | 1.3 | 22.5 | 22.5 | 6.5 | 8.6 | 1.8 | 2.6 |
| P5 | 218.4 | 218.4 | 5.0 | 5.0 | 3.3 | 3.2 | 56.0 | 55.7 | 5.8 | 9.2 | 12.7 | 5.2 |
| P6 | 311.0 | 311.0 | 4.7 | 4.7 | 2.9 | 2.9 | 50.8 | 51.1 | 5.3 | 6.6 | 5.3 | 7.7 |
| P7 | 236.4 | 236.4 | 4.7 | 4.7 | 2.5 | 2.5 | 43.9 | 43.6 | 5.6 | 6.7 | 5.5 | 5.3 |
| P8 | 165.0 | 165.0 | 4.6 | 4.6 | 3.1 | 3.2 | 55.9 | 56.5 | 8.6 | 10.2 | 3.1 | 3.4 |
| P9 | 229.0 | 230.8 | 4.5 | 4.5 | 3.2 | 3.0 | 55.8 | 76.7 | 6.6 | 8.7 | 5.1 | 5.1 |
The difference in the rate of intensity modulation in the vibration sensor and microphone signals was significant (p = 0.004). Larger differences in the rate of intensity modulation were seen with vowels that typically have smaller differences between the first formant frequency (F1) and second formant frequency (F2) based on previous studies of speech (Kent & Vorperian, 2018; Peterson & Barney, 1952). The relationship between the difference in the rate of intensity modulation in the vibration sensor and microphone signals and the difference in F1 and F2 is represented in Fig. 6.
Figure 6:
Illustration of the relationship between the difference in the rate of intensity modulation in the microphone and vibration sensor signals and the typical difference in F1 and F2 based on Peterson & Barney (1952).
4. Discussion
The purpose of the current study was to determine if differences in the rate and extent of fo and intensity modulation could be captured using simultaneously recorded signals from a neck-surface vibration sensor and a microphone, which represent features of the source before and after supraglottal vocal tract filtering. The study revealed that the rate and extent of fo modulation and the extent of intensity modulation were equivalent in the vibration sensor and microphone signals, but that the rate of intensity modulation was significantly higher in the microphone signal than in the vibration sensor signal. The consistency in the rate and extent of fo modulation in the vibration sensor and microphone signals supported our hypothesis and indicated that these features of the source were not significantly affected by the filter. Although nonlinear source-filter interactions may occur during speaking and singing, particularly when the fo crosses over the F1, these interactions did not appear to influence the extent or rate of fo modulation in the current study based on the consistency of these measures in the vibration sensor and microphone signals. The relationship between the extent of fo modulation between the vibration sensor and microphone signals remained consistent for 8/9 participants when fo values were converted from Hertz to cents based on Herbst, Hertegard, Zangger-Borch, and Lindestad (2017) and Sundberg (1995) using the following formula: 1200 x log2 (f2/f1), where f1 was the average fo for the trial and f2 was the minimum or maximum fo for each modulation cycle. The fo modulation extents are presented in cents for each participant in Table 1.
The consistency in the extent of intensity modulation in the vibration sensor and microphone signals did not support our hypothesis that the extent of intensity modulation would be higher in the microphone signal than in the vibration sensor signal. Although the extent of intensity modulation was consistent across the vibration sensor and microphone signals for most participants, one participant had a considerably higher extent in the microphone signal than in the vibration sensor signal, and two participants had considerably higher extents in the vibration sensor signal than in the microphone signal. There were no apparent patterns of difference in participants’ singing training or experience that accounted for these findings. However, the two participants with higher extents of intensity modulation in the vibration sensor signal than the microphone signal were both male singers who represented the lowest voice types in the study (i.e., baritone, bass-baritone). Because Nandamudi and Scherer (2019) found lower extents of airflow modulation in male singers than in female singers, sex-based and singing type-based differences in production of vibrato require further investigation. In addition, further investigation is needed to understand nonlinear source-filter interactions in vocal vibrato that may influence the extent of intensity modulation differently in male and female singers.
The intensity modulation extents measured in the microphone signal in the current study were considerably lower than the extents seen in previous studies of vibrato. That is, we measured intensity modulation extents of 7-10% in the current study; whereas, extents of 23-38% were measured in previous studies (Ramig & Shipp, 1987; Seidner, 1995). This difference is likely related to the method of calculating the extent of intensity modulation. Seidner (1995) appeared to use the formula min / max x 100, and Ramig and Shipp (1987) used the formula (max – min) / (max) x 100 to calculate the extent of intensity modulation; whereas, we used the formula (max – min) / (max + min) x 100, consistent with previous studies measuring the extent of intensity modulation in vocal tremor (Barkmeier-Kraemer, Lato, & Wiley, 2011; Lester et al., 2013). Using the same formula for calculating the extent of fo modulation in the current study, we found that the extents of fo modulation of 4-5% were still lower than the extents of intensity modulation of 7-10%. Further investigation is needed to determine the physiological and acoustical bases for the difference in the extents of fo and intensity modulation.
The finding that the rate of intensity modulation was significantly higher in the microphone signal than the vibration sensor signal supported our hypothesis and may represent the resonance-harmonics interaction in the microphone signal. Analysis of vowel-based patterns revealed that the difference in rate of intensity modulation in the vibration sensor and microphone signals may have been driven by vowels with smaller differences between F1 and F2 based on previous studies of speech (Kent & Vorperian, 2018; Peterson & Barney, 1952). Horii (1989) and Sundberg (1995) discussed a doubling of the rate of intensity modulation relative to fo modulation that may occur when a harmonic modulates symmetrically around an F1 maximum or minimum during one cycle of fo modulation. The results of the current study indicate that this effect may be magnified when F1 and F2 are closer in frequency. Further investigation is needed with a larger sample of speakers and repeated productions of all vowels to confirm this pattern. In addition, future studies should investigate how source-filter interactions may have influenced these vowel-based patterns in the rate of intensity modulation. While vowels with a low F1 like /i/ and /u/ may be more likely to elicit source-filter interactions because the fo could cross over the F1 the patterns in the rate of intensity modulation were inconsistent for /i/ and /u/ in the current study and did not help clarify these interactions Although we did not see a relationship between the fo and the vowel-based patterns in the current study, it is possible that participants may tune the fo with subglottal and supraglottal resonances. Future studies should also confirm that the vowel-based differences observed in the current study were not related to differences in the overall amplitude of the vibration sensor and microphone signals or differences in the amplitude of the higher harmonics due to filtering of the vibration sensor signal through the neck. Finally, future studies should use an omnidirectional microphone with a 5-10 cm mouth-to-microphone distance to prevent microphone proximity effects that affect estimation of the sound pressure level of lower frequencies in particular (Svec & Granqvist, 2010). Although an omnidirectional microphone was not required for the current study to accurately estimate fo or overall signal intensity, it would facilitate investigation of the contribution of individual harmonics to modulation of the microphone signal intensity.
With regard to the relationship between the rate of fo and intensity modulation in the microphone signal, the rate of intensity modulation was 1.4-2.2 times higher (mean = 1.8) than the rate of fo modulation in the current study, which is consistent with previous studies (Horii, 1989; Nandamudi & Scherer, 2019; Sundberg, 1995). Because the “pitch object” was smoothed but the “intensity object” was not smoothed in the same way for the current study analyses, manual analysis of the rate of intensity modulation was performed based on previous manual analysis methods (Barkmeier-Kraemer et al., 2011; Lester et al., 2013) in order to confirm this relationship. The rates of fo and intensity modulation were equivalent for two participants, but the rate of intensity modulation was 1.1-1.6 times higher (mean = 1.3) than the rate of fo modulation for the remaining participants.
Future studies should use a combination of physiological and acoustical approaches to determine if differences in the physiological source of modulation could account for the acoustical patterns in the current study. For example, further investigation is warranted to determine if physiological differences may have contributed to the inconsistent relationship between the extent of intensity modulation in the vibration sensor and microphone signals across participants. It is possible that singers with vocal tract oscillation might produce higher extents of intensity modulation in the microphone signal than in the vibration sensor signal; whereas, singers with oscillation of the degree of vocal fold adduction or subglottal pressure might produce higher extents of intensity modulation in the vibration sensor signal than in the microphone signal. There is also a possibility that pathophysiological differences in singers might account for inconsistent relationships between the extent of intensity modulation in the microphone and vibration sensor signals, as participants’ vocal health screenings were based on participant self-report. Physiological and acoustical relationships could also be examined using analysis-by-synthesis with a computational model to simulate vibrato, similar to an approach used to clarify physiological and acoustical patterns in a speaker with vocal tremor (Lester et al., 2013). Finally, future studies should investigate differences between the K&K Sound vibration sensor used in the current study and the Knowles BU-27135 accelerometer used most commonly in recent studies (Marks et al., 2020; Mehta et al., 2019; Mehta et al., 2016) to confirm that detection of acoustical patterns was not influenced by properties of the vibration sensor. For example, while the Knowles BU accelerometer has unidirectional sensitivity in the axial direction (Mehta et al., 2016), the K&K Sound vibration sensor has some bidirectional sensitivity, which may allow vertical laryngeal oscillation to affect signal intensity.
Based on the results of this study, future comparisons of vibration sensor and microphone signals are also warranted in speakers with vocal tremor. As with vibrato, the respiratory system, larynx, and vocal tract may serve as physiological sources of modulation in vocal tremor (Brown & Simonson, 1963; Hachinski et al., 1975; Koda & Ludlow, 1992; Sulica & Louis, 2010). Current medical management of vocal tremor often involves injection of botulinum toxin into the laryngeal musculature (Adler et al., 2004; Gurey, Sinclair, & Blitzer, 2013), even when the source of tremor is not isolated to the larynx or cannot be identified using endoscopic examinations (Bové et al., 2006). For example, identification of respiratory tremor requires systematic visual and tactile assessments (Hemmerich, Finnegan, & Hoffman, 2017; Hixon & Hoit, 1999, 2000), aerodynamic assessments (Hixon & Hoit, 2006), or kinematic assessments like respiratory inductive plethysmography (Hemmerich et al., 2017; Koda & Ludlow, 1992). The time and instrumentation required to perform these procedures may limit their clinical application and the feasibility of identifying all physiologic sources of vocal tremor.
The addition of subglottal pressure estimates to the current study procedures could advance diagnosis and characterization of vocal tremor and support individualized, physiologically-based treatment. Recent advances in neck-surface accelerometry have allowed for estimation of subglottal pressure in speakers with voice disorders including phonotraumatic and nonphonotraumatic vocal hyperfunction and unilateral vocal fold paralysis (Marks et al., 2020). The authors’ reported goal was to optimize estimation of subglottal pressure using neck-surface accelerometry across a variety of speaking tasks and communication settings. Application of these procedures to the current study methods could inform future studies of vocal tremor and vibrato.
5. Conclusions
The results of this study demonstrated that the rate of intensity modulation at the source prior to vocal tract filtering, as measured in neck-surface vibration sensor signals, was lower than the rate of intensity modulation after vocal tract filtering, as measured in microphone signals. The difference in rate was larger for vowels that typically have smaller differences between the first and second formant frequencies. These findings provide further support of the resonance-harmonics interaction in vocal vibrato. The results of this study also demonstrated that, although the extent of intensity modulation before and after vocal tract filtering was consistent for most participants, there was one participant who had a higher extent of intensity modulation in the microphone signal and two participants who had higher extents of modulation in the vibration sensor signal. Further investigation is warranted to determine if differences in the physiological source(s) of vibrato account for inconsistent relationships between the extent of intensity modulation in vibration sensor and microphone signals.
6. Acknowledgements
We thank Dr. Brad H. Story and Chun Liang Chan for their previous contributions to the data analysis scripts. This research was funded by the National Institute on Disability, Independent Living, and Rehabilitation Research Advanced Rehabilitation Research Training Grant 90AR5015 (PI L.R. Cherney), the National Institute on Deafness and Other Communication Disorders Early Career Research Award R21 DC017001 (PI R.A. Lester-Smith), and research funding provided by the Moody College of Communication at The University of Texas at Austin (PI R.A. Lester-Smith).
Footnotes
Declarations of Interest
None
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
7. References
- Adler CH, Bansberg SF, Hentz JG, Ramig LO, Buder EH, Witt K, … Caviness JN (2004). Botulinum toxin type A for treating voice tremor. Archives of neurology, 61(9), 1416–1420. [DOI] [PubMed] [Google Scholar]
- Askenfelt A, Gauffin J, Sundberg J, & Kitzing P (1980). A comparison of contact microphone and electroglottograph for the measurement of vocal fundamental frequency. Journal of Speech, Language, and Hearing Research, 23(2), 258–273. [DOI] [PubMed] [Google Scholar]
- Austin SF, & Titze IR (1997). The effect of subglottal resonance upon vocal fold vibration. Journal of Voice, 11(4), 391–402. [DOI] [PubMed] [Google Scholar]
- Barkmeier-Kraemer J, Lato A, & Wiley K (2011). Development of a speech treatment program for a client with essential vocal tremor. Seminars in Speech and Language, 32(1), 43–57. [DOI] [PubMed] [Google Scholar]
- Bové M, Daamen N, Rosen C, Wang CC, Sulica L, & Gartner-Schmidt J (2006). Development and validation of the vocal tremor scoring system. The Laryngoscope, 116(9), 1662–1667. [DOI] [PubMed] [Google Scholar]
- Brown JR, & Simonson J (1963). Organic voice tremor: A tremor of phonation. Neurology, 13(6), 520–525. [DOI] [PubMed] [Google Scholar]
- Chiba T, & Kajiyama M (1958). The vowel: Its nature and structure. Tokoyo, Japan: Phonetic Society of Japan. [Google Scholar]
- Cler GJ, McKenna VS, Dahl KL, & Stepp CE (2020). Longitudinal case study of transgender voice changes under testosterone hormone therapy. Journal of Voice, 34(5), 748–762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coleman RF (1988). Comparison of microphone and neck-mounted accelerometer monitoring of the performing voice. Journal of Voice, 2(3), 200–205. [Google Scholar]
- Dromey C, Reese L, & Hopkin JA (2009). Laryngeal-level amplitude modulation in vibrato. Journal of Voice, 23(2), 156–163. [DOI] [PubMed] [Google Scholar]
- Fant G (1960). The acoustic theory of speech production. Moulton, The Hague. [Google Scholar]
- Gurey LE, Sinclair CF, & Blitzer A (2013). A new paradigm for the management of essential vocal tremor with botulinum toxin. The Laryngoscope, 123(10), 2497–2501. [DOI] [PubMed] [Google Scholar]
- Hachinski VC, Thomsen IV, & Buch NH (1975). The nature of primary vocal tremor. The Canadian Journal of Neurological Sciences, 2(3), 195–197. [DOI] [PubMed] [Google Scholar]
- Hemmerich AL, Finnegan EM, & Hoffman HT (2017). The distribution and severity of tremor in speech structures of persons with vocal tremor. Journal of Voice, 31(3), 366–377. [DOI] [PubMed] [Google Scholar]
- Herbst CT (2020). Electroglottography–an update. Journal of Voice, 34(4), 503–526. [DOI] [PubMed] [Google Scholar]
- Herbst CT, Hertegard S, Zangger-Borch D, & Lindestad P-Å (2017). Freddie Mercury—acoustic analysis of speaking fundamental frequency, vibrato, and subharmonics. Logopedics Phoniatrics Vocology, 42(1), 29–38. [DOI] [PubMed] [Google Scholar]
- Hillman RE, Heaton JT, Masaki A, Zeitels SM, & Cheyne HA (2006). Ambulatory monitoring of disordered voices. Annals of Otology, Rhinology & Laryngology, 115(11), 795–801. [DOI] [PubMed] [Google Scholar]
- Hirano M, Hibi S, & Hagino S (1995). Physiological aspects of vibrato. In Dejonckere PH, Hirano M, & Sundberg J (Eds.), Vibrato (pp. 9–33). San Diego, CA: Singular Publishing Group, Inc. [Google Scholar]
- Hixon TJ, & Hoit JD (1999). Physical examination of the abdominal wall by the speech-language pathologist. American Journal of Speech-Language Pathology, 8(4), 335–346. [Google Scholar]
- Hixon TJ, & Hoit JD (2000). Physical examination of the rib cage wall by the speech-language pathologist. American Journal of Speech-Language Pathology, 9(3), 179–196. [Google Scholar]
- Hixon TJ, & Hoit JD (2006). A clinical method for the detection and quantification of quick respiratory hyperkinesia. [DOI] [PubMed] [Google Scholar]
- Horii Y (1989). Acoustic analysis of vocal vibrato: A theoretical interpretation of data. Journal of Voice, 3(1), 36–43. [Google Scholar]
- Kent RD, & Vorperian HK (2018). Static measurements of vowel formant frequencies and bandwidths: A review. Journal of communication disorders, 74, 74–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koda J, & Ludlow CL (1992). An evaluation of laryngeal muscle activation in patients with voice tremor. Otolaryngology--Head and Neck Surgery, 107(5), 684–696. [DOI] [PubMed] [Google Scholar]
- Lehoux H, Hampala V, & Švec JG (2021). Subglottal pressure oscillations in anechoic and resonant conditions and their influence on excised larynx phonations. Sci Rep, 11(1), 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lester-Smith RA, Kim JH, Hilger A, Chan C-L, & Larson CR (2021). Auditory-Motor Control of Fundamental Frequency in Vocal Vibrato. Journal of Voice. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lester RA, Barkmeier-Kraemer J, & Story BH (2013). Physiologic and acoustic patterns of essential vocal tremor. Journal of Voice, 27(4), 422–432. [DOI] [PubMed] [Google Scholar]
- Marks KL, Lin JZ, Burns JA, Hron TA, Hillman RE, & Mehta DD (2020). Estimation of Subglottal Pressure From Neck Surface Vibration in Patients With Voice Disorders. Journal of Speech, Language, and Hearing Research, 63(7), 2202–2218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maxfield L, Palaparthi A, & Titze I (2017). New evidence that nonlinear source-filter coupling affects harmonic intensity and fo stability during instances of harmonics crossing formants. Journal of Voice, 31(2), 149–156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mehta DD, Espinoza VM, Van Stan JH, Zañartu M, & Hillman RE (2019). The difference between first and second harmonic amplitudes correlates between glottal airflow and neck-surface accelerometer signals during phonation. The Journal of the Acoustical Society of America, 145(5), EL386–EL392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mehta DD, Van Stan JH, & Hillman RE (2016). Relationships between vocal function measures derived from an acoustic microphone and a subglottal neck-surface accelerometer. IEEE/ACM transactions on audio, speech, and language processing, 24(4), 659–668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nandamudi S, & Scherer RC (2019). Airflow vibrato: Dependence on pitch and loudness. Journal of Voice, 33(6), 815–830. [DOI] [PubMed] [Google Scholar]
- Niimi S, Horiguchi S, Kobayashi N, & Yamada M (1988). Electromyographic study of vibrato and tremolo in singing. Voice Production, Mechanisms and Functions, 403–414. [Google Scholar]
- Patel RR, Awan SN, Barkmeier-Kraemer J, Courey M, Deliyski D, Eadie T, … Hillman R (2018). Recommended protocols for instrumental assessment of voice: American Speech-Language-Hearing Association expert panel to develop a protocol for instrumental assessment of vocal function. American Journal of Speech-Language Pathology, 27(3), 887–905. [DOI] [PubMed] [Google Scholar]
- Peterson GE, & Barney HL (1952). Control methods used in a study of the vowels. The Journal of the Acoustical Society of America, 24(2), 175–184. [Google Scholar]
- Prame E (1994). Measurements of the vibrato rate of ten singers. The Journal of the Acoustical Society of America, 96(4), 1979–1984. [Google Scholar]
- Ramig LA, & Shipp T (1987). Comparative measures of vocal tremor and vocal vibrato. Journal of Voice, 1(2), 162–167. [Google Scholar]
- Rothenberg M, Miller D, & Molitor R (1988). Aerodynamic investigation of sources of vibrato. Folia Phoniatrica et Logopaedica, 40(5), 244–260. [DOI] [PubMed] [Google Scholar]
- Seidner W, Nawka T, Cebulla M (1995). Dependence of the Vibrato on Pitch, Musical Intensity, and Vowel in Different Voice Classes. In Dejonckere PH, Hirano M, Sundberg J (Ed.), Vibrato (pp. 63–82). San Diego, CA: Singular Publishing Group, Inc. [Google Scholar]
- Shipp T, Leanderson R, & Sundberg J (1980). Some acoustic characteristics of vocal vibrato. J Res Sing, 4, 18–25. [Google Scholar]
- Šrámková H, Granqvist S, Herbst CT, & Švec JG (2015). The softest sound levels of the human voice in normal subjects. The Journal of the Acoustical Society of America, 137(1), 407–418. [DOI] [PubMed] [Google Scholar]
- Sulica L, & Louis ED (2010). Clinical characteristics of essential voice tremor: a study of 34 cases. The Laryngoscope, 120(3), 516–528. [DOI] [PubMed] [Google Scholar]
- Sundberg J (1995). Acoustic and psychoacoustic aspects of vocal vibrato. In Dejonckere PH, Hirano M, & Sundberg J (Eds.), Vibrato (pp. 35–62). San Diego, CA: Singular Publishing Group, Inc. [Google Scholar]
- Svec JG, & Granqvist S (2010). Guidelines for selecting microphones for human voice production research. [DOI] [PubMed] [Google Scholar]
- Švec JG, Titze IR, & Popolo PS (2005). Estimation of sound pressure levels of voiced speech from skin vibration of the neck. The Journal of the Acoustical Society of America, 117(3), 1386–1394. [DOI] [PubMed] [Google Scholar]
- Titze I, Riede T, & Popolo P (2008). Nonlinear source–filter coupling in phonation: Vocal exercises. The Journal of the Acoustical Society of America, 123(4), 1902–1915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Titze IR (2008). Nonlinear source–filter coupling in phonation: Theorya). The Journal of the Acoustical Society of America, 123(5), 2733–2749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Z, Neubauer J, & Berry DA (2006). The influence of subglottal acoustics on laboratory models of phonation. The Journal of the Acoustical Society of America, 120(3), 1558–1569. [DOI] [PubMed] [Google Scholar]





