Abstract
Purpose:
Although relatively precise control over the extent and rate of fundamental frequency (fo) modulation may be needed for optimal production of vibrato, the role of auditory feedback in controlling vibrato is not well understood. Previous studies altered the gain and timing of auditory feedback in singers producing vibrato and showed inconsistent effects on the extent and rate of fo modulation, which may have been related to small sample sizes or limited analyses. Therefore, the purpose of this study was to further investigate whether the gain or timing of auditory feedback impacts control of vibrato in a larger sample of speakers and with advanced statistical analyses.
Method:
Ten classically-trained singers produced sustained vowels with vibrato while their auditory feedback was masked with pink noise or multi-talker babble to reduce the gain of their auditory feedback and while their auditory feedback was delayed by about 200 or 300 ms to alter the timing of their auditory feedback. Acoustical analyses measured changes in the extent and rate of fo modulation in the masked and delayed trials relative to control trials. Bayesian modeling was used to analyze the effects of noise-masked, babble-masked, and delayed auditory feedback on the extent and rate of fo modulation.
Results:
There was compelling evidence that noise masking increased the extent of fo modulation, and babble masking increased the variability in the rate of fo modulation (i.e., jitter of fo modulation). Masked auditory feedback did not affect the average rate of fo modulation. Delayed auditory feedback did not affect the extent, rate, or jitter of fo modulation.
Conclusions:
The current study demonstrated that reducing the gain of the auditory feedback with noise masking increased the extent of fo modulation but did not affect the average rate of fo modulation in classically-trained singers producing vibrato. Reducing the gain of the auditory feedback with babble masking and altering the timing of auditory feedback with imposed delays did not affect the average extent or rate of fo modulation. However, babble masking increased the jitter of fo modulation rate, which suggests that modulated auditory feedback may affect the periodicity of fo modulation from one modulation cycle to the next. These findings clarify the role of auditory feedback in controlling vibrato and may inform the current reflex-resonance models of vibrato.
Keywords: auditory feedback, sensorimotor control, vibrato
Introduction
Vocal vibrato is often used in classical singing and involves modulation of the frequency and intensity of voice [1,2]. These acoustical modulations are characterized by: 1) the extent or the range of modulation, and 2) the rate of modulation, or the number of cycles of modulation occurring in one second. The average extent of fo modulation in typical vibrato is 6–8%, or about 1 semitone above and below the average fo [2–5], and the average rate of fundamental frequency (fo) modulation in typical vibrato is 4–7 Hz [1,2,5]. Within these ranges, the fo modulation contributes to a “richness” of tone [4]. However, extents outside the range of +/− 1 semitone from the average fo are undesirable and thought to be associated with older age or physical deconditioning [3,6]. Additionally, vibrato rates slower than 5 Hz are perceived as “unacceptably slow” and faster than 8 Hz are perceived as “nervous” [3]. Therefore, relatively precise control over the extent and rate of fo modulation may be needed for optimal production of vibrato.
Although the exact mechanisms for neural control of vibrato are not well understood, models of speech motor control such as the Directions into the Velocities of the Articulators (DIVA) model [7,8] could have implications for understanding the mechanisms involved in controlling vibrato. In the DIVA model, speech is controlled by feedforward and feedback systems. The feedforward control system creates a motor program for the intended speech that is relayed to muscles of the respiratory, laryngeal, pharyngeal-oral, and velopharyngeal-nasal subsystems for motor execution and speech production. If the sensory system (i.e., auditory and somatosensory systems) receives information about the produced speech that does not match the intended speech, the feedback system detects this error and responds by immediately sending a corrective command to the muscles. In addition, the feedback system informs the feedforward system of the produced error to update motor programs and prevent future errors.
The feedforward and feedback systems are both implicated in controlling fo during production of steady, sustained vowels in typical speakers [9–11] and singers [12–14]. Titze, Story, Smith, and Long [6] proposed that the feedforward and feedback systems are also involved in controlling fo during production of vibrato. In their reflex-resonance model of vocal vibrato, the feedforward system generates a motor program that signals a constant level of muscle activation for the intended fo. Central oscillators are also activated during production of vibrato and modulate the constant signal. The modulated signal is then transmitted to the laryngeal muscles including the cricothyroid and thyroarytenoid muscles that are involved in controlling fo. Modulated activation of these muscles produces oscillation of vocal fold length and stiffness, which causes modulation of the fo. The feedback system is then involved in maintaining the oscillation of vocal fold length and modulation of fo through a sensorimotor reflex, according to the reflex-resonance model. Specifically, when the cricothyroid muscle contracts, peripheral sensory receptors (i.e., muscle spindles) detect the change in vocal fold length. This somatosensory information is immediately relayed to the brainstem, which initiates a counteractive motor response that activates the opposing thyroarytenoid muscle. Contraction of the thyroarytenoid muscle changes vocal fold length, which is detected by the somatosensory system and initiates a subsequent motor response. This sensorimotor reflex alternately produces contraction of the cricothyroid and thyroarytenoid muscles, which maintains modulation of vocal fold length and fo. The gain of the alternating somatosensory responses determines the level of opposing muscle activation, and the delay in the sensorimotor system determines the timing of the muscle activation. In a typical system, the period of one full cycle—peak to peak cricothyroid activation—is 170ms or 6 Hz [6], which is consistent with average rate of fo modulation in vibrato [1,2].
The gains and delays of the proposed sensorimotor reflex were tested by Titze, Story, Smith, and Long [6] using computational modeling, as well as mechanical perturbation of the larynx in healthy speakers producing sustained vowels and electrical stimulation of the intrinsic laryngeal muscles in healthy singers producing “vibrato-free” sustained vowels. These studies focused on the role of somatosensory feedback in controlling fo. However, the speakers and singers received normal voice auditory feedback during the experiments, and Titze, Story, Smith, and Long [6] suggested that auditory feedback might also contribute reflexive control of vibrato. Furthermore, Leydon, Bauer, and Larson [15] demonstrated that modulated auditory feedback induced modulation of fo in healthy speakers. In their experiment, participants produced sustained vowels while their auditory feedback was sinusoidally modulated at a rate of 1–10 Hz and with an extent of ±25 cents (peak-to-peak modulation of 50 cents). The resultant fo modulation in the voice output was the greatest when the modulation rates were between 4–7 Hz, suggesting that reflexive auditory-motor control may contribute to the 5–7 Hz fo modulation associated with vibrato.
More recently, Brajot and Lawrence [16] studied the influence of auditory feedback on voice and speech modulation in typical speakers by delaying the presentation of auditory feedback during sustained vowel production. They found that, with typical auditory feedback, speakers exhibited a 3 Hz modulation of fo. As the delay in auditory feedback increased from 100–600 ms in 100 ms increments, the extent and rate of fo modulation linearly increased. Delayed auditory feedback did not significantly affect modulation of the first or second formants in this study. Subsequently, Brajot and Neiman [17] expanded the reflex-resonance model to incorporate an auditory feedback loop [17]. The model was evaluated by altering the auditory feedback gains (i.e., the amplitude of the auditory feedback) and delays in a young, healthy speaker and in an individual with intermittent vocal tremor related to multiple sclerosis. This study revealed that the standard deviation of fo, which represented the fo modulation extent, increased as the gain increased, whereas the rate of fo modulation decreased as the feedback delay increased for both participants.
Previous studies also altered the gain and timing of auditory feedback to study auditory-motor control of vocal vibrato. The gain of auditory feedback was altered in two studies using noise masking. Schultz-Coloun and Battmer [18], as cited by Shipp, Sundberg, and Doherty [19], masked auditory feedback with “high-level noise” in one singer. They found that the singer’s extent of fo modulation became more variable, although the rate of fo modulation remained the same. Ward and Burns [20] masked auditory feedback using random low-pass filtered noise in one singer producing vibrato. These authors found that the extent of fo modulation declined more when the singer was instructed to inhibit vibrato in the presence of masking noise than when they were instructed to inhibit vibrato with normal auditory feedback. However, the presence of masking noise without the instruction to inhibit vibrato did not affect the extent of fo modulation. Differences in the results of Schultz-Coloun and Battmer [18] and Ward and Burns [20] could have been related to individual differences between the two singers, the procedures used for masking, or the methods used to estimate fo modulation.
Two other studies altered the timing of auditory feedback by delaying the presentation of the auditory feedback during production of vibrato. Deutsch and Clarkson [21] delayed auditory feedback by 91, 197, 366, and 548 ms in 13 singers and reported that the extent and rate of fo modulation increased as the delay increased. Similarly, Shipp, Sundberg, and Doherty [19] delayed auditory feedback in three singers and found that fo modulation rate increased when auditory feedback was delayed by 120, 300, and 500 ms. However, they found that fo modulation rate did not change when the auditory feedback was delayed by 200 or 400 ms, and they did not find a significant difference in fo modulation extent or fo modulation rate variability (i.e., jitter of fo modulation) for any of the delays. The authors suggested that delays of 200 and 400 ms may not have affected fo modulation extent or rate because the period of the vibrato would have aligned with the duration of the delay. Differences between the findings of Deutsch and Clarkson [21] and Shipp, Sundberg, and Doherty [19] could have been related to the durations of delay, instrumentation used to process and record signals, sample size, analysis approach, or participants’ singing training and experience. These conflicting findings from previous studies suggest that additional work using expanded data collection and analysis procedures are needed to clarify of the role of feedback systems in controlling the extent and rate of fo modulation in vibrato.
The purpose of the present study was to further investigate whether the gain or timing of auditory feedback impacts control of vibrato. Specifically, this study addressed the following question: does masked or delayed auditory feedback affect the extent or rate of fo modulation in classically-trained singers producing vibrato? Based on the reflex-resonance model [6], masking somatosensory feedback would be expected to reduce the gain of the somatosensory response, thereby reducing the magnitude of reflexive motor response and the extent of fo modulation. The recently expanded reflex-resonance model that incorporated an auditory feedback loop [17] would suggest the same. However, we predicted that the somatosensory control mechanisms studied by Titze, Story, Smith, and Long [6] and the auditory-motor control mechanisms studied by Brajot and Neiman [17] during steady, sustained vowel production might differ from auditory-motor control mechanisms during production of vibrato. We hypothesized that singers have an intended extent of fo modulation that they expect to hear in their auditory feedback, and that masking their auditory feedback would reduce their ability to detect the intended extent of fo modulation, leading to an increase in the produced extent of fo modulation. As a secondary question, we investigated whether masking auditory feedback with pink noise or multi-talker babble would have differential effects on the extent of fo modulation. Because previous studies have indicated that attention to auditory feedback influences auditory-motor control of fo [22,23], we hypothesized that the effect would be larger in the babble masking condition than the noise masking condition, as babble would not only mask the singers’ air-conducted feedback like noise but might also distract them from attending to their auditory, somatosensory, and bone-conducted feedback. We hypothesized that delaying the auditory feedback would alter timing of the auditory-motor response, thereby changing the timing of the reflexive motor response and the rate of fo modulation. We further hypothesized based on the findings of Shipp, Sundberg, and Doherty [19] that a delay maintaining an in-phase relationship between the voice output and auditory feedback would not affect vibrato rate; whereas, a delay causing an out-of-phase relationship between the voice output and auditory feedback would reduce the regularity of the rate of fo modulation. Findings of the present study could improve the understanding of the role of the auditory feedback system in control of vibrato and inform current models of sensorimotor control of voice. Additionally, the methods for altering the gain and timing of auditory feedback developed for the current study could be applied in future studies to investigate essential vocal tremor, a neurogenic voice disorder that has both acoustical and physiological similarities to vibrato [24,25] and may involve impaired speech motor control.
Method
Participants
Ten healthy classically-trained singers (six female and four male; ages 22 to 53 years) participated in this study. The same participants completed the fo perturbation experiments described by Lester-Smith, Kim, Hilger, Chan, and Larson [26]. Participants denied current neurological, speech, language, cognitive and voice disorders. All participants reported at least 4 years of classical singing training and experience. Further details about participant characteristics are reported in Lester-Smith, Kim, Hilger, Chan, and Larson [26]. This study was approved by the Northwestern University Institutional Review Board (NU IRB).
Procedure
All participants completed the informed consent process according to NU IRB guidelines. Participants were informed that the study purpose was “to understand how speakers use what they hear to control their voice.”
Hearing Threshold Test.
All participants completed hearing threshold testing and had bilateral hearing thresholds ≤ 25 dB HL at octave intervals between 250–4000 Hz.
Data Collection.
Procedures and equipment for data collection were similar to those described by Lester-Smith, Kim, Hilger, Chan, and Larson [26]. Data were collected in a quiet clinical room. Each participant was seated in front of a laptop and wore an AKG C520 head-mounted condenser cardioid microphone positioned 4 cm from the corner of the mouth at about a 45 degree angle. The microphone signal was digitized (MOTU UltraLite-mk3) and routed to a multi-channel data acquisition device (ADInstruments PowerLab 8SP ML 785) connected to a laptop computer (Apple MacBook Pro A1278) with LabChart software (ADInstruments, 2009, Version 7.0.3) for signal recording. A 10 kHz sampling rate was used to capture fo while reducing the file size for each experimental recording. The digitized microphone signal was also routed to a second laptop computer (Apple MacBook Pro A1278) with CueMix Fx software (MOTU, 2017, version 1.6 7322) and Max software (Cycling ‘74, 2017, Version 7) to control the timing of the experimental trials, visual cues, and auditory feedback. Auditory feedback from Max was sent via the MOTU UltraLite-mk3 to an earphone amplifier (Aphex HeadPod 4) and then to the insert earphones (Etymotic ER-2) as well as the PowerLab for recording of the auditory feedback signal. The ER-2 foam ear tips were inserted deeply to reduce air-conducted and bone-conducted feedback of the produced voice. In addition, auditory feedback was calibrated to be 10 dB SPL louder than the microphone input to mask air-conducted feedback of the produced voice. Levels were calibrated with a Brüel & Kjær Type 2250 sound level meter, 2 cc coupler, and 1000 Hz pure tone presented with a handheld recorder (Olympus VN-541PC) 4 cm from the microphone.
Participants were asked to repeatedly produce the sustained vowel /ɑ/ with vibrato for 3 s using a comfortable pitch that they could maintain in each of the three experiments described below. They were instructed to produce the vowel /ɑ/ when a visual cue “aaah” appeared on the laptop screen and to maintain a target loudness based on a sound level meter presented on the same screen. The target intensity was approximately 70 dB SPL at 4 cm from the corner of the mouth. Participants were asked to take a breath and prepare for the next trial when the visual cue “breathe” appeared on the screen for 2–4 seconds (randomly jittered). Participants completed six practice trials before the experimental trials in the three conditions described below. The condition order was pseudorandomized and counterbalanced across participants.
Noise-Masked Auditory Feedback.
Participants were informed that they would hear their voice through the earphones for some trials and noise in the earphones for other trials. For the six practice trials, participants were presented with voice auditory feedback for three trials and pink noise to mask their auditory feedback for three trials. These trials were completed in random order. The onset of the noise occurred at the start of the trial prior to voice onset, and a 500 ms ramp in amplitude was applied to prevent a startle response to the 80 dB SPL noise. The noise was presented continuously for 4.5 s to ensure that the full 3 s vowel production was masked. For the experimental trials, participants received voice auditory feedback for 20 trials (control trials) and pink noise for 20 trials, with the order randomly determined. Example waveforms of the microphone signal and headphone amplifier output signal for a control trial and a noise-masked trial are shown in Fig. 1.
Figure 1:

Waveforms representing the microphone signal (black) and headphone amplifier output signal (orange) for a control trial (left), noise-masked trial (middle), and babble-masked trial (right) for one participant producing a sustained vowel /ɑ/ with vibrato.
Babble-Masked Auditory Feedback.
Participants were informed that they would hear their voice in the earphones for some trials and people talking in the earphones for other trials. The practice and experimental trials were identical to those in the noise-masked condition, except that multi-talker babble was presented instead of pink noise. The multi-talker babble was comprised of three male and three female speakers producing sentences, with the signal amplitude fluctuating between about 76 to 84 dB SPL. Example waveforms of the microphone signal and headphone amplifier output signal for a babble-masked trial are shown in Fig. 1.
Delayed Auditory Feedback.
Participants were informed that they would hear their voice in the earphones. For the six practice trials, participants were presented with immediate voice auditory feedback for two trials, auditory feedback delayed by about 200 ms for two trials, and auditory feedback delayed by about 300 ms for two trials. The ~200 ms delay was expected to be in phase with the participants’ fo modulation rate, while the ~300 ms delay was expected to be out of phase with the participants’ fo modulation rate based on Shipp, Sundberg, and Doherty [19]. For the experimental trials, participants received voice auditory feedback for 20 trials (control trials), voice auditory feedback delayed by ~200 ms for 20 trials, and voice auditory feedback delayed by ~300 ms for 20 trials, with the order randomly determined. Example fo contours of the microphone signal and headphone amplifier output signal for a control trial, ~200 ms delayed trial, and ~300 ms delayed trial are shown in Fig. 2.
Figure 2:

Fundamental frequency (fo) contours representing the microphone signal (black) and headphone amplifier output signal (orange) for a control trial (left), ~200 ms delayed trial (middle), and ~300 ms delayed trial (right) for one participant producing a sustained vowel /ɑ/ with an average vibrato rate of 4.5 Hz in the control trials.
Data Analysis
Audio recordings were visually inspected in Praat (Boersma & Weenink, 2019–2020, versions 6.0.50–6.1.16) for the accuracy of voice onset identification and fo tracking for each trial. Timing pulses were generated by Max software during data collection to designate voice onset. When the pulses occurred more than 200 ms before or after the onset of the fo trace, voice onset was re-identified using a custom-written Praat script that created a text grid and marked periods of sound and silence. The onset of each sound period was then used to designate voice onset. When fo tracking appeared to be inconsistent or inaccurate (e.g., during instances of glottal fry or aperiodicity), the default pitch range of 75–500 Hz was adjusted. The pitch range was incrementally adjusted around the participant’s mean fo until the fo trace appeared to be consistent with the fo represented by the narrowband spectrogram. When the default settings were changed for a participant in one experimental condition, the same pitch range was used for all three conditions for a given participant to ensure consistency in the analyses.
Estimates of the fo for the first 2 s of each trial were then obtained using custom-written Praat scripts, which created a pitch object, smoothed the pitch object with a 10 Hz bandwidth, and identified the minimum and maximum fo peak values and times. Extent was calculated for each modulation cycle in the 2 s window using the formula: (fmax - fmin) / (fmax + fmin) × 100. The average fo modulation extent was then determined for each trial. The cycle period was calculated as the time difference between the peak of one cycle and the peak of the preceding cycle. The average fo modulation rate was determined for each trial using the formula 1/T, where T was the average cycle period for the trial. Jitter of fo modulation was also calculated to estimate variability in the rate of fo modulation from one cycle of modulation to the next, based on Shipp, Sundberg, and Doherty [19]. The difference between the period of each cycle of fo modulation and the preceding cycle of modulation was determined. An average period difference was then calculated for each trial and converted to a percentage change. In order to determine the actual delay induced by Max software during the delayed auditory feedback experiments, cross-correlation analyses of the recorded microphone signal and the recorded headphone amplifier output signal were completed in Matlab (The Mathworks, Inc., 2018, version R2018a).
Statistical Analysis
Statistical analyses were conducted with R [27, version 4.0.5] using RStudio [28, version 1.4.1103]. Measures of fo modulation extent, rate, and jitter were submitted to Bayesian hierarchical models using Stan modeling language [29] and the R package brms [30]. A detailed description of Bayesian statistics is beyond the focus for this paper; however, please refer to Nalborczyk, Batailler, Lœvenbruck, Vilain, and Bürkner [31] for a tutorial on applying Bayesian statistics to speech acoustics research. Bayesian modeling was chosen over frequentist modeling because of the flexibility in defining hierarchical models that include maximal random effect structure as recommended by Barr, Levy, Scheepers, and Tily [32].
To assess the effects of masked and delayed auditory feedback on fo modulation extent, rate, and jitter, we fit two separate Bayesian hierarchical regression models (BHRM). The first model was fit to fo modulation extent, rate, and jitter predicted by masked auditory feedback (three conditions: control trials, noise-masked trials, and babble-masked trials). The second model was fit to the same dependent variables (fo modulation extent, rate, and jitter) predicted by delayed auditory feedback (three conditions: control trials, ~200 ms delayed trials, and ~300 ms delayed trials). Both models included maximal random-effect structures with a random intercept for participants and random slopes that allowed the effects of masked and delayed auditory feedback to vary by participant.
Weakly informative priors were specified for all model parameters. For the model predictors (masked and delayed auditory feedback), we used regularizing gaussian priors (µ = 0, σ = 2), signifying that we assumed no effect of masked or delayed auditory feedback on the dependent variables. For the random effects, a half Cauchy distribution (µ = 0, σ = 0.1) was used for the standard deviation and an LKJ(2) distribution for the correlation. For the residual standard deviation, a half Cauchy distribution was used (µ = 0, σ = 1). Four sampling chains with 2,000 iterations were run for each model, with a warm-up period of 1,000 iterations. The 95% credible intervals (CI) and the posterior probability that the masked or delayed auditory feedback coefficient was smaller or larger than zero Pr(β> or < 0) are reported below. The 95% CI indicated 95% certainty that the true value lay within the specified interval. When the 95% interval did not overlap with zero and when Pr(β> or < 0) was close to zero, we concluded that there was compelling evidence for an effect.
Results
The results of the fo modulation extent, rate, and jitter analyses for the masked and delayed auditory feedback experiments are reported for each participant in Table 1 and 2, respectively. Average results and statistical analyses are presented below.
Table 1:
Average results of the acoustical analyses for each participant in the masked auditory feedback experiments, presented as mean (standard deviation).
| Participant | Experiment | fo mod extent (%) | fo mod rate (Hz) | fo mod jitter (%) | |||
|---|---|---|---|---|---|---|---|
| control | experimental | control | experimental | control | experimental | ||
| P1 | Noise | 4.0 (0.5) | 4.1 (0.6) | 5.0 (0.2) | 5.5 (0.2) | 2.1 (1.1) | 1.8 (0.6) |
| Babble | 3.4 (0.3) | 3.5 (0.6) | 5.1 (0.4) | 5.5 (0.2) | 2.6 (1.2) | 2.5 (1.7) | |
| P2 | Noise | 1.3 (0.4) | 1.5 (0.4) | 4.3 (0.4) | 4.3 (0.2) | 3.7 (1.3) | 4.5 (2.1) |
| Babble | 1.2 (0.2) | 0.9 (0.3) | 4.3 (0.4) | 4.3 (0.4) | 4.5 (3.2) | 6.4 (4.0) | |
| P3 | Noise | 1.4 (0.2) | 1.8 (0.3) | 5.2 (0.1) | 5.1 (0.1) | 1.5 (0.5) | 1.6 (1.1) |
| Babble | 1.2 (0.2) | 1.5 (0.3) | 5.2 (0.3) | 5.3 (0.2) | 3.1 (1.1) | 2.7 (2.5) | |
| P4 | Noise | 1.5 (0.5) | 2.0 (0.7) | 6.0 (0.5) | 5.3 (0.4) | 2.9 (1.0) | 3.5 (1.1) |
| Babble | 2.1 (0.5) | 2.5 (0.6) | 5.7 (0.4) | 5.2 (0.3) | 2.8 (0.9) | 2.8 (1.1) | |
| P5 | Noise | 3.6 (0.6) | 3.7 (0.5) | 4.6 (0.2) | 4.6 (0.2) | 1.0 (0.3) | 1.4 (0.7) |
| Babble | 3.4 (0.7) | 3.6 (0.5) | 4.5 (0.2) | 4.5 (0.1) | 1.4 (0.6) | 1.1 (0.3) | |
| P6 | Noise | 1.2 (0.2) | 1.3 (0.3) | 3.6 (0.2) | 3.7 (0.2) | 3.1 (1.4) | 3.9 (2.8) |
| Babble | 1.1 (0.3) | 1.3 (0.3) | 3.7 (0.2) | 3.8 (0.4) | 4.0 (3.3) | 4.9 (2.3) | |
| P7 | Noise | 2.9 (0.3) | 2.8 (0.5) | 4.6 (0.1) | 4.7 (0.1) | 1.1 (0.4) | 1.7 (0.6) |
| Babble | 2.7 (0.3) | 2.7 (0.3) | 4.5 (0.2) | 4.7 (0.1) | 1.3 (0.5) | 1.8 (0.9) | |
| P8 | Noise | 2.6 (0.4) | 2.1 (0.2) | 4.3 (0.3) | 4.3 (0.2) | 2.6 (0.9) | 3.0 (1.2) |
| Babble | 2.6 (0.3) | 2.1 (0.3) | 4.5 (0.3) | 4.5 (0.3) | 2.3 (1.2) | 3.4 (1.8) | |
| P9 | Noise | 1.4 (0.4) | 1.8 (0.4) | 5.3 (0.7) | 5.0 (0.4) | 5.3 (1.8) | 3.3 (1.2) |
| Babble | 0.9 (0.4) | 0.9 (0.3) | 5.2 (0.7) | 5.3 (0.7) | 6.9 (2.4) | 7.6 (4.6) | |
| P10 | Noise | 2.1 (0.1) | 2.6 (0.2) | 4.6 (0.2) | 4.7 (0.1) | 1.7 (0.6) | 1.6 (0.8) |
| Babble | 1.9 (0. 3) | 2.3 (0.3) | 4.7 (0.2) | 4.7 (0.1) | 1.8 (1.7) | 1.6 (0.6) | |
Table 2:
Average results of the acoustical analyses for each participant in the delayed auditory feedback experiment, presented as mean (standard deviation).
| Participant | fo mod extent (%) | fo mod rate (Hz) | fo mod jitter (%) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| control | ~200 ms | ~300 ms | control | ~200 ms | ~300 ms | control | ~200 ms | ~300 ms | |
| P1 | 4.1 (0.5) | 3.6 (0.6) | 3.6 (0.7) | 5.1 (0.3) | 5.1 (0.3) | 5.2 (0.6) | 2.1 (1.2) | 3.3 (2.0) | 2.2 (0.9) |
| P2 | 0.7 (0.4) | 0.7 (0.3) | 0.8 (0.4) | 4.3 (0.5) | 4.6 (0.8) | 4.5 (0.5) | 8.6 (5.1) | 7.7 (6.8) | 8.5 (5.1) |
| P3 | 1.3 (0.3) | 1.3 (0.3) | 1.4 (0.3) | 5.2 (0.2) | 5.3 (0.2) | 5.3 (0.1) | 2.2 (1.0) | 1.7 (0.8) | 1.8 (0.6) |
| P4 | 1.4 (0.4) | 1.5 (0.3) | 1.5 (0.5) | 6.0 (0.6) | 6.1 (0.4) | 5.9 (0.6) | 2.9 (0.9) | 2.2 (0.8) | 2.6 (1.8) |
| P5 | 3.3 (0.5) | 3.8 (0.5) | 2.9 (0.4) | 4.5 (0.2) | 4.4 (0.1) | 4.6 (0.3) | 1.3 (0.5) | 1.0 (0.4) | 1.1 (0.4) |
| P6 | 0.9 (0.2) | 1.0 (0.2) | 1.0 (0.2) | 3.8 (0.5) | 4.0 (0.3) | 3.6 (0.4) | 4.2 (2.2) | 4.2 (3.4) | 6.2 (2.7) |
| P7 | 2.8 (0.3) | 2.7 (0.3) | 2.1 (0.3) | 4.5 (0.1) | 4.6 (0.1) | 5.0 (0.2) | 1.1 (0.5) | 1.5 (0.6) | 1.7 (0.8) |
| P8 | 2.6 (0.4) | 2.5 (0.5) | 2.2 (0.4) | 4.5 (0.3) | 4.4 (0.4) | 4.2 (0.4) | 2.7 (1.4) | 3.0 (1.7) | 4.2 (1.7) |
| P9 | 0.7 (0.1) | 0.7 (0.2) | 0.6 (0.2) | 5.2 (0.6) | 5.3 (0.7) | 5.3 (0.7) | 8.2 (2.8) | 6.5 (2.1) | 8.0 (3.6) |
| P10 | 1.8 (0.2) | 1.7 (0.2) | 1.9 (0.3) | 4.6 (0.2) | 4.6 (0.2) | 4.8 (0.1) | 1.7 (0.7) | 1.6 (0.7) | 1.4 (0.5) |
Masked Auditory Feedback Experiments
The average fo modulation extent was 2.1% (SD = 1.0) in the control trials, 2.4% (SD = 0.9) in the noise-masked trials, and 2.1% (SD = 1.0) in the babble-masked trials. The average fo modulation rate was 4.7 Hz (SD = 0.6) in the control trials, 4.7 (SD = 0.5) Hz in the noise-masked trials, and 4.8 Hz (SD = 0.5) in the babble-masked trials. The average fo modulation jitter was 2.8% (SD = 1.5) in the control trials, 2.6% (SD = 1.1) in the noise-masked trials, and 3.5% (SD = 2.2) in the babble-masked trials.
The 95% credible intervals and mean estimates for fo modulation extent, rate, and jitter by condition are shown in Fig. 3 and Table 3. Contingent on the data and model, there was compelling evidence that, compared with control trials, noise masking increased fo modulation extent (β = 0.28, 95%CI = [0.00, 0.56]; Pr(β > 0) = 0.02) (Fig. 4), and babble masking increased fo modulation jitter (β = 0.36, 95%CI = [0.01, 0.77]; Pr(β > 0) = 0.02) (Fig. 5). There was also evidence, though not compelling, that noise masking increased fo modulation extent more than babble masking (β = 0.30, 95%CI = [−0.05, 0.68]; Pr(β < 0) = 0.05) (Fig. 6), and babble masking increased fo modulation jitter more than noise masking (β = 0.39, 95%CI = [−0.15, 0.89]; Pr(β < 0) = 0.06) (Fig. 7).
Figure 3:

Mean estimate and 95% credible interval for fo modulation extent (upper panel), fo modulation rate (middle panel), and fo modulation jitter (lower panel) for the masked auditory feedback experiments. Contrasts between each condition are listed in each panel. For the contrasts, an overlap with the zero line indicated a lack of compelling evidence for an effect.
Table 3:
Model output for the masked auditory feedback experiments. Mean estimate and 95% credible interval are presented for the BHRM on the effect of masked auditory feedback on each measure of vocal vibrato. Rhat is reported as an indication of model convergence (at convergence, Rhat is around 1.00). The posterior probability that the contrast coefficients are less than or greater than zero is also presented. The contrasts with compelling evidence for an effect are in bold.
| Response | Term | Estimate | Lower | Upper | Rhat | Pr(β<0), Pr(β>0) |
|---|---|---|---|---|---|---|
| fo modulation extent | Control - Noise | −0.28 | −0.56 | 0.01 | 1.00 | (0.98, 0.02) |
| fo modulation extent | Control - Babble | 0.02 | −0.27 | 0.32 | 1.00 | (0.43, 0.57) |
| fo modulation extent | Noise - Babble | 0.30 | −0.05 | 0.68 | 1.00 | (0.05, 0.95) |
| fo modulation rate | Control - Noise | 0.00 | −0.18 | 0.19 | 1.00 | (0.50, 0.50) |
| fo modulation rate | Control - Babble | −0.04 | −0.22 | 0.15 | 1.00 | (0.66, 0.34) |
| fo modulation rate | Noise - Babble | −0.04 | −0.14 | 0.06 | 1.00 | (0.78, 0.22) |
| fo modulation jitter | Control - Noise | 0.03 | −0.41 | 0.48 | 1.00 | (0.43, 0.57) |
| fo modulation jitter | Control – Babble | −0.36 | −0.77 | −0.01 | 1.00 | (0.98, 0.02) |
| fo modulation jitter | Noise - Babble | −0.39 | −0.89 | 0.15 | 1.00 | (0.94, 0.06) |
Figure 4:

Observed differences in fo modulation extent between control trials and noise-masked trials. Each dot represents a participant. Grey dots indicate values with higher fo modulation extent for noise-masked trials compared with control trials. The overall negative difference reflects an increase in fo modulation extent for noise-masked trials. The grey density shape represents the probability density along the measurement.
Figure 5:

Observed differences in fo modulation jitter between control trials and babble-masked trials. Each dot represents a participant. Grey dots indicate values with higher fo modulation jitter for babble-masked trials compared with control trials. The overall negative difference reflects an increase in fo modulation jitter for babble-masked trials. The grey density shape represents the probability density along the measurement.
Figure 6:

Observed differences in fo modulation extent between noise-masked trials and babble-masked trials. Each dot represents a participant. Orange dots indicate values with higher fo modulation extent for the noise-masked trials compared with the babble-masked trials. The overall positive difference reflects an increase in fo modulation extent for noise-masked trials. The grey density shape represents the probability density along the measurement.
Figure 7:

Observed differences in fo modulation jitter between noise-masked trials and babble-masked trials. Each dot represents a participant. Grey dots indicate values with lower fo modulation jitter for the noise-masked trials compared with the babble-masked trials. The overall negative difference reflects an increase in fo modulation jitter for babble-masked trials. The grey density shape represents the probability density along the measurement.
Delayed Auditory Feedback Experiment
The average fo modulation extent was 2.0% (SD = 1.2) in the control trials, 2.0% (SD = 1.1) in the ~200 ms delay trials, and 1.8% (SD = 1.0) in the ~300 ms delay trials. The average fo modulation rate was 4.8 Hz (SD = 0.6) in the control trials, 4.8 Hz (SD = 0.6) in the ~200 ms delay trials, and 4.8 Hz (SD = 0.6) in the ~300 ms delay trials. The average fo modulation jitter was 3.5% (SD = 2.7) in the control trials, 3.2% (SD = 2.2) in the ~200 ms delay trials, and 3.8% (SD = 2.8) in ~300 ms delay trials.
The cross-correlation analyses revealed that the measured timing difference between the microphone signal and the recorded headphone amplifier output signal was 24 ms for the control trials and 240 and 340 ms for the delayed trials for nine of the 10 participants. For the first participant in the experiment, the measured timing difference was 24 ms for the control trials and 224 and 324 ms for the delayed trials. The difference in the delay for this participant’s trials was found to be related to a difference in the delay coded in the experimental scripts.
The 95% credible intervals and mean estimates for fo modulation extent, rate, and jitter by condition are presented in Fig. 8 and Table 4. Contingent on the data and model, there was no compelling evidence that delayed auditory feedback affected the acoustical measures of vibrato.
Figure 8:

Mean estimate and 95% credible interval for fo modulation extent (upper panel), fo modulation rate (middle panel), and fo modulation jitter (lower panel) for the delayed auditory feedback experiment. Contrasts are listed for each figure in each panel between each condition. For the contrasts, an overlap with the zero line indicates a lack of compelling evidence for an effect.
Table 4:
Model output for the delayed auditory feedback experiments. Mean estimate and 95% credible interval are presented for the BHRM on the effect of delay auditory feedback on each measure of vocal vibrato. Rhat is reported as an indication of model convergence (at convergence, Rhat is around 1.00). The posterior probability that the contrast coefficients are less than or greater than zero is also presented. There were no contrasts with compelling evidence for an effect.
| Response | Term | Estimate | Lower | Upper | Rhat | Pr(β<0), Pr(β>0) |
|---|---|---|---|---|---|---|
| fo modulation extent | Control – 200ms | −0.04 | −0.16 | 0.09 | 1.00 | (0.75, 0.25) |
| fo modulation extent | Control – 300ms | 0.06 | −0.08 | 0.22 | 1.00 | (0.16, 0.84) |
| fo modulation extent | 200ms – 300ms | 0.10 | −0.04 | 0.25 | 1.00 | (0.07, 0.93) |
| fo modulation rate | Control – 200ms | −0.06 | −0.18 | 0.05 | 1.00 | (0.87, 0.13) |
| fo modulation rate | Control – 300ms | −0.06 | −0.23 | 0.12 | 1.00 | (0.76, 0.24) |
| fo modulation rate | 200ms – 300ms | 0.01 | −0.18 | 0.19 | 1.00 | (0.47, 0.53) |
| fo modulation jitter | Control – 200ms | 0.13 | −0.25 | 0.55 | 1.00 | (0.23, 0.77) |
| fo modulation jitter | Control – 300ms | −0.14 | −0.61 | 0.36 | 1.00 | (0.73, 0.27) |
| fo modulation jitter | 200ms – 300ms | −0.28 | −0.75 | 0.10 | 1.00 | (0.93, 0.07) |
Discussion
The purpose of this study was to further investigate the effects of masked and delayed auditory feedback on the extent and rate of fo modulation in classically-trained singers producing vibrato. This investigation was needed to clarify the role of auditory feedback in controlling vibrato due to the inconsistent findings across previous studies with small samples of participants and limited analyses. Bayesian modeling with data from ten classically-trained singers revealed that masking auditory feedback with pink noise increased the extent of fo modulation relative to control trials with unmasked auditory feedback. This finding was consistent with our hypothesis and may indicate that, when singers cannot hear their intended extent of fo modulation in their auditory feedback, they increase the extent of fo modulation in an attempt to achieve the desired extent. This finding was inconsistent with the lack of effect of noise masking on the extent of fo modulation in the singer studied by Ward and Burns [20] and with the predicted effect of reduced sensory feedback gain in the original reflex-resonance model [6] and the expanded reflex-resonance model [17].
The inconsistency between the observed effect of noise masking and the predicted effect of reducing the gain of the sensory response may be related to the reflex-resonance model being based primarily on studies of somatosensory feedback, which used mechanical perturbation of the larynx with normal voice auditory feedback and computational modeling. Larson, Altman, Liu, and Hain [33] suggested that there may be linear or non-linear interactions of somatosensory and auditory feedback, wherein alteration of somatosensory feedback may oppose auditory feedback responses or adjust the gain of responses to auditory feedback. As such, it is possible that the combination of altered somatosensory feedback and typical auditory feedback in previous studies affected control of fo in a different way than typical somatosensory feedback and altered auditory feedback in the current experiment. In addition, the findings may have been inconsistent with the expanded reflex-resonance model because the model was based on a typical speaker and a speaker with vocal tremor related to multiple sclerosis. Alternatively, because noise masking would have reduced the gain of auditory feedback in experimental trials, while amplifying voice auditory feedback in control trials would have increased the gain of auditory feedback, the current study may have produced a different effect than Brajot and Neiman [17] found with amplified auditory feedback only.
Although masking auditory feedback with pink noise increased the extent of fo modulation in the current study, there was no compelling evidence that masking auditory feedback with multi-talker babble affected the extent of modulation. This contradicted our hypothesis that babble masking would have a larger effect on fo modulation because it would not only mask the air-conducted feedback but might also distract participants from their sensory feedback. It is possible that babble did not adequately mask auditory feedback because the intensity of the babble masking varied between 76–84 dB SPL, while the intensity of the noise was consistently 80 dB SPL. It is also possible that participants habituated to the babble across trials because the same multi-talker recording was repeated for all experimental trials.
The current study revealed that there was no effect of masking auditory feedback on the rate of fo modulation. This finding was consistent with our hypotheses, the findings of Schultz-Coloun and Battmer [18], as cited by Shipp, Sundberg, and Doherty [19], and predictions of the reflex-resonance models. However, babble masking did increase the jitter of fo modulation rate, indicating that there was an increase in the variability of fo modulation rate from one cycle of modulation to the next. Because the amplitude envelope of speech has a dominant modulation rate between 4–5 Hz [see 34 for review], and three speakers were talking simultaneously in the multi-talker babble recordings, the increase in jitter of fo modulation may indicate that an irregular modulation of auditory feedback affected cycle-to-cycle periodicity of fo modulation.
The current study also revealed that there was also no effect of delayed auditory feedback on extent of fo modulation, consistent with our hypotheses, the findings of Shipp, Sundberg, and Doherty [19], and the predictions of the reflex-resonance models. This finding was inconsistent with the findings of Deutsch and Clarkson [21], who showed that the extent of fo modulation increased as the delay in auditory feedback increased. The inconsistent findings may be related to differences in the imposed delays. That is, Deutsch and Clarkson [21] induced delays of 91, 197, 366, and 548 ms, which differed from the induced delays in the current study and in Shipp, Sundberg, and Doherty [19].
Finally, the current study revealed that there was no effect of delayed auditory feedback on the rate of fo modulation. This finding was inconsistent with our hypothesis that delaying the auditory feedback would alter timing of the auditory-motor response, thereby changing the timing of the reflexive motor response and the rate of fo modulation. The finding was also inconsistent with the findings of Deutsch and Clarkson [21], who saw that increasing delays increased the rate of fo modulation in 13 singers, and Shipp, Sundberg, and Doherty [19], who saw that delays of 120, 300, and 500 ms increased the rate of fo modulation in three singers, while delays of 200 and 400 ms did not affect the rate of fo modulation. Although Shipp, Sundberg, and Doherty [19] suggested that delays of 200 and 400 ms did not affect the rate of fo modulation because they aligned with the singers’ typical rates of fo modulation, the singers reportedly had fo modulation rates of 5.3, 5.7, and 5.8 Hz in the control trials. Therefore, delays of 189, 175, 172 ms respectively (or integer multiples of these delays) would have been required to maintain an in-phase relationship with the participants’ rate of fo modulation.
The cross-correlation analyses in the current study revealed that the measured timing difference between the microphone signal and the headphone amplifier output signal was 224–240 and 324–340 ms for the delayed trials. It should be noted that the measured timing difference between the recorded microphone signal and the recorded headphone amplifier output signal did not capture the additional input hardware delay, which was probably less than 10 ms based on Kim, Wang, and Max [35]. Therefore, the induced delays were likely closer to 234–250 ms and 334–350 ms, which would correspond to fo modulation rates of 4 Hz and 3 Hz respectively. With two participants having an fo modulation rate close to 4 Hz, four participants having an fo modulation rate close to 4.5 Hz, three participants having an fo modulation rate close to 5 Hz, and one participant having an fo modulation close to 6 Hz, delays of about 250 ms, 225 ms, 200 ms, and 165 ms would have been required to align the phase of modulation in the microphone signal and headphone signal for all participants. For future studies, the delay between the voice output and the auditory input should be measured using the procedures described by Kim, Wang, and Max [35], and the duration of the induced delay should be aligned with each participant’s typical rate of fo modulation. Furthermore, because experimental hardware and software also induced delays in the control trials, future experiments should use normal auditory feedback for the control trials.
Conclusions
Bayesian modeling with data from ten classically-trained singers producing vibrato revealed that reducing the gain of auditory feedback with pink noise increased the extent of fo modulation, and reducing the gain of auditory feedback with multi-talker babble increased the variability of the fo modulation rate (i.e., jitter of fo modulation). Reducing the gain of auditory feedback did not affect the average rate of fo modulation. Altering the gain of auditory feedback with imposed delays did not affect the average extent or rate of fo modulation. These findings have implications for current reflex-resonance models of vocal vibrato and indicate that control of vibrato is affected by the gain of auditory feedback but may not be affected by the timing of auditory feedback.
Acknowledgements
We would like to thank Brad Story, PhD. for his previous assistance with the data analysis scripts and Melanie Looper and Elaina Derrick for their assistance with data analysis. This research was funded by the National Institute on Disability, Independent Living, and Rehabilitation Research Advanced Rehabilitation Research Training Grant 90AR5015 (PI L.R. Cherney); the National Institute on Deafness and Other Communication Disorders Early Career Research Award R21 DC017001 (PI R.A. Lester-Smith); and research funding provided by the Moody College of Communication at The University of Texas at Austin (R.A. Lester-Smith).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- [1].Prame E Measurements of the vibrato rate of ten singers. The Journal of the Acoustical Society of America (1994) 96(4), 1979–1984. [Google Scholar]
- [2].Shipp T, Leanderson R, Sundberg J Some acoustic characteristics of vocal vibrato. J Res Sing (1980) 4, 18–25. [Google Scholar]
- [3].Sundberg J. Acoustic and psychoacoustic aspects of vocal vibrato. In: Dejonckere PH, Hirano M, Sundberg J, eds. Vibrato San Diego, CA: Singular Publishing Group, Inc.; 1995:35–62. [Google Scholar]
- [4].Seashore CE The vibrato. In: University of Iowa studies in the psychology of music Vol 1. Iowa City, IA: University Press; 1932. [Google Scholar]
- [5].Nix J, Perna N, James K, Allen S Vibrato rate and extent in college music majors: a multicenter study. Journal of Voice (2016) 30(6), 756. e731–756. e741. [DOI] [PubMed] [Google Scholar]
- [6].Titze IR, Story B, Smith M, Long R A reflex resonance model of vocal vibrato. The Journal of the Acoustical Society of America (2002) 111(5), 2272–2282. [DOI] [PubMed] [Google Scholar]
- [7].Guenther FH A neural network model of speech acquisition and motor equivalent speech production. Biol Cybern (1994) 72(1), 43–53. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=7880914. Published 1994/01/01. [DOI] [PubMed] [Google Scholar]
- [8].Guenther FH, Ghosh SS, Tourville JA Neural modeling and imaging of the cortical interactions underlying syllable production. Brain and language (2006) 96(3), 280–301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Lester-Smith RA, Daliri A, Enos N, et al. The Relation of Articulatory and Vocal Auditory–Motor Control in Typical Speakers. Journal of Speech, Language, and Hearing Research (2020) 63(11), 3628–3642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Burnett TA, Freedland MB, Larson CR, Hain TC Voice F0 responses to manipulations in pitch feedback. The Journal of the Acoustical Society of America (1998) 103(6), 3153–3161. [DOI] [PubMed] [Google Scholar]
- [11].Jones JA, Munhall KG Perceptual calibration of F0 production: Evidence from feedback perturbation. Journal of the Acoustical Society of America (2000) 108(3), 1246–1251. [DOI] [PubMed] [Google Scholar]
- [12].Zarate JM, Zatorre RJ Neural substrates governing audiovocal integration for vocal pitch regulation in singing. Annals of The New York Academy of Sciences (2005) 1060, 404–408.’doi:’1060/1/404 [pii] 10.1196/annals.1360.058 [DOI] [PubMed] [Google Scholar]
- [13].Zarate JM, Zatorre RJ Experience-dependent neural substrates involved in vocal pitch regulation during singing. NeuroImage (2008) 40(4), 1871–1887.’doi:’S1053–8119(08)00059–1 [pii] 10.1016/j.neuroimage.2008.01.026 [DOI] [PubMed] [Google Scholar]
- [14].Jones JA, Keough D Auditory-motor mapping for pitch control in singers and nonsingers. Experimental Brain Research (2008) 190(3), 279–287.’doi:’ 10.1007/s00221-008-1473-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Leydon C, Bauer JJ, Larson CR The role of auditory feedback in sustaining vocal vibrato. The Journal of the Acoustical Society of America (2003) 114(3), 1575–1581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Brajot F-X, Lawrence D Delay-induced low-frequency modulation of the voice during sustained phonation. The Journal of the Acoustical Society of America (2018) 144(1), 282–291. [DOI] [PubMed] [Google Scholar]
- [17].Brajot F-X, Neiman AB Vocal wow in an adapted reflex resonance model. The Journal of the Acoustical Society of America (2020) 147(3), 1822–1833. [DOI] [PubMed] [Google Scholar]
- [18].Schultz-Coloun H, Battmer R Quantitative evaluation of vibrato in singers. Folia Phoniatr (1981) 33, 1–14. [PubMed] [Google Scholar]
- [19].Shipp T, Sundberg J, Doherty ET The effect of delayed auditory feedback on vocal vibrato. Journal of Voice (1988) 2(3), 195–199. [Google Scholar]
- [20].Ward D, Burns E Singing without auditory feedback. J Res Sing (1978) 1(2), 4–44. [Google Scholar]
- [21].Deutsch JA, Clarkson JK Nature of the vibrato and the control loop in singing. Nature (1959) 183, 167–168. [DOI] [PubMed] [Google Scholar]
- [22].Scheerer NE, Tumber AK, Jones JA Attentional demands modulate sensorimotor learning induced by persistent exposure to changes in auditory feedback. J neurophysiol (2016) 115(2), 826–832. [DOI] [PubMed] [Google Scholar]
- [23].Tumber AK, Scheerer NE, Jones JA Attentional demands influence vocal compensations to pitch errors heard in auditory feedback. PLoS One (2014) 9(10), e109968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Ramig LA, Shipp T Comparative measures of vocal tremor and vocal vibrato. Journal of Voice (1987) 1(2), 162–167. [Google Scholar]
- [25].Koda J, Ludlow CL An evaluation of laryngeal muscle activation in patients with voice tremor. Otolaryngology--Head and Neck Surgery (1992) 107(5), 684–696. [DOI] [PubMed] [Google Scholar]
- [26].Lester-Smith RA, Kim JH, Hilger A, Chan C-L, Larson CR Auditory-Motor Control of Fundamental Frequency in Vocal Vibrato. Journal of Voice (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].R: A language and environment for statistical computing [computer program]. Version 4.0.5 Vienna, Austria: R Foundation for Statistical Computing; 2021. [Google Scholar]
- [28].RStudio: Integrated Development Environment for R [computer program] Boston, MA: RStudio, PBC; 2021. [Google Scholar]
- [29].Carpenter B, Gelman A, Hoffman MD, et al. Stan: A probabilistic programming language. Journal of statistical software (2017) 76(1), 1–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Bürkner P Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10 (1), 395–411. In: DOI 10.32614/RJ-2018-017; 2018. [DOI] [Google Scholar]
- [31].Nalborczyk L, Batailler C, Lœvenbruck H, Vilain A, Bürkner P-C An introduction to Bayesian multilevel models using brms: A case study of gender effects on vowel variability in standard Indonesian. Journal of Speech, Language, and Hearing Research (2019) 62(5), 1225–1242. [DOI] [PubMed] [Google Scholar]
- [32].Barr DJ, Levy R, Scheepers C, Tily HJ Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of memory and language (2013) 68(3), 255–278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Larson CR, Altman KW, Liu H, Hain TC Interactions between auditory and somatosensory feedback for voice F 0 control. Experimental Brain Research (2008) 187(4), 613–621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Poeppel D, Assaneo MF Speech rhythms and their neural foundations. Nature Reviews Neuroscience (2020) 21(6), 322–334. [DOI] [PubMed] [Google Scholar]
- [35].Kim KS, Wang H, Max L It’s about time: Minimizing hardware and software latencies in speech research with real-time auditory feedback. Journal of Speech, Language, and Hearing Research (2020) 63(8), 2522–2534. [DOI] [PMC free article] [PubMed] [Google Scholar]
