Skip to main content
Journal of Speech, Language, and Hearing Research : JSLHR logoLink to Journal of Speech, Language, and Hearing Research : JSLHR
. 2020 Sep 24;63(10):3392–3407. doi: 10.1044/2020_JSLHR-19-00422

Compensatory Responses to Formant Perturbations Proportionally Decrease as Perturbations Increase

Ayoub Daliri a,, Sara-Ching Chao a, Lacee C Fitzgerald a
PMCID: PMC8060011  PMID: 32976078

Abstract

Purpose

We continuously monitor our speech output to detect potential errors in our productions. When we encounter errors, we rapidly change our speech output to compensate for the errors. However, it remains unclear whether we adjust the magnitude of our compensatory responses based on the characteristics of errors.

Method

Participants (N = 30 adults) produced monosyllabic words containing /ɛ/ (/hɛp/, /hɛd/, /hɛk/) while receiving perturbed or unperturbed auditory feedback. In the perturbed trials, we applied two different types of formant perturbations: (a) the F1 shift, in which the first formant of /ɛ/ was increased, and (b) the F1–F2 shift, in which the first formant was increased and the second formant was decreased to make a participant's /ɛ/ sound like his or her /æ/. In each perturbation condition, we applied three participant-specific perturbation magnitudes (0.5, 1.0, and 1.5 ɛ–æ distance).

Results

Compensatory responses to perturbations with the magnitude of 1.5 ɛ–æ were proportionally smaller than responses to perturbation magnitudes of 0.5 ɛ–æ. Responses to the F1–F2 shift were larger than responses to the F1 shift regardless of the perturbation magnitude. Additionally, compensatory responses for /hɛd/ were smaller than responses for /hɛp/ and /hɛk/.

Conclusions

Overall, these results suggest that the brain uses its error evaluation to determine the extent of compensatory responses. The brain may also consider categorical errors and phonemic environments (e.g., articulatory configurations of the following phoneme) to determine the magnitude of its compensatory responses to auditory errors.


Speech production is dependent on the precise and coordinated movements of speech articulators. The brain uses a combination of processes to ensure the precision of speech movements. During planning, as the brain prepares appropriate control signals to generate speech movements, it also predicts the sensory consequences of the movements (Guenther, 2016; Houde & Nagarajan, 2011). It has been suggested that the brain uses efference copies of its control signals and internal forward models to predict sensory consequences of its movements (Houde & Nagarajan, 2011; Krakauer et al., 2019; Wolpert & Flanagan, 2016). Then, during speech production, the brain compares sensory feedback of the movements with its sensory prediction of the movements (e.g., Krakauer et al., 2019). One approach for examining these mechanisms in the laboratory setting is to generate discrepancies between participants' predictions and sensory feedback. For example, in an auditory feedback perturbation paradigm, participants produce target words while receiving a perturbed version of their auditory feedback that is created by systematically manipulating acoustic parameters of their speech (e.g., Houde & Jordan, 1998). Therefore, by examining how participants evaluate and respond to experimentally generated errors, one can elucidate sensorimotor processes that support accurate speech production.

Auditory feedback perturbations have been used in two different paradigms: adaptation and compensation (for a review, see Fuchs et al., 2019, pp. 16–76). In an adaptation paradigm, auditory perturbations are applied in consecutive trials. After several trials, participants develop adaptive responses to reduce auditory errors, and participants maintain the acquired adaptive responses (Abur et al., 2018; Ballard et al., 2018; Daliri & Dittman, 2019; Daliri & Max, 2018; Daliri et al., 2017; Houde & Jordan, 1998; Stepp et al., 2017). Thus, using this paradigm, one can examine how participants use errors in current and past trials to modify their feedforward motor commands for future productions. In a compensation paradigm, auditory perturbations are applied in a subset of trials that are randomly distributed such that consecutive trials are not necessarily perturbed (Burnett et al., 1997; Liu & Larson, 2007; Purcell & Munhall, 2006; Tourville et al., 2008). During a perturbed trial, participants generate online compensatory responses. Typically, compensatory responses are in the opposite direction of the perturbations and start within 100–200 ms after the onset of the perturbations. Compensatory responses are often interpreted as reactive or reflexive responses that are generated to reduce or eliminate the perceived auditory errors (Burnett et al., 1997; Cai et al., 2012; Guenther, 2016; Perkell, 2012). Therefore, the compensation paradigm is suitable for elucidating sensorimotor processes involved in feedback control mechanisms of speech production (i.e., error detection, evaluation, and correction during speaking). Overall, auditory feedback perturbations are powerful techniques that can be used to elucidate feedforward or feedback control mechanisms underlying speech production.

By combining these behavioral paradigms with neural recording techniques, many studies have examined neural substrates and mechanisms that support error-driven changes during voice and speech production. For example, electrophysiological studies have shown that brain responses evoked by self-produced speech sounds are attenuated in comparison with responses evoked by listening to the speech sounds (Houde et al., 2002; Niziolek et al., 2013). Importantly, experimentally inducing auditory errors by manipulating auditory feedback results in a reduction (and even elimination) of the auditory attenuation (Behroozmand & Larson, 2011; Guo et al., 2017; Heinks-Maldonado et al., 2006; Liu et al. 2011). Typically, such results are interpreted as evidence for efference copy–mediated modulation of the auditory system during production, that is, direct effects of the motor system on the auditory system (for a review, see Max & Daliri, 2019). Neuroimaging studies have further examined the auditory–motor integration by elucidating neural substrates involved in error detection and correction during production (Guenther, 2016). For example, the primary auditory cortex, superior temporal gyrus, ventral sensorimotor cortex, and inferior frontal gyrus are typically considered among the core neural networks that support auditory error detection and correction (Ballard et al., 2018; Daliri et al., 2020; Tourville et al., 2008). Together, using behavioral and neural measures, studies have shown that the brain monitors its output to detect potential errors during production (such as errors induced by auditory perturbations); when it detects errors, the brain (a) learns from the errors by adapting its movements to reduce auditory errors and (b) uses the errors to generate compensatory responses to minimize the perceived errors during speech production.

In a recent study, we used an adaptation paradigm to examine error evaluation processes (Daliri & Dittman, 2019). We found that the brain evaluates auditory errors and responds to auditory errors based on the relevance of the errors to the current task. Additionally, MacDonald et al. (2010) have reported that adaptive responses decrease when perturbations are substantial. Studies of limb motor control have also reported similar relationships between the magnitude of sensory perturbations and the extent of adaptive responses (for a review, see Krakauer et al., 2019). It has been suggested that, when errors become too large or task irrelevant, the brain may assign them to external sources and consider them with a lower weight; the brain becomes less responsive to those errors (Wolpert & Flanagan, 2016). In an adaptation paradigm, the brain is exposed to perturbations in several trials; therefore, the brain gathers information (from several trials) to evaluate the errors and to gradually develop its adaptive responses to the errors. However, it is not clear if the brain uses similar error evaluation processes during online monitoring (e.g., in a compensation paradigm) when the brain needs to generate an online (within-trial) corrective response to compensate for the errors. There is very limited evidence that the brain may use error evaluation processes to generate compensatory responses (Behroozmand & Larson, 2011; Korzyukov et al., 2017; Liu & Larson, 2007). For example, Liu and Larson (2007) used a “pitch-reflex” paradigm (which has a similar design to the compensation paradigm) and examined compensatory responses to perturbations of the fundamental frequency with different magnitudes during sustained phonations. They reported that participants responded proportionally less to larger perturbations in comparison with smaller perturbations. However, sensorimotor processes underlying the control of suprasegmental features (e.g., fundamental frequency) may differ from processes underlying the control of segmental features (e.g., formant frequencies). Generally, suprasegmental features are more susceptible to change in response to perturbations (for a review, see Perkell, 2012). For example, suprasegmental features quickly change in response to change in background noise or to the removal of auditory feedback (e.g., turning off a hearing aid device for individuals with hearing impairments); however, this is not the case for segmental features (Perkell, 2012; Pittman et al., 2018). Therefore, in this study, we investigated error evaluation processes involved in the control of formants (i.e., segmental features) by examining compensatory responses to formant perturbations with different magnitudes.

While many studies have examined adaptive responses to formant perturbations (for a review, see Fuchs et al., 2019, pp. 16–76), very few studies have examined compensatory responses to formant perturbations (Cai et al., 2014, 2012, 2011; Niziolek & Guenther, 2013; Parrell et al., 2017; Purcell & Munhall, 2006; Reilly & Dougherty, 2013; Tourville et al., 2008). These studies have used perturbations of the first formant (F1) or the second formant (F2) alone (Cai et al., 2014, 2012, 2011; Parrell et al., 2017; Purcell & Munhall, 2006; Reilly & Dougherty, 2013; Tourville et al., 2008), or simultaneous perturbations of both F1 and F2 (Niziolek & Guenther, 2013). One of the rationales for perturbing both F1 and F2 is to generate categorical errors (e.g., perturbing /ɛ/ to sound like /æ/). Given that previous studies have not compared compensatory responses to perturbations of a single formant with responses to simultaneous perturbations of F1 and F2, it is not clear whether participants respond differently to auditory errors generated by these two types of perturbations. Overall, to address these knowledge gaps, we used a compensation paradigm in which we perturbed each participant's /ɛ/ in two different directions by manipulating F1 only (F1 shift) or both F1 and F2 (F1–F2 shift). In each perturbation direction, we applied three different perturbation magnitudes that were based on the participant's ɛ–æ distance (0.5, 1.0, and 1.5 ɛ–æ). In the F1 shift, F1 was increased; in the F1–F2 shift, a concurrent increase in F1 and decrease in F2 was applied to generate a categorical error (/ɛ/ was perturbed to sound like /æ/). We hypothesized that (a) if the sensorimotor processes of the auditory–motor compensation evaluate errors, then the extent of compensatory responses to larger auditory errors would be proportionally less than the extent of responses to smaller errors (as larger errors are more likely to be related to external sources than the speech output) and that (b) if mechanisms of the auditory–motor compensation calculate categorical errors to guide the speech output, then the extent of compensatory responses to the F1–F2 shift would be larger than the extent of responses to the F1 shift (as the F1–F2 shift is more likely to produce a categorical error). Additionally, previous studies have suggested that a participant's compensatory response is related to her perceptual acuity. Therefore, we used two methods (categorical perception and just noticeable difference [JND]) to measure each participant's perceptual acuity. We used correlation coefficients to examine potential relationships between compensatory responses and perceptual acuity.

Method

Participants

We recruited 30 adult participants for this study (11 men; age range: 19–43.81 years, M = 23.44, SD = 4.98). Four participants were left-handed, based on participants' self-reports of handedness. Participants were screened to ensure they meet the following inclusion criteria: (a) not having a history of neurological, psychological, and speech-language disorders; (b) not taking any medications with direct effects or side effects on the sensorimotor system; (c) being a native speaker of American English; and (d) having a normal binaural hearing. For hearing screening, we used an audiometer (Madsen Itera II, Natus) and followed the standard pure-tone hearing screening procedures recommended by the American Speech-Language-Hearing Association (1997). All participants had a pure-tone hearing threshold of less than 20 dB HL at all octave frequencies of 250–8000 Hz. All participants signed a written consent form before the experiment. The institutional review board at Arizona State University approved all study protocols.

Apparatus

The study was conducted inside a double-walled, sound-attenuating booth. Participants were comfortably seated in a chair in front of a 24-in. LCD monitor (with a refresh rate of 60 Hz). A cardioid dynamic microphone (SM58, Shure) was placed approximately 15 cm away from the participant's mouth (at an approximately 45° angle). The microphone signal was amplified (TubeOpto 8, ART) and transmitted to a computer via an external audio interface (UltraLite-mk3 Hybrid, MOTU). The computer was used (a) to manipulate auditory feedback, (b) to record the input (microphone) and output (auditory feedback) signals, and (c) to present target words and visual feedback on the monitor (see Figure 1A). The output of the audio interface was amplified (S-phone, Samson Technologies) and binaurally played back to the participant via insert earphones (ER-1, Etymotic Research). Before each experiment, the input–output level (microphone to insert earphones) was calibrated, such that the output signal was played at 5 dB higher than the input signal (Abur et al., 2018; Daliri & Max, 2015, 2016).

Figure 1.

Figure 1.

(A) Participants produced monosyllabic words containing /ɛ/ while receiving perturbed or unperturbed auditory feedback. (B) Each participant completed a compensation task in which we perturbed the first formant (F1) and the second formant (F2). A given participant's formants of /ɛ/ were perturbed in two directions: F1 shift and F1–F2 shift. In the F1 shift, F1 was increased. In the F1–F2 shift, perturbations were applied along the participant-specific ε–æ line such that a produced /ε/ for the participant would sound like her /æ/ (increase in F1 and decrease in F2). For each perturbation direction, we applied three perturbation magnitudes that were calculated based on the participant-specific ε–æ distance: 0.5, 1.0, and 1.5 ε–æ distance. In other words, the same perturbation magnitudes were applied in the F1 shift and the F1–F2 shift.

For formant tracking and real-time formant manipulation (approximately 18 ms of input–output delay), we used Audapter (Cai, 2015). Audapter's source code is written in C++ and can be invoked from MATLAB (MathWorks). For formant tracking, Audapter uses linear predictive coding analysis in combination with dynamic programming. Similar to our previous studies (Chao et al., 2019; Daliri & Dittman, 2019), we used linear predictive coding orders of 17 and 15 for male and female participants, respectively. We also used a sampling rate of 48 kHz and a downsampling factor of 3.

Procedure

The experiment was completed in one session that took less than 2 hr. Participants completed several blocks with short breaks between consecutive blocks: one block of a training task, one block of a pretest task, 10 blocks of a compensation task, four blocks of a vowel discrimination task, and one block of a vowel categorization task.

Training Task

The purposes of the training task were (a) to familiarize participants with the experimental setup (i.e., speaking into the microphone and hearing their speech output via the insert earphones) and (b) to train participants to produce words with the duration and intensity within desired ranges (450–700 ms; 72–82 dB SPL). In each trial of the training task, a consonant–vowel–consonant word containing /ɛ/ (/hɛp/, /hɛd/, and /hɛk/) appeared on the monitor. Each trial lasted 2.5 s, with a break (1–1.5 s) between two consecutive trials. The training task consisted of 30 trials (10 repetitions of each of the three target words), and the order of the presentation of target words was randomized. After the completion of each trial, participants received visual feedback about the duration and intensity of the produced word.

Pretest Task

The purpose of the pretest task was to determine participant-specific vowel centroids for the front vowels /ɪ/, /ε/, and /æ/. This task consisted of one block of 75 trials (25 repetitions of each vowel in the context of /hVp/). The overall design of this task was like the design of the training task except that participants received visual feedback only if their productions were outside the desired intensity range or duration range. Immediately after the completion of the pretest task, we used a MATLAB script to automatically extract formant trajectories (estimated by Audapter) for each production. We then calculated the F1 and F2 averages for each production. Using the extracted formant values from all productions, we estimated vowel centroids (i.e., the center of a vowel distribution in the F1–F2 coordinates) for /ɪ/, /ε/, and /æ/. We ensured the accuracy of the automatically estimated vowel centroids by plotting and visually inspecting the extracted formant values for all productions, as well as the calculated vowel centroids. We also determined the /hεp/ production that was closest to the centroid of /ε/ (hereafter called “median production”). The median production was used in the vowel discrimination task and the vowel categorization task. We used the vowel centroids (a) to estimate a participant-specific vowel configuration (i.e., the distance between /ε/ and /æ/ centroids as well as the angle of the line connecting /ε/ and /æ/ centroids) that was used to determine participant-specific formant perturbations in the compensation task and (b) to improve the precision of Audapter's formant tracking by entering F1 and F2 of centroids as participant-specific initial values for F1 and F2 tracking (Daliri & Dittman, 2019).

Compensation Task

The purpose of this task was to measure participants' compensatory responses to distinct formant perturbations. This task consisted of 10 blocks of 77 trials. Each block consisted of three types of trials: unperturbed word-reading trials, perturbed word-reading trials, and sentence-reading trials. In the unperturbed and perturbed word-reading trials, participants produced monosyllabic words that contained /ε/ (/hɛp/, /hɛd/, and /hɛk/). The order of words was randomized in each block. In the perturbed word-reading trials (21 trials per block), formant perturbations were applied, and no formant perturbations were applied in the unperturbed trials (42 trials per block). The perturbed trials were distributed such that there were one to four unperturbed trials before each perturbed trial. The rationale for this design was to reduce motor learning effects due to exposure to perturbations. In the sentence-reading trials (14 trials per block), participants produced a sentence while receiving unperturbed auditory feedback. The sentence in each trial was randomly selected from a list of sentences adopted from the Harvard word list. The sentence-reading trials were randomly distributed throughout the block. The rationale for including sentence-reading trials was to ensure that participants do not predict perturbed trials and to reduce the effects of exposure to perturbations in previous perturbed trials (motor learning).

To design the formant perturbations, we used each participant's ε–æ distance (in hertz) and ε–æ line, which were calculated based on the participant's pretest data. We calculated three perturbation magnitudes for each participant: 0.5, 1.0, and 1.5 times of participant-specific ε–æ distance. We then used these perturbation magnitudes in two types of formant perturbations: F1 shift and F1–F2 shift. In the F1 shift, F1 was increased by 0.5, 1.0, and 1.5 ε–æ distance, and F2 was not perturbed. In the F1–F2 shift, concurrent F1 and F2 perturbations were applied along the participant-specific ε–æ line. The perturbation magnitudes for F1 and F2 were designed such that the overall perturbation would shift the participant's ε-centroid to points in the F1–F2 coordinates that were at 0.5, 1.0, or 1.5 ε–æ distance away from the centroid of /ε/. In other words, the F1–F2 shift perturbed the participant's /ε/ toward her /æ/ (F1 was increased, and F2 was decreased) by 0.5, 1.0, and 1.5 ε–æ distance. Thus, as shown in Figure 1B, participants experienced perturbations with similar magnitudes but in two different directions (F1 shift and F1–F2 shift). Five out of 10 compensation blocks (randomly selected) included the F1 shift, and the remaining five blocks included the F1–F2 shift. These two perturbation directions were selected because previous studies have used either F1 perturbation (Cai et al., 2012; Parrell et al., 2017; Purcell & Munhall, 2006; Reilly & Dougherty, 2013; Tourville et al., 2008) or simultaneous perturbations of both F1 and F2 (Niziolek & Guenther, 2013). Given that previous studies have not compared compensatory responses to these two perturbation directions, we examined participants' responses to the two perturbation directions. However, because we designed the F1–F2 shift based on participant-specific ε–æ angle and ε–æ distance, an inherent limitation of this technique is that the F1 shift and the F1–F2 shift could become very similar for participants with a small ε–æ angle (see Figure 1B).

Vowel Discrimination Task

The purpose of the vowel discrimination task was to estimate the smallest formant perturbation that each participant can perceive (i.e., JND). This task consisted of four blocks of 35 trials. In each trial, we played the participant-specific median production (at 75 dB SPL), followed by a formant-perturbed version of the same token (with a 1-s interstimulus interval). Then, participants were asked to indicate (using a keyboard) whether or not the two stimuli sounded different (i.e., whether or not they perceived the perturbation). We used a weighted one-up, two-down staircase method (Kingdom & Prins, 2016) to determine the magnitude of the formant perturbation applied in each trial. The downward step size was equal to 5% of the participant-specific ε–æ distance, and the upward step size was equivalent to 9.1% of the participant-specific ε–æ distance. The one-up, two-down procedure with this ratio of downward step size to upward step size (0.5488) would converge to the 80% threshold; that is, participants would perceive the perturbation on 80% of trials (Kingdom & Prins, 2016).

Because formant perturbations in two directions (F1 shift and F1–F2 shift) were used in the compensation task, we estimated participants' JND values for formant perturbations that involved the F1 shift and the F1–F2 shift. In two out of the four blocks of the discrimination task (one block of the F1 shift and one block of the F1–F2 shift), formant perturbation magnitude started from 100% to reduce perceptual bias; in the other two blocks, formant perturbation magnitude started from 0% (i.e., no perturbation was applied in the first trial). After every two blocks of the compensation task, there was one block of the discrimination task, and the order of the blocks of the discrimination task was randomized for each participant. All participants completed a few trials of the discrimination task to become familiar with the overall procedure of the task.

Vowel Categorization Task

The purpose of this task was to estimate the categorical boundary between /ε/ and /æ/ for each participant. This task consisted of one block (60 trials) that was conducted after the completion of all blocks of the compensation task. Similar to our previous studies (Chao et al., 2019; Daliri & Dittman, 2019), we used the median production and shifted its first and second formant frequencies in five equal increments along the participant-specific ε–æ line, resulting in six auditory stimuli (including the median production). In each trial, one of the generated auditory stimuli was binaurally presented (at 75 dB SPL), and participants were instructed to indicate (using a keyboard) whether they heard /hεp/ or /hæp/. The order of the auditory stimuli was randomized, and each stimulus was presented 10 times.

Data Analysis

Compensation Task

We inspected all trials of this task to exclude trials with production errors (e.g., mispronunciations) or formant tracking errors. Using the spectrogram of the speech signal, we manually defined the onset and offset of the vowel for each production. Then, we extracted the trajectories of F1 and F2 from the initial 400 ms of the vowel. This duration was selected to be consistent with previous studies (Niziolek & Guenther, 2013; Parrell et al., 2017). It should be noted that, in the training task, participants were trained to produce vowels with the duration in the range of 450–700 ms. We excluded trials with a duration less than 400 ms (M = 7.62% of all trials, SD = 7.18%). To ensure that our results were not biased by the number of trials in each condition, we conducted several Wilcoxon signed-ranks tests to compare the number of excluded trials in all conditions. All the statistical comparisons were nonsignificant (p > .149 in all cases). Given that we were interested in change in formants in response to the perturbations, the average trajectory of all unperturbed trials for each target word was subtracted from the trajectory of perturbed trials for that word to calculate formant change (Niziolek & Guenther, 2013; Parrell et al., 2017). Figure 2 shows the group-average F1-change (solid lines) and F2-change (dashed lines) trajectories (averaged over all perturbation magnitudes and target words) for the F1 shift (Panel A) and the F1–F2 shift (Panel B). This procedure (i.e., subtraction of the average of unperturbed trials from perturbed trials) was similar to the procedure that was used in all previous studies (Cai et al., 2014, 2012, 2011; Niziolek & Guenther, 2013; Parrell et al., 2017; Purcell & Munhall, 2006; Reilly & Dougherty, 2013; Tourville et al., 2008). To ensure that participants' responses in unperturbed trials were not influenced by exposure to perturbations in previous perturbed trials, we compared formant values of unperturbed trials immediately before a perturbed trial with formant values of unperturbed trials immediately after a perturbed trial (note that there were one to four unperturbed trials before each perturbed trial). If exposure to a perturbation would result in trial-to-trial learning, then the formants of the unperturbed trials after a perturbation would be different from formants of the unperturbed trials before the perturbation. For this purpose, we used F1 and F2 produced in the first 100 ms of the vowel, as formant values in this window are influenced by changes in feedforward motor commands more than feedback-driven changes (Niziolek et al., 2013; Parrell et al., 2017). However, formant values of trials before and after perturbations were not statistically significantly different: F1 values, t(29) = 1.815, p = .080; F2 values, t(29) = 0.434, p = .668. These results suggested that the exposure to perturbations in the perturbed trials did not result in a measurable change in feedforward motor commands, and thus, it was valid to subtract the average of unperturbed trials from the perturbed trials to estimate compensatory responses to the perturbations.

Figure 2.

Figure 2.

The average trajectory of unperturbed trials was subtracted from the trajectory of perturbed trials to calculate formant change in response to perturbations. Panels A and B show the group-average F1-change (solid lines) and F2-change (dashed lines) trajectories for the F1 shift and the F1–F2 shift, respectively (colored shaded areas correspond to standard errors). For each participant, we calculated the difference between average formant changes in the last 100-ms window and the first 100-ms window (gray shaded areas in Panels A and B). Polar plots in Panels C and D show the average formant change across all perturbation magnitudes and all target words for the F1 shift and the F1–F2 shift, respectively. These polar plots show that, regardless of the perturbation type, participants responded to the perturbations by changing both F1 (first formant) and F2 (second formant) toward their /ɪ/. Panel E shows the configuration of the front vowels /ɪ/ and /æ/ (relative to /ε/) for all participants (based on the data from the pretest task). Given the similarity of responses to both formant perturbation directions, we projected formant-change trajectories to the participant-specific ε–ɪ line (Panel F). This procedure decomposed formant-change trajectories into two components: compensation response and deviation response. Negative values of the compensation response indicated responses in the direction toward /ɪ/. Negative values of the deviation response indicated responses toward the inside of the vowel space.

After subtracting the average of unperturbed trials from perturbed trials, we calculated the difference between formant changes in the last 100-ms window (300–400 ms) and the first 100-ms window (0–100 ms) for each perturbed trial. Polar plots in Figures 2C and 2D show the average formant change (across all perturbation magnitudes and target words) in the F1 shift and the F1–F2 shift, respectively. Initially, we planned to examine responses in the direction opposite the perturbations, that is, examining F1 for the F1 shift and examining the projection of formant changes to the participant-specific ε–æ line for the F1–F2 shift. However, visual inspections of these results revealed that, regardless of the perturbation direction (F1 shift or F1–F2 shift), participants responded to the perturbations by changing both F1 and F2 such that they would produce a vowel that was toward their /ɪ/. Figure 2E shows the configuration of the front vowels /ɪ/ and /æ/ (relative to /ε/) for all participants (based on the data from the pretest task). Given the similarity in the patterns of F1 and F2 changes in response to the F1 shift and F1–F2 shift (see Figures 2C and 2D), we projected formant-change trajectories to the participant-specific ε–ɪ line (see Figure 2F). This procedure decomposed the responses (vectors consist of F1 and F2 trajectories) into two components: The first component was parallel to the participant-specific ε–ɪ line (hereafter called “compensation response”; measured in hertz), and the second component was orthogonal to the ε–ɪ line (hereafter called “deviation response”; measured in hertz). Negative values of the compensation response indicated responses in the direction toward /ɪ/. Negative values of the deviation response indicated responses toward the inside of the vowel space (see Figure 2E). Note that compensation responses are more appropriate for addressing our hypotheses; however, we also analyzed deviation responses for consistency and clarity purposes. Finally, to be able to meaningfully compare compensation responses in different perturbation magnitudes and across participants, we normalized compensation responses by the magnitude of the formant perturbation in each of the conditions and for each participant. Overall, as dependent variables, we calculated the difference between the last 100-ms window and the first 100-ms window of the trajectories of the (a) compensation responses, (b) normalized compensation responses, (c) deviation responses, and (d) normalized deviation responses.

Lastly, we defined response latency as the time point that a given participant started to change her formants in response to the perturbations. To identify response latency for the compensation responses and normalized compensation responses, we used a MATLAB function (“findchangepts”) to find one change point for each response trajectory. We used this function to apply a two-piece linear regression model to the compensatory response and normalized compensatory responses. This function divides a trajectory into two segments, fits a regression line to each segment, and calculates the total residual errors for the fitted lines. It repeats this procedure for different division points and uses dynamic programming to find a division point at which the fitted lines to the two segments are statistically different and the total residual error is minimum. In other words, this procedure finds a point at which the linear trend of the trajectory substantially changes. If the two lines are not different, the function does not return a change point. The function did not find a change point for approximately 8% of all trials. We used the returned change point as the response latency. This method is similar to the method that was used in previous studies (Cai et al., 2012; Parrell et al., 2017). To ensure the accuracy of calculated change points, (a) we visually inspected the fitted lines, and (b) we calculated the goodness of fit for all the fitted lines. Our analysis of the goodness of fit showed satisfactory fit to the data for all participants (R 2: M = 0.82, SD = 0.03, range: 0.77–0.90). Given that the normalized compensation response was an amplitude-scaled version of the compensation response, its response latency was identical to the response latency of the compensation response. As a dependent variable, we entered the response latency in our statistical analyses.

Vowel Discrimination Task

To calculate JND for each block of this task, we averaged perturbation magnitudes applied in the last 20 trials. We then averaged the JND values of the two blocks of each perturbation type (F1 shift and F1–F2 shift), resulting in two JND values for each participant (corresponding to each of the perturbation types).

Vowel Categorization Task

First, we calculated each participant's proportion of ε-responses for each of the six auditory stimuli. Then, using a maximum likelihood criterion, we fitted a logistic psychometric function to each participant's data (R 2 > .85 for all participants). Based on the fitted psychometric functions, we calculated each participant's perceptual boundary at 50% (i.e., a stimulus that the participant perceived as /ε/ on 50% of trials). As the final value of the perceptual boundary, we then calculated the Euclidean distance of the categorical boundary's formant frequencies from those of the ε-centroid and normalized the distance based on the participant-specific ε–æ distance. In other words, similar to JND values, the categorical boundary was expressed as a percentage of the participant-specific ε–æ distance.

Statistical Analysis

All participants completed six conditions of the perturbation task in which they produced three target words in a 2 × 3 × 3 design (two perturbation directions, three perturbation magnitudes, and three target words). We entered five dependent variables into the statistical analyses: compensation response, deviation response, normalized compensation response, normalized deviation response, and compensation latency. Because there were four left-handed participants, we examined their responses in comparison with the right-handed participants' responses. Our visual inspection showed that the left-handed participants' responses were well within the range of the responses of the right-handed participants. Thus, we entered all participants into the statistical analyses. We fitted a linear mixed-effects model to the data for each dependent variable using the lme4 package (Bates et al., 2015). For each model, we used perturbation direction (two levels; F1 shift and F1–F2 shift), perturbation magnitude (three levels; 0.5, 1.0, and 1.5 ε–æ distance), and target word (three levels; /hɛp/, /hɛd/, and /hɛk/) as fixed effects and participant as a random effect (random intercept). To determine the statistical significance of the fixed effects, we used the lmerTest R package, with Satterthwaite's method for determining the degrees of freedom (Kuznetsova et al., 2017). To assess statistically significant effects, we conducted post hoc pairwise comparisons using the emmeans package with Tukey's method for multiple-comparison correction (Lenth, 2019). We used the Kenward–Roger method to determine the degrees of freedom of the post hoc tests. Given that this analysis examined between-condition differences, we conducted a series of a priori planned comparisons (using uncorrected one-sample t tests) to determine if compensation responses and normalized responses in each of the conditions and for each of the target words were statistically different from zero (i.e., participants compensated in response to formant perturbations).

To assess the potential effects of the speech perception system on the speech production system, we calculated Pearson correlation coefficients between perceptual measures (JND and categorical boundary) and compensatory measures (compensation and normalized compensation responses) using the psych package (Revelle, 2018). We used R Version 4.0.0 to conduct all statistical analyses.

Results

Response Analysis

As shown in Figures 2A2D, regardless of the direction of the formant perturbation, participants responded to the perturbations by changing both F1 and F2 toward their /ɪ/. Therefore, we decomposed formant changes into two components (compensation and deviation responses, measured in hertz) by projecting formant-change trajectories to the participant-specific ε–ɪ line. The compensation response was parallel to the participant-specific ε–ɪ line, with negative values in the direction toward /ɪ/. The deviation response was orthogonal to the ε–ɪ line, with negative values toward the inside of the vowel space. The compensation response is more appropriate for addressing our hypotheses; however, we also analyzed the deviation response for consistency and clarity purposes. For statistical analysis of the responses (compensation, normalized compensation, deviation, and normalized deviation), we calculated the difference between average responses in the last 100-ms window (300–400 ms) and the first 100-ms window (0–100 ms; gray shaded areas in Figures 3A3D and Figures 4A4D). Table 1 contains the average and standard deviations for each of the dependent measures.

Figure 3.

Figure 3.

The left panels show the analysis of perturbation direction by magnitude, and the right panels show the analysis of perturbation direction by target words. Group-average trajectories of compensation responses for the F1 shift and the F1–F2 shift are shown in Panels A–D. Colored shaded areas in Panels A–D correspond to standard errors. We calculated the change in compensation responses in the last 100-ms window relative to the first 100-ms window (gray shaded areas in Panels A–D). The group-average and individual participant data for the change in compensation responses are shown in Panels E and F. We found statistically significant main effects of perturbation direction (p < .001), perturbation magnitude (p < .001), and target word (p < .001). We also found a statistically significant Direction × Word interaction (p = .024). The overall compensation responses to the F1 shift were smaller than responses to the F1–F2 shift. Compensation responses gradually increased as the magnitude of perturbation increased from 0.5 to 1.5 ε–æ (Panel E). The Direction × Word interaction indicated smaller responses to the F1 shift in comparison with responses to the F1–F2 shift for “hep” and “heck.” However, this was not the case for “head” (Panel F). Error bars in Panels E and F correspond to ±1 standard error. F1 = first formant; F2 = second formant.

Figure 4.

Figure 4.

Given that (a) we used perturbations with different magnitudes and (b) perturbations were based on participant-specific ε–æ distance, we normalized the compensation responses based on the perturbation magnitude in each condition and for each participant. The left panels show the analysis of perturbation direction by magnitude, and the right panels show the analysis of perturbation direction by target words. Group-average trajectories for the normalized compensation responses for all perturbation conditions and target words are shown in Panels A–D. We found statistically significant main effects of perturbation direction (p < .001), perturbation magnitude (p < .001), and target word (p < .001). Normalized responses to the F1–F2 shift were larger than responses to the F1 shift, and the normalized responses gradually decreased as the magnitude of the perturbation increased from 0.5 to 1.5 ε–æ (Panel E). Normalized responses were smaller for “head” in comparison with responses for “hep” and “heck” (Panel F). Error bars in Panels E and F correspond to ±1 standard error. F1 = first formant; F2 = second formant.

Table 1.

Group average and standard deviation (in parentheses) for all dependent measures and in all conditions.


F1 shift
F1–F2 shift
Measure 0.5
εæ
1.0
εæ
1.5
εæ
0.5
εæ
1.0
εæ
1.5
εæ
Compensation response (Hz) hɛp −15.0
(17.1)
−25.3
(25.8)
−30.6
(24.9)
−19.7
(19.1)
−36.8
(33.6)
−43.5
(31.2)
hɛd −12.2
(15.5)
−24.2
(22.3)
−18.0
(20.6)
−11.7
(22.0)
−23.6
(26.7)
−30.7
(33.8)
hɛk −13.3
(17.7)
−24.6
(21.7)
−25.4
(26.6)
−25.4
(23.2)
−44.6
(30.5)
−42.0
(29.1)
Normalized compensation response (%) hɛp −16.3
(21.5)
−10.3
(14.6)
−9.8
(8.0)
−19.9
(23.0)
−17.4
(15.3)
−13.9
(10.4)
hɛd −11.8
(14.5)
−11.4
(10.4)
−5.8
(7.4)
−9.1
(29.1)
−10.6
(16.1)
−9.3
(11.8)
hɛk −13.4
(18.4)
−12.8
(13.1)
−8.1
(9.6)
−21.9
(20.8)
−22.5
(16.5)
−14.8
(13.7)
Deviation response (Hz) hɛp 3.3
(15.6)
0.7
(23.2)
0.7
(19.6)
4.6
(15.7)
7.0
(20.6)
13.5
(26.1)
hɛd −2.1
(14.2)
−5.4
(16.0)
0.8
(16.1)
3.2
(15.3)
4.6
(16.9)
5.4
(20.7)
hɛk 2.8
(14.2)
1.4
(14.0)
−0.0
(17.2)
8.7
(14.6)
11.9
(19.4)
8.4
(18.7)
Normalized deviation response (%) hɛp 4.0
(15.3)
−0.2
(11.9)
−0.5
(7.8)
5.5
(18.7)
3.8
(9.4)
5.5
(10.3)
hɛd −2.7
(15.5)
−2.4
(7.7)
0.2
(6.8)
1.3
(19.8)
1.9
(10.4)
0.9
(8.9)
hɛk 2.9
(16.3)
0.9
(7.5)
−0.2
(7.4)
9.8
(20.3)
6.6
(9.9)
3.3
(6.5)
Compensation latency (ms) hɛp 170.8
(45.6)
181.7
(40.4)
168.7
(31.7)
185.8
(39.8)
180.2
(33.3)
185.3
(31.9)
hɛd 177.2
(38.6)
170.1
(38.3)
169.8
(32.0)
169.8
(33.2)
170.7
(35.0)
172.6
(32.8)
hɛk 175.6
(44.1)
179.4
(42.4)
180.7
(43.1)
187.1
(31.9)
190.7
(43.7)
185.9
(49.4)

Note. F1 = first formant; F2 = second formant.

Compensation Response

Figure 3 shows the group-average trajectories of compensation responses to the F1 shift and the F1–F2 shift, averaged over target words (Panels A and C) and averaged over perturbation magnitudes (Panels B and D). To ensure that participants compensated for the formant perturbations, we conducted a series of a priori one-sample t tests. Our analyses showed that compensation responses were statistically significantly different from zero (p < .007 in all cases). Examining compensation responses for between-condition differences, we found statistically significant main effects of perturbation direction, F(1, 5512.6) = 38.987, p < .001; perturbation magnitude, F(2, 5512.5) = 52.002, p < .001; and target word, F(2, 5515.9) = 14.189, p < .001. We also found a statistically significant Direction × Word interaction, F(2, 5514.1) = 3.726, p = .024. We did not find statistically significant Direction × Magnitude, F(2, 5512.2) = 1.176, p = .308; Magnitude × Word, F(4, 5514.1) = 1.099, p = .355; and Direction × Magnitude × Word, F(4, 5513.9) = 0.640, p = .634, interactions. As shown in Figure 3E, the overall compensation responses to perturbation magnitudes of 0.5 ε–æ were smaller than responses to perturbation magnitudes of 1.0 ε–æ, t(5512) = 7.244, p < .001, and perturbation magnitudes of 1.5 ε–æ, t(5513) = 9.846, p < .001. Compensation responses to perturbation magnitudes of 1.0 ε–æ were also smaller than responses to perturbation magnitudes of 1.5 ε–æ, t(5512) = 2.620, p = .024. The overall compensation responses to the F1 shift were smaller than responses to the F1–F2 shift (main effect of direction). Additionally, the overall responses for “head” were smaller than responses for “hep,” t(5515) = −4.221, p < .001, and “heck,” t(5517) = 4.881, p < .001. As shown in Figure 3F, the Direction × Word interaction indicated smaller responses to the F1 shift in comparison with responses to the F1–F2 shift for “hep,” t(5513) = 3.615, p = .004, and “heck,” t(5514) = 5.434, p < .001. However, this was not the case for “head,” t(5514) = 1.693, p = .537.

Normalized Compensation Response

Given that (a) we used perturbations with three different magnitudes (0.5, 1.0, and 1.5 of ε–æ distance) and (b) perturbations were based on participant-specific ε–æ distance, we normalized the compensation responses based on the perturbation magnitude. Figures 4A4D show the group-average trajectories of normalized compensation responses for all perturbation conditions and target words. The planned a priori one-sample t tests showed that the normalized compensation responses in all conditions and for all words were statistically significantly different from zero (p < .002 in all cases). We found statistically significant main effects of perturbation direction, F(1, 5512.8) = 35.890, p < .001; perturbation magnitude, F(2, 5513.1) = 18.349, p < .001; and target word, F(2, 5516.1) = 11.127, p < .001. However, unlike the analysis of compensation responses, the analysis of normalized compensation responses did not reveal a statistically significant Direction × Word interaction, F(2, 5514.2) = 2.840, p = .059. We did not find statistically significant Direction × Magnitude, F(2, 5512.3) = 0.016, p = .884; Magnitude × Word, F(4, 5514.4) = 0.615, p = .652; and Direction × Magnitude × Word, F(4, 5514.0) = 0.855, p = .491, interactions. As shown in Figure 4E, normalized responses to perturbations with magnitudes of 0.5 ε–æ, t(5514) = −5.751, p < .001, and 1.0 ε–æ, t(5512) = −4.443, p < .001, were larger than responses to perturbations with 1.5 ε–æ magnitude. However, responses to perturbations with 0.5 and 1.0 ε–æ magnitudes were similar, t(5513) = −1.442, p = .319. Figure 4F shows that the responses were smaller for “head” in comparison with responses for “hep,” t(5515) = −3.020, p = .007, and “heck,” t(5517) = 4.633, p < .001; however, responses were similar for “hep” and “heck,” t(5516) = 1.587, p = .251.

Compensation Latency

We calculated the latency of compensation responses in all conditions and for each target word. Given that the normalized compensation response was an amplitude-scaled version of the compensation response, its response latency was identical to the response latency of the compensation response. Our analysis of compensation latency did not reveal statistically significant main effects of perturbation direction, F(1, 5113.2) = 3.254, p = .071; perturbation magnitude, F(2, 5112.3) = 0.820, p = .441; and target word, F(2, 5122.0) = 1.598, p = .202. We did not find statistically significant Direction × Word, F(2, 5116.8) = 0.826, p = .438; Direction × Magnitude, F(2, 5112.2) = 1.051, p = .350; Magnitude × Word, F(4, 5117.9) = 0.123, p = .974; and Direction × Magnitude × Word, F(4, 5116.3) = 0.954, p = .432, interactions. The overall average-group compensation latency was 177.901 ms (SD = 22.694 ms).

Deviation Response

Examining deviation responses, we found a statistically significant main effect of perturbation direction, F(1, 5513.3) = 36.836, p < .001, with larger deviation responses to the F1–F2 shift. We did not find statistically significant main effects of perturbation magnitude, F(2, 5512.7) = 0.211, p = .809, and target word, F(2, 5519.1) = 1.582, p = .206. Similarly, we did not find statistically significant Direction × Word, F(2, 5516.4) = 0.187, p = .829; Direction × Magnitude, F(2, 5112.7) = 1.887, p = .152; Magnitude × Word, F(4, 5516.3) = 0.828, p = .507; and Direction × Magnitude × Word, F(4, 5515.5) = 1.225, p = .298, interactions.

Normalized Deviation Response

Results of the analysis of normalized deviation responses were similar to the results for deviation responses. The main effect of perturbation direction, F(1, 5512.8) = 27.609, p < .001, was statistically significant (larger deviation responses to the F1–F2 shift). We did not find statistically significant main effects of perturbation magnitude, F(2, 5515.2) = 0.873, p = .418, and target word, F(2, 5523.0) = 2.310, p = .099. We also did not find statistically significant interactions of Direction × Word, F(2, 5519.6) = 0.308, p = .735; Direction × Magnitude, F(2, 5512.9) = 1.096, p = .334; Magnitude × Word, F(4, 5519.0) = 0.893, p = .467; and Direction × Magnitude × Word, F(4, 5517.7) = 0.944, p = .437.

Correlational Analysis

We calculated Pearson correlation coefficients to examine potential relationships between participants' responses in different conditions with their perceptual thresholds. However, we did not find statistically significant correlations between JND values and compensation responses or normalized compensation responses (p > .533 in all cases). Similarly, we did not find statistically significant correlations between categorical perception and compensation responses or normalized compensation responses (p > .106 in all cases).

Discussion

The brain uses a combination of processes to ensure the accuracy of speech production. During speaking, the brain monitors its speech output and compares incoming auditory feedback with its prediction to estimate errors in its output. When it detects an error in its output, the brain uses the error to generate compensatory responses to reduce the error (Guenther, 2016; Houde & Nagarajan, 2011). In a previous study, using an adaptation paradigm, we showed that the brain evaluates auditory errors and responds to auditory errors based on its evaluation of the relevance of the errors (Daliri & Dittman, 2019). However, it remained unclear whether the brain uses similar error evaluation processes during online monitoring when the brain needs to generate online corrective responses to reduce auditory errors. Therefore, in this study, we used a compensation paradigm in which we applied formant perturbations with three different magnitudes that were calculated based on each participant's ɛ–æ distance in the F1–F2 coordinates. Because previous studies have shifted formants of the vowel /ɛ/ in different directions (toward the outside of the vowel space by increasing F1 or toward another vowel by simultaneous F1–F2 perturbations), it remained unclear whether compensatory responses to F1 perturbations are different from responses to simultaneous F1–F2 perturbations. Thus, we compared compensatory responses to the two types of perturbations by shifting formants of the vowel /ɛ/ in two different directions: F1 shift and F1–F2 shift. In the F1 shift, F1 of the vowel /ɛ/ was increased by 0.5, 1.0, and 1.5 ɛ–æ distance; in the F1–F2 shift, a participant's vowel /ɛ/ was shifted toward her vowel /æ/ (increase in F1 and decrease in F2) by 0.5, 1.0, and 1.5 ɛ–æ distance. We hypothesized that (a) compensatory responses to larger auditory perturbations would be proportionally less than responses to smaller perturbations and that (b) compensatory responses to the F1–F2 shift would be larger than responses to the F1 shift.

To address our first hypothesis, we conducted two analyses. In the first analysis, we compared the extent of compensatory responses (in hertz) in all conditions. We found that, regardless of the perturbation direction, compensatory responses to smaller perturbations (0.5 ɛ–æ) were smaller than compensatory responses to large perturbations (1.5 ɛ–æ). Because perturbations with different magnitudes were used and perturbations were based on participant-specific ɛ–æ distance, we normalized the compensation responses based on the perturbation magnitude for each participant. This analysis revealed that normalized responses (in percentages) to small perturbations (0.5 ɛ–æ) were larger than responses to large perturbations (1.5 ɛ–æ). Together, these results suggest that, even for generating within-trial compensatory responses, the brain evaluates errors and uses its evaluation to determine the extent of its response; the brain does not respond to all auditory errors in the same way. These results are consistent with the results of a previous study that examined adaptive responses to formant perturbations with different magnitudes (MacDonald et al., 2010). MacDonald et al. (2010) reported that adaptive responses increased with an increase in formant perturbations until the adaptive responses reached a plateau at large perturbations and started to decrease. Our results are also consistent with the results of a previous study that examined compensatory responses to perturbations of the fundamental frequency with different magnitudes (Liu & Larson, 2007). Liu and Larson (2007) reported that participants responded proportionally less to larger perturbations in comparison with smaller perturbations. The similarity in results suggests that the brain relies on error evaluation processes to generate both adaptive and compensatory responses, and the brain uses such processes for controlling both segmental (e.g., formant frequencies) and suprasegmental (e.g., fundamental frequency) features of speech.

The normalized compensatory responses in all conditions were statistically significantly different from zero; however, regardless of the perturbation direction and target word, participants compensated for a relatively small fraction of the perturbations (0.5 ɛ–æ: −15.4%, 1.0 ɛ–æ: −14.2%, 1.5 ɛ–æ: −10.3%). Two conceptually related explanations can be offered for this relatively small compensation. The first explanation is that, during its error evaluation, the brain determines the probability that the errors are related to its output and not generated by external sources (Wolpert & Flanagan, 2016). Based on this explanation, small errors are more likely to be related to inaccuracy in movements, and larger errors are more likely to be generated by external sources. Therefore, the brain may assign a higher weight to smaller errors and a lower weight to larger errors when it determines the magnitude of its compensatory responses to auditory errors (i.e., proportionally larger response to smaller errors). The second explanation stems from the fact that the brain uses both auditory and somatosensory feedback to control its speech movements (Guenther, 2016; Perkell, 2012). For example, if a participant produces “head” but hears “had,” the participant needs to produce a word that is closer to “hid” to compensate for the auditory error successfully; however, generating this compensatory response would require the participant to position her articulators in the position for “hid,” causing a large somatosensory error (note that the participant expects to receive somatosensory feedback related to “head”). Thus, complete compensations for auditory errors could generate large somatosensory errors. Therefore, the brain may optimize its compensatory responses to errors by minimizing the total errors that it receives from both sensory modalities. Overall, the relatively small magnitude of compensatory responses can be attributed to these two potential explanations, and future studies need to determine the contributions of each explanation.

To address our second hypothesis, we compared the magnitude of compensatory responses to the F1 shift and F1–F2 shift. Both the compensatory responses and normalized compensatory responses were larger for the F1–F2 shift (−30.9 Hz and −15.5%) than those for the F1 shift (−20.9 Hz and −11.1%). These results suggest that the brain considers higher level constraints such as categorical errors when it determines the magnitude of its compensatory responses to auditory errors (Chao et al., 2019; Daliri & Dittman, 2019; Niziolek & Guenther, 2013). In the F1–F2 shift, the centroid of /ɛ/ was shifted toward the centroid of /æ/, whereas in the F1 shift, the centroid of /ɛ/ was shifted along the F1 direction. For the F1 shift, depending on the vowel configuration of a given participant (see Figure 1B), the shift could be toward the outside of the vowel space (for large ɛ–æ angles) or toward the edge of the vowel category of /æ/ (for small ɛ–æ angles). Therefore, it is possible that, even in the F1 shift, there are some categorical errors (although less pronounced than the categorical errors generated by the F1–F2 shift). Overall, the F1–F2 shift is more likely to generate a categorical error, and thus, the brain may respond differently to errors generated by the F1–F2 shift. An alternative explanation is that the F1 shift alone generates errors that are unnatural, whereas the F1–F2 shift generates natural errors (e.g., one could mistakenly say “had” instead of “head”; however, it is highly unlikely to produce “head” with a shifted F1 mistakenly). Thus, it is possible that the brain assigns errors generated by the F1 shift to external sources and, therefore, responds less to the errors. As mentioned above, it is important to note that an inherent limitation of our design could influence the applicability of these explanations. We perturbed the vowel /ɛ/ in two directions, primarily based on the previous literature on compensatory responses to formant perturbations. However, the difference between the two perturbations depends on the configuration of the vowel space for each participant. For example, if the two vowels of /ɛ/ and /æ/ are oriented in the F1–F2 coordinates in such a way that the ɛ–æ line is closer to a horizontal line (see Figure 1B), then the difference between the two perturbation directions is minimal (the F1–F2 shift is essentially the same as the F1 shift). To more systematically test these explanations, future studies could compare compensatory responses to F1–F2 perturbations that shift a participant's /ɛ/ toward her /æ/ (errors that are more plausible and natural) with responses to perturbations that are toward the outside of the vowel space along the orthogonal direction to the participant's ɛ–æ line (errors that are impossible to make).

In this study, participants produced three target words that contained the vowel /ɛ/ in the context of /hɛC/ (/hɛp/, /hɛd/, and /hɛk/). We found that both compensatory responses and normalized compensatory responses were smaller for “head” than responses for “hep” and “heck.” This effect was modified by the perturbation direction for compensatory responses (and not normalized compensatory responses): Compensatory responses to the F1 shift and the F1–F2 shift were similar for “head,” but this was not the case for “hep” and “heck.” The consonants in this study were different in two ways: place of articulation (bilabial: /p/, alveolar: /d/, and velar: /k/) and voicing (voiced: /d/ and voiceless: /p, k/). One potential explanation for these results is that the articulatory configuration of the following consonant adds additional constraints on the magnitude of compensatory responses to errors during the vowel. Another potential explanation is related to voicing. It is known that vowels that are followed by a voiced consonant are typically longer (e.g., Stevens, 1999); vowel duration was longer for “head” (564 ms) than for “hep” (514 ms) and “heck” (508 ms). One could argue that the brain may adjust the dynamic of its responses to errors based on its prediction of the duration of the vowel; it responds to the errors more slowly in longer vowels. In other words, it may take a longer time for the compensatory responses in long vowels to reach the level of responses in short vowels (vowels that are followed by a voiceless consonant). Because we examined the first 400 ms of the vowel for all words, we may have only examined the early portion of the response for the longer vowel, where the response has not reached its maximum. Overall, these results suggest that constraints of consonantal environments may influence how the brain responds to auditory errors. We did not design this study to examine the potential effects of consonantal environments on compensatory responses. Future studies can determine such contributions by controlling for voicing and articulatory configuration of target words.

Examining the patterns of formant change in response to the F1 shift and the F1–F2 shift, we found that participants changed their F1 and F2 to compensate for the perturbations. This pattern was such that, on average, participants changed their formants toward /ɪ/. Typically, when participants receive auditory perturbations, they generate compensatory responses in the opposite direction of the perturbations. In the F1–F2 shift, it was expected that participants would change both F1 and F2 trajectories (as both F1 and F2 were perturbed). However, we found that participants changed both F1 and F2 trajectories even in response to the F1 shift (where only F1 was perturbed). This unexpected finding contrasts with the findings of previous studies (see also Tang et al., 2019). For example, MacDonald et al. (2011) applied separate F1 or F2 perturbations and examined participants' adaptive responses to the perturbations. They found that, on average, participants adapted to the perturbations by changing the formant that they perceived as perturbed. However, there are important methodological differences between our study and MacDonald et al.'s (2011) study. First, their study was an adaptation study, whereas our study is a compensation study. Second, the perturbation in their study was not participant specific; however, we designed the formant perturbations based on the vowel configuration of each participant. This participant-specific procedure may have reduced the between-participant variability in our study, resulting in a statistically significant change in F1 and F2. Overall, it remains unclear why participants changed both F1 and F2 trajectories in response to the F1 shift. One explanation for this pattern of response is based on the biomechanical constraints imposed by the shape of the vocal tract. To generate a compensatory response, participants need to change their articulatory configurations (i.e., change the vocal tract shape). However, any change in the vocal tract shape could influence all formant frequencies (Guenther, 2016; Perkell, 2012; Stevens, 1999), and it is unlikely that naïve participants would position their articulators to change one formant separately from other formants. Therefore, it is possible that when participants respond to a given perturbation by changing their vocal tract configuration, the manifestations of the new configuration can be measured in multiple formant frequencies. A second explanation is that the two perturbation types may have had similar perceptual effects (i.e., in both cases, formants of /ɛ/ were shifted toward /æ/, either by shifting F1 alone or by shifting both F1 and F2). Due to the perceptual similarity, participants may have perceived similar errors and generated compensatory responses (although with different magnitudes) in the same direction toward /ɪ/. Overall, these results indicate that future studies should examine both formants regardless of what formants are perturbed.

This study's findings have immediate theoretical and clinical implications. For example, in the Directions Into Velocities of Articulators model of speech, auditory feedback is compared with auditory targets to determine potential errors in production (Guenther, 2016). Then, the calculated errors are multiplied by a feedback gain to determine the magnitude of the feedback-driven compensatory responses to the errors—a linear relationship between errors and the magnitude of the compensatory responses. However, we found that the relationship between the magnitude of errors (perturbations) and the magnitude of compensatory responses is not linear; that is, as the errors increase, the magnitude of compensatory responses to the errors reaches a plateau (i.e., a nonlinear relationship). Thus, the feedback control mechanism of the Directions Into Velocities of Articulators model can be slightly modified to consider the nonlinear relationship between errors and compensatory responses. Additionally, previous studies have shown that disorders of speech production—such as stuttering (Cai et al., 2012; Daliri et al., 2014, 2013, 2017), Parkinson's disease (Abur et al., 2018; Huang et al., 2016), and acquired apraxia of speech (Ballard et al., 2018)—may impair feedback control mechanisms. For example, Cai et al. (2012) reported that stuttering adults' compensatory responses to formant perturbations are smaller than responses of nonstuttering adults. Given our finding of the nonlinear relationship between perturbations and compensatory responses, one could speculate that adults who stutter respond less to perturbations because they evaluate the magnitude of the perturbations differently (e.g., they may evaluate the perturbations as larger). In other words, the smaller responses of adults who stutter are not due to difficulties in generating the responses but, rather, due to difficulties in evaluating the magnitude of errors. However, it should be noted that deficits in other mechanisms could also result in reduced compensatory responses (e.g., deficits in processes that generate sensory prediction to estimate errors, deficits in sensory-to-motor processes that translate errors into corrective responses). Overall, the paradigm that we developed in this study, along with our findings, can be used to shed light on mechanisms underlying speech production and disorders of speech production.

In summary, we examined auditory–motor processes that the brain uses to respond to auditory errors during online monitoring. We used a compensation paradigm in which we applied formant perturbations in two directions: the F1 shift, where the first formant was increased, and the F1–F2 shift, where concurrent F1 and F2 perturbations were applied to generate a categorical error. In each perturbation direction, we applied three different perturbation magnitudes (0.5, 1.0, and 1.5 ɛ–æ distance). We found that responses to small perturbations were proportionally larger than responses to large perturbations. Additionally, responses were larger when participants received a concurrent F1–F2 perturbation than when they received an F1 perturbation. Finally, we found that the subsequent consonant influences the magnitude of compensatory responses. Overall, these results suggest that the brain uses its error evaluation to modulate compensatory responses to errors. The brain may also consider categorical errors and phonemic environments (e.g., configurations of the following phoneme) to determine the magnitude of its compensatory responses.

Acknowledgments

This work was supported by National Institute on Deafness and Other Communication Disorders Grant R21 DC017563, awarded to Ayoub Daliri. The authors thank Damaris Ochoa for her contribution to participant recruitment for this project.

Funding Statement

This work was supported by National Institute on Deafness and Other Communication Disorders Grant R21 DC017563, awarded to Ayoub Daliri.

References

  1. Abur, D. , Lester-Smith, R. A. , Daliri, A. , Lupiani, A. A. , Guenther, F. H. , & Stepp, C. E. (2018). Sensorimotor adaptation of voice fundamental frequency in Parkinson's disease. PLOS ONE, 13(1), Article e0191839. https://doi.org/10.1371/journal.pone.0191839 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. American Speech-Language-Hearing Association. (1997). Guidelines for audiologic screening [Guidelines]. https://www.asha.org/policy
  3. Ballard, K. J. , Halaki, M. , Sowman, P. , Kha, A. , Daliri, A. , Robin, D. A. , Tourville, J. A. , & Guenther, F. H. (2018). An investigation of compensation and adaptation to auditory perturbations in individuals with acquired apraxia of speech. Frontiers in Human Neuroscience, 12, 510. https://doi.org/10.3389/fnhum.2018.00510 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bates, D. , Mächler, M. , Bolker, B. M. , & Walker, S. C. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01 [Google Scholar]
  5. Behroozmand, R. , & Larson, C. R. (2011). Error-dependent modulation of speech-induced auditory suppression for pitch-shifted voice feedback. BMC Neuroscience, 12(1), Article 54. https://doi.org/10.1186/1471-2202-12-54 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Burnett, T. A. , Senner, J. E. , & Larson, C. R. (1997). Voice F0 responses to pitch-shifted auditory feedback: A preliminary study. Journal of Voice, 11(2), 202–211. https://doi.org/10.1016/S0892-1997(97)80079-3 [DOI] [PubMed] [Google Scholar]
  7. Cai, S. (2015). Audapter. GitHub. https://github.com/shanqing-cai/audapter_matlab [Google Scholar]
  8. Cai, S. , Beal, D. S. , Ghosh, S. S. , Guenther, F. H. , & Perkell, J. S. (2014). Impaired timing adjustments in response to time-varying auditory perturbation during connected speech production in persons who stutter. Brain and Language, 129(1), 24–29. https://doi.org/10.1016/j.bandl.2014.01.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cai, S. , Beal, D. S. , Ghosh, S. S. , Tiede, M. K. , Guenther, F. H. , & Perkell, J. S. (2012). Weak responses to auditory feedback perturbation during articulation in persons who stutter: Evidence for abnormal auditory-motor transformation. PLOS ONE, 7(7), Article e41830. https://doi.org/10.1371/journal.pone.0041830 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cai, S. , Ghosh, S. S. , Guenther, F. H. , & Perkell, J. S. (2011). Focal manipulations of formant trajectories reveal a role of auditory feedback in the online control of both within-syllable and between-syllable speech timing. The Journal of Neuroscience, 31(45), 16483–16490. https://doi.org/10.1523/JNEUROSCI.3653-11.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chao, S.-C. , Ochoa, D. , & Daliri, A. (2019). Production variability and categorical perception of vowels are strongly linked. Frontiers in Human Neuroscience, 13, 96. https://doi.org/10.3389/fnhum.2019.00096 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Daliri, A. , & Dittman, J. (2019). Successful auditory motor adaptation requires task-relevant auditory errors. Journal of Neurophysiology, 122(2), 552–562. https://doi.org/10.1152/jn.00662.2018 [DOI] [PubMed] [Google Scholar]
  13. Daliri, A. , & Max, L. (2015). Electrophysiological evidence for a general auditory prediction deficit in adults who stutter. Brain and Language, 150, 37–44. https://doi.org/10.1016/j.bandl.2015.08.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Daliri, A. , & Max, L. (2016). Modulation of auditory responses to speech vs. nonspeech stimuli during speech movement planning. Frontiers in Human Neuroscience, 10, 1–9. https://doi.org/10.3389/fnhum.2016.00234 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Daliri, A. , & Max, L. (2018). Stuttering adults' lack of pre-speech auditory modulation normalizes when speaking with delayed auditory feedback. Cortex, 99, 55–68. https://doi.org/10.1016/j.cortex.2017.10.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Daliri, A. , Murray, S. H. , Blood, A. J. , Burns, J. , Noordzij, J. P. , Nieto-Castanon, A. , Tourville, J. A. , & Guenther, F. H. (2020). Auditory feedback control mechanisms do not contribute to cortical hyperactivity within the voice production network in adductor spasmodic dysphonia. Journal of Speech, Language, and Hearing Research, 63(2), 421–432. https://doi.org/10.1044/2019_JSLHR-19-00325 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Daliri, A. , Prokopenko, R. A. , Flanagan, J. R. , & Max, L. (2014). Control and prediction components of movement planning in stuttering versus nonstuttering adults. Journal of Speech, Language, and Hearing Research, 57(6), 2131–2141. https://doi.org/10.1044/2014_JSLHR-S-13-0333 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Daliri, A. , Prokopenko, R. A. , & Max, L. (2013). Afferent and efferent aspects of mandibular sensorimotor control in adults who stutter. Journal of Speech, Language, and Hearing Research, 56(6), 1774–1778. https://doi.org/10.1044/1092-4388(2013/12-0134) [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Daliri, A. , Wieland, E. A. , Cai, S. , Guenther, F. H. , & Chang, S.-E. (2017). Auditory-motor adaptation is reduced in adults who stutter but not in children who stutter. Developmental Science, 21(2), Article e12521. https://doi.org/10.1111/desc.12521 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Fuchs, S. , Cleland, J. , & Rochet-Capellan, A. (Eds.). (2019). Speech production and perception: Learning and memory. Peter Lang. https://doi.org/10.3726/b15982 [Google Scholar]
  21. Guenther, F. H. (2016). Neural control of speech. MIT Press. https://doi.org/10.7551/mitpress/10471.001.0001 [Google Scholar]
  22. Guo, Z. , Wu, X. , Li, W. , Jones, J. A. , Yan, N. , Sheft, S. , Liu, P. , & Liu, H. (2017). Top-down modulation of auditory-motor integration during speech production: The role of working memory. The Journal of Neuroscience, 37(43), 10323–10333. https://doi.org/10.1523/JNEUROSCI.1329-17.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Heinks-Maldonado, T. H. , Nagarajan, S. S. , & Houde, J. F. (2006). Magnetoencephalographic evidence for a precise forward model in speech production. NeuroReport, 17(13), 1375–1379. https://doi.org/10.1097/01.wnr.0000233102.43526.e9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Houde, J. F. , & Jordan, M. I. (1998). Sensorimotor adaptation in speech production. Science, 279(5354), 1213–1216. https://doi.org/10.1126/science.279.5354.1213 [DOI] [PubMed] [Google Scholar]
  25. Houde, J. F. , & Nagarajan, S. S. (2011). Speech production as state feedback control. Frontiers in Human Neuroscience, 5, 82. https://doi.org/10.3389/fnhum.2011.00082 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Houde, J. F. , Nagarajan, S. S. , Sekihara, K. , & Merzenich, M. M. (2002). Modulation of the auditory cortex during speech: An MEG study. Journal of Cognitive Neuroscience, 14(8), 1125–1138. https://doi.org/10.1162/089892902760807140 [DOI] [PubMed] [Google Scholar]
  27. Huang, X. , Chen, X. , Yan, N. , Jones, J. A. , Wang, E. Q. , Chen, L. , Guo, Z. , Li, W. , Liu, P. , & Liu, H. (2016). The impact of Parkinson's disease on the cortical mechanisms that support auditory–motor integration for voice control. Human Brain Mapping, 37(12), 4248–4261. https://doi.org/10.1002/hbm.23306 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kingdom, F. A. A. , & Prins, N. (2016). Psychophysics: A practical introduction (2nd ed.). Academic Press. https://doi.org/10.1016/B978-0-12-407156-8.00001-3 [Google Scholar]
  29. Korzyukov, O. , Bronder, A. , Lee, Y. , Patel, S. , & Larson, C. R. (2017). Bioelectrical brain effects of one's own voice identification in pitch of voice auditory feedback. Neuropsychologia, 101, 106–114. https://doi.org/10.1016/j.neuropsychologia.2017.04.035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Krakauer, J. W. , Hadjiosif, A. M. , Xu, J. , Wong, A. L. , & Haith, A. M. (2019). Motor learning. Comprehensive Physiology, 9(2), 613–663. https://doi.org/10.1002/cphy.c170043 [DOI] [PubMed] [Google Scholar]
  31. Kuznetsova, A. , Brockhoff, P. B. , & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26. https://doi.org/10.18637/jss.v082.i13 [Google Scholar]
  32. Lenth, R. (2019). emmeans: Estimated marginal means (Version 1.3.3) [Computer software]. R package. https://cran.r-project.org/package=emmeans [Google Scholar]
  33. Liu, H. , & Larson, C. R. (2007). Effects of perturbation magnitude and voice F0 level on the pitch-shift reflex. The Journal of the Acoustical Society of America, 122(6), 3671–3677. https://doi.org/10.1121/1.2800254 [DOI] [PubMed] [Google Scholar]
  34. Liu, H. , Meshman, M. , Behroozmand, R. , & Larson, C. R. (2011). Differential effects of perturbation direction and magnitude on the neural processing of voice pitch feedback. Clinical Neurophysiology, 122(5), 951–957. https://doi.org/10.1016/j.clinph.2010.08.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. MacDonald, E. N. , Goldberg, R. , & Munhall, K. G. (2010). Compensations in response to real-time formant perturbations of different magnitudes. The Journal of the Acoustical Society of America, 127(2), 1059–1068. https://doi.org/10.1121/1.3278606 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. MacDonald, E. N. , Purcell, D. W. , & Munhall, K. G. (2011). Probing the independence of formant control using altered auditory feedback. The Journal of the Acoustical Society of America, 129(2), 955–965. https://doi.org/10.1121/1.3531932 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Max, L. , & Daliri, A. (2019). Limited pre-speech auditory modulation in individuals who stutter: Data and hypotheses. Journal of Speech, Language, and Hearing Research, 62(8S), 3071–3084. https://doi.org/10.1044/2019_JSLHR-S-CSMC7-18-0358 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Niziolek, C. A. , & Guenther, F. H. (2013). Vowel category boundaries enhance cortical and behavioral responses to speech feedback alterations. The Journal of Neuroscience, 33(29), 12090–12098. https://doi.org/10.1523/JNEUROSCI.1008-13.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Niziolek, C. A. , Nagarajan, S. S. , & Houde, J. F. (2013). What does motor efference copy represent? Evidence from speech production. The Journal of Neuroscience, 33(41), 16110–16116. https://doi.org/10.1523/JNEUROSCI.2137-13.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Parrell, B. , Agnew, Z. K. , Nagarajan, S. S. , Houde, J. F. , & Ivry, R. B. (2017). Impaired feedforward control and enhanced feedback control of speech in patients with cerebellar degeneration. The Journal of Neuroscience, 37(38), 9249–9258. https://doi.org/10.1523/JNEUROSCI.3363-16.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Perkell, J. S. (2012). Movement goals and feedback and feedforward control mechanisms in speech production. Journal of Neurolinguistics, 25(5), 382–407. https://doi.org/10.1016/j.jneuroling.2010.02.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Pittman, A. L. , Daliri, A. , & Meadows, L. (2018). Vocal biomarkers of mild-to-moderate hearing loss in children and adults: Voiceless sibilants. Journal of Speech, Language, and Hearing Research, 61(11), 2814–2826. https://doi.org/10.1044/2018_JSLHR-H-17-0460 [DOI] [PubMed] [Google Scholar]
  43. Purcell, D. W. , & Munhall, K. G. (2006). Compensation following real-time manipulation of formants in isolated vowels. The Journal of the Acoustical Society of America, 119(4), 2288–2297. https://doi.org/10.1121/1.2173514 [DOI] [PubMed] [Google Scholar]
  44. Reilly, K. J. , & Dougherty, K. E. (2013). The role of vowel perceptual cues in compensatory responses to perturbations of speech auditory feedback. The Journal of the Acoustical Society of America, 134(2), 1314–1323. https://doi.org/10.1121/1.4812763 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Revelle, W. (2018). psych: Procedures for psychological, psychometric, and personality research (Version 1.8.12) [Computer software]. Northwestern University. https://cran.r-project.org/package=psych [Google Scholar]
  46. Stepp, C. E. , Lester-Smith, R. A. , Abur, D. , Daliri, A. , Noordzij, P. J. , & Lupiani, A. A. (2017). Evidence for auditory-motor impairment in individuals with hyperfunctional voice disorders. Journal of Speech, Language, and Hearing Research, 60(6), 1545–1550. https://doi.org/10.1044/2017_JSLHR-S-16-0282 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Stevens, K. N. (1999). Acoustic phonetics. MIT Press. https://doi.org/10.7551/mitpress/1072.001.0001 [Google Scholar]
  48. Tang, D. , Lametti, D. R. , & Watkins, K. E. (2019). Altered auditory feedback induces coupled changes in formant frequencies during speech production. bioRxiv. https://doi.org/10.1101/733121 [Google Scholar]
  49. Tourville, J. A. , Reilly, K. J. , & Guenther, F. H. (2008). Neural mechanisms underlying auditory feedback control of speech. NeuroImage, 39(3), 1429–1443. https://doi.org/10.1016/j.neuroimage.2007.09.054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Wolpert, D. M. , & Flanagan, J. R. (2016). Computations underlying sensorimotor learning. Current Opinion in Neurobiology, 37, 7–11. https://doi.org/10.1016/j.conb.2015.12.003 [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Speech, Language, and Hearing Research : JSLHR are provided here courtesy of American Speech-Language-Hearing Association

RESOURCES