Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2023 Jan 20;18(1):e0269326. doi: 10.1371/journal.pone.0269326

Effects of sensorimotor voice training on event-related potentials to pitch-shifted auditory feedback

Sona Patel 1,2,*, Karen Hebert 3, Oleg Korzyukov 1,4, Charles R Larson 1
Editor: Li-Hsin Ning5
PMCID: PMC9858400  PMID: 36662730

Abstract

The pitch perturbation technique is a validated technique that has been used for over 30 years to understand how people control their voice. This technique involves altering a person’s voice pitch in real-time while they produce a vowel (commonly, a prolonged /a/ sound). Although post-task changes in the voice have been observed in several studies (e.g., a change in mean fo across the duration of the experiment), the potential for using the pitch perturbation technique as a training tool for voice pitch regulation and/or modification has not been explored. The present study examined changes in event related potentials (ERPs) and voice pitch in three groups of subjects due to altered voice auditory feedback following a brief, four-day training period. Participants in the opposing group were trained to change their voice fo in the opposite direction of a pitch perturbation stimulus. Participants in the following group were trained to change their voice fo in the same direction as the pitch perturbation stimulus. Participants in the non-varying group did not voluntarily change their pitch, but instead were asked to hold their voice constant when they heard pitch perturbations. Results showed that all three types of training affected the ERPs and the voice pitch-shift response from pre-training to post-training (i.e., “hold your voice pitch steady” task; an indicator of voice pitch regulation). Across all training tasks, the N1 and P2 components of the ERPs occurred earlier, and the P2 component of the ERPs occurred with larger amplitude post-training. The voice responses also occurred earlier but with a smaller amplitude following training. These results demonstrate that participation in pitch-shifted auditory feedback tasks even for brief periods of time can modulate the automatic tendency to compensate for alterations in voice pitch feedback and has therapeutic potential.

Introduction

Sensorimotor control is important for achieving accuracy of goal-directed movements and involves active integration of sensory feedback and motor commands during an ongoing movement [1]. Speech-motor control relies on integration of auditory feedback information in order to adjust motor commands to correct for deviations from the intended production and produce clear and fluent speech [2]. To examine the sensorimotor integration process for voice and speech, altered auditory feedback can be used [3, 4]. The pitch perturbation technique or “pitch-shift task” is an established method for examining the sensorimotor system for voice control [3, 59]. In this technique the auditory feedback is manipulated by changing the voice pitch while a person is speaking, which results in a perceived mismatch between the intended and perceived vocalizations. Deviations in the auditory feedback from the intended voice pitch result in predictable modifications to the voice. These modifications are typically compensatory and in the opposite direction of the pitch shift (an “opposing” response), although voice changes in the same direction of the shift (a “following” response) do occur in both neurologically healthy and impaired individuals [3, 1012].

Although altered auditory feedback techniques have typically been used to examine sensorimotor control of voice in healthy individuals, research has shown that these techniques can be clinically useful. Delayed auditory feedback is a common therapeutic technique for improving fluency in individuals who stutter and in individuals with Parkinson’s disease [13, 14]. Laukkanen [15] demonstrated that by shifting the pitch of auditory feedback while participants read a text aloud, it was possible to change one’s habitual pitch. The authors concluded that speaking repeatedly under the influence of auditory feedback changes a person’s voice, and as a result, this technique might be useful in voice training and therapy. A number of studies have demonstrated that voice training can affect voice control [16]. For example, vocal training for singing has been shown to affect voice (pitch) control [1719]. These changes are not only seen at the behavioral level during singing but also at the neurological level [20, 21]. Zarate and Zatorre [21] argue that activities involving activation of sensorimotor and auditory areas are associated with changes in cortical regions as a result of musical practice.

The exact mechanism by which training paradigms that utilize altered auditory feedback modify speech production patterns long-term remains unclear. One possibility is that these paradigms involve sustained attention to the auditory feedback, resulting in improved attentional control during sensorimotor integration and thus speech production. Tumber and colleagues [22] demonstrated the role of attention on vocal compensations to pitch-shift modulations using a dual-task paradigm. Compensatory voice responses were smaller under a dual-task condition in which individuals had to monitor a visual stream of information for target letters while vocalizing during sudden downward pitch shifts of one half of a semitone. This suggests that when less attention was available for the pitch-shift task, individuals were less able to utilize the auditory feedback to change speech production. Additionally, a study by Li and colleagues [23] found that working memory training modified brain activation during a pitch-shift task. In this study participants trained on an adaptive backwards digit span task for 12 days, and brain activity in response to auditory stimuli (event related potentials or ERPs) were measured before and after training during a standard pitch-shift task. Their results showed modifications in the resulting auditory evoked potentials, namely decreased N100 (“N1” component; activity around the 100 ms region of the auditory ERP response) and increased P200 (“P2” component; activity around the 200 ms region of the auditory ERP response) amplitudes recorded during pitch-shift perturbations following training. The working memory training paradigm used in this study involved sustained attention to the auditory domain over multiple training sessions.

Taken together, these studies suggest that training paradigms may improve voice control and related brain processes by causing the individual to engage additional attentional control mechanisms following a sustained focus on auditory attention processing. However, no evidence exists on whether volitional changes to pitch-shifted feedback impact the automatic error correction processes of voice control. Furthermore, no evidence exists to explain whether simply holding one’s voice constant under pitch-shifted auditory feedback passively directs attention to the auditory feedback, thereby impacting voice control and the related neural mechanisms. These factors are important for identifying the minimum parameters under which training paradigms are expected to operate impacting the clinical feasibility of related interventions. Thus, the purpose of this study was to determine whether the neurological mechanisms for voice control are modified due to brief training under pitch-shifted auditory feedback. To consistently change one’s voice pattern, it is necessary to go through a period of training that should lead to a stage when the behavior becomes automatic. The automatic nature of this or any movement as a function of training is likely to be reflected in the neural mechanisms underlying that movement [21] and can be examined using event related potentials (ERPs) from electroencephalogram (EEG) recordings. In this study the ERPs were recorded in response to shifts in the voice pitch of one’s auditory feedback while vocalizing to assess whether the altered auditory feedback resulted in changes to the automatic compensatory response to alterations in voice pitch feedback. The pattern of auditory-evoked ERPs (i.e., the P50-N1-P2 ERP complex) obtained as a result of speaking under altered auditory feedback have been shown to produce a consistent pattern across studies [2427] and have been reported to reflect the neural processing of voice pitch feedback perturbations during vocalization [28].

In two variants of the vocal training task implemented in the present study, participants volitionally changed their voice fundamental frequency (fo) during the production of a steady vowel sound. In the third variant, participants did not intentionally vary their voice pitch during vowel production, but instead, were instructed to keep their voice pitch constant. These tasks mirrored those implemented by Hain and colleagues [10] in which participants were also asked to oppose the direction of the shift, follow the direction of the shift, or ignore the shift and maintain a steady pitch. However, we implemented a between-groups design where each group performed a single task: oppose the shift (the “opposing group”), follow the shift (the “following group”), and ignore the shift and maintain a steady pitch (the “non-varying group”). The opposing and following tasks were performed by different groups because differences in the vocal response during each task were observed by Hain and colleagues [10] and also because a multiple baseline approach would not be practical (3 tasks would require 6 EEG sessions, resulting in 15 days of testing or 27 hours). In addition, both voice responses and auditory-motor ERPs were measured during a baseline “maintain a steady pitch” pitch-shift task in a pretest-posttest design, specifically, before and after four training sessions. This allowed us to examine the effects of short training intervals (a few sessions) as would be encountered in typical therapy sessions on voice pitch regulation in typical individuals.

The specific aims of this study were to examine the impact of three brief volitional training paradigms on 1) auditory-motor ERPs (the N1-P2 complex) and 2) voice responses in a pitch-shift task. We predict 1) shorter latencies in the N1 and P2 auditory motor response following volitional training; 2) larger amplitudes in the N1 and P2 auditory motor response following volitional training; and 3) shorter latencies and amplitudes in the voice response during a pitch-shift task following training.

Methods

Participants

Thirty-eight participants were recruited from Northwestern University. All participants were native speakers of American English and self-reported being right-hand dominant. They all had normal hearing at octave intervals from 250 Hz to 8000 Hz at 20 dB HL [29] and passed tests of central auditory processing (“CAP”; the Duration Pattern Sequence test and Pitch Pattern Sequence test [30, 31]) with a score of at least 90% (18 of 20) correct. Participants reported having no history of neurological, speech, or language disorders and minimal vocal training (defined as less than three years of vocal training) and that they did not regularly sing in a group (two times per week or less). Participant recruitment and testing procedures were approved by the Northwestern University Institutional Review Board.

All participants were randomly assigned to one of three training groups: opposing, following, or non-varying (described in the next section). Of the participants recruited, one did not complete the study, two were dropped as a result of not being able to perform the training task, two were dropped due to technical errors in data collection, and four were dropped due to artifacts in the EEG signals (mainly due to movements and sleepiness). As a result, a total of 29 participants remained: 10 participants in the opposing group (3 males, 7 females; mean 19.8 years), 9 participants in the following group (3 males, 6 females; mean 21.4 years), and 10 participants in the non-varying group (5 males, 5 females; mean 21.0 years).

Procedures

All testing took place in a double-walled, sound-treated booth. A visual display was presented on the computer screen instructing the participant to vocalize an /a/ vowel for 5 seconds. A progress bar indicated the length of time to either “Get ready” or “Say aah”. Participant vocalizations were recorded using an AKG boomset microphone (model C420; AKG, Vienna, Austria). The voice was amplified with a 10 dB gain using a Mackie mixer (model 1202; Loud Technologies, Woodinville, IL) and presented as real-time feedback using a Sennheiser headset (Sennheiser Electronic Corporation, Old Lyme, CT) during training and Etymotic Research, Inc., (model ER-2) ear inserts (Etymotic Research, Inc., Elk Grove Village, IL) during the baseline task pre- and post-training. During the vocalization the participant’s voice pitch was shifted upward or downward by 100 cents (100 cents = 1 semitone) using an Eventide Eclipse Harmonizer (Eventide, Little Ferry, NJ), creating perturbations in the real-time auditory feedback. MIDI software (Max/MSP v. 5.0) was used to present the display and control characteristics of the pitch-shift (direction randomization, timing, and magnitude). The vocalizations, modified voice feedback signal, and control pulses (used as an indicator of the direction of the pitch-shift) were digitized at 10 kHz, low-passed filtered at 5 kHz, and recorded using LabChart Pro software (AD Instruments, Colorado Springs, CO).

To investigate the effects of volitional voice training on a person’s involuntary pitch-shift response, a pretest-posttest design was used. Participants underwent a specific task (referred to as the “baseline task”) before and after a training period (Days 1 and 5). Vocal training was performed in four sessions, each on a different day within a two-week period (Days 2–5). The total test time for this experiment was 5.5 hours, with no longer than 1.5 hours per session. All participants were monetarily compensated for their participation.

Training task

During the training task a single 1000-ms long shift in pitch occurred during each vocalization (either 100 cents down or up) with a random onset between 500 ms and 1000 ms after voice onset. Participants were asked to dynamically change their pitch to either volitionally oppose (the “opposing” group) or follow (the “following” group) the direction of the actual shift depending on the group they were assigned to and maintain the new pitch level for the remainder of their breath. Participants in a third group (the “non-varying” group) were simply asked to ignore the changes in their auditory feedback and maintain a constant pitch and loudness level (i.e., hold your voice steady). Thus, the non-varying group did not volitionally change their voice in response to the stimuli. Participants performed a short practice session of 10 trials before testing. The instructions in the practice session were the same as the main task. Each training session consisted of 4 blocks of 52 vocalizations.

Baseline task

For the baseline task (Days 1 and 5), all participants were first fitted with a 32-channel Brain Products actiCAP active electrode cap that was connected to the actiCHamp amplifier (Brain Products GmbH, Germany) for EEG recordings. In addition to recording voice samples, event-related potentials were recorded using BrainVision Recorder software (Brain Products, GmbH, Germany) at a sampling rate of 5 kHz and then low-pass filtered at 400 Hz.

Participants vocalized a steady “aah” sound while their pitch was shifted for five, 200-ms segments within each vocalization. The first shift occurred randomly between 700 ms and 1000 ms after vocalization onset, and each successive pitch-shift occurred randomly between 700 and 900 ms after the onset of the previous shift. The “baseline” task used in the pre- and post-testing is commonly used to assess the pitch-shift response, which occurs automatically in response to a brief change in pitch. Because the training task involved volitional modification of voice pitch, a longer time interval was used to reduce the additional memory demands needed to produce the volitional changes in vocalization. In this task participants in all three groups were instructed to ignore changes to their voice and continue to say “aah” at a constant pitch and comfortable level for the length of the progress bar. A total of 52 test vocalizations were recorded before training and after training, which resulted in 260 trials for each measurement (52 vocalizations x 5 pitch shifts per vocalization) with an approximately equal number of upward and downward pitch shifts.

Data analysis

Since we were interested in the effects of training on the baseline task (the pitch-shift reflex, an indicator of voice control), data analysis was performed for both ERP and voice data during the baseline task on Day 1 (pre-training) and Day 5 (post-training). Data analysis was not performed for the training task, as these results are reported elsewhere [8].

EEG analysis

The ERPs were obtained by averaging the recorded EEG signals using Brain Products’ Analyzer software, synchronized to the onset of the pitch-shift stimulus. Standard preprocessing of the data was performed including filtering (1–50 Hz), segmentation (500 ms segments were selected; 100 ms pre-shift and 400 ms post-shift), artifact rejection (on the frontal channels and those epochs with amplitudes exceeding ±50 μV), normalization of the mean value to 0, and averaging across all trials. In addition, the data were re-referenced to the common reference instead of using the reference electrode, since preliminary results showed high activation at the region of the reference electrode (FCz). This methodology allowed us to make use of the electrode at the FCz location. The N1 and P2 peak amplitudes and latencies were extracted for a subset of the channels showing maximal negative and maximal positive responses, respectively. An automatic search was performed to identify the global minima (N1 peak) and maxima (P2 peak), during the time window of maximal activation.

All statistical analyses were performed in SPSS (v.17; SPSS Inc., Chicago, IL) to compare the ERPs (N1 peak amplitude, N1 peak latency, P2 peak amplitude, P2 peak latency) obtained before and after training (within-subject factor of time: pre-training vs post-training), and whether this difference was affected by the stimulus direction (up, down), electrode (Fz, FCz, Cz), and the between-subjects factor of group (opposing, following, non-varying). For this, four separate linear mixed models were conducted. Because Mauchly’s test indicated that the assumption of sphericity was violated (p < 0.05), Greenhouse-Geisser corrected estimates were used. We were primarily interested in the main effect of time to determine overall changes due to training. Follow-up paired samples t-tests were also performed within each group to confirm differences in time (from pre-training to post-training) for each electrode and stimulus direction.

Vocal analysis

Voice samples were analyzed in Igor Pro (Wavemetrics, Inc., Lake Oswego, OR), which called upon Praat [32] for fo detection. Praat was used to develop a wave representing the voice fo contour, which was used in further analyses. These fo contours were first segmented into individual trials of 1100 ms duration (400 ms prior to the pitch-shift onset and 700 ms following the pitch-shift onset). Then outliers were removed from each trial using several processes including normalization by setting the mean baseline voice pitch to 0 cents and removal of extreme values (e.g., extraneous background noise) in the vocalization wave prior to the pitch-shift (for threshold = 30 cents, where max cents > threshold, and min cents < -threshold were rejected) and in the entire duration of each trial when vocalization was occurring (for threshold = 1000 cents, where the whole wave was rejected if max cents > threshold or min cents < -threshold). Only responses that opposed the direction of the pitch shift were used. Finally, the trials were averaged within a participant for each condition (+100-cent shifts, -100-cent shifts). The magnitude of the largest upward or downward compensatory peak (“response magnitude”) and time that the peak reached maximum amplitude (referred to as the “response latency”) was measured for each subject and submitted to statistical testing using general linear mixed models.

Differences in the voice responses were examined between groups for two measures: 1) voice response latency and 2) magnitude of the largest upward or downward compensatory peak. A log-transformation was performed to achieve homogeneity of variance for voice response latency values but was not needed for peak magnitude responses. Linear mixed models were used to test differences in voice response latency and the absolute values of the peak magnitude with the between-subjects factor of training group (opposing, following, non-varying) and within-subjects factors of time (pre-training, post-training) and direction (up, down). Since direction was not a significant factor (F(1,71) = 0.890, p = 0.349) for voice amplitude, the up and down responses were aggregated, resulting in fixed factors of group (opposing, following, non-varying) and time (pre-training, post-training).

Results

ERP results

The ERPs showed that the maximal negative response occurred between 130–160 ms pre-training and 115–135 ms post-training at the following frontal-central electrodes: Cz, FCz, Fz (shown in Fig 1). The maximal positive response occurred between 210–250 ms pre-training and post-training at the same three electrodes (shown in Fig 2). Each subject’s N1 and P2 peak information was extracted for these three channels using the above time windows. The grand averaged ERPs for each group are shown for three electrode sites (Cz, FCz, and Fz) during the pre-training and post-training phases in Fig 3.

Fig 1. Mapping view of the N1 response.

Fig 1

Mapping view of the grand averaged ERPs from 130–160 ms pre-training (top row) and 115–135 ms post-training (bottom row) for the following, opposing, and non-varying groups.

Fig 2. Mapping view of the P2 response.

Fig 2

Mapping view of the grand averaged ERPs from 210–250 ms pre-training (top row) and 210–250 ms post-training (bottom row) for the opposing, following, and non-varying groups.

Fig 3. Grand-averaged ERPs by group at pretest and post-test.

Fig 3

Grand averages of the ERPs at three electrode sites (Cz, FCz, and Fz) for all three groups: opposing (pre: red dashed line; post: red solid line), following (pre: blue dashed line; post: blue solid line), and non-varying (pre: black dotted line; post: black solid line).

Linear mixed models for N1 peak latency and N1 peak amplitude show a significant effect of time on N1 peak latency, (F(1,30) = 32.002, p < 0.05), but not N1 amplitude, (F(1,30) = 0.179, p = 0.675). The main effect of group was not significant for the N1 peak latency (F(2,30) = 0.235, p = 0.79) or the N1 peak amplitude (F(2,30) = 1.04, p = 0.36). Additionally, none of the interaction terms for N1 peak latency: group x time (F(2,30) = 1.685, p = 0.20), group x direction (F(2,30) = 1.604, p = 0.097), group x electrode (F(4,60) = 2.341, p = 0.079), time x direction (F(1,30) = 0.016, p = 0.902), or time x electrode (F(2,60) = 0.169, p = 0.805) or amplitude (group x time (F(2,30) = 0.234, p = 0.793), group x direction (F(2,30) = 0.267, p = 0.768), group x electrode (F(4,60) = 0.183, p = 0.946, time x direction (F(1,30) = 0.022, p = 0.884), or time x electrode (F(2,60) = 1.983, p = 0.147) were significant. Thus, the N1 peak occurred earlier post-training (M: 129.32 ms; SD: 21.2) compared to pre-training (M: 148.94 ms; SD: 20.5) across all groups. The N1 peak latency and N1 peak amplitude are shown (at the Cz electrode) for all three groups in Fig 4.

Fig 4. N1 latency and amplitude.

Fig 4

The a) mean N1 peak latency and b) mean N1 peak amplitude from the ERPs at Cz for the opposing (Opp), following (Fol), and non-varying (Non) groups pre-training (solid bars) compared to post-training (slanted lines). Bars represent standard error of the mean.

A second set of tests was performed to examine the effects of the same four factors (time, direction, electrode, group) on P2 peak latency and P2 amplitude. Greenhouse-Geisser corrected comparisons on P2 peak latency show a significant effect of time (pre-training vs post-training; (F(1,30) = 25.34, p < 0.05), but not a main effect of group (F(2,30) = 0.007, p = 0.9). Specifically, the P2 peak occurred earlier post-training (M: 230.87 ms; SD: 30.4) compared to pre-training (M: 250.05 ms; SD: 28.6). There was no significant main effect of electrode (F(2,60) = 1.49, p = 0.23), although the electrode by group interaction was significant (F(4,60) = 3.114, p < 0.05). In other words, the amplitude of the response occurred in different, albeit nearby, electrode sites. There were no significant interactions for P2 peak latency for group x time (F(2,30) = 1.64, p = 0.21), group x direction (F(2,30) = 0.007, p = 0.993), time x direction (F(1,30) = 1.34, p = 0.25), or time x electrode (F(2,60) = 0.098, p = 0.9). P2 peak amplitude had a main effect of time (F(1,30) = 25.55, p < 0.05) with greater amplitude post-training (M: 1.890 V; SD: .8 V) compared to pre-training (M: 1.448 V; SD: .8 V). No main effect of group was found for P2 amplitude (F(2,30) = 2.12, p = 0.13). A significant main effect of electrode was found for P2 amplitude (F(2,60) = 4.04, p<0.05) with the amplitude highest at the Cz electrode. The electrode by group interaction was significant (F(4,60) = 4.9, p < 0.05). There were no significant interactions for P2 peak amplitude for group x time (F(2,30) = 0.67, p = 0.52), group x direction (F(2,30) = 0.056, p = 0.94), time x direction (F(1,30) = 3.26 p = 0.08), or time x electrode (F(2,60) = 1.14, p = 0.31). Because the P2 peak amplitude was largest at Cz this electrode location is plotted for all three groups in Fig 5.

Fig 5. P2 latency and amplitude.

Fig 5

The a) mean P2 peak latency and b) mean P2 peak amplitude from the ERPs at Cz for the opposing (Opp), following (Fol), and non-varying (Non) groups pre-training (solid bars) compared to post-training (slanted lines). Bars represent standard error of the mean.

Taken together the findings of a significant effect of time and not a time by group interaction demonstrates a consistent change in ERPs following exposure to training. Specifically, the N1 peak latency and P2 peak latency were reduced, and P2 amplitude was increased from pre-training to post-training across all groups.

Voice responses

The grand averages of the voice fo response contours pre-training and post-training are shown for each group in Fig 6 (Panel A: opposing, Panel B: following, and Panel C: non-varying). The voice responses to upward pitch-shifts and downward pitch-shifts are displayed in separate graphs within each panel. The dashed vertical line is the onset time of the pitch-shift stimulus. All groups demonstrate changes in the magnitude of the voice pitch responses from pre-training to post-training.

Fig 6. Grand-averaged vocal responses by group.

Fig 6

Grand-averaged vocal responses to upward (top) and downward (bottom) pitch shifts for the a) opposing, b) following, and c) non-varying groups (blue line represents pre-training responses, red line represents post-training responses).

Linear mixed models showed a main effect of time (F(1,23) = 7.651, p < 0.05) but not group (F(2,46) = 2.195, p = 0.126), and no time by group interaction (F(2,46) = 1.675, p = 0.199) which suggests that response magnitudes for all groups was reduced from the pre-training (M = 24.25, SD = 7.79) to post-training (M = 20.46, SD = 7.02) period similarly. Next, differences in the latency of responses were examined pre- and post-training for each group using a linear mixed model. Results showed main effects of direction (F(1,11) = 5.541, p < 0.05) and time (F(1,11) = 5.262, p < 0.05) but not group (F(2,22) = 0.249, p = 0.782). The interactions between direction and time (F(1,11) = 3.938, p = 0.073), direction and group (F(2,22) = 3.501, p = 0.071), and time and group (F(2,22) = 1.380, p = 0.273) were not significant. This finding indicates that the latency of the response to the pitch-shift stimulus for all groups was reduced from the pre-training (M = 0.35, SD = 0.12) to post-training period (M = 0.32, SD = 0.17) similarly.

Discussion

The purpose of the present study was to determine whether short auditory feedback training intervals could modify voice pitch regulation and affect the corresponding ERPs. Changes in one’s auditory feedback while speaking are perceived as errors in production (e.g., [9]). Subjects were trained over a short, four-day period to respond to unpredictable perturbations in the pitch of their auditory feedback as they produced a prolonged vowel sound (/a/). Research has shown that individuals modify their voice in response to errors or simulated changes in their auditory feedback by opposing or following the direction of the change. To better understand the neurological basis of these response types, we trained individuals to attempt to respond in each of these three ways: (a) instructing subjects to change their voice fo in the opposite direction of pitch-shifted auditory feedback stimuli (opposing the shift), (b) instructing subjects to change their voice fo in the same direction of the pitch-shifted auditory feedback stimuli (following the shift), or (c) by simply instructing subjects to ignore all pitch shifts and maintain a steady voice pitch (non-varying). A pretest-posttest design was used to assess changes in voice control before and after the training period. The outcome measures included magnitude and latency of the compensatory voice response and the magnitude and timing of the corresponding ERPs to the baseline task, i.e., an involuntary pitch-shift task, where participants were asked to hold a constant pitch and loudness (similar to the non-varying task). The resulting involuntary pitch-shift response is an indicator of voice control.

Our results show differences in the voice responses and the corresponding ERPs during the baseline task as a result of training. Differences in the ERPs were seen in N1 peak latency, P2 peak latency, and P2 peak amplitude. Specifically, both the N1 and P2 peaks occurred earlier post-training compared to pre-training, and the P2 peak magnitude was enhanced post-training compared to pre-training. These results are consistent with the findings of Li and colleagues [23] who report an increase in P2 magnitude post-training. While Li and colleagues [23] found a decrease in the N1 amplitude, we did not find changes to the N1 following training in our study, potentially due to differences in the training task. Other research has shown a N1 suppression (vocalization compared to listening) for pitch-shifts that occur at voice onset cite but a P2 enhancement for pitch shifts that occur mid-vocalization [24, 25]. Behroozmand et al. [24] suggest that this enhancement in the middle of vocalization may reflect an increased sensitivity or responsiveness to auditory feedback during the resolution of mismatches between the intended vocalization and its feedback. The finding of systematic changes in the neural response suggests that the trained motor behavior (after practice with any of the three instructed conditions of opposing, following, and holding the voice steady) may have become more automatic [21] and the processing of auditory information has become more efficient [23]. These results were complemented by the voice changes, which revealed significant changes in response latency and magnitude in that the peak responses occurred earlier and with a reduced amplitude post-training compared to pre-training. We suspect the reduction in amplitude of the corrective response to pitch-shifts indicates a greater control of the voice, as others have shown larger amplitudes of vocal responses in pathological conditions such as Parkinson’s disease, where vocal control is abnormal [33].

The present results confirmed our predictions that ERPs are modified following the two dynamic-response training tasks. Surprisingly, a similar pattern of ERPs was observed for the hold-your-voice-steady (“non-varying”) task. Some studies have shown that voluntary responses to the perturbation paradigm engage different mechanisms than involuntary responses in that voluntary responses to pitch shifts can have both involuntary and voluntary components [8, 10, 16]. The involuntary component of responses often results in latency times that are shorter (~100–150 ms) than the voluntary component (~250–600 ms), and are thought to reflect automatic neural processing used by the audio-vocal system to correct for any errors.

On the other hand, the voluntary component is thought to represent higher cognitive mechanisms used at a more conscious level to control voice fo, such as in speaking and singing tasks [8, 10, 16]. In the present study, we investigated involuntary and voluntary responses as individual conditions, rather than as components of the voluntary response. Results show that practicing to maintain a steady pitch also produced differences in ERP/voice responses, potentially because the non-varying task invoked similar cognitive processes used for voice error detection and correction as the other dynamic or volitional tasks. In other words, both dynamic response and hold-your-voice-steady tasks resulted in training an underlying process that positively affected voice pitch regulation. Further, this task activated strong enough processes to produce a change in the neurological mechanism thought to be involved in controlling the voice [5, 23]. Limitations of this study include the relatively small sample per training group. Groups in this study were not balanced by sex although previous work has found that males produce slower and larger vocal responses to pitch shift changes [34]. N1 and P2 amplitude during pitch shift alterations has also been found to vary by gender. The proportions of following vs opposing responses could not be examined and may have changed post-training. Future work should examine the change in response types due to training. This study found that all three training types induced a similar change in vocal response and ERPs to pitch shift stimuli. Future work should also consider examining if training effects vary systematically by gender or age [35].

The results of this study can be used to support the development of brief training interventions for voice modulation. Popular voice therapy programs such as the LSVT® have been shown to be effective in helping individuals with PD to improve their vocal communication by raising voice loudness [36, 37]. However, the standard initial treatment program for LSVT® requires a minimum of 16 sessions over a four-week period. Similar to the LSVT®, the training tasks described in the present study also required subjects to monitor their vocal output and modify their voice fo based on their auditory feedback. This study found changes in voice control and underlying brain mechanisms supporting speech production in only four brief training sessions. Our findings of behavioral and brain changes due to training suggest that brief voice control paradigms modulate the neurological processes for voice production and may be valuable in applications for individuals with neurological voice disorders, such as patients with PD.

Conclusions

In the present study we hypothesized that training individuals to produce a vocal-motor behavior in response to changes in auditory-sensory feedback would affect the ERPs and voice pitch regulation (also assessed as the pitch shift response). Three types of training were implemented where subjects changed their pitch in the opposite direction to a shift in their auditory feedback while vocalizing a prolonged /a/ vowel or in the same direction to the same shift, or subjects maintained a steady voice pitch with no volitional intent to change fo. Effectiveness of training was evaluated by comparing the voice and ERP responses during the baseline task, before and after 4 days of training. Results revealed differences in both the ERPs and voice responses after training for all training tasks. Differences in the ERPs were seen in N1 peak latency, P2 peak latency, and P2 peak amplitude, and voice changes were seen in response latency and magnitude. Changes were seen in ERP responses and voice responses, whereby the peak responses occurred earlier and the peak ERP amplitude was enhanced while the peak voice response amplitude was reduced post-training compared to pre-training. These results suggest that active participation in a vocal task involving the use of altered auditory feedback even for brief periods of time can result in changes in neural activation patterns.

Acknowledgments

The authors would like to thank Chun Liang Chan for his help with computer programming. The authors have no conflicts of interest to disclose. Information regarding reprints can be directed to Sona Patel, Department of Speech-Language Pathology, 123 Metro Blvd, Nutley, NJ 07110.

Data Availability

All subject files are available from the Open Science Framework database at: https://osf.io/ef86u/.

Funding Statement

This research was supported by a grant from the National Institutes of Health, including NIDCD R01DC006243 to CL and R03DC013883 to SP (https://www.nidcd.nih.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Riemann BL, Lephart SM. The sensorimotor system, part I: the physiologic basis of functional joint stability. J Athl Train. 2002;37(1):71–79. [PMC free article] [PubMed] [Google Scholar]
  • 2.Gracco VL. Sensorimotor mechanisms in speech motor control. In: Speech Motor Control and Stuttering. Amsterdam: North Holland: Elsevier; 1991. pp. 53–78. [Google Scholar]
  • 3.Burnett TA, Freedland MB, Larson CR, Hain TC. Voice F0 Responses to Manipulations in Pitch Feedback. J Acoust Soc Am. 1998;103(6): 3153–3161. doi: 10.1121/1.423073 [DOI] [PubMed] [Google Scholar]
  • 4.Donath TM, Natke U, Kalveram KT. Effects of frequency-shifted auditory feedback on voice F0 contours in syllables. J Acoust Soc Am. 2002;111(1): 357–366. doi: 10.1121/1.1424870 [DOI] [PubMed] [Google Scholar]
  • 5.Behroozmand R, Korzyukov O, Sattler L, Larson CR. Opposing and following vocal responses to pitch-shifted auditory feedback: evidence for different mechanisms of voice pitch control. J Acoust Soc Am. 2012;132(4): 2468–2477. doi: 10.1121/1.4746984 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kawahara H. Interactions between speech production and perception under auditory feedback perturbations on fundamental frequencies. Journal of the Acoustical Society of Japan, 1994;15(3):201–202. doi: 10.1250/ast.15.201 [DOI] [Google Scholar]
  • 7.Larson CR, Burnett TA, Bauer JJ, Kiran S, Hain TC. Comparisons of voice F0 responses to pitch-shift onset and offset conditions. J Acoust Soc Am.2001;110(6):2845–2848. doi: 10.1121/1.1417527 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Patel S, Gao L, Wang S, Gou C, Manes J, Robin DA, et al. Comparison of volitional opposing and following responses across speakers with different vocal histories. J Acoust Soc Am. 2019;146(6): 4244–4254. doi: 10.1121/1.5134769 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Scheerer NE, Jones JA. The relationship between vocal accuracy and variability to the level of compensation to altered auditory feedback. Neurosci Lett. 2012;529(2):128–132. doi: 10.1016/j.neulet.2012.09.012 [DOI] [PubMed] [Google Scholar]
  • 10.Hain TC, Burnett TA, Kiran S, Larson CR, Singh S, Kenney MK. Instructing subjects to make a voluntary response reveals the presence of two components to the audio-vocal reflex. Exp Brain Res. 2000;130(2):133–141. doi: 10.1007/s002219900237 [DOI] [PubMed] [Google Scholar]
  • 11.Bauer JJ, Larson CR. Audio-vocal responses to repetitive pitch-shift stimulation during a sustained vocalization: Improvements in methodology for the pitch-shifting technique. J Acoust Soc Am. 2003;114(2):1048–1054. doi: 10.1121/1.1592161 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Franken M. K., Acheson D. J., McQueen J. M., Hagoort P., & Eisner F. (2018). Opposing and following responses in sensorimotor speech control: Why responses go both ways. Psychonomic Bulletin & Review, 25(4), 1458–1467. [DOI] [PubMed] [Google Scholar]
  • 13.Brendel B, Lowit A, Howell P. The effects of delayed and frequency shifted feedback on speakers with Parkinson disease. J Med Speech Lang Pathol, 2004;12(4): 131–138. [PMC free article] [PubMed] [Google Scholar]
  • 14.Soderberg GA. Delayed auditory feedback and the speech of stutterers: A review of studies. J Speech Hear Disord. 1969;34(1):20–29. doi: 10.1044/jshd.3401.20 [DOI] [PubMed] [Google Scholar]
  • 15.Laukkanen AM. Artificial Pitch Changing in Auditory Feedback as a Possible Method in Voice Training and Therapy. Folia Phoniatrica Et Logopaedica, 1994;46(2): 86–96. doi: 10.1159/000266297 [DOI] [PubMed] [Google Scholar]
  • 16.Larson CR. Cross-modality influences in speech motor control: The use of pitch shifting for the study of F0 control. J Commun Disord.1998;31(6):489–503. doi: 10.1016/s0021-9924(98)00021-5 [DOI] [PubMed] [Google Scholar]
  • 17.Jones JA, Keough D. Auditory-motor mapping for pitch control in singers and nonsingers. Exp Brain Res. 2008;190(3): 279–287. doi: 10.1007/s00221-008-1473-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Natke U, Donath TM, Kalveram KT. Control of voice fundamental frequency in speaking versus singing. J Acoust Soc Am. 2003;113:1587–1593. doi: 10.1121/1.1543928 [DOI] [PubMed] [Google Scholar]
  • 19.Sundberg J. The Science of the Singing Voice. Dekalb, IL: Northern Illinois University Press; 1987. [Google Scholar]
  • 20.Zarate JM. The neural control of singing. Front Hum Neurosci. 2013;7:237. doi: 10.3389/fnhum.2013.00237 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zarate JM, Zatorre RJ. Experience-dependent neural substrates involved in vocal pitch regulation during singing. NeuroImage. 2008;40(4):1871–1887. doi: 10.1016/j.neuroimage.2008.01.026 [DOI] [PubMed] [Google Scholar]
  • 22.Tumber AK, Scheerer NE, Jones JA. Attentional demands influence vocal compensations to pitch errors heard in auditory feedback. PLoS ONE. 2014;9(10):e109968. doi: 10.1371/journal.pone.0109968 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Li W, Guo Z, Jones JA, Huang X, Chen X, Liu P, et al. Training of working memory impacts neural processing of vocal pitch regulation. Sci Rep.2015;5:16562. doi: 10.1038/srep16562 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Behroozmand R, Karvelis L, Liu H, Larson CR. Vocalization-induced enhancement of the auditory cortex responsiveness during voice F0 feedback perturbation. Clin Neurophysiol. 2009;120(7): 1303–1312. doi: 10.1016/j.clinph.2009.04.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Behroozmand R, Larson CR. Error-dependent modulation of speech-induced auditory suppression for pitch-shifted voice feedback. BMC neuroscience, 2011;12:54. doi: 10.1186/1471-2202-12-54 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Korzyukov O, Karvelis L, Behroozmand R, Larson CR. ERP correlates of auditory processing during automatic correction of unexpected perturbations in voice auditory feedback. Int J Psychophysiol. 2012;83(1): 71–78. doi: 10.1016/j.ijpsycho.2011.10.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Korzyukov O, Tapaskar N, Pflieger ME, Behroozmand R, Lodhavia A, Patel S, et al. Event related potentials study of aberrations in voice control mechanisms in adults with attention deficit hyperactivity disorder. Clin Neurophysiol.2015;126(6):1159–1170. doi: 10.1016/j.clinph.2014.09.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Liu H, Meshman M, Behroozmand R, Larson CR. Differential effects of perturbation direction and magnitude on the neural processing of voice pitch feedback. Clin Neurophysiol. 2011;122(5):951–957. doi: 10.1016/j.clinph.2010.08.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.ANSI. Methods for manual pure-tone threshold audiometry. American National Standards Institute; (Vol. ANSI S3.21–2004). New York, NY; 2004. [Google Scholar]
  • 30.Musiek FE, Baran JA, Pinheiro ML. Duration pattern recognition in normal subjects and patients with cerebral and cochlear lesions. Audiology.1990;29(6):304–313. doi: 10.3109/00206099009072861 [DOI] [PubMed] [Google Scholar]
  • 31.Musiek FE, Pinheiro ML. Frequency patterns in cochlear, brainstem, and cerebral lesions. Audiology.1987;26(2):79–88. [PubMed] [Google Scholar]
  • 32.Boersma P, Weenink D. Praat, a system for doing phonetics by computer. Glot International, 2001;5(9/10): 341–345. [Google Scholar]
  • 33.Liu H, Wang EQ, Metman LV, Larson CR. Vocal responses to perturbations in voice auditory feedback in individuals with Parkinson’s disease. PLoS One. 2012;7(3):e33629. doi: 10.1371/journal.pone.0033629 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Chen Z., Liu P., Jones J. A., Huang D., & Liu H. (2010). Sex-related differences in vocal responses to pitch feedback perturbations during sustained vocalization. The Journal of the Acoustical Society of America, 128(6), EL355–EL360. doi: 10.1121/1.3509124 [DOI] [PubMed] [Google Scholar]
  • 35.Li J, Hu H, Chen N, Jones JA, Wu D, Liu P, et al. Aging and Sex Influence Cortical Auditory-Motor Integration for Speech Control. Front Neurosci. 2018. Oct 17;12:749. doi: 10.3389/fnins.2018.00749 ; PMCID: PMC6199396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Ramig LO, Fox C, Sapir S. Parkinson’s disease: speech and voice disorders and their treatment with the Lee Silverman Voice Treatment. Semin Speech Lang. 2004;25(2):169–180. doi: 10.1055/s-2004-825653 [DOI] [PubMed] [Google Scholar]
  • 37.Sapir S, Ramig LO, Fox CM. Intensive voice treatment in Parkinson’s disease: Lee Silverman Voice Treatment. Expert Rev Neurother. 2011;11(6):815–830. doi: 10.1586/ern.11.43 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Jamie Males

4 Aug 2022

PONE-D-22-14364Effects of Sensorimotor Voice Training on Event-Related Potentials to Pitch-Shifted Auditory FeedbackPLOS ONE

Dear Dr. Patel,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Two peer reviewers have evaluated your manuscript and have raised a number of queries that need to be carefully addressed in a revision. Please pay particular attention to clarifying the aspects of the study design and methods that the reviewers have identified as needing further explanation.

Please submit your revised manuscript by Sep 18 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jamie Males

Editorial Office

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper used a pitch-shift paradigm to investigate the training effect on the neuronal and behavioral responses. The experiment had a pre-training baseline and a post-training baseline. In between, four sessions of training took place on four different days, where the participants were instructed differently (to oppose, to follow, or to ignore) to respond to the pitch perturbation. The authors concluded that there was a significant difference between the post-training and the pre-training. The topic is interesting. However, I found that the results of group effect were missing in the report and discussion, while their figures clearly showed a difference among the three groups. Their discussion was not completely consistent with their findings, either. My comments and suggestions can be found below.

Ln 42:

“magnitude of the voice pitch shift response in the baseline” -> Since you have the pre-training and post-training baseline tasks, it would be better to specify them here.

Ln 152:

“vocalize an /a/ vowel” -> prolong for how many seconds?

Ln 171:

I wonder why each session would take up to 1.5 hours. Assume that you asked the participants to vocalize /a/ for 3 or 5 seconds. Each training session had 4 blocks of 52 vocalizations, meaning that they had to say /a/ 208 times. Then, the total time that should be used would be less than an half hour or less than an hour. How long was the inter-vocalization delay? How long was the break between each block?

Ln 182:

“a short practice session of 10 trials before testing” -> What was the instruction for the practice session? Was it the same as the baseline task (to ignore the pitch-shift stimuli)?

Ln 190-191:

Please explain the rationale for using different pitch-shift stimuli in the baseline and the training tasks. In the training tasks, a single 1000-ms long shift per vocalization was used (see Ln 174). However, 5 shifts (each 200 ms-long) per vocalization were used in the baseline tasks. I wonder why.

Ln 220-221:

“We were primarily interested in the main effect of time and the interaction of group x time…” -> It seems that the author(s) preferred a pre-planned comparison. Then, why did the author(s) include “electrode” and “stimulus direction” as the fixed effects in the model? The main effects of electrode and stimulus direction were all missing in the results section. If these factors are not important, I would suggest to remove them from the analysis. If they should be included in the regression model, then I would expect to see the statistics report in the results section (no matter whether they were significant).

Ln 235:

Why was the 1000 cents threshold used for the entire duration, while the 30 cents threshold was used in the pre-shift period?

Ln 236:

“Only responses that opposed the direction of the pitch shift were used.” -> Can you provide the percentage of opposing responses for each group (opposing, following, non-varying)? It would be interesting to see if the participants in the “following” group would have less opposing responses than those in the “opposing” group. I think it is worth discussing whether the training method (i.e., the group effect) played an important role in their responses.

Ln 272 (Fig 4):

Figure 4 shows that there should be a main effect of group for the N1 peak amplitude (as you can see the non-varying group had smaller N1 than the “following” group and the “following” group smaller than the “opposing” group). In your report (from Ln 260 to Ln 272), you did not mention whether the main effect of group was significant or insignificant. As I suggested above, if all the four fixed factors (time, stimulus direction, electrode, group) were included, all the statistics (both significance and insignificance) should be reported here. Currently, many statistics (main effects and interactions) were missing in this paragraph.

For your Figure 4, why did you choose to plot the data for the Cz electrode? What happened to the other electrodes?

Ln 282-283:

“…the electrode by group interaction was significant for both P2 peak latency …and amplitude” -> What are the results of simple main effects analyses? Only significant at the Cz electrode? (Fig5) What happened to the other electrodes?

Ln 285-288:

I feel it’s a bit awkward to say the N1 peak latency effect in this paragraph, as you mainly focused on the statistical results of P2 in this paragraph. A short summary for both N1 and P2 can be placed in a separate paragraph below.

Ln 289:

Your Figures 1, 2, 4 (N1 peak amplitude), 5 (P2 peak amplitude) all showed a main effect of group. However, the statistics were all missing in the text. I think the manipulation of group (i.e., different instructions for the participants) was an important factor in the present study. I would expect a considerable portion of discussion on this issue.

Ln 310:

What was the Levene’s test result of log-transformed data? Please provide the statistics.

Ln 313:

“…changed from the pre- to post-training period …” -> I think the word “change” was a bit vague. It would be better to specify the tendency: larger or smaller. Same for the sentence in Ln 322.

Ln 342:

“The results are consistent with the findings of Li and colleagues [23] who report a decrease in N1 and an increase in P2 magnitude post-training.” -> But you had no significant effect of time in your N1 amplitude. How can you say they are consistent? Or can you explain why the effect of P2 amplitude was present and why the effect of N1 amplitude was absent in your study?

Ln 345-346:

“of the three training tasks” -> What are the three? (opposing, following, non-varying?)

“automatic [21]…efficient [23]” -> Do these two terms refer to peak latency? If yes, then it should be moved to Ln 342, where you discussed your N1 and P2 peak latencies.

Ln 348:

“…with a reduced amplitude post-training compared to pre-training” -> But your results show that P2 peak amplitude was enhanced rather than reduced in the post-training task compared to pre-training. Your discussion here does not match what you found. Please revise.

Ln 363-266:

“Results show that practicing to ignore auditory changes in pitch and hold your voice pitch steady also produced differences in ERP/voice responses, potentially because the non-varying task invoked similar cognitive processes used for voice error detection and correction as the other dynamic or volitional tasks.” -> It seems that the authors try to compare the similarity between pre-training and post-training. However, a more interesting comparison would be whether the instruction during the training session would affect how they respond in the post-training task (i.e., the non-varying task). Please elaborate more on this.

Ln 393-394:

“in both ERPs and voice responses, the peak responses occurred earlier and with a reduced amplitude post-training compared to pre-training.” -> Your N1 was reduced but N2 was enhanced. So this conclusion should be revised, not “both” ERPs.

Reviewer #2: The authors report an investigation of the nature of the pitch-shift reflex before and after a training period to inform whether changes are observed in ERPs and vocal pitch. The study examines an important and relevant question to the field. The motivation and methodological choices for the study require additional information to fully evaluate the contributions. The authors suggest in the introduction and abstract that the current investigation has therapeutic potential, but this notion is not clear from the manuscript in its current form. More information in the motivation and interpretation of the study is required to support this claim throughout.

The introduction requires more detail to help the reader understand the motivation of the current study. In addition, the specific study aims and hypotheses that correspond to the methods are absent from the introduction. This information is required to fully evaluate the appropriateness of the study methods, analysis, and interpretation of the results. Detailed comments below.

Line 59 – 66. It is not clear why understanding the nature of this response would be useful for voice rehabilitation at this point in the manuscript. A discussion of how individuals seeking voice rehabilitation have differing responses to these paradigms is needed.

Line 75 - 76. Please define what the authors intend by ‘voice control’ and ‘pitch control’. If these are referring to the same thing, it would also be helpful to keep terminology consistent as well.

Line 84. Pitch-shift task is not defined. These tasks are variable in the literature, including timing, shift amount, frequency of shifts, duration of shifts, etc. Please include more detail about the paradigms throughout the introduction and review of prior work for clarity.

Line 91. ERP is not defined or explained at this point. The reader needs more information to evaluate what the contribution of this study is to the current study motivation.

Line 92. N1 and P2 are not defined or explained at this point.

Line 95 – 100. The authors motivate the study purpose by saying that no evidence of this yet exists. This could be further strengthened by the addition of why these questions, specifically, need to be answered.

Line 111. This is the first mention of auditory-motor ERPs. Please provide more information here or earlier in the introduction.

Line 116. The authors use inconsistent nomenclature for “f0”. Here, it is f0 and elsewhere in the manuscript the ‘f’ is italicized. It would be helpful to use the notation recently proposed as a consensus; a lower-case italic f and subscript of an ‘o’ for oscillation (see Titze, I. R., Baken, R. J., Bozeman, K. W., Granqvist, S., Henrich, N., Herbst, C. T., ... & Wolfe, J. (2015). Toward a consensus on symbolic notation of harmonics, resonances, and formants in vocalization. The Journal of the Acoustical Society of America, 137(5), 3005-3007.)

Line 116 – 130. It is nicely outlined how the authors are building on the study by Hain and colleagues, however the reason for this additional investigation could be better explained. Specific research aims of the current work are not described. Study hypotheses are also absent. Please include the specific study aims and hypotheses for this work.

Line 147 – 149. The groups are not balanced by sex, which could have impacted the results. Pitch-shift responses have been shown to be impacted by speaker sex (Chen, Z., Liu, P., Jones, J. A., Huang, D., & Liu, H. (2010). Sex-related differences in vocal responses to pitch feedback perturbations during sustained vocalization. The Journal of the Acoustical Society of America, 128(6), EL355-EL360.). This should be acknowledged, along with other study limitations, with references to the specific directional effects that might be expected.

Experimental methods are well described.

For the analysis section, more information and justification are needed for data that were excluded. This information is required to fully evaluate the analysis methods and interpretation of the study. Comments below.

Line 231 – 236. More descriptive statistics are needed for the removed data. E.g., How many trials were removed for each speaker on average, and how many speakers required trials to be removed? A response of greater or less than 10 semitones is very large, how many speakers demonstrated a response outside this threshold? This information informs future work and is important to clarify.

Line 236. Why were only responses that opposed the direction of the pitch shift used? This requires justification. Recent work supports that following responses are common, with one study observing that all of their participants had a proportion of following responses to pitch-shifts (Franken, M. K., Acheson, D. J., McQueen, J. M., Hagoort, P., & Eisner, F. (2018). Opposing and following responses in sensorimotor speech control: Why responses go both ways. Psychonomic Bulletin & Review, 25(4), 1458-1467.). It is not clear why following responses were removed from the current sample when evidence shows it is a consistently observed behavior that is relevant to understanding the nature of the pitch-shift response.

Line 236. It would also be interesting to include if there were differing percentages of following responses by group (opposing, following, and non-varying).

Line 238 – 240. More information is required on the statistical testing. Please include descriptive information about each statistical test (e.g., method, software used, input variables, outcome variables), any corrections that were applied, and the corresponding study aim/purpose for the test.

Line 260. Please include statistical methods and software in the methods section.

Results and discussion – The results appear to be thoroughly described, but the results and interpretation cannot be fully evaluated given earlier comments regarding study hypothesis and analysis methods.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Li-Hsin Ning

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Jan 20;18(1):e0269326. doi: 10.1371/journal.pone.0269326.r002

Author response to Decision Letter 0


19 Sep 2022

We appreciate the editor’s and reviewers’ time in providing helpful comments and suggestions which we believe have improved our manuscript (PONE-D-22-14364). We have carefully considered each of the comments identified and hopefully have improved the clarity of our manuscript. Below we list our responses to the suggestions and criticisms and discuss how we have modified our manuscript based on the reviewer feedback.

Reviewer 1:

1. Ln 42: “magnitude of the voice pitch shift response in the baseline” -> Since you have the pre-training and post-training baseline tasks, it would be better to specify them here.

RESPONSE: We now clarify the wording as below:

Results showed that all three types of training affected the ERPs (N1 peak latency, P2 peak latency, and P2 peak amplitude) and the response latency and magnitude of the voice pitch shift response in the pre-training and post-training task (i.e., “hold your voice pitch steady” task; an indicator of voice pitch regulation).

2. Ln 152: “vocalize an /a/ vowel” -> prolong for how many seconds?

RESPONSE: We now note that participants vocalized for 5 sec.

3. Ln 171: I wonder why each session would take up to 1.5 hours. Assume that you asked the participants to vocalize /a/ for 3 or 5 seconds. Each training session had 4 blocks of 52 vocalizations, meaning that they had to say /a/ 208 times. Then, the total time that should be used would be less than an half hour or less than an hour. How long was the inter-vocalization delay? How long was the break between each block?

RESPONSE: The pre and post sessions with EEG took up to 1.5 hours (a minimum of 30-45 minutes are needed to put on and calibrate the EEG system, with additional time needed to remove the cap and wash the hair), but the individual training sessions without EEG took around 30 minutes, resulting in a total experiment time of 5.5 hours. The ISI was 700-900ms as outlined in the “Baseline Task”.

4. Ln 182: “a short practice session of 10 trials before testing” -> What was the instruction for the practice session? Was it the same as the baseline task (to ignore the pitch-shift stimuli)?

RESPONSE: The instruction was the same as presented during the training task as noted in this section. The goal of this practice session was to familiarize the subject with the training task.

5. Ln 190-191: Please explain the rationale for using different pitch-shift stimuli in the baseline and the training tasks. In the training tasks, a single 1000-ms long shift per vocalization was used (see Ln 174). However, 5 shifts (each 200 ms-long) per vocalization were used in the baseline tasks. I wonder why.

RESPONSE: The “baseline” task used in the pre- and post-testing is commonly used to assess the pitch-shift response, which occurs automatically in response to a brief change in pitch. Because the training task involved volitional modification of voice pitch, a longer time interval was used to reduce the additional memory demands needed to produce the volitional changes in vocalization. In addition, one of the aims of the study was to see if the volitional responses produced during the training task generalized to the typical pitch-shift response.

6. Ln 220-221: “We were primarily interested in the main effect of time and the interaction of group x time…” -> It seems that the author(s) preferred a pre-planned comparison. Then, why did the author(s) include “electrode” and “stimulus direction” as the fixed effects in the model? The main effects of electrode and stimulus direction were all missing in the results section. If these factors are not important, I would suggest to remove them from the analysis. If they should be included in the regression model, then I would expect to see the statistics report in the results section (no matter whether they were significant).

RESPONSE: We believe it is necessary to examine these variables. All of the nonsignificant statistics are now included.

7. Ln 235: Why was the 1000 cents threshold used for the entire duration, while the 30 cents threshold was used in the pre-shift period?

RESPONSE: The trial had vocalization while the pre-shift period was not expected to have vocalization. Therefore, the outlier removal identification threshold needed to be adjusted to the signals expected during each time period. We clarified this in text:

“Then outliers were removed from each trial using several processes including normalization by setting the mean baseline voice pitch to 0 cents and removal of extreme values (e.g., extraneous background noise) in the vocalization wave prior to the pitch-shift (for threshold = 30 cents, where max cents > threshold, and min cents < -threshold were rejected) and in the entire duration of each trial when vocalization was occurring (for threshold = 1000 cents, where the whole wave was rejected if max cents > threshold or min cents < -threshold).”

8. Ln 236: “Only responses that opposed the direction of the pitch shift were used.” -> Can you provide the percentage of opposing responses for each group (opposing, following, non-varying)? It would be interesting to see if the participants in the “following” group would have less opposing responses than those in the “opposing” group. I think it is worth discussing whether the training method (i.e., the group effect) played an important role in their responses.

RESPONSE: This is a great point. The purpose of this paper was to focus on the neurological responses to the volitional training paradigm, which hasn’t been done before. Because work has shown that both types of responses occur (following and opposing; see Franken, M. K., Acheson, D. J., McQueen, J. M., Hagoort, P., & Eisner, F. (2018). Opposing and following responses in sensorimotor speech control: Why responses go both ways. Psychonomic Bulletin & Review, 25(4), 1458-1467), and there is the potential for brain activation based on these differential responses to cancel each other out, we only analyzed responses that opposed the direction of the pitch shift to increase the homogeneity of the anticipated brain activation and focus on the aspect of responding to a pitch shift that has typically been utilized in the clinical training literature.

9. Ln 272 (Fig 4): Figure 4 shows that there should be a main effect of group for the N1 peak amplitude (as you can see the non-varying group had smaller N1 than the “following” group and the “following” group smaller than the “opposing” group). In your report (from Ln 260 to Ln 272), you did not mention whether the main effect of group was significant or insignificant. As I suggested above, if all the four fixed factors (time, stimulus direction, electrode, group) were included, all the statistics (both significance and insignificance) should be reported here. Currently, many statistics (main effects and interactions) were missing in this paragraph.

RESPONSE: The main effect of group was not significant for any of the ERPs (N1 latency, N1 amplitude, P1 latency, P2 amplitude). These non-significant main effects and interactions are now included in the results section.

10. For your Figure 4, why did you choose to plot the data for the Cz electrode? What happened to the other electrodes?

RESPONSE: We chose to show an illustration of the data at only the Cz electrode for clarity because it is the site of maximum activity. The N1-P2 complex is traditionally recorded from the midline electrode locations and is frequency found to be the largest and clearest at electrode Cz in speech training paradigms (Tremblay, K., Kraus, N., McGee, T., Ponton, C., & Otis, B. (2001). Central auditory plasticity: changes in the N1-P2 complex after speech-sound training. Ear and hearing, 22(2), 79-90; Tremblay, K. L., Billings, C., & Rohila, N. (2004). Speech evoked cortical potentials: effects of age and stimulus presentation rate. Journal of the American Academy of Audiology, 15(03), 226-237).

11. Ln 282-283: “…the electrode by group interaction was significant for both P2 peak latency …and amplitude” -> What are the results of simple main effects analyses? Only significant at the Cz electrode? (Fig5) What happened to the other electrodes?

RESPONSE: We now report the statistics of the main effect of electrode for the P2 amplitude and lack of main effect of electrode for the P2 latency in the results section. We now clarify that for brevity and clarity, Figure 5 is plotted only at the Cz electrode because it was the site of the maximal response.

12. Ln 285-288: I feel it’s a bit awkward to say the N1 peak latency effect in this paragraph, as you mainly focused on the statistical results of P2 in this paragraph. A short summary for both N1 and P2 can be placed in a separate paragraph below.

RESPONSE: We now move the final line summarizing the N1 peak latency, P1 peak latency, and P2 peak amplitude into a short summary paragraph below. This paragraph now reads:

“Taken together the findings of a significant effect of time and not a time by group interaction demonstrates a consistent change in ERPs following exposure to training. Specifically, the N1 peak latency, P2 peak latency, and P2 amplitude were modulated from pre-training to post-training across all groups.”

13. Ln 289: Your Figures 1, 2, 4 (N1 peak amplitude), 5 (P2 peak amplitude) all showed a main effect of group. However, the statistics were all missing in the text. I think the manipulation of group (i.e., different instructions for the participants) was an important factor in the present study. I would expect a considerable portion of discussion on this issue.

RESPONSE: The main effect of group was not significant for any of the ERP measures. We now include the nonsignificant main effect of group in the results section. However, the main effect of group was not of theoretical interest in this study. It is the group X time interaction would reflects potential changes in voice and ERPs following different instructions (to oppose, follow, or hold voice neutral). This interaction was also not significant, indicating that the instruction type presented did not result in differential ERP responses following training. Our conclusion regarding the lack of interaction is that all 3 groups experienced training effects to a similar degree regardless of the instruction provided. This finding and conclusion is discussed in the discussion section of the manuscript.

14. Ln 310:What was the Levene’s test result of log-transformed data? Please provide the statistics.

RESPONSE: The statistics section has now been updated and re-organized.

15. Ln 313: “…changed from the pre- to post-training period …” -> I think the word “change” was a bit vague. It would be better to specify the tendency: larger or smaller. Same for the sentence in Ln 322.

RESPONSE: We now specify the changes in latency and amplitude in the voice responses following the training period. Specifically, we specify the reduction in latency and the reduced peak amplitude in the voice response following training.

16. Ln 342: “The results are consistent with the findings of Li and colleagues [23] who report a decrease in N1 and an increase in P2 magnitude post-training.” -> But you had no significant effect of time in your N1 amplitude. How can you say they are consistent? Or can you explain why the effect of P2 amplitude was present and why the effect of N1 amplitude was absent in your study?

RESPONSE: We expanded on the findings and clarify where we are and are not consistent with the ERP findings between the Li et al. study and our study. The modified information is below:

These results are consistent with the findings of Li and colleagues [23] who report an increase in P2 magnitude post-training. While Li and colleagues [23] found a decrease in the N1 amplitude, we did not find changes to the N1 following training in our study, potentially due to differences in the training task. Other research has shown a N1 suppression (vocalization compared to listening) for pitch-shifts that occur at voice onset cite but a P2 enhancement for pitch shifts that occur mid-vocalization [24, 25]. Behroozmand et al. [24] suggest that this enhancement in the middle of vocalization may reflect an increased sensitivity or responsiveness to auditory feedback during the resolution of mismatches between the intended vocalization and its feedback.

17. Ln 345-346: “of the three training tasks” -> What are the three? (opposing, following, non-varying?)

“automatic [21]…efficient [23]” -> Do these two terms refer to peak latency? If yes, then it should be moved to Ln 342, where you discussed your N1 and P2 peak latencies.

RESPONSE: We now clarify that this trained motor behavior is based on practice with the 3 instructed conditions of opposing, following, and holding the voice steady. The included statement ‘trained motor behavior (after practice with the 3 instructed conditions of opposing, following, and holding the voice steady) may have become more automatic [21] and the processing of auditory information has become more efficient [23]’ is not referencing the peak latency specifically but more broadly is in agreement with previous interpretations about the effects of training.

18. Ln 348: “…with a reduced amplitude post-training compared to pre-training” -> But your results show that P2 peak amplitude was enhanced rather than reduced in the post-training task compared to pre-training. Your discussion here does not match what you found. Please revise.

RESPONSE: The full line reads “These results were complemented by the voice changes, which revealed significant changes in response latency and magnitude in that the peak responses occurred earlier and with a reduced amplitude post-training compared to pre-training". This discussion is about the changes in voice amplitude following training and not the ERPs following training. We now put this information related to voice changes in a separate paragraph to help clarify the change to voice changes.

19. Ln 363-266: “Results show that practicing to ignore auditory changes in pitch and hold your voice pitch steady also produced differences in ERP/voice responses, potentially because the non-varying task invoked similar cognitive processes used for voice error detection and correction as the other dynamic or volitional tasks.” -> It seems that the authors try to compare the similarity between pre-training and post-training. However, a more interesting comparison would be whether the instruction during the training session would affect how they respond in the post-training task (i.e., the non-varying task). Please elaborate more on this.

RESPONSE: We agree that this question is of importance and was in fact one of the main motivators of this study (what is the impact of study instructions on post-training performance). However, no time x group effect was found in this study, indicating that all groups had a similar response to the training intervention. We interpret this lack of effect of instruction as outlined in the manuscript that the holding steady condition required the participants to volitionally pay attention to their auditory feedback and that the cognitive processes involved in this act of attending to the feedback which were present in all 3 groups were primarily responsible for the impact of training on the ERP/voice response.

20. Ln 393-394: “in both ERPs and voice responses, the peak responses occurred earlier and with a reduced amplitude post-training compared to pre-training.” -> Your N1 was reduced but N2 was enhanced. So this conclusion should be revised, not “both” ERPs.

RESPONSE: We were not intending to imply that the same result was observed for both N1 and P2, but rather for voice and ERP N1s. The text has been modified to clarify this:

Changes were seen in ERP responses and voice responses, whereby the peak responses occurred earlier and the peak amplitude was modified post-training compared to pre-training.

Reviewer 2:

1. The motivation and methodological choices for the study require additional information to fully evaluate the contributions. The authors suggest in the introduction and abstract that the current investigation has therapeutic potential, but this notion is not clear from the manuscript in its current form. More information in the motivation and interpretation of the study is required to support this claim throughout. In addition, the specific study aims and hypotheses that correspond to the methods are absent from the introduction. This information is required to fully evaluate the appropriateness of the study methods, analysis, and interpretation of the results. Detailed comments below.

Line 59 – 66. It is not clear why understanding the nature of this response would be useful for voice rehabilitation at this point in the manuscript. A discussion of how individuals seeking voice rehabilitation have differing responses to these paradigms is needed.

RESPONSE: This is not the focus of the paper, but it is a possible avenue. We decided to reword and delete some of this content.

2. Line 75 - 76. Please define what the authors intend by ‘voice control’ and ‘pitch control’. If these are referring to the same thing, it would also be helpful to keep terminology consistent as well.

RESPONSE: Yes, these refer to the same thing. We now use “voice control” throughout.

3. Line 84. Pitch-shift task is not defined. These tasks are variable in the literature, including timing, shift amount, frequency of shifts, duration of shifts, etc. Please include more detail about the paradigms throughout the introduction and review of prior work for clarity.

RESPONSE: We added a definition earlier in the text. We also updated this line to reflect the basic paradigm information about pitch shift timing and amounts presented in the Tumber manuscript.

4. Line 91. ERP is not defined or explained at this point. The reader needs more information to evaluate what the contribution of this study is to the current study motivation.

RESPONSE: We now introduce the terminology of ERP at this earlier point in the manuscript and clarify that event related potentials to auditory stimuli are measured during the pre and post training conditions.

5. Line 92. N1 and P2 are not defined or explained at this point.

RESPONSE: We now clarify that the N1 and P2 are auditory evoked potentials that are recorded during electroencephalographic measurement in the study that is described. Note that it is not standard to define N1 and P2 further than to the extent that we provide and would require significantly more text to provide the reader with a basic understanding of the event-related potentials (see our cited literature, for example:

23. Li W, Guo Z, Jones JA, Huang X, Chen X, Liu P, et al. Training of working memory impacts neural processing of vocal pitch regulation. Sci Rep.2015;5:16562.doi: 10.1038/srep16562.

24. Behroozmand R, Karvelis L, Liu H, Larson CR. Vocalization-induced enhancement of the auditory cortex responsiveness during voice F0 feedback perturbation. Clin Neurophysiol. 2009;120(7): 1303-1312. doi: 10.1016/j.clinph.2009.04.022.

6. Line 95 – 100. The authors motivate the study purpose by saying that no evidence of this yet exists. This could be further strengthened by the addition of why these questions, specifically, need to be answered.

RESPONSE: The importance of identifying this evidence as minimum parameters under which training paradigms operate and its impact on the clinical feasibility of interventions is now identified.

7. Line 111. This is the first mention of auditory-motor ERPs. Please provide more information here or earlier in the introduction.

RESPONSE: For clarity we now describe these ERPs as auditory-evoked measures. Details about the auditory evoked N1-P2 are introduced earlier (see #5 above).

8. Line 116. The authors use inconsistent nomenclature for “f0”. Here, it is f0 and elsewhere in the manuscript the ‘f’ is italicized. It would be helpful to use the notation recently proposed as a consensus; a lower-case italic f and subscript of an ‘o’ for oscillation (see Titze, I. R., Baken, R. J., Bozeman, K. W., Granqvist, S., Henrich, N., Herbst, C. T., ... & Wolfe, J. (2015). Toward a consensus on symbolic notation of harmonics, resonances, and formants in vocalization. The Journal of the Acoustical Society of America, 137(5), 3005-3007.)

RESPONSE: We now use consistent nomenclature following the consensus panel throughout the manuscript.

9. Line 116 – 130. It is nicely outlined how the authors are building on the study by Hain and colleagues, however the reason for this additional investigation could be better explained. Specific research aims of the current work are not described. Study hypotheses are also absent. Please include the specific study aims and hypotheses for this work.

RESPONSE: We now clarify the specific aims and hypotheses as follows.

The specific aims of this study were to examine the impact of three brief volitional training paradigms on 1) auditory-motor ERPs (the N1-P2 complex) and 2) voice responses in a pitch-shift task. We predict 1) shorter latencies in the N1 and P2 auditory motor response following volitional training; 2) larger amplitudes in the N1 and P2 auditory motor response following volitional training; and 3) shorter latencies and amplitudes in the voice response during a pitch-shift task following training.

10. Line 147 – 149. The groups are not balanced by sex, which could have impacted the results. Pitch-shift responses have been shown to be impacted by speaker sex (Chen, Z., Liu, P., Jones, J. A., Huang, D., & Liu, H. (2010). Sex-related differences in vocal responses to pitch feedback perturbations during sustained vocalization. The Journal of the Acoustical Society of America, 128(6), EL355-EL360.). This should be acknowledged, along with other study limitations, with references to the specific directional effects that might be expected.

RESPONSE: Limitations of the study including the potential implications of gender effects are now presented in the discussion section.

11. Line 231 – 236. More descriptive statistics are needed for the removed data. E.g., How many trials were removed for each speaker on average, and how many speakers required trials to be removed? A response of greater or less than 10 semitones is very large, how many speakers demonstrated a response outside this threshold? This information informs future work and is important to clarify.

RESPONSE: The outlier removal process occurred within a trial. The process described did not pertain to trial removal. Pitch extraction is known to produce errors in computation. The steps taken were simply to reduce invalid data points due to errors in pitch extraction (see below). The last line indicates trial rejection criteria, however, this rarely occurs in less than 5% of trials:

Then outliers were removed from each trial using several processes including normalization by setting the mean baseline voice pitch to 0 cents and removal of extreme values in the vocalization wave prior to the pitch-shift (for threshold = 30 cents, where max cents > threshold, and min cents < -threshold were rejected) and in the entire duration of each trial (for threshold = 1000 cents, where the whole wave was rejected if max cents > threshold or min cents < -threshold).

12. Line 236. Why were only responses that opposed the direction of the pitch shift used? This requires justification. Recent work supports that following responses are common, with one study observing that all of their participants had a proportion of following responses to pitch-shifts (Franken, M. K., Acheson, D. J., McQueen, J. M., Hagoort, P., & Eisner, F. (2018). Opposing and following responses in sensorimotor speech control: Why responses go both ways. Psychonomic Bulletin & Review, 25(4), 1458-1467.). It is not clear why following responses were removed from the current sample when evidence shows it is a consistently observed behavior that is relevant to understanding the nature of the pitch-shift response.

RESPONSE: We agree that following responses are consistently observed behavior. The presence of both opposing and following responses in fact motivated this study. However, we were analyzing the compensatory responses here as they present in a majority of trials, at least in the studies conducted in CL’s lab (observationally). An aggregation of following and compensatory responses would simply cancel the compensatory responses as they oppose one another in the direction representation of the compensatory response, which was of primary interest in this study.

13. Line 236. It would also be interesting to include if there were differing percentages of following responses by group (opposing, following, and non-varying).

RESPONSE: We agree, although it was not feasible to obtain these percentages at the time of writing, due to updates in the analysis software as this was custom-designed MIDI software for data collection.

14. Line 238 – 240. More information is required on the statistical testing. Please include descriptive information about each statistical test (e.g., method, software used, input variables, outcome variables), any corrections that were applied, and the corresponding study aim/purpose for the test.

RESPONSE: The methods and data analysis section have been reorganized to include and clarify information on the statistical testing (including software used and variables assessed) in the methods section of the manuscript.

15. Line 260. Please include statistical methods and software in the methods section.

RESPONSE: This information was previously provided intext in the results section of the manuscript. All statistical information was reorganized as suggested and moved into the methods and data analyses sections for clarity.

Attachment

Submitted filename: Responses to reviewers.docx

Decision Letter 1

Li-Hsin Ning

2 Oct 2022

PONE-D-22-14364R1Effects of Sensorimotor Voice Training on Event-Related Potentials to Pitch-Shifted Auditory FeedbackPLOS ONE

Dear Dr. Patel,

Thank you for submitting your manuscript to PLOS ONE. I was one of the reviewers in the first-round review process and now take on the role of Guest Academic Editor for this manuscript. After reading your revised manuscript, I feel it is much improved. I have a few minor comments which you can find below in this email. I invite you to submit a revised version of the manuscript that addresses the points raised here. 

Please submit your revised manuscript by Nov 16 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Li-Hsin Ning

Guest Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Line 40: It would be better to specify how the training methods influenced the brain and vocal responses in the abstract. Did they become larger or smaller after training?

Line 42: "...in the pre-training..." How did the training methods affect the pre-training task? I think only the post-training task can be affected by the training methods.

Line 91: "...modifications..." Can you specify the change of N1 and P2? Were they reduced or enhanced?

Line 262: You mentioned that up and down responses were aggregated so that the main effect of direction was not tested. However, in Line 337, you reported the main effect of direction (F(1,11) = 5.541, p < 0.05). I wonder which way is correct.

Line 293-294: It would be better to add standard deviations to the mean values (129.32 ms and 148.94 ms). Same for Line 304 and Line 310.

Line 305 and Line 313: There was a significant interaction between electrode and group on P2 latency and P2 amplitude. More elaborations should be made here. What is the implication for this interaction?

Line 320: "...modulated..." Can you specify the change of N1 and P2? Were they reduced or enhanced?

Line 347: "...(cite)." Citations should be added here.

Line 432: "...modified..." reduced or enhanced?

Additional comments:

1. Both reviewers questioned the proportions of following responses in the pre- and post-training tasks. If the information cannot be retrieved, it, at least, should be mentioned or discussed in the limitations.

2. The response to Reviewer 1's 5th comment can be added to the main text: the rationale of using 1000-ms long and 200-ms long stimuli.

[Note: HTML markup is below. Please do not edit.]

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 2

Li-Hsin Ning

21 Nov 2022

Effects of Sensorimotor Voice Training on Event-Related Potentials to Pitch-Shifted Auditory Feedback

PONE-D-22-14364R2

Dear Dr. Patel,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Li-Hsin Ning

Guest Editor

PLOS ONE

Additional Editor Comments (optional):

I appreciate the great efforts that the authors have made in response to my questions and concerns. I have one more suggestion: In Ln 343, the keyword "response latency" should be added to the sentence so that we can be clear that the statistical report refers to the results of response latency, not response amplitude.

Reviewers' comments:

Acceptance letter

Li-Hsin Ning

10 Jan 2023

PONE-D-22-14364R2

Effects of Sensorimotor Voice Training on Event-Related Potentials to Pitch-Shifted Auditory Feedback

Dear Dr. Patel:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Li-Hsin Ning

Guest Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Responses to reviewers.docx

    Attachment

    Submitted filename: Responses to reviewers.docx

    Data Availability Statement

    All subject files are available from the Open Science Framework database at: https://osf.io/ef86u/.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES