Abstract
Objective:
Hearing plays an important role in our ability to control voice, and perturbations in auditory feedback result in compensatory changes in vocal production. The auditory cortex (AC) has been proposed as an important mediator of this behavior, but causal evidence is lacking. We tested this in an animal model, hypothesizing that AC is necessary for vocal self-monitoring and feedback-dependent control, and that altering activity in AC during vocalization will interfere with vocal control.
Methods:
We implanted two marmoset monkeys (Callithrix jacchus) with bilateral AC electrode arrays. Acoustic signals were recorded from vocalizing marmosets while altering vocal feedback or electrically stimulating AC during random subsets of vocalizations. Feedback was altered by real-time frequency shifts and presented through headphones and electrical stimulation delivered to individual electrodes. We analyzed recordings to measure changes in vocal acoustics during shifted feedback and stimulation, and to determine their interaction. Results were correlated with the location and frequency tuning of stimulation sites.
Results:
Consistent with previous results, we found electrical stimulation alone evoked changes in vocal production. Results were stronger in the right hemisphere, but decreased with lower currents or repeated stimulation. Simultaneous stimulation and shifted feedback significantly altered vocal control for a subset of sites, deceasing feedback compensation at some and increasing it at others. Inhibited compensation was more likely at sites closer to vocal frequencies.
Conclusions:
Results provide causal evidence that the AC is involved in feedback-dependent vocal control, that it is sufficient and may also be necessary to drive changes in vocal production.
Keywords: Auditory cortex, marmoset, vocalization, vocal production, sensory-motor
Introduction
Vocal communication plays an important role in our day to day lives. In order to ensure accurate communication, however, we must continuously monitor our vocal output to detect and correct any perceived changes or errors in our vocal production.1 This vocal self-monitoring is a sensory-motor process, involving sensory feedback from both our auditory and other sensory systems, that allows us to rapidly correct many aspects of our speech and voice, including loudness, timing, pitch, and formant frequencies.2–5 However, these self-monitoring mechanisms during vocal control have received limited attention and remain poorly understood.
The importance of hearing to speech and voice is well appreciated. Individuals with congenital deafness have difficulty in acquiring and maintaining normal speech production.6 Even those with hearing loss acquired later in life exhibit mild degradation in their speech, though more subtly so than for those with pre-lingual impairments.7 Unfortunately, although hearing restoration, including cochlear implants, improves vocal control and clarity of speech production, the communication abilities of these patients still lag behind those of normal hearing individuals.8 Recent results have suggested that implant recipients, for example, do have some ability to monitor and their auditory feedback, but that these abilities lack the fine control seen in normal individuals.9 Additionally, dysfunction of vocal self-monitoring and resulting feedback-dependent vocal control has also been seen in a variety of primary vocal disorders and neurologic disorders with vocal symptoms, including spasmodic dysphonia,10 vocal cord paralysis,11 Parkinson’s disease,12 and cerebellar degeneration.13 A better understanding of the underlying neural and behavioral mechanisms may lead to better diagnostic and rehabilitation tools to improve patient communication deficits.
Current models have suggested that the auditory cortex (AC), an important brain area involved in sound perception, as a possible site for auditory self-monitoring during vocal production.14 This structure may allow the sensory-motor comparison between motor predictions of expected vocal sounds and feedback of the sounds actually made, the output of which could be relayed to motor areas to drive compensatory changes in vocal production. Recent work in human subjects has demonstrated a reduction of activity in the AC during speech and vocal production, as compared to passive listening.15–19 Importantly, this change in neural activity is sensitive to artificial changes in vocal feedback and correlates with vocal control.18,20 However, due to inherent methodologic limitations when working with humans, the specific, causal role for the AC in feedback-dependent vocal control and mechanistic details are unknown.
Paralleling findings in humans, work in non-human primates has demonstrated suppression of individual neurons in the AC during vocal production.21,22 Despite this inhibition, AC neurons remain sensitive to vocal feedback changes and are, in fact, more sensitive to this feedback distortion than during passive listening.23 Non-human primate studies have also shown that these animals, like humans, exhibit feedback-dependent control of their vocal pitch and loudness, and that activity in AC neurons can predict this vocal control.24,25 Importantly, recent electrical microstimulation studies in primates have shown the AC stimulation can evoke changes in the acoustics of vocal production,24 evidence that AC activity is sufficient to drive vocal control. However, it is still unclear if processing in the AC is actually necessary.
In this study we sought to better examine the causal role of the AC in feedback-dependent vocal control behaviors. We hypothesized that the AC is necessary for vocal self-monitoring and feedback control, and that altering activity in the AC during vocal production will interfere with this vocal control. We approached this by combining electrical microstimulation of the AC with real-time frequency-shifted auditory feedback in vocalizing marmoset monkeys (Callithrix jacchus).
Materials and Methods
Animal Subjects
Two adult marmosets were used in these experiments, one male and one female. Over the past few years, marmosets have emerged as important non-human primate model for the study of auditory perception and vocal behavior.26 Prior to experiments, marmosets were adapted for handling, but underwent no specific behavioral training. Our general approach is to exploit natural vocal communication in these animals, which they do prolifically even in captive housing environments. Data from these two marmosets were used in our previous work,24 however those results were limited to electrical stimulation without shifted feedback. Here, we include elements of this previous stimulation data in our analyses for comparison between different stimulation parameters, and now include novel data and analyses for the effects of stimulation during the shifted feedback. All experiments were conducted under guidelines and protocols approved by the University of Pennsylvania Institutional Animal Care and Use Committee, and consistent with the guidelines of US National Research Council’s Guide for the Care and Use of Laboratory Animals, the US Public Health Service’s Policy on Humane Care and Use of Laboratory Animals, and Guide for the Care and Use of Laboratory Animals.
Vocal Recordings
We recorded acoustic signals while marmosets made self-initiated vocalizations. Experiments were conducted with the marmosets in their housing colony, allowing free visual and vocal interactions with the other animals. Subjects were placed in a small cage within a custom three-walled booth lined with anechoic foam to improve vocal recording quality. Recordings were performed with the animals tethered, but free roaming, within the cage and otherwise unrestrained. Vocalizations were recorded using a directional microphone (Sennheiser ME66) placed ~20 cm in front of the marmoset, amplified, and digitized at 48.8 kHz sampling rate (TDT RX-8, Tucker-Davis Technologies, Alachua FL). We used multiple microphones to monitor the vocalizations produced by the experimental animals as well as sounds from the rest of the colony. We extracted vocalizations from the recorded signals and classified them into established marmoset call types27 based upon their spectrograms using a semi-automated system. All major call types were produced by marmosets in this context, however only trill vocalizations were made in sufficient numbers to allow comparisons between different testing conditions.
Frequency-Shifted Feedback
Frequency (pitch) shifted feedback has emerged as a common and important method to measure feedback-dependent vocal control in both humans2,28 and animals.23,24 When faced with unexpected changes in the frequency content of their vocal feedback, subjects will compensate by changing their vocal production in the opposite direction. Here, we altered animals’ auditory feedback in real-time by passing microphone signals through a commercial effects processor (Eventide Eclipse V4) and modifying the vocal signal to increases or decrease the frequency by ±2 semitones (ST; 1ST = 1/12th of an octave), a shift magnitude chosen based upon previous work in marmosets and humans. Feedback was presented back to the subjects through earbud-style headphones (Sony MDR-EX10LP), modified to attach to the animal’s headcap,23 and amplified +10dB SPL to overcome the sound of direct un-altered feedback of the animals’ vocalization. Due to the limited number of vocalizations produced at a time, only one direction of feedback was used in any given session. In order prevent anticipation, and to allow comparisons of effects with and without electrical stimulation, feedback was shifted on ~50% of trials, triggered by a computer to begin ~100 msec after the onset of vocalization, and lasted 1000 msec. Trigger and feedback signals were digitized to allow for event timing.
Electrode Implants and Surgery
Both marmosets were implanted bilaterally with multi-electrode arrays (Warp 16, Neuralynx, Bozeman MT) containing tungsten microlectrodes (4MΩ, FHC, Bowdoinham, ME). Following established procedures for marmoset implants,29 we first performed a headcap surgery, following which single electrode recording techniques were used to localize the AC. We then performed bilateral craniotomies to expose the temporal lobe and placed the electrode arrays, secured with dental acrylic. One animal had a right hemisphere array that was tested, and then replaced at the time of the left array placement.
Following array placement, but prior to feedback and stimulation testing, we measured the auditory (sensory) responses from the electrodes. Marmosets sere seated in a custom primate chair within a soundproof chamber (Industrial Acoustics, Bronx NY). Auditory stimuli were digitally generated at 97.6 kHz sampling rate and delivered using TDT hardware (System III) in free-field through a speaker (B&W 686 S2) located ~1m in front of the animal. Stimuli included tones (1–32 kHz, 10/octave; −10 to 80 dB SPL by 10 dB) and bandpass noise (1–32 kHz, 5/octave, 1 octave bandwidth) to measure frequency tuning curves, as well as wide-band noise stimuli. Multi-unit neural responses were sorted, and the center frequency (CF) tuning was determined by the pure tone stimulus with the highest firing rate response, or from bandpass when no tone response was present. Based on the relative strength of responses to tone and noise stimuli, electrodes were judged to likely span both primary AC (A1) for more medial electrodes, and non-primary (lateral belt) AC more laterally.30 Due to ongoing experiments, detailed histologic localization was not available for these animals.
Electrical Microstimulation
Similar to the shifted feedback manipulation, stimulation was triggered to occur ~100 msec following vocal onset, and delivered on 50% of trials. The triggering of shifted feedback and stimulation were randomized so that ~25% had stimulation alone, 25% feedback alone, 25% had both, and 25% had neither (normal condition). Electrical stimulation was delivered through the implanted electrodes using a current source generator (MultiStim Model 3800 and SIU 3820 isolator, A-M Systems, Carlsborg, WA). Current delivered was 50 or 100 μA using 0.25 msec biphasic square pulses, 300 Hz, 1000 msec duration. Stimulation parameters were chosen based upon previous experiments in marmoset motor cortex,31 where stimulation resulted in movement that was presumably a result of cortical activation, and recent studies in marmoset AC.24 Current delivery did not appear to interrupt or prevent ongoing vocal production, and the animals exhibited no abnormal behavior, beyond changes in their vocal acoustics, to indicate a conscious perception. We only tested one electrode and current per session, however many electrodes were tested with different currents or feedbacks during different sessions.
Data Analysis
We analyzed our vocal recording to determine changes induced by shifted feedback and microstimulation. We first extracted the fundamental frequency contour for each vocalization. Following previous methods, spectrograms were calculated, low-frequency (<2 kHz) background noise was removed, and the frequency with the maximal power was measured to determine the vocal fundamental frequency at each time point in the vocalization.24,27 These frequency traces were then smoothed to remove the natural frequency oscillations seen in trill calls by low pass filtering. Traces were then aligned to the feedback/stimulation onset trigger. Because normal/reference trials (no stimulation or feedback shift) lack a trigger time for alignment, we used the mean trigger time from feedback/stimulation trials (100ms). To determine frequency changes attributable to manipulations, and not natural call-to-call variability, we subtracted the mean frequency in a window around the trigger time (±50 msec) for each vocalization. We only tested vocalizations that continued for at least 250 msec after trigger onset, and the short duration of many trill vocalizations limited population analysis of longer time windows. We focused analysis of stimulation effects to a window 50–200 msec after stimulation onset, and analysis of shifted feedback effects focused on a time window beginning 200 msec after feedback onset, chosen based upon the timing of marmoset vocal compensation.24 Individual vocalizations were averaged over these time windows prior to comparison across trials and conditions.
In order to determine the interactions between stimulation and shifted feedback, we created a multi-variate linear regression model for each electrode/session, extracting the effects of microstimulation alone, feedback alone, and, importantly, the interaction term. Individual electrode results for stimulation, feedback, or interaction were considered significant for p<0.05. For sample vocal time plots, we compared frequency curves during manipulation to reference (first subtracting the mean of the normal vocalizations for display purposes) using an ANOVA with post-hoc Bonferroni corrections at each time point, followed by False Discovery Rate (FDR) correction for multiple time bin testing. To better visualize the relationship between vocal responses and electrode CF, we performed a smoothing procedure, filtering each data point with a gaussian curve, σ = 2 ST (chosen based on shifted feedback). We calculated shuffle-corrected confidence intervals for this CF comparison by randomizing the CF-response pairings 1000 times, and measuring the intervals containing 95% of the shuffled results. When comparing across different groups of electrodes or different stimulation parameters, we utilized t-tests for categorical comparisons, and Pearson correlation coefficients for continuous data. Multi-variate linear regression was also used to examine anatomic differences between sites/electrodes and stimulation parameters correlating with vocal changes. P-values <0.05 were considered statistically significant.
Results
Effects of AC Electrical Stimulation on Vocal Acoustics
Figure 1 shows sample recordings of marmoset trill vocalizations during electrical microstimulation of AC. Shortly following the onset of stimulation, there were often abrupt changes in vocal acoustics, typically a rapidly decrease in frequency contours, or sometimes a brief increase in frequency followed by a sustained decrease (Fig. 1A). These patterns of vocal frequency change were atypical for normal marmoset vocalization, and not seen in the absence of electrical stimulation. We quantified these vocal changes by measuring the fundamental frequency over time and comparing results during manipulation and normal vocalizations. Figure 1B illustrates these effects for a different stimulation site, comparing the effects of stimulation, frequency-shifted feedback, and the combination upon vocal frequencies. Stimulation alone (orange) evoked a rapid decrease in vocal frequency that significantly deviated from normal vocalizations beginning 54 msec after stimulation onset. In contrast, −2 ST feedback shifts (dark green) evoked a significant increase in frequency that began 129 msec after feedback onset. When both stimulation and shifted feedback were combined (light green), vocal frequency initially followed the stimulation alone condition (onset 69 msec), but later deviated upward. These results suggest that stimulation effects dominated early on, consistent with a shorter latency, and then were modified by the slower effect of vocal feedback.
Figure 1.

Vocal changes during cortical electrical stimulation and frequency-shifted feedback. A, Sample frequency-time spectrograms of marmoset ‘trill’ vocalizations showing abrupt changes following onset of stimulation, a decrease (left) or a transient increased followed by a drop (right). Vertical dashed lines indicate onset of stimulation. The +2 semitone (ST) shift is visible in the right spectrogram. B, Frequency tracing showing changes during vocalization from another stimulation site. Changes are normalized relative to frequency at feedback/stimulation onset and corrected (subtracted) for the average normal vocalization for display purposes. Mean and SEM (shaded) results are shown. Hashed symbols mark times of significant difference from normal (p<0.05).
Although the effects of AC microstimulation on vocal production have been previously described,24 we sought to quantify the resulting changes in vocal frequency, and to better measure the effects of different stimulation parameters. To better compare parameters, the present analyses include these previous results, which had been limited to a single current. Across all sites and condition, stimulation evoked a significant net decrease in frequency (Fig 2A; mean±s.d.: −34.0±72.1 Hz; t-test: df=196, t=−6.62, p<0.001). There was considerable variability, with a few electrodes showing increases in frequency. Overall, 16.8% of stimulation sites exhibited a significant effect of stimulation (p<0.05). We tested whether the stimulation current used had effects on the results, comparing 50 and 100 μA, and found significantly stronger effects for the higher stimulation current (t=21.01, p<0.001; t-test), where 20.3% of sites were individually significant. When comparing which brain hemisphere a stimulating electrode was located in, we found a significant bias towards larger effects in the right hemisphere over the left (Fig 2B; t=6.07, p=0.015).
Figure 2.

Effects of electrical stimulation on vocal frequency. A, Histogram showing the net reduction in vocal frequency during AC electrical stimulation, comparing 50 μA (grey) and 100 μA (black) current. Mean±std are indicated, as are p-values for individual distributions, and fraction of tested sites showing significant stimulation effects (fx sig; shaded bars). P-values comparing distributions are indicated. B, Histogram comparing left (blue; LH) and right (red; RH) hemisphere stimulation.
We further examined whether other anatomic variables and stimulation parameters had an effect on vocal frequency changes, including hemisphere, electrode row (medial-lateral position, corresponding to A1 vs. lateral belt), electrode column (anterior-posterior position), repeating stimulation at the same site on different sessions, and the direction of feedback shift tested that day (Table I). Multi-variate linear regression revealed significant effects (p<0.05) for current, repeated stimulation, and hemisphere, but not for the other variables. These results suggested that using a lower current, and repeating stimulation at the same electrode on a different session, decreased the effects of stimulation. If we restricted our analysis to only 100 μA and non-repeated sessions, the fraction of stimulation sites evoked significant vocal changes increased to 29.5% of the sites tested. Based on these results, we limited our further analysis of the stimulation-feedback interaction to sessions tested with 100μA.
Table I.
Predictors of vocal responses to electrical stimulation
| Regression coefficient (Hz) | Std. Error | t-statistic | p-value | |
|---|---|---|---|---|
| Hemisphere (right) | −25.3 | 11.5 | −2.20 | 0.029 |
| Electrode Row | 8.3 | 4.8 | 1.73 | 0.085 |
| Electrode Column | 3.9 | 4.5 | 0.87 | 0.39 |
| Current (50 vs 100) | −37.7 | 11.4 | −3.31 | 0.001 |
| Stim Repetition | 23.0 | 11.2 | 2.06 | 0.04 |
| Feedback Direction | −2.6 | 8.7 | −0.30 | 0.77 |
Interaction Between AC Stimulation and Feedback Responses
As illustrated in Figure 1B, analysis of simultaneous electrical stimulation and frequency-shifted feedback is complicated by the fact that both manipulations can evoke different changes to vocal production. In order to determine whether AC stimulation could affect the feedback response, we calculated multi-variate linear regression models for each electrode/stimulation site that included both stimulation and feedback variables, as well as an interaction term. We first sought to determine the best time window for the comparison, calculating regressions for different time bins relative the onset of manipulation (Fig. 3). Results were calculated for individual sites and then averaged. Because the expected effect of shifted feedback is opposite for +2 and −2 ST shifts, we rectified (flipped) the sign of results for +2 shifts (so that appropriate compensation for feedback is positive), and similarly for the interaction result. We found significant effects of stimulation beginning at 60ms, feedback at 160ms, and interactions at 220ms. The interaction effects actually appeared to begin much earlier, closer to 180ms, but were likely not significant due to greater variability than for the other results. We therefore focused our further analysis of the interaction to a time window beginning 200ms after the onset of manipulations.
Figure 3.

Feedback and stimulation results over time. Population averages are shown for linear regression coefficients calculated for individual testing sites and then averaged for different time points (bins 20 msec) relative to perturbation onset. Results are shown separately for feedback (red), stimulation (orange), and their interaction (black). Mean ± SEM (shaded) are shown. Filled symbols indicate time points of significant changes (p<0.05). Results for +2 ST feedback and interactions have been rectified so that positive changes indicate vocal compensation regardless of feedback direction.
We measured the magnitude of the stimulation-feedback interactions for individual electrodes and found significant effects for both +2 and −2 ST shifted feedback (Fig. 4A). For +2 ST feedback, stimulation evoked a net decrease in vocal frequency, i.e. in the same direction as vocal compensation (−33.8±131.3 Hz; p=0.035, t-test), with only 6.4% of sites showing individually significant effects. Similarly, for −2 ST feedback, stimulation evoked an average increase in vocal frequency (37.7±149.9 Hz; p=0.01; 12.9% individually significant). We combined these two sets, rectifying the sign of +2 ST results so that a positive interaction indicated an increase or enhancement in the feedback compensation, and a negative interaction indicated an inhibition or blocking of feedback compensation (Fig. 4B). Overall, we found an average bias towards positive interactions, or enhanced compensation (36.1±142.1 Hz; df=116, t=2.75, p=0.007, t-test), with 10.3% of individually significant sites. If we only looked at the optimal conditions, 100μA and no repetitions, this increased to 16.1% of sites with significant interactions. Interestingly, while there was an average positive result, the negative tail of the distribution (inhibited compensation) extended further than the positive tail, suggesting a stronger effect when present. There were no significant differences in the magnitude between +2 and −2 ST (p=0.89).
Figure 4.

Distribution of stimulation-feedback interactions. A, Results are shown separately for +2 ST (red) and −2 ST (green) feedback conditions. Mean ± std are indicated, as are p-values and the fraction of sites individually significant (shaded bars). B, Distributions from A are combined after rectifying +2 ST feedback, so that a positive interaction indicated an increase or enhancement of compensation, while a negative interaction indicates a decrease or inhibition of compensation due to stimulation.
As a control, we also examined the degree to which variability in the stimulation-feedback interaction may have been related to day-to-day variations in feedback vocal control alone. While stimulation of individual electrodes would reasonably be expected to have different outcomes at different brain locations, frequency-shifted feedback effects might not be expected to vary from session-to-session beyond random chance. To examine this, we adjusted the responses of feedback-only trials so that the mean for each sessions matched that of the global mean across all sessions (for that feedback direction), preserving the number of trials and trial-to-trial variability of that session, and then re-calculated the interaction effects. We found a generally preserved trend towards positive interactions (18.3±32.4, p=0.045, t-test). Interestingly, however, there was a reduction in the fraction of sessions that were significant, 6% down from 10.3%, but the number of significant negative/inhibited interactions was preserved (4 sites in both analyses), while the number of significant positive interactions decreased (from 8 down to 2). The results suggest that the inhibition of compensation was likely a more robust effect than apparent enhancement.
Predictors of Stimulation-Feedback Interactions
To better understand the origins of the interaction between AC stimulation and feedback vocal responses, we next compared interaction coefficients to several anatomic and stimulation factors. We primarily focused on the frequency tuning of the stimulation sites, as we might a priori think that sites near vocal frequencies (fundamental ~ 7kHz) should have a greater effect than stimulation more distant. We measured sites’ center frequencies (CFs) using tone and noise stimuli, comparing this tuning to interaction effects as well as stimulation alone (Fig 5A). During isolated stimulation (orange), we observed a biphasic effect, where stimulating just above the vocal frequency range resulted in a decrease of the produced frequencies, and an increase when stimulating just below (r=−0.35, p=0.008; Fig 5B). The effects of hemispheric asymmetry on stimulation were preserved, with stronger frequency dependence in the right hemisphere.
Figure 5.

Comparison of stimulation-feedback interactions to frequency tuning. A, Results are shown demonstrating variations in interactions based on frequency tuning (CF), separately for +2 and −2 ST, and for stimulation alone. Curves are a gaussian smoothing of the results (see Methods). Individual data points are shown only for significant sites (p<0.05). Shaded areas are 95% confidence intervals calculated for a random association between CF and interaction coefficients (pink: +2 ST, grey −2 ST). Vertical lines indicate vocal mean frequency (7 kHz), its harmonic (14 kHz), and the surrounding ±2 ST (dotted lines). B, Scatter plot comparing stimulation-evoked frequency changes to site CF. CFs are shown for an octave range centered around vocal mean and harmonic frequencies. Filled symbols are sites that were individually significant (p<0.05). Linear regression (line) and 95% confidence intervals (shaded) are shown, as are correlation coefficients for left (LH, blue), right hemisphere (RH, red), and all data. * p<0.05, ** p<0.001, ns: non-significant. C, Scatter plot comparing ±2 ST interactions and CF distance from vocal mean/harmonic frequency. The feedback range is indicated (±2 ST, dotted lines). D, Scatter plot comparing ±2 ST interactions, with +2 ST rectified, as in Fig 4, and expressed as the absolute value of the CF distance. Feedback range is indicated (dotted line). Regression and correlation coefficient are shown.
The frequency dependence of stimulation-feedback interaction was less straight-forward (Fig 5 A,C). As we might expect, the effects of stimulation were less prominent outside the vocal frequency range (i.e. <3 and >16 kHz) where little vocal acoustics were present. The interaction effects for +2 and −2 ST were often in opposite directions, particularly at frequencies just below the 7 kHz vocal mean frequency, but in the same direction as would be expected for vocal changes during shifted feedback alone, increased for −2 ST and decreased for + 2ST. Closer to 7 kHz (vertical dashed line), the signs of the effect were flipped, suggesting a decrease in the feedback vocal response. Interestingly, while the interaction was in the same direction as stimulation alone (orange) for −2 ST feedback (green) near the 7kHz vocal mean frequency, the opposite was true for +2 ST feedback (red). Replotting the interaction results the after rectifying the +2 ST response so that enhanced interactions were positive and inhibited negative, as before, against the CF distance from vocalization showed a significant positive correlation (Fig. 5D; r=0.33, p=0.018). These results suggest that stimulation of neurons closer to the frequency range of vocal acoustics (within the 2 ST range) was more likely to inhibit or block feedback-dependent vocal changes, while stimulating further away was more likely to enhance them. Interestingly, the regression of interaction vs. CF crosses from inhibiting to enhancing feedback very close to the 2 ST point (Fig. 5D). We also noted a few sites with significant interaction effects for −2 ST at frequencies much less than that of vocalization, closer to 3.5–4 kHz (Fig 5A). Since these sites were tuned for ~½ vocal frequency, it is possible that they actually had multi-peaked frequency tuning that included the vocal range. Multi-peaked neurons have been previously seen in AC, and typically exhibit harmonic relationships between their peaks.32
We also examined whether there were differences between the anatomic location and interactions effects, notably hemisphere and position within the electrode arrays. Based on sensory-response properties and frequency tuning (tonotopy),30 electrodes were estimated to span between primary AC (A1) and the adjacent lateral belt (Fig. 6A). We did not see any clear pattern based on electrode location or hemisphere (Fig. 6B).
Figure 6.

Anatomic distribution of interactions. A, Illustration of estimated electrode locations within auditory cortex. A1: primary AC, Belt: lateral belt. B, Plot showing spatial distribution of median interaction effects for each electrode by location within the electrode arrays, 2 left hemisphere and 3 right. Electrode locations are approximate, anterior (A), and dorsal (D) orientations are indicated.
We performed further analysis of other variables that might have affected the interaction between AC stimulation and feedback sensitivity, including electrode location (hemisphere, row, column), current (50 vs 100 μA), repeated stimulation, and CF, using multi-variate linear regression (Table II). While the CF dependence was preserved, the only other variable showing a significant effect was repeated stimulation, where repetition increased the likelihood of getting a positive/enhanced interaction. We did not see any significant effects of electrode location. We were unable to include the independent effects of stimulation alone in our regression model, as the interaction coefficient was derived from the stimulation data (for the individual site regression calculations) and therefore inherently correlated with the interaction.
Table II.
Predictors of stimulation-feedback interaction
| Regression coefficient (Hz) | Std. Error | t-statistic | p-value | |
|---|---|---|---|---|
| Hemisphere (right) | −16.5 | 33.9 | −0.49 | 0.63 |
| Electrode Row | 1.1 | 18.2 | 0.06 | 0.95 |
| Electrode Column | 23.2 | 16.2 | 1.43 | 0.16 |
| Current (50 vs 100) | 46.8 | 34.0 | 1.38 | 0.17 |
| Stim Repetition | 80.7 | 39.8 | 2.03 | 0.046 |
| CF Distance (kHz) | 321.7 | 132.8 | 2.42 | 0.018 |
Discussion
In this study, we examined the role of the auditory cortex in feedback-dependent vocal control. Combining electrical microstimulation of the AC with frequency-shifted vocal feedback, we found that stimulation of a subset of AC sites could either inhibit or enhance the changes in vocal production brought on by shifted feedback. This selectivity was primarily dependent upon the frequency tuning of neurons at the stimulation site, with stimulation closer to vocal frequencies blocking feedback responses and more distant stimulation enhancing it.
Evidence for the role of the auditory cortex in vocal control
Recent work in both humans and animals have implicated the AC as a critical site for feedback-dependent vocal control. Building upon computational and empiric models of general motor control, the AC is thought to be the primary site of sensory-motor integration during vocal production (Fig. 7). These models suggest that frontal motor control areas, including pre-frontal and pre-motors areas such as Broca’s area, make sensory predictions about the expected sounds of our voice, termed a forward model or efference copy.14,33 This signal is relayed to the AC where it is compared to the actual sensory feedback, allowing the calculation of feedback error, which can then be used by motor areas to help control vocal production. The neural pathways that mediate this audio-motor projection are unclear. In humans, the arcuate fasciculus connects the superior temporal gyrus, and higher order AC, with pre-motor areas involved in speech.34 Similar neural pathways appear to exist in non-human primates, including marmosets,35,36 though their function is an open question.
Figure 7.

Illustration of vocal motor and feedback pathways overlaying a marmoset brain. Green arrows indicate ascending auditory/feedback information, blue descending motor control, and orange efference copy predictions projection from motor to auditory areas.
Evidence that the AC is the site of sensory-motor comparisons during vocal production has been largely based upon physiologic responses. Because the vocal activity of the AC in both humans and non-human primates is vastly different than during sensory perception, suppressed as opposed to excited, it has been suggested that this inhibition plays a role in feedback processing and may be some form of feedback error signal.14,37–39 Supporting this conceptual model is the hypersensitivity of suppressed AC neurons to perturbed vocal feedback20,23 and the correlation of neural signals in AC with changes in vocal production.18,20,24 Under such a model, a neuron is maximally suppressed when there is no feedback error, and perturbed feedback results in increased neural activity that reflects the degree of the error. The exact mechanism of vocalization-induced suppression of AC is still unclear, as is evidence that feedback sensitivity in AC meets all the expected characteristics of such an error signal.
Despite this evidence for specialized processing in the AC during vocalization, evidence that the AC is actually involved in vocal control is still very limited. Results showing AC activity predicts subsequent vocal production are suggestive,24 but still only a correlation. More definitive evidence requires causal inference based upon manipulations of the neural circuit. Recent work has found that stimulation of marmoset AC can rapidly induce vocal changes, causal evidence that AC activity is sufficient to drive vocal production changes.24 These stimulation effects occurred on time-scales similar to that of feedback vocal control, in contrast to earlier work in songbirds where vocal changes could be induced with stimulation, but over time scales of days.40 The present work complements the evidence for a causal role of AC by demonstrating that it is also possible to block or modify normal feedback-dependent vocal control. Importantly, these effects were limited to a small subset of stimulation sites, primarily those near vocal frequencies and therefore most likely to be involved in processing frequency shifts in the feedback. These results support a model of vocal production where the AC is not only sufficient, but also necessary for feedback processing and vocal control.
Site specificity of stimulation effects
Another interesting observation in our results was that, while stimulation near vocal frequencies appeared to inhibit feedback control, stimulation more distant actually enhanced the effects of shifted feedback. The origins of this effect are unclear. In the absence of feedback, stimulation evokes differential changes in vocal production depending on frequency tuning, decreasing vocal frequency for sites just above vocal ranges, and increasing for just below. This frequency-dependence is distinctly different than the effects stimulation had on feedback control. One possible explanation is that electrical microstimulation can have a variety of effects on cortical circuits, including activation as well as inhibition.41 It is possible that the effects of isolated stimulation to increase vocal frequency were due to a generalized activation of these neurons, which could have a secondary effect of increasing their sensitivity/activation during shifted feedback. In contrast, the stimulation might have blocked other neurons, resulting in reduced responses to feedback, and therefore decreased vocal compensation. Biphasic effects of stimulation are also possible, as evidenced by vocalizations exhibiting transiently increased frequency followed by decreases, perhaps reflecting early neural activation by stimulation followed by inhibition. Resolving this question will require different methods to manipulate neural circuits and induce more specific changes in activity, such as pharmacologic inhibition or optogenetic methods.
We also found that location of stimulation had a significant effect upon the resulting vocal changes. Stimulation in the right AC evoked stronger changes than the left. While cortical asymmetry for either vocal production or perception in non-human primates is unclear, these results are consistent with recent evidence of right hemispheric asymmetry in human motor cortex control of vocal pitch,42,43 as well as older studies suggesting specialization of the right AC for pitch and prosody.44 We did not see a significant effect of hemisphere on stimulation-feedback interactions, thought right hemisphere stimulation did show a tendency towards inhibiting feedback rather than enhancing. We also did not see any meaningful changes based upon electrode position within the AC. Because anatomic connections between frontal and auditory areas are primarily to higher order AC (lateral belt, parabelt),45 it might be expected that electrodes in more lateral rows would exhibit greater effects, however this was not the case. In humans, posterior areas of the temporal lobe, notably planum temporale, have been suggested to be involved in the feedback monitoring process.14 Unfortunately, our electrode arrays were not positioned posteriorly enough to cover the equivalent location for marmosets. It should also be noted that our estimates of electrode locations were based on sensory responses, as histologic verification was not available, introducing a degree of uncertainty in precise electrode positions within AC.
It is also possible the lack of differences between A1 and more lateral areas may have been due the broader frequency-tuning of neurons in higher order AC46 somehow interfered with frequency-specific changes needed for vocal control. On the other hand, A1 has more narrow frequency-tuned receptive fields which might more effectively process small changes in feedback frequency. However, we did not see stronger effects of stimulation to suppress feedback in the more medial electrodes. Lateral belt neurons also often have more complex sensory receptive fields, that could potentially be specific to individual types of calls or have multiple frequency preferences, which may result in vocal effects that are not clearly frequency-specific. Future work will need to evaluate what role such complex receptive fields may play.
Pitfalls of electrical stimulation
Several other limitations of the current study also stem from the use of electrical microstimulation. While this approach allows spatial and temporally precise manipulation, it also introduces several potential confounds. One of these is a difficulty to repeat testing at the same stimulation site. We used high impedance electrodes in order to allow physiologic recordings from the same electrode. However, this may have resulted in the observation of a weaker effect of stimulation when repeated on the same electrode during subsequent sessions. The high current stimulation delivered in a high impedance electrode may have resulted in damage to the electrode insulation or to the local tissue itself. We attempted to mitigate this by testing a lower current during initial stimulation for many electrodes (50 vs 100 μA), but found weaker effects. Because repeated testing at the same site is desirable, to average out variability of marmoset vocalizations, alternate approaches will be needed in the future. This could potentially include use of lower impedance electrodes, forgoing the ability to record high-quality neural activity at the same location, or newer silicone probes that mixed separate recording and stimulation contacts. Alternatively, pharmacologic methods such as muscimol, an inhibitory GABA-ergic agonist, or optogenetic techniques may better facilitate repeated manipulation of the same cortical site. Both of these may also allow inhibition of a broader area of cortex, as electrical stimulation may have been too spatially focused to disrupt all the neurons involved in feedback processing. The potential trade-off would be poorer temporal specificity, at least in the case of muscimol, making it difficult to compare the same vocalization with and without inhibition. Finally, it is possible that effects of electrical stimulation may have been mediated by indirect activation of other brain structures, such as thalamocortical and other descending projections. Although vocal suppression has primarily been seen in cortex, we cannot rule out the possible contributions of subcortical structures, including the basal ganglia and cerebellum. Use of an inhibitory manipulation may complement the current results by blocking AC processing with less confounding of excitatory stimulation effects.
Clinical implications
Self-monitoring during our speech plays an important role in our ability to maintain accurate vocal production. This mechanism not only allows us to correct any errors we might make, but also allows us to adapt to changing and noisy acoustical environments that may interfere with our communication.47 Similar processes may also be involved in the sensorimotor learning of speech during infancy.48 Dysfunction of these mechanisms have been seen in a variety of clinical disorders, including hearing loss,8 neurologic diseases with associated vocal dysfunction like cerebellar disorders and Parkinson’s,12,13 and primary vocal disorders such as vocal cord paralysis and spasmodic dysphonia.10,11 Interestingly, dysfunction of this mechanism has also been suggested to be a possible cause for auditory hallucinations of schizophrenia.49 Better understanding of the these mechanisms, and the role of AC, may open several novel therapeutic avenues. This might include something as simple of different approaches to speech therapy based upon self-monitoring. For patients with hearing loss, it may be possible to optimize hearing rehabilitation devices, like cochlear implants, to maximize vocal feedback information. In vocal disorders where abnormal sensory-motor processing may be a primary culprit, such as in spasmodic dysphonia,50 it may be possible to manipulate or block sensory feedback to induce vocal stability by forcing the vocal motor system to completely ignore such feedback. A similar approach has shown benefit in some patients with stuttering.51 Finally, establishing which neural structures are specifically involved in feedback vocal control may lead novel approaches using neural stimulation or modulation, either non-invasive or implanted, to improve patients’ control of voice.
Conclusion
Hearing plays an important role in vocal self-monitoring and feedback-dependent control of the voice. Using non-human primates, we show that electrical stimulation of the auditory cortex both evokes changes in vocal production and can interact with shifted vocal feedback in a frequency-dependent manner. These results provide causal evidence for the role of the auditory cortex in vocal control, and may allow the development of novel therapies for primary and secondary vocal dysfunction.
Acknowledgements:
The authors would like to thank T. Coleman and P. Sayde for assistance with animal training and care.
Funding:
This work was supported by NIH grants K08-DC014299 and R01-DC018525 (SJE), and funding from a Triological Society Clinician-Scientist Development Award (SJE)
Footnotes
Conflicts of Interest: The authors have no financial relationships or conflicts of interest to disclose.
Meeting Information
This is a TRIOLOGICAL THESIS (2022-07) and was a recipient of the Edmund Prince Fowler Award for Basic Sciences. Presented at COSM, Dallas, TX, April 2022.
Level of Evidence: N/A
References
- 1.Levelt WJ. Monitoring and self-repair in speech. Cognition 1983; 14:41–104. [DOI] [PubMed] [Google Scholar]
- 2.Burnett TA, Freedland MB, Larson CR, Hain TC. Voice F0 responses to manipulations in pitch feedback. J Acoust Soc Am 1998; 103:3153–3161. [DOI] [PubMed] [Google Scholar]
- 3.Houde JF, Jordan MI. Sensorimotor adaptation in speech production. Science 1998; 279:1213–1216. [DOI] [PubMed] [Google Scholar]
- 4.Bauer JJ, Mittal J, Larson CR, Hain TC. Vocal responses to unanticipated perturbations in voice loudness feedback: An automatic mechanism for stabilizing voice amplitude. J Acoust Soc Am 2006; 119:2363–2371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lee BS. Effects of delayed speech feedback. J Acoust Soc Am 1950; 22:824–826. [Google Scholar]
- 6.Smith CR. Residual hearing and speech production in deaf children. J Speech Hear Res 1975; 18:795–811. [DOI] [PubMed] [Google Scholar]
- 7.Lane H, Webster JW. Speech deterioration in postlingually deafened adults. J Acoust Soc Am 1991; 89:859–866. [DOI] [PubMed] [Google Scholar]
- 8.Gautam A, Naples JG, Eliades SJ. Control of speech and voice in cochlear implant patients. Laryngoscope 2019; 129:2158–2163. [DOI] [PubMed] [Google Scholar]
- 9.Gautam A, Brant JA, Ruckenstein MJ, Eliades SJ. Real-time feedback control of voice in cochlear implant recipients. Laryngoscope investigative otolaryngology 2020; 5:1156–1162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Thomas A, Mirza N, Eliades SJ. Auditory feedback control of vocal pitch in spasmodic dysphonia. Laryngoscope 2020. [DOI] [PubMed] [Google Scholar]
- 11.Naunheim ML, Yung KC, Schneider SLet al. Vocal motor control and central auditory impairments in unilateral vocal fold paralysis. Laryngoscope 2019; 129:2112–2117. [DOI] [PubMed] [Google Scholar]
- 12.Liu H, Wang EQ, Metman LV, Larson CR. Vocal responses to perturbations in voice auditory feedback in individuals with Parkinson’s disease. PLoS One 2012; 7:e33629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Houde JF, Gill JS, Agnew Zet al. Abnormally increased vocal responses to pitch feedback perturbations in patients with cerebellar degeneration. J Acoust Soc Am 2019; 145:EL372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Hickok G, Houde J, Rong F. Sensorimotor integration in speech processing: computational basis and neural organization. Neuron 2011; 69:407–422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Paus T, Perry DW, Zatorre RJ, Worsley KJ, Evans AC. Modulation of cerebral blood flow in the human auditory cortex during speech: role of motor-to-sensory discharges. Eur J Neurosci 1996; 8:2236–2246. [DOI] [PubMed] [Google Scholar]
- 16.Crone NE, Hao L, Hart J Jr. et al. Electrocorticographic gamma activity during word production in spoken and sign language. Neurology 2001; 57:2045–2053. [DOI] [PubMed] [Google Scholar]
- 17.Greenlee JD, Jackson AW, Chen Fet al. Human auditory cortical activation during self-vocalization. PLoS One 2011; 6:e14744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Chang EF, Niziolek CA, Knight RT, Nagarajan SS, Houde JF. Human cortical sensorimotor network underlying feedback control of vocal pitch. Proc Natl Acad Sci U S A 2013; 110:2653–2658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Curio G, Neuloh G, Numminen J, Jousmaki V, Hari R. Speaking modifies voice-evoked activity in the human auditory cortex. Hum Brain Mapp 2000; 9:183–191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Behroozmand R, Oya H, Nourski KVet al. Neural correlates of vocal production and motor control in human Heschl’s Gyrus. J Neurosci 2016; 36:2302–2315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Muller-Preuss P, Ploog D. Inhibition of auditory cortical neurons during phonation. Brain Res 1981; 215:61–76. [DOI] [PubMed] [Google Scholar]
- 22.Eliades SJ, Wang X. Sensory-motor interaction in the primate auditory cortex during self-initiated vocalizations. J Neurophysiol 2003; 89:2194–2207. [DOI] [PubMed] [Google Scholar]
- 23.Eliades SJ, Wang X. Neural substrates of vocalization feedback monitoring in primate auditory cortex. Nature 2008; 453:1102–1106. [DOI] [PubMed] [Google Scholar]
- 24.Eliades SJ, Tsunada J. Auditory cortical activity drives feedback-dependent vocal control in marmosets. Nature communications 2018; 9:2540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Eliades SJ, Wang X. Neural correlates of the lombard effect in primate auditory cortex. J Neurosci 2012; 32:10737–10748. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Eliades SJ, Miller CT. Marmoset vocal communication: Behavior and neurobiology. Developmental neurobiology 2017; 77:286–299. [DOI] [PubMed] [Google Scholar]
- 27.Agamaite JA, Chang CJ, Osmanski MS, Wang X. A quantitative acoustic analysis of the vocal repertoire of the common marmoset (Callithrix jacchus). J Acoust Soc Am 2015; 138:2906–2928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Burnett TA, Larson CR. Early pitch-shift response is active in both steady and dynamic voice pitch control. J Acoust Soc Am 2002; 112:1058–1063. [DOI] [PubMed] [Google Scholar]
- 29.Eliades SJ, Wang X. Chronic multi-electrode neural recording in free-roaming monkeys. J Neurosci Methods 2008; 172:201–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Rauschecker JP, Tian B, Hauser M. Processing of complex sounds in the macaque nonprimary auditory cortex. Science 1995; 268:111–114. [DOI] [PubMed] [Google Scholar]
- 31.Burish MJ, Stepniewska I, Kaas JH. Microstimulation and architectonics of frontoparietal cortex in common marmosets (Callithrix jacchus). J Comp Neurol 2008; 507:1151–1168. [DOI] [PubMed] [Google Scholar]
- 32.Kadia SC, Wang X. Spectral integration in A1 of awake primates: neurons with single- and multipeaked tuning characteristics. J Neurophysiol 2003; 89:1603–1622. [DOI] [PubMed] [Google Scholar]
- 33.Hage SR, Nieder A. Dual neural network model for the evolution of speech and language. Trends Neurosci 2016; 39:813–829. [DOI] [PubMed] [Google Scholar]
- 34.Glasser MF, Rilling JK. DTI tractography of the human brain’s language pathways. Cereb Cortex 2008; 18:2471–2482. [DOI] [PubMed] [Google Scholar]
- 35.Reser DH, Burman KJ, Yu HH et al. Contrasting patterns of cortical input to architectural subdivisions of the area 8 complex: a retrograde tracing study in marmoset monkeys. Cereb Cortex 2013; 23:1901–1922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Romanski LM, Bates JF, Goldman-Rakic PS. Auditory belt and parabelt projections to the prefrontal cortex in the rhesus monkey. J Comp Neurol 1999; 403:141–157. [DOI] [PubMed] [Google Scholar]
- 37.Eliades SJ, Wang X. Corollary discharge mechanisms during vocal production in marmoset monkeys. Biological psychiatry Cognitive neuroscience and neuroimaging 2019; 4:805–812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Niziolek CA, Nagarajan SS, Houde JF. What does motor efference copy represent? Evidence from speech production. J Neurosci 2013; 33:16110–16116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Behroozmand R, Larson CR. Error-dependent modulation of speech-induced auditory suppression for pitch-shifted voice feedback. BMC Neurosci 2011; 12:54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lei H, Mooney R. Manipulation of a central auditory representation shapes learned vocal output. Neuron 2010; 65:122–134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Logothetis NK, Augath M, Murayama Yet al. The effects of electrical microstimulation on cortical signal propagation. Nat Neurosci 2010; 13:1283–1291. [DOI] [PubMed] [Google Scholar]
- 42.Kort NS, Nagarajan SS, Houde JF. A bilateral cortical network responds to pitch perturbations in speech feedback. Neuroimage 2014; 86:525–535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Dichter BK, Breshears JD, Leonard MK, Chang EF. The control of vocal pitch in human laryngeal motor cortex. Cell 2018; 174:21–31 e29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Zatorre RJ, Belin P. Spectral and temporal processing in human auditory cortex. Cereb Cortex 2001; 11:946–953. [DOI] [PubMed] [Google Scholar]
- 45.de la Mothe LA, Blumell S, Kajikawa Y, Hackett TA. Cortical connections of auditory cortex in marmoset monkeys: lateral belt and parabelt regions. Anatomical record 2012; 295:800–821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Rauschecker JP, Tian B. Processing of band-passed noise in the lateral auditory belt cortex of the rhesus monkey. J Neurophysiol 2004; 91:2578–2589. [DOI] [PubMed] [Google Scholar]
- 47.Luo J, Hage SR, Moss CF. The Lombard Effect: From acoustics to neural mechanisms. Trends Neurosci 2018; 41:938–949. [DOI] [PubMed] [Google Scholar]
- 48.Doupe AJ, Kuhl PK. Birdsong and human speech: common themes and mechanisms. Annu Rev Neurosci 1999; 22:567–631. [DOI] [PubMed] [Google Scholar]
- 49.Ford JM, Mathalon DH, Heinks T, Kalba S, Faustman WO, Roth WT. Neurophysiological evidence of corollary discharge dysfunction in schizophrenia. Am J Psychiatry 2001; 158:2069–2071. [DOI] [PubMed] [Google Scholar]
- 50.Mor N, Simonyan K, Blitzer A. Central voice production and pathophysiology of spasmodic dysphonia. Laryngoscope 2018; 128:177–183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Foundas AL, Mock JR, Corey DM, Golob EJ, Conture EG. The SpeechEasy device in stuttering and nonstuttering adults: fluency effects while speaking and reading. Brain Lang 2013; 126:141–150. [DOI] [PubMed] [Google Scholar]
