Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 May 1.
Published in final edited form as: Hear Res. 2017 Mar 9;348:98–111. doi: 10.1016/j.heares.2017.03.001

Contributions of Sensory Tuning to Auditory-Vocal Interactions in Marmoset Auditory Cortex

Steven J Eliades 1, Xiaoqin Wang 2
PMCID: PMC5392437  NIHMSID: NIHMS858501  PMID: 28284736

Abstract

During speech, humans continuously listen to their own vocal output to ensure accurate communication. Such self-monitoring is thought to require the integration of information about the feedback of vocal acoustics with internal motor control signals. The neural mechanism of this auditory-vocal interaction remains largely unknown at the cellular level. Previous studies in naturally vocalizing marmosets have demonstrated diverse neural activities in auditory cortex during vocalization, dominated by a vocalization-induced suppression of neural firing. How underlying auditory tuning properties of these neurons might contribute to this sensory-motor processing is unknown. In the present study, we quantitatively compared marmoset auditory cortex neural activities during vocal production with those during passive listening. We found that neurons excited during vocalization were readily driven by passive playback of vocalizations and other acoustic stimuli. In contrast, neurons suppressed during vocalization exhibited more diverse playback responses, including responses that were not predictable by auditory tuning properties. These results suggest that vocalization-related excitation in auditory cortex is largely a sensory-driven response. In contrast, vocalization-induced suppression is not well predicted by a neuron's auditory responses, supporting the prevailing theory that internal motor-related signals contribute to the auditory-vocal interaction observed in auditory cortex.

Keywords: Auditory Cortex, Auditory-Vocal Interaction, Marmoset, Vocalization, Sensory-Motor

1. Introduction

Recent investigations in both humans and non-human primates have begun to reveal the role of the central auditory system, and in particular the auditory cortex, in representing the sound of an animal's own vocalizations during vocal production. During vocal communication, vocalized sounds are heard by both the intended recipients and the individual producing them (Békèsy 1949). Neural encoding of this vocal feedback is thought to be crucial for monitoring one's own voice (Hickok et al. 2011; Houde and Nagarajan 2011; Levelt 1983), and may play a role in feedback-dependent control of vocalization in both animals (Brumm et al. 2004; Leonardo and Konishi 1999; Schuller et al. 1974; Sinnott et al. 1975) and humans (Burnett et al. 1998; Houde and Jordan 1998; Lane and Tranel 1971; Lee 1950).

Single neuron recordings in the auditory cortex of the marmoset (Callithrix jacchus), a highly vocal New World primate, have demonstrated the presence of two types of responses during vocal production, vocalization-induced suppression and vocalization-related excitation (Eliades and Wang 2003). Vocalization-induced suppression affects approximately 70% of neurons in marmoset auditory cortex (Eliades and Wang 2013), is observed across different types of vocalizations, and is thought to be caused by inhibitory signals originating from brain regions that initiate and control vocal production. Moreover, neurons showing vocalization-induced suppression exhibit an increased sensitivity to alterations in auditory feedback during vocalization and may play a role in self-monitoring (Eliades and Wang 2008a, 2012). In contrast, neurons showing vocalization-related excitation, representing a small proportion of auditory cortex neurons, tend to respond during a more limited set of vocalization types (Eliades and Wang 2013) and are less sensitive to altered auditory feedback (Eliades and Wang 2008a). The origin of the differences between these two groups of neurons is not clear.

Several recent parallel human investigations have addressed the suppression of human auditory cortex during speech (Crone et al. 2001; Curio et al. 2000; Christoffels et al. 2007; Flinker et al. 2010; Greenlee et al. 2011; Heinks-Moldonado et al. 2006; Houde et al. 2002). These studies demonstrated that auditory cortex is activated during both speech production and perception, with reduced responses during speaking, termed speech-induced suppression. Human studies have also demonstrated vocal feedback sensitivity similar to that observed in marmosets (Behroozmand et al. 2011, 2016; Chang et al. 2013). However, a lack of spatial resolution has prevented a more accurate characterizations of the auditory component of speech production-related activity in human auditory cortex.

More recent work in rodents has begun to reveal possible neural circuits underlying such suppression. These experiments have revealed a direct suppression of auditory cortex from connections originating in M2, a putative equivalent of pre-motor cortex (Nelson et al., 2013; Schneider et al., 2014; Schneider and Mooney, 2015a). When paired with a predictable motor-triggered tone, there is a suppression of the tone-evoked sensory response in auditory cortex (Schneider and Mooney, 2015b), similar to what has been described in human subjects (Martikainen et al., 2005; Agnew et al., 2013). Although this suppression of self-generated sensory responses is thought to have a generally similar mechanism to vocalization- and speech-induced suppression, the extent of the mechanistic overlap remains an open question.

A better understanding of these auditory-vocal interactions and their underlying mechanisms requires a more thorough characterization of the contributions of sensory inputs. However, our previous efforts to examine these integration mechanisms has not revealed meaningful differences in auditory tuning between vocalization suppressed and excited auditory cortex neurons (Eliades and Wang 2003, 2008a). Here we conducted further analyses of single neuron recordings obtained from auditory cortex of two naturally vocalizing marmosets (Eliades and Wang 2013) in order to more specifically compare auditory and vocal responses of each neuron. We expand on our previous results by examining responses to previously un-analyzed auditory control stimuli. In contrast to our previous findings in which auditory tuning of suppressed and excited neurons were found to be similar, this new analysis demonstrates that vocalization-related excitation is highly predictable based on a neurons passive auditory responses, whereas neurons exhibiting vocalization-induced suppression exhibit more diverse auditory tuning properties, including vocal responses that could not be predicted based upon passive auditory responses. Given the scarcity of single neuron data obtained from naturally vocalizing monkeys, these results add valuable contributions to our understanding of auditory-vocal interaction mechanisms in the primate brain.

2. Materials Methods

All experiments were conducted under the guidelines and protocols approved by the Johns Hopkins University Animal Care and Use Committee. The neural data analyzed in this report were obtained from the same animals studied in our previous work (Eliades and Wang, 2013). In these chronic recording experiments, we typically collected a large amount of data under multiple experimental conditions from each neuron. In the Eliades and Wang (2013) study, we focused on comparing vocal responses in auditory cortex of marmosets between different classes of marmoset vocalization. This previous publication, however, only included responses from a limited subset of the auditory control stimuli tested. The present study includes analyses of previously unpublished neural responses to auditory control stimuli and additional analyses including modeling of vocal responses, described further below. Details of the neural recording experiments can be found in our previous publication (Eliades and Wang, 2013) and are only briefly described below.

2.1 Electrophysiological recordings

Two marmoset monkeys (Callithrix jacchus) were each implanted bilaterally with Warp-16 multielectrode arrays (Neuralynx, Bozeman, MT). Each array contained 16 individually moveable microelectrodes (2-4 MOhm impedances). Details on the electrode arrays and recordings, as well as spike sorting procedures, have been previously described (Eliades and Wang 2008a,b). Auditory cortex was located with standard single-electrode methods prior to array placement (Lu et al. 2001). Both hemispheres were recorded for each animal, starting first in the left hemisphere and subsequently in both simultaneously. Histological examination showed arrays spanning primary auditory cortex, lateral belt and possibly a portion of parabelt fields (Eliades and Wang 2008b).

2.2 Auditory response characterization

Auditory responses were measured within a soundproof chamber (Industrial Acoustics, Bronx, NY), with the animal seated and head restrained in a custom primate chair. Auditory stimuli were presented free-field by a speaker (B&W DM601) located 1 m in front of the animal. Stimuli included both tone- and noise-based sounds to assess frequency tuning and rate-level responses. Tone-based stimuli consisted of randomly ordered 100 ms pips at 500 ms inter-stimulus intervals, with frequencies spanning 1-32 kHz (5 octaves) at 1/10th octave steps. During most sessions, frequency tuning was measured at 3 sound pressure levels (30, 50, 70 dB SPL); a subset of sessions used a more extensive SPL range (-10 to 80 dB in 10 or 20 dB intervals) to measure the full frequency response area (FRA) map. Band-pass noise stimuli were presented similarly to tones, but were 200 ms in duration, 0.5 octave in bandwidth, and the center frequency varied at 1/5th octave steps. Selected tone and bandpass frequencies were tested more extensively at multiple SPLs (-10 to 90 dB in 10 dB intervals) to assess rate-level tuning. Rate-level functions using wideband (white) noise stimuli were also collected from all units.

In addition, multiple examples of recorded vocalizations were played at different sound levels (“playback”). These include samples of the animal's own vocalizations (previously recorded from that animal) and conspecific vocalization samples (from other animals living in the marmoset colony). These included multiple exemplars (6-10) from each of the four major marmoset vocalization classes: phee, trillphee, trill, and twitter (Agamaite et al. 2015; Epple 1968). Based upon the responses to these vocalization stimuli, one or two samples of each call type were selected and presented at multiple SPLs (0 to 90 dB in 10 dB steps) to measure vocal rate-level tuning. All vocalization samples were previously recorded at 50 kHz sampling rate, filtered to exclude low-frequency (<1 kHz) background noise, and normalized to have equal stimulus power. A subset of vocalization stimuli were also presented with a parametrically varying mean frequency, computed using a hetrodyning technique (Schuller et al., 1974). This technique involves serial convolution of a vocal signal with cosines of different frequencies and results in a linear frequency shift of a desired magnitude. Samples were first up-sampled (3×), scaled in frequency by convolution with a 25 kHz cosine, high-pass filtered to remove the aliased signal, convolved with a second cosine of 25-f kHz (where f is the desired frequency shift), low-pass filtered, and finally down-sampled back to the original sample rate. The responses from these additional vocalization stimuli, including parametric changes in loudness and mean frequency, were not included in previous analyses.

2.3 Vocal recordings

Simultaneous vocal and neural recordings were performed following auditory testing. These were performed either in the marmoset colony (Eliades and Wang 2008a,b), allowing the subject to vocally interact with other animals, or in the laboratory, where the animal engaged in antiphonal calling with computer-controlled playback of conspecific vocalizations (Miller and Wang 2006). Vocal production was recorded using a directional microphone (AKG C1000S) placed ∼20 cm in front of the animal and digitized at a 50 kHz sampling rate (National Instruments PCI-6052E) and synchronized with neural recordings. Individual vocalizations were extracted from the microphone recording and manually classified into established marmoset vocalization types (Agamaite et al. 2015) based upon visual inspection of their spectrograms.

2.4 Data analysis

Neural responses to individual vocalizations were calculated by comparing the firing rate during vocalization to spontaneous activity before vocal onset. Individual vocalization responses were quantified with a normalized metric, the response modulation index (RMI) to correct for firing rate differences between units (Eliades and Wang 2003).

RMI=(RvocalRprevocal)/(Rvocal+Rprevocal)

where Rvocal is the firing rate during vocalization and Rprevocal is the average rate before vocalization. Negative RMIs indicate suppression during vocalization and positive values indicate strongly driven activity. The overall response of a neuron to a given vocalization type was measured by averaging the RMI from multiple individual vocalizations. Only units with sufficient samples of a given vocalization type (≥4) were included for analysis. Responses to playback of vocalization stimuli were similarly quantified. To differentiate, RMI measured during vocal production are referred to as ‘Vocal RMI’ and measurements during playback of vocal stimuli as ‘Auditory RMI’. Only those vocal stimuli presented at similar SPLs to vocal production (determined post-hoc separately for each class of vocalization and for each session) were included in the Auditory RMI calculation.

Auditory tuning properties, include center frequency (CF) and rate-level tuning, were measured from responses to tone, bandpass, and wideband noise stimuli. CF was defined as the frequency evoking the highest firing rate response across all SPLs tested. A separate measurement of CF was performed using those tones matching the loudness of the vocalizations actually produced by the animal (typically ≥70 dB SPL) for secondary analyses. In cases where there was a response to both tone and bandpass stimuli, the CF was chosen from the stimulus with the strongest response. Rate-level responses were measured for both simple stimuli and vocal playback stimuli; however, the two correlated highly and therefore rate-level analysis is presented only for vocal playback responses. A Monotonicity Index (MI) was measured for each rate-level response, defined as the firing rate to the loudest stimulus divided by the strongest response (Sadagopan and Wang 2008). An MI >0.5 indicates a monotonically increasing or saturating rate-level function, while an MI <0.5 indicates a non-monotonic (peaked) function.

Statistical significance of differences between vocalization and playback responses (RMIs) was determined for individual units using Wilcoxon signed-rank testing. Trends across neural populations were tested using correlation coefficients and Kruskal-Wallis non-parametric ANOVAs. P values <0.05 were considered statistically significant.

2.5. Vocal response model

In order to better characterize the contribution of auditory tuning to vocal responses, a simple linear model was created, similar to that of Bar-Yosef et al. (2002). Similar models have also been used successfully to explain responses to complex stimuli in sub-cortical auditory brain areas (Bauer et al., 2002; Holmstrom et al., 2010). First, the acoustic frequency spectrum of each vocalization was measured using a power-spectral density function. Because only the four major marmoset vocalization types were used, none of which contain low frequency spectral information, frequencies below 2 kHz were discarded. The power-spectral density function was then used to select matching frequency-level bins from the tone FRA, and the firing rate of these bins averaged according to:

Rvocal=1Nf=1NRtone(f,A{f})

where Rtone is the tone-based FRA firing rate, and A{f} is the power spectrum of a given vocal sample. Because of the higher sampling density of the vocal power-spectral function, the FRA was spline-interpolated to increase density by 10×. Only units with full FRAs (those with at least 5 sound levels tested) were included. This process was repeated for each vocalization produced, and for all vocal playback samples. Model prediction results were measured at the population level by the correlation coefficient between predicted and measured unit mean firing rates for vocalization and for playback. Unsurprisingly, predictions within individual units (i.e. predictions based upon responses to different vocalizations/samples) were found to be weak. Therefore only the prediction of the unit average response was used, and prediction accuracy was calculated at the population level (i.e. predicting which units would be more and which units less responsive to vocal production and playback). All calculations were performed separately for each class of vocalization.

3. Results

We recorded neural activities from 1603 single-units in the bilateral auditory cortices of two marmoset monkeys (Eliades and Wang, 2013). Of these units, 66% were collected from the first marmoset, the remaining 34% from the second marmoset which was recorded over a shorter time period due to other constraints. All units were studied both during self-initiated vocal production and during auditory testing (passive playback) to measure receptive field properties and responses to previously recorded vocal stimuli. Based on our previous observations (Eliades and Wang 2003), we broadly classified responses during vocal production as either “suppressed” (RMI≤-0.2) or “excited” (RMI ≥0.2), but also examined vocal responses along a continuous axis from strongly suppressed (RMI -1) to strongly driven (RMI +1).

3.1. Comparison of responses during vocal production and playback

Each unit recorded during vocal production was also tested to determine its responses to passive playback of a library of vocalizations previously recorded from the same animal. The neural activities for a given type of marmoset vocalization were then compared for each unit to determine what components of vocalization-related modulation (suppression or excitation) might be explained by the passive auditory responses to vocal playback stimuli.

Figure 1 illustrates one example unit's responses to trill vocalizations. This unit was excited during vocal production (mean vocal RMI 0.52±0.36) with a strong onset response followed by sustained activity for the duration of the trill vocalizations (Fig. 1B, D). Playback of previously recorded trill vocalizations also resulted in strongly driven auditory responses (mean auditory RMI 0.84±0.2), but with considerable variability between different exemplars tested (Fig. 1C, D). This pattern of responses, excited during both vocal production and playback, was characteristic for excited units.

Fig. 1.

Fig. 1

Sample unit with excitatory responses during both vocal production and playback. A: Spectrogram of sample trill vocalization. B: Raster plot of unit response to produced trill vocalizations, aligned by vocal onset. Shaded: duration of vocalization. C: Raster plot of unit response to playback of trills, including phase locking to vocal oscillations for some samples. D: Peri-stimulus time histograms (PSTHs) for trill vocalization production (blue) and playback (black) aligned by vocal onset. This unit showed similar response to both production and playback, including onset and sustained responses.

In contrast to excited units, units suppressed during vocal production had more variable responses to playback vocalizations (Fig. 2). Some suppressed units also exhibited suppression during playback, such as the example unit in Figure 2A-D. This unit was suppressed during trillphee vocalizations (Fig. 2B, RMI -0.79±0.16) and during playback of trillphees (Fig. 2C, RMI -0.38±0.37). Interestingly, the suppression during playback did not develop until later in the stimuli (Fig. 2D). Another suppressed unit (Fig. 2E-H) was suppressed during trill vocalizations (Fig. 2F, -0.46±0.53), but strongly driven by playback of recorded trills (Fig. 2G, 0.46±0.40).

Fig. 2.

Fig. 2

Sample units suppressed during vocalization, but with different playback responses. One unit (A-D) was suppressed during trillphee vocal production as well as during playback (though with some delay). The second unit (E-H) was suppressed during trill production, but strongly driven during playback. The second type of unit was more commonly encountered than the first.

Examining the relative prevalence of these unit populations reveals that vocalization excited units account for only 8.7% of the total samples (Table 1). Of these excited units, however, <5% were suppressed by playback vocalizations, suggesting that vocalization-related excitation is primarily an auditory response. In contrast, vocalization suppressed units made up 55% of all neurons recorded. Of these suppressed units, only 10.7% were also suppressed by playback vocalizations. Only in this small set of units might vocal suppression be a direct product of auditory tuning. About 45% of the suppressed units were driven by playback vocalizations, suggesting that vocalization-related suppression was likely induced by sources other than the ascending auditory inputs.

Table 1. Distribution of playback responses in suppressed and excited units.

Suppressed (RMI ≤ -0.2) (-0.2 to 0.2) Excited (RMI ≥ 0.2) Total

Auditory
RMI ≥ 0.2 24.8% 18.7% 5.9% 49.4%
-0.2 to 0.2 24.3% 16.6% 2.4% 43.3%
≤ -0.2 5.9% 1.1% 0.4% 7.3%
Total 55.0% 36.3% 8.7%

3.2 Population comparison of vocal production and playback responses

Figure 3 compares vocal and auditory RMIs for all tested units and each marmoset vocalization type. The results show that excited units were consistently excited by both vocal production and playback vocalizations (Fig. 3). Units with vocal RMI values near 0 had more diverse responses to playback vocalizations, but were generally biased towards driven responses. Suppressed units exhibited a greater variety of playback responses, including both driven and suppressed response. The majority of playback responses had positive auditory RMI values, indicating driven activities, regardless of the corresponding vocal responses. Overall, only 7.3% units showed suppression during playback vocalizations. There was a weak, but statistically significant correlation between vocal and auditory RMIs (phee r=0.13, p<0.001; trillphee r=0.17, p<0.001; trill r=0.16, p<0.001; twitter r=0.10, p<0.05).

Fig. 3.

Fig. 3

Population comparison of vocalization and playback responses. Vocalization (“Vocal RMI”) and playback (“Auditory RMI”) responses were quantified by a normalized RMI measure and averaged for each unit. Comparisons are plotted individually for the four most common marmoset vocalization classes: phee (blue, top left), trillphee (black, top right), trill (green, bottom left), and twitter (red, bottom right). Plotted curves indicate mean auditory RMI for units binned by their vocal RMI. Vocal RMIs <0 indicate suppression during vocalization, and RMIs>0 indicate excitation. Auditory responses were distributed around zero for suppressed units and increased with vocal responses. Error bars: bootstrapped 95% confidence intervals. Filled symbols: statistically significant deviations from 0 (p<0.01. signed-rank).

In Figure 4, we plot further analysis comparing vocal production and playback responses. For all types of vocalizations, the vocal-auditory RMI difference was biased towards negative values, indicating more suppression during vocal production compared to playback (Fig.4A, shaded bars indicating statistically significant units). Difference values of zero indicate units with identical responses to vocal production and playback. For phee vocalizations, the average RMI difference was -0.46±0.43 (p<0.001, signed-rank). For trillphees, trills, and twitters, this difference was -0.52±0.43, -0.45±0.42, and -0.50±0.45, respectively (p<0.001, for all). Units with positive differences, indicating stronger excitation during vocalization than playback, were uncommon, particularly units with statistically significant increases (Fig. 4A, shaded).

Fig. 4.

Fig. 4

Distribution of vocal-auditory differences. A: Histograms are plotted showing the distribution of RMI differences between vocalization and playback (vocal - auditory) for each unit. Most units showed large shifts towards negative values, indicating suppression. Shaded bars: units with statistically significant differences between vocal production and playback (p<0.05, ranksum). B: Plot of mean vocal-auditory differences for units binned by vocal RMI. Differences were nearly zero for excited units, indicating matched vocal and auditory responses. Differences were negative for units with vocal RMI near zero, indicating that these vocal “unresponsive” units were actually suppressed compared to playback Colors indicate vocalization types as in A. Grey: average response including all vocalization types. Error bars: bootstrapped 95% confidence intervals. Filled symbols: statistically significant deviations from 0 (p<0.01. signed-rank).

Analysis of overall population vocal-auditory response differences as a function of the vocal RMI shows that the largest difference were for the units that were suppressed most, and decreasing differences in less suppressed units (Fig. 4B). This trend was present for all vocalization types independently (p<0.001, Kruskal-Wallis ANOVA) and collectively as a group. Interestingly, excited units were the ones in which vocal and auditory responses matched most closely (difference close to zero). Another important observation was that neurons unresponsive during vocal production (vocal RMI ∼0) also had negative vocal-auditory differences, (p<0.01, signed-rank), indicating decreased vocal production responses when compared to vocal playback (relative suppression).

3.3 Vocalization responses and sound level tuning

One possible explanation for the differences between vocalization-suppressed and excited units is lower-level auditory tuning properties that are not fully captured by the responses to the playback of recorded vocalization stimuli. We therefore also examined basic auditory response properties of these units and compared results to vocalization-related activity. We first measured rate-level functions for multiple classes of stimuli, including tones, bandpass noise, wideband noise, and vocalizations. To illustrate the dependency of vocal and playback responses on sound level, we examined the relationship between vocal and auditory RMIs on the degree of rate-level monotonicity (Fig. 5). A monotonicity index (MI, see Methods) was calculated for each unit based on the response to vocal playback stimuli of varying SPL. Since playback vocalization stimuli were presented at sound levels matched to those of vocal production (generally >60 dB SPL), it was not surprising that units excited by playback stimuli (auditory RMI > 0) tended to be monotonic (MI > 0.5), whereas those units suppressed by the playback stimuli (auditory RMI <0) tended to be non-monotonic (MI <0.5), and therefore less responsive to the loud vocal playback stimuli (Fig. 5C).

Fig. 5.

Fig. 5

Effect of rate-level tuning on vocalization and playback responses. A: Average monotonicity index (MI) for units grouped by vocal RMI, showing equal monotonic and nonmonotonic units for suppression and increasing MI with vocal excitation. Error bars: 95% confidence intervals. B: Two-dimensional plot of mean MI grouped by both vocal and auditory RMI, showing the largest MIs for units excited by both vocalization and playback and smallest MIs for units suppressed by both. Color bar (right) indicates the MI scale. C: Average MI grouped by auditory RMI.

An examination of the relationship between vocal production responses (vocal RMI) and monotonicity revealed a more complex relationship (Fig 5A). As with auditory responses, units with positive vocal RMIs were biased towards monotonic units. The units with negative vocal RMIs, vocal production suppressed units, exhibited more variable MIs with both monotonic and non-monotonic playback responses. Further analysis of the interactions between vocal production and playback responses revealed that the variability of MI with vocal RMI strength was highly dependent upon the auditory response (Vocal: F=3.06, df=5, p<0.05; Auditory: F=8.69, df=5, p<0.001; Interaction: F=1.81, df=68, p<0.001, Kruskal-Wallis). Specifically, units with suppressed vocal responses (especially those vocal RMI near -1), tended to be monotonic if they had excitatory vocal playback responses (positive auditory RMI), and tended to be nonmonotonic if they had suppressed playback responses (Fig 5B). These observations are consistent with the hypothesis that, while vocalization-related excitation is a product of auditory sensory tuning, vocalization-induced suppression is not due not purely due to auditory response properties of the neurons, but rather related to the act of vocal production.

3.4 Vocalization responses and frequency tuning

We next examined auditory frequency tuning, measured with either tones or bandpass noise, to determine if frequency selectivity might account for differences in vocalization responses. A few units were found with clear correlations between vocal and auditory responses, such as the unit illustrated in Figure 6, a multi-peaked unit as has been previously described (Kadia and Wang 2003). One frequency peak overlaps the vocalization frequency range (Fig. 6B), and another overlaps the first harmonic of vocalization frequency. We tested this unit with two trill vocalization exemplars that were shifted in mean frequency using a heterodyning technique (Schuller et al. 1974). This unit's responses to these playback stimuli showed a similar spectral sensitivity profile (Fig. 6C) as the frequency tuning measured with tones (Fig. 6B) (r=-0.74, p<0.001, between 5-8.5 kHz). The unit's responses during self-produced trill vocalizations (Fig. 6D) also exhibited a similar frequency-dependence as the tone tuning (r=-0.65, p<0.001). Such units would appear to have vocalization preferences arising from their auditory tuning; however units with such clear correlations were uncommon in our sampled neural population.

Fig. 6.

Fig. 6

Sample unit frequency tuning and vocal responses. A: Tone frequency tuning curve exhibiting multi-peaked frequency tuning. Shaded: range of fundamental frequency and first harmonic for trill vocalizations recorded during the same testing session. B: Expansion of the frequency tuning curve focusing on the vocal range. C: Responses to individual samples (circles) and mean response (line) to the playback of two trill exemplars (orange, black) acoustically modified to sample a range of mean frequencies. Trill frequency tuning qualitatively reflects tone-based tuning in B. D: Firing rates are plotted against the mean frequency of self-produced trill vocalizations. Correlation coefficients of responses with mean vocal frequency are indicated.

We compared vocal response and frequency tuning, measured by center frequency (CF), over the whole population of tested units (Fig 7). There was no clear relationship between CF and vocal RMI. Both vocalization-suppressed and excited units were found at CFs near or distant from the frequency range typically occupied by the first two spectral components of marmoset vocalizations (marked by grey bars).

Fig. 7.

Fig. 7

Comparison of vocal response and unit center frequency. Scatter plots show unit mean vocal production response (RMI) against CF measured from either tone or bandpass noise tuning. No clear relationship is evident. Shaded: range of vocalization mean fundamental frequency and first harmonic.

It is likely, however, that using CF alone would under-represent the complexity of spectral tuning of auditory cortex units. Units varied widely in the tuning-width around the CF peak, and often had multiple frequency peaks (i.e. Fig. 6), both of which could affect the response to vocalizations. Marmoset vocalizations also typically consist of at least 2 harmonics of the fundamental frequency, any one of which could interact with a unit's frequency receptive field, possibly contributing to these inconsistencies. To explore these factors, we computed the fraction of units with firing rates ≥ 80% of the maximum frequency tuning peak within 1/2 octave of vocalization mean frequency or one of its harmonics (Fig. 8). Such units would be expected to have significant overlap of vocal spectral energy and the tone/noise tuning curve. In general, at least half of auditory cortex units met this criterion. Such overlaps were more prevalent amongst units excited by vocal production (Fig. 8A-B), although, again, the relationship to vocal playback was less clear (Fig. 8C). Even amongst units suppressed during vocalization, about half of the units had this proximity between frequency tuning peaks and vocal mean frequency, suggesting that frequency tuning also cannot fully account for vocalization-induced suppression, except in a subset of units.

Fig. 8.

Fig. 8

Effect of frequency tuning on vocalization and playback responses. The distance between vocal frequency (either fundamental or any harmonic) and the nearest frequency peak was measured. A: Fraction of units with a CF peak-vocal frequency difference < 0.5 octaves is plotted against vocal RMI. Unit fraction with close proximity was relatively constant except for units excited by vocalization, where it was increased. B: Two-dimensional histogram of unit fraction grouped by both vocal and auditory RMI, showing the largest overlapping fraction for units excited by both vocalization and playback. Colorbar (right) indicates the unit fraction scale. D: Unit fraction grouped by auditory RMI.

3.5 Model prediction of vocal responses

Because marmoset vocalizations contain multiple acoustic components, and receptive fields can be quite complex, explanation of vocal responses based upon sound level or frequency tuning in isolation are likely poor estimates. We therefore constructed a simple linear model (further details in Methods) to predict both vocalization and playback responses based on the approach from a previous study (Bar-Yosef et al. 2002). We measured tone-based frequency response areas (FRA) for each of 334 units. The spectral content for each vocal sample was determined using power-spectral density calculations and projected onto the FRA response (Fig. 9A). The firing rates of congruent frequency-level bins were then averaged to estimate the mean rate response to the vocalization. This was repeated for all recorded self-produced vocalizations and playback vocal stimuli (the same samples and matched loudness used above to estimate the auditory RMI). Comparison of mean unit responses and model predictions provides an estimate of the model's ability to predict the average degree of vocal suppression or excitation for the unit.

Fig. 9.

Fig. 9

Linear frequency response area model of mean vocalization responses. A: Illustration of the FRA-based model. A smoothed tone-measured FRA is shown overlaid with the power-spectral density function from a sample phee vocalization. The firing rate response from the overlapping bins was averaged to calculate the model prediction. B: Scatter plots comparing measured unit mean firing rates during vocal production and model predictions for all four major vocalization types (colored). Comparisons between units' mean playback responses and model predictions are shown in grey. Model predictions were better for playback than during vocal production, particularly for phee and trillphee vocalizations. (** p<0.001).

Overall, this model provided a reasonable population-level prediction (r=0.55, p<0.001) of mean unit responses to vocal playback (Fig. 9B). The prediction for vocal production was much poorer (r=0.18, p<0.001). Further examination of model predictions for different units showed that the model rarely predicted strong suppression or inhibition, either for vocalization or playback, a likely source of the poor prediction of vocal production results. The model tended to underestimate the playback responses to trills and twitters (linear regression slopes 0.35 [95% CI: 0.31,0.38] and 0.31 [0.27 0.34]) more so than phees and trillphees (0.56 [0.45 0.67] and 0.43 [0.32 0.53]), particularly for strongly driven activity. Overall, however, the accuracy of the prediction was surprisingly good for playback of individual vocal types (phee r=0.49, trillphee r=0.41, trill r=0.74, twitter r=0.75; all p<0.001). Predictions were poorer during vocal production, as expected given the rarity of predicted suppressed responses (phee r=-0.08, trillphee r=-0.08, p>0.05; trill r=0.31, twitter r=0.47, p<0.001).

We further measured model predictions by grouping units according to their vocal and playback responses and examining the predictions for each sub-group (Fig. 10). Model playback predictions were strongest for units with strongly driven responses (auditory RMI near +1), and weaker for units with less driven auditory responses (Fig. 10A). Interestingly, the model did a reasonable job for some suppressed units, particularly those with matched auditory and vocal RMIs, suggesting the model could account for some of sensory-related inhibition during vocal playback. Multiple linear regression confirmed increased predictions with excitation (r=0.57, F=3.88, p<0.05), with a stronger dependence on playback response strength (coefficient 0.67, [95% CI: 0.15 1.20]) than for production (-0.26, [CI -0.69 0.18]).

Fig. 10.

Fig. 10

Comparison of model prediction accuracy with vocalization and playback responses. Model predictions of unit mean responses were grouped by unit vocal and auditory RMIs and the prediction accuracy for each group measured by a correlation coefficient. Two-dimensional plots of the accuracy are shown separately for predictions of vocal playback (A) and production (B) responses. Colorbar (right) indicates the correlation scale. Predications were better for playback than vocalization. Predictions were also stronger for units excited by playback and/or vocal production, and weaker for vocalization-suppressed units.

Model predictions of vocal production responses (Fig. 10B) showed poor (negative) correlation for suppressed units, but good predictions (r>0.5) for most excited units. Linear regression again confirms improved prediction with excitation (r=0.63, F=5.38, p<0.05), with a similar dependence on playback responses (0.67, [0.18 1.17]), and modest improvement in dependence upon vocal responses (0.16, [-0.24 0.57]). This close match between predictions for both playback and production in excited units again suggests that such responses were a result of tuned ascending auditory inputs. In contrast, the significantly poorer and even inverse predictions for suppressed units were present.

We also examined the performance of the model in predicting responses to individual vocalization samples for each unit. For playback responses, prediction correlation coefficients varied widely from -1 to 1, but the average correlation across units was weak (0.06±0.34; p<0.01 signed-rank). The predictions for vocal production responses were even poorer (0±0.4; p>0.05). These results suggest that, while a simple model can reasonably predict which units will be excited or suppressed by playback vocalizations, and to a lesser extent vocal production, the model cannot predict how a unit will respond to the varying acoustics of individual vocalizations.

3.6 Subcortical contribution to model prediction of playback-production differences

Previous work has suggested that subcortical attenuation, in particular a combination of middle ear reflexes (Carmel and Starr 1963; Henson 1965; Salomon and Starr 1963; Suga and Jen 1975) and brainstem-level attenuation (Papanicolaou et al. 1986; Suga and Schlegel 1972; Suga and Shimozawa 1974) may also be present during vocal production, but not during passive listening. Such attenuation has been estimated between 20-40 dB SPL (Suga and Shimozawa 1974) and might bias model estimates of vocalization responses. We therefore repeated model calculations, factoring in varying degrees of sub-cortical loudness attenuation (0-60 dB). Overall model performance decreased with increasing attenuation, from r=0.18 for un-attenuated, to 0.06 and -0.07 for 20 and 40 dB, respectively. Specific examination of excited units (RMI ≥ 0.2), those with the best model accuracy, also showed reduced performance from r=0.71 to 0.52 and 0.12. Suppressed units did not exhibit any changes in performance (r= -0.41, -0.40, and -0.34). These results suggest that presumed sub-cortical sources of attenuation do not account for differences between vocal production-related activity and model estimates from sensory receptive fields.

3.7 Influence of cortical location on vocal responses

We further examined the effects of recording location on neural responses during vocal production. The recording arrays contained four rows of electrodes, with the medial two rows generally falling within primary auditory cortex (A1), the third row on lateral belt (LB), and the fourth row on parabelt (PB) areas. We first compared the prevalence of suppression and excitation by electrode row, and found generally similar proportions of suppressed (medial to lateral: 65.0%, 59.1%, 46.2%, and 52.5%; overall: 55%) and excited (8.6%, 6.6%, 11.4% 10.1%; overall: 8.7%) units. There was a general trend towards more suppressed units in medial (A1) electrodes, and more excited units in lateral (LB/PB) electrodes. We further examined the overall magnitude of the vocal RMI by electrode row, these distributions were highly overlapping, but with pattern of increased suppression in medial over lateral rows (mean RMI: -0.38±0.38, -0.32±0.29, -0.21±0.27, -0.26±0.30) that was statistically significant (p<0.001, Kruskall-Wallis).

Examination of all units' playback responses also revealed stronger auditory RMIs in more medial electrodes (medial to lateral: 0.25±0.34, 0.15±0.26, 0.15±0.24. 0.17±26; p<0.001, Kruskall-Wallis). However, when only vocal suppressed units (vocal RMI <-0.2) were examined, average auditory RMIs were not different between electrode rows (0.16±0.37, 0.11±0.30, 0.15±0.27, 0.15±0.30; p=0.36, Kruskall-Wallis). These results raise the possibility that some of the apparent differences in vocal responses between electrodes may have been due to differences in their passive auditory responses. Given that different cortical areas were not sampled at matched positions along the tonotopic axis in these experiments, it is difficult to disambiguate this confound or make strong claims about the role of different auditory cortical areas in vocal responses with the present data set.

4. Discussion

We examined the activities of a large number of single-units in the marmoset auditory cortex and compared the responses during self-produced vocalizations, basic auditory tuning, and responses to playback of recorded vocal sounds. We found that (1) neurons excited during vocal production were almost always excited by playback vocalizations, while (2) neurons suppressed by vocal production had more diverse playback responses, though generally favoring playback excitation. (3) Neurons excited by either playback or vocal production tended to have monotonic rate-level functions, while neurons suppressed by both tended to be non-monotonic. (4) Frequency tuning and frequency-based models predict playback responses more accurately than responses during vocal production, but generally fail to explain vocalization-induced suppression. These findings further our understanding of auditory-vocal mechanisms in the auditory cortex, and begin to explain some of the diverse neural activities that have been observed during vocal production.

4.1 Comparison with previous results

In our previous investigations, we failed to find a relationship between vocal production-related neural responses and sensory tuning of auditory cortex neurons. In particular, we noted that CF, threshold, and monotonicity did not predict the behavior of an auditory cortex neuron during vocalization and that many suppressed neurons would respond to playback of previously recorded (conspecific) vocalizations (Eliades and Wang 2003, 2013). We also noted that the variation of vocalization responses with vocal acoustics (Eliades and Wang 2005) or altered feedback (Eliades and Wang 2008a, 2012) was seemingly unrelated to auditory tuning. These previous studies were limited, however, by examining only simple frequency tuning parameters such as CF and the responses to a limited set of vocal playback stimuli.

The new analyses conducted in the present work, as well as the inclusion of additional auditory responses in our analyses, provide new insights beyond our previous work. In contrast to our previous findings, here we demonstrate that neurons with vocalization-related excitation are nearly universally responsive to vocal playback (Fig 1, 3) and have mostly monotonic rate-level tuning to vocal playback (Fig 5). Additionally, we examined frequency tuning properties besides CF, which allowed us to take into account tuning bandwidth, multiple frequency peaks, and vocal harmonics, and a large overlap between vocal acoustics and frequency-tuning (Fig 8) as well as high degree of predictability of vocal responses for both production and playback based upon a frequency-response area model (Fig 9-10). These results suggest that vocalization-related excitation during vocal production is largely, if not entirely, a sensory phenomenon. Since such neurons do not appear to be biased by vocal production, they may provide a mechanism for encoding outside sounds during vocalization.

In contrast, the results of the present study confirm our earlier finding that vocalization-induced suppression cannot be predicted based upon a neuron's auditory responses (Eliades and Wang, 2003) (Fig 2-3). This finding is consistent with the prevailing theory of vocal suppression arising from internal modulatory signals, as we further discuss below. One interesting exception is the presence of a small subset of suppressed neurons that were also suppressed by playback stimuli (∼10%), suggesting that the suppression was not entirely caused by motor signals in these neurons (Fig 3). Such neurons may be a significant contaminant of our previous analyses. For example, our previous results showing sensitivity to altered vocal feedback in suppressed neurons also found decreased sensitivity for the most strongly suppressed neurons (Eliades and Wang, 2008a). If many of these maximally suppressed neurons were driven by sensory instead of sensory-motor processes, it can explain why our previous work did not observe a relationship between vocal production-related neural responses and sensory tuning of auditory cortex neurons.

Another novel finding in these analyses were a significant number of neurons whose vocal responses were reduced compared to vocal playback responses, but not significantly below spontaneous activity (Fig 4). Under our previous definition of vocalization-induced suppression, these neurons were not classified as suppressed neurons. However, previous human studies have used a similar production-playback comparison measure to establish speech-induced suppression (i.e. Houde et al., 2002; Chang et al., 2013). Some differences in results (i.e. the relationship between suppression and altered feedback effects) between marmoset and human experiments may be in part attributable to these differing definitions of vocalization-induced suppression. Further work will need to take into account both possible definitions of vocal suppression in order to better reconcile results based on single-neuron recording in animals with those based on surface potential recordings and imaging in humans.

4.2 Modeling results

As part of these analyses, we also constructed a simple linear model to predict vocal responses for both vocal production and auditory playback based upon pure-tone FRA responses. This simple model, based upon Bar-Yosef et al. (2002) has an appeal in its ability to simultaneously integrate for multiple aspects of a unit's receptive field (center frequency, bandwidth, multiple frequency peaks, amplitude tuning) as well as the multiple harmonic components of marmoset vocalizations. Given the simplicity of the model, it was surprising the degree to which it was able to predict auditory playback responses (correlation coefficients of 0.41 to 0.75), although it did not perform well in predicting vocal production responses. One of the limitations of this approach is that it is well known that such predictive models are highly dependent on the types of stimuli used to make the prediction, such as artificial vs. natural stimuli (Laudanski et al., 2012). Additionally, linear models often fail to fully capture important non-linear interactions between frequency components in auditory receptive fields (Young et al. 2005). Our model also fails to capture any sensitivity to temporal or spectro-temporal information which may also be important (Theunissen et al., 2000). Despite these limitations, the observation of significantly better model predictions of auditory playback responses than for responses during the production of acoustically similar vocalizations is consistent with the notion that non-auditory inputs contribute to vocalization-induced suppression.

4.3 Cortical location and auditory-vocal interaction

We examined the strength and prevalence of vocalization-induced suppression in different auditory cortical fields. Marmoset auditory cortex is structured with a core-belt-parabelt organization common to non-human primates (de la Mothe et al., 2006, Kackett et al. 2001). Vocal response distributions were largely overlapping between more medial electrodes (A1), and more lateral ones (presumed lateral belt & parabelt areas). There were, however, statistically significant trends towards both stronger playback responses and stronger vocal suppression in more medial electrodes—However, these observations are possibly confounded by variations in the tonotopic locations of electrodes between different cortical areas. Further work with more extensive and matched spatial/spectral sampling of units will be needed in order to reveal the role of different auditory cortical areas in vocal suppression and auditory-vocal interaction.

4.3 Mechanism of auditory-vocal integration

One important question that remains unanswered is the mechanism by which a suppressive vocal motor signal is combined with auditory feedback signal at the level of an individual neuron in auditory cortex. The absence of correlation between passive auditory tuning and vocal responses for suppressed neurons shown in the present study suggests that vocalization-induced suppression is more complicated than a simple linear additive process (e.g., excitatory vocal feedback response being added to a static vocal motor inhibition). This is in contrast to the observed linear prediction of vocalization-related excitation based upon both vocal playback and pure tone responses. Several competing models can be posited to potentially explain the incongruent responses for vocal suppression.

The first is that vocalization-induced suppression represents an error signal (Behroozmand and Larson 2011; Houde and Nagarajan 2011; Niziolek et al. 2013). In this model, suppressed vocal responses reflect a direct subtraction of expected (efferent copy) sensory input, with maximal suppression resulting from a perfect match between vocal feedback and the expected signal (i.e. no feedback error). The effects of such efferent signals, also termed corollary discharges, have long been studied in various model systems (Crapse and Summer 2008; Sperry 1950; von Holst and Mittelstaedt 1950). Recent results using human MEG studies are consistent with this model, where natural speech fluctuations in vowel formant frequencies were found to evoke increased auditory cortical activity compared to vowels closer to the mean (Niziolek et al. 2013).

Additionally, recent work on motor efference in non-vocalizing rodents may also be consistent with this model. Optogenetic techniques have demonstrated a direct neural pathway for motor-induced suppression of auditory cortex during locomotion (Nelson et al., 2013; Schneider et al., 2014; Schneider and Mooney, 2015a). This pathway appears to provide multiple inputs to the auditory cortex, including from both motor cortex and the basal forebrain. When locomotion is paired with an expected sound, there is a suppression of stimulus-evoked activity to similar tone frequencies, but not to tones of more distant frequencies, suggesting a subtractive error comparison (Schneider and Mooney, 2015b; Nelson and Mooney, 2016). Whether or not a similar mechanism is active during vocal production remains an open question.

A second possible explanatory model is that efferent copy signals bias the receptive fields of auditory cortex neurons to better encode vocalization feedback. A selective scaling model of sensory tuning, as has been described for attention (Fritz et al. 2007), is one possibility. Another is a wholesale shift in receptive fields as has been described in parietal cortex during saccades (Duhamel et al. 1992). Some recent evidence has emerged that auditory cortex receptive fields can change dynamically with behavioral tasks (Fritz et al. 2005), and such changes are likely under the control of frontal cortex (Fritz et al. 2010). Which of these models might best explain the auditory-vocal integration observed in primate auditory cortex remains unanswered. However it should also be noted that they are not necessarily mutually exclusive. Future work will more directly test these models to determine the functional mechanism of auditory-vocal interaction and integration.

Highlights.

  • We compare responses to vocal production and listening in marmoset auditory cortex

  • Vocalization-related excitation is predicted from passive auditory responses

  • Vocalization-induced suppression in not clearly related to auditory tuning of cortical neurons

Acknowledgments

The authors thank A. Pistorio for assistance in animal care and training, and C. Miller for helpful feedback on this manuscript.

Grants: This work was supported by National Institute on Deafness and Other Communication Disorders Grants DC-005808 (X.W.) and DC-014299 (S.J.E).

Footnotes

Disclosures: No conflicts of interest, financial or otherwise, are declared by the authors.

Author Contributions: S.J.E. and X.W. shared conception and design of research, interpretation of results, manuscript editing and revision, and approval of the final version of this manuscript; S.J.E. performed experiments, analyzed data, prepared figures, and drafted the manuscript.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Agamaite JA, Chang CJ, Osmanski MS, Wang X. A quantitative acoustic analysis of the vocal repertoire of the common marmoset (Callithrix jacchus) J Acoust Soc Am. 2015;138:2906–2928. doi: 10.1121/1.4934268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Agnew ZK, McGettigan C, Banks B, Scott SK. Articulatory movements modulate auditory responses to speech. Neuroimage. 2013;73:191–199. doi: 10.1016/j.neuroimage.2012.08.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bar-Yosef O, Rotman Y, Nelken I. Responses of neurons in cat primary auditory cortex to bird chirps: effects of temporal and spectral context. J Neurosci. 2002;22:8619–8632. doi: 10.1523/JNEUROSCI.22-19-08619.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bauer EE, Klug A, Pollak GD. Spectral determination of responses to species-specific calls in the dorsal nucleus of the lateral lemniscus. J Neurophysiol. 2002;88:1955–1967. doi: 10.1152/jn.2002.88.4.1955. [DOI] [PubMed] [Google Scholar]
  5. Behroozmand R, Larson CR. Error-dependent modulation of speech-induced auditory suppression for pitch-shifted voice feedback. BMC Neurosci. 2011;12:54. doi: 10.1186/1471-2202-12-54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Behroozmand R, Oya H, Nourski KV, Kawasaki H, Larson CR, Brugge JF, Howard MA, 3rd, Greenlee D. Neural Correlates of Vocal Production and Motor Control in Human Heschl's Gyrus. J Neurosci. 2016;36:2302–2315. doi: 10.1523/JNEUROSCI.3305-14.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bekesy Gv. The structure of the middle ear and the hearing of one's own voice by bone conduction. J Accoust Soc Am. 1949;21:217–232. [Google Scholar]
  8. Brumm H, Voss K, Kollmer I, Todt D. Acoustic communication in noise: regulation of call characteristics in a New World monkey. J Exp Biol. 2004;207:443–448. doi: 10.1242/jeb.00768. [DOI] [PubMed] [Google Scholar]
  9. Burnett TA, Freedland MB, Larson CR, Hain TC. Voice F0 responses to manipulations in pitch feedback. J Acoust Soc Am. 1998;103:3153–3161. doi: 10.1121/1.423073. [DOI] [PubMed] [Google Scholar]
  10. Carmel PW, Starr A. Acoustic and nonacoustic factors modifying middle ear muscle activity in waking cats. J Neurophysiol. 1963;26:598–616. doi: 10.1152/jn.1963.26.4.598. [DOI] [PubMed] [Google Scholar]
  11. Chang EF, Niziolek CA, Knight RT, Nagarajan SS, Houde JF. Human cortical sensorimotor network underlying feedback control of vocal pitch. Proc Natl Acad Sci USA. 2013;110:2653–2658. doi: 10.1073/pnas.1216827110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Christoffels IK, Formisano E, Schiller NO. Neural correlates of verbal feedback processing: an fMRI study employing overt speech. Hum Brain Mapp. 2007;28:868–879. doi: 10.1002/hbm.20315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Crapse TB, Sommer MA. Corollary discharge across the animal kingdom. Nat Rev Neurosci. 2008;9:587–600. doi: 10.1038/nrn2457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Crone NE, Hao L, Hart J, Jr, Boatman D, Lesser RP, Irizarry R, Gordon B. Electrocorticographic gamma activity during word production in spoken and sign language. Neurology. 2001;57:2045–2053. doi: 10.1212/wnl.57.11.2045. [DOI] [PubMed] [Google Scholar]
  15. Curio G, Neuloh G, Numminen J, Jousmaki V, Hari R. Speaking modifies voice-evoked activity in the human auditory cortex. Hum Brain Mapp. 2000;9:183–191. doi: 10.1002/(SICI)1097-0193(200004)9:4&#x0003c;183::AID-HBM1&#x0003e;3.0.CO;2-Z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. de la Mothe LA, Blumell S, Kajikawa Y, Hackett TA. Cortical connections of the auditory cortex in marmoset monkeys: Core and medial belt regions. J Comp Neurol. 2006;496:27–71. doi: 10.1002/cne.20923. [DOI] [PubMed] [Google Scholar]
  17. Duhamel JR, Colby CL, Goldberg ME. The updating of the representation of visual space in parietal cortex by intended eye movements. Science. 1992;255:90–92. doi: 10.1126/science.1553535. [DOI] [PubMed] [Google Scholar]
  18. Eliades SJ, Wang X. Sensory-motor interaction in the primate auditory cortex during self-initiated vocalizations. J Neurophysiol. 2003;89:2194–2207. doi: 10.1152/jn.00627.2002. [DOI] [PubMed] [Google Scholar]
  19. Eliades SJ, Wang X. Dynamics of auditory-vocal interaction in monkey auditory cortex. Cereb Cortex. 2005;15:1510–1523. doi: 10.1093/cercor/bhi030. [DOI] [PubMed] [Google Scholar]
  20. Eliades SJ, Wang X. Neural substrates of vocalization feedback monitoring in primate auditory cortex. Nature. 2008;453:1102–1106. doi: 10.1038/nature06910. [DOI] [PubMed] [Google Scholar]
  21. Eliades SJ, Wang X. Chronic multi-electrode neural recording in free-roaming monkeys. J Neurosci Methods. 2008;172:201–214. doi: 10.1016/j.jneumeth.2008.04.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Eliades SJ, Wang X. Neural correlates of the lombard effect in primate auditory cortex. J Neurosci. 2012;32:10737–48. doi: 10.1523/JNEUROSCI.3448-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Eliades SJ, Wang X. Comparison of auditory-vocal interactions across multiple types of vocalizations in marmoset auditory cortex. J Neurophysiol. 2013;109:1638–1657. doi: 10.1152/jn.00698.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Flinker A, Chang EF, Kirsch HE, Barbaro NM, Crone NE, Knight RT. Single-trial speech suppression of auditory cortex activity in humans. J Neurosci. 2010;30:16643–16650. doi: 10.1523/JNEUROSCI.1809-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Fritz JB, Elhilali M, Shamma SA. Differential dynamic plasticity of A1 receptive fields during multiple spectral tasks. J Neurosci. 2005;25:7623–7635. doi: 10.1523/JNEUROSCI.1318-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Fritz JB, Elhilali M, David SV, Shamma SA. Does attention play a role in dynamic receptive field adaptation to changing acoustic salience in A1? Hear Res. 2007;229:186–203. doi: 10.1016/j.heares.2007.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Fritz JB, David SV, Radtke-Schuller S, Yin P, Shamma SA. Adaptive, behaviorally gated, persistent encoding of task-relevant auditory information in ferret frontal cortex. Nat Neurosci. 2010;13:1011–1019. doi: 10.1038/nn.2598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Greenlee JD, Jackson AW, Chen F, Larson CR, Oya H, Kawasaki H, Chen H, Howard MA. Human auditory cortical activation during self-vocalization. PLoS One. 2011;6:e14744. doi: 10.1371/journal.pone.0014744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Hackett TA, Preuss TM, Kaas JH. Architectonic identification of the core region in auditory cortex of macaques, chimpanzees, and humans. J Comp Neurol. 2001;441:197–222. doi: 10.1002/cne.1407. [DOI] [PubMed] [Google Scholar]
  30. Heinks-Maldonado TH, Mathalon DH, Gray M, Ford JM. Fine-tuning of auditory cortex during speech production. Psychophysiology. 2005;42:180–190. doi: 10.1111/j.1469-8986.2005.00272.x. [DOI] [PubMed] [Google Scholar]
  31. Henson OW., Jr The activity and function of the middle-ear muscles in echo-locating bats. J Physiol. 1965;180:871–887. doi: 10.1113/jphysiol.1965.sp007737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Hickok G, Houde J, Rong F. Sensorimotor integration in speech processing: computational basis and neural organization. Neuron. 2011;69:407–422. doi: 10.1016/j.neuron.2011.01.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Holmstrom LA, Eeuwes LB, Roberts PD, Portfors CV. Efficient encoding of vocalizations in the auditory midbrain. J Neurosci. 2010;30:802–819. doi: 10.1523/JNEUROSCI.1964-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Houde JF, Jordan MI. Sensorimotor adaptation in speech production. Science. 1998;279:1213–1216. doi: 10.1126/science.279.5354.1213. [DOI] [PubMed] [Google Scholar]
  35. Houde JF, Nagarajan SS. Speech production as state feedback control. Frontiers in human neuroscience. 2011;5:82. doi: 10.3389/fnhum.2011.00082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Houde JF, Nagarajan SS, Sekihara K, Merzenich MM. Modulation of the auditory cortex during speech: an MEG study. J Cog Neurosci. 2002;14:1125–1138. doi: 10.1162/089892902760807140. [DOI] [PubMed] [Google Scholar]
  37. Kadia SC, Wang X. Spectral integration in A1 of awake primates: neurons with single-and multipeaked tuning characteristics. J Neurophysiol. 2003;89:1603–1622. doi: 10.1152/jn.00271.2001. [DOI] [PubMed] [Google Scholar]
  38. Lane H, Tranel B. The Lombard sign and the role of hearing in speech. J Speech Hear Res. 1971;14:677–709. [Google Scholar]
  39. Laudanski J, Edeline JM, Huetz C. Differences between spectro-temporal receptive fields derived from artificial and natural stimuli in the auditory cortex. PLoS One. 2012;7:e50539. doi: 10.1371/journal.pone.0050539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Lee BS. Effects of delayed speech feedback. J Acoust Soc Am. 1950;22:824–826. [Google Scholar]
  41. Levelt WJ. Monitoring and self-repair in speech. Cognition. 1983;14:41–104. doi: 10.1016/0010-0277(83)90026-4. [DOI] [PubMed] [Google Scholar]
  42. Lu T, Liang L, Wang X. Neural representations of temporally asymmetric stimuli in the auditory cortex of awake primates. J Neurophysiol. 2001;85:2364–2380. doi: 10.1152/jn.2001.85.6.2364. [DOI] [PubMed] [Google Scholar]
  43. Martikainen MH, Kaneko KI, Hari R. Suppressed responses to self-triggered sounds in the human auditory cortex. Cereb Cortex. 2005;15:299–302. doi: 10.1093/cercor/bhh131. [DOI] [PubMed] [Google Scholar]
  44. Miller CT, Wang X. Sensory-motor interactions modulate a primate vocal behavior: antiphonal calling in common marmosets. J Comp Physiol A Neuroethol Sens Neural Behav Physiol. 2006;192:27–38. doi: 10.1007/s00359-005-0043-z. [DOI] [PubMed] [Google Scholar]
  45. Nelson A, Mooney R. The basal forebrain and motor cortex provide convergent yet distinct movement-related inputs to the auditory cortex. Neuron. 2016;90:635–648. doi: 10.1016/j.neuron.2016.03.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Nelson A, Schneider DM, Takatoh J, Sakurai K, Wang F, Mooney R. A circuit for motor cortical modulation of auditory cortical activity. J Neurosci. 2013;33:14342–14353. doi: 10.1523/JNEUROSCI.2275-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Niziolek CA, Nagarajan SS, Houde JF. What does motor efference copy represent? Evidence from speech production. J Neurosci. 2013;33:16110–16116. doi: 10.1523/JNEUROSCI.2137-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Papanicolaou AC, Raz N, Loring DW, Eisenberg HM. Brain stem evoked response suppression during speech production. Brain Lang. 1986;27:50–55. doi: 10.1016/0093-934x(86)90004-0. [DOI] [PubMed] [Google Scholar]
  49. Sadagopan S, Wang X. Level invariant representation of sounds by populations of neurons in primary auditory cortex. J Neurosci. 2008;28:3415–3426. doi: 10.1523/JNEUROSCI.2743-07.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Salomon B, Starr A. Electromyography of middle ear muscles in man during motor activities. Acta Neurol Scand. 1963;39:161–168. doi: 10.1111/j.1600-0404.1963.tb05317.x. [DOI] [PubMed] [Google Scholar]
  51. Schneider DM, Mooney R. Motor-related signals in the auditory system for listening and learning. Curr Opin Neurobiol. 2015;33:78–84. doi: 10.1016/j.conb.2015.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Schneider DM, Mooney R. Neural coding of self-generated sounds in mouse auditory cortex. Society for Neuroscience Abstracts 2015 [Google Scholar]
  53. Schneider DM, Nelson A, Mooney R. A synaptic and circuit basis for corollary discharge in the auditory cortex. Nature. 2014;513:189–194. doi: 10.1038/nature13724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Schuller G, Beuter K, Schnitzler HU. Response to frequency shifted articial echoes in the bat Rhinolophus ferrumequinum. J Comp Physiol. 1974;89:275–286. [Google Scholar]
  55. Sinnott JM, Stebbins WC, Moody DB. Regulation of voice amplitude by the monkey. J Acoust Soc Am. 1975;58:412–414. doi: 10.1121/1.380685. [DOI] [PubMed] [Google Scholar]
  56. Sperry RW. Neural basis of the spontaneous optokinetic responses produced by visual inversion. J Comp Physiol Psych. 1950;43:482–489. doi: 10.1037/h0055479. [DOI] [PubMed] [Google Scholar]
  57. Suga N, Schlegel P. Neural attenuation of responses to emitted sounds in echolocating bats. Science. 1972;177:82–84. doi: 10.1126/science.177.4043.82. [DOI] [PubMed] [Google Scholar]
  58. Suga N, Shimozawa T. Site of neural attenuation of responses to self-vocalized sounds in echolocating bats. Science. 1974;183:1211–1213. doi: 10.1126/science.183.4130.1211. [DOI] [PubMed] [Google Scholar]
  59. Suga N, Jen PH. Peripheral control of acoustic signals in the auditory system of echolocating bats. J Exp Biol. 1975;62:277–311. doi: 10.1242/jeb.62.2.277. [DOI] [PubMed] [Google Scholar]
  60. Theunissen FE, Sen K, Doupe AJ. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J Neurosci. 2000;20:2315–31. doi: 10.1523/JNEUROSCI.20-06-02315.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. von Holst E, Mittelstaedt H. Das Reafferenzprinzip: Wechselwirkungen zwischen Zentralnervensystem und Peripherie. Naturwissenschaften. 1950;37:464–476. [Google Scholar]
  62. Young ED, Yu JJ, Reiss LA. Non-linearities and the representation of auditory spectra. International review of neurobiology. 2005;70:135–68. doi: 10.1016/S0074-7742(05)70005-2. [DOI] [PubMed] [Google Scholar]

RESOURCES