Skip to main content
. 2021 Nov 18;10:e65566. doi: 10.7554/eLife.65566

Figure 4. Testing the importance of ecological relevance.

Experiment II measured responses to a larger number of ferret vocalizations (30 compared with 4 in experiment I), as well as speech (14) and music (16) sounds. (A) Map showing the dissimilarity between natural and spectrotemporally matched synthetic sounds from experiment II for each recorded hemisphere, measured using the noise-corrected normalized squared error (NSE). NSE values were low across auditory cortex, replicating experiment I. (B) Maps showing the average difference between responses to natural and synthetic sounds for vocalizations, speech, music, and others sounds, normalized for each voxel by the standard deviation across all sounds. Results are shown for ferret T, left hemisphere for both experiments I and II (see Figure 4—figure supplement 1C for all hemispheres). For comparison, the same difference maps are shown for the human subjects, who were only tested in experiment I. (C) NSE for different sound categories, plotted as a function of distance to primary auditory cortex (binned as in Figure 2F). Shaded area represents 1 standard error of the mean across sounds within each category (Figure 4—figure supplement 1D plots NSEs for individual sounds). (D) Same as panel (C) but showing results from experiment II. (E) The slope of NSE vs. distance-to-primary auditory cortex (PAC) curves for individual ferrets and human subjects. Ferret slopes were measured separately for ferret vocalizations (black lines) and speech (gray lines) (animal indicated by line style). For comparison, human slopes are plotted for speech (each yellow line corresponds to a different human subject).

Figure 4.

Figure 4—figure supplement 1. Results of experiment II from other hemispheres.

Figure 4—figure supplement 1.

(A) The animal’s spontaneous movements were monitored with a video recording of the animal’s face. Motion was measured as the mean absolute deviation between adjacent video frames, averaged across pixels. (B) Average evoked movement amplitude for natural (shaded) and synthetic (unshaded) sounds broken down by category. Each dot represents one sound. Significant differences between natural and synthetic sounds, and between categories of natural sounds are plotted (Wilcoxon signed-rank test, *p<0.05; **p<0.01; ***p<0.001). Evoked movement amplitude was normalized by the standard deviation across sounds for each recording session prior to averaging across sound category (necessary because absolute pixel deviations cannot be meaningfully compared across sessions). Movement amplitude is shown for each animal separately. (C) Same format as Figure 4B but showing results from additional hemispheres/animals. (D) This panel shows the distribution of normalized squared error (NSE) values for all pairs of natural and synthetic sounds (median across all voxels; averaged across subjects for humans), grouped by category. Dots show individual sound pairs and boxplots show the median, central 50%, and central 92% (whiskers) of the distribution. Humans were only tested in experiment I. Note that the two outliers for the human music plot are sound clips that contain singing, and thus are a mixture of both speech and music, which likely explains the particularly divergent responses.
Figure 4—figure supplement 2. Components from experiment II.

Figure 4—figure supplement 2.

The components derived from experiment II were similar to those from experiment I, shown in Figure 3. This figure plots similar low-frequency, high-frequency, and speech-preferring components from experiment II. Figure 3 (A) For reference with the weight maps in panel (B), a tonotopic map is shown, measured using pure tones. The map is from one hemisphere of one animal (ferret T, left). (B) Voxel weight maps. (C) Component response to natural and spectrotemporally matched synthetic sounds, colored based on category labels (labels shown at the bottom left of the figure). (D) Correlation of component responses with energy at different audio frequencies, measured from a cochleagram. Inset for f3 shows the correlation pattern that would be expected from a response that was perfectly speech selective (i.e., 1 for speech, 0 for all other sounds). (E) Correlations with modulation energy at different temporal and spectral rates. Inset shows the correlation pattern that would be expected for a speech-selective response.
Figure 4—figure supplement 3. The effect of removing outside-of-cortex components on motion correlations.

Figure 4—figure supplement 3.

Voxel responses were denoised by removing components from outside of cortex, which are likely to reflect artifacts like motion (see ‘Denoising part I’ in Materials and methods). (A) Effect of removing components from outside of cortex on correlations with movement. We measured the correlation of each voxel’s response with movement, measured from a video recording of the animal’s face (absolute deviation between adjacent frames). Each line shows the average absolute correlation across voxels for a single recording session/slice. Correlation values are plotted as a function of the number of removed components. Motion correlations were substantially reduced by removing the top 20 components (vertical dotted line). (B) The average difference between responses to natural vs. synthetic sounds for an example slice (ferret A) before and after removing the top 20 out-of-cortex components. Motion induces a stereotyped ‘striping’ pattern due to its effect on blood vessels, which is evident in the map computed from raw data, likely because this ferret moved substantially more during natural vs. synthetic sounds (in particular for ferret vocalizations; Figure 4—figure supplement 1). The striping pattern is unlikely to reflect genuine neural activity and is largely removed by the denoising procedure.