Skip to main content
Cerebral Cortex (New York, NY) logoLink to Cerebral Cortex (New York, NY)
. 2022 Nov 6;33(9):5395–5408. doi: 10.1093/cercor/bhac427

Selective attention sharpens population receptive fields in human auditory cortex

Agustin Lage-Castellanos 1,2,3, Federico De Martino 4,5,6, Geoffrey M Ghose 7, Omer Faruk Gulban 8, Michelle Moerel 9,10,11,
PMCID: PMC10152083  PMID: 36336333

Abstract

Selective attention enables the preferential processing of relevant stimulus aspects. Invasive animal studies have shown that attending a sound feature rapidly modifies neuronal tuning throughout the auditory cortex. Human neuroimaging studies have reported enhanced auditory cortical responses with selective attention. To date, it remains unclear how the results obtained with functional magnetic resonance imaging (fMRI) in humans relate to the electrophysiological findings in animal models. Here we aim to narrow the gap between animal and human research by combining a selective attention task similar in design to those used in animal electrophysiology with high spatial resolution ultra-high field fMRI at 7 Tesla. Specifically, human participants perform a detection task, whereas the probability of target occurrence varies with sound frequency. Contrary to previous fMRI studies, we show that selective attention resulted in population receptive field sharpening, and consequently reduced responses, at the attended sound frequencies. The difference between our results to those of previous fMRI studies supports the notion that the influence of selective attention on auditory cortex is diverse and may depend on context, stimulus, and task.

Keywords: frequency tuning, human auditory cortex, pRF modeling, selective attention, ultra-high field fMRI

Introduction

Selective attention highlights currently relevant information through the preferential processing of stimulus features (Maunsell and Treue 2006; Carrasco 2011). Selective attention results in a behavioral advantage (e.g. increased detection performance, faster reaction times) for the attended feature, thereby allowing us to efficiently handle the potentially overwhelming amount of sensory information in our environment.

The neural correlates of selective attention have been extensively investigated throughout stimulus modalities (e.g. in the visual system (Martinez-Trujillo and Treue 2004); somatosensory system (Schweisfurth et al. 2014); and auditory system (Fritz et al. 2003; Lee and Middlebrooks 2011). In the auditory system, selective attention has also been shown to substantially influence neuronal processing. However, while across-species evidence from the visual system almost uniformly supports a gain model of selective attention (Tootell et al. 1998; Saenz et al. 2002; Hopf et al. 2004; Warren et al. 2014), results from the auditory cortex are diverse. Invasive electrophysiological studies in animals have shown that selectively attending a specific sound feature (e.g. frequency or spatial location) induces rapid changes in the preference and selectivity of auditory neurons (Fritz et al. 2003, 2007; Fritz 2005; Lee and Middlebrooks 2011; Lakatos et al. 2013; O’Connell et al. 2014). These changes are stronger in secondary and tertiary regions (i.e. belt and parabelt, respectively) than in the primary auditory cortex (Atiani et al. 2014) and are strongest in neurons whose preference matches the attended feature (Atiani et al. 2014). However, these changes are highly task dependent. For example, decreased, instead of increased responsiveness was observed when animals were rewarded for target detection instead of punished for missing a target (David et al. 2012). These results support a matched filter model of selective attention in which auditory neurons change tuning to optimally process attended features (Fritz et al. 2007), but for which the exact changes are dependent on task structure, difficulty, and associated reward (Atiani et al. 2009; David et al. 2012).

By contrast, human functional magnetic resonance imaging (fMRI) studies of auditory selective attention consistently support a gain model in which overall responsiveness (Paltoglou et al. 2009; Da Costa et al. 2013; Riecke et al. 2017), but not tuning (Dick et al. 2017; Riecke et al. 2018, but see Kikuchi et al. 2019), changes. This discrepancy with results from animal electrophysiology could in part be due to the limited spatial resolution of fMRI. While changes in voxel tuning can be reliably assessed with fMRI (Kay et al. 2015), the voxel response could obscure changes at the neuronal level (Sadil et al. 2022). Conflicting findings between human fMRI and nonhuman invasive studies may also have been caused by their differences in experimental design. While target and distractor sounds were presented sequentially in animal electrophysiology studies, in human fMRI studies target and distractor sounds were always presented simultaneously, i.e. in an auditory scene (note that while Da Costa et al. (2013) included a “single stream” condition, they did not evaluated the effect of attention within this condition). Finally, the fMRI studies evaluated frequency tuning with limited spectral resolution (Dick et al. 2017; Riecke et al. 2018) and therefore might be less sensitive to subtle changes. As a result, to date it is still largely unclear how the results obtained with fMRI in humans relate to the electrophysiological findings in animal models.

Here we employed a selective attention task where participants alternatively directed their attention to low or high frequencies. By presenting target and distractor sounds sequentially and using ultra-high field fMRI, we aimed to narrow the gap between animal and human research. We measured natural sounds responses to estimate population receptive field (pRF) changes with high spectral resolution. Contrary to previous fMRI studies we observed an attention-induced sharpening in voxel frequency tuning, resulting in a response reduction to attended sounds. Our results, and their difference with results from previous human fMRI investigations, thereby support the notion arising from animal studies that, depending on context, stimulus, and task, diverse mechanisms underlie auditory selective attention.

Materials and methods

Ethics

The experimental procedures were approved by the Ethics Review Committee of the Faculty of Psychology and Neuroscience at Maastricht University (#167_09_05_2016_S2). The experiment was performed in accordance with the approved guidelines and the Declaration of Helsinki. Informed consent was obtained from each participant before starting the measurements. Participants received course credit or gift vouchers for their participation.

Participants

Eight healthy volunteers, without history of hearing disorder or neurological disease, participated in this study (mean age [SD] = 28.1 [3.2]; 4 males and 4 females). A pure-tone audiogram (with a 25 dB hearing level threshold) was conducted to ensure the participants did not have hearing loss.

Experimental design and statistical analysis

Noise burst detection task

Selective attention was manipulated through a noise-burst detection task on artificial ripple sounds. Specifically, participants were presented with ripples centered at a low or high sound frequency (2 center frequencies [300 Hz; 4 kHz] × 2 temporal modulation rates [3 Hz; 10 Hz]; ripple bandwidth = 1 octave; modulation depth = 0.6; duration = 1 s, with 50-ms onset and offset linearly ramped) and were instructed to press a button when they detected a short white noise burst (duration = 112 ms, with 6-ms onset and offset linearly ramped) in the ripple sounds. The noise burst occurred 500–800 ms after ripple onset and always started at a peak intensity of the ripple envelope. The intensity of the noise burst was individually calibrated to ensure equal detection difficulty across ripple center frequencies and equal performance across participants (see below for details). Participants received feedback about their performance by a change in the color of the fixation cross (from black to either green [correct] or red [incorrect]) upon button press. By manipulating the probability of noise burst occurrence, it was made advantageous to either attend low-frequency (300 Hz) or high-frequency (4 kHz) ripple sounds. Specifically, in the “Attend Low” condition, 70% of the ripples with a center frequency of 300 Hz contained a noise burst, whereas the noise burst was present in only 30% of the ripples with a center frequency of 4 kHz. In the “Attend High” condition, the percentage of ripples containing a noise burst was reversed (30% of the 300 Hz ripples and 70% of the 4 kHz ripples). The attentional condition was kept equal throughout a run of several minutes (see below for details), and runs were presented in a randomized order. At the start of each run, participants were informed regarding the sound frequency for which noise bursts were most common. All ripple sounds, with and without noise burst, were generated with MATLAB (The MathWorks, Inc.) and presented to the participants using the Psychophysics Toolbox Version 3 (Psychtoolbox-3).

Calibration of noise burst intensity

The intensity of the noise burst was calibrated using the method of constant stimuli. Specifically, ripples with noise burst intensities near and just above detection threshold (ranging from 0.0005 to 0.04 × intensity of the ripple in 11 steps) were generated and presented, along with ripple sounds without noise burst, in random order. Each of the 4 ripple sounds (2 center frequencies × 2 temporal modulation rates) was presented 12 times without a noise burst and repeated 4 times with each of the 11 burst sound intensities resulting in 56 trials per ripple. Participants were instructed to press a button if they heard a noise burst in the ripple sound. For each ripple center frequency, but combined across the two temporal modulation rates (112 trials in total), burst intensity was plotted against percentage hit (i.e. the percentage of trials on which the noise burst was correctly detected) and this relationship was fitted with a sigmoidal function. This allowed deriving the intensity at which the noise burst was detected with an accuracy of 60%, separately for 300 and 4 kHz ripple sounds.

Behavioral data

Each participant completed a behavioral session that took place in a soundproof booth. In this session, we conducted the pure-tone audiogram and then calibrated the noise burst intensity. Next, each participant performed the noise burst detection task for the dual purpose of familiarizing the participants with the task before they entered the scanner and behaviorally validating the task. Specifically, participants were instructed to press a button whenever they heard a noise burst (whose intensity was set at 60% detection threshold) in the presented ripple sounds. Each of the 4 ripple sounds was presented 60 times per attentional condition, resulting in 240 trials per condition in total. These trials were divided into runs of 120 trials (i.e. 2 runs per condition). Intertrial interval was equal to 1.7 s, resulting in a run duration of ~4 min (and ~16 min to complete this part of the behavioral data collection).

We analyzed the hit rate and d’ of all trials as well as the reaction time of all trials that resulted in a hit (i.e. where the noise burst was correctly identified), with 3 separate 2-way repeated-measures analyses of variance (ANOVAs; with factors “Sound Frequency” [300 Hz, 4 kHz] and “Condition” [Attend Low, Attend High]). Significant interactions were further explored through a paired t-test per level of the factor “Condition.”

MRI data

All measurements were performed on a 7 Tesla Siemens MAGNETOM scanner (Siemens Medical Solutions, Erlangen, Germany) using a single transmit 32-channel head coil (Nova Medical) at Scannexus (Maastricht, the Netherlands). Six of our 8 participants took part in an earlier study (Sitek et al. 2019) consisting of 3 MRI measurement sessions (referred to below as MRI sessions 1–3). For these participants, only MRI sessions 4 and 5 were collected for this study and were added to the existing data. For the remaining 2 participants, data for all 5 MRI sessions were collected for this study.

In the first session, T1-weighted (T1w) and proton density-weighted [PDw] data were collected at a voxel size of 0.7-mm isotropic. The T1w scan was acquired using a magnetization-prepared rapid gradient-echo (3-D MPRAGE) sequence (repetition time [TR] = 3100 ms; time to inversion [TI] = 1500 ms; time echo [TE] = 2.42 ms; flip angle = 5°; generalized autocalibrating partially parallel acquisition [GRAPPA] = 3; matrix size = 320 x 320; 256 slices). PDw images were acquired with the same 3-D MPRAGE as the T1w image, but without the inversion pulse (TR = 1380 ms; TE = 2.42 ms; flip angle = 5°; GRAPPA = 3; matrix size = 320 × 320; 256 slices; pixel bandwidth = 200 Hz/pixel). Dielectric pads were used to improve transmit efficiency in temporal areas when acquiring these anatomical images (Teeuwisse et al. 2012). Acquisition time for the T1w and PDw datasets were ~9.5 min each. In this first session, 2 additional anatomical datasets (a T2*-weighted and T1-weighted dataset with a short inversion time (Tourdias et al. 2014) and a diffusion-weighted MRI data were collected; see (Sitek et al. 2019) for acquisition parameters). These datasets were not used in this study.

In sessions 2–5, fMRI data were collected while participants listened to sounds. In sessions 2–3, 168 natural sounds (sound duration = 1 s) covering 7 semantic categories (speech, voice, nature, tools, music, animals, and monkey calls) were presented. Subjects performed a 1-back task on the sounds and pressed a button if consecutive sounds were repeated (trials with a repetition were excluded from the analysis). Following a rapid event-related design, sounds were presented in silent gaps in between fMRI acquisitions with an intertrial interval of 2, 3, or 4 TRs. Sessions 2 and 3 were identical to each other. The 168 sounds were divided into 4 nonoverlapping cross-validation sets of 42 sounds (each containing 6 samples per category). All sounds of a cross-validation set were presented once per run. As a session comprised 12 runs, each natural sound was repeated 3 times per session and 6 times throughout sessions 2–3 combined.

In sessions 4–5, participants listened to ripple sounds (sound duration = 1 s) with and without noise burst and performed the noise burst detection task. Noise burst intensity was set at 80%, as opposed to 60% for the behavioral data collected in the sound booth, as in pilot measurements it was observed that the task otherwise was too difficult to perform in the noisy scanner environment. The 80% noise burst detection threshold was determined by repeating a short version of the noise burst intensity calibration procedure inside the scanner. The procedure was kept the same as described above, except that each of the ripple sounds was presented 9 times (instead of 12) without a noise burst and repeated 3 times (instead of 4) with each of the 11 burst sound intensities resulting in 42 trials per ripple. The noise burst detection task was kept the same as in the behavioral part of the experiment as well. Participants had 1.7 s to respond to each sound. Upon button press, feedback was provided by a change in the color of the fixation cross (from black to green [correct] or red [incorrect]). Attentional condition was kept the same throughout a run, and attentional conditions were randomized across sessions 4 and 5.

In both attentional conditions, 14 low frequency (center frequency of 300 Hz) and 14 high frequency (center frequency of 4 kHz) ripples were presented per run. Sound order was equal across conditions. The only difference between conditions was the occurrence of noise bursts within the ripple sounds. In “Attend Low” runs, 10 out of 14 low-frequency ripples contained a noise burst (71.4%), whereas 4 out of 14 of the low-frequency ripples did not contain a noise burst (Fig. 1). For the ripples with a center frequency of 4 kHz, the noise burst was present in only 4 out of 10 ripples (28.6%) and absent in the remaining 10 out of 14 ripples. This occurrence of noise bursts was reversed for the “Attend High” condition.

Fig. 1.

Fig. 1

Sound stimulation protocol. Sound stimulation in part of two runs where participants either attended low frequencies (left) or high frequencies (right). Ripple sounds (gray and black squares) were presented interspersed with natural sounds (white squares). Across attentional conditions, exactly the same sounds were presented in the same order. The only difference across conditions was that the target, a short white noise burst (shown as an asterisk), occurred more often in ripples with an attended center frequency (e.g. 300 Hz in the “Attend Low” condition) than in ripples with a nonattended center frequency (e.g. 4 kHz in the “Attend Low” condition).

In addition to the ripple sounds, natural sounds (sound duration = 1 s, with 10-ms onset and offset linearly ramped) were presented interspersed with the ripple sounds for the purpose of estimating pRFs per attentional condition (Fig. 1). Noise bursts did not occur in the natural sounds, and participants were informed accordingly. A total of 96 natural sounds was presented, comprising the following 6 sound categories: speech, voice, music, tools, animals, and nature sounds. The 96 natural sounds were divided into 4 nonoverlapping cross-validation sets of 24 sounds (each containing 4 samples per category). Half of the sounds of a cross-validation set were presented once per run, and the other half was presented twice (resulting in 36 natural sound trials/run). Across the 2 sessions, which comprised 16 runs (8 runs per session), each natural sound was repeated 3 times per attentional condition. Each run comprised 68 trials (28 ripples, 36 natural sounds, and 4 trials where no sound was presented). As all sounds were presented following the same rapid event related design as employed in sessions 2–3 (including the sound presentation during the silent gap between acquisitions), this resulted in a run duration just below 9 min. All sounds, both ripples and natural sounds, were sampled at 16 kHz. Sound energy was equalized using MATLAB. Additionally, before onset of the scans but with earbuds in place, the loudness of the 300 Hz and 4 kHz ripple sounds was adjusted for each participant to match the loudness of the ripples to each other and to the loudness of the natural sounds. Sounds were presented to the participants in the MRI scanner using the MRI-compatible S14 model earbuds of the Sensimetrics Corporation (www.sens.com).

Functional MRI data throughout sessions 2–5 were acquired with a 2-D Multi-Band Echo Planar Imaging (2D-MB EPI) sequence (Moeller et al. 2010; Setsompop et al. 2012) (TR = 2600 ms; silent gap = 1400 ms; TE = 20 ms; flip angle = 80°; GRAPPA = 3; Multi-Band = 2; matrix size = 188 × 188; 46 slices; 1.1 mm isotropic voxels; phase encode direction inferior to superior). Acquisitions with reversed-phase encode polarity were used for distortion correction. Slices were oriented coronally oblique to cover the complete ascending auditory pathway (including the auditory brainstem structures, auditory thalamus, and auditory cortex).

MRI data analysis

The anatomical data analysis started by taking the ratio between the T1w and PDw images to minimize receive coil inhomogeneities in the T1w images (Van de Moortele et al. 2009). The resulting dataset was corrected for residual inhomogeneities, upsampled to 0.5-mm isotropic resolution, and brought to Talairach space. The white matter (WM)—gray matter (GM) boundary and the GM-cerebrospinal fluid boundary were detected using the automatic tools of BrainVoyager QX and then manually corrected. The WM–GM boundary was used for the cortical surface reconstruction of individual hemispheres, which were used for defining the following regions of interest based on macroanatomy, following the criteria outlined in Kim et al. (2000): core (Heschl’s gyrus [HG]), belt (planum temporale and planum polare), and parabelt (superior temporal gyrus). Furthermore, separate for the left and right hemisphere, we brought each hemisphere across participants to cortex-based aligned (CBA; Goebel et al. 2006) space for the purpose of group analysis.

We used BrainVoyager QX, FSL, and custom MATLAB code (The MATHWORKS Inc., Natick, MA, USA) to analyze the functional data. Preprocessing consisted of slice scan-time correction (with sinc interpolation), 3-dimensional motion correction, and temporal high-pass filtering (6 sines/cosines). FSL-FLIRT was used to align the functional data of all sessions to those collected in the first run of session 2, while employing a mask that included the brainstem, thalamus and auditory cortex. The functional images were then distortion corrected using FSL-TOPUP based on the opposite phase encoding direction images collected in session 2. The functional data were then projected in Talairach space while re-sampling at a spatial resolution of 1 mm isotropic.

Responses to ripple sounds

We analyzed the functional data within a bilateral anatomical mask that covered the superior half of the temporal lobe, which includes the auditory cortex. A General Linear Model (GLM) analysis with a canonical hemodynamic response function (HRF) was used to estimate the overall BOLD response to the ripple and natural sounds in sessions 4–5, and thereby determine the auditory responsive regions in each participant. Next, we used a fixed-effects GLM analysis to evaluate the effect of selective attention on the BOLD responses to ripple sounds. In this second GLM analysis, we equalized the number of ripples with and without a noise burst across center frequencies within a run. That is, ripple sounds whose center frequencies matched the attended frequency were more often presented with noise burst (5 out of 7 ripples) than ripple sounds for which the center frequency did not match the attended frequency (2 out of 7 ripples). By excluding, per run, 3 ripples with(out) noise burst for ripples of the attended and nonattended frequency, respectively, we matched the number of presented noise bursts across ripple sounds. The effect of selective attention on the auditory cortical response to ripple sounds was first evaluated throughout the auditory cortex as a whole, by sampling the voxel timecourses to the reconstructed surfaces [spatial smoothing = 3 voxels], aligning them across participants in CBA space, and contrasting [attended ripple − nonattended ripple]. Second, we evaluated the effect of selective attention as a function of the difference between the attended frequency and the voxel’s best frequency [BF]. For this analysis, only voxels that showed a significant response to the sounds (false discovery rate [FDR]-corrected, q < 0.01) and a positive response (in percent signal change [PSC]) to ripple sounds of both 300 Hz and 4 kHz in sessions 4–5 were included. We computed the change in response to each ripple sound when that sound was attended compared with nonattended as:

graphic file with name DmEquation1.gif

where β as the estimated response to the sound in PSC. Response suppression thus measures the effect of attention on the sound response, relative to the overall strength of the response. Response suppression was computed separately for voxels whose BF (averaged across attentional conditions) ranged from 0 to 1.5 octaves from the attended frequency (in 6 linearly spaced bins). A linear regression analysis was used to test if response suppression (averaged across ripple sounds) varied with distance in octaves from the attended frequency (i.e. testing if the slope of the regression line was different from 0).

pRF mapping

Responses to the natural sounds were used to estimate pRFs (Dumoulin and Wandell 2008; Thomas et al. 2015), separately for sessions 2–3 (“Baseline”) and the 2 attentional conditions of sessions 4–5 (“Attend Low” and “Attend High”). The response to the individual natural sounds was estimated following the procedure outlined in previous work (Moerel et al. 2012, 2015; Sitek et al. 2019). In short, separate for each cross-validation, GLM-denoise was used to denoise the data (http://kendrickkay.net/GLMdenoise/) (Kay et al. 2013). We then used a deconvolution GLM (9 stick predictors) to estimate the HRF separately for each voxel but common to all sounds (Kay et al. 2008). The resulting HRF and noise regressors, estimated on the training data, were used to compute a response estimate (beta weight) to the training and testing sounds in each voxel.

Following the methodology employed in pRF modeling of visual cortex (Dumoulin and Wandell 2008; Kay et al. 2015), we used a 2-step procedure to estimate the voxel-wise pRF based on its response estimate. First we estimated the BF and selectivity per voxel, and in a second step we estimated voxel gain. Through Fourier transform, we extracted the representation of each sound in the frequency space (logarithmically ranging from 180 to 8000 Hz in 2048 bins; Supplementary Fig. S1A). The response to each sound was predicted by assuming a one dimensional Gaussian receptive field for each voxel, defined by the BF (mean μ of the Gaussian) and selectivity (size σ of the Gaussian). The possible parameter ranges of the 1-dimensional Gaussian were defined by creating a regular grid of 205 seeds in the frequency space (logarithmically ranging from 180 to 8000 Hz) and 10 seeds in the selectivity space (ranging from 0.27 to 2.67 octaves; Supplementary Fig. S1). For each point in this grid, a predicted sound response (in beta weights) was generated by multiplying the sound representation in frequency space with the frequency response curve defined by that grid point. For each voxel and grid point, we then computed a correlation coefficient between the observed and predicted beta responses.

As we have shown before (Lage-Castellanos et al. 2020), the distribution of the resulting cost function under the null hypothesis varies with the pRF selectivity. As a result, the grid point generating the most correlated prediction with the observed data is not necessarily the least probable one to occur by chance. We therefore followed the Permutation Based Model Grid-Search for Separable Betas Design (PermGS) procedure, described in detail in (Lage-Castellanos et al. 2020). This is a refinement of the grid search pRF-estimation procedure where the selection criterion varies from the traditional cost function (correlation coefficient) to a criterion based on the probability of selecting a particular seed under the null (−log[P value]). While similar to the standard grid search algorithm, this variant prevents the bias toward high-selectivity pRFs (small pRF size) in voxels with low SNR. A total of 1000 permutations were implemented, where in each permutation the sound order (i.e. the vector of beta responses) was randomized. The distribution of the correlation coefficient between observed and predicted beta response was computed for each seed at each randomization. The best seed for the observed data was selected as the one with the lowest probability of occurrence under the null hypothesis across all the null distributions for each seed in the grid. As shown in (Lage-Castellanos et al. 2020), this change in the selection criterion does not modify the pRF parameters in voxels with a good SNR but increases the reliability of the estimated pRFs in voxels with a low SNR.

The pRF fitting procedure was performed per dataset (separate for Baseline, Attend Low, and Attend High). Each dataset was designed to contain 4 nonoverlapping sets of sounds presented in separate runs. Fitting was performed on training data in 4-fold cross-validation, where each training dataset consisted of 3 out of 4 sound sets. Prediction accuracy was assessed by correlating the predicted and observed voxels’ responses in independent testing data (i.e. the left out sound set). The final prediction accuracy was computed as the average accuracy on the testing data across the 4-folds.

After selecting the BF μ and selectivity σ per voxel, we estimated the voxel’s gain (in PSC) per attentional condition as the beta weight resulting from the linear regression between the observed and predicted voxel’s time series. In this regression analysis, the predicted voxel’s time series was based on the frequency content of the presented sound, the attentional condition, and the condition- and voxel-specific pRF estimate. Noise covariates (as estimated through GLM-denoise) were included as predictors in the regression analysis. Gain was computed in 4-fold cross validation.

The cortical maps of frequency preference and selectivity were created by color-coding each voxel according to the mean (BF) and size (tuning width [TW]) of the best-fitting Gaussian, averaged over cross-validations. A red–yellow–green–blue color scale was used to create BF maps, where preference for low and high frequencies was assigned to red and blue colors, respectively. A yellow–green–blue–purple color scale was used for the TW maps, where broad and narrow TW were assigned with yellow and purple colors, respectively. Maps were restricted to those voxels that showed a significant response to the sounds (FDR-corrected, q < 0.01) in sessions 4–5. Group maps were created by sampling the individual maps to CBA space, smoothing each map at individual participant level (full width at half maximum [FWHM] = 2.4 mm) and averaging the resulting maps at all locations for which data of at least 5 out of 8 participants was available. To evaluate the existence of a relation between maps of BF and TW, we computed the Pearson’s correlation between these maps in each individual participant and per attentional condition. Per attentional condition, a 1-sample t-test was used to test the statistical significance of the observed Pearson’s correlation between BF and TW maps across the 8 participants.

Evaluation of attention-induced pRF changes

In addition to limiting the analysis to those voxels with a significant response to the sounds, for all following analyses we furthermore limited the voxels to those that in the Baseline dataset (scanning sessions 2–3; averaged across the 4 cross-validations) showed a Pearson correlation between predicted and observed responses to testing sounds >0.18 (which corresponds to P < 0.01, determined through permutation testing). Importantly, the Attend Low and Attend High datasets (scanning sessions 4–5) were not used for voxel selection. We reasoned that if pRFs changed across attentional conditions, the prediction of responses to test sounds should be more accurate if based on data of the same condition (within-condition prediction accuracy) than on data of a different condition (across-condition prediction accuracy). To test for the presence of pRF changes with attention, we therefore computed 4 maps of prediction accuracy per participant, where pRFs were computed on either training dataset (“Attend Low” or “Attend High”) and evaluated on either testing dataset (“Attend Low” or “Attend High”). We then subtracted maps of within-condition prediction accuracy from across-condition prediction accuracy (i.e. “Low-to-Low”—“High-to-Low,” and “High-to-High”—“Low-to-High”), projected these individual maps to the surface and brought them to CBA space, and averaged the maps across conditions and participants. Cluster-size permutation thresholding, performed separately for the left and right hemisphere, was used to determine the statistical significance of the activation clusters observed in the resulting group maps. Specifically, by performing all possible sign inversions of the 8 participant maps (28 = 256 permutations), thresholding the maps (at 7 different thresholds that ranged from 0.03 to 0.09 in linearly spaced bins) and recording the maximum cluster sizes occurring for each permuted group map (using the SurfStat toolbox; https://www.math.mcgill.ca/keith/surfstat/), we constructed the null distribution of the maximum cluster size. The 95th percentile of this distribution was used as the cluster size threshold for determining significance of clusters in the observed data. This imposes a family wise error rate (FWER) control at the level of 0.05.

We visualized the distribution of BF, TW, and gain, separately for the core (HG), belt (planum temporale and planum polare), and parabelt (superior temporal gyrus), through histograms (10 linearly spaced bins). These histograms were computed per participant, and normalized for the total number of included voxels at individual subject level. Per region of interest, paired t-tests were used to test for differences in average BF, TW, and gain across attentional conditions.

We also evaluated the effect of selective attention on TW and gain as a function of the difference between the attended frequency and the voxel’s BF. We computed the average TW and gain in bins relative to the attended frequency (ranging from 0 to 1.5 octave in 6 linearly spaced bins), separately for when that frequency was attended and was not attended. Separately for TW and gain, a two-way repeated measures ANOVA per region of interest with factors “Attention” (2 levels: attended vs. nonattended) and “Distance to target frequency” (6 levels: 0—0.25 octaves, 0.25—0.5 octaves, 0.5—0.75 octaves, 0.75—1 octaves, 1—1.25 octaves, and 1.25—1.5 octaves) was used to test for differences in with selective attention. Significant interactions were further explored through a paired t-test per level of the factor “Distance to target frequency”.

Results

Selective attention sped up the reaction time to noise bursts in attended compared with unattended ripples (Fig. 2A and B). This effect was significant when probed outside the scanner (Fig. 2A; significant interaction “Sound Frequency” × “Condition”; F(1,6) = 11.73; P = 0.011). Follow-up tests per level of “Sound Frequency” showed a significantly faster response to noise bursts in 300 Hz ripples when attended (Bonferroni-corrected P = 0.002). The reaction time to noise bursts in 4 kHz ripple sounds did not significantly differ between attentional conditions. While inside the scanner (i.e. during MRI data collection) reaction times qualitatively followed the same pattern as in the soundproof booth (Fig. 2B), we did not observe a significant interaction between “Sound Frequency” and “Condition” nor any significant main effects.

Fig. 2.

Fig. 2

Effect of attention on behavioral responses. Selective attention sped up the reaction time to noise bursts that occurred in ripples of the attended frequency. This effect was present when probed outside and inside the scanner (A–B, respectively), but only significant when tested outside the scanner. C–D) Hit rate increased with attention, both outside and inside the scanner (in the left and right panel, respectively). E–F) Attention did not affect detectability index d’, but we did observe an overall higher detectability for low compared with high-frequency ripples. This pattern was present both outside and inside the scanner (in the left and right panel, respectively). Throughout figure panels, error bars show the standard error across participants. Single and double asterisks indicate statistically significant differences between conditions at P < 0.05 and <0.01, respectively.

We also observed a facilitatory effect of selective attention on the hit rate (sound booth: significant interaction, F(1,6) = 32.78; P = 7.2 x 10−4; significant difference between levels of “Condition” for 300 Hz sounds, Bonferroni-corrected P = 0.0063; inside the scanner: significant interaction, F(1,6) = 8.53; P = 0.022; significant difference between levels of “Condition” for 4 kHz sounds, Bonferroni-corrected P = 0.040; Fig. 2C and D). However, this was accompanied by an increase in the number of false alarms for attended ripples and therefore likely due to a response bias. Indeed, we did not observe an effect of selective attention on d’ (Fig. 2E and F). There was, however, a main effect of “Sound Frequency” on d’ (sound booth: F(1,6) = 7.51; P = 0.029; inside the scanner: F(1,6) = 6.01; P = 0.044), indicating a higher detectability of noise bursts in ripples of 300 Hz compared with 4 kHz despite the noise burst intensity calibration.

We observed significant responses (FDR-corrected, q < 0.05) to the sounds throughout the auditory cortex. The responses to ripple sounds were weaker when these sounds were attended compared with unattended. This effect was observed throughout the auditory cortex (Fig. 3A), especially in the belt and parabelt regions of the right hemisphere. In order to further examine the weaker response to attended sounds, we assessed the voxels’ frequency preference (BF), frequency selectivity (TW), and gain through pRF mapping based on responses to the natural sounds. We estimated separate pRF’s for the 3 conditions (“Baseline,” “Attend Low” condition, and “Attend High”). PRF’s could be reliably estimated throughout individual participants and conditions (see Supplementary Fig. S2 for pRF fits in example voxels), showing a prediction accuracy in testing that was consistently above zero (see Fig. 4 for within-condition prediction accuracy and Supplementary Fig. S3 for across-condition prediction accuracy). When analyzing attention-induced suppression of responses to ripple sounds as a function of distance between the voxel’s BF and the attended sound frequency, we observed a stronger suppression in those voxels whose BF most closely matched the attended sound frequency. Response suppression for both ripple frequencies decreased with increasing distance between the voxels’ BF and the attended sound frequency (Fig. 3B). Statistical testing on the average response suppression across ripple frequencies showed a significant linear trend (P = 0.011).

Fig. 3.

Fig. 3

Effect of selective attention on response to ripples. A) At group level, the response to attended ripple sounds was lower than when the same sounds were nonattended ([response to attended − response to nonattended sounds] shown ). This effect could be observed throughout the auditory cortex (P < 0.05 uncorrected), but was strongest in belt and parabelt regions of the right hemisphere (FDR-corrected, q < 0.05 shown in black outlines). The white dashed line outlines HG. B) Response suppression (in %) to ripple sounds when attended compared with nonattended as a function of the distance (in octaves) between the voxel’s BF and the attended (i.e. target) frequency. Suppression of responses to ripple sounds at 300 Hz , 4 kHz, and their average is shown. Responses to attended ripples were suppressed compared with when the same ripple sounds were not attended, and this effect was stronger in voxels with a BF that closely matched the attended frequency. C) Modeled response suppression (%) to attended compared with nonattended ripple sounds as a function of the distance (in octaves) between the voxels BF and the attended (i.e. target) frequency. Modeled response suppression to ripple sounds at 300 Hz , 4 kHz , and their average is shown. Model responses to attended ripples in voxels with a BF that closely matched the attended frequency were weaker than when the same ripple sounds were not attended.

Fig. 4.

Fig. 4

Prediction accuracy. Prediction accuracy as the Pearson correlation coefficient between predicted and observed responses to test sounds, averaged across voxels, and cross validations, in each participant. All voxels with significant response to sounds and a prediction accuracy >0.18 (corresponding to P < 0.01) in the “Baseline” dataset were included. Gray and black bars show results from the “Attend Low” and “Attend High” condition, respectively. Error bars represent the standard deviation across cross-validations.

To examine the processing changes that may have caused this weaker response to the attended ripple sounds, we next explored the voxels frequency preference, frequency selectivity and gain, as well as changes in voxel tuning with attention. The BF maps (i.e. tonotopic maps) were in accordance with previous reports (Humphries et al. 2010; Da Costa et al. 2011; Striem-Amit et al. 2011; Langers and van Dijk 2012; Moerel et al. 2012). Prediction accuracy varied with BF, such that voxels with a BF around 2 kHz had the lowest prediction accuracy (Supplementary Fig. S3). The lower prediction accuracy in the middle frequency range likely resulted from its coincidence with the spectrum of the scanner noise. Maps of TW followed previous reports as well (Kajikawa 2004; Rauschecker and Tian 2004; Kusmierek and Rauschecker 2009; Moerel et al. 2012) and showed high correspondence across conditions (see Fig. 5 for maps of individual participant S01 and for the group maps). In accordance with previous findings (Cheung et al. 2001; Imaizumi et al. 2004; Moerel et al. 2012), BF (in Hz) was negatively correlated with TW (in octaves), such that a higher BF corresponded to more narrow tuning in the “Baseline” maps (mean [SE] correlation across subjects = −0.07 [0.02]; 1-sample t-test, t(7) = −3.16, P = 0.016). For maps in conditions “Attend Low” and “Attend High” we did not observe a significant correlation between BF and TW (mean [SE] correlation across subjects = −0.03 [0.02]; 1-sample t-test, t(7) = −1.40, P = 0.203 for “Attend Low,” and mean [SE] correlation across subjects = −0.04 [0.02]; 1-sample t-test, t(7) = −2.24, P = 0.060 for “Attend High”).

Fig. 5.

Fig. 5

Maps of frequency preference and selectivity. Tonotopic maps (i.e. BF) and maps of TW per hemisphere, in a single participant (S01; left) and at group level (right). Maps showed high correspondence across conditions. The white dashed line outlines HG.

As a next step, we examined if pRFs changed across attentional conditions. We reasoned that if pRFs changed, the prediction accuracy when training and testing on natural sound responses originating from within the same attentional condition should be higher than if training and testing sounds originated from different attentional conditions. Indeed, group maps of cross-modal difference (computed as [within-condition accuracy − across-condition accuracy]) were overall positive, indicating higher within-condition prediction accuracy than across-condition prediction accuracy (Fig. 6A). Cluster-size permutation thresholding showed that the size of the positive clusters in the map (i.e. where within-condition accuracy > across-condition accuracy) across a range of map thresholds were greater than expected by chance (Fig. 6B). This was not the case for the negative clusters in the map (i.e. where within-condition accuracy < across-condition accuracy; Fig. 6B). The same results were observed when analyzing the cross-modal differences separately per condition (i.e. when model training was either performed on data of the Attend Low condition or on data of the Attend High condition; Supplementary Fig. S4). These results confirm that in parts of the auditory cortex the within-condition prediction accuracy was indeed greater than the across-condition accuracy, indicating the presence of pRF changes across attentional conditions.

Fig. 6.

Fig. 6

Within-compared with across-condition prediction accuracy. A) The cross modal difference (computed as [within-condition − across-condition prediction accuracy]) maps show positive values, indicating that prediction accuracy was higher when probed within- compared with across-attentional conditions. The white dashed line outlines HG. Maps are cluster size corrected at a FWER = 0.05. B) Across hemispheres, the largest positive cluster size observed was greater than expected by chance . The largest negative cluster size observed did not exceed the cluster size expected by chance . This was true independent of the “cross modal difference” threshold. Note that for higher “cross modal difference” thresholds, no negative clusters were present in the maps and correspondingly no line is shown at these thresholds.

To examine pRF changes with attention, we created BF, TW, and gain histograms across conditions, separately for the core, belt, and parabelt (see Supplementary Fig. S5 for regions of interest) for all pRFs in these regions. No major differences between conditions in BF, TW, or gain were apparent (Fig. 7A–C). Indeed, when comparing the pRF parameters across the 2 attentional conditions, we did not observe a significant difference in any of the regions of interest (core: 2-sided paired t-test on BF: t(7) = −0.134, P = 0.897; 2-sided paired t-test on TW: t(7) = 1.046, P = 0.331; 2-sided paired t-test on gain: t(7) = −0.260, P = 0.802; belt: 2-sided paired t-test on BF: t(7) = −0.795, P = 0.453; 2-sided paired t-test on TW: t(7) = −1.128, P = 0.297; 2-sided paired t-test on gain: t(7) = −0.163, P = 0.875; parabelt: 2-sided paired t-test on BF: t(7) = 0.747, P = 0.479; 2-sided paired t-test on TW: t(7) = −0.584, P = 0.578; 2-sided paired t-test on gain: t(7) = −0.315, P = 0.762.

Fig. 7.

Fig. 7

Distribution of frequency preference, frequency selectivity, and gain. Distribution of frequency preference (BF, in kHz), frequency selectivity (TW, in octaves), and gain (in PSC) are shown from top to bottom, across core, belt, and parabelt shown from left to right. In A, the attended frequencies are indicated by black vertical lines. No differences between “Attend Low” (gray line) and “Attend High” (black line) were apparent.

Following our observation that the largest changes in ripple responses occurred in voxels whose BF was close to the attended frequency, we then explored TW and gain changes as a function of voxel BF. This analysis showed a TW change with attention in the auditory parabelt. Specifically, while no significant attention-induced TW differences were observed in the core (2-way repeated measures ANOVA with factors “Attention” and “Distance to target frequency”; no significant interaction or main effect of “Attention”; significant main effect of “Distance to target frequency”, F(1,5) = 35.426; P = 9.57 × 10−13) or belt (2-way repeated measures ANOVA with factors “Attention” and “Distance to target frequency”; no significant interaction or main effect of “Attention”; significant main effect of “Distance to target frequency”, F(1,5) = 41.591; P = 9.23 × 10−14), TW narrowed in parabelt voxels with attention (Fig. 8A). This effect was not driven by a sampling bias, as TW narrowing in the auditory parabelt was observed both when the target frequency was 300 Hz and 4 kHz (Supplementary Fig. S6). The effect of attention on TW was stronger in voxels with a BF close to the attended frequency (significant interaction between factors “Attention” and “Distance to target frequency”; F(1,5) = 3.522; P = 0.011), specifically in voxels with a BF up to 0.5 octaves distance from the attended frequency (follow-up paired t-tests per level of factor “Distance to target frequency”; significant difference in bin 0–0.25 octaves before but not after multiple comparison correction (t = −2.606; Puncorr = 0.018; Pcorr = 0.105; significant difference in bin 0.25–0.5 octaves; t = −3.669; Puncorr = 0.004; Pcorr = 0.024; no trends toward significant differences in the other bins). No significant attention-induced gain changes were observed in the core (2-way repeated measures ANOVA with factors “Attention” and “Distance to target frequency”; no significant interaction or main effect of “Attention”; significant main effect of “Distance to target frequency”, F(1,5) = 3.414; P = 0.013), belt (2-way repeated measures ANOVA with factors “Attention” and “Distance to target frequency”; no significant interaction or main effect of “Attention”; significant main effect of “Distance to target frequency”, F(1,5) = 13.192; P = 2.99 × 10−7), or parabelt (2-way repeated measures ANOVA with factors “Attention” and “Distance to target frequency”; no significant interaction, main effect of “Attention” or “Distance to target frequency”; Fig. 8B).

Fig. 8.

Fig. 8

Effect of attention on frequency selectivity and gain. A) TW as a function of distance (in octaves) between BF and the attended target frequency. In the parabelt, TW was more narrow when the BF was attended (dashed black line) compared with when it was not attended (solid black line). This effect was strongest in those voxels whose BF was closest to the attended sound frequency. B) Gain as a function of distance (in octaves) between BF and the attended target frequency. No attention-dependent differences in gain were observed.

As a final step, we used computational modeling to explore how the pRF changes observed in our data related to the observed fMRI responses to ripple sounds. To this end, we modeled the auditory cortex as a set of voxels whose response was fully characterized by their frequency preference and selectivity. BF was set to the frequency preference (averaged over attentional conditions) observed in the left hemisphere of a representative participant (S01 reported in Fig. 7A). While BF was the same across attentional conditions, TW was modeled to sharpen with attention in voxels with a BF close to the attended one. TW parameters across attentional conditions were set to match the results reported in Fig. 8A. Gain was kept the same across voxels, as no attention-dependent effect on gain was observed. Responses to ripple sounds per attentional condition were modeled as the multiplication of their sound spectrum with the condition-specific pRF of each voxel and normalized between 1% and 2% signal change to match the response strength observed in our dataset. We then explored TW changes as a function of the difference between the attended frequency and the voxel’s BF following the same procedure as used on our data. That is, we computed response suppression with attention in voxel bins defined relative to the target frequency (ranging from 0 to 1.5 octave in 6 linearly spaced bins). In accordance with our data (reported in Fig. 3B), modeled responses to attended ripples in voxels with a BF that closely matched the attended frequency were weaker than when the same ripple sounds were not attended (Fig. 3C). While model output suggested that TW narrowing could account for ~1.5% suppression in responses to ripple sounds, our data showed up to 5% response suppression (compare panels B and C in Fig. 3). This indicates that the observed weaker response to attended ripple sounds in our dataset could have been at least in part driven by pRF narrowing.

Discussion

Our selective attention task resulted in a faster response to targets in attended compared with nonattended sounds outside the scanner. While reaction times qualitatively followed the same pattern, the effect of attention on behavior was not statistically significant inside the scanner. Possibly the level of task compliance (i.e. attentional engagement) varied over time while participants were inside the scanner, weakening the overall effect of attention on behavior. It would be of interest to evaluate this in future research. However, as behavioral results on average followed the expected pattern, we likely tapped into the same mechanism inside and outside the scanner.

Task performance was accompanied by sharpening of frequency tuning in voxels whose frequency preference closely matched the attended sound frequency, resulting in a lower response in auditory cortex to attended sounds. This observation was surprising, as previous fMRI studies of human auditory cortex reported increased BOLD responses, interpreted as increased gain, with attention (Paltoglou et al. 2009; Da Costa et al. 2013; Dick et al. 2017; Riecke et al. 2017, 2018). However, our findings are in line with results of previous animal electrophysiology studies reporting attention-induced receptive field changes (Fritz et al. 2003, 2007; Lee and Middlebrooks 2011) and reduced responses (David et al. 2012). Our results refute increased gain at voxel level as an exclusive mechanism of selective attention in the auditory cortex, and instead support the notion that the effect of attention on auditory cortex may be described as an adaptive filter optimized to improve discriminability between relevant and irrelevant sounds (Fritz et al. 2007).

Conceptualizing the effect of attention on auditory cortex as an adaptive filter could reconcile our results with previous fMRI studies. While gain modulation could be optimal for the subset of stimuli and task settings employed by previous fMRI studies, we may have tapped into a combination of experimental settings that instead required the sharpening of frequency tuning. A main difference of our study compared with previous work is the experimental design. Previous fMRI studies simultaneously presented the target (attended) and reference (nonattended) sounds in a scene, whereas our stimulus presentation was sequential (i.e. consecutive presentation of target and reference sounds). While simultaneous presentation requires enhancing attended sounds compared with background noise, such a strategy may not be needed when only 1 sound stream is present. In sequential presentation, inhibiting responses to irrelevant stimuli (as operationalized through increased frequency selectivity) may be more beneficial than increasing responses to relevant ones. Additionally, stimulus properties may have contributed to the attentional mechanism that we observed. Indeed, increased selectivity with attention has been observed in neuroimaging studies across sensory modalities (Murray and Wojciulik 2004; Ahveninen et al. 2011; Kikuchi et al. 2019), but only in case attention was directed to features processed in largely overlapping neuronal populations (Kikuchi et al. 2019). While the frequency content of the sounds competing for attention in our study should have driven different neuronal populations in the tonotopic map in early auditory cortex where neuronal (and voxel) tuning is rather simple, in higher order auditory cortex (i.e. the parabelt) attended and nonattended acoustic features may have fallen within a neuron’s broad and complex tuning profile. As a result, the ripple sounds of the competing attentional conditions may have driven overlapping neuronal populations, making simplification of neuronal tuning through sharpening the computationally favorable attentional mechanism.

Based on responses of a simple computational model of auditory cortex, we interpreted the weaker response to attended ripple sounds in our dataset as at least in part driven by pRF narrowing. Model output suggested that TW narrowing could account for ~1.5% suppression in responses to ripple sounds. Instead, our data showed up to 5% response suppression. We currently entertain 2 possible explanations for this discrepancy. First, the simplicity of and assumptions behind our modeling approach could be responsible. We characterized the voxel’s pRF by a 1-dimensional Gaussian function. This represents a simplification of voxel tuning, as a 1-dimensional Gaussian cannot characterize complex, multi-peaked frequency preferences that are known to be present in primary and especially higher-order auditory regions (Kadia 2002; Sadagopan and Wang 2009; Moerel et al. 2013; Kikuchi et al. 2014). It is possible that we failed to pick up attention-induced changes beyond the main frequency preference, and that these changes account for the part of the observed responses reduction that currently remains unexplained. The employment of fMRI encoding techniques (Moerel et al. 2013; Santoro et al. 2014) as opposed to the pRF mapping would have allowed assessment of complex, multi-peaked voxel tuning. However, compared with fMRI encoding, our current approach is preferential for reliably assessing attention-induced changes in the voxel’s pRF (Lage-Castellanos et al. 2020). Second, attentional changes could be weaker on the natural sounds than on the ripples, because task instructions clearly specified that targets could only occur in ripple sounds. Thus, participants may have partly “disengaged” attention when listening to a natural sound. As a result, the pRF sharpening as estimated based on natural sounds may underestimate the pRF sharpening present when participants process a ripple sound.

In addition to the attention-induced narrowing of frequency tuning, we observed an overall (i.e. attention-independent) sharpening of TW with increasing distance between voxel’s preferred frequency and the attended frequency. This could reflect a true neurobiological finding, i.e. a condition-independent narrowing of TW for the middle frequency range. Alternatively, the sharpening of TW could be an artifact of the pRF estimation. The accuracy of pRF estimation was lowest for voxels in the middle frequency range (~2 kHz; Supplementary Fig. S3B). The lower prediction accuracy in the 2 kHz range is in accordance with our previous observations (Santoro et al. 2017) and likely results from the fMRI noise. That is, the BOLD response may be partially saturated in those neuronal population whose frequency preference matches the spectrum of the scanner noise. PRF estimation is biased toward selecting more narrowly tuned Gaussian profiles in cases of low SNR (Lage-Castellanos et al. 2020). While the pRF estimation approach was optimized to address this bias, we cannot exclude that it was still partially present for voxels with the lowest SNR. As the affected frequency region is far away from the attended regions, we do not expect this to affect our main conclusions.

In conclusion, our results argue for the existence of diverse attention-induced effects in auditory cortex that may depend on task, stimulus, and setting. These observations urge for future work that directly compares tasks with regard to their influence on sensory representation throughout auditory cortex. It would furthermore be of interest to explore the influence of the stimulus characteristics embedding the attended feature and the effect of switching features (e.g. attending specific spectral or temporal modulations instead of a specific frequency). Furthermore, studying the relationship between which feature is attended and how stably this feature is encoding throughout cortical depth (O’Connell et al. 2014; De Martino et al. 2015) may shed light on the computational relevance of the cortical depth-dependent organization of the auditory cortex (Moerel et al. 2018, 2019). The approach presented here can be followed for such future endeavors.

Supplementary Material

SupplementaryMaterial_Moerel_R1_bhac427

Acknowledgements

The authors declare no financial or non-financial competing interests. MM, FDM, and GG designed the research. MM, FDM, and OFG performed the research. MM and ALC analyzed the data. All authors contributed to writing the paper.

Contributor Information

Agustin Lage-Castellanos, Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD, Maastricht, The Netherlands; Maastricht Brain Imaging Center (MBIC), 6200 MD, Maastricht, The Netherlands; Department of NeuroInformatics, Cuban Neuroscience Center, Havana City 11600, Cuba.

Federico De Martino, Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD, Maastricht, The Netherlands; Maastricht Brain Imaging Center (MBIC), 6200 MD, Maastricht, The Netherlands; Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN 55455, United States.

Geoffrey M Ghose, Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN 55455, United States.

Omer Faruk Gulban, Brain Innovation B.V, 6229 EV, Maastricht, The Netherlands.

Michelle Moerel, Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD, Maastricht, The Netherlands; Maastricht Brain Imaging Center (MBIC), 6200 MD, Maastricht, The Netherlands; Maastricht Centre for Systems Biology, Maastricht University, 6200 MD, Maastricht, The Netherlands.

Funding

This work was supported by the Dutch Research Council (NWO; grant 451-15-012 to MM and grant 864-13-012 to FDM) and the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Programme (grant ERC-CoG-2020-101001270 to FDM).

Conflict of interest statement: None declared.

References

  1. Ahveninen J, Hamalainen M, Jaaskelainen IP, Ahlfors SP, Huang S, Lin F-H, Raij T, Sams M, Vasios CE, Belliveau JW. Attention-driven auditory cortex short-term plasticity helps segregate relevant sounds from noise. Proc Natl Acad Sci. 2011:108(10):4182–4187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Atiani S, Elhilali M, David SV, Fritz JB, Shamma SA. Task difficulty and performance induce diverse adaptive patterns in gain and shape of primary auditory cortical receptive fields. Neuron. 2009:61(3):467–480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Atiani S, David S, Elgueda D, Locastro M, Radtke-Schuller S, Shamma S, Fritz J. Emergent selectivity for task-relevant stimuli in higher-order auditory cortex. Neuron. 2014:82(2):486–499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Carrasco M. Visual attention: the past 25 years. Vis Res. 2011:51(13):1484–1525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cheung SW, Bedenbaugh PH, Nagarajan SS, Schreiner CE. Functional organization of squirrel monkey primary auditory cortex: responses to pure tones. J Neurophysiol. 2001:85(4):1732–1749. [DOI] [PubMed] [Google Scholar]
  6. Da Costa S, Zwaag W, Marques JP, Frackowiak RSJ, Clarke S, Saenz M. Human primary auditory cortex follows the shape of Heschl’s gyrus. J Neurosci. 2011:31(40):14067–14075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Da Costa S, Zwaag W, Miller LM, Clarke S, Saenz M. Tuning in to sound: frequency-selective attentional filter in human primary auditory cortex. J Neurosci. 2013:33(5):1858–1863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. David SV, Fritz JB, Shamma SA. Task reward structure shapes rapid receptive field plasticity in auditory cortex. Proc Natl Acad Sci. 2012:109(6):2144–2149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. De Martino F, Moerel M, Ugurbil K, Goebel R, Yacoub E, Formisano E. Frequency preference and attention effects across cortical depths in the human primary auditory cortex. Proc Natl Acad Sci. 2015:112(52):16036–16041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Dick FK, Lehet MI, Callaghan MF, Keller TA, Sereno MI, Holt LL. Extensive tonotopic mapping across auditory cortex is recapitulated by spectrally directed attention and systematically related to cortical myeloarchitecture. J Neurosci. 2017:37(50):12187–12201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dumoulin SO, Wandell BA. Population receptive field estimates in human visual cortex. NeuroImage. 2008:39(2):647–660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fritz JB. Differential dynamic plasticity of A1 receptive fields during multiple spectral tasks. J Neurosci. 2005:25(33):7623–7635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fritz J, Shamma S, Elhilali M, Klein D. Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nat Neurosci. 2003:6(11):1216–1223. [DOI] [PubMed] [Google Scholar]
  14. Fritz J, Elhilali M, David S, Shamma S. Auditory attention--focusing the searchlight on sound. Curr Opin Neurobiol. 2007:17(4):437–455. [DOI] [PubMed] [Google Scholar]
  15. Goebel R, Esposito F, Formisano E. Analysis of functional image analysis contest (FIAC) data with brainvoyager QX: from single-subject to cortically aligned group general linear model analysis and self-organizing group independent component analysis. Hum Brain Mapp. 2006:27(5):392–401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Hopf J, Boelmans K, Schoenfeld M, Luck S, Heinze H. Attention to features precedes attention to locations in visual search: evidence from electromagnetic brain responses in humans. J Neurosci. 2004:24(8):1822–1832. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Humphries C, Liebenthal E, Binder JR. Tonotopic organization of human auditory cortex. NeuroImage. 2010:50(3):1202–1211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Imaizumi K, Priebe NJ, Crum PAC, Bedenbaugh PH, Cheung SW, Schreiner CE. Modular functional organization of cat anterior auditory field. J Neurophysiol. 2004:92(1):444–457. [DOI] [PubMed] [Google Scholar]
  19. Kadia SC. Spectral integration in A1 of awake primates: neurons with single- and multipeaked tuning characteristics. J Neurophysiol. 2002:89(3):1603–1622. [DOI] [PubMed] [Google Scholar]
  20. Kajikawa Y. A comparison of neuron response properties in areas A1 and CM of the Marmoset monkey auditory cortex: tones and Broadband noise. J Neurophysiol. 2004:93(1):22–34. [DOI] [PubMed] [Google Scholar]
  21. Kay KN, David SV, Prenger RJ, Hansen K a, Gallant JL. Modeling low-frequency fluctuation and hemodynamic response timecourse in event-related fMRI. Hum Brain Mapp. 2008:29(2):142–156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kay KN, Rokem A, Winawer J, Dougherty RF, Wandell B a. GLMdenoise: A fast, automated technique for denoising task-based fMRI data. Front Neurosci. 2013:7(247):1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kay K, Weiner K, Grill-Spector K. Attention reduces spatial uncertainty in human ventral temporal cortex. Curr Biol. 2015:25(5):595–600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kikuchi Y, Horwitz B, Mishkin M, Rauschecker J. Processing of harmonics in the lateral belt of macaque auditory cortex. Front Neurosci. 2014:8(204):1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Kikuchi Y, Ip J, Lagier G, Mossom JC, Kumar X, Christopher X, Petkov I, Nick X, Barraclough E, Vuong QC. Interactions between conscious and subconscious signals: selective attention under feature-based competition increases neural selectivity during brain adaptation. J Neurosci. 2019:39(28):5506–5516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Kim JJ, Crespo-Facorro B, Andreasen NC, O’Leary DS, Zhang B, Harris G, Magnotta V a. An MRI-based parcellation method for the temporal lobe. NeuroImage. 2000:11(4):271–288. [DOI] [PubMed] [Google Scholar]
  27. Kusmierek P, Rauschecker JP. Functional specialization of medial auditory belt cortex in the alert rhesus monkey. J Neurophysiol. 2009:102(3):1606–1622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Lage-Castellanos A, Valente G, Senden M, De Martino F. Investigating the reliability of population receptive field size estimates using fMRI. Front Neurosci. 2020:14(825):1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Lakatos P, Musacchia G, O’Connel MN, Falchier AY, Javitt DC, Schroeder CE. The spectrotemporal filter mechanism of auditory selective attention. Neuron. 2013:77(4):750–761. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Langers DRM, Dijk P. Mapping the tonotopic organization in human auditory cortex with minimally salient acoustic stimulation. Cereb Cortex. 2012:22(9):2024–2038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Lee C, Middlebrooks J. Auditory cortex spatial sensitivity sharpens during task performance. Nat Neurosci. 2011:14(1):108–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Martinez-Trujillo J, Treue S. Feature-based attention increases the selectivity of population responses in primate visual cortex. Curr Biol. 2004:14(9):744–751. [DOI] [PubMed] [Google Scholar]
  33. Maunsell J, Treue S. Feature-based attention in visual cortex. Trends Neurosci. 2006:29(6):317–322. [DOI] [PubMed] [Google Scholar]
  34. Moeller S, Yacoub E, Olman CA, Auerbach E, Strupp J, Harel N, Uǧurbil K. Multiband multislice GE-EPI at 7 tesla, with 16-fold acceleration using partial parallel imaging with application to high spatial and temporal whole-brain fMRI. Magn Reson Med. 2010:63(5):1144–1153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Moerel M, De Martino F, Formisano E. Processing of natural sounds in human auditory cortex: tonotopy, spectral tuning, and relation to voice sensitivity. J Neurosci. 2012:32(41):14205–14216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Moerel M, De Martino F, Santoro R, Ugurbil K, Goebel R, Yacoub E, Formisano E. Processing of natural sounds: characterization of multipeak spectral tuning in human auditory cortex. J Neurosci. 2013:33(29):11888–11898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Moerel M, De Martino F, Ugurbil K, Yacoub E, Formisano E. Processing of frequency and location in human subcortical auditory structures. Sci Rep. 2015:5(1):1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Moerel M, De Martino F, Uğurbil K, Formisano E, Yacoub E. Evaluating the columnar stability of acoustic processing in the human auditory cortex. J Neurosci. 2018:38(36):7822–7832. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Moerel M, De Martino F, Uğurbil K, Yacoub E, Formisano E. Processing complexity increases in superficial layers of human primary auditory cortex. Sci Rep. 2019:9(1):1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Murray SO, Wojciulik E. Attention increases neural selectivity in the human lateral occipital complex. Nat Neurosci. 2004:7(1):70–74. [DOI] [PubMed] [Google Scholar]
  41. O’Connell MN, Barczak A, Schroeder CE, Lakatos P. Layer specific sharpening of frequency tuning by selective attention in primary auditory cortex. J Neurosci. 2014:34(49):16496–16508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Paltoglou AE, Sumner CJ, Hall DA. Examining the role of frequency specificity in the enhancement and suppression of human cortical activity by auditory selective attention. Heart Res. 2009:257(1-2):106–118. [DOI] [PubMed] [Google Scholar]
  43. Rauschecker JP, Tian B. Processing of band-passed noise in the lateral auditory belt cortex of the rhesus monkey. J Neurophysiol. 2004:91(6):2578–2589. [DOI] [PubMed] [Google Scholar]
  44. Riecke L, Peters JC, Valente G, Kemper VG, Formisano E, Sorger B. Frequency-selective attention in auditory scenes recruits frequency representations throughout human superior temporal cortex. Cereb Cortex. 2017:27(5):3002–3014. [DOI] [PubMed] [Google Scholar]
  45. Riecke L, Peters JC, Valente G, Poser BA, Kemper VG, Formisano E, Sorger B. Frequency-specific attentional modulation in human primary auditory cortex and midbrain. NeuroImage. 2018:174:274–287. [DOI] [PubMed] [Google Scholar]
  46. Sadagopan S, Wang X. Nonlinear spectrotemporal interactions underlying selectivity for complex sounds in auditory cortex. J Neurosci. 2009:29(36):11192–11202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Sadil P, Cowell RA, Huber DE. A modeling framework for determining modulation of neural-level tuning from non-invasive human fMRI data. 2022: bioRxiv. 2021.03.04.433362. [DOI] [PMC free article] [PubMed]
  48. Saenz M, Buracas G, Boynton G. Global effects of feature-based attention in human visual cortex. Nat Neurosci. 2002:5(7):631–632. [DOI] [PubMed] [Google Scholar]
  49. Santoro R, Moerel M, De Martino F, Goebel R, Ugurbil K, Yacoub E, Formisano E. Encoding of natural sounds at multiple spectral and temporal resolutions in the human auditory cortex. PLoS Comput Biol. 2014:10(1):e1003412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Santoro R, Moerel M, De Martino F, Valente G, Ugurbil K, Yacoub E, Formisano E. Reconstructing the spectrotemporal modulations of real-life sounds from fMRI response patterns. Proc Natl Acad Sci U S A. 2017:114(18):4799–4804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Schweisfurth M, Schweizer R, Treue S. Feature-based attentional modulation of orientation perception in somatosensation. Front Hum Neurosci. 2014:8(519):1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Setsompop K, Gagoski BA, Polimeni JR, Witzel T, Wedeen VJ, Wald LL. Blipped-controlled aliasing in parallel imaging for simultaneous multislice echo planar imaging with reduced g-factor penalty. Magn Reson Med. 2012:67(5):1210–1224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Sitek K, Gulban O, Calabrese E, Johnson G, Lage-Castellanos A, Moerel M, Ghosh S, De Martino F. Mapping the human subcortical auditory system using histology, postmortem MRI and in vivo MRI at 7T. elife. 2019:8:e48932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Striem-Amit E, Hertz U, Amedi A. 2011. Extensive cochleotopic mapping of human auditory cortical fields obtained with phase-encoding fMRI. PLoS One 6(3):e17832. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Teeuwisse W, Brink W, Webb A. Quantitative assessment of the effects of high-permittivity pads in 7 Tesla MRI of the brain. Magn Reson Med. 2012:67(5):1285–1293. [DOI] [PubMed] [Google Scholar]
  56. Thomas JM, Huber E, Stecker GC, Boynton GM, Saenz M, Fine I. Population receptive field estimates of human auditory cortex. NeuroImage. 2015:105:428–439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Tootell RBH, Hadjikhani N, Hall EK, Marrett S, Vanduffel W, Vaughan JT, Dale AM. The retinotopy of visual spatial attention. Neuron. 1998:21(6):1409–1422. [DOI] [PubMed] [Google Scholar]
  58. Tourdias T, Levesque IR, Su J, Rutt BK, Saranathan M. Visualization of intra-thalamic nuclei with optimized white-matter-nulled MPRAGE at 7T. NeuroImage. 2014:84:534–545. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Van de Moortele P-F, Auerbach EJ, Olman C, Yacoub E, Uğurbil K, Moeller S. T1 weighted brain images at 7 Tesla unbiased for proton density, T2* contrast and RF coil receive B1 sensitivity with simultaneous vessel visualization. NeuroImage. 2009:46(2):432–446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Warren S, Yacoub E, Ghose G. Featural and temporal attention selectively enhance task-appropriate representations in human primary visual cortex. Nat Commun. 2014:5(1):1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SupplementaryMaterial_Moerel_R1_bhac427

Articles from Cerebral Cortex (New York, NY) are provided here courtesy of Oxford University Press

RESOURCES