Abstract
Despite the multidimensional and temporally fleeting nature of auditory signals we quickly learn to assign novel sounds to behaviorally relevant categories. The neural systems underlying the learning and representation of novel auditory categories are far from understood. Current models argue for a rigid specialization of hierarchically organized core regions that are fine-tuned to extracting and mapping relevant auditory dimensions to meaningful categories. Scaffolded within a dual-learning systems approach, we test a competing hypothesis: the spatial and temporal dynamics of emerging auditory-category representations are not driven by the underlying dimensions but are constrained by category structure and learning strategies. To test these competing models, we used functional Magnetic Resonance Imaging (fMRI) to assess representational dynamics during the feedback-based acquisition of novel non-speech auditory categories with identical dimensions but differing category structures: rule-based (RB) categories, hypothesized to involve an explicit sound-to-rule mapping network, and information integration (II) based categories, involving pre-decisional integration of dimensions via a procedural-based sound-to-reward mapping network. Adults were assigned to either the RB (n = 30, 19 females) or II (n = 30, 22 females) learning tasks. Despite similar behavioral learning accuracies, learning strategies derived from computational modeling and involvements of corticostriatal systems during feedback processing differed across tasks. Spatiotemporal multivariate representational similarity analysis revealed an emerging representation within an auditory sensory-motor pathway exclusively for the II learning task, prominently involving the superior temporal gyrus (STG), inferior frontal gyrus (IFG), and posterior precentral gyrus. In contrast, the RB learning task yielded distributed neural representations within regions involved in cognitive-control and attentional processes that emerged at different time points of learning. Our results unequivocally demonstrate that auditory learners’ neural systems are highly flexible and show distinct spatial and temporal patterns that are not dimension-specific but reflect underlying category structures and learning strategies.
Keywords: auditory category learning, category structure, neural representation, spatiotemporal dynamics, computational modeling, MVPA
Introduction
Learning to map continuous acoustic information to meaningful, behaviorally relevant auditory categories is critical to making sense of complex soundscapes. Behaviorally relevant sounds like speech and music contain multiple acoustic dimensions that are perceived differentially depending on listening experiences. Listeners need to extract relevant dimensions from continuously varying sound inputs and map the extracted structure to behaviorally relevant equivalence classes, or categories (Diehl et al., 2004). The successful acquisition of auditory category mapping requires our brain to efficiently reorganize to encode category-relevant acoustic signals for the formation of task-relevant neural representations and categorization decisions (Ashby and Maddox, 2005; Golestani and Zatorre, 2004; Ohl and Scheich, 2005; Schultz et al., 1998; Tricomi et al., 2006). Despite decades of behavioral, computational, and neuroimaging research examining auditory categorization, the neural dynamics underlying different phases of the category learning process are poorly understood. Our objective here is to examine the block-by-block spatiotemporal patterns in neural representations of two types of auditory category structures when participants learning to categorize sounds with feedback. The two types of category structures are in the identical dimensions: a complex rule-based (RB) category structure that is hypothesized to involve effortful, hypothesis-driven sound-to-rule mapping (Ashby et al., 1998), and an information-integration (II) category structure, that is hypothesized to involve a pre-decisional procedural-based integration of dimensions (Ashby et al., 1998; Ashby and Gott, 1988). Categories with II structure are ubiquitous in the auditory world where these categories are not optimally learned by rules because the category structure is composed of acoustic dimensions involving incommensurable units that are required to be integrated prior to any decision process. For example, some speech and auditory category structures require pre-decisional integration of different dimensions that is difficult to verbalize and can be learned without explicit attention (Lim et al., 2019; Yi et al., 2016). In contrast, for the RB structure, listeners can attribute complex rules to disambiguate sound categories for successful learning (e.g., a physician listening to auscultation sounds and making rule-based decisions).
One view of category learning posits a two-stage hierarchical model that does not differentiate between the learning of different category structures (Freedman et al., 2003; Jiang et al., 2007; Jiang et al., 2018). Specifically, the current two-stage models involve a perceptual learning phase in the sensory cortex (e.g., auditory cortex), followed by a higher-level learning phase in the prefrontal cortex, which can learn to identify, discriminate, or categorize stimuli. Per the two-stage model, auditory category learning involves increased selectivity in the representation of relevant auditory features within the auditory cortex (e.g., sharpening neuronal populations in auditory areas to form a task-independent reduction of the sensory input), which in turn serves as the input to prefrontal regions that implement categorical decisions (Jiang et al., 2018; Myers et al., 2009). The prefrontal cortex is not considered as the site of category sound representation per se but play important role in facilitating the formation of representations in the auditory cortex during learning (Myers, 2014). The frontal-temporal regions are structurally and functionally connected via dorsal and ventral auditory streams (Kelly et al., 2010; Sheppard et al., 2012; Wong et al., 2011), providing the substrate for general sound-to-category mapping.
In contrast to the two-stage hierarchical model, the Dual Learning System (DLS) model posits that distinct corticostriatal learning systems enable different forms of auditory category learning (Chandrasekaran et al., 2014a; Chandrasekaran et al., 2014b). The neurobiological mechanisms proposed in the DLS model are derived from theoretical frameworks in vision (Ashby et al., 1998; Ashby and Maddox, 2005; Helie et al., 2015; Seger, 2008; Seger and Miller, 2010). Emerging literature has been examining multiple or dual learning systems across various domains, including automaticity (Ashby and Crossley, 2012; Ashby and Maddox, 2005, 2011), visual category learning (Carpenter et al., 2016; Nomura et al., 2007; Nomura and Reber, 2012; Seger, 2008; Seger and Miller, 2010) and speech category learning (Feng et al., 2019; Myers, 2014; Myers and Swan, 2012; Perrachione et al., 2011; Yi et al., 2016). In the auditory domain (Chandrasekaran et al., 2014a; Chandrasekaran et al., 2014b), the DLS model proposes two distinct and parallel corticostriatal circuits that underlie category learning: an explicit sound-to-rule stream that involves a cortico-thalamic-basal ganglia network composed of the prefrontal cortex (PFC), hippocampus, thalamus, head of caudate nucleus, and anterior cingulate cortex (ACC), and a procedural-based sound-to-reward stream that involves the auditory-motor network in coordination with the reward-based striatal circuitry. Per the DLS model, during sound-to-rule mapping, a rule is generated within the auditory network and maintained in the prefrontal cortex (PFC)-hippocampus-thalamus circuit. This thalamocortical circuit receives inhibitory projections from the globus pallidus (GP), which in turn receives inhibitory projections from the caudate. Excitation of the caudate by the PFC results in excitation of the PFC by the thalamus, providing a neural mechanism through which the pre-existing rule can be maintained or updated with external trial-by-trial corrective feedback signals. In contrast, the sound-to-reward system implicitly maps sounds to behavioral responses via the reward-sensitive striatal circuitry. This is achieved via many-to-one corticostriatal projections from the superior temporal cortex to the putamen. A closed-loop projection back to the STG allows the basal ganglia to fine-tune STG function. The sound-to-reward system is not consciously penetrable, and it associates perception with motor actions that lead to reinforcements via feedback. Mechanistically, during sound-to-reward mapping, a single medium-spiny neuron in the putamen implicitly associates an abstract motor response with a cluster of sensory cells. Corticostriatal synaptic learning is facilitated by a reinforcement signal (e.g., corrective feedback) from the ventral striatum (e.g., nucleus accumbens, NAC). Both systems require supervised feedback signals and involvement of the reward-related striatal system to successfully map sounds onto categories. However, the DLS model predicts the two systems utilize the reward signals differently depending on the category structure and learning strategy (e.g., Yi and Chandrasekaran, 2016) and therefore involving different corticostriatal systems. When the auditory category structure is discernable by rules, the sound-to-rule system dominates learning; however, when the category structure requires implicit information integration across multiple dimensions, the sound-to-reward system is more optimal and may take over learning to ensure high categorization accuracies.
Here we generated novel non-speech auditory category structures with the same underlying dimensions, spectral and temporal modulations. These dimensions are the building blocks of complex signals like speech and music, where they are robustly represented in the auditory cortex (Schonwiesner and Zatorre, 2009). We used these dimensions to create two category structures: a rule-based (RB) category structure allows categorization using an explicit decisional criterion on the spectral modulation dimension (high vs. low) and another on the temporal modulation dimension (fast vs. slow), and an information-integration (II) category structure that was created by rotating the separable structure by 45° clockwise. Optimal learning of the II categories requires pre-decisional integration of the two dimensions to determine categories. Using non-speech auditory categories allows us to control for the learners’ prior auditory experiences with speech and music. Critically, it provides a flexible way to manipulate the category structures using the same dimensions while the behavioral learning performance can be comparable across the two category structures, overcoming the limitation in unbalanced performance or stimulus design in previous studies. We recruited 60 participants who were randomly assigned to either the II (n = 30) or RB (n = 30) groups in a between-subjects design. All participants in a group were actively trained on the same structure using a feedback-based sound-to-category training paradigm during fMRI scanning. We examine the temporal dynamics in neural representations of category-relevant perceptual information with a single session of categorization training (i.e., six blocks of 240 trials). Prior studies demonstrate that this initial session is critical, with significant individual differences in learning success relates to the overall long-term learning success (Reetzke et al., 2018). We focused on examining group differences in sound-induced multivoxel pattern representations and feedback-related brain activations across training blocks (i.e., group-by-block interaction effect). Per the two-stage perceptual hierarchy model, irrespective of category structure (RB vs. II), we expect neural representations to emerge within superior temporal gyrus (STG) and inferior frontal gyrus (IFG) ensembles at different time-points in learning. In contrast, per the DLS model, we predict that representations are task and category-structure contingent such that distinct emerging patterns in neural representations would be observed within the auditory-motor circuitry for II category structures and within the distributed rule-based circuitry for the RB category structure.
Materials and Methods
Participants.
Young adult native Mandarin speakers (n = 60; mean age: 21.2 years; SD = 2.1; age range: 18-28 years; all right-handed) with no hearing problems (based on a self-report questionnaire) were recruited from the South China Normal University community. Some of the participants can speak different dialects to certain degrees other than standard Mandarin. Participants were excluded if they self-reported any major psychiatric conditions, neurological disorders, hearing disorders, head trauma, or use of psychoactive drugs or psychotropic medication. All participants had a normal or corrected-to-normal vision and minimal formal music training experience (< 1 year). Participants were randomly assigned to either the RB or II group (n = 30 each). The two groups of participants were highly similar in age (t(58) = 0.249, P = 0.805), gender (II: 22 females; RB: 19 females: Chi-square = 0.693, P = 0.405), and years in musical training (t(58) = 1.644, P = 0.106) confirmed with post hoc analyses. The participants received monetary compensation for their participation. The recruitment, consent, testing, and compensation procedures were approved by the South China Normal University Institutional Review Board and The Joint Chinese University of Hong Kong – New Territories East Cluster Clinical Research Ethics Committee (The Joint CUHK-NTEC CREC).
Experimental design and Stimuli.
The experimental procedure and design are consistent with previous studies (Feng et al., 2019; Yi and Chandrasekaran, 2016) except for the sound stimuli. For the current study, forty sounds per category structure were generated by modulating a white noise stimulus (duration = 0.5 sec; digital sampling rate = 44.1 kHz; low-pass filtered at 4.8 kHz). Spectral modulation frequency ranged from 0.25 to 1.33 cyc/oct. Temporal modulation frequency ranged from 4 to 10 Hz (Fig. 1A). These modulation frequencies were selected because they are strongly represented in the human auditory cortex (Schonwiesner and Zatorre, 2009). Speech intelligibility would be significantly affected if these modulation components are removed from the auditory signals (Elliott and Theunissen, 2009). The amplitude of modulation depth was 30 dB. All sounds were RMS amplitude normalized to 80 dB. To create the RB category structure, 40 coordinates were first generated in an abstract normalized two-dimensional space, with a minimum value of 0 and the maximum value of 1. Four bivariate normal distributions of 40 coordinates were centered on (0.33, 0.33), (0.33, 0.68), (0.68, 0.33), and (0.68, 0.68), with a standard deviation of 0.1 for both dimensions. Values along each dimension were logarithmically mapped onto spectral and temporal modulation frequencies. Thereby, for the RB category structure, the optimal decision boundaries reflect the placement of a decision criterion along the spectral modulation dimension at 0.58 cyc/oct and along the temporal modulation dimension at 6.32 Hz (Fig. 1A left panel, dashed lines). The II category structure was created by rotating the RB structure by 45° counterclockwise. Thus, the optimal decision boundaries for the II category structure are not easy to describe verbally and rely on both spectral and temporal modulation dimensions (Fig. 1A right panel, dashed lines).
Fig. 1.
Auditory dimensions, category structures, training paradigm, and behavioral learning performance. A, two category structures and spectrograms of sample stimuli. Left panel: forty sounds of the RB structure. Four categories (C1 - 4) of sounds are plotted in different shapes and colors in a two-dimensional perceptual space (i.e., spectral and temporal modulation dimensions). Dashed lines represent optimal decision boundaries between the four categories. Right panel: forty sounds of the II structure. Four categories (C1 - 4) of sounds are plotted in the same perceptual space. B, the feedback-based sound-to-category training procedure used in the fMRI experiment. TR = scanning repetition (2.5 sec; acquisition time = 1.7 sec). Each sound (duration = 0.5 sec) is presented within the silence gap (0.8 sec) of a TR. Each training trial consisted of three TRs. C, categorization accuracies increased over training blocks comparably for both groups. No significant difference was found between groups across blocks.
Sound-to-Category Training Procedures.
Sounds were presented via MRI-compatible circumaural headphones in an MR scanner. Visual stimuli (e.g., feedback) were presented using an in-scanner projector visible using a mirror attached to the head coil. Participants were equipped with a 2-button response box in each hand. The experiment consisted of 6 contiguous scan runs or “training blocks.” Before each training block, participants were instructed to attend to the fixation cross on the screen (10 sec). On each trial, a sound was presented during the 800-ms silence gap following the 1700-ms image acquisition (Fig. 1B; also see Imaging acquisition section for the customized sparse-sampling imaging parameters). Participants were instructed to categorize the sounds into one of four categories. Informational corrective feedback (e.g., “正确, 这是类别4°” [“CORRECT, that was a 4.”] or “错误, 这是类别4°” [“WRONG, that was a 4.”]) was displayed for 750 ms after each response. If the participant failed to respond within a two-sec window, cautionary feedback was presented (i.e., “没反应” [“NO RESPONSE”]). After feedback presentation, a fixation cross was displayed until the onset of the next imaging acquisition. To maximize the accuracy in estimation of stimulus- and feedback-evoked blood-oxygen-level-dependent (BOLD) responses, the stimulus-to-feedback interval was pooled from random samples from a uniform distribution of 2 to 4 sec. The jitter duration after feedback presentation was also derived from a distribution of 0.45 to 5 sec. Each trial lasted three TRs (7.5 sec in total). Null trials (i.e., silence trials; n = 10 per block) with a fixation cross were randomly inserted between sound trials to jitter the inter-trial intervals for better estimation of single-trial activations. Each stimulus was presented once within a block. The presentation order of the stimuli was randomized for all participants and blocks. Participants conducted a short practice for finger-button mapping before the fMRI training session.
Imaging acquisition.
MRI data were acquired using a Siemens 3T Tim Trio MRI system with a 12-channel head coil in the Brain Imaging Center at South China Normal University. Functional images were acquired using a sparse-sampling T2*-weighted gradient echo-planar imaging (EPI) pulse sequence [repetition time (TR) = 2,500 ms with 800-ms silence gap, TE = 30 ms, flip angle = 90°, 31 slices, field of view = 224 × 224 mm2, in-plane resolution = 3.5 × 3.5 mm2, slice thickness = 3.5 mm with 1.1 mm gap]. T1-weighted high-resolution structural images were acquired with a magnetization prepared rapid acquisition gradient echo sequence (176 slices, TR = 1,900 ms, TE = 2.53 ms, flip angle = 9°, voxel size = 1 × 1 × 1 mm3).
Statistical analyses Behavioral data modeling.
Participants’ trial-by-trial accuracy and reaction times were analyzed using linear mixed-effects (LME) regression analysis to assess the main effect of group and training block as well as their interaction. The fixed effects of interest were group (II = 0 and RB = 1) and the training block (1-6; mean-centered to 0). Analysis of variance for the LME models was conducted to reveal the statistical significance for the fixed factors and the interaction effect.
Besides, we used a behavioral representational similarity analysis (bRSA) to examine the extent to which category-relevant perceptual information emerges over training blocks by modeling learners’ block-by-block categorization responses. We first converted each participant’s responses into a response confusion matrix (i.e., a 40 × 40 pairwise matrix; same response = 0 and different responses = 1) for each block. We then correlated a predefined “perceptual” representational dissimilarity matrix (RDM) with the response confusion matrices. The perceptual RDM was constructed by calculating the standardized Euclidean distance between each pair of sounds in hypothesized perceptual space (see Fig. S1 for the construction procedure). To construct this space, we first generated different sound structures in normalized space. Each point in this space representing a sound stimulus was then transformed to sounds in the physical spectral and temporal frequency-modulation space. To calculate the perceptual distance between each pair of sounds (i.e., dij in Fig. S1) for RDMs, we log-transformed the points in the physical scale. We then calculated the RDMs based on the standardized Euclidean distance for each pair of sounds in the hypothesized perceptual space (see the RDMs in Fig. 2A). Because the sounds from both structures are drawn from the same distribution in standardized space, the correlation between the II and RB RDMs is almost at the ceiling (r = 0.97, P < 0.001), as expected. We used Spearman’s rank correlation to calculate the similarity between the perceptual RDMs and participants’ response confusion matrices (see Fig. S2A for the graphical analysis procedure). The higher similarity between the perceptual RDMs and response confusion matrices represents more robust representations of category-perceptual information.
Fig. 2.
Multivariate spatiotemporal searchlight-based representational similarity analysis (RSA) procedure and putative temporal emerging patterns in neural representations. A, perceptual representational dissimilarity matrices (i.e., RDM) for II and RB structures. The RDMs were calculated based on the standardized Euclidean distance of each pair of sounds in a two-dimension perceptual space. Within-category pairs were labeled and highlighted in number. PD = perceptual distance. B, the spatiotemporal searchlight-based RSA procedure. RSA was conducted across the whole brain by predicting the local neural RDMs with perceptual RDMs across training blocks for each group. Data from two consecutive blocks were combined (temporal searchlight time window = 2 blocks). C, conceptual diagrams of three hypothesized patterns in training-induced neural representations of category information. The neural representations are assumed to be formed and updated following training. Three patterns (decreasing, inverted-U, and increasing) are illustrated with line graphs. Red circles represent the most robust representations in a specific training phase.
Model-Based Analyses.
To get a more detailed description of how participants categorized the stimuli, a number of different decision-bound models (Ashby, 1992a; Maddox and Ashby, 1993) were fit separately to the data for each block and each learner. Decision-bound models are derived from general recognition theory (Ashby and Townsend, 1986), a multivariate generalization of signal detection theory (Green and Swets, 1966). It is assumed that, on each trial, the percept can be represented as a point in a multidimensional psychological space and that each participant constructs a decision bound to partition the perceptual space into response regions. The participant determines which region the percept is in, and then makes the corresponding response. While this decision strategy is deterministic, decision-bound models predict probabilistic responding because of trial-by-trial perceptual and criterial noise (Ashby and Lee, 1993). Below we briefly describe the decision-bound models. For more details, see Ashby (1992a) or Maddox and Ashby (1993). The classification of these models as either rule-based or information-integration models is designed to reflect current theories of how these categories are learned (e.g., Ashby et al., 1998) and has received considerable empirical support (see Ashby and Valentin, 2017 for a review).
Rule-Based Models
Unidimensional Classifier (UC).
This model assumes that the stimulus space is partitioned into four regions by setting three criteria on one of the stimulus dimensions (see Fig. S3A for the visualization of the models). Two versions of the UC were used to fit these data. One version assumes that participants attended selectively to spectral modulation and the other version assumes participants attended selectively to temporal modulation. The UC has four free parameters, three correspond to the decision criteria on the attended dimension and the other corresponds to the variance of internal (perceptual and criterial) noise.
Conjunctive Classifier (CC).
A more appropriate rule-based strategy given the current category structures (Fig. 1A) is a conjunction rule involving separate decisions about the stimulus values on the two dimensions, with the response assignment based on the outcome of these two decisions (Ashby and Gott, 1988). All versions of the CC assume that the participant partitions the stimulus space into four regions (i.e., if low on temporal modulation and high on spectral modulation, respond 1; if high on temporal modulation and high on spectral modulation, respond 2; if low on temporal modulation and low on spectral modulation, respond 3; if high on temporal modulation and low on spectral modulation, respond 4; see Fig. S3A). The CC has three free parameters: the decision criteria on the two dimensions and a common value of internal noise for the two dimensions. A special case of the CC, the optimal rule-based classifier, assumes that participants use the CC that maximizes accuracy (i.e., the dashed boundary plotted with the RB structure in Fig. 1A). This special case has one free parameter (internal noise).
Conjunctive* Classifier (CC*).
This class of models is similar to the CC with the exception that they assume two criteria on either temporal modulation or spectral modulation (see Fig. S3A). The first CC* model assumes that the temporal modulation dimension is partitioned into three regions and that a criterion on spectral modulation is used for stimuli intermediate in temporal modulation, resulting in the following rule: respond 1 if temporal modulation is low; respond 4 if temporal modulation is high; respond 2 if temporal modulation is intermediate and spectral modulation is high; respond 3 if temporal modulation is intermediate and spectral modulation is low. The second CC* model assumes that the spectral modulation dimension is partitioned into three regions and that a criterion on temporal modulation is used for stimuli intermediate in spectral modulation, resulting in the following rule: respond 3 if spectral modulation is low; respond 2 if spectral modulation is high; respond 4 if spectral modulation is intermediate and temporal modulation is high; respond 1 if spectral modulation is intermediate and temporal modulation is low. The CC* model has four free parameters (two criteria on temporal/spectral modulation, one criterion on spectral/temporal modulation, and one for internal noise).
Information-Integration Models
Linear Classifier (LC).
This model assumes that two linear decision boundaries partition the stimulus space into four regions (see the dashed boundary plotted with the information-integration structure in Fig. 1A for an example; also see Fig. S3A for hypothetical response patterns and decision bounds). The LC differs from the CC in that the LC does not assume decisional selective attention (Ashby and Townsend, 1986). This produces an information-integration decision strategy because it requires linear integration of the perceived values on the stimulus dimensions prior to invoking any decision processes. The LC assumes two linear decision bounds of opposite slope (five parameters, slope, and intercept of both linear bounds and a common value of internal noise). A special case of the LC, the optimal information-integration classifier, assumes that participants use the LC that maximizes accuracy (i.e., the dashed boundary plotted with the II structure in Fig. 1A). This special case has one free parameter (perceptual and criterial noise).
Minimum Distance Classifier (MDC).
This model assumes that there are four units (one associated with each category) representing a low-resolution map of the stimulus space (Ashby and Waldron, 1999; Ashby et al., 2001; Maddox et al., 2004). On each trial, the participant determines which unit is closest to the perceived stimulus and produces the associated response. Because the location of one of the units can be fixed, and because a uniform expansion or contraction of the space will not affect the location of the minimum-distance decision bounds, the MDC has six free parameters (five determining the location of the units and one for perceptual and criterial noise).
Random Responder Models
Equal Response Frequency (ERF).
This model assumes that participants randomly assign stimuli to the four categories in a manner that preserves the category base rates (i.e., 25% of the stimuli in each category). This model has no free parameters.
Biased Response Frequency (BRF).
This model assumes that participants randomly assign stimuli to the four categories in a manner that matches the participant’s categorization response frequencies. This model has three free parameters, the proportion of responses in categories 1, 2, and 3. Although the ERF and BRF are assumed to be consistent with guessing, these models would also likely provide the best account of participants that frequently shift to very different strategies.
Model Fitting
The model parameters were estimated using maximum likelihood (Ashby, 1992b; Wickens, 1982) and the goodness-of-fit statistic was
where N is the sample size, r is the number of free parameters, and L is the likelihood of the model given the data (Schwarz, 1978). The BIC statistic penalizes a model for poor fit and extra free parameters. To find the best model among a set of competitors, one simply computes a BIC value for each model, and then chooses the model with the smallest BIC.
Neuroimaging data analyses
Univariate activation analysis.
All functional imaging data were preprocessed using SPM12 (Wellcome Department of Imaging Neuroscience, London, UK; www.fil.ion.ucl.ac.uk/spm/) closely following the pipeline described in previous studies (Feng et al., 2021a; Feng et al., 2018; Feng et al., 2019). Here, we provide brief descriptions of the processing procedure. The T2*-weighted functional images were head-movement corrected. The high-resolution T1-weighted image was co-registered to the mean image of the functional images and further normalized to the Montreal Neurological Institute (MNI) space using a segmentation-normalization procedure for estimation of the normalization parameters. The realigned functional images were spatially smoothed using a Gaussian kernel (FWHM = 6 mm).
Voxel-wise univariate activation analysis was employed to examine brain activities induced by sound categorization and feedback processing respectively with the general linear model (GLM). For the subject-level analysis, a GLM with a design matrix including three regressors of interest (i.e., sound categorization and correct and incorrect feedback presentations) was constructed for each participant. The regressors corresponding to the onset of trials were convolved with a canonical hemodynamic response function (without modeling temporal derivatives). Low-frequency drifts were removed using a temporal high-pass filter (cutoff at 128 sec). The AR(1) approach was used for autocorrelation correction. We identified outlier time points where the volumetric images were outside of three standard deviations (SD) of the global mean intensity or with the composite head movement that was larger than 1 mm (overall mean percentage of the outlier = 1.75 ± 1.05%; II learners = 1.93 ± 1.06%, RB learner = 1.57 ± 1.06%; two-sample t-test: t(58) = 1.32, P = 0.192). The outlier time points (1’s for outlier time points and 0’s for other time points) together with the six head-movement parameters and the session mean were added into the GLM models as nuisance regressors. The gray-matter image generated from the segmentation step was converted to a binary inclusive mask for each subject to define voxels of interest. For the group-level analysis, all reported brain regions were corrected at the cluster-level P = 0.05 with the family-wise error rate (FWER) approach as implemented in the SPM package.
ROI-based univariate-activation analyses were conducted in 12 pre-defined anatomical ROIs, including attention and cognitive-control related frontoparietal regions (i.e., the bilateral IFG and IPL), striatum areas (i.e., the bilateral putamen and caudate nucleus related to reward and feedback processes), and sensorimotor regions (i.e., the bilateral STG and PreCG). These regions were selected because they have been proposed to be involved in category learning and representation of newly-acquired categories (Ashby and Maddox, 2005, 2011; Ashby and Valentin, 2017; Feng et al., 2018; Feng et al., 2019; Seger and Miller, 2010; Seger and Peterson, 2013; Yi et al., 2016). These ROIs were derived from the automated anatomical labeling (AAL) atlas (Rolls et al., 2015) to ensure the independence of ROI definition and data analysis. The univariate activation estimates for each block and subject were extracted and averaged across voxels within each ROI for further group-level analyses.
Spatiotemporal representational similarity analysis (stRSA).
To examine how multivoxel activation patterns represent category-related information during training, we conducted RSA with the searchlight approach (Kriegeskorte and Kievit, 2013; Kriegeskorte et al., 2008) on both space and time dimensions. For the space dimension, we used the whole-brain searchlight to identify which regions show significant representations of category-perceptual information; for the time dimension, we applied the searchlight approach to the training block with a searchlight time window of two blocks (see Fig. 2B for the analysis procedure) to identify the temporal emerging patterns in neural representations (Fig. 2C).
To conduct stRSA, we estimated single-trial brain responses using the least-squares single (LSS) approach (Mumford et al., 2014; Mumford et al., 2012). The realigned unsmoothed functional images in each participant’s native space (without spatial normalization) were analyzed with the subject-level GLM. Specifically, a GLM with a design matrix for each trial was constructed and estimated separately. A design matrix consisted of a stimulus regressor of interest for that trial during the sound presentation; a regressor of non-interest consisted of all other events (i.e., feedback presentation for that trial, and sound and feedback presentations for the rest of the trials in the same block), outlier regressors, six head movement regressors, and a session mean regressor for each block individually. Therefore, 240 subject-level GLMs were constructed and estimated for each participant. The whole-brain t-statistic maps were calculated for each trial and used for stRSA (Misaki et al., 2010).
For the stRSA, we defined a perceptual category model (i.e., RDM; see Fig. 2A) for each category structure by calculating perceptual distances for each pair of sounds within the spectral and temporal modulation dimensions. The model RDM was used to correlate the neural RDM (nRDM) derived from a spherical brain area (for the searchlight-based RSA) or a region of interest (for the ROI-based RSA) using Spearman’s rank correlation (see Fig. 2B for the graphical analysis procedure). Whole-brain correlation maps were calculated using the searchlight algorithm (Kriegeskorte et al., 2006) to identify brain regions that represent the category-relevant perceptual information across training blocks. To identify which training phase shows the most robust neural representations, we applied the RSA with a temporal searchlight approach where the searchlight time window was set as two blocks. That is, single-trial data from two consecutive blocks (see Fig. 2B, right panel) were combined to generate nRDMs. The multi-voxel activation patterns based on t-statistic values derived from each searchlight sphere (the average number of voxels in each sphere = 90) on each time window were used to calculate dissimilarities between each pair of sounds for the nRDMs (i.e, 1- Pearson’s correlation matrix). The nRDMs were then correlated with the predefined perceptual-distance RDM by using Spearman’s rank correlation. The correlation value of each sphere and time window was standardized using Fisher’s r-to-z transformation and projected back to the center voxel to generate stRSA maps. For group-level analysis, the searchlight RSA maps were first normalized to the MNI space and then fed to a flexible factorial-design analysis of variance (ANOVA) model as dependent variables to assess the main effects of group and training block as well as the interaction effect in multivariate representations. We also used a model-based approach to identify regions that fit different changing patterns of learning-related representational dynamics (see below for the description of three types of dynamic patterns). All group statistical maps from the multivariate analyses were thresholded at cluster-level FWER of 0.05.
We hypothesized that the neural representations of category information change as a function of the training block, with three types of dynamic patterns: increasing, decreasing, and inverted-U (see Fig. 2C for the diagram). These patterns assume that the neural representations of category-relevant information emerge at different time points of the training for different regions. This assumption has been supported by the observation of various learning studies that have demonstrated brain representations (mostly reflect in univariate activations) of the learning stimuli change after training as compared with that of pre-training or early-training sessions, especially for the decreasing and increasing pattern (Karuza et al., 2014; Ley et al., 2012; Myers, 2014; Ohl and Scheich, 2005; Pasupathy and Miller, 2005; Tagarelli et al., 2019; Wang et al., 2003; Wong et al., 2007). The inverted-U pattern is motivated by learning studies interested in the function of the hippocampus memory system that show that the hippocampus temporally engages during learning to rapidly extract learning-relevant contextual information, mediate memory consolidation, and facilitate the transfer of new-acquired knowledge into the cortex (Baldassano et al., 2017; Davis and Gaskell, 2009; Schapiro et al., 2017; Takashima et al., 2014). Based on these observations, we hypothesized that the hippocampus memory system could temporally encode category-relevant information during learning. Therefore, an inverted-U pattern could reflect a pattern where the most robust representations emerge soon after training and a decrease in the following sessions. We are aware that the inverted-U pattern could range from several minutes to days of training depending on the nature of the learning task. Therefore, examining the inverted-U pattern of the representations in the current experiment is considered exploratory.
Each of these three patterns was proposed and modeled at the group-level analyses. The increasing pattern was defined as the neural representation increasing across training blocks (Fig. 2C, right panel). We predicted that this increasing pattern could closely follow the behavioral learning performance since the neural representations are hypothesized to subserve behavioral categorization. Therefore, an increasing curve function was created based on the group-level mean-centered behavioral learning curve (i.e., II = [−0.07, 0.00, 0.02, 0.02, 0.03]; RB = [−0.07, −0.02, 0.01, 0.03, 0.05]; each number denotes the weight of each searchlight time window [i.e., blocks 1-2, 2-3, 3-4, 4-5, and 5-6]). The decreasing pattern was defined as the neural representation decreasing across training blocks (Fig. 2C, left panel). The decreasing curve function was created by inverting the increasing function. The inverted-U pattern was defined as the neural representation emerging temporarily in the middle of training and diminishing in the “late” training phase (Fig. 2C, middle panel). The inverted-U function was created by assigning the highest weight for the “middle” blocks (e.g., block 3-4) while decreasing weights for the early and late blocks (i.e., [−0.8 0.2 1.2 0.2 −0.8]). The “middle” blocks could be arbitrarily defined. We also generated other variants of the inverted-U function (see Fig. S6A for the other two variants) to model the data, including assigning the highest weight to blocks 2-3 (i.e., [−0.8 1.2 0.4 0 −0.8]) and blocks 4-5 (i.e., [−0.8 0 0.4 1.2 −0.8]). These curve functions were used to weight each pair of blocks’ RSA model fits for each voxel across the whole brain, which can assess the extent to which a given region’s neural representation changes as a specific function.
Results
Behavioral response patterns and learning strategies
Categorization accuracy significantly increased over training blocks for both groups (II learners: mean accuracy of the first block = 0.31 and last block = 0.46 [paired t-test: t(29) = 5.38, P < 0.001]; RB learners: mean accuracy of the first block = 0.28 and last block = 0.45 [paired t-test: t(29) = 6.60, P < 0.001]; Fig. 1C). The linear mixed-effects (LME) regression analysis confirmed that the main effect of block in accuracy was significant (F(5,348) = 26.64, P < 0.001) but the main effect of group was not significant (F(1,348) = 1.43, P = 0.23). No significant group-by-block interaction in accuracy was found (F(5,348) = 0.716, P = 0.612). For categorization response time (RT), LME analysis revealed that both main effects of block and group were significant (block: F(5,348) = 3.41, P = 0.005; group: F(1,348) = 4.58, P = 0.033). The RT increased over blocks in general (mean RT in block 1 = 1146 ms and block 6 = 1200 ms), with RB learners responded slower than did the II learners (mean RT in RB task = 1219 ms and II = 1155 ms). The group-by-block interaction effect was not significant for RT (Ps > 0.1). The behavioral representational similarity analysis (bRSA) further showed that the category-perceptual distance RDMs were increasingly correlated with learners’ response confusion patterns for both groups (LME, main effect of block: F(5,348) = 24.212, P < 0.001; see Fig. S2B for the block-by-block bRSA model fits), but group differences were not significant (the main effect of group: F(1,348) = 0.070, P = 0.792). No significant group-by-block interaction effect was found (F(5,348) = 0.947, P = 0.451). These bRSA results demonstrate that learners’ confusion patterns were increasingly similar with the category-perceptual distances between sounds for both groups.
We used computational modeling to assess each learner’s best-fitting strategy for each block. We coded the strategies into three categories: II, RB, and random responder (Fig. 3A; also see Fig. S3B for detailed graphs of each best fitting strategy). Participants who performed the II task (i.e., II learners) were more likely to use II strategies compared to other strategies (χ2 = 25.733, P < 0.001). Similarly, participants who performed the RB task (i.e., RB learners) were more likely to use RB strategies (including unidimensional and conjunctive models) relative to other strategies (χ2 = 70.533, P < 0.001). By using multinomial logistic regression analysis, we further confirmed that there were significant group differences (II vs. RB learners) in the proportions of the strategies used (percentage of II vs. RB strategy: b = 2.718, SE = 0.337, t = 8.067, P < 0.001; II vs. random strategy: b = 2.239, SE = 0.391, t = 6.391, P < 0.001; RB vs. random strategy: b = −0.478, SE = 0.268, t = −1.786, P = 0.074). These interaction effects demonstrate that II learners are more likely to use II strategies whereas RB learners tend to use RB strategies to learn to categorize the sounds.
Fig. 3.
Learning strategy modeling results and group-by-block interaction effects in feedback-processing-related brain activations. A, the proportion of each learning strategy used in each block and learner group as revealed by neurocomputational decision-bound modeling. Strategy: II = optimal II or MDC responders; Rd = Random responders; RB = rule-based UC or CC responders. B, group-by-block interaction effects were found in a distributed corticostriatal network. The brain map was thresholded at the cluster-level FWER = 0.05. The frontoparietal and striatal regions were labeled. L, left hemisphere. C, ROI-based analysis shows that the feedback-related activations (feedback presentation vs. baseline) changed differently over training block between the two groups in anatomical-defined corticostriatal ROIs. ROIs from two hemispheres were combined. Brain region abbreviation: PreCG, precentral gyrus; IPL, inferior parietal lobule; MFG, middle frontal gyrus; MFGd, dorsal middle frontal gyrus; SFGm, medial superior frontal gyrus; ACC, anterior cingulate cortex; IFG, inferior frontal gyrus; STG, superior temporal gyrus.
Feedback-related corticostriatal activations changed differently following training between II and RB learners
We focus on examining the neural dynamics in feedback-related brain activations and sound-related multivoxel representations over training blocks for each learning group and revealing group differences in these neural dynamics. Both univariate activation and multivariate representation measures were calculated and examined. For univariate activations, the feedback processing (i.e., feedback vs. baseline) yielded brain activations in the bilateral frontoparietal areas (bilateral inferior frontal gyrus [IFG]/middle frontal gyrus [MFG] and inferior parietal lobule [IPL]), insula, the left middle temporal gyrus (MTG), and occipital cortices (Fig. S4A). Although a larger extent of the frontoparietal areas was engaged for the RB learners than the II learners, no region was found showing significant group differences. We also identified regions sensitive to feedback valence (i.e., correct vs. incorrect) in the bilateral putamen and head of the caudate nucleus where they showed greater activations for positive than negative (i.e., correct > incorrect) feedback for both groups (Fig. S4B), which consistent with previous findings in the context of speech category learning (Feng et al., 2019; Yi et al., 2016). Direct comparisons between II and RB learners in feedback-valence-related activations (i.e., II [correct-incorrect] vs. RB [correct-incorrect]) did not reveal any significant regions.
To examine group differences in feedback-related activation across training blocks, we constructed a second-level flexible factorial-design analysis of variance (ANOVA) with the voxel-wise feedback-related activations (i.e., feedback vs. baseline and correct-incorrect feedback) as the dependent variable, group as a between-subject factor, and training block (i.e., block 1 to 6) as a within-subject factor (i.e., group-by-block two-way ANOVA). The group differences in dynamic changing patterns are summarized by the voxel-wise group-by-block interaction effect. A corticostriatal network was identified showing significant interaction effects (Fig. 3B), including the left precentral gyrus (L.PreCG), right ventral and dorsal MFG, bilateral IPL, bilateral medial superior frontal gyrus (mSFG) extending to the anterior cingulate cortex (ACC), bilateral occipital cortex, bilateral putamen, and head of caudate nucleus.
To further break down the group-by-block interaction effect, we performed ROI-based univariate-activation analyses in 12 pre-defined anatomical ROIs, including attention and cognitive control related frontoparietal regions (the bilateral IFG and IPL), feedback and reward related striatum areas (the bilateral putamen and caudate nucleus), and sensorimotor regions (the bilateral STG and PreCG). The bilateral ROIs of the same region were combined since their activation patterns were similar across training blocks. The group-by-block activation profiles were showed in Fig. 3C. Significant main effects of block were found in the bilateral STG (F(5,348) = 9.41, FDR-corrected q < 0.001), IPL (F(5,348) = 4.90, FDR-corrected q = 0.001), and PreCG (F(5,348) = 3.58, FDR-corrected q = 0.011), showing decreasing trends over block for both groups. Significant group-by-block interaction effects were found in the bilateral caudate nucleus (F(5,348) = 3.58, FDR-corrected q = 0.011), putamen (F(5,348) = 2.51, FDR-corrected q = 0.067), IFG (F(5,348) = 2.94, FDR-corrected q = 0.033), IPL (F(5,348) = 5.15, FDR-corrected q = 0.001), and PreCG (F(5,348) = 4.76, FDR-corrected q = 0.001). We did not find any ROI showing a significant main effect of group (Ps > 0.1). These ROI-based results are consistent with the whole-brain voxel-wise results shown in Fig. 3B and reveal dynamic changing patterns of the feedback-related activations across blocks. These results further confirmed that a corticostriatal network, including the frontoparietal and striatum regions, was increasingly involved in the acquisition of RB categories, whereas the feedback-related activations in these regions decreased or remain stable during the learning of II categories.
In addition to the feedback-related univariate activations, sound categorization (compared with baseline) induced distributed fronto-temporoparietal activations for the II learning task, consistent with previous findings (Feng et al., 2019; Yi et al., 2016). This categorization network consists of the bilateral IFG, insula, PreCG and postcentral gyrus (PostCG), IPL, auditory cortices (e.g., bilateral Heschl’s gyrus [HG], and STG), and striatal areas (Fig. S5A). Besides, we found a significant main effect of training block in a dorsal auditory pathway (Fig. S5B), where sound categorization-related activations decreased following training, which may reflect stimulus-related repetition adaptation effects (Henson, 2003; Larsson and Smith, 2011; Summerfield et al., 2008). No region showed a significant main effect of group or group-by-block interaction effect. For the RB learning task, categorization-related activations were similar to that of the II task (Fig. S5C). The decreasing activations were restricted to the dorsal medial prefrontal cortex, supramarginal gyrus (SMG), and middle temporal regions (Fig. S5D). No region showed a significant main effect of group or group-by-block interaction effect.
Group differences in spatiotemporal dynamics of neural representations
The multivariate stRSA was used to examine the emerging neural representations of category-perceptual information across training blocks for each group. We focused on analyzing the sound-related multivoxel patterns to reveal the extent to which the local activation patterns reflect the auditory category-related perceptual representations. The whole-brain group-by-block analysis of variance (ANOVA) in stRSA brain maps revealed a significant main effect of training block in the bilateral frontoparietal regions (Fig. 4A, the top panel), including the left lateral IFG, left MFG, right IFG, bilateral posterior PreCG, and IPL, the supplementary motor area (SMA) and posterior cingulate cortex (PCC). The main effects of group were found in the left dorsal PreCG and left anterior IPL (Fig. 4A, the middle panel). Significantly more robust representations emerged for the II learners than that of the RB learners in the two regions across training blocks (Fig. 4B, lower panel). Critically, we found another two regions, left STG (L.STG; peak coordinate: x = −60, y = −37, z = 13) and left triangular IFG (L.IFGtri; peak coordinate: x = −48, y = 14, z = 7), that showed significant group-by-block interaction effects (Fig. 4A, bottom panel). The detailed interaction effects for the two regions are further displayed in the line graphs, which show that the representations increased following training for the II learners whereas the representations decreased for the RB learners (Fig. 4B, upper panel).
Fig. 4.
Whole-brain group-by-block ANOVA in the neural representations of perceptual category distance. A, the group-by-block ANOVA revealed the significant main effect of block (upper panel), the main effect of group (middle panel), and group-by-block interaction effects (bottom panel) in multi-voxel representations. Brain maps were thresholded at cluster-level corrected FWER = 0.05. B, emerging profiles in neural representations for the two interaction-effect regions (L.STG and L.IFGtri) and the two regions showing main-effect-of-group (L.PreCGd and L.IPLa). These line graphs are for post hoc visualization purposes. C, individual differences in the robustness of neural representations were significantly correlated with individual differences in learning performance (i.e., Cat. Acc) exclusively for the II learners. The lightness of dots and lines denotes training blocks.
To quantitatively compare the robustness of the emerging pattern between the two groups with a model-free approach that does not assume a priori the curve function across training blocks, we used a linear regression model to estimate the ‘emerging rate’ (i.e., slope) of the representations for each group separately. The independent variable is block pairs (i.e., block 1-2, 2-3, 3-4, etc.) while the dependent variable is the RSA model fit. The regression slope derived from the regression model is an indicator of the emerging rate where the higher slope indicates faster emergence. We found that the representations in the L.STG (extending to the supramarginal gyrus; peak coordinate: x = −63, y = 34, z = 10; cluster size = 1,359 mm3) and L.IFG (peak coordinate: x = −42, y = 26, z = 1; cluster size = 738 mm3) increased significantly faster (i.e., higher slope) for II learners than that of RB learners revealed by a two-sample t-test. This result converged with the ANOVA findings that show significant group-by-block interaction effects in the two regions.
To examine the extent to which the emerging neural representations in the L.STG and L.IFGtri are behaviorally-relevant, we conducted an ROI-based individual difference correlation analyses between the robustness of the neural representations and behavioral categorization accuracies across training blocks for each group separately. Individual differences in the robustness of neural representations were significantly correlated with individual learning success (i.e., categorization accuracies) among II learners in the L.STG (block 1-2: r = 0.31, P = 0.09, block 3-4: r = 0.52, P = 0.003, and block 5-6: r = 0.54, P = 0.002) and the L.IFGtri (block 1-2: r = 0.27, P = 0.14, block 3-4: r = 0.62, P < 0.001, and block 5-6: r = 0.44, P = 0.015) (see Fig. 4C, upper panel), but not among RB learners in the L.STG (block 1-2: r = −0.26, P = 0.17, block 3-4: r = 0.01, P = 0.99, and block 5-6: r = −0.06, P = 0.73) nor L.IFGtri (block 1-2: r = 0.11, P = 0.55, block 3-4: r = 0.23, P = 0.21, block 5-6: r = −0.09, P = 0.62) (see Fig. 4C, lower panel). The behavior-neural correlations for the II learners increase over training blocks (Fig. 4C, upper panel), which indicates that the learners’ emerging neural representations are increasingly associated with the learning performance.
II learners: The emerging neural representations of category-perceptual distance
We hypothesized that the neural representations of sounds change following training with different trajectories (see Fig. 2B) when learning different structures. Three hypothesized emerging patterns (i.e., increasing, decreasing, and inverted-U) were applied to reveal the dynamic patterns in multivariate neural representations across blocks. With the searchlight-based tsRSA, we identified the neural representations of perceptual category distance in a fronto-temporoparietal network showing a significantly increasing pattern mimicking the behavioral learning performance for the II learners (Fig. 5A). The most relevant regions involved were located in the bilateral IFG, left PreCG, left PostCG, left IPL, left supramarginal gyrus (L.SMG), left middle section of the superior temporal gyrus (L.STGm), right MFG, MTG, and PCC (see Table 1 for the region details). The increasing patterns are displayed in the line graphs for four representative regions in Fig. 5B. For visual comparison purposes, the dynamic changing patterns derived from the RB learners were also displayed for the same regions. No region showed a significant decreasing or inverted-U pattern.
Fig. 5.
Changes in multi-voxel representations of category distance during learning for the II learners. A, a fronto-temporoparietal network showed significantly increasing neural representations of category distance for the II group. No significantly decreasing or inverted-U pattern was found. Brain maps were thresholded at the voxel-level P = 0.005 and cluster-level FWER = 0.05. B, post hoc ROI-based RSA analysis for four representative ROIs (highlighted in panel A). RB learners’ RSA correlations (i.e., model fits) were also displayed for the same regions for visual comparison purposes.
Table 1:
Brain clusters that showed different emerging patterns in neural representations of category-perceptual distance.
| Group \ pattern |
Regions | Peak MNI coordinates (mm) |
Cluster size (# voxel) |
Peak t value | ||
|---|---|---|---|---|---|---|
| x | y | z | ||||
| II Increase | L.IFG | −54 | 35 | 7 | 512 | 4.14 |
| L.PreCG | −30 | −1 | 64 | 271 | 4.75 | |
| L.SMG/IPL | −39 | −40 | 13 | 123 | 4.09 | |
| L.STGm | −60 | −19 | 7 | 96 | 3.91 | |
| L.Precuneus/IPL | −12 | −55 | 40 | 512 | 5.04 | |
| R.IFG/MFG | 39 | 20 | 43 | 313 | 3.75 | |
| R.MTG | 48 | −76 | 22 | 104 | 4.53 | |
| RB Increase | SMA | −6 | 32 | 43 | 306 | 4.25 |
| R.IPL | 36 | −37 | 43 | 131 | 4.22 | |
| R.PreCGd | 12 | −10 | 67 | 205 | 4.39 | |
| Decrease | L.STGp | −60 | −37 | 13 | 225 | 4.46 |
| L.MOG | −30 | −85 | 13 | 140 | 3.99 | |
| R.STG | 63 | −34 | 19 | 83 | 3.71 | |
| Inverted-U | R.PreCGv | 54 | −1 | 49 | 277 | 4.51 |
| R.IPL/SPL | 39 | −52 | 64 | 140 | 4.45 | |
| R.ACC | 6 | 20 | 25 | 100 | 3.94 | |
| R.Precuneus | 9 | −70 | 61 | 572 | 4.85 | |
| R.Hipp/Parahipp | 24 | −22 | −17 | 192 | 4.59 | |
RB learners: The emerging neural representations of category-perceptual distance
We identified regions showing the three types of dynamic patterns for the RB group. For the decreasing pattern, we identified three regions, including the bilateral STG and the left middle occipital cortex (L.MOG) (Fig. 6A). Intriguingly, we also identified extended brain areas showing the inverted-U pattern, including the R.PreCG, R.IPL/SPL, right inferior ACC (iACC), right precuneus, right hippocampus (R.Hipp), and parahippocampus (Fig. 6B; also see Fig. S6B for brain regions revealed by other variants of the inverted-U functions). This finding indicates that these regions encode category-perceptual distance prominently in the middle of the training (i.e., block 3-4) and these representations decreased at the late stage of training. For the increasing pattern, we identified three significant regions (Fig. 6C), including the SMA, right dorsal PreCG, and IPL. These dynamic emerging patterns for the RB learners were schematically summarized in Fig. 6D (see Table 1 for detailed regional statistics).
Fig. 6.
Dynamic changing patterns in multi-voxel representations of perceptual category distance for RB learners. A, decreasing neural representation was found in the bilateral STG and left middle occipital gyrus. Line graphs on the lower panel show the changing patterns across blocks. The neural representations for the II learners were also shown for visual comparison in the same region. B, the inverted-U pattern was found in the frontoparietal regions, anterior ACC, and hippocampus. C, increasing neural representations were found in the right IPL, dorsal PreCG, and SMA. All the searchlight maps were thresholded at the cluster-level corrected FWER = 0.05. D, a schematic summary of the spatiotemporal representation dynamics. The region positions are derived proximately from panels A to C. The lightness denotes three pairs of training blocks (i.e., early [block 1-2], middle [block 3-4], and late [block 5-6] blocks). ROI abbreviation: L/R.STG, left/right superior temporal gyrus; L.MOG, left middle occipital gyrus; SMA, supplementary motor area; R.Hipp, right hippocampus; L.STGp, left posterior portion of superior temporal gyrus; R.IPL, right inferior parietal lobule; R.SPL, right superior parietal lobule; ACC, the middle section of the anterior cingulate cortex; R.PreCGd, right dorsal precentral gyrus; R.PreCGv, right ventral precentral gyrus.
In summary, the bilateral STG shows the representation at the early phase of training and decreases afterward while the right frontoparietal regions, ACC, and hippocampus encode the category information dominantly in the middle of training. The right IPL, PreCG, and SMA represent the category-related information dominantly in the late phase of training. These dynamic patterns indicate that RB learners’ neural representations did not emerge linearly in specific regions, instead, the representations might be constantly updating to categorize sounds optimally and efficiently.
Discussion
We examined the neural dynamics underlying the acquisition of two different types of auditory category structures using a feedback-based sound-to-category training paradigm with fMRI and spatiotemporal multivariate pattern analyses. The same underlying dimensions, spectral and temporal modulations, considered the building blocks of speech and music, were common to both category structures. For the RB category structure, optimal performance during learning is assumed to be obtained by making separate decisions about the perceptual information on each dimension and developing and validating combinatorial rules that map sounds to categories. In contrast, optimal categorization in the II learning task required integration of the spectral and temporal modulation dimensions prior to any decision process, which may rely on a procedural-based learning strategy to associate auditory signals with category-motor outputs. The sound-to-category feedback-based training procedure was identical for the RB and II tasks. Our design allowed the test of competing models: do representations of the auditory dimensions emerge within a processing hierarchy (the STG-IFG pathway) that are functionally specialized in mapping acoustic signals to behaviorally relevant categories irrespective of stimulus structure? Or do the emerging representations reflect the task at hand, varying dynamically based on the category structure and underlying learning strategies? Our results point to the latter. Emerging representations do not follow a strict hierarchy and the extent to which representations emerge within the auditory-motor regions versus regions involved in cognitive control are determined by the category learning task (RB vs. II).
Despite large individual differences in learning performance, participants learned RB and II tasks to similar extents. Notably, though, learners tended to use different decisional strategies, with the II task yielding more procedural-based strategies, and the RB task yielding more rule-based strategies. Importantly, corrective feedback was also differentially processed across tasks as a function of the time course of learning within a corticostriatal network. The putamen and caudate nucleus are increasingly sensitive to feedback processing in the RB task, whereas distributed fronto-temporoparietal regions showed decreased activations in response to feedback for II learners. Spatiotemporal representational similarity analysis (stRSA) revealed a dorsal sensor-motor pathway, especially the left superior temporal gyrus (STG) and left triangular section of the inferior frontal gyrus (IFGtri) showing increasing representations of category-perceptual distance over training blocks in the II task. These emerging neural representations relate to both training-induced behavioral improvements and individual differences in learning success. In contrast, the representations in this dorsal pathway decreased over training blocks in the RB task. This suggests that representations within these regions are not key to the RB learning task; instead, emerging representations within a large, distributed frontal-parietal-hippocampus network involved in cognitive control or attention are key to learning RB categories. These novel findings are consistent with the dual-learning systems model for auditory category learning and its predictions in the dynamics of multivoxel neural representations and feedback-related corticostriatal involvement as a function of the category structures are learned (Chandrasekaran et al., 2014a; Chandrasekaran et al., 2014b; Maddox et al., 2013).
For the II structure, pre-decisional integration and procedural-based learning are proposed to be key for optimal learning. We found a dorsal auditory-motor pathway consisting of the left superior temporal gyrus (STG), supramarginal gyrus (SMG), anterior inferior parietal lobule (IPLa), and precentral gyrus (PreCG) shows increasing representations of category-related perceptual information. This auditory category representational network is consistent with a broad speech processing network, spanning the frontal, temporal, and parietal lobes (Feng et al., 2021a; Giraud and Poeppel, 2012; Hickok and Poeppel, 2007; Rauschecker and Scott, 2009). Among these brain regions, the STG has been demonstrated to encode multidimensional acoustic signals (including spectral and temporal modulations) that differentiate native speech categories (Arsenault and Buchsbaum, 2015; Bonte et al., 2014; Feng et al., 2016; Feng et al., 2018; Formisano et al., 2008; Mesgarani et al., 2014), which is presumably acquired slowly with the development of one’s native language. Auditory associative regions within the STG are hypothesized to be functionally specialized in representing familiar acoustic signals, especially speech (Arsenault and Buchsbaum, 2015; Chevillet et al., 2013; Desai et al., 2008; Feng et al., 2021a; Feng et al., 2018; Feng et al., 2021b; Yi et al., 2019). Previous studies have found that the left fronto-temporo-parietal regions vary in their sensitivity to speech relative to non-speech stimuli (Binder et al., 2000). Speech contexts relative to non-speech significantly enhances the robustness of the neural representations of auditory categories in the bilateral frontal, temporal, and precentral regions (Feng et al., 2021a). However, prior studies have also shown that the processing of non-speech stimuli is similar to that of speech in the fronto-STG neural patterns when they share critical acoustic properties in differentiating categories (Feng et al., 2021a; Leech et al., 2009) or after training (Karuza et al., 2014; Ley et al., 2012). Our findings, together with previous observations that short-term training on non-native speech and non-speech auditory categories are associated with increased neural recruitment of the STG, suggest that the STG plasticity occurs rapidly for adult speech learners (Callan et al., 2003; Desai et al., 2008; Feng et al., 2019; Leech et al., 2009; Wang et al., 2003; Zhang et al., 2009). We posit that multivoxel representations in the STG may increasingly encode perceptual boundaries among the four categories of the II structure, but not the RB structure, by increasing the sensitivity of between-category sounds while decreasing the sensitivity of the within-category items. This trend in categorical perception has been demonstrated by the findings that the RSA model fits are gradually increasing following training in the left STG.
We do not argue that these emergent neural representations are abstract or categorical (as yet) given that participants were only assayed at an early (but important) stage of learning. Instead, the emergent representations (across six blocks of sound-to-category training) are sensitive to category-related perceptual similarities between sounds. We speculate that the neural representations could transform to abstract or categorical with long-term or extensive training when participants had a high degree of proficiency (e.g., >90%, native-like) (Reetzke et al., 2018). It is also worthy to note that all analyses were performed on stimulus processing during active categorization and learning. Thus, we cannot determine the extent to which the patterns of activity that developed across training were due to changes in perceptual tuning within the STG or due to top-down modulations or influences interacting with STG representations (e.g., from frontoparietal regions).
Besides the STG, we identified the triangular section of the left inferior frontal gyrus (L.IFGtri) showing the category-structure-specific emerging representations over training sessions. This L.IFGtri is located within the ventral IFG. Previous studies have found that the IFG is involved in the acquisition of auditory or non-native speech categories (Lee et al., 2012; Myers, 2014; Myers and Swan, 2012) with activation patterns in this region showing greater sensitivity to between-category changes and reduced sensitivity to within-category changes after training. To achieve successful learning, the IFG is hypothesized to work dynamically with the temporal cortices (e.g., STG) to form categorical representations and make overt categorization decisions (Feng et al., 2021a; Myers, 2014; Myers et al., 2009; Myers and Swan, 2012). Here, we also demonstrate that different sub-regions of the IFG showed distinct profiles of plasticity. While the L.IFGtri shows increasing representations for the II learners (Fig. 4B), the lateral IFG in both hemispheres yielded similar increasing representations for both groups (Fig. 4A, the main effect of block). These findings suggest that the L.IFGtri may be related to the acquisition of task-specific (procedural-based) category structures, while the lateral IFG may be related to task-general sound-to-category learning and categorization decisions.
In contrast to the unidirectional increasing pattern in representations for the II task, we found a more complex dynamic pattern for the RB task, including decreasing, inverted-U, and increasing emerging dynamics. First, decreasing neural representations of category-perceptual distance are found in the bilateral STG, which suggests that the neural sensitivity to the perceptual similarity between sounds was reduced as a function of training. This finding suggests that as learners use more rule-based learning strategies (Fig. 3A), categorization decisions are less driven by perceptual information encoding in the STG. Instead, the emergence of higher-order abstract rules may guide categorization decisions. Second, we found a distributed frontoparietal-medial temporal network that shows an inverted-U pattern, in which this network represents the perceptual information highest in the middle of training while decreasing at the late phase. This network consists of frontoparietal regions, hippocampus, and anterior cingulate cortex (ACC) that have been associated with functions of working memory, knowledge-based learning, and memory formation. The frontoparietal regions may support the encoding of stimulus-derived perceptual distance for task-relevant functions, e.g., rule generation, testing, and maintenance that are highly related to working memory and executive controls (Braver, 2012; Feng et al., 2021a). This emergent representational network is consistent with predictions from multiple-learning systems models (e.g., Competition between Verbal and Implicit Systems (COVIS) (Ashby et al., 1998) and dual learning system (DLS) model (Chandrasekaran et al., 2014a; Chandrasekaran et al., 2014b). The frontoparietal regions overlap with the executive brain network includes the bilateral prefrontal cortices, IPL, and ACC. These regions have been proposed to be integral to hypothesis-testing (Ashby et al., 1998; Ashby and Maddox, 2005; Ashby and Valentin, 2017; Chandrasekaran et al., 2014b). The frontoparietal representations during learning may be in support of domain-general working memory and executive control processes, which enable the transformation of the perceptual information into testable rules that are stored temporally in the hippocampus (Radulescu et al., 2019).
Another possibility of the invert-U pattern found in the RB group is that differential engagement of cognitive processes other than learning might be recruited in the RB learning task. Previous research has suggested that attentional mechanisms may be differentially involved during learning depending on stimulus predictability and outcome prediction (i.e., prediction error) (Diederen et al., 2016). Effective engagement of attention is proposed to be optimal at the intermediate levels of predictability (Kidd et al., 2012). For the RB learning task, corrective feedback is likely in the ‘sweet spot’ of predictability in the middle phase of training per internally generated rules (based on the computational modeling of the learning strategies). Therefore, the most robust representations emerging in the middle phase of training in the frontal, hippocampus, and ACC regions might be related to the attentional mechanism and prediction processes. Future studies need a deeper examination of these inverted-U effects during auditory category learning while considering or control for the attentional processes.
Lastly, the increasing pattern of neural representations of category distance was located in the right dorsal motor system, including the right parietal, precentral, and supplementary motor areas (SMA). Previous studies have found that these dorsal motor regions are engaging during the active categorization of speech categories (Feng et al., 2018). Local multivoxel representations of speech categories are found robust in these regions for native speakers in both speech and non-speech contexts (Feng et al., 2021a). The emerging neural representations found in these motor regions are proposed to be reflecting a certain degree of categorical representations beyond motor responses (Feng et al., 2021b). Therefore, for the current training task, the late-emerging neural representations in these dorsal motor regions may support the emerging representations of rules or rule-based category decision processes. Altogether, we demonstrate that the neural representations of category distance for the RB learners undergo dynamic representational transformation across regions and training blocks, where decreasing dependence of perceptual information in the STG, temporally representing perceptual information in a frontoparietal-medial temporal network and increasing representation in the motor network.
Some limitations of the current study should be mentioned when we consider the key results. First, although the behavioral performances of the two learning tasks are comparable, the learners only achieved low-to-mediate categorization proficiency overall in a single sound-to-category training session. The learners are very much novices in this categorization task. Therefore, the above spatiotemporal dynamics in neural representations only reflect the very early stage of learning. Future works need to probe the neural changes in a longer period of training when learners achieve high proficiency. Second, some participants used non-II strategies in the II learning task and non-RB strategies in the RB learning task, which is an inevitable phenomenon due to the individual differences in forming various learning strategies during the learning process (Roark et al., 2021). Although the proportion of these participants are comparable between the two groups, this fact may compromise the group-level findings. Excluding those participants would significantly reduce the sample sizes and therefore effect sizes. Future studies could consider expanding the sample size to only include participants with a pure learning strategy.
Summary and conclusion
Using decision-bound computational modeling, univariate activation, and spatiotemporal multivariate representational similarity analyses, we demonstrate partially dissociated neural dynamic patterns in response to learning information-integration (II) and rule-based (RB) category structures. A sensory-motor pathway, prominently in the superior temporal and inferior frontal regions, and dorsal precentral gyrus showed increasing representations of category-perceptual information during the process of learning II categories, while a frontoparietal-hippocampus network showed a complex dynamic representational pattern for learning RB categories. Moreover, univariate analysis showed that feedback-related activations of a frontostriatal network increased for RB learners whereas it decreased or remained unchanged for II learners. These findings demonstrate that learners’ corticostriatal systems are highly plastic and sensitive to the types of the category learning task. Emerging auditory category-related representations are not strictly restricted to regions (e.g., STG) that are functionally specialized to process key perceptual dimensions. Instead, consistent with recent neural models, learning to categorize is flexibly achieved by strategically learning efficient and task-dependent representations (task-state representations). The composition of category structures, and consequently the dynamics of decisional strategies (II vs. RB mapping) during the process of learning are key determinants in the emergent representational dynamics underlying auditory category learning.
Supplementary Material
Highlights.
Distinct brain networks subserve learning of different auditory category structures.
Learning an information-integration-based structure recruits a sensory-motor network.
Learning a rule-based structure recruits a Frontoparietal-ACC-Hipp-PreCG network.
The two networks represent category-structure information in distinct temporal patterns.
Acknowledgments
This work was supported by grants from the National Institute on Deafness and other Communication Disorders of the National Institutes of Health under Award No. (R01DC013315 834 to B.C.) and the General Research Fund (Ref. No. 14619518 to G.F.) by the Research Grants Council of Hong Kong.
Footnotes
Conflict of interest statement
Patrick C. M. Wong is a founder of a company in Hong Kong supported by a Hong Kong SAR government startup scheme for universities.
References
- Arsenault JS, Buchsbaum BR, 2015. Distributed neural representations of phonological features during speech perception. Journal of Neuroscience 35, 634–642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashby FG, 1992a. Multidimensional models of categorization. Multidimensional models of perception and cognition. Lawrence Erlbaum Associates, Inc, Hillsdale, NJ, US, pp. 449–483. [Google Scholar]
- Ashby FG, 1992b. Multivariate probability distributions. Multidimensional models of perception and cognition. Lawrence Erlbaum Associates, Inc, Hillsdale, NJ, US, pp. 1–34. [Google Scholar]
- Ashby FG, Alfonso-Reese LA, Turken AU, Waldron EM, 1998. A neuropsychological theory of multiple systems in category learning. Psychological Review 105, 442–481. [DOI] [PubMed] [Google Scholar]
- Ashby FG, Crossley MJ, 2012. Automaticity and multiple memory systems. Wiley Interdisciplinary Reviews: Cognitive Science 3, 363–376. [DOI] [PubMed] [Google Scholar]
- Ashby FG, Gott RE, 1988. Decision rules in the perception and categorization of multidimensional stimuli. Journal of Experimental Psychology: Learning, Memory, and Cognition 14, 33–53. [DOI] [PubMed] [Google Scholar]
- Ashby FG, Lee WW, 1993. Perceptual variability as a fundamental axiom of perceptual science. Advances in psychology. Elsevier, pp. 369–399. [Google Scholar]
- Ashby FG, Maddox WT, 2005. Human category learning. Annual Review of Psychology 56, 149–178. [DOI] [PubMed] [Google Scholar]
- Ashby FG, Maddox WT, 2011. Human category learning 2.0. Annals of the New York Academy of Sciences 1224, 147–161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashby FG, Townsend JT, 1986. Varieties of perceptual independence. Psychological Review 93, 154–179. [PubMed] [Google Scholar]
- Ashby FG, Valentin VV, 2017. Multiple Systems of Perceptual Category Learning: Theory and Cognitive Tests. Handbook of Categorization in Cognitive Science, 2nd Edition, 157–188. [Google Scholar]
- Ashby FG, Waldron EM, 1999. On the nature of implicit categorization. Psychonomic Bulletin & Review 6, 363–378. [DOI] [PubMed] [Google Scholar]
- Ashby FG, Waldron EM, Lee WW, Berkman A, 2001. Suboptimality in human categorization and identification. Journal of Experimental Psychology: General 130, 77. [DOI] [PubMed] [Google Scholar]
- Baldassano C, Chen J, Zadbood A, Pillow JW, Hasson U, Norman KA, 2017. Discovering Event Structure in Continuous Narrative Perception and Memory. Neuron 95, 709–721 e705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Binder JR, Frost JA, Hammeke TA, Bellgowan PSF, Springer JA, Kaufman JN, Possing ET, 2000. Human temporal lobe activation by speech and nonspeech sounds. Cerebral Cortex 10, 512–528. [DOI] [PubMed] [Google Scholar]
- Bonte M, Hausfeld L, Scharke W, Valente G, Formisano E, 2014. Task-dependent decoding of speaker and vowel identity from auditory cortical response patterns. J Neurosci 34, 4548–4557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Braver TS, 2012. The variable nature of cognitive control: a dual mechanisms framework. Trends in Cognitive Sciences 16, 106–113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Callan DE, Tajima K, Callan AM, Kubo R, Masaki S, Akahane-Yamada R, 2003. Learning-induced neural plasticity associated with improved identification performance after training of a difficult second-language phonetic contrast. Neuroimage 19, 113–124. [DOI] [PubMed] [Google Scholar]
- Carpenter KL, Wills AJ, Benattayallah A, Milton F, 2016. A Comparison of the neural correlates that underlie rule-based and information-integration category learning. Human Brain Mapping 37, 3557–3574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chandrasekaran B, Koslov SR, Maddox WT, 2014a. Toward a dual-learning systems model of speech category learning. Frontiers in Psychology 5, 825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chandrasekaran B, Yi HG, Maddox WT, 2014b. Dual-learning systems during speech category learning. Psychon Bull Rev 21, 488–495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chevillet MA, Jiang X, Rauschecker JP, Riesenhuber M, 2013. Automatic phoneme category selectivity in the dorsal auditory stream. Journal of Neuroscience 33, 5208–5215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davis MH, Gaskell MG, 2009. A complementary systems account of word learning: neural and behavioural evidence. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 364, 3773–3800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Desai R, Liebenthal E, Waldron E, Binder JR, 2008. Left posterior temporal regions are sensitive to auditory categorization. Journal of Cognitive Neuroscience 20, 1174–1188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Diederen KM, Spencer T, Vestergaard MD, Fletcher PC, Schultz W, 2016. Adaptive Prediction Error Coding in the Human Midbrain and Striatum Facilitates Behavioral Adaptation and Learning Efficiency. Neuron 90, 1127–1138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Diehl RL, Lotto AJ, Holt LL, 2004. Speech perception. Annual Review of Psychology 55, 149–179. [DOI] [PubMed] [Google Scholar]
- Elliott TM, Theunissen FE, 2009. The modulation transfer function for speech intelligibility. PLoS Computational Biology 5, e1000302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feng G, Chen Q, Zhu Z, Wang S, 2016. Separate Brain Circuits Support Integrative and Semantic Priming in the Human Language System. Cerebral Cortex 26, 3169–3182. [DOI] [PubMed] [Google Scholar]
- Feng G, Gan Z, Llanos F, Meng D, Wang S, Wong PCM, Chandrasekaran B, 2021a. A distributed dynamic brain network mediates linguistic tone representation and categorization. Neuroimage 224, 117410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feng G, Gan Z, Wang S, Wong PCM, Chandrasekaran B, 2018. Task-general and acoustic-invariant neural representation of speech categories in the human brain. Cerebral Cortex 28, 3241–3254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feng G, Li Y, Hsu S-M, Wong PCM, Chou T-L, Chandrasekaran B, 2021b. Emerging Native-Similar Neural Representations Underlie Non-Native Speech Category Learning Success. Neurobiology of Language 2, 280–307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feng G, Yi HG, Chandrasekaran B, 2019. The role of the human auditory corticostriatal network in speech learning. Cerebral Cortex 29, 4077–4089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Formisano E, De Martino F, Bonte M, Goebel R, 2008. "Who" is saying "what"? Brain-based decoding of human voice and speech. Science 322, 970–973. [DOI] [PubMed] [Google Scholar]
- Freedman DJ, Riesenhuber M, Poggio T, Miller EK, 2003. A comparison of primate prefrontal and inferior temporal cortices during visual categorization. Journal of Neuroscience 23, 5235–5246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giraud A-L, Poeppel D, 2012. Cortical oscillations and speech processing: emerging computational principles and operations. Nature Neuroscience 15, 511–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Golestani N, Zatorre RJ, 2004. Learning new sounds of speech: reallocation of neural substrates. Neuroimage 21, 494–506. [DOI] [PubMed] [Google Scholar]
- Green DM, Swets JA, 1966. Signal detection theory and psychophysics. Wiley; New York. [Google Scholar]
- Helie S, Ell SW, Ashby FG, 2015. Learning robust cortico-cortical associations with the basal ganglia: an integrative review. Cortex 64, 123–135. [DOI] [PubMed] [Google Scholar]
- Henson RNA, 2003. Neuroimaging studies of priming. Progress in Neurobiology 70, 53–81. [DOI] [PubMed] [Google Scholar]
- Hickok G, Poeppel D, 2007. The cortical organization of speech processing. Nature Reviews: Neuroscience 8, 393–402. [DOI] [PubMed] [Google Scholar]
- Jiang X, Bradley E, Rini RA, Zeffiro T, Vanmeter J, Riesenhuber M, 2007. Categorization training results in shape- and category-selective human neural plasticity. Neuron 53, 891–903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang X, Chevillet MA, Rauschecker JP, Riesenhuber M, 2018. Training Humans to Categorize Monkey Calls: Auditory Feature- and Category-Selective Neural Tuning Changes. Neuron 98, 405–416 e404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karuza EA, Emberson LL, Aslin RN, 2014. Combining fMRI and behavioral measures to examine the process of human learning. Neurobiology of Learning and Memory 109, 193–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kelly C, Uddin LQ, Shehzad Z, Margulies DS, Castellanos FX, Milham MP, Petrides M, 2010. Broca's region: linking human brain functional connectivity data and non-human primate tracing anatomy studies. European Journal of Neuroscience 32, 383–398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kidd C, Piantadosi ST, Aslin RN, 2012. The Goldilocks Effect: Human Infants Allocate Attention to Visual Sequences That Are Neither Too Simple Nor Too Complex. PloS One 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kriegeskorte N, Goebel R, Bandettini P, 2006. Information-based functional brain mapping. Proceedings of the National Academy of Sciences of the United States of America 103, 3863–3868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kriegeskorte N, Kievit RA, 2013. Representational geometry: integrating cognition, computation, and the brain. Trends in Cognitive Sciences 17, 401–412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kriegeskorte N, Mur M, Bandettini P, 2008. Representational similarity analysis - connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience 2, 1–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Larsson J, Smith AT, 2011. fMRI Repetition Suppression: Neuronal Adaptation or Stimulus Expectation? Cerebral Cortex. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee YS, Turkeltaub P, Granger R, Raizada RD, 2012. Categorical speech processing in Broca's area: an fMRI study using multivariate pattern-based analysis. Journal of Neuroscience 32, 3942–3948. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leech R, Holt LL, Devlin JT, Dick F, 2009. Expertise with artificial nonspeech sounds recruits speech-sensitive cortical regions. Journal of Neuroscience 29, 5234–5239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ley A, Vroomen J, Hausfeld L, Valente G, De Weerd P, Formisano E, 2012. Learning of new sound categories shapes neural response patterns in human auditory cortex. Journal of Neuroscience 32, 13273–13280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lim SJ, Fiez JA, Holt LL, 2019. Role of the striatum in incidental learning of sound categories. Proceedings of the National Academy of Sciences of the United States of America 116, 4671–4680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maddox WT, Ashby FG, 1993. Comparing decision bound and exemplar models of categorization. Perception and Psychophysics 53, 49–70. [DOI] [PubMed] [Google Scholar]
- Maddox WT, Chandrasekaran B, Smayda K, Yi HG, 2013. Dual systems of speech category learning across the lifespan. Psychology and Aging 28, 1042–1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maddox WT, Filoteo JV, Hejl KD, David A, 2004. Category number impacts rule-based but not information-integration category learning: further evidence for dissociable category-learning systems. Journal of Experimental Psychology: Learning, Memory, and Cognition 30, 227. [DOI] [PubMed] [Google Scholar]
- Mesgarani N, Cheung C, Johnson K, Chang EF, 2014. Phonetic feature encoding in human superior temporal gyrus. Science 343, 1006–1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Misaki M, Kim Y, Bandettini PA, Kriegeskorte N, 2010. Comparison of multivariate classifiers and response normalizations for pattern-information fMRI. Neuroimage 53, 103–118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mumford JA, Davis T, Poldrack RA, 2014. The impact of study design on pattern estimation for single-trial multivariate pattern analysis. Neuroimage 103, 130–138. [DOI] [PubMed] [Google Scholar]
- Mumford JA, Turner BO, Ashby FG, Poldrack RA, 2012. Deconvolving BOLD activation in event-related designs for multivoxel pattern classification analyses. Neuroimage 59, 2636–2643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Myers EB, 2014. Emergence of category-level sensitivities in non-native speech sound learning. Frontiers in Neuroscience 8, 238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Myers EB, Blumstein SE, Walsh E, Eliassen J, 2009. Inferior frontal regions underlie the perception of phonetic category invariance. Psychological Science 20, 895–903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Myers EB, Swan K, 2012. Effects of category learning on neural sensitivity to non-native phonetic categories. Journal of Cognitive Neuroscience 24, 1695–1708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nomura EM, Maddox WT, Filoteo JV, Ing AD, Gitelman DR, Parrish TB, Mesulam MM, Reber PJ, 2007. Neural correlates of rule-based and information-integration visual category learning. Cerebral Cortex 17, 37–43. [DOI] [PubMed] [Google Scholar]
- Nomura EM, Reber PJ, 2012. Combining computational modeling and neuroimaging to examine multiple category learning systems in the brain. Brain Sci 2, 176–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ohl FW, Scheich H, 2005. Learning-induced plasticity in animal and human auditory cortex. Current Opinion in Neurobiology 15, 470–477. [DOI] [PubMed] [Google Scholar]
- Pasupathy A, Miller EK, 2005. Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433, 873–876. [DOI] [PubMed] [Google Scholar]
- Perrachione TK, Lee J, Ha LY, Wong PC, 2011. Learning a novel phonological contrast depends on interactions between individual differences and training paradigm design. Journal of the Acoustical Society of America 130, 461–472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Radulescu A, Niv Y, Ballard I, 2019. Holistic Reinforcement Learning: The Role of Structure and Attention. Trends in Cognitive Sciences 23, 278–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rauschecker JP, Scott SK, 2009. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nature Neuroscience 12, 718–724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reetzke R, Xie Z, Llanos F, Chandrasekaran B, 2018. Tracing the trajectory of sensory plasticity across different stages of speech learning in adulthood. Current Biology 28, 1419–1427 e1414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roark CL, Smayda KE, Chandrasekaran B, 2021. Auditory and visual category learning in musicians and nonmusicians. Journal of Experimental Psychology: General. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rolls ET, Joliot M, Tzourio-Mazoyer N, 2015. Implementation of a new parcellation of the orbitofrontal cortex in the automated anatomical labeling atlas. Neuroimage 122, 1–5. [DOI] [PubMed] [Google Scholar]
- Schapiro AC, Turk-Browne NB, Botvinick MM, Norman KA, 2017. Complementary learning systems within the hippocampus: a neural network modelling approach to reconciling episodic memory with statistical learning. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schonwiesner M, Zatorre RJ, 2009. Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI. Proceedings of the National Academy of Sciences of the United States of America 106, 14611–14616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schultz W, Tremblay L, Hollerman JR, 1998. Reward prediction in primate basal ganglia and frontal cortex. Neuropharmacology 37, 421–429. [DOI] [PubMed] [Google Scholar]
- Schwarz G, 1978. Estimating the dimension of a model. The annals of statistics 6, 461–464. [Google Scholar]
- Seger CA, 2008. How do the basal ganglia contribute to categorization? Their roles in generalization, response selection, and learning via feedback. Neuroscience and Biobehavioral Reviews 32, 265–278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seger CA, Miller EK, 2010. Category learning in the brain. Annual Review of Neuroscience 33, 203–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seger CA, Peterson EJ, 2013. Categorization = decision making + generalization. Neuroscience and Biobehavioral Reviews 37, 1187–1200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sheppard JP, Wang JP, Wong PC, 2012. Large-scale cortical network properties predict future sound-to-word learning success. Journal of Cognitive Neuroscience 24, 1087–1103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Summerfield C, Trittschuh EH, Monti JM, Mesulam MM, Egner T, 2008. Neural repetition suppression reflects fulfilled perceptual expectations. Nature Neuroscience 11, 1004–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tagarelli KM, Shattuck KF, Turkeltaub PE, Ullman MT, 2019. Language learning in the adult brain: A neuroanatomical meta-analysis of lexical and grammatical learning. Neuroimage 193, 178–200. [DOI] [PubMed] [Google Scholar]
- Takashima A, Bakker I, van Hell JG, Janzen G, McQueen JM, 2014. Richness of information about novel words influences how episodic and semantic memory networks interact during lexicalization. Neuroimage 84, 265–278. [DOI] [PubMed] [Google Scholar]
- Tricomi E, Delgado MR, McCandliss BD, McClelland JL, Fiez J.A.J.C.N.J.o., 2006. Performance Feedback Drives Caudate Activation in a Phonological Learning Task. Journal of Cognitive Neuroscience 18, 1029–1043. [DOI] [PubMed] [Google Scholar]
- Wang Y, Sereno JA, Jongman A, Hirsch J, 2003. fMRI evidence for cortical modification during learning of Mandarin lexical tone. Journal of Cognitive Neuroscience 15, 1019–1027. [DOI] [PubMed] [Google Scholar]
- Wickens TD, 1982. Models for Behavior: Stochastic Processes in Psychology. W.H. Freeman. [Google Scholar]
- Wong FC, Chandrasekaran B, Garibaldi K, Wong PC, 2011. White matter anisotropy in the ventral language pathway predicts sound-to-word learning success. Journal of Neuroscience 31, 8780–8785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wong PC, Perrachione TK, Parrish TB, 2007. Neural characteristics of successful and less successful speech and word learning in adults. Human Brain Mapping 28, 995–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yi HG, Chandrasekaran B, 2016. Auditory categories with separable decision boundaries are learned faster with full feedback than with minimal feedback (L). Journal of the Acoustical Society of America 140, 1332–1335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yi HG, Leonard MK, Chang EF, 2019. The encoding of speech sounds in the superior temporal gyrus. Neuron 102, 1096–1110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yi HG, Maddox WT, Mumford JA, Chandrasekaran B, 2016. The role of corticostriatal systems in speech category learning. Cerebral Cortex 26, 1409–1420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Kuhl PK, Imada T, Iverson P, Pruitt J, Stevens EB, Kawakatsu M, Tohkura Y, Nemoto I, 2009. Neural signatures of phonetic learning in adulthood: a magnetoencephalography study. Neuroimage 46, 226–240. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






