Abstract
Categorization is fundamental to our perception and understanding of the environment. However, little is known about the neural bases underlying the categorization of sounds. Using human functional magnetic resonance imaging (fMRI) we compared the brain responses to a category discrimination task with an auditory discrimination task using identical sets of sounds. Our stimuli differed along two dimensions: a speech–nonspeech dimension and a fast–slow temporal dynamics dimension. All stimuli activated regions in the primary and nonprimary auditory cortices in the temporal cortex and in the parietal and frontal cortices for the two tasks. When comparing the activation patterns for the category discrimination task to those for the auditory discrimination task, the results show that a core group of regions beyond the auditory cortices, including inferior and middle frontal gyri, dorsomedial frontal gyrus, and intraparietal sulcus, were preferentially activated for familiar speech categories and for novel nonspeech categories. These regions have been shown to play a role in working memory tasks by a number of studies. Additionally, the categorization of nonspeech sounds activated left middle frontal gyrus and right parietal cortex to a greater extent than did the categorization of speech sounds. Processing the temporal aspects of the stimuli had a greater impact on the left lateralization of the categorization network than did other factors, particularly in the inferior frontal gyrus, suggesting that there is no inherent left hemisphere advantage in the categorical processing of speech stimuli, or for the categorization task itself. Hum Brain Mapp, 2005. © 2005 Wiley‐Liss, Inc.
Keywords: brain, human, functional magnetic resonance imaging, phonemes, auditory, hemispheric asymmetry
INTRODUCTION
Categorization is basic to cognition, and an understanding of how we categorize is central to understanding how we think and function [Lakoff,1987]. Categorization stands at the transition between sensory processing and higher‐level cognition [Freedman et al.,2001] and allows us to reduce incoming sensory complexity into tractable concepts by focusing on salient and invariant aspects of the stimuli. Some researchers [e.g., Lotto,2000] have suggested that language acquisition itself can be seen as complex category formation. Categorization is a broad term encompassing a range of mental faculties that are involved in recognizing and labeling an incoming stimulus as belonging to a group and using this identification in further mental processing. With such a broad definition of categorization, many studies can be said to have used the categorization paradigm; several tasks involving short‐term memory that are concerned with groups of stimuli can be interpreted also as categorization tasks [e.g., Haxby et al.,1991; Ishai et al.,2002; Shah et al.,2001]. In the present study, we explicitly investigate categorization of speech and nonspeech sounds employing two delayed match‐to‐sample (DMS) tasks, one of which explicitly makes use of category knowledge.
Although categorization of speech sounds has been studied extensively [Liberman,1996; Repp,1984; Stevens,1980], there have been few neuroimaging studies investigating the neural basis of categorization of speech and nonspeech sounds. Categorization of speech sounds (phonemes) is learned early in life and is automatic. Other sound categories can be learned later in life, but may require more conscious effort for making decisions about category memberships. For instance, students of music are taught to differentiate between a crescendo and diminuendo. Whereas some researchers have argued for specialized processing of speech and its concomitant categorization [Liberman et al.,1961; Mattingly,1972], others have suggested that speech processing utilizes a general framework of neural resources used by other auditory processing functions [Joanisse and Gati,2003; Massaro,1997]. We report on using functional magnetic resonance imaging (fMRI) to investigate whether the neural circuitry involved in mediating the processing of familiar phonemic categories also mediates the processing of relatively novel categories of nonspeech sounds. We are studying the neural mechanisms that mediate categorization after the acquisition of the categories and not the process of learning these categories.
As a matter of clarification, we want to distinguish categorization from categorical perception. Categorical perception is a special case of categorization and is limited to a special set of stimuli. It is characterized by a loss of discrimination ability for exemplars within a category and enhanced discrimination ability at the boundaries between two categories. It is also characterized by an identification performance level at ceiling for exemplars within a category and at chance at category boundaries. We are not studying categorical perception as it is so defined, but rather we are investigating one of the aspects of categorization wherein the input stimulus is recognized and identified as belonging to a group. This label is then used to compare it with another stimulus labeled in a similar manner. Furthermore, our discrimination tasks, described below, differ from the discrimination task generally used in the categorical perception studies primarily based on the instructions given to the observers regarding the discrimination decision.
In this fMRI study, we contrasted an explicit category discrimination task with an auditory discrimination task to determine the extra neural resources utilized by categorization. Both these tasks involved discrimination of pairs of stimuli in a DMS task and both tasks featured the same set of sounds to account for stimulus‐specific factors. In the category discrimination task, the listeners distinguished the two sounds based on the categories to which they belonged, whereas in the auditory discrimination task, the listeners discriminated between the two sounds based on their acoustic features.
The sound categories differed not only along the speech–nonspeech dimension, but also differed in having relatively faster and slower (in our case, steady‐state) acoustic dynamics. We included stimuli with different temporal dynamics to investigate the effect of such dynamics on the categorization of sounds. Several researchers [Hickok and Poeppel,2000; Tallal et al.,1993; Zatorre et al.,2002] have suggested that the left hemisphere is better suited for processing of fast temporal information compared to the right hemisphere; such preference may influence the network of brain regions mediating categorization of fast speech and nonspeech sounds. The stimuli were pure tones (nonspeech, slow dynamics), tonal contours (sequences of up‐ and down‐FM (frequency modulated) sweeps interspersed with a steady‐state portion; nonspeech, fast dynamics), vowels (speech, slow dynamics), and consonant–vowel syllables (speech, fast dynamics). For the sake of brevity, we describe the stimuli as having the properties of speech/nonspeech and fast/slow dynamics; they are described in greater detail below. Each stimulus type was divided into two categories, with five exemplars in each category. For tones, the categories were low‐frequency and high‐frequency tones; for tonal contours the categories were up‐down sweeps and down‐up sweeps; for vowels they were /a/ and /i/; and for syllables they were /ba/ and /pa/ sounds. The subject heard an identical set of sounds in both discrimination tasks. Subjects were trained before scanning to familiarize them with the tasks and the novel categories of low‐ and high‐frequency tones and the up‐down and down‐up sweeps. We have shown previously [Guenther et al.,1999] that relatively brief training can result in induction of categories for novel classes of sounds. Subjects were also trained on the speech categories, although they were familiar with these since early childhood.
We wanted to determine the set of brain regions activated in the category discrimination task when contrasted with an auditory discrimination task. As both of these tasks involve discrimination of two stimuli, we can subtract out the task factors related to the discrimination. Sounds belonging to the familiar speech categories may be implicitly categorized regardless of the task, and therefore the subtraction may reveal differences between overt and covert category discrimination. We hypothesize that the category‐discrimination task will involve an additional mental step of assigning the stimulus to a category that the auditory discrimination task does not involve. This should result in longer reaction times in the category discrimination task.
SUBJECTS AND METHODS
Subjects
Twelve right‐handed, normal volunteer subjects (5 women, 7 men; age range 20–45 years) participated in the study. All subjects underwent neurological and audiological tests before scanning. Individuals with any condition that may have compromised their hearing ability in the scanner were excluded from the study. Subjects gave their informed consent to the study, which was approved by the NIDCD‐NINDS IRB (protocol NIH 92‐DC‐0178), and were suitably compensated for their participation in the study.
Stimuli
There were four types of stimuli that differed from each other primarily along two dimensions: in terms of dynamics (steady state to rapidly changing), and in terms of their speech‐like properties (see Fig. 1). The stimuli were pure tones, tonal contours, vowels, and consonant–vowel (CV) syllables. The nonspeech stimuli were also less complex than were the speech‐like stimuli. The stimulus types were divided into two categories, with five exemplars in each category. The pure tones consisted of five low tones (500, 550, 600, 650, and 700 Hz) and five high tones (1,500, 1,600, 1,700, 1,800, and 1,900 Hz). The tonal contours were a combination of up‐ and down‐sweeps (125 ms) interspersed with a 1,000‐Hz steady‐state pure tone (100 ms). There were five up‐and‐down and five down‐and‐up tonal contours. In each tonal contour, the starting and ending frequencies were identical. The five up‐down tonal contours had starting and ending frequencies of 125, 189, 287, 435, and 659 Hz. The down‐up tonal contours had starting and ending frequencies of 1,246, 1,552, 1,933, 2,408, and 3,000 Hz. The frequencies were chosen so as to be equispaced on a log‐frequency (Mel) scale. The tones and tonal contours were generated using LabVIEW (National Instruments Corporation, Austin, TX). The vowels and syllables were generated using HLsyn software (Sensimetrics Corporation, Cambridge, MA). The vowels consisted of five examples of /a/ and five examples of /i/ and the CV syllables consisted of five examples of /ba/ and five examples of /pa/ sounds. The vowels and the vowel portion of the CVs were variations of the male, female, and child formant values of these vowels as reported in Peterson and Barney [1952], thus varying primarily in the pitch of the stimuli. The /i/ vowel tokens were, in the format of (f0, f1, f2, f3): (100, 720, 1,100, 2,500); (130, 720, 1,200, 2,810); (210, 800, 1,300, 2,810); (220, 830, 1,350, 2,900); and (256, 1,000, 1,500, 3,170). The /a/ vowel tokens were: (136, 270, 2,290, 3,010); (180, 290, 2,400, 3,100); (232, 310, 2,600, 3,200); (255, 340, 2,700, 3,300); and (272, 370, 3,200, 3,730). The consonant portions of /ba/ and /pa/ tokens were synthesized based on the work of Hanson and Stevens [2000,2002]. As an example, one /ba/ token had f0 rising from 105 to 110 Hz from 0 to 55 ms and remained constant until 70 ms, then rose to 130 Hz until 150 ms, finally dropping gradually to 800 Hz at 350 ms. The voicing of the /ba/ stimuli began at 50 ms after stimulus onset and the /pa/ stimuli voicing began at 90 ms after onset.
Figure 1.

Types of stimuli used in the experiment, represented here by example spectrograms. The stimulus types varied along the dimension of speech‐like to nonspeech‐like, and the dimension of having slow to fast transients in the stimuli. Each of the four stimulus types (Tones, Vowels, Tonal Contours, and Syllables) is divided into two categories, with five exemplars within each category. Depicted in the figure are examples of the different categories. (Exact values of the stimuli along the frequency–time dimensions are listed in Subjects and Methods ).
All stimuli were 350 ms in duration and matched for loudness. First, the amplitude of the stimuli was changed so that all sounds had the same average root‐mean‐square (RMS) power. Second, in a pilot study, subjective ratings were obtained from four volunteers for the relative loudness of all stimuli within the scanner, with the parameters of the scanning procedure identical to that of the experiment. The sounds were adjusted to reflect the average of these ratings. At the beginning of each scanning session, subjects were able to adjust the loudness of the sounds they heard and sounds were played at this maximum comfort level (MCL). Postexperiment tests revealed that the MCL was between 12–18 dB above the scanner noise.
Tasks
Subjects carried out a DMS task (see Fig. 2). The subjects listened to two sounds, interspersed by an interval, and decided if the two sounds were the same or different. During separate blocks, the decision was based on one of two criteria: (1) are the two sounds exactly same, and (2) do the sounds belong to the same category? The first type of task is referred as auditory discrimination (AUD) and the second type as category discrimination or (CAT). Each trial was 3.7 s in duration, with two stimuli of 350 ms each, an interstimulus interval of 1 s, and 2 s of subject‐response time. The response time was fixed, regardless of the time taken by the subject to indicate the response.
Figure 2.

Timeline of an fMRI trial. Each trial is a delayed‐match‐to‐sample task, where pairs of sounds are played interspersed with a delay and the subject responds after hearing the second stimulus.
Training
Subjects were trained outside the magnet before being scanned. Training lasted generally for half an hour, with training for each stimulus type taking about 7 min. During each mini‐training session for a particular stimulus type, subjects listened passively to the five sounds from each of the two categories, and then participated in tasks to discriminate between pairs of sounds based on either criterion (1) or (2). The categories were identified only as “A” and “B” so as to be uniform (not all stimulus types had easily identifiable categories). To the subjects, category A consisted of the five sounds they heard and were told belonged to category A. They were not given labels like “up‐down glides” (the stimuli categories have been given names in Fig. 1 for ease of understanding of the reader). Although there were separate training sessions for each stimulus type, the instructions and training were identical in each case. Subjects were given feedback for the tasks and repeated a specific training session until their performance was better than a threshold of 85% (62 correct/72 trials) for that stimulus type. During scanning, before each test run, subjects were familiarized with the type of sounds that they would discriminate in the following run by listening passively to examples of the two categories of that sound type.
Scanning
Subjects were scanned using echo planar imaging (EPI) technique in a block design paradigm in which 22 axial slices were collected on a 1.5‐Tesla Signa scanner (General Electric, Waukesha, WI). Stimuli were presented using Presentation software (Neurobehavioral Systems, Albany, CA) on a PC laptop computer. Subjects listened to the sound stimuli through headphones (Commander XG; Resonance Technology, Northridge, CA) that were attached to an air conductance sound delivery system. There were four separate functional EPI runs, one for each type of stimuli. Within an EPI run, subjects underwent seven rest blocks and six test blocks. Within a given test block, there were 12 DMS trials, all of the same type, either AUD or CAT. The instructions as to which criterion to use for the DMS tasks within a given block was given visually 10 s before the beginning of the test block (during the preceding rest block) and during the entire period of the block. The first rest block (fixating on an asterisk on screen) was 42 s; subsequent rest blocks were 30 s long. Each test block was 44.5 or 45 s long. The last rest block was 33 s long. Total time was 8.12 min (or 164 repetitions of 3‐s TR) for each run. The first four and the last volumes were discarded. Sagittal localizer and anatomical scans were run before the T2‐weighted functional EPI (epilx1) scans. The functional scans consisted of 22 interleaved axial slices that were 5 mm in thickness and with 3.75 × 3.75 mm in‐plane resolution (echo time [TE] = 40 ms, matrix size 64 × 64, filed of view [FOV] = 240 mm, 90‐degree flip angle).
Analysis
The analysis was carried out using the statistical parametric mapping (SPM) software (SPM99; Wellcome Dept. of Cognitive Neurology, London, UK; online at http://www.fil.ion.ucl.ac.uk/spm). The image volumes were corrected for slice time differences, realigned, normalized into standard stereotactic space (using the Montreal Neurological Institute [MNI] EPI template provided with SPM99 software) and smoothed with a Gaussian kernel (8 mm full‐width half‐maximum). The normalized and smoothed fMRI time series of the 12 volunteers were subject to group analysis. The data was rescaled for variations in global signal intensity and high‐pass filtered to remove the effect of any low‐frequency drift. Using a general linear model [Friston et al.,1995], a mixed effects analysis was conducted at two levels. First, a fixed‐effects analysis was carried out for individual subjects (P < 0.001, uncorrected) and contrast images created for the different task conditions. Next, the contrast images from the first level were used to carry out a t‐test (random‐effects analysis) that determined the significantly activated voxels.
Our a priori hypothesis is that a distributed network is involved in categorization of sounds. To insure that all nodes of this network are included, it is important to provide an adequate balance between type I and type II errors. Therefore, to avoid the inadvertent exclusion of critical nodes in the network, a slightly more relaxed statistical threshold was therefore used (in most cases this was either P < 0.001 or P < 0.005, uncorrected).
RESULTS
Behavioral Data
The stimuli presentation program recorded the responses of the subjects to the DMS task via button presses. Both reaction time and accuracy of responses were analyzed for 11 subjects (one subject whose behavioral responses were inadvertently not recorded was excluded from the behavioral analysis) using two‐way repeated‐measures analysis of variance (ANOVA). The analysis of the reaction times data showed a main effect of task type (F[1,10] = 4.75, P = 0.03) and stimulus type (F[3,30] = 2.72, P = 0.0497). There was no significant interaction of stimulus and task types, although the 11 subjects showed the following trend: within a stimulus type, reaction times were longer for the CAT than the AUD tasks, as shown in Figure 3a. The analysis of the correct responses revealed a main effect of task type was significant (F[1,10] = 4.29, P = 0.04); the interaction of stimulus and task types was also significant (F[3,30] = 3.06, P = 0.03). Subjects performed at greater than 90% accuracy on both tasks for all stimulus types; nevertheless, their accuracy was significantly better (two‐tailed t‐test, P < 0.05) for the AUD task as compared to the that for the CAT task for the Tones and Syllables stimuli (see Fig. 3b).
Figure 3.

a: Subjects' reaction times during the different tasks (auditory or category discrimination) for all the four stimulus types. b: Subjects' responses to the different tasks (auditory or category discrimination) for all the four stimulus types. *Significance in t‐tests (two‐tailed, P < 0.05).
Imaging Data
As a first step in the analysis of the imaging data, we determined the active voxels for each stimulus type separately, for both the CAT and the AUD conditions when compared to a baseline Rest task. For each stimulus type, the contrast revealed activated voxels bilaterally in primary and secondary auditory areas in the temporal cortex, in areas in the frontal and parietal lobes, and in the pre‐ and postcentral gyri. As an example, the activation patterns for tonal contours (TCs) for the contrasts AUD‐Rest and CAT‐Rest are depicted in Figure 4.
Figure 4.

Activation from the SPM z‐maps projected onto surface of a single‐subject template brain. The contrasts depicted are for the Tonal Contours as an example of the subtractions Category discrimination − Rest and Auditory discrimination − Rest for the random‐effects analysis thresholded at P < 0.005, uncorrected.
To determine the areas involved in categorization, we contrasted the CAT and AUD conditions (with Rest implicitly subtracted from each). Figure 5 (see also Table I) shows the areas with significantly (P < 0.005, uncorrected, cluster size >20 voxels) more activity in the CAT compared to that in the AUD conditions for the different stimulus types. As expected, the primary and secondary auditory areas and the supplementary motor areas, activated relative to Rest in both the AUD and CAT conditions, were absent in these comparison maps. The Tones CAT‐AUD contrast revealed significantly activated voxels in three main clusters: in the left and right inferior frontal gyri (more voxels in the left hemisphere) and in the left superior parietal lobule. Additionally, the right middle frontal gyrus, inferior parietal lobule, and precuneus were activated to a smaller extent. The Tonal Contours (TCs) CAT‐AUD contrast showed large clusters of activity in the inferior frontal gyrus, the inferior parietal lobule, and the middle temporal gyrus, all in the left hemisphere. There were smaller activation clusters in the left dorsomedial frontal gyrus and the right occipital lobe. The Vowels condition demonstrated a higher activity for CAT compared to that for AUD predominantly in bilateral inferior frontal gyrus, intraparietal sulcus, and putamen (in all cases the left hemisphere was more active than the right was) and right dorsomedial frontal gyrus. There were smaller clusters in right middle frontal gyrus, caudate nucleus, putamen, and cerebellum. Compared to the other stimulus conditions, the Syllables condition exhibited the least number of significantly activated voxels in the CAT‐AUD condition. In Figure 5, the Syllables condition figures therefore are shown with two thresholds, P < 0.005 and P < 0.01. In Table I, the Syllables CAT‐AUD contrast is reported with a more relaxed threshold of cluster size of greater than 10 voxels (P < 0.005). The significantly activated voxels for the CAT‐AUD contrast were located in the bilateral middle frontal gyrus and in the left inferior frontal gyrus, thalamus, inferior parietal lobule, and dorsomedial frontal gyrus.
Figure 5.

Activation from the SPM z‐maps in the axial view. The contrasts depicted are for the four stimulus types for the subtraction Category discrimination − Auditory discrimination (or CAT − AUD) for the random‐effects analysis rendered on a single‐subject T1‐weighted structural template, thresholded at P < 0.005, uncorrected. In addition, the Syllables conditions contrast is shown for P < 0.01.
Table I.
Locations of the local maxima for the contrast category discrimination – auditory discrimination for the four stimulus types using a random‐effects analysis
| Stimulus | MNI coordinates (x, y, z) | Z score | Cluster size | Gyrus (Brodmann area) |
|---|---|---|---|---|
| Tones | −45, 33, 24 | 4.43 | 273 | IFG (45/46) |
| 48, 15, 30 | 3.79 | 167 | IFG (44/46) | |
| −27, −66, 54 | 3.71 | 120 | SPL (7) | |
| 30, −63, 45 | 3.48 | 21 | IPL (7) | |
| 33, 48, 18 | 3.34 | 29 | MFG (46) | |
| 6, −69, 45 | 3.30 | 24 | Precuneus (7) | |
| Tonal contours | −54, −51, −12 | 3.95 | 89 | MTG (37) |
| 0, 18, 57 | 3.66 | 24 | DMFG (6,8) | |
| −39, −57, 51 | 3.56 | 162 | IPL (40) | |
| 6, −78, −15 | 3.46 | 29 | Occipital lobe, lingual G (18) | |
| −51, 33, 18 | 3.40 | 147 | IFG, MFG (45, 46) | |
| Vowels | −27, −54, 48 | 4.28 | 106 | SPL (7) |
| −24, 18, 0 | 4.09 | 104 | Putamen | |
| −45, 18, 27 | 4.04 | 340 | IFG (45, 46) | |
| 6, −27, 0 | 3.74 | 43 | Midbrain | |
| 9, 33, 42 | 3.67 | 124 | DMFG (8) | |
| 12, −3, 21 | 3.66 | 21 | Caudate nucleus | |
| −42, −42, 48 | 3.61 | 47 | IPL (40) | |
| 18, 9, 3 | 3.59 | 61 | Putamen | |
| 33, 51, 18 | 3.50 | 21 | MFG (46) | |
| 54, 18, 33 | 3.33 | 66 | MFG, IFG (9, 44) | |
| 12, −45, −27 | 3.31 | 45 | Cerebellum, culmen | |
| 45, 3, 42 | 3.20 | 28 | MFG (9) | |
| Syllables | −3, −15, 0 | 4.18 | 28 | Thalamus |
| −48, −48, 48 | 3.97 | 30 | IPL (40) | |
| 0, 24, 54 | 3.91 | 50 | DMFG (8) | |
| −30, 27, 0 | 3.71 | 16 | IFG (47) | |
| −36, 54, 15 | 3.68 | 33 | MFG (10) | |
| −51, 33, 15 | 3.62 | 13 | IFG (46) | |
| −36, −63, 45 | 3.42 | 16 | IPL (7) | |
| 39, 51, 12 | 3.36 | 27 | MFG (10) |
Locations are listed in Montreal Neurological Institute (MNI) coordinates and in terms of Brodmann areas (the MNI coordinates were converted into Talairach system before determining the corresponding Brodmann area). Cluster sizes for these activations are >20 voxels for all stimuli, except Syllables, which are >10 voxels.
P < 0.005, uncorrected for random effects analysis.
I, inferior; F, frontal; G, gyrus; S, superior; P, parietal; L, lobule; M, middle; DM, dorsomedial.
We also compared the AUD with the CAT conditions. Whereas TCs and Vowels showed few significant clusters of activated voxels (compared to their CAT‐AUD contrast images), the Tones and Syllables conditions exhibited several clusters located predominantly in left posterior cingulate, right parahippocampal gyrus, and dorsomedial frontal gyrus. There were additional clusters in the right cerebellum, right posterior cingulate, and left middle temporal gyrus for the Syllables condition and in the right superior/middle temporal gyrus for the Tones condition.
To determine if areas activated in the CAT‐AUD contrasts were due to deactivation in the AUD condition, we inclusively masked CAT‐AUD contrast with the CAT‐Rest contrast. This was carried out by creating a group CAT‐Rest image using random effects with a generous threshold (P < 0.05, uncorrected) and using this image to explicitly mask (logical AND) the CAT‐AUD images at the second level of random effects (P < 0.005, uncorrected), thereby removing the effects of deactivations in the AUD‐Rest condition. The results of the masked CAT‐AUD contrast were not substantially different from those of the non‐masked CAT‐AUD contrast; the difference lay primarily in the cluster size, with smaller clusters being activated in the masked contrast conditions.
Because reaction time in the CAT task was always longer than that in the AUD task (however, this difference was only significant for the Tones condition), some might argue that categorization task reflects the same process as in the auditory discrimination task, only involving more effort. To assess the effect of performance, we used performance (reaction time for CAT‐ reaction time for AUD, one mean value per subject) as a nuisance covariate and carried out regression at the second level of random‐effects analysis for the CAT‐AUD contrast. The brain region most correlated with performance was the dorsomedial frontal gyrus, especially prominent in the Tones condition but absent in the Syllables condition.
Conjunction Analysis
To investigate the underlying categorization network common to different dimensions of the stimulus types, we carried out conjunction analysis [Friston et al.,1999; Price and Friston,1997; and for an example see Janata et al.,2002] for pairs of stimulus types for the CAT‐AUD contrast. The results are shown in Figure 6 and Table II (P < 0.001, uncorrected). Although conjunction analysis has its problems [Caplan and Moo,2004], it is applicable to our experimental design because the CAT‐AUD subtraction is narrowly cognitively constrained, as the two tasks differ only in the decision criterion, and a conjunction of the two subtractions (differing only in the stimulus type) can yield important information. The conjunction analysis was carried out using two‐sample t‐tests at the second level of random‐effects analysis, without inclusion of the constant factor to account for nonsphericity.
Figure 6.

Activation from the SPM z‐maps in the axial view. The contrasts depicted are for pooled random‐effects analysis of all the four stimulus types for the subtraction Category discrimination (CAT) − Auditory discrimination (AUD) (P < 0.05, corrected) and for the conjunctions of the following stimulus conditions Tones + Tonal Contours, Vowels + Syllables, Tones + Vowels, Tonal Contours + Syllables (P < 0.001, uncorrected).
Table II.
Locations of the local maxima for the contrast category discrimination – auditory discrimination for the four stimulus types using conjunction and pooled random‐effects analysis
| Stimulus | MNI coordinates (x, y, z) | Z score | Cluster size | Gyrus (Brodmann area) |
|---|---|---|---|---|
| All (pooled) | −33, −60, 48 | 5.99 | 109 | IPL, SPL (7,40) |
| −45, 33, 21 | 5.89 | 66 | IFG, MFG (46, 45) | |
| −30, 27, 0 | 5.55 | 9 | IFG (47) | |
| 36, 51, 15 | 5.44 | 11 | MFG (10) | |
| −3, 24, 51 | 5.36 | 24 | DMFG (6,8) | |
| 51, 15, 30 | 5.24 | 11 | IFG (44,9) | |
| Tones * Vowels | −48, 18, 24 | 5.74 | 325 | IFG, MFG (44, 46) |
| 54, 15, 33 | 5.22 | 250 | IFG, MFG (46, 44, 45) | |
| −30, −57, 48 | 5.19 | 320 | IPL, SPL (7, 40) | |
| 33, 51, 18 | 5.19 | 62 | MFG (46) | |
| 42, 3, 54 | 4.24 | 45 | MFG (6) | |
| 30, −63, 48 | 4.14 | 48 | IPL (7) | |
| −36, 3, 54 | 4.12 | 43 | MFG (6) | |
| −6, 27, 48 | 3.66 | 67 | DMFG (8) | |
| TCs * Syllables | −45, −51, 51 | 5.62 | 107 | IPL (40, 7) |
| 0, 24, 54 | 5.12 | 47 | DMFG (8) | |
| 33, −93, −12 | 5.11 | 30 | IOG (18, 17) | |
| −51, 33, 15 | 4.85 | 146 | IFG (45, 44, 46) | |
| 39, 51, 12 | 4.67 | 49 | MFG, SFG (46, 10) | |
| −51, 15, 27 | 4.04 | 30 | IFG (44, 46) | |
| 36, −72, 51 | 3.93 | 29 | SPL (7) | |
| Tones * TCs | −39, −57, 51 | 5.83 | 286 | IPL, SPL (40, 7) |
| −42, 36, 24 | 5.63 | 393 | IFG, MFG (44, 46) | |
| 39, 3, 54 | 4.72 | 25 | MFG (6) | |
| −33, 30, −3 | 4.60 | 26 | IFG (47) | |
| 30, 48, 24 | 4.42 | 169 | IFG, SFG (44, 46, 10) | |
| −27, 12, 54 | 4.15 | 22 | MFG (8) | |
| −3, 21, 54 | 3.99 | 47 | DMFG(8) | |
| −39, 39, 0 | 3.98 | 38 | MFG (10) | |
| 42, −45, 51 | 3.96 | 27 | IPL, SPL (40, 7) | |
| 33, −63, 42 | 3.91 | 55 | IPL, SPL (40, 7) | |
| Vowels * Syllables | −30, 30, 0 | 5.58 | 31 | IFG (47) |
| −6, −18, 0 | 5.10 | 64 | Thalamus | |
| 36, 51, 15 | 4.72 | 56 | MFG (10) | |
| −45, −51, 48 | 4.67 | 99 | IPL, SPL (40, 7) | |
| 0, 27, 51 | 4.60 | 64 | DMFG (8) | |
| 57, 12, 27 | 4.58 | 41 | IFG (44) | |
| −51, 15, 27 | 4.35 | 22 | IFG (44) |
P < 0.001, uncorrected for conjunction random‐effects analysis; P < 0.05, corrected for pooled random‐effects analysis.
Locations are listed in Montreal Neurological Institute (MNI) coordinates and in terms of Brodmann areas (the MNI coordinates were converted into Talairach system before determining the corresponding Brodmann area). Cluster sizes for these activations are >20 voxels for conjunction analysis and >10 voxels for pooled analysis.
TCs, tonal contours; I, inferior; P, parietal; L, lobule; S, superior, M, middle; F, frontal; G, gyrus; DM, dorsomedial; O, occipital.
Nonspeech stimuli (the conjunction of Tones and TCs) demonstrated more activity for CAT than for AUD in bilateral middle and inferior frontal gyri, and inferior and superior parietal lobules, with the left hemisphere activations greater than the right hemisphere activations. Additionally, the nonspeech stimuli exhibited activation in the dorsomedial frontal gyrus. Speech stimuli (Vowels and Syllables) together activated bilateral inferior frontal gyrus, right middle frontal gyrus, left inferior and superior parietal lobules, left thalamus, and dorsomedial frontal gyrus. There was no activation in the left middle frontal gyrus and right posterior parietal areas for this conjunction.
Stimuli having relatively slow temporal dynamics (the conjunction of Tones and Vowels) exhibited greater activity for CAT compared to that for AUD in bilateral inferior and middle frontal gyri (both comparable in number of voxels), left posterior (minor activation in right posterior) parietal cortex, and left dorsomedial frontal gyrus. Stimuli with fast temporal dynamics (TCs and Syllables) activated left posterior parietal cortex and left inferior frontal gyrus. There were minor activations in right middle frontal gyrus, dorsomedial frontal gyrus, right superior parietal lobule, and right occipital gyrus. There was no activation in right inferior frontal gyrus for this last conjunction.
To determine the common areas activated for all stimulus types, we pooled the data from the four conditions together and carried out a random‐effects analysis (i.e., used the contrast images from the first level analysis of all stimulus conditions for a one‐sample t‐test). We elected to carry out a pooled analysis because SPM99 does not permit conjunction of four conditions (or conjunction of a conjunction) in a straightforward manner. Earlier we had verified that for two conditions, the results of a conjunction analysis and a pooled random‐effects analysis did not substantially differ, i.e., the samples did not differ markedly in their variance. The pooled analysis (P < 0.05, corrected; Fig. 6, Table II) showed that left superior and inferior parietal lobules, left inferior and middle frontal gyri, and left dorsomedial frontal gyrus had more activity for all stimulus types in the category discrimination condition than during the auditory discrimination condition. There were minor activations in the right inferior and middle frontal gyri.
Lateralization Analysis
To verify that the lateralization effects discussed above were not due to the threshold that we set, we carried out the following analysis, which compares homologous left‐right differences. First, the initial processing of the fMRI time series of the 12 volunteers was carried out as described above. The normalized and smoothed fMRI time series were then right‐left flipped about the y‐axis. Using a general linear model (fixed effects), significantly activated voxels were determined for each subject, resulting in CAT′ and AUD′ conditions. We then tested for significant effects of the contrast of ([CAT − AUD] − [CAT′ − AUD′]) by carrying out a random‐effects analysis at the second level for all subjects. The significant voxels (P < 0.05, uncorrected) in this contrast condition, for all the four stimulus conditions, are depicted in Figure 7. P < 0.05 is statistically conservative because we want to allow even small hemispheric differences to be significant. For the Tones and the Vowels conditions, the significant voxels were located primarily in the right frontal areas and the left posterior areas. Additionally, there was an absence of activation in the inferior frontal gyri for these conditions, indicating a bilateral activation. In the TCs and Syllables conditions, the significant voxels were found in the left frontal areas and the right posterior areas; however, the Syllables conditions also showed left posterior activations. These results demonstrate that the lateralization effects found earlier, especially in the frontal areas, are robust and not due to the threshold that was used. Furthermore, they indicate that the lateralization effects of the stimulus types for the CAT‐AUD contrast are greater along the dimension of slow‐to‐fast temporal dynamics than they are along the speech‐to‐nonspeech dimension.
Figure 7.

Activation from the SPM z‐maps in the axial view. The contrasts depicted are for the lateralization analysis of the four stimulus types, ([CAT − AUD] − [CAT′ − AUD′]), where Category discrimination (CAT) and Auditory discrimination (AUD) refer to contrasts of the original images and CAT′ and AUD′ refer to contrasts of the flipped images (P < 0.05, uncorrected, cluster size >20 voxels).
DISCUSSION
Using fMRI, we investigated the neural basis of categorization of some simple auditory objects. Subjects were trained to distinguish between pairs of sounds from four classes (tones, tonal contours, vowels, and syllables), based on either their auditory features (AUD) or their categorical nature (CAT). Whereas subjects had long‐term knowledge with the categories of vowels and syllables used in the experiment, they became familiar with the nonspeech categories only during the training session before scanning. To determine the brain regions that were involved in the additional mental computation needed for categorization, we contrasted the brain activation for the CAT task with the AUD task for individual stimulus types. The results of the CAT‐AUD contrast (further corroborated by a combined pooled analysis of all the stimulus types) demonstrated that a core set of regions crucial for processing of categories was activated by all the stimulus types: areas in predominantly left inferior and middle frontal gyri (with minor activations in the corresponding right hemisphere regions), left inferior parietal lobule, and left dorsomedial frontal gyrus.
The finding of a common set of brain regions for the categorization of speech and nonspeech stimuli has implications for theories of speech perception. One prominent theory holds that categorical speech perception is special and utilizes a different cognitive architecture from that employed for the perception of nonspeech categories [e.g., Liberman et al.,1961; Mattingly,1972]. Our results do not support this proposal; rather they are more in keeping with previous work with animals and infants [Kuhl and Padden,1982; Kuhl,1987,1991] that suggests that the categorization of speech sounds draws upon a more universal set of neural mechanisms used for categorizing sounds. However, it may be that the specialness of speech may be found at a finer spatial and temporal resolution than that revealed by fMRI as employed in our experiments. In addition, because of the close link between perception and production, the unique cognitive processing of speech may occur using more complex stimuli and tasks [Horwitz and Braun,2004].
Core Regions
The inferior and middle frontal gyri have been implicated in the processing of phonemes [Burton et al.,2000; Joanisse and Gati,2003] and sounds with fast acoustic transients [Griffiths et al.,2000b; Joanisse and Gati,2003; Johnsrude et al.,1997; Poldrack et al.,2001]. The inferior frontal gyrus has also been shown to be active during visual and auditory semantic categorization [Adams and Janata,2002; Saykin et al.,1999] and the middle frontal gyrus in semantic categorization [Hugdahl et al.,1999; Saykin et al.,1999], suggesting their roles as polymodal integrators of categories and concepts.
The parietal lobule has been implicated in phonological storage of working memory [D'Esposito et al.,1998; Fiez,2001; Honey et al.,2000; Paulesu et al.,1993] and in selective attention tasks [Shaywitz et al.,1998,2001]. Vogels et al. [2002], using positron emission tomography (PET), showed that a categorization task of visual random dot stimuli patterns activated the prefrontal and parietal cortical areas to a greater degree than did a noncategorization task.
The dorsomedial frontal gyrus, or the supplementary motor area (SMA), has been shown to be active in tasks involving verbal working memory [Grasby et al.,1994; Schumacher et al.,1996] and demanding tasks that involve paying attention to temporal synchrony of the stimulus features [Lux et al.,2003], or speech in noise conditions [Salvi et al.,2002]. Kojima et al. [1998] have suggested that one role of SMA is in categorizing the input sound by analyzing the pattern of vocalization action necessary to generate that sound; such analysis becomes especially necessary in case of encounters with unfamiliar stimuli such as pseudowords.
Our results lead us to hypothesize that the categorization process, as expressed in our study, is involved with working memory (WM) processing, bridging the gap between perception and higher‐order cognition. We propose that the categorization task, compared to the auditory discrimination task, involves additional neural resources to label the incoming stimuli with the correct category, remember this label, and compare it with the label of the subsequent stimulus. This is supported by the CAT‐AUD subtraction maps for the different stimuli that showed greater activation in areas associated with WM tasks. The three core regions show up in numerous working memory studies, especially those involving manipulation [e.g., Bullmore et al.,2000; Fiez,2001]. Behavioral studies [Oakhill and Kyle,2000] have shown a close link between performance on phonological sound categorization task and short‐term WM, which consisted of both storage and further manipulation requirements. A similar link could not be established between categorization and short‐term memory task involving only storage components. Manipulation is one type of “central executive” function as described in the WM model of Baddeley [1992,2003]. Although our study employed WM tasks to investigate categorization, the subtraction of the AUD task from the CAT task minimized the contribution of the short‐term memory component to the contrast and thus permitted the neural resources needed for the additional computational step of categorization to be assessed.
Support for the idea of the discrimination tasks as WM tasks comes from our modeling experimental study [Husain et al.,2004]. We used a part of the results of the present fMRI study to validate a large‐scale neurobiologically realistic model of auditory processing. The model consists of a set of modules representing regions from the primary auditory to the prefrontal cortex. The electrical activities of the neuronal units of the model were constrained to agree with data from the neurophysiological literature regarding the perception of frequency‐modulated sweeps. The model performs a DMS task, similar to the auditory discrimination task (AUD) in the present study, using simulated TCs and Tones. In this modeling framework, the discrimination task is processed by a set of regions in the prefrontal cortex that maintain the first stimulus during the interstimulus delay, compare it with the second stimulus, and indicate whether the two stimuli match or not. The task is thus realized in the model as a short‐term WM task mediated by frontal areas. As shown in Figure 4 and noted above, there was activation in the prefrontal areas for the AUD‐Rest contrast for the TCs and the other stimuli. We used data from the AUD task for Tones and TCs from the experiment and compared it to the simulated BOLD fMRI data that were derived from the integrated synaptic activity of the neuronal units in each region of the model. There was general agreement between the simulated fMRI data and the experimentally observed fMRI data in the brain areas corresponding to the regions of the model. As shown in Figure 5 and Table I, there was additional activation in the frontal areas for the CAT task compared to that for the AUD task for all stimulus types, suggesting that the categorization task involved neural resources that are in the same locations as those mediating short‐term WM tasks.
A recent article by Hickok and Poeppel [2004] provides further support for our hypothesis of categorization as a verbal WM process. Hickok and Poeppel [2004] propose a framework for understanding the neural basis of speech perception within which the early stages of speech perception involve auditory areas in the superior temporal gyrus. Beyond this early processing, two divergent streams of processing emerge: a ventral stream that mediates mapping sound to meaning and a dorsal stream that concerns mapping sound onto articulatory‐based representations. The dorsal stream involves a distributed network, consisting of regions in the parietal–temporal boundary and the frontal cortex, that carries out auditory–motor integration. In this framework, verbal WM is conceived of as a special case of auditory–motor integration requiring articulatory rehearsal in the frontal systems to keep sensory‐based representations in the superior temporal gyrus active. Hickok and Poeppel [2004] hold that the posterior part of the inferior frontal gyrus is activated during sublexical tasks, such as categorization, due to conversion of speech inputs to articulatory codes needed to maintain the stimuli in WM. In our results, the inferior frontal gyrus is activated to a greater extent during the categorization task than it is during the auditory discrimination task. During the categorization task for speech‐like stimuli (Vowels, Syllables), the additional mental steps needed to identify and remember the category labels therefore may involve conversion to articulatory codes. A similar process, without explicit translation to articulatory codes, may be involved in categorization of nonspeech stimuli like Tones and TCs.
Speech versus Nonspeech
The newly learned nonspeech categories (conjunction of Tones and TC conditions) showed activation patterns in the CAT‐AUD contrast similar to those seen for the familiar speech categories (conjunction of Vowels and Syllables conditions). Nevertheless, there were some important differences: the nonspeech stimuli also activated the left middle frontal gyrus and the right parietal lobule, whereas the speech stimuli activated the left thalamus. We interpret the similarity between the nonspeech and speech patterns of activation to mean that the processing of relatively novel and familiar sound categories utilizes a common set of brain regions. We interpret the differences to mean that the categorization of novel sound classes recruits more regions (the homologs) due to greater demands of processing.
The activation of left middle frontal gyrus has been noted in demanding tasks of working memory [Yoo et al.,2004] and of speech segmentation [Gandour et al.,2003]. Benson et al. [2001] used fMRI to investigate the dissociation of speech and nonspeech stimuli with varying degrees of complexity. Apart from regions in the temporal cortex, these researchers noted increased activation of left middle frontal gyrus for speech versus nonspeech stimuli; however, this distinction was for passive listening to natural speech sounds as compared to synthetic speech sounds. The natural speech sounds differed from synthetic speech sounds in that the former had energy for frequencies greater than 3,500 Hz. The fact that we used a short‐term memory task unlike the passive listening task of Benson et al. [2001] and that our speech stimuli were synthetic may partly explain the lack of activation in left middle frontal gyrus. Similarly, the activation of right parietal cortex has been noted in demanding sentence comprehension tasks by older adults but not young normal subjects [Grossman et al.,2002] and in analysis of sound movements [Griffiths et al.,2000a]. The dissociation in our results between the speech and nonspeech conditions is also not consistent with the results of Joanisse and Gati [2003].
Stop consonants, such as the initial parts of the /ba/ and /pa/ syllables used in our study, are said to be highly categorizable [Schouten and van Hessen,1992] and relatively impervious to task demands [Pisoni,1973]. Regardless of the task, such consonants (and the syllables) therefore will always be categorized. This partly explains why fewer activation clusters were seen in the CAT‐AUD contrast condition for the Syllables (see Fig. 5) as compared to that for the other stimuli. Vowel segments, although speech sounds, are much less categorizable [Schouten and van Hessen,1992]; therefore, more clusters were seen in the CAT‐AUD contrasts for the Vowels than for the Syllables conditions. Additionally, the conjunction of Vowels and Syllables CAT‐AUD conditions exhibited the least number of activated voxels than any other conjunction.
Fast versus Slow Temporal Dynamics
Contrasting the CAT‐AUD tasks for sounds with fast and slow temporal dynamics resulted in different patterns of activation, primarily in the inferior frontal gyrus and middle frontal gyrus. We consider TCs and Syllables to have faster dynamics with respect to the slower dynamics of the Tones and Vowels stimuli. However, the transitions in the TCs and Syllables are not identical, with the former having slower (and fewer) transitions. Both TCs and Syllables (fast dynamics) showed only left hemisphere activation in the inferior frontal gyrus and only right hemisphere activation in the middle frontal gyrus, whereas the Tones and Vowels (slow dynamics) showed a more bilateral activation in these areas; there were no major differences in other areas of the core network.
Our results regarding the dissimilarity in the processing of sounds with fast and slow temporal dynamics are partially consistent with the findings of Joanisse and Gati [2003], who investigated the processing of syllables and tones. They used fMRI and employed a short‐term memory task similar to our AUD discrimination task (they did not have a CAT discrimination task). The syllables differed in either the consonant or the vowel portion of the stimulus whereas the tones varied along either a dynamic or a spectral dimension. They found that regions in bilateral superior temporal gyrus and left inferior frontal gyrus showed greater activation for stimuli with rapid dynamics compared to that for the steady‐state stimuli. Unlike our results, these investigators did not find any region that had differential activation for the speech and the nonspeech conditions; both conditions activated regions bilaterally in the superior temporal plane and the left inferior frontal gyrus to the same extent. Further, no parietal lobe activations were found by Joanisse and Gati [2003]. Although not all of the parietal lobe was imaged by these researchers, we think that the lack of an overt categorization task accounts for the absence of parietal involvement: we found a difference between speech and nonspeech conditions in the differential activation of the intraparietal sulcal areas for the CAT‐AUD contrast.
Lateralized Activation
We noted an apparent lateralization of the activation patterns in the CAT‐AUD subtraction (Fig. 5, Table I) and in the conjunction analyses (Fig. 6, Table II). To determine if this lateralization is significant or is dependent on the threshold used, we carried out an additional lateralization analysis within the context of categorization (see Fig. 7). In this last analysis, a significantly activated cluster of voxels in a particular hemisphere would indicate significantly lateralized activation for that hemisphere. In the discussion of the results that follows, we use the term “left lateralization” to mean that the activation pattern is completely left lateralized for a particular region and there is no significant activation in the right homolog of that region.
Stimuli with the faster temporal dynamics (TCs and Syllables) exhibited left‐lateralized activation in the inferior and middle frontal gyri, whereas stimuli with slower temporal dynamics (Tones and Vowels) showed a right‐lateralized activation at a small locus restricted to the middle frontal gyrus. Possibly due to the nature of the task (active memory task) and the nature of the stimuli (brief), the categorization of the Tones or the Vowels did not result in a more right‐lateralized activation pattern. Even at the level of the auditory cortex, lateralization is susceptible to small changes in acoustic features or familiarity [Tervaniemi and Hugdahl,2003]. No consistent patterns of lateralization differences were discerned when stimulus conditions were grouped along the speech–nonspeech dimension.
Left‐lateralized processing of speech and language stimuli, primarily in the temporal cortex, have been noted for many years [Belin et al.,1998; Coney,2002; Frost et al.,1999; Tervaniemi et al.,2000; Zatorre et al.,1992]. We did not find any common left‐lateralized patterns for the speech (Vowels, Syllables) stimuli for the CAT task compared to that for the AUD task (Fig. 6). The conjunction of speech‐like stimuli (Fig. 6) showed a right‐lateralized activation of the middle frontal gyrus and a left‐lateralized activation of the posterior cortex. The nature of the speech‐like stimuli (isolated, synthetic tokens of simple sounds) and the tasks (active decision making) used in the present study may have contributed to the absence of left‐lateralized patterns of activations.
Although our results are partly consistent with the lateralization hypotheses proposed by several researchers [Hickok and Poeppel,2000; Poeppel,2001; Tallal et al.,1993; Zatorre and Belin,2001; Zatorre et al.,2002], they suggest that the differential involvement of the two hemispheres in auditory processing is more complicated than these theories indicate.
Absence of Activation in Temporal Regions
Whereas auditory processing areas in the superior temporal gyrus were activated during CAT‐Rest and AUD‐Rest contrast conditions, these were not present in the CAT‐AUD subtractions (with Rest subtracted from each). From this, one should not draw the conclusion that superior temporal cortex is not involved in categorization. It may be that the temporal resolution of fMRI and the nature of the tasks and the block‐design experimental paradigm did not suitably exhibit the involvement of the temporal regions in categorization. A recent fMRI study [Guenther et al.,2004] found that the auditory cortex is involved in the representation of sound categories, with fewer neural cells involved in representing the category center than those involved in category boundaries. Aphasic studies [Baum and Ryan,1993; Monoi et al.,1983] have indicated the role of superior temporal gyrus (extending into planum temporale or Wernicke's area) in categorization of consonants and vowels. To obtain greater temporal details about categorization and the involvement of temporal cortex in categorization, we have conducted an magnetoencephalography (MEG) study using stimuli and tasks identical to the ones used in the present study. The results of the MEG study [Luo et al.,2005,] indicate the involvement of the temporal lobe to a greater extent in the categorical discrimination task than in the auditory discrimination task during part of the delay period between stimuli presentation; this difference was not found for any other time periods and only for the nonspeech stimuli. These results further imply that the temporal cortex can be a locus for the dissociation between categorization of speech and nonspeech stimuli.
Future Studies
The stimuli used in our experiment were low‐level speech and nonspeech sounds. For example, the syllables and vowels used were isolated synthesized tokens and their processing may not involve many of the brain areas that are concerned with linguistic processing. Future studies employing more complex stimuli will be necessary to determine how factors such as context, long‐term memory, and higher‐order linguistic processing interact with categorization.
In the present study, we chose to present stimuli in the presence of background scanner noise, as opposed to during quiet intervals in a “sparse” sampling regimen [Hall et al.,1999]. The main disadvantage to the sparse sampling regimen is that the fewer trials and volumes collected in this regimen decreases the statistical power, especially because in our paradigm we are using 40 different sounds. Nevertheless, in future extensions of the present study, we plan to use a sparse sampling regimen with fewer different sounds.
CONCLUSIONS
Our study demonstrated that the categorization of sounds utilizes a distributed neural system. For all stimulus types, this system involved key regions in the frontal lobe (inferior and middle frontal gyri and dorsomedial cortex) and parietal lobe (intraparietal sulcus). These regions also play a role in WM processing, thus suggesting a close connection between categorization and working memory. Additional brain regions exhibited minor activations depending on the stimulus type. We found that neither the categorization process itself nor the aspect of being a speech sound was sufficient to invoke a left‐lateralized activation pattern. However, we did find that during categorization (relative to auditory discrimination), the activation patterns for sounds with relatively fast temporal dynamics were left lateralized in the inferior frontal cortex as compared to sounds with relatively slow temporal dynamics. Consequently, it was the fast–slow temporal dimension of the sounds rather than the speech–nonspeech dimension that affected the left lateralization of the activity pattern during the categorization.
Categorization, being at the junction between perceptual and cognitive processing, can be studied in humans [e.g., Lakoff,1987; Phillips et al.,2000], in animals [e.g., Freedman et al.,2001; Kuhl and Padden,1982,1983; Molfese et al.,1986; Morse et al.,1987], and across different sensory modalities [e.g., Adams and Janata,2002; Gauthier et al.,1999]. It therefore lends itself to large‐scale neurobiologically realistic modeling, along the lines of Tagamets and Horwitz [1998] and Husain et al. [2004]. Within such a modeling framework, one can combine data from monkey electrophysiological studies, human behavioral and neuroimaging findings, and from multiple sensory modalities. Our results thus provide crucial information about the neural substrates of categorical processing that such a modeling‐experimental paradigm requires.
Acknowledgements
Supported by the NIDCD intramural program of NIH.
We thank Helen M. Hanson of Sensimetrics, Inc. and audiologists Mary Ann Mastroiani and Yvonne Szymko‐Bennett for their assistance.
REFERENCES
- Adams RB, Janata P (2002): A comparison of neural circuits underlying auditory and visual object categorization. Neuroimage 16: 361–377. [DOI] [PubMed] [Google Scholar]
- Baddeley A (1992): Working memory. Science 255: 556–559. [DOI] [PubMed] [Google Scholar]
- Baddeley A (2003): Working memory and language: an overview. J Commun Disord 36: 189–208. [DOI] [PubMed] [Google Scholar]
- Baum SR, Ryan L (1993): Rate of speech effects in aphasia: voice onset time. Brain Lang 44: 431–445. [DOI] [PubMed] [Google Scholar]
- Belin P, Zilbovicius M, Crozier S, Thivard L, Fontaine A, Masure MC, Samson Y (1998): Lateralization of speech and auditory temporal processing. J Cogn Neurosci 10: 536–540. [DOI] [PubMed] [Google Scholar]
- Benson RR, Whalen DH, Richardson M, Swainson B, Clark VP, Lai S, Liberman AM (2001): Parametrically dissociating speech and nonspeech perception in the brain using fMRI. Brain Lang 78: 364–396. [DOI] [PubMed] [Google Scholar]
- Bullmore E, Horwitz B, Honey G, Brammer M, Williams S, Sharma T (2000): How good is good enough in path analysis of fMRI data? Neuroimage 11: 289–301. [DOI] [PubMed] [Google Scholar]
- Burton MW, Small SL, Blumstein SE (2000): The role of segmentation in phonological processing: an fMRI investigation. J Cogn Neurosci 12: 679–690. [DOI] [PubMed] [Google Scholar]
- Caplan D, Moo L (2004): Cognitive conjunction and cognitive functions. Neuroimage 21: 751–756. [DOI] [PubMed] [Google Scholar]
- Coney J (2002): Lateral asymmetry in phonological processing: relating behavioral measures to neuroimaged structures. Brain Lang 80: 355–365. [DOI] [PubMed] [Google Scholar]
- D'Esposito M, Aguirre GK, Zarahn E, Ballard D, Shin RK, Lease J (1998): Functional MRI studies of spatial and nonspatial working memory. Brain Res Cogn Brain Res 7: 1–13. [DOI] [PubMed] [Google Scholar]
- Fiez JA (2001): Bridging the gap between neuroimaging and neuropsychology: using working memory as a case‐study. J Clin Exp Neuropsychol 23: 19–31. [DOI] [PubMed] [Google Scholar]
- Freedman DJ, Riesenhuber M, Poggio T, Miller EK (2001): Categorical representation of visual stimuli in the primate prefrontal cortex. Science 291: 312–316. [DOI] [PubMed] [Google Scholar]
- Friston KJ, Holmes AP, Price CJ, Buchel C, Worsley KJ (1999): Multisubject fMRI studies and conjunction analyses. Neuroimage 10: 385–396. [DOI] [PubMed] [Google Scholar]
- Friston KJ, Holmes AP, Worsley KJ, Poline JP, Frith AD, Frackowiak RSJ (1995): Statistical parametric maps in functional imaging: a general linear approach. Hum Brain Mapp 2: 189–210. [Google Scholar]
- Frost JA, Binder JR, Springer JA, Hammeke TA, Bellgowan PS, Rao SM, Cox RW (1999): Language processing is strongly left lateralized in both sexes. Evidence from functional MRI. Brain 122: 199–208. [DOI] [PubMed] [Google Scholar]
- Gandour J, Xu Y, Wong D, Dzemidzic M, Lowe M, Li X, Tong Y (2003): Neural correlates of segmental and tonal information in speech perception. Hum Brain Mapp 20: 185–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gauthier I, Tarr MJ, Anderson AW, Skudlarski P, Gore JC (1999): Activation of the middle fusiform “face area” increases with expertise in recognizing novel objects. Nat Neurosci 2: 568–573. [DOI] [PubMed] [Google Scholar]
- Grasby PM, Frith CD, Friston KJ, Simpson J, Fletcher PC, Frackowiak RS, Dolan RJ (1994): A graded task approach to the functional mapping of brain areas implicated in auditory‐verbal memory. Brain 117: 1271–1282. [DOI] [PubMed] [Google Scholar]
- Griffiths TD, Green GG, Rees A, Rees G (2000a): Human brain areas involved in the analysis of auditory movement. Hum Brain Mapp 9: 72–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Griffiths TD, Penhune V, Peretz I, Dean JL, Patterson RD, Green GG (2000b): Frontal processing and auditory perception. Neuroreport 11: 919–922. [DOI] [PubMed] [Google Scholar]
- Grossman M, Cooke A, DeVita C, Alsop D, Detre J, Chen W, Gee J (2002): Age‐related changes in working memory during sentence comprehension: an fMRI study. Neuroimage 15: 302–317. [DOI] [PubMed] [Google Scholar]
- Guenther FH, Husain FT, Cohen MA, Shinn‐Cunningham BG (1999): Effects of categorization and discrimination training on auditory perceptual space. J Acoust Soc Am 106: 2900–2912. [DOI] [PubMed] [Google Scholar]
- Guenther FH, Nieto‐Castanon A, Ghosh SS, Tourville JA (2004): Representation of sound categories in auditory cortical maps. J Speech Lang Hear Res 47: 46–57. [DOI] [PubMed] [Google Scholar]
- Hall DA, Haggard MP, Akeroyd MA, Palmer AR, Summerfield AQ, Elliott MR, Gurney EM, Bowtell RW (1999): “Sparse” temporal sampling in auditory fMRI. Hum Brain Mapp 7: 213–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hanson HM, Stevens KN (2000): Modeling stop‐consonant releases for synthesis. J Acoust Soc Am 107: 2907. [Google Scholar]
- Hanson HM, Stevens KN (2002): A quasiarticulatory approach to controlling acoustic source parameters in a Klatt‐type formant synthesizer using HLsyn. J Acoust Soc Am 112: 1158–1182. [DOI] [PubMed] [Google Scholar]
- Haxby JV, Grady CL, Horwitz B, Ungerleider LG, Mishkin M, Carson RE, Herscovitch P, Schapiro MB, Rapoport SI (1991): Dissociation of object and spatial visual processing pathways in human extrastriate cortex. Proc Natl Acad Sci USA 88: 1621–1625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hickok G, Poeppel D (2000): Towards a functional neuroanatomy of speech perception. Trends Cogn Sci 4: 131–138. [DOI] [PubMed] [Google Scholar]
- Hickok G, Poeppel D (2004): Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92: 67–99. [DOI] [PubMed] [Google Scholar]
- Honey GD, Bullmore ET, Sharma T (2000): Prolonged reaction time to a verbal working memory task predicts increased power of posterior parietal cortical activation. Neuroimage 12: 495–503. [DOI] [PubMed] [Google Scholar]
- Horwitz B, Braun AR (2004): Brain network interactions in auditory, visual and linguistic processing. Brain Lang 89: 377–384. [DOI] [PubMed] [Google Scholar]
- Hugdahl K, Lundervold A, Ersland L, Smievoll AI, Sundberg H, Barndon R, Roscher BE (1999): Left frontal activation during a semantic categorization task: an fMRI‐ study. Int J Neurosci 99: 49–58. [DOI] [PubMed] [Google Scholar]
- Husain FT, Tagamets MA, Fromm SJ, Braun AR, Horwitz B (2004): Relating neuronal dynamics for auditory object processing to neuroimaging activity: a computational modeling and an fMRI study. Neuroimage 21: 1701–1720. [DOI] [PubMed] [Google Scholar]
- Ishai A, Haxby JV, Ungerleider LG (2002): Visual imagery of famous faces: effects of memory and attention revealed by fMRI. Neuroimage 17: 1729–1741. [DOI] [PubMed] [Google Scholar]
- Janata P, Tillmann B, Bharucha JJ (2002): Listening to polyphonic music recruits domain‐general attention and working memory circuits. Cogn Affect Behav Neurosci 2: 121–140. [DOI] [PubMed] [Google Scholar]
- Joanisse MF, Gati JS (2003): Overlapping neural regions for processing rapid temporal cues in speech and nonspeech signals. Neuroimage 19: 64–79. [DOI] [PubMed] [Google Scholar]
- Johnsrude IS, Zatorre RJ, Milner BA, Evans AC (1997): Left‐hemisphere specialization for the processing of acoustic transients. Neuroreport 8: 1761–1765. [DOI] [PubMed] [Google Scholar]
- Kojima H, Hirano S, Naito Y, Honjo I (1998): Left hemispheric dominance and the role of verbal motor‐related region in language cognition. Clin Positron Imaging 1: 223–228. [DOI] [PubMed] [Google Scholar]
- Kuhl PK (1987) The special‐mechanism debate in speech research: Categorization tests on animals and infants In: Harnad S, editor. Categorical perception: the groundwork of cognition. Cambridge, UK: Cambridge University Press; p 355–386. [Google Scholar]
- Kuhl PK (1991): Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not. Percept Psychophys 50: 93–107. [DOI] [PubMed] [Google Scholar]
- Kuhl PK, Padden DM (1982): Enhanced discriminability at the phonetic boundaries for the voicing feature in macaques. Percept Psychophys 32: 542–550. [DOI] [PubMed] [Google Scholar]
- Kuhl PK, Padden DM (1983): Enhanced discriminability at the phonetic boundaries for the place feature in macaques. J Acoust Soc Am 73: 1003–1010. [DOI] [PubMed] [Google Scholar]
- Lakoff G (1987): Women, fire and dangerous things: what categories reveal about the mind. Chicago: University of Chicago Press; 614 p. [Google Scholar]
- Liberman AM (1996): Speech: a special code. Cambridge, MA: MIT Press; 504 p. [Google Scholar]
- Liberman AM, Harris KS, Kinney JA, Lane HL (1961): The discrimination of relative‐onset time of the components of certain speech and non‐speech patterns. J Exp Psychol 61: 379–388. [DOI] [PubMed] [Google Scholar]
- Lotto AJ (2000): Language acquisition as complex category formation. Phonetica 57: 189–196. [DOI] [PubMed] [Google Scholar]
- Luo H, Husain FT, Horwitz B, Poeppel D (2005): Discrimination and categorization of speech and non‐speech sounds in an MEG delayed‐match‐to‐sample‐study. Neuroimage. doi:10.1016/j.neuroimage.2005.05.040. [DOI] [PubMed] [Google Scholar]
- Lux S, Marshall JC, Ritzl A, Zilles K, Fink GR (2003): Neural mechanisms associated with attention to temporal synchrony versus spatial orientation: an fMRI study. Neuroimage 20(Suppl): 58–65. [DOI] [PubMed] [Google Scholar]
- Massaro D (1997): Categorical partition: a fuzzy‐logic model of categorization behavior In: Harnad S, editor. Categorical perception: the groundwork of cognition. Cambridge, UK: Cambridge University Press; p 254–283. [Google Scholar]
- Mattingly IG (1972): Speech cues and sign stimuli. Am Sci 60: 327–337. [PubMed] [Google Scholar]
- Molfese DL, Laughlin NK, Morse PA, Linnville SE, Wetzel WF, Erwin RJ (1986): Neuroelectrical correlates of categorical perception for place of articulation in normal and lead‐treated rhesus monkeys. J Clin Exp Neuropsychol 8: 680–696. [DOI] [PubMed] [Google Scholar]
- Monoi H, Fukusako Y, Itoh M, Sasanuma S (1983): Speech sound errors in patients with conduction and Broca's aphasia. Brain Lang 20: 175–194. [DOI] [PubMed] [Google Scholar]
- Morse PA, Molfese D, Laughlin NK, Linnville S, Wetzel F (1987): Categorical perception for voicing contrasts in normal and lead‐treated rhesus monkeys: electrophysiological indices. Brain Lang 30: 63–80. [DOI] [PubMed] [Google Scholar]
- Oakhill J, Kyle F (2000): The relation between phonological awareness and working memory. J Exp Child Psychol 75: 152–164. [DOI] [PubMed] [Google Scholar]
- Paulesu E, Frith CD, Frackowiak RS (1993): The neural correlates of the verbal component of working memory. Nature 362: 342–345. [DOI] [PubMed] [Google Scholar]
- Peterson G, Barney H (1952): Control methods used in study of vowels. J Acoust Soc Am 24: 175–184. [Google Scholar]
- Phillips C, Pellathy T, Marantz A, Yellin E, Wexler K, Poeppel D, McGinnis M, Roberts T (2000): Auditory cortex accesses phonological categories: an MEG mismatch study. J Cogn Neurosci 12: 1038–1055. [DOI] [PubMed] [Google Scholar]
- Pisoni DB (1973): Auditory and phonetic memory codes in the discrimination of consonants and vowels. Percept Psychophys 13: 253–260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poeppel D (2001): Pure word deafness and the bilateral processing of the speech code. Cogn Sci 25: 679–693. [Google Scholar]
- Poldrack RA, Temple E, Protopapas A, Nagarajan S, Tallal P, Merzenich M, Gabrieli JD (2001): Relations between the neural bases of dynamic auditory processing and phonological processing: evidence from fMRI. J Cogn Neurosci 13: 687–697. [DOI] [PubMed] [Google Scholar]
- Price CJ, Friston KJ (1997): Cognitive conjunction: a new approach to brain activation experiments. Neuroimage 5: 261–270. [DOI] [PubMed] [Google Scholar]
- Repp BH (1984): Categorical perception: issues, methods and findings In: Lass NJ, editor. Speech and language: advances in basic research and practice, Vol 10 New York, NY: Academic Press; p 243–335. [Google Scholar]
- Salvi RJ, Lockwood AH, Frisina RD, Coad ML, Wack DS, Frisina DR (2002): PET imaging of the normal human auditory system: responses to speech in quiet and in background noise. Hear Res 170: 96–106. [DOI] [PubMed] [Google Scholar]
- Saykin AJ, Flashman LA, Frutiger SA, Johnson SC, Mamourian AC, Moritz CH, O'Jile JR, Riordan HJ, Santulli RB, Smith CA, Weaver JB (1999): Neuroanatomic substrates of semantic memory impairment in Alzheimer's disease: patterns of functional MRI activation. J Int Neuropsychol Soc 5: 377–392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schouten ME, van Hessen AJ (1992): Modeling phoneme perception. I: Categorical perception. J Acoust Soc Am 92: 1841–1855. [DOI] [PubMed] [Google Scholar]
- Schumacher EH, Lauber E, Awh E, Jonides J, Smith EE, Koeppe RA (1996): PET evidence for an amodal verbal working memory system. Neuroimage 3: 79–88. [DOI] [PubMed] [Google Scholar]
- Shah NJ, Marshall JC, Zafiris O, Schwab A, Zilles K, Markowitsch HJ, Fink GR (2001): The neural correlates of person familiarity. A functional magnetic resonance imaging study with clinical implications. Brain 124: 804–815. [DOI] [PubMed] [Google Scholar]
- Shaywitz BA, Shaywitz SE, Pugh KR, Fulbright RK, Skudlarski P, Mencl WE, Constable RT, Marchione KE, Fletcher JM, Klorman R, Lacadie C, Gore JC (2001): The functional neural architecture of components of attention in language‐processing tasks. Neuroimage 13: 601–612. [DOI] [PubMed] [Google Scholar]
- Shaywitz SE, Shaywitz BA, Pugh KR, Fulbright RK, Constable RT, Mencl WE, Shankweiler DP, Liberman AM, Skudlarski P, Fletcher JM, Katz L, Marchione KE, Lacadie C, Gatenby C, Gore JC (1998): Functional disruption in the organization of the brain for reading in dyslexia. Proc Natl Acad Sci USA 95: 2636–2641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stevens KN (1980): Acoustic correlates of some phonetic categories. J Acoust Soc Am 68: 836–842. [DOI] [PubMed] [Google Scholar]
- Tagamets MA, Horwitz B (1998): Integrating electrophysiological and anatomical experimental data to create a large‐scale model that simulates a delayed match‐to‐sample human brain imaging study. Cereb Cortex 8: 310–320. [DOI] [PubMed] [Google Scholar]
- Tallal P, Miller S, Fitch RH (1993): Neurobiological basis of speech: a case for the preeminence of temporal processing. Ann N Y Acad Sci 682: 27–47. [DOI] [PubMed] [Google Scholar]
- Tervaniemi M, Hugdahl K (2003): Lateralization of auditory‐cortex functions. Brain Res Brain Res Rev 43: 231–246. [DOI] [PubMed] [Google Scholar]
- Tervaniemi M, Medvedev SV, Alho K, Pakhomov SV, Roudas MS, Van Zuijen TL, Naatanen R (2000): Lateralized automatic auditory processing of phonetic versus musical information: a PET study. Hum Brain Mapp 10: 74–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vogels R, Sary G, Dupont P, Orban GA (2002): Human brain regions involved in visual categorization. Neuroimage 16: 401–414. [DOI] [PubMed] [Google Scholar]
- Yoo SS, Paralkar G, Panych LP (2004): Neural substrates associated with the concurrent performance of dual working memory tasks. Int J Neurosci 114: 613–631. [DOI] [PubMed] [Google Scholar]
- Zatorre RJ, Belin P (2001): Spectral and temporal processing in human auditory cortex. Cereb Cortex 11: 946–953. [DOI] [PubMed] [Google Scholar]
- Zatorre RJ, Belin P, Penhune VB (2002): Structure and function of auditory cortex: music and speech. Trends Cogn Sci 6: 37–46. [DOI] [PubMed] [Google Scholar]
- Zatorre RJ, Evans AC, Meyer E, Gjedde A (1992): Lateralization of phonetic and pitch discrimination in speech processing. Science 256: 846–849. [DOI] [PubMed] [Google Scholar]
