Skip to main content
Human Brain Mapping logoLink to Human Brain Mapping
. 2008 Mar 10;30(3):859–873. doi: 10.1002/hbm.20550

Functional architecture of verbal and tonal working memory: An FMRI study

Stefan Koelsch 1,2,, Katrin Schulze 1, Daniela Sammler 1, Thomas Fritz 1, Karsten Müller 1, Oliver Gruber 3
PMCID: PMC6871123  PMID: 18330870

Abstract

This study investigates the functional architecture of working memory (WM) for verbal and tonal information during rehearsal and articulatory suppression. Participants were presented with strings of four sung syllables with the task to remember either the pitches (tonal information) or the syllables (verbal information). Rehearsal of verbal, as well as of tonal information activated a network comprising ventrolateral premotor cortex (encroaching Broca's area), dorsal premotor cortex, the planum temporale, inferior parietal lobe, the anterior insula, subcortical structures (basal ganglia and thalamus), as well as the cerebellum. The topography of activations was virtually identical for the rehearsal of syllables and pitches, showing a remarkable overlap of the WM components for the rehearsal of verbal and tonal information. When the WM task was performed under articulatory suppression, activations in those areas decreased, while additional activations arose in anterior prefrontal areas. These prefrontal areas might contain additional storage components of verbal and tonal WM that are activated when auditory information cannot be rehearsed. As in the rehearsal conditions, the topography of activations under articulatory suppression was nearly identical for the verbal as compared to the tonal task. Results indicate that both the rehearsal of verbal and tonal information, as well as storage of verbal and tonal information relies on strongly overlapping neuronal networks. These networks appear to partly consist of sensorimotor‐related circuits which provide resources for the representation and maintenance of information, and which are remarkably similar for the production of speech and song. Hum Brain Mapp, 2009. © 2008 Wiley‐Liss, Inc.

Keywords: working memory, pitch, verbal, music

INTRODUCTION

Working memory (WM) refers to a brain system of linked and interacting information‐processing components for temporal storage and simultaneous manipulation of information [Baddeley,1992,2003]. This brain system is critical for higher cognitive functions such as language, music, planning, problem solving, and reasoning. One of the most influential WM models was developed by Baddeley and Hitch more than three decades ago [Baddeley,1992; Baddeley and Hitch,1974]. According to this model, WM consists of an attentional control system (the “central executive”) that operates in conjunction with two “slave systems” that serve to maintain representations of information of different modalities: the visuospatial sketchpad and the phonological loop. The visuospatial sketchpad is concerned with the processing and storage of visual and spatial information. The phonological loop represents verbal short‐term memory, and is thought to consist on the one hand of a phonological store that holds auditory information for a few seconds, and on the other hand of a phonological rehearsal mechanism that is analogous to subvocal speech [Baddeley,2003].

So far, the functional neuroarchitecture of the phonological loop has mainly been investigated with respect to language. Both neuropsychological and functional imaging studies indicate that Broca's area and premotor areas (pre‐SMA, SMA, vlPMC, and dPMC) play a crucial role during the phonological rehearsal process [Awh et al.,1996; Fiez et al.,1996; Gruber and von Cramon,2003; Paulesu et al.,1993; Ravizza et al.,2004]. In addition, both the insular cortex [Bamiou et al.,2003; Chein et al.,2002; Paulesu et al.,1993] and the cerebellum [Chen and Desmond,2005; Gruber,2001; Kirschen et al.,2005; Ravizza et al.,2004] have been reported to be involved in phonological rehearsal. The phonological store has been suggested to be located in parietal areas, particularly the inferior parietal lobe [Awh et al.,1996; Chen and Desmond,2005; Crottaz‐Herbette et al.,2004; Gruber,2001; Gruber and von Cramon,2003; Henson et al.,2000; Jonides et al.,1998; Kirschen et al.,2005; Paulesu et al.,1993], but also in the superior parietal lobe [Awh et al.,1996; Chen and Desmond,2005; Crottaz‐Herbette et al.,2004; Henson et al.,2000; Ravizza et al.,2004]. However, the localization of the phonological store in the parietal lobe is partly controversial [Fiez et al.,1996; Hickok et al.,2003], because, e.g., neural activity in this area might also reflect increased engagement of attentional resources [for an overview see: Cabeza and Nyberg,2000; Corbetta and Shulmann,2002, see also Jones et al.,2004]. Moreover, a series of recent functional neuroimaging and experimental neuropsychological studies has provided evidence that phonological storage is not a purely parietal brain function, but relies on a broader network of inferior parietal and anterior prefrontal brain regions supporting the nonarticulatory maintenance of phonological information [Gruber,2001; Gruber and Goschke,2004; Gruber and von Cramon,2001,2003; Gruber et al.,2005].

Obviously, phonological information is not the only important auditory information in everyday life. Other relevant information includes speech prosody and music. So far, a number of behavioral studies investigated whether the phonological loop also serves the processing of such nonphonological information, or whether different subsystems [like a “tonal loop,” see Pechmann and Moor,1992] exist in addition to the phonological loop. However, these studies do not yet provide a consistent picture. Deutsch [1970] reported that intervening tones interfered more strongly than phonemes with a pitch memory task, and this finding was taken as evidence for a specialized tonal WM system. Salame and Baddeley [1989] showed that vocal music interfered more strongly with phonological short‐term memory than instrumental music, supporting the assumption of two independent WM systems for verbal and tonal stimuli [Salame and Baddeley,1989]. On the other hand, results by Semal et al. [1996] suggest that the pitch of speech sounds is not stored differently from the pitch of nonspeech sounds in WM. In addition, Iwanaga and Itoh [2002] reported that instrumental as well as vocal music interfered with a verbal WM task, and Chan et al. [1998] observed that musical training increases the performance during a verbal WM task, suggesting rather overlapping neural resources for verbal and tonal WM. Considering these contradictory results, it remains unclear whether cognitive (and neural) resources of tonal and verbal WM overlap. Thus, knowledge about the neural organization of the phonological loop cannot simply be generalized to nonphonological auditory WM.

Possible differences or similarities between the neuronal networks underlying WM for tonal and verbal stimuli have so far only sparsely been addressed. Using fMRI, Gaab et al. [2003] showed involvement of the supramarginal gyrus (SMG) extending into the intraparietal sulcus (IPS), planum temporale, ventrolateral premotor regions encroaching Broca's area, dorsolateral premotor regions, and dorsolateral cerebellar regions during a pitch memory task. This network is surprisingly reminiscent of the network implicated in the phonological loop (see above). A similar network (including the inferior frontal and insular cortex, the planum temporale, and the SMG) had previously been shown with PET for the active retention of pitch [Zatorre et al.,1994]. Hickok et al. [2003] showed with fMRI that (subvocal) articulatory rehearsal of verbal as well as of musical information activated ventrolateral premotor regions encroaching Broca's area, dorsolateral premotor regions, the planum temporale (referred to by the authors as area Spt), and (with lowered statistical threshold) the SMG/IPS. The regions activated in that study were, thus, very similar compared to those observed by Gaab et al. [2003], and they served the processing of both verbal and musical rehearsal.

In the present study, we investigated similarities and differences between the neural components underlying WM for verbal (syllables) and tonal (pitch) material both during rehearsal and under articulatory suppression. The stimuli for the tonal and verbal WM tasks were identical, consisting of sequences of four sung syllables. To investigate the articulatory rehearsal component, participants were required to rehearse subvocally either the pitches or the syllables after the presentation of a stimulus sequence, and to respond subsequently to a probe sequence with a button press. In addition, suppression conditions were employed to assess the neural correlates of the nonarticulatory storage component. Articulatory suppression is known to prevent articulatory rehearsal, and therefore participants have to rely more strongly on the information represented in the phonological store to perform the task [Gruber,2001]. During the suppression condition in the present experiment, participants were asked to remember either the pitches or the syllables of a presented sequence, while singing a well known children's song after the presentation of the sequence. Only this combined articulatory and musical (tonal) suppression task was able to prevent the subjects from using a tonal rehearsal strategy, i.e. to subvocally repeat the pitches while performing (purely) articulatory suppression. After the suppression, participants were asked to respond subsequently to a probe sequence with a button press (as in the rehearsal conditions).

On the basis of the literature reported earlier, we hypothesized that articulatory rehearsal would activate frontal speech areas (ventral premotor cortex and Broca's area), parietal regions (SMG/IPS), and the planum temporale. Region of interest analyses were planned to investigate possible differences in the topography of the activated networks for tonal and verbal rehearsal, as well as hemispheric differences and differences in strength of activation. In the suppression conditions, additional activations were expected in anterior prefrontal (intermediate frontal sulcus) and inferior parietal areas.

METHODS

Participants

Twelve right‐handed nonmusicians (25–30 years, M = 26.7 years, 7 females) with normal hearing took part in the experiment. None of the participants had any special musical training exceeding general school education. All subjects were students of the University of Leipzig (except one, who was a chef in a restaurant). Participants were right‐handed [mean lateralization quotient was 95.8% according to the Edinburgh Handedness Inventory; Oldfield,1971], and had reading span scores ranging from 2.6 to 6 [M = 3.7, SD = 1.1; scores were assessed with a German version of the reading span test from Daneman and Carpenter,1980].

Stimuli

Stimuli were sung syllables, thus containing both verbal (syllable) and tonal (pitch) information (no spoken syllables, and no pure tones or instrumental tones were presented). There were eight syllables (taken from the German alphabet) which were acoustically well distinguishable (b [be;], f [Ef], j [jOt], k [ka;], o [o;], v [fAu], x [Iks], and z [tsEt]). Each of these eight syllables was sung by a female singer on eight different pitches (these eight pitches corresponded to the pitches of a major scale), resulting in a total of 64 sung stimuli (8 syllables × 8 pitches = 64). The pitches of the stimuli were electronically adjusted using Cool Edit Pro (Syntrillium Corp., Phoenix, AZ) within a range from 200 to 400 Hz (corresponding to one octave) with interval ratios exactly corresponding to tempered intonation. Length of stimuli was adjusted to 400 ± 2 ms by shortenting vowels only (thus without reducing intelligibility of the syllables). To construct control conditions, each stimulus was also recorded backwards (see also below). Subsequently, stimuli were grouped to 216 sequences, each comprising of four stimuli (see Fig. 1). Syllables did not form meaningful words (such as “fox”) and pitches of consecutive stimuli were at least five, and not more than nine, semitones apart from each other. Silence periods of 150 ms were inserted between stimuli, and a 100 ms pause was added after the last stimulus.

Figure 1.

Figure 1

Experimental design. The six example trials illustrate the six different experimental conditions, each trial had a duration of 13 s and began with a visual cue (V R = verbal (syllable) rehearsal, V S = verbal (syllable) suppression, T R = tonal (pitch) rehearsal, T S = tonal (pitch) suppression, N 0 = not memorize anything without rehearsing or singing a song, N S = not memorize anything and sing the song). The cue was followed by the presentation of the stimulus sequence. In the subsequent rehearsal conditions, subjects covertly rehearsed either the syllables (V R) or the pitches (T R). During the suppression conditions (V S, T S), subjects covertly sang a children's song while trying to maintain either the verbal (V S) or the tonal information (T S) in their memory. Then, a probe sequence was presented, followed by a silence period of 2.35 s during which participants had to indicate whether the probe sequence was identical to the initial sequence (verbal and tonal conditions). In the control conditions (N 0, N S), in which participants did not have to memorize the initial sequence, participants had to indicate whether each of the syllables was played forward (forward speech) or backward (backward speech).

Procedure

There were six experimental conditions (see also Fig. 1): (1) memorize pitches while rehearsing the pitches, (2) memorize pitches during articulatory suppression (singing a children's song, see also below), (3) memorize syllables while rehearsing the syllables, and (4) memorize syllables during articulatory suppression (singing the children's song). In addition to these memory conditions, there were two control conditions: (5) memorize nothing (without rehearsal or singing) and (6) memorize nothing and sing the children's song.

Each experimental trial started with a visual cue consisting of two simultaneously presented capital letters (2,350 ms). The first letter indicated what to memorize (see green letters in the left of Fig. 1): either only syllables (i.e., the verbal information, “V”), or only pitches (i.e., the tonal information, “T”), or nothing (“N”). The second letter indicated the articulatory action to be performed after the presentation of a stimulus sequence (see red letters in the left of Fig. 1): either rehearsal (of pitches or syllables, “R”), or singing the children's song (“S”, this task also served the articulatory suppression, see below), or neither sing nor rehearse (“0”). For the singing condition, participants were instructed to subvocally sing a well‐known German children's song (“Hänschen klein”). Importantly, during trials in which pitches or syllables had to be memorized, the singing condition represented a combined articulatory and musical (tonal) suppression because it prevented both tonal and verbal rehearsal (note that the material used for articulatory suppression was, thus, identical for the verbal and the tonal condition). For the rehearsal conditions, participants were instructed to subvocally rehearse the syllables (without melody) in the verbal condition, and in the tonal condition to subvocally rehearse the pitches (without articulating the syllables of the stimulus sequence) using the syllable [hm]. That is, the rehearsal task was designed such that participants only rehearsed pitches (without syllables), or only rehearsed syllables (without pitches). We used covert production to avoid auditory feedback of the subject's own voice (and corresponding activations of the auditory cortex), and to avoid motion artifacts in the fMRI signal that are likely to occur during overt oral production [see also Callan et al.,2006]. Moreover, the potential interaction between the degree of susceptibility artifact related to changes in the oral cavity during scanning of vocal articulation on the one side, and the type of production task (verbal or tonal rehearsal) on the other, is likely to produce false results [see also Callan et al.,2006]. However, participants were thoroughly trained with the tasks in a separate training session with both overt and covert production (see also below).

The initial cue of each trial was followed by a four‐stimulus sequence that had a duration of 2,150 ms, and by a silence period (4,000 ms) for subvocal rehearsal or singing/suppression. After this period, participants were presented with a probe sequence that consisted, as the initial sequence, of four sung syllables (2,150 ms). Then, subjects were asked to indicate via a button press whether the pitches of that sequence (in the tonal condition) or the syllables of that sequence (in the verbal condition) were the same as those of the initial sequence. As mentioned earlier, participants did not have to memorize syllables or tones in the two control conditions (N 0, N S). In these conditions, stimuli of the probe sequence were played with the same pitches, but each of the syllables was either played forward or backward, and subjects had to indicate via a button press whether they heard forward speech or backward speech (see outermost right of the two bottom panels of Fig. 1). This task was easy (correct responses were >98%, see Results), and although participants had nothing to remember, they still anticipated a control sequence, made a binary (yes/no) decision, and performed a motor response.

In the other conditions, four different types of probe sequences were used (see two outer most right panels in Fig. 1): (a) verbal and tonal information of the sequence was correct (same syllables, same pitches), (b) only the tonal, or (c) only the verbal sequence was correct, or (d) neither tonal nor verbal sequence were correct. Incongruency was obtained by exchanging the positions of two elements, that is, either of two syllables, or of two pitches, or of two pitch‐syllable elements (see right of Fig. 1). Each probe sequence type occurred equiprobably in the four conditions (V R, V S, T R, T S).

Participants had two response buttons (correct/incorrect) which they pressed with their left and right index finger. Key assignment was counterbalanced across participants. The fMRI experiment comprised 36 trials in each of the six conditions (T S, T R, V S, V R, N S, N 0), resulting in a total of 216 trials (with 216 different initial stimulus sequences), corresponding to a duration of approximately 50 min. During the experiment, trials of all six conditions were pseudorandomly intermixed.

Participants were trained in a separate session of about 1 h duration on a separate day within the week prior to the fMRI measurement. In the training session they performed all tasks both covertly and overtly. This enabled us to control that participants actually rehearsed only either the pitches or the syllables in the rehearsal conditions, and that they actually sang the children's song during the suppression condition. Moreover, it allowed us to collect behavioral data for overt rehearsal and suppression, and to compare these data with the behavioral data obtained in the fMRI session.

fMRI Scanning Procedure

Scanning was performed on a 3‐T scanner (Medspec 30/100, Bruker, Ettlingen). Prior to the functional recordings, anatomical slices were acquired. The anatomical slices had the same geometric orientation as the functional slices. Before each functional session, a high‐resolution anatomical reference data set (T1‐weighted) was acquired for each participant, which was standardized to the Talairach stereotactic space [Talairach and Tournoux,1988]. A spin‐echo EPI sequence was used with a TE of 75 ms, a TR of 2,000 ms, and an acquisition bandwidth of 100 kHz. Acquisition of the slices was arranged uniformly within the TR interval. The matrix acquired was 64 × 64 with a FOV of 19.2 cm, resulting in an in‐plane resolution of 3 mm × 3 mm. Slice thickness was 5 mm with an interslice gap of 1 mm (14 slices were acquired, nine above the AC‐PC plane). In the present study, we did not choose a sparse temporal scanning design because our primary interest was not to investigate perceptual mechanisms within the auditory cortex (and the larger number of acquisitions may increase the signal‐to‐noise ratio in nonauditory regions). However, we currently investigate whether the continuous scanning interferes more strongly with the maintenance of tonal than with the maintenance of verbal information during articulatory suppression (unpublished data).

fMRI Data Analysis

fMRI data were processed using the software package LIPSIA [Lohmann et al.,2001]. Functional data were corrected for motion using a matching metric based on linear correlation. To correct for the temporal offset between the slices acquired in one scan, a cubic‐spline‐interpolation was applied. A temporal highpass filter with a cutoff frequency of 1/72 Hz was used for baseline correction of the signal and a spatial gaussian filter with 5.65 mm FWHM was applied. Functional data were linearly registered with the Talairach stereotactic coordinate system [Talairach and Tournoux,1988]. The rotational and translational parameters were subsequently transformed by linear scaling to the standard size. The resulting parameters were then used to transform the functional slices using trilinear interpolation, so that the resulting functional slices were aligned with the Talairach coordinate system.

The statistical evaluation was based on a least‐squares estimation using the general linear model for serially autocorrelated observations [see also Friston,1994; Worsley and Friston,1995]. The design matrix was generated using a box‐car function. The design matrix, the acquired data, and the error term were convolved with a Gaussian kernel of 4 s (to deal with the temporal autocorrelation). Subsequently, contrast‐images were calculated for each participant, and entered into a second‐level random effects analysis. One‐sample t‐tests were performed to evaluate whether observed differences were significantly different from zero (t‐values were transformed into z‐values). The results were corrected for multiple comparisons using cluster‐size and cluster‐value thresholds obtained by Monte‐Carlo simulations using a significance level of P < 0.05 (clusters in the resulting maps were obtained using a z‐value threshold of 2.58).

Region of Interest Analysis

Cortical areas that were significantly activated in the SPMs in either hemisphere, and in either the verbal or the tonal rehearsal condition, were subjected to a further post hoc analysis. It was tested whether the activation strength in regions of interest (ROIs) differed between hemispheres and/or conditions (for comparisons between hemispheres, or conditions, some areas were investigated with ROI analyses, even if those areas were not significantly activated in the SPMs with the applied statistical thresholds, see Table I). For each subject, five ROIs were defined in each hemisphere and for each condition as single voxels. These ROIs were as follows: (1) ventrolateral premotor cortex (vlPMC), (2) dorsolateral premotor cortex (dlPMC), (3) supramarginal gyrus/intraparietal sulcus (SMG/IPS), (4) planum temporale, and (5) the anterior superior insula. An additional ROI was defined as voxel in the pre‐SMA. The ROI coordinates were determined separately for each subject (using for each subject the individual z‐maps and the individual high‐resolution anatomical scan). Such individually adjusted ROIs were computed because of the interindividual variability of brain morphology, thus obtaining as high accuracy of the statistical comparisons as possible (see Fig. 3D for illustration of individual ROIs for verbal rehearsal). For the determination of ROI coordinates, SPMs were scaled to 1 mm × 1 mm × 1 mm using trilinear interpolation. The coordinate of each ROI was defined as pixel with the highest z‐value in the interpolated single‐subject SPM within a search radius of 9 mm around the local signal maximum in the group contrast, but within the anatomical boundaries of the respective structure (regardless of the statistical significance of the z‐value; coordinates were determined by S.K. and T.F.). These coordinates were local maxima in approximately 95% of all cases. Then, for each subject contrast values were computed for each contrast for the voxel containing the respective coordinate. These mean values subsequently entered repeated measures ANOVAs with factors condition (verbal rehearsal, tonal rehearsal) and hemisphere [Bosch,2000]. In addition to comparing the hemodynamic responses in the ROIs, coordinates of ROIs were compared between verbal and tonal rehearsal to test for possible differences in the topography of activations between these two conditions. Therefore, x‐, y‐, and z‐coordinates were compared by paired two‐sided t‐tests.

Table I.

Activations elicited during rehearsal conditions (contrasted to the control condition in which subjects did neither sing nor memorize)

Anatomical structure BA Left hemisphere Right hemisphere
Talairach coord. (SPM) z‐value (SPM) mm3 Talairach coord. (ROI) P‐value (ROI) Talairach coord. (SPM) z‐value (SPM) mm3 Talairach coord. (ROI) P‐value (ROI)
Verbal rehearsal
 vlPMC 6 −44 1 24 4.6 2,511 −46 3 22 0.005 50 9 22 0.007
 dlPMC 4/6 −50 −8 42 8.27 1,350 −48 −6 44 0.0001 49 −8 39 5.35 378 49 −6 41 0.0001
 IPS/SMG 40 −38 −38 39 4.83 1,161 −41 −37 40 0.0002 41 −37 43 0.0004
 Planum temporale 22 −44 −38 21 6.88 1,269 −47 −40 22 0.0001 52 −34 23 0.0001
 Ant. sup. Insula −32 15 4 n.s. 34 15 0 n.s.
 IFG/pars trinangularis 45/46 −45 29 9 n.s. 45 32 8 0.005
 Pre‐SMA 6 −5 3 54 5.11 243 −6 6 55 0.005
 Subcentral gyrusa 43 −59 −5 15 5.94
 Putamen −17 0 15 4.10 2,511
 Caudate nucleus 16 10 15 4.01 594
 Thalamusb −17 −18 15 4.25
 Cerebellum 25 −62 −15 5.68 1,269
Tonal rehearsal
 vlPMC 6 −50 4 24 12.97 8,883 −49 6 22 0.0001 49 7 21 9.43 2,754 47 7 20 0.0001
 dlPMC 4/6 −47 −8 42 9.20 1,080 −47 −5 43 0.0001 50 −7 42 0.0001
 IPS/SMG 40 −47 −35 39 10.95 5,238 −41 −38 40 0.0001 34 −38 42 10.51 2,403 40 −39 44 0.0002
 Planum temporalec 22 −47 −42 24 8.07 −48 −41 26 0.0001 51 −34 23 0.0004
 Ant. sup. Insulad −32 19 3 9.83 −31 18 5 0.02 35 16 0 n.s.
 IFG/pars trinangularis 45/46 −44 31 9 7.67 486 −43 30 10 0.005 45 33 9 0.005
 Pre‐SMA 6 −5 7 54 10.85 3,132 −5 7 55 0.0005
 Caudate nucleus 16 9 15 4.01 189
 Pallidum −17 −3 3 7.24 189
 Thalamus −14 −14 15 7.31 918 13 −15 15 6.27 135
 Cerebellum −29 −62 −18 9.62 864 31 −56 −18 8.47 1,863

The table shows the results of the cluster analysis of statistical parametric maps (P < 0.05 corrected for multiple comparisons) and ROI analyses (see “Methods” for details).

a

The cluster in the left vlPMC had another local maximum in the subcentral gyrus.

b

The cluster in the left striatum had another local maximum in the Thalamus.

c

The cluster in the left IPS/SMG had another local maximum in the planum temporale.

d

The cluster in the left vlPMC had another local maximum in the anterior superior insula.

Figure 3.

Figure 3

Activations during verbal (A) and tonal (B) rehearsal (contrasted to the control condition in which subjects did neither sing nor memorize; P < 0.05 corrected for multiple comparisons). Both tasks activated a network comprising the ventrolateral premotor cortex (vlPMC), the dorsal precentral gyrus, the intraparietal sulcus (IPS) extending into the supramarginal gyrus (SMG), and the planum temporale (p.t.). In the left hemisphere, the pars triangularis of the IFG was activated only during the tonal rehearsal. (C) Shows areas that were significantly activated during both verbal and tonal rehearsal. (D) Illustration of individually adjusted ROIs (for the verbal rehearsal condition). In each cluster, each circle represents the ROI coordinate of one participant.

The analogous procedure was applied for the maintenance of verbal, and the maintenance of tonal information during articulatory suppression with the following ROIs: (1) intermediate frontal sulcus, (2) IFG/pars triangularis, (3) vlPMC, (4) anterior superior insula, and (5) pre‐SMA. To test for hemispheric differences, this procedure was also applied for the singing condition with the following ROIs: dlPMC, Rolandic operculum, planum temporale/supramarginal gyrus, IPL/angular gyrus, precuneus, and posterior cingulate cortex (PCC).

RESULTS

Behavioral Data

The behavioral data for both verbal (syllable) and tonal (pitch) tasks are summarized in Figure 2. During the verbal rehearsal, participants had on average 97.25% (SEM = 0.78%) correct responses. Memory performance in the verbal task clearly dropped during articulatory suppression (87.08%, SEM = 2.60%). During the tonal rehearsal, participants had on average 63.83% (SEM = 2.82%) correct responses. Like in the verbal task, performance in the tonal task was less accurate under articulatory suppression (60.08%, SEM = 2.82%). To fulfill the requirements of normal distribution and equality of variances for an ANOVA, behavioral data for both verbal and tonal tasks were transformed with 2 * arcsin (sqrt(x)) [and 1 − (1/2n) for x = 100% correct responses, n being the number of trials, see Kirk,1995]. A subsequent ANOVA with factors memory‐type (verbal, tonal) and suppression (with, without articulatory suppression) on the hit rates showed a main effect of memory‐type (F(1,11) = 131.78, P < 0.0001, reflecting that participants' performance was better in the verbal than in the tonal tasks), a main effect of suppression (F(1,11) = 15.82, P = 0.002, reflecting that performance was better during rehearsal than during articulatory suppression), and a two‐way interaction (F(1,11) = 17.37, P = 0.002, reflecting that the effect of articulatory suppression on the memory task was stronger during the verbal than during the tonal task). Importantly, performance dropped significantly during suppression (compared to rehearsal) in both the verbal (t(11) = 4.27, P < 0.001) and the tonal task (t(11) = 2.22, P < 0.05), providing assurance that participants actually performed the articulatory suppression during both tasks. In the control task in which subjects only had to sing subvocally, without memorizing pitches or syllables, they classified 98.08% (SEM = 0.79) of the probe stimuli correctly as played forward/backward, and 99% (SEM = 0.56) in the control task in which they neither sang nor memorized the stimuli.

Figure 2.

Figure 2

Behavioral data of verbal and tonal WM during rehearsal (shaded bars), and of verbal and tonal WM under simultaneous articulatory suppression (nonshaded bars). Participants performed better in the verbal than in the tonal conditions. Note the significant drop in performance (compared to the rehearsal conditions) during maintenance of both verbal and tonal information under articulatory suppression.

During the rehearsal tasks, and during the verbal WM task under suppression, performance during the fMRI session was similar to the performance during the training session in which both rehearsal and suppression was also performed overtly (and could, thus, be controlled by the experimenter): Correct responses for verbal rehearsal were 96.64%, for maintenance of verbal information under articulatory suppression 83.35%, and for tonal rehearsal 70.04%. Paired t‐tests showed that the differences between training and fMRI session were statistically not significant (verbal rehearsal: P > 0.7, maintenance of verbal information under suppression: P > 0.2, tonal rehearsal: P > 0.1). This provides some assurance that participants followed the instructions correctly. For the maintenance of tonal information during articulatory suppression, performance was significantly better during the training session (69.09% correct responses, P < 0.05), perhaps because the scanner noise made the tonal task more difficult. However, the fact that performance dropped during the fMRI experiment corroborates that the participants followed the instructions correctly.

fMRI Data: Rehearsal Conditions

Table I summarizes activations elicited by the verbal and the tonal rehearsal (both contrasted to nonrehearsal, see also Fig. 3A,B). The topography of local maxima was remarkably similar for both rehearsal conditions: Both the verbal and the tonal rehearsal activated a cortical network comprising (a) ventrolateral premotor cortex (vlPMC, this activation extended along the precentral sulcus into the posterior wall of the pars opercularis/Broca's area), (b) dorsolateral premotor cortex (dlPMC), (c) the intraparietal sulcus (IPS) extending into the supramarginal gyrus (SMG), (d) the planum temporale, (e) the anterior superior insula, (f) the pars triangularis of the IFG (BA 45/46), although during verbal rehearsal only in the right hemisphere, and (g) the pre‐SMA (local maxima in the planum temporale lay within the probability region of 26–45% for the planum temporale according to the probability maps of Westbury et al. [1999]). This network clearly resembles the functional architecture of articulatory rehearsal reported in previous studies (see Introduction).

The conjunction analysis showed that vlPMC, dlPMC, and SMG/IPS (all bilaterally), and the left planum temporale were significantly activated during both verbal and tonal rehearsal (Fig. 3C). Moreover, activations for both conditions were also indicated in the pre‐SMA, the cerebellum bilaterally, the left Rolandic operculum, the putamen, the pallidum, and the thalamus, as well as the right caudate nucleus (not shown in Fig. 3C).

For the verbal rehearsal an additional activation was indicated within the subcentral gyrus (Rolandic operculum, BA43). Such activation was not yielded for the tonal rehearsal in the corrected SPMs. However, a local maximum within this structure was also indicated for the tonal rehearsal in the uncorrected SPMs (z = 5.45; coordinate of this local maximum was −52, −14, 14), strongly suggesting that the Rolandic operculum was not only activated during the verbal, but also during the tonal rehearsal.

To investigate hemispheric differences, and to compare both activation patterns in more detail, ROI analyses were performed using individually adjusted ROIs (see Table I for results, see Fig. 3D for illustration of individual ROIs): For each participant and each analyzed structure, a ROI‐coordinate was determined as local maximum of activation within the anatomical boundaries of the respective structure (vlPMC, dlPMC, planum temporale, SMG/IPS, and pre‐SMA, see Methods for details).

In a first step, we investigated if these ROI coordinates differed between verbal and tonal rehearsal. Therefore, individual ROI coordinates were grand‐averaged separately for each structure (grand‐averaged ROI‐coordinates are provided in Table I). In each of the analyzed structures (except the planum temporale) grand‐averaged coordinates of local maxima of the verbal rehearsal were located within a 3 mm range of the respective coordinates of the tonal rehearsal (in the planum temporale, grand‐averaged coordinates were within a 4 mm range). That is, given the spatial resolution of our fMRI data, the local maxima of ROI coordinates were virtually identical for both verbal and tonal rehearsal (located within the same, or the directly adjacent voxel). These observations were confirmed by statistical analyses: Paired t‐tests on the x‐, y‐, and z‐coordinates of individual ROI coordinates were computed for each structure to test if the coordinates of activations during tonal rehearsal differed from coordinates of activations during verbal rehearsal. These t‐tests did not indicate any difference between verbal and tonal rehearsal (P was between 0.1 and 0.2 in four tests, between 0.21 and 0.89 in 33 tests, and > 0.9 in two tests). This indicates that the functional architecture of verbal and tonal rehearsal does not differ, at least when applying the task used in the present study.

To test differences in activation strength between conditions, and lateralization of activations, ANOVAs with factors condition (tonal rehearsal, verbal rehearsal) and hemisphere were carried out for each ROI, indicating significant effects of hemisphere for the vlPMC (P < 0.05), and a marginally significant effect of hemisphere for the SMG/IPS (P < 0.07) as well as for the planum temporale (P < 0.07). Significant effects of condition were indicated for BA 45/46 (P < 0.02), and for the vlPMC (P < 0.05). A t‐test comparing left and right BA46 for the tonal condition only indicated a significant difference between hemispheres (P < 0.05).

fMRI Data: Suppression Conditions

Figure 4A,B shows the activations during verbal and tonal WM under articulatory suppression (i.e., singing the children's song while maintaining the pitches or the syllables in WM) contrasted to the control condition (singing the children's song without keeping the pitches or syllables in memory). Significant activations were observed for both verbal and tonal conditions within the left vlPMC (extending into the pars opercularis/Broca's area), the anterior insula, the right cerebellum, and the right striatum (see Fig. 4, and Table II). Moreover, activations were present in the IFG (pars triangularis, BA 45/46) during the tonal condition, and during both tonal and verbal conditions in the inferior frontal sulcus (IFS, see also Table II). The latter activation extended anteriorly along the upper bank of the IFS into the frontomarginal/anterior intermediate frontal sulcus (see inset in Fig. 4). The conjunction analysis showed that, in the left hemisphere, vlPMC, and anterior prefrontal areas were significantly activated during maintenance of both verbal and tonal information in the face of simultaneous suppression (Fig. 4C). Moreover, activations for both conditions were observed in the anterior insula bilaterally, as well as in the right putamen and the right cerebellum.

Figure 4.

Figure 4

Activations during maintenance of verbal (A) and maintenance of tonal (B) information under articulatory suppression (contrasted to the control condition in which subjects covertly sung, but did not memorize; P < 0.05 corrected for multiple comparisons). During both verbal and tonal conditions, activations were observed in the vlPMC (extending into the pars opercularis/Broca's area), the anterior insula, the right cerebellum, and the right ventral striatum (not shown). Additional activations were indicated in the pars triangularis, and in the inferior frontal sulcus. The inset in (B) shows that the latter activation extended anteriorly along the upper bank of the IFS into the frontomarginal/intermediate frontal sulcus (P < 0.05 uncorrected). (C) Shows areas that were significantly activated during both conditions.

Table II.

Activations elicited during maintenance of verbal, and maintenance of tonal information under articulatory suppression (contrasted to the control condition in which subjects covertly sang, but did not memorize)

Anatomical structure BA Left hemisphere Right hemisphere
Talairach coord. (SPM) z‐value (SPM) mm3 Talairach coord. (ROI) P‐value (ROI) Talairach coord. (SPM) z‐value (SPM) mm3 Talairach coord. (ROI) P‐value (ROI)
Verbal WM (during suppression)
 Intermediate frontal sulcus −35 40 24 3.13 111 −34 38 23 0.05 34 40 11 n.s.
 IFG/pars trinangularis 45/46 −44 27 3 n.s. 48 30 1 n.s.
 vlPMC 6 −53 7 15 4.25 648 −48 8 16 0.01 47 5 18 0.05
 IPS/SMG 40 −44 −36 41 0.05 40 −40 41 0.08
 Ant. sup. Insula −29 19 3 5.09 1,269 −29 20 3 0.0005 37 16 3 3.98 729 33 16 2 0.001
 Pre‐SMA 6 −5 16 51 5.05 2,511 −5 14 52 0.0001
 Putamen 22 16 −3 4.03 270
 Pallidum −14 −6 0 3.79 162
 Cerebellum −38 −65 −24 4.75 432 25 −59 −18 3.82 891
Tonal WM (during suppression)
 Intermediate frontal sulcus −35 40 21 3.99 648 −33 39 21 0.05 35 43 14 0.08
 IFG/pars trinangularis 45/46 −41 37 6 3.86 270 −44 29 3 0.05 47 30 2 n.s.
 vlPMCa 6 −44 3 24 5.42 −44 3 23 0.005 48 5 24 0.05
 IPS/SMG 40 −50 −35 48 3.76 243 −46 −37 44 0.05 41 −41 44 0.05
 Ant. sup. Insula −29 19 6 7.14 5,751 −28 19 3 0.001 34 19 6 4.52 3,159 31 17 3 0.01
 Pre‐SMA 6 −5 25 45 6.36 5,940 −3 14 51 0.0001
 Cerebellum 31 −56 −24 4.05 216

The table shows the results of the cluster analysis of statistical parametric maps (p < 0.05 corrected for multiple comparisons) and ROI analyses.

a

The cluster in the insula had another local maximum in the vlPMC.

In contrast to the rehearsal condition, no significant activations were indicated for the dlPMC, or the planum temporale (the IPL was activated only during the tonal condition, and only in the left hemisphere). Because the absence of significant activations in the SPMs does not indicate that these structures were completely inactive, each coordinate of the network observed under verbal and tonal rehearsal was examined by searching for the nearest local maximum in the z‐maps of the suppression contrasts (only activations with a P‐value of at least 0.05 uncorrected were regarded as local maxima). In all structures of the left hemisphere, except the planum temporale, local maxima were found within the same, or the adjacent, voxel as in the rehearsal conditions. These findings were supported by ROI analyses (using the coordinates determined for the rehearsal conditions): All mentioned structures (vlPMC, dlPMC, SMG/IPS, and insula), but not the planum temporale, were also significantly activated (all P < 0.05) during the suppression conditions. That is, the activity of the network observed under articulatory rehearsal was not completely abolished, although strongly reduced during articulatory suppression.

In addition to this ROI analysis (which used ROI coordinates obtained for the rehearsal conditions), we also obtained the individual coordinates of activations during the suppression conditions (grand‐averaged ROI‐coordinates are provided in Table II). As in the rehearsal conditions, in each of the analyzed structures (except the vlPMC) grand‐averaged coordinates of local maxima of the verbal and the tonal conditions were located within the same, or the directly adjacent voxel. In the vlPMC, the ROI coordinates differed between the verbal and the tonal condition (with regards to x‐, y‐, and z‐coordinates, P < 0.05 in all three paired t‐tests). No such differences between conditions were indicated for any other structure (neither in x‐, y‐, nor z‐direction, P > 0.2 in each test). To test for differences in activation strength between conditions, and for lateralization of activations, ANOVAs with factors condition (maintenance of tonal, and maintenance of verbal information, both during articulatory suppression) and hemisphere were carried out for each ROI, but no main effects or interactions were indicated.

Figure 5 shows activations of the covert singing (contrast: singing vs. not singing, without memorizing pitches or syllables in both conditions, see also Table III). Marked activations were found within the planum temporale bilaterally (in the left hemisphere extending into the supramarginal gyrus), the Rolandic operculum bilaterally, and the dlPMC bilaterally. Notably, in contrast to the rehearsal and suppression conditions, activations within the vlPMC or Broca's area were not significant with the applied statistical threshold.

Figure 5.

Figure 5

Activations elicited during covert singing (contrasted to the control condition in which subjects did not sing; P < 0.05 corrected for multiple comparisons).

Table III.

Activations of singing (contrasted to the control condition in which subjects did not sing)

Anatomical structure BA Left hemisphere Right hemisphere
Talairach coord. (SPM) z‐value (SPM) mm3 P‐value (ROI) Talairach coord. (SPM) z‐value (SPM) mm3 P‐value (ROI)
Singing
 dlPMC 4/6 −50 −8 42 6.27 810 0.005 49 −8 39 6.03 459 0.001
 Rol. operc. 43 −59 −8 12 5.47 1,134 0.001 0.01
 p.t./SMG −47 −38 21 7.38 5,643 0.0001 49 −29 10 6.83 1,593 0.0005
 IPL/ang. gyrus −35 −68 39 5.10 1,377 0.05 n.s.
 IPS/SPL n.s. 43 −53 48 4.59 1,053 0.05
 Precuneus 1 −59 39 4.99 2,052 0.001
 PCC 1 −38 21 5.06 2,511 0.001
 Thalamus −17 −17 15 5.59 2,150 16 −15 18 4.96 1,650
 Cerebellum −29 −59 −18 6.37 2,700 22 −62 −15 6.58 3,861

The table shows the results of the cluster analysis of statistical parametric maps (P < 0.05 corrected for multiple comparisons) and ROI analyses.

DISCUSSION

Rehearsal

During the verbal rehearsal, a neural network including the vlPMC and dlPMC, the anterior insula, the SMG/IPS, the planum temporale, the IFG, pre‐SMA, and the cerebellum was activated. This network has been described in previous studies on verbal WM with auditory [Hickok et al.,2003], and visual stimuli [with the exception of the planum temporale; Awh et al.,1996; Chen and Desmond,2005; Gruber,2001; Gruber and von Cramon,2001,2003; Kirschen et al.,2005; Paulesu et al.,1993]. Importantly, virtually the identical network as during verbal rehearsal was also found to be activated during the tonal rehearsal: the coordinates of the above mentioned activations did not differ within subjects between the verbal and the tonal rehearsal, and the conjunction analysis showed that the mentioned structures (in concert with the left putamen, pallidum, and thalamus, as well as the right caudate nucleus) were significantly activated in both conditions. Thus, the present data show a remarkable overlap of neocortical, subcortical, and cerebellar neuronal resources underlying the rehearsal of verbal and tonal stimuli, indicating that phonological rehearsal mechanisms are less specialized for language than usually believed. This finding is in agreement with previous studies on tonal WM [Gaab et al.,2003; Hickok et al.,2003; Zatorre et al.,1994; ] which reported activations for tonal WM that were similar to those observed in studies on verbal WM (see above). Our data are also in line with previous findings from Hickock et al. [2003], who observed activations of the vlPMC, the IPS/SMG, the planum temporale (referred to as area Spt by the authors), and the dlPMC for the rehearsal of both melodies and sentences.

Differences between conditions were activations of the triangular part of the left inferior frontal gyrus, and of the left anterior superior insula during the tonal, but not during the verbal rehearsal task. However, it is unlikely that these regions play a role only for tonal WM: Both the left anterior insula [Bamiou et al.,2003; Chein et al.,2002; Paulesu et al.,1993] as well as the triangular part of the left inferior frontal gyrus [e.g., Cabeza and Nyberg,2000] have also been reported to be involved in verbal WM tasks [as well as in other tasks such as speech production and action observation, e.g. Augustine,1996; Binkofski et al.,1999; Buccino et al.,2001]. During WM tasks, the latter region appears to become particularly involved when strategic processes come into play, such as organizing of WM contents into higher level chunks [see Bor et al.,2003; that study used a visual‐spatial task]. Because the tonal WM task was more difficult than the verbal task (as reflected in the behavioral data), it is likely that participants engaged strategic processes (especially chunking of the pitches of a sequence into melodic segments) more strongly during tonal rehearsal than during verbal rehearsal, and that this engagement may be reflected by the activation of BA 45/46 during the tonal rehearsal. The greater difficulty of the tonal task presumably also explains that activations were stronger during the tonal than during the verbal rehearsal. Taken together, the present data thus indicate that rehearsal of tonal information (i.e., a “tonal loop”) relies on neural resources that strongly overlap in their topography with those involved in the rehearsal of verbal information.

It is unlikely that the similar topography of activations for verbal (syllable) and tonal (pitch) rehearsal is simply due to noncompliant behavior on the part of the subjects (i.e., simultaneous rehearsal of both syllables and pitches): First, the behavioral data recorded during the training‐session (in which participants rehearsed overtly, allowing the experimenter to control that they were rehearsing either only the tones, or only the syllables, but not both simultaneously) were very similar to those recorded during the covert rehearsal of the fMRI session. Second, after a little training, it is considerably easier, and more convenient, to rehearse either the tones or the syllables (instead of both, this was also mentioned by our participants when asked after the experiment). Third, activations were in some structures significantly stronger during the tonal rehearsal (compared to the verbal rehearsal), which is not to be expected if participants performed the same rehearsal (i.e., simultaneous rehearsal of syllables and pitches) in both conditions. Finally, fMRI research experience tells us that subjects are generally highly compliant in covert tasks [see also Callan et al.,2006].

Nonspecificity of Activations During Rehearsal

During subvocal articulatory rehearsal, strong activations of the vlPMC, extending anteriorly into the precentral sulcus, and Broca's area were observed. Such activations were not observed during the subvocal singing, suggesting that vlPMC/Broca's area plays a more specific role for verbal and tonal WM. The present data, thus, point to the particular importance of vlPMC as an active rehearsal component (which is a substantial part of the articulatory loop). However, it should also be noted that the vlPMC is not only involved in WM functions: Previous studies have also shown involvement of this region in a number of other functions such as action planning and understanding, serial prediction, and analysis as well as recognition of sequential information [Buccino et al.,2001; Conway and Christiansen,2001; Huettel et al.,2002; Meyer and Jancke,2006; Rizzolatti and Craighero,2004; Schubotz and von Cramon,2002; see also below].

Although the SMG and the IPS were also active during the singing condition, activations of these areas appeared to be considerably stronger during the rehearsal conditions. Because these structures have previously been reported to play an important role for WM [e.g., Awh et al.,1996; Crottaz‐Herbette et al.,2004; Gruber,2001; Gruber and vonCramon,2003], and because singing also involves WM operations, it is likely that these areas serve WM processes, rather than simply articulatory processes. For example, following a suggestion by Cohen et al. [1997], these inferior parietal areas may store phonological long‐term information that may be actively accessed via item‐specific functional connections to the anterior prefrontal cortex, which has been shown to play a major role both in verbal WM [Gruber,2001; Gruber and von Cramon,2003] and in memory retrieval [see, for example, Buckner and Koutstaal,1998]. Nevertheless, we already noted in the Introduction that regions along the IPS are also involved in attentional mechanisms (and in a number of other functions such as spatial localization, reaching and grasping, as well as task switching, for overviews see Cabeza and Nyberg,2000; Culham and Kanwisher,2001; Corbetta and Shulman,2002). Hence, further studies are needed to clarify the particular role that this region plays for WM.

By contrast, the planum temporale was not only activated during the subvocal rehearsal, but activated even more strongly during the covert singing condition. This suggests that the planum temporale plays a role for mechanisms that are not directly dependent on WM processes, such as the formation of auditory images during rehearsal or singing [Halpern and Zatorre,1999], transformation of such images into motor codes [Buchsbaum et al.,2005; Callan et al.,2006; Hickock et al.,2003; Warren et al.,2005], segregation and analysis of the spectrotemporal structure of sounds [Binder et al.,2000; Griffiths and Warren,2002; Jäncke et al.,2002], as well as matching of spectrotemporal patterns with learned spectrotemporal representations [Griffiths and Warren,2002].

Like the planum temporale, the dorsal precentral gyrus was not only activated during rehearsal, but also during subvocal singing. The coordinates of the dlPMC activations were virtually identical between the singing and rehearsal conditions, and also highly similar to the coordinates reported for monotonic vocalizations of tones reported in a previous study [Brown et al.,2004; in that study, the coordinate reported for monotonic vocalization was x = −48, y = −10, z = 44]. This indicates that this region of the dorsolateral premotor cortex serves articulatory processes independent of WM operations.

Suppression

Activations in the dlPMC, the planum temporale, and the IPL were considerably smaller during the suppression conditions (i.e., singing with the additional task of remembering the syllables and the pitches) compared to the rehearsal conditions. This indicates that the articulatory suppression impaired the phonological loop, which is also reflected in the behavioral data. However, activations within these regions were not completely abolished. The possible reasons for the residual activations of these areas during suppression are discussed in the next section.

Interestingly, particularly during the maintenance of tonal information under articulatory suppression, activations within the IFG/IFS extended into the frontomarginal sulcus/anterior intermediate frontal sulcus. This further supports previous findings suggesting that these anterior prefrontal areas constitute an important component of a (bilateral) prefrontal‐parietal network that becomes activated whenever the speech‐based rehearsal mechanism is not available, or not sufficient, to solve a memory task by itself [Gruber,2001; Gruber and Goschke,2004; Gruber and von Cramon,2001,2003; Gruber et al.,2005, 2007]. Therefore, these areas might contain additional storage components of WM that are activated when auditory information cannot be rehearsed [Gruber and von Cramon,2003]. Note that the latter study [Gruber and von Cramon,2003] also showed that these regions are selectively activated in a (phonological) WM task under articulatory suppression, but not under similar conditions of conflict in the visuospatial domain (i.e. during visuospatial WM under visuospatial suppression), providing evidence that these regions are not simply involved in general executive control that support other WM areas in situations of conflict, but rather subserve domain‐specific processes related to phonological WM. Because these areas were activated during both verbal and tonal WM under suppression conditions, our data indicate that this additional WM component is important for the storage of both verbal and tonal information.

Rehearsal and Suppression: Sensorimotor Codes

Previous work has suggested that sensorimotor processes may assist with the representation and manipulation of information, and that sensorimotor coding plays an important role for WM processes [for a review see Wilson,2001]. Sensorimotor codes provide resources for the representation and maintenance of information (in the present study verbal and tonal information), and it is highly plausible that such resources were used by the participants to perform the WM tasks. This assumption is supported by the strong activation of lateral premotor areas along with parietal areas, cerebellar, and subcortical regions during the rehearsal conditions (and, although to a lesser extent, also during the suppression conditions). Numerous neurons in these regions are also involved in cortico‐basal ganglia thalamo‐cortical and cerebellar loops that serve voluntary motor control, and contribute to the programming, initiation, and execution of movements [Hoover and Strick,1999; Leblois et al.,2006; Middleton and Strick,2000; Parent and Hazrati,1995].

Also note that motor actions are not only coded by premotor, but also by parietal areas and that, in addition, parietal areas translate sensory input into information appropriate for action, and provide representations of these actions with specific sensory information [e.g., Fogassi and Luppino,2005; Fogassi et al.,2005]. In the present study, the parietal (SMG/IPS) and ventrolateral as well as dorsolateral premotor areas observed to be active during the rehearsal tasks (along with subcortical and cerebellar structures) might thus represent neural circuits involved in the formation and maintenance of sensorimotor codes serving the rehearsal of the tonal and verbal information. Because sensorimotor coding is involved in a number of different tasks (such as observing, performing, or recognizing actions), parts of the network observed in the present study (particularly the premotor and parietal regions) have also been reported in a number of previous studies that did not focus on WM [for an overview see Rizzolatti and Craighero,2004, see also Janata et al.,2002]. Likewise, one reason for the strong overlap of neuronal networks involved in verbal and tonal WM functions is presumably that WM for phonemes and for pitches relies to a considerable amount on sensorimotor‐related circuits which are similar for speech and song [see also Callan et al.,2006; that study showed a remarkable overlap of the brain structures involved in covert singing and covert speech, among them dlPMC and the planum temporale].

The residual activations of these areas during suppression are possibly due to the formation of motor representations during the presentation of stimuli in the suppression trials, which were not instantly erased with the onset of the suppression, but which probably decayed during the suppression, and were, thus, still residually observable. That is, it appears likely that the verbal and tonal information was encoded in sensorimotor representations, that these codes were held active during rehearsal, but decayed during suppression. However, on the other hand it cannot be excluded that the activations of SMG/IPS and vlPMC during the suppression conditions were simply due to erroneous rehearsal in some trials by some participants. Nevertheless, the strong decrease of performance during the suppression condition (as reflected in the behavioral data) indicates that participants mainly followed the instructions correctly. During articulatory suppression, the local maximum of activation in the vlPMC appeared to differ between tonal and verbal condition (as indicated by the ROI coordinates). However, because no such difference was found in the rehearsal conditions, and because the conjunction analysis showed a clear overlap of both tonal and verbal WM under suppression, we suggest to await whether future studies can replicate this effect.

Singing

A nice additional finding was the activation of the Rolandic operculum during the singing condition (as well as during rehearsal). Similar activations have been reported in previous functional imaging studies on both overt and covert singing [Jeffries et al.,2003; Riecker et al.,2000; Wildgruber et al.,1996]. The Rolandic operculum has been proposed to contain the representation of the larynx (and the pharynx), that is, of a vocal tract articulator crucially involved in the production of melody [Koelsch et al.,2006]. The present results support this assumption, and highlight the importance of this area for the production of frequency‐modulated vocal signals.

In conclusion, our data show that the topography of neocortical, subcortical, and cerebellar WM components is strongly overlapping for the rehearsal of verbal and tonal information, as well as for the maintenance of such information during articulatory suppression. This indicates that the functional architecture of verbal and tonal WM is remarkably similar. Articulatory rehearsal of verbal and tonal information involved mainly motor‐related areas (along with basal ganglia and thalamic nuclei, as well as the cerebellum), whereas maintenance of information during articulatory suppression additionally involved anterior prefrontal areas which might contain additional storage components of WM that are activated when auditory information cannot be rehearsed. The overlap of the neuronal networks underlying verbal and tonal WM and the involvement of brain structures implicated in sensorimotor processing suggests that WM for phonemes and for pitches relies considerably on sensorimotor‐related circuits which are similar (and partly identical) for speech and song. Because of such sensorimotor coding, some WM circuits are also overlapping with circuits involved in other cognitive tasks which do not involve WM, but require activity of sensorimotor‐related processes.

Acknowledgements

This study was supported by a grant from the German Research Foundation (Deutsche Forschungsgemeinschaft); awarded to S.K. (KO 2266/4–1).

REFERENCES

  1. Augustine JR (1996): Circuitry and functional aspects of the insular lobe in primates including humans. Brain Res Rev 22: 229–244. [DOI] [PubMed] [Google Scholar]
  2. Awh E, Jonides J, Smith EE, Schumacher EH, Koeppe RA, Katz S (1996): Dissociation of storage and rehearsal in verbal working memory: Evidence from positron emission tomography. Psychol Sci 7: 25–31. [Google Scholar]
  3. Baddeley AD (1992): Working memory. Science 255: 556–559. [DOI] [PubMed] [Google Scholar]
  4. Baddeley AD (2003): Working memory: Looking back and looking forward. Nat Rev Neurosci 4: 829–839. [DOI] [PubMed] [Google Scholar]
  5. Baddeley AD, Hitch GJ ( 1974): Working memory In: Bower GA, editor. Recent Advances in Learning and Motivation, Vol. VIII New York: Academic Press; pp 47–89. [Google Scholar]
  6. Bamiou DE, Musiek FE, Luxon LM (2003): The insula (Island of Reil) and its role in auditory processing. Brain Res Rev 42: 143–154. [DOI] [PubMed] [Google Scholar]
  7. Binder JR, Frost JA, Hammeke TA, Bellgowan PS, Springer JA, Kaufman JN, Possing ET (2000): Human temporal lobe activation by speech and nonspeech sounds. Cereb Cortex 10: 512–528. [DOI] [PubMed] [Google Scholar]
  8. Binkofski F, Buccino G, Posse S, Seitz RJ, Rizzolatti G, Freund H (1999): A fronto‐parietal circuit for object manipulation in man: Evidence from an fMRI‐study. Eur J Neurosci 11: 3276–3286. [DOI] [PubMed] [Google Scholar]
  9. Bor D, Duncan J, Wiseman RJ, Owen AM (2003): Encoding strategies dissociate prefrontal activity from working memory demand. Neuron 37: 361–367. [DOI] [PubMed] [Google Scholar]
  10. Bosch V (2000): Statistical analysis of multi‐subject fmri data: The assessment of focal activations. J Magn Reson Imaging 11: 61–64. [DOI] [PubMed] [Google Scholar]
  11. Brown S, Martinez MJ, Hodges DA, Fox PT, Parsons LM (2004): The song system of the human brain. Brain Res Cogn Brain Res 20: 363–375. [DOI] [PubMed] [Google Scholar]
  12. Buccino G, Binkofski F, Fink GR, Fadiga L, Fogassi L, Gallese V, Seitz RJ, Zilles K, Rizzolatti G,Freund H‐J (2001): Action observation activates premotor and parietal areas in a somatotopic manner: An fMRI study. Eur J Neurosci 13: 400–404. [PubMed] [Google Scholar]
  13. Buchsbaum BR, Olsen RK, Koch PF, Kohn P, Kippenhan JS, Berman KF (2005): Reading, hearing, and the planum temporale. Neuroimage 24: 444–454. [DOI] [PubMed] [Google Scholar]
  14. Buckner RL, Koutstaal W (1998): Functional neuroimaging studies of encoding, priming, and explicit memory retrieval. Proc Natl Acad Sci USA 95: 891–898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Cabeza R, Nyberg L (2000): Imaging cognition II: An empirical review of 275 PET and fMRI studies. J Cogn Neurosci 12: 1–47. [DOI] [PubMed] [Google Scholar]
  16. Callan DE, Tsytsarev V, Hanakawa T, Callan AM, Katsuhara M, Fukuyama H, Turner B (2006). Song and speech: Brain regions involved with perception and covert production. NeuroImage 31: 1327–1342. [DOI] [PubMed] [Google Scholar]
  17. Chan AS, Ho YC, Cheung MC (1998): Music training improves verbal memory. Nature 396: 128. [DOI] [PubMed] [Google Scholar]
  18. Chein JM, Fissell K, Jacobs S, Fiez JA (2002): Functional heterogeneity within Broca's area during verbal working memory. Physiol Behav 77: 635–639. [DOI] [PubMed] [Google Scholar]
  19. Chen SH, Desmond JE (2005): Cerebrocerebellar networks during articulatory rehearsal and verbal working memory tasks. Neuroimage 24: 332–338. [DOI] [PubMed] [Google Scholar]
  20. Cohen JD, Perlstein WM, Braver TS, Nystrom LE, Noll DC, Jonides J, Smith EE (1997): Temporal dynamics of brain activation during a working memory task. Nature 386: 604–608. [DOI] [PubMed] [Google Scholar]
  21. Conway C, Christiansen M (2001): Sequential learning in non‐human primates. Trends Cogn Sci 5: 539–546. [DOI] [PubMed] [Google Scholar]
  22. Corbetta M, Shulman GL (2002): Control of goal‐directed and stimulus‐driven attention in the brain. Nat Rev Neurosci 3: 201–215. [DOI] [PubMed] [Google Scholar]
  23. Crottaz‐Herbette S, Anagnoson RT, Menon V (2004): Modality effects in verbal working memory: Differential prefrontal and parietal responses to auditory and visual stimuli. Neuroimage 21: 340–351. [DOI] [PubMed] [Google Scholar]
  24. Culham JC, Kanwisher NG (2001): Neuroimaging of cognitive functions in human parietal cortex. Curr Opin Neurobiol 11: 157–163. [DOI] [PubMed] [Google Scholar]
  25. Daneman M, Carpenter PA (1980): Individual differences in working memory and reading. J Verbal Learn Verbal Behav 19: 450–466. [Google Scholar]
  26. Deutsch D (1970): Tones and numbers: Specificity of interference in immediate memory. Science 168: 1604–1605. [DOI] [PubMed] [Google Scholar]
  27. Fiez JA, Raife EA, Balota DA, Schwarz JP, Raichle ME, Petersen SE (1996): A positron emission tomography study of the short‐term maintenance of verbal information. J Neurosci 16: 808–822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Fogassi L, Luppino G (2005): Motor functions of the parietal lobe. Curr Opin Neurobiol 15: 626–631. [DOI] [PubMed] [Google Scholar]
  29. Fogassi L, Ferrari PF, Gesierich B, Rozzi S, Chersi F, Rizzolatti G (2005): Parietal lobe: From action organization to intention understanding. Science 308: 644–645. [DOI] [PubMed] [Google Scholar]
  30. Friston K (1994): Statistical parametric maps in functional imaging: A general linear approach. Hum Brain Mapp 2: 189–210. [Google Scholar]
  31. Gaab N, Gaser C, Zaehle T, Jancke L, Schlaug G (2003): Functional anatomy of pitch memory—An fMRI study with sparse temporal sampling. Neuroimage 19: 1417–1426. [DOI] [PubMed] [Google Scholar]
  32. Griffiths TD, Warren JD (2002): The planum temporale as a computational hub. Trends Neurosci 25: 348–353. [DOI] [PubMed] [Google Scholar]
  33. Gruber O (2001): Effects of domain‐specific interference on brain activation associated with verbal working memory task performance. Cereb Cortex 11: 1047–1055. [DOI] [PubMed] [Google Scholar]
  34. Gruber O, Goschke T (2004): Executive control emerging from dynamic interactions between brain systems mediating language, working memory and attentional processes. Acta Psychol 115: 105–121. [DOI] [PubMed] [Google Scholar]
  35. Gruber O, von Cramon DY (2001): Domain‐specific distribution of working memory processes along human prefrontal and parietal cortices: A functional magnetic resonance imaging study. Neurosci Lett 297: 29–32. [DOI] [PubMed] [Google Scholar]
  36. Gruber O, von Cramon DY (2003): The functional neuroanatomy of human working memory revisited. Evidence from 3‐T fMRI studies using classical domain‐specific interference tasks. Neuroimage 19: 797–809. [DOI] [PubMed] [Google Scholar]
  37. Gruber O, Gruber E, Falkai P (2005): Neural correlates of working memory deficits in schizophrenic patients. Ways to establish neurocognitive endophenotypes of psychiatric disorders. Radiologe 45: 153–160. [DOI] [PubMed] [Google Scholar]
  38. Gruber O, Müller T, Falkai P (2007): Dynamic interactions between brain systems underlying different components of verbal working memory. J Neural Trans 114: 1047–1050. [DOI] [PubMed] [Google Scholar]
  39. Halpern AR, Zatorre RJ (1999): When that tune runs through your head: A PET investigation of auditory imagery for familiar melodies. Cereb Cortex 9: 697–704. [DOI] [PubMed] [Google Scholar]
  40. Henson RN, Burgess N, Frith CD (2000): Recoding, storage, rehearsal and grouping in verbal shortterm memory: An fMRI study. Neuropsychologia 38: 426–440. [DOI] [PubMed] [Google Scholar]
  41. Hickock G, Buchsbaum B, Humphries C, Muftuler T (2003): Auditory‐motor interaction revealed by fMRI: Speech, music, and working memory in area. J Cogn Neurosci 15: 673–682. [DOI] [PubMed] [Google Scholar]
  42. Hoover JE, Strick PL (1999): The organization of cerebellar and basal ganglia outputs to primary motor cortex as revealed by retrograde transneuronal transport of herpes simplex virus type 1. J Neurosci 19: 1446–1463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Huettel S, Mack P, McCarthy G (2002): Perceiving patterns in random series: Dynamic processing of sequence in prefrontal cortex. Nat Neurosci 5: 485–490. [DOI] [PubMed] [Google Scholar]
  44. Iwanaga M, Ito T (2002): Disturbance effect of music on processing of verbal and spatial memories. Percept Mot Skills 94(3, Part 2): 1251–1258. [DOI] [PubMed] [Google Scholar]
  45. Janata P, Tillmann B, Bharucha JJ (2002): Listening to polyphonic music recruits domain‐general attention and working memory circuits. Cogn Affect Behav Neurosci 2: 121–140. [DOI] [PubMed] [Google Scholar]
  46. Jäncke L, Wüstenberg T, Scheich H, Heinze H‐J (2002). Phonetic perception and the temporal cortex. NeuroImage 15: 733–746. [DOI] [PubMed] [Google Scholar]
  47. Jeffries KJ, Fritz JB, Braun AR (2003): Words in melody: An H(2)15O PET study of brain activation during singing and speaking. Neuroreport 14: 749–754. [DOI] [PubMed] [Google Scholar]
  48. Jones DM, Macken WJ, Nicholls AP (2004): The phonological store of working memory: Is it phonological and is it a store? J Exp Psychol Learn Mem Cogn 30: 656–674. [DOI] [PubMed] [Google Scholar]
  49. Jonides J, Schumacher EH, Smith EE, Koeppe RA, Awh E, Reuter‐Lorenz PA, Marshuetz C, Willis CR (1998): The role of parietal cortex in verbal working memory. J Neurosci 18: 5026–5034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Kirk RE (1995): Experimental Design. Pacific Grove, CA: ITP. [Google Scholar]
  51. Kirschen MP, Chen SH, Schraedley‐Desmond P, Desmond JE (2005): Load‐ and practice‐dependent increases in cerebro‐cerebellar activation in verbal working memory: An fMRI study. Neuroimage 24: 462–472. [DOI] [PubMed] [Google Scholar]
  52. Koelsch S, Fritz T, von Cramon DY, Muller K, Friederici AD (2006): Investigating emotion with music: An fMRI study. Hum Brain Mapp 27: 329–350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Leblois A, Boraud T, Meissner W, Bergman H, Hansel D (2006): Competition between feedback loops underlies normal and pathological dynamics in the basal ganglia. J Neurosci 26: 7317–7318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Lohmann G, Muller K, Bosch V, Mentzel H, Hessler S, Chen L, Zysset S, von Cramon DY (2001): LIPSIA—A new software system for the evaluation of functional magnetic resonance images of the human brain. Comput Med Imaging Graph 25: 449–457. [DOI] [PubMed] [Google Scholar]
  55. Meyer M, Jancke L (2006): Involvement of left and right frontal operculum in speech and nonspeech perception and production In: Grodzinsky Y,Amunts K, editors. Broca's Region. New York: Oxford University Press; pp 218–241. [Google Scholar]
  56. Middleton FA, Strick PL (2000): Basal ganglia and cerebellar loops: Motor and cognitive circuits. Brain Res Brain Res Rev 31: 236–250. [DOI] [PubMed] [Google Scholar]
  57. Oldfield RC (1971): The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia 9: 97–113. [DOI] [PubMed] [Google Scholar]
  58. Parent A, Hazrati LN (1995): Functional anatomy of the basal ganglia. I. The cortico‐basal ganglia‐thalamo‐cortical loop. Brain Res Brain Res Rev 20: 91–127. [DOI] [PubMed] [Google Scholar]
  59. Paulesu E, Frith CD, Frackowiak RS (1993): The neural correlates of the verbal component of working memory. Nature 362: 342–345. [DOI] [PubMed] [Google Scholar]
  60. Pechmann T, Mohr G (1992): Interference in memory for tonal pitch: Implications for a working‐memory model. Mem Cognit 20: 314–320. [DOI] [PubMed] [Google Scholar]
  61. Ravizza SM, Delgado MR, Chein JM, Becker JT, Fiez JA (2004): Functional dissociations within the inferior parietal cortex in verbal working memory. Neuroimage 22: 562–573. [DOI] [PubMed] [Google Scholar]
  62. Riecker A, Ackermann H, Wildgruber D, Dogil G, Grodd W (2000): Opposite hemispheric lateralization effects during speaking and singing at motor cortex, insula and cerebellum. Neuroreport 11: 1997–2000. [DOI] [PubMed] [Google Scholar]
  63. Rizzolatti G, Craighero L (2004): The mirror‐neuron system. Annu Rev Neurosci 27: 169–192. [DOI] [PubMed] [Google Scholar]
  64. Schubotz RI, von Cramon DY (2002): Predicting perceptual events activates corresponding motor schemes in lateral premotor cortex: An fMRI study. NeuroImage 15: 787–796. [DOI] [PubMed] [Google Scholar]
  65. Salame P, Baddeley AD. (1989): Effects of background music on phonological short‐term memory. Q J Exp Psychol A 41: 107–122. [Google Scholar]
  66. Semal C, Demany L, Ueda K, Halle PA (1996): Speech versus nonspeech in pitch memory. J Acoust Soc Am 100(2, Part 1): 1132–1140. [DOI] [PubMed] [Google Scholar]
  67. Talairach P, Tournoux J (1998): A Stereotactic Coplanar Atlas of the Human Brain. Stuttgart: Thieme. [Google Scholar]
  68. Warren JE, Wise RJ, Warren JD (2005): Sounds do‐able: Auditory‐motor transformations and the posterior temporal plane. Trends Neurosci 28: 636–643. [DOI] [PubMed] [Google Scholar]
  69. Westbury CF, Zatorre RJ, Evans AC (1999): Quantifying variability in the planum temporale: A probability map. Cereb Cortex 9: 392–405. [DOI] [PubMed] [Google Scholar]
  70. Worsley KJ, Friston KJ (1995): Analysis of fMRI time‐series revisited—again. Neuroimage 2: 173–181. [DOI] [PubMed] [Google Scholar]
  71. Wildgruber D, Ackermann H, Klose U, Kardatzki B, Grodd W (1996): Functional lateralization of speech production at primary motor cortex: A fMRI study. Neuroreport 7: 2791–2795. [DOI] [PubMed] [Google Scholar]
  72. Wilson M (2001): The case for sensorimotor coding in working memory. Psychonomic Bull Rev 8: 44–57. [DOI] [PubMed] [Google Scholar]
  73. Zatorre RJ, Evans AC, Meyer E (1994): Neural mechanisms underlying melodic perception and memory for pitch. J Neurosci 14: 1908–1919. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Human Brain Mapping are provided here courtesy of Wiley

RESOURCES