Abstract
A foreign language (L2) learned after childhood results in an accent. This functional neuroimaging study investigated speech in L2 as a sensory-motor skill. The hypothesis was that there would be an altered response in auditory and somatosensory association cortex, specifically the planum temporale and parietal operculum, respectively, when speaking in L2 relative to L1, independent of rate of speaking. These regions were selected for three reasons. First, an influential computational model proposes that these cortices integrate predictive feedforward and postarticulatory sensory feedback signals during articulation. Second, these adjacent regions (known as Spt) have been identified as a “sensory-motor interface” for speech production. Third, probabilistic anatomical atlases exist for these regions, to ensure the analyses are confined to sensory-motor differences between L2 and L1. The study used functional magnetic resonance imaging (fMRI), and participants produced connected overt speech. The first hypothesis was that there would be greater activity in the planum temporale and the parietal operculum when subjects spoke in L2 compared with L1, one interpretation being that there is less efficient postarticulatory sensory monitoring when speaking in the less familiar L2. The second hypothesis was that this effect would be observed in both cerebral hemispheres. Although Spt is considered to be left-lateralized, this is based on studies of covert speech, whereas overt speech is accompanied by sensory feedback to bilateral auditory and somatosensory cortices. Both hypotheses were confirmed by the results. These findings provide the basis for future investigations of sensory-motor aspects of language learning using serial fMRI studies.
Keywords: bilingualism, functional magnetic resonance imaging, language
the majority of functional imaging research on bilingualism has been concerned with identifying similarities and differences in the linguistic and cognitive processing of native (L1) and nonnative (L2, L3, etc.) languages (Abutalebi et al. 2008; Crinion et al. 2006; Illes et al. 1999; Kim et al. 1997; Klein et al. 1995; Price et al. 1999). However, what is apparent to a native listener of L1 is the accent of the person to whom that language is L2, particularly when it was acquired in adulthood (Flege et al. 1995). The occasional studies that investigated consequences of overt production of L2 that had not been acquired early in life used tasks that only required the utterance of single words rather than connected speech. They demonstrated differential activity confined to the basal ganglia (Frenck-Mestre et al. 2005; Klein et al. 1994), and functional differences in cortical regions were not reported. This contrasts with modern computational models of speech production that emphasize interactions between premotor, motor, and sensory cortical systems in the online control of articulatory movements (Guenther et al. 2006; Ventura et al. 2009).
It is proposed that, during speech production, fast neural pathways integrate feedforward discharges from premotor or motor cortex encoding articulatory gestures with signals from auditory and somatosensory feedback. These match expectation with outcome to monitor online for articulatory errors (Golfinopoulos et al. 2010; Guenther et al. 2006; Ventura et al. 2009). When speakers use L1, it has been shown that there is paradoxical suppression of neural activity (“sensory gating”) in auditory and somatosensory association cortex (Dhanjal et al. 2008; Ventura et al. 2009). This does not necessarily reflect a relative absence of online sensory-motor monitoring during highly automatic speech production in L1. Instead, it may indicate efficiency, whereby overall local suppression during vocalization actually increases the sensitivity of a subset of responsive neurons to sensory feedback (Eliades and Wang 2008).
The junction of temporal and parietal cortex within the posterior end of the lateral sulcus has been proposed as the site for sensory-motor integration during articulation (Hickok et al. 2003, 2009). This region incorporates the planum temporale and the dorsally adjacent parietal operculum. Although these regions are normally considered to be unimodal auditory and somatosensory cortices, respectively, it is now apparent that cross-modal processing occurs early in “unimodal” sensory streams (Smiley et al. 2007). The main hypothesis that motivated the present study was that activity within what has become known as area Spt would be modulated by the language spoken by a bilingual speaker. Subjects were chosen who had acquired their second language (L2) after early childhood and spoke it with an accent. The 18 nonnative English speakers were studied with functional magnetic resonance imaging (fMRI), contrasting the hemodynamic response in motor and sensory (auditory and somatosensory) cortices as the subjects used connected overt speech during the production of L1 (which varied across subjects) and L2 (English). Speech rate was recorded in both languages to ensure that this was not a confound when analyzing the imaging data.
MATERIALS AND METHODS
Subjects
Eighteen proficient nonnative English speakers (9 female; mean age, 27 yr) participated after giving informed written consent. The study was approved by the local research ethics committee. All participants had learned English as a nonnative language (mean age of starting to use English, 12 yr). They had considerable exposure to L2, since they were all living and working in England (mean time in England, 1 yr, 10 mo; range, 4 mo-6 yr). All had passed English language proficiency examinations for United Kingdom university entrance or visa applications and were able to speak at a normal conversational rate in English. Very early bilinguals dominant in both languages and bilinguals who had more basic L2 abilities or did not speak English every day were not asked to take part. There was a range of first languages: two Greek, one Italian, one German, one Dutch, three French, three Spanish, two Polish, one Russian, two Chinese (Mandarin), and one Indonesian.
Five additional subjects were excluded from analyses using functionally defined regions of interest, due to absence of significant activity for functional localizers within those regions. One subject was excluded from the analyses using individual anatomical regions of interest, due to a technical error.
Behavioral Assessment of Language Proficiency
Proficiency in English was assessed using standardized language tests. There were three measures of spoken English: age of learning English, amount of English used daily, and the speaking component of the Certificate of Proficiency in English (CPE) exam from the University of Cambridge ESOL examinations. There were two additional scores: knowledge of English vocabulary [the picture-naming component of the Bilingual Verbal Ability Tests (BVAT) (Munoz-Sandoval 1998)] and proficiency at reading, in which subjects were recorded reading a text aloud, thereby assessing accent proficiency. These were administered and assessed by a trained foreign language teacher (A. J. Simmonds).
Functional MRI Acquisition and Analysis
MRI data were obtained on a Philips Intera 3.0 Tesla scanner, using dual gradients, a phased array head coil, and sensitivity encoding with an undersampling factor of 2. Thirty-two axial slices with a slice thickness of 3.25 mm and an interslice gap of 0.75 mm were acquired in ascending order (resolution, 2.19 × 2.19 × 4.00 mm; field of view, 280 × 224 × 128 mm). Quadratic shim gradients were used to correct for magnetic field inhomogeneities within the anatomy of interest. There two runs, each of 75 volumes. Functional MR images were obtained using a T2-weighted, gradient-echo, echoplanar imaging (EPI) sequence with whole brain coverage (TR = 10 s, acquisition time = 2 s, giving 8 s for the subjects to speak during silence; echo time = 30 ms; flip angle, 90°). A “sparse” fMRI design was used to minimize movement- and respiratory-related artifacts associated with speech studies (Hall et al. 1999), as well as to minimize auditory masking. High-resolution, T1-weighted images were also acquired for structural reference. Stimuli were presented visually using E-Prime software (Psychology Software Tools) run on an IFIS-SA system (In Vivo).
FMRI was analyzed using FEAT (FMRI Expert Analysis Tool) version 5.98. Preprocessing included motion correction using MCFLIRT (Jenkinson et al. 2002), nonbrain removal using BET (Brain Extraction Tool) (Smith 2002), spatial smoothing using a Gaussian kernel of 8-mm full width at half maximum, grand-mean intensity normalization of the entire four-dimensional (4-D) data set by a single multiplicative factor, and high-pass temporal filtering (Gaussian-weighted least-squares straight line fitting, with σ = 50.0 s). In addition, motion outliers were included in the model as additional confound variables. Time-series statistical analysis was performed using FILM (FMRIB's Improved Linear Modelling) with local autocorrelation correction (Woolrich et al. 2001). Z (Gaussianized T/F)-statistic images were thresholded using clusters determined by Z > 2.3 and a corrected cluster significance threshold of P = 0.05. Registration to high-resolution structural and standard space images was carried out using FLIRT (FMRIB's Linear Image Registration Tool) (Jenkinson et al. 2002; Jenkinson and Smith 2001). Higher level group analysis was carried out using FLAME (FMRIB's Local Analysis of Mixed Effects) stage 1 (Beckmann et al. 2003; Woolrich et al. 2004).
Functional MRI Experimental Tasks
In the fMRI scanning session, subjects produced overt propositional speech in both their native language (L1) and English (L2). Rest was included as the baseline condition. During speech trials, subjects were instructed to give definitions of visually presented pictures. For example, when seeing a picture of an apple, the subject might respond, “This is a round fruit, it grows on a tree, it tastes nice.” Each picture appeared twice, once for each language (country flags indicated whether L1 or L2 should be spoken). Across the 2 runs, there were 60 trials for each of the languages and 30 rest trials. The pictures and the cue for speaking in L1 or L2 were kept constant across subjects to avoid the confound of repeated language switching, which would have occurred with single-trial randomization. The subjects performed 6–12 trials in one language, followed by 3 rest trials, and then 6–12 trials in the second language. Each picture appeared twice, once each in the two runs, and the requirement for speaking in L1 and in L2 to a specific picture was switched between runs.
Somatosensory and auditory functional localizers were also conducted (TR = 5 s, 96 volumes). The first condition involved silently moving the tongue from the floor of the mouth to the upper ridge of the hard palate, with the jaw open and still (Dhanjal et al. 2008). The second involved listening to filtered (either 0–1 or 1–2 kHz) amplitude-modulated (either 8 or 32 Hz) white noise to activate auditory cortex in the supratemporal plane (Warren and Griffiths 2003).
Materials
The picture stimuli consisted of 60 black-and-white line drawings of objects from the International Picture Naming Project (University of California, San Diego). The stimuli had been normed and matched for a range of linguistic and visual variables (see Table 1). They were ranked by familiarity (speed of response in picture-naming tasks) and grouped into objects with high and low familiarity. There were 30 pictures for each category (30 high familiarity and 30 low familiarity), and each picture appeared twice, once for L1 and once for L2. The 30 stimuli for the rest condition consisted of black-and-white random figures not depicting any clear object, to provide a high-level baseline.
Table 1.
Details of stimuli used from the International Picture Naming Project
| Stimuli | RT Target Mean | Familiarity | Syllables | Characters | Log Frequency CELEX | AOA | Visual Complexity |
|---|---|---|---|---|---|---|---|
| High familiarity | |||||||
| Book | 656 | 1 | 1 | 4 | 6.075 | 1 | 8,619 |
| Eye | 700 | 0.98 | 1 | 3 | 6.261 | 1 | 9,104 |
| Balloon | 702 | 1 | 2 | 7 | 1.946 | 1 | 8,015 |
| Dog | 702 | 1 | 1 | 3 | 4.754 | 1 | 12,012 |
| Pencil | 702 | 1 | 2 | 6 | 2.996 | 2 | 7,899 |
| Bell | 703 | 1 | 1 | 4 | 3.332 | 3 | 11,109 |
| Sock | 712 | 1 | 1 | 4 | 2.944 | 1 | 8,316 |
| Camera | 725 | 1 | 3 | 6 | 3.611 | 2 | 16,408 |
| Turtle | 734 | 1 | 2 | 6 | 1.609 | 1 | 14,768 |
| House | 745 | 0.98 | 1 | 5 | 6.409 | 1 | 18,069 |
| Frog | 751 | 1 | 1 | 4 | 2.303 | 1 | 14,773 |
| Sun | 762 | 1 | 1 | 3 | 5.03 | 1 | 18,102 |
| Cat | 766 | 0.96 | 1 | 3 | 4.22 | 1 | 9,894 |
| Finger | 775 | 0.98 | 2 | 6 | 4.82 | 1 | 5,370 |
| Spoon | 777 | 1 | 1 | 5 | 2.773 | 1 | 7,344 |
| Airplane | 778 | 0.7 | 2 | 8 | 1.946 | 1 | 16,810 |
| Ring | 785 | 1 | 1 | 4 | 1.386 | 3 | 7,652 |
| Television | 786 | 0.61 | 2 | 2 | 0 | 1 | 18,950 |
| Helicopter | 793 | 1 | 4 | 10 | 2.833 | 2 | 18,241 |
| Tree | 796 | 1 | 1 | 4 | 5.257 | 1 | 26,074 |
| Clown | 804 | 1 | 1 | 5 | 1.609 | 2 | 21,244 |
| Moon | 804 | 1 | 1 | 4 | 4.094 | 1 | 3,730 |
| Carrot | 806 | 1 | 2 | 6 | 2.197 | 1 | 13,201 |
| Banana | 808 | 1 | 3 | 6 | 2.197 | 1 | 8,767 |
| Apple | 810 | 1 | 2 | 5 | 3.434 | 1 | 8,241 |
| Knife | 816 | 1 | 1 | 5 | 3.807 | 2 | 8,773 |
| Broom | 821 | 1 | 1 | 5 | 2.197 | 1 | 11,261 |
| Glove | 848 | 1 | 1 | 5 | 2.996 | 3 | 11,509 |
| Leaf | 848 | 1 | 1 | 4 | 4.407 | 3 | 26,600 |
| Table | 852 | 0.98 | 2 | 5 | 5.464 | 1 | 12,010 |
| Low familiarity | |||||||
| Fire | 854 | 0.96 | 2 | 4 | 5.094 | 3 | 52,543 |
| Kangaroo | 856 | 1 | 3 | 8 | 1.386 | 3 | 14,555 |
| Web | 869 | 0.68 | 3 | 9 | 0 | 3 | 14,705 |
| Alligator | 881 | 0.9 | 4 | 9 | 1.099 | 2 | 14,874 |
| Palm tree | 908 | 0.86 | 2 | 8 | 0 | 3 | 18,577 |
| Parrot | 910 | 0.79 | 2 | 6 | 1.609 | 3 | 18,115 |
| Gorilla | 944 | 0.7 | 3 | 7 | 1.386 | 3 | 17,084 |
| Piggy bank | 965 | 0.94 | 3 | 9 | 0 | 3 | 24,489 |
| Sink | 984 | 0.96 | 1 | 4 | 2.773 | 1 | 26,560 |
| Rhinoceros | 998 | 0.77 | 4 | 10 | 1.099 | 3 | 18,320 |
| Stairs | 1011 | 0.74 | 1 | 6 | 3.807 | 1 | 27,602 |
| Swan | 1049 | 0.74 | 1 | 4 | 2.079 | 3 | 12,465 |
| Violin | 1051 | 0.82 | 3 | 6 | 1.946 | 3 | 8,571 |
| Trumpet | 1053 | 0.69 | 2 | 7 | 2.197 | 3 | 13,615 |
| Nest | 1059 | 0.73 | 1 | 4 | 2.89 | 3 | 12,296 |
| Saxophone | 1061 | 0.81 | 3 | 9 | 0.693 | 3 | 8,795 |
| Panda | 1071 | 0.38 | 2 | 5 | 0.693 | 3 | 29,117 |
| Teapot | 1085 | 0.44 | 2 | 6 | 1.609 | 3 | 17,625 |
| Ironing board | 1105 | 0.9 | 4 | 12 | 0 | 3 | 12,848 |
| Sweater | 1122 | 0.55 | 2 | 7 | 2.773 | 1 | 11,622 |
| Moose | 1158 | 0.76 | 1 | 5 | 0.693 | 2 | 23,330 |
| Dresser | 1163 | 0.48 | 2 | 7 | 1.792 | 3 | 21,173 |
| Canoe | 1164 | 0.62 | 2 | 5 | 1.946 | 3 | 27,029 |
| Lawn mower | 1166 | 0.96 | 3 | 9 | 0 | 2 | 18,238 |
| Vase | 1171 | 0.94 | 1 | 4 | 2.079 | 3 | 20,221 |
| Leopard | 1194 | 0.54 | 2 | 7 | 2.197 | 3 | 23,203 |
| Stethoscope | 1209 | 0.93 | 3 | 11 | 0.693 | 3 | 13,841 |
| Tractor | 1216 | 0.87 | 2 | 7 | 2.485 | 2 | 9,518 |
| Grasshopper | 1234 | 0.67 | 3 | 11 | 1.386 | 3 | 13,119 |
| Paper clip | 1262 | 0.81 | 3 | 9 | 0 | 3 | 21,555 |
Speech Rate
Syllables were counted from recordings of speech trials. Due to technical failure, speech output from six subjects was not recorded, and these subjects were not included in subsequent analyses of speech rate. Trials were split into high and low speech rate, based on the median for each individual, and entered into a 2 × 2 factorial ANOVA design with the factors “language” and “speech rate.” Syllable counts were calculated using Praat (Boersma and Weenink 2007), with a validated automatic algorithm that detects syllable nuclei (De Jong and Wempe 2009), so that speech rate could be measured; there was no manual transcription.
Regions of Interest
Posterior auditory and somatosensory association cortices are anatomically variable (Ono et al. 1990; Penhune et al. 1996) and closely adjacent in posterior perisylvian cortex. Theoretically motivated regions of interest (ROI) were defined using two converging methodologies, one sensitive to functional individual variability and the other sensitive to anatomical variability. First, ROIs were defined by combining probabilistic anatomical masks from the FSL Harvard-Oxford Cortical Structural Atlas with results from individual functional localizers. Each subject's individual somatosensory and auditory functional localizers were multiplied with either parietal operculum or planum temporale probabilistic anatomical masks. These ROIs were then investigated in each hemisphere separately. To complement this functional analysis, parietal operculum and adjacent posterior planum temporale were defined anatomically on an individual basis using gyral and sulcal landmarks. The planum temporale was labeled using automatic parcellation within Freesurfer (Dale et al. 1999; Fischl et al. 2002). In the absence of a defined parietal operculum within Freesurfer, we employed boundaries from Eickhoff et al. (2006). In brief, the parietal operculum comprises dorsal cortex within the posterior lateral sulcus, with the anterior border defined by the postcentral sulcus and the medial border by the circular sulcus of the insula. The cortical surface was reconstructed from each subject's high-resolution T1 scan using Freesurfer; parietal operculum and posterior planum temporale were then automatically defined for each individual's reconstructed cortical surface. This approach has been shown to be comparable in accuracy to manual labeling of brain regions (Fischl et al. 2002). Within each ROI, mean effect sizes for L1 and L2 speech conditions, relative to rest, were calculated for each individual. For the anatomically defined ROIs, functional data were not spatially smoothed before averaging, to avoid any possibility of blurring of activation across the Sylvian fissure between temporal and parietal lobes.
RESULTS
Language Proficiency
All behavioral measures showed a high level of English proficiency for all participants and confirmed that all subjects used English daily for work. The CPE speaking assessment demonstrated that all subjects had “fluent, spontaneous expression in clear, well-structured speech” (see Table 2). These scores, and knowledge of English vocabulary (BVAT difference between native language and English, 10.5 ± 1.3 mean ± SE) and reading (correct words out of 209, 202.5 ± 1.0), were included as regressors in the ROI analyses of the imaging data.
Table 2.
Behavioral proficiency scores
| Subject | Native Language (L1) | Age at Time of Scanning, yr (mo) | Age Started English, yr | English Experience, yr | Use of English* (maximum score, 21) | CPE Speaking Grade† |
|---|---|---|---|---|---|---|
| 1 | Greek | 24 (10) | 12 | 12 | 18 | C1+ |
| 2 | Italian | 25 (2) | 7 | 18 | 14 | C2 |
| 3 | German | 25 (5) | 11 | 14 | 18 | nd |
| 4 | Dutch | 32 (7) | 11 | 21 | 12 | C2 |
| 5 | French | 24 (2) | 14 | 10 | 21 | nd |
| 6 | French | 26 (7) | 12 | 14 | 14 | nd |
| 7 | Spanish | 27 (7) | 20 | 7 | 11 | C1 |
| 8 | Chinese | 26 (10) | 11 | 15 | 13 | nd |
| 9 | Chinese | 26 (4) | 12 | 14 | 13 | C1+ |
| 10 | Spanish | 26 (5) | 8 | 18 | 17 | B2+ |
| 11 | Indonesian | 23 (6) | 17 | 6 | 21 | B2+ |
| 12 | Polish | 31 (6) | 13 | 18 | 13 | C1 |
| 13 | Russian | 26 (8) | 11 | 15 | 17 | B2+ |
| 14 | Polish | 25 (11) | 12 | 13 | 18 | C1+ |
| 15 | Spanish | 32 (11) | 5 | 27 | 15 | C1 |
| 16 | Italian | 28 (5) | 13 | 15 | 19 | B2+ |
| 17 | Greek | 20 (11) | 9 | 11 | 9 | C1 |
| 18 | French | 35 (10) | 21 | 14 | 12 | nd |
Subjects were asked to describe their current use of English on a sliding scale from 1 to 7, with native language only as 1 and English only as 7, for 3 categories: at work, at home, and other places. A score of 21 would show that the subject only uses English, and a score of 3 would means that they only use their native language.
There are 11 grades in the Cambridge ESOL Examinations Assessments scales for the Certificate of Proficiency in English (CPE) (from low to high: A1, A1+, A2, A2+, B1, B1+, B2, B2+, C1, C1+, C2). nd, data are not available for these subjects.
Functional MRI Analyses
Whole brain analyses.
Speech (L1 + L2) contrasted with a nonspeech “rest” baseline condition demonstrated activity that accorded with previous studies of propositional speech production (Blank et al. 2002; Dhanjal et al. 2008). These included bilateral primary sensorimotor and auditory cortical regions, the left pars opercularis and the paravermal cerebellum (Fig. 1). At a more lenient statistical threshold (P < 0.01, uncorrected), activity in midline premotor cortex (PMC), both the supplementary motor area (SMA) proper and the pre-SMA, lateral PMC, the bodies of both caudate nuclei, left and right globus pallidum, and both thalami were evident (Fig. 1).
Fig. 1.
Speech (L1 + L2) contrasted against rest, where L1 is native language and L2 is nonnative language. Cluster-corrected activity (P < 0.05; orange) is shown in bilateral primary somatosensory (1), primary auditory (2), primary motor (3), and secondary somatosensory cortical regions (4), left pars opercularis (5), and midline vermal cerebellum (6). Additional uncorrected activity (Z = 2.3; red) is shown in midline premotor cortex (7), pallidum (8), thalamus (9), and caudate nucleus (10).
Directly contrasting production of L2 with L1 demonstrated greater activity in a limited number of cerebral and cerebellar cortical regions (P < 0.05, corrected): left pars opercularis, left anterior superior temporal gyrus, left lateral PMC, medial PMC (both the SMA and pre-SMA), left temporoparietal cortex (both parietal operculum and planum temporale), and midline vermal cerebellum (Fig. 2). In addition, there was significantly greater activity in a number of bilateral subcortical regions: the basal ganglia (left and right globus pallidum) and the lateral thalami. The increased cognitive demands of producing L2 during picture description were reflected in increased activity in anterior medial prefrontal cortex.
Fig. 2.
Direct contrast of L2 against L1 in the whole brain. Sagittal (top) and coronal (bottom) slices show increased activity in motor feedforward and sensory feedback systems, as well as midline cerebellum, for L2. Greater activity is shown in left temporoparietal cortex (1), premotor cortex (2), left pars opercularis (3), midline vermal cerebellum (4), thalami (5), and anterior medial prefrontal cortex (6).
ROI analyses.
Activity during production of L1 and L2 within ROIs confirmed and extended the impression from the whole brain analyses (Fig. 3). Within the auditory functional ROI, both speech production conditions resulted in increased activity bilaterally relative to rest, although activity was marginally less during production of L1 relative to L2 (Fig. 3). In both left and right somatosensory functional ROIs, there was significant activation in response to production of L2, relative to both production of L1 and the baseline rest condition, whereas activity during production of L1 was no different from rest. The whole brain analysis at the statistical threshold chosen (P <0.05, cluster corrected) had only shown the difference in activity on the left, but this apparent lateralization on thresholded images was not confirmed by the ROI analyses (Jernigan et al. 2003).
Fig. 3.
A: percent signal changes for L1 and L2 in the functionally defined regions of interest (ROIs) on smoothed data, where there was overlap of the ROIs (auditory and somatosensory). Within the auditory region, both L1 and L2 were active relative to baseline, with L1 marginally suppressed relative to L2. Within the somatosensory region, L1 showed negative activation (not significantly different from the nonspeech baseline), whereas L2 showed significant positive activation. B: auditory (blue) and somatosensory (green) masks. C: percent signal changes for L1 and L2 in the individual anatomically defined ROIs (lateral posterior planum temporale and parietal operculum) on unsmoothed data, where overlap had been avoided. Within both regions, L2 showed significantly greater activation relative to L1, and the percent signal change was equivalent in the 2 regions. *P < 0.05; **P < 0.001.
Using anatomically defined ROIs in the left temporoparietal cortex, we observed a similar pattern of results, with L2 associated with significantly greater activity than L1 in both the posterior planum temporale and parietal operculum. However, in contrast to whole brain analyses and functionally defined ROI analyses, the anatomical ROIs demonstrated no significant activity during L1 speech production contrasted with rest.
In addition to considering group differences in the neuroimaging analyses, all measures of English proficiency were included as covariates in ROI analyses. None of these covariates showed any significant correlation with activity in the data from the ROIs, although this was not a primary aim of the current study. Different native languages had no demonstrable influence on differences in articulating in L1 relative to the common L2 of English. We did not find evidence that the motor-sensory consequences of speaking in L2 are dependent on L1 or are related to, for example, differences in stress patterns between languages, but this will require confirmation with future studies that include larger groups of subjects with a specific L1.
One potential confound with using volitional propositional speech is that the number of utterances produced is unconstrained and can differ between languages. Analyses of recordings of 12 participants' speech demonstrated approximately equivalent numbers of syllables spoken in L1 relative to L2 [mean speech rate in syllables: L1 = 8.2, L2 = 7.8; t(21) = 1.437, P =0.165, paired t-test]. There were no significant differences in speech rate across the two runs (mean difference of 0.1 syllables). Figure 4 shows the online rate of speaking in L1 and L2 at the time of scanning, and although there was considerable interindividual variability of speech rate across both scanning runs, this plot shows that within-individual rates were very closely matched between L1 and L2. In addition, a whole brain ANOVA testing for effects of speech rate and L1/L2 language activation revealed no significant effects of speech rate or interaction with language (even at a liberal statistical threshold) while revealing a significant main effect of language in cortical motor-sensory feedback systems, in particular, temporoparietal cortex (Fig. 5). It should be stressed that although interindividual speech rate was variable, intraindividual rate was relatively constant. Therefore, this analysis will not have demonstrated the specific effect of speech rate. This specific effect has been investigated previously using a study design that varies speech rate systematically within each individual, such as by varying the rate of repetition (Wise et al. 1999). Speech rate was not intended as a marker of proficiency; this was independently measured using more sensitive out-of-scanner assessments.
Fig. 4.
Speech rate measured in syllables produced per trial for L1 and L2. Circles represent trials with lower speech rates (less than the median), and crosses represent trials with higher speech rates (median or greater).
Fig. 5.
Results from ANOVA testing for effects of speech rate and L1/L2 language activation revealed no significant effects of speech rate or interaction with language. A significant main effect of language was revealed in cortical sensorimotor feedback systems, particularly in the parietal operculum. Coronal (y = −34) and sagittal (x = −51) slices show the main effect of language.
DISCUSSION
This study investigated whether the response of temporoparietal cortex is altered when bilinguals switch from speaking in L1, without an accent, to L2, with an accent. The specific hypothesis was that speaking in L2, relative to L1, would be associated with increased activity in both planum temporale and parietal operculum. Different analysis approaches (including whole brain voxelwise analysis as well as ROIs defined both on individual anatomy and using individual functional localizers) provided converging evidence of increased activation for L2 relative to L1 within temporoparietal cortex. Given previous work investigating native speech production, as well as work delineating the more basic sensory properties and connectivity of these regions in humans and nonhuman primates, temporoparietal cortex is highly likely to be involved in heteromodal auditory and somatosensory feedback control.
One explanation of our findings is that production of L1 is highly automatic with few motor errors and requires much less online sensory monitoring. In contrast, production of L2, less automatic and more prone to error, engages much closer sensory monitoring of any discrepancies between the predicted and actual sensory outcome of movements. An alternative interpretation, inferred from single-cell recordings in nonhuman primate auditory cortex during vocalizations (Eliades and Wang 2008), is that feedback monitoring, both auditory and somatosensory, during speech production of L1 engages a subset of sensory association neurons distributed within temporoparietal cortex, with suppression of many other neurons not engaged by speech production monitoring. The net change in blood oxygen level-dependent (BOLD) response may not be detectable at the resolution of fMRI or may even be reduced relative to a rest condition. The conclusion from this interpretation is that a reduced or absent signal in secondary sensory cortex during native speech indicates increased feedback efficiency by a limited number of neurons, tuned by experience from an early age. Viewed in this way, increased signal in temporoparietal cortex during nonnative speech production indicates feedback processing that has never become as optimally efficient as native language acquisition during early childhood. The consequence would be a less reliable feedforward copy of premotor or motor articulatory commands sent to or processed in sensory cortex and a consequent inability to appropriately suppress neurons involved in feedback. These two explanations are, of course, not mutually exclusive, and the signal we observed in sensory cortices may represent a combination of the two effects.
Evidence to support the claim that the observed difference between L1 and L2 is due to sensory-motor changes between first and second languages comes from the precise anatomical locations of this change. ROIs were placed on the planum temporale and parietal operculum by using objective anatomical criteria on unsmoothed data from each subject (thereby minimizing inaccurate parcellation of cortical regions that may accompany ROIs placed on normalized smoothed images). The increased activity in temporoparietal cortex when speaking in L2 is, by its very location, related to sensory feedback; this was corroborated further by the activation differences between L1 and L2 in regions defined by our nonlinguistic functional localizers. In addition to sensory-motor differences between L1 and L2, there were inevitably also linguistic or cognitive processing during speech production. Such factors (e.g., phonological processing differences, differences in attentional control or translation influences from L1 into L2) plausibly explain the pattern of activation seen in a number of the regions reported, such as in inferior and superior frontal regions and others. However, the regions analyzed with ROIs, the planum temporale and the parietal operculum, are sensory areas and are not implicated in cognitive or linguistic processing.
Midline cerebellar cortex has also been proposed as a component of the pathway for feedforward commands for speech production (Guenther et al. 2006). We observed increased activity in response to L2 relative to L1 within these cerebellar regions. This mirrors the response observed in the study of Tourville et al. (2008). As such, increased midline cerebellar activation provides converging evidence for sensorimotor processing differences between L1 and L2.
Tourville et al. (2008) showed sudden unexpected perturbations of auditory feedback during single word reading in L1 result in an event-related increase in activity in posterior auditory association cortex (including planum temporale) and in parietal operculum. This was associated with an online compensatory alteration in articulation to counter the perceived perturbation, despite subjects' lack of awareness of the mismatch between production and auditory feedback. This automatic compensation offers an explanation for why, in the present study, the shift in activity within temporoparietal cortex accompanying the change from L1 to L2 was not sensitive to measures of L2 proficiency and daily use. This study was performed on highly educated subjects working in England and continually using English in their professional lives. This deliberate choice of a homogeneous population explains why there were unlikely to be imaging-behavioral correlations. The choice of population was to demonstrate altered function in temporoparietal cortex even though the subjects were proficient and practiced in L2. Furthermore, it is not clear that there should be a linear relationship between proficiency and the fMRI signal. Increased signal for L2 may only occur in late bilinguals, and bilinguals who have learnt a second language before a critical period in the first decade of life would not show differences between L1 and L2. In this case, no matter how proficient the bilingual, L2 may still result in an increased signal. Of greater interest will be future studies over the course of subjects first acquiring L2 that would manipulate proficiency and allow for a systematic investigation of relationships between variation in language ability and activity in sensory-motor regions.
A number of functional neuroimaging studies have reported activity within left temporoparietal cortex during the covert production of both speech and nonspeech vocalizations (Hickok et al. 2009; Hickok and Poeppel 2000; Pa and Hickok 2008). The location of this region, named Spt, was within the planum temporale and the adjacent parietal operculum (Pa and Hickok 2008). Clearly, the studies by Hickok and colleagues did not study direct interactions between feedforward and feedback vocalization pathways. Because the ventral premotor cortex is left-lateralized for speech production and is activated by covert speech when motor cortex is not (Basho et al. 2007), data from Hickok and colleagues indicate the existence of a left-lateralized “prearticulatory sensory” area that is active during covert vocalization when auditory and somatosensory association areas remain inactive. It is proposed that left ventral premotor cortex and Spt are strongly connected, both anatomically and functionally (Hickok and Poeppel 2007). The present study could not confirm whether area Spt responds in the same manner to both covert and overt speech production. This will require future studies that investigate associations and dissociations of activity during covert and overt speech production in a single experimental design.
What is not established from this study is the relative importance of auditory compared with somatosensory feedback processing when speaking L2. Heteromodal processing of sensory information occurs early in unimodal sensory association cortices, and there is merging of auditory and somatosensory processing in posterior supratemporal plane and parietal operculum (Bremmer et al. 2001; Foxe et al. 2002; Galaburda and Sanides 1980; Smiley et al. 2007). Further studies are required to separate the relative contributions of auditory and somatosensory feedback to the signal within temporoparietal cortex, a line of enquiry that may be assisted by use of multivariate pattern analysis of human fMRI studies (see Okada et al. 2010) to complement single-cell recordings in nonhuman primates.
Now that we have established the sensitivity of this technique at detecting sensory-motor differences when subjects speak in their native and nonnative languages, further studies are planned to investigate changes in this system over time (with serial fMRI studies) as naive subjects acquire a second language through training.
GRANTS
This work was supported by the Medical Research Council and the Research Councils UK.
DISCLOSURES
No conflicts of interest, financial or otherwise, are declared by the author(s).
REFERENCES
- Abutalebi J, Annoni JM, Zimine I, Pegna AJ, Seghier ML, Lee-Jahnke H, Lazeyras F, Cappa SF, Khateb A. Language control and lexical competition in bilinguals: an event-related fMRI study. Cereb Cortex 18: 1496–1505, 2008 [DOI] [PubMed] [Google Scholar]
- Basho S, Palmer ED, Rubio MA, Wulfeck B, Muller RA. Effects of generation mode in fMRI adaptations of semantic fluency: paced production and overt speech. Neuropsychologia 45: 1697–1706, 2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beckmann CF, Jenkinson M, Smith SM. General multilevel linear modeling for group analysis in fMRI. Neuroimage 20: 1052–1063, 2003 [DOI] [PubMed] [Google Scholar]
- Blank SC, Scott SK, Murphy K, Warburton EA, Wise RJS. Speech production: Wernicke, Broca and beyond. Brain 125: 1829–1838, 2002 [DOI] [PubMed] [Google Scholar]
- Boersma P, Weenink D. Praat: Doing Phonetics By Computer (version 4 D.5.25) [Software]. Available from www.praat.org [2007].
- Bremmer F, Schlack A, Shah NJ, Zafiris O, Kubischik M, Hoffmann KP, Zilles K, Fink GR. Polymodal motion processing in posterior parietal and premotor cortex: a human fMRI study strongly implies equivalencies between humans and monkeys. Neuron 29: 287–296, 2001 [DOI] [PubMed] [Google Scholar]
- Crinion J, Turner R, Grogan A, Hanakawa T, Noppeney U, Devlin JT, Aso T, Urayama S, Fukuyama H, Stockton K, Usui K, Green DW, Price CJ. Language control in the bilingual brain. Science 312: 1537–1540, 2006 [DOI] [PubMed] [Google Scholar]
- Dale AM, Fischl B, Sereno MI. Cortical surface-based analysis. Neuroimage 9: 179–194, 1999 [DOI] [PubMed] [Google Scholar]
- De Jong NH, Wempe T. Praat script to detect syllable nuclei and measure speech rate automatically. Behav Res Methods 41: 385–390, 2009 [DOI] [PubMed] [Google Scholar]
- Dhanjal NS, Handunnetthi L, Patel MC, Wise RJS. Perceptual systems controlling speech production. J Neurosci 28: 9969–9975, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eickhoff SB, Schleicher A, Zilles K, Amunts K. The human parietal operculum. I. Cytoarchitectonic mapping of subdivisions. Cereb Cortex 16: 254–267, 2006 [DOI] [PubMed] [Google Scholar]
- Eliades SJ, Wang X. Neural substrates of vocalization feedback monitoring in primate auditory cortex. Nature 453: 1102–1107, 2008 [DOI] [PubMed] [Google Scholar]
- Fischl B, Salat DH, Busa E, Albert M, Dieterich M, Haselgrove C, van der Kouwe A, Killiany R, Kennedy D, Klaveness S, Montillo A, Makris N, Rosen B, Dale AM. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 33: 341–355, 2002 [DOI] [PubMed] [Google Scholar]
- Flege JE, Munro MJ, MacKay IRA. Factors affecting strength of perceived foreign accent in a second language. J Acoust Soc Am 97: 3125–3134, 1995 [DOI] [PubMed] [Google Scholar]
- Foxe JJ, Wylie GR, Martinez A, Schroeder CE, Javitt DC, Guilfoyle D, Ritter W, Murray MM. Auditory-somatosensory multisensory processing in auditory association cortex: an fMRI study. J Neurophysiol 88: 540–543, 2002 [DOI] [PubMed] [Google Scholar]
- Frenck-Mestre C, Anton JL, Roth M, Vaid J, Viallet F. Articulation in early and late bilinguals' two languages: evidence from functional magnetic resonance imaging. Neuroreport 16: 761–765, 2005 [DOI] [PubMed] [Google Scholar]
- Galaburda A, Sanides F. Cytoarchitectonic organization of the human auditory cortex. J Comp Neurol 190: 597–610, 1980 [DOI] [PubMed] [Google Scholar]
- Golfinopoulos E, Tourville JA, Guenther FH. The integration of large-scale neural network modeling and functional brain imaging in speech motor control. Neuroimage 52: 862–874, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guenther FH, Ghosh SS, Tourville JA. Neural modeling and imaging of the cortical interactions underlying syllable production. Brain Lang 96: 280–301, 2006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hickok G, Buchsbaum B, Humphries C, Muftuler T. Auditory-motor interaction revealed by fMRI: speech, music, and working memory in area Spt. J Cogn Neurosci 15: 673–682, 2003 [DOI] [PubMed] [Google Scholar]
- Hickok G, Okada K, Serences JT. Area Spt in the human planum temporale supports sensory-motor integration for speech processing. J Neurophysiol 101: 2725–2732, 2009 [DOI] [PubMed] [Google Scholar]
- Hickok G, Poeppel D. The cortical organization of speech processing. Nat Rev Neurosci 8: 393–402, 2007 [DOI] [PubMed] [Google Scholar]
- Hickok G, Poeppel D. Towards a functional neuroanatomy of speech perception. Trends Cogn Sci 4: 1463–1467, 2000 [DOI] [PubMed] [Google Scholar]
- Illes J, Francis WS, Desmond JE, Gabrieli JDE, Glover GH, Poldrack R, Lee CJ, Wagner AD. Convergent cortical representation of semantic processing in bilinguals. Brain Lang 70: 347–363, 1999 [DOI] [PubMed] [Google Scholar]
- Jenkinson M, Bannister P, Brady M, Smith S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage 17: 825–841, 2002 [DOI] [PubMed] [Google Scholar]
- Jenkinson M, Smith S. A global optimisation method for robust affine registration of brain images. Med Image Anal 5: 143–156, 2001 [DOI] [PubMed] [Google Scholar]
- Jernigan TL, Gamst AC, Fennema-Notestine C, Ostergaard AL. More “mapping” in brain mapping: statistical comparison of effects. Hum Brain Mapp 19: 90–95, 2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim KHS, Relkin NR, Lee KM, Hirsch J. Distinct cortical areas associated with native and second languages. Nature 388: 171–174, 1997 [DOI] [PubMed] [Google Scholar]
- Klein D, Milner B, Zatorre RJ, Meyer E, Evans AC. The neural substrates underlying word generation: a bilingual functional-imaging study. Proc Natl Acad Sci USA 92: 2899–2903, 1995 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klein D, Zatorre RJ, Milner B, Meyer E, Evans AC. Left putaminal activation when speaking a second language: evidence from PET. Neuroreport 5: 2295–2297, 1994 [DOI] [PubMed] [Google Scholar]
- Munoz-Sandoval AF, Cummins J, Alvarado CG, Ruef ML. Bilingual Verbal Abilities Test: Comprehensive Manual. Itasca, IL: Riverside, 1998 [Google Scholar]
- Okada K, Rong F, Venezia J, Matchin W, Hsieh IH, Saberi K, Serences JT, Hickok G. Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech. Cereb Cortex 20: 2486–2495, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ono M, Kubik S, Abernathey CD. Atlas of the Cerebral Sulci. Stuttgart, Germany: Georg Thieme, 1990, p. 218 [Google Scholar]
- Pa J, Hickok G. A parietal-temporal sensory-motor integration area for the human vocal tract: evidence from an fMRI study of skilled musicians. Neuropsychologia 46: 362–368, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Penhune VB, Zatorre RJ, MacDonald JD, Evans AC. Interhemispheric anatomical differences in human primary auditory cortex: probabilistic mapping and volume measurement from magnetic resonance scans. Cereb Cortex 6: 661–672, 1996 [DOI] [PubMed] [Google Scholar]
- Price CJ, Green DW, von Studnitz R. A functional imaging study of translation and language switching. Brain 122: 2221–2235, 1999 [DOI] [PubMed] [Google Scholar]
- Smiley JF, Hackett TA, Ulbert I, Karmas G, Lakatos P, Javitt DC, Schroeder CE. Multisensory convergence in auditory cortex. I. Cortical connections of the caudal superior temporal plane in macaque monkeys. J Comp Neurol 502: 894–923, 2007 [DOI] [PubMed] [Google Scholar]
- Smith SM. Fast robust automated brain extraction. Hum Brain Mapp 17: 142–155, 2002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tourville JA, Reilly KJ, Guenther FH. Neural mechanisms underlying auditory feedback control of speech. Neuroimage 39: 1429–1443, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ventura MI, Nagarajan SS, Houde JF. Speech target modulates speaking induced suppression in auditory cortex. BMC Neurosci 10: 58–69, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Warren JD, Griffiths TD. Distinct mechanisms for processing spatial sequences and pitch sequences in the human auditory brain. J Neurosci 23: 5799–5804, 2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wise RJS, Greene J, Buchel C, Scott SK. Brain regions involved in articulation. Lancet 353: 1057–1061, 1999 [DOI] [PubMed] [Google Scholar]
- Woolrich MW, Behrens TEJ, Beckmann CF, Jenkinson M, Smith SM. Multilevel linear modelling for FMRI group analysis using Bayesian inference. Neuroimage 21: 1732–1747, 2004 [DOI] [PubMed] [Google Scholar]
- Woolrich MW, Ripley BD, Brady M, Smith SM. Temporal autocorrelation in univariate linear modeling of FMRI data. Neuroimage 14: 1370–1386, 2001 [DOI] [PubMed] [Google Scholar]





