Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Mar 31.
Published in final edited form as: Restor Neurol Neurosci. 2007;25(0):195–210.

Neuroplasticity in the processing of pitch dimensions: A multidimensional scaling analysis of the mismatch negativity

Bharath Chandrasekaran 1, Jackson T Gandour 1,*, Ananthanarayan Krishnan 1
PMCID: PMC4380289  NIHMSID: NIHMS672708  PMID: 17942999

Abstract

Purpose

An auditory electrophysiological study was conducted to explore the influence of language experience on the saliency of dimensions underlying cortical pitch processing.

Methods

Mismatch negativity (MMN) responses to Mandarin tones were recorded in Chinese and English participants (n = 10 per group) using a passive oddball paradigm. Stimuli consisted of three tones (T1: high level; T2: high rising; T3: low falling-rising). There were three oddball conditions (standard/deviant): T1/T2, T1/T3, T2/T3. In the T1/T2 and T1/T3 conditions, each tonal pair represented a contrast between a level and a contour tone; the T2/T3 condition, a contrast between two contour tones. Twenty dissimilarity matrices were created using the MMN mean amplitude measured from the Fz location for each condition per participant, and analyzed by an individual differences multidimensional scaling model.

Results

Two pitch dimensions were revealed, interpretively labeled as ‘height’ and ‘contour’. The latter was found to be more important for Chinese than English subjects. Using individual weights on the contour dimension, a discriminant function showed that 17 out of 20 participants were correctly classified into their respective language groups.

Conclusions

The MMN can serve as an index of pitch features that are differentially weighted depending on a listener’s experience with lexical tones and their acoustic correlates within a particular tone space.

Keywords: Mismatch negativity, pitch, multidimensional scaling, experience-dependent plasticity, lexical tone, Mandarin Chinese

1. Introduction

Using the mismatch negativity (MMN) component of the auditory event-related potential (ERP), it is well established that language experience influences the automatic involuntary processing of speech sounds (N. Kraus & Cheour, 2000; Naatanen, 2001). Early work focused on segmental information, for example, the effects of the presence or absence of consonants or vowels in a phonemic inventory (Naatanen et al., 1997; Sharma & Dorman, 2000; Winkler et al., 1999) as well as other segmental aspects of phonology, including phonotactics (Dehaene-Lambertz, Dupoux, & Gout, 2000) and phoneme boundaries (Sharma & Dorman, 1999; Ylinen, Shestakova, Huotilainen, Alku, & Naatanen, 2006). More recently, it has been demonstrated that the MMN response is also sensitive to suprasegmental information in the speech signal, including changes in loudness (Hungarian word stress, Honbolygo, Csepe, & Rago, 2004), duration (Finnish vowel length, Nenonen, Shestakova, Huotilainen, & Naatanen, 2003), and pitch (Mandarin lexical tones, Chandrasekaran, Krishnan, & Gandour, 2007).

The MMN component of the ERP has also been used to study the representation of auditory attributes in sensory memory (Caclin et al., 2006, pp. 1960–1961, review). A number of studies support the view that basic perceptual attributes (e.g., frequency, duration, intensity) are represented separately in auditory sensory memory. Of special interest are perceptual attributes that rely on several acoustic dimensions. A recent study of timbre, a multidimensional perceptual attribute, provides support for the notion that its dimensions are represented separately in auditory sensory memory (Caclin et al., 2006; Caclin, McAdams, Smith, & Winsberg, 2005).

Pitch is a multidimensional perceptual attribute that similarly relies on several acoustic dimensions (e.g., height, slope, direction). Psychophysical evidence for the multidimensional nature of pitch perception in tone languages comes primarily from crosslanguage multidimensional scaling studies of dissimilarity ratings. Based on a sample of tone languages from the Far East (Thai, Cantonese, Mandarin, Taiwanese) and West Africa (Yoruba) and a non-tone language (English), it appears that three dimensions underlie a common perceptual space: average F0 height, direction of F0 movement, magnitude of F0 slope or level vs. contour (Gandour, 1983; Gandour & Harshman, 1978). The relative importance of these dimensions, however, varies depending on a listeners’ familiarity with specific types of pitch patterns that occur in their native language. For example, the perceptual saliency of the contour dimension is greater for native speakers of tone languages than for speakers of English, while English listeners give greater weight to the height dimension than do tone language speakers. Such differences in perceptual saliency suggest that long-term experience enhances listeners’ attention to pitch dimensions that are phonetically relevant in a particular language.

As a tone language, Mandarin Chinese has four contrastive lexical tones (ma1 ‘mother’ [T1], ma2 ‘hemp’ [T2], ma3 ‘horse’ [T3], ma4 ‘scold’ [T4]). Tones 1 to 4 can be described phonetically as high level, high rising, low falling rising, and high falling, respectively (Howie, 1976). In a recent crosslanguage study of the MMN elicited in response to Mandarin tones (Chandrasekaran et al., 2007), two oddball conditions were constructed with a common deviant, a low falling rising contour tone (T3). One condition consisted of two tones that are acoustically similar to one another (T2/T3: T2, high rising contour = standard). The other condition consisted of two tones that are acoustically dissimilar to one another (T1/T3: T1, high level = standard). The native group (Chinese) exhibited larger MMN responses to the high dissimilarity condition (T1/T3) than the non-native (English). In the case of the low dissimilarity condition (T2/T3), MMN responses were similar for both groups. Interestingly, MMN responses were larger in the T1/T3 condition than the T2/T3 condition for native Chinese listeners only. By virtue of these group differences, Chandrasekaran et al. (2007) inferred that early cortical processing of pitch may be shaped by the relative saliency of acoustic dimensions underlying the pitch patterns of a particular language. Yet they were unable to test this hypothesis due to limitations of the stimulus set.

In order to directly assess the number and nature of pitch dimensions underlying these MMN responses, as well as their relative importance for individual subjects, we have chosen to apply INDSCAL (INDividual Differences SCALing) (Carroll & Chang, 1970; cf.Harshman, 1970), a multidimensional scaling model that can be applied to two-mode three-way data, i.e., two or more square symmetric data matrices for pairs of stimuli from two or more subjects or other data sources. Instead of a single data matrix representing the average of a group of subjects, INDSCAL performs a metric multidimensional scaling analysis using a separate matrix for each individual in the group. The model assumes that the set of dimensions is common to all subjects. For example, when subjects or groups of subjects differ in their MMN responses, the model assumes that they do so either by assigning different relative importance to the dimensions or by using different subsets of the total number of dimensions. On the basis of subjects’ pairwise dissimilarities data, the output from INDSCAL consists of two matrices. The first matrix contains the coordinates of the stimulus objects on n dimensions in Euclidean space (group stimulus space). The distance between points in this space is represented by a weighted Euclidean function that reflects the relative importance or salience of different dimensions per subject. The dimensions can then be interpreted based on the distribution of the objects in Euclidian space. The second matrix contains each subject’s weights on the different dimensions (subject space). An important property of the INDSCAL individually-weighted solution is that the axes of the group stimulus space are oriented in such a way that expansion or contraction results in a mathematically unique orientation of axes, i.e., an orientation that maximally accounts for individual differences. The axes cannot be rotated without worsening the overall fit of the model.

The aim of the present study is to directly test the hypothesis of separate neural processing of pitch dimensions by applying INDSCAL analysis to the MMN responses. As such, it represents an extension of the observations reported in Chandrasekaran et al. (2007). The specific goals are (i) to determine the number and nature of pitch dimensions elicited by MMN responses to Mandarin tones and (ii) to assess to what extent subjects’ differences in the relative importance attached to particular dimensions are influenced by their language experience. Since INDSCAL analysis requires square symmetric data matrices for stimulus pairs for all subjects, it was necessary to obtain an additional condition (T1/T2) from those who participated in the companion study (Chandrasekaran et al., 2007). By including T1/T2, which involves a comparison of a level tone (T1) with a high rising tone (T2), we satisfy INDSCAL’s input requirements. All objects (T1, T2, T3) can now be compared pairwise to one another, resulting in three conditions (T1/T2, T1/T3, T2/T3) for each of the 20 participants (Chinese, English). Based on earlier crosslanguage multidimensional scaling studies of tone perception (Gandour, 1983; Gandour & Harshman, 1978), in addition to the findings of Chandrasekaran et al. (2007), we expect that two pitch dimensions (‘height’, ‘contour’) will best account for pairwise dissimilarities in mean MMN amplitude between the three pairs of tones. By including two language groups, one native (Chinese), the other nonnative (English), we are able to determine the extent to which modulation of the MMN can be attributed to a listener’s familiarity with the pitch dimensions underlying the Mandarin tone space. Since pitch contour is a critical phonetic feature of the Mandarin tone space, the contour dimension is expected to be more heavily weighted by Chinese subjects than by English. Discriminant analysis is employed to determine whether individual subjects can be classified into their respective language group on the basis of their dimension weights. To the extent that a discriminant function is highly successful in separating the two language groups, we can infer that experience-dependent plasticity, as reflected in the MMN, extends to the extraction of pitch dimensions. These predicted outcomes would be consistent with the view that there are separate representations of pitch dimensions in auditory sensory memory (cf. timbre dimensions (Caclin et al., 2006)).

2. Materials and methods

2.1. Participants

Ten adult, native speakers of Mandarin Chinese (5 male; 5 female) and ten adult, native speakers of American English (5 male; 5 female) participated in the ERP experiment. Both groups were closely matched with respect to age (Chinese: M = 23.2; English: M = 25.2) and years of formal education (Chinese: M = 17.2; English: M = 18.2). They gave informed consent in compliance with a protocol approved by the Institutional Review Board of Purdue University. See Chandrasekaran et al. (2007) for further details about language background, musical experience, and audiometric profile of the participants.

2.2. Stimuli

Stimuli consisted of a set of three Mandarin Chinese words that are distinguished minimally by tonal contour (pinyin Roman transliteration): yi1 ‘clothing’ [T1]; yi2 ‘aunt’ [T2]; yi3 ‘chair’ [T3]. Using a cascade/parallel formant synthesizer (Klatt, 1980; Klatt & Klatt, 1990), synthetic versions were created so that all four syllables were identical except for their tonal contours. Synthesis parameters for voice fundamental frequency (F0) and duration were obtained from words produced in citation form by an adult male speaker (Y. Xu, 1997). F0 contours of the stimuli are displayed in Fig. 1. Vowel formant frequencies were steady-state, and were held constant across the four syllables (in Hz): F1 = 300; F2 = 2500; and F3 = 3500 (Howie, 1976). Tonal durations were normalized to 250 ms. Voice amplitude was kept constant at 60 dB.

Fig. 1.

Fig. 1

Average fundamental frequency contours of time-normalized Mandarin Chinese tonal stimuli (T1, T2, T3) adapted from Xu (1997). Superscript numbers (1–3) denote three Mandarin lexical tones: yi1 ‘clothing’; yi2 ‘aunt’; yi3 ‘chair’.

These three tones differed in F0 height as measured by onset, offset, and average F0, and in F0 contour and/or direction as measured by slope from tonal onset to offset, from onset to turning point, and from turning point to offset (Table 1). The locations of the turning points (in ms) for T2 and T3 were 106 and 144, respectively. T1 may be described as a high level tone; T2 and T3 as high rising and low falling rising contour tones, respectively. With regard to F0 contour and direction, T2 is more similar to T3 than to T1. In terms of F0 height, T1 is more similar to T2 than to T3. See Chandrasekaran et al. (2007) for further information about the rationale underlying the selection of T1, T2, and T3.

Table 1.

Acoustic characteristics of experimental stimuli

F0 parameters
Tone Onset Offset Average Slopea Onset-Offset Slopeb Onset-TP Slopec TP-Offset TP Δ F0
T1 129 128 129 0.00 0.02 −0.02 125 2
T2 109 136 117 0.11 −0.04 0.17 71 −4
T3 104 109 96 0.02 −0.13 0.19 133 −18

Note. Onset, offset, and average F0 values are expressed in Hertz (Hz). All slope values are expressed in Hz/ms. T1, T2, and T3 refer to the Mandarin high level, high rising, and low falling rising tones, respectively. TP, expressed in milliseconds, refers to turning point, i.e., time at which the contour changed direction. F0 = voice fundamental frequency; Δ F0 = change in Hz from onset to turning point.

a

Overall slope, measured from pitch onset to offset.

b

Slope from the onset to TP. Since the level tone T1 has no clear turning point, slope was measured from onset to 125 ms (50% duration). Both T2 and T3 have negative slopes (i.e. falling F0 contour).

c

Slope from the TP to offset. Since the level tone T1 has no clear turning point, slope was measured from 125 ms (50% duration) to offset. Both T2 and T3 have positive slopes (i.e. rising F0 contour).

The experiment consisted of three oddball conditions (standard/deviant): T1/T2, T1/T3, T2/T3 (Fig. 2). In the T1/T2 and T1/T3 conditions, each tonal pair represented a dissimilar contrast between a level (T1) and a contour tone (T2, T3). In the T2/T3 condition, the tonal pair represented a similar contrast between two rising contour tones. By pairing T1 with T2, we were able to contrast a high level tone (T1) with a high rising contour tone (T2). By pairing T1 with T3, we were able to contrast a high level tone (T1) with a low rising contour tone (T3) and, by pairing T2 with T3, a high rising (T2) with a low rising contour tone (T3). These three conditions ensured that all three tones (T1, T2, T3) were compared to one another, a prerequisite for multidimensional scaling analysis.

Fig. 2.

Fig. 2

Three oddball conditions used to elicit MMN responses: T3 presented in the context of T1 (T1/T3, left panel) and T2 (T2/T3, right panel); T2 presented in the context of T1 (T1/T2, middle panel). The frequent (p = 0.85) and rare (p = 0.15) stimuli are represented with dashed and solid lines, respectively.

2.3. Experimental protocol

Subjects were seated in a recliner in an acoustically and electrically shielded booth facing a video monitor. They were instructed to ignore the auditory stimuli presented via earphones and to focus their attention exclusively on a self-selected, closed-caption silent movie. Five different stimulus sequences with an interstimulus onset-to-onset interval of 667 ms were presented in random order. Three were oddball conditions made up of a frequent stimulus or standard (p = 0.85) and an infrequent stimulus or deviant (p = 0.15). T3 was used as the deviant in two of the oddball conditions (T1/T3, T2/T3); T2 as the deviant in the third condition (T1/T2). Within the oddball sequences, the order of presentation of stimuli was pseudo-random, i.e., at least one standard stimulus preceded each deviant. The remaining two conditions were comprised of the deviants T3 and T2 presented alone (p = 1.00). See Chandrasekaran et al. (2007) for further information about data acquisition procedures.

2.4. Evoked potential recording

For each subject, silver-chloride electrodes were mounted on frontal midline (Fz), left frontal (F3), and right frontal (F4) sites according to the 10–20 location system. MMN responses are known to be robust at these frontal electrodes (Naatanen et al., 1997). In our previous study, Mandarin tones elicited consistent and robust responses at these three electrode sites (Chandrasekaran et al., 2007). In this experiment we did not seek to answer questions on source localization and/or lateralization of MMN responses. Therefore, a three-electrode configuration (Fz, F3, F4) was sufficient. The Fz electrode was chosen for statistical analysis because the MMN response is known to be most robust and stable at this location. Also, Chandrasekaran et al. (2007) reported no effect of electrode location (Fz, F3, F4) on the mean amplitude and peak latency of the mismatch negativity responses. See Chandrasekaran et al. (2007) for further information about recording procedures.

2.5. Pre-processing and statistical analysis of MMN data

The baseline for the grand averaged waveforms was defined as the average amplitude between −100 ms to 0 ms (onset of stimuli). To calculate the MMN, the deviant waveform obtained from the oddball paradigm was subtracted from the deviant presented in the 100% probability condition. This subtraction process, also called an ‘identity MMN’, effectively controls for any acoustical differences between stimuli (Kraus et al., 1995; Sharma & Dorman, 2000; Ylinen et al., 2006). MMN amplitude was measured as the mean voltage from a 100 ms window centered on the grand average MMN peak latency obtained from each difference waveform, within the predefined MMN window. See Chandrasekaran et al. (2007) for further details about measures of MMN amplitude and peak latency.

MMN mean amplitudes and peak latencies were analyzed using a two-way mixed model ANOVA (subject as random effect) for the effects of language group (Chinese, English) and tonal condition (T1/T3, T1/T2, T2/T3). Partial and generalized eta-squared values were measured for all effects to determine effect size (Olejnik & Algina, 2003).

2.6. INDSCAL analysis

For the current application, the input consisted of 20 (subjects) 3 (stimulus tones) × 3 (stimulus tones) symmetric data matrices. Each data matrix contained distance estimates, i.e., the normalized MMN mean amplitude, for each paired comparison of the three stimulus tones (T1 vs. T2, T1 vs. T3, T2 vs. T3). INDSCAL (Carroll & Chang, 1970) analyses of these 20 dissimilarity matrices were performed at n (where n = 1, 2) dimensionalities in order to determine the appropriate number of dimensions underlying the distances among the three tones or objects in an auditory electrophysiological space. The output consisted of two matrices, a 3 (stimulus tones) × n dimensions matrix of coordinates of the three stimulus tones on n dimensions (represented visually in a ‘group stimulus space’), and a 20 × n matrix of weights of each of the 20 individual subjects (represented visually in a ‘subject space’).

Several criteria were used to determine the number of dimensions underlying the auditory electrophysiological space. First, solutions obtained from the INDSCAL iterative process were considered to have converged when they met the program’s internal criterion for convergence (< 0.01). In this application, convergence implies that the n dimensions were common to the MMN responses of all subjects irrespective of language group. Only one- and two-dimensional solutions were evaluated because the maximum number of dimensions possible is one less than the number of objects (T1, T2, T3). Second, the variance accounted for (VAF) was used as an index of overall fit, i.e., a measure of the proportion of variance in the scaled data that is accounted for by the INDSCAL procedure, for both one-dimensional and two-dimensional solutions. Based on the VAF, either a one-dimensional or a two-dimensional solution was chosen for further analysis. Third, the mean correlation (r) between the interstimulus distances from the two-dimensional solution and the original dissimilarity data obtained from the scaled MMN was used to determine how well the one or two-dimensional solutions reflect the original MMN scaled mean amplitude data. Fourth, to assess the statistical stability of the INDSCAL output, crossvalidated analyses were performed for one- or two-dimensional spaces, each based on a random split-half of the total subject sample. If the INDSCAL solution for the total sample was truly unique, similar stimulus configurations should result from INDSCAL analyses of the random split-half samples. Further, to assess the stability of the INDSCAL solution, the group space of the INDSCAL solution was rotated by 45 degrees. From this starting position, the program was allowed to converge. If the VAF is drastically reduced, relative to the original VAF, when the stimulus configuration is rotated, and converges back to the original configuration when iteration is allowed, the INDSCAL solution can be regarded as both unique and stable (J.D. Carroll, personal communication, October 18, 2006).

Discriminant analysis was performed for the n-dimensional solution to determine whether individual subjects can be classified into their respective language group on the basis of a weighted linear combination of the n dimension weights. The discriminant function was crossvalidated using a k-fold crossvalidation, i.e., dimension weights from each subject were used as validation data for the function created from dimension weights of the remaining subjects. This was repeated until every one of the twenty subjects was used for validation. Finally, comparisons of the mean subject weights per language group on each of the n dimensions were performed by means of a one-way ANOVA.

3. Results

3.1. Mismatch negativity

Chinese and English grand-average MMN standard and deviant waveforms for the three conditions (T1/T2, T1/T3, T2/T3) at the Fz electrode site are shown in Fig. 3. For all three conditions across groups, tones elicited robust MMN responses when presented in the low probability (p = 0.15) conditions relative to the 100% probability sequences. A two-way [group X condition] repeated measures ANOVA conducted on the mean MMN amplitude yielded a significant group (F1,18 = 12.94, p < 0.01, partial-η2 = 0.42, generalized-η2 = 0.26), condition (F2,36 = 9.19, p < 0.01, partial-η2 = 0.34, generalized-η2 = 0.15), and interaction effect between group and condition (F2,36 = 6.20, p < 0.01, partial-η2 = 0.26, generalized-η2 = 0.12). Post hoc Tukey-adjusted multiple comparisons revealed that the MMN mean amplitude of the T1/T3 (CT1/T3 > ET1/T3, p < 0.01) and T1/T2 (CT1/T2 > ET1/T2, p = 0.02) conditions were larger for the Chinese group relative to the English. The mean MMN amplitude of the T2/T3 condition, however, did not differ significantly between language groups (p = 1.00). Within the Chinese group, the mean MMN amplitude of the T2/T3 condition was significantly less than either the T1/T3 (CT2/T3 < CT1/T3, p < 0.01) or T1/T2 (CT2/T3 < CT1/T2, p < 0.01) condition. No significant difference was found in the mean MMN amplitude between the T1/T2 and T1/T3 conditions (p = 0.95). In contrast, within the English group, no significant differences in mean MMN amplitude were observed between any of the three tonal conditions.

Fig. 3.

Fig. 3

Grand-averaged standard (high probability, p = 1.00) and deviant waveforms (low probability, p = 0.15) obtained from the Fz electrode location are displayed for the two language groups (Chinese, top panel; English, bottom panel) per experimental condition (T1/T2, left; T1/T3, middle; T2/T3, right). The standard (‘identity’) waveforms were elicited by presenting T2 and T3 alone; the deviant waveforms were elicited by presenting T3 in the context of either T1 (T1/T3) or T2 (T2/T3), T2 in the context of T1 (T1/T2). Both groups show a larger MMN for the deviant relative to the standard, and a peak latency that occurs between 150–300 ms post stimulus onset. The T1/T2 and the T1/T3 conditions, which involve a contrast between level and contour tones, elicit larger MMNs in the Chinese group relative to the English (cf. Table 3). However, no group differences are observed in the magnitude of the MMN for the T2/T3 condition, which contrasts two similarly rising contour tones.

With respect to peak latency, a two-way [group X condition] repeated measures ANOVA yielded a significant condition effect (F2,36 = 7.27, p < 0.01, partial-η2 = 0.29, generalized-η2 = 0.26). Neither group (F1,18 = 0.42, p = 0.52, partial-η2 = 0.02, generalized-η2 = 0.01) nor interaction effects (F2,36 = 0.76, p = 0.47, partial-η2 = 0.04, generalized-η2 = 0.02) was significant. Post-hoc Tukey-adjusted multiple comparisons (p = 0.01) revealed that, irrespective of group, the peak negativity for the T2/T3 condition occurred later than either the T1/T2 or T1/T3.

3.2. Multidimensional scaling of the mismatch negativity

3.2.1. Number of dimensions

A comparison of the badness-of-fit of the INDSCAL solutions for one- and two-dimensional spaces indicated that two dimensions best describe the MMN data. The cumulative percentage of VAF by the two-dimensional INDSCAL model was 74%, a 14% increase over the one-dimensional solution (60%). These results indicate that two dimensions are necessary to characterize the subjects’ MMN responses to T1, T2, and T3.

The mean correlation between the interstimulus distances from the two-dimensional solution and the original dissimilarity data obtained from the scaled MMN mean amplitude measures, across all twenty subjects, was r = 0.95. For the Chinese group, the mean correlation was r = 0.92; for the English, r = 0.82. Thus, the INDSCAL model indicates that the data from both language groups can be described very well by this common set of two dimensions.

3.2.2. Interpretation of dimensions

The group stimulus space of the two-dimensional INDSCAL solution for all 20 subjects’ dissimilarities data (left panel) and two cross-validated two-dimensional solutions (right panel), each (a, b) based on a random half of the total subject sample, are shown in Fig. 4. For all three solutions, Dim 1 is interpretively labeled ‘height’, as measured by average F0 and F0 offset (Table 1) of each tone (T1, T2, T3). T3, the low falling-rising contour tone, is positioned toward one end of the axis, whereas T2, the high rising contour tone, and T1, the high level tone, are positioned toward the opposite end. For all three solutions, Dim 2 is interpretively labeled ‘contour’, as measured by changes in the magnitude of slope of each tone throughout its duration, especially from turning point to offset (Table 1). T1, the high level tone, is positioned toward one end of the axis, whereas the two rising contour tones (T2, T3), are positioned toward the opposite end. In this group stimulus space, it is further observed that the distance between T1 and T3 is much larger relative to that between T1 and T2, meaning that the mismatch negativity is especially sensitive to pitch height regardless of language group.

Fig. 4.

Fig. 4

Dimensions 1 (height) and 2 (contour) of the two-dimensional INDSCAL stimulus space from the combined group of 20 Chinese and English participants (left panel), and from two (a, b) random split-halves (right panel). Each dimension is normalized so that the mean of the coordinate values equals zero and the sum of squared coordinates equals 1.00. DIM = dimension. Similarity between the configurations of the crossvalidated and original INDSCAL solutions confirm the stability and validity of the group stimulus space.

Although Pearson product-moment correlations could not be computed due to the small number of objects (n = 3), the configuration of the group space from the two split-half solutions (Fig. 4, bottom panel) is similar to the solution obtained from all 20 subjects (Fig. 4, top panel). This result indicates that the orientation of the axes of the two-dimensional space is unique, and that the dimensions can be interpreted without rotation. As a further test of the stability of the solution, the axes of the group space was rotated by 45 degrees and the rotated stimulus configuration was used as the starting point for a new INDSCAL analysis with zero iterations. The VAF dropped to 0.07, indicating that the original INDSCAL group configuration was genuinely unique. Further, when the rotated solution was used as the starting configuration, and the program was allowed to iterate, the program converged on to the original solution, confirming the stability of the INDSCAL group configuration.

3.2.3. Dimension weights for subjects

The subject space for the two-dimensional INDSCAL solution for all 20 subjects is shown in Fig. 5. Subjects’ weights on the two dimensions are shown by the projection of symbols on the coordinate axes. If subjects assigned equal weights to the two dimensions, their symbols would be located at a 45° angle (equidistant) between the two axes of the subject space. Instead, subjects’ weights appear to project off the 45° angle, and moreover, are clustered according to language group. All but one of the Chinese subjects project higher on Dim 2 contour when compared to the majority of English subjects (9/10). On Dim 1 height, however, subjects’ projections do not appear to cluster by language group. Results of a one-way ANOVA of the mean subject weights per language group confirmed that pitch contour had a greater influence on the mismatch negativity of the Chinese group relative to the English (F1,18 = 20.81, p < 0.01, η2 = 0.54), whereas pitch height did not yield a language group effect (F1,18 = 0.35, p = 0.55, η2 = 0.02).

Fig. 5.

Fig. 5

Subject space of the two-dimensional INDSCAL configuration for the Chinese (●) and English (∇) language groups. DIM = dimension. Only three (dotted-line rectangles) of 20 subjects are incorrectly classified with the other language group by means of discriminant analysis. The distribution of subjects reveals that MMN responses from the Chinese language group are more heavily weighted on the pitch contour dimension as compared to the English group.

3.3. Discriminant analysis of subject weights

Discriminant functions were used to determine the extent to which individual subjects can be classified into their respective language groups based on Dim 1 weights, Dim 2 weights, or a weighted linear combination of the two dimension weights. The classification matrices from these three discriminant analyses show that Dim 2 contour yielded the highest correct classification of subjects into their respective language groups (Table 2). Dim 2 was very effective in discriminating individuals by language group; 90% and 80% correct for Chinese and English subjects, respectively (canonical R2 = 0.73). Because we expect only 50% correct classification by chance, an overall 85% accuracy rate suggests that Dim 2 is a useful predictor in separating the two groups. A linear-weighted combination of the two dimensions yielded a slightly lower overall 75% accuracy rate, 80% and 70% correct for Chinese and English subjects, respectively (canonical R2 = 0.75). Dim 1 height, on the other hand, was not a useful predictor of language group membership; 50% chance level for both groups (canonical R2 = 0.13).

Table 2.

Classification matrix from discriminant analysis of Chinese and English language groups for separate and combined dimensions from the INDSCAL group stimulus space

Predicted group
Original group n Chinese English
Dimension 1 (Height)
Chinese 10 5 5
English 10 5 5
Dimension 2 (Contour)
Chinese 10 9 1
English 10 2 8
Dimensions 1 + 2 (Height + Contour)
Chinese 10 8 1
English 10 3 7

4. Discussion

4.1. Influence of language experience on the MMN response: categories or dimensions?

With the inclusion of MMN responses to a third condition (T1/T2), in addition to the two conditions (T1/T3, T2/T3) from Chandrasekaran et al. (2007), it can be seen that the native Chinese group exhibits larger MMN responses than the English group to the T1/T2 and T1/T3 conditions, but not to the T2/T3 (Fig. 3). Whereas the MMN responses to the T1/T2 and T1/T3 conditions are larger than that to the T2/T3 condition for the Chinese group, no condition effects are evident for the English group. By focusing on individual tones, however, we miss the phonetic generalization that both T1/T2 and T1/T3 involve a comparison between a level and contour tone, in contrast to T2/T3, which involves a comparison between two tones with similar contours. By applying an individual differences model of multidimensional scaling (Carroll & Chang, 1970; Harshman, 1970), we are now able to quantify not only the number and nature of the pitch dimensions underlying the auditory electrophysiological responses, but also the relative weighting of these dimensions as a function of language experience.

4.2. Underlying dimensions of preattentive tone processing

INDSCAL analysis reveals that two dimensions underlie the scaled MMN dissimilarities data across Chinese and English subjects. As measured by average F0 or F0 offset (Table 1), Dim 1 appears to be organized on the basis of pitch height (Fig. 4). Prima facie, Dim 2, on the other hand, appears to reflect measures related to F0 slope (onset to turning point; turning point to offset) and/or another measure of pitch height (F0 onset). Instead we adduce both internal and external evidence to support our interpretation that Dim 2 is organized primarily on the basis of pitch contour. Because of the limited number of stimulus objects in this study, the question remains as to how many such pitch dimensions are represented separately in auditory sensory memory.

In the current study, irrespective of group, the peak latency of the MMN occurs later for the T2/T3 condition (200 ms) relative to T1/T2 or T1/T3 (170–180 ms). If the MMN is responding primarily to pitch onset, we would not expect to observe a condition effect for peak latency. Pitch onset is a static cue that occurs at a single point in time. Two conditions (T1/T2, T1/T3) involve a comparison between a level tone and a contour tone; the other condition, T2/T3, a comparison between two rising contour tones. The later MMN for the T2/T3 condition presumably reflects the greater time that is required to disambiguate these two tones based on a dynamic cue, i.e., pitch contour. Moreover, it is well known that the MMN occurs between 100–150 ms post onset for phonetic stimuli (Rinne et al., 1999). In this study, the relatively later range of peak latency (170–200 ms) reinforces the notion that the MMN response is integrated over time rather than being elicited by a single event at the onset of the stimulus.

The MMN peak latency is a function of the magnitude of acoustic deviance between standard and deviant (Naatanen et al., 1997). Besides acoustic differences between stimuli, the MMN mean amplitude can also be shaped by language experience (Naatanen, 2001; Naatanen, Tervaniemi, Sussman, Paavilainen, & Winkler, 2001). Therefore, it is possible that the later occurring negativity for T2/T3 simply reflects greater acoustic similarity between T2 and T3 relative to T1/T2 and T1/T3 (Table 1, Fig. 4). Based on psychoacoustics of pitch onset alone, we would predict that the mean MMN amplitude for the English group to be larger in the T1/T3 and T1/T2 conditions relative to T2/T3. If pitch onset is the primary cue for Dim 2, we would further expect both groups to show larger MMN responses for the T1/T3 condition compared to T1/T2. These predictions are not borne out. MMN mean amplitude does not differ between any of the three conditions in the English group; nor does it differ between T1/T2 vs. T1/T3 across groups. Instead, our findings suggest that the MMN peak latency reflects multiple pitch dimensions, differentially weighted by language experience, and integrated over time. The presence of multiple dimensions in our speech stimuli, as compared to those that vary along a single dimension only, results in a later negativity. That is, pitch onset per se, a static cue based on a single point in time, cannot account for the longer time frame of the MMN.

External behavioral data also support the labeling of Dim 2 as a pitch contour dimension. Based on earlier multidimensional scaling investigations of tone perception across five tone language groups (Cantonese, Mandarin, Taiwanese, Thai, Yoruba) and a non-tone language group (English), three dimensions emerge in the group stimulus space: height, direction, and contour (Gandour, 1983; Gandour & Harshman, 1978). Crosslanguage differences in perceptual saliency of dimensions can be related to the presence of specific types of lexical tones (e.g., level, contour) as well as to tone rules in the listeners’ phonological system (Huang, 2004; Hume & Johnson, 2001). Of relevance to Dim 2, the height dimension is found to be important to all language groups regardless of typology, whereas the direction and contour dimensions are relatively more important to speakers of tone languages. In the case of Mandarin Chinese vs. English (Gandour, 1983), the English group gave more weight to the height dimension than the Mandarin group, whereas the reverse group effect occurred on the direction dimension.

Previous MDS investigations of tone perception constructed dissimilarity matrices from direct comparisons of tonal pairs (Gandour, 1983; Gandour & Harshman, 1978) or from an inverse of reaction time to tonal pairs (Huang, 2004). In the current study, the dissimilarity matrices for subjects were constructed from the differences in ERP responses obtained from passive listening conditions. Despite methodological differences in stimuli, task, and subjects, the two dimensions obtained from the ERPs (pitch height, pitch contour) are consistent with those described in the earlier literature. This convergence of behavioral and electrophysiological data bolsters the view that these two pitch dimensions are crucial for understanding language-dependent effects on tonal processing in the brain.

Choosing a higher-dimensional solution may lead to an over-inflation of the variance accounted for (Hair, Anderson, Tatham, & Black, 1998). In this study, however, the selection of a two-dimensional solution was essential for revealing crosslanguage differences in MMN sensitivity to Chinese tones. Previous MDS studies have consistently shown that at least two dimensions are used in the processing of tone, and that these two dimensions are most likely to reflect pitch height and pitch direction or contour. The one-dimensional solution accounted for much less variance than the two-dimensional, thus justifying that a two-dimensional solution was necessary to account for the dissimilarities data. Taken together, the interpretability of the solutions, the variance accounted for, the distribution of subject weights by language group, and the discriminant analysis of subject weights all point to the two-dimensional solution as offering the best insights into preattentive processing of lexical tone.

A truly one-dimensional solution, on the other hand, “constitutes a fairly uninteresting special case for the INDSCAL model” (Arabie, Carroll, & DeSarbo, 1987, p. 35). In that case, any group differences in individual weights would be nothing more than a scaling factor. A two-dimensional solution, on the other hand, gives us an opportunity to evaluate the influence of language experience on early cortical processing of different pitch dimensions in the speech signal.

4.3. Influence of language experience on the dimensions of preattentive tone processing

Our findings on the relative weighting of Dim 2 show that MMN responses of the Chinese group are more sensitive to pitch contour than the English group. A discriminant function using this pitch contour dimension is highly successful in separating the two language groups. Consistent with earlier MDS data (Gandour, 1983; Gandour & Harshman, 1978), the greater sensitivity of the MMN to Dim 2 in the Chinese group, relative to English, provides a neurobiological signature of the relevance of the pitch contour in early cortical stages of tonal processing. Such crosslanguage differences in the effects of the pitch contour dimension on the mismatch negativity are compatible with its role in language typology. In Pike’s dichotomy (Pike, 1948), there are contour-tone and register-tone languages. The former refers to tonal systems in which tones are best described in terms of single points within a pitch range, the latter to those in which tones are better described on the basis of gliding pitch movements. Mandarin Chinese is classified as a contour-tone language. Since pitch contour is a critical phonetic feature of the Mandarin tone space, native speakers place more emphasis on it, relative to nonnative speakers, in early stages of pitch extraction from the auditory signal.

The relative importance of Dim 1, on the other hand, is the same for Chinese and English speakers alike, indicating that this dimension is shared in common regardless of language typology. A discriminant function based on pitch height can achieve no better than chance rates of classifying Chinese and English subjects into their respective language group. Taken together, these findings demonstrate that the effect of language experience on electrophysiological responses varies depending on specific pitch dimensions that underlie the tonal categories rather than the categories themselves.

4.4. Comparison to crosslanguage studies of tone processing at the level of the cortex and the auditory brainstem

Our finding that language experience influences preattentive processing of pitch information in the cerebral cortex is compatible with recent crosslanguage investigations of pitch processing at the brainstem level. As measured in the human frequency following response (FFR), pitch strength and pitch tracking accuracy have been reported to be greater for Chinese listeners than for English across all four Mandarin tones (Krishnan, Xu, Gandour, & Cariani, 2005). Our MMN data notwithstanding, their brainstem data suggest that there may be even enhancing or priming of linguistically-relevant pitch information well before the auditory signal reaches the cerebral cortex. Their stimuli were identical to those in this study, i.e., prototypical, curvilinear F0 contours of Mandarin tones. In a subsequent FFR study (Xu, Krishnan, & Gandour, 2006), pitch strength and tracking accuracy were examined in linear rising and falling F0 ramps representative of Mandarin T2 and T4. Interestingly, no crosslanguage differences were observed in pitch strength or accuracy for either tone, indicating that stimuli with linear rising/falling ramps elicit homogeneous pitch representations at the level of the brainstem. In the case of Mandarin T2 and T4, native listeners’ long-term learning experience has improved their ability to rapidly track nonlinear changes in pitch movement at the syllable level with a high degree of accuracy. No language-dependent effects, however, are observed in response to linear rising or falling F0 ramps because they are not part of native Chinese listeners’ experience. Taken together, these findings lead us to conclude that pitch extraction at the brainstem level is critically dependent on specific dimensions of pitch contours that native speakers have been exposed to in natural speech contexts. Both local reorganization (Krishnan et al., 2005), and corticofugal modulation of brainstem neural activity appear to be implicated in this experience dependent effect. It remains to be determined just exactly how fine-grained is the degree of linguistic specificity for pitch extraction at both cortical (MMN) and subcortical (FFR) stages of processing.

Our crosslanguage effects are also compatible with PET (positron emission tomography) and fMRI (functional magnetic resonance imaging) investigations of pitch processing in native vs. nonnative speakers of tone languages at the level of the cerebral cortex (see Gandour, 2006a; 2006b, for reviews). A major finding is that the left hemisphere (LH) is implicated in pitch processing only when the pitch contours are of linguistic relevance to the listener. For example, when asked to discriminate Thai tones, Chinese listeners fail to show activation of LH regions (Gandour et al., 2000; Gandour et al., 2002), indicating that the influence of experience-dependent parameters of the auditory signal is specific to a particular language. This influence of categorical representations on pitch processing has been further demonstrated using hybrid stimuli created by superimposing Thai tones onto Mandarin syllables (tonal chimeras) and Mandarin tones onto the same syllables (real words) (Xu et al., 2006). In the left planum temporale, a double dissociation occurs between language experience (Chinese, Thai) and neural representation of pitch, such that stronger activity is elicited in response to native as compared to non-native tones. These brain imaging experiments, however, all used speeded response discrimination tasks that carry considerable attention and memory demands. In this experiment, the task did not involve selective attention or working memory. Our ERP data are seen to complement those of PET and fMRI. With its superior temporal resolution, our ERP data show that the influence of tonal categories not only modulates early automatic preattentive cortical activity, but moreover, that neural mechanisms operating on pitch information at this very early stage of processing may be based primarily on phonetic dimensions underlying these tonal categories. Future research is to be directed to the time course of cortical dynamics of tonal processing and its spatially-distributed neural circuitry in the cerebral cortex as well as in subcortical areas.

4.5. Applying the INDSCAL model to crosslanguage comparisons of MMN sensitivity to speech stimuli

A crucial assumption of the INDSCAL model is that subjects use a common set of dimensions, though they may differ in the relative weighting of each dimension. Previous INDSCAL applications have relied on subjective perceptions of similarities or dissimilarities between objects. Similarity and dissimilarity matrices have typically been constructed using reaction time information, paired comparisons, preference scales, or direct rankings. All these behavioral techniques encounter issues of high intra-subject as well as inter-subject variability due in large part to working memory and selective attention demands related to the task. In the current study, we minimize intra-subject variability by applying the INDSCAL model directly to auditory electrophysiological responses elicited from stimuli presented in a passive oddball paradigm. The task is passive in the sense that subjects were instructed to ignore the auditory stimuli presented via earphones; to refrain from extraneous body movements; and to focus their attention exclusively on a self-selected, closed-caption silent movie. In such tasks, the MMN is considered preattentive because the task requires no voluntary attention or working memory.

The majority of MMN studies of speech stimuli have been designed so that only a single dimension distinguishes the deviant from its standard. For example, in crosslanguage studies of vowel perception (Cheour et al., 1998; Naatanen et al., 1997), only the second formant was varied across stimuli. Other formants and voice fundamental frequency were held constant. Similarly, in a crosslanguage study of the perception of stop consonants (Sharma & Dorman, 1999), voice onset time (VOT) was the only variable that distinguished the standards and deviants. The perceptual saliency of these particular cues (F2/F1 ratio, VOT) notwithstanding, it is well known that speakers utilize multiple cues in processing speech stimuli. And moreover, these cues may be more or less important depending on native language experience, or even second language (L2) learning. In a perceptual training study of learning the three-way voicing contrast in Korean stop consonants (Francis & Nusbaum, 2002), MDS was used to show that after training, L2 listeners’ perceptual space undergoes restructuring in terms of the relative weighting of acoustic-phonetic dimensions. Of relevance to the issue of multiple cues, phonetic learning resulted in L2 listeners being able to direct their attention to a previously unattended dimension of phonetic contrast that has no analog in their native language (English). Using the INDSCAL model, we can assess directly subjects’ relative weighting of multiple dimensions, and moreover, determine the extent to which any differential effects can be attributed to language experience.

In the current study, the stimulus set consisted of three lexical tones that are distinguished by time-varying F0 trajectories (Fig. 1). There are multiple spectral and temporal cues that can possibly separate the three tones (Table 1). If MMN responses to each condition are examined separately (T1/T2, T1/T3, T2/T3), we are then unable to generalize beyond the level of a tonal category. As a consequence, any similarities in MMN responses between stimulus pairs are seen to be accidental. It is only by applying INDSCAL to MMN mean amplitudes across all three conditions that we are able to extract the two underlying dimensions: pitch height and pitch contour. Though their relative weighting varies as a function of native vs. nonnative language experience, it is especially noteworthy that the pitch contour dimension is the one that clearly separates the two language groups. Not all dimensions are created equal when it comes to determining what language group a listener belongs to. Such differences in their relative importance can be attributed to abstract structural properties of a listener’s phonological system (Gandour, 1983). Such insights are only possible by applying statistical methods (e.g., INDSCAL) that permit us to examine the relative weighting of features or dimensions that make up the speech categories.

4.6. Experience-dependent plasticity

4.6.1. Second language learning

The relative weighting of acoustic features is critical to understanding difficulties faced by learners of a second language. In a crosslanguage (English, German, Japanese) investigation of the perception of English /r/ vs. /l/(Iverson et al., 2003), Iverson et al. applied MDS analysis to perceptual judgments of stimuli varying in F2 and F3. Of these two formants, variation in F3 is the preeminent acoustic cue for distinguishing /r/ from /l/ (Goto, 1971; Miyawaki et al., 1975). In English and German, this segmental contrast is phonemic; it is non-phonemic in Japanese. English and German listeners placed relatively more emphasis on the F3 dimension. Japanese listeners, on the other hand, directed their attention primarily to F2. Their difficulties in the acquisition of this segmental contrast may be attributed to their reliance (or weighting) on a dimension (F2) that is irrelevant in differentiating English /r/ vs. /l/.

Similar difficulties in the L2 acquisition of a suprasegmental contrast are to be expected by nonnative learners of a tone language (Kiriloff, 1969; Shen, 1989). In this study, MMN responses of the Chinese group are more sensitive to pitch contour than the English group. Such findings agree with previous MDS studies showing that native Mandarin speakers rely on changes in F0 contours more than height in distinguishing tones, whereas speakers of non-tone languages (English) tend to attach more importance to height (Gandour, 1983; Gandour, 1984). Since English has no contrastive tones, English listeners might be expected to focus their attention more on pitch height. Their difficulties in the acquisition of this suprasegmental contrast may be attributed to their reliance on a dimension (pitch height) that is of secondary importance in the Mandarin tone space.

Interestingly, the perception of Mandarin tones by adult English L2 learners can be improved with auditory training (Wang, Spence, Jongman, & Sereno, 1999). Using multiple talkers, both male and female, L2 learners were able to achieve a 21% increase in the accuracy of tone perception after training. Wang et al. infer that this improvement results from trainees’ ability to focus more attention on pitch contour than height. At the cortical level, L2 acquisition of Chinese tones, as measured by tonal identification, involves both the expansion of left language-related areas and the recruitment of right hemisphere regions specialized for functions related to pitch (Wang, Sereno, Jongman, & Hirsch, 2003). Using INDSCAL, it is now possible to investigate whether the mismatch negativity, a measure of early preattentive cortical processing, is responsive to changes in the relative weighting of pitch dimensions that result from auditory training.

4.6.2. Speech specific or domain general?

Our experimental design did not include comparisons of homologous non-speech stimuli with time-varying pitch. Hence we cannot determine conclusively whether the experience-dependent plasticity of the MMN is specific to speech or whether it is domain general.

In a crosslanguage (Chinese, English) study of categorical perception (CP) of speech and nonspeech continuua ranging from level (T1) to rising (T2) (Xu et al., 2006), results show evidence of strong categorical perception of speech stimuli for Chinese but not English listeners. CP of nonspeech stimuli was comparable to that for speech stimuli for Chinese but weaker for English listeners. Their findings suggest that the perceptual salience of the pitch direction or contour dimension is greater for native speakers of Mandarin Chinese than for speakers of non-tone languages. Linguistic experience directs attention to linguistically-relevant properties of the auditory signal. We therefore conclude that Chinese listeners’ nonspeech performance must derive at least in part from their experience with listening to Chinese pitch patterns. These data lead us to infer that Chinese listeners’ native language experience has changed the way they process pitch patterns regardless of the stimulus context in which these patterns are embedded. Although the basis for cross-language differences in CP may emerge from linguistic experience, the effects of such experience are not specific to speech perception. More generally, we predict CP effects whenever listeners are asked to judge auditory features that are similar to linguistically-relevant speech parameters in their native language no matter whether they are presented in the context of natural speech or not.

With respect to the mismatch negativity, language experience can modify the magnitude of the MMN responses to linguistically-relevant cues in nonspeech contexts (Tervaniemi et al., 2006). Native speakers of Finnish, for whom the length of a consonant or vowel conveys differences in word meaning (e.g., /tuli/ ‘fire’, /tuuli/ ‘wind’, /tulli/ ‘customs’), showed larger MMN responses to nonspeech stimuli that differ in duration, relative to German speakers, for whom duration cues are not phonemic. Interestingly, the two language groups did not differ on nonspeech stimuli differing in frequency, presumably because pitch is not phonemic for either language group. Thus, it appears that preattentive cortical processing can be selectively tuned to those sound features of the auditory signal that are of phonological relevance in a particular language even in nonspeech sounds (Tervaniemi et al., 2006, p. 2541). In the case of voice fundamental frequency (F0), the next step is to determine whether selective tuning extends to frequency as well as duration in nonspeech sounds.

Musical experience can influence the preattentive processing of several dimensions of the auditory signal, including intensity (Tervaniemi, Castaneda, Knoll, & Uther, 2006), spatial location (Tervaniemi, Castaneda et al., 2006; Nager, Kohlmetz, Altenmuller, Rodriguez-Fornells, & Munte, 2003), and pitch (Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004; Koelsch, Schroger, & Tervaniemi, 1999; Tervaniemi, Rytkonen, Schroger, Ilmoniemi, & Naatanen, 2001). Pop and jazz musicians, who predominantly ‘learned by ear’, showed superior ability to process musical contours than classical musicians, who relied on musical scores (Tervaniemi, Rytkonen, Schroger, Ilmoniemi, & Naatanen, 2001). In terms of the MMNm (magnetic version of the electrical MMN), musicians showed a larger MMNm in response to musical contours and intervals when compared to non-musicians (Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004). A recent model of music processing, based on data from individuals with neurological impairments, incorporates functionally dissociable components for tonal encoding, contour, and interval analysis (Peretz & Coltheart, 2003). Tonal encoding is argued to be specific to music because it is based on a hierarchical organization of scale tones to one another (Peretz, 2006; Peretz & Coltheart, 2003). Our data suggest it is possible that lexical tones may be similarly organized in a hierarchical fashion depending on their acoustic-phonetic dimensions in a language-specific tone space. Whether tonal encoding in music is comparable to that in speech is a matter for further empirical investigation.

4.7. Separate processing of pitch dimensions in auditory sensory memory

On the basis of additivity of the MMN response and neurally separate generators for different dimensions of musical timbre, Caclin et al. (2006) suggest that the auditory sensory memory represents different features of the stimuli separately. Their results, however, do not completely preclude holistic processing of stimulus features. They allow that the MMN may reflect both separate and holistic analysis of auditory attributes, perhaps at different stages of processing. Our findings on pitch are compatible with theirs on timbre insomuch as we show that two separate pitch dimensions (height, contour) can be extracted from the MMN signal. By virtue of language-dependent enhancement of one of the pitch dimensions (contour) for native (Chinese) relative to non-native (English) speakers, we similarly argue that features of the signal are indeed processed separately in auditory sensory memory. Since the experiment was not designed to tease apart holistic from separate feature processing, we cannot rule out holistic processing at a later processing stage.

5. Conclusion

By applying individual differences multidimensional scaling (INDSCAL) analysis to the MMN mean amplitude, we are able to extract dimensions underlying the preattentive processing of linguistically-relevant time-varying pitch patterns. For at least some of the processes indexed by the MMN, the relevant entities are pitch dimensions and not pitch as a whole. Two pitch dimensions, height and contour, underlie Chinese and English subject’s processing of the three Mandarin tones (T1, T2, T3). Discriminant analysis indicates that pitch contour, not height, effectively separates native speakers of Chinese from English. Thus, not only can we extract the dimensions underlying the processing of standards and deviants that vary in more than one dimension, but we can also measure the extent to which individual differences in pitch processing are influenced by language experience, as reflected in the mismatch negativity. More broadly, INDSCAL gives us a tool for exploring the number and nature of neural dimensions underlying the MMN and their relative importance to one another as a function of one’s auditory experience.

Acknowledgments

Research supported in part by the Purdue Research Foundation (J. G.) and the College of Liberal Arts (A. K.). Thanks to Eunjung Lim, Bruce Craig, and Yagna Kalyanaraman for their assistance with statistical analysis. B. C. is currently a predoctoral student in the Purdue University Life Sciences Integrative Neuroscience Program.

References

  1. Arabie P, Carroll JD, DeSarbo WS. Three-way scaling and clustering. Newbury Park, CA: Sage; 1987. [Google Scholar]
  2. Caclin A, Brattico E, Tervaniemi M, Naatanen R, Morlet D, Giard MH, et al. Separate neural processing of timbre dimensions in auditory sensory memory. Journal of Cognitive Neuroscience. 2006;18(12):1959–1972. doi: 10.1162/jocn.2006.18.12.1959. [DOI] [PubMed] [Google Scholar]
  3. Caclin A, McAdams S, Smith BK, Winsberg S. Acoustic correlates of timbre space dimensions: A confirmatory study using synthetic tones. The Journal of the Acoustical Society of America. 2005;118(1):471–482. doi: 10.1121/1.1929229. [DOI] [PubMed] [Google Scholar]
  4. Carroll JD, Chang JJ. Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition. Psychometrika. 1970;35:283–319. [Google Scholar]
  5. Chandrasekaran B, Krishnan A, Gandour JT. Mismatch negativity to pitch contours is influenced by language experience. Brain Res. 2007;1128(1):148–156. doi: 10.1016/j.brainres.2006.10.064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cheour M, Ceponiene R, Lehtokoski A, Luuk A, Allik J, Alho K, et al. Development of language-specific phoneme representations in the infant brain. Nat Neurosci. 1998;1(5):351–353. doi: 10.1038/1561. [DOI] [PubMed] [Google Scholar]
  7. Dehaene-Lambertz G, Dupoux E, Gout A. Electrophysiological correlates of phonological processing: a cross-linguistic study. J Cogn Neurosci. 2000;12(4):635–647. doi: 10.1162/089892900562390. [DOI] [PubMed] [Google Scholar]
  8. Francis AL, Nusbaum HC. Selective attention and the acquisition of new phonetic categories. J Exp Psychol Hum Percept Perform. 2002;28(2):349–366. doi: 10.1037//0096-1523.28.2.349. [DOI] [PubMed] [Google Scholar]
  9. Fujioka T, Trainor LJ, Ross B, Kakigi R, Pantev C. Musical training enhances automatic encoding of melodic contour and interval structure. J Cogn Neurosci. 2004;16(6):1010–1021. doi: 10.1162/0898929041502706. [DOI] [PubMed] [Google Scholar]
  10. Gandour J. Tone perception in Far Eastern languages. Journal of Phonetics. 1983;11:149–175. [Google Scholar]
  11. Gandour J. Tone dissimilarity judgments by Chinese listeners. Journal of Chinese Linguistics. 1984;12:235–261. [Google Scholar]
  12. Gandour J. Brain mapping of Chinese speech prosody. In: Li P, Tan LH, Bates E, Tzeng OJL, editors. Handbook of East Asian psycholinguistics: Chinese. Vol. 1. New York: Cambridge University Press; 2006a. pp. 308–319. [Google Scholar]
  13. Gandour J. Tone: Neurophonetics. In: Brown K, editor. Encyclopedia of language and linguistics. 2. Vol. 12. Oxford, UK: Elsevier; 2006b. pp. 751–760. [Google Scholar]
  14. Gandour J, Harshman R. Crosslanguage differences in tone perception: a multidimensional scaling investigation. Language and Speech. 1978;21:1–33. doi: 10.1177/002383097802100101. [DOI] [PubMed] [Google Scholar]
  15. Gandour J, Wong D, Hsieh L, Weinzapfel B, Van Lancker D, Hutchins GD. A crosslinguistic PET study of tone perception. J Cogn Neurosci. 2000;12(1):207–222. doi: 10.1162/089892900561841. [DOI] [PubMed] [Google Scholar]
  16. Gandour J, Wong D, Lowe M, Dzemidzic M, Satthamnuwong N, Tong Y, et al. A cross-linguistic FMRI study of spectral and temporal cues underlying phonological processing. J Cogn Neurosci. 2002;14(7):1076–1087. doi: 10.1162/089892902320474526. [DOI] [PubMed] [Google Scholar]
  17. Goto H. Auditory perception by normal Japanese adults of sounds L and R. Neuropsychologia. 1971;9(3):317–323. doi: 10.1016/0028-3932(71)90027-3. [DOI] [PubMed] [Google Scholar]
  18. Hair JF, Anderson RE, Tatham RL, Black WC. Multivariate data analysis. New Jersey: Prentice Hall; 1998. [Google Scholar]
  19. Harshman R. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis. 10,085. Ann Arbor: University Microfilms; 1970. p. 10,085. [Google Scholar]
  20. Honbolygo F, Csepe V, Rago A. Suprasegmental speech cues are automatically processed by the human brain: a mismatch negativity study. Neurosci Lett. 2004;363(1):84–88. doi: 10.1016/j.neulet.2004.03.057. [DOI] [PubMed] [Google Scholar]
  21. Howie J. Acoustical studies of Mandarin vowels and tones. Cambridge, UK: Cambridge University Press; 1976. [Google Scholar]
  22. Huang T. Unpublished Dissertation. Ohio State University; Columbus, OH: 2004. Language-specificity in auditory perception of Chinese tones. [Google Scholar]
  23. Hume E, Johnson K. A model of the interplay of speech perception and phonology. In: Hume E, Johnson K, editors. The role of speech perception in phonology. New York: Academic Press; 2001. pp. 3–25. [Google Scholar]
  24. Iverson P, Kuhl PK, Akahane-Yamada R, Diesch E, Tohkura Y, Kettermann A, et al. A perceptual interference account of acquisition difficulties for non-native phonemes. Cognition. 2003;87(1):47–57. doi: 10.1016/s0010-0277(02)00198-1. [DOI] [PubMed] [Google Scholar]
  25. Kiriloff C. On the auditory discrimination of tones in Mandarin. Phonetica. 1969;20:63–67. [Google Scholar]
  26. Klatt DH. Software for a cascade/parallel formant synthesizer. Journal of the Acoustical Society of America. 1980;67(3):971–995. [Google Scholar]
  27. Klatt DH, Klatt LC. Analysis, synthesis, and perception of voice quality variations among female and male talkers. Journal of the Acoustical Society of America. 1990;87(2):820–857. doi: 10.1121/1.398894. [DOI] [PubMed] [Google Scholar]
  28. Koelsch S, Schroger E, Tervaniemi M. Superior pre-attentive auditory processing in musicians. Neuroreport. 1999;10(6):1309–1313. doi: 10.1097/00001756-199904260-00029. [DOI] [PubMed] [Google Scholar]
  29. Kraus N, Cheour M. Speech sound representation in the brain. Audiol Neurootol. 2000;5(3–4):140–150. doi: 10.1159/000013876. [DOI] [PubMed] [Google Scholar]
  30. Kraus N, McGee T, Carrell T, King C, Tremblay K, Nicol T. Central auditory system plasticity associated with speech discrimination training. Journal of Cognitive Neuroscience. 1995;7:25–32. doi: 10.1162/jocn.1995.7.1.25. [DOI] [PubMed] [Google Scholar]
  31. Krishnan A, Xu Y, Gandour J, Cariani P. Encoding of pitch in the human brainstem is sensitive to language experience. Brain Res Cogn Brain Res. 2005;25(1):161–168. doi: 10.1016/j.cogbrainres.2005.05.004. [DOI] [PubMed] [Google Scholar]
  32. Miyawaki K, Liberman AM, Strange W, Verbrugge R, Liberman AM, Jenkins JJ, et al. Effect of linguistic experience – Discrimination of [R] and [L] by native speakers of Japanese and English. Perception & Psychophysics. 1975;18(5):331–340. [Google Scholar]
  33. Naatanen R. The perception of speech sounds by the human brain as reflected by the mismatch negativity (MMN) and its magnetic equivalent (MMNm) Psychophysiology. 2001;38(1):1–21. doi: 10.1017/s0048577201000208. [DOI] [PubMed] [Google Scholar]
  34. Naatanen R, Lehtokoski A, Lennes M, Cheour M, Huotilainen M, Iivonen A, et al. Language-specific phoneme representations revealed by electric and magnetic brain responses. Nature. 1997;385(6615):432–434. doi: 10.1038/385432a0. [DOI] [PubMed] [Google Scholar]
  35. Naatanen R, Tervaniemi M, Sussman E, Paavilainen P, Winkler I. “Primitive intelligence” in the auditory cortex. Trends Neurosci. 2001;24(5):283–288. doi: 10.1016/s0166-2236(00)01790-2. [DOI] [PubMed] [Google Scholar]
  36. Nager W, Kohlmetz C, Altenmuller E, Rodriguez-Fornells A, Munte TF. The fate of sounds in conductors’ brains: an ERP study. Brain Res Cogn Brain Res. 2003;17(1):83–93. doi: 10.1016/s0926-6410(03)00083-1. [DOI] [PubMed] [Google Scholar]
  37. Nenonen S, Shestakova A, Huotilainen M, Naatanen R. Linguistic relevance of duration within the native language determines the accuracy of speech-sound duration processing. Brain Res Cogn Brain Res. 2003;16(3):492–495. doi: 10.1016/s0926-6410(03)00055-7. [DOI] [PubMed] [Google Scholar]
  38. Olejnik S, Algina J. Generalized eta and omega squared statistics: measures of effect size for some common research designs. Psychol Methods. 2003;8(4):434–447. doi: 10.1037/1082-989X.8.4.434. [DOI] [PubMed] [Google Scholar]
  39. Peretz I. The nature of music from a biological perspective. Cognition. 2006;100(1):1–32. doi: 10.1016/j.cognition.2005.11.004. [DOI] [PubMed] [Google Scholar]
  40. Peretz I, Coltheart M. Modularity of music processing. Nat Neurosci. 2003;6(7):688–691. doi: 10.1038/nn1083. [DOI] [PubMed] [Google Scholar]
  41. Pike KL. Tone languages. Ann Arbor, MI: University of Michigan Press; 1948. [Google Scholar]
  42. Rinne T, Alho K, Alku P, Holi M, Sinkkonen J, Virtanen J, et al. Analysis of speech sounds is left-hemisphere predominant at 100–150 ms after sound onset. Neuroreport. 1999;10(5):1113–1117. doi: 10.1097/00001756-199904060-00038. [DOI] [PubMed] [Google Scholar]
  43. Sharma A, Dorman MF. Cortical auditory evoked potential correlates of categorical perception of voice-onset time. J Acoust Soc Am. 1999;106(2):1078–1083. doi: 10.1121/1.428048. [DOI] [PubMed] [Google Scholar]
  44. Sharma A, Dorman MF. Neurophysiologic correlates of cross-language phonetic perception. J Acoust Soc Am. 2000;107(5 Pt 1):2697–2703. doi: 10.1121/1.428655. [DOI] [PubMed] [Google Scholar]
  45. Shen XS. Toward a register approach in teaching Mandarin tones. Journal of Chinese Language Teachers Association. 1989;24:27–47. [Google Scholar]
  46. Tervaniemi M, Castaneda A, Knoll M, Uther M. Sound processing in amateur musicians and nonmusicians: event-related potential and behavioral indices. Neuroreport. 2006;17(11):1225–1228. doi: 10.1097/01.wnr.0000230510.55596.8b. [DOI] [PubMed] [Google Scholar]
  47. Tervaniemi M, Jacobsen T, Rottger S, Kujala T, Widmann A, Vainio M, et al. Selective tuning of cortical sound-feature processing by language experience. Eur J Neurosci. 2006;23(9):2538–2541. doi: 10.1111/j.1460-9568.2006.04752.x. [DOI] [PubMed] [Google Scholar]
  48. Tervaniemi M, Rytkonen M, Schroger E, Ilmoniemi RJ, Naatanen R. Superior formation of cortical memory traces for melodic patterns in musicians. Learning & Memory. 2001;8(5):295–300. doi: 10.1101/lm.39501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Wang Y, Sereno JA, Jongman A, Hirsch J. fMRI evidence for cortical modification during learning of Mandarin lexical tone. J Cogn Neurosci. 2003;15(7):1019–1027. doi: 10.1162/089892903770007407. [DOI] [PubMed] [Google Scholar]
  50. Wang Y, Spence MM, Jongman A, Sereno JA. Training American listeners to perceive Mandarin tones. The Journal of the Acoustical Society of America. 1999;106(6):3649–3658. doi: 10.1121/1.428217. [DOI] [PubMed] [Google Scholar]
  51. Winkler I, Kujala T, Tiitinen H, Sivonen P, Alku P, Lehtokoski A, et al. Brain responses reveal the learning of foreign language phonemes. Psychophysiology. 1999;36(5):638–642. [PubMed] [Google Scholar]
  52. Xu Y. Contextual tonal variations in Mandarin. Journal of Phonetics. 1997;25:61–83. [Google Scholar]
  53. Xu Y, Gandour J, Talavage T, Wong D, Dzemidzic M, Tong Y, et al. Activation of the left planum temporale in pitch processing is shaped by language experience. Human Brain Mapping. 2006;27(2):173–183. doi: 10.1002/hbm.20176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Xu Y, Krishnan A, Gandour JT. Specificity of experience-dependent pitch representation in the brainstem. Neuroreport. 2006;17(15):1601–1605. doi: 10.1097/01.wnr.0000236865.31705.3a. [DOI] [PubMed] [Google Scholar]
  55. Ylinen S, Shestakova A, Huotilainen M, Alku P, Naatanen R. Mismatch negativity (MMN) elicited by changes in phoneme length: a cross-linguistic study. Brain Res. 2006;1072(1):175–185. doi: 10.1016/j.brainres.2005.12.004. [DOI] [PubMed] [Google Scholar]

RESOURCES