Abstract
Voice pitch is an important information-bearing component of language that is subject to experience dependent plasticity at both early cortical and subcortical stages of processing. We’ve already demonstrated that pitch onset component (Na) of the cortical pitch response (CPR) is sensitive to flat pitch and its salience. In regards to dynamic pitch, we do not yet know whether the multiple pitch-related transient components of the CPR reflect specific temporal attributes of such stimuli. Here we examine the sensitivity of the multiple transient components of CPR to changes in pitch acceleration associated with the Mandarin high rising lexical tone. CPR responses from Chinese listeners were elicited by three citation forms varying in pitch acceleration and duration. Results showed that the pitch onset component (Na) was invariant to changes in acceleration. In contrast, Na-Pb and Pb-Nb showed a systematic increase in the interpeak latency and decrease in amplitude with increase in pitch acceleration that followed the time course of pitch change across the three stimuli. A strong correlation with pitch acceleration was observed for these two components only – a putative index of pitch-relevant neural activity associated with the more rapidly-changing portions of the pitch contour. Pc-Nc marks unambiguously the stimulus offset. We therefore propose that in the early stages of cortical sensory processing, a series of neural markers flag different temporal attributes of a dynamic pitch contour: onset of temporal regularity (Na); changes in temporal regularity between onset and offset (Na-Pb, Pb-Nb); and offset of temporal regularity (Pc-Nc). At the temporal electrode sites, the stimulus with the most gradual change in pitch acceleration evoked a rightward asymmetry. Yet within the left hemisphere, stimuli with more gradual change were indistinguishable. These findings highlight the emergence of early hemispheric preferences and their functional roles as related to sensory and cognitive properties of the stimulus.
Keywords: auditory, pitch encoding, iterated rippled noise, cortical pitch response, pitch onset response, hemispheric laterality
1. Introduction
Pitch is an important information-bearing perceptual attribute that plays an important role in the perception of language and music. There is considerable interest therefore in how this pitch-relevant information is extracted from speech and nonspeech sounds at both subcortical (Cariani & Delgutte, 1996a, 1996b; Cedolin & Delgutte, 2005; Meddis & O’Mard, 1997) and cortical levels (Walker, Bizley, King, & Schnupp, 2011). There is also growing interest in understanding the neural mechanisms that mediate experience-dependent shaping of pitch processing. Linguistic and musical pitch provide us with a window to evaluate how neural representation of pitch-relevant attributes emerge from early sensory levels of processing and interact with higher levels of cognitive processing in the human brain, and how language and music experience shapes these representations. Recent empirical data show that neural representation of pitch is shaped by one’s experience with language and music at the level of the auditory brainstem as well as the cerebral cortex (Besson, Chobert, & Marie, 2011; Gandour & Krishnan, 2014; Koelsch, 2012; Kraus & Banai, 2007; Krishnan, Gandour, & Bidelman, 2012; Meyer, 2008; Munte, Altenmuller, & Jancke, 2002; Patel & Iversen, 2007; Tervaniemi et al., 2009; Zatorre, Belin, & Penhune, 2002; Zatorre & Gandour, 2008). But we have yet to achieve a precise characterization of neural representation of pitch-relevant information associated with specific attributes of dynamic pitch contours that occur in natural speech.
At the cortical level, magnetoencephalography (MEG) has been used previously to study the sensitivity to periodicity, an essential requisite of pitch, by investigating the N100m component. However, a large proportion of the N100m is simply a response to the onset of sound energy, and not exclusively to pitch (Alku, Sivonen, Palomaki, & Tiitinen, 2001; Gutschalk, Patterson, Scherg, Uppenkamp, & Rupp, 2004; Lutkenhoner, Seither-Preisler, & Seither, 2006; Soeta & Nakagawa, 2008; Yrttiaho, Tiitinen, May, Leino, & Alku, 2008). In order to disentangle the pitch-specific response from the onset response, a novel stimulus paradigm was constructed with two segments - an initial segment of noise with no pitch to evoke the onset components only, followed by a pitch-eliciting segment of iterated rippled noise (IRN) matched in intensity and overall spectral profile (Krumbholz, Patterson, Seither-Preisler, Lammertmann, & Lutkenhoner, 2003). Interestingly, a transient pitch onset response (POR) was evoked from this noise-to-pitch transition only. The reverse stimulus transition from pitch to noise failed to produce a POR. It has been proposed that the human POR, as measured by MEG, reflects synchronized cortical neural activity specific to pitch (Chait, Poeppel, & Simon, 2006; Krumbholz et al., 2003; Ritter, Gunter Dosch, Specht, & Rupp, 2005; Seither-Preisler, Patterson, Krumbholz, Seither, & Lutkenhoner, 2006). POR latency and magnitude, for example, has been shown to depend on pitch salience. A more robust POR with shorter latency is observed for stimuli with stronger pitch salience as compared to those with weaker pitch salience. Source analyses (Gutschalk, Patterson, Rupp, Uppenkamp, & Scherg, 2002; Gutschalk et al., 2004; Krumbholz et al., 2003), corroborated by human depth electrode recordings (Griffiths et al., 2010; Schonwiesner & Zatorre, 2008), indicate that the POR is localized to the anterolateral portion of Heschl’s gyrus, the putative site of pitch processing (Bendor & Wang, 2005; Griffiths, Buchel, Frackowiak, & Patterson, 1998; Johnsrude, Penhune, & Zatorre, 2000; Patterson, Uppenkamp, Johnsrude, & Griffiths, 2002; Penagos, Melcher, & Oxenham, 2004; Zatorre, 1988).
We recently adopted Krumbholz et al.’s (2003) pitch onset response paradigm to demonstrate that a human cortical pitch response (CPR) can be extracted from scalp-recorded electroencephalogram (EEG) (Krishnan, Bidelman, Smalt, Ananthakrishnan, & Gandour, 2012). Indeed, neural responses evoked by IRN steady-state pitch stimuli steadily increased in magnitude with increasing IRN stimulus temporal regularity. Behavioral pitch discrimination also improved with increasing stimulus temporal regularity. This change in response amplitude with increasing stimulus temporal regularity was strongly correlated with behavioral measures of change in pitch salience (perceived strength of pitch). Furthermore, a robust CPR was evoked from both weak and strong IRN pitch-eliciting stimuli, but not to “no-pitch” IRN. We therefore conclude that the CPR is specific to pitch rather than simply a neural response to IRN elicited by slow, spectrotemporal modulations unrelated to pitch (Barker, Plack, & Hall, 2012).
In MEG/EEG studies of pitch processing, the primary focus was to characterize a single, transient pitch onset response to stimuli exhibiting a steady-state pitch. Whether or not their findings can be extrapolated to neural mechanisms that are associated with dynamic pitch contours is an empirical question. In this study, we utilize linguistically-relevant pitch stimuli with dynamic, curvilinear patterns. They represent within-category variants of the high rising lexical tone of Mandarin Chinese.
Linguistic and musical pitch provide an analytic window to evaluate how neural representations of important pitch attributes of a sound undergo transformation from early sensory to later cognitive stages of processing in the human brain, and how pitch-relevant experience shapes these representations. In the music domain, MEG/EEG studies have shown enhanced auditory cortical representations (Pantev et al., 1998; Shahin, Bosnyak, Trainor, & Roberts, 2003) and superior detection of weak pitch incongruities (Magne, Schon, & Besson, 2006; Marie, Delogu, Lampis, Belardinelli, & Besson, 2011) in musicians compared with nonmusician controls. Though providing convincing evidence in support of experience-dependent neural plasticity, none of their evoked response components were pitch-dependent. In the language domain, tonal languages are especially advantageous for studying pitch processing because of its functional load at the level of the syllable. Almost all previous EEG studies of lexical tone have investigated mismatch negativity (MMN) responses by using dynamic pitch contours with a passive oddball paradigm (cf. Gu, Zhang, Hu, Zhao, & Zhang, 2013; steady-state pitch; Zheng, Minett, Peng, & Wang, 2012; active oddball, P300). As measured by MMN amplitude, early, preattentive processing of lexical tone, relative to consonants, was lateralized to the right hemisphere (RH) (Luo et al., 2006). This RH preference was also observed in an MMN study of categorical perception of lexical tone (Xi, Zhang, Shu, Zhang, & Li, 2010). In their nonspeech condition that differed only in spectral components from the speech condition, no difference was observed between within- and between-category deviants, relative to standard stimuli, in the RH. Of special relevance to this study, within-category deviants elicited larger MMN in the RH regardless of speech/nonspeech condition. Lexical tone is of further interest because its primary auditory correlate is based on variations in pitch, a multidimensional perceptual attribute that relies on several acoustic features. MMN studies of tone languages have revealed that pitch contour and pitch height are two important features used in early, preattentive lexical tone processing (Chandrasekaran, Gandour, & Krishnan, 2007; Chandrasekaran, Krishnan, & Gandour, 2007; Tsang, Jia, Huang, & Chen, 2011). These findings notwithstanding, any definitive interpretations about cortical pitch processing must be tempered by the fact that the MMN is comprised of both auditory and cognitive mechanisms of frequency change detection in auditory cortex (Maess, Jacobsen, Schroger, & Friederici, 2007). The MMN itself is not a pitch-specific response.
Our overall objective therefore is to fill this knowledge gap by examining cortical, pitch-specific responses that are elicited by linguistically-relevant, dynamic pitch contours exemplary of those that occur in natural speech. To our knowledge there are no published reports that tie specific transient components of the cortical pitch response to selected portions of dynamic pitch contours. The specific aim of this paper is to first identify the multiple transient components of the CPR and label them in relation to specific aspects of our dynamic, curvilinear pitch stimuli (e.g., pitch onset, pitch offset, pitch acceleration). Our hypothesis is that the cortical representation of pitch, as reflected by the CPR, will be differentially sensitive to different attributes or features of dynamic pitch. To eliminate any potential interference of higher-order phonological categories, all three of our dynamic pitch stimuli fall within the limits of citation forms of Mandarin Tone 2. To enable us to focus primarily on the effects of changes in rate of acceleration during the rising portion of Tone 2, pitch height is constant; i.e, pitch onset and offset and pitch range are identical across stimuli. With respect to duration, the three stimuli represent short-, medium-, and long-duration variants of isolated productions of Tone 2.
2. Materials and methods
2.1. Participants
Ten native speakers of Mandarin Chinese (5 male, 5 female) were recruited from the Purdue University student body to participate in the experiment. All exhibited normal hearing sensitivity at audiometric frequencies between 500 and 4000 Hz and reported no previous history of neurological or psychiatric illnesses. They were closely matched in age (24.50 ± 3.53 years), years of formal education (16.90 ± 2.88 years), and were strongly right handed (laterality index 93.10 ± 11.22%) as measured by the Edinburgh Handedness Inventory (Oldfield, 1971). All participants were born and raised in mainland China. None had received formal instruction in English before the age of nine (12 ± 1.24 years). As determined by a music history questionnaire (Wong & Perrachione, 2007), all participants, except for one, had less than three years of musical training (1.55 ± 3.00 years) on any combination of instruments. None had any training within the past five years. Each participant was paid and gave informed consent in compliance with a protocol approved by the Institutional Review Board of Purdue University.
2.2. Stimuli
Using an iterated rippled noise (IRN) algorithm, dynamic pitch contours approximating three isolated, citation variants of Mandarin Tone 2 were constructed: short (T2_150), intermediate (T2_200), and long (T2_250). Their durations were, in order, 150, 200, and 250 ms (Appendix A: audio, stimuli). Though infrequent, a short variant (T2_150) has been reported to occur in isolated productions of Tone 2 (Kratochvil, 1985). These stimuli differed in F0 rate of acceleration as well as duration (Fig. 1). These durations easily fall outside the range of temporal integration effects (≈80 ms) on pitch and its salience for stimuli with resolved harmonics (Plack, Carlyon, & Viemeister, 1995; Plack, Turgeon, Lancaster, Carlyon, & Gockel, 2011; Plack & White, 2000; White & Plack, 1998, 2003). It is therefore unlikely that temporal integration effects pose a potential confound for our evaluation of pitch acceleration-related effects. Rates of acceleration are displayed at 80 ms and from minimum to maximum (in the acceleration domain) per stimulus in Table 1. The maximum speed of pitch change within a speaker’s ability to produce a rapid shift in rising pitch over a 4 st interval is 61.3 st/s (Xu & Sun, 2002, p. 1407, Table VII). The average velocity rates (in st/s), calculated from the turning point to F0 offset, for T2_250 (25.6), T2_200 (32.1), and T2_150 (42.7) fall within the physiological limits of speed of rising pitch changes. As reflected by FFR responses in the brainstem (Krishnan, Gandour, Smalt, & Bidelman, 2010, p. 96, Figs. 2–3), a scaled variant of Tone 2 with a velocity rate of 51.94 st/s falls within the bounds of the normal voice range. F0 onset (100.88 Hz) and offset (131.72 Hz) were constant. Δ F0 from turning point to offset was fixed across stimuli at 30.84 Hz (4.6 st; 0.38 octaves). This Δ F0 value is comparable to that of an exemplary Tone 2 citation form (Krishnan et al., 2010) and is an effective cue for the perception of isolated Tone 2 (Moore & Jongman, 1997). The turning point was located at about ≈26% of the duration of the F0 contour (40 ms, T2_150; 53 ms, T2_200; 66 ms, T2_250). The timing of these turning points relative to F0 onset are perceptually relevant in the identification of Tone 2 (cf. Moore & Jongman, 1997, p. 1870, Fig. 4). Based on these behavioral and neural data, we judged these stimuli to be ecologically valid representations of Tone 2 and moreover likely to elicit differential sensitivity to varying degrees of acceleration rates at the cortical level.
Table 1.
Instantaneousa | Minimum to Maximumb | |||
---|---|---|---|---|
| ||||
Hz/s | st/s | Hz/s | st/s | |
Stimulus
|
|
|
||
T2_150 | 346 | 56 | 559 | 85 |
T2_200 | 130 | 22 | 419 | 63 |
T2_250 | 40 | 7 | 336 | 51 |
Note.
Located at 80 ms within pitch segment;
average.
These three IRN stimuli with time-varying F0 contours were generated by applying a time-varying, delay-and-add algorithm (Appendix B: text, fourth-order polynomial equations) (Denham, 2005; Krishnan, Swaminathan, & Gandour, 2009; Sayles & Winter, 2007; Swaminathan, Krishnan, Gandour, & Xu, 2008). A high iteration step (n = 32) was chosen because pitch salience does not increase by any noticeable amount beyond this number of iteration steps. The gain was set to 1. By using IRN, we preserve dynamic variations in pitch of auditory stimuli that lack a waveform periodicity, formant structure, temporal envelope, and recognizable timbre characteristic of speech.
Each stimulus condition consisted of two segments (crossfaded with 5ms cos2 ramps): an initial 500 ms noise segment followed by a pitch segment, i.e., T2_150, T2_200, and T2_250 (Fig. 2). The overall RMS level of each segment was equated such that there was no discernible difference in intensity between initial and final segments. All stimuli were presented binaurally at 80 dB SPL through magnetically-shielded tubal insert earphones (ER-3A; Etymotic Research, Elk Grove Village, IL, USA) with a fixed onset polarity (rarefaction) and a repetition rate of 0.94/s. Stimulus presentation order was randomized both within and across participants. All stimuli were generated and played out using an auditory evoked potential system (SmartEP, Intelligent Hearing Systems; Miami, FL, USA).
2.3. Cortical pitch response acquisition
Participants reclined comfortably in an electro-acoustically shielded booth to facilitate recording of neurophysiologic responses. They were instructed to relax and refrain from extraneous body movement (to minimize myogenic artifacts), ignore the sounds they heard, and were allowed to sleep throughout the duration of the recording procedure (≈ 75% fell asleep). The EEG was acquired continuously (5000 Hz sampling rate; 0.3 to 2500 Hz analog band-pass) using ASA-Lab EEG system (ANT Inc., The Netherlands) utilizing a 32-channel amplifier (REFA8-32, TMS International BV) and WaveGuard (ANT Inc., The Netherlands) electrode cap with 32-shielded sintered Ag/AgCl electrodes configured in the standard 10–20-montage system. The high sampling rate of 5 kHz was necessary to recover the brainstem frequency following responses in addition to the relatively slower cortical pitch components. Because the primary objective of this study was to characterize the cortical pitch components, the EEG acquisition electrode montage was limited to 9 electrode locations: Fpz, AFz, Fz, F3, F4, Cz, T7, T8, M1, M2 (Appendix C: figure, electrode montage). The AFz electrode served as the common ground and the common average of all connected unipolar electrode inputs served as default reference for the REFA8-32 amplifier. An additional bipolar channel with one electrode placed lateral to the outer canthi of the left eye and another electrode placed above the left eye was used to monitor artifacts introduced by ocular activity. Inter-electrode impedances were maintained below 10 kΩ. For each stimulus, EEGs were acquired in blocks of 1000 sweeps. The experimental protocol took about 2 hours to complete.
2.4 Extraction of the cortical pitch response (CPR)
CPR responses were extracted off-line from the EEG files. To extract the cortical pitch response components, EEG files were first down sampled from 5000 Hz to 2048 Hz. They were then digitally high-pass filtered (3–25 Hz) to enhance the transient components and minimize the sustained component. Sweeps containing electrical activity exceeding ± 50 μV were rejected automatically. Subsequently, averaging was performed on all 8 unipolar electrode locations using the common reference to allow comparison of CPR components at the right frontal (F4), left frontal (F3), right temporal (T8), and left temporal (T7) electrode sites to evaluate laterality effects. The re-referenced electrode site, Fz-linked T7/T8, was used to characterize the transient pitch response components. This electrode configuration was exploited to improve the signal-to-noise ratio of the CPR components by differentially amplifying (i) the non-inverted components recorded at Fz and (ii) the inverted components recorded at the temporal electrode sites (T7 and T8). This identical electrode configuration will also allow us to compare these CPR responses with brainstem responses in subsequent experiments. For both averaging procedures, the analysis epoch was 1200 ms including the 100 ms pre-stimulus baseline.
2.5. Analysis of CPR
The CPR is characterized by obligatory components (P1/N1) corresponding to the onset of energy in the precursor noise segment of the stimulus followed by several transient response components (Pa, Na, Pb, Nb, Pc, Nc) occurring after the onset of the pitch-eliciting segment of the stimulus. Because our primary objective in this study is to characterize what aspects of the dynamic pitch contours are being indexed by the components of the CPR (e.g., pitch onset, pitch acceleration, stimulus duration), only the latency and magnitude of the CPR components were evaluated. Both peak latencies (time interval between pitch-eliciting stimulus onset and a response peak of interest) of response components Pa, Na, Pb, Nb, and Pc, and interpeak latencies (time interval between response peaks) Pa-Na, Na-Pb, Pb-Nb, Nb-Pc, and Pa-Nc were measured to enable us to identify the components associated with pitch onset, pitch acceleration, stimulus duration, and stimulus offset. Peak-to-peak amplitude of Pa-Na, Na-Pb, Pb-Nb, and Nb-Pc was measured to determine if variations in amplitude were indexing specific aspects of the pitch contour (e.g., pitch acceleration). In addition, peak-to-peak amplitude of Na-Pb and Pb-Nb was measured separately at the frontal (F3/F4) and temporal (T7/T8) electrode sites to evaluate laterality effects. To enhance visualization of the laterality effects along a spectrotemporal dimension, joint time frequency analysis (using a continuous wavelet transform) was performed on the grand average waveforms derived from the frontal and temporal electrodes. Since the focus of this paper is on the characterization of the CPR components, the obligatory onset responses to the noise precursor were not analyzed.
2.6. Statistical analysis
Separate two-way, mixed model ANOVAS were conducted on peak latency, interpeak latency, and peak-to-peak amplitude derived from the Fz electrode site. Subjects were treated as a random factor, stimulus and component as within-subject factors. In the analysis of peak latency, there were five components (Pa, Na, Pb, Nb, Pc); in the analysis of interpeak latency and peak-to-peak amplitude, four components (Pa-Na, Na-Pb, Pb-Nb, Nb-Pc). By analyzing these components, we were able to assess the effects of pitch acceleration on latency and amplitude across stimuli (T2_150, T2_200, T2_250). Separate three-way mixed model ANOVAs were conducted on peak-to-peak amplitude derived from the T7/T8 and F3/F4 electrode sites. Subjects were treated as a random factor; stimulus (T2_150, T2_200, T2_250), component (Na-Pb, Pb-Nb), and side (left, right) as within-subject factors. By focusing on these two components, we were able to determine whether laterality effects at the frontal and temporal sites vary as a function of stimulus.
3. Results
3.1. Response morphology of CPR components
Grand averaged cortical evoked response components to the three stimuli are shown in Fig. 3. The top panel shows both the superimposed P1/N1 onset complex for the three stimuli (black) and the CPR component to T2_250 (green). As expected (Krishnan, Bidelman, et al., 2012), the obligatory P1/N1 complex, reflecting neural activity synchronized to the onset of the noise precursor (black), is very similar across the three stimulus conditions. The CPR components, elicited by the pitch-eliciting stimulus segment, are characterized by a series of successive biphasic components (in ms): Pa, 70–85; Na, 125–140; Pb, 700–715; Nb, 765–775; Pc, 800–815; and Nc, 840–855, for the T2_250 stimulus only. Na, Pb, and Nb were the most robust response components. The bottom panel in Fig. 3 shows only the CPR waveforms elicited by the three stimuli. It is clear from these waveform traces that peak latency for Pa and Na do not change appreciably across stimulus conditions (alignment of latency indicated by solid vertical lines). In contrast, response components Pb, Nb, Pc, and Nc all show a systematic increase in peak latency across stimulus conditions (indicated by dashed lines). Consistent with these observations, the interpeak latencies (Na-Pb, Pb-Nb, Nb-Pc) increase across stimulus conditions. Response amplitude for Na, Pb, and Nb also appears to increase across stimuli with T2_150 showing the smallest amplitude. These systematic changes in response latency and amplitude are likely produced by the decrease in rate of pitch acceleration and the increase in stimulus duration across the three stimulus conditions.
3.2. Latency of CPR components
The stimulus effects on mean peak latency of the CPR components are displayed in Fig. 4 (Appendix D1: table, M, SE). Qualitatively, we observe large increases in peak latency across stimuli for components Pb, Nb, and Pc, with smaller increases in Na. An omnibus two-way ANOVA yielded significant (p < 0.001) main effects (stimulus, F2,18 = 1029.61; component, F4,45 = 3116.71), and an interaction between stimulus and component (F8,72 = 213.42). By Pb, Nb, and Pc separately, post hoc multiple comparisons of stimuli (αBonferroni = 0.05) revealed that peak latency of T2_250 was longer than T2_200 which, in turn, was longer than T2_150. For Na, T2_150 was shorter than either of the other two stimuli. For Pa, differences in peak latency between stimuli failed to reach significance. These results suggest that decreasing acceleration rate and increasing stimulus duration increases peak latency of components Pb, Nb, and Pc, with a smaller change for Na, and no change for Pa. The invariant nature of Pa latency across stimuli suggests that it may be indexing the onset of the pitch-eliciting stimulus. The latency of Na is consistent with a pitch-onset response component. In contrast, the relatively large changes in latency for Pb, Nb, and Pc across stimuli are likely to represent changes in the more dynamic portions of the pitch contour and stimulus duration.
Fig. 5 displays mean interpeak latency effects of stimuli on pitch-related components: Pa-Na, Na-Pb, Pb-Nb, Nb-Pc (Appendix D2: table, M, SE). Na-Pb and Pb-Nb are strikingly similar in showing a systematic increase in their response patterns across stimuli. An omnibus two-way ANOVA on the pitch-related components yielded significant (p < 0.001) main effects (stimulus, F2,72 = 131.13; component, F3,36 = 40.09), and an interaction between stimulus and component (F6,72 = 31.33). By Na-Pb and Pb-Nb separately, post hoc paired comparisons of stimuli indicated that interpeak latency of T2_250 was longer than T2_200 which, in turn, was longer than T2_150. For Pa-Na, T2_150 was shorter than T2_200 and T2_250. For Nb-Pc, the pattern was somewhat irregular; T2_200 was shorter than T2_250 and T2_150. An omnibus two-way ANOVA yielded significant (p < 0.001) main effects (stimulus, F2,18 = 1029.61; component, F4,45 = 3116.71), and a significant interaction between stimulus and component (F8,72 = 213.42). The large and systematic changes across stimuli for both Na-Pb and Pb-Nb suggest that they may be indexing the more rapidly-changing portions of the pitch contours.
To determine which response temporal interval best reflected stimulus duration, we conducted a two-way mixed model ANOVA on the interpeak latency of two response components (Pa-Pc, Pa-Nc) and three stimuli (T2_150, T2_200, T2_250). Results yielded significant (p < 0.001) main effects (stimulus, F2,18 = 637.33; component, F1,18 = 360.88), and an interaction between stimulus and component (F2,18 = 25.44). Pa-Nc exhibited longer interpeak latency than Pa-Pc, and corresponded best with stimulus duration irrespective of stimulus (Fig. 5; Appendix E: figure, Pa-Nc vs. Pa-Pc). The interaction effect was due to a larger relative difference between Pa-Nc and Pa-Pc for T2_150 (+1.36) than for either T2_200 (+1.13) or T2_250 (+1.16).
To confirm that Pc and Nc are indeed stimulus offset components (and not related to pitch), we recorded responses from five of the participants to a “noise-to-noise” stimulus. It was comprised of the same noise precursor as T2_250, but the pitch segment was replaced by a 250 ms noise segment to eliminate pitch-evoked components. A comparison of the responses to the noise-to-pitch and noise-to-noise stimuli is displayed in Fig. 6. It clearly shows that responses to the noise-to-noise stimulus contained only Pc and Nc components, and moreover, that they overlapped temporally with the Pc and Nc components elicited by the noise-to-pitch stimulus. Thus it is clear that both Pc and Nc components reflect stimulus offset, and the Pa-Nc interval better approximates stimulus duration. The mean Nc latency observed herein is also consistent with offset latencies previously reported for the cortical pitch response (Gutschalk, Patterson, Scherg, Uppenkamp, & Rupp, 2007; Seither-Preisler, Krumbholz, Patterson, Seither, & Lutkenhoner, 2004; Seither-Preisler et al., 2006).
3.3. Amplitude of CPR components
The stimulus effects on peak-to-peak amplitude of the CPR components are displayed in Fig. 7 (Appendix D3: table; M, SE). Again, Na-Pb and Pb-Nb are strikingly similar in their response patterns across stimuli. An omnibus two-way ANOVA on the pitch-related components yielded significant (p < 0.001) main effects (stimulus, F2,18 = 34.87; component, F3,27 = 10.76), and an interaction between stimulus and component (F6,54 = 8.75). By Na-Pb and Pb-Nb separately, post hoc paired comparisons of stimuli indicated that peak-to-peak amplitude of T2_250 was greater than T2_200 which, in turn, was greater than T2_150. In the case of Nb-Pc, differences in peak-to-peak amplitude between stimuli failed to reach significance. Within Pa-Na, peak-to-peak amplitude was marginally higher in T2_250 than T2_150 (p = 0.042). The latency and amplitude changes specific to Na-Pb and Pb-Nb reinforce our view that these components reflect pitch-relevant neural activity that may be closely associated with the dynamic portions of the pitch contour.
To support this view, we measured the extent to which CPR components (Pa-Na, Na-Pb, Pb-Nb, Nb-Pc) were associated with pitch acceleration, Pearson correlations (r) were computed between interpeak latency and peak-to-peak amplitude with the average acceleration (Hz/s) from minimum to maximum across stimuli (T2_150, 559; T2_200, 419; T2_250, 336). Na-Pb and Pb-Nb showed a significantly stronger negative association with pitch acceleration relative to the other two components (Table 2). The negative correlation coefficient means that both interpeak latency and peak-to-peak amplitude increase with decreasing acceleration. Na-Pb and Pb-Nb similarly showed a stronger negative association compared to other two components even when peak-to-peak amplitude was correlated with instantaneous acceleration (Fig. 1).
Table 2.
Component | Interpeak Latency |
Peak to Peak Amplitudea |
Peak to Peak Amplitudeb |
---|---|---|---|
Pa-Na | −.407 (.0255) | −.696 (<.0001) | −.411 (.0240) |
Na-Pb | −.778 (<.0001) | −.954 (<.0001) | −.780 (<.0001) |
Pb-Nb | −.777 (<.0001) | −.932 (<.0001) | −.776 (<.0001) |
Nb-Pc | .053 (.7797) | .056 (.5621) | .056 (.7671) |
Note: Values in parentheses represent levels of significance.
Average acceleration from minimum to maximum;
Instantaneous acceleration at 80ms after pitch onset
3.4. Comparison of CPR components at frontal (F3/F4) and temporal (T7/T8) electrode sites to examine hemispheric laterality
Fig. 8 displays the grand average waveforms of F3/F4 and T7/T8 (two left columns) and their corresponding spectra (two right columns) for each of the three stimuli. The waveform data reveal that pitch-related components at frontal F3 (orange) and F4 (green) electrode sites essentially overlap for all three stimuli. In contrast, these same components at the right temporal electrode (T8: red) appear to be larger compared to the left temporal electrode (T7: blue), particularly for T2_250. Similar trends are evident in the spectrotemporal representations of the pitch-related components (enclosed within vertical dashed lines, 500–900 ms). Note the large rightward lateralization for T2_250.
Fig. 9 shows that hemispheric laterality effects on peak-to-peak amplitude of pitch-related components (Na-Pb, Pb-Nb) vary depending on electrode site (Fig. 9). These effects were evaluated by performing separate, three-way mixed model ANOVAs (component X hemisphere X stimulus) on peak-to-peak amplitude at the frontal (F3/F4) and temporal (T7/T8) electrode sites. For F3/F4, results revealed significant main effects of component (F1,36 = 11.52, p = 0.0017) and stimulus (F2,36 = 44.03, p < 0.0001). Other effects failed to reach significance including the interaction between stimulus and hemisphere. For T7/T8, on the other hand, results showed significant main effects (component: F1,34 = 8.50, p = 0.0062; stimulus: F2,34 = 43.55, p < 0.0001; hemisphere: F2,36 = 14.78, p = 0.0005) but also, and more importantly, an interaction between stimulus and hemisphere (F2,34 = 9.16, p = 0.0007). By stimulus, post hoc multiple comparisons (αBonferroni = 0.05) showed that a RH advantage (T8 > T7) was evident for T2_250 only. By hemisphere, all pairwise comparisons were significant (T2_250 > T2_200, T2_250 > T2_150, T2_200 > T2_150) except for T2_200 vs. T2_250 in the LH. These combined, stimulus- and hemisphere-dependent simple main effects suggest that the hemispheres are differentially sensitive to graded approximations to a prototypical pitch representation of Mandarin Tone 2. The main effect of component was the same for both F3/F4 and T7/T8. Na-Pb exhibited greater peak-to-peak amplitude than Pb-Nb across the three stimuli.
4. Discussion
Our objective is to identify cortical, pitch-specific responses that are elicited by linguistically-relevant, dynamic curvilinear pitch contours exemplary of those that occur in natural speech. Herein we examine the sensitivity of the multiple transient components of the CPR and relate them to specific temporal attributes associated with three, within-category variants of the Mandarin Tone 2. Our major findings show that the pitch onset component (Na) was invariant to changes in acceleration. In contrast, two components (Na-Pb, Pb-Nb) showed a strong correlation with pitch acceleration and a systematic increase in the interpeak latency and decrease in amplitude with increase in pitch acceleration that followed the time course of pitch change across the three stimuli. Pc-Nc marks unambiguously the stimulus offset. These data lead us to propose that in the early stages of cortical sensory processing of pitch, a series of neural markers flag different temporal attributes of a dynamic pitch contour: onset of temporal regularity (Na); changes in temporal regularity between onset and offset (Na-Pb, Pb-Nb); and offset of temporal regularity (Pc-Nc). A rightward asymmetry at a temporal site is observed for the tonal variant (T2_250) that most closely approximates a prototypical representation of Tone 2. These findings highlight the emergence of early hemispheric preferences and their functional roles as related to sensory and cognitive properties of the stimulus.
4.1. Pitch-relevant neural components of the CPR
4.1.1. Pitch onset (Na)
In this study the invariance of Na latency and amplitude across high-iteration-step IRN stimuli suggests that the Na component may be specific to pitch onset and its salience, and moreover, that it may not be sensitive to other varying attributes of the stimuli, viz., duration or pitch acceleration. Evidence adduced from earlier studies point to the lateral Heschl’s gyrus as a source for the pitch onset response component including dipole source analysis of the MEG pitch onset response (Chait et al., 2006; Gutschalk et al., 2004; Krumbholz et al., 2003; Ritter et al., 2005; Seither-Preisler et al., 2004) and localization of direct depth electrode recordings along the lateral to medial stretch of Heschl’s gyrus in humans (Schonwiesner & Zatorre, 2008). Previous studies using IRN stimuli show that the latency and amplitude of the pitch onset response varies systematically with the pitch salience (EEG: Krishnan, Bidelman, et al., 2012; MEG: Krumbholz et al., 2003; Seither-Preisler et al., 2006; Soeta, Nakagawa, & Tonoike, 2005). Functional neuroimaging studies in humans (Griffiths et al., 1998; Griffiths, Uppenkamp, Johnsrude, Josephs, & Patterson, 2001) and intracranial electrode recordings in both primates (Bendor & Wang, 2005) and humans (Schonwiesner & Zatorre, 2008) demonstrate that activity of the primary auditory cortex increases as a function of the number of IRN iteration steps. These findings collectively suggest that the increase in pitch salience with increasing temporal regularity of the IRN stimulus is correlated with an increase in pitch-relevant neural activity and/or neural synchrony in cortical auditory neurons in the lateral Heschl’s gyrus. This inference about the CPR source, however, must await empirical validation by neuroimaging techniques with superior spatial resolution (e.g., MEG, fMRI).
4.1.2. Dynamic portions of pitch contour (Na-Pb, Pb-Nb)
In contrast to the onset (Na) and offset (Pc-Nc) components, Na-Pb and Pb-Nb show a systematic increase in the interpeak latency and decrease in amplitude with increase in pitch acceleration that parallels the time course of pitch change across the three stimuli. This finding suggests that these components may be indexing pitch-relevant neural activity associated with the more rapidly-changing portions of the pitch contour. The observation of a strong correlation with pitch acceleration for only these two components further reinforces our view that these components are indexing the dynamic portions of the pitch contour. While our current experimental design does not permit us to determine whether components Na-Pb and Pb-Nb are indexing different portions of the dynamic segment of the pitch contour, we hypothesize that Na-Pb (relatively longer latency and larger amplitude) indexes the increasing pitch acceleration between the turning point and the point of maximum acceleration in the stimulus; whereas Pb-Nb (shorter latency and smaller amplitude) indexes the shorter pitch deceleration segment between maximum acceleration and stimulus offset.
To reinforce our view that Na-Pb and Pb-Nb reflect neural activity associated with pitch acceleration, we need to rule out the possibility that changes in stimulus duration may have contributed to changes in latency and amplitude of the pitch-relevant information. Indeed, it is well known that pitch information is integrated over duration. Notwithstanding, it is unlikely that changes in latency and amplitude of Na-Pb and Pb-Nb can be attributed to differences in temporal integration for pitch. Compared to psychoacoustic data, our stimulus durations fall well outside the range of temporal integration effects on pitch (≈80 ms) and its salience for stimuli with resolved harmonics (Plack et al., 1995; Plack et al., 2011; Plack & White, 2000; White & Plack, 1998, 2003). Data from electrophysiologic studies indicate that the amplitude of the N1(m) response increases as a function of stimulus duration until reaching an asymptotic level with a temporal integration window between 20 and 50 ms (Alain, Woods, & Covarrubias, 1997; Gage & Roberts, 2000; Joutsiniemi, Hari, & Vilkman, 1989; Krumbholz et al., 2003; Onishi & Davis, 1968; Ostroff, McDonald, Schneider, & Alain, 2003; Ross et al., 2009). Using the cortical MEG component N1(m), a similar temporal integration window (30–50 ms) is found for the periodicity of speech sounds (Yrttiaho, Tiitinen, Alku, Miettinen, & May, 2010). Taken together, it is unlikely that changes in latency and amplitude of Na-Pb and Pb-Nb are confounded by temporal integration for pitch.
While it is generally agreed that lateral Heschl’s gyrus is the putative source for the pitch onset response (Na), the generator sources for the remaining transient, pitch-relevant components (Pb, Nb) are unknown, and cannot be determined from this study. From a hierarchical processing perspective, it is nevertheless plausible that these later components (Na-Pb, Pb-Nb) may reflect neural activity from spatially distinct generators that represent later stages of processing, relative to Na, along a pitch processing hierarchy. Using IRN stimuli to evaluate pitch changes over the time course of a sound with fMRI (Patterson et al., 2002), they report that only when pitch changes are varied over a time course to produce a musical melody does pitch processing move beyond auditory cortex with asymmetries emerging that favor the RH. In their experiment, the time course spans a sequence of discrete changes in flat pitch that constitute a musical unit. In the case of tonal languages, the time course spans a sequence of continuous changes in dynamic pitch within a single syllable that represent a linguistic unit.
4.1.3. Stimulus offset (Pc-Nc)
Our findings clearly show that the biphasic component Pc-Nc marks unambiguously the stimulus offset (Gutschalk et al., 2007; Hari et al., 1987; Seither-Preisler et al., 2004; Seither-Preisler et al., 2006). However, it has also been suggested that rather than a simple offset of sound energy, the offset response may in certain conditions represent cessation of more specific sound features (Seither-Preisler et al., 2006; e.g., pitch) or persistence of the response to stimulus regularity (Lutkenhoner, Seither-Preisler, Krumbholz, & Patterson, 2011). In this study the absence of significant changes in either amplitude or latency of the offset response across stimuli does not support the view that offset components may also carry pitch information. To the contrary, we observe essentially identical offset components both in latency and amplitude for the noise-to-pitch vs. noise-to-noise stimulus condition (Fig. 6). This finding strengthens our view that the offset response observed in this study is specific to stimulus (sound) offset and distinct from components that carry pitch-relevant information.
Viewing our CPR components as temporal regularity monitors (Lutkenhoner et al., 2011), we propose that the Na component marks the onset of stimulus regularity; the Pc-Nc component marks cessation of stimulus regularity (not pitch). The intervening transient components (Na-Pb, Pb-Nb) reflect additional temporal monitors in the time course of this stimulus regularity processing that may be indexing attributes of pitch change presented in a dynamic pitch contour. Thus, we propose a series of temporal regularity monitors that mark different temporal attributes of a dynamic pitch contour: 1) onset of temporal regularity; 2) changes in temporal regularity between onset and offset; and 3) offset of temporal regularity. It has already been proposed that temporal regularity monitors at onset and offset of stimulus regularity are embedded in the early stages of cortical processing (Lutkenhoner et al., 2011). By extension, we similarly propose that the temporal regularity monitor(s) that indexes attributes of dynamic pitch between the onset and offset are also embedded in the early stages of cortical sensory processing.
4.2. Hemispheric preferences for pitch processing
We first observe that a rightward asymmetry is confined to the temporal electrodes (T8 > T7) only. This asymmetry in the temporal lobe stands in stark contrast to the absence of hemispheric asymmetry in the frontal electrodes (F3/F4). Important to note is that our experimental protocol is free of task demands; stimuli are reduced to the pitch parameter only; and moreover, this differential pattern of hemispheric asymmetry is based on peak-to-peak amplitude responses extracted from the two putative, pitch-specific components (Na-Pb, Pb-Nb). Thus, it seems reasonable to infer that the temporal asymmetry reflects selective recruitment of pitch-specific mechanisms in right auditory cortex at an early stage of cortical processing. This finding converges with an extensive literature that attests to the greater role of the RH in the processing of pitch (Friederici & Alter, 2004; Meyer, 2008; Poeppel, Idsardi, & van Wassenhove, 2008; Wildgruber, Ackermann, Kreifelts, & Ethofer, 2006; Zatorre & Gandour, 2008, for reviews).
Our stimuli also crucially exhibit dynamic, curvilinear F0 trajectories that are representative of a Mandarin lexical tone. Steady-state or flat F0 patterns are of no functional relevance in the speech of any of the world’s languages, tonal or otherwise. Interestingly, MEG recordings fail to observe any hemispheric differences with regard to either latency or amplitude of the pitch-relevant cortical components elicited by stimuli with flat pitch (Gutschalk et al., 2004; Hari et al., 1987; Krumbholz et al., 2003; Lutkenhoner & Steinstrater, 1998; Seither-Preisler et al., 2006). This disparity in hemispheric asymmetry between dynamic and flat pitch patterns further emphasizes the role of temporal regularity monitors between onset and offset.
Even though all three stimuli represent variants of a single tonal category, we observe differences in peak-to-peak amplitude of Na-Pb and Pb-Nb related to an interaction between hemisphere and stimulus. By hemisphere, we observe a gradual, continuous effect across stimuli in the RH (T8) as measured by (T2_250 > T2_200 > T2_150). In the LH (T7), we do not find a continuous effect across all stimuli. Like the RH, T2_250 and T2_200 are greater in amplitude than T2_150, but T2_250 does not differ from T2_200. By stimulus, only T2_250 elicits a preference for the RH (T8 > T7). The question now is whether simple acoustic features are sufficient to explain this pattern of hemispheric specialization. Differences in neural responses between the left and right auditory cortices have been conceptualized as related to differences in the speed with which dynamically changing spectral information is processed. One theoretical approach emphasizes different temporal integration windows (Poeppel, 2003, see also 4.1.2.); the other, a relative trade-off in temporal and spectral resolution (Zatorre et al., 2002). Both models can account for the observed stimulus continuum in the RH (Fig. 9). The RH preferentially extracts information from long integration windows (Poeppel, 2003, 150–250 ms). Encoding of pitch information becomes progressively better the longer the sampling window, and as a consequence, the frequency representation (Zatorre et al., 2002). Only Zatorre et al.’s model, however, can explain why only T2_250 elicits a RH preference, whereas T2_200 and T2_150 don’t. Because all three stimuli fall into Poeppel’s long integration window, it is unclear why the RH advantage is observed for T2_250 only. Neither model can adequately account for the absence of a stimulus effect between T2_200 and T2_250 in the LH.
It is incontrovertible that neural specializations are based on low-level features of the stimulus. More recent data from the extant literature, however, suggests that they can also be influenced by their functional status within one’s domain of pitch expertise (Gandour & Krishnan, 2014; Krishnan, Gandour, et al., 2012; Zatorre & Baum, 2012; Zatorre & Gandour, 2008, for reviews). Indeed, a complete account of pitch processing must allow for interactions between sensory and cognitive contributions that interact within the same time interval, as well as at different time intervals at different cortical levels of the brain. In this study, the time interval occurs at an early, preattentive stage of cortical sensory processing; all three stimuli fall within the bounds of a single tonal category. The stimulus-dependent asymmetry at the right temporal site suggests that it is sensitive to graded approximations to a prototypical pitch representation of Mandarin Tone 2. The stimulus with the most gradual change in pitch acceleration, T2_250, appears to cross that threshold that leads to a RH advantage at this temporal site (T8 > T7). Yet within the LH, T2_250 is indistinguishable from T2_200. By assuming that the LH is preferentially engaged for mediating the categorical status of pitch, this finding suggests that T2_200 as well as T2_250 are better candidates as representations of Mandarin Tone 2. Thus, within this early cortical time window we begin to see the emergence of hemispheric asymmetries and how the complementary roles of the two hemispheres reflect influences from sensory and cognitive properties of the stimulus.
Our findings are consistent with earlier ERP data that reveal the emergence of experience-dependent asymmetries at early cortical levels of processing. For example, in response to pure tones, musicians exhibit larger amplitudes of the early cortical components (N19m–P30m) within the right primary auditory cortex of trained musicians only; non-musicians show no hemispheric asymmetry (Schneider et al., 2002). In response to musical stimuli, a right temporal advantage is seen in the cortical N1 component related to pitch transition (change-N1, ~100 ms latency) in trained musicians (Itoh, Okumiya-Kanke, Nakayama, Kwee, & Nakada, 2012). But no hemispheric asymmetry is observed for the onset component. Both aforementioned studies point to experience-dependent enhancement of pitch processing in the right auditory cortex related to the music domain. Both employ discrete, flat pitch stimuli that are of musical relevance. This study uses dynamic, curvilinear pitch stimuli that are relevant to the language domain. What is especially interesting is the discovery of ERP components related to changes in pitch regardless of domain (cf. Itoh et al., 2012). We acknowledge that a direct comparison of data from English listeners for the same three tonal stimuli must be completed in order to confirm an experience-dependent effect on our pitch-related components (in progress).
4.3. Conclusions
Our discovery of ERP components sensitive to specific attributes of dynamic, curvilinear pitch contours that are ecologically representative of natural speech opens up a new avenue for research into stages of pitch processing in the human brain. The experimental paradigm used in this study may also provide a new window to evaluate the online interplay between feedforward and feedback components in the processing of pitch-relevant information at the level of the brainstem and the auditory cortex (cf. Foxe & Schroeder, 2005), and how experience shapes pitch processing at cortical and subcortical levels. The selection of dynamic pitch contours representative of lexical tone will enable us to evaluate the influence of language experience on the latency and amplitude of these cortical pitch response components in subsequent experiments. Complementary studies using MEG will be crucial to determine the anatomical sources of these components in an effort to shed more light on specific cortical generators contributing to the hierarchical stages of pitch processing, and how experience may shape these processes.
Supplementary Material
Cortical pitch response components index specific features of dynamic pitch
Pa: sound onset; Na: pitch onset; Na-Pb/Pb-Nb: pitch change; Pc-Nc: sound offset
Hemispheric roles emerge early depending upon sensory/cognitive stimulus properties
Acknowledgments
Research supported by NIH 5R01DC008549-06 (A.K.). Thanks to Jingyi Zhu and Longjie Cheng for their assistance with statistical analysis (Department of Statistics); Jilian Wendel and Chandan Hunsur Suresh for their help with data acquisition and graphics, respectively.
Appendices A–E
Supplementary material associated with this article can be found in the online version at doi:
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Contributor Information
Ananthanarayan Krishnan, Email: rkrish@purdue.edu.
Jackson T. Gandour, Email: gandour@purdue.edu.
Saradha Ananthakrishnan, Email: sanantha@purdue.edu.
Venkatakrishnan Vijayaraghavan, Email: vvijayar@purdue.edu.
References
- Alain C, Woods DL, Covarrubias D. Activation of duration-sensitive auditory cortical fields in humans. Electroencephalography and Clinical Neurophysiology. 1997;104(6):531–539. doi: 10.1016/s0168-5597(97)00057-9. [DOI] [PubMed] [Google Scholar]
- Alku P, Sivonen P, Palomaki K, Tiitinen H. The periodic structure of vowel sounds is reflected in human electromagnetic brain responses. Neuroscience Letters. 2001;298(1):25–28. doi: 10.1016/s0304-3940(00)01708-0. [DOI] [PubMed] [Google Scholar]
- Barker D, Plack CJ, Hall DA. Reexamining the evidence for a pitch-sensitive region: a human fMRI study using iterated ripple noise. Cerebral Cortex. 2012;22(4):745–753. doi: 10.1093/cercor/bhr065. [DOI] [PubMed] [Google Scholar]
- Bendor D, Wang X. The neuronal representation of pitch in primate auditory cortex. Nature. 2005;436(7054):1161–1165. doi: 10.1038/nature03867. doi: nature03867 [pii] 10.1038/nature03867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Besson M, Chobert J, Marie C. Language and music in the musician brain. Language and Linguistics Compass. 2011;5(9):617–634. doi: 10.1111/j.1749-818x.2011.00302. [DOI] [Google Scholar]
- Cariani PA, Delgutte B. Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. Journal of Neurophysiology. 1996a;76(3):1698–1716. doi: 10.1152/jn.1996.76.3.1698. [DOI] [PubMed] [Google Scholar]
- Cariani PA, Delgutte B. Neural correlates of the pitch of complex tones. II. Pitch shift, pitch ambiguity, phase invariance, pitch circularity, rate pitch, and the dominance region for pitch. Journal of Neurophysiology. 1996b;76(3):1717–1734. doi: 10.1152/jn.1996.76.3.1717. [DOI] [PubMed] [Google Scholar]
- Cedolin L, Delgutte B. Pitch of complex tones: rate-place and interspike interval representations in the auditory nerve. Journal of Neurophysiology. 2005;94(1):347–362. doi: 10.1152/jn.01114.2004. doi: 01114.2004 [pii] 10.1152/jn.01114.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chait M, Poeppel D, Simon JZ. Neural response correlates of detection of monaurally and binaurally created pitches in humans. Cerebral cortex (New York, NY: 1991) 2006;16(6):835–848. doi: 10.1093/cercor/bhj027. doi: bhj027 [pii] 10.1093/cercor/bhj027. [DOI] [PubMed] [Google Scholar]
- Chandrasekaran B, Gandour JT, Krishnan A. Neuroplasticity in the processing of pitch dimensions: A multidimensional scaling analysis of the mismatch negativity. Restorative Neurology and Neuroscience. 2007;25(3–4):195–210. [PMC free article] [PubMed] [Google Scholar]
- Chandrasekaran B, Krishnan A, Gandour JT. Mismatch negativity to pitch contours is influenced by language experience. Brain Research. 2007;1128(1):148–156. doi: 10.1016/j.brainres.2006.10.064. doi: S0006-8993(06)03179-9 [pii] 10.1016/j.brainres.2006.10.064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Denham S. Pitch detection of dynamic iterated rippled noise by humans and a modified auditory model. Biosystems. 2005;79(1–3):199–206. doi: 10.1016/j.biosystems.2004.09.008. [DOI] [PubMed] [Google Scholar]
- Foxe JJ, Schroeder CE. The case for feedforward multisensory convergence during early cortical processing. Neuroreport. 2005;16(5):419–423. doi: 10.1097/00001756-200504040-00001. [DOI] [PubMed] [Google Scholar]
- Friederici AD, Alter K. Lateralization of auditory language functions: a dynamic dual pathway model. Brain and Language. 2004;89(2):267–276. doi: 10.1016/S0093-934X(03)00351-1. doi: 10.1016/S0093-934X(03)00351-1 S0093934X03003511 [pii] [DOI] [PubMed] [Google Scholar]
- Gage NM, Roberts TP. Temporal integration: reflections in the M100 of the auditory evoked field. Neuroreport. 2000;11(12):2723–2726. doi: 10.1097/00001756-200008210-00023. [DOI] [PubMed] [Google Scholar]
- Gandour JT, Krishnan A. Neural bases of lexical tone. In: Winskel H, Padakannaya P, editors. Handbook of South and Southeast Asian psycholinguistics. Cambridge, UK: Cambridge University Press; 2014. pp. 339–349. [Google Scholar]
- Griffiths TD, Buchel C, Frackowiak RS, Patterson RD. Analysis of temporal structure in sound by the human brain. Nature Neuroscience. 1998;1(5):422–427. doi: 10.1038/1637. [DOI] [PubMed] [Google Scholar]
- Griffiths TD, Kumar S, Sedley W, Nourski KV, Kawasaki H, Oya H, Howard MA. Direct recordings of pitch responses from human auditory cortex. Current Biology. 2010;20(12):1128–1132. doi: 10.1016/j.cub.2010.04.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Griffiths TD, Uppenkamp S, Johnsrude I, Josephs O, Patterson RD. Encoding of the temporal regularity of sound in the human brainstem. Nature Neuroscience. 2001;4(6):633–637. doi: 10.1038/88459. [DOI] [PubMed] [Google Scholar]
- Gu F, Zhang C, Hu A, Zhao G, Zhang X. Left hemisphere lateralization for lexical and acoustic pitch processing in Cantonese speakers as revealed by mismatch negativity. Neuroimage. 2013 doi: 10.1016/j.neuroimage.2013.02.080. [DOI] [PubMed] [Google Scholar]
- Gutschalk A, Patterson RD, Rupp A, Uppenkamp S, Scherg M. Sustained magnetic fields reveal separate sites for sound level and temporal regularity in human auditory cortex. Neuroimage. 2002;15(1):207–216. doi: 10.1006/nimg.2001.0949. [DOI] [PubMed] [Google Scholar]
- Gutschalk A, Patterson RD, Scherg M, Uppenkamp S, Rupp A. Temporal dynamics of pitch in human auditory cortex. Neuroimage. 2004;22(2):755–766. doi: 10.1016/j.neuroimage.2004.01.025. doi: 10.1016/j.neuroimage.2004.01.025 S1053811904000680 [pii] [DOI] [PubMed] [Google Scholar]
- Gutschalk A, Patterson RD, Scherg M, Uppenkamp S, Rupp A. The effect of temporal context on the sustained pitch response in human auditory cortex. Cerebral Cortex. 2007;17(3):552–561. doi: 10.1093/cercor/bhj180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hari R, Pelizzone M, Makela JP, Hallstrom J, Leinonen L, Lounasmaa OV. Neuromagnetic responses of the human auditory cortex to on- and offsets of noise bursts. Audiology. 1987;26(1):31–43. doi: 10.3109/00206098709078405. [DOI] [PubMed] [Google Scholar]
- Itoh K, Okumiya-Kanke Y, Nakayama Y, Kwee IL, Nakada T. Effects of musical training on the early auditory cortical representation of pitch transitions as indexed by change-N1. European Journal of Neuroscience. 2012;36(11):3580–3592. doi: 10.1111/j.1460-9568.2012.08278.x. [DOI] [PubMed] [Google Scholar]
- Johnsrude IS, Penhune VB, Zatorre RJ. Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain. 2000;123(Pt 1):155–163. doi: 10.1093/brain/123.1.155. [DOI] [PubMed] [Google Scholar]
- Joutsiniemi SL, Hari R, Vilkman V. Cerebral magnetic responses to noise bursts and pauses of different durations. Audiology. 1989;28(6):325–333. doi: 10.3109/00206098909081639. [DOI] [PubMed] [Google Scholar]
- Koelsch S. Brain & Music. Chichester, UK: Wiley-Blackwell; 2012. [Google Scholar]
- Kratochvil P. Variable norms of tones in Beijing prosody. Cahiers de Linguistique Asie Orientale. 1985;14(2):153–174. [Google Scholar]
- Kraus N, Banai K. Auditory-processing malleability: Focus on language and music. Current Directions in Psychological Science. 2007;16(2):105–110. [Google Scholar]
- Krishnan A, Bidelman GM, Smalt CJ, Ananthakrishnan S, Gandour JT. Relationship between brainstem, cortical and behavioral measures relevant to pitch salience in humans. Neuropsychologia. 2012;50(12):2849–2859. doi: 10.1016/j.neuropsychologia.2012.08.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krishnan A, Gandour JT, Bidelman GM. Experience-dependent plasticity in pitch encoding: from brainstem to auditory cortex. Neuroreport. 2012;23(8):498–502. doi: 10.1097/WNR.0b013e328353764d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krishnan A, Gandour JT, Smalt CJ, Bidelman GM. Language-dependent pitch encoding advantage in the brainstem is not limited to acceleration rates that occur in natural speech. Brain and Language. 2010;114(3):193–198. doi: 10.1016/j.bandl.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krishnan A, Swaminathan J, Gandour JT. Experience-dependent enhancement of linguistic pitch representation in the brainstem is not specific to a speech context. Journal of Cognitive Neuroscience. 2009;21(6):1092–1105. doi: 10.1162/jocn.2009.21077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krumbholz K, Patterson RD, Seither-Preisler A, Lammertmann C, Lutkenhoner B. Neuromagnetic evidence for a pitch processing center in Heschl’s gyrus. Cerebral Cortex. 2003;13(7):765–772. doi: 10.1093/cercor/13.7.765. [DOI] [PubMed] [Google Scholar]
- Luo H, Ni JT, Li ZH, Li XO, Zhang DR, Zeng FG, Chen L. Opposite patterns of hemisphere dominance for early auditory processing of lexical tones and consonants. Proceedings of the National Academy of Sciences of the United States of America. 2006;103(51):19558–19563. doi: 10.1073/pnas.0607065104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lutkenhoner B, Seither-Preisler A, Krumbholz K, Patterson RD. Auditory cortex tracks the temporal regularity of sustained noisy sounds. Hearing Research. 2011;272(1–2):85–94. doi: 10.1016/j.heares.2010.10.013. [DOI] [PubMed] [Google Scholar]
- Lutkenhoner B, Seither-Preisler A, Seither S. Piano tones evoke stronger magnetic fields than pure tones or noise, both in musicians and non-musicians. Neuroimage. 2006;30(3):927–937. doi: 10.1016/j.neuroimage.2005.10.034. [DOI] [PubMed] [Google Scholar]
- Lutkenhoner B, Steinstrater O. High-precision neuromagnetic study of the functional organization of the human auditory cortex. Audiology and Neuro-Otology. 1998;3:191–213. doi: 10.1159/000013790. [DOI] [PubMed] [Google Scholar]
- Maess B, Jacobsen T, Schroger E, Friederici AD. Localizing pre-attentive auditory memory-based comparison: magnetic mismatch negativity to pitch change. Neuroimage. 2007;37(2):561–571. doi: 10.1016/j.neuroimage.2007.05.040. doi: S1053-8119(07)00462-4 [pii] 10.1016/j.neuroimage.2007.05.040. [DOI] [PubMed] [Google Scholar]
- Magne C, Schon D, Besson M. Musician children detect pitch violations in both music and language better than nonmusician children: behavioral and electrophysiological approaches. Journal of Cognitive Neuroscience. 2006;18(2):199–211. doi: 10.1162/089892906775783660. [DOI] [PubMed] [Google Scholar]
- Marie C, Delogu F, Lampis G, Belardinelli MO, Besson M. Influence of musical expertise on segmental and tonal processing in Mandarin Chinese. Journal of Cognitive Neuroscience. 2011;23(10):2701–2715. doi: 10.1162/jocn.2010.21585. [DOI] [PubMed] [Google Scholar]
- Meddis R, O’Mard L. A unitary model of pitch perception. Journal of the Acoustical Society of America. 1997;102(3):1811–1820. doi: 10.1121/1.420088. [DOI] [PubMed] [Google Scholar]
- Meyer M. Functions of the left and right posterior temporal lobes during segmental and suprasegmental speech perception. Zeitshcrift fur Neuropsycholgie. 2008;19(2):101–115. [Google Scholar]
- Moore CB, Jongman A. Speaker normalization in the perception of Mandarin Chinese tones. Journal of the Acoustical Society of America. 1997;102(3):1864–1877. doi: 10.1121/1.420092. [DOI] [PubMed] [Google Scholar]
- Munte TF, Altenmuller E, Jancke L. The musician’s brain as a model of neuroplasticity. Nature Reviews: Neuroscience. 2002;3(6):473–478. doi: 10.1038/nrn843. [DOI] [PubMed] [Google Scholar]
- Oldfield RC. The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia. 1971;9:97–113. doi: 10.1016/0028-3932(71)90067-4. [DOI] [PubMed] [Google Scholar]
- Onishi S, Davis H. Effects of duration and rise time of tone bursts on evoked V potentials. Journal of the Acoustical Society of America. 1968;44(2):582–591. doi: 10.1121/1.1911124. [DOI] [PubMed] [Google Scholar]
- Ostroff JM, McDonald KL, Schneider BA, Alain C. Aging and the processing of sound duration in human auditory cortex. Hearing Research. 2003;181(1-2):1–7. doi: 10.1016/s0378-5955(03)00113-8. [DOI] [PubMed] [Google Scholar]
- Pantev C, Oostenveld R, Engelien A, Ross B, Roberts LE, Hoke M. Increased auditory cortical representation in musicians. Nature. 1998;392(6678):811–814. doi: 10.1038/33918. [DOI] [PubMed] [Google Scholar]
- Patel AD, Iversen JR. The linguistic benefits of musical abilities. Trends in Cognitive Sciences. 2007;11(9):369–372. doi: 10.1016/j.tics.2007.08.003. [DOI] [PubMed] [Google Scholar]
- Patterson RD, Uppenkamp S, Johnsrude IS, Griffiths TD. The processing of temporal pitch and melody information in auditory cortex. Neuron. 2002;36(4):767–776. doi: 10.1016/s0896-6273(02)01060-7. [DOI] [PubMed] [Google Scholar]
- Penagos H, Melcher JR, Oxenham AJ. A neural representation of pitch salience in nonprimary human auditory cortex revealed with functional magnetic resonance imaging. Journal of Neuroscience. 2004;24(30):6810–6815. doi: 10.1523/JNEUROSCI.0383-04.2004. doi: 10.1523/JNEUROSCI.0383-04.2004 24/30/6810 [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plack CJ, Carlyon RP, Viemeister NF. Intensity discrimination under forward and backward masking: role of referential coding. Journal of the Acoustical Society of America. 1995;97(2):1141–1149. doi: 10.1121/1.412227. [DOI] [PubMed] [Google Scholar]
- Plack CJ, Turgeon M, Lancaster S, Carlyon RP, Gockel HE. Frequency discrimination duration effects for Huggins pitch and narrowband noise (L) Journal of the Acoustical Society of America. 2011;129(1):1–4. doi: 10.1121/1.3518745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plack CJ, White LJ. Perceived continuity and pitch perception. Journal of the Acoustical Society of America. 2000;108(3 Pt 1):1162–1169. doi: 10.1121/1.1287022. [DOI] [PubMed] [Google Scholar]
- Poeppel D. The analysis of speech in different temporal integration windows: Cerebral lateralization as ‘asymmetric sampling in time’. Speech Communication. 2003;41(1):245–255. [Google Scholar]
- Poeppel D, Idsardi WJ, van Wassenhove V. Speech perception at the interface of neurobiology and linguistics. Philosophical Transactions of the Royal Society of London Series B: Biological Sciences. 2008;363(1493):1071–1086. doi: 10.1098/rstb.2007.2160. doi: TM425571U1117682 [pii] 10.1098/rstb.2007.2160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ritter S, Gunter Dosch H, Specht HJ, Rupp A. Neuromagnetic responses reflect the temporal pitch change of regular interval sounds. Neuroimage. 2005;27(3):533–543. doi: 10.1016/j.neuroimage.2005.05.003. [DOI] [PubMed] [Google Scholar]
- Ross B, Snyder JS, Aalto M, McDonald KL, Dyson BJ, Schneider B, Alain C. Neural encoding of sound duration persists in older adults. Neuroimage. 2009;47(2):678–687. doi: 10.1016/j.neuroimage.2009.04.051. [DOI] [PubMed] [Google Scholar]
- Sayles M, Winter IM. The temporal representation of the delay of dynamic iterated rippled noise with positive and negative gain by single units in the ventral cochlear nucleus. Brain Research. 2007;1171:52–66. doi: 10.1016/j.brainres.2007.06.098. doi: S0006-8993(07)01509-0 [pii] 10.1016/j.brainres.2007.06.098. [DOI] [PubMed] [Google Scholar]
- Schneider P, Scherg M, Dosch HG, Specht HJ, Gutschalk A, Rupp A. Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of musicians. Nature Neuroscience. 2002;5(7):688–694. doi: 10.1038/nn871. [DOI] [PubMed] [Google Scholar]
- Schonwiesner M, Zatorre RJ. Depth electrode recordings show double dissociation between pitch processing in lateral Heschl’s gyrus and sound onset processing in medial Heschl’s gyrus. Experimental Brain Research. 2008;187(1):97–105. doi: 10.1007/s00221-008-1286-z. [DOI] [PubMed] [Google Scholar]
- Seither-Preisler A, Krumbholz K, Patterson R, Seither S, Lutkenhoner B. Interaction between the neuromagnetic responses to sound energy onset and pitch onset suggests common generators. European Journal of Neuroscience. 2004;19(11):3073–3080. doi: 10.1111/J.1460-9568.2004.03423.X. [DOI] [PubMed] [Google Scholar]
- Seither-Preisler A, Patterson R, Krumbholz K, Seither S, Lutkenhoner B. Evidence of pitch processing in the N100m component of the auditory evoked field. Hearing Research. 2006;213(1–2):88–98. doi: 10.1016/j.heares.2006.01.003. [DOI] [PubMed] [Google Scholar]
- Shahin A, Bosnyak DJ, Trainor LJ, Roberts LE. Enhancement of neuroplastic P2 and N1c auditory evoked potentials in musicians. Journal of Neuroscience. 2003;23(13):5545–5552. doi: 10.1523/JNEUROSCI.23-13-05545.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soeta Y, Nakagawa S. The effects of pitch and pitch strength on an auditory-evoked N1m. Neuroreport. 2008;19(7):783–787. doi: 10.1097/WNR.0b013e3282fe2085. [DOI] [PubMed] [Google Scholar]
- Soeta Y, Nakagawa S, Tonoike M. Auditory evoked magnetic fields in relation to iterated rippled noise. Hearing Research. 2005;205(1–2):256–261. doi: 10.1016/j.heares.2005.03.026. [DOI] [PubMed] [Google Scholar]
- Swaminathan J, Krishnan A, Gandour JT, Xu Y. Applications of static and dynamic iterated rippled noise to evaluate pitch encoding in the human auditory brainstem. IEEE Transactions on Biomedical Engineering. 2008;55(1):281–287. doi: 10.1109/TBME.2007.896592. [DOI] [PubMed] [Google Scholar]
- Tervaniemi M, Kruck S, De Baene W, Schroger E, Alter K, Friederici AD. Top-down modulation of auditory processing: effects of sound context, musical expertise and attentional focus. The European journal of neuroscience. 2009;30(8):1636–1642. doi: 10.1111/j.1460-9568.2009.06955.x. [DOI] [PubMed] [Google Scholar]
- Tsang YK, Jia S, Huang J, Chen HC. ERP correlates of pre-attentive processing of Cantonese lexical tones: The effects of pitch contour and pitch height. Neuroscience Letters. 2011;487(3):268–272. doi: 10.1016/j.neulet.2010.10.035. doi: S0304-3940(10)01374-1 [pii] 10.1016/j.neulet.2010.10.035. [DOI] [PubMed] [Google Scholar]
- Walker KM, Bizley JK, King AJ, Schnupp JW. Cortical encoding of pitch: recent results and open questions. Hearing Research. 2011;271(1–2):74–87. doi: 10.1016/j.heares.2010.04.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- White LJ, Plack CJ. Temporal processing of the pitch of complex tones. Journal of the Acoustical Society of America. 1998;103(4):2051–2063. doi: 10.1121/1.421352. [DOI] [PubMed] [Google Scholar]
- White LJ, Plack CJ. Factors affecting the duration effect in pitch perception for unresolved complex tones. Journal of the Acoustical Society of America. 2003;114(6 Pt 1):3309–3316. doi: 10.1121/1.1621860. [DOI] [PubMed] [Google Scholar]
- Wildgruber D, Ackermann H, Kreifelts B, Ethofer T. Cerebral processing of linguistic and emotional prosody: fMRI studies. Progress in Brain Research. 2006;156:249–268. doi: 10.1016/S0079-6123(06)56013-3. doi: S0079-6123(06)56013-3 [pii] 10.1016/S0079-6123(06)56013-3. [DOI] [PubMed] [Google Scholar]
- Wong PC, Perrachione TK. Learning pitch patterns in lexical identification by native English-speaking adults. Applied Psycholinguistics. 2007;28(4):565–585. [Google Scholar]
- Xi J, Zhang L, Shu H, Zhang Y, Li P. Categorical perception of lexical tones in Chinese revealed by mismatch negativity. Neuroscience. 2010;170(1):223–231. doi: 10.1016/j.neuroscience.2010.06.077. doi: S0306-4522(10)00949-8 [pii] 10.1016/j.neuroscience.2010.06.077. [DOI] [PubMed] [Google Scholar]
- Xu Y, Sun X. Maximum speed of pitch change and how it may relate to speech. Journal of the Acoustical Society of America. 2002;111(3):1399–1413. doi: 10.1121/1.1445789. [DOI] [PubMed] [Google Scholar]
- Yrttiaho S, Tiitinen H, Alku P, Miettinen I, May PJ. Temporal integration of vowel periodicity in the auditory cortex. Journal of the Acoustical Society of America. 2010;128(1):224–234. doi: 10.1121/1.3397622. [DOI] [PubMed] [Google Scholar]
- Yrttiaho S, Tiitinen H, May PJ, Leino S, Alku P. Cortical sensitivity to periodicity of speech sounds. Journal of the Acoustical Society of America. 2008;123(4):2191–2199. doi: 10.1121/1.2888489. [DOI] [PubMed] [Google Scholar]
- Zatorre RJ. Pitch perception of complex tones and human temporal-lobe function. Journal of the Acoustical Society of America. 1988;84(2):566–572. doi: 10.1121/1.396834. [DOI] [PubMed] [Google Scholar]
- Zatorre RJ, Baum SR. Musical melody and speech intonation: Singing a different tune. PLoS Biology. 2012;10(7):e1001372. doi: 10.1371/journal.pbio.1001372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zatorre RJ, Belin P, Penhune VB. Structure and function of auditory cortex: music and speech. Trends in Cognitive Sciences. 2002;6(1):37–46. doi: 10.1016/s1364-6613(00)01816-7. [DOI] [PubMed] [Google Scholar]
- Zatorre RJ, Gandour JT. Neural specializations for speech and pitch: moving beyond the dichotomies. Philosophical Transactions of the Royal Society of London Series B: Biological Sciences. 2008;363(1493):1087–1104. doi: 10.1098/rstb.2007.2161. doi: J412P80575385013 [pii] 10.1098/rstb.2007.2161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng HY, Minett JW, Peng G, Wang WS-Y. The impact of tone systems on the categorical perception of lexical tones: An event-related potentials study. Language and Cognitive Processes. 2012;27(2):184–209. doi: 10.1080/01690965.2010.520493. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.