Skip to main content
PLOS One logoLink to PLOS One
. 2021 Feb 25;16(2):e0247495. doi: 10.1371/journal.pone.0247495

Change detection of auditory tonal patterns defined by absolute versus relative pitch information. A combined behavioural and EEG study

Nina Coy 1,*, Maria Bader 1, Erich Schröger 1, Sabine Grimm 1
Editor: Qian-Jie Fu2
PMCID: PMC7906474  PMID: 33630974

Abstract

The human auditory system often relies on relative pitch information to extract and identify auditory objects; such as when the same melody is played in different keys. The current study investigated the mental chronometry underlying the active discrimination of unfamiliar melodic six-tone patterns by measuring behavioural performance and event-related potentials (ERPs). In a roving standard paradigm, such patterns were either repeated identically within a stimulus train, carrying absolute frequency information about the pattern, or shifted in pitch (transposed) between repetitions, so only relative pitch information was available to extract the pattern identity. Results showed that participants were able to use relative pitch to detect when a new melodic pattern occurred. Though in the absence of absolute pitch sensitivity significantly decreased and behavioural reaction time to pattern changes increased. Mismatch-Negativity (MMN), an ERP indicator of auditory deviance detection, was elicited at approximately 206 ms after stimulus onset at frontocentral electrodes, even when only relative pitch was available to inform pattern discrimination. A P3a was elicited in both conditions, comparable in amplitude and latency. Increased latencies but no differences in amplitudes of N2b, and P3b suggest that processing at higher levels is affected when, in the absence of absolute pitch cues, relative pitch has to be extracted to inform pattern discrimination. Interestingly, the response delay of approximately 70 ms on the behavioural level, already fully manifests at the level of N2b. This is in accordance with recent findings on implicit auditory learning processes and suggests that in the absence of absolute pitch cues a slowing of target selection rather than a slowing of the auditory pattern change detection process causes the deterioration in behavioural performance.

Introduction

Acoustic information arriving at the human ear is fleeting and complex–in fact, auditory events are temporally and spectrally highly variable signals requiring efficient processing [1]. And yet, we can easily understand the same words articulated by speakers in different registers or volume, and recognise a melody played in another key or tempo. The meaningful sensory representation (i.e., a verbal message, a melody) derived from the, oftentimes ambiguous, acoustic input is typically referred to as an “auditory object”. This term delineates “an acoustic experience that produces a two-dimensional image with frequency and time dimensions” [2]. Thus, despite being a general definition, it points out that the time course of spectral information provides salient cues about auditory events (e.g., a word, a short melody). Hence, it is not surprising that the human auditory system is finely attuned to the processing of spectral properties such as pitch at multiple levels of the central nervous system [3].

When a sound pattern occurs repeatedly, to extract and store its specific acoustic structure allows the auditory system to compare new input against this representation. In fact, the extraction of invariances facilitates the formation as well as the stabilisation of auditory representations [4], and their segregation from complex auditory environments [5].

In the simplest case, a specific auditory event recurs exactly within a natural auditory environment. For instance, when playing the same piano note twice, the spectral information is identical for both keystrokes. However, relevant auditory information (particularly in a sequence of auditory events) typically is not comprised in exact absolute spectral values but rather in their relative distances (e.g. pitch relations in music, formants in speech)–enabling the recognition of auditory objects despite large variation in spectral features [610]. As absolute spectral information provides only a limited basis for generalisation from prior listening experiences, it is of limited use outside highly familiar and rich contexts [11]. Actually, as absolute spectral information often is not available in learning situations, tolerance to absolute spectral variability must extend to the initial acquisition mechanisms. A first proof-of-principle was delivered by a study almost three decades ago, showing that not only first-order physical features of auditory stimuli are encoded in the brain, but also the relation between tone pairs can be derived from a series of varying physical events [12]. Since then, a rich body of research has emerged, enabling a better understanding of the processing of complex sounds [for review: 3, 13–16].

Indeed, a study published in 2017 found evidence that even in the absence of direct attention unfamiliar short melodic patterns are extracted when learning has to rely solely on relative pitch cues because absolute pitch information varies due to transpositions of these patterns [17]. In naïve listeners that performed a loudness change detection task, Bader and colleagues [17] observed a clear mismatch-negativity (MMN) in response to the introduction of a new pattern not only after previous exact (i.e. absolute) pattern repetitions but also when the previous pattern was transposed (i.e. relative) between repetitions. The MMN component of the event-related-potential (ERP) is typically observed as a frontocentral negative deflection in the deviant minus standard difference potential at 100–250 ms after deviation onset [18, 19]. MMN is considered to reflect the outcome of a process where a deviant event is found incongruent with “the predictions produced by the neural representations of regularities extracted from the acoustic environment” [16]. Bader et al. [17] observed that MMN was elicited after only three presentations of the preceding pattern and at a similar latency in both absolute and relative pitch contexts. Bader et al. [17] interpreted this as strong indication that the sensory memory trace forms automatically and violations of complex pattern regularities are extracted even without absolute pitch cues. This is in line with research showing that MMN is sensitive to higher order regularities [for review: 15, 19]. However, on the stimulus (not the difference) level deviant amplitudes increased significantly as a function of the number of preceding standard stimuli in the absolute pitch information context only. Bader et al. [17] suggested that this might be explained by a general attenuation of pattern change processing when reliant on relative pitch cues. Considering topographical differences between conditions, it also might indicate that different areas are involved in the processing of relative patterns.

Interestingly, P3a in response to the occurrence of new patterns was markedly affected in the form of a decreased amplitude and an increased latency by absence of absolute pitch cues. P3a is typically observed as a strong frontal/central positive deflection peaking at around 300–500 ms after deviation onset in response to task-irrelevant deviant stimuli [2023]. Although it is not fully understood yet, which processes underlie the elicitation of P3a [24, 25], it has traditionally been associated with the (involuntary) orienting of attention towards the deviant auditory event [22, 26], more recently, with its evaluation [25, 2729], and stimulus selection in working memory [24]. Bader et al. [17] interpreted the P3a modulation as a reflection of “the difficulty to distinguish implicitly between standard and deviant sound patterns in a relative pitch code context”. This was supported during additional behavioural testing: when the emergence of new patterns was made explicitly task relevant, such pattern changes were detected less often and substantially slower when based on relative compared to absolute pitch cues.

One might argue that the observed P3a effects result from a substantial impoverishment of stimulus discriminability in the absence of absolute pitch. Nonetheless, it should not be neglected that the comparison of new patterns arguably is computationally more complex when reliant on relative compared to absolute pitch information, as it is not sufficient to compare whether two pitches are the same but rather whether their relative distance is. For instance, it was found that previously heard melodies that were in the same key at exposure and test were recognized with greater accuracy than melodies that were transposed [3033], and that the ability to process relative pitch information depends on experience [30, 32, 3437]. In our case, when absolute pitch information is available, any change in pitch is indication enough that the listener hears a new pattern. Deviant relative patterns can only be identified as such by comparing relative distances between at least two pitches within a pattern in relation to the previous stimuli. Thus, the P3a latency effect might be attributable to differences in computational processing demands. Actually, this might also concord with different brain areas being involved, which fits well with the observations described above about the topography of the deviant response at the level of MMN. Nevertheless, the consistent latencies of MMN in the two contexts rather confirm that the auditory discrimination process involved in detecting a pattern change is not necessarily delayed in the absence of absolute pitch cues. Thus, the considerable P3a differences might be (partly) attributable to an uncertainty at higher levels about which auditory changes are actually relevant in a passive listening situation [24]. Though, P3a might reflect stimulus evaluation and not mere orienting of attention toward the deviant stimulus, that does not mean it is independent from attentional modulation [22, 27, 38, 39]. Whereas a new pattern in a context of identical pattern repetitions constitutes a clear deviation to the preceding stimulation and elicits P3a [17, 40], the occurrence of a new pattern in a relative pitch code context must not necessarily be inherently more relevant to the brain than if it were a transposition of the preceding pattern–in both cases absolute pitch information has changed. For instance, when a word is spoken first in a low and then in a higher register it might indicate emotional arousal in a person (i.e., a change within the current source) just as well as it might signify that another speaker simply repeated what the first said (i.e., change to a new source). Also, it might not be adaptive to shift resources to every change (i.e., transpositions, pattern and loudness changes) within a high change environment (i.e., relative pitch context) which holds no task-relevant information for a listener. Therefore, the amplitude difference is possibly accounted for by bottom-up attentional modulation.

In sum, these findings indicate a striking divergence between the automatic change detection regardless of absolute pitch information at the level of MMN on the one hand, and the decreased further evaluation processing at the level of P3a. This poses the question whether the P3a amplitude decrease and latency increase reported by Bader et al. [17] reflect a difference in stimulus discriminability, computational complexity, bottom-up relevance, or a combination of these factors as a function of availability of relative compared to absolute pitch information. Although Bader et al. [17] observed that the discrimination performance between pattern repetition and true pattern change was significantly decreased by lack of absolute pitch information, one should consider the following: in the instances that a new pattern based on relative pitch information was correctly identified as such, still a prolongation of response latency occurred, it seems unlikely that the P3a effect is only a question of relevance. The aim of the experiment on hand was to investigate this temporal delay as well as a possible amplitude modulation on the electrophysiological level in a setting, in which the occurrence of new patterns is top-down relevant. That means, participants are explicitly asked to detect pattern changes to ensure that they actively attend to deviant patterns.

Other studies employing active discrimination paradigms have shown that the components MMN, N2b and the P300 complex typically characterise the ERP in such situations [38, 41]. It has also been reported that in active paradigms MMN can be difficult to estimate due to a temporal and spatial superposition of N2b [18, 19, 42]. The generic N2b component usually shows at approximately 200–350 ms after stimulus onset with a modality specific topography–in response to auditory stimuli it manifests as a central negative deflection [43, 44]. It has been related to intentional higher-order processing of change (mismatch) and of task-relevant (match) stimulus characteristics [43, 4547]. Dien, Spencer & Donchin (2004) have discussed N2 as a process operating on a stimulus identification stage, though Ritter et al. (1983) have argued that it despite reflecting a comparison process between current stimulus and a voluntarily held target template, presence of N2b alone should not simply be equated with the actual detection of a target. The aforementioned P3a actually often occurs within a broad positivity at around 300–500 ms after stimulus onset observed in stimulus discrimination contexts, termed P300-complex [22, 23, 48]. In response to target stimuli the P300 complex also includes the more posterior P3b [2124, 48]. Estimation of components within the P300 range often is difficult when there is a temporal overlap of central P3a and parietal P3b, sometimes adding up to one big late positive potential [21, 24, 49, 50]. P3b is typically assumed to index the revision, or updating of the current mental model of the stimulus environment in response to an unexpected task-relevant event [51, 52]. The postulation that P3b is largely independent from response selection and execution processes [51, 53, 54] has been challenged though [49, 55, 56], as has the notion that P300 elicitation warrants unexpectedness [22, 24].

In general terms, P300 is associated with (focal) attention and (working) memory operations [2124, 57]. Though, it is not entirely clear whether P3a represent the same underlying processes in passive and active listening situations [24].

To summarise, MMN is mainly associated with automatic (non-intentional), P3a with somewhat semi-automatic processing, while N2b and P3b are related to intentional processing. Therefore, these four components and their relation to the behavioural output might offer insight into how, or when auditory processing is affected when absolute information about abstract patterns is variable. Extrapolating from the findings by Bader et al. [17], we hypothesise an impairment of behavioural performance visible in decreased pattern discrimination accuracy as well as prolonged reaction times in the absence of absolute pitch cues. Furthermore, that automatic auditory pattern processing (MMN) remains relatively unaffected by variability of absolute pitch information, but processes associated with direct attention (N2b, P3b) are more vulnerable to the absence of absolute pitch information. It is not really clear whether this is also true for P3a, or whether the explicit relevance of true pattern changes can compensate [24] the P3a impoverishments observed in the passive listening situation by Bader et al. [17] in a relative pitch context.

Methods

Participants

Data was collected at Leipzig University, protocol and procedures were in accordance with the Declaration of Helsinki and approved by the ethics committee of the Medical Faculty at Leipzig University (Az: 089-15-09032015). Participants either received credit points or were paid in compensation for their collaboration. Seventeen healthy people (18–31 years, 10 female) participated in the experiment on hand. Two additional participants were tested but excluded due to extensive artifacts. All participants reported normal hearing, and normal or corrected-to-normal vision. Self-reported musical expertise varied between no experience and 14 years of having played an instrument and, or sung in a choir (M = 5.79; SD = 4.88) but none of the participants were professional musicians.

Procedure and apparatus

Participants were seated comfortably in a soundproof chamber during the experiment. The auditory stimuli were created in MATLAB R2013a (MathWorks, 2013) and presented with Psychtoolbox [58] binaurally via headphones (HD25-I 70 Ω, Sennheiser GmbH & Co.KG, Germany) at approximately 73 dB sound pressure level. Participants were instructed to listen to melodic sound patterns. Before each block they were instructed via a computer screen, approximately 100 cm away from their eyes, to respond as quickly and as accurately as possible with a button press, whenever they noticed a pattern being genuinely different from the preceding one, i.e., not a mere transposition. Behavioural responses (button press) were registered via reaction time box (Response Time Box, Suzhou Litong Electronic Co., Ltd., China). Feedback of performance was given on the screen after each block. During the auditory stimulation, participants were asked to gaze at a white fixation cross against a black background at the centre of the screen. The whole experiment, including preparation and two breaks, took approximately three and a half hours.

Stimuli and design

Each auditory pattern was 300 ms in duration and composed of six seamlessly concatenated segments of 50 ms duration. Each segment was a compound of a fundamental tone, drawn randomly from within the frequency range between 220–880 Hz (i.e., the range of an octave), and several harmonics (decreasing in intensity with a linear slope and a cut-off at 6000 Hz). All segments included 5 ms rise and fall time, intensity was root-mean-square normalised.

The pattern change detection task was conducted in the form of a roving standard paradigm [17, 59, 60]. A randomly generated pattern was presented either 2, 3, 5 or 8 times successively (stimulus train). The last respective presentation of a given pattern served as the standard stimulus, and the first pattern of the subsequent stimulus train, which was incongruent with the pattern from the previous stimulus train, as a deviant stimulus. Therefore, there was no constant standard or deviant stimulus throughout a block.

In the absolute condition (ABS) a pattern was repeated identically within its stimulus train, thus each repetition consisted of the exact same configuration of pitches. Whereas, in the transposed condition (TRA) only the relations between pitches of a sound pattern were repeated by shifting the whole pattern by at least one semitone up or down in pitch (i.e., transposition over 12 equidistant semitone steps)–absolute pitches were thus changed between pattern presentations within a stimulus train. For an exemplary illustration of a pattern sequence please refer to Fig 1 and for a listening example to S1 and S2 Files.

Fig 1. Exemplary sequence of auditory patterns presented in a roving standard paradigm.

Fig 1

Each pattern is composed of six seamlessly concatenated 50 ms segments. Each segment was comprised of a fundamental, randomly drawn from 220–880 Hz, and added harmonics with a slope until 6000 Hz. Note, that for simplicity only fundamentals are depicted in the stimulus illustration. Sound patterns were presented for a certain number of times [2, 3, 5, 8] with a constant SOA (1100 ms), until a new pattern was introduced. The last pattern in each train served as standard (S), while the first presentation in each new pattern train was defined as a deviant (D) pattern (i.e., true pattern change). In the absolute condition (top, cf. S1 File) patterns within a stimulus train were repeated identically, carrying absolute frequency information about the pattern. In the transposed condition (bottom, cf. S2 File) patterns as a whole were shifted up or down in pitch (transposed) with a train, thus only relative frequency information was available to extract pattern identity. Participants were instructed to press a button whenever they detected a genuine pattern change (i.e., ignore transpositions).

There were 800 ms of silence between each pattern presentation. As at least two segments were necessary to discriminate a transposed pattern from the previous one (relation between the first two tones), the first segment of absolute patterns was fixed at 400 Hz so that congruently in this condition only the second segment indicated a potential pattern change. Each train length occurred ten times in a randomised order within each block. There were 10 blocks (400 deviants) of the absolute and 14 blocks (560 deviants) of the transposed condition, in order to compensate for the expected reduced hit rate of relative pattern deviants.

Data acquisition

Reaction time (RT) was defined as the time between pattern onset and button press, as long as it did not exceed the inter-stimulus-interval. In accordance with signal detection theory, a button press in response to a deviant pattern (target) was treated as a hit, whereas a button press in reaction to a standard pattern (non-target) was registered as a false alarm. Discrimination sensitivity (d’) between standard and deviant patterns was estimated using the log-linear correction [61].

EEG data was recorded continuously at 64 Ag/AgCl active electrodes, mounted according to the 10–10 international system in a suitable head cap, and amplified with a BrainAmp DC (Brain Products GmbH, Germany) amplifier and digitised with a sampling rate of 500 Hz. Eye movements (EOG) were measured horizontally with two electrodes positioned at the outer canthi of the eyes and vertically with one electrode below the left eye which was bipolarised with Fp1. Impedances were kept below 5 kΩ at all electrodes.

Data processing

EEG data analysis was performed offline in MATLAB (MathWorks, R2018b) and with the EEGLAB toolbox [62]. Data were consecutively filtered [63, 64] using a 0.1 Hz high-pass filter (Kaiser windowed sinc FIR filter, order = 9056, beta = 5.653, transition bandwidth = 0.2 Hz) and a subsequent 35 Hz low-pass filter (Kaiser windowed sinc FIR filter, order = 364, beta = 5.653, transition bandwidth = 5 Hz). Noisy channels, except EOG channels, with a z standardised standard deviation greater than 4 were removed from further analysis. The data were segmented into epochs of 900 ms in duration time-locked to the stimulus onset and included 100 ms baseline. Trials were excluded if maximal peak to peak difference was greater than 750 μV. Independent component analysis (ICA), using an infomax algorithm implemented in the pop_runica function of EEGLAB, was applied to further correct the data for artifacts with [65]. To improve decomposition, ICA was computed (exclusive of bad channels and trials) on the 1 Hz high-pass (Kaiser windowed sinc FIR filter, order = 3624, beta = 5.653, transition bandwidth = 0.5 Hz) and subsequently 35 Hz low-pass (see above) filtered raw data. To shorten computation time, the data were down-sampled to a 250 Hz sampling rate. ICA weights were then transferred onto the 0.1–35 Hz data. Artifactual components were semi-automatically identified [66] using the EEGLAB plugins SASICA, ADJUST [67] and FASTER [68], and subsequently removed from the data. Previously excluded channels were spherically spline interpolated [69] afterwards. Epochs were baseline corrected using the 100 ms before stimulus onset. Trials exceeding a 100 μV peak-to-peak difference were excluded from analysis. For each subject the remaining trials (epochs per cell: M = 291; SD = 108) were averaged for each stimulus type (standard and deviant) in each condition (absolute and transposed). Grand averages were computed from these subject-level ERPs, and difference waves by subtracting ERPs in response to standards from those to deviants.

Amplitudes were extracted on the subject level by window-averaging around each component’s respective grand-average peak (window widths: MMN: 40 ms, N2b: 60 ms, P3a: 40 ms, P3b: 100 ms) for each stimulus in both conditions within either an anterior (MMN, N2b and P3a) or a posterior (P3b) 3x3 electrode cluster. The jackknife-scoring method [70] was used to estimate the time course of each ERP component at a respective electrode of largest activation (MMN: Fz, N2b: FCz, P3a: FCz, P3b: Pz). Specifically, the time point was estimated, at which the amplitude of a particular component across leave-one-participant-out subsamples of the grand-averaged wave first reaches specific percentage values of the respective peak amplitude; slope (60%) and peak (100%). The 60%- relative peak estimate was included firstly, because relative latency estimates have been shown to be less noisy than peak latency estimates using the jack-knifing technique, and secondly [70], to probe whether latency effects are already present in the build-up of a given component. The search windows for jackknife estimation were defined based on the grand-average to avoid overlap of components as follows: 100–230 ms for MMN, 180–500 ms for N2b, 250–550 ms for P3a, and 300–800 ms for P3b.

Statistical analysis

The statistical analysis was conducted in RStudio Desktop Open Source Edition Version 3.6.2 [71], with ez package [72], multtest package [73], and ggplot2 package [74].

Reaction times and sensitivity indices were tested between conditions (absolute vs. relative) by means of a paired student’s t test.

The reported t values of the jackknife latencies comparing between absolute and transposed condition, as well as corresponding standard errors were adjusted to correct for an artificial reduction in error variance induced by jackknifing [75, 76]. Differences in amplitudes were statistically analysed by means of a 2x2x3 repeated measures ANOVA for each component respectively with the within-subject factors condition (absolute vs. transposed) x stimulus type (standard vs. deviant) x frontality [MMN, N2b, P3a: anteriority (frontal, frontocentral, central) / P3b: posteriority (parietocentral, parietal, parietooccipital)]. As there were no meaningful effects of laterality, the lateral dimension was collapsed, in order to simplify the statistical analysis. Please note that the activity values along the midline (factor frontality) represent averaged values not only including the central electrode but also the respective lateral electrodes directly adjacent to the midline electrode; e.g., the factor level frontal is the average of Fz (middle), F3 (left) and F4 (right).

In case of violations of the sphericity assumption, Greenhouse-Geisser corrections were applied, and corrected p values are labelled with “GG”. Post hoc analysis of significant interactions was conducted by means of within factor level repeated measures ANOVAs and post-hoc paired student t test. To investigate presence of the components of interest, deviants were compared against standards respectively, to probe for condition effects on these components the magnitude of the deviant minus standard differences were compared between conditions by means of post hoc paired student’s t tests. In multiple pairwise comparisons the two-step Benjamini-Hochberg procedure [77] was applied to control the false discovery rate at a level of 5%, which has been shown to yield a good trade-off with statistical power [78]; in these instances only the adjusted p values are reported.

Data were deposited in the OSF repository. https://doi.org/10.17605/OSF.IO/PTGR3 [79].

Results

Behavioural results

As can be seen in Fig 2, participants detected true pattern changes well above chance even in the absence of absolute pitch. Nonetheless, discrimination performance is significantly decreased (difference in d’: M = -2.34; SD = 0.57, see Table 1) when only relative pitch cues are available, t(16) = -16.983, p < .001, d = -4.119. The available pitch information explains approximately 81% of the variance in the discrimination performance (η2 = 0.808). On average participants take 67 ms (SD = 23 ms) longer to press the button in response to a true pattern change in the transposed compared to the absolute condition, t(16) = 11.980, p < .001, d = 2.905. The available pitch information explains approximately 64% of the variance in the reaction times (η2 = 0.636).

Fig 2. Violin plots depict behavioural performance measures as a function of condition.

Fig 2

The plots show (n = 17 participants) sensitivity (d’) in discriminating true pattern changes from pattern repetitions (left panel), and averaged reaction times (RT) when correctly responding to true pattern changes (right panel). The absolute condition is depicted in blue, the transposed in red and the respective absolute–transposed difference in black. Small dots represent each participant’s mean performance, the bold square the group average and error bars ± 1 SEM. Participants detected true pattern changes well above chance even in the absence of absolute pitch. Nonetheless, sensitivity to true pattern changes based on relative compared to absolute pitch information is lower (d’: M = -2.34) and the behavioural response on average 67 ms slower respectively.

Table 1. Sensitivity index (d’), reaction times (RT in ms), jackknife latencies (ms) and window-averaged amplitudes (μV) of the deviant minus standard difference components MMN, N2b, P3a and P3b in the absolute and the transposed condition.

absolute transposed tra vs. abs
Measure EP std dev Δdev,std d std dev Δdev,std d diff d
d’ 3.90 1.55 -2.34 -4.119
RT 566 633 67 2.906
MMN
latency Fz
    slope 209 172 -36 -0.088
    peak 230 206 -24 -0.098
amplitude F -1.54 -2.77 -1.22 -0.97 -0.40 -0.92 -0.52 -0.55 0.94 0.493
FC -1.17 -2.53 -1.36 -1.16 -0.39 -0.72 -0.34 0.34 0.70 0.698
C -0.50 -1.72 -1.22 -1.23 -0.41 -0.49 -0.08 -0.08 1.02 0.820
N2b
latency FCz
    slope 219 293 74 0.797
    peak 288 339 51 1.703
amplitude F -2.77 -4.66 -1.90 -0.98 -2.84 -3.99 -1.15 -0.82 0.74 -0.413
FC -2.36 -4.18 -1.82 -0.84 -2.68 -3.70 -1.01 -0.66 0.80 -0.402
C -1.60 -2.74 -1.14 -0.61 -2.48 -2.83 -0.35 -0.20 0.79 -0.433
P3a
latency FCz
    slope 422 452 30 0.207
    peak 489 492 3 0.030
amplitude F -2.77 -1.11 1.66 0.43 -2.95 -1.57 1.39 0.43 -0.28 -0.091
FC -2.03 0.82 2.85 0.75 -2.38 -0.30 2.07 0.65 -0.77 -0.232
C -1.26 3.90 5.16 1.35 -1.83 1.71 3.55 1.17 -1.62 -0.447
P3b
latency Pz
    slope 420 514 94 0.857
    peak 630 670 40 0.323
amplitude PC -0.31 9.10 9.40 2.17 -0.84 8.17 9.01 2.17 -0.39 -0.098
P 0.03 10.73 10.70 2.34 -0.44 10.22 10.66 2.40 -0.04 -0.010
PO 0.49 10.52 10.03 2.54 0.40 10.91 10.51 2.42 0.48 0.145

Note. EP: electrode positions, F: frontal (F3, Fz, F4), FC: frontocentral (FC3, FCz, FC4), PC: parietocentral (PC3, PCz, PC4), P: parietal (P3, Pz, P4), PO: parietooccipital (PO3, POz, PO4). d: Cohen’s dz. Significant differences (p or padj < .05) are printed in bold.

EEG results

For a rough characterisation, grand-average peak latency values are reported, but please note that jackknife latencies will be used for statistical analysis. The grand-averaged deviant-minus-standard difference waveforms are characterised by an initial negative going deflection (Fig 3A). In the transposed condition there is a first negative difference at Fz which is maximal around 185 ms (MMN) and is followed by a second negative difference peaking at 342 ms (N2b). However, in the absolute condition the negative difference rises to one prominent peak at 286 ms, indicating a potential overlap of MMN and N2b activity. In both conditions the negativity of the deviants subsequently decreases, resulting in a positive difference (P3a) visible most clearly at anterior electrodes–maximal at FCz at 488 ms in the absolute and 490 ms in the transposed condition. Simultaneously, a strong posterior positivity (P3b) of the deviants starts to build up and is maximally different from the standards at 626 ms and 666 ms, respectively, at Pz. Principally, the ERP waves in response to both standard and deviant pattern stimuli appear morphologically quite similar between the absolute and transposed condition, though visually not identical with regard to the time course and magnitude of the deviant minus standard difference.

Fig 3. Grand averaged ERP waves.

Fig 3

(A) Grand-averaged nose-referenced ERP waves of n = 17 participants at midline electrodes Fz (top), FCz (middle) and Pz (bottom) as a function of condition (blue: absolute, red: transposed) and stimulus (left: standard and deviant response, right: deviant–standard difference). The shaded area represents ± 1 SEM around the difference potential. The ERP components are labelled where appropriate. The solid grey vertical line marks the pattern offset, while the dashed vertical lines denote the average response time in each condition. Please note, that the point of deviation occurs at 50 ms, that is with the start of the second segment of a pattern. (B) Topographies depict the window-averaged deviant–standard difference amplitude around each ERP component‘s peak for both conditions (left: absolute, right: transposed) respectively.

To test for effects of pitch information (condition) on topography (frontality) and magnitude of window averaged amplitudes associated with pattern repetitions and true pattern changes (stimulus), a repeated measures ANOVA was computed including the factors and their interactions (full model): condition (absolute vs. transposed) x stimulus type (standard vs. deviant) x frontality [MMN, N2b, P3a: anteriority (frontal, frontocentral, central) / P3b: posteriority (parietocentral, parietal, parietooccipital)].

MMN

Topographies of the averaged difference amplitude values within the MMN time window (Fig 3B) show that the initial negative difference between deviants and standards is mainly distributed at anterior sites with an inversion at more posterior, including the mastoid electrodes. In the transposed condition it is concentrated mostly at frontal electrodes, whereas in the absolute condition it shows a broader distribution centred around frontocentral electrodes with a slight right-hemispheric tendency.

Latencies. Jackknife latencies were extracted within a search window between 100 ms and 230 ms. In the transposed condition 60 percent of the peak amplitude of that first negative component are reached at approximately 172 ms (SE = 77 ms) at electrode Fz (Fig 4A), peaking at 206 ms (SE = 66 ms) which is after the fourth segment of the pattern ended. The estimated jackknife latencies did not differ significantly from the absolute condition, slope: tadj(16) = 0. 404, padj = .515, d = 0.098; peak: tadj(16) = 0.364, padj = .515, d = 0.089. However, in the absolute condition, visually, there is no clear peak in this time frame. In fact, the jackknife peak latency value in the absolute condition (M = 230 ms; SE = 0 ms) corresponds to the upper limit of the search window for the MMN peak. Thus, the slope latency in the absolute condition (M = 209 ms; SE = 38 ms) actually reflects the time point at which sixty percent of the amplitude at the upper search boundary are reached. In order to verify that this issue did not confound our results, we further compared the latencies at which 60% of the peak amplitude in the transposed condition were reached in both conditions respectively (absolute: M = 209 ms; SE = 5 ms; transposed: M = 238 ms; SE = 56 ms), again yielding no significant difference in the MMN slope, tadj(16) = -0.513, p = .615, d = -0.124. In sum, there is no indication of a delay in MMN build-up as a function of pitch information.

Fig 4. Error bar plots of averaged ERP component latencies and amplitudes.

Fig 4

Condition is coded by colour (blue: absolute, red: transposed). Error bars represent ± 1 SEM. For exact values please refer to Table 1. (A) Jackknife latency estimates of slope (60% of the local peak) and peak (100% of local peak) in each condition for each ERP component respectively. Note that different electrodes were used for the components (y-axis label). Significant differences (padj ≤ .05) are marked with an asterisk. (B) For each component window-averaged amplitudes are depicted both on the stimulus as well as the deviant-standard difference level from anterior to posterior electrodes along the midline, though averaged across the lateral dimension within a 3x3 electrode array.

Amplitudes. Within the MMN time range, activity independent of stimulus generally increases in negativity from central towards frontal electrodes (Fig 4B), main effect of anteriority: F(2,32) = 7.031, pGG = .013, ηg2 = 0.009. Activity in response to the auditory stimuli is generally more negative in the absolute compared to the transposed condition, main effect of condition: F(1,16) = 27.035, p < .001, ηg2 = 0.042. Also, the distribution of activity within the MMN window differs between conditions, condition*anteriority interaction: F(2,32) = 24.870, pGG < .001, ηg2 = 0.004. Only in the absolute condition negative amplitudes increase from central to frontal electrode sites (anteriorityabs: F(2,32) = 17.639, pGG < .001, ηg2 = 0.026), while this effect is absent in the transposed condition (anterioritytra: F(2,32) = 0.703, pGG = .432, ηg2 = 0.001;).

More interestingly, amplitudes in response to deviant patterns are significantly more negative compared to amplitudes for standard patterns, main effect of stimulus: F(1,16) = 18.475, p < .001, ηg2 = 0.020. Firstly, this main effect is dependent on the condition, condition*stimulus interaction: F(1,16) = 8.019, p = .012, ηg2 = 0.008. Secondly, the stimulus main effect is further characterised by a condition*stimulus*anteriority interaction: F(2,32) = 5.879, pGG = .015, ηg2 = 0.0003. Post hoc analysis revealed that in the absolute condition there are significant additive but no interaction effects of stimulus and anteriority (stimulusabs: F(1,16) = 22.814, p < .001, ηg2 = 0.050; anteriorityabs: F(2,32) = 17.639, pGG < .001, ηg2 = 0.025; stimulus*anteriorityabs: F(2,32) = 0.600, pGG = .484, ηg2 = 0.0001). Consequently, a significant negative standard minus deviant difference was elicited in response to pattern changes independent of anterior position (Table 1) when absolute pitch information was available. In contrast, in the transposed condition there are no additive but only interactive effects of stimulus and anteriority (stimulustra: F(1,16) = 1.790, p = .200, ηg2 = 0.003; anterioritytra: F(2,32) = 0.703 pGG = .432, ηg2 = 0.001; stimulus*anterioritytra: F(2,32) = 10.990, p < .001, ηg2 = 0.001). Post hoc analysis of paired comparisons between transposed deviants and standards at the respective anterior positions (see Table 1) revealed that a significant MMN is elicited only at frontal and frontocentral (t(16) > 1.4; padj ≤ .033) but not at central electrodes (t(16) = 10.313; padj = .126). Furthermore, the amplitude of MMN is statistically significantly larger in the absolute than in the transposed condition at all anterior positions (t(16) ≤ -2.034; padj ≤ .033), increasing from medium to large effect size from frontal towards central electrodes (see Table 1). Taken together, MMN in the absolute compared to the transposed condition receives some contribution from more central sources.

N2b

Topographies of the averaged difference amplitude values within the respective N2b time windows of each condition (Fig 3B) show that the negative difference between deviants and standards is mainly distributed at frontocentral electrodes, though the field is broader and slightly more right-hemispheric in the absolute compared to the transposed condition.

Latencies. Jackknife latencies were extracted within a search window between 180 ms and 500 ms. The second negative difference between deviants and standards, N2b, on average peaks 51 ms (SE = 7 ms) later in the transposed than in the absolute condition, tadj(16) = 7.020, padj < .001, d = 1.703. Actually, the temporal lag between the conditions which is already present at the time point at which 60% of the respective peak amplitude is reached (N2b slope difference: M = 74 ms; SE = 22 ms), tadj(16) = 3.287, padj ≤ .008, d = 0.797, indicates that there is a positive temporal shift (i.e., increased latency) of the whole component in the absence of absolute pitch information.

Amplitudes. Within the N2b time range, activity shows a typical N2 topography, as negativity increases from central towards frontal electrodes, main effect of anteriority: F(2,32) = 11.583, pGG < .001, ηg2 = 0.020. This distribution of activity slightly differs between conditions, condition*anteriority, F(2,32) = 24.211, pGG < .001, ηg2 = 0.002. The anterior increase of negativity is of bigger effect size in the absolute compared to the transposed condition (anteriorityabs: F(2,32) = 17.674, pGG < .001, ηg2 = 0.038; anterioritytra: F(2,32) = 5.389, pGG = .024, ηg2 = 0.010).

Activity within the N2b time range in response to deviants is generally more negative compared to standards, main effect of stimulus: F(1,16) = 11.720, p = .003, ηg2 = 0.032. This deviant minus standard negative difference is more pronounced at frontal than at central electrodes, stimulus*anteriority: F(2,32) = 16.332, pGG < .001, ηg2 = 0.003.

The combination of all these effects translates, as can be seen in Table 1, into the elicitation of a significant N2b regardless of pitch information. Although the negative difference between deviants and standards seems distributed more broadly in the absolute than in the transposed pitch information context (Fig 3B), there are no significant differences in N2b amplitudes between conditions within the 3x3 electrode array (condition*stimulus: F(1,16) = 3.112, pGG = .097, ηg2 = 0.003; condition*stimulus*anteriority: F(2,32) = 0.056, pGG = .945, ηg2 < 0.001).

P3a

The grand averaged difference waveforms show that after the N2b peak the deviants are less negative than the standards at anterior electrodes, resulting in a positive difference. Topographies of averaged amplitudes show that especially in the absolute condition a strong posterior positive component (P3b) overlaps with P3a activity, also visible in increased P3a amplitudes from frontal to central in the absolute compared to the transposed condition.

Latencies. Jackknife latencies were estimated within a search window from 250 ms and 550 ms. The positive difference between deviants and standards (P3a) at FCz succeeding the N2b peak, reaches 60% of its respective final peak approximately 29 ms (SE = 34 ms) later in the transposed compared to the absolute condition. Though, the difference is not significant, P3a slope: tadj(16) = 0.860, padj = .405, d = 0.207. The 3 ms (SE = 24 ms) peak latency difference between conditions is not significant either, P3a peak: tadj(16) = 0.130, padj = .564 d = 0.030. This indicates that P3a was elicited at a relatively similar time in both conditions.

Amplitudes. Overall, activity within the time window selected for P3a amplitude extraction both standards and deviants show a decrease in negativity from frontal towards central electrodes, main effect of anteriority: F(2,32) = 31.735, pGG < .001, ηg2 = 0.081. There is no significant main effect of condition alone, F(1,16) = 3.494, p < .080, ηg2 = 0.011. However, amplitudes differ between conditions as a function of anteriority, condition*anteriority interaction: F(2,32) = 9.705, pGG = .005, ηg2 = 0.003. While averaged across stimulus types amplitudes significantly decrease in negativity from frontal to central in both conditions (anteriorityabs: F(2,32) = 40.792, pGG < .001, ηg2 = 0.135; anterioritytra: F(2,32) = 18.080, pGG < .001, ηg2 = 0.068), this trend is of bigger effect size in the absolute compared to the transposed condition.

Averaged across conditions amplitudes within the P3a time range are significantly more positive in response to deviants than to standards, main effect of stimulus: F(1,16) = 15.607, p = .001, ηg2 = 0.120. This deviant-standard difference increases from frontal towards central electrodes, interaction stimulus*anteriority: F(2,32) = 21.738, pGG < .001, ηg2 = 0.024. Furthermore, the former effect is modulated by the factor condition, condition*stimulus*anteriority interaction: F(2,32) = 8.529, pGG < .05, ηg2 = 0.001 and also a slight interaction between stimulus, anteriority and laterality (F(4,64) = 3.878, pGG = .006, ηg2 = 0.001). The positive deviant-standard difference increases more strongly in the absolute compared to the transposed condition from frontal towards central electrodes (see Table 1). This converges with the already described posterior positive component (P3b), building up later in the transposed compared to the absolute condition, resulting in a stronger overlap at the time of the P3a peak in the latter condition. Post hoc analysis of paired comparisons of the difference amplitudes between conditions at the respective anterior positions did not reach significance though, t(16) < 1.8; padj > .060; (Table 1). Thus, there is no evidence that the P3a elicited in response to task relevant true pattern changes is reduced in the absence of absolute pitch cues.

P3b

Approximately at the time of the auditory pattern offset a strong and broad posterior positivity begins to develop in the grand averaged response to deviants relative to standards, though visually later in the transposed relative to the absolute condition. In both conditions the positive difference steadily rises, the temporal delay in the transposed condition already visible in the slope. The broad peak is visually in temporal proximity to the time point of the average behavioural response. The decline after the peak visually maintains the delay between conditions, indicating a temporal shift of the whole component in the absence of absolute pitch information.

Latencies. Jackknife latencies were extracted from a search window between 300 ms and 800 ms. At the time point at which 60% of the final difference peak is reached, there is a significant temporal delay of 94 ms (SE = 27 ms) between the transposed and the absolute condition, P3b slopetra vs abs: tadj(16) = 3.533, padj ≤ .007 d = 0.857. The temporal delay when the maximal positive difference between deviants and standards is reached amounts to 40 ms (SE = 30 ms), although not significant, P3b peaktra vs abs: tadj(16) = 1.330, padj = .253 d = 0.323. This is likely due to the broadness of the P3b peak, which makes peak latency estimation difficult.

Amplitudes. On average the amplitudes within the P3b time window significantly increase in positivity towards posterior, main effect of posteriority: F(2,32) = 23.046, pGG < .001, ηg2 = 0.044. There is no significant main effect of condition on activity within the P3b window, F(1,16) = 0.455, p = .510, ηg2 = 0.003. However, the distribution of the posterior activity is modulated by the condition, condition*posteriority interaction: F(2,32) = 27.379, p < .001, ηg2 = 0.004. Overall, deviants are significantly more positive than standards, main effect of stimulus: F(1,16) = 114.369, p < .001, ηg2 = 0.735. The condition*stimulus interaction is not significant, F(1,16) = 0.0003, p = .986, ηg2 < 0.0001. However, the distribution of activity along the midline within the P3b time window differs between standards and deviants, stimulus*posteriority: F(2,32) = 14.774, pGG < .001, ηg2 = 0.010, which in turn is modulated by experimental condition, condition*stimulus*posteriority, F(2,32) = 3.653, p = .004, ηg2 = 0.001. This means that P3b activity slightly differs between conditions with regard to its distribution along the midline (see also Fig 4B, bottom panel).

However, post hoc analysis further confirmed that a robust and large effect sized P3b (Table 1) is elicited in response to deviant patterns at all posterior positions (t(16) > 2.7; padj < .004), and there are no significant P3b amplitude differences as a function of pitch information (|t(16)| < 0.6; padj > .239).

Discussion

Although absolute spectral features are a dominant aspect in auditory processing and absolute pitch information typically is a salient feature of auditory events [9, 13, 80, 81], a certain tolerance to its variability is required, even in initial learning situations. In an indirect listening task Bader and colleagues [17] reported a striking divergence between the relatively untinged automatic change detection in the face of absolute pitch variability at the level of MMN on the one hand, and a prominent decreased further evaluation processing at the level of P3a. While MMN elicitation offers strong indication of sensory learning without absolute pitch cues, the reported P3a amplitude decrease and latency increase [17] are less clear in their meaning. They might reflect a difference in stimulus discriminability [17, 82], computational complexity [83, 84], bottom-up relevance [22, 27, 38, 39], or a combination of these factors depending on whether or not relative pitch distances have to be represented when absolute pitch alone is not sufficient to inform pattern discrimination.

Thus, the aim of the experiment on hand was to assess how pattern processing as a function of pitch reflects on the electrophysiological level in a setting in which the occurrence of new patterns is top-down relevant. Within a roving standard paradigm, randomly generated six-tone patterns were either repeated identically within a stimulus train, carrying absolute pitch information about the pattern, or shifted in pitch (transposed) between repetitions, so only relative pitch information was available to extract the pattern identity. Importantly, participants were asked to indicate whenever they detected a true pattern change, that is to ignore transpositions in the relative pitch context.

Behavioural performance

As hypothesised and congruent with the findings by Bader and colleagues [17] true pattern changes were detected well above chance regardless of pitch information. This clearly shows that, when explicitly relevant, invariant patterns can be extracted and discriminated from each other despite variable absolute pitch. This is in line with other findings that non-musicians adeptly recognise transposed melodies [79, 30, 31, 85]. There were some inter-individual differences, but the largest portion of performance variance was explained by whether absolute or relative pitch cues had to be extracted. Sensitivity to true pattern changes was significantly reduced when relative pitch had to be extracted from transposed pattern sequences. The average response to correctly identified true pattern changes was 67 ms slower when pattern discrimination had to rely on relative pitch extraction. Thus, although true pattern changes were explicitly relevant and transpositions explicitly irrelevant, there is a clear performance advantage of absolute over relative pitch information in pattern discrimination. Similar advantageous effects of absolute pitch (same key) over relative pitch (different key) in melody recognition tasks have been reported [79, 30, 31, 85]. This indicates that relevance alone does not provide a sufficient explanation for the P3a effects reported by Bader et al. [17], and implies that there are more fundamental differences in the processing of absolute and relative patterns. Relative pitch related decreased accuracy and increased response times together suggest that there are differences in computational complexity with regard to increased processing time but perhaps also the reliance on additional or even different processes altogether.

EEG

As hypothesised and similar to other studies using active discrimination paradigms [38, 41], the ERP in response to true pattern changes compared to true pattern repetitions (difference ERP) was characterised by the components MMN, N2b and the P300 complex (P3a and P3b). In comparison to the data reported by Bader et al. [17] MMN and P3a are present in both paradigms, whereas N2b and P3b were only observed in the active listening task of the current study but not in the passive listening data (please refer to Fig 5 contrasting the results in active and passive settings).

Fig 5. Comparison of ERPs between active and passive listening setting.

Fig 5

Grand-averaged stimulus-level (standard and deviant pattern) ERP waves as a function of pitch information context (blue: absolute, red: transposed) are depicted in the active listening paradigm of the current study (left panel) and the passive listening paradigm by Bader et al. [17] (middle panel). Grand-averaged deviant minus standard difference ERPs are compared between active and passive listening setting (right panel). Please note that the stimulus-onset-asynchrony was shorter in the passive listening study (650 ms) than in the active listening study (1100 ms) and that train lengths differed slightly as well.

MMN

Activity within the time frame of MMN was significantly larger in the absolute compared to the relative pitch context. While this negative difference could reflect a true MMN amplitude difference, it could also be explained by differential N2b overlap between conditions [18, 19, 42]. In the following we will discuss both accounts.

A potential true MMN amplitude difference could indicate differences in regularity representation strength, as a the stability of auditory environment has been inversely linked to model precision [86]. To elaborate, it is suggested that precision is higher the more stable an auditory regularity (low variability), and when deviation occurs from the expectation under a high-confidence scenario (higher precision), it results in larger MMN [86]. Within such an interpretational framework differences in MMN amplitude would reflect decreased model precision as a result of high environmental variability. However, there is some argument that MMN amplitude is not proportional to deviation magnitude, but that it is rather an all-or-none response [87]. This means, that differences in the average MMN amplitude between conditions would reflect a differential consistency with which MMN was elicited respectively. Detection of a new pattern needs at least two segments in both conditions. However, there might have been some jitter in (perceptual) change onset between conditions, resulting in a temporally more variable transposed MMN, not as well captured by a window averaging approach.

However, in the context of absolute pitch there was only one prominent, quite broadly distributed frontocentral negative deflection with an early shoulder within the typical latency range of MMN but the main peak occurring in the time range of N2b, impeding a robust estimation of MMN amplitude and latency. Thus, a likely explanation for the observed amplitude differences as a function of pitch context is a differing degree of temporal and spatial superposition of MMN by N2b [18, 19, 42], especially considering the more central contributions to the negative deflection in the MMN window in the absolute compared to the relative pitch context. In fact, MMN and N2b are often dissociated by the somewhat more central distribution of the latter component compared to the former, and typically also by polarity inversion at mastoids for MMN but not N2b [18, 19, 42, 88, 89]. However, the processing of complex patterns generally tends to elicit a more central MMN topography [17] than prototypical classic oddball stimuli, complicating the first approach. Furthermore, our observations seem similar to Bader et al. [17] that such patterns appear to also lack a pronounced and clear inverted peak at the mastoids.

One alternative approach to disambiguate MMN and N2b component overlap is to contrast a passive and an active oddball condition [9092]. The direct comparison with the passive listening paradigm by Bader et al. [17] (see Fig 5) suggests that the morphology of MMN in response to deviant patterns in the transposed condition in our active paradigm is highly concordant with the passively elicited MMN–both with regard to latency and amplitude. In the absolute condition, the initial portion of the negativity shows a similar concordance, with a divergence starting at around 210 ms (that is 160 ms after deviance onset), which is well in line with typical slope latencies of N2b [46, 47, 89, 93]. Thus, MMN shows a similar pattern in both studies, whereas the relatively later negativity (N2b) is only observable in the active but not the passive listening condition.

Particularly in the context of relative pitch, MMN was distinctly identifiable from N2b. It peaked at approximately 206 ms after stimulus onset, considering the 50 ms delay due to the fixed first segment in both experimental conditions, around 150 ms after the actual change in pattern identity started to occur. That is well within the typical MMN peak range [18, 19] and concords with other MMN-studies on higher-order regularities [15, 41, 9496]. These results are in line with Bader et al. [17] that a significant MMN is elicited in response to true pattern changes even when relative pitch information has to be extracted because absolute pitch varies. In accordance with common MMN interpretation [16, 97] these results support the inference by Bader et al. [17] that relative pitch information is sufficient for the formation of a sensory memory trace of a regular auditory pattern and its discrimination from other relative patterns at least on the sensory level.

N2b

N2b was elicited in response to true pattern changes in both absolute and relative pitch context. There was a descriptive, though not significant, attenuation of the amplitude for relative compared to absolute true pattern changes. N2b amplitude often is interpreted as an indicator of the strength of a voluntarily held stimulus template [45, 47], i.e. in this case how well the spectrotemporal pattern is represented based either on absolute or relative pitch information. The lack of evidence for an amplitude modulation might mean that a pattern template is established equally well in both instances. However, N2b amplitude is also taken to reflect the allocation of attentional resources. If relative compared to absolute patterns require more attentional resources at the level of N2b, that might have cancelled out any effects of representation strength. Notably, findings on N2b amplitude modulations are generally rather inconsistent: some studies report a more negative [98], others a less negative N2b amplitude [99] with increasing discrimination difficulty. Similarly, with progressing age both increases [38, 100, 101] as well as decreases [102] were observed. No evidence for N2b amplitude modulations was reported for age [103] and medication in ADHD patients [104]. Ultimately, N2b amplitude might simply be slightly smaller because it temporally somewhat overlapped more with the stimulus offset response [14] and the P3a in the relative compared to the absolute pitch context.

More noteworthy might be the significant temporal shift of the whole N2b component in the absence of pitch information in the order of magnitude of the respective delay on the level of reaction times. N2b latency has been observed to increase with discrimination demands [44, 46, 98, 105], and with age-related decline of cognitive resources [38, 100, 103]. Interestingly, N2b latency in children diagnosed with ADHD has been reported to be abnormally short compared to neurotypicals, and to not only be normalised (i.e., increased) under medication with methylphenidate but also, to concur with improved behavioural accuracy [104]. Taken together, it seems that N2b latency not merely indexes the speed but also the complexity, and perhaps accuracy, of the processing of deviating and, or target stimulus characteristics. In these terms, the increase in latency in the relative compared with the absolute pitch context, might well reflect generally increased computational complexity when a pattern is defined by relative rather than absolute pitch. As Ritter et al. (1992) pointed out that the interpretation of N2b largely relies “on the experimental circumstances and the strategies used by the subjects”, one could argue even further: N2b latency differences as a function of pitch might not just reflect differences in mere processing time between absolute and relative patterns, but might actually indicate wholly different kind of processes operating at the stage of stimulus comparison or categorisation respectively [100, 103].

P3a

In a passive listening situation Bader et al. [17] reported substantial P3a impoverishments for relative compared to absolute pitch context in the form of a delayed peak (approximately 130 ms) and diminished amplitude. One possible explanation for these effects is the question of relevance [22, 27, 38, 39]. Whereas a new pattern in a context of identical (absolute) pattern repetitions constitutes a clear deviation to the preceding stimulation, the occurrence of a new pattern in a relative pitch code context must not necessarily be inherently more relevant to the brain than if it were a transposition of the preceding pattern–in both cases absolute pitch information has changed. Thus, at least the P3a amplitude modulation by Bader et al [17] might have reflected decreased allocation of attentional resources to the true pattern changes.

The current study aimed to assess the validity of that argument, by making the true pattern changes but not the transpositions task relevant. There was no evidence that the P3a elicited in response to task relevant true pattern changes is reduced and none that P3a latency was affected in the absence of absolute pitch cues. Whatever differences exist in relative compared to absolute pattern processing in general, there was no evidence in the current study to suggest that when true pattern changes are explicitely relevant, that they manifest in the processes operating at the stage of P3a. In active listening the processes at P3a level seem to occur relatively independent of whether absolute or relative pitch information defines the patterns.

However, caution should taken when pondering whether this is evidence that top-down task-relevance of true pattern changes compensates for lack of bottom-up salience [24] in passive listening. It is not entirely clear yet whether P3a represent the same underlying processes in passive and active listening situations [24]. Actually, in the direct comparison of passive and active listening task (Fig 5), it appears that P3a elicited by true pattern changes might be separable into two distinct subcomponents [106110]–namely an early and a late P3a. When visually comparing both studies a potential late P3a was elicited of similar latency and amplitude in both studies irrespective of pitch information. However, a potential early P3a was elicited only in response to absolute true pattern changes in passive listening. Thus, this early P3a activity could be a candidate to further elucidate differential pattern processing as a function of pitch in future studies. It might well be the case that processing of relative patterns depends to some degree on the investment of direct attentional resources, while absolute patterns can be processed relatively independent from attention. Then again, in passive listening neither transpositions nor true pattern changes hold behaviourally relevant information for the listener. Even though the human auditory system might be able to discriminate relative patterns from each other just as well absolute patterns, it might be actually beneficial to ignore the constantly changing auditory environment in a relative pitch context. Whether learned relevance could compensate in the passive listening needs to be adressed in future studies including a passive listening task following an active learning phase.

P3b

As expected a prominent posterior P3b was elicited when a target (i.e. a true pattern change) was correctly identified [2124, 48]. There was no evidence for an attenuation of P3b amplitude but a clear and pronounced delay in the absence of absolute pitch information.

Within the traditional framework [51, 52] the lack of evidence for modulation of P3b amplitude by pitch information implies that during the revision, or updating of the current mental model in response to the detection of a true pattern change, attentional resources are equally available [22, 38]. A more recent account suggests that P3b rather reflects reactivation of stimulus-response links, and that P3b amplitude increases as a function of response infrequency [111]. The absence of P3b amplitude modulation in the current study would thus be explained by the fact that true pattern changes occurred equiprobably in both absolute and relative pitch context, resulting in comparable response frequencies.

Within both contexts, the increase in P3b latency would signify increased stimulus evaluation time [22, 48, 112]. Indeed, the observed delay was already present at the N2b level, and did not notably increase at the level of P3b. That P3b peaked after the correct behavioural response was made [24], implies that it reflects post-identification stimulus categorisation, response selection and execution processes [24, 48, 49, 55, 56, 111] which are not vulnerable to the absence of pitch information.

Conclusion

To conclude, even with specific instruction there is a clear advantage of absolute over relative pitch information in the active discrimination of unfamiliar melodic patterns. When relative pitch has to inform pattern learning and discrimination, the most notable electrophysiologic correlates are increased latencies at the level of N2b and P3b. Interestingly, the response delay of approximately 70 ms on the behavioural level, already fully manifests as early as at the level of N2b and is merely propagated to the level of P3b. In contrast, MMN and P3a were elicited regardless of pitch information to inform pattern discrimination. In sum, these findings strongly suggest that, rather than a break-down of the auditory change detection process, or later working memory or response related processes, target selection is at the root of the deterioration in behavioural performance. Specifically, relative compared to absolute pitch processing during active pattern learning either or both differs with regard to increased processing time but perhaps also the reliance on additional or even different computational processes altogether.

Supporting information

S1 File. Example of absolute pattern sequence depicted in Fig 1.

(WAV)

S2 File. Example of transposed pattern sequence depicted in Fig 1.

(WAV)

Data Availability

Data underlying the findings are deposited on the OSF repository: (10.17605/OSF.IO/PTGR3).

Funding Statement

The study was funded by the German Research Foundation (www.dfg.de) with the project number GR 3412/2-1. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Bregman AS. Auditory Scene Analysis. 1. MIT Pre. Cambridge, Mass. [u.a.]: MIT Press; 1994. Available: https://mitpress.mit.edu/books/auditory-scene-analysis [Google Scholar]
  • 2.Griffiths TD, Warren JD. What is an auditory object? Nat Rev Neurosci. 2004;5: 887–892. 10.1038/nrn1538 [DOI] [PubMed] [Google Scholar]
  • 3.Saffran JR, Griepentrog GJ. Absolute pitch in infant auditory learning: evidence for developmental reorganization. Dev Psychol. 2001;37: 74–85. 10.1037/0012-1649.37.1.74 [DOI] [PubMed] [Google Scholar]
  • 4.Bendixen A. Predictability effects in auditory scene analysis: A review. Front Neurosci. 2014;8: 1–16. 10.3389/fnins.2014.00001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.McDermott JH, Wrobleski D, Oxenham AJ. Recovering sound sources from embedded repetition. Proc Natl Acad Sci. 2011;108: 1188–1193. 10.1073/pnas.1004765108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Schindler A, Herdener M, Bartels A. Coding of melodic gestalt in human auditory cortex. Cereb Cortex. 2013;23: 2987–93. 10.1093/cercor/bhs289 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Halpern AR, Bartlett JC, Dowling WJ. Perception of mode, rhythm, and contour in unfamiliar melodies: Effects of age and experience. Music Percept. 1998;15: 335–355. 10.2307/40300862 [DOI] [Google Scholar]
  • 8.Bartlett JC, Dowling WJ. Recognition of transposed melodies: A key-distance effect in developmental perspective. J Exp Psychol Hum Percept Perform. 1980;6: 501–515. 10.1037//0096-1523.6.3.501 [DOI] [PubMed] [Google Scholar]
  • 9.McDermott JH, Lehr AJ, Oxenham AJ. Is relative pitch specific to pitch? Psychol Sci. 2008;19: 1263–1271. 10.1111/j.1467-9280.2008.02235.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Trainor LJ, McDonald KL, Alain C. Automatic and controlled processing of melodic contour and interval information measured by electrical brain activity. J Cogn Neurosci. 2002;14: 430–442. 10.1162/089892902317361949 [DOI] [PubMed] [Google Scholar]
  • 11.Saffran JR. Absolute pitch in infancy and adulthood: the role of tonal structure. Dev Sci. 2003;6: 35–43. 10.1111/1467-7687.00250 [DOI] [Google Scholar]
  • 12.Saarinen J, Paavilainen P, Schröger E, Tervaniemi M, Näätänen R. Representation of abstract attributes of auditory stimuli in the human brain. NeuroReport. 1992. pp. 1149–1151. 10.1097/00001756-199212000-00030 [DOI] [PubMed] [Google Scholar]
  • 13.Peretz I, Zatorre RJ. Brain Organization for Music Processing. Annu Rev Psychol. 2005;56: 89–114. 10.1146/annurev.psych.56.091103.070225 [DOI] [PubMed] [Google Scholar]
  • 14.Picton TW, Alain C, Otten L, Ritter W, Achim A. Mismatch negativity: Different water in the same river. Audiology and Neuro-Otology. 2000. pp. 111–139. 10.1159/000013875 [DOI] [PubMed] [Google Scholar]
  • 15.Paavilainen P. The mismatch-negativity (MMN) component of the auditory event-related potential to violations of abstract regularities: A review. Int J Psychophysiol. 2013;88: 109–123. 10.1016/j.ijpsycho.2013.03.015 [DOI] [PubMed] [Google Scholar]
  • 16.Winkler I. Interpreting the mismatch negativity. J Psychophysiol. 2007;21: 147–163. 10.1027/0269-8803.21.34.147 [DOI] [Google Scholar]
  • 17.Bader M, Schröger E, Grimm S. How regularity representations of short sound patterns that are based on relative or absolute pitch information establish over time: An EEG study. Brattico E, editor. PLoS One. 2017;12: e0176981. 10.1371/journal.pone.0176981 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Schröger E. The mismatch negativity as a tool to study auditory processing. Acta Acust United with Acust. 2005;91: 490–501. [Google Scholar]
  • 19.Näätänen R, Paavilainen P, Rinne T, Alho K. The mismatch negativity (MMN) in basic research of central auditory processing: A review. Clin Neurophysiol. 2007;118: 2544–2590. 10.1016/j.clinph.2007.04.026 [DOI] [PubMed] [Google Scholar]
  • 20.Escera C, Alho K, Schröger E, Winkler I. Involuntary attention and distractibility as evaluated with event-related brain potentials. Audiol Neurootol. 2000;5: 151–166. 10.1159/000013877 [DOI] [PubMed] [Google Scholar]
  • 21.Polich J, Criado JR. Neuropsychology and neuropharmacology of P3a and P3b. Int J Psychophysiol. 2006;60: 172–185. 10.1016/j.ijpsycho.2005.12.012 [DOI] [PubMed] [Google Scholar]
  • 22.Polich J. Updating P300: An integrative theory of P3a and P3b. Clin Neurophysiol. 2007;118: 2128–2148. 10.1016/j.clinph.2007.04.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Linden DEJ. The P300: where in the brain is it produced and what does it tell us? Neurosci. 2005;11: 563–576. 10.1177/1073858405280524 [DOI] [PubMed] [Google Scholar]
  • 24.Dien J, Spencer KM, Donchin E. Parsing the late positive complex: Mental chronometry and the ERP components that inhabit the neighborhood of the P300. Psychophysiology. 2004;41: 665–678. 10.1111/j.1469-8986.2004.00193.x [DOI] [PubMed] [Google Scholar]
  • 25.Wetzel N, Schröger E, Widmann A. The dissociation between the P3a event-related potential and behavioral distraction. Psychophysiology. 2013;50: 920–930. 10.1111/psyp.12072 [DOI] [PubMed] [Google Scholar]
  • 26.Berti S, Roeber U, Schröger E. Bottom-up influences on working memory: Behavioral and electrophysiological distraction varies with distractor strength. Exp Psychol. 2004;51: 249–257. 10.1027/1618-3169.51.4.249 [DOI] [PubMed] [Google Scholar]
  • 27.Horváth J, Winkler I, Bendixen A. Do N1/MMN, P3a, and RON form a strongly coupled chain reflecting the three stages of auditory distraction? Biol Psychol. 2008;79: 139–147. 10.1016/j.biopsycho.2008.04.001 [DOI] [PubMed] [Google Scholar]
  • 28.Schröger E, Bendixen A, Denham SL, Mill RW, Bohm TM, Winkler I. Predictive regularity representations in violation detection and auditory stream segregation: From conceptual to computational models. Brain Topography. 2014. pp. 565–577. 10.1007/s10548-013-0334-6 [DOI] [PubMed] [Google Scholar]
  • 29.Friedman D, Cycowicz YM, Gaeta H. The novelty P3: an event-related brain potential (ERP) sign of the brain’s evaluation of novelty. Neurosci Biobehav Rev. 2001;25: 355–373. 10.1016/s0149-7634(01)00019-7 [DOI] [PubMed] [Google Scholar]
  • 30.Schellenberg EG, Poon J, Weiss MW. Memory for melody and key in childhood. PLoS One. 2017;12. 10.1371/journal.pone.0187115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Schellenberg EG, Habashi P. Remembering the melody and timbre, forgetting the key and tempo. Mem Cogn. 2015;43: 1021–1031. 10.3758/s13421-015-0519-1 [DOI] [PubMed] [Google Scholar]
  • 32.Foster N, Zatorre RJ. A role for the intraparietal sulcus in transforming musical pitch information. Cereb Cortex. 2010;20: 1350–1359. 10.1093/cercor/bhp199 [DOI] [PubMed] [Google Scholar]
  • 33.Zatorre RJ, Delhommeau K, Zarate JM. Modulation of auditory cortex response to pitch variation following training with microtonal melodies. Front Psychol. 2012;3: 1–17. 10.3389/fpsyg.2012.00001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Tervaniemi M, Rytkönen M, Schröger E, Ilmoniemi RJ, Näätänen R. Superior formation of cortical memory traces for melodic patterns in musicians. Learn Mem. 2001;8: 295–300. 10.1101/lm.39501 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Fujioka T, Trainor LJ, Ross B, Kakigi R, Pantev C. Musical training enhances automatic encoding of melodic contour and interval structure. J Cogn Neurosci. 2004;16: 1010–1021. 10.1162/0898929041502706 [DOI] [PubMed] [Google Scholar]
  • 36.He C, Hotson L, Trainor LJ. Development of infant mismatch responses to auditory pattern changes between 2 and 4 months old. Eur J Neurosci. 2009;29: 861–867. 10.1111/j.1460-9568.2009.06625.x [DOI] [PubMed] [Google Scholar]
  • 37.Seppänen M, Brattico E, Tervaniemi M. Practice strategies of musicians modulate neural processing and the learning of sound-patterns. Neurobiol Learn Mem. 2007;87: 236–247. 10.1016/j.nlm.2006.08.011 [DOI] [PubMed] [Google Scholar]
  • 38.Schiff S, Valenti P, Andrea P, Lot M, Bisiacchi P, Gatta A, et al. The effect of aging on auditory components of event-related brain potentials. Clin Neurophysiol. 2008;119: 1795–1802. 10.1016/j.clinph.2008.04.007 [DOI] [PubMed] [Google Scholar]
  • 39.Rosburg T, Weigl M, Thiel R, Mager R. The event-related potential component P3a is diminished by identical deviance repetition, but not by non-identical repetitions. Exp Brain Res. 2018;236: 1519–1530. 10.1007/s00221-018-5237-z [DOI] [PubMed] [Google Scholar]
  • 40.Tervaniemi M, Huotilainen M, Brattico E. Melodic multi-feature paradigm reveals auditory profiles in music-sound encoding. Front Hum Neurosci. 2014;8: 1–10. 10.3389/fnhum.2014.00001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Bendixen A, Schröger E. Memory trace formation for abstract auditory features and its consequences in different attentional contexts. Biol Psychol. 2008;78: 231–241. 10.1016/j.biopsycho.2008.03.005 [DOI] [PubMed] [Google Scholar]
  • 42.Sussman E, Chen S, Sussman-Fort J, Dinces E. The five myths of MMN: Redefining how to Use MMN in basic and clinical research. Brain Topogr. 2014;27: 553–564. 10.1007/s10548-013-0326-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Näätänen R, Simpson M, Loveless NE. Stimulus deviance and evoked potentials. Biol Psychol. 1982;14: 53–98. 10.1016/0301-0511(82)90017-5 [DOI] [PubMed] [Google Scholar]
  • 44.Ritter W, Simson R, Vaughan HG. Event-related potential correlates of two stages of information processing in Physical and Semantic Discrimination Tasks. Psychophysiology. 1983;20: 168–179. 10.1111/j.1469-8986.1983.tb03283.x [DOI] [PubMed] [Google Scholar]
  • 45.Ritter W, Paavilainen P, Lavikainen J, Reinikainen K, Alho K, Sams M, et al. Event-related potentials to repetition and change of auditory stimuli. Eleetroencephalography Clin Neurophysiol. 1992. 10.1016/0013-4694(92)90090-5 [DOI] [PubMed] [Google Scholar]
  • 46.Breton F, Ritter W, Simson R, Vaughan HG. The N2 component elicited by stimulus matches and multiple targets. Biol Psychol. 1988;27: 23–44. 10.1016/0301-0511(88)90003-8 [DOI] [PubMed] [Google Scholar]
  • 47.Sams M, Alho K, Näätänen R. Sequential effects on the ERP in discriminating two stimuli. Biol Psychol. 1983;17: 41–58. 10.1016/0301-0511(83)90065-0 [DOI] [PubMed] [Google Scholar]
  • 48.Patel SH, Azzam PN. Characterization of N200 and P300: Selected studies of the Event-Related Potential. Int J Med Sci. 2005;2: 147–154. 10.7150/ijms.2.147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Falkenstein M, Hohnsbein J, Hoormann J. Effects of choice complexity on different subcomponents of the late positive complex of the event-related potential. Electroencephalogr Clin Neurophysiol Evoked Potentials. 1994;92: 148–160. 10.1016/0168-5597(94)90055-8 [DOI] [PubMed] [Google Scholar]
  • 50.Spencer K, Dien J, Donchin E. Spatiotemporal analysis of the late ERP to deviant stimuli. Psychophysiology. 2001;38: 343–358. 10.1111/1469-8986.3820343 [DOI] [PubMed] [Google Scholar]
  • 51.McCarthy G, Donchin E. A metric for thought: a comparison of P300 latency and reaction time. Science. 1981. pp. 77–80. 10.1126/science.7444452 [DOI] [PubMed] [Google Scholar]
  • 52.Donchin E. Surprise! … Surprise? Psychophysiology. 1981;18: 493–513. 10.1111/j.1469-8986.1981.tb01815.x [DOI] [PubMed] [Google Scholar]
  • 53.Magliero A, Bashore TR, Coles MGH, Donchin E. On the Dependence of P300 Latency on Stimulus Evaluation Processes. Psychophysiology. 1984. pp. 171–186. 10.1111/j.1469-8986.1984.tb00201.x [DOI] [PubMed] [Google Scholar]
  • 54.Duncan-Johnson C. P300 latency: A new metric for information processing. Psychophysiology. 1981;18: 207–215. 10.1111/j.1469-8986.1981.tb03020.x [DOI] [PubMed] [Google Scholar]
  • 55.Pfefferbaum A, Christensen C, Ford JM, Kopell BS. Apparent response incompatibility effects on P3 latency depend on the task. Electroencephalogr Clin Neurophysiol. 1986;64: 424–437. 10.1016/0013-4694(86)90076-3 [DOI] [PubMed] [Google Scholar]
  • 56.Verleger R. On the utility of P3 latency as an index of mental chronometry. Psychophysiology. 1997;34: 131–156. 10.1111/j.1469-8986.1997.tb02125.x [DOI] [PubMed] [Google Scholar]
  • 57.Fabiani M, Karis D, Donchin E. P300 and recall in an incidental memory paradigm. Psychophysiology. 1986;23: 298–308. 10.1111/j.1469-8986.1986.tb00636.x [DOI] [PubMed] [Google Scholar]
  • 58.Kleiner M, Brainard D, Pelli D. What’s new in Psychtoolbox 3. Perception. 2007;36. [Google Scholar]
  • 59.Cowan N, Winkler I, Teder W, Näätänen R. Memory prerequisites of mismatch negativity in the auditory event-related potential (ERP). J Exp Psychol Learn Mem Cogn. 1993;19: 909–921. 10.1037//0278-7393.19.4.909 [DOI] [PubMed] [Google Scholar]
  • 60.Baldeweg T, Klugman A, Gruzelier JH, Hirsch SR. Impairment in frontal but not temporal components of mismatch negativity in schizophrenia. Int J Psychophysiol. 2002;43: 111–122. 10.1016/s0167-8760(01)00183-0 [DOI] [PubMed] [Google Scholar]
  • 61.Hautus MJ, Lee A. Estimating sensitivity and bias in a yes/no task. Br J Math Stat Psychol. 2006;59: 257–273. 10.1348/000711005X65753 [DOI] [PubMed] [Google Scholar]
  • 62.Delorme A, Makeig S. EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Methods. 2004;134: 9–21. 10.1016/j.jneumeth.2003.10.009 [DOI] [PubMed] [Google Scholar]
  • 63.Widmann A, Schröger E. Filter Effects and Filter Artifacts in the Analysis of Electrophysiological Data. Front Psychol. 2012;3: 1–5. 10.3389/fpsyg.2012.00001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Widmann A, Schröger E, Maess B. Digital filter design for electrophysiological data—a practical approach. J Neurosci Methods. 2015;250: 34–46. 10.1016/j.jneumeth.2014.08.002 [DOI] [PubMed] [Google Scholar]
  • 65.Makeig S, Bell AJ, Jung T-P, Sejnowski T. Independent component analysis of electroencephalographic data. Advances in neural information processing systems. MIT Press. 1996;8: 145–151. [Google Scholar]
  • 66.Chaumon M, Bishop DVM, Busch NA. A practical guide to the selection of independent components of the electroencephalogram for artifact correction. J Neurosci Methods. 2015;250: 47–63. 10.1016/j.jneumeth.2015.02.025 [DOI] [PubMed] [Google Scholar]
  • 67.Mognon A, Jovicich J, Bruzzone L, Buiatti M. ADJUST: An automatic EEG artifact detector based on the joint use of spatial and temporal features. Psychophysiology. 2011;48: 229–240. 10.1111/j.1469-8986.2010.01061.x [DOI] [PubMed] [Google Scholar]
  • 68.Nolan H, Whelan R, Reilly RB. FASTER: Fully Automated Statistical Thresholding for EEG artifact Rejection. J Neurosci Methods. 2010;192: 152–162. 10.1016/j.jneumeth.2010.07.015 [DOI] [PubMed] [Google Scholar]
  • 69.Perrin F, Pernier J, Bertrand O. Spherical splines for scalp potential and current density mapping. Electroencephalogr Clin Neurophysiol. 1989;72: 184–187. 10.1016/0013-4694(89)90180-6 [DOI] [PubMed] [Google Scholar]
  • 70.Kiesel A, Miller J, Jolicoeur P, Brisson B. Measurement of ERP latency differences: A comparison of single-participant and jackknife-based scoring methods. Psychophysiology. 2008;45: 250–274. 10.1111/j.1469-8986.2007.00618.x [DOI] [PubMed] [Google Scholar]
  • 71.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria: RStudio, Inc.; 2018. Available: http://www.r-project.org [Google Scholar]
  • 72.Lawrence MA. ez–Easy analysis and visualization of factorial experiments. 2012. Available: http://www.rdocumentation.org/packages/ez [Google Scholar]
  • 73.Pollard KS, Dudoit S, van der Laan MJ. Multiple Testing Procedures: the multtest Package and Applications to Genomics. In: Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S, editors. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York, NY: Springer New York; 2005. pp. 249–271. 10.1007/0-387-29362-0_15 [DOI] [Google Scholar]
  • 74.Wickham H. ggplot2: Elegant Graphics for Data Analysis. 2nd ed. New York: Springer-Verlag New York; 2016. Available: https://ggplot2.tidyverse.org/authors.html [Google Scholar]
  • 75.Ulrich R, Miller J. Using the jackknife-based scoring method for measuring LRP onset effects in factorial designs. Psychophysiology. 2001;38: 816–827. 10.1111/1469-8986.3850816 [DOI] [PubMed] [Google Scholar]
  • 76.Karoly LA, Schroder C. Fast Methods for Jackknifing Inequality Indices. RAND Work Pap Ser WR-1017. 2013; 18. 10.2139/ssrn.2341310 [DOI]
  • 77.Benjamini Y, Krieger AM, Yekutieli D. Adaptive linear step-up procedures that control the false discovery rate. Biometrika. 2006;93: 491–507. 10.1093/biomet/93.3.491 [DOI] [Google Scholar]
  • 78.Stevens JR, Masud A Al, Suyundikov A. A comparison of multiple testing adjustment methods with block-correlation positively dependent tests. PLoS One. 2017;12. 10.1371/journal.pone.0176124 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Coy N, Bader M, Schröger E, Grimm S. Change detection of auditory tonal patterns defined by absolute versus relative pitch information. A combined behavioural and EEG study. OSF; 2021. 10.17605/OSF.IO/PTGR3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Schmuckler MA. Tonality and Contour in Melodic Processing. 2nd ed. In: Hallam S, Cross I, Thaut M, editors. The Oxford Handbook of Music Psychology. 2nd ed. Oxford University Press; 2016. 10.3109/02699052.2016.1147077 [DOI] [Google Scholar]
  • 81.Dubus G, Bresin R. A systematic review of mapping strategies for the sonification of physical quantities. PLoS ONE. 2013. p. 82491. 10.1371/journal.pone.0082491 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Katayama J, Polich J. Stimulus context determines P3a and P3b. Psychophysiology. 1998;35: 23–33. 10.1017/s0048577298961479 [DOI] [PubMed] [Google Scholar]
  • 83.Polich J. Task difficulty, probability, and inter-stimulus interval as determinants of P300 from auditory stimuli. Electroencephalogr Clin Neurophysiol Evoked Potentials. 1987;68: 311–320. 10.1016/0168-5597(87)90052-9 [DOI] [PubMed] [Google Scholar]
  • 84.Barkaszi I, Czigler I, Balázs L. Stimulus complexity effects on the event-related potentials to task-irrelevant stimuli. Biol Psychol. 2013;94: 82–89. 10.1016/j.biopsycho.2013.05.007 [DOI] [PubMed] [Google Scholar]
  • 85.Cuddy LL, Cohen A, Mewhort DJ. Perception of structure in short melodic sequences. J Exp Psychol Hum Percept Perform. 1981;7: 869–883. 10.1037//0096-1523.7.4.869 [DOI] [PubMed] [Google Scholar]
  • 86.Fitzgerald K, Todd J. Hierarchical timescales of statistical learning revealed by mismatch negativity to auditory pattern deviations. Neuropsychologia. 2018;120: 25–34. 10.1016/j.neuropsychologia.2018.09.015 [DOI] [PubMed] [Google Scholar]
  • 87.Horváth J, Czigler I, Jacobsen T, Maess B, Schröger E, Winkler I. MMN or no MMN: No magnitude of deviance effect on the MMN amplitude. Psychophysiology. 2008;45: 60–69. 10.1111/j.1469-8986.2007.00599.x [DOI] [PubMed] [Google Scholar]
  • 88.Fitzgerald K, Todd J. Making Sense of Mismatch Negativity. Front Psychiatry. 2020;11: 1–19. 10.3389/fpsyt.2020.00001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Näätänen R, Sussman E, Salisbury D, Shafer VL. Mismatch negativity (MMN) as an index of cognitive dysfunction. Brain Topogr. 2014;27: 451–466. 10.1007/s10548-014-0374-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Sussman E. A new view on the MMN and attention debate: The role of context in processing auditory events. J Psychophysiol. 2007;21: 164–175. 10.1027/0269-8803.21.34.164 [DOI] [Google Scholar]
  • 91.Potts GF, Dien J, Hartry-Speiser AL, McDougal LM, Tucker DM. Dense sensor array topography of the event-related potential to task- relevant auditory stimuli. Electroencephalogr Clin Neurophysiol. 1998;106: 444–456. 10.1016/s0013-4694(97)00160-0 [DOI] [PubMed] [Google Scholar]
  • 92.Wei JH, Chan TC, Luo YJ. A modified oddball paradigm ‘cross-modal delayed response’ and the research on mismatch negativity. Brain Res Bull. 2002;57: 221–230. 10.1016/s0361-9230(01)00742-0 [DOI] [PubMed] [Google Scholar]
  • 93.Novak GP, Ritter W, Vaughan HG, Wiznitzer ML. Differentiation of Negative Event-Related Potentials in an Auditory-Discrimination Task. Electroencephalogr Clin Neurophysiol. 1990;75: 255–275. 10.1016/0013-4694(90)90105-s [DOI] [PubMed] [Google Scholar]
  • 94.Tervaniemi M, Schröger E, Saher M, Näätänen R. Effects of spectral complexity and sound duration on automatic complex-sound pitch processing in humans—A mismatch negativity study. Neurosci Lett. 2000;290: 66–70. 10.1016/s0304-3940(00)01290-8 [DOI] [PubMed] [Google Scholar]
  • 95.Alho K, Tervaniemi M, Huotilainen M, Lavikainen J, Tiitinen H, Ilmoniemi RJ, et al. Processing of complex sounds in the human auditory cortex as revealed by magnetic brain responses. Psychophysiology. 1996;33: 369–375. 10.1111/j.1469-8986.1996.tb01061.x [DOI] [PubMed] [Google Scholar]
  • 96.Alho K, Sinervo N. Pre-attentive processing of complex auditory information in the human brain. Neurosci Lett. 1997;233: 33–36. 10.1016/s0304-3940(97)00620-4 [DOI] [PubMed] [Google Scholar]
  • 97.Schröger E, Bendixen A, Trujillo-Barreto N, Roeber U. Processing of abstract rule violations in audition. PLoS One. 2007;2. 10.1371/journal.pone.0001131 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Fitzgerald PG, Picton TW. Event-Related Ppotentials Recorded During The Discrimination of Improbable Stimuli. Biol Psychol. 1983;17: 241–276. 10.1016/0301-0511(83)90003-0 [DOI] [PubMed] [Google Scholar]
  • 99.Ford JM, Roth WT, Kopell BS. Auditory Evoked Potentials to Unpredictable Shifts in Pitch. Psychophysiology. 1976;13: 32–39. 10.1111/j.1469-8986.1976.tb03333.x [DOI] [PubMed] [Google Scholar]
  • 100.Iragui VJ, Kuta M, Mitchiner MR, Hillyard SA. Effects of aging on event-related brain potentials and reaction times in an auditory oddball task. Psychophysiology. 2007;30: 10–22. 10.1111/j.1469-8986.1993.tb03200.x [DOI] [PubMed] [Google Scholar]
  • 101.Pontifex MB, Hillman CH, Polich J. Age, physical fitness, and attention: P3a and P3b. Psychophysiology. 2009;46: 379–387. 10.1111/j.1469-8986.2008.00782.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Gunter TC, Jackson JL, Mulder G. Focussing on aging: an electrophysiological exploration of spatial and attentional processing during reading. Biol Psychol. 1996;43: 103–145. Available: https://ac.els-cdn.com/0301051195051805/1-s2.0-0301051195051805-main.pdf?_tid=56d2bfa4-81e2-46ea-9092-219cc04b0749&acdnat=1547031395_c47fd6331232d52b1383c29755567cd2 10.1016/0301-0511(95)05180-5 [DOI] [PubMed] [Google Scholar]
  • 103.Amenedo E, Díaz F. Automatic and effortful processes in auditory memory reflected by event- related potentials. Age-related findings. Electroencephalogr Clin Neurophysiol—Evoked Potentials. 1998;108: 361–369. 10.1016/s0168-5597(98)00007-0 [DOI] [PubMed] [Google Scholar]
  • 104.Sunohara G, Malone MA, Rovet J, Humphries T, Roberts W, Taylor MJ. Effect of Methylphenidate on Attention in Children with Attention Deficit Hyperactivity Disorder (ADHD) ERP Evidence. Neuropsychopharmacology. 1999;21: 218–228. 10.1016/S0893-133X(99)00023-8 [DOI] [PubMed] [Google Scholar]
  • 105.Ritter W, Simson R, Vaughan H, Macht M. Manipulation of event-related potential manifestations of information processing stages. Science. 1982;218: 909–911. 10.1126/science.7134983 [DOI] [PubMed] [Google Scholar]
  • 106.Escera C, Alho K, Winkler I, Näätänen R. Neural Mechanisms of Involuntary Attention. J Cogn Neurosci. 1998;10: 590–604. 10.1162/089892998562997 [DOI] [PubMed] [Google Scholar]
  • 107.Masson R, Bidet-Caulet A. Fronto-central P3a to distracting sounds: An index of their arousing properties. Neuroimage. 2019;185: 164–180. 10.1016/j.neuroimage.2018.10.041 [DOI] [PubMed] [Google Scholar]
  • 108.Barry RJ, Steiner GZ, De Blasio FM. Reinstating the Novelty P3. Sci Rep. 2016;6: 1–13. 10.1038/s41598-016-0001-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Bonmassar C, Widmann A, Wetzel N. The impact of novelty and emotion on attention-related neuronal and pupil responses in children. Dev Cogn Neurosci. 2020;42: 1878–9293. 10.1016/j.dcn.2020.100766 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Yago E, Escera C, Alho K, Giard MH, Serra-Grabulosa JM. Spatiotemporal dynamics of the auditory novelty-P3 event-related brain potential. Cogn Brain Res. 2003;16: 383–390. 10.1016/s0926-6410(03)00052-1 [DOI] [PubMed] [Google Scholar]
  • 111.Verleger R. Effects of relevance and response frequency on P3b amplitudes: Review of findings and comparison of hypotheses about the process reflected by P3b. Psychophysiology. 2020;57: 1–22. 10.1111/psyp.13542 [DOI] [PubMed] [Google Scholar]
  • 112.Kutas M, McCarthy G, Donchin E. Augmenting mental chronometry: the P300 as a measure of stimulus evaluation time. Science. 1977;197: 792–795. 10.1126/science.887923 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Qian-Jie Fu

18 Nov 2020

PONE-D-20-25390

Change detection of auditory tonal patterns defined by absolute versus relative pitch information. A combined behavioural and EEG study.

PLOS ONE

Dear Dr. Coy,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Note both reviewers have major concerns about the data analysis. It is highly recommended to include the additional source space analysis or PCA analysis in the revision. Such analyses may be critical to support the conclusion in the present study. 

Please submit your revised manuscript by Jan 02 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Qian-Jie Fu, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This study describes the behavioural and spatiotemporal EEG dynamics associated with detecting melodic oddball deviants that differ in either absolute pitch or relative pitch. A primary purpose was to confirm that complex auditory representations are “automatically” coded in memory even in the absence of absolute pitch information. A secondary purpose was to describe any behavioural (RT) and spatio-temporal EEG differences in so-called “automatic” (as indexed by the MMN and maybe the P3a) and “intentional” (as indexed by the N2b and P3b) processes involved in the active detection of auditory irregularities defined by either absolute pitch deviants or relative pitch deviants. The main results show that while MMN and P3a latencies are similar when detecting absolute and relative pitch deviants, N2b and P3b latencies are longer to relative pitch deviants compared to absolute pitch deviants. This suggests that complex auditory representations may be coded automatically even in the absence of absolute pitch information, while intentional processes are slower to respond to auditory representations in the absence of absolute pitch information.

In general I think this work is well done; the goals are properly-justified, the study design was well thought-out and executed, analyses were mostly clear and properly-justified (but see major revisions), and the results provide interesting insight into the temporal dissociation between automatic coding of auditory representations and intentional higher-order processing (attention, stimulus matching) (but again, see major revisions). I have one major concern and a few minor concerns. I am, therefore, requesting accept with Major Revisions, but I have every confidence the major revision will be addressed.

Major Revisions

Please explain in more detail how you determined that there is, in fact, both an MMN and an N2b in the absolute pitch condition. While I’m sympathetic to the argument that “should” be both components present in the absolute condition, and that the difference wave in Figure 3 represents a likely spatiotemporal overlap between the two conditions, I am, however, concerned that the MMN and N2b scalp maps look very similar. My understanding is that the N2b, unlike the MMN, does not typically have an inversion at temporal sites (e.g. Näätänen & Gaillard, 1983; Sussman et al., 2002 see Fig 2), but the N2b scalp map shown in Figure 3 appears to have just as much source located around temporal/posterior sites as does the MMN scalp map. Since one of the main conclusions of the study is that automatic coding of auditory patterns occurs at a similar timecourse regardless of whether absolute pitch information is available (as indexed by the latency of the MMN), but that intentional processes are slower to detect relevant pitch deviants (as indexed by the latency of the N2b as well as the P3b), it’s vital to your argument that you make clear that both the MMN and N2b were, in fact, generated in each condition.

References:

Näätänen, R., & Gaillard, A. W. K. (1983). 5 The Orienting Reflex and the N2 Deflection of the Event-Related Potential (ERP). In A. W. K. Gaillard & W. Ritter (Eds.), Advances in Psychology (Vol. 10, pp. 119–141). North-Holland. https://doi.org/10.1016/S0166-4115(08)62036-1

Sussman, E., Winkler, I., Huotilainen, M., Ritter, W., & Näätänen, R. (2002). Top-down effects can modify the initially stimulus-driven auditory organization. Cognitive Brain Research, 13(3), 393–405. https://doi.org/10.1016/S0926-6410(01)00131-8

Similarly, please make clearer how you determined the latency of the N2b given the overlap between MMN and N2b. You explain that the MMN latency was defined as the point at which the initial negative deflection reached 60% of the peak amplitude, but it’s not clear how you determined the latency for the N2b. Which slope and which peak amplitude did you use to calculate the latency of the N2b in the absolute condition?

Finally, please make clearer from which channels did you extract the data for your statistical comparisons. From Table 1 it looks like you picked Fz for the MMN and FCz for the N2b. Why did you choose these channels? The table also seems to show that peak difference wave amplitude for the transposed condition, for which there is potentially a clear MMN, was somewhere in the FC region (this is also confusing, see minor revisions), while the N2b in that same condition peaked somewhere in the F region. There are similar discrepancies for P3a and P3b subcomponents.

Minor Revisions

General note: please make sure all your variables are italicized (namely d’s and η^2)

Methods section: if it’s possible and not too much trouble, I’d appreciate a rough estimation of when the deviants actually deviated from the preceding standards (i.e. which note within the deviant stimulus was unexpected relative to the previous standard). I understand difference waves weren’t calculated this way (deviant – immediately preceding standard), but I’d like to know approximately when the average deviation occurred. You state that in the absolute condition the first note was always fixed at 400 Hz, and then you imply that the second note consistently deviated from the previous standard, but I’m not sure if I understood you correctly. This will help provide some context for the ERP latencies you’re reporting, since presumably the latencies (particularly of the early components) reflect about when the auditory and attentional systems detected a violation of the ongoing auditory regularity. If that varied a lot deviant by deviant, and if one condition varied more than the other, then the meaning of the latency differences between conditions becomes harder to interpret. At the very least I think you should provide a range of which notes could have represented deviations from the preceding standards, and perhaps address this potential issue in the discussion.

Methods section: how many different tone patterns were there? Were they randomly generated or predefined?

Results section: please remind the reader of the timeframes you’re using to define each subcomponent in their subsequent “latency” section of the results

Table 1: Why are some of the channel labels in EP ambiguous (i.e. F instead of Fz or F1)? Why not state exactly which channels each row is referring to? Do the rows represent averaged data from multiple channels? If so which channels? Please clarify this in the caption of the table.

Lines 59-60: “…facilitating the recognition of auditory objects despite large variation in spectral features”. This statement should have a reference.

Line 69: please supply more references that speak to the “rich body of research”

Line 88: “statistically robustly”. I assume you mean that there was a strong positive correlation between deviant ERP amplitude and the number of preceding standard stimuli? If that’s the case then I don’t think you need to include “statistically robustly”, as it reads as a subjective evaluation of the correlation. It’s enough to say that the amplitude increased as a function of the number of preceding stimuli.

Lines 107-109: Please support this sentence with a reference.

Line 564: “the reported P3a amplitude decrease and latency increase…”. Fig 4a shows that P3a latencies were not different. Please clarify.

Lines 565-568: Please include references for how the P3a is associated with any of the listed stimulus features/cognitive demands

Line 586: “notably and significantly reduced…”. Please refrain from using subjective evaluations (notably) when statistical comparisons suffice (statistically).

Reviewer #2: This paper is a follow up of a previous paper by Bader et al, in which, during a passive listening paradigm, no difference in MMN was found in response to pattern violations with absolute pitch or relative pitch changes. This paper did found decreased amplitude and increased latency of P3a, however, in the relative pitch compared to absolute pitch condition. The present paper conducts a similar study, but now in an active paradigm in which participants are asked to indicate when a repeating stimulus pattern changes, in conditions involving absolute versus relative pitch cues. The basic findings were that now P3a was similar in amplitude and latency across for the relative and absolute pitch conditions. And N2b and P3a were both delayed in the relative pitch compared to absolute pitch conditions.

The paper is interesting in attempting to dissect the stages of processing in order to determine where relative and absolute pitch information in patterns are processed similarly and differently. However I have some concerns.

1. A general difficulty with the approach is that there is not wide consensus in the literature as to exactly what these 4 components represent, leaving the paper with considerable speculation as to the meaning of the results.

2. The literature review on MMN, P3a and relative pitch is not very complete. For example, it is stated “Since then, a rich body of research has emerged, enabling a better understanding of the processing of complex sounds”, with no references given. For example, the work of Trainor and colleagues showed MMN and P3a responses for relative pitch information in non-musicians (e.g., 2002, J Cog Neuro) and infants (e.g., He et al., 2009 Eur J Neurosci). As well, musical training may significantly modify these brain responses to relative pitch information (e.g., Fujioka et al., 2004; J Cog Neuro), as may exposure to a tonal language, so the generalizability of the results should be considered in the discussion.

2. Methodologically, I think several major issues need to be addressed as follows.

2a. Stimuli: the stimulus patterns are very brief with the tones comprising them only 50 ms. The whole patterns are only 300 ms. Given that the temporal window of integration is 150 – 200 ms, it is not clear that these are perceived as patterns in the sense of a sequence of events. Rather they are likely processed as single units.

2b. The 4 components of the ERPs (MMN, N2b, P3, P3b) are difficult to separate and extract using waveforms from surface electrodes. Especially given the different topographies of the components that overlap, and the differences between conditions, the analyses should be done after source localization of each components has been done. Right now, not only can the components not be cleanly separated, but the ANOVA statistical approach is complicated by many factors and complex interactions. This makes the paper very difficult to read and raises the problem of many statistical tests. All of this would be much simpler and cleaner if done in source space. Furthermore, if I understand correctly, single electodes were analyzed, thus ignoring the rich information available from having collected 64 channels of data. Again sources space analysis would include all information across the scalp. Alternatively some kind of PCA analysis might also be able to better separate and characterize the components by utilizing all of the data from all electrodes over time.

2c. The jackknife approach needs more explanation. It is not clear if it was applied across participants or whether an individual latency was calculated for each participant.

2d. The use of when 60% and 100% of the peak of some of the components was not clear and seems arbitrary in order to make arguments that some effects of significant at 60% even though not at 100%. This needs to be justified or should be removed.

2e. There is a fundamental problem in separating and distinguishing the MMN and N2b components in the absolute pitch condition. Thus, without doing some kind of source localization, both the MMN and N2b results are unclear.

3. Results/interpretation

3a. MMN was actually found to be a bit larger in absolute condition compared to the relative condition, which is actually not in agreement with Bader et al. However, this is inconclusive as, with the current analysis methods, it was not really possible to separate MMN and N2b. The authors argue that because of this MMN is not really different across conditions, but I think this remains unknown.

3b. With respect to N2b it was found to be later in the relative than absolute pitch condition. However, this is not completely convincing as the MMN and N2b could not be completely separated. This delayed N2b for relative pitch is interpreted as increased discrimination demands, but it seems that other interpretations might be possible.

3c. The amplitude of the N2b was “descriptively” smaller in amplitude for the relative pitch case, but not statistically different. A somewhat convoluted argument is made that N2b might reflect both “representational strength” and “attention” and that the lack of significant amplitude difference might reflect some combination of reduced “representational strength” and increases “attention in the relative pitch case. This seems overly speculative.

3d. No amplitude or latency differences were found for P3a for relative and absolute conditions. This is consistent with the idea that top-down processes (attention?) compensate for any increased difficult of the relative pitch condition. But the authors go on to state the absolute and relatively pitch information might just be processed “differently”. I couldn’t follow this argument.

3e. The P3b result was quite clear – no amplitude different but a longer latency for relative than absolute condition.

3f. From the results, the N2b and P3b were both delayed in the relative compared to absolute condition. The authors interpret the N2b as representing “target selection” and the P3b as representing change detection, or working memory or response related processing. Therefore, they conclude that target selection is delayed from relative pitch, but that change detection, working memory and response-related processing are not of themselves delayed. This argument of course depends on accurate assessment of N2b latency.

3g. I find it curious that in the conclusions, no mention is made of MMN or P3a. There is no discussion of how these two component relate to N2b and P3b in a stages of processing account. And why, for example, N2b can be delay while the following P3a is not delayed.

In sum, this is an interesting study, but many questions need to be addressed before the results can be interpreted with confidence.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Feb 25;16(2):e0247495. doi: 10.1371/journal.pone.0247495.r002

Author response to Decision Letter 0


23 Dec 2020

Dear Reviewers,

We are pleased to resubmit for publication the revised version of the manuscript entitled “Change detection of auditory tonal patterns defined by absolute versus relative pitch information. A combined behavioural and EEG study.” by Nina Coy, Maria Bader, Erich Schröger, and Sabine Grimm.

We appreciate your constructive comments. We will address the central concern regarding ERP component separation first. Each of your other concerns are addressed in the later sections.

Separation of ERP components, especially MMN and N2b

Reviewer #1: “Please explain in more detail how you determined that there is, in fact, both an MMN and an N2b in the absolute pitch condition. While I’m sympathetic to the argument that “should” be both components present in the absolute condition, and that the difference wave in Figure 3 represents a likely spatiotemporal overlap between the two conditions, I am, however, concerned that the MMN and N2b scalp maps look very similar. My understanding is that the N2b, unlike the MMN, does not typically have an inversion at temporal sites (e.g. Näätänen & Gaillard, 1983; Sussman et al., 2002 see Fig 2), but the N2b scalp map shown in Figure 3 appears to have just as much source located around temporal/posterior sites as does the MMN scalp map. Since one of the main conclusions of the study is that automatic coding of auditory patterns occurs at a similar timecourse regardless of whether absolute pitch information is available (as indexed by the latency of the MMN), but that intentional processes are slower to detect relevant pitch deviants (as indexed by the latency of the N2b as well as the P3b), it’s vital to your argument that you make clear that both the MMN and N2b were, in fact, generated in each condition.”

Reviewer #2: “The 4 components of the ERPs (MMN, N2b, P3, P3b) are difficult to separate and extract using waveforms from surface electrodes. Especially given the different topographies of the components that overlap, and the differences between conditions, the analyses should be done after source localization of each components has been done. Right now, not only can the components not be cleanly separated, but the ANOVA statistical approach is complicated by many factors and complex interactions. This makes the paper very difficult to read and raises the problem of many statistical tests. All of this would be much simpler and cleaner if done in source space. Furthermore, if I understand correctly, single electodes were analyzed, thus ignoring the rich information available from having collected 64 channels of data. Again sources space analysis would include all information across the scalp. Alternatively some kind of PCA analysis might also be able to better separate and characterize the components by utilizing all of the data from all electrodes over time.”

We fully agree with the Editor and Reviewers, that more detailed knowledge about the sources or the underlying component structure would immensely help when analysing (and interpreting) the effects of our experimental manipulations. Our research group has expertise in analysing EEG data using source separation methods such as PCA (e.g., Bonmassar, Widmann & Wetzel, 2020; Widmann, Widmann, Schröger & Wetzel, 2018; Scharf & Nestler, 2018; Scharf & Nestler, 2019) and we pursued several attempts of applying a PCA to the data reported in the manuscript. The typical approach would be to decompose data using a temporal PCA in a combined fashion (using standard and deviant ERPs from both the absolute and the transposed condition together as input). However, such combined approaches will usually not satisfactorily unmix the data unless the systematic condition-related latency differences are sufficiently large (in the range of 100 ms; see, Barry et al., 2016). Please note, that this restriction even holds for very prominent components such as the P3b. Thus, for small or medium latency differences between conditions, PCA often results in a single latent component capturing the time course activated by both conditions and a second latent component capturing the difference between the two latencies. Thus, latency shifts across individuals or conditions can result in “ghost” components resembling the time-derivative of the latency behaviour (Mocks 1986; Dien, 1998).

A separate temporal PCA approach (running the PCA for the absolute and the transposed condition separately, Fig 1) is therefore in our case more appropriate. Yet, while PCA (i.e., finding orthogonal directions) might allow the separation of well-distinguishable components, a high overlap in time can prevent the algorithm from finding a satisfactory solution during unmixing (partly also due to similar reasons as mentioned above). We see this in our results: for the absolute condition, a very broad negative component is found (including both peaks that we interpret as MMN and N2b), for the transposed condition, the PCA reveals two components in the time range of MMN and N2b, one that is again broad and two-peaked and one that only includes the later peak of N2b (Fig. 1 bottom panel). This somewhat points in the right direction, yet it is overall difficult to interpret as we know that the results of PCA are affected by the degree of overlap and since we see in our separate PCAs signs of problematic unmixing (component time courses are partly not monophasic and focal). We run into similar problems when applying a spatial PCA (Fig 2): such an approach is more sensitive to the specific rotation method used and while it is consistently able to distinguish between the parietal topography of P3b and a frontocentral negative component, we only see an inconsistent unmixing of frontal and central parts of the topographies of MMN and N2b (which have a probably even higher overlap in the spatial dimension than in the temporal dimension). Spatial PCA generally suffers from overlap of components due to volume conduction (Dien, 2012), which means that typically many or even most electrodes have non-zero loadings on more than one factor (i.e., cross-loadings). This often results in worse component separation performance compared to temporal PCA.

Figure 1 Overview of principal termporal components (TCs) topography and time course retrieved from a temporal PCA (goemin, epsilon = 0.5) for each condition separately. The TC1 in ech condition is a late positivity with parieto-occipital distribution peaking at 658 ms in the absolute and at 700 ms in the transposed condition (reflecting P3b). TC2 in each condition is a broad, partly double-peaked negative component (with frontocentral distribution peaking at around 390 ms) – likely reflecting the MMN/N2b complex. Bottom panel shows projected TCs whose time course falls into the range of MMN/N2b in comparison to the grand averaged deviant-standard difference waves. In the absolute condition one main component was found (TC2 – explaining 26.4 % of the variance in the data) showing a broad time course of frontocentral negativity (roughly from 150 ms – 550 ms). In the transposed condition, TC2 (explaining 20.5% of the variance in the data) has a similarly broad time course (roughly from 150 ms – 550 ms) and is double peaked. An additional component TC3 explains 3.5 % of the variance and coincides mainly with the second peak of the grand averaged ERP difference wave form (likely reflecting N2b). Please note, that the temporal components are overall rather broad and in several instances biphasic. This likely reflects difficulties in unmixing the data due to high temporal and spatial overlap of MMN and N2b.

Figure 2 Top: Overview of spatial components (SCs) topography and time course retrieved from a combined spatial PCA (varimax rotated). The first SC is a component with a parieto-occipital distribution explaining 55 % of the variance in the data (reflecting P3b). The second SC has a frontocentral distribution and is mainly active in broad time window covering MMN/N2b. The third SC is a rather focussed central component also covering wide portions of the MMN/N2b explaining 8 % of the variance in the data. Bottom: Comparison of grand-average difference waveforms and PC projections for the first three SCs The parieto-occipital component SC1 explains the P3b activity in the data to a high degree. SC2 with a frontocentral distributions is active during the time course of MMN/N2b , but partly also during later time ranges. SC3 with a central distribution contributes to an early portion of the negativity (covering MMN and N2b) in the absolute (left panel), but not so much in the transposed condition (right panel). Again this component also contributes to activity in later time ranges. Please note, that the orthogonal nature of PCA will not easily deal with high spatial overlap between components. Whether the central component SC3 is rather a “ghost” source or reflects a true centrally distributed activation contributing to MMN (since it peaks rather early) is unclear – yet MMN topography shifts to more central/parietal regions are sometimes seen for complex stimulus material.

Therefore, we are still highly hesitant whether results of the PCA reflect the real situation better and add trustful, valuable information to the analysis of ERPs.

To not leave the Reviewers’ justified concerns unaddressed, we finally chose a different approach. In the literature, the classical way of distinguishing MMN and N2b is a direct comparison between a passive and an active oddball condition (see Potts et al., 1998; Wei, Chan & Luo, ; Sussman, 2007). Since we have a very similar dataset from the study of Bader et al. (2017), where subjects listened to the same complex tonal patterns in a passive paradigm (only with a different SOA and a slightly different combination of possible train lengths), we decided to directly compare these conditions graphically. In comparison to the data reported by Bader et al. (2017) MMN and P3a are present in both paradigms, whereas N2b and P3b were only observed in the active listening task of the current study. Specifically, the comparison suggests that the morphology of MMN in response to deviant patterns in the transposed condition in our active paradigm is highly concordant with the passively elicited MMN – both with regard to latency and amplitude. In the absolute condition, the initial portion of the negativity shows a similar concordance, with a divergence in between active and passive starting at around 210 ms (that is 160 ms after deviance onset), which is well in line with typical slope latencies of N2b (Breton et al., 1988; Näätänen et al., 2014; Novak, Ritter, Vaughan, & Wiznitzer, 1990; Sams et al., 1983). Thus, MMN shows a similar pattern in both studies, whereas the relatively later negativity is only observable in the active but not the passive listening condition.

Additionally, this direct comparison yielded another potentially useful insight: it appears that P3a elicited by true pattern changes might be separable into two distinct subcomponents – namely, an early and a late P3a. When visually comparing both studies a potential late P3a was elicited of similar latency and amplitude in both studies irrespective of pitch information. However, a potential early P3a was elicited only in response to absolute true pattern changes in passive listening. Thus, this early P3a activity could be a candidate to further elucidate differential pattern processing as a function of pitch in future studies.

Figure 3 Grand average waves elicited in response to true pattern changes occurring in an absolute and relative (transposed) pitch context during active (left panel; current submission) and passive listening (middle panel; Bader et al., 2017). The right panel shows the grand averaged deviant-standard difference ERPs reported in both studies.

Additionally, in order to simplify the statistical analysis, we decided to drop the laterality factor from the ANOVA and collapse the cells accordingly. We have uploaded the revised statistical analysis (Rmd Files) to the OSF repository linked in the manuscript.

References

Barry, R. J., De Blasio, F. M., Fogarty, J. S., & Karamacoska, D. (2016). ERP Go/NoGo condition effects are better detected with separate PCAs. International Journal of Psychophysiology, 106, 50-64.

Breton F, Ritter W, Simson R, Vaughan HG. (1988) The N2 component elicited by stimulus matches and multiple targets. Biol Psychol;27:23–44.

Bonmassar C, Widmann A, Wetzel N. (2020) The impact of novelty and emotion on attention-related neuronal and pupil responses in children. Dev Cogn Neurosci;42:1878–9293. https://doi.org/10.1016/j.dcn.2020.100766.

Dien, J. (1998). Addressing misallocation of variance in principal components analysis of event-related potentials. Brain topography, 11(1), 43-55.

Dien J. (2012) Applying principal components analysis to event-related potentials: A tutorial. Dev Neuropsycholy; 37:497–517.

Mocks, J. The influence of latency jitter in principal component analysis of event-related potentials. Psychophysiology, 1986, 23(4): 480-484.

Novak GP, Ritter W, Vaughan HG, Wiznitzer ML. Differentiation of Negative Event-Related Potentials in an Auditory-Discrimination Task. (1990) Electroencephalogr Clin Neurophysiol;75:255–75.

Näätänen R, Sussman ES, Salisbury D, Shafer VL. Mismatch negativity (MMN) as an index of cognitive dysfunction. (2014) Brain Topogr 2;27:451–66. https://doi.org/10.1007/s10548-014-0374-6.

Potts, G. F., Dien, J., Hartry-Speiser, A. L., McDougal, L. M., & Tucker, D. M. (1998). Dense sensor array topography of the event-related potential to task-relevant auditory stimuli. Electroencephalography and clinical neurophysiology, 106(5), 444-456.

Sams M, Alho K, Näätänen R. (1983) Sequential effects on the ERP in discriminating two stimuli. Biol Psychol;17:41–58. https://doi.org/10.1016/0301-0511(83)90065-0.

Scharf & Nestler, S. (2018) Principles behind variance misallocation in temporal exploratory factor analysis for ERP data: Insights from an inter-factor covariance decomposition. International Journal of Psychophysiology. 128; 119-136.

Scharf F, Nestler S. (2019) A comparison of simple structure rotation criteria in temporal exploratory factor analysis for event-related potential data. Methodology. 15:43–60.

Sussman ES. A new view on the MMN and attention debate: The role of context in processing auditory events. J Psychophysiol 2007;21:164–75. https://doi.org/10.1027/0269-8803.21.34.164.

Wei JH, Chan TC, Luo YJ. A modified oddball paradigm ‘cross-modal delayed response’ and the research on mismatch negativity. Brain Res Bull 2002;57:221–30. https://doi.org/10.1016/S0361-9230(01)00742-0.

Widmann A, Schröger E, Wetzel N. (2018) Emotion lies in the eye of the listener: Emotional arousal to novel sounds is reflected in the sympathetic contribution to the pupil dilation response and the P3. Biol Psychol;133:10–7.

Revisions requested by Reviewer #1:

“Similarly, please make clearer how you determined the latency of the N2b given the overlap between MMN and N2b. You explain that the MMN latency was defined as the point at which the initial negative deflection reached 60% of the peak amplitude, but it’s not clear how you determined the latency for the N2b. Which slope and which peak amplitude did you use to calculate the latency of the N2b in the absolute condition?”

We extended the explanation of the jackknife approach in the methods section (cf. ll. 300-311) and added the search windows for each ERP component both in the methods as well as the respective results sections. We used different search windows for MMN and N2b.

“Finally, please make clearer from which channels did you extract the data for your statistical comparisons. From Table 1 it looks like you picked Fz for the MMN and FCz for the N2b. Why did you choose these channels? The table also seems to show that peak difference wave amplitude for the transposed condition, for which there is potentially a clear MMN, was somewhere in the FC region (this is also confusing, see minor revisions), while the N2b in that same condition peaked somewhere in the F region. There are similar discrepancies for P3a and P3b subcomponents.”

AND

“Table 1: Why are some of the channel labels in EP ambiguous (i.e. F instead of Fz or F1)? Why not state exactly which channels each row is referring to? Do the rows represent averaged data from multiple channels? If so which channels? Please clarify this in the caption of the table.”

We added a more elaborate explanation both to the methods section: “As there were no meaningful effects of laterality, the lateral dimension was collapsed, in order to simplify the statistical analysis. Please note that the activity values along the midline (factor frontality) represent averaged values not only including the central electrode but also the respective lateral electrodes directly adjacent to the midline electrode respectively ; e.g., the factor level frontal is the average of Fz (middle), F3 (left) and F4 (right).” (ll. 324-328) as well as a similar note in Table 1.

“General note: please make sure all your variables are italicized (namely d’s and η^2)”

We changed the formatting accordingly.

“Methods section: if it’s possible and not too much trouble, I’d appreciate a rough estimation of when the deviants actually deviated from the preceding standards (i.e. which note within the deviant stimulus was unexpected relative to the previous standard). I understand difference waves weren’t calculated this way (deviant – immediately preceding standard), but I’d like to know approximately when the average deviation occurred. You state that in the absolute condition the first note was always fixed at 400 Hz, and then you imply that the second note consistently deviated from the previous standard, but I’m not sure if I understood you correctly. This will help provide some context for the ERP latencies you’re reporting, since presumably the latencies (particularly of the early components) reflect about when the auditory and attentional systems detected a violation of the ongoing auditory regularity. If that varied a lot deviant by deviant, and if one condition varied more than the other, then the meaning of the latency differences between conditions becomes harder to interpret. At the very least I think you should provide a range of which notes could have represented deviations from the preceding standards, and perhaps address this potential issue in the discussion.”

We added the following to the discussion: “Detection of a new pattern needs at least two segments in both conditions. However, there might have been some jitter in (perceptual) change onset between conditions, resulting in a temporally more variable transposed MMN, not as well captured by a window averaging approach.” (ll.639-642).

“Methods section: how many different tone patterns were there? Were they randomly generated or predefined?”

Patterns were randomly generated. (l. 225)

“Results section: please remind the reader of the timeframes you’re using to define each subcomponent in their subsequent “latency” section of the results”

We added the search windows for each ERP component in the respective latency results sections. MMN: l. 407f., N2b: l. 473f., P3a: l. 505f., P3b: l. 548f.

“Lines 59-60: “…facilitating the recognition of auditory objects despite large variation in spectral features”. This statement should have a reference.”

We added several references: (Bartlett & Dowling, 1980; Halpern, Bartlett, & Dowling, 1998; McDermott, Lehr, & Oxenham, 2008; Schindler, Herdener, & Bartels, 2013; Trainor, McDonald, & Alain, 2002).

“Line 69: please supply more references that speak to the “rich body of research”

We referenced several reviews that summarise research on complex sounds: Paavilainen, 2013; Peretz & Zatorre, 2005; Picton, Alain, Otten, Ritter, & Achim, 2000; Saffran & Griepentrog, 2001; Winkler, 2007).

“Line 88: “statistically robustly”. I assume you mean that there was a strong positive correlation between deviant ERP amplitude and the number of preceding standard stimuli? If that’s the case then I don’t think you need to include “statistically robustly”, as it reads as a subjective evaluation of the correlation. It’s enough to say that the amplitude increased as a function of the number of preceding stimuli.”

We dropped “robustly”.

“Lines 107-109: Please support this sentence with a reference.”

We extended this sentence and added several references: “Nonetheless, it should not be neglected that the comparison of new patterns arguably is computationally more complex when reliant on relative compared to absolute pitch information, as it is not sufficient to compare whether two pitches are the same but rather whether their relative distance is. For instance, it was found that previously heard melodies that were in the same key at exposure and test were recognized with greater accuracy than melodies that were transposed [30–33], and that the ability to process relative pitch information depends on experience [30,32,34–37].” (ll.106-114)

“Line 564: “the reported P3a amplitude decrease and latency increase…”. Fig 4a shows that P3a latencies were not different. Please clarify.”

This sentence referred to the findings by Bader et al. (2017), not to the current study. We explicitly added the reference, so there is no confusion.

“Lines 565-568: Please include references for how the P3a is associated with any of the listed stimulus features/cognitive demands.”

Although the reasoning behind these features is extensively described in the introduction section, we added some references. Now in ll. 582-583

“Line 586: “notably and significantly reduced…”. Please refrain from using subjective evaluations (notably) when statistical comparisons suffice (statistically).”

We removed the “notably” from that sentence.

Revisions requested by Reviewer #2

“A general difficulty with the approach is that there is not wide consensus in the literature as to exactly what these 4 components represent, leaving the paper with considerable speculation as to the meaning of the results.”

We tried to explicitly point out differing accounts on how to interpret modulations of said ERP components, and contrast the different inferences that could be drawn from them. While for instance, we may not be able to draw on empirically bullet-proof conceptions of these components, there is some consensus. For instance, while we may not know the exact meaning of P300 (P3b), there is considerable consensus that it reflects (i.e., covaries with) stimulus evaluation time as we described in the discussion section on P3b.

“The literature review on MMN, P3a and relative pitch is not very complete. For example, it is stated “Since then, a rich body of research has emerged, enabling a better understanding of the processing of complex sounds”, with no references given. For example, the work of Trainor and colleagues showed MMN and P3a responses for relative pitch information in non-musicians (e.g., 2002, J Cog Neuro) and infants (e.g., He et al., 2009 Eur J Neurosci). As well, musical training may significantly modify these brain responses to relative pitch information (e.g., Fujioka et al., 2004; J Cog Neuro), as may exposure to a tonal language, so the generalizability of the results should be considered in the discussion.”

We added these references both to the introduction as well as the discussion.

“Stimuli: the stimulus patterns are very brief with the tones comprising them only 50 ms. The whole patterns are only 300 ms. Given that the temporal window of integration is 150 – 200 ms, it is not clear that these are perceived as patterns in the sense of a sequence of events. Rather they are likely processed as single units.”

We would argue that this question largely depends on the conceptualisation of what a pattern is. Our patterns are likely perceived as a consecutive stream of tone pips of different pitch (rather than as distinct notes) and in that sense probably rather as single units. Nevertheless, we know from previous studies that temporal order is well preserved in representations of such tone pip patterns (e.g. Weise, Grimm, Trujillo-Barreto & Schröger (2014) Timing matters: the processing of pitch relations, Frontiers in Neuroscience, 8:387), so the system is sensitive to the exact spectrotemporal composition of such pattern stimuli. We agree that it is an interesting question to explore whether such sound patterns are perceived as single or as concatenated units. However, in our opinion this question is beyond the scope of the current experiment. We do not think that either one of these possibilities regarding the quality of the percept is in principle incompatible with the finding that even though spectrotemporal patterns informed by relative pitch can be sufficiently represented to discriminate them well above chance from other patterns, they are processed differently compared to when absolute pitch is available.

“The jackknife approach needs more explanation. It is not clear if it was applied across participants or whether an individual latency was calculated for each participant.”

We changed the wording, to make the explanation clearer: “Specifically, the time point was estimated at which the amplitude of a particular component across leave-one-participant-out subsamples of the grand-averaged wave first reaches specific percentage values of the respective peak amplitude; slope (60%) and peak (100%).” (ll. 302-305)

“The use of when 60% and 100% of the peak of some of the components was not clear and seems arbitrary in order to make arguments that some effects of significant at 60% even though not at 100%. This needs to be justified or should be removed.”

We added the following explanation to the method section: “The 60%- relative peak estimate was included firstly, because relative latency estimates have been shown to be less noisy than peak latency estimates using the jack-knifing technique, and secondly [70], to probe whether latency effects are already present in the build-up of a given component.”

“There is a fundamental problem in separating and distinguishing the MMN and N2b components in the absolute pitch condition. Thus, without doing some kind of source localization, both the MMN and N2b results are unclear.”

We tried to address this problem in the discussion section: “However, in the context of absolute pitch there was only one prominent, quite broadly distributed frontocentral negative deflection within the typical latency range of MMN and N2b, impeding a robust estimation of MMN amplitude and latency.“ (ll. 643-646) As already described in the first section of this letter, we added a comparative figure to the supplements. This includes both the data from the current study as well as the data from Bader et al. (2017) to contrast the ERPs from active and passive listening. We argue that this figure shows that the large negative deflection is only present within the active listening task, and that there is a consistent time course for the initial negative deflection compatible with MMN elicitation in both active and passive listening and both absolute and relative pitch conditions respectively.

“MMN was actually found to be a bit larger in absolute condition compared to the relative condition, which is actually not in agreement with Bader et al. However, this is inconclusive as, with the current analysis methods, it was not really possible to separate MMN and N2b. The authors argue that because of this MMN is not really different across conditions, but I think this remains unknown.”

Given the more central contributions in the absolute compared to the transposed MMN, we argued that the likeliest explanation would be a spatial and temporal overlap by N2b in the absolute condition. Naturally, we cannot preclude that the significant difference in the MMN window reflects a true MMN amplitude difference. However, as already pointed out, the strong overlap with N2b did not allow a robust MMN amplitude (& latency) estimation. As it is well known that N2b overlaps with MMN in active paradigms, (cf. introduction and discussion for references) we would still argue this is the far likelier explanation. Especially given that Bader et al. (2017) did not find a significant MMN amplitude difference in passive listening. Of course, absence of evidence is not evidence of absence. Therefore, we extended this discussion part to more explicitly explain the interpretations of a potential true amplitude difference (ll. 629ff.).

“With respect to N2b it was found to be later in the relative than absolute pitch condition. However, this is not completely convincing as the MMN and N2b could not be completely separated. This delayed N2b for relative pitch is interpreted as increased discrimination demands, but it seems that other interpretations might be possible.”

We tried to discuss several different interpretations (ll. 699ff.): “Taken together, it seems that N2b latency not merely indexes the speed but also the complexity, and perhaps accuracy, of the processing of deviating and, or target stimulus characteristics. In these terms, the increase in latency in the relative compared with the absolute pitch context, might well reflect generally increased computational complexity when a pattern is defined by relative rather than absolute pitch. As Ritter et al. (1992) pointed out that the interpretation of N2b largely relies “on the experimental circumstances and the strategies used by the subjects”, one could argue even further: N2b latency differences as a function of pitch might not just reflect differences in mere processing time between absolute and relative patterns, but might actually indicate wholly different kind of processes operating at the stage of stimulus comparison or categorisation respectively [97,100].”

“The amplitude of the N2b was “descriptively” smaller in amplitude for the relative pitch case, but not statistically different. A somewhat convoluted argument is made that N2b might reflect both “representational strength” and “attention” and that the lack of significant amplitude difference might reflect some combination of reduced “representational strength” and increases “attention in the relative pitch case. This seems overly speculative.”

We did not mean that N2b amplitude is necessarily modulated by both. As findings on N2b amplitude modulations are generally rather inconsistent, we aimed to stress that there are several different explanations that could account for the reported findings. Some of them could relate to differences in the processing of absolute and relative pitch at the level of N2b, but such a descriptive difference could just as well be accounted by the differential overlap with P3a – as the N2b peaked later in the relative pitch context, it more temporally overlapped with P3a (which did not show a significant latency effect by pitch context).

“No amplitude or latency differences were found for P3a for relative and absolute conditions. This is consistent with the idea that top-down processes (attention?) compensate for any increased difficult of the relative pitch condition. But the authors go on to state the absolute and relatively pitch information might just be processed “differently”. I couldn’t follow this argument.”

We meant to say that in active listening we did not find evidence that processing of true pattern changes is affected by absence of absolute pitch information at the level of P3a. However, we would advise caution to infer from this finding in active listening, that the same holds true for passive listening. Simply because top-down processes could compensate in active listening, the same might not be true in passive listening. Such an inference is, in our opinion, beyond the scope of the current study. During the comparison of the current data with Bader et al. (2017) we noticed that when taking the data from both studies together, this point is stressed even further. We added this to the discussion in the following manner: “Actually, in the direct comparison of passive and active listening task (supporting information S3), it appears that P3a elicited by true pattern changes might be separable into two distinct subcomponents [95–99] – namely an early and a late P3a. When visually comparing both studies a potential late P3a was elicited of similar latency and amplitude in both studies irrespective of pitch information. However, a potential early P3a was elicited only in response to absolute true pattern changes in passive listening. Thus, this early P3a activity could be a candidate to further elucidate differential pattern processing as a function of pitch in future studies.” Thus, in passive listening absence of absolute pitch information might not have caused an increase in P3a latency per se, but rather early P3a seems to only be elicited in response to absolute and not relative true pattern changes. Such an early P3a was not observed in active listening regardless of type of pitch information. This points to a striking divergence between passive and active processing of absolute and relative pitch.

“From the results, the N2b and P3b were both delayed in the relative compared to absolute condition. The authors interpret the N2b as representing “target selection” and the P3b as representing change detection, or working memory or response related processing. Therefore, they conclude that target selection is delayed from relative pitch, but that change detection, working memory and response-related processing are not of themselves delayed. This argument of course depends on accurate assessment of N2b latency.”

To clarify, we would argue that P3b reflects latency stimulus evaluation time in the sense that it covaries with it, but does not represent change detection, working memory processes itself – which is in line with literature (e.g. Dien, 2014; Verleger, 2020). This means, whatever time penalty relative compared to absolute pitch introduces, likely happened at previous processing stages. We did not find evidence that MMN or (late) P3a were delayed. We revised the discussion section on P3b as follows to clarify: “Within the traditional framework [51,52] the lack of evidence for modulation of P3b amplitude by pitch information implies that during the revision, or updating of the current mental model in response to the detection of a true pattern change, attentional resources are equally available [22,38]. A more recent account suggests that P3b rather reflects reactivation of stimulus-response links, and that P3b amplitude increases as a function of response infrequency [108]. The absence of P3b amplitude modulation in the current study would thus be explained by the fact that true pattern changes occurred equiprobably in both absolute and relative pitch context, resulting in comparable response frequencies. Within both contexts, the increase in P3b latency would signify increased stimulus evaluation time [22,48,109]. Indeed, the observed delay was already present at the N2b level, and did not notably increase at the level of P3b. That P3b peaked after the correct behavioural response was made [24], implies that it reflects post-identification stimulus categorisation, response selection and execution processes [24,48,49,55,56,108] which are not vulnerable to the absence of pitch information.” (ll. 752-765). We agree that this inference rests on the reported N2b latency shift.

“I find it curious that in the conclusions, no mention is made of MMN or P3a. There is no discussion of how these two component relate to N2b and P3b in a stages of processing account. And why, for example, N2b can be delay while the following P3a is not delayed.”

We added a sentence on MMN and P3a in the conclusion section. In terms of processing stages, we connected MMN to sensory change detection, N2b to target selection, P3a to working memory, P3b to (post-) response related-processes. We are not aware of literature providing a robust picture of how these components are linked and interact. We therefore decided against engaging in a speculative attempt to explain this.

Yours sincerely,

Nina Coy (on behalf of all co-authors)

Attachment

Submitted filename: SP2_responseReviewers.pdf

Decision Letter 1

Qian-Jie Fu

19 Jan 2021

PONE-D-20-25390R1

Change detection of auditory tonal patterns defined by absolute versus relative pitch information. A combined behavioural and EEG study.

PLOS ONE

Dear Dr. Coy,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 05 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Qian-Jie Fu, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have thoroughly addressed all my comments. I believe the concern about separation of MMN and N2b is most adequately addressed in Fig 3 of the response to reviewers, which directly compares the ERPs in active and passive paradigms. This figure, and some of the preceding discussion, should be included in the paper itself as further evidence of the temporal overlap between MMN and N2b components in the absolute condition. I would also recommend making clearer that the ERPs shown in column 3 are difference waves, and which waveforms are from the active and which are from the passive condition, in both the legend and the caption. I eventually figured it out but it took me a while. Otherwise I think the paper is much better, much clearer, and is ready for publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Feb 25;16(2):e0247495. doi: 10.1371/journal.pone.0247495.r004

Author response to Decision Letter 1


29 Jan 2021

Dear Reviewers,

We are pleased to resubmit for publication the revised version of the manuscript entitled “Change detection of auditory tonal patterns defined by absolute versus relative pitch information. A combined behavioural and EEG study.” by Nina Coy, Maria Bader, Erich Schröger, and Sabine Grimm.

We appreciate the positive response to our first revision. We have amended the requested changes from Reviewer #1 as follows.

Reviewer #1: “The authors have thoroughly addressed all my comments. I believe the concern about separation of MMN and N2b is most adequately addressed in Fig 3 of the response to reviewers, which directly compares the ERPs in active and passive paradigms. This figure, and some of the preceding discussion, should be included in the paper itself as further evidence of the temporal overlap between MMN and N2b components in the absolute condition. I would also recommend making clearer that the ERPs shown in column 3 are difference waves, and which waveforms are from the active and which are from the passive condition, in both the legend and the caption. I eventually figured it out but it took me a while. Otherwise I think the paper is much better, much clearer, and is ready for publication.“

We have now included the, previously supplementary, figure comparing the ERP data between active and passive listening to the main manuscript body (now Fig 5) and extended its discussion. We changed the legend and added some more explicit information to improve its readability, especially concerning the differentiation between stimulus level and difference level ERPs.

Yours sincerely,

Nina Coy (on behalf of all co-authors)

Attachment

Submitted filename: SP2_responseReviewers.pdf

Decision Letter 2

Qian-Jie Fu

9 Feb 2021

Change detection of auditory tonal patterns defined by absolute versus relative pitch information. A combined behavioural and EEG study.

PONE-D-20-25390R2

Dear Dr. Coy,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Qian-Jie Fu, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Qian-Jie Fu

17 Feb 2021

PONE-D-20-25390R2

Change detection of auditory tonal patterns defined by absolute versus relative pitch information. A combined behavioural and EEG study.

Dear Dr. Coy:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Qian-Jie Fu

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Example of absolute pattern sequence depicted in Fig 1.

    (WAV)

    S2 File. Example of transposed pattern sequence depicted in Fig 1.

    (WAV)

    Attachment

    Submitted filename: SP2_responseReviewers.pdf

    Attachment

    Submitted filename: SP2_responseReviewers.pdf

    Data Availability Statement

    Data underlying the findings are deposited on the OSF repository: (10.17605/OSF.IO/PTGR3).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES