Evidence for temporal-coherence-based segregation of complex auditory scenes in the newborn human brain

Silvia Polver; Petra Kovács; Gábor P Háden; István Sziller; István Winkler; Brigitta Tóth

doi:10.3389/fnhum.2026.1719515

. 2026 Apr 1;20:1719515. doi: 10.3389/fnhum.2026.1719515

Evidence for temporal-coherence-based segregation of complex auditory scenes in the newborn human brain

Silvia Polver ^1,^2,^3,^*, Petra Kovács ^1,^4,^5,⁶, Gábor P Háden ^1,⁷, István Sziller ⁸, István Winkler ¹, Brigitta Tóth ¹

PMCID: PMC13079306 PMID: 41993067

Abstract

Detecting a target sound within a mixture of sounds (referred to as auditory stream segregation) is crucial for perception in natural environments, a skill humans and animals excel at. This study investigates the role of temporal coherence in auditory stream segregation in human newborns using high-density EEG recordings. Sleeping newborns were exposed to temporally coherent auditory tone sequences embedded in random background tones, and their event-related responses were analyzed. The results indicate that newborns can segregate auditory streams based on temporal coherence, suggesting that this stream segregation is driven by automatic mechanisms from birth, as evidenced by brain responses resembling the object-related negativity (ORN) event-related potential (ERP) component. However, discriminating among different signal-to-noise ratios requires further fine-tuning, as evidenced by delayed latencies in neonates compared to adults. These findings indicate that temporal coherence aids in detecting and orienting toward salient stimuli, thereby laying the foundation for the development of abilities such as selective attention and speech perception.

Keywords: auditory stream segregation, event-related potentials, newborn infants, object-related negativity, temporal coherence

1. Introduction

In everyday life, we are often surrounded by multiple sounds playing at once, competing for our attention, such as conversations in crowded spaces, traffic noise, and background music. Separating sounds from this mixture (auditory stream segregation) is crucial for auditory perception in natural environments, a skill at which both humans and animals excel (Bregman, 1994). This process enables us to navigate and interpret complex auditory environments efficiently and is crucial for effective communication, speech comprehension, and sustained and selective attention. There are two primary forms of auditory stream segregation: concurrent or spectral and sequential stream segregation (Bregman, 1994). The former relies on spectral cues from the set of concurrently presented sounds, such as harmonic relationships, without reference to previous sounds. The latter provides cues extracted from the relationship between successive sounds, such as their similarity. There is, however, a cue, which is based on integrating concurrent and sequential stimulus features: temporal coherence, i.e., the dynamic co-modulation of sounds, which helps the brain to bind concurrent sounds together into perceptual streams, and separating them from sounds that follow different patterns of temporal modulation (Shamma et al., 2011; Teki et al., 2013). The present study tested neonatal auditory stream segregation based on temporal coherence, which requires integrating concurrently presented frequency components over time to form a coherent auditory object.

Much research has been conducted to understand the neural mechanisms and cognitive processes underlying auditory stream segregation (for recent reviews, see Hermes, 2023; Winkler and Denham, 2024). However, less is known about newborn infants’ innate capabilities. The present study aims to investigate the innate aspects of auditory stream segregation with respect to temporal coherence, focusing on the developmental emergence and neural mechanisms underlying auditory scene analysis.

1.1. The temporal coherence theory of auditory stream segregation

Temporal coherence refers to the joint modulation of the amplitudes of concurrent spectral elements of the incoming sound over time. It represents a general feature of our physical environment: all parts of a sound originating from the same source usually fluctuate together in time (Lu et al., 2017). This principle was first observed by the Gestalt school of psychology (termed common fate; Köhler, 1947) and is an essential factor in auditory stream segregation, probably because, unlike their visual counterpart, auditory scenes do not include static elements, which can be revisited for extracting further information (Boncz et al., 2024; O'Sullivan et al., 2015; Shamma et al., 2011; Teki et al., 2013; Tóth et al., 2016). The brain of adult listeners detects this coherence (O'Sullivan et al., 2015) and uses it to group concurrent sounds with similar temporal modulation, while separating them from concurrent sounds with different temporal modulation patterns. For example, temporal coherence helps us separate a speech stream from background noise by detecting consistent timing patterns in speech signals (Shamma et al., 2013).

Several results support the role of temporal coherence in listening under noisy conditions, as demonstrated by psychoacoustic and neuroimaging studies using stochastic figure-ground (SFG) stimuli: repeating tonal patterns embedded within clouds of randomly varying pure tones (Boncz et al., 2024; O'Sullivan et al., 2015; Shamma et al., 2011; Teki et al., 2013; Tóth et al., 2016). In these figure-ground segregation stimuli, the figure consists of a set of concurrently delivered tones that repeat the same frequencies across consecutive time frames, creating a temporally coherent pattern. In contrast, the background tones vary randomly in frequency from frame to frame (see Figure 1). The perceptual salience of the figure increases with the number of repeated (coherent) frequency components and the duration of the cohesive segment, supporting the notion that successful segregation relies on the integration of acoustic information over time and frequency. As in many natural listening environments (e.g., understanding speech in background noise), the figure cannot be detected based on spectral (concurrent) cues alone—it emerges only through temporally integrating information from successive sounds (sequential information) to detect the coherent spectral structure over time.

Four panels labeled a, b, c, and d compare auditory stimuli using scatter plots and spectrograms. Scatter plots show frequency versus time, with blue circles for noise and red circles for signal components. Spectrograms display frequency and power, with blue to yellow color mapping. Panel a shows noise-only, panel b shows low-SNR with faint signal, panel c shows high-SNR with prominent signal, and panel d shows signal-only with strong signal patterns. — Spectrograms of the four experimental conditions. Blue circles (left panels) and yellow rectangles (right panels) represent tones. The x-axis represents time, and the y-axis represents frequency. The signal’s onset and offset are marked by vertical lines in **(b–d)**, and the tones in the signal are highlighted with red (left panels) or a white X (right panels). In the noise-only condition **(a)**, there is no signal, only background tones.

The temporal coherence theory of auditory scene analysis suggests that stream formation relies on temporal coherence (coincidence operations) between neuronal responses encoding different acoustic features, leading to the binding of coincident responses into perceptual sources (Shamma et al., 2011; Teki et al., 2013). In this view, stream formation begins with cochlear frequency analysis and the extraction of spectral and temporal features, including pitch and location (Lu et al., 2017). Results demonstrate that attentive listening induces rapid modulation of the interactions among neurons, which are driven by the temporal synchronicity of the auditory stimulation, as was shown by changes in the receptive fields of ferrets, such as their adaptation, their response rate, and spiking correlations in primary auditory cortex (Lu et al., 2017; Elhilali et al., 2009). Computational studies have successfully simulated auditory stream formation across various stimuli (Krishnan et al., 2014; Viswanathan et al., 2022). The results align with the assumed mechanisms of temporal coherence analysis, demonstrating that temporal coherence influences speech comprehension in noise; specifically, temporally coherent auditory masker elements interfere with target speech perception (Viswanathan et al., 2022).

1.2. Neural correlates of stream segregation by temporal coherence in the human brain

Functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) studies have identified brain regions involved in processing figure-ground segregation in the SFG paradigm, including the auditory cortex and higher-order cognitive areas (superior temporal gyrus, planum temporale, and the intraparietal areas; Boncz et al., 2024; Molloy et al., 2019; O'Sullivan et al., 2015; Shamma et al., 2011; Teki et al., 2013; Tóth et al., 2016). In this paradigm, correct figure detection (separately perceiving the repeating set of tones within the tone cloud) elicits a centrally maximally negative event-related response between 200 and 300 ms from the onset of the tone sequence (Boncz et al., 2024; Tóth et al., 2016). This neural response (object-related negativity; ORN; Alain et al., 2001) has been initially observed for sound segregation based on concurrent (spectral) cues alone, such as when mistuning a partial in a harmonic tone complex, which results in hearing this partial as a separate sound from the rest of the harmonic complex. In the SFG paradigm, the amplitude of the ORN has been shown to increase with both the coherence and the duration of the figure (Tóth et al., 2016), reflecting the increasing salience of a coherent auditory stream. When the figure is task-relevant—e.g., when participants judge the number of perceived sound sources—the ORN is followed by a P400 component (Alain et al., 2001, 2002). These two ERP components are assumed to reflect distinct stages in the formation of auditory object representations: the ORN indexes the brain’s assessment of the likelihood of the presence of a coherent sound source (the figure) over the random background based on the sensory evidence, while the P400 reflects the positive outcome of segregation, i.e., the decision that the target, a separate concurrent sound has been detected (Alain et al., 2001).

1.3. Stream segregation abilities in neonates

Initially, research on the development of auditory scene analysis focused on sequential streaming during early infancy using habituation/dishabituation paradigms (see Leibold, 2011, for a comprehensive review). This research revealed that newborns and 3-month-old infants demonstrate sequential streaming, albeit with lower precision than adults (Demany, 1982; McAdams and Bertoncini, 1997; Smith and Trainor, 2011). Consistent with behavioral findings, subsequent research examined neural correlates of sequential stream segregation in infants. These studies suggested that infants also exhibit mismatch negativity (MMN) in response to a version of the “two-streams” paradigm (van Noorden, 1975): two interleaved sound sequences from different frequency ranges with infrequent deviant tones embedded within one or both of the frequency ranges. MMN elicitation in this paradigm by newborn infants suggested that the neural mechanisms for sequential stream segregation are operational already at birth (Winkler et al., 2003). Furthermore, 7-month-olds display an MMN when the deviant occurs within a chord component presented sequentially (Marie and Trainor, 2013).

Recent findings further support newborns’ ability to quickly form representations of auditory regularities, which is critical for processing auditory sequences. After minimal exposure, Tóth et al. (2023) demonstrated that the neonatal auditory system can detect repeating patterns embedded within random tone sequences. Neonates exhibited differential neural responses to regular versus random tone sequences after a single repetition of the pattern. This suggests an innate capacity to rapidly detect sequential auditory regularities, paving the way for future learning and environmental adaptation.

However, whether newborns can segregate streams separated by concurrent cues has not yet received similar effort. Only two studies explored the neural correlates of concurrent stream segregation within the first year of life, and they yielded conflicting results. Both studies employed similar paradigms featuring 500 ms complex tones with mistuned harmonics; half of the tones contained an 8% mistuned second harmonic, while the other half was entirely in tune. While newborns (Bendixen et al., 2015) and 4- to 12-month-old infants (Folland et al., 2015) displayed an ORN in response to mistuned complex tones, 2-month-olds did not (Folland et al., 2015). However, these studies tested only isolated complex tones, thereby precluding the integration of evidence over extended periods. Therefore, in the present study, we tested the functionality of concurrent (spectral) and sequential (temporal) integration together in neonatal auditory stream segregation by presenting temporally coherent tonal figures within a random tone cloud (Teki et al., 2013; see Figure 1).

1.4. Research questions and hypotheses

In the present study, high-density EEG was recorded while sleeping newborns were exposed to temporally coherent tonal elements embedded in randomly varying background tones. To investigate the stream segregation processes under different signal-to-noise ratios (SNR) we employed the stochastic figure–ground (SFG) stimulus paradigm, adapted from previous adult and infant studies (Teki et al., 2011, 2013, 2016; Tóth et al., 2016; see Figure 1). Here, the SNR indicates the number of coherent, temporally repeating signal components embedded in a background of randomly varying tones. Each stimulus consisted of a sequence of 30 chords (50 ms per chord). In Signal trials, a subset of tones with fixed frequencies repeated coherently across successive chords for 800 ms, forming a perceptual “figure,” while the remaining tones varied randomly in frequency and constituted the background. Noise-only trials contained only randomly varying tones. To probe stream segregation across different levels of signal salience, Signal trials varied in the number of coherent tones (10, 15, or 20), with the number of background tones decreasing accordingly, yielding Low-SNR, High-SNR, and Signal-only conditions.

The primary research question was whether temporal coherence supports auditory stream segregation already at birth, as reflected in newborns’ event-related brain responses. Based on the temporal coherence theory of auditory scene analysis and prior adult EEG findings using the SFG paradigm, we formulated component-specific hypotheses targeting the object-related negativity (ORN) and the later positive ERP component often referred to as P400 in the adult literature.

First, we hypothesized that signal-containing trials would elicit an ORN-like response relative to Noise-only trials, reflecting the neural detection of a coherent auditory object embedded in a random background. In adults, the ORN indexes the brain’s early, largely automatic assessment of the presence of a distinct sound source (Alain et al., 2001; Tóth et al., 2016). We therefore predicted that, even in newborns, temporally coherent figures would elicit a more negative response than Noise-only trials, indicating an early, automatic form of stream segregation driven by temporal coherence.

Second, we tested whether differences in Signal conditions with varying signal-to-noise ratios would elicit sensitivity levels similar to those of adults. Although adult listeners show graded ORN amplitude modulation as a function of figure salience and coherence strength, we expected that newborns would show limited differentiation between Low-SNR and High-SNR conditions. This prediction reflects the ongoing postnatal maturation of functional networks that integrate acoustic information over time and frequency, which may constrain fine-grained sensitivity to differences in signal strength at birth.

Third, we explored whether a later positive ERP component, potentially corresponding to the P400 observed in awake adults, would be present in newborns. In adults, the P400 has been associated with a later stage of auditory object formation and perceptual evaluation, particularly when the figure is task-relevant (Alain et al., 2001, 2002). Although newborns cannot perform explicit perceptual judgments, we hypothesized that a delayed and possibly attenuated late positivity might nevertheless emerge following signal-containing trials, reflecting an early neural correlate of successful stream segregation. As newborns were tested during sleep and the paradigm involved passive listening, we did not expect this late response to show the same functional characteristics or topography as the adult P400, but rather to represent a developmental precursor of this processing stage.

EEG recordings were conducted during quiet sleep, a standard practice in neonatal research that provides a stable recording environment with minimal movement and reduced sensory interference. Importantly, previous studies have demonstrated that early sensory and perceptual ERP components in newborns—such as the mismatch negativity and other sound-related responses—are robustly elicited during sleep and show functional similarities to responses observed during wakefulness, albeit with longer latencies and developmental differences in scalp distribution (Winkler et al., 2003; Bendixen et al., 2015; Tóth et al., 2023). Because the present paradigm relied on passive auditory stimulation and automatic segregation mechanisms rather than attention or task engagement, quiet sleep was considered an appropriate and methodologically justified state for probing the neural bases of auditory stream segregation at birth.

2. Methods

2.1. Participants

Forty-seven healthy full-term newborn infants (0–4 days of age, APGAR score 9/10 or above) were measured. They had a mean gestational age of 280.35 days (40 + 3 weeks; SD = 8.12 days) and a mean birth weight of 3,549 g (SD = 434.67 g). All newborns included in the study passed the routine universal newborn hearing screening conducted at the maternity ward (based on otoacoustic emissions and/or automated auditory brainstem responses, according to hospital protocol). Only infants with documented normal screening results and without known risk factors for hearing impairment were included in the study. Issues due to technical failures in the data-acquisition machinery (the stimulus markers were not recorded) led to the exclusion of 5 newborns, while preprocessing steps (see section 2.5. EEG preprocessing) led to the exclusion of another nine newborns. Thus, the final sample included 33 infants (15 males, 18 females). Informed consent was obtained from one or both parents, and the infant’s mother could opt to be present during the recording. The study fully complied with the World Medical Association of Helsinki and all national and international laws. The Scientific and Research Ethics Committee of the Medical Research Council of Hungary (ETT TUKEB) granted permission for the research.

2.2. Stimuli

Stochastic figure–ground (SFG) stimuli were adapted from previous studies (Teki et al., 2011; Tóth et al., 2016). SFG stimuli consisted of a background of randomly varying frequency pure tones (noise) and, in some conditions, an additional set of pure tones with fixed frequencies repeating across successive time windows, forming a temporally coherent figure (signal).

The primary experimental manipulation was the number of temporally coherent frequency components forming the figure. For convenience, we refer to this manipulation as the signal-to-noise ratio (SNR), defined here as the number of coherent (signal) components embedded in a background of randomly varying tones. Although coherence strength in SFG paradigms is more commonly described in terms of the number of repeating elements, the term SNR is used in the present study as a shorthand to denote the relative proportion of coherent components, rather than acoustic energy in the classical sense.

Four stimulus conditions were used: Noise-only, Low-SNR, High-SNR, and Signal-only (200 stimuli per condition). In the Noise-only condition, stimuli contained no coherent components (0 signal components). In the Low-SNR and High-SNR conditions, stimuli contained 5 and 10 coherent frequency components, respectively. In the Signal-only condition, all 20 components were coherent. The four stimulus conditions are illustrated in Figure 1A in time–frequency space.

Each stimulus had a total duration of 1,500 ms and consisted of 30 consecutive time windows, each 50 ms in duration. Each time window contained 20 pure tones of different frequencies, resulting in a 30 × 20 time–frequency grid (Figure 1A). In stimuli containing a signal (all except Noise-only), an 800-ms window was pseudo-randomly selected to include the coherent components, with the constraint that the first 300 ms of each stimulus contained only noise. This randomization ensured the figure’s onset was unpredictable.

In the Noise-only condition, all 20 frequency components varied randomly from time window to time window throughout the entire stimulus. In the Signal conditions, the selected 800-ms window contained 5, 10, or 20 fixed frequency components (Low-SNR, High-SNR, and Signal-only, respectively). In contrast, the remaining components (15, 10, or 0, respectively) varied randomly in frequency, as in the Noise-only condition. Thus, the signal differed from the noise solely in its temporal structure, namely the repetition of fixed frequencies across successive time windows, and could not be distinguished based on intensity or spectral cues alone. Each 50-ms time window contained 20 pure tones (starting at zero phase), with a 10-ms raised-cosine ramp applied at both the onset and the offset. The frequencies were randomly selected from a 64-step set, logarithmically spaced between 500 and 4,000 Hz. low-frequency sound transmission in newborns is strongly affected by immature middle-ear mechanics and residual middle-ear fluid, leading to increased attenuation and variability below approximately 400–500 Hz (Keefe et al., 1993; Keefe and Levi, 1996).

Following the protocol of Teki et al. (2011), all pure-tone components had equal amplitudes, without controlling for perceived loudness. Because the assignment of frequency components to signal or noise was randomized across stimuli, perceived loudness and saliency could not serve as reliable cues for distinguishing the figure from the background. Stimuli were presented at a comfortable overall intensity of approximately 70 dB SPL.

All stimuli were generated with Matlab (R2017a, Mathworks; Natick, MA, USA) at a sampling rate of 44.1 kHz and 16-bit resolution and delivered using a Maya 22 USB external sound card and ER•2 Insert Earphones (Etymotic Research Inc., Elk Grove Village, IL, USA), inserted into the infants’ ears using ER2 Foam Infant Ear-tips.

2.3. Procedure

EEG recordings were conducted in a dedicated experimental room at the Department of Obstetrics and Gynecology, Szent Imre Hospital, Budapest. The newborn participants were asleep during stimulus presentation, with sleep states classified according to the criteria established by Anders and Weinstein (1972). Only infants who remained in quiet sleep for the entire 45-min duration were included in the study. In addition to behavioral criteria, the EEG signal was visually examined to ensure muscle tension was stable, respiration was regular, and significant eye movements were absent.

A total of 800 trials (200 for each of the four conditions: Noise-only, Low-SNR, High-SNR, Signal-only), were presented. Stimuli were delivered within a single experimental block, with the order of the conditions randomized. Stimulus presentation was controlled using Psychophysics Toolbox (Kleiner et al., 2007) (Psychology Software Tools, Inc., Pittsburgh, PA, USA), and EEG activity was recorded continuously throughout the presentation. The silent inter-stimulus interval (ISI, offset to onset) duration was randomized between 800 and 1,200 ms with values rounded to 100. The experimental session lasted approximately 45 min.

2.4. EEG data acquisition

ActiChamp Plus amplifier with a 64-channel sponge-based electrode system (R-Net, saltwater sponges and passive Ag/AgCl electrodes) and BrainVision Recorder were employed for EEG recording (all from Brain Products GmbH, Gilching, Germany). The sampling rate was 500 Hz with a 100 Hz online low-pass filter applied for visualization-only purposes. Electrodes were placed according to the International 10/20 system. The Cz electrode served as the reference electrode, while the ground electrode was placed at the center of the forehead. During the recording, impedances were kept below 15 kΩ.

2.5. EEG preprocessing

EEG data were imported to MATLAB (Mathworks, Natick, MA, USA; ver. 2021a) and preprocessed in EEGLab (Delorme and Makeig, 2004) using the validated NEAR pipeline (Kumaravel et al., 2022). In the NEAR pipeline, data were first band-pass filtered between 0.1 and 100 Hz with a finite impulse response (FIR) filter. Cut-off frequencies were 0.05 and 100.05 Hz (−6 dB). Then, the Local Outlier Factor (LOF), a data-driven approach based on the squared Euclidean distance across electrodes, was applied to remove noisy channels (Kumaravel et al., 2022). Following the recommendations of Kumaravel et al. (2022) for newborns, we set the threshold at 2.5.

This was followed by Artifact Subspace Reconstruction (ASR; Mullen et al., 2015): channels were deemed ‘bad’ and removed if either flat for longer than 5 s or through computation of each channel’s correlation to its RANdom SAmple Consensus (RANSAC) reconstruction for each window. The correlation threshold for channel removal was set at 0.8. The maximum number of channels rejected was 11. A Principal Component Analysis (PCA) decomposition was then applied to all recordings to identify artifactual PCs (defined by comparison with the cleanest parts of the data) and reject them, and to reconstruct activity from the remaining components. The windows’ standard Deviation (SD) was set at 20. After all ASR steps, the overall mean number of electrodes rejected was 6.51 (SD = 2.31). The rejection threshold was set to 11: participants with more than 11 noisy electrodes were excluded from further analysis. This threshold was chosen based on reasonable ranges in developmental samples. This led to the exclusion of 9 participants, as noted in Section 2.1. Participants. The removed electrodes were spherically interpolated using the full-rank electrode matrix, and the data were re-referenced to the average reference.

The trials were then epoched into −200 to 1,000 ms segments relative to the beginning of the signal in the signal conditions. In the noise-only condition, time points for triggering were selected pseudo-randomly by using the onset value of a trigger in one of the signal trials. The −200 to 1,000 ms range was chosen because target responses reportedly arise within the first 500 ms in newborns (Bendixen et al., 2015), and to minimize the number of epochs rejected due to late artifacts (beyond our focus of interest). After baseline correction, epochs containing abrupt amplitude changes were rejected using a threshold of ±75 μV. At the same time, a joint probability-based rejection was used to remove epochs exceeding 3 SD from the mean of activity across channels. The average number of remaining epochs was 135.44 (SD = 20.17) for the Noise-only condition, 136.29 (SD = 18.96) for the Low-SNR condition, 134.11 (SD = 20.6) for the High-SNR condition, 137.62 (SD = 16.16) for the Signal-only condition (543.48, SD = 72.98 for the total number of epochs per infant). The proportion of rejected epochs is similar to that in other studies of neonates, and rejections are typically attributed to neonates moving during sleep, poor electrode adherence to the scalp, and occasional high-amplitude electromagnetic signals in the hospital environment. A series of t-tests was performed to assess differences in epoch numbers across the four conditions, but none were found (lowest p = 0.44).

2.6. ERP analysis

Before ERP analysis, the EEG data were low-pass filtered at 45 Hz. For each signal-containing condition (Low-SNR, High-SNR, and Signal-only), we computed difference waveforms by subtracting from the response the ERP obtained in the Noise-only condition. Visual inspection of the difference waveforms revealed two distinct components with different scalp distributions. An early component, occurring between 300 and 600 ms, exhibited a temporo-parietal maximum, whereas a late component, occurring between 600 and 1,000 ms, showed a predominantly fronto-central distribution (Figure 2).

Electroencephalography (EEG) data visualization compares low signal-to-noise ratio (SNR), high SNR, and signal-only conditions. Topographic maps at 350 milliseconds and 800 milliseconds illustrate voltage distributions using a blue to red color scale. A line plot in the center shows average microvolt responses across time for each condition, with shaded error regions and color-coded labels for statistical comparisons. — Group-average (N = 33) scalp distribution and ERP waveforms for the difference between the Signal conditions (Low-SNR, High-SNR, and Signal-only) and the Noise-only condition. Topoplots highlight the scalp distribution of the difference waveforms at the peak of the early response (top of the figure) and a characteristic latency of the late slow shift (bottom right of the figure)–see the color scale for the difference amplitudes. The difference waveforms are shown for the means of channels Fp1, Fz, F3, F7, FC5, FC1, FC2, F4, Fp2, AF7, AF3, AFz, F1, F5, FT7, FC3, FC4, FT8, F6, F2, AF4 with color marking the condition. Confidence areas represent standard errors.

Based on these observed spatial and temporal characteristics, we restricted further statistical testing to these two time windows and to electrode groups corresponding to the visually identified topographies, frontal and temporo-parietal electrodes for the early window (Fz, F3, FC2, AFz, F1, F2, F7, FC5, FC3, F5, FT7, C3, C5, CP3, CP5, P7, P9, P5, T7, and TP7), and fronto-central electrodes for the late window (Fp1, Fz, F3, F7, FC5, FC1, C3, FC2, FC6, F10, F8, F4, Fp2, AF7, AF3, AFz, F1, F5, FT7, FC3, FC4, FT8, F6, F2, AF4, AF8, T7, CP5, P7, P9, T8, C1, C5, TP7, CP3, P5, C6, and C2). To ensure that the results were not driven by gender differences, we conducted independent-samples t-tests on the averaged values across the two time windows by gender. No significant effects were observed (all ps > 0.29). As an exploratory analysis, we examined correlations between maturational indices, defined as z-scores of maternal age, gestational age at birth, birth weight, birth length, and head circumference, and the mean values obtained from the early and late time windows.

Then, permutation-based one-tailed t-tests were used to assess (1) whether absolute voltage values at individual time points were significantly higher than those observed in the Noise-only condition, analyzed separately for each signal condition, using one-tailed tests because our comparisons were made against the Noise-only condition, which represented our theoretical zero value and for which our hypotheses predicted a specific direction of effect: signal-containing conditions were expected to produce greater responses than Noise-only, and (2) whether the signal conditions significantly differed from one another, for which two-tailed tests were used. Permutation testing is a non-parametric statistical approach that builds the “null” distribution by repeatedly shuffling condition labels. In our analysis, condition labels were randomly permuted across subjects, and the test statistic (e.g., t-value) was recalculated for each permutation. Repeating this procedure 1,000 times yielded an empirical null distribution, against which the observed statistic was compared to obtain a permutation-based p-value (p < 0.05). This method offers two main advantages: it avoids assumptions about the distribution of the data and multiple-comparison issues. Statistical analyses were conducted using the Brainstorm toolbox (Tadel et al., 2011) in Matlab (ver. 2024a).

2.7. Transparency and openness

Data are available on request due to privacy and ethical restrictions.

3. Results

Brain activity significantly differed from the Noise-only condition in all three signal conditions (Low-SNR, High-SNR, and Signal-only). Moreover, the Low-SNR condition was significantly different from the Signal-only condition. No other significant differences were observed between the signal conditions (Low- vs. High-SNR; High-SNR vs. Signal-only).

More in depth, permutation-based t-tests confirmed that amplitudes during the early (300–600 ms) and late (600–1,000 ms) windows were significantly greater than zero across both temporo-parietal (early window) and fronto-central (late window) electrodes for all three signal conditions (p < 0.05; Figure 3). This pattern reflects progression in time in the neural response, from posterior to anterior scalp regions.

Figure contains three panels (A–C) each with two scalp topographies and two EEG line plots at electrodes P9 and Fz. Topographies show t-value distributions for Low-SNR, High-SNR, and Signal-only conditions, with red and blue indicating positive and negative t-values respectively. Line plots compare signal and noise conditions across participants (n = 33 per group) with 95% confidence bands, time on x-axes, and microvolts on y-axes. Gray shaded regions highlight analysis windows. — Topology (scalp distribution) of t-values from permutation testing is shown for the early (300–600 ms) and late (600–1000 ms) time windows (indicated in gray), highlighting differences between Signal conditions and the Noise-only condition **(A-C)**. T-values are represented in color [see color scales in **(A)**]. The ERP waveforms are shown for selected channels (P9, Fz) that exhibited maximal differences across permutation testing for illustrative purposes. Confidence areas represent standard errors.

Regarding the between-condition comparisons, we observed a significant difference between the Low-SNR and the Signal-only conditions, with increased parietal activity in the Low-SNR condition during both the early and the late time windows (p < 0.05; Figure 4). In contrast to comparisons between the signal conditions and the Noise-only condition, the activity in the Low-SNR vs. Signal-only comparison remained focused over parietal areas across both windows. No other significant differences were observed between the remaining condition pairs.

Composite figure showing two EEG scalp topography maps of t-values with blue and red gradients, indicating activity differences across the head, above a line graph of event-related potentials at electrode P9. The graph compares Low-SNR (orange) and Signal-only (red) groups across time, with shaded regions marking significant intervals: 300–600 milliseconds and 600–1000 milliseconds. Error margins are shown. Scale bars for amplitude (microvolts) and time (0.1 seconds) are labeled. — Topology (scalp distribution) of t-values from permutation testing is shown for the early (300–600 ms) and late (600–1000 ms) time windows (indicated in gray), highlighting differences between the Low-SNR and the Signal-only condition. T-values are represented in color (see color scales). The ERP waveforms are shown for P9 for illustrative purposes. Confidence areas represent standard errors.

Although correlations with maturational indices were not statistically significant (all ps > 0.9), some patterns may warrant future investigation. Specifically, we observed a negative correlation between the early window and gestational age at birth (r = −0.17), a positive correlation between the late window and birth weight (r = 0.22), and a positive correlation between the early window and head circumference at birth (r = 0.26). Graphical representation of these relationships can be found in the Supplementary material.

4. Discussion

The results provide evidence for the role of temporal coherence in organizing auditory scenes from birth. This study demonstrates that newborns can segregate auditory streams based on temporal coherence, indicating that the ability to segment sensory input into discrete objects (Winkler and Denham, 2024) is already present at birth. By examining difference waveforms and their spatial distributions, we identified two temporally distinct ERP responses: an early response between approximately 300–600 ms, with a temporo-parietal maximum, and a later response between 600 and 1,000 ms, characterized by a more fronto-central distribution. Both responses were observed for signal-containing conditions relative to the Noise-only condition.

By calculating difference-waveforms and examining their spatial distributions, we identified two distinct ERP components: an early response between 300 and 600 ms with a temporo-parietal maximum, and a later response between 600 and 1,000 ms with a fronto-central distribution. These components were observed across all three signal conditions (Low-SNR, High-SNR, and Signal-only), relative to the Noise-only condition.

In adult listeners, auditory figure–ground segregation is typically associated with the object-related negativity (ORN), followed by a later positive component such as the P400 (Alain et al., 2001; Johnson et al., 2007). In the present newborn sample, however, the early response differed from the canonical adult ORN in several respects, including its polarity, scalp distribution, and hemispheric lateralization. We therefore interpret this early activity not as a direct homolog of the adult ORN, but rather as an ORN-like or putative object-related response, potentially reflecting an early developmental stage of auditory object processing. Such deviations from adult ERP morphology are expected in neonates. Early auditory responses in infancy often differ in polarity, latency, and topography due to immature cortical layering, incomplete myelination, and developing long-range connectivity (Adibpour et al., 2020; Long et al., 2018; Polver et al., 2023). From this perspective, the early response observed here may represent a developmental precursor of later object-related components that become more focal, midline-centered, and negative in polarity over the course of maturation.

Visual inspection of the scalp distributions suggests that the ORN-like response in newborns may be relatively stronger over the right hemisphere. In adults, figure–ground segregation and temporal coherence–based auditory object formation have been shown to preferentially engage right auditory cortical regions, including the planum temporale and superior temporal gyrus, particularly for non-speech and spectrally complex stimuli (Teki et al., 2013; Molloy et al., 2019). Developmental neuroimaging studies indicate that hemispheric asymmetries in auditory processing are already detectable in early infancy, although they remain less specialized and more variable than in adults (Dehaene-Lambertz and Pena, 2001; Minagawa-Kawai et al., 2011). In newborn EEG, apparent lateralization effects may further be shaped by ongoing cortical maturation, skull conductivity, and individual anatomical variability. Consequently, while the observed right-leaning asymmetry may reflect an early bias in the neural mechanisms supporting auditory scene analysis, it should be interpreted cautiously. Longitudinal and source-level studies will be necessary to determine whether such asymmetries are stable across development and whether they relate to later specialization for speech, music, or other complex auditory functions.

Although newborns were asleep during testing—precluding a definitive identification of the P400—the fronto-central activity observed in the 600–1,000 ms window may reflect a late positivity that could be a precursor to the P400. This later fronto-central positivity was more robust and aligns more closely with the P400 described in adults. Although newborns were tested during quiet sleep—precluding explicit perceptual reports—the presence of this late positivity suggests that higher-level evaluation of temporally coherent auditory input can occur even in the absence of attention or conscious awareness. This interpretation is consistent with evidence that newborns can learn and form auditory representations during sleep (Fifer et al., 2010). Its presence during passive sleep suggests that some aspects of auditory scene evaluation can occur without conscious awareness. Newborns have been shown to learn during sleep (Cheour et al., 2002; Fifer et al., 2010). Our findings partially diverge from previous work on auditory novelty detection, which reported responses in preterm infants only during active sleep, whereas basic sensory perception and responses to standard tones were observed in both active and quiet sleep (Suppiej et al., 2010; Kushnerenko et al., 2013).

In contrast, our results indicate that even during quiet sleep, newborns are capable of processing auditory scenes, underscoring their functional relevance. Both the early negative wave—emerging around 300 ms over central and temporo-parietal regions—and the subsequent fronto-central positivity resemble adult ERP components (Boncz et al., 2024; Tóth et al., 2016), suggesting continuity in the developmental trajectory of auditory processing. However, the longer latencies observed in newborns compared to adults are consistent with maturational delays in neonatal auditory processing, likely reflecting ongoing myelination and synaptic development in primary and secondary auditory areas, which support processes critical for temporal binding and auditory stream segregation (Adibpour et al., 2020; Long et al., 2018). Indeed, infants have been reported to exhibit longer response latencies earlier in development, with these differences gradually decreasing as they mature (Polver et al., 2023). Future studies should include a range of age groups to better map the developmental trajectory of response latencies. Although our exploratory correlations between maturational indices (e.g., birth weight, gestational age, head circumference) and early and late response latencies were not significant, the observed trends suggest that early growth characteristics may still be relevant for understanding developmental changes in response timing. Future studies including multiple age groups will be essential for mapping how figure salience influences response latency across developmental stages.

Taken together, our findings indicate that while the late P400-like response provides the clearest evidence for auditory stream segregation at birth, the earlier ORN-like activity may reflect an immature precursor of object-related processing. Future longitudinal work will be essential to determine how these early and late responses evolve into the canonical adult ERP components associated with auditory scene analysis.

Consistent with this interpretation, the Low-SNR condition elicited significant differences in neural responses over parietal areas compared to the Signal-only condition. In contrast to adult listeners, newborns did not show a monotonic increase in neural response amplitude with increasing figure coherence. Instead, the only reliable difference between signal conditions was observed between the Low-SNR and Signal-only conditions, with increased parietal activity in the Low-SNR condition. Importantly, this pattern differs from adult findings, in which both ORN and P400 amplitudes scale positively with figure coherence and duration (Tóth et al., 2016). We therefore interpret the Low-SNR effect not as evidence for enhanced segregation at lower coherence levels, but rather as reflecting increased processing demands when temporally coherent elements are embedded in a more complex and acoustically variable background. In newborns, immature temporal integration mechanisms may lead to greater neural engagement under more challenging listening conditions, without yielding stronger or more stable perceptual object representations. In contrast, the Signal-only condition may place lower demands on segregation mechanisms, resulting in reduced engagement of distributed parietal networks. The absence of systematic differences between the Low- and High-SNR conditions further suggests that newborn auditory processing does not yet exhibit the graded sensitivity to figure coherence observed in adults. This non-monotonic pattern likely reflects developmental immaturity in temporal coherence processing and underscores the fact that adult-like scaling of auditory object representations emerges only later in development.

The use of permutation-based statistical testing enabled us to identify robust effects in both the early and late time windows without relying on assumptions of normality—an important consideration given the variability of neonatal EEG data. These tests confirmed that the topography of neural activity shifted significantly over time: from a temporo-parietal distribution in the early window to a more fronto-central pattern in the later window. This dynamic spatial progression may indicate a transition from initial sensory/perceptual processing to higher-level evaluation of the auditory input. Similar topographic shifts have been observed in adults during auditory figure-ground segregation, which are thought to reflect increased interaction between primary auditory cortices and higher-order associative areas (Diepenbrock et al., 2017; Mishra et al., 2021; Viswanathan et al., 2022).

These findings contribute to a growing body of evidence suggesting that the basic neural architecture supporting auditory stream segregation is present from birth. The temporally and spatially distinct ERP components observed in our study reflect an emerging capacity for detecting salient auditory events in complex soundscapes. This capacity likely serves as a foundation for the development of selective attention and speech perception (Elhilali et al., 2009; Kujala et al., 2023; Rezaeizadeh and Shamma, 2021; Stefanics et al., 2007), enabling infants to orient toward relevant signals such as the caregiver’s voice (Winkler et al., 2003; Kujala et al., 2023) and to extract wordforms from auditory streams (Kujala et al., 2023).

Future work employing source reconstruction and connectivity analysis could further clarify the cortical areas generating these early and late components, as well as their developmental trajectories. Longitudinal studies could also examine how the maturation of spatially distinct ERP components relates to the emergence of language, attention, and social communication abilities. In summary, this study provides compelling evidence that human newborns can segregate temporally coherent auditory input from background noise. The observed spatiotemporal dynamics—marked by early sensory and later associative responses—highlight the foundational neural mechanisms of auditory scene analysis at birth. These mechanisms likely support early learning through observation and interaction, laying the groundwork for the development of communication and cognition.

Acknowledgments

We thank Eszter Rozgonyiné Lányi for her assistance during data collection.

Funding Statement

The author(s) declared that financial support was received for this work and/or its publication. This work was funded by the Hungarian National Research Development and Innovation Office (ANN131305 to BT, FK139135 to GH, and K147135 to IW); the János Bolyai Research Grant awarded to GH (BO/00523/21/2) and the New National Excellence Program of the Ministry for Innovation and Technology from the source of the National Research, Development and Innovation (ÚNKP-23-5-BME-429) for GH.

Footnotes

Edited by: Peter Sörös, University of Oldenburg, Germany

Reviewed by: Yasuki Noguchi, Kobe University, Japan

Elena Benocci, Université Libre de Bruxelles, Belgium

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors without undue reservation.

Ethics statement

The studies involving humans were approved by The Scientific and Research Ethics Committee of the Medical Research Council of Hungary (ETT TUKEB). The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation in this study was provided by the participants’ legal guardians/next of kin.

Author contributions

SP: Methodology, Formal analysis, Writing – review & editing, Writing – original draft. PK: Writing – review & editing, Methodology. GH: Conceptualization, Methodology, Writing – review & editing. IS: Project administration, Data curation, Writing – review & editing, Funding acquisition. IW: Writing – review & editing, Conceptualization. BT: Funding acquisition, Project administration, Writing – review & editing, Writing – original draft, Methodology, Data curation, Conceptualization.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnhum.2026.1719515/full#supplementary-material

Data_sheet_1.docx^{(62.8KB, docx)}

References

Adibpour P., Lebenberg J., Kabdebon C., Dehaene-Lambertz G., Dubois J. (2020). Anatomo-functional correlates of auditory development in infancy. Dev. Cogn. Neurosci. 42:100752. doi: 10.1016/j.dcn.2019.100752, [DOI] [PMC free article] [PubMed] [Google Scholar]
Alain C., Arnott S. R., Picton T. W. (2001). Bottom–up and top–down influences on auditory scene analysis: evidence from event-related brain potentials. J. Exp. Psychol. Hum. Percept. Perform. 27:1072. doi: 10.1037/0096-1523.27.5.1072 [DOI] [PubMed] [Google Scholar]
Alain C., Schuler B. M., McDonald K. L. (2002). Neural activity associated with distinguishing concurrent auditory objects. J. Acoust. Soc. Am. 111, 990–995. doi: 10.1121/1.1434942, [DOI] [PubMed] [Google Scholar]
Anders T. F., Weinstein P. (1972). Sleep and its disorders in infants and children: a review. Pediatrics 50, 312–324. doi: 10.1542/peds.50.2.312 [DOI] [PubMed] [Google Scholar]
Bendixen A., Háden G. P., Németh R., Farkas D., Török M., Winkler I. (2015). Newborn infants detect cues of concurrent sound segregation. Dev. Neurosci. 37, 172–181. doi: 10.1159/000370237, [DOI] [PubMed] [Google Scholar]
Boncz Á., Szalárdy O., Velősy P. K., Béres L., Baumgartner R., Winkler I., et al. (2024). The effects of aging and hearing impairment on listening in noise. Iscience 27:109295. doi: 10.1016/j.isci.2024.109295, [DOI] [PMC free article] [PubMed] [Google Scholar]
Bregman A. S. (1994). Auditory scene analysis: the perceptual organization of sound. Cambridge, MA: MIT Press. [Google Scholar]
Cheour M., Martynova O., Näätänen R., Erkkola R., Sillanpää M., Kero P., et al. (2002). Speech sounds learned by sleeping newborns. Nature 415, 599–600. doi: 10.1038/415599b, [DOI] [PubMed] [Google Scholar]
Dehaene-Lambertz G., Pena M. (2001). Electrophysiological evidence for automatic phonetic processing in neonates. Neuroreport 12, 3155–3158. doi: 10.1097/00001756-200110080-00034 [DOI] [PubMed] [Google Scholar]
Delorme A., Makeig S. (2004). EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134, 9–21. doi: 10.1016/j.jneumeth.2003.10.009, [DOI] [PubMed] [Google Scholar]
Demany L. (1982). Auditory stream segregation in infancy. Infant Behav. Dev. 5, 261–276. doi: 10.1016/S0163-6383(82)80036-2 [DOI] [Google Scholar]
Diepenbrock J. P., Jeschke M., Ohl F. W., Verhey J. L. (2017). Comodulation masking release in the inferior colliculus by combined signal enhancement and masker reduction. J. Neurophysiol. 117, 853–867. doi: 10.1152/jn.00191.2016, [DOI] [PMC free article] [PubMed] [Google Scholar]
Elhilali M., Ma L., Micheyl C., Oxenham A. J., Shamma S. A. (2009). Temporal coherence in the perceptual organization and cortical representation of auditory scenes. Neuron 61, 317–329. doi: 10.1016/j.neuron.2008.12.005, [DOI] [PMC free article] [PubMed] [Google Scholar]
Fifer W. P., Byrd D. L., Kaku M., Eigsti I. M., Isler J. R., Grose-Fifer J., et al. (2010). Newborn infants learn during sleep. Proc. Natl. Acad. Sci. 107, 10320–10323. doi: 10.1073/pnas.100506110 [DOI] [PMC free article] [PubMed] [Google Scholar]
Folland N. A., Butler B. E., Payne J. E., Trainor L. J. (2015). Cortical representations sensitive to the number of perceived auditory objects emerge between 2 and 4 months of age: electrophysiological evidence. J. Cogn. Neurosci. 27, 1060–1067. doi: 10.1073/pnas.1005061107 [DOI] [PubMed] [Google Scholar]
Hermes D. J. (2023). The perceptual structure of sound. Eindhoven: Springer. [Google Scholar]
Johnson B. W., Hautus M. J., Duff D. J., Clapp W. C. (2007). Sequential processing of interaural timing differences for sound source segregation and spatial localization: evidence from event-related cortical potentials. Psychophysiology 44, 541–551. doi: 10.1111/j.1469-8986.2007.00535.x, [DOI] [PubMed] [Google Scholar]
Keefe D. H., Bulen J. C., Arehart K. H., Burns E. M. (1993). Ear-canal impedance and reflection coefficient in human infants and adults. J. Acoust. Soc. Am. 94, 2617–2638. doi: 10.1121/1.407347 [DOI] [PubMed] [Google Scholar]
Keefe D. H., Levi E. (1996). Maturation of the middle and external ears: acoustic power-based responses and reflectance tympanometry. Ear Hear. 17, 361–373, [DOI] [PubMed] [Google Scholar]
Kleiner M., Brainard D., Pelli D. (2007). “What’s new in Psychtoolbox-3?” Perception 36 ECVP abstract supplement.
Köhler W. (1947). Gestalt psychology: an introduction to new concepts in modern psychology. New York, NY: Liveright. [Google Scholar]
Krishnan L., Elhilali M., Shamma S. (2014). Segregating complex sound sources through temporal coherence. PLoS Comput. Biol. 10:e1003985. doi: 10.1371/journal.pcbi.1003985, [DOI] [PMC free article] [PubMed] [Google Scholar]
Kujala T., Partanen E., Virtala P., Winkler I. (2023). Prerequisites of language acquisition in the newborn brain. Trends Neurosci. 46, 726–737. doi: 10.1016/j.tins.2023.05.011, [DOI] [PubMed] [Google Scholar]
Kumaravel V. P., Farella E., Parise E., Buiatti M. (2022). NEAR: an artifact removal pipeline for human newborn EEG data. Dev. Cogn. Neurosci. 54:101068. doi: 10.1016/j.dcn.2022.101068, [DOI] [PMC free article] [PubMed] [Google Scholar]
Kushnerenko E. V., Van den Bergh B. R. H., Winkler I. (2013). Separating acoustic deviance from novelty during the first year of life: a review of event-related potential evidence. Front. Psychol. 4:595. doi: 10.3389/fpsyg.2013.00595, [DOI] [PMC free article] [PubMed] [Google Scholar]
Leibold L. J. (2011). “Development of auditory scene analysis and auditory attention” in Human auditory development (New York, NY: Springer New York; ), 137–161. [Google Scholar]
Long P., Wan G., Roberts M. T., Corfas G. (2018). Myelin development, plasticity, and pathology in the auditory system. Dev. Neurobiol. 78, 80–92. doi: 10.1002/dneu.22538, [DOI] [PMC free article] [PubMed] [Google Scholar]
Lu K., Xu Y., Yin P., Oxenham A. J., Fritz J. B., Shamma S. A. (2017). Temporal coherence structure rapidly shapes neuronal interactions. Nat. Commun. 8:13900. doi: 10.1038/ncomms13900 [DOI] [PMC free article] [PubMed] [Google Scholar]
Marie C., Trainor L. J. (2013). Development of simultaneous pitch encoding: infants show a high voice superiority effect. Cereb. Cortex 23, 660–669. doi: 10.1093/cercor/bhs050 [DOI] [PubMed] [Google Scholar]
McAdams S., Bertoncini J. (1997). Organization and discrimination of repeating sound sequences by newborn infants. J. Acoust. Soc. Am. 102, 2945–2953. doi: 10.1121/1.420349, [DOI] [PubMed] [Google Scholar]
Minagawa-Kawai Y., Cristià A., Dupoux E. (2011). Cerebral lateralization and early speech acquisition: a developmental scenario. Dev. Cogn. Neurosci. 1, 217–232. doi: 10.1016/j.dcn.2011.03.005, [DOI] [PMC free article] [PubMed] [Google Scholar]
Mishra A. P., Peng F., Li K., Harper N., Schnupp J. W. (2021). Sensitivity of the neural responses to statistical features of sound textures in the inferior colliculus. Hear. Res. 412, 10–1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Molloy K., Lavie N., Chait M. (2019). Auditory figure-ground segregation is impaired by high visual load. J. Neurosci. 39, 1699–1708. doi: 10.1523/JNEUROSCI.2518-18.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
Mullen T. R., Kothe C. A., Chi Y. M., Ojeda A., Kerth T., Makeig S., et al. (2015). Real-time neuroimaging and cognitive monitoring using wearable dry EEG. IEEE Trans. Biomed. Eng. 62, 2553–2567. doi: 10.1109/TBME.2015.2481482 [DOI] [PMC free article] [PubMed] [Google Scholar]
O'Sullivan J. A., Shamma S. A., Lalor E. C. (2015). Evidence for neural computations of temporal coherence in an auditory scene and their enhancement during active listening. J. Neurosci. 35, 7256–7263. doi: 10.1523/JNEUROSCI.4973-14.2015, [DOI] [PMC free article] [PubMed] [Google Scholar]
Polver S., Háden G. P., Bulf H., Winkler I., Tóth B. (2023). Early maturation of sound duration processing in the infant’s brain. Sci. Rep. 13:10287. doi: 10.1038/s41598-023-36794-x, [DOI] [PMC free article] [PubMed] [Google Scholar]
Rezaeizadeh M., Shamma S. (2021). Binding the acoustic features of an auditory source through temporal coherence. Cereb. Cortex Commun. 2:tgab060. doi: 10.1093/texcom/tgab060, [DOI] [PMC free article] [PubMed] [Google Scholar]
Shamma S., Elhilali M., Ma L., Micheyl C., Oxenham A. J., Pressnitzer D., et al. (2013). “Temporal coherence and the streaming of complex sounds” in Basic aspects of hearing: physiology and perception (New York: Springer; ), 535–543. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shamma S. A., Elhilali M., Micheyl C. (2011). Temporal coherence and attention in auditory scene analysis. Trends Neurosci. 34, 114–123. doi: 10.1016/j.tins.2010.11.002, [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith N. A., Trainor L. J. (2011). Auditory stream segregation improves infants’ selective attention to target tones amid distracters. Infancy 16, 655–668. doi: 10.1111/j.1532-7078.2011.00067.x [DOI] [PMC free article] [PubMed] [Google Scholar]
Suppiej A., Mento G., Zanardo V., Franzoi M., Battistella P. A., Ermani M., et al. (2010). Auditory processing during sleep in preterm infants: An event related potential study. Early human development. 86, 807–812. [DOI] [PubMed] [Google Scholar]
Stefanics G., Háden G., Huotilainen M., Balázs L., Sziller I., Beke A., et al. (2007). Auditory temporal grouping in newborn infants. Psychophysiology 44, 697–702. doi: 10.1111/j.1469-8986.2007.00540.x [DOI] [PubMed] [Google Scholar]
Tadel F., Baillet S., Mosher J. C., Pantazis D., Leahy R. M. (2011). Brainstorm: a user-friendly application for MEG/EEG analysis. Comput. Intell. Neurosci. 2011:879716. doi: 10.1155/2011/879716, [DOI] [PMC free article] [PubMed] [Google Scholar]
Teki S., Barascud N., Picard S., Payne C., Griffiths T. D., Chait M. (2016). Neural correlates of auditory figure-ground segregation based on temporal coherence. Cereb. Cortex 26, 3669–3680. doi: 10.1093/cercor/bhw173, [DOI] [PMC free article] [PubMed] [Google Scholar]
Teki S., Chait M., Kumar S., Shamma S., Griffiths T. D. (2013). Segregation of complex acoustic scenes based on temporal coherence. eLife 2:e00699. doi: 10.7554/eLife.00699, [DOI] [PMC free article] [PubMed] [Google Scholar]
Teki S., Chait M., Kumar S., von Kriegstein K., Griffiths T. D. (2011). Brain bases for auditory stimulus-driven figure–ground segregation. J. Neurosci. 31, 164–171. doi: 10.1523/JNEUROSCI.3788-10.2011, [DOI] [PMC free article] [PubMed] [Google Scholar]
Tóth B., Kocsis Z., Háden G. P., Szerafin Á., Shinn-Cunningham B. G., Winkler I. (2016). EEG signatures accompanying auditory figure-ground segregation. NeuroImage 141, 108–119. doi: 10.1016/j.neuroimage.2016.07.028, [DOI] [PMC free article] [PubMed] [Google Scholar]
Tóth B., Velősy P. K., Kovács P., Háden G. P., Polver S., Sziller I., et al. (2023). Auditory learning of recurrent tone sequences is present in the newborn's brain. NeuroImage 281:120384. doi: 10.1016/j.neuroimage.2023.120384, [DOI] [PubMed] [Google Scholar]
van Noorden L. P. A. S. (1975). Temporal coherence in the perception of tone sequences. Eindhoven: Technical University. [Google Scholar]
Viswanathan V., Shinn-Cunningham B. G., Heinz M. G. (2022). Speech categorization reveals the role of early-stage temporal-coherence processing in auditory scene analysis. J. Neurosci. 42, 240–254. doi: 10.1523/JNEUROSCI.1610-21.2021, [DOI] [PMC free article] [PubMed] [Google Scholar]
Winkler I., Denham S. L. (2024). The role of auditory source and action representations in segmenting experience into events. Nat. Rev. Psychol. 3, 223–241. doi: 10.1038/s44159-024-00287-z [DOI] [Google Scholar]
Winkler I., Kushnerenko E., Horváth J., Čeponienė R., Fellman V., Huotilainen M., et al. (2003). Newborn infants can organize the auditory world. Proc. Natl. Acad. Sci. 100, 11812–11815. doi: 10.1073/pnas.2031891100, [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data_sheet_1.docx^{(62.8KB, docx)}

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors without undue reservation.

[ref1] Adibpour P., Lebenberg J., Kabdebon C., Dehaene-Lambertz G., Dubois J. (2020). Anatomo-functional correlates of auditory development in infancy. Dev. Cogn. Neurosci. 42:100752. doi: 10.1016/j.dcn.2019.100752, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] Alain C., Arnott S. R., Picton T. W. (2001). Bottom–up and top–down influences on auditory scene analysis: evidence from event-related brain potentials. J. Exp. Psychol. Hum. Percept. Perform. 27:1072. doi: 10.1037/0096-1523.27.5.1072 [DOI] [PubMed] [Google Scholar]

[ref3] Alain C., Schuler B. M., McDonald K. L. (2002). Neural activity associated with distinguishing concurrent auditory objects. J. Acoust. Soc. Am. 111, 990–995. doi: 10.1121/1.1434942, [DOI] [PubMed] [Google Scholar]

[ref4] Anders T. F., Weinstein P. (1972). Sleep and its disorders in infants and children: a review. Pediatrics 50, 312–324. doi: 10.1542/peds.50.2.312 [DOI] [PubMed] [Google Scholar]

[ref5] Bendixen A., Háden G. P., Németh R., Farkas D., Török M., Winkler I. (2015). Newborn infants detect cues of concurrent sound segregation. Dev. Neurosci. 37, 172–181. doi: 10.1159/000370237, [DOI] [PubMed] [Google Scholar]

[ref6] Boncz Á., Szalárdy O., Velősy P. K., Béres L., Baumgartner R., Winkler I., et al. (2024). The effects of aging and hearing impairment on listening in noise. Iscience 27:109295. doi: 10.1016/j.isci.2024.109295, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] Bregman A. S. (1994). Auditory scene analysis: the perceptual organization of sound. Cambridge, MA: MIT Press. [Google Scholar]

[ref8] Cheour M., Martynova O., Näätänen R., Erkkola R., Sillanpää M., Kero P., et al. (2002). Speech sounds learned by sleeping newborns. Nature 415, 599–600. doi: 10.1038/415599b, [DOI] [PubMed] [Google Scholar]

[ref9] Dehaene-Lambertz G., Pena M. (2001). Electrophysiological evidence for automatic phonetic processing in neonates. Neuroreport 12, 3155–3158. doi: 10.1097/00001756-200110080-00034 [DOI] [PubMed] [Google Scholar]

[ref10] Delorme A., Makeig S. (2004). EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134, 9–21. doi: 10.1016/j.jneumeth.2003.10.009, [DOI] [PubMed] [Google Scholar]

[ref11] Demany L. (1982). Auditory stream segregation in infancy. Infant Behav. Dev. 5, 261–276. doi: 10.1016/S0163-6383(82)80036-2 [DOI] [Google Scholar]

[ref12] Diepenbrock J. P., Jeschke M., Ohl F. W., Verhey J. L. (2017). Comodulation masking release in the inferior colliculus by combined signal enhancement and masker reduction. J. Neurophysiol. 117, 853–867. doi: 10.1152/jn.00191.2016, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] Elhilali M., Ma L., Micheyl C., Oxenham A. J., Shamma S. A. (2009). Temporal coherence in the perceptual organization and cortical representation of auditory scenes. Neuron 61, 317–329. doi: 10.1016/j.neuron.2008.12.005, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] Fifer W. P., Byrd D. L., Kaku M., Eigsti I. M., Isler J. R., Grose-Fifer J., et al. (2010). Newborn infants learn during sleep. Proc. Natl. Acad. Sci. 107, 10320–10323. doi: 10.1073/pnas.100506110 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] Folland N. A., Butler B. E., Payne J. E., Trainor L. J. (2015). Cortical representations sensitive to the number of perceived auditory objects emerge between 2 and 4 months of age: electrophysiological evidence. J. Cogn. Neurosci. 27, 1060–1067. doi: 10.1073/pnas.1005061107 [DOI] [PubMed] [Google Scholar]

[ref16] Hermes D. J. (2023). The perceptual structure of sound. Eindhoven: Springer. [Google Scholar]

[ref17] Johnson B. W., Hautus M. J., Duff D. J., Clapp W. C. (2007). Sequential processing of interaural timing differences for sound source segregation and spatial localization: evidence from event-related cortical potentials. Psychophysiology 44, 541–551. doi: 10.1111/j.1469-8986.2007.00535.x, [DOI] [PubMed] [Google Scholar]

[ref18] Keefe D. H., Bulen J. C., Arehart K. H., Burns E. M. (1993). Ear-canal impedance and reflection coefficient in human infants and adults. J. Acoust. Soc. Am. 94, 2617–2638. doi: 10.1121/1.407347 [DOI] [PubMed] [Google Scholar]

[ref19] Keefe D. H., Levi E. (1996). Maturation of the middle and external ears: acoustic power-based responses and reflectance tympanometry. Ear Hear. 17, 361–373, [DOI] [PubMed] [Google Scholar]

[ref20] Kleiner M., Brainard D., Pelli D. (2007). “What’s new in Psychtoolbox-3?” Perception 36 ECVP abstract supplement.

[ref21] Köhler W. (1947). Gestalt psychology: an introduction to new concepts in modern psychology. New York, NY: Liveright. [Google Scholar]

[ref22] Krishnan L., Elhilali M., Shamma S. (2014). Segregating complex sound sources through temporal coherence. PLoS Comput. Biol. 10:e1003985. doi: 10.1371/journal.pcbi.1003985, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref23] Kujala T., Partanen E., Virtala P., Winkler I. (2023). Prerequisites of language acquisition in the newborn brain. Trends Neurosci. 46, 726–737. doi: 10.1016/j.tins.2023.05.011, [DOI] [PubMed] [Google Scholar]

[ref24] Kumaravel V. P., Farella E., Parise E., Buiatti M. (2022). NEAR: an artifact removal pipeline for human newborn EEG data. Dev. Cogn. Neurosci. 54:101068. doi: 10.1016/j.dcn.2022.101068, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] Kushnerenko E. V., Van den Bergh B. R. H., Winkler I. (2013). Separating acoustic deviance from novelty during the first year of life: a review of event-related potential evidence. Front. Psychol. 4:595. doi: 10.3389/fpsyg.2013.00595, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref26] Leibold L. J. (2011). “Development of auditory scene analysis and auditory attention” in Human auditory development (New York, NY: Springer New York; ), 137–161. [Google Scholar]

[ref27] Long P., Wan G., Roberts M. T., Corfas G. (2018). Myelin development, plasticity, and pathology in the auditory system. Dev. Neurobiol. 78, 80–92. doi: 10.1002/dneu.22538, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref28] Lu K., Xu Y., Yin P., Oxenham A. J., Fritz J. B., Shamma S. A. (2017). Temporal coherence structure rapidly shapes neuronal interactions. Nat. Commun. 8:13900. doi: 10.1038/ncomms13900 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref29] Marie C., Trainor L. J. (2013). Development of simultaneous pitch encoding: infants show a high voice superiority effect. Cereb. Cortex 23, 660–669. doi: 10.1093/cercor/bhs050 [DOI] [PubMed] [Google Scholar]

[ref30] McAdams S., Bertoncini J. (1997). Organization and discrimination of repeating sound sequences by newborn infants. J. Acoust. Soc. Am. 102, 2945–2953. doi: 10.1121/1.420349, [DOI] [PubMed] [Google Scholar]

[ref31] Minagawa-Kawai Y., Cristià A., Dupoux E. (2011). Cerebral lateralization and early speech acquisition: a developmental scenario. Dev. Cogn. Neurosci. 1, 217–232. doi: 10.1016/j.dcn.2011.03.005, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref32] Mishra A. P., Peng F., Li K., Harper N., Schnupp J. W. (2021). Sensitivity of the neural responses to statistical features of sound textures in the inferior colliculus. Hear. Res. 412, 10–1016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref33] Molloy K., Lavie N., Chait M. (2019). Auditory figure-ground segregation is impaired by high visual load. J. Neurosci. 39, 1699–1708. doi: 10.1523/JNEUROSCI.2518-18.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref34] Mullen T. R., Kothe C. A., Chi Y. M., Ojeda A., Kerth T., Makeig S., et al. (2015). Real-time neuroimaging and cognitive monitoring using wearable dry EEG. IEEE Trans. Biomed. Eng. 62, 2553–2567. doi: 10.1109/TBME.2015.2481482 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref35] O'Sullivan J. A., Shamma S. A., Lalor E. C. (2015). Evidence for neural computations of temporal coherence in an auditory scene and their enhancement during active listening. J. Neurosci. 35, 7256–7263. doi: 10.1523/JNEUROSCI.4973-14.2015, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref36] Polver S., Háden G. P., Bulf H., Winkler I., Tóth B. (2023). Early maturation of sound duration processing in the infant’s brain. Sci. Rep. 13:10287. doi: 10.1038/s41598-023-36794-x, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref37] Rezaeizadeh M., Shamma S. (2021). Binding the acoustic features of an auditory source through temporal coherence. Cereb. Cortex Commun. 2:tgab060. doi: 10.1093/texcom/tgab060, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref38] Shamma S., Elhilali M., Ma L., Micheyl C., Oxenham A. J., Pressnitzer D., et al. (2013). “Temporal coherence and the streaming of complex sounds” in Basic aspects of hearing: physiology and perception (New York: Springer; ), 535–543. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref39] Shamma S. A., Elhilali M., Micheyl C. (2011). Temporal coherence and attention in auditory scene analysis. Trends Neurosci. 34, 114–123. doi: 10.1016/j.tins.2010.11.002, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref40] Smith N. A., Trainor L. J. (2011). Auditory stream segregation improves infants’ selective attention to target tones amid distracters. Infancy 16, 655–668. doi: 10.1111/j.1532-7078.2011.00067.x [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref001] Suppiej A., Mento G., Zanardo V., Franzoi M., Battistella P. A., Ermani M., et al. (2010). Auditory processing during sleep in preterm infants: An event related potential study. Early human development. 86, 807–812. [DOI] [PubMed] [Google Scholar]

[ref41] Stefanics G., Háden G., Huotilainen M., Balázs L., Sziller I., Beke A., et al. (2007). Auditory temporal grouping in newborn infants. Psychophysiology 44, 697–702. doi: 10.1111/j.1469-8986.2007.00540.x [DOI] [PubMed] [Google Scholar]

[ref42] Tadel F., Baillet S., Mosher J. C., Pantazis D., Leahy R. M. (2011). Brainstorm: a user-friendly application for MEG/EEG analysis. Comput. Intell. Neurosci. 2011:879716. doi: 10.1155/2011/879716, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref43] Teki S., Barascud N., Picard S., Payne C., Griffiths T. D., Chait M. (2016). Neural correlates of auditory figure-ground segregation based on temporal coherence. Cereb. Cortex 26, 3669–3680. doi: 10.1093/cercor/bhw173, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref44] Teki S., Chait M., Kumar S., Shamma S., Griffiths T. D. (2013). Segregation of complex acoustic scenes based on temporal coherence. eLife 2:e00699. doi: 10.7554/eLife.00699, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref45] Teki S., Chait M., Kumar S., von Kriegstein K., Griffiths T. D. (2011). Brain bases for auditory stimulus-driven figure–ground segregation. J. Neurosci. 31, 164–171. doi: 10.1523/JNEUROSCI.3788-10.2011, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref46] Tóth B., Kocsis Z., Háden G. P., Szerafin Á., Shinn-Cunningham B. G., Winkler I. (2016). EEG signatures accompanying auditory figure-ground segregation. NeuroImage 141, 108–119. doi: 10.1016/j.neuroimage.2016.07.028, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref47] Tóth B., Velősy P. K., Kovács P., Háden G. P., Polver S., Sziller I., et al. (2023). Auditory learning of recurrent tone sequences is present in the newborn's brain. NeuroImage 281:120384. doi: 10.1016/j.neuroimage.2023.120384, [DOI] [PubMed] [Google Scholar]

[ref48] van Noorden L. P. A. S. (1975). Temporal coherence in the perception of tone sequences. Eindhoven: Technical University. [Google Scholar]

[ref49] Viswanathan V., Shinn-Cunningham B. G., Heinz M. G. (2022). Speech categorization reveals the role of early-stage temporal-coherence processing in auditory scene analysis. J. Neurosci. 42, 240–254. doi: 10.1523/JNEUROSCI.1610-21.2021, [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref50] Winkler I., Denham S. L. (2024). The role of auditory source and action representations in segmenting experience into events. Nat. Rev. Psychol. 3, 223–241. doi: 10.1038/s44159-024-00287-z [DOI] [Google Scholar]

[ref51] Winkler I., Kushnerenko E., Horváth J., Čeponienė R., Fellman V., Huotilainen M., et al. (2003). Newborn infants can organize the auditory world. Proc. Natl. Acad. Sci. 100, 11812–11815. doi: 10.1073/pnas.2031891100, [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Evidence for temporal-coherence-based segregation of complex auditory scenes in the newborn human brain

Silvia Polver

Petra Kovács

Gábor P Háden

István Sziller

István Winkler

Brigitta Tóth

Roles

Abstract

1. Introduction

1.1. The temporal coherence theory of auditory stream segregation

Figure 1.

1.2. Neural correlates of stream segregation by temporal coherence in the human brain

1.3. Stream segregation abilities in neonates

1.4. Research questions and hypotheses

2. Methods

2.1. Participants

2.2. Stimuli

2.3. Procedure

2.4. EEG data acquisition

2.5. EEG preprocessing

2.6. ERP analysis

Figure 2.

2.7. Transparency and openness

3. Results

Figure 3.

Figure 4.

4. Discussion

Acknowledgments

Funding Statement

Footnotes

Data availability statement

Ethics statement

Author contributions

Conflict of interest

Generative AI statement

Publisher’s note

Supplementary material

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases