Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jun 22.
Published in final edited form as: Neuropsychologia. 2022 Jun 3;172:108284. doi: 10.1016/j.neuropsychologia.2022.108284

Linking the neural basis of distributional statistical learning with transitional statistical learning: The paradox of attention

Julie M Schneider 1,2, Yi-Lun Weng 1, Anqi Hu 1, Zhenghan Qi 1,3
PMCID: PMC10286817  NIHMSID: NIHMS1895694  PMID: 35667495

Abstract

Statistical learning, the process of tracking distributional information and discovering embedded patterns, is traditionally regarded as a form of implicit learning. However, recent studies proposed that both implicit (attention-independent) and explicit (attention-dependent) learning systems are involved in statistical learning. To understand the role of attention in statistical learning, the current study investigates the cortical processing of distributional patterns in speech across local and global contexts. We then ask how these cortical responses relate to statistical learning behavior in a word segmentation task. We found Event-Related Potential (ERP) evidence of pre-attentive processing of both the local (mismatching negativity) and global distributional information (late discriminative negativity). However, as speech elements became less frequent and more surprising, some participants showed an involuntary attentional shift, reflected in a P3a response. Individuals who displayed attentive neural tracking of distributional information showed faster learning in a speech statistical learning task. These results suggest that an involuntary attentional shift might play a facilitatory, but not essential, role in statistical learning.

Keywords: Statistical learning, MMN, P3a, ERP, attention

Introduction

Humans are equipped with remarkable sensitivity to frequencies, regularities, and variabilities of the information encountered in their environments. The process of detecting statistical patterns embedded in the input, known as statistical learning (SL), is posited as a foundational theoretical account for a wide range of human behaviors, from category learning, sequence learning, and sound-object mapping, to the more holistic acquisition of language and literacy skills (Aslin & Newport, 2014; Erickson & Thiessen, 2015; Frost et al., 2015; Yu & Smith, 2007). Learners’ sensitivity to statistical patterns span across hierarchical structures, including distributional probabilities and transitional/conditional probabilities (Thiessen et al., 2013). Distributional probabilities describe the frequency of occurrence for an independent event, while transitional probabilities refer to the likelihood of two or more elements co-occurring in the input, and can aid in grouping of elements (e.g. word segmentation; Thiessen, 2017). The vast majority of SL research has been focused on learners’ sensitivity to transitional probabilities. In a typical SL paradigm, people learn arbitrary associations between stimuli through passive exposure to streams of information embedded with certain statistical patterns (Fiser & Aslin, 2001; Saffran, Aslin, & Newport, 1996). After a brief period of exposure, participants can correctly identify which stimuli are more likely to co-occur during the exposure phase. Successful learning of transitional probabilities has been demonstrated in infants, older children, and adults with a wide range of stimuli, including but not limited to visual sequences (e.g., visual spatial sequences, Fiser & Aslin, 2002; visual shape and color sequences, Turk-Browne et al., 2008), auditory sequences (e.g., pure tones, Paraskevopoulos et al., 2012; Saffran et al., 1999; timbre, Caclin et al., 2006; Goydke et al., 2004; Koelsch et al., 2016; Tervaniemi et al., 1997; Toiviainen et al., 1998; pitch/chord, Daikoku et al., 2014, 2015, 2016; François & Schön, 2011; Kim et al., 2011; Moreau et al., 2013; Tervaniemi et al., 2000) and sequences generated from artificial grammar structures (e.g., auditory tones and visual color sequences, Conway & Christiansen, 2006), and joint audio-visual multimodal sequences (e.g., shapes and tones; Seitz et al., 2007).

An important feature of SL is that it can occur in the absence of instruction or an explicit goal, such as when participants are passively exposed to stimuli without an explicit task (Fiser & Aslin, 2001; Fiser & Aslin, 2002; Saffran et al., 1999) or when participants are engaged in an unrelated cover task (Saffran et al., 1997; Turk-Browne, Jungé, & Scholl, 2005; Turk-Browne, Scholl, Chun, & Johnson, 2009). SL takes place even when minimal attention and working memory resources are available. For example, learning of bigrams and trigrams embedded in an artificial grammar learning task was not interrupted by a concurrent working memory task that recruited most, if not all, attention (Hendricks et al., 2013). Using a similarly taxing working memory task, another recent study of speech SL reported robust learning even when attentional processes to the speech stream were drastically reduced. Listeners were able to extract relevant statistical patterns from the speech stream and show a similar degree of perceptual binding, that is, the brain tagged the rhythmic frequency of the segmented words in a similar way regardless of whether learners were paying full attention to the speech stream. Thus, these studies demonstrate that attention is not required for the detection of statistical patterns from speech. Interestingly though, the benefit of attention was observable in a post-learning target detection task, which assesses the implicit memory of segmented words. Participants who paid full attention to the speech stream showed better implicit memory of the segmented words, as reflected by a greater speed-up to target syllables at the later position of the words compared to the participants who divided their attention during learning (Batterink & Paller, 2019). These studies shed light on the necessity of task-related attention for transitional SL.

While successful detection and recall of the embedded sequences does not require conscious awareness of the learning goals (Curran & Keele, 1993; Song et al., 2007; Goschke, 1998), this does not imply that SL can always occur in the absence of attention. Indeed, previous research compared SL performance in tasks with or without directed attention to the stimuli suggesting that attention may serve as a gate for at least some aspects of SL (e.g., Fernandes et al., 2010; Toro et al., 2005; Turk-Browne et al., 2005). For example, Toro and colleagues (2005) manipulated the task demands during a speech SL task. Half of the participants were asked to passively listen to the speech stream, while the other half were asked to perform a concurrent task. When the concurrent task taxed attentional processes or occurred within the same sensory modality as the speech stream, participants were incapable of identifying the words following the embedded pattern, indicating that, even in the absence of conscious awareness, at least some attentional resources must be available and directed to the speech stream for successful word segmentation. Similarly, Turk-Browne, Junge, and Scholl (2005) found that successful visual SL was enabled only in the attended stream, but not in the unattended stream. The presence of attention might also explain the advantage of explicit over implicit training context found in some SL studies which demonstrated that knowing the learning goals prior to training optimizes attention allocation and leads to faster learning and more accurate post-learning recall of the pattern (Batterink, Reber, & Paller, 2015; Daikoku et al., 2014)..

Recent work using implicit SL tasks where the existence of statistical structures are not explicitly revealed to the participants also support a reciprocal relationship between SL and attention. For example, over the course of visual SL, participants showed greater preference for predictive stimuli (Alamia & Zénon, 2016) and complicated statistical structures (Forest et al., 2021), compared to random stimuli. These findings suggest that even though learners may not be explicitly instructed about the statistical structures in the input, the learning process itself can shift learners’ endogenous attention towards the predictive inputs.

Taken together, prior research indicates the role of attention in both implicit and explicit learning contexts (Batterink, Reber, Neville, et al., 2015). While attention is not necessary for SL to occur, attention is often engaged even in implicit learning contexts and benefits learning outcomes. However, most evidence supporting the importance of attention in SL comes from reflection-based tests of learning, where participants are required to engage some degree of conscious awareness to the task goals, such as reflecting on what they have learned and making explicit decisions about test stimuli. These reflection-based tests are problematic as they introduce additional noise across individuals (Siegelman et al., 2017) and thus cannot tease apart the role of attention in SL studies relative to processing-based measures (e.g., Batterink et al., 2015; Vuong, Meyer, Christiansen, & Kau, 2016). These behavioral measures are also problematic as they represent the final result of a complex chain of multiple operations that may include perceptual/cognitive mechanisms, response selection, and motor preparation and execution (Daltrozzo et al., 2017; Daltrozzo & Conway, 2014).

Neurophysiological measurements offer a great opportunity to disentangle the respective correlates of attention and SL without the influence of conscious response. Studies of Event-Related Potentials (ERPs) and Event-Related Magnetic Fields (ERFs) have revealed a variety of neural indices for SL (Daltrozzo & Conway, 2014; Daikoku, 2018), among which the mismatch negativity (MMN) and P3 (also known as P300) are two well-studied ERP components. These two components are often elicited in an auditory oddball paradigm, a basic paradigm containing rare stimuli (deviants) along with more frequent stimuli (standards). Auditory oddball paradigm has been used to investigate a simple form of SL: the learning and processing of frequency-sensitive information. The generation of expectations from structured patterns is inherent to SL and is particularly important for learning information that is distributed temporally (Conway, 2020). Occasional auditory deviants elicit a frontocentral negative deflection of the ERP waveforms as opposed to more frequently presented standards stimuli, mismatch negativity (MMN), emerging between 150 – 500 msec after the stimulus onset. The MMN is sometimes accompanied by another large, broad, and positive ERP component, P3 complex, emerging between 200 – 1000 ms (Näätänen et al., 2007; Polich, 2007). Both MMN and P3 are highly sensitive to stimulus probability, each reflecting different levels of attention engagement (Duncan et al., 2009; Van Zuijen et al., 2006).

The MMN reflects a pre-attentive, nonconscious response to an auditory stimulus deviance occurring in the concurrent and local context (Chennu et al., 2013; Näätänen, Tervaniemi, Sussman, Paavilainen, & Winkler, 2001; Paavilainen, 2013), where explicit attention to the auditory stimuli is not required (Alho et al., 1994; Van Zuijen et al., 2006). Importantly, the MMN appears to only be sensitive to violations of expectation based on transient auditory experiences (local information), but not to violations of auditory prediction that accrues over time (global information; Pegado et al., 2010; Bekinschtein et al., 2009; Wacongne et al., 2011). For instance, in the sequence AAAAB, B is the local deviant of the phrase. But after the same phrase is repeated many times, the A in phrase AAAAA should become surprising given the low global frequency of the phrase. In this context, the MMN was only detectable for local deviants, but not for global deviants (e.g., Bekinschtein et al., 2009). The transient nature of the MMN is further supported by findings that MMN responses were drastically reduced when the presentation rate was slowed down (Pegado et al., 2010). Because transitional probability must be learned across a protracted time period, it is of particular interest to test whether the MMN could represent learned transitional probabilities. Several studies using paradigms containing embedded triplet structures confirmed that the MMN is indeed sensitive to transitional probability patterns (Fitzgerald & Todd, 2018; Furl et al., 2011; Koelsch et al., 2016; Tsogli et al., 2019). For example, Koelsch et al. (2016) reported that the ERP responses to triplet endings were reminiscent of the temporal and scalp distribution of the MMN and were modulated by the transitional probability of local dependencies. Such a sensitivity to deviance from transitional statistical patterns was found to be a dissociable characteristic of the MMN from its sensitivity to perceptual deviance (Tsogli et al., 2019). Interestingly, these studies also reported a lack of explicit knowledge of the learned patterns. Taken together, these studies suggest that the MMN is not only a marker of perceptual auditory prediction, but also a marker of prediction based on implicit knowledge of transitional probabilities. Given that the prediction errors for transitional probability are derived from listeners’ most recent memory traces about the root of the triplet, these findings are still consistent with the transient nature of the MMN.

In contrast, the P3 was found to be modulated by mnemonic prediction, a process based on accumulated information perceived over a longer period of time (global context; Bekinschtein et al., 2009; Chennu et al., 2013; Squires et al., 1975; Stadler et al., 2006; Sutton et al., 1965). Crucially, the P3 has been associated with an attentional shift to stimulus changes, which relies on continuous updating of one’s memory trace (Bekinschtein et al., 2009; Donchin & Coles, 1988; Sergent et al., 2005). It has been found that an explicit search for auditory “oddballs” of global statistical patterns leads to a protracted P3 response, whereas an explicit search for auditory “oddballs” of local statistical patterns leads to shorter and sharper P3 responses (Chennu et al., 2013; Bekinschtein et al., 2009). Importantly, when no attention is required during the task, or when participants are engaged in a cover task, no P3 effect is elicited by the global pattern violations (Wacongne et al., 2011; Bekinschtein et al., 2009). Therefore, the P3 reflects a conscious, attentional shift to the prediction errors derived from the regularities distributed over an extended period of time (Van Zuijen et al., 2006). While the P3 is the most common nomenclature used to refer to this late positive component, the P3 effects reported in these studies are typical P3b responses, distributed across temporal-parietal scalp sites. Recently, research examining implicit statistical learning found that transitional probability modulates the P3a (Pesnot Lerousseau & Schön, 2021), a more frontally distributed component, reflecting task-related automatic orientation of attention (Polich, 2007).

One caveat of these studies is that the global information differed from the local information both in the temporal scale of its distribution (minutes vs. seconds) and in the complexity of the statistical pattern. The global probability was manipulated on the level of transitional statistical information, that is how likely a multi-tone phrase is to occur over an extended period of time (across trials), while information related to the local probability was distributional statistical information, that is how likely a particular tone occurred in the local context (within trials). Evidence from behavioral work suggests independent learning processes underlie transitional and distributional statistical learning (Endress & Bonatti, 2007; but see Thiessen, Kronstein, & Hufnagle, 2013 for an alternative account). Therefore, the context by which attention is required remains opaque: is it the gradual nature of learning, the complexity of the statistical pattern, or both? Regardless, these findings suggest that the attention-dependent system plays a key role in transitional statistical learning, where learning depends on the gradual accumulation of co-occurring information over a protracted period of time. Yet, the vast literature in implicit statistical learning has presented paramount evidence suggesting learning can be achieved without explicit attention towards the stimuli. The current study aims to address this paradox by manipulating the temporal scale of distributional information. We ask whether processing local and global distributional information involves a pre-attentive (MMN) or an attentional process (P3), and how neural prediction based on distributional information is related to higher-level transitional statistical learning?

Our research addresses the following gaps in the literature. First, existing evidence for the dissociation between the MMN and P3 during processing of statistical information rely on designs which confound two dimensions of hierarchy: the complexity of the statistical regularity (distributional vs. transitional) and the temporal delay of the distribution (local vs. global). As a result, it is not clear whether the P3 effect represents the necessity of attention for learning transitional statistical patterns, learning patterns distributed over a longer period of time, or both. In our study, we remove the confound by only focusing on the temporal delay of the distribution (local vs. global). We ask how the MMN and P3a are modulated by distributional statistics (e.g., graded probability of occurrence). Second, previous studies demonstrated the necessity of attention when learning regularities distributed across an extended period of time by contrasting tasks with and without an explicit target-detection component. In this study we sought to investigate the role of involuntary attention in a passive oddball paradigm with speech stimuli. Specifically, we ask whether the P3a, a fronto-parietal distributed event related potential (ERP) component which reflects involuntary attention shift to highly salient stimulus (Polich, 2007), can be modulated by local or global distributional statistical information. This approach is more ecologically valid, as a passive paradigm will be more suitable for future studies involving developmental comparison or special populations. Finally, current study is the first to examine the relationship between neural prediction and statistical learning. We ask whether attention-dependent or attention-independent neural prediction relates to statistical learning outcomes. We relate individual differences in a word segmentation task measured by both online and offline learning measures with the neural indices of prediction measured by ERPs.

2. Methods

2.1. Participants

Forty-five adults (mean age = 22.76 years, SD = 3.02 years, range = 18.1—34.6 years, 11 males) participated in this study. All participants were right-handed, native English speakers, with no history of neurological or psychiatric disorders, or brain damage. All had average or above-average non-verbal intelligence (age-based standard score > 85) as measured by the Matrices subtest (mean = 107.84, SD = 12.89) of the Kaufman Brief Intelligence Test (KBIT-2; Kaufman & Kaufman, 2004), and had average or above average vocabulary as measured by the Picture Vocabulary Test (PVT; mean = 114.2, SD = 13.35, range = 91–149) of the NIH Toolbox (Gershon et al., 2013). All participants gave written consent to participate, in accordance with the Institutional Review Board at the University of Delaware. All participants were compensated for their participation.

2.2. Statistical Learning Behavioral Task

Participants completed an auditory linguistic SL task hosted on a web-based platform (Qi et al., 2019; Schneider, Hu, Legault, & Qi, 2020) programmed using jsPsych (de Leeuw, 2015). The stimuli, modeled after Saffran, Aslin, & Newport (1996), were composed of a set of four triplets, each containing three syllable stimuli which always appeared in the same order. The four triplets were pa-bi-ku, da-ro-pi, ti-bu-do, and go-la-tu, were generated by a speech synthesizer in a monotone female voice. Each triplet was repeated 48 times and was concatenated in a pseudorandom order so that the same triplet did not repeat more than twice across three consecutive triplets. The transitional probability within each triplet words was 1 and across triplet words was 0.33. The stimulus onset asynchrony was 480 ms and each triplet lasted 1440 ms. The exposure phase had a total duration of 4 minutes and 36 seconds.

During the exposure phase, participants were told to track a target syllable (i.e., the alien’s favorite word) by pressing a button on the keyboard while listening to a continuous stream of speech syllables. Participants were randomly assigned to track one of the four syllables in the third position of a triplet throughout the exposure phase. For example, the participant may be asked to track do, which appeared only in the sequence ti-bu-do. They were instructed to press the spacebar on their keyboard whenever they heard this target syllable. Reaction time (RT) was recorded for the participant’s response to the target. Keypresses from trials immediately preceding or following the target were included to allow for both anticipatory and delayed responses: that is −480 ms to +960 ms relative to the onset of the target. To control for baseline RT differences across individuals in the current study, the raw RTs of an individual were first transformed into z-scores. The z-normed RTs of an individual were then entered as the dependent variable into a linear regression model. The target trial order (1 to 48) was entered as the independent variable. The reaction time slope was computed as the slope of the linear regression line (beta coefficient).

In order to validate the RT slope measure for statistical learning, we recruited an independent sample of 16 adults who completed a speech statistical learning task across two blocks. In one block, adults heard structured sequences containing embedded speech triplets mixed with random sequences of monotones. In the other block, adults heard random sequences of speech syllables mixed with structured sequences of monotone triplets. Each speech sequence contained 96 syllables. The structured sequences in this validation study contain the same embedded triplet structures as the current statistical learning task. Speech syllables and tones were intermixed to minimize the interference of the random sequences on statistical learning within each block. Adults showed significant acceleration in responding to the same target syllable in the structured (M = −1.54, SD = 3.44), but not in the random (M = 0.48, SD = 2.16) speech sequences. The condition difference in RT slope was significant (t(21.63) = 1.88, p = .037). These findings were consistent with previous work from our group demonstrating that adults responded to target syllables more quickly in structured than in random sequences of speech streams (Schneider et al. 2020). Importantly, the validity of the RT slope measure was further supported by the steeper RT acceleration slope in structured than random sequences in the visual domains (Kozloff et al., 2018; Zinszer et al., 2020).

A two-alternative forced-choice task consisting of 32 test trials immediately followed the exposure phase. In each trial, the participant heard a triplet word from the exposure phase and a foil nonword. Foil nonwords were comprised of three stimuli from across triplets, where each syllable maintained its position in the word but never co-occurred during the exposure phase (pa-ro-do, ti-la-pi, da-bi-tu, go-bu-ku). Each trial prompted the user to identify which of the two sequences had been heard during the exposure phase. Test accuracy was calculated as the percentage of correct trials during the two-alternative forced-choice test phase.

All behavioral analysis of the speech SL task was completed in R (R Core Team, 2017). Seven participants were removed from these behavioral analyses: six participants did not complete the task and one participant did not have enough valid key presses (< 6 trials) during the exposure phase. Therefore, all behavioral and brain-behavior correlational analyses were conducted on the remaining 38 participants (mean age = 20.55 years, SD = 2.53 years, range = 18.0—31.0 years, 11 males).

2.3. Auditory EEG Oddball Paradigm

2.3.1. Stimuli.

Two female English native speakers produced sounds of “bog” and “dog” 100 times for each word in a picture-naming task. 100 tokens for each word were digitally recorded using a SHURE SM58 microphone and Edirol UA-25EX sound card, sampling at 44.1 kHz. 50 tokens with better recording quality were chosen for the experiment. The /ba/ and /da/ sound were manually cut from the original recordings. The duration of each sound file is 180 ms with ramping at the beginning and the end of the syllable. The intensity of each sound was normalized to 70dB.

Two auditory streams of 1500 stimuli (SOA = 0.7 seconds) were created consisting of three conditions: standard, linguistic deviant and non-linguistic deviant (see Figure 1 for visualization of EEG paradigm). The standard condition included repeated presentations of the /ba/ syllable spoken by one female speaker. The linguistic deviant was a different syllable, /da/, spoken by the same speaker as in the standard condition. The non-linguistic deviant was the same syllable, /ba/, spoken by a different female speaker. In order to investigate the neural responses associated with differences of abstract features of the stimuli as opposed to merely acoustic differences, we included 50 different exemplars for each stimulus type.

Figure 1. Schematic illustration of the EEG Paradigm.

Figure 1.

The global probability was manipulated across the two blocks. In the first block linguistic deviant was presented with higher number of repetitions (rep), as compared to the non-linguistic deviant, and this pattern switched in the second block. The local probability was manipulated within each block. In the high condition two standard stimuli were presented before the linguistic or non-linguistic deviant and in the low condition six standard stimuli were presented before the deviant. Block order was counterbalanced across participants.

To investigate listeners’ sensitivity to global probability, we manipulated the frequency of deviant presentation across two experimental blocks. In both blocks, standard stimuli were presented 1200 times, resulting in a global probability of 0.8 (1200 standards /1500 total stimuli). In one block, the linguistic deviant occurred at a high frequency (global probability = 0.13; 200 deviants /1500 total stimuli), while the non-linguistic deviant occurred at a low frequency (global probability = 0.06; 100 deviants /1500 total stimuli). In the other block, the non-linguistic deviant occurred at a high frequency (global probability = 0.13), while the linguistic deviant occurred at a low frequency (global probability = 0.06). The block order was counterbalanced across participants.

To investigate listeners’ sensitivity to local probability, we manipulated the number of standard stimuli preceding deviant stimuli within each global probability condition. In the high condition, two standard stimuli were presented before a linguistic or non-linguistic deviant (i.e., /ba/ /ba/ /deviant/). In the low condition six standard stimuli were presented before a linguistic or non-linguistic deviant (i.e., /ba/ /ba/ /ba/ /ba/ /ba/ /ba/ /deviant/). Low and high conditions were randomly interspersed in each auditory stream. For each deviant type within a block, one-third of the deviants were in the high (local) condition. One-third of the deviants were in the low (local) condition. The remaining one-third of the deviants were preceded by four standard stimuli. The local probability conditions were randomly ordered within each auditory stream so that it was impossible for the listeners to predict when a deviant will occur given the number of proceeding standards. To maximize the analysis power, we only analyzed the difference between the high and low local probability conditions.

2.3.2. Procedure.

Participants were instructed to watch a silent animation movie while listening to the auditory streams through a pair of noise-attenuating Cortech ER-2 earphones. Participants performed a visual target detection cover task and were instructed to press a button on a Cedrus Response Pad RB-840 (Cedrus Corporation, San Pedro, CA) as quickly as possible upon seeing a target animation character in the movie. Each experimental block lasted for 17.5 minutes. All the visual and auditory stimuli were presented using Presentation® software (Version 18.0, Neurobehavioral Systems, Inc., Berkeley, CA, www.neurobs.com).

2.4. EEG Preprocessing

EEG was recorded with a 24-channel mobile EEG system (Easy Cap’s EEG-RBE 24 cap; SMARTING, mBrainTrain, Belgrade, Serbia) which features a sampling rate of 500 Hz, a resolution of 24 bits, and a bandwidth from DC to 250 Hz (SMARTING, www.mbraintrain.com). The amplifier used in this study includes a 3D gyroscope and power supply for several hours use (weight 64 grams; size 82 × 51 × 14 mm). Data is transmitted wirelessly via bluetooth (v2.1) to a nearby paired laptop. Electrode impedances were kept below 10 kΩ. Recordings were online referenced to electrode FCz and grounded to electrode AFz. All data was saved using Labrecorder software, which is part of the Lab Streaming Layer (LSL).

All EEG data was pre-processed using the EEGLAB toolbox of MATLAB (Delorme & Makeig, 2004). All continuous data was high-pass filtered at 0.1 Hz, low-pass filtered at 30 Hz, and re-referenced to the mastoids. An Independent Components Analysis (ICA; Delorme, Makeig, & Sejnowski, 2001) was carried out for artifact removal. Components are automatically identified through a supervised machine learning algorithm which extracts features across the spatial, spectral and temporal domain. Component activations and scalp maps were inspected manually. The components related to eye-movements or muscle artifacts were then identified and removed from the continuous EEG data (Jung et al., 2000). Within the ERPLAB toolbox (Lopez-Calderon & Luck, 2014), all data was then epoched from 100 msec before to 600 msec after stimulus onset and baseline corrected to the 100 msec before stimulus onset. Trials were removed from analysis if the peak-to-peak voltage between 100 ms pre-stimulus and 600 ms post-stimulus exceeded 100 μV for any of the 24 EEG channels. On average, 1.22 components (SD = .40) and 15.11 trials (SD = 12.90) were removed. Given the current study was an oddball paradigm, more trials were present in the High (global) condition versus the Low (global) condition. To ensure differences in noise would not interfere with our results, subsequent analyses utilized mean amplitude, rather than peak amplitude, as mean amplitude is an unbiased measure that can be used when noise levels differ across conditions (Luck, 2014; Clayson, Baldwin, & Larson, 2013).

2.5. Event Related Potential Analysis

Our main interest is to examine whether the MMN and P3a are modulated by the local and global distributional information of the deviants. We defined the earlier and later time windows and electrodes of interest by first examining significant differences between all standards and deviants using a cluster-based permutation test within the Mass Univariate Toolbox of MATLAB (Groppe et al., 2011). This permutation test is based on a repeated measures t-statistic using every time point at each electrode from 0 to 600 msec post-stimulus, and controls for the family-wise error rate across the full set of comparisons. The cluster-based test corrects for the multiple comparisons problem by first forming clusters of neighboring extreme t-scores and building a null hypothesis distribution from the most extreme cluster statistic (Maris & Oostenveld, 2007). For our cluster-based permutation, neighbors were established as electrodes within approximately 5.44 cm of one another with a family-wise alpha level of .05. This approach capitalizes on the fact that ERP effects are more likely than noise to extend across many adjacent electrodes and time points and is probably the most powerful mass univariate procedure for detecting broadly distributed effects (Groppe et al., 2011; Maris & Oostenveld, 2007).

For the time windows and spatial distributions identified by the omnibus cluster permutation (standards vs. deviants), the mean amplitude of the ERP was then averaged across the significant cluster of electrodes for each deviant condition (as compared to the corresponding standard condition within the same experimental block) and extracted for each participant using the ERPLAB toolbox in MATLAB. The ERP mean amplitudes were then submitted to a linear mixed-effect modeling using the lmer function (lme4 version 1.1–20) in R (RStudio Team, 2016). The model included fixed effects for domain (syllable vs voice), local probability (high vs. low sequence of standards before deviant), and global probability (high vs. low frequency of occurrence) with by-subject random intercepts and random slopes for the interaction between local and global probability. Effect sizes are represented by Cohen’s d and were produced using the lme.dscore function (EMAtools version 0.1.30).

3. Results

3.1. Statistical Learning Behavioral Results

Accuracy on the behavioral statistical learning task was significantly above chance (M = 60.1%, SD = 12.05%, t(37) = 5.17, p < 0.001). Reaction time slope during the exposure phase did not significantly improve across the course of the exposure window (M = .004, SD = 0.02; t(37) = 0.92, p = 0.36); however, reaction time slope was marginally related to accuracy (R = −0.31, p = 0.055). Individuals who responded more quickly over the course of the exposure phase were more likely to correctly identify words over foils during the test phase.

3.2. ERP Results

We first sought to investigate the overall change in the neural response to auditory deviants in order to define the time windows and electrodes of interest for our main analyses. Therefore, we compared all deviants to all standards between 0 and 600 msec after the stimulus onset. Next, using the time windows and electrodes defined by the first step, we sought to delineate how the properties of auditory deviants (linguistic vs. nonlinguistic, high vs. low global probability, and high vs. low local probability) modulate listeners’ neural responses to the deviants.

3.2.1. Deviants compared to standards

The mass univariate analysis resulted in two significant clusters. The deviants elicited a significantly greater negativity than the standards between 60 msec and 214 msec at all electrodes except for F7, T7, and Cz, and between 350 and 598 msec at these same electrodes, with the added exception of P7 (Figure 2). The early effect uncovered by the cluster permutation is similar in temporal and spatial distribution to the canonical MMN (Näätänen, Paavilainen, Rinne, & Alho, 2007). The late effect, distributed widely, shares the similar temporal characteristic as P3a, but with the opposite polarity. Such effect, termed as late discriminative negativity (LDN) or late MMN, has also been reported in passive auditory oddball tasks with speech stimuli (Wetzel & Schroger, 2014; Bishop, Hardiman, & Barry, 2011; Cheour, Korpilahti, Martynova, & Lang, 2001; Korpilahti, Lang, & Aaltonen, 1995).

Figure 2.

Figure 2.

Waveforms recorded in response to standards and deviants. Scalpmaps represent the spatial distribution of significant differences identified by the mass univariate cluster analysis in both the early (60–214 msec) and late (350–598 msec) time windows between deviants and standards. The ERP is a temporal representation of significant differences between deviants and standards collapsed across the electrodes identified as a significant cluster by the mass univariate cluster analysis (cluster-level p < 0.05).

3.2.2. Contribution of domain, global probability, and local probability

In order to investigate how the neural responses to auditory deviants are affected by domain (non-linguistic vs linguistic), local probability (low vs. high), and global probability (low vs. high), the average amplitude in the two significant time windows for each participant was submitted to linear mixed effect models.

The linear mixed effect model in the early MMN time window (60–214 msec), shown in Table 1, revealed a main effect of local probability (β = −0.51, SE = 0.25, p = 0.04). Participants showed a greater MMN response to deviants with a low local probability (deviants following a longer sequence of standards) than those with a high local probability (deviants following a shorter sequence of standards; Figure 3a). The linear mixed effect model in the late time window, shown in Table 2, revealed a significant main effect of global probability (Figure 3b; β = −0.81, SE = 0.39, p = 0.04) and a significant interaction between local and global probability (Figure 4a; β = 1.17, SE = 0.39, p = 0.04). Participants showed a greater LDN response to deviants with a low global probability (deviants occurring less frequently) than those with a high global probability (deviants occurring more frequently; Figure 3b). To unpack the interaction between local and global probability, post-hoc pairwise analyses revealed that the group effect shifted from a significant LDN response to a non-significant P3a trend, as the deviants occurred in less frequent probability across global and local contexts. We found a larger and significant LDN response to global probability (low vs. high) for the High (local) deviants (p = 0.01, Bonferroni-corrected p < 0.05), as compared to a non-significant global probability effect in a reversed direction for the Low (local) deviants (p = 0.33). There was also a larger and significant LDN response to the local probability (low vs. high) for the High (global) deviants (p = 0.04), as compared to a non-significant local effect in a reversed direction for the Low (global) deviants (p = 0.17; see Figure 4b). The effect of domain (non-linguistic vs. linguistic) was not significant in either the early or the late time window.

Table 1.

Generalized linear mixed-effects model and effect size (Cohen’s d) of individual learners’ average early ERP amplitude based on Domain, Local Probability and Global Probability.

Dependent Variables Predictors β Std. Error t value p value Cohen’s d

Average ERP Amplitude in Early Time Window (Intercept) 0.13 0.17 0.77 0.44 --
Domain −0.43 0.24 −1.83 0.07 −0.23
Global Probability −0.34 0.24 −1.40 0.16 −0.19
Local Probability −0.51 0.25 −2.07 0.04 −0.31
Domain × Global 0.20 0.34 0.61 0.55 0.07
Domain × Local 0.41 0.34 1.23 0.22 0.15
Global × Local 0.55 0.30 1.58 0.11 0.23
Domain × Global × Local −0.01 0.47 −0.02 0.99 −0.002
Figure 3.

Figure 3.

Main ERP effects of a) local probability in the early time window and b) global probability in the late time window. a) Deviants preceded by a longer sequence of standards, the Low(local) condition, elicited a greater negativity from 60–214 msec (distinguished by hatching) than deviants preceded by a shorter sequence of standards, the High (local) condition. b) Deviants that occurred less frequently, the Low (global) condition, resulted in a greater negativity from 350–598 msec (distinguished by hatching) than deviants that occurred more frequently, the High(global) condition. The scalpmap represents the spatial distribution of the electrodes. Electrodes highlighted in red were identified as significant in the mass univariate analysis comparing all deviants and all standards.

Table 2.

Generalized linear mixed-effects model and effect size (Cohen’s d) of individual learners’ average late ERP amplitude based on Domain, Local Probability and Global Probability.

Dependent Variables Predictors β Std. Error t value p value Cohen’s d

Average ERP Amplitude in Late Time Window (Intercept) −0.07 0.28 −0.25 0.81 --
Domain 0.03 0.39 0.08 0.94 0.009
Global Probability −0.81 0.39 −2.10 0.04 −0.26
Local Probability −0.57 0.39 −1.46 0.14 −0.19
Domain × Global 0.35 0.55 0.65 0.52 0.08
Domain × Local 0.18 0.55 0.34 0.74 0.04
Global × Local 1.17 0.57 2.05 0.04 0.27
Domain × Global × Local −0.39 0.77 −0.50 0.62 −0.06
Figure 4.

Figure 4.

Interaction between local and global probability in the late time window. (a) Average ERP amplitude by local and global probability conditions. Error bars represent the standard errors. (b) ERP waveforms in each condition. The scalpmap represents the spatial distribution of the electrodes. Electrodes highlighted in red were identified as significant in the mass univariate analysis comparing all deviants and all standards in the late time window.

3.3. Correlation Results

To clarify how differences in the neural response during this auditory oddball paradigm related to learning of transitional statistics in speech, we extracted the mean amplitude of the effects identified as significant in the early and late time windows based on the above analyses. We then conducted a Spearman correlation analysis between the two SL behavioral measures (RT slope and accuracy) and each main ERP effect independently, as each correlation matrix addressed a different theoretical question. We first asked if the pre-attentive neural prediction measured by the MMN in the early time window is related to statistical learning outcomes? Greater magnitude of local probability (low vs. high) is significantly associated with slower online learning, measured by RT slope (Rho = −0.33, p = 0.04). We next asked whether neural prediction in the late time window measured was related to statistical learning outcomes. We extracted five ERP measures for each participant: one from the main global probability effect, and four from the significant interaction between the global and local probability reported earlier. The fours significant interaction terms came from the: a) local probability effect within High (global) condition, b) local probability effect within the Low (global) condition, c) the global probability effect within the High (local) condition, and d) the global probability effect within the Low (local) condition. We found learners with a larger P3a-like response to local probability for Low (global) deviants exhibited faster RT acceleration during the SL task (Rho =−.42, p = 0.009, Bonferroni-corrected p = 0.09; Figure 5a; see Table 3 for all the statistics). None of these ERP markers were related to offline learning measured by accuracy.

Figure 5.

Figure 5.

Greater P3a-like response to local probability is related to faster statistical learning. (a) Faster acceleration of RT (more negative slope) is associated with greater P3a response to local probabilities in the Low(global) condition. (b) The P3a group demonstrated significantly faster online learning of transitional statistics as compared to the LDN group (t(36) = −8.73, p < .0001). (c) The P3a and LDN groups showed similar accuracy on the post-training statistical learning task (t(36) = 0.84, p = .41).

Table 3.

Spearman correlations between main ERP effects, RT Slope, and accuracy.

MMN ERP Amplitude RT Slope Accuracy

 Local Effect Rho = −0.33, p = 0.04 Rho = 0.004, p = 0.98

LDN/P3a ERP Amplitude RT Slope Accuracy

 Global Effect Rho = 0.19, p = 0.26 Rho = 0.10, p = 0.54
 Global Effect within High(local) Rho = 0.20, p = 0.24 Rho = 0.09, p = 0.60
 Global Effect within Low(local) Rho = −0.06, p = 0.71 Rho = 0.09, p = 0.60
 Local Effect within High(global) Rho = −0.19, p = 0.25 Rho = 0.07, p = 0.67
 Local effect within Low(global) Rho = −0.42, p = 0.009 Rho = 0.12, p = 0.49

4. Discussion

The current study sought to elucidate a paradox in implicit SL research: what role does attention-dependent system play for statistical learning, if learning can be achieved without explicit attention towards the stimuli? To address this question, we designed a passive auditory EEG oddball paradigm in which we embedded two levels of distributional information in speech: low and high frequency of occurrence at the local (across neighboring syllables) and global (across the whole speech stream) levels. Our analysis extracted the early and late discriminative ERP responses, MMN and LDN respectively, to auditory deviants as compared to auditory standards, and investigated how their amplitude was modulated by the distributional statistics of the same deviants. Our findings indicate that monitoring local probabilities engages an automatic pre-attentive process, reflected as a MMN response, while global distributional information modulates an individual’s expectation to the local probability. As auditory oddballs became less frequent and more surprising, participants exhibited a shift from a LDN response to a P3a response. Individuals who indicated attentive tracking of distributional information were also more likely to show faster RT acceleration in a transitional statistical learning task (word segmentation).

In the early processing stage, we found that local probabilistic information modulated the MMN amplitude, suggesting neural prediction based on transient auditory inputs is a pre-attentive process, graded by the local frequency of occurrence. These results are broadly consistent with the classic view that the elicitation of a MMN represents pre-attentive encoding of short-duration memory traces (Näätänen et al., 2007). In particular, our finding replicated previous reports that increasing repetitions of proceeding standards leads to a progressive enhancement of the MMN (Baldeweg, 2007; Haeascfael et al., 2005), but extend these findings to the speech domain. Our findings also suggest that unlike local probabilistic information, global probabilistic information was not encoded during this early and pre-attentive process. The blindness of the MMN to global patterns can be attributed to the transient nature of the memory trace represented by the MMN. Reduced MMN amplitude has been reported in response to lengthening the delay between deviants and standards (Pegado et al., 2010). Previous studies with deviants in both local and global contexts reported a lack of MMN responses to deviants that violated global patterns of distributional statistics (e.g., AAAAB (standard) vs. AAAAA (deviant); Bekinschtein et al., 2009; Wacongne et al., 2011). Findings from our study confirm that global probabilistic information was not processed during this early window, due to its extended timescale of memory retrieval, rather than the complexity of the pattern.

Our individual differences analyses in the early time window showed an intriguing relationship between the effect of local probability and online learning behavior during the speech SL task. Individuals who were less sensitive to local probabilities, as measured by a smaller MMN, showed faster RT acceleration during the SL task where the knowledge of transitional probability was established over an extended period of time. Does better learning of global statistical patterns reflect greater inhibition of local probabilistic processing? Our findings are consistent with the global interference effect studied in human visual and auditory perception (Bouvet et al., 2011; Navon, 1977; Poirel et al., 2008). Across these experiments, accurate identification of the global pattern (e.g., pitch changes over the whole melody) comes at a cost to identifying the local pattern (e.g., pitch changes in a three-tone group). Our finding might also suggest that processing of local probabilistic information hinders the detection of the global statistical patterns. Such a suppression effect of local distributional information on global pattern learning has been reported in the context of rule-based learning. Learners failed to generalize global statistical patterns to new items when being overloaded with local distributional information, for example through an extended familiarization phase (Endress & Bonatti, 2007; Peña et al., 2002). Therefore, our finding supports a competitive relationship between local and global probabilistic information, though it does not speak to the direction of the interference.

In the late time window, the presence of a global effect is consistent with previous reports that global probabilistic information is processed later (Chennu et al., 2013; Berkinstein et al., 2009; Wacongne et al., 2011; Marti, Thibault, & Dehaene, 2014). However, unlike these studies, attention was not explicitly directed towards the stimuli in our paradigm. As a result, we observed an LDN rather than a P3a response to global probabilistic information. Research in both children and adults indicates that verbal stimuli elicits a later centrally distributed negativity which reflects the automatic processing of complex auditory, possibly even linguistic information (Cheour et al., 2001; Hill, McArthur, & Bishop, 2004; Korpilahti et al., 1995). The LDN, sometimes also termed as a late MMN (e.g., Korpilahti, Krause, Holopainen, & Lang, 2001), has been speculated to represent additional (or immature) processing of subtle features of the auditory stimulus (Bishop et al., 2011). Although there has not been a consensus regarding whether the LDN reflects an attention-dependent or attention-independent process (Horváth et al., 2008; Roeber et al., 2003), the fact that the LDN is sensitive to global probability in the context of passive oddball paradigms suggests explicit attention to the stimuli is not necessary (Bishop, 2007; Wetzel & Schröger, 2014).

The interaction between the local and the global effects in the late window provides more evidence about the role of attention in the processing of distributional information in speech. We observed a shift from an LDN to P3a response to probabilistic information as stimuli became less expected, suggesting that the global distributional pattern modulates an individual’s expectation to the local stimulus. When encountering a locally rare deviant with a relatively more frequent global probability, participants consistently showed an LDN response. However, when participants encountered a locally rare deviant, which is highly unexpected in the global context, the LDN effect was replaced by a P3a-like effect with large variabilities across individuals. We found two-thirds of the participants elicited a P3a response, while the other one-third of the participants elicited an LDN response. Similar shifts from an LDN to a P3a were observed for the effect of global probability. These results provide neural evidence that the mutual interference effect between global and local levels might depend on an attentional process.

Individual difference analyses in the late time window revealed a facilitatory role of attention in statistical learning. Individuals with a larger P3a response, which indexes a greater involuntary attentional shift modulated by the global context, demonstrated more rapid online statistical learning. Although this relationship did not survive Bonferroni correction, it is likely that flexible top-down control of attention towards salient prediction errors allows for more efficient learning during the statistical learning task, through prioritizing the global patterns embedded in the speech stream (Zhao et al., 2013). Although selective attention to stimuli is known to boost statistical learning (e.g. Toro, Sinnett, & Soto-Faraco, 2005; Turk-Browne, Jungé, & Scholl, 2005), our findings emphasized the relationship between endogenous attentional control and real-time statistical learning. However, attention is apparently not necessary for statistical learning. Participants, regardless of their ERP profiles (P3a or LDN), were equally successful in recognizing the familiar words. These findings align well with the view that both the attention-independent and the attention-dependent learning mechanisms are at play, both within and across individuals (Batterink et al., 2015; Conway, 2020) (Daltrozzo & Conway, 2014; Jamieson & Mewhort, 2009; Ordin & Polyanskaya, 2021). Batterink and colleagues (2015) evaluated explicit knowledge of SL via a forced-choice recognition test combined with a remember/know procedure, and implicit knowledge of SL through a novel target detection task. They found that stronger subjective awareness of recollection was associated with more accurate explicit knowledge of SL. But the accurate explicit knowledge of SL was not associated with the implicit knowledge of SL measured either by reaction time or P300 effects during the target detection task. These results suggest parallel processes for the formation of attention-dependent vs. attention-independent memory traces of statistical patterns.

Walk and Conway (2016) proposed that implicit learning is sufficient for learning unimodal sequential regularities (i.e., sequential dependencies between items in the same perceptual modality) but that additional cognitive resources such as selective attention or working memory may be required to learn cross-modal sequential patterns. From the developmental perspective, a bottom-up implicit-perceptual learning system develops early in life and encodes the surface structure of input, while a second system that is dependent on attention, develops later in life, and relies to a greater extent on top-down information to encode and represent more complex patterns (Daltrozzo & Conway, 2014). Even though endogenous attentional control is not mature in children, children show adult-like performance in speech statistical learning measured by offline recognition tasks (Raviv & Arnon, 2018; Saffran et al., 1996; Shufaniya & Arnon, 2018). Processing-based measures, such as ERPs and reaction time, will have the potential to unveil the developmental shift of weights from automatic to attention-dependent learning mechanisms.

It is worth noting that previous studies vary in their definition of local and global information. In our EEG paradigm, we focused on how frequently certain syllables or voices were presented, that is, the distributional statistics. The local and global levels refer to the temporal scale of their distribution, either in seconds or in minutes. The definition of both our local and global information is similar to the global statistics defined as entropy/uncertainty in the framework of information theory (Daikoku, 2018; Harrison et al., 2006). For the local information, because the occurrence of the deviants are entirely unpredicatable based on the number of proceeding standards, the local statistics only reflect the summary statistics around the time of the most recent encountering (similar to Bekinschtein et al., 2009). In contrast, the global information was derived from the summary statistics across the whole sequence, which guides the prediction for the likelihood of occurrence of future sensory inputs. In our behavioral paradigm, we investigated transitional statistical learning. Even though the learning process leads to item-based statistics, which on the surface, appears to be specific to the local context, successful extraction of such information requires computation of co-occuring frequency statistics across the whole sequence. Therefore, we think transitional statistical information shares features of global distributions. The findings from our study provide critical neural evidence linking distributional and transitional statistical learning.

A limitation of the current study relates to the interpretations about implicit versus explicit attention. True implicit learning occurs over extended periods of time, requiring substantial periods of training or practice (DeJong, 2005). Studies with short exposure durations are biased towards explicit attention (Daikoku et al., 2017; Dekeyser, 2008). Thus, future studies would benefit from examining implicit learning over a longer time window than the current design. Moreover, because we did not measure EEG during the word-segmentation task, our findings do not speak to a causal relationship between attention and transitional SL. Thus, further research is needed to determine whether individuals’ attentional engagement during a lower-order SL task is a reliable indicator of their attentional engagement during a higher-order SL task. The current study was also limited by the lack of other individual difference measures in our sample. For example, the ability to perceive absolute pitch and musical training experience might affect individuals’ sensitivity to statistical patterns in both the speech and non-speech domains (Schön & François, 2012; Saffran & Griepentrog, 2001). Working memory capacity, on the other hand, might also play a role in learning transitional probability patterns (Arciuli & von Koss Torkildsen, 2012; Palmer et al., 2018).

Taken together, the current study provides indication about the facilitatory role of attention in statistical learning: while the neural adaptation to local and transient distributional information is automatic, computing the global distributional information over a continuous stream of stimuli may engage involuntary attentional shift, a process possibly deployed by more successful statistical learners.

Highlights:

  • Monitoring of local distributional probabilities engaged an automatic pre-attentive process (MMN).

  • Global distributional information modulated listeners’ expectation of local auditory events.

  • A larger P3a response was associated with faster online learning of transitional statistics.

  • Involuntary attention to statistical information may facilitate statistical learning.

Acknowledgements:

We thank An Nguyen and Violet Kozloff for their contribution in experiment implementation and data collection. We thank Tyler Perrachione for his advice on stimuli construction and experimental design. We thank Sara Beach and Elizabeth Norton for their contribution in auditory stimuli recording. We thank Yoel Sanchez Araujo and Wendy Georgan for their contribution in the design of the web-based speech statistical learning task.

Funding:

This work was supported by NARSAD Young Investigator Award (#24836) sponsored by Brain & Behavior Research Foundation and National Institute on Deafness and Other Communication Disorders (R21DC017576). The first author’s time was supported by an NSF SPRF from the Social, Behavioral and Economic Sciences Directorate (#1911462).

5. References

  1. Alamia A, & Zénon A. (2016). Statistical regularities attract attention when task-relevant. Frontiers in Human Neuroscience, 10(FEB2016), 42. 10.3389/fnhum.2016.00042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alho K, Woods DL, & Algazi A. (1994). Processing of auditory stimuli during auditory and visual attention as revealed by event-related potentials. Psychophysiology, 31(5), 469–479. 10.1111/j.1469-8986.1994.tb01050.x [DOI] [PubMed] [Google Scholar]
  3. Arciuli J, & von Koss Torkildsen J. (2012). Advancing our understanding of the link between statistical learning and language acquisition: The need for longitudinal data. In Frontiers in Psychology (Vol. 3, Issue AUG). 10.3389/fpsyg.2012.00324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Aslin RN, & Newport EL (2014). Distributional Language Learning: Mechanisms and Models of Category Formation. Language Learning , 64(2), 86–105. 10.1111/lang.12074 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Baldeweg T. (2007). ERP repetition effects and mismatch negativity generation: A predictive coding perspective. In Journal of Psychophysiology (Vol. 21, Issues 3–4, pp. 204–213). 10.1027/0269-8803.21.34.204 [DOI] [Google Scholar]
  6. Batterink LJ, & Paller KA (2019). Statistical learning of speech regularities can occur outside the focus of attention. Cortex, 115, 56–71. 10.1016/j.cortex.2019.01.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Batterink LJ, Reber PJ, Neville HJ, & Paller KA (2015). Implicit and explicit contributions to statistical learning. Journal of Memory and Language, 83, 62–78. 10.1016/j.jml.2015.04.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Batterink LJ, Reber PJ, & Paller KA (2015). Functional differences between statistical learning with and without explicit training. Learning and Memory, 22(11), 544–556. 10.1101/lm.037986.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bekinschtein TA, Dehaene S, Rohaut B, Tadel F, Cohen L, & Naccache L. (2009a). Neural signature of the conscious processing of auditory regularities. Proceedings of the National Academy of Sciences of the United States of America, 106(5), 1672–1677. 10.1073/pnas.0809667106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bekinschtein TA, Dehaene S, Rohaut B, Tadel F, Cohen L, & Naccache L. (2009b). Neural signature of the conscious processing of auditory regularities. Proceedings of the National Academy of Sciences of the United States of America, 106(5), 1672–1677. 10.1073/pnas.0809667106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bishop D. (2007). Using mismatch negativity to study central auditory processing in developmental language and literacy impairments: Where are we, and where should we be going? . Psychological Bulletin, 133(4), 651–672. https://psycnet.apa.org/doiLanding?doi=10.1037%2F0033-2909.133.4.651 [DOI] [PubMed] [Google Scholar]
  12. Bishop DVM, Hardiman MJ, & Barry JG (2011). Is auditory discrimination mature by middle childhood? A study using time-frequency analysis of mismatch responses from 7 years to adulthood. Developmental Science, 14(2), 402–416. 10.1111/j.1467-7687.2010.00990.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Bouvet L, Rousset S, Valdois S, & Donnadieu S. (2011). Global precedence effect in audition and vision: Evidence for similar cognitive styles across modalities. Acta Psychologica, 138(2), 329–335. 10.1016/j.actpsy.2011.08.004 [DOI] [PubMed] [Google Scholar]
  14. Caclin A, Brattico E, Tervaniemi M, Näätänen R, Morlet D, Giard MH, & McAdams S. (2006). Separate neural processing of timbre dimensions in auditory sensory memory. Journal of Cognitive Neuroscience, 18(12), 1959–1972. 10.1162/jocn.2006.18.12.1959 [DOI] [PubMed] [Google Scholar]
  15. Chennu S, Noreika V, Gueorguiev D, Blenkmann A, Kochen S, Ibáñez A, Owen AM, & Bekinschtein TA (2013). Expectation and attention in hierarchical auditory prediction. Journal of Neuroscience, 33(27), 11194–11205. 10.1523/JNEUROSCI.0114-13.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Cheour M, Korpilahti P, Martynova O, & Lang A-H (2001). Mismatch Negativity and Late Discriminative Negativity in Investigating Speech Perception and Learning in Children and Infants. Audiology and Neuro-Otology, 6(1), 2–11. 10.1159/000046804 [DOI] [PubMed] [Google Scholar]
  17. Conway CM (2020). How does the brain learn environmental structure? Ten core principles for understanding the neurocognitive mechanisms of statistical learning. 10.1016/j.neubiorev.2020.01.032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Conway CM, & Christiansen MH (2006). Statistical learning within and between modalities: Pitting abstract against stimulus-specific representations. Psychological Science, 17(10), 905–912. 10.1111/j.1467-9280.2006.01801.x [DOI] [PubMed] [Google Scholar]
  19. Curran T, & Keele SW (1993). Attentional and Nonattentional Forms of Sequence Learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19(1), 189–202. 10.1037/0278-7393.19.1.189 [DOI] [Google Scholar]
  20. Daikoku T. (2018). Neurophysiological markers of statistical learning in music and language: Hierarchy, entropy, and uncertainty. In Brain Sciences (Vol. 8, Issue 6). 10.3390/brainsci8060114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Daikoku T, Yatomi Y, & Yumoto M. (2014). Implicit and explicit statistical learning of tone sequences across spectral shifts. Neuropsychologia, 63, 194–204. 10.1016/J.NEUROPSYCHOLOGIA.2014.08.028 [DOI] [PubMed] [Google Scholar]
  22. Daikoku T, Yatomi Y, & Yumoto M. (2015). Statistical learning of music- and language-like sequences and tolerance for spectral shifts. Neurobiology of Learning and Memory, 118, 8–19. 10.1016/J.NLM.2014.11.001 [DOI] [PubMed] [Google Scholar]
  23. Daikoku T, Yatomi Y, & Yumoto M. (2016). Pitch-class distribution modulates the statistical learning of atonal chord sequences. Brain and Cognition, 108, 1–10. 10.1016/j.bandc.2016.06.008 [DOI] [PubMed] [Google Scholar]
  24. Daikoku T, Yatomi Y, & Yumoto M. (2017). Statistical learning of an auditory sequence and reorganization of acquired knowledge: A time course of word segmentation and ordering. Neuropsychologia, 95, 1–10. 10.1016/j.neuropsychologia.2016.12.006 [DOI] [PubMed] [Google Scholar]
  25. Daltrozzo J, & Conway CM (2014). Neurocognitive mechanisms of statistical-sequential learning: What do event-related potentials tell us? In Frontiers in Human Neuroscience (Vol. 8, Issue JUNE, p. 437). Frontiers Media S. A. 10.3389/fnhum.2014.00437 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Daltrozzo J, Emerson SN, Deocampo J, Singh S, Freggens M, Branum-Martin L, & Conway CM (2017). Visual statistical learning is related to natural language ability in adults: An ERP study. Brain and Language, 166, 40–51. 10.1016/j.bandl.2016.12.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. DeJong N. (2005). Learning Second Language Grammar By Listening. http://wwwlot.let.uu.nl/ [Google Scholar]
  28. Dekeyser R. (2008). Implicit and Explicit Learning. In The Handbook of Second Language Acquisition (pp. 312–348). wiley. 10.1002/9780470756492.ch11 [DOI] [Google Scholar]
  29. Delorme A, Makeig S, & Sejnowski T. (2001). Automatic artifact rejection for EEG data using high-order statistics and independent component analysis. Proc. of the 3rd International Workshop on ICA, 457, 462. [Google Scholar]
  30. Delorme Arnaud, & Makeig S. (2004). EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods, 134(1), 9–21. http://search.ebscohost.com/login.aspx?direct=true&db=cmedm&AN=15102499&site=ehost-live [DOI] [PubMed] [Google Scholar]
  31. Donchin E, & Coles MGH (1988). Is the P300 component a manifestation of context updating? Behavioral and Brain Sciences, 11(3), 357–374. 10.1017/S0140525X00058027 [DOI] [Google Scholar]
  32. Duncan CC, Barry RJ, Connolly JF, Fischer C, Michie PT, Näätänen R, Polich J, Reinvang I, & Van Petten C. (2009). Event-related potentials in clinical research: Guidelines for eliciting, recording, and quantifying mismatch negativity, P300, and N400. In Clinical Neurophysiology (Vol. 120, Issue 11, pp. 1883–1908). 10.1016/j.clinph.2009.07.045 [DOI] [PubMed] [Google Scholar]
  33. Endress AD, & Bonatti LL (2007). Rapid learning of syllable classes from a perceptually continuous speech stream. Cognition, 105(2), 247–299. 10.1016/j.cognition.2006.09.010 [DOI] [PubMed] [Google Scholar]
  34. Erickson LC, & Thiessen ED (2015). Statistical learning of language: Theory, validity, and predictions of a statistical learning account of language acquisition. Developmental Review. 10.1016/j.dr.2015.05.002 [DOI] [Google Scholar]
  35. Fernandes T, Kolinsky R, Attention PV-, Perception, undefined, & undefined, & 2010, undefined. (2010). The impact of attention load on the use of statistical information and coarticulation as speech segmentation cues. Springer, 72(6), 1522–1532. 10.3758/APP.72.6.1522 [DOI] [PubMed] [Google Scholar]
  36. Fiser J, & Aslin RN (2001). Unsupervised statistical learning of higher-order spatial structures from visual scenes. Psychological Science, 12(6), 499–504. [DOI] [PubMed] [Google Scholar]
  37. Fiser József, & Aslin RN (2002). Statistical learning of higher-order temporal structure from visual shape sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28(3), 458. [DOI] [PubMed] [Google Scholar]
  38. Fitzgerald K, & Todd J. (2018). Hierarchical timescales of statistical learning revealed by mismatch negativity to auditory pattern deviations. Neuropsychologia, 120, 25–34. 10.1016/j.neuropsychologia.2018.09.015 [DOI] [PubMed] [Google Scholar]
  39. Forest TA, Siegelman N, & Finn AS (2021). Attention shifts to more complex structure with experience. PsyArxiv. https://psyarxiv.com/kr5a9/ [DOI] [PubMed] [Google Scholar]
  40. François C, & Schön D. (2011). Musical expertise and statistical learning of musical and linguistic structures. In Frontiers in Psychology (Vol. 2). 10.3389/fpsyg.2011.00167 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Frost R, Armstrong BC, Siegelman N, & Christiansen MH (2015). Domain generality versus modality specificity: The paradox of statistical learning. In Trends in Cognitive Sciences (Vol. 19, Issue 3, pp. 117–125). Elsevier Ltd. 10.1016/j.tics.2014.12.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Furl N, Kumar S, Alter K, Durrant S, Shawe-Taylor J, & Griffiths TD (2011). Neural prediction of higher-order auditory sequence statistics. NeuroImage, 54(3), 2267–2277. 10.1016/j.neuroimage.2010.10.038 [DOI] [PubMed] [Google Scholar]
  43. Gershon RC, Wagster MV, Hendrie HC, Fox NA, Cook KF, & Nowinski CJ (2013). NIH toolbox for assessment of neurological and behavioral function. Neurology, 80(11 Suppl 3), S2. 10.1212/wnl.0b013e3182872e5f [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Goydke KN, Altenmüller E, Möller J, & Münte TF (2004). Changes in emotional tone and instrumental timbre are reflected by the mismatch negativity. Cognitive Brain Research, 21(3), 351–359. 10.1016/j.cogbrainres.2004.06.009 [DOI] [PubMed] [Google Scholar]
  45. Groppe DM, Urbach TP, & Kutas M. (2011). Mass univariate analysis of event-related brain potentials/fields I: A critical tutorial review. In Psychophysiology (Vol. 48, Issue 12, pp. 1711–1725). Blackwell Publishing Inc. 10.1111/j.1469-8986.2011.01273.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Haeascfael C, Vernon DJ, Dwivedi P, Gruzelier JH, & Baldeweg T. (2005). Event-related brain potential correlates of human auditory sensory memory-trace formation. Journal of Neuroscience, 25(45), 10494–10501. 10.1523/JNEUROSCI.1227-05.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Hendricks MA, Conway CM, & Kellogg RT (2013). Using dual-task methodology to dissociate automatic from nonautomatic processes involved in artificial grammar learning. Journal of Experimental Psychology: Learning Memory and Cognition, 39(5), 1491–1500. 10.1037/a0032974 [DOI] [PubMed] [Google Scholar]
  48. Hill PR, McArthur GM, & Bishop DVM (2004). Phonological categorization of vowels: A mismatch negativity study. NeuroReport, 15(14), 2195–2199. 10.1097/00001756-200410050-00010 [DOI] [PubMed] [Google Scholar]
  49. Horváth J, Winkler I, & Bendixen A. (2008). Do N1/MMN, P3a, and RON form a strongly coupled chain reflecting the three stages of auditory distraction? Biological Psychology. https://psycnet.apa.org/record/2008-12958-001 [DOI] [PubMed] [Google Scholar]
  50. Jamieson RK, & Mewhort DJK (2009). Applying an exemplar model to the serial reaction-time task: Anticipating from experience. Quarterly Journal of Experimental Psychology, 62(9), 1757–1783. 10.1080/17470210802557637 [DOI] [PubMed] [Google Scholar]
  51. Jung T-P, Makeig S, Humphries C, Lee T-W, McKeown MJ, Iragui V, & Sejnowski TJ (2000). Removing electroencephalographic artifacts by blind source separation. Psychophysiology, 37(2), 163–178. 10.1111/1469-8986.3720163 [DOI] [PubMed] [Google Scholar]
  52. Kaufman AS, & Kaufman NL (2004). KBIT2 : Kaufman Brief Intelligence Test . Pearson/PsychCorp. [Google Scholar]
  53. Kim SG, Kim JS, & Chung CK (2011). The effect of conditional probability of chord progression on brain response: An MEG study. PLoS ONE, 6(2), e17337. 10.1371/journal.pone.0017337 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Koelsch S, Busch T, Jentschke S, & Rohrmeier M. (2016). Under the hood of statistical learning: A statistical MMN reflects the magnitude of transitional probabilities in auditory sequences. Scientific Reports 2016 6:1, 6(1), 1–11. 10.1038/srep19741 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Korpilahti P, Lang H, & Aaltonen O. (1995). Is there a late-latency mismatch negativity (MMN) component? Electroencephalography and Clinical Neurophysiology, 95(4), P96. 10.1016/0013-4694(95)90016-g [DOI] [PubMed] [Google Scholar]
  56. Korpilahti Pirjo, Krause CM., Holopainen I, & Lang AH (2001). Early and late mismatch negativity elicited by words and speech-like stimuli in children. Brain and Language, 76(3), 332–339. 10.1006/brln.2000.2426 [DOI] [PubMed] [Google Scholar]
  57. Lopez-Calderon J, & Luck SJ (2014). ERPLAB: An open-source toolbox for the analysis of event-related potentials. Frontiers in Human Neuroscience, 8(1 APR). 10.3389/fnhum.2014.00213 [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Maris E, & Oostenveld R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177–190. http://search.ebscohost.com/login.aspx?direct=true&db=cmedm&AN=17517438&site=ehost-live [DOI] [PubMed] [Google Scholar]
  59. Marti S, Thibault L, & Dehaene S. (2014). How does the extraction of local and global auditory regularities vary with context? PLoS ONE, 9(9). 10.1371/journal.pone.0107227 [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Moreau P, Jolicoeur P, & Peretz I. (2013). Pitch discrimination without awareness in congenital amusia: Evidence from event-related potentials. 10.1016/j.bandc.2013.01.004 [DOI] [PubMed] [Google Scholar]
  61. Näätänen R, Paavilainen P, Rinne T, & Alho K. (2007). The mismatch negativity (MMN) in basic research of central auditory processing: A review. In Clinical Neurophysiology (Vol. 118, Issue 12, pp. 2544–2590). 10.1016/j.clinph.2007.04.026 [DOI] [PubMed] [Google Scholar]
  62. Näätänen Risto, Tervaniemi M, Sussman E, Paavilainen P, & Winkler I. (2001). “Primitive intelligence” in the auditory cortex. In Trends in Neurosciences (Vol. 24, Issue 5, pp. 283–288). Elsevier. 10.1016/S0166-2236(00)01790-2 [DOI] [PubMed] [Google Scholar]
  63. Navon D. (1977). Forest before trees: The precedence of global features in visual perception. Cognitive Psychology, 9(3), 353–383. 10.1016/0010-0285(77)90012-3 [DOI] [Google Scholar]
  64. Ordin M, & Polyanskaya L. (2021). The role of metacognition in recognition of the content of statistical learning. Psychonomic Bulletin and Review, 28(1), 333–340. 10.3758/S13423-020-01800-0/FIGURES/1 [DOI] [PubMed] [Google Scholar]
  65. Paavilainen P. (2013). The mismatch-negativity (MMN) component of the auditory event-related potential to violations of abstract regularities: A review. International Journal of Psychophysiology, 88(2), 109–123. 10.1016/j.ijpsycho.2013.03.015 [DOI] [PubMed] [Google Scholar]
  66. Palmer SD, Hutson J, & Mattys SL (2018). Statistical learning for speech segmentation: Age-related changes and underlying mechanisms. Psychology and Aging, 33(7), 1035–1044. 10.1037/pag0000292 [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Paraskevopoulos E, Kuchenbuch A, Herholz SC, & Pantev C. (2012). Statistical learning effects in musicians and non-musicians: An MEG study. Neuropsychologia, 50(2), 341–349. 10.1016/j.neuropsychologia.2011.12.007 [DOI] [PubMed] [Google Scholar]
  68. Pegado F, Bekinschtein T, Chausson N, Dehaene S, Cohen L, & Naccache L. (2010). Probing the lifetimes of auditory novelty detection processes. Neuropsychologia, 48(10), 3145–3154. 10.1016/j.neuropsychologia.2010.06.030 [DOI] [PubMed] [Google Scholar]
  69. Peña M, Bonatti LL, Nespor M, & Mehler J. (2002). Signal-Driven Computations in Speech Processing. Science, 10, 1. http://search.epnet.com/login.aspx?direct=true&db=aph&an=7738404&lang=es [DOI] [PubMed] [Google Scholar]
  70. Poirel N, Pineau A, & Mellet E. (2008). What does the nature of the stimuli tell us about the Global Precedence Effect? Acta Psychologica, 127(1), 1–11. 10.1016/j.actpsy.2006.12.001 [DOI] [PubMed] [Google Scholar]
  71. Polich J. (2007). Updating P300: An integrative theory of P3a and P3b. In Clinical Neurophysiology (Vol. 118, Issue 10, pp. 2128–2148). NIH Public Access. 10.1016/j.clinph.2007.04.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Qi Z, Sanchez Araujo Y, Georgan WC, Gabrieli JD, & Arciuli J. (2019). Hearing matters more than seeing: A cross-modality study of statistical learning and reading ability. Scientific Studies of Reading, 23(1), 101–115. [Google Scholar]
  73. Raviv L, & Arnon I. (2018). The developmental trajectory of children’s auditory and visual statistical learning abilities: modality-based differences in the effect of age. Developmental Science, 21(4), e12593. 10.1111/desc.12593 [DOI] [PubMed] [Google Scholar]
  74. Roeber U, Berti S, & Schröger E. (2003). Auditory distraction with different presentation rates: An event-related potential and behavioral study. Clinical Neurophysiology, 114(2), 341–349. 10.1016/S1388-2457(02)00377-2 [DOI] [PubMed] [Google Scholar]
  75. Saffran JR, Aslin RN, & Newport EL (1996a). Statistical Learning by 8-Month-Old Infants. Science, 274(5294), 1926–1928. [DOI] [PubMed] [Google Scholar]
  76. Saffran JR, Aslin RN, & Newport EL (1996b). Word Segmentation: The Role of Distributional Cues. Journal of Memory and Language, 35(4), 606–621. 10.1006/jmla.1996.0032 [DOI] [Google Scholar]
  77. Saffran, Jenny R, Johnson EK, Aslin RN, & Newport EL (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70(1), 27–52. 10.1016/S0010-0277(98)00075-4 [DOI] [PubMed] [Google Scholar]
  78. Saffran Jenny R, Newport,’, Aslin EL, Tunick RN, A R, & Barrueco3 S (1997). INCIDENTAL LANGUAGE LEARNING: Listening (and Learning) out of the Comer of Your Ear. https://journals.sagepub.com/doi/pdf/10.1111/j.1467-9280.1997.tb00690.x [Google Scholar]
  79. Schneider JM, Hu A, Legault J, & Qi Z. (2020). Measuring statistical learning across modalities and domains in school-aged children via an online platform and neuroimaging techniques. Journal of Visualized Experiments, 2020(160), 1–21. 10.3791/61474 [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Seitz A, Kim R, Wassenhove V. van, Perception LS-, & 2007, undefined. (2007). Simultaneous and independent acquisition of multisensory and unisensory associations. Journals.Sagepub.Com, 36(10), 1445–1453. 10.1068/p5843 [DOI] [PubMed] [Google Scholar]
  81. Sergent C, Baillet S, & Dehaene S. (2005). Timing of the brain events underlying access to consciousness during the attentional blink. Nature Neuroscience, 8(10), 1391–1400. 10.1038/nn1549 [DOI] [PubMed] [Google Scholar]
  82. Shufaniya A, & Arnon I. (2018). Statistical Learning Is Not Age-Invariant During Childhood: Performance Improves With Age Across Modality. Cognitive Science, 42(8), 3100–3115. 10.1111/cogs.12692 [DOI] [PubMed] [Google Scholar]
  83. Siegelman N, Bogaerts L, & Frost R. (2017). Measuring individual differences in statistical learning: Current pitfalls and possible solutions. Behavior Research Methods, 49(2), 418–432. 10.3758/s13428-016-0719-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Song S, Howard JH, & Howard DV (2007). Implicit probabilistic sequence learning is independent of explicit awareness. Learning & Memory, 14(3), 167–176. 10.1101/LM.437407 [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Squires NK, Squires KC, & Hillyard SA (1975). Two varieties of long-latency positive waves evoked by unpredictable auditory stimuli in man. Electroencephalography and Clinical Neurophysiology, 38(4), 387–401. 10.1016/0013-4694(75)90263-1 [DOI] [PubMed] [Google Scholar]
  86. Stadler W, Klimesch W, Pouthas V, & Ragot R. (2006). Differential effects of the stimulus sequence on CNV and P300. Brain Research, 1123(1), 157–167. 10.1016/j.brainres.2006.09.040 [DOI] [PubMed] [Google Scholar]
  87. Sutton S, Braren M, Zubin J, & John ER (1965). Evoked-potential correlates of stimulus uncertainty. Science (New York, N.Y.), 150(3700), 1187–1188. 10.1126/science.150.3700.1187 [DOI] [PubMed] [Google Scholar]
  88. Tervaniemi M, Schröger E, Saher M, & Näätänen R. (2000). Effects of spectral complexity and sound duration on automatic complex-sound pitch processing in humans - A mismatch negativity study. Neuroscience Letters, 290(1), 66–70. 10.1016/S0304-3940(00)01290-8 [DOI] [PubMed] [Google Scholar]
  89. Tervaniemi Mari, Winkler I, & Näätänen R. (1997). Pre-attentive categorization of sounds by timbre as revealed by event-related potentials. NeuroReport, 8(11), 2571–2574. 10.1097/00001756-199707280-00030 [DOI] [PubMed] [Google Scholar]
  90. Thiessen ED (2017). What’s statistical about learning? Insights from modelling statistical learning as a set of memory processes. In Philosophical Transactions of the Royal Society B: Biological Sciences (Vol. 372, Issue 1711). The Royal Society. 10.1098/rstb.2016.0056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Thiessen ED, Kronstein AT, & Hufnagle DG (2013a). The extraction and integration framework: A two-process account of statistical learning. Psychological Bulletin, 139(4), 792–814. 10.1037/a0030801 [DOI] [PubMed] [Google Scholar]
  92. Thiessen ED, Kronstein AT, & Hufnagle DG (2013b). The extraction and integration framework: A two-process account of statistical learning. Psychological Bulletin, 139(4), 792–814. 10.1037/a0030801 [DOI] [PubMed] [Google Scholar]
  93. Toiviainen P, Tervaniemi M, Louhivuori J, Saher M, Huotilainen M, & Näätänen R. (1998). Timbre similarity: Convergence of neural, behavioral, and computational approaches. Music Perception, 16(2), 223–241. 10.2307/40285788 [DOI] [Google Scholar]
  94. Toro JM, Sinnett S, & Soto-Faraco S. (2005). Speech segmentation by statistical learning depends on attention. Cognition, 97(2). 10.1016/j.cognition.2005.01.006 [DOI] [PubMed] [Google Scholar]
  95. Tsogli V, Jentschke S, Daikoku T, & Koelsch S. (2019). When the statistical MMN meets the physical MMN. Scientific Reports, 9(1). 10.1038/s41598-019-42066-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Turk-Browne NB, Isola PJ, Scholl BJ, & Treat TA (2008). Multidimensional Visual Statistical Learning. Journal of Experimental Psychology: Learning Memory and Cognition, 34(2), 399–407. 10.1037/0278-7393.34.2.399 [DOI] [PubMed] [Google Scholar]
  97. Turk-Browne NB, Jungé JA, & Scholl BJ (2005). The Automaticity of Visual Statistical Learning. 10.1037/0096-3445.134.4.552 [DOI] [PubMed] [Google Scholar]
  98. Turk-Browne NB, Scholl BJ, Chun MM, & Johnson MK (2009). Neural evidence of statistical learning: Efficient detection of visual regularities without awareness. Journal of Cognitive Neuroscience, 21(10), 1934–1945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Van Zuijen TL, Simoens VL, Paavilainen P, Näätänen R, & Tervaniemi M. (2006). Implicit, Intuitive, and Explicit Knowledge of Abstract Regularities in a Sound Sequence: An Event-related Brain Potential Study. [DOI] [PubMed] [Google Scholar]
  100. Vuong LC, Meyer AS, Christiansen MH, & Kau K. (2016). Concurrent Statistical Learning of Adjacent and Nonadjacent Dependencies. Language Learning, 66(1). 10.1111/lang.12137 [DOI] [Google Scholar]
  101. Wacongne C, Labyt E, Van Wassenhove V, Bekinschtein T, Naccache L, & Dehaene S. (2011). Evidence for a hierarchy of predictions and prediction errors in human cortex. Proceedings of the National Academy of Sciences of the United States of America, 108(51), 20754–20759. 10.1073/pnas.1117807108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Walk AM, & Conway CM (2016). Cross-Domain Statistical–Sequential Dependencies Are Difficult to Learn. Frontiers in Psychology, 7, 250. 10.3389/fpsyg.2016.00250 [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Wetzel N, & Schröger E. (2014). On the development of auditory distraction: A review. PsyCh Journal, 3(1), 72–91. 10.1002/pchj.49 [DOI] [PubMed] [Google Scholar]
  104. Yu C, & Smith LB (2007). Rapid Word Learning Under Uncertainty via Cross-Situational Statistics. Psychological Science, 18(5), 414–420. 10.1111/j.1467-9280.2007.01915.x [DOI] [PubMed] [Google Scholar]
  105. Zhao J, Al-Aidroos N, & Turk-Browne NB (2013). Attention is spontaneously biased toward regularities. Psychological Science, 24(5), 667–677. 10.1177/0956797612460407 [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES