Re-defining "learning" in statistical learning: what does an online measure reveal about the assimilation of visual regularities?

Noam Siegelman; Louisa Bogaerts; Ofer Kronenfeld; Ram Frost

doi:10.1111/cogs.12556

. Author manuscript; available in PMC: 2019 Jun 1.

Published in final edited form as: Cogn Sci. 2017 Oct 7;42(Suppl 3):692–727. doi: 10.1111/cogs.12556

Re-defining "learning" in statistical learning: what does an online measure reveal about the assimilation of visual regularities?

Noam Siegelman ¹, Louisa Bogaerts ^1,², Ofer Kronenfeld ¹, Ram Frost ^1,^3,⁴

PMCID: PMC5889756 NIHMSID: NIHMS904674 PMID: 28986971

Abstract

From a theoretical perspective, most discussions of statistical learning (SL) have focused on the possible “statistical” properties which are the object of learning. Much less attention has been given to defining what “learning” is in the context of “statistical learning”. One major difficulty is that SL research has been monitoring participants’ performance in laboratory settings with a strikingly narrow set of tasks, where learning is typically assessed offline, through a set of 2-alternative-forced-choice questions, which follow a brief visual or auditory familiarization stream. Is that all there is to characterizing SL abilities? Here we adopt a novel perspective for investigating the processing of regularities in the visual modality. By tracking online performance in a self-paced SL paradigm, we focus on the trajectory of learning. In a set of three experiments we show that this paradigm provides a reliable and valid signature of SL performance, and offers important insights for understanding how statistical regularities are perceived and assimilated in the visual modality. This demonstrates the promise of integrating different operational measures to our theory of statistical learning.

Keywords: Statistical learning, Online measures, Learning dynamics, Individual differences

In the last two decades, statistical learning (SL) has become a major theoretical construct in cognitive science. Since the seminal demonstration of Saffran and her colleagues (1996) that infants display remarkable sensitivity to transitional probabilities of syllabic segments, a large and constantly growing number of studies have focused on documenting the human ability of exploiting statistical cues to discover regularities in their environment (see Frost, Armstrong, Siegelman, & Christiansen, 2015, for a review). Following this work, SL has been commonly defined as the ability to extract the statistical properties of sensory input in time and space (e.g., Frost et al., 2015; Romberg & Saffran, 2010; Schapiro & Turk-Browne, 2015). Unsurprisingly, therefore, most experimental manipulations and theoretical discussions of SL have focused on the possible “statistical” properties which are the object of perception and assimilation (e.g., Fiser & Aslin, 2001; Newport & Aslin, 2004; Thiessen, Kronstein, & Hufnagle, 2013). Most studies have thus differed in the type of statistical contingencies embedded in their input, aiming to chart whether or not, or to what extent these contingencies affect human performance. Interestingly, much less attention has been given to defining what “learning” is in the context of “statistical learning”. The present paper aims to address this gap.

As in any exploration in the cognitive or psychological sciences, a critical step in theory development is the operationalization of the theoretical construct of interest. The goal of successful operationalization is to minimize the distance between the theoretical definition of a construct and its corresponding operational proxy. Ideally, the operational measure does not leave out critical aspects of the theoretical construct, but also does not extend to cover unrelated ones. This is important, because with time, the theoretical and operational definitions are typically taken to be the two sides of the same coin, and are often even used interchangeably. As we will argue, in the context of SL, narrowing the gap between the “Statistical Learning ability” and its operational definition is far from being trivial.

One major difficulty is that SL research has been monitoring participants’ performance in laboratory settings with a strikingly narrow set of tasks (see Armstrong, Frost, & Christiansen, 2017, for discussion). Typically, the to-be-learned regularities (i.e., co-occurrence of elements, their transitional probabilities, etc.) are embedded in a sensory input for a relatively brief familiarization phase, and their “learning” is assessed in a subsequent test phase (typically a series of two-alternative-forced-choice (2-AFC) questions). By this approach, there is evidence for learning if the mean performance of a sample of participants is significantly above chance. From an individual differences perspective, “good” statistical learners are those who obtain a high score in the test, and “bad” statistical learners are those who perform at chance or close to it. Here we ask: is this all there is to characterizing statistical learning ability? Note that this question is not confined just to characterizing “good” or “poor” individual learners. It permeates to understanding SL as an ongoing process of assimilating various types of statistical properties. For if two learning conditions result in similar score in the post-familiarization test-score, they are implicitly taken to be equal in terms of the complexity they impose on participants, with all resulting theoretical implication (e.g., Arciuli, von Koss Torkildsen, Stevens, & Simpson, 2014). In contrast, if they result in different test scores, the magnitude of the test-score difference is taken to represent the difference in complexity between condition possibly suggesting different mechanisms (e.g., Bogaerts, Siegelman, & Frost, 2016). Are these implicit assumptions necessarily true?

The main aim of the present research is to expand the theoretical scope of “learning” in SL, by exploring other operational definitions for it. We start by reviewing the commonly used two-alternative-forced-choice (2-AFC) task as a proxy for SL, highlighting both its merits and shortcomings in terms of the theoretical coverage it offers. We then consider alternative operational measures of learning discussing their possible contribution to SL theory. Subsequently, we employ novel measures to investigate the processing of regularities in the visual modality. We show that critical insight for understanding visual SL can be gained once novel “learning” perspectives are integrated into our theory of assimilating statistical regularities. Specifically, our investigation focuses on one important aspect in SL behavior – the trajectory of learning – which was mostly overlooked due to the commonly used SL tasks.

Insights from observing offline test performance

Most SL studies have been using the same experimental procedure that was originally employed by Saffran and her colleagues¹. The typical SL task comprises two parts: First, a familiarization phase, in which participants are exposed to a stream of stimuli in the auditory or visual modality. Unbeknownst to participants the stream consists of several repeated patterns (typically, pairs or triplets of syllables or shapes), which co-occur frequently, so that the first elements in the patterns reliably predict the other elements. The patterns appear for a pre-defined number of repetitions (a parameter that varies widely between studies, from 12 repetitions of each pattern, e.g., Sell & Kaschak, 2009, to as many as 300 repetitions, Saffran, Newport, Aslin, Tunick, & Barrueco, 1997). Importantly, during familiarization, participants are typically asked to just passively attend to the sensory stream (e.g., Saffran, Johnson, Aslin, & Newport, 1999), or they perform an unrelated cover task (e.g., Arciuli & Simpson, 2012), so that no information regarding the actual learning of the statistical properties is collected during the familiarization phase itself.

At a second step, a test phase begins. Participants’ sensitivity to the statistical properties of the stream is assessed, typically via a 2-AFC recognition test. In each trial, a configuration of stimuli that appeared together in the familiarization phase (i.e., a pattern with high TPs between elements) is paired with a ‘foil’ – a configuration of stimuli that either did not appear together at all during familiarization (i.e., TPs=0), or that co-occurred less frequently than the target (i.e., a foil of relatively low TPs). Participants are required to decide which pattern of stimuli they are more familiar with, and a score based on the number of correct identifications of targets upon foils, is taken to reflect their SL ability.

In the following we label this common measure of SL an offline measure. We define offline measures as proxies of learning performance which do not tap participants’ accumulated knowledge throughout the presumed learning process itself (i.e., the familiarization phase, in which participants actually pick up the statistical properties of the stream), but monitor it in a later stage, once the learning process itself is already over (see also, e.g., Batterink & Paller, 2017; Franco, Gaillard, Cleeremans, & Destrebecqz, 2015). Note that the 2-AFC procedure described above constitutes but one example of possible offline measures. Other offline measures focus on familiarity ratings (e.g., Jonaitis & Saffran, 2009), or on speed of identification of targets vs. foils (e.g., Barakat, Seitz, & Shams, 2013; Bertels, Franco, & Destrebecqz, 2012), but they all assess performance once learning is over.

The reliance on offline measures, and specifically on the common 2-AFC tasks, reflects a common goal of most SL research: to demonstrate that humans can detect and extract statistical regularities embedded in a range of sensory inputs, whether in the auditory (Endress & Mehler, 2009), or visual (Kirkham, Slemmer, & Johnson, 2002) modality, over verbal (Pelucchi, Hay, & Saffran, 2009) or nonverbal (Gebhart, Newport, & Aslin, 2009) material, across time or space (Fiser & Aslin, 2002), and when contingencies are either adjacent or non-adjacent (Gómez, 2002; Newport & Aslin, 2004). For that purpose, offline measures such as the number of 2-AFC correct responses are in fact apt. If a sampled group of participants scores significantly above the 50% chance-level on a series of 2-AFC trials, then the population from which the group has been sampled is taken to possess the ability to extract, at least to some extent, the relevant statistical properties embedded in the input. In other words, such offline measures are useful for assessing whether learning has occurred or not in a given sample under certain experimental conditions, and if learning has indeed occurred, offline measures can also quantify the overall extent of learning for the sample (i.e. how much better than chance performance was). Previous research has indeed successfully used offline measures to compare the extent of SL between different populations (e.g., dyslexics vs. controls, Gabay, Thiessen, & Holt, 2015, children in different age groups, Arciuli & Simpson, 2011, etc.), and between different learning conditions (e.g., incidental vs. intentional learning conditions, Arciuli, von Koss Torkildsen, Stevens, & Simpson, 2014, under different presentation parameters, Emberson, Conway, & Christiansen, 2011, etc.).

From a theoretical perspective, however, this form of operationalization is not optimal. First and foremost, its coverage of the full scope of “learning” as a theoretical construct is relatively thin. It only assesses the extent of behavioral changes at a single, arbitrary pre-defined time point following exposure to the input. SL, in contrast, is taken to be a process of continuously assimilating the regularities in the environment, where behavior changes incrementally over time. Second, offline measures inevitably extend to cover cognitive processes unrelated to SL. Because in the testing phase participants are required to explicitly recall and decide which patterns have occurred during familiarization and which have not, offline measures cannot disentangle SL abilities per-se from encoding and memory capacities, and decision-making biases. To complicate things further, the 2-AFC testing procedure often involves methodological confounds related to the recurrent repetitions of targets and foils during the test phase (see Siegelman, Bogaerts, Christiansen, & Frost, 2017, for an extended discussion). Note that these problems are particularly relevant to the recent interest in individual-differences in SL as predictors of linguistic functions (e.g., Arciuli & Simpson, 2012; Conway, Bauernschmidt, Huang, & Pisoni, 2010; Frost, Siegelman, Narkiss, & Afek, 2013), and as a window on SL mechanisms (Frost et al., 2015; Siegelman & Frost, 2015). Since learning is a continuous process, a comprehensive characterization of it for individuals as well as for specific populations, involves the manner by which it dynamically unfolds. Offline measures are by definition blind to this.

As a simple demonstration, Fig. 1 shows how a similar offline learning score can result from very different learning trajectories, which diverge in the shape of the function (linear, logarithmic, or a step-function), as well as in the speed of learning. From a theoretical perspective, knowing what statistical information is picked-up at a given point in time point and at what rate is an important step towards a mechanistic understanding of SL. In a nutshell, we view the learning dynamics as an integral part of the definition of SL as a theoretical construct. Thus, if similar offline performance following familiarization is consistently achieved through different learning trajectories, then this must tell us something important about the mechanisms of learning statistical regularities (see also Adini, Bonneh, Komm, Deutsch, & Israeli, 2015, for discussion in the context of procedural learning). In the same vein, if two populations with similar success rate in an offline task have different learning trajectories building up to this overall performance, then these two populations should not be considered as having identical SL abilities. Importantly, this holds not only for group-level research, but also for the study of individual differences. Individuals may differ from one another not only in their overall learning magnitude, but also in their speed of learning--- fast vs. slow learners, and these two operational measures may have distinct predictive power (Siegelman et al., 2017).

A schematic depiction of different theoretically possible learning trajectories (from left to right: linear, logarithmic, step-function), all resulting in the same end performance. Light green lines represent a fast learning trajectory, dark green lines a slower one. Note that if one were to measure learning performance halfway, the offline learning score would be quite different depending of the shape of the function and the speed of learning.

Here we suggest to move from offline measuring of overall extent of learning to the online tracking of the learning process. The distinction between these measures reflects a different theoretical perspective: whereas current offline measures provide information regarding the representations of repeated patterns established after vast familiarization, online measures speak directly to how these representations are formed during learning, offering insight regarding the mechanisms of prediction which operate during the learning process. In general, SL research has provided little answer to questions such as how the continuous gradual learning of regularities results in the formation, updating and integration of representations in memory, and how such updating leads to predictions. Online measures have the promise to speak to these important questions directly.

Importantly, offering a novel operationalization to learning implicates not just theoretical considerations but also methodological ones. If the dynamic of learning is argued to be an essential part of our learning theory, one has to show first that its operational measures are reasonably reliable, and adequately valid. For if not, they cannot serve as proxy of SL. The present paper does exactly that. In Experiment 1, we consider an online measure that tracks the dynamics of learning regularities in the visual modality. We then explicitly test its reliability and validity. These findings serve as a springboard for putting to the test our main theoretical claim, that such online measures reveal invaluable information about the mechanisms of learning visual regularities which the typical offline measures are blind to. In Experiment 2 we focused on the extent of predictability in the stream and how different TPs impact learning. In Experiment 3 we targeted learning of more complex situations, where two streams of regularities are consecutively presented within a single experiment. Together, our findings reveal novel insights on how regularities in a visual input are perceived and learned.

Experiment 1

As noted above, we define online measures of performance as measures that assess performance throughout the learning process. They typically tap participants’ responses to a large number of stimuli throughout familiarization. The behavioral measure which is the focus of the present investigation considers the difference in RTs between stimuli given their predictability. According to SL theory, predictable elements should result in faster responses compared with unpredictable stimuli. This effect has been well documented in related paradigms in the field of implicit learning (such as the Serial Reaction Time task, SRT, e.g., Cleeremans & McClelland, 1991; Schvaneveldt & Gomez, 1998, or Contextual Cueing, e.g., Chun & Jiang, 1998).

Some recent studies have applied this simple experimental strategy to the domain of SL. For example, Misyak and colleagues employed an Artificial Grammar Learning (AGL) task in which participants heard sequences comprised of nonwords, and were simultaneously asked to click on corresponding written nonwords presented on the screen. RTs recorded for these mouse clicks showed that nonwords in predictable locations within sequences were recognized faster than nonwords in non-predictable locations (Misyak, Christiansen, & Tomblin, 2010b). In the same vein, Gomez and colleagues (2011) used a click-detection task, in which clicks were super-imposed on a speech stream comprising of tri-syllabic words. As learning proceeded, clicks in word boundaries were recognized faster than clicks within-words, and importantly, the RT difference between the two conditions increased throughout the familiarization phase (Gómez, Bion, & Mehler, 2011). Another recent example of an online measure is a self-paced Artificial Grammar Learning task (Karuza, Farmer, Fine, Smith, & Jaeger, 2014). Much like in the classic self-paced reading paradigm (Just, Carpenter, & Woolley, 1982), participants were asked to advance the elements in the sequences during familiarization at their own pace, by pressing the spacebar each time to advance to the next element in the stream. As predicted, presses for predictable stimuli were faster than those for unpredictable stimuli, with an increase in this RT difference over the course of familiarization (see also Amato & MacDonald, 2010 for a related self-paced reading paradigm in an Artificial Language Learning study). Another online measure of SL was offered by Dale and his colleagues in a paradigm similar to a SRT task, which continuously registered the mouse coordinates, measuring the extent to which participants anticipate the next stimulus in the sequence. Again, when stimuli in the stream were more predictable, participants tended to move the mouse in the direction of the stimulus already before it appeared, and this anticipatory behavior increased over the course of familiarization (Dale, Duran, & Morehead, 2012).

These findings raise a set of important methodological and theoretical questions. First, as we outlined above, an operational variable that is offered as proxy for a theoretical construct, should be proven to be 1) reliable – i.e., providing a stable and consistent measurement, and 2) valid – i.e., corresponds to the actual theoretical construct it presumably taps. Applying these criteria to the study of SL, a first critical question is whether the gain in RTs for predictable stimuli in the familiarization phase is a stable and reliable signature of each individual. The question of validity is somehow more complex. Theoretically, the online gain in RTs for predictable (vs. unpredictable stimuli) as learning proceeds seems evident. However, whether this speeding of response indeed reflects stable learning is an open question. Interestingly, there is little empirical evidence that the reported speeding to predictable stimuli indeed correlates with SL performance measured subsequent to familiarization. In fact, some recent studies have shown that the obtained RTs differences do not correlate with the standard offline measures (Franco, Gaillard, Cleeremans, & Destrebecqz, 2015; Misyak et al., 2010b; but see Dale et al., 2012; Karuza et al., 2014). These reports lead to a problematic state of affairs where the current online measures of SL remain invalidated, requiring additional scrutiny. Possibly, this lack of correlation is theoretically interesting showing that online and offline measures perhaps tap different sub-components of SL (see Misyak, Christiansen, & Tomblin, 2010a). Alternatively, it could be due to some peripheral methodological factors. First, gains in RTs are not independent of the overall speed of response. Fast responders would show then smaller gains regardless of their SL abilities. Second, it is possible that the mere presence of a secondary task employed during familiarization hinders learning due to its taxation on attentional resources (see Franco et al., 2014 for such direct evidence in the click detection SL task). This again poses a serious challenge for assessing the theoretical contribution of online measures. Impaired performance may hurt both the task’s reliability (Siegelman, Bogaerts, & Frost, 2016) and its validity (the online task perhaps measures SL, but may confound it with the ability to successfully divide attention between the primary and secondary tasks, Franco et al., 2014).

The goal of Experiment 1 was to address these challenges. First, we aimed to offer an online measure that tracks the dynamics of SL and provides information about the trajectory of learning in terms of time-course. Second, we endeavored to examine whether such measure withstands the psychometric requirement of test-retest reliability, so that it can be taken as a stable signature of the individual. Third, we sought to provide evidence for its validity in assessing SL ability.

We chose to focus on visual SL, where participants are expected to learn the transitional probabilities of visual shapes. Following a recent work by Karuza and her colleagues (Karuza et al., 2014), instead of asking participants to passively watch the stream of shapes, we asked them to actively advance the shapes in their own pace. In Experiment 1a we show that this simple procedure results in an online SL measure where RTs in advancing predictable shapes are faster than RTs in advancing non-predictable ones as learning proceeds. More importantly, in Experiment 1b, we show that this RT gain is a reliable signature of an individual. Experiments 1a and 1b also provide critical information regarding the validity of the measure (its correlation with the well-established offline learning score), and novel insight regarding the time course of learning in the group level.

Experiment 1a

Experiments 1a and 1b employed the typical design of visual SL experiments, where shapes are presented sequentially, and follow each other given a pre-determined set of transitional probabilities (e.g., Kirkham et al., 2002; Turk-Browne, Junge, & Scholl, 2005; Siegelman & Frost, 2015). This experimental paradigm has been used and validated extensively, and our main modification was to set the presentation of shapes to be participant determined, rather than at a fixed rate. On the group level this provided us with reliable information when learning occurs during the experimental session. On the individual level, it provided for each participant a new measure of learning that reflected his/her sensitivity to the statistical regularities embedded in the input stream.

Method

Participants

Seventy students of the Hebrew University (17 males) participated in the study for payment or for course credit. Participants had a mean age of 22.96 (range: 18–32), and had no reported history of reading disabilities, ADD or ADHD.

Design, Materials, and Procedure

Similar to a typical SL paradigm, our task consisted of a familiarization phase, followed by a test phase. The latent structure of the visual input stream presented during familiarization was also similar to that of multiple previously employed SL tasks (e.g., Frost, Siegelman, Narkiss, & Afek, 2013; Glicksohn & Cohen, 2013; Turk-Browne, Junge, & Scholl, 2005): the task included 24 complex visual shapes (see Appendix A), which were randomly organized for each participant to create eight triplets, with a TP of 1 between shapes within triplets. The familiarization stream consisted of 24 blocks, with all eight triplets appearing once (in a random order) in each block. Before familiarization, participants were told that they would be shown a sequence of shapes, appearing on the screen one after the other. Participants were instructed that some of the shapes tend to follow each other and that their task is to try to notice these co-occurrences². Importantly, in contrast to standard SL tasks, participants did not have to watch the stimuli appearing in a fixed presentation rate but were asked to advance the stream of shapes at the own pace, by pressing the space bar each time they wanted to advance to the next shape. There was no Inter Stimulus Interval (ISI) between shapes in familiarization. RTs for each press were recorded and served as the basis for the online measure of learning (see below).

Following familiarization, participants took a 2-AFC offline test, consisting of 32 trials. In each trial, participants were sequentially presented with two three-item sequences of shapes: (1) a target – three shapes that formed a triplet during the familiarization phase (TP=1), and (2) a "foil" – three shapes that never appeared together in the familiarization phase (TP=0). Foils were constructed without violating the position of the shapes within the original triplets (e.g., for the three triplets ABC, DEF and GHI, a possible foil could be AEI, but not BID). During test, shapes appeared in a fixed presentation rate of 800ms, with an ISI of 200ms between shapes within triplets, and a blank of 1000ms between triplets. Each of the eight familiarization triplets (i.e., targets) appeared four times throughout the test, with four different foils (each foil also appearing four times throughout the test, with different triplets). Before the test phase, participants were instructed that in each trial they would see two groups of shapes and that their task would be to choose the group that they are more familiar with as a whole. The offline test score ranged from 0 to 32, according to the number of correct identifications of targets over foils. Given the 2-AFC format, chance performance corresponds to a score of 16/32.

Results and Discussion

For each participant, RTs outside the range of 2 SD from the participant’s mean were trimmed to the cutoff value to minimize the effect of outliers (3.6% of all trials). Note also that, to account for variance in baseline RTs, all analyses were conducted on log-transformed RTs (rather than raw RTs). The use of a log-scale allows us to compare SL performance across individuals with different RT baselines. This is important because in the self-paced paradigm participants determine their own speed of advancing the stream of shapes. Consider, for example, two individuals (S1 and S2) with a mean difference of 100ms between predictable and unpredictable elements, but with a different baseline RT: S1 predictable = 900ms, unpredictable = 1000ms; S2 predictable = 300ms, unpredictable = 400ms. Without log-transformation, these two individuals will have a similar difference score (of 100ms), which is problematic as the relative speed-up of S2 to predictable stimuli is much larger than that of S1. In contrast, after log-transformation, the difference between predictable and unpredictable stimuli is indeed larger for S2 compared to S1: log difference of S1 = 0.11, S2 = 0.29.

Table 1 presents the mean RTs and standard deviations of key presses for shapes in the first, second, and third positions within triplets. A one-way repeated measures ANOVA confirmed the effect of position on log-transformed RTs (F(2, 138) = 18.79, p < 0.001). Subsequent paired t-tests revealed a difference between shapes in the first versus second position within-triplets (t(69) = 4.32, p < 0.001) and between shapes in first versus third position (t(69) = 4.84, p < 0.001), but provided no evidence for a difference between shapes in second and third position (t(69) = 1.53, p = 0.13). Fig. 2 presents the response latencies for shapes in the first, second and third positions over familiarization blocks, and shows the divergence between shapes in first position, to those appearing in second and third positions.

Table 1.

Means and SDs for RTs and log-transformed RTs for shapes in first, second, and third positions.

	1^st position	2^nd position	3^rd position
Raw RT (SD)	834.5 (377)	798.8 (340)	790.6 (339)
Log-transformed RT (SD)	6.43 (0.44)	6.39 (0.42)	6.38 (0.42)

Open in a new tab

Response latencies to shapes in first, second, and third position over familiarization blocks. Dashed lines represent the best logarithmic fit. Zoom-in area presents blocks 7–24.

In light of these results, we next calculated the online measure of SL performance. This measure, formulated in (1) below, quantifies learning as the difference in log-transformed RTs between shapes in the unpredictable position (the first position within triplets) to the mean RTs for predictable shapes (in the second and third positions within triplets). A score of zero in this online measure reflects no learning of the statistical properties of the input (i.e., no difference between predictable and unpredictable stimuli), whereas positive values reflect learning (i.e., faster responses to predictable compared to unpredictable stimuli).

Online Measure of SL = log . RT (1 st position) - mean . log . RT (2 nd & 3 rd position)

(1)

Fig. 3 shows the time-course of SL during familiarization, as reflected by the change in the online measure across the 24 blocks in the familiarization stream. Overall, the trajectory of the online measure seems to be well fitted by a logarithmic function – with relatively fast increase in SL until block 7 (i.e., after 7 repetitions), a point from which learning does not increase, with only random fluctuations around a fixed value. Indeed, a logarithmic curve better fitted the data compared to a linear function (R² = 0.29 vs. R² = 0.23)³. Relatedly, one-sample t-tests revealed that participants learned the underlying statistical structure of the input already relatively early in the familiarization – there was significantly larger than zero mean RT difference in all blocks from block 7 until the end of familiarization, showing robust and stable learning of patterns already after 7 repetitions. In addition, there were some earlier signs of learning, reflected by significant mean RT difference already in blocks 3 and 4 (all p_one-tailed’s < 0.05)⁴.

Learning trajectory as reflected by the change in the online measure throughout familiarization blocks. Error bars represent standard errors. The dashed line represents the best logarithmic fit.

Validation

In order to validate the novel online measure of SL we examined its correlation with the standard 2-AFC offline test score (which presented above-chance mean performance of 22.57/32 (70.5%) trials, t(69) = 8.59, p_one-tailed < 0.001). For each individual, we calculated the overall extent of SL based on the online measure, by averaging the difference in log-transformed RTs between predictable and unpredictable shapes (formula (1) above) in blocks 7 to 24. We chose to focus on these blocks as these were the blocks in which stable significant learning was observed for the group as a whole, and since these included a large enough number of blocks to reduce measurement error. A strong correlation of r = 0.56 (p < 0.001, 95% CI: [0.37, 0.7]) was found between the individual gain in RTs for predictable shapes and his/her offline test performance (see Fig. 4)⁵. This result suggests that the online measure we proposed indeed taps into SL ability, validating it. Participants who score higher in the offline test are, on the average, faster with predictable vs. unpredictable stimuli.

Scatter plot of the correlation between the online measure of SL and performance in the 2-AFC offline test. This correlation might seem to be over-estimated due to a few extreme observations (3 on top right corner, 2 on bottom left). However, it remains strong even when removing these data points: r = 0.46, p < 0.001.

Taken together, the results of Experiment 1a reveal the promise of an online measure in investigating visual SL. By merely asking participants to advance the shapes at their own pace rather than watching the visual input stream passively, we obtained novel information regarding the dynamics of learning. We found that learning was well fitted by a logarithmic fashion, and that significant learning of structure is present already after a small number of exposures to the repeated patterns. At least within our experimental parameters (eight triplets, TPs of 1.0) and dependent measure (log transformed RT gain), the data suggest that seven or eight repetitions of the triplets are sufficient to reach significant learning. Experiment 1a also showed that for a given individual, the gain in RT to predictable vs. unpredictable shapes is highly correlated with his/her standard (2-AFC) offline measure of performance. This demonstrates that the online measure is indeed a valid proxy of SL. What remains to be shown, however, is that the gain in RTs for predictable stimuli withstands the psychometric requirement of reliability, providing a signature of individual SL performance that is stable over time. Experiment 1b was set, therefore, to assess the test-retest reliability of this online measure.

Experiment 1b

In Experiment 1b we recalled our original sample, and retested participants with the same task, using different triplets. Again, we measured their individual gain in response time to predictable vs. unpredictable shapes, aiming to correlate their RT gain in the two experimental sessions.