Skip to main content
eLife logoLink to eLife
. 2025 Sep 12;13:RP102475. doi: 10.7554/eLife.102475

Sequence action representations contextualize during early skill learning

Debadatta Dash 1, Fumiaki Iwane 1, William Hayward 1, Roberto F Salamanca-Giron 1, Marlene Bönstrup 2, Ethan R Buch 1,, Leonardo G Cohen 1,
Editors: Juan Alvaro Gallego3, Michael J Frank4
PMCID: PMC12431778  PMID: 40938318

Abstract

Activities of daily living rely on our ability to acquire new motor skills composed of precise action sequences. Here, we asked in humans if the millisecond-level neural representation of an action performed at different contextual sequence locations within a skill differentiates or remains stable during early motor learning. We first optimized machine learning decoders predictive of sequence-embedded finger movements from magnetoencephalographic (MEG) activity. Using this approach, we found that the neural representation of the same action performed in different contextual sequence locations progressively differentiated—primarily during rest intervals of early learning (offline)—correlating with skill gains. In contrast, representational differentiation during practice (online) did not reflect learning. The regions contributing to this representational differentiation evolved with learning, shifting from the contralateral pre- and post-central cortex during early learning (trials 1–11) to increased involvement of the superior and middle frontal cortex once skill performance plateaued (trials 12–36). Thus, the neural substrates supporting finger movements and their representational differentiation during early skill learning differ from those supporting stable performance during the subsequent skill plateau period. Representational contextualization extended to Day 2, exhibiting specificity for the practiced skill sequence. Altogether, our findings indicate that sequence action representations in the human brain contextually differentiate during early skill learning, an issue relevant to brain-computer interface applications in neurorehabilitation.

Research organism: Human

Introduction

Motor learning is required to perform a wide array of activities of daily living, intricate athletic endeavors, and professional skills. Whether it’s learning to type more quickly on a keyboard (Bönstrup et al., 2019a), improve one’s tennis game (Schmidt, 2018), or play a piece of music on the piano (Doyon and Benali, 2005) – all these skills require the ability to execute sequences of actions with precise temporal coordination. Action sequences thus form the building blocks of fine motor skills (Dehaene et al., 2015). Practicing a new motor skill elicits rapid performance improvements (early learning; Bönstrup et al., 2019a) that precede skill performance plateaus (Walker and Stickgold, 2004). Skill gains during early learning accumulate over rest periods (micro-offline) interspersed with practice (Bönstrup et al., 2019a; Buch et al., 2021; Jacobacci et al., 2020; Mylonas et al., 2024; Hayward et al., 2024; Brooks et al., 2024), and are up to four times larger than offline performance improvements reported following overnight sleep (Bönstrup et al., 2019a). During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory (Bönstrup et al., 2020). Micro-offline gains observed during early learning are reproducible (Jacobacci et al., 2020; Brooks et al., 2024; Bönstrup et al., 2020; Chen et al., 2024; Sjøgård, 2024) and are similar in magnitude even when practice periods are reduced by half to 5 seconds in length, thereby confirming that they are not merely a result of recovery from performance fatigue (Bönstrup et al., 2020). Additionally, they are unaffected by the random termination of practice periods, which eliminates the possibility of predictive motor slowing as a contributing factor (Bönstrup et al., 2020). Collectively, these behavioral findings point towards the interpretation that micro-offline gains during early learning represent a form of memory consolidation (Bönstrup et al., 2019a).

This interpretation has been further supported by brain imaging and electrophysiological studies linking known memory-related networks and consolidation mechanisms to rapid offline performance improvements. In humans, the rate of hippocampo-neocortical neural replay predicts micro-offline gains (Buch et al., 2021). Consistent with these findings, Chen et al., 2024 and Sjøgård, 2024 furnished direct evidence from intracranial human EEG studies, demonstrating a connection between the density of hippocampal sharp-wave ripples (80–120 Hz)—recognized markers of neural replay—and micro-offline gains during early learning. Further, Griffin et al. reported that neural replay of task-related ensembles in the motor cortex of macaques during brief rest periods—akin to those observed in humans (Bönstrup et al., 2019a; Buch et al., 2021; Jacobacci et al., 2020; Mylonas et al., 2024; Wamsley et al., 2023)—is not merely correlated with, but are causal drivers of micro-offline learning (Griffin et al., 2025). Specifically, the same reach directions that were replayed the most during rest breaks showed the greatest reduction in path length (i.e. more efficient movement path between two locations in the reach sequence) during subsequent trials, while stimulation applied during rest intervals preceding performance plateau reduced reactivation rates and virtually abolished micro-offline gains (Griffin et al., 2025). Thus, converging evidence in humans and non-human primates across indirect non-invasive and direct invasive recording techniques links hippocampal activity, neural replay dynamics, and offline skill gains in early motor learning that precede performance plateau.

During skill learning, the neural representation of a sequential skill binds discrete individual actions (e.g. single piano keypress) into complex, temporally and spatially precise sequence representations (e.g. a refrain from a piece of music; Karni et al., 1995; Song and Cohen, 2014; Natraj et al., 2022; Ghilardi et al., 2009; Yokoi and Diedrichsen, 2019). After a skill is learned over extended periods (i.e. weeks), the neural representation of the sequence changes significantly (Yokoi and Diedrichsen, 2019), while the representation of its individual action components (e.g. finger movements) does not (Beukema et al., 2019). On the other hand, it is not known whether individual sequence action representations differentiate or remain stable during the early stages of skill learning, when the memory is still not fully formed (Bönstrup et al., 2019a). Furthermore, it is unknown whether the neural representations of identical movements, performed at different positions within a skill sequence (i.e. the skill context), differentiate with learning—an important consideration for advancing robust brain-computer interface (BCI) applications (Merino et al., 2023; Liu et al., 2023; Lee et al., 2022; Zhao et al., 2022; Yao et al., 2022).

Examining the millisecond-level differentiation of discrete action representations during learning is challenging, as evolving neural dynamics concurrently encode skill sequences and their individual action components (Yokoi and Diedrichsen, 2019; Hikosaka et al., 1999) across multiple spatial scales (Munn et al., 2024). To address this problem, we first optimized a multi-scale decoder aimed at predicting keypress actions from magnetoencephalographic (MEG) neural activity. Using this optimized approach, we report that an individual sequence action representation differentiates depending on the sequence context and correlates with early skill learning. This representational contextualization developed predominantly over rest rather than during practice intervals—in parallel with rapid consolidation of skill.

Results

Participants engaged in a well-characterized sequential skill learning task (Bönstrup et al., 2019a; Buch et al., 2021; Bönstrup et al., 2020) that involved repetitive typing of a sequence (4-1-3-2-4) performed with their (non-dominant) left hand over 36 trials with alternating periods of 10 s practice and 10 s rest (inter-practice rest; Day 1 Training; Figure 1A), a practice schedule that minimizes reactive inhibition effects (Bönstrup et al., 2020; Pan and Rickard, 2015; see Materials and methods). Individual keypress times and finger keypress identities were recorded and used to quantify skill as the correct sequence speed (keypresses/s; Bönstrup et al., 2019a).

Figure 1. Experimental design and behavioral performance.

(A) Skill learning task. Participants engaged in a procedural motor skill learning task, which required them to repeatedly type a keypress sequence, "4-1-3-2-4" (1=little finger, 2=ring finger, 3=middle finger, and 4=index finger) with their non-dominant, left hand. The Day 1 Training session included 36 trials, with each trial consisting of alternating 10 s practice and rest intervals. The rationale for this task design was to minimize reactive inhibition effects during the period of steep performance improvements (early learning; Bönstrup et al., 2020; Pan and Rickard, 2015; see Materials and methods). After a 24-hr break, participants were retested on performance of the same sequence (4-1-3-2-4) for nine trials (Day 2 Retest) to inform on the generalizability of the findings over time and MEG recording sessions, as well as single-trial performance on nine different control sequences (Day 2 Control; 2-1-3-4-2, 4-2-4-3-1, 3-4-2-3-1, 1-4-3-4-2, 3-2-4-3-1, 1-4-2-3-1, 3-2-4-2-1, 3-2-1-4-2, and 4-2-3-1-4) to inform on specificity of the findings to the learned skill. MEG was recorded during both Day 1 and Day 2 sessions with a 275-channel CTF magnetoencephalography (MEG) system (CTF Systems, Inc, Canada). (B) Skill Learning. As reported previously1, participants on average reached 95% of peak performance by trial 11 of the Day 1 Training session (see Figure 1—figure supplement 1A for results over all Day 1 Training and Day 2 Retest trials). Shaded regions in main plot indicate the 95% confidence interval of the group mean. At the group level, total early learning was exclusively accounted for by micro-offline gains during inter-practice rest intervals (B, inset; F [2,75]=14.79, p=3.86 × 10–6; micro-online vs. micro-offline: p=7.98 × 10–6; micro-online vs. total: p=0.0002; micro-offline vs. total: p=0.669). These results were not impacted by potential preplanning effects on initial skill performance (Ariani and Diedrichsen, 2019) since alternative measurements of cumulative micro-online and -offline gains remain unchanged after omission of the first 3 keypresses in each trial from the correct sequence speed computation (paired t-tests; micro-online: t25=–0.0223, p=0.982; micro-offline: t25=–0.879, p=0.388). Center line of box plots shown in inset indicate the group median, while box limits indicate the 1st and 3rd quartiles. Whisker lengths are set at the extreme value ≤1.5×IQR. (C) Keypress transition time (KTT) variability. Distribution of KTTs normalized to the median correct sequence time for each participant and centered on the mid-point for each full sequence iteration during early learning (see Figure 1—figure supplement 1B for results over all Day 1 Training and Day 2 Retest trials). Note the initial variability of the relative KTT composition of the sequence (i.e., – 4–1, 1–3, 3–2, 2–4, 4–4), before it stabilizes in the early learning period.

Figure 1.

Figure 1—figure supplement 1. Behavioral performance during skill learning.

Figure 1—figure supplement 1.

(A) Total Skill Learning over Day 1 Training (36 trials) and Day 2 Retest (9 trials). As reported previously (Bönstrup et al., 2019a), participants on average reached 95% of peak performance during Day 1 Training by trial 11. Note that after trial 11, performance stabilizes around a plateau through trial 36. Following a 24-hr break, participants displayed an upward shift in performance during the Day 2 Retest – indicative of an overnight skill consolidation effect. Shaded regions indicate the 95% confidence interval of the group mean. (B) Keypress transition time (KTT) variability. Distribution of KTTs normalized to the median correct sequence time for each participant and centered on the mid-point for each full sequence iteration during early learning. Note that the initial variability of the five component transitions in the sequence (i.e. 4–1, 1–3, 3–2, 2–4, 4–4) stabilize by trial 6 in the early learning period and remain stable throughout the rest of Day 1 Training (through trial 36) and Day 2 Retest.

Participants reached 95% of maximal skill (i.e., - Early Learning) within the initial 11 practice trials (Figure 1B), with improvements developing over inter-practice rest periods (micro-offline gains) accounting for almost all total learning across participants (Figure 1B, inset; Bönstrup et al., 2019a). In addition to the reduction in sequence duration during early learning, individual keypress transition times became more consistent across repeated sequence iterations (Figure 1C). On average across subjects, 2.32% ± 1.48% (mean ± SD) of all keypresses performed were errors, which were evenly distributed across the four possible keypress responses. While errors increased progressively over practice trials, they did so in proportion to the increase in correct keypresses, so that the overall ratio of correct-to-incorrect keypresses remained stable over the training session.

On the following day, participants were retested on performance of the same sequence (4-1-3-2-4) over 9 trials (Day 2 Retest), as well as on the single-trial performance of 9 different untrained control sequences (Day 2 Controls: 2-1-3-4-2, 4-2-4-3-1, 3-4-2-3-1, 1-4-3-4-2, 3-2-4-3-1, 1-4-2-3-1, 3-2-4-2-1, 3-2-1-4-2, and 4-2-3-1-4). As expected, an upward shift in performance of the trained sequence (0.68 ± SD 0.56 keypresses/s; t=7.21, p<0.001) was observed during Day 2 Retest, indicative of an overnight skill consolidation effect (Figure 1—figure supplement 1A).

Keypress actions are represented in multi-scale hybrid-space manifolds

We investigated the differentiation of neural representations of the same index finger keypress performed at different positions of the skill sequence. A set of decoders was constructed to predict keypress actions from MEG activity as a function of both the learning state and the ordinal position of the keypress within the sequence. We first characterized the spectral and spatial features of keypress state representations by comparing performance of decoders constructed around broadband (1–100 Hz) or narrowband [delta- (1–3 Hz), theta- (4–7 Hz), alpha- (8–14 Hz), beta- (15–24 Hz), gamma- (25–50 Hz), and high gamma-band (51–100 Hz)] MEG oscillatory activity. We found that decoders trained on broadband activity consistently outperformed those trained on narrowband activity. Whole-brain parcel-space (70.11% ± SD 7.11% accuracy; n=148 brain regions; t=1.89, p=0.035, df = 25, Cohen’s d=0.17, Figure 2A; also see Figure 2B for topographic map of feature importance scores) and voxel-space (74.51% ± SD 7.34% accuracy; n=15684; t=7.18, p<0.001, df = 25, Cohen’s d=0.76, Figure 2A; also see Figure 2C for topographic map of feature importance scores; Destrieux et al., 2010) decoders exhibited greater accuracy than all regional voxel-space decoders constructed from individual brain areas (Figure 2D; maximum accuracy of 68.77% ± SD 7.6%; see also Figure 2—figure supplements 1 and 2). Thus, optimal decoding required information from multiple brain regions, predominantly contralateral to the hand engaged in the skill task (Figure 2B and C).

Figure 2. Spatial and oscillatory contributions to neural decoding of finger identities.

(A) Contribution of whole-brain oscillatory frequencies to decoding. When trained on broadband activity relative to narrow frequency band features, decoding accuracy (i.e. test sample performance) was highest for whole-brain voxel-space (74.51% ± SD 7.34%, t=8.08, p<0.001) and parcel-space (70.11% ± SD 7.11%, t=13.22, p<0.001) MEG activity. Thus, decoders trained on whole-brain broadband data consistently outperformed those trained on narrowband activity. Dots depict decoding accuracy for each participant. Center line of box plots indicate the group median, while notches represent the 95% confidence interval of the group median. Box limits indicate the 1st and 3rd quartiles while whisker lengths are set at the extreme value ≤1.5×IQR. Outlier values located outside of the whisker range are marked with “+” symbols. *p<0.05, **p<0.01, ***p<0.001, n.s. - no statistical significance (p>0.05). (B) Whole-brain parcel-space decoding. Color-coded brain surface plot displaying the relative importance of individual brain regions (parcels) to broadband whole-brain parcel-space decoding performance (far-left light gray box plot in A). (C) Whole-brain voxel space decoding. Color-coded brain surface plot displaying the relative importance of individual voxels to broadband whole-brain voxel-space decoding performance (far-left dark gray box plot in A). (D) Regional voxel-space decoding. Broadband voxel-space decoding performance for top-ranked brain regions across the group is displayed on a standard (FreeSurfer fsaverage) brain surface and color-coded by accuracy. Note that while whole-brain parcel- and voxel-space decoders relied more on information from brain regions contralateral to the engaged hand, regional voxel-space decoders performed similarly for bilateral sensorimotor regions.

Figure 2.

Figure 2—figure supplement 1. Oscillatory contributions at individual brain regions.

Figure 2—figure supplement 1.

Decoding performance of regional voxel-space activity patterns within individual brain areas for broadband and each narrowband oscillatory range is displayed in the form of a heatmap for both the left and right hemisphere. Optimal decoding performance for broadband regional voxel-space decoders was obtained from bilateral superior frontal (Left: 68.77% ± SD 7.6%; Right: 67.52%% ± SD 6.78%), middle frontal (Left: 63.41% ± SD 7.58%; Right: 62.78%% ± SD 76.94%), pre-central (Left: 62.37%% ± SD 6.32%; Right: 62.69% ± SD 5.94%), and post-central (Left: 61.71% ± SD 6.62%; Right: 61.09% ± SD 6.2%) brain regions. Superior parietal, central, paracentral, anterior cingulate, and precuneus regions also showed broadband decoding performance exceeding 60%. With respect to decoders constructed from narrowband oscillatory input features, only Delta-band voxel-space activity from bilateral superior frontal regions achieved at least 60% decoding accuracy of keypresses.
Figure 2—figure supplement 2. Distribution of correlation coefficients between parcel-space time-series and their constituent voxels.

Figure 2—figure supplement 2.

Data is shown for all subjects. Parcels represented in the regional voxel-space features of the hybrid-space decoder are marked with red vertical boxes (bilateral superior frontal, middle frontal, pre-central, and post-central regions). The y-axis indicates the absolute correlation coefficients for each voxel time series with the time series of the parcel it is a member of (1=complete redundancy; 0=orthogonality). Note that while signal in some voxels correlates strongly with parcel-space time series, others are fully orthogonal. That is, the degree to which information obtained at the two different spatial scales is complementary (or redundant) varies substantially over the regional voxel space. This finding is consistent with the documented increase in correlational structure of neural activity across larger spatial scales that does not reflect perfect dependency or orthogonality (Munn et al., 2024). The normalized cumulative distributions of parcel-to-voxel-space correlations depicted on the right show that voxels included in the hybrid-space decoder (red) are correlated less overall (two-sample Kolmogorov-Smirnov test: D=0.2484, p<1 × 10–10) with their respective parcel-space time-series relative to excluded voxels (gray).

Next, given that the brain simultaneously processes information more efficiently across multiple spatial and temporal scales (Munn et al., 2024; Buch et al., 2017; Lisman and Buzsáki, 2008), we asked if the combination of lower resolution whole-brain and higher resolution regional brain activity patterns further improve keypress prediction accuracy. We constructed hybrid-space decoders (N=1295 ± 20 features; Figure 3A) combining whole-brain parcel-space activity (n=148 features; Figure 2B) with regional voxel-space activity from a data-driven subset of brain areas (n=1147 ± 20 features; Figure 2D). This subset covers brain regions showing the highest regional voxel-space decoding performances (top regions across all subjects shown in Figure 2D; Materials and methods – Hybrid Spatial Approach). Accuracy was higher for hybrid- (78.15% ± SD 7.03%; weighted mean F1 score of 0.78 ± SD 0.07) than for voxel- (74.51% ± SD 7.34%; paired t-test: t=6.30, p<0.001, df = 25, Cohen’s d=0.39) and parcel-space decoders (70.11% ± SD 7.48%; paired t-test: t=12.08, p<0.001, df = 25, Cohen’s d=0.98, Figure 3B, Figure 3—figure supplements 1 and 6). Note that while features from contralateral brain regions were more important for whole-brain decoding (in both parcel- and voxel-spaces), regional voxel-space decoders performed best for bilateral sensorimotor areas on average across the group. Thus, a multi-scale hybrid-space representation best characterizes the keypress action manifolds.

Figure 3. Hybrid spatial approach for neural decoding during skill learning.

(A) Pipeline. Sensor-space MEG data (N=272 channels) were source-localized (voxel-space features; N=15,684 voxels), and then parcellated (parcel-space features; N=148) by averaging the activity of all voxels located within an individual region defined in a standard template space (Desikan-Killiany Atlas). Individual regional voxel-space decoders were then constructed and ranked. The final hybrid-space keypress state (i.e. 4-class) decoder was constructed using all whole-brain parcel-space features and top-ranked regional voxel-space features (see Materials and methods). (B) Decoding performance across parcel, voxel, and hybrid spaces. Note that decoding performance was highest for the hybrid space approach compared to performance obtained for whole-brain voxel- and parcel spaces. Addition of linear discriminant analysis (LDA)-based dimensionality reduction further improved decoding performance for both parcel- and hybrid-space approaches. Each dot represents accuracy for a single participant and method. Center line of box plots indicates the group median, while notches (and shaded areas) represent the 95% confidence interval of the group median. Box limits indicate the 1st and 3rd quartiles while whisker lengths are set at the extreme value ≤1.5×IQR. Outlier values located outside of the whisker range are marked with “+” symbols. ***p<0.001 and *p<0.05. (C) Confusion matrix of individual finger identity decoding for hybrid-space manifold features. True predictions are located on the main diagonal. Off-diagonal elements in each row depict false-negative predictions for each finger, while off-diagonal elements in each column indicate false-positive predictions. Please note that the index finger keypress had the highest false-negative misclassification rate (11.55%).

Figure 3.

Figure 3—figure supplement 1. Contribution of whole-brain oscillatory frequencies to decoding.

Figure 3—figure supplement 1.

Accuracy for decoders trained on four different input feature spaces—sensor, whole-brain parcel, whole-brain voxel, and hybrid (combination of whole-brain parcel plus regional voxel)—was highest for broadband MEG activity, followed by Delta-band activity. The hybrid approach resulted in the highest decoding accuracy, regardless of whether input features were broadband or narrowband-limited. Sensor-, parcel-, and voxel-space decoders displayed similar accuracy with respect to one another for broadband MEG activity, and also for all narrowband ranges assessed. Dots depict decoding accuracy for each participant. Center line of box plots indicates the group median, while notches represent the 95% confidence interval of the group median. Box limits indicate the 1st and 3rd quartiles while whisker lengths are set at the extreme value ≤1.5×IQR. Outlier values located outside of the whisker range are marked with “+” symbols. ***p<0.001, n.s. - no statistical significance (p>0.05).
Figure 3—figure supplement 2. Comparison of different dimensionality reduction techniques.

Figure 3—figure supplement 2.

Dimensionality reduction was applied to the input features for each approach (parcel-space: N=148; voxel-space: N=15684; hybrid-space: N=1295; Maaten and Postma, 2009). The results with principal component analysis (PCA, in green), multi-dimensional scaling (MDS, in blue), minimum redundant maximum relevance algorithm (MRMR, in red), linear discriminant analysis (LDA, in black) are shown in comparison to performance obtained using all input features (in magenta). For parcel-space input features, all these approaches increased the mean decoding accuracy with PCA and LDA (both of which result in extraction of orthogonal features) showing statistically significant improvement (one-way ANOVA: F=13.05, p<0.001; post hoc Tukey tests: p=0.032; PCA: p<0.001; LDA: p>0.05). For voxel-space features, there was no statistically significant improvement with any of the approaches (p>0.05). While MRMR resulted in the largest voxel-space decoding accuracy improvement, it was not statistically significant (post hoc Tukey test: p=0.14), and application of LDA dimensionality reduction actually reduced performance dramatically. Uniquely for hybrid-space features—all dimensionality reduction techniques improved decoding performance significantly (one-way ANOVA: F=21.32; post hoc Tukey tests: p<0.05) with the best largest improvement observed following application of LDA. Center line of box plots indicates the group median, while notches represent the 95% confidence interval of the group median. Box limits indicate the 1st and 3rd quartiles while whisker lengths are set at the extreme value ≤1.5×IQR. Outlier values located outside of the whisker range are marked with “+” symbols. ***p<0.001, **p<0.01, *p<0.05, n.s. - no statistical significance (p>0.05).
Figure 3—figure supplement 3. ICA artefacts do not contribute to decoding.

Figure 3—figure supplement 3.

(A) Example of ICA component time-series for components labeled as artifacts from a single subject during MEG data pre-processing. The features of these components are consistent with known motion and physiological artifacts in MEG data. (B) 4-class confusion matrix and (C) decoding performance of keypress action labels from ICA components labeled as artifacts and removed from the MEG data during pre-processing. These components failed to predict keypress labels above empirically determined chance levels (as shown by decoding performance after random label shuffling). Note that in all cases, decoding performance from movement and physiological artifacts was substantially lower than 4-class MEG hybrid-space decoding for all participants. (D) Head position was assessed at the beginning and at the end of each recording and used to measure head movement. The mean measured head movement across the study group was 1.159 mm (±1.077 SD). Center line of the box plot indicates the group median, while notches represent the 95% confidence interval of the group median. Box limits indicate the 1st and 3rd quartiles while whisker lengths are set at the extreme value ≤1.5×IQR. Outlier values located outside of the whisker range are marked with “+” symbols.
Figure 3—figure supplement 4. Confusion matrices for decoding performance on Day 2 Retest (A) and Day 2 Control (B) data.

Figure 3—figure supplement 4.

Note that the hybrid-space decoding strategy generalized to Day 2 data with 87.11% overall accuracy for keypresses embedded within the trained sequence (Day 2 Retest) and 79.44% overall accuracy for keypresses embedded within untrained control sequences (Day 2 Control).
Figure 3—figure supplement 5. Decoding performance across temporal scales.

Figure 3—figure supplement 5.

(A) Average decoding accuracies across participants with varying window parameters. The x-axis indicates the onset of the time window (in ms) used to relate MEG activity time series to individual keypresses (i.e. KeyDown event = 0ms), while the y-axis indicates the window duration (in ms). The heatmap color denotes the decoding accuracy for all window onset/duration pairings. The best decoding accuracy across subjects was obtained using a window duration of 200ms with the leading edge aligned to the KeyDown event (i.e. 0ms; marked by the dashed lines and open circle). (B) Decoder window parameters (onset and duration) used for each subject in reported decoder accuracy comparisons (Figures 24). Please note that the group-optimal set of parameters (window onset = 0ms; window duration = 200ms; LDA dimensionality reduction) was utilized for all contextualization analyses (Figure 5) to allow for comparison across participants. Center line of box plots indicates the group median, while notches represent the 95% confidence interval of the group median. Box limits indicate the 1st and 3rd quartiles while whisker lengths are set at the extreme value ≤1.5×IQR. Outlier values located outside of the whisker range are marked with “+” symbols.
Figure 3—figure supplement 6. Comparison of decoding performances with two different hybrid approaches.

Figure 3—figure supplement 6.

HybridOverlap (regional voxel-space features from top-ranked parcels combined with all whole-brain parcel-space features as shown in Figure 3B, Figure 3—figure supplements 1; 35 of the manuscript) and HybridNon-overlap (regional voxel-space features of top-ranked parcels and spatially non-overlapping whole-brain parcel-space features). Filled circle markers represent decoding accuracy for individual subjects. Dashed lines indicate within-subject performance changes between decoding approaches. Note that the HybridOverlap (the approach used in our manuscript) significantly outperforms the HybridNon-overlap approach (Wilcoxon signed rank test, z=3.7410, p=1.8326e-04), despite the removed features (n=8) only comprising less than 1% of the overall input feature space. These results indicate that the spatially overlapping whole-brain (lower resolution) parcel-space and regional (higher resolution) voxel-space features provide complementary—as opposed to redundant—information to the hybrid-space decoder. Center line of box plots indicates the group median, while notches represent the 95% confidence interval of the group median. Box limits indicate the 1st and 3rd quartiles while whisker lengths are set at the extreme value ≤1.5×IQR.
Figure 3—figure supplement 7. Comparison of different decoder methods.

Figure 3—figure supplement 7.

Performance for all different machine learning decoders assessed is shown for each participant. The results show that the linear discriminant analysis (LDA) classifier outperformed other methods, on average, across the group. Decoding analysis performance comparisons reported in the current study utilized the LDA decoder for all subjects. Center line of box plots indicates the group median, while notches represent the 95% confidence interval of the group median. Box limits indicate the 1st and 3rd quartiles while whisker lengths are set at the extreme value ≤1.5×IQR. Outlier values located outside of the whisker range are marked with “+” symbols.

We implemented different dimensionality reduction or manifold extraction strategies including principal component analysis (PCA), multi-dimensional scaling (MDS), minimum redundant maximum relevance (MRMR), and linear discriminant analysis (LDA; Maaten and Postma, 2009) to map the input feature (parcel, voxel, or hybrid) space to a low-dimensional latent space (Natraj et al., 2022). LDA-based manifold extraction led to the greatest classifier performance gains, improving keypress decoding accuracy to 90.47% ± SD 3.44% (Figure 3B; weighted mean F1 score = 0.91 ± SD 0.05). In comparison to the hybrid-space decoder, whole-brain parcel-space decoder performance also improved following LDA-based dimensionality reduction (82.95% ± SD 5.48%), while whole-brain voxel-space decoder accuracy dropped substantially (40.38% ± SD 6.78%; also see Figure 3—figure supplement 2).

Notably, decoding of index finger keypresses (executed at two different ordinal positions in the sequence) exhibited the highest false negative (0.115 per keypress) and false positive (0.067 per prediction) misclassification rates compared with all other digits (false negative rate range = [0.067 0.114]; false positive rate range = [0.085 0.131]; Figure 3C), raising the hypothesis that the same action could be differentially represented when executed within different contexts (i.e. at different locations within the skill sequence). Testing the keypress state (4-class) hybrid decoder performance on Day 1 after randomly shuffling keypress labels for held-out test data resulted in a performance drop approaching expected chance levels (22.12% ± SD 9.1%; Figure 3—figure supplement 3C). An alternate decoder trained on ICA components labeled as movement or physiological artifacts (e.g. head movement, ECG, eye movements, and blinks; Figure 3—figure supplement 3A, D) and removed from the original input feature set during the pre-processing stage approached chance-level performance (Figure 4—figure supplement 3), indicating that the 4-class hybrid decoder results were not driven by task-related artifacts.

Utilizing the highest performing decoders that included LDA-based manifold extraction, we assessed the robustness of hybrid-space decoding over multiple sessions by applying it to data collected on the following day during the Day 2 Retest (9-trial retest of the trained sequence) and Day 2 Control (single-trial performance of 9 different untrained sequences) blocks. The decoding accuracy for Day 2 MEG data remained high (87.11% ± SD 8.54% for the trained sequence during Retest, and 79.44% ± SD 5.54% for the untrained Control sequences; Figure 3—figure supplement 4). Thus, index finger classifiers constructed using the hybrid decoding approach robustly generalized from Day 1 to Day 2 across trained and untrained keypress sequences.

Inclusion of keypress sequence context location optimized decoding performance

Next, we tracked the trial-by-trial evolution of keypress action manifolds as training progressed. Within-subject keypress neural representations progressively differentiated during early learning. A representative example in Figure 4A (top row) depicts increased four-digit representation clustering across trials 1, 11, and 36. The cortical representation of these clusters changed over the course of training, beginning with predominant involvement of contralateral pre-central areas in trial 1 before transitioning to greater contralateral post-central, superior frontal, and middle frontal cortex contributions in trials 11 and 36 (Figure 4A, bottom row), paralleling improvements in decoding performance (see Figure 4—figure supplement 1 for trial-by-trial quantitative feature importance score changes during skill learning).

Figure 4. Evolution of keypress neural representations with skill learning.

(A) Keypress neural representations differentiate during early learning. t-SNE distribution of neural representation of each keypress (top scatter plots) is shown for trial 1 (start of training; top-left), 11 (end of early learning; top-center), and 36 (end of training; top-right) for a single representative participant. Individual keypress manifold representation clustering in trial 11 (top-center; end of early learning) depicts sub-clustering for the index finger keypress performed at the two different ordinal positions in the sequence (IndexOP1 and IndexOP5), which remains present by trial 36 (top-right). Spatial distribution of regional contributions to decoding (bottom brain surface maps). The surface color heatmap indicates feature importance scores across the brain. Note that decoding contributions shifted from contralateral right pre-central cortex at trial 1 (bottom-left) to contralateral superior and middle frontal cortex at trials 11 (bottom-center) and 36 (bottom-right). (B) Confusion matrix for 5-class decoding of individual sequence items. Decoders were trained to classify contextual representations of the keypresses (i.e. 5-class classification of the sequence elements 4-1-2-3-4). Note that the decoding accuracy increased to 94.15% ± SD 4.84% and the misclassification of keypress 4 was significantly reduced (from 141 to 82). (C) Trial-by-trial classification accuracy for 2-class decoder (IndexOP1 vs. IndexOP5). A decoder (200ms window duration aligned to the KeyDown event) was trained to differentiate between the two index finger keypresses embedded at different positions within the practiced skill sequence (IndexOP1=index finger keypress at ordinal position 1 of the sequence; IndexOP5=index finger keypress at ordinal position 5 of the sequence). Decoder accuracy progressively improved over early learning, stabilizing around 96% by trial 11 (end of early learning). Similar results were observed for other decoding window sizes (50, 100, 150, 250, and 300ms; see Figure 4—figure supplement 2). Taken together, these findings indicate that the neural feature space evolves over early learning to incorporate sequence location information. Shaded region indicates the 95% confidence interval of the group mean.

Figure 4.

Figure 4—figure supplement 1. Quantification of trial-by-trial parcel-space feature importance scores during skill learning.

Figure 4—figure supplement 1.

Trial-by-trial changes in parcel-space feature importance scores are shown for right superior frontal, middle frontal, pre-central, and post-central cortex (i.e. the contralateral regions showing the highest regional voxel-space decoding accuracy). Note that the feature importance is initially higher for the contralateral pre-central cortex in early trials before shifting towards the contralateral middle and superior frontal cortex during later trials, as can be seen with the divergence of line plots beginning around trial 11.
Figure 4—figure supplement 2. Trial-by-trial classification accuracy for 2-class decoder (IndexOP1 vs. IndexOP5).

Figure 4—figure supplement 2.

Several decoders (with varying window durations aligned to the KeyDown event) were trained to differentiate between the two index finger keypresses embedded at different positions within the practiced skill sequence (IndexOP1 at ordinal position 1 vs. IndexOP5 at ordinal position 5). Decoding accuracy for the 200ms duration windows (i.e. the optimal window size for 5-class decoding of individual keypresses) progressively improves over early learning, stabilizing around 96% by trial 11 (end of early learning). Similar results were observed for all other decoding window sizes (50, 100, 150, 250, and 300ms), with overall accuracy slightly lower compared to 200ms. These findings indicate that the neural representations of the skill action are updated over early learning to incorporate sequence location information. Shaded regions indicate the 95% confidence interval of the group mean.
Figure 4—figure supplement 3. Eye movement features do not contribute to decoding.

Figure 4—figure supplement 3.

(A) Scatter plot of gaze positions at the KeyDown event and 200ms after the KeyDown event (i.e. beginning and ending of window used for decoding keypress labels from MEG input features) from a representative participant. Transparent gray dots indicate all sampled gaze positions during practice trials. The overall mean gaze position during practice trials is indicated by the black filled circle marker. Colored right-pointing triangle markers indicate the gaze position at the KeyDown event for each ordinal position keypress (IndexOP1 – magenta; LittleOP2 – yellow; MiddleOP3 – blue; RingOP4 – green; IndexOP5 – brown), while left-pointing triangle markers indicate the gaze position 200ms after the KeyDown event. The mean gaze position for these two time points is indicated by the larger-sized triangle markers. On average, gaze position is largely fixed for the OP1 and OP3 keypresses, moves from left to right for OP2 and OP4 keypresses, and from right to left for OP5 keypresses (which is when the asterisk moves leftward from the last sequence item back to the first). (B) Confusion matrix showing that three eye movement features fail to predict asterisk position on the task display above chance levels (Fold 1 test accuracy = 0.21718; Fold 2 test accuracy = 0.22023; Fold 3 test accuracy = 0.21859; Fold 4 test accuracy = 0.22113; Fold 5 test accuracy = 0.21373; Overall cross-validated accuracy = 0.2181). Since the ordinal position of the asterisk on the display is highly correlated with the ordinal position of individual keypresses in the sequence, this analysis provides strong evidence that keypress decoding performance from MEG features is not explained by systematic relationships between finger movement behavior and eye movements (i.e. behavioral artifacts). (C) 5-class decoding of ordinal position keypress labels from eye movement recording features approached empirically determined chance levels (as shown by decoding performance after random label shuffling). Note that all decoding performances from eye movement data were substantially lower than MEG hybrid-space decoding for all participants. Sample distribution means are indicated by the solid blue horizontal line with the 95% confidence interval of the group mean indicated by the shaded blue rectangular box.

The trained skill sequence required pressing the index finger twice (4-1-3-2-4) at two contextually different ordinal positions (sequence positions 1 and 5). Inclusion of sequence location information (i.e. sequence context) for each keypress action (five sequence elements with the one keypress represented twice at two different locations) improved decoding accuracy (t=7.09, p<0.001, df = 25, Cohen’s d=0.86, Figure 4B) from 90.47% (± SD 3.44%) to 94.15% (± SD 4.84%; weighted mean F1 score: 0.94), and reduced overall misclassifications by 54.3% (from 219 to 119; Figures 3C and 4B). The improved decoding accuracy is supported by greater differentiation in neural representations of the index finger keypresses performed at positions 1 and 5 of the sequence (Figure 4A), and by the trial-by-trial increase in 2-class decoding accuracy over early learning (Figure 4C) across different decoder window durations (Figure 4—figure supplement 2). As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41% ± SD 7.4% for Day 1 data; Figure 4—figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4—figure supplement 3A) performed at chance levels (cross-validated test accuracy = 0.2181; Figure 4—figure supplement 3B, C).

On Day 2, incorporating contextual information into the hybrid-space decoder enhanced classification accuracy for the trained sequence only (improving from 87.11% for 4-class to 90.22% for 5-class), while performing at or below-chance levels for the control sequences (≤30.22% ± SD 0.44%). Thus, the accuracy improvements resulting from inclusion of contextual information in the decoding framework were specific to the trained skill sequence.

Neural representation of keypress sequence location diverged during early skill learning

We used a Euclidean distance measure to evaluate the differentiation of the neural representation manifold of the same action (i.e. an index-finger keypress) executed within different local sequence contexts (i.e. ordinal position 1 vs. ordinal position 5; Figure 5). To make these distance measures comparable across participants, a new set of classifiers was then trained with group-optimal parameters (i.e. broadband hybrid-space MEG data with subsequent manifold extraction Figure 3—figure supplement 2) and LDA classifiers (Figure 3—figure supplement 7) trained on 200ms duration windows aligned to the KeyDown event (see Materials and methods, Figure 3—figure supplement 5).

Figure 5. Neural representation distance between index finger keypresses performed at two different ordinal positions within a sequence.

(A) Contextualization increases over Early Learning during Day 1 Training. Online (green) and offline (purple) neural representation distances (contextualization) between two index finger key presses performed at ordinal positions 1 and 5 of the trained sequence (4-1-3-2-4) are shown for each trial during Day 1 Training. Both online and offline contextualization between the two index finger representations increases sharply over Early Learning before stabilizing across later Day 1 Training trials. Shaded regions indicate the 95% confidence interval of the group mean. (B) Contextualization develops predominantly during rest periods (offline) on Day 1. The cumulative neural representation differences during early learning were significantly greater over rest (Offline contextualization; right) than during practice (Online contextualization; left) periods (t=4.84, p<0.001, df = 25, Cohen’s d=1.2). Center line of box plot indicates the group median, while notches represent the 95% confidence interval of the group median. Box limits indicate the 1st and 3rd quartiles while whisker lengths are set at the extreme value ≤1.5×IQR. (C) Contextualization acquired on Day 1 was retained on Day 2 specifically for the trained sequence. The neural representation differences assessed across both rest and practice for the trained sequence (4-1-3-2-4) were retained at Day 2 Retest. This is in stark contrast with the reduction in contextualization for several untrained sequences controlling for: (1) index finger keypresses located at the same ordinal positions 1 and 5 but within a different intervening sequence pattern (Pattern Specificity Control: 4-2-3-1-4, 51.05% lower contextualization); (2) use of a finger different than the index (little or ring finger) in both ordinal positions 1 and 5 (Finger Specificity Control: 2-1-3-4-2, 1-4-2-3-1 and 2-3-1-4-2; 35.80% lower contextualization); and (3) multiple index finger keypresses occurring at ordinal positions other than 1 and 5 (Position Specificity Control: 4-2-4-3-1 and 1-4-3-4-2; 22.06% lower contextualization). Note that offline contextualization cannot be measured for the Day 2 Control sequences as each sequence was only performed over a single trial. Error bars indicate S.E.M.

Figure 5.

Figure 5—figure supplement 1. Relationship between offline neural representational changes and micro-offline learning.

Figure 5—figure supplement 1.

(A) Relationship between offline neuronal representational changes and micro-offline learning. Offline contextualization—calculated as the Euclidian distance between the neural representations observed for the first IndexOP1 keypress from practice trial, n, and the last IndexOP5 keypress from practice trial, n-1—increased over early learning. A linear regression analysis (shown in the inset) revealed a strong temporal relationship (correlation coefficient [r]=0.903 and coefficient of variance explained [R2]=0.816) between contextualization and cumulative micro-offline gains over early learning. Shaded regions indicate the 95% confidence interval of the group mean. (B) Changes in offline contextualization for different decoding window durations as a function of rest breaks. We constructed decoders from different MEG input feature time windows (window durations of 50, 100, 150, 200, 250, and 300ms; all aligned to the KeyDown event), to assess the robustness of the offline contextualization finding with respect to this parameter selection. Offline contextualization showed similar trends for all options tested. (C) Relationship between offline neural representational changes and micro-offline learning across all window durations. The linear regression analysis from (A) was repeated for all contextualization measures from (B) obtained after varying the MEG input feature window size (50–300ms). This strong temporal relationship was observed for all window durations (0.598 ≥ R2≥0.816), except for 300ms (R2=0.284) where temporal overlap of individual keypress features was most prominent.
Figure 5—figure supplement 2. Trial-by-trial trends for different measurement approaches of offline and online contextualization changes.

Figure 5—figure supplement 2.

(A) Offline contextualization between the last sequence of a preceding trial and the second sequence of the subsequent one (skipping the first sequence of that trial) rendered a comparable result to the measure reported in Figure 5, Figure 5—figure supplement 1 which use the first sequence—inconsistent with a possible confounding effect of pre-planning (Ariani and Diedrichsen, 2019). Shaded regions indicate the 95% confidence interval of the group mean. (B) Two different measurement approaches were used to characterize online contextualization changes. The sequence-based approach calculated the mean distance between IndexOP1 and IndexOP5 for each correct sequence iteration within a trial (green). A second trial-based approach was also implemented, which controlled for the passage of time between observations used in both online and offline distance measures (10 s between IndexOP1 and IndexOP5 observations in both cases). Note that the trial-based approach showed no increase in online contextualization over early learning. Importantly, the overall magnitude of online contextualization by the end of early learning was similar for both measurement approaches, and both showed reduced online relative to offline contextualization. Shaded regions indicate the 95% confidence interval of the group mean.
Figure 5—figure supplement 3. Online contextualization versus micro-online learning.

Figure 5—figure supplement 3.

The relationship between online contextualization and online learning is shown for both sequence- (A, left) and trial-based (B, right) distance measurement approaches. There was no significant relationship between online learning and online contextualization regardless of the measurement approach. Shaded regions indicate the 95% confidence interval of the group mean.
Figure 5—figure supplement 4. Within-subject correlations between online and offline contextualization changes versus learning.

Figure 5—figure supplement 4.

Pirate plots displaying individual subject correlation coefficients for offline (i.e. over rest) and online (i.e. during practice) contextualization changes versus micro-offline and -online performance gains. Zero correlation is marked by the horizontal dashed line. Distribution means are indicated by the solid black horizontal line with the 95% confidence interval of the group mean indicated by the shaded rectangular box. Within-subject correlations were significantly greater for offline contextualization changes versus micro-offline performance gains than for online contextualization changes versus either micro-offline or -online performance gains. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (left; t=3.87, p=0.00035, df = 25, Cohen’s d=0.76) and stronger than correlations between online contextualization and either micro-online (middle; t=3.28, p=0.0015, df = 25, Cohen’s d=1.2) or micro-offline gains (right; t=3.7021, p=5.3013e-04, df = 25, Cohen’s d=0.69).
Figure 5—figure supplement 5. Online versus offline changes in keypress transition patterns.

Figure 5—figure supplement 5.

(A) Trial-by-trial Euclidean distance between the relative share of each keypress transition time to the full sequence duration (i.e. differences in typing rhythm). This distance was calculated for the first and last sequence of each trial (online pattern distance; green) and the last sequence of a trial versus the first sequence of the next (offline pattern distance; purple). Shaded regions indicate the 95% confidence interval of the group mean. (B) Cumulative online (green; left) and offline (purple; right) pattern distances recorded over all forty-five trials covering Days 1 and 2. Note the comparable online and offline typing rhythm changes do not explain differences between online and offline contextualization, which is fully developed by trial 11 (Figure 5).
Figure 5—figure supplement 6. The relationship between adjacent index finger transitions and online contextualization.

Figure 5—figure supplement 6.

Scatter plot showing that the sum of adjacent index finger keypress transition times (i.e. the 4–4 transition at the conclusion of one sequence iteration and the 4–1 transition at the beginning of the next sequence iteration) versus online contextualization distances measured during practice trials. Both the keypress transition times and online contextualization scores were z-score normalized within individual subjects and then concatenated into a single data superset. A simple linear regression between keypress transition time predictor and the online contextualization response variable showed a very weak linear relationship between the two (R2=0.00507, F[1,3202]=16.3). This result shows that contextualization of index finger representations does not reflect the amount of overlap between adjacent keypresses.
Figure 5—figure supplement 7. Between-subject differences in typing speed versus online contextualization.

Figure 5—figure supplement 7.

(A) Between-subject relationship between plateau performance speed and online contextualization. The plateau performance typing speed showed no significant relationship with the degree of online contextualization (R2=0.028, p=0.41). Each dot represents the maximum speed attained and the corresponding degree of contextualization of each participant. Thus, the magnitude of online contextualization was not dependent on how fast individuals could perform the task at the end of early learning. (B) Trial-by-trial relationship between typing speed and degree of online contextualization. We also performed a trial-by-trial regression analysis that related the degree of online contextualization for each trial with the median typing speed for that trial. The R2 values obtained for regression analyses performed on individual trials were also low and not statistically significant (mean R2=0.06; p>0.05). Red and black horizontal lines indicate the group median and mean R2 values, respectively.

The Euclidean distance between neural representations of IndexOP1 (i.e. index finger keypress at ordinal position 1 of the sequence) and IndexOP5 (i.e. index finger keypress at ordinal position 5 of the sequence) increased progressively during early learning (Figure 5A)—predominantly during rest intervals (offline contextualization) rather than during practice (online) (t=4.84, p<0.001, df = 25, Cohen’s d=1.2; Figure 5B; Figure 5—figure supplement 1A). An alternative online contextualization determination equaling the time interval between online and offline comparisons (Trial-based; 10 s between IndexOP1 and IndexOP5 observations in both cases) rendered a similar result (Figure 5—figure supplement 2B).

Offline contextualization strongly correlated with cumulative micro-offline gains (r=0.903, =0.816, p<0.001; Figure 5—figure supplement 1A, inset) across decoder window durations ranging from 50 to 250 ms (Figure 5—figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure (Ariani and Diedrichsen, 2019; Figure 5—figure supplements 2A 1st vs. 2nd Sequence approaches). Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. Figure 5—figure supplement 3). Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5—figure supplement 4, left; t=3.87, p=0.00035, df = 25, Cohen’s d=0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5—figure supplement 4, middle; t=3.28, p=0.0015, df = 25, Cohen’s d=1.2) or micro-offline gains (Figure 5—figure supplement 4, right; t=3.7021, p=5.3013e-04, df = 25, Cohen’s d=0.69). These findings were not explained by behavioral changes of typing rhythm (t=–0.03, p=0.976; Figure 5—figure supplement 5), adjacent keypress transition times (R2=0.00507, F [1,3202]=16.3; Figure 5—figure supplement 6), or overall typing speed (between-subject; R2=0.028, p=0.41; Figure 5—figure supplement 7).

Finally, contextualization of IndexOP1 vs. IndexOP5 representations observed on Day 1 generalized to Day 2 Retest of the trained skill sequence. Distances between representations for the same keypress performed twice within untrained sequences were lower in magnitude (Day 2 Control)—pointing to specificity of the contextualization effect (Figure 5C).

Discussion

The main findings of this study during which subjects engaged in a naturalistic, self-paced task were that individual sequence action representations differentiate during early skill learning in a manner reflecting the local sequence context in which they were performed, and that the degree of representational differentiation—particularly prominent over rest intervals—correlated with skill gains.

Optimizing decoding of sequential finger movements from MEG activity

The initial phase of the study focused on optimizing the accuracy of decoding individual finger keypresses from MEG brain activity. Recent work showed that the brain simultaneously processes information more efficiently across multiple—rather than a single—spatial scale(s) (Munn et al., 2024; Buch et al., 2017). To this effect, we developed a novel hybrid-space approach designed to integrate neural representation dynamics over two different spatial scales: (1) whole-brain parcel-space (i.e. spatial activity patterns across all cortical brain regions) and (2) regional voxel-space (i.e. spatial activity patterns within select brain regions) activity. We found consistent spatial differences between whole-brain parcel-space feature importance (predominantly contralateral frontoparietal, Figure 2B) and regional voxel-space decoder accuracy (bilateral sensorimotor regions, Figure 2D). The whole-brain parcel-space decoder likely emphasized more stable activity patterns in contralateral frontoparietal regions that differed between individual finger movements (Beukema et al., 2019; Lemon, 2008), while the regional voxel-space decoder likely incorporated information related to adaptive interhemispheric interactions operating during motor sequence learning (Buch et al., 2017; Zimerman et al., 2014; Waters et al., 2017), particularly pertinent when the skill is performed with the non-dominant hand (Sawamura et al., 2019; Lee et al., 2019; Grafton et al., 2002). The observation of increased cross-validated test accuracy (as shown in Figure 3—figure supplement 6) indicates that the spatially overlapping information in parcel- and voxel-space time-series in the hybrid decoder was complementary, rather than redundant (Yu and Liu, 2004). The hybrid-space decoder, which achieved an accuracy exceeding 90%—and robustly generalized to Day 2 across trained and untrained sequences—surpassed the performance of both parcel-space and voxel-space decoders and compared favorably to other neuroimaging-based finger movement decoding strategies (Buch et al., 2021; Lee et al., 2022; Liao et al., 2014; Quandt et al., 2012; Kornysheva et al., 2019).

Evaluation of individual brain oscillatory activity revealed that low-frequency oscillations (LFOs) result in higher decoding accuracy compared to other narrow-band activity (Natraj et al., 2022; Reddy et al., 2021). Task-related movements—which also express in lower frequency ranges—did not explain these results given the near chance-level performance of alternative decoders trained on (a) artifact-related ICA components removed during MEG pre-processing (Figure 3—figure supplement 3A–C) and on (b) task-related eye movement features (Figure 4—figure supplement 3B, C). This explanation is also inconsistent with the minimal average head motion of 1.159 mm (±1.077 SD) across the MEG recording (Figure 3—figure supplement 3D). How could LFOs contribute to keypress decoding accuracy? LFOs, observed during movement onset in the cerebral cortex of animals (Bansal et al., 2011; Mollazadeh et al., 2011) and humans (Bönstrup et al., 2019b; Cruikshank et al., 2012; Tomassini et al., 2017), encode information about movement trajectories and velocity (Bansal et al., 2011; Mollazadeh et al., 2011). They also contain information related to movement timing (Ramanathan et al., 2018; Hall et al., 2014; Stefanics et al., 2010), preparation (Flint et al., 2012; Krasoulis et al., 2014), sensorimotor integration (Cruikshank et al., 2012), kinematics (Flint et al., 2012; Krasoulis et al., 2014) and may contribute to the precise temporal coordination of movements required for sequencing (Churchland et al., 2012). Within clinical contexts, LFOs in the frontoparietal regions, resulting in high decoding accuracy in the present study, have been linked to recovery of motor function after brain lesions like stroke (Bönstrup et al., 2019b; Ramanathan et al., 2018; Frohlich et al., 2021).

Neural representations of individual sequence actions differentiate during early skill learning

Next, we exploited the hybrid decoding approach to investigate if individual sequence action representations differentiate or remain stable during early skill learning, when the memory is not yet fully formed (Bönstrup et al., 2019a). The first hint of representational differentiation was the highest false-negative and lowest false-positive misclassification rates for index finger keypresses performed at different locations in the sequence compared with all other digits (Figure 3C). This was further supported by the progressive differentiation of neural representations of the index finger keypress (Figure 4A) and by the robust trial-by-trial increase in 2-class (IndexOP1 vs IndexOP5) decoding accuracy across time windows ranging between 50 and 250ms (Figure 4C; Figure 4—figure supplement 2). Further, the 5-class classifier—which directly incorporated information about the sequence location context of each keypress into the decoding pipeline—improved decoding accuracy relative to the 4-class classifier (Figure 4C). Importantly, testing on Day 2 revealed specificity of this representational differentiation for the trained skill but not for the same keypresses performed during various unpracticed control sequences (Figure 5C).

The main region contributing information to representational differentiation during early practice (trials 1–10) was the primary motor cortex, followed by the somatosensory cortex (trial 11), both of which are known to be actively engaged in skill acquisition (Buch et al., 2021; Karni et al., 1995; Classen et al., 1998; Kleim et al., 1998; Kumar et al., 2019; Pavlides et al., 1993). Concurrently, information from the superior frontal and middle frontal cortex—which encodes hierarchical structures of skill sequences (Yokoi and Diedrichsen, 2019)—steadily increased in importance and emerged as the two most crucial decoding contributors once skill performance plateau had been reached (trials 15–36; Figure 4—figure supplement 1; Hikosaka et al., 1999; Dayan and Cohen, 2011). Thus, the neural substrates supporting finger movements and their representational differentiation during early skill learning (the time period during which 95% skill gains in the training session occur Bönstrup et al., 2019a; Pan and Rickard, 2015, trials 1–11 in this study) differed from those supporting stable performance during the subsequent skill plateau period (Karni et al., 1995; Robertson and Cohen, 2006; trials 12–36 in this study).

Differentiation of neural representations developed predominantly during rest periods interspersed with practice

We then focused on the timeline of differentiation of index finger keypress neural representations—which we refer to as contextualization—over early learning. We found that contextualization increased progressively during early learning—predominantly during short rest breaks (offline) rather than during practice (online; Figure 5, Figure 5—figure supplement 2B). Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5—figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning (Ariani and Diedrichsen, 2019; Figure 5—figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5—figure supplement 3). Consistent with these results, the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than within-subject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5—figure supplement 4).

Offline contextualization was not driven by trial-by-trial behavioral differences, including typing rhythm (Figure 5—figure supplement 5) and adjacent keypress transition times (Figure 5—figure supplement 6) nor by between-subject differences in overall typing speed (Figure 5—figure supplement 7)—ruling out a reliance on differences in the temporal overlap of keypresses. Importantly, offline contextualization documented on Day 1 stabilized once a performance plateau was reached (trials 11–36) and was retained on Day 2, documenting overnight consolidation of the differentiated neural representations. A possible neural mechanism supporting contextualization could be the emergence and stabilization of conjunctive ‘what–where’ representations of procedural memories (Komorowski et al., 2009) with the corresponding modulation of neuronal population dynamics (Georgopoulos, 1994; Georgopoulos et al., 1982) during early learning. Exploring the link between contextualization and neural replay could provide additional insights into this issue (Buch et al., 2021; Chen et al., 2024; Sjøgård, 2024; Griffin et al., 2025).

In this study, classifiers were trained on MEG activity recorded during or immediately after each keypress, emphasizing neural representations related to action execution, memory consolidation, and recall over those related to planning. An important direction for future research is determining whether separate decoders can be developed to distinguish the representations or networks separately supporting these processes. Ongoing work in our lab is addressing this question. The present accuracy results across varied decoding window durations and alignment with each keypress action support the feasibility of this approach (Figure 3—figure supplement 5).

Limitations

One limitation of this study is that contextualization was investigated for only one finger movement (index finger or digit 4) embedded within a relatively short 5-item skill sequence. Determining if representational contextualization is exhibited across multiple finger movements embedded within, for example, longer sequences (e.g. two index finger and two little finger keypresses performed within a short piece of piano music) will be an important extension to the present results. While a supervised manifold learning approach (LDA) was used here because it optimized hybrid-space decoder performance, unsupervised strategies (e.g. PCA and MDS, which also substantially improved decoding accuracy in the present study; Figure 3—figure supplement 2), are likely more suitable for real-time BCI applications. Finally, caution should be exercised when extrapolating findings during early skill learning, a period of steep performance improvements, to findings reported after insufficient practice (Das et al., 2024), post-plateau performance periods (Gupta and Rickard, 2022), or non-learning situations (e.g. performance of non-repeating keypress sequences in Das et al., 2024) when reactive inhibition or contextual interference effects are prominent. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. memory consolidation, planning, working memory, and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.

Summary

In summary, individual sequence action representations contextualize during early learning of a new skill, and the degree of differentiation parallels skill gains. Differentiation of the neural representations developed during rest intervals of early learning to a larger extent than during practice in parallel with rapid consolidation of skill. It is possible that the systematic inclusion of contextualized information into sequence skill practice environments could improve learning in areas as diverse as music education, sports training, and rehabilitation of motor skills after brain lesions.

Materials and methods

Study participants

The study was approved by the Combined Neuroscience Institutional Review Board of the National Institutes of Health (NIH). A total of thirty-three young and healthy adults (16 females) with a mean age of 26.6 years (±0.87 SEM) participated in the study after providing written informed consent and undergoing a standard neurological examination. No participants were actively engaged in playing musical instruments in their daily lives, as per guidelines outlined in prior research (Ruiz et al., 2009; Maidhof et al., 2009). All study scientific data were de-identified and permanently unlinked from all personal identifiable information (PII) before the analysis. These data are publicly available (https://doi.org/10.18112/openneuro.ds006502.v1.0.0; Accession Number: ds006502). Two participants were excluded from the analysis due to MEG system malfunction during data acquisition. An additional 5 subjects were excluded because they failed to generate any correct sequences in two or more consecutive trials. The study was powered to determine the minimum sample size needed to detect a significant change in skill performance following training using a one-sample t-test (two-sided; alpha = 0.05; 95% statistical power; Cohen’s d effect size = 0.8115 calculated from previously acquired data in our lab Censor et al., 2014). The calculated minimum sample size was 22. The included study sample size (n=26) exceeded this minimum (Bönstrup et al., 2019a).

Experimental setup

Participants practiced a procedural motor skill learning task that involved repetitively typing a 5-item numerical sequence (4-1-3-2-4) displayed on a computer screen. They were instructed to perform the task as quickly and accurately as possible on a response pad (Cedrus LS-LINE, Cedrus Corp) using their non-dominant, left hand. Each numbered sequence item corresponded to a specific finger keypress: 1 for the little finger, 2 for the ring finger, 3 for the middle finger, and 4 for the index finger. Individual keypress times and identities were recorded and used to assess skill learning and performance. The response pad was positioned in a manner that minimized wrist, arm, or more proximal body movements during the task. The head was restrained with an inflatable air bladder, and head position was assessed at the beginning and at the end of each recording. The mean measured head movement across the study group was 1.159 mm (±1.077 SD; Figure 3—figure supplement 3).

Participants practiced the skill for 36 trials. Each trial spanned a total of 20 s and included a 10 s practice round followed by a 10 s rest break. The study design followed specific recommendations by Pan and Rickard, 2015: (1) utilizing 10 s practice trials and (2) constraining analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur) that precede the emergence of ‘scalloped’ performance dynamics strongly linked to reactive inhibition effects (Pan and Rickard, 2015; Brawn et al., 2010). This is precisely the portion of the learning curve Pan and Rickard referred to when they stated “…rapid learning during that period masks any reactive inhibition effect” (Pan and Rickard, 2015).

The five-item sequence was displayed on the computer screen for the duration of each practice round, and participants were directed to fix their gaze on the sequence. Small asterisks were displayed above a sequence item after each successive keypress, signaling the participants' present position within the sequence. Inclusion of this feedback minimizes working memory loads during task performance (Walker et al., 2002). Following the completion of a full sequence iteration, the asterisk returned to the first sequence item. The asterisk did not provide error feedback as it appeared for both correct and incorrect keypresses. At the end of each practice round, the displayed number sequence was replaced by a string of five ‘X’ symbols displayed on the computer screen, which remained for the duration of the rest break. Participants were instructed to focus their gaze on the screen during this time. The behavior in this explicit, motor learning task consists of generative action sequences rather than sequences of stimulus-induced responses as in the serial reaction time task (SRTT). A similar real-world example would be manually inputting a long password into a secure online application in which one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user.

On the next day, participants were tested (Day 2 Retest) with the same trained sequence (4-1-3-2-4) for nine trials as well as for nine different unpracticed control sequences (Day 2 Control; 2-1-3-4-2; 4-2-4-3-1; 3-4-2-3-1; 1-4-3-4-2; 3-2-4-3-1; 1-4-2-3-1; 3-2-4-2-1; 2-3-1-4-2; 4-2-3-1-4) each for one trial. The practice schedule structure for Day 2 was the same as Day 1, with 10 s practice trials interleaved with 10 s rest breaks.

Behavioral data analysis

Skill

Skill, in the context of the present task, is quantified as the correct sequence typing speed, (i.e. the number of correctly typed sequence keypresses per second; kp/s). That is, improvements in the speed/accuracy trade-off equate to greater skill. Keypress transition times (KTT) were calculated as the difference in time between the keyDown events recorded for consecutive keypresses. Since the sequence was repeatedly typed within a single trial, individual keypresses were marked as correct if they were members of a five consecutive keypress set that matched any possible circular shift of the displayed five-item sequence. The instantaneous correct sequence speed was calculated as the inverse of the average KTT across a single correct sequence iteration and was updated for each correct keypress. Trial-by-trial skill changes were assessed by computing the median correct sequence typing speed for each trial.

Early learning

The early learning period was defined as the trial range (1 T trials) over which 95% of the total skill performance was first attained at the group level. We quantified this by fitting the group average trial-by-trial correct sequence speed data with an exponential model of the form:

L(t)=C1+C2 (1ekt)

Here, the trial number is denoted by t, and L(t) signifies the group-averaged performance at trial t. Parameters C1 and C2 correspond to the pre-training performance baseline and asymptote, respectively, while k denotes the learning rate. The values for C1, C2, and k were computed using a constrained nonlinear least-squares method (MATLAB’s lsqcurvefit function, trust-region-reflective algorithm) and were determined to be 0.5, 0.15, and 0.2, respectively. The early learning trial cut-off, denoted as T, was identified as the first trial where 95% of the learning had been achieved. In this study, T was determined to be trial 11.

Micro-offline and -online gains

Performance improvements over each 10 s rest break (micro-offline gains) were calculated as the net performance change (instantaneous correct sequence typing speed) from the end of one practice period to the onset of the next, while micro-online gains were computed as the net performance change over a single practice trial. Total early learning was derived as the sum of all micro-online and micro-offline gains over trials 1–11. Cumulative micro-offline gains, micro-online gains, and total early learning were statistically compared using one-way ANOVAs and post-hoc Tukey tests. Possible pre-planning effects on initial skill performance (Ariani and Diedrichsen, 2019) were assessed by using paired t-tests to statistically compare cumulative micro-offline and -online computed for all keypresses with their measurement counterparts calculated after omitting the first 3 keypresses in each trial from the correct sequence speed computation.

MRI acquisition

We acquired T1-weighted high-resolution anatomical MRI volumes images (1 mm3 isotropic MPRAGE sequence) for each participant on a 3T MRI scanner (GE Excite HDxt or Siemens Skyra) equipped with a 32-channel head coil. These data allowed for spatial co-registration of an individual participant’s brain with the MEG sensors, and individual head models required for surface-based cortical dipole estimation from MEG signals (i.e. MEG source-space modeling).

MEG acquisition

We recorded continuous magnetoencephalography (MEG) at a sampling frequency of 600 Hz using a CTF 275 MEG system (CTF Systems, Inc, Canada) while participants were seated in an upright position. The MEG system comprises a whole-head array featuring 275 radial 1st-order gradiometer/SQUID channels housed in a magnetically shielded room (Vacuumschmelze, Germany). Three of the gradiometers (two non-functional and one with high channel noise after visual inspection) were excluded from the analysis resulting in a total of 272 useable MEG sensor channels. Synthetic third-order gradient balancing was applied to eliminate background noise in real-time data collection. Temporal alignment of behavioral and MEG data was achieved using a TTL trigger. Head position in the scanner coordinate space was assessed at the beginning and end of each recording using head localization coils at the nasion, left, and right pre-auricular locations. These fiducial positions were co-registered in the participants’ T1-MRI coordinate space using a stereotactic neuronavigation system (BrainSight, Rogue Research Inc). MEG data was acquired starting 6 min before the task (resting-state baseline) and continued through the end of the 12 min training session.

MEG data analysis

Preprocessing

MEG data were preprocessed using the FieldTrip (Oostenveld et al., 2011) and EEGLAB (Delorme and Makeig, 2004) toolboxes on MATLAB 2022a. Continuous raw MEG data were band-pass filtered between 1–100 Hz with a fourth-order noncausal Butterworth filter. 60 Hz line noise was removed with a narrow-band discrete Fourier transform (DFT) notch filter. Independent component analysis (ICA) was used to remove typical MEG signal artifacts associated with eye blinks or movement, muscle contraction or cardiac pulsation. All recordings were visually inspected and marked to denoise segments containing other large amplitude artifacts due to movements. Eye movements were simultaneously recorded with MEG (EyeLink 1000 Plus).

Source reconstruction and parcellation

For each participant, individual volume conduction models were computed to estimate the propagation of brain-generated currents through tissue resulting in externally measurable magnetic fields. This was accomplished through a single-shell head corrected-sphere approach based on the brain volume segmentation of the participant’s high-resolution T1 MRI. Source models and surface labels from the Desikan-Killiany Atlas (Destrieux et al., 2010) were created for each participant using inner-skull and pial layer surfaces obtained through FreeSurfer segmentation (Dale et al., 1999; Marcus et al., 2011) and Connectome Workbench resampling (Dale et al., 1999; Marcus et al., 2011). Aligning sensor positions in the MEG helmet to individual head space involved rigid-body registration of the mean MEG head coil position to the same fiducial locations marked in the MRI and applying the same affine transformation to all MEG sensors.

The individual source, volume conduction model, and sensor positions were then utilized to generate the forward solution at each source dipole location, describing the propagation of source activity from each cortical location on the grid to each MEG sensor. The Linearly Constrained Minimum-Variance (LCMV) beamformer was employed for computing the inverse solution. Each trial of MEG activity contributed to calculating the inverse solution data covariance matrix. The individual sample noise covariance matrix was derived from 6 min of pre-training rest MEG data recorded in the same subject during the same session. A total of 15,684 surface-based cortical dipoles (i.e. source-space voxels) were estimated.

Source-space parcellation was carried out by averaging all voxel time-series located within distinct anatomical regions defined in the Desikan-Killiany Atlas (Destrieux et al., 2010). Since source time-series estimated with beamforming approaches are inherently sign-ambiguous, a custom Matlab-based implementation of the mne.extract_label_time_course with ‘mean_flip’ sign-flipping procedure in MNE-Python (Gramfort et al., 2013) was applied prior to averaging to prevent within-parcel signal cancellation. All voxel time-series within each parcel were extracted, and the time series sign was flipped at locations where the orientation difference was greater than 90° from the parcel mode. A mean time series was then computed across all voxels within the parcel after sign-flipping.

Feature selection for decoding

Several MEG activity features were extracted over different spatial, spectral, and temporal scales.

Oscillatory analysis

MEG signals were constrained to broadband (1–100 Hz) or standard neural oscillatory frequency bands defined as delta (1–3 Hz), theta (4–7 Hz), alpha (8–14 Hz), beta (15–24 Hz), gamma (25–50 Hz), and high-gamma (51–100 Hz) using a fourth-order non-causal Butterworth filter. Subsequent decoding analyses were independently conducted for each band of MEG activity.

Spatial analysis

Decoding was performed in both sensor and source spaces. The sensor-space decoding feature dimension was 272 (corresponding to the 272 usable MEG channels), while source-space decoding was carried out at both the higher feature dimension voxel (i.e. higher spatially sampled; N=15,684) and lower feature dimension parcel space (i.e. lower spatially sampled; N=148) across all oscillatory frequency bands (i.e. broadband, delta, theta, alpha, beta, gamma, and high-gamma) for comprehensive comparison.

Temporal analysis

MEG activity time series corresponding to each keypress was defined using the time window, [t + △t], where t ∈ [0 : 10ms: 100ms] and △t ∈ [25ms: 25ms: 350ms]. In other words, a sliding window of variable width (from 25 ms to 350 ms with 25ms increments), and with onsets ranging from the keyDown event (i.e. t=0ms) to +100ms after the keyDown event (with increments of 10ms) was used. This approach generated a set of 140 different time windows associated with each keypress for each participant. MEG activity was averaged over time within each of these windows and independently analyzed for decoding. The optimal time window was selected for each subject that resulted in the maximum cross-validation performance (Figure 3—figure supplement 5). This window optimization analysis was performed for each frequency band and spatial scale.

Hybrid spatial approach

First, we evaluated the decoding performance of each individual brain region in accurately decoding finger keypresses from regional voxel-space (i.e. all voxels within a brain region as defined by the Desikan-Killiany Atlas) activity. Brain regions were then ranked from 1 to 148 based on their decoding accuracy at the group level. In a stepwise manner, we then constructed a ‘hybrid-space’ decoder by incrementally concatenating regional voxel-space activity of brain regions—starting with the top-ranked region—with whole-brain parcel-level features and assessed decoding accuracy. Subsequently, we added the regional voxel-space features of the second-ranked brain region and continued this process until decoding accuracy reached saturation. The optimal ‘hybrid-space’ input feature set over the group included the 148 parcel-space features and regional voxel-space features from a total of 8 brain regions (bilateral superior frontal, middle frontal, pre-central, and post-central; N=1295 ± 20 features).

Dimension reduction

We independently applied several supervised and unsupervised dimension reduction techniques as an additional feature extraction step for each broadband MEG activity space (i.e. sensor, parcel, voxel, and hybrid), including: linear discriminant analysis (LDA), minimum redundant maximum relevance (MRMR), principal component analysis (PCA), Autoencoder, Diffusion maps, factor analysis, large margin nearest neighbor (LMNN), multi-dimensional scaling (MDS), neighbor component analysis (NCA), spatial predictor envelope (SPE; Maaten and Postma, 2009). Among these techniques, PCA, MDS, MRMR, and LDA emerged as particularly effective in significantly improving decoding performance across all broadband MEG activity spaces.

PCA, a method for unsupervised dimensionality reduction, transforms the high-dimensional dataset into a new coordinate system of orthogonal principal components. These components, capturing the maximum variance in the data, were iteratively added to reconstruct the feature space and execution of decoding. MDS finds a configuration of points in a lower-dimensional space such that the distances between these points reflect the dissimilarities or similarities between the corresponding objects in the original high-dimensional space. MRMR, an approach combining relevance and redundancy metrics, ranks features based on their significance to the target variable and their non-redundancy with other features. The decoding process started with the highest-ranked feature and iteratively incorporated subsequent features until decoding accuracy reached saturation. LDA finds the linear combinations of features (dimensions) that best separate different classes in a dataset. It projects the original features onto a lower-dimensional space (number of classes –1) while preserving the class-discriminatory information. This transformation maximizes the ratio of the between-class variance to the within-class variance. In our study, LDA transformed the features to a 3/4-dimensional hyperdimensional space that was used for decoding. Dimension reduction was first applied to training data, and then with the tuned parameters of the dimension reduction model, an independent test data subset was transformed for decoder metrics evaluation. Decoding accuracies were systematically compared between the original and reduced dimension feature spaces, providing insight into the effectiveness of each dimension reduction technique. By rigorously assessing the impact of dimension reduction on decoding accuracy, the study aimed to identify techniques that not only reduced the computational burden associated with high-dimensional data but also enhanced the discriminative power of the selected features. This comprehensive approach aimed at optimizing the neural representation of each finger keypress for decoding performance across various spatial contexts.

Decoding analysis

Decoding analysis was conducted for each participant individually, employing a randomized split of the data into independent training (90%) and test (10%) samples over eight iterations. For each iteration, an eightfold cross-validation was applied to the training samples to optimize decoder configuration, allowing for the fine-tuning of hyperparameters and selection of the most effective model. On average, the total number of individual keypress samples for the entire duration of training was 219 ± SD: 66 (keypress 1: little), 205 ± SD: 66 (keypress 2: ring), 209±66 (keypress 3: middle), and 426 ± SD: 131 (keypress 4: index) across participants. Only keypresses belonging to correctly typed sequence iterations (94.64% ± 4.04% of all keypresses) were considered. The total number of index finger keypresses (i.e. keypress 4) was approximately twice that of the others, as it was the only action that occurred more than once in the trained sequence (4-1-3-2-4), albeit at two different ordinal positions. Considering the higher (2 x) number of samples for one-class, we independently oversampled the keypresses 1, 2, and 3 to avoid overfitting to the over-represented class. Importantly, oversampling was applied independently for each keypress class, ensuring that validation folds were never oversampled, and training folds did not share common oversampled patterns. The decoder configuration demonstrating the best validation performance was selected for each iteration, and subsequently, test accuracy was evaluated on the independent/unseen test samples. This process was repeated for the eight different iterations of train-test splitting, and the average test accuracy over all iterations was reported. This rigorous methodology aimed at generalizing decoding performance to ensure robust and reliable results across participants. Finally, decoding evaluation was also performed on the Day 2 data, for both the trained (Day 2 Retest; 9 trials) and untrained sequences (Day 2 Control; 9 different single-trial tests).

Machine learning classifiers

We employed a diverse set of machine learning-based decoders—including Naïve Bayes (NB), decision trees (DT), ensembles (EN), k-nearest neighbor (KNN), linear discriminant analysis (LDA), support vector machines (SVM), and artificial neural network (ANN)—to train features generated over all possible combinations of spatial and temporal scales and oscillation frequency-bands in order to carry out a comprehensive comparative analysis. The hyperparameters of these decoders underwent fine-tuning using Bayesian optimization search.

All NB classifiers were configured with a normal distribution predictor and Gaussian Kernel, while KNN classifiers had a K value of 4 (for keypress decoding) and utilized the Euclidean distance metric. For DT classifiers, the maximum number of splits was set to 4 (for keypress decoding), with leaves being merged based on the sum of risk values greater than or equal to the risk associated with the parent node. The optimal sequence of pruned trees was estimated, and the predictor selection method was 'Standard CART’, selecting the split predictor that maximizes the split-criterion gain over all possible splits of all predictors. The split criterion used was 'gdi' (Gini’s diversity index). EN classifiers employed the bagging method with random predictor selections at each split, forming a random forest. The maximum number of learning cycles was set to 100 with a weak learner based on discriminant analysis. For SVM, the RBF kernel was selected through cross-validation (CV), and the 'C' parameter and kernel scale were optimized using Bayesian optimization search. In the case of LDA, the linear coefficient threshold and the amount of regularization were computed based on Bayesian optimization search. Finally, all ANN decoders consisted of one hidden layer with 128 nodes, followed by a sigmoid and a softmax layer, each with four nodes (for keypress decoding). Training utilized a scaled conjugate gradient optimizer with backpropagation, employing a learning rate of 0.01 (coarse to fine tuning) for a maximum of 100 epochs, with early stopping validation patience set to 6 epochs.

Decoding performance metric

Decoding performance was assessed using several metrics, including accuracy (%), which indicates the proportion of correct classifications among all test samples. Confusion matrices provide a detailed comparison of the number of correctly predicted samples for each class against the ground truth. The F1 score—defined as the harmonic mean of the precision (percentage of true predictions that are actually true positive) and recall (percentage of true positives that were correctly predicted as true) scores—was used as a comprehensive metric for each one-versus-all keypress state decoder to assess class-wise performance that accounts for both false-positive and false-negative prediction tendencies (Rijsbergen, 1979; Schütze et al., 2008). A weighted mean F1 score was then computed across all classes to assess the overall prediction performance of the multi-class model. Test accuracies based on the best decoder performance (LDA in our case) were reported and used for statistical comparisons.

Decoding during skill learning progression

We systematically assessed decoding performance of a 2-class decoder (IndexOP1 vs IndexOP5; i.e. decoding of two index finger keypresses occurring at different locations within the training sequence) at each trial during the skill learning process to capture the evolving relationship between differentiated index finger decoding proficiency and the acquired skill. Our approach involved evaluating decoder performance individually for each Day 1 Training trial. We ensured an equal number of samples (first k keypresses) in each trial were used to mitigate the influence of increasing samples available in later trials.

We used t-distributed stochastic neighborhood estimation (t-SNE) to visualize the evolution of neural representations corresponding to each keypress at each trial of the learning period. Within t-SNE distributions, index finger keypresses were separately labeled based upon their sequence location (i.e. IndexOP1 and IndexOP5 for ordinal positions 1 and 5, respectively).

Decoding sequence elements

We performed 5-class decoding of each action based on its location within the sequence (i.e. IndexOP1, Little, Middle, Ring, and IndexOP5). The same decoding strategy was utilized as for the above 4-class keypress-based decoding (i.e. 90%–10% split for train and test, 8-fold cross-validation of training samples to select best decoder configuration, hybrid spatial features, and LDA-based dimension reduction). Note, oversampling was not needed after sub-grouping the index finger keypresses into two separate classes based on their sequence context. 5-class sequence-based decoding was evaluated for both Day 1 Training, Day 2 Retest, and Day 2 Control data.

Feature importance scores

The relative contribution of source-space voxels and parcels to decoding performance (i.e. feature importance score) was calculated using minimum redundant maximum relevance (MRMR; Ding and Peng, 2005) and highlighted in topography plots. MRMR, an approach that combines both relevance and redundancy metrics, ranked individual features based upon their significance to the target variable (i.e. keypress state identity) prediction accuracy and their non-redundancy with other features.

Neural representation analysis

We evaluated the online (within-trial) and offline (between-trial) changes in the neural representation of the contextual actions (IndexOP1 and IndexOP5) for each trial during training. For offline differentiation, we evaluated the Euclidean distance between the hybrid spatial features of the last index finger keypress of a trial (IndexOP5) to the first index finger keypress (IndexOP1) of the subsequent trial, mirroring the approach used to calculate micro-offline gains in skill. This offline distance provided insight into the net change in contextual representation of the index finger keypress over each interleaved rest break. For online differentiation, we calculated either the mean Euclidean distance between IndexOP1 and IndexOP5 of all the correctly typed sequences (sequence-based) or the distance between the first IndexOP1 and last IndexOP5 (trial-based) within the same practice trial. Online differentiation informed on the net change in the contextual representation of the index finger keypress occurring within each practice trial. Cumulative offline and online representation distances across participants were statistically compared using paired t-tests. As a control analysis, we computed the difference in neural representation between IndexOP1 and IndexOP5 on Day 2 Retest data for the same sequence (4-1-3-2-4) as well as for different Day 2 Control untrained sequences where the same action was performed at ordinal positions 1 and 5 (2-1-3-4-2; 1-4-2-3-1; 2-3-1-4-2; 4-2-3-1-4). We also assessed for specificity of contextualization to the trained sequence, by evaluating differentiation between index finger keypress representations performed at two different positions within untrained sequences (4-2-4-3-1 and 1-4-3-4-2). The cumulative differences were compared across participants with paired t-tests.

Finally, we computed trial-by-trial differences in offline and online representations during early learning, exploring their temporal relationships with cumulative micro-offline and -online gains in skill, respectively, through regression analysis and Pearson correlation analysis. Linear regression models were trained utilizing the fitlm function in MATLAB. The model employed M-estimation, formulating estimating equations and solving them through the Iteratively Reweighted Least Squares (IRLS) method (Holland and Welsch, 1977). Key metrics such as the square root of the mean squared error (RMSE), which estimates the standard deviation of the prediction error distribution, the coefficient of explained variance (R2), the F-statistic as a test statistic for the F-test on the regression model, examining whether the model significantly outperforms a degenerate model consisting only of a constant term, and the p-value for the F-test on the model were computed and compared across different models. This multifaceted approach aimed to uncover the nuanced dynamics of neural representation changes in response to skill acquisition.

Acknowledgements

We thank Ms. Tasneem Malik, Ms. Michele Richman, NIMH MEG Core Facility staff, and NIH NMRF and FMRIF Core Facility staff for their support. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov). This research was supported by the Intramural Research Program of the National Institutes of Health (NIH). The contributions of the NIH author(s) were made as part of their official duties as NIH federal employees, are in compliance with agency policy requirements, and are considered works of the United States Government. However, the findings and conclusions presented in this paper are those of the author(s) and do not necessarily reflect the views of the NIH or the U.S. Department of Health and Human Services.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Ethan R Buch, Email: ethan.buch@nih.gov.

Leonardo G Cohen, Email: cohenl@ninds.nih.gov.

Juan Alvaro Gallego, Imperial College London, United Kingdom.

Michael J Frank, Brown University, United States.

Funding Information

This paper was supported by the following grant:

  • National Institute of Neurological Disorders and Stroke NINDS Intramural Research Program to Leonardo G Cohen.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing.

Writing – review and editing.

Writing – review and editing.

Writing – review and editing.

Data curation, Writing – review and editing.

Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing.

Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing.

Ethics

Human subjects: The study was approved by the Combined Neuroscience Institutional Review Board of the National Institutes of Health (NIH). All participants provided written informed consent for the study.

Additional files

MDAR checklist

Data availability

All de-identified and permanently unlinked from all personal identifiable information (PII) data are publicly available on the OpenNeuro platform. All custom analysis code is available in a publicly accessible repository hosted on GitHub (copy archived at Dash and hcps-ninds, 2025).

The following dataset was generated:

Bönstrup M, Buch ER, Cohen LG. 2025. Skill learning and consolidation in healthy humans. OpenNeuro.

References

  1. Ariani G, Diedrichsen J. Sequence learning is driven by improvements in motor planning. Journal of Neurophysiology. 2019;121:2088–2100. doi: 10.1152/jn.00041.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bansal AK, Vargas-Irwin CE, Truccolo W, Donoghue JP. Relationships among low-frequency local field potentials, spiking activity, and three-dimensional reach and grasp kinematics in primary motor and ventral premotor cortices. Journal of Neurophysiology. 2011;105:1603–1619. doi: 10.1152/jn.00532.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Beukema P, Diedrichsen J, Verstynen TD. Binding during sequence learning does not alter cortical representations of individual actions. The Journal of Neuroscience. 2019;39:6968–6977. doi: 10.1523/JNEUROSCI.2669-18.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bönstrup M, Iturrate I, Thompson R, Cruciani G, Censor N, Cohen LG. A rapid form of offline consolidation in skill learning. Current Biology. 2019a;29:1346–1351. doi: 10.1016/j.cub.2019.02.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bönstrup M, Krawinkel L, Schulz R, Cheng B, Feldheim J, Thomalla G, Cohen LG, Gerloff C. Low-frequency brain oscillations track motor recovery in human stroke. Annals of Neurology. 2019b;86:853–865. doi: 10.1002/ana.25615. [DOI] [PubMed] [Google Scholar]
  6. Bönstrup M, Iturrate I, Hebart MN, Censor N, Cohen LG. Mechanisms of offline motor learning at a microscale of seconds in large-scale crowdsourced data. NPJ Science of Learning. 2020;5:7. doi: 10.1038/s41539-020-0066-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Brawn TP, Fenn KM, Nusbaum HC, Margoliash D. Consolidating the effects of waking and sleep on motor-sequence learning. The Journal of Neuroscience. 2010;30:13977–13982. doi: 10.1523/JNEUROSCI.3295-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Brooks E, Wallis S, Hendrikse J, Coxon J. Micro-consolidation occurs when learning an implicit motor sequence, but is not influenced by HIIT exercise. NPJ Science of Learning. 2024;9:23. doi: 10.1038/s41539-024-00238-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Buch ER, Liew SL, Cohen LG. Plasticity of sensorimotor networks: multiple overlapping mechanisms. The Neuroscientist. 2017;23:185–196. doi: 10.1177/1073858416638641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Buch ER, Claudino L, Quentin R, Bönstrup M, Cohen LG. Consolidation of human skill linked to waking hippocampo-neocortical replay. Cell Reports. 2021;35:109193. doi: 10.1016/j.celrep.2021.109193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Censor N, Horovitz SG, Cohen LG. Interference with existing memories alters offline intrinsic functional brain connectivity. Neuron. 2014;81:69–76. doi: 10.1016/j.neuron.2013.10.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Chen PC, Stritzelberger J, Walther K, Hamer H, Staresina BP. Hippocampal ripples during offline periods predict human motor sequence learning. bioRxiv. 2024 doi: 10.1101/2024.10.06.614680. [DOI]
  13. Churchland MM, Cunningham JP, Kaufman MT, Foster JD, Nuyujukian P, Ryu SI, Shenoy KV. Neural population dynamics during reaching. Nature. 2012;487:51–56. doi: 10.1038/nature11129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Classen J, Liepert J, Wise SP, Hallett M, Cohen LG. Rapid plasticity of human cortical movement representation induced by practice. Journal of Neurophysiology. 1998;79:1117–1123. doi: 10.1152/jn.1998.79.2.1117. [DOI] [PubMed] [Google Scholar]
  15. Cruikshank LC, Singhal A, Hueppelsheuser M, Caplan JB. Theta oscillations reflect a putative neural mechanism for human sensorimotor integration. Journal of Neurophysiology. 2012;107:65–77. doi: 10.1152/jn.00893.2010. [DOI] [PubMed] [Google Scholar]
  16. Dale AM, Fischl B, Sereno MI. Cortical surface-based analysis. NeuroImage. 1999;9:179–194. doi: 10.1006/nimg.1998.0395. [DOI] [PubMed] [Google Scholar]
  17. Das A, Karagiorgis A, Diedrichsen J, Stenner MP, Azañón E. “Micro-offline gains” convey no benefit for motor skill learning. bioRxiv. 2024 doi: 10.1101/2024.07.11.602795. [DOI]
  18. Dash D, hcps-ninds SequenceActionRepresentationsContextualizeDuringEarlySkillLearning. swh:1:rev:11d27bc37017e6073e9cf44bf80d0e8856fecc64Software Heritage. 2025 https://archive.softwareheritage.org/swh:1:dir:0ba807c448770f0dbd2c76a7770968460c9c9457;origin=https://github.com/hcps-ninds/SequenceActionRepresentationsContextualizeDuringEarlySkillLearning;visit=swh:1:snp:365eeff833452df5c3212515d79528071d0c3e44;anchor=swh:1:rev:11d27bc37017e6073e9cf44bf80d0e8856fecc64
  19. Dayan E, Cohen LG. Neuroplasticity subserving motor skill learning. Neuron. 2011;72:443–454. doi: 10.1016/j.neuron.2011.10.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Dehaene S, Meyniel F, Wacongne C, Wang L, Pallier C. The neural representation of sequences: From transition probabilities to algebraic patterns and linguistic trees. Neuron. 2015;88:2–19. doi: 10.1016/j.neuron.2015.09.019. [DOI] [PubMed] [Google Scholar]
  21. Delorme A, Makeig S. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods. 2004;134:9–21. doi: 10.1016/j.jneumeth.2003.10.009. [DOI] [PubMed] [Google Scholar]
  22. Destrieux C, Fischl B, Dale A, Halgren E. Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. NeuroImage. 2010;53:1–15. doi: 10.1016/j.neuroimage.2010.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology. 2005;3:185–205. doi: 10.1142/s0219720005001004. [DOI] [PubMed] [Google Scholar]
  24. Doyon J, Benali H. Reorganization and plasticity in the adult brain during learning of motor skills. Current Opinion in Neurobiology. 2005;15:161–167. doi: 10.1016/j.conb.2005.03.004. [DOI] [PubMed] [Google Scholar]
  25. Flint RD, Ethier C, Oby ER, Miller LE, Slutzky MW. Local field potentials allow accurate decoding of muscle activity. Journal of Neurophysiology. 2012;108:18–24. doi: 10.1152/jn.00832.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Frohlich J, Toker D, Monti MM. Consciousness among delta waves: a paradox? Brain. 2021;144:2257–2277. doi: 10.1093/brain/awab095. [DOI] [PubMed] [Google Scholar]
  27. Georgopoulos AP, Kalaska JF, Caminiti R, Massey JT. On the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortex. The Journal of Neuroscience. 1982;2:1527–1537. doi: 10.1523/JNEUROSCI.02-11-01527.1982. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Georgopoulos AP. Population activity in the control of movement. International Review of Neurobiology. 1994;37:103–119. doi: 10.1016/s0074-7742(08)60241-x. [DOI] [PubMed] [Google Scholar]
  29. Ghilardi MF, Moisello C, Silvestri G, Ghez C, Krakauer JW. Learning of a sequential motor skill comprises explicit and implicit components that consolidate differently. Journal of Neurophysiology. 2009;101:2218–2229. doi: 10.1152/jn.01138.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Grafton ST, Hazeltine E, Ivry RB. Motor sequence learning with the nondominant left hand: a PET functional imaging study. Experimental Brain Research. 2002;146:369–378. doi: 10.1007/s00221-002-1181-y. [DOI] [PubMed] [Google Scholar]
  31. Gramfort A, Luessi M, Larson E, Engemann DA, Strohmeier D, Brodbeck C, Goj R, Jas M, Brooks T, Parkkonen L, Hämäläinen M. MEG and EEG data analysis with MNE-Python. Frontiers in Neuroscience. 2013;7:267. doi: 10.3389/fnins.2013.00267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Griffin S, Khanna P, Choi H, Thiesen K, Novik L, Morecraft RJ, Ganguly K. Ensemble reactivations during brief rest drive fast learning of sequences. Nature. 2025;638:1034–1042. doi: 10.1038/s41586-024-08414-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Gupta MW, Rickard TC. Dissipation of reactive inhibition is sufficient to explain post-rest improvements in motor sequence learning. NPJ Science of Learning. 2022;7:25. doi: 10.1038/s41539-022-00140-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Hall TM, Nazarpour K, Jackson A. Real-time estimation and biofeedback of single-neuron firing rates using local field potentials. Nature Communications. 2014;5:5462. doi: 10.1038/ncomms6462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Hayward W, Buch ER, Norato G, Iwane F, Dash D, Salamanca-Girón RF, Bartrum E, Walitt B, Nath A, Cohen LG. Procedural motor memory deficits in patients with long-COVID. Neurology. 2024;102:e208073. doi: 10.1212/WNL.0000000000208073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Hikosaka O, Nakahara H, Rand MK, Sakai K, Lu X, Nakamura K, Miyachi S, Doya K. Parallel neural networks for learning sequential procedures. Trends in Neurosciences. 1999;22:464–471. doi: 10.1016/s0166-2236(99)01439-3. [DOI] [PubMed] [Google Scholar]
  37. Holland PW, Welsch RE. Robust regression using iteratively reweighted least-squares. Communications in Statistics - Theory and Methods. 1977;6:813–827. doi: 10.1080/03610927708827533. [DOI] [Google Scholar]
  38. Jacobacci F, Armony JL, Yeffal A, Lerner G, Amaro E, Jr, Jovicich J, Doyon J, Della-Maggiore V. Rapid hippocampal plasticity supports motor sequence learning. PNAS. 2020;117:23898–23903. doi: 10.1073/pnas.2009576117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Karni A, Meyer G, Jezzard P, Adams MM, Turner R, Ungerleider LG. Functional MRI evidence for adult motor cortex plasticity during motor skill learning. Nature. 1995;377:155–158. doi: 10.1038/377155a0. [DOI] [PubMed] [Google Scholar]
  40. Kleim JA, Barbay S, Nudo RJ. Functional reorganization of the rat motor cortex following motor skill learning. Journal of Neurophysiology. 1998;80:3321–3325. doi: 10.1152/jn.1998.80.6.3321. [DOI] [PubMed] [Google Scholar]
  41. Komorowski RW, Manns JR, Eichenbaum H. Robust conjunctive item-place coding by hippocampal neurons parallels learning what happens where. The Journal of Neuroscience. 2009;29:9918–9929. doi: 10.1523/JNEUROSCI.1378-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Kornysheva K, Bush D, Meyer SS, Sadnicka A, Barnes G, Burgess N. Neural competitive queuing of ordinal structure underlies skilled sequential action. Neuron. 2019;101:1166–1180. doi: 10.1016/j.neuron.2019.01.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Krasoulis A, Hall TM, Vijayakumar S, Jackson A, Nazarpour K. Generalizability of EMG decoding using local field potentials. Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 2014. pp. 1630–1633. [DOI] [PubMed] [Google Scholar]
  44. Kumar N, Manning TF, Ostry DJ. Somatosensory cortex participates in the consolidation of human motor memory. PLOS Biology. 2019;17:e3000469. doi: 10.1371/journal.pbio.3000469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Lee SH, Jin SH, An J. The difference in cortical activation pattern for complex motor skills: A functional near- infrared spectroscopy study. Scientific Reports. 2019;9:14066. doi: 10.1038/s41598-019-50644-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Lee HS, Schreiner L, Jo S-H, Sieghartsleitner S, Jordan M, Pretl H, Guger C, Park H-S. Individual finger movement decoding using a novel ultra-high-density electroencephalography-based brain-computer interface system. Frontiers in Neuroscience. 2022;16:1009878. doi: 10.3389/fnins.2022.1009878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Lemon RN. Descending pathways in motor control. Annual Review of Neuroscience. 2008;31:195–218. doi: 10.1146/annurev.neuro.31.060407.125547. [DOI] [PubMed] [Google Scholar]
  48. Liao K, Xiao R, Gonzalez J, Ding L. Decoding individual finger movements from one hand using human EEG signals. PLOS ONE. 2014;9:e85192. doi: 10.1371/journal.pone.0085192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Lisman J, Buzsáki G. A neural coding scheme formed by the combined function of gamma and theta oscillations. Schizophrenia Bulletin. 2008;34:974–980. doi: 10.1093/schbul/sbn060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Liu C, You J, Wang K, Zhang S, Huang Y, Xu M, Ming D. Decoding the EEG patterns induced by sequential finger movement for brain-computer interfaces. Frontiers in Neuroscience. 2023;17:17. doi: 10.3389/fnins.2023.1180471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Maaten L, Postma EO. Dimensionality reduction: A comparative review. Journal of Machine Learning Research. 2009;10:13 [Google Scholar]
  52. Maidhof C, Rieger M, Prinz W, Koelsch S. Nobody is perfect: ERP effects prior to performance errors in musicians indicate fast monitoring processes. PLOS ONE. 2009;4:e5032. doi: 10.1371/journal.pone.0005032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Marcus DS, Harwell J, Olsen T, Hodge M, Glasser MF, Prior F, Jenkinson M, Laumann T, Curtiss SW, Van Essen DC. Informatics and data mining tools and strategies for the human connectome project. Frontiers in Neuroinformatics. 2011;5:4. doi: 10.3389/fninf.2011.00004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Merino EC, Faes A, Van Hulle MM. The role of distinct ECoG frequency features in decoding finger movement. Journal of Neural Engineering. 2023;20:ad0c5e. doi: 10.1088/1741-2552/ad0c5e. [DOI] [PubMed] [Google Scholar]
  55. Mollazadeh M, Aggarwal V, Davidson AG, Law AJ, Thakor NV, Schieber MH. Spatiotemporal variation of multiple neurophysiological signals in the primary motor cortex during dexterous reach-to-grasp movements. The Journal of Neuroscience. 2011;31:15531–15543. doi: 10.1523/JNEUROSCI.2999-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Munn BR, Müller EJ, Favre-Bulle I, Scott E, Lizier JT, Breakspear M, Shine JM. Multiscale organization of neuronal activity unifies scale-dependent theories of brain function. Cell. 2024;187:7303–7313. doi: 10.1016/j.cell.2024.10.004. [DOI] [PubMed] [Google Scholar]
  57. Mylonas D, Schapiro AC, Verfaellie M, Baxter B, Vangel M, Stickgold R, Manoach DS. Maintenance of procedural motor memory across brief rest periods requires the hippocampus. The Journal of Neuroscience. 2024;44:e1839232024. doi: 10.1523/JNEUROSCI.1839-23.2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Natraj N, Silversmith DB, Chang EF, Ganguly K. Compartmentalized dynamics within a common multi-area mesoscale manifold represent a repertoire of human hand movements. Neuron. 2022;110:154–174. doi: 10.1016/j.neuron.2021.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Oostenveld R, Fries P, Maris E, Schoffelen JM. FieldTrip: Open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data. Computational Intelligence and Neuroscience. 2011;2011:156869. doi: 10.1155/2011/156869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Pan SC, Rickard TC. Sleep and motor learning: Is there room for consolidation? Psychological Bulletin. 2015;141:812–834. doi: 10.1037/bul0000009. [DOI] [PubMed] [Google Scholar]
  61. Pavlides C, Miyashita E, Asanuma H. Projection from the sensory to the motor cortex is important in learning motor skills in the monkey. Journal of Neurophysiology. 1993;70:733–741. doi: 10.1152/jn.1993.70.2.733. [DOI] [PubMed] [Google Scholar]
  62. Quandt F, Reichert C, Hinrichs H, Heinze HJ, Knight RT, Rieger JW. Single trial discrimination of individual finger movements on one hand: a combined MEG and EEG study. NeuroImage. 2012;59:3316–3324. doi: 10.1016/j.neuroimage.2011.11.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Ramanathan DS, Guo L, Gulati T, Davidson G, Hishinuma AK, Won S-J, Knight RT, Chang EF, Swanson RA, Ganguly K. Low-frequency cortical activity is a neuromodulatory target that tracks recovery after stroke. Nature Medicine. 2018;24:1257–1267. doi: 10.1038/s41591-018-0058-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Reddy L, Self MW, Zoefel B, Poncet M, Possel JK, Peters JC, Baayen JC, Idema S, VanRullen R, Roelfsema PR. Theta-phase dependent neuronal coding during sequence learning in human single neurons. Nature Communications. 2021;12:4839. doi: 10.1038/s41467-021-25150-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Rijsbergen V. Reviews: Van Rijsbergen. goodreads; 1979. [Google Scholar]
  66. Robertson EM, Cohen DA. Understanding consolidation through the architecture of memories. The Neuroscientist. 2006;12:261–271. doi: 10.1177/1073858406287935. [DOI] [PubMed] [Google Scholar]
  67. Ruiz MH, Jabusch HC, Altenmüller E. Detecting wrong notes in advance: neuronal correlates of error monitoring in pianists. Cerebral Cortex. 2009;19:2625–2639. doi: 10.1093/cercor/bhp021. [DOI] [PubMed] [Google Scholar]
  68. Sawamura D, Sakuraba S, Suzuki Y, Asano M, Yoshida S, Honke T, Kimura M, Iwase Y, Horimoto Y, Yoshida K, Sakai S. Acquisition of chopstick-operation skills with the non-dominant hand and concomitant changes in brain activity. Scientific Reports. 2019;9:20397. doi: 10.1038/s41598-019-56956-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Schmidt RA. Motor Control and Learning: A Behavioral Emphasis. Human kinetics; 2018. [Google Scholar]
  70. Schütze H, Manning CD, Raghavan P. Introduction to Information Retrieval. Cambridge University Press; 2008. [Google Scholar]
  71. Sjøgård M. Hippocampal ripples mediate motor learning during brief rest breaks in humans. bioRxiv. 2024 doi: 10.1101/2024.05.02.592200. [DOI] [PMC free article] [PubMed]
  72. Song S, Cohen LG. Practice and sleep form different aspects of skill. Nature Communications. 2014;5:3407. doi: 10.1038/ncomms4407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Stefanics G, Hangya B, Hernádi I, Winkler I, Lakatos P, Ulbert I. Phase entrainment of human delta oscillations can mediate the effects of expectation on reaction speed. The Journal of Neuroscience. 2010;30:13578–13585. doi: 10.1523/JNEUROSCI.0703-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Tomassini A, Ambrogioni L, Medendorp WP, Maris E. Theta oscillations locked to intended actions rhythmically modulate perception. eLife. 2017;6:e25618. doi: 10.7554/eLife.25618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Walker MP, Brakefield T, Morgan A, Hobson JA, Stickgold R. Practice with sleep makes perfect: sleep-dependent motor skill learning. Neuron. 2002;35:205–211. doi: 10.1016/s0896-6273(02)00746-8. [DOI] [PubMed] [Google Scholar]
  76. Walker MP, Stickgold R. Sleep-dependent learning and memory consolidation. Neuron. 2004;44:121–133. doi: 10.1016/j.neuron.2004.08.031. [DOI] [PubMed] [Google Scholar]
  77. Wamsley EJ, Arora M, Gibson H, Powell P, Collins M. Memory consolidation during ultra-short offline states. Journal of Cognitive Neuroscience. 2023;35:1617–1634. doi: 10.1162/jocn_a_02035. [DOI] [PubMed] [Google Scholar]
  78. Waters S, Wiestler T, Diedrichsen J. Cooperation not competition: Bihemispheric tDCS and fMRI Show role for ipsilateral hemisphere in motor learning. The Journal of Neuroscience. 2017;37:7500–7512. doi: 10.1523/JNEUROSCI.3414-16.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Yao L, Zhu B, Shoaran M. Fast and accurate decoding of finger movements from ECoG through Riemannian features and modern machine learning techniques. Journal of Neural Engineering. 2022;19:016037. doi: 10.1088/1741-2552/ac4ed1. [DOI] [PubMed] [Google Scholar]
  80. Yokoi A, Diedrichsen J. Neural organization of hierarchical motor sequence representations in the human neocortex. Neuron. 2019;103:1178–1190. doi: 10.1016/j.neuron.2019.06.017. [DOI] [PubMed] [Google Scholar]
  81. Yu L, Liu H. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research. 2004;5:1205–1224. [Google Scholar]
  82. Zhao Y, Zhang X, Li X, Zhao H, Chen X, Chen X, Gao X. Decoding finger movement patterns from microscopic neural drive information based on deep learning. Medical Engineering & Physics. 2022;104:103797. doi: 10.1016/j.medengphy.2022.103797. [DOI] [PubMed] [Google Scholar]
  83. Zimerman M, Heise K-F, Gerloff C, Cohen LG, Hummel FC. Disrupting the ipsilateral motor cortex interferes with training of a complex motor task in older adults. Cerebral Cortex. 2014;24:1030–1036. doi: 10.1093/cercor/bhs385. [DOI] [PubMed] [Google Scholar]

eLife Assessment

Juan Alvaro Gallego 1

This valuable study asks how the neural representation of individual finger movements changes during the early periods of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide solid evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The authors also show that offline contextualization during short rest periods is the basis for improved performance. Further confirmation of these results on multiple movement sequences would further strengthen the key claims.

Reviewer #1 (Public review):

Anonymous

Summary:

This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements, and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning.

Strengths:

The work follows from a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established a neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these so-called micro-offline rest periods.

The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%.

Reviewer #2 (Public review):

Anonymous

Summary:

The current paper consists of two parts. The first part is the rigorous feature optimization of the MEG signal to decode individual finger identity performed in a sequence (4-1-3-2-4; 1~4 corresponds to little~index fingers of the left hand). By optimizing various parameters for the MEG signal, in terms of (i) reconstructed source activity in voxel- and parcel-level resolution and their combination, (ii) frequency bands, and (iii) time window relative to press onset for each finger movement, as well as the choice of decoders, the resultant "hybrid decoder" achieved extremely high decoding accuracy (~95%).

In the second part of the paper, armed with the successful 'hybrid decoder,' the authors asked how neural representation of individual finger movement that is embedded in a sequence, changes during a very early period of skill learning and whether and how such representational change can predict skill learning. They assessed the difference in MEG feature patterns between the first and the last press 4 in sequence 41324 at each training trial and found that the pattern differentiation progressively increased over the course of early learning trials. Additionally, they found that this pattern differentiation specifically occurred during the rest period rather than during the practice trial. With a significant correlation between the trial-by-trial profile of this pattern differentiation and that for accumulation of offline learning, the authors argue that such "contextualization" of finger movement in a sequence (e.g., what-where association) underlies the early improvement of sequential skill. This is an important and timely topic for the field of motor learning and beyond.

Strengths:

The use of temporally rich neural information (MEG signal) has a significant advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. The finding of the early "contextualization" of the finger movement in a sequence and its correlation to early (offline) skill improvement is interesting and important. The comparison between "online" and "offline" pattern distance is a neat idea.

Weaknesses:

One potential weakness, in terms of the generality, is that the study assessed the single sequence, the "41324" across all participants. Future confirmation test of using different sequences would be important.

Reviewer #3 (Public review):

Anonymous

Summary:

One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multi-scale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training, and correlates with a performance metric which the authors interpret as an indicator of offline learning.

Strengths:

A strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of concurrent distribution of neural coding across local circuits as well as large-scale networks.

Weaknesses:

A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, which partly arise from the experimental design, and which are described below, question the neurobiological implications proposed by the authors, and offer a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence casts doubt on this assumption.

Specifically:

The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence, and test the classifier on other sequences that require the same movements, but in different positions (Kornysheva et al., Neuron 2019). In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 3 - supplement 5 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the keypress, up to at least {plus minus}100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides little evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context.

During the review process, the authors pointed out that a "mixing" of temporally overlapping information from consecutive keypresses, as described above, should result in systematic misclassifications and therefore be detectable in the confusion matrices in Figures 3C and 4B, which indeed do not provide any evidence that consecutive keypresses are systematically confused. However, such absence of evidence (of systematic misclassification) should be interpreted with caution. The authors also reported that there was only a weak relation between inter-press intervals and "online contextualization" (Figure 5 - figure supplement 6), however, their analysis suprisingly includes a keypress transition that is shared between OP1 and OP5 ("4-4"), rather than focusing solely on the two distinctive transitions ("2-4" and "4-1").

Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time, and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. During the review process, authors pointed at absence of evidence of a relation between tapping speed and "ordinal coding" (Figure 5 - figure supplement 7). However, a rigorous test of the idea that the mental representation of context changes would require a task design in which the physical context remains constant.

A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence, but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses.

A further complication in interpreting the results stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen. It is not clear why the authors introduced this complicating visual feedback in their task, besides consistency with their previous studies. The resulting systematic link between the pattern of visual stimulation (the number of asterisks on the screen) and the ordinal position of a keypress makes the interpretation of "contextual information" that differentiates between ordinal positions difficult. While the authors report the surprising finding that their eye-tracking data could not predict asterisk position on the task display above chance level, the mean gaze position seemed to vary systematically as a function of ordinal position of a movement - see Figure 4 - figure supplement 3.

The authors report a significant correlation between "offline differentiation" and cumulative micro-offline gains. However, to reach the conclusion that "the degree of representational differentiation -particularly prominent over rest intervals - correlated with skill gains.", the critical question is rather whether "offline differentiation" correlates with micro-offline gains (not with cumulative micro-offline gains). That is, does the degree to which representations differentiate "during" a given rest period correlate with the degree to which performance improves from before to after the same rest period (not: does "offline differentiation" in a given rest period correlate with the degree to which performance has improved "during" all rest periods up to the current rest period - but this is what Figure 5 - figure supplements 1 and 4 show).

The authors follow the assumption that micro-offline gains reflect offline learning. However, there is no compelling evidence in the literature, and no evidence in the present manuscript, that micro-offline gains (during any training phase) reflect offline learning. Instead, emerging evidence in the literature indicates that they do not (Das et al., bioRxiv 2024), and instead reflect transient performance benefits when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024). During the review process, the authors argued that differences in the design between Das et al. (2024) on the one hand (Experiments 1 and 2), and the study by Bönstrup et al. (2019) on the other hand, may have prevented Das et al. (2024) from finding the assumed (lasting) learning benefit by micro-offline consolidation. However, the Supplementary Material of Das et al. (2024) includes an experiment (Experiment S1) whose design closely follows the early learning phase of Bönstrup et al. (2019), and which, nevertheless, demonstrates that there is no lasting benefit of taking breaks for the acquired skill level, despite the presence of micro-offline gains.

Along these lines, the authors argue that their practice schedule "minimizes reactive inhibition effects", in particular their short practice periods of 10 seconds each. However, 10 seconds are sufficient to result in motor slowing, as report in Bächinger et al., elife 2019, or Rodrigues et al., Exp Brain Res 2009.

An important conceptual problem with the current study is that the authors conclude that performance improves, and representation manifolds differentiate, "during" rest periods. However, micro-offline gains (as well as offline contextualization) are computed from data obtained during practice, not rest, and may, thus, just as well reflect a change that occurs "online", e.g., at the very onset of practice (like pre-planning) or throughout practice (like fatigue, or reactive inhibition).

The authors' conclusion that "low-frequency oscillations (LFOs) result in higher decoding accuracy compared to other narrow-band activity" should be taken with caution, given that the critical decoding analysis for this conclusion was based on data averaged across a time window of 200 ms (Figure 2), essentially smoothing out higher frequency components.

eLife. 2025 Sep 12;13:RP102475. doi: 10.7554/eLife.102475.4.sa4

Author response

Debadatta Dash 1, Fumiaki Iwane 2, William Hayward 3, Roberto F Salamanca-Giron 4, Marlene Bönstrup 5, Ethan R Buch 6, Leonardo Cohen 7

The following is the authors’ response to the previous reviews

Overview of reviewer's concerns after peer review:

As for the initial submission, the reviewers' unanimous opinion is that the authors should perform additional controls to show that their key findings may not be affected by experimental or analysis artefacts, and clarify key aspects of their core methods, chiefly:

(1) The fact that their extremely high decoding accuracy is driven by frequency bands that would reflect the key press movements and that these are located bilaterally in frontal brain regions (with the task being unilateral) are seen as key concerns,

The above statement that decoding was driven by bilateral frontal brain regions is not entirely consistent with our results. The confusion was likely caused by the way we originally presented our data in Figure 2. We have revised that figure to make it more clear that decoding performance at both the parcel- (Figure 2B) and voxel-space (Figure 2C) level is predominantly driven by contralateral (as opposed to ipsilateral) sensorimotor regions. Figure 2D, which highlights bilateral sensorimotor and premotor regions, displays accuracy of individual regional voxel-space decoders assessed independently. This was the criteria used to determine which regional voxel-spaces were included in the hybridspace decoder. This result is not surprising given that motor and premotor regions are known to display adaptive interhemispheric interactions during motor sequence learning [1, 2], and particularly so when the skill is performed with the non-dominant hand [3-5]. We now discuss this important detail in the revised manuscript:

Discussion (lines 348-353)

“The whole-brain parcel-space decoder likely emphasized more stable activity patterns in contralateral frontoparietal regions that differed between individual finger movements [21,35], while the regional voxel-space decoder likely incorporated information related to adaptive interhemispheric interactions operating during motor sequence learning [32,36,37], particularly pertinent when the skill is performed with the non-dominant hand [38-40].”

We now also include new control analyses that directly address the potential contribution of movement-related artefact to the results. These changes are reported in the revised manuscript as follows:

Results (lines 207-211):

“An alternate decoder trained on ICA components labeled as movement or physiological artefacts (e.g. – head movement, ECG, eye movements and blinks; Figure 3 – figure supplement 3A, D) and removed from the original input feature set during the pre-processing stage approached chance-level performance (Figure 4 – figure supplement 3), indicating that the 4-class hybrid decoder results were not driven by task-related artefacts.”

Results (lines 261-268):

“As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (cross-validated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C). “

Discussion (Lines 362-368):

“Task-related movements—which also express in lower frequency ranges—did not explain these results given the near chance-level performance of alternative decoders trained on (a) artefact-related ICA components removed during MEG preprocessing (Figure 3 – figure supplement 3A-C) and on (b) task-related eye movement features (Figure 4 – figure supplement 3B, C). This explanation is also inconsistent with the minimal average head motion of 1.159 mm (± 1.077 SD) across the MEG recording (Figure 3 – figure supplement 3D).“

(2) Relatedly, the use of a wide time window (~200 ms) for a 250-330 ms typing speed makes it hard to pinpoint the changes underpinning learning,

The revised manuscript now includes analyses carried out with decoding time windows ranging from 50 to 250ms in duration. These additional results are now reported in:

Results (lines 258-261):

“The improved decoding accuracy is supported by greater differentiation in neural representations of the index finger keypresses performed at positions 1 and 5 of the sequence (Figure 4A), and by the trial-by-trial increase in 2-class decoding accuracy over early learning (Figure 4C) across different decoder window durations (Figure 4 – figure supplement 2).”

Results (lines 310-312):

“Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R2 = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C).“

Discussion (lines 382-385):

“This was further supported by the progressive differentiation of neural representations of the index finger keypress (Figure 4A) and by the robust trial-bytrial increase in 2-class decoding accuracy across time windows ranging between 50 and 250ms (Figure 4C; Figure 4 – figure supplement 2).”

Discussion (lines 408-9):

“Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1).”

(3) These concerns make it hard to conclude from their data that learning is mediated by "contextualisation" ---a key claim in the manuscript;

We believe the revised manuscript now addresses all concerns raised in Editor points 1 and 2.

(4) The hybrid voxel + parcel space decoder ---a key contribution of the paper--- is not clearly explained;

We now provide additional details regarding the hybrid-space decoder approach in the following sections of the revised manuscript:

Results (lines 158-172):

“Next, given that the brain simultaneously processes information more efficiently across multiple spatial and temporal scales [28, 32, 33], we asked if the combination of lower resolution whole-brain and higher resolution regional brain activity patterns further improve keypress prediction accuracy. We constructed hybrid-space decoders (N = 1295 ± 20 features; Figure 3A) combining whole-brain parcel-space activity (n = 148 features; Figure 2B) with regional voxel-space activity from a datadriven subset of brain areas (n = 1147 ± 20 features; Figure 2D). This subset covers brain regions showing the highest regional voxel-space decoding performances (top regions across all subjects shown in Figure 2D; Methods – Hybrid Spatial Approach).

[…]

Note that while features from contralateral brain regions were more important for whole-brain decoding (in both parcel- and voxel-spaces), regional voxel-space decoders performed best for bilateral sensorimotor areas on average across the group. Thus, a multi-scale hybrid-space representation best characterizes the keypress action manifolds.”

Results (lines 275-282):

“We used a Euclidian distance measure to evaluate the differentiation of the neural representation manifold of the same action (i.e. - an index-finger keypress) executed within different local sequence contexts (i.e. - ordinal position 1 vs. ordinal position 5; Figure 5). To make these distance measures comparable across participants, a new set of classifiers was then trained with group-optimal parameters (i.e. – broadband hybrid-space MEG data with subsequent manifold extraction Figure 3 – figure supplements 2) and LDA classifiers (Figure 3 – figure supplements 7) trained on 200ms duration windows aligned to the KeyDown event (see Methods, Figure 3 – figure supplements 5). “

Discussion (lines 341-360):

“The initial phase of the study focused on optimizing the accuracy of decoding individual finger keypresses from MEG brain activity. Recent work showed that the brain simultaneously processes information more efficiently across multiple—rather than a single—spatial scale(s) [28, 32]. To this effect, we developed a novel hybridspace approach designed to integrate neural representation dynamics over two different spatial scales: (1) whole-brain parcel-space (i.e. – spatial activity patterns across all cortical brain regions) and (2) regional voxel-space (i.e. – spatial activity patterns within select brain regions) activity. We found consistent spatial differences between whole-brain parcel-space feature importance (predominantly contralateral frontoparietal, Figure 2B) and regional voxel-space decoder accuracy (bilateral sensorimotor regions, Figure 2D). The whole-brain parcel-space decoder likely emphasized more stable activity patterns in contralateral frontoparietal regions that differed between individual finger movements [21, 35], while the regional voxelspace decoder likely incorporated information related to adaptive interhemispheric interactions operating during motor sequence learning [32, 36, 37], particularly pertinent when the skill is performed with the non-dominant hand [38-40]. The observation of increased cross-validated test accuracy (as shown in Figure 3 – Figure Supplement 6) indicates that the spatially overlapping information in parcel- and voxel-space time-series in the hybrid decoder was complementary, rather than redundant [41]. The hybrid-space decoder which achieved an accuracy exceeding 90%—and robustly generalized to Day 2 across trained and untrained sequences— surpassed the performance of both parcel-space and voxel-space decoders and compared favorably to other neuroimaging-based finger movement decoding strategies [6, 24, 42-44].”

Methods (lines 636-647):

“Hybrid Spatial Approach. First, we evaluated the decoding performance of each individual brain region in accurately labeling finger keypresses from regional voxelspace (i.e. - all voxels within a brain region as defined by the Desikan-Killiany Atlas) activity. Brain regions were then ranked from 1 to 148 based on their decoding accuracy at the group level. In a stepwise manner, we then constructed a “hybridspace” decoder by incrementally concatenating regional voxel-space activity of brain regions—starting with the top-ranked region—with whole-brain parcel-level features and assessed decoding accuracy. Subsequently, we added the regional voxel-space features of the second-ranked brain region and continued this process until decoding accuracy reached saturation. The optimal “hybrid-space” input feature set over the group included the 148 parcel-space features and regional voxelspace features from a total of 8 brain regions (bilateral superior frontal, middle frontal, pre-central and post-central; N = 1295 ± 20 features).”

(5) More controls are needed to show that their decoder approach is capturing a neural representation dedicated to context rather than independent representations of consecutive keypresses;

These controls have been implemented and are now reported in the manuscript:

Results (lines 318-328):

“Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or micro-offline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R2 = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R2 = 0.028, p = 0.41; Figure 5 – figure supplement 7).”

Results (lines 385-390):

“Further, the 5-class classifier—which directly incorporated information about the sequence location context of each keypress into the decoding pipeline—improved decoding accuracy relative to the 4-class classifier (Figure 4C). Importantly, testing on Day 2 revealed specificity of this representational differentiation for the trained skill but not for the same keypresses performed during various unpracticed control sequences (Figure 5C).”

Discussion (lines 408-423):

“Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than withinsubject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4).

Offline contextualization was not driven by trial-by-trial behavioral differences, including typing rhythm (Figure 5 – figure supplement 5) and adjacent keypress transition times (Figure 5 – figure supplement 6) nor by between-subject differences in overall typing speed (Figure 5 – figure supplement 7)—ruling out a reliance on differences in the temporal overlap of keypresses. Importantly, offline contextualization documented on Day 1 stabilized once a performance plateau was reached (trials 11-36), and was retained on Day 2, documenting overnight consolidation of the differentiated neural representations.”

(6) The need to show more convincingly that their data is not affected by head movements, e.g., by regressing out signal components that are correlated with the fiducial signal;

We now include data in Figure 3 – figure supplement 3D showing that head movement was minimal in all participants (mean of 1.159 mm ± 1.077 SD). Further, the requested additional control analyses have been carried out and are reported in the revised manuscript:

Results (lines 204-211):

“Testing the keypress state (4-class) hybrid decoder performance on Day 1 after randomly shupling keypress labels for held-out test data resulted in a performance drop approaching expected chance levels (22.12%± SD 9.1%; Figure 3 – figure supplement 3C). An alternate decoder trained on ICA components labeled as movement or physiological artefacts (e.g. – head movement, ECG, eye movements and blinks; Figure 3 – figure supplement 3A, D) and removed from the original input feature set during the pre-processing stage approached chance-level performance (Figure 4 – figure supplement 3), indicating that the 4-class hybrid decoder results were not driven by task-related artefacts.” Results (lines 261-268):

“As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (cross-validated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C). “

Discussion (Lines 362-368):

“Task-related movements—which also express in lower frequency ranges—did not explain these results given the near chance-level performance of alternative decoders trained on (a) artefact-related ICA components removed during MEG preprocessing (Figure 3 – figure supplement 3A-C) and on (b) task-related eye movement features (Figure 4 – figure supplement 3B, C). This explanation is also inconsistent with the minimal average head motion of 1.159 mm (± 1.077 SD) across the MEG recording (Figure 3 – figure supplement 3D). “

(7) The offline neural representation analysis as executed is a bit odd, since it seems to be based on comparing the last key press to the first key press of the next sequence, rather than focus on the inter-sequence interval

While we previously evaluated replay of skill sequences during rest intervals, identification of how offline reactivation patterns of a single keypress state representation evolve with learning presents non-trivial challenges. First, replay events tend to occur in clusters with irregular temporal spacing as previously shown by our group and others. Second, replay of experienced sequences is intermixed with replay of sequences that have never been experienced but are possible. Finally, and perhaps the most significant issue, replay is temporally compressed up to 20x with respect to the behavior [6]. That means our decoders would need to accurately evaluate spatial pattern changes related to individual keypresses over much smaller time windows (i.e. - less than 10 ms) than evaluated here. This future work, which is undoubtably of great interest to our research group, will require more substantial tool development before we can apply them to this question. We now articulate this future direction in the Discussion:

Discussion (lines 423-427):

“A possible neural mechanism supporting contextualization could be the emergence and stabilization of conjunctive “what–where” representations of procedural memories [64] with the corresponding modulation of neuronal population dynamics [65, 66] during early learning. Exploring the link between contextualization and neural replay could provide additional insights into this issue [6, 12, 13, 15].”

(8) And this analysis could be confounded by the fact that they are comparing the last element in a sequence vs the first movement in a new one.

We have now addressed this control analysis in the revised manuscript:

Results (Lines 310-316)

“Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R2 = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches).”

Discussion (lines 408-416):

“Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than within-subject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4).”

It also seems to be the case that many analyses suggested by the reviewers in the first round of revisions that could have helped strengthen the manuscript have not been included (they are only in the rebuttal). Moreover, some of the control analyses mentioned in the rebuttal seem not to be described anywhere, neither in the manuscript, nor in the rebuttal itself; please double check that.

All suggested analyses carried out and mentioned are now in the revised manuscript.

eLife Assessment

This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning…

We have now included all the requested control analyses supporting “an early, swift change in the brain regions correlated with sequence learning”:

The addition of more control analyses to rule out that head movement artefacts influence the findings,

We now include data in Figure 3 – figure supplement 3D showing that head movement was minimal in all participants (mean of 1.159 mm ± 1.077 SD). Further, we have implemented the requested additional control analyses addressing this issue:

Results (lines 207-211):

“An alternate decoder trained on ICA components labeled as movement or physiological artefacts (e.g. – head movement, ECG, eye movements and blinks; Figure 3 – figure supplement 3A, D) and removed from the original input feature set during the pre-processing stage approached chance-level performance (Figure 4 – figure supplement 3), indicating that the 4-class hybrid decoder results were not driven by task-related artefacts.”

Results (lines 261-268):

“As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (cross-validated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C). “

Discussion (Lines 362-368):

“Task-related movements—which also express in lower frequency ranges—did not explain these results given the near chance-level performance of alternative decoders trained on (a) artefact-related ICA components removed during MEG preprocessing (Figure 3 – figure supplement 3A-C) and on (b) task-related eye movement features (Figure 4 – figure supplement 3B, C). This explanation is also inconsistent with the minimal average head motion of 1.159 mm (± 1.077 SD) across the MEG recording (Figure 3 – figure supplement 3D).“

and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript.

We have edited the manuscript to clarify that the degree of representational differentiation (contextualization) parallels skill learning. We have no evidence at this point to indicate that “offline contextualization during short rest periods is the basis for improvement in performance”. The following areas of the revised manuscript now clarify this point:

Summary (Lines 455-458):

“In summary, individual sequence action representations contextualize during early learning of a new skill and the degree of differentiation parallels skill gains. Differentiation of the neural representations developed during rest intervals of early learning to a larger extent than during practice in parallel with rapid consolidation of skill.”

Additional control analyses are also provided supporting a link between offline contextualization and early learning:

Results (lines 302-318):

“The Euclidian distance between neural representations of IndexOP1 (i.e. - index finger keypress at ordinal position 1 of the sequence) and IndexOP5 (i.e. - index finger keypress at ordinal position 5 of the sequence) increased progressively during early learning (Figure 5A)—predominantly during rest intervals (offline contextualization) rather than during practice (online) (t = 4.84, p < 0.001, df = 25, Cohen's d = 1.2; Figure 5B; Figure 5 – figure supplement 1A). An alternative online contextualization determination equaling the time interval between online and offline comparisons (Trial-based; 10 seconds between IndexOP1 and IndexOP5 observations in both cases) rendered a similar result (Figure 5 – figure supplement 2B).

Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R2 = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3).”

Public Reviews:

Reviewer #1 (Public review):

Summary:

This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning.

Strengths:

The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established a neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these so-called micro-offline rest periods.

The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%.

Weaknesses:

A formal analysis and quantification of how head movement may have contributed to the results should be included in the paper or supplemental material. The type of correlated head movements coming from vigorous key presses aren't necessarily visible to the naked eye, and even if arms etc are restricted, this will not preclude shoulder, neck or head movement necessarily; if ICA was conducted, for example, the authors are in the position to show the components that relate to such movement; but eye-balling the data would not seem sufficient. The related issue of eye movements is addressed via classifier analysis. A formal analysis which directly accounts for finger/eye movements in the same analysis as the main result (ie any variance related to these factors) should be presented.

We now present additional data related to head (Figure 3 – figure supplement 3; note that average measured head movement across participants was 1.159 mm ± 1.077 SD) and eye movements (Figure 4 – figure supplement 3) and have implemented the requested control analyses addressing this issue. They are reported in the revised manuscript in the following locations: Results (lines 207-211), Results (lines 261-268), Discussion (Lines 362-368).

This reviewer recommends inclusion of a formal analysis that the intra-vs inter parcels are indeed completely independent. For example, the authors state that the inter-parcel features reflect "lower spatially resolved whole-brain activity patterns or global brain dynamics". A formal quantitative demonstration that the signals indeed show "complete independence" (as claimed by the authors) and are orthogonal would be helpful.

Please note that we never claim in the manuscript that the parcel-space and regional voxelspace features show “complete independence”. More importantly, input feature orthogonality is not a requirement for the machine learning-based decoding methods utilized in the present study while non-redundancy is [7] (a requirement satisfied by our data, see below). Finally, our results show that the hybrid space decoder out-performed all other methods even after input features were fully orthogonalized with LDA (the procedure used in all contextualization analyses) or PCA dimensionality reduction procedures prior to the classification step (Figure 3 – figure supplement 2).

Relevant to this issue, please note that if spatially overlapping parcel- and voxel-space timeseries only provided redundant information, inclusion of both as input features should increase model over-fitting to the training dataset and decrease overall cross-validated test accuracy [8]. In the present study however, we see the opposite effect on decoder performance. First, Figure 3 – figure supplement 1 & 2 clearly show that decoders constructed from hybrid-space features outperform the other input feature (sensor-, wholebrain parcel- and whole-brain voxel-) spaces in every case (e.g. – wideband, all narrowband frequency ranges, and even after the input space is fully orthogonalized through dimensionality reduction procedures prior to the decoding step). Furthermore, Figure 3 – figure supplement 6 shows that hybrid-space decoder performance supers when parceltime series that spatially overlap with the included regional voxel-spaces are removed from the input feature set.

We state in the Discussion (lines 353-356)

“The observation of increased cross-validated test accuracy (as shown in Figure 3 – Figure Supplement 6) indicates that the spatially overlapping information in parcel- and voxel-space time-series in the hybrid decoder was complementary, rather than redundant [41].”

To gain insight into the complimentary information contributed by the two spatial scales to the hybrid-space decoder, we first independently computed the matrix rank for whole-brain parcel- and voxel-space input features for each participant (shown in Author response image 1). The results indicate that whole-brain parcel-space input features are full rank (rank = 148) for all participants (i.e. - MEG activity is orthogonal between all parcels). The matrix rank of voxelspace input features (rank = 267± 17 SD), exceeded the parcel-space rank for all participants and approached the number of useable MEG sensor channels (n = 272). Thus, voxel-space features provide both additional and complimentary information to representations at the parcel-space scale.

Author response image 1. Matrix rank computed for whole-brain parcel- and voxel-space time-series in individual subjects across the training run.

Author response image 1.

The results indicate that whole-brain parcel-space input features are full rank (rank = 148) for all participants (i.e. - MEG activity is orthogonal between all parcels). The matrix rank of voxel-space input features (rank = 267 ± 17 SD), on the other hand, approached the number of useable MEG sensor channels (n = 272). Although not full rank, the voxel-space rank exceeded the parcel-space rank for all participants. Thus, some voxel-space features provide additional orthogonal information to representations at the parcel-space scale. An expression of this is shown in the correlation distribution between parcel and constituent voxel time-series in Figure 2—figure Supplement 2.

Figure 2—figure Supplement 2 in the revised manuscript now shows that the degree of dependence between the two spatial scales varies over the regional voxel-space. That is, some voxels within a given parcel correlate strongly with the time-series of the parcel they belong to, while others do not. This finding is consistent with a documented increase in correlational structure of neural activity across spatial scales that does not reflect perfect dependency or orthogonality [9]. Notably, the regional voxel-spaces included in the hybridspace decoder are significantly less correlated with the averaged parcel-space time-series than excluded voxels. We now point readers to this new figure in the results.

Taken together, these results indicate that the multi-scale information in the hybrid feature set is complimentary rather than orthogonal. This is consistent with the idea that hybridspace features better represent multi-scale temporospatial dynamics reported to be a fundamental characteristic of how the brain stores and adapts memories, and generates behavior across species [9].

Reviewer #2 (Public review):

Summary:

The current paper consists of two parts. The first part is the rigorous feature optimization of the MEG signal to decode individual finger identity performed in a sequence (4-1-3-2-4; 1~4 corresponds to little~index fingers of the left hand). By optimizing various parameters for the MEG signal, in terms of (i) reconstructed source activity in voxel- and parcel-level resolution and their combination, (ii) frequency bands, and (iii) time window relative to press onset for each finger movement, as well as the choice of decoders, the resultant "hybrid decoder" achieved extremely high decoding accuracy (~95%). This part seems driven almost by pure engineering interest in gaining as high decoding accuracy as possible.

In the second part of the paper, armed with the successful 'hybrid decoder,' the authors asked more scientific questions about how neural representation of individual finger movement that is embedded in a sequence, changes during a very early period of skill learning and whether and how such representational change can predict skill learning. They assessed the difference in MEG feature patterns between the first and the last press 4 in sequence 41324 at each training trial and found that the pattern differentiation progressively increased over the course of early learning trials. Additionally, they found that this pattern differentiation specifically occurred during the rest period rather than during the practice trial. With a significant correlation between the trial-by-trial profile of this pattern differentiation and that for accumulation of offline learning, the authors argue that such "contextualization" of finger movement in a sequence (e.g., what-where association) underlies the early improvement of sequential skill. This is an important and timely topic for the field of motor learning and beyond.

Strengths:

Each part has its own strength. For the first part, the use of temporally rich neural information (MEG signal) has a significant advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. For the second part, the finding of the early "contextualization" of the finger movement in a sequence and its correlation to early (offline) skill improvement is interesting and important. The comparison between "online" and "offline" pattern distance is a neat idea.

Weaknesses:

Despite the strengths raised, the specific goal for each part of the current paper, i.e., achieving high decoding accuracy and answering the scientific question of early skill learning, seems not to harmonize with each other very well. In short, the current approach, which is solely optimized for achieving high decoding accuracy, does not provide enough support and interpretability for the paper's interesting scientific claim. This reminds me of the accuracy-explainability tradeoff in machine learning studies (e.g., Linardatos et al., 2020). More details follow.

There are a number of different neural processes occurring before and after a key press, such as planning of upcoming movement and ahead around premotor/parietal cortices, motor command generation in primary motor cortex, sensory feedback related processes in sensory cortices, and performance monitoring/evaluation around the prefrontal area. Some of these may show learning-dependent change and others may not.

In this paper, the focus as stated in the Introduction was to evaluate “the millisecond-level differentiation of discrete action representations during learning”, a proposal that first required the development of more accurate computational tools. Our first step, reported here, was to develop that tool. With that in hand, we then proceeded to test if neural representations differentiated during early skill learning. Our results showed they did. Addressing the question the Reviewer asks is part of exciting future work, now possible based on the results presented in this paper. We acknowledge this issue in the revised Discussion:

Discussion (Lines 428-434):

“In this study, classifiers were trained on MEG activity recorded during or immediately after each keypress, emphasizing neural representations related to action execution, memory consolidation and recall over those related to planning. An important direction for future research is determining whether separate decoders can be developed to distinguish the representations or networks separately supporting these processes. Ongoing work in our lab is addressing this question. The present accuracy results across varied decoding window durations and alignment with each keypress action support the feasibility of this approach (Figure 3—figure supplement 5).”

Given the use of whole-brain MEG features with a wide time window (up to ~200 ms after each key press) under the situation of 3~4 Hz (i.e., 250~330 ms press interval) typing speed, these different processes in different brain regions could have contributed to the expression of the "contextualization," making it difficult to interpret what really contributed to the "contextualization" and whether it is learning related. Critically, the majority of data used for decoder training has the chance of such potential overlap of signal, as the typing speed almost reached a plateau already at the end of the 11th trial and stayed until the 36th trial. Thus, the decoder could have relied on such overlapping features related to the future presses. If that is the case, a gradual increase in "contextualization" (pattern separation) during earlier trials makes sense, simply because the temporal overlap of the MEG feature was insufficient for the earlier trials due to slower typing speed. Several direct ways to address the above concern, at the cost of decoding accuracy to some degree, would be either using the shorter temporal window for the MEG feature or training the model with the early learning period data only (trials 1 through 11) to see if the main results are unaffected would be some example.

We now include additional analyses carried out with decoding time windows ranging from 50 to 250ms in duration, which have been added to the revised manuscript as follows:

Results (lines 258-261):

“The improved decoding accuracy is supported by greater differentiation in neural representations of the index finger keypresses performed at positions 1 and 5 of the sequence (Figure 4A), and by the trial-by-trial increase in 2-class decoding accuracy over early learning (Figure 4C) across different decoder window durations (Figure 4 – figure supplement 2).”

Results (lines 310-312):

“Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R2 = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C).“

Discussion (lines 382-385):

“This was further supported by the progressive differentiation of neural representations of the index finger keypress (Figure 4A) and by the robust trial-by trial increase in 2-class decoding accuracy across time windows ranging between 50 and 250ms (Figure 4C; Figure 4 – figure supplement 2).”

Discussion (lines 408-9):

“Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1).”

Several new control analyses are also provided addressing the question of overlapping keypresses:

Reviewer #3 (Public review):

Summary:

One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multi-scale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements.

Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training and correlates with a performance metric which the authors interpret as an indicator of offline learning.

Strengths:

A strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybridspace approach follows the neurobiologically plausible idea of concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers.

Weaknesses:

A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, which partly arise from the experimental design (mainly the use of a single sequence) and which are described below, question the neurobiological implications proposed by the authors and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption.

Please, see below for detailed response to each of these points.

Specifically: The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence and test the classifier on other sequences that require the same movements, but in different positions (Kornysheva et al., Neuron 2019). In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4).

A crucial difference between our present study and the elegant study from Kornysheva et al. (2019) in Neuron highlighted by the Reviewer is that while ours is a learning study, the Kornysheva et al. study is not. Kornysheva et al. included an initial separate behavioral training session (i.e. – performed outside of the MEG) during which participants learned associations between fractal image patterns and different keypress sequences. Then in a separate, later MEG session—after the stimulus-response associations had been already learned in the first session—participants were tasked with recalling the learned sequences in response to a presented visual cue (i.e. – the paired fractal pattern).

Our rationale for not including multiple sequences in the same Day 1 training session of our study design was that it would lead to prominent interference effects, as widely reported in the literature [10-12]. Thus, while we had to take the issue of interference into consideration for our design, the Kornysheva et al. study did not. While Kornysheva et al. aimed to “dissociate ordinal position information from information about the moving effectors”, we tested various untrained sequences on Day 2 allowing us to determine that the contextualization result was specific to the trained sequence. By using this approach, we avoided interference effects on the learning of the primary skill caused by simultaneous acquisition of a second skill.

The revised manuscript states our findings related to the Day 2 Control data in the following locations:

Results (lines 117-122):

“On the following day, participants were retested on performance of the same sequence (4-1-3-2-4) over 9 trials (Day 2 Retest), as well as on the single-trial performance of 9 different untrained control sequences (Day 2 Controls: 2-1-3-4-2, 4-2-4-3-1, 3-4-2-3-1, 1-4-3-4-2, 3-2-4-3-1, 1-4-2-3-1, 3-2-4-2-1, 3-2-1-4-2, and 4-23-1-4). As expected, an upward shift in performance of the trained sequence (0.68 ± SD 0.56 keypresses/s; t = 7.21, p < 0.001) was observed during Day 2 Retest, indicative of an overnight skill consolidation effect (Figure 1 – figure supplement 1A).”

Results (lines 212-219):

“Utilizing the highest performing decoders that included LDA-based manifold extraction, we assessed the robustness of hybrid-space decoding over multiple sessions by applying it to data collected on the following day during the Day 2 Retest (9-trial retest of the trained sequence) and Day 2 Control (single-trial performance of 9 different untrained sequences) blocks. The decoding accuracy for Day 2 MEG data remained high (87.11% ± SD 8.54% for the trained sequence during Retest, and 79.44% ± SD 5.54% for the untrained Control sequences; Figure 3 – figure supplement 4). Thus, index finger classifiers constructed using the hybrid decoding approach robustly generalized from Day 1 to Day 2 across trained and untrained keypress sequences.”

Results (lines 269-273):

“On Day 2, incorporating contextual information into the hybrid-space decoder enhanced classification accuracy for the trained sequence only (improving from 87.11% for 4-class to 90.22% for 5-class), while performing at or below-chance levels for the Control sequences (≤ 30.22% ± SD 0.44%). Thus, the accuracy improvements resulting from inclusion of contextual information in the decoding framework was specific for the trained skill sequence.”

As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the keypress, up to at least +/-100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress.

Currently, the manuscript provides little evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context.

During the review process, the authors pointed out that a "mixing" of temporally overlapping information from consecutive keypresses, as described above, should result in systematic misclassifications and therefore be detectable in the confusion matrices in Figures 3C and 4B, which indeed do not provide any evidence that consecutive keypresses are systematically confused. However, such absence of evidence (of systematic misclassification) should be interpreted with caution, and, of course, provides no evidence of absence. The authors also pointed out that such "mixing" would hamper the discriminability of the two ordinal positions of the index finger, given that "ordinal position 5" is systematically followed by "ordinal position 1". This is a valid point which, however, cannot rule out that "contextualization" nevertheless reflects the described "mixing".

The revised manuscript contains several control analyses which rule out this potential confound.

Results (lines 318-328):

“Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or micro-offline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R2 = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R2 = 0.028, p = 0.41; Figure 5 – figure supplement 7).”

Results (lines 385-390):

“Further, the 5-class classifier—which directly incorporated information about the sequence location context of each keypress into the decoding pipeline—improved decoding accuracy relative to the 4-class classifier (Figure 4C). Importantly, testing on Day 2 revealed specificity of this representational differentiation for the trained skill but not for the same keypresses performed during various unpracticed control sequences (Figure 5C).”

Discussion (lines 408-423):

“Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than within subject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4).

Offline contextualization was not driven by trial-by-trial behavioral differences, including typing rhythm (Figure 5 – figure supplement 5) and adjacent keypress transition times (Figure 5 – figure supplement 6) nor by between-subject differences in overall typing speed (Figure 5 – figure supplement 7)—ruling out a reliance on differences in the temporal overlap of keypresses. Importantly, offline contextualization documented on Day 1 stabilized once a performance plateau was reached (trials 11-36), and was retained on Day 2, documenting overnight consolidation of the differentiated neural representations.”

During the review process, the authors responded to my concern that training of a single sequence introduces the potential confound of "mixing" described above, which could have been avoided by training on several sequences, as in Kornysheva et al. (Neuron 2019), by arguing that Day 2 in their study did include control sequences. However, the authors' findings regarding these control sequences are fundamentally different from the findings in Kornysheva et al. (2019), and do not provide any indication of effector-independent ordinal information in the described contextualization - but, actually, the contrary. In Kornysheva et al. (Neuron 2019), ordinal, or positional, information refers purely to the rank of a movement in a sequence. In line with the idea of competitive queuing, Kornysheva et al. (2019) have shown that humans prepare for a motor sequence via a simultaneous representation of several of the upcoming movements, weighted by their rank in the sequence. Importantly, they could show that this gradient carries information that is largely devoid of information about the order of specific effectors involved in a sequence, or their timing, in line with competitive queuing. They showed this by training a classifier to discriminate between the five consecutive movements that constituted one specific sequence of finger movements (five classes: 1st, 2nd, 3rd, 4th, 5th movement in the sequence) and then testing whether that classifier could identify the rank (1st, 2nd, 3rd, etc) of movements in another sequence, in which the fingers moved in a different order, and with different timings. Importantly, this approach demonstrated that the graded representations observed during preparation were largely maintained after this cross decoding, indicating that the sequence was represented via ordinal position information that was largely devoid of information about the specific effectors or timings involved in sequence execution. This result differs completely from the findings in the current manuscript. Dash et al. report a drop in detected ordinal position information (degree of contextualization in figure 5C) when testing for contextualization in their novel, untrained sequences on Day 2, indicating that context and ordinal information as defined in Dash et al. is not at all devoid of information about the specific effectors involved in a sequence. In this regard, a main concern in my public review, as well as the second reviewer's public review, is that Dash et al. cannot tell apart, by design, whether there is truly contextualization in the neural representation of a sequence (which they claim), or whether their results regarding "contextualization" are explained by what they call "mixing" in their author response, i.e., an overlap of representations of consecutive movements, as suggested as an alternative explanation by Reviewer 2 and myself.

Again, as stated in response to a related comment by the Reviewer above, it is not surprising that our results differ from the study by Kornysheva et al. (2019) . A crucial difference between the studies that the Reviewer fails to recognize is that while ours is a learning study, the Kornysheva et al. study is not. Our rationale for not including multiple sequences in the same Day 1 training session of our study design was that it would lead to prominent interference effects, as widely reported in the literature [10-12]. Thus, while we had to take the issue of interference into consideration for our design, the Kornysheva et al. study did not, since it was not concerned with learning dynamics. The strengths of the elegant Kornysheva study highlighted by the Reviewer—that the pre-planned sequence queuing gradient of sequence actions was independent of the effectors or timings used—is precisely due to the fact that participants were selecting between sequence options that had been previously—and equivalently—learned. The decoders in the Kornynsheva study were trained to classify effector- and timing-independent sequence position information— by design—so it is not surprising that this is the information they reflect.

The questions asked in our study were different: (1) Do the neural representations of the same sequence action executed in different skill (ordinal sequence) locations differentiate (contextualize) during early learning? and (2) Is the observed contextualization specific to the learned sequence? Thus, while Kornysheva et al. aimed to “dissociate ordinal position information from information about the moving effectors”, we tested various untrained sequences on Day 2 allowing us to determine that the contextualization result was specific to the trained sequence. By using this approach, we avoided interference effects on the learning of the primary skill caused by simultaneous acquisition of a second skill.

Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - figure supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - figure supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject, or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression.

The aim of the between-subject regression analysis presented in the Results (see below) and in Figure 5—figure supplement 7 (previously Figure 5—figure supplement 3) of the revised manuscript, was to rule out a general effect of tapping speed on the magnitude of contextualization observed. If temporal overlap of neural representations was driving their differentiation, then participants typing at higher speeds should also show greater contextualization scores. We made the decision to use a between-subject analysis to address this issue since within-subject skill speed variance was rather small over most of the training session.

The Reviewer’s request that we additionally carry-out a “regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject, or at a group-level, after averaging across subjects)” is essentially the same request of Reviewer 2 above. That request was to perform a modified simple linear regression analysis where the predictor is the sum the 4-4 and 4-1 transition times, since these transitions are where any temporal overlaps of neural representations would occur. A new Figure 5 – figure supplement 6 in the revised manuscript includes a scatter plot showing the sum of adjacent index finger keypress transition times (i.e. – the 4-4 transition at the conclusion of one sequence iteration and the 4-1 transition at the beginning of the next sequence iteration) versus online contextualization distances measured during practice trials. Both the keypress transition times and online contextualization scores were z-score normalized within individual subjects, and then concatenated into a single data superset. As is clear in the figure data, results of the regression analysis showed a very weak linear relationship between the two (R2 = 0.00507, F[1,3202] = 16.3). Thus, contextualization score magnitudes do not reflect the amount of overlap between adjacent keypresses when assessed either within- or between-subject.

The revised manuscript now states:

Results (lines 318-328):

“Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or micro-offline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R2 = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R2 = 0.028, p = 0.41; Figure 5 – figure supplement 7).”

Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physical context should be controlled for).

The revised manuscript now addresses specifically the question of mixing of temporally overlapping information:

Results (Lines 310-328)

“Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R2 = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3). Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or micro-offline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R2 = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R2 = 0.028, p = 0.41; Figure 5 – figure supplement 7). “

Discussion (Lines 417-423)

“Offline contextualization was not driven by trial-by-trial behavioral differences, including typing rhythm (Figure 5 – figure supplement 5) and adjacent keypress transition times (Figure 5 – figure supplement 6) nor by between-subject differences in overall typing speed (Figure 5 – figure supplement 7)—ruling out a reliance on differences in the temporal overlap of keypresses. Importantly, offline contextualization documented on Day 1 stabilized once a performance plateau was reached (trials 11-36), and was retained on Day 2, documenting overnight consolidation of the differentiated neural representations.”

A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023).

The revised manuscript now addresses specifically the question of pre-planning:

Results (lines 310-318):

“Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R2 = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3).”

Discussion (lines 408-416):

“Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than within-subject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4).”

A further complication in interpreting the results stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen. It is not clear why the authors introduced this complicating visual feedback in their task, besides consistency with their previous studies. The resulting systematic link between the pattern of visual stimulation (the number of asterisks on the screen) and the ordinal position of a keypress makes the interpretation of "contextual information" that differentiates between ordinal positions difficult. During the review process, the authors reported a confusion matrix from a classification of asterisks position based on eye tracking data recorded during the task and concluded that the classifier performed at chance level and gaze was, thus, apparently not biased by the visual stimulation. However, the confusion matrix showed a huge bias that was difficult to interpret (a very strong tendency to predict one of the five asterisk positions, despite chance-level performance). Without including additional information for this analysis (or simply the gaze position as a function of the number of astersisk on the screen) in the manuscript, this important control analysis cannot be properly assessed, and is not available to the public.

We now include the gaze position data requested by the Reviewer alongside the confusion matrix results in Figure 4 – figure supplement 3.

Results (lines 207-211):

“An alternate decoder trained on ICA components labeled as movement or physiological artefacts (e.g. – head movement, ECG, eye movements and blinks; Figure 3 – figure supplement 3A, D) and removed from the original input feature set during the pre-processing stage approached chance-level performance (Figure 4 – figure supplement 3), indicating that the 4-class hybrid decoder results were not driven by task-related artefacts.” Results (lines 261-268):

“As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (cross-validated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C). “

Discussion (Lines 362-368):

“Task-related movements—which also express in lower frequency ranges—did not explain these results given the near chance-level performance of alternative decoders trained on (a) artefact-related ICA components removed during MEG preprocessing (Figure 3 – figure supplement 3A-C) and on (b) task-related eye movement features (Figure 4 – figure supplement 3B, C). This explanation is also inconsistent with the minimal average head motion of 1.159 mm (± 1.077 SD) across the MEG recording (Figure 3 – figure supplement 3D).”

The rationale for the task design including the asterisks is presented below:

Methods (Lines 500-514)

“The five-item sequence was displayed on the computer screen for the duration of each practice round and participants were directed to fix their gaze on the sequence. Small asterisks were displayed above a sequence item after each successive keypress, signaling the participants' present position within the sequence. Inclusion of this feedback minimizes working memory loads during task performance [73]. Following the completion of a full sequence iteration, the asterisk returned to the first sequence item. The asterisk did not provide error feedback as it appeared for both correct and incorrect keypresses. At the end of each practice round, the displayed number sequence was replaced by a string of five "X" symbols displayed on the computer screen, which remained for the duration of the rest break. Participants were instructed to focus their gaze on the screen during this time. The behavior in this explicit, motor learning task consists of generative action sequences rather than sequences of stimulus-induced responses as in the serial reaction time task (SRTT). A similar real-world example would be manually inputting a long password into a secure online application in which one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user.”

The authors report a significant correlation between "offline differentiation" and cumulative micro-offline gains. However, this does not address the question whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - i.e., the question whether performance changes (micro-offline gains) are less pronounced across rest periods for which the change in "contextualization" is relatively low. The single-subject correlation between contextualization changes "during" rest and micro-offline gains (Figure 5 - figure supplement 4) addresses this question, however, the critical statistical test (are correlation coefficients significantly different from zero) is not included. Given the displayed distribution, it seems unlikely that correlation coefficients are significantly above zero.

As recommend by the Reviewer, we now include one-way right-tailed t-test results which provide further support to the previously reported finding. The mean of within-subject correlations between offline contextualization and cumulative micro-offline gains was significantly greater than zero (t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76; see Figure 5 – figure supplement 4, left), while correlations for online contextualization versus cumulative micro-online (t = -1.14, p = 0.8669, df = 25, Cohen's d = -0.22) or micro-offline gains t = -0.097, p = 0.5384, df = 25, Cohen's d = -0.019 were not. We have incorporated the significant one-way t-test for offline contextualization and cumulative micro-offline gains in the Results section of the revised manuscript (lines 313-318) and the Figure 5 – figure supplement 4 legend.

The authors follow the assumption that micro-offline gains reflect offline learning.

However, there is no compelling evidence in the literature, and no evidence in the present manuscript, that micro-offline gains (during any training phase) reflect offline learning. Instead, emerging evidence in the literature indicates that they do not (Das et al., bioRxiv 2024), and instead reflect transient performance benefits when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024). During the review process, the authors argued that differences in the design between Das et al. (2024) on the one hand (Experiments 1 and 2), and the study by Bönstrup et al. (2019) on the other hand, may have prevented Das et al. (2024) from finding the assumed (lasting) learning benefit by micro-offline consolidation. However, the Supplementary Material of Das et al. (2024) includes an experiment (Experiment S1) whose design closely follows the early learning phase of Bönstrup et al. (2019), and which, nevertheless, demonstrates that there is no lasting benefit of taking breaks for the acquired skill level, despite the presence of micro-offline gains.

We thank the Reviewer for alerting us to this new data added to the revised supplementary materials of Das et al. (2024) posted to bioRxiv. However, despite the Reviewer’s claim to the contrary, a careful comparison between the Das et al and Bönstrup et al studies reveal more substantive differences than similarities and does not “closely follows a large proportion of the early learning phase of Bönstrup et al. (2019)” as stated.

In the Das et al. Experiment S1, sixty-two participants were randomly assigned to “with breaks” or “no breaks” skill training groups. The “with breaks” group alternated 10 seconds of skill sequence practice with 10 seconds of rest over seven trials (2 min and 2 sec total training duration). This amounts to 66.7% of the early learning period defined by Bönstrup et al. (2019) (i.e. - eleven 10-second-long practice periods interleaved with ten 10-second-long rest breaks; 3 min 30 sec total training duration).

Also, please note that while no performance feedback nor reward was given in the Bönstrup et al. (2019) study, participants in the Das et al. study received explicit performance-based monetary rewards, a potentially crucial driver of differentiated behavior between the two studies:

“Participants were incentivized with bonus money based on the total number of correct sequences completed throughout the experiment.”

The “no breaks” group in the Das et al. study practiced the skill sequence for 70 continuous seconds. Both groups (despite one being labeled “no breaks”) follow training with a long 3-minute break (also note that since the “with breaks” group ends with 10 seconds of rest their break is actually longer), before finishing with a skill “test” over a continuous 50-second-long block. During the 70 seconds of training, the “with breaks” group shows more learning than the “no breaks” group. Interestingly, following the long 3minute break the “with breaks” group display a performance drop (relative to their performance at the end of training) that is stable over the full 50-second test, while the “no breaks” group shows an immediate performance improvement following the long break that continues to increase over the 50-second test.

Separately, there are important issues regarding the Das et al. study that should be considered through the lens of recent findings not referred to in the preprint. A major element of their experimental design is that both groups—“with breaks” and “no breaks”— actually receive quite a long 3-minute break just before the skill test. This long break is more than 2.5x the cumulative interleaved rest experienced by the “with breaks” group. Thus, although the design is intended to contrast the presence or absence of rest “breaks”, that difference between groups is no longer maintained at the point of the skill test.

The Das et al. results are most consistent with an alternative interpretation of the data— that the “no breaks” group experiences offline learning during their long 3-minute break. This is supported by the recent work of Griffin et al. (2025) where micro-array recordings from primary and premotor cortex were obtained from macaque monkeys while they performed blocks of ten continuous reaching sequences up to 81.4 seconds in duration (see source data for Extended Data Figure 1h) with 90 seconds of interleaved rest. Griffin et al. observed offline improvement in skill immediately following the rest break that was causally related to neural reactivations (i.e. – neural replay) that occurred during the rest break. Importantly, the highest density of reactivations was present in the very first 90second break between Blocks 1 and 2 (see Fig. 2f in Griffin et al., 2025). This supports the interpretation that both the “with breaks” and “no breaks” group express offline learning gains, with these gains being delayed in the “no breaks” group due to the practice schedule.

On the other hand, if offline learning can occur during this longer break, then why would the “with breaks” group show no benefit? Again, it could be that most of the offline gains for this group were front-loaded during the seven shorter 10-second rest breaks. Another possible, though not mutually exclusive, explanation is that the observed drop in performance in the “with breaks” group is driven by contextual interference. Specifically, similar to Experiments 1 and 2 in Das et al. (2024), the skill test is conducted under very different conditions than those which the “with breaks” group practiced the skill under (short bursts of practiced alternating with equally short breaks). On the other hand, the “no breaks” group is tested (50 seconds of continuous practice) under quite similar conditions to their training schedule (70 seconds of continuous practice). Thus, it is possible that this dissimilarity between training and test could lead to reduced performance in the “with breaks” group.

We made the following manuscript revisions related to these important issues:

Introduction (Lines 26-56)

“Practicing a new motor skill elicits rapid performance improvements (early learning) [1] that precede skill performance plateaus [5]. Skill gains during early learning accumulate over rest periods (micro-offline) interspersed with practice [1, 6-10], and are up to four times larger than offline performance improvements reported following overnight sleep [1]. During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11]. Micro-offline gains observed during early learning are reproducible [7, 10-13] and are similar in magnitude even when practice periods are reduced by half to 5 seconds in length, thereby confirming that they are not merely a result of recovery from performance fatigue [11]. Additionally, they are unaffected by the random termination of practice periods, which eliminates the possibility of predictive motor slowing as a contributing factor [11]. Collectively, these behavioral findings point towards the interpretation that micro offline gains during early learning represent a form of memory consolidation [1].

This interpretation has been further supported by brain imaging and electrophysiological studies linking known memory-related networks and consolidation mechanisms to rapid offline performance improvements. In humans, the rate of hippocampo-neocortical neural replay predicts micro-offline gains [6]. Consistent with these findings, Chen et al. [12] and Sjøgård et al. [13] furnished direct evidence from intracranial human EEG studies, demonstrating a connection between the density of hippocampal sharp-wave ripples (80-120 Hz)—recognized markers of neural replay—and micro-offline gains during early learning. Further, Griffin et al. reported that neural replay of task-related ensembles in the motor cortex of macaques during brief rest periods— akin to those observed in humans [1, 6-8, 14]—are not merely correlated with, but are causal drivers of micro-offline learning [15]. Specifically, the same reach directions that were replayed the most during rest breaks showed the greatest reduction in path length (i.e. – more efficient movement path between two locations in the reach sequence) during subsequent trials, while stimulation applied during rest intervals preceding performance plateau reduced reactivation rates and virtually abolished micro-offline gains [15]. Thus, converging evidence in humans and non-human primates across indirect non-invasive and direct invasive recording techniques link hippocampal activity, neural replay dynamics and offline skill gains in early motor learning that precede performance plateau.”

Next, in the Methods, we articulate important constrains formulated by Pan and Rickard and Bonstrup et al for meaningful measurements:

Methods (Lines 493-499)

“The study design followed specific recommendations by Pan and Rickard (2015): (1) utilizing 10-second practice trials and (2) constraining analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur) that precede the emergence of “scalloped” performance dynamics strongly linked to reactive inhibition effects ([29, 72]). This is precisely the portion of the learning curve Pan and Rickard referred to when they stated “…rapid learning during that period masks any reactive inhibition effect” [29].”

We finally discuss the implications of neglecting some or all of these recommendations:

Discussion (Lines 444-452):

“Finally, caution should be exercised when extrapolating findings during early skill learning, a period of steep performance improvements, to findings reported after insufficient practice [67], post-plateau performance periods [68], or non-learning situations (e.g. performance of non-repeating keypress sequences in [67]) when reactive inhibition or contextual interference effects are prominent. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.”

Along these lines, the authors' claim, based on Bönstrup et al. 2020, that "retroactive interference immediately following practice periods reduces micro-offline learning", is not supported by that very reference. Citing Bönstrup et al. (2020), "Regarding early learning dynamics (trials 1-5), we found no differences in microscale learning parameters (micro online/offline) or total early learning between both interference groups." That is, contrary to Dash et al.'s current claim, Bönstrup et al. (2020) did not find any retroactive interference effect on the specific behavioral readout (micro-offline gains) that the authors assume to reflect consolidation.

Please, note that the Bönstrup et al. 2020 paper abstract states:

“Third, retroactive interference immediately after each practice period reduced the learning rate relative to interference after passage of time (N = 373), indicating stabilization of the motor memory at a microscale of several seconds.”

which is further supported by this statement in the Results:

“The model comprised three parameters representing the initial performance, maximum performance and learning rate (see Eq. 1, “Methods”, “Data Analysis” section). We then statistically compared the model parameters between the interference groups (Fig. 2d). The late interference group showed a higher learning rate compared with the early interference group (late: 0.26 ± 0.23, early: 2.15 ± 0.20, P=0.04). The effect size of the group difference was small to medium (Cohen’s d 0.15)[29]. Similar differences with a stronger rise in the learning curve of a late interference groups vs. an early interference group were found in a smaller sample collected in the lab environment (Supplementary Fig. 3).”

We have modified the statement in the revised manuscript to specify that the difference observed was between learning rates: Introduction (Lines 30-32)

“During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11].”

The authors conclude that performance improves, and representation manifolds differentiate, "during" rest periods (see, e.g., abstract). However, micro-offline gains (as well as offline contextualization) are computed from data obtained during practice, not rest, and may, thus, just as well reflect a change that occurs "online", e.g., at the very onset of practice (like pre-planning) or throughout practice (like fatigue, or reactive inhibition).

The Reviewer raises again the issue of a potential confound of “pre-planning” on our contextualization measures as in the comment above:

“Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023).”

The cited studies by Ariani et al. indicate that effects of pre-planning are likely to impact the first 3 keypresses of the initial sequence iteration in each trial. As stated in the response to this comment above, we conducted a control analysis of contextualization that ignores the first sequence iteration in each trial to partial out any potential preplanning effect. This control analyses yielded comparable results, indicating that preplanning is not a major driver of our reported contextualization effects. We now report this in the revised manuscript:

We also state in the Figure 1 legend (Lines 99-103) in the revised manuscript that preplanning has no effect on the behavioral measures of micro-offline and micro-online gains in our dataset:

The Reviewer also raises the issue of possible effects stemming from “fatigue” and “reactive inhibition” which inhibit performance and are indeed relevant to skill learning studies. We designed our task to specifically mitigate these effects. We now more clearly articulate this rationale in the description of the task design as well as the measurement constraints essential for minimizing their impact.

We also discuss the implications of fatigue and reactive inhibition effects in experimental designs that neglect to follow these recommendations formulated by Pan and Rickard in the Discussion section and propose how this issue can be better addressed in future investigations.

To summarize, the results of our study indicate that: (a) offline contextualization effects are not explained by pre-planning of the first action sequence iteration in each practice trial; and (b) the task design implemented in this study purposefully minimize any possible effects of reactive inhibition or fatigue. Circling back to the Reviewer’s proposal that “contextualization…may just as well reflect a change that occurs "online"”, we show in this paper direct empirical evidence that contextualization develops to a greater extent across rest periods rather than across practice trials, contrary to the Reviewer’s proposal.

That is, the definition of micro-offline gains (as well as offline contextualization) conflates online and "offline" processes. This becomes strikingly clear in the recent Nature paper by Griffin et al. (2025), who computed micro-offline gains as the difference in average performance across the first five sequences in a practice period (a block, in their terminology) and the last five sequences in the previous practice period. Averaging across sequences in this way minimises the chance to detect online performance changes and inflates changes in performance "offline". The problem that "online" gains (or contextualization) is actually computed from data entirely generated online, and therefore subject to processes that occur online, is inherent in the very definition of micro-online gains, whether, or not, they computed from averaged performance.

We would like to make it clear that the issue raised by the Reviewer with respect to averaging across sequences done in the Griffin et al. (2025) study does not impact our study in any way. The primary skill measure used in all analyses reported in our paper is not temporally averaged. We estimated instantaneous correct sequence speed over the entire trial. Once the first sequence iteration within a trial is completed, the speed estimate is then updated at the resolution of individual keypresses. All micro-online and -offline behavioral changes are measured as the difference in instantaneous speed at the beginning and end of individual practice trials.

Methods (lines 528-530):

“The instantaneous correct sequence speed was calculated as the inverse of the average KTT across a single correct sequence iteration and was updated for each correct keypress.”

The instantaneous speed measure used in our analyses, in fact, maximizes the likelihood of detecting changes in online performance, as the Reviewer indicates. Despite this optimally sensitive measurement of online changes, our findings remained robust, consistently converging on the same outcome across our original analyses and the multiple controls recommended by the reviewers. Notably, online contextualization changes are significantly weaker than offline contextualization in all comparisons with different measurement approaches.

Results (lines 302-309)

“The Euclidian distance between neural representations of IndexOP1 (i.e. - index finger keypress at ordinal position 1 of the sequence) and IndexOP5 (i.e. - index finger keypress at ordinal position 5 of the sequence) increased progressively during early learning (Figure 5A)—predominantly during rest intervals (offline contextualization) rather than during practice (online) (t = 4.84, p < 0.001, df = 25, Cohen's d = 1.2; Figure 5B; Figure 5 – figure supplement 1A). An alternative online contextualization determination equalling the time interval between online and offline comparisons (Trial-based; 10 seconds between IndexOP1 and IndexOP5 observations in both cases) rendered a similar result (Figure 5 – figure supplement 2B).

Results (lines 316-318)

“Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3).”

Results (lines 318-328)

“Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or microoffline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R2 = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R2 = 0.028, p = 0.41; Figure 5 – figure supplement 7).”

We disagree with the Reviewer’s statement that “the definition of micro-offline gains (as well as offline contextualization) conflates online and "offline" processes”. From a strictly behavioral point of view, it is obviously true that one can only measure skill (rather than the absence of it during rest) to determine how it changes over time. While skill changes surrounding rest are used to infer offline learning processes, recovery of skill decay following intense practice is used to infer “unmeasurable” recovery from fatigue or reactive inhibition. In other words, the alternative processes proposed by the Reviewer also rely on the same inferential reasoning.

Importantly, inferences can be validated through the identification of mechanisms. Our experiment constrained the study to evaluation of changes in neural representations of the same action in different contexts, while minimized the impact of mechanisms related to fatigue/reactive inhibition [13, 14]. In this way, we observed that behavioral gains and neural contextualization occurs to a greater extent over rest breaks rather than during practice trials and that offline contextualization changes strongly correlate with the offline behavioral gains, while online contextualization does not. This result was supported by the results of all control analyses recommended by the Reviewers. Specifically:

Methods (Lines 493-499)

“The study design followed specific recommendations by Pan and Rickard (2015): (1) utilizing 10-second practice trials and (2) constraining analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur) that precede the emergence of “scalloped” performance dynamics strongly linked to reactive inhibition effects ([29, 72]). This is precisely the portion of the learning curve Pan and Rickard referred to when they stated “…rapid learning during that period masks any reactive inhibition effect” [29].”

And Discussion (Lines 444-448):

“Finally, caution should be exercised when extrapolating findings during early skill learning, a period of steep performance improvements, to findings reported after insufficient practice [67], post-plateau performance periods [68], or non-learning situations (e.g. performance of non-repeating keypress sequences in [67]) when reactive inhibition or contextual interference effects are prominent.”

Next, we show that offline contextualization is greater than online contextualization and predicts offline behavioral gains across all measurement approaches, including all controls suggested by the Reviewer’s comments and recommendations.

Results (lines 302-318):

“The Euclidian distance between neural representations of IndexOP1 (i.e. - index finger keypress at ordinal position 1 of the sequence) and IndexOP5 (i.e. - index finger keypress at ordinal position 5 of the sequence) increased progressively during early learning (Figure 5A)—predominantly during rest intervals (offline contextualization) rather than during practice (online) (t = 4.84, p < 0.001, df = 25, Cohen's d = 1.2; Figure 5B; Figure 5 – figure supplement 1A). An alternative online contextualization determination equalling the time interval between online and offline comparisons (Trial-based; 10 seconds between IndexOP1 and IndexOP5 observations in both cases) rendered a similar result (Figure 5 – figure supplement 2B).

Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R2 = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3).”

Results (lines 318-324)

“Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or microoffline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69).”

Discussion (lines 408-416):

“Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than within subject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4).”

We then show that offline contextualization is not explained by pre-planning of the first action sequence:

Results (lines 310-316):

“Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R2 = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches).”

Discussion (lines 409-412):

“This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A).”

In summary, none of the presented evidence in this paper—including results of the multiple control analyses carried out in response to the Reviewers’ recommendations— supports the Reviewer’s position.

Please note that the micro-offline learning "inference" has extensive mechanistic support across species and neural recording techniques (see Introduction, lines 26-56). In contrast, the reactive inhibition "inference," which is the Reviewer's alternative interpretation, has no such support yet [15].

Introduction (Lines 26-56)

“Practicing a new motor skill elicits rapid performance improvements (early learning) [1] that precede skill performance plateaus [5]. Skill gains during early learning accumulate over rest periods (micro-offline) interspersed with practice [1, 6-10], and are up to four times larger than offline performance improvements reported following overnight sleep [1]. During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11]. Micro-offline gains observed during early learning are reproducible [7, 10-13] and are similar in magnitude even when practice periods are reduced by half to 5 seconds in length, thereby confirming that they are not merely a result of recovery from performance fatigue [11]. Additionally, they are unaffected by the random termination of practice periods, which eliminates the possibility of predictive motor slowing as a contributing factor [11]. Collectively, these behavioral findings point towards the interpretation that microoffline gains during early learning represent a form of memory consolidation [1].

This interpretation has been further supported by brain imaging and electrophysiological studies linking known memory-related networks and consolidation mechanisms to rapid offline performance improvements. In humans, the rate of hippocampo-neocortical neural replay predicts micro-offline gains [6].

Consistent with these findings, Chen et al. [12] and Sjøgård et al. [13] furnished direct evidence from intracranial human EEG studies, demonstrating a connection between the density of hippocampal sharp-wave ripples (80-120 Hz)—recognized markers of neural replay—and micro-offline gains during early learning. Further, Griffin et al. reported that neural replay of task-related ensembles in the motor cortex of macaques during brief rest periods— akin to those observed in humans [1, 6-8, 14]—are not merely correlated with, but are causal drivers of micro-offline learning [15]. Specifically, the same reach directions that were replayed the most during rest breaks showed the greatest reduction in path length (i.e. – more efficient movement path between two locations in the reach sequence) during subsequent trials, while stimulation applied during rest intervals preceding performance plateau reduced reactivation rates and virtually abolished micro-offline gains [15]. Thus, converging evidence in humans and non-human primates across indirect non-invasive and direct invasive recording techniques link hippocampal activity, neural replay dynamics and offline skill gains in early motor learning that precede performance plateau.”

That said, absence of evidence, is not evidence of absence and for that reason we also state in the Discussion (lines 448-452):

A simple control analysis based on shuffled class labels could lend further support to the authors' complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance-level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). During the review process, the authors reported this analysis to the reviewers. Given that readers may consider following the presented decoding approach in their own work, it would have been important to include that control analysis in the manuscript to convince readers of its validity.

As requested, the label-shuffling analysis was carried out for both 4- and 5-class decoders and is now reported in the revised manuscript.

Results (lines 204-207):

“Testing the keypress state (4-class) hybrid decoder performance on Day 1 after randomly shuffling keypress labels for held-out test data resulted in a performance drop approaching expected chance levels (22.12%± SD 9.1%; Figure 3 – figure supplement 3C).”

Results (lines 261-264):

“As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C).”

Furthermore, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - it is unclear what the authors refer to when they talk about the sign of the "average source", line 477).

The revised manuscript now provides a more detailed explanation of the parcellation, and sign-flipping procedures implemented:

Methods (lines 604-611):

“Source-space parcellation was carried out by averaging all voxel time-series located within distinct anatomical regions defined in the Desikan-Killiany Atlas [31]. Since source time-series estimated with beamforming approaches are inherently sign-ambiguous, a custom Matlab-based implementation of the mne.extract_label_time_course with “mean_flip” sign-flipping procedure in MNEPython [78] was applied prior to averaging to prevent within-parcel signal cancellation. All voxel time-series within each parcel were extracted and the timeseries sign was flipped at locations where the orientation difference was greater than 90° from the parcel mode. A mean time-series was then computed across all voxels within the parcel after sign-flipping.”

Recommendations for the authors:

Reviewer #1 (Recommendations for the authors):

Comments on the revision:

The authors have made large efforts to address all concerns raised. A couple of suggestions remain:

- formally show if and how movement artefacts may contribute to the signal and analysis; it seems that the authors have data to allow for such an analysis

We have implemented the requested control analyses addressing this issue. They are reported in: Results (lines 207-211 and 261-268), Discussion (Lines 362-368):

- formally show that the signals from the intra- and inter parcel spaces are orthogonal.

Please note that, despite the Reviewer’s statement above, we never claim in the manuscript that the parcel-space and regional voxel-space features show “complete independence”.

Furthermore, the machine learning-based decoding methods used in the present study do not require input feature orthogonality, but instead non-redundancy [7], which is a requirement satisfied by our data (see below and the new Figure 2 – figure supplement 2 in the revised manuscript). Finally, our results already show that the hybrid space decoder outperformed all other methods even after input features were fully orthogonalized with LDA or PCA dimensionality reduction procedures prior to the classification step (Figure 3 – figure supplement 2).

We also highlight several additional results that are informative regarding this issue. For example, if spatially overlapping parcel- and voxel-space time-series only provided redundant information, inclusion of both as input features should increase model overfitting to the training dataset and decrease overall cross-validated test accuracy [8]. In the present study however, we see the opposite effect on decoder performance. First, Figure 3 – figure supplements 1 & 2 clearly show that decoders constructed from hybrid-space features outperform the other input feature (sensor-, whole-brain parcel- and whole-brain voxel-) spaces in every case (e.g. – wideband, all narrowband frequency ranges, and even after the input space is fully orthogonalized through dimensionality reduction procedures prior to the decoding step). Furthermore, Figure 3 – figure supplement 6 shows that hybridspace decoder performance supers when parcel-time series that spatially overlap with the included regional voxel-spaces are removed from the input feature set. We state in the Discussion (lines 353-356)

“The observation of increased cross-validated test accuracy (as shown in Figure 3 – Figure Supplement 6) indicates that the spatially overlapping information in parcel- and voxel-space time-series in the hybrid decoder was complementary, rather than redundant [41].”

To gain insight into the complimentary information contributed by the two spatial scales to the hybrid-space decoder, we first independently computed the matrix rank for whole-brain parcel- and voxel-space input features for each participant (shown in Author response image 1). The results indicate that whole-brain parcel-space input features are full rank (rank = 148) for all participants (i.e. - MEG activity is orthogonal between all parcels). The matrix rank of voxelspace input features (rank = 267± 17 SD), exceeded the parcel-space rank for all participants and approached the number of useable MEG sensor channels (n = 272). Thus, voxel-space features provide both additional and complimentary information to representations at the parcel-space scale.

Figure 2—figure Supplement 2 in the revised manuscript now shows that the degree of dependence between the two spatial scales varies over the regional voxel-space. That is, some voxels within a given parcel correlate strongly with the time-series of the parcel they belong to, while others do not. This finding is consistent with a documented increase in correlational structure of neural activity across spatial scales that does not reflect perfect dependency or orthogonality [9]. Notably, the regional voxel-spaces included in the hybridspace decoder are significantly less correlated with the averaged parcel-space time-series than excluded voxels. We now point readers to this new figure in the results.

Taken together, these results indicate that the multi-scale information in the hybrid feature set is complimentary rather than orthogonal. This is consistent with the idea that hybridspace features better represent multi-scale temporospatial dynamics reported to be a fundamental characteristic of how the brain stores and adapts memories, and generates behavior across species [9].

Reviewer #2 (Recommendations for the authors):

I appreciate the authors' efforts in addressing the concerns I raised. The responses generally made sense to me. However, I had some trouble finding several corrections/additions that the authors claim they made in the revised manuscript:

"We addressed this question by conducting a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4, and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis also affirmed that the possible alternative explanation that contextualization effects are simple reflections of increased mixing is not supported by the data (Adjusted R2 = 0.00431; F = 5.62). We now include this new negative control analysis in the revised manuscript."

This approach is now reported in the manuscript in the Results (Lines 324-328 and Figure 5-Figure Supplement 6 legend).

"We strongly agree with the Reviewer that the issue of generalizability is extremely important and have added a new paragraph to the Discussion in the revised manuscript highlighting the strengths and weaknesses of our study with respect to this issue."

Discussion (Lines 436-441)

“One limitation of this study is that contextualization was investigated for only one finger movement (index finger or digit 4) embedded within a relatively short 5-item skill sequence. Determining if representational contextualization is exhibited across multiple finger movements embedded within for example longer sequences (e.g. – two index finger and two little finger keypresses performed within a short piece of piano music) will be an important extension to the present results.”

"We strongly agree with the Reviewer that any intended clinical application must carefully consider the specific input feature constraints dictated by the clinical cohort, and in turn impose appropriate and complimentary constraints on classifier parameters that may differ from the ones used in the present study. We now highlight this issue in the Discussion of the revised manuscript and relate our present findings to published clinical BCI work within this context."

Discussion (Lines 441-444)

“While a supervised manifold learning approach (LDA) was used here because it optimized hybrid-space decoder performance, unsupervised strategies (e.g. - PCA and MDS, which also substantially improved decoding accuracy in the present study; Figure 3 – figure supplement 2) are likely more suitable for real-time BCI applications.”

and

"The Reviewer makes a good point. We have now implemented the suggested normalization procedure in the analysis provided in the revised manuscript."

Results (lines 275-282)

“We used a Euclidian distance measure to evaluate the differentiation of the neural representation manifold of the same action (i.e. - an index-finger keypress) executed within different local sequence contexts (i.e. - ordinal position 1 vs. ordinal position 5; Figure 5). To make these distance measures comparable across participants, a new set of classifiers was then trained with group-optimal parameters (i.e. – broadband hybrid-space MEG data with subsequent manifold extraction Figure 3 – figure supplements 2) and LDA classifiers (Figure 3 – figure supplements 7) trained on 200ms duration windows aligned to the KeyDown event (see Methods, Figure 3 – figure supplements 5). “

Where are they in the manuscript? Did I read the wrong version? It would be more helpful to specify with page/line numbers. Please also add the detailed procedure of the control/additional analyses in the Method.

As requested, we now refer to all manuscript revisions with specific line numbers. We have also included all detailed procedures related to any additional analyses requested by reviewers.

I also have a few other comments back to the authors' following responses:

"Thus, increased overlap between the "4" and "1" keypresses (at the start of the sequence) and "2" and "4" keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged. One must also keep in mind that since participants repeat the sequence multiple times within the same trial, a majority of the index finger keypresses are performed adjacent to one another (i.e. - the "4-4" transition marking the end of one sequence and the beginning of the next). Thus, increased overlap between consecutive index finger keypresses as typing speed increased should increase their similarity and mask contextualization- related changes to the underlying neural representations." "We also re-examined our previously reported classification results with respect to this issue.

We reasoned that if mixing effects reflecting the ordinal sequence structure is an important driver of the contextualization finding, these effects should be observable in the distribution of decoder misclassifications. For example, "4" keypresses would be more likely to be misclassified as "1" or "2" keypresses (or vice versa) than as "3" keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3-figure supplement 3A display a distribution of misclassifications that is inconsistent with an alternative mixing effect explanation of contextualization."

"Based upon the increased overlap between adjacent index finger keypresses (i.e. - "4-4" transition), we also reasoned that the decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position, should show decreased performance as typing speed increases. However, Figure 4C in our manuscript shows that this is not the case. The 2-class hybrid classifier actually displays improved classification performance over early practice trials despite greater temporal overlap. Again, this is inconsistent with the idea that the contextualization effect simply reflects increased mixing of individual keypress features."

As the time window for MEG feature is defined after the onset of each press, it is more likely that the feature overlap is the current and the future presses, rather than the current and the past presses (of course the three will overlap at very fast typing speed). Therefore, for sequence 41324, if we note the planning-related processes by a Roman numeral, the overlapping features would be '4i', '1iii', '3ii', '2iv', and '4iv'. Assuming execution-related process (e.g., 1) and planning-related process (e.g., i) are not necessarily similar, especially in finer temporal resolution, the patterns for '4i' and '4iv' are well separated in terms of process 'i' and 'iv,' and this advantage will be larger in faster typing speed. This also applies to the other presses. Thus, the author's arguments about the masking of contextualization and misclassification due to pattern overlap seem odd. The most direct and probably easiest way to resolve this would be to use a shorter time window for the MEG feature. Some decrease in decoding accuracy in this case is totally acceptable for the science purpose.

The revised manuscript now includes analyses carried out with decoding time windows ranging from 50 to 250ms in duration. These additional results are now reported in:

Results (lines 258-268):

“The improved decoding accuracy is supported by greater differentiation in neural representations of the index finger keypresses performed at positions 1 and 5 of the sequence (Figure 4A), and by the trial-by-trial increase in 2-class decoding accuracy over early learning (Figure 4C) across different decoder window durations (Figure 4 – figure supplement 2). As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (crossvalidated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C).”

Results (lines 310-316):

“Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R² = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). “

Discussion (lines 380-385):

“The first hint of representational differentiation was the highest false-negative and lowest false-positive misclassification rates for index finger keypresses performed at different locations in the sequence compared with all other digits (Figure 3C). This was further supported by the progressive differentiation of neural representations of the index finger keypress (Figure 4A) and by the robust trial-by-trial increase in 2class decoding accuracy across time windows ranging between 50 and 250ms (Figure 4C; Figure 4 – figure supplement 2).”

Discussion (lines 408-9):

“Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1).”

"We addressed this question by conducting a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence"

For regression analysis, I recommend to use total keypress time per a sequence (or sum of 4-1 and 4-4) instead of specific transition intervals, because there likely exist specific correlational structure across the transition intervals. Using correlated regressors may distort the result.

This approach is now reported in the manuscript:

Results (Lines 324-328) and Figure 5-Figure Supplement 6 legend.

"We do agree with the Reviewer that the naturalistic, generative, self-paced task employed in the present study results in overlapping brain processes related to planning, execution, evaluation and memory of the action sequence. We also agree that there are several tradeoffs to consider in the construction of the classifiers depending on the study aim. Given our aim of optimizing keypress decoder accuracy in the present study, the set of tradeoffs resulted in representations reflecting more the latter three processes, and less so the planning component. Whether separate decoders can be constructed to tease apart the representations or networks supporting these overlapping processes is an important future direction of research in this area. For example, work presently underway in our lab constrains the selection of windowing parameters in a manner that allows individual classifiers to be temporally linked to specific planning, execution, evaluation or memoryrelated processes to discern which brain networks are involved and how they adaptively reorganize with learning. Results from the present study (Figure 4-figure supplement 2) showing hybrid-space decoder prediction accuracies exceeding 74% for temporal windows spanning as little as 25ms and located up to 100ms prior to the KeyDown event strongly support the feasibility of such an approach."

I recommend that the authors add this paragraph or a paragraph like this to the Discussion. This perspective is very important and still missing in the revised manuscript.

We now included in the manuscript the following sections addressing this point:

Discussion (lines 334-338)

“The main findings of this study during which subjects engaged in a naturalistic, self-paced task were that individual sequence action representations differentiate during early skill learning in a manner reflecting the local sequence context in which they were performed, and that the degree of representational differentiation— particularly prominent over rest intervals—correlated with skill gains. “

Discussion (lines 428-434)

“In this study, classifiers were trained on MEG activity recorded during or immediately after each keypress, emphasizing neural representations related to action execution, memory consolidation and recall over those related to planning. An important direction for future research is determining whether separate decoders can be developed to distinguish the representations or networks separately supporting these processes. Ongoing work in our lab is addressing this question. The present accuracy results across varied decoding window durations and alignment with each keypress action support the feasibility of this approach (Figure 3—figure supplement 5).”

"The rapid initial skill gains that characterize early learning are followed by micro-scale fluctuations around skill plateau levels (i.e. following trial 11 in Figure 1B)" Is this a mention of Figure 1 Supplement 1 A?

The sentence was replaced with the following: Results (lines 108-110)

“Participants reached 95% of maximal skill (i.e. - Early Learning) within the initial 11 practice trials (Figure 1B), with improvements developing over inter-practice rest periods (micro-offline gains) accounting for almost all total learning across participants (Figure 1B, inset) [1].”

The citation below seems to have been selected by mistake;

"9. Chen, S. & Epps, J. Using task-induced pupil diameter and blink rate to infer cognitive load. Hum Comput Interact 29, 390-413 (2014)."

We thank the Reviewer for bringing this mistake to our attention. This citation has now been corrected.

Reviewer #3 (Recommendations for the authors):

The authors write in their response that "We now provide additional details in the Methods of the revised manuscript pertaining to the parcellation procedure and how the sign ambiguity problem was addressed in our analysis." I could not find anything along these lines in the (redlined) version of the manuscript and therefore did not change the corresponding comment in the public review.

The revised manuscript now provides a more detailed explanation of the parcellation, and sign-flipping procedure implemented:

Methods (lines 604-611):

“Source-space parcellation was carried out by averaging all voxel time-series located within distinct anatomical regions defined in the Desikan-Killiany Atlas [31]. Since source time-series estimated with beamforming approaches are inherently sign-ambiguous, a custom Matlab-based implementation of the mne.extract_label_time_course with “mean_flip” sign-flipping procedure in MNEPython [78] was applied prior to averaging to prevent within-parcel signal cancellation. All voxel time-series within each parcel were extracted and the timeseries sign was flipped at locations where the orientation difference was greater than 90° from the parcel mode. A mean time-series was then computed across all voxels within the parcel after sign-flipping.”

The control analysis based on a multivariate regression that assessed whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times, as briefly mentioned in the authors' responses to Reviewer 2 and myself, was not included in the manuscript and could not be sufficiently evaluated.

This approach is now reported in the manuscript: Results (Lines 324-328) and Figure 5-Figure Supplement 6 legend.

The authors argue that differences in the design between Das et al. (2024) on the one hand (Experiments 1 and 2), and the study by Bönstrup et al. (2019) on the other hand, may have prevented Das et al. (2024) from finding the assumed learning benefit by micro-offline consolidation. However, the Supplementary Material of Das et al. (2024) includes an experiment (Experiment S1) whose design closely follows a large proportion of the early learning phase of Bönstrup et al. (2019), and which, nevertheless, demonstrates that there is no lasting benefit of taking breaks with respect to the acquired skill level, despite the presence of micro-offline gains.

We thank the Reviewer for alerting us to this new data added to the revised supplementary materials of Das et al. (2024) posted to bioRxiv. However, despite the Reviewer’s claim to the contrary, a careful comparison between the Das et al and Bönstrup et al studies reveal more substantive differences than similarities and does not “closely follows a large proportion of the early learning phase of Bönstrup et al. (2019)” as stated.

In the Das et al. Experiment S1, sixty-two participants were randomly assigned to “with breaks” or “no breaks” skill training groups. The “with breaks” group alternated 10 seconds of skill sequence practice with 10 seconds of rest over seven trials (2 min and 2 sec total training duration). This amounts to 66.7% of the early learning period defined by Bönstrup et al. (2019) (i.e. - eleven 10-second long practice periods interleaved with ten 10-second long rest breaks; 3 min 30 sec total training duration). Also, please note that while no performance feedback nor reward was given in the Bönstrup et al. (2019) study, participants in the Das et al. study received explicit performance-based monetary rewards, a potentially crucial driver of differentiated behavior between the two studies:

“Participants were incentivized with bonus money based on the total number of correct sequences completed throughout the experiment.”

The “no breaks” group in the Das et al. study practiced the skill sequence for 70 continuous seconds. Both groups (despite one being labeled “no breaks”) follow training with a long 3-minute break (also note that since the “with breaks” group ends with 10 seconds of rest their break is actually longer), before finishing with a skill “test” over a continuous 50-second-long block. During the 70 seconds of training, the “with breaks” group shows more learning than the “no breaks” group. Interestingly, following the long 3minute break the “with breaks” group display a performance drop (relative to their performance at the end of training) that is stable over the full 50-second test, while the “no breaks” group shows an immediate performance improvement following the long break that continues to increase over the 50-second test.

Separately, there are important issues regarding the Das et al study that should be considered through the lens of recent findings not referred to in the preprint. A major element of their experimental design is that both groups—“with breaks” and “no breaks”— actually receive quite a long 3-minute break just before the skill test. This long break is more than 2.5x the cumulative interleaved rest experienced by the “with breaks” group. Thus, although the design is intended to contrast the presence or absence of rest “breaks”, that difference between groups is no longer maintained at the point of the skill test.

The Das et al results are most consistent with an alternative interpretation of the data— that the “no breaks” group experiences offline learning during their long 3-minute break. This is supported by the recent work of Griffin et al. (2025) where micro-array recordings from primary and premotor cortex were obtained from macaque monkeys while they performed blocks of ten continuous reaching sequences up to 81.4 seconds in duration (see source data for Extended Data Figure 1h) with 90 seconds of interleaved rest. Griffin et al. observed offline improvement in skill immediately following the rest break that was causally related to neural reactivations (i.e. – neural replay) that occurred during the rest break. Importantly, the highest density of reactivations was present in the very first 90second break between Blocks 1 and 2 (see Fig. 2f in Griffin et al., 2025). This supports the interpretation that both the “with breaks” and “no breaks” group express offline learning gains, with these gains being delayed in the “no breaks” group due to the practice schedule.

On the other hand, if offline learning can occur during this longer break, then why would the “with breaks” group show no benefit? Again, it could be that most of the offline gains for this group were front-loaded during the seven shorter 10-second rest breaks. Another possible, though not mutually exclusive, explanation is that the observed drop in performance in the “with breaks” group is driven by contextual interference. Specifically, similar to Experiments 1 and 2 in Das et al. (2024), the skill test is conducted under very different conditions than those which the “with breaks” group practiced the skill under (short bursts of practiced alternating with equally short breaks). On the other hand, the “no breaks” group is tested (50 seconds of continuous practice) under quite similar conditions to their training schedule (70 seconds of continuous practice). Thus, it is possible that this dissimilarity between training and test could lead to reduced performance in the “with breaks” group.

We made the following manuscript revisions related to these important issues:

Introduction (Lines 26-56)

“Practicing a new motor skill elicits rapid performance improvements (early learning) [1] that precede skill performance plateaus [5]. Skill gains during early learning accumulate over rest periods (micro-offline) interspersed with practice [1, 6-10], and are up to four times larger than offline performance improvements reported following overnight sleep [1]. During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11]. Micro-offline gains observed during early learning are reproducible [7, 10-13] and are similar in magnitude even when practice periods are reduced by half to 5 seconds in length, thereby confirming that they are not merely a result of recovery from performance fatigue [11]. Additionally, they are unaffected by the random termination of practice periods, which eliminates the possibility of predictive motor slowing as a contributing factor [11]. Collectively, these behavioral findings point towards the interpretation that microoffline gains during early learning represent a form of memory consolidation [1].

This interpretation has been further supported by brain imaging and electrophysiological studies linking known memory-related networks and consolidation mechanisms to rapid offline performance improvements. In humans, the rate of hippocampo-neocortical neural replay predicts micro-offline gains [6]. Consistent with these findings, Chen et al. [12] and Sjøgård et al. [13] furnished direct evidence from intracranial human EEG studies, demonstrating a connection between the density of hippocampal sharp-wave ripples (80-120 Hz)—recognized markers of neural replay—and micro-offline gains during early learning. Further, Griffin et al. reported that neural replay of task-related ensembles in the motor cortex of macaques during brief rest periods— akin to those observed in humans [1, 6-8, 14]—are not merely correlated with, but are causal drivers of micro-offline learning [15]. Specifically, the same reach directions that were replayed the most during rest breaks showed the greatest reduction in path length (i.e. – more efficient movement path between two locations in the reach sequence) during subsequent trials, while stimulation applied during rest intervals preceding performance plateau reduced reactivation rates and virtually abolished micro-offline gains [15]. Thus, converging evidence in humans and non-human primates across indirect non-invasive and direct invasive recording techniques link hippocampal activity, neural replay dynamics and offline skill gains in early motor learning that precede performance plateau.”

Next, in the Methods, we articulate important constraints formulated by Pan and Rickard (2015) and Bönstrup et al. (2019) for meaningful measurements:

Methods (Lines 493-499)

“The study design followed specific recommendations by Pan and Rickard (2015): (1) utilizing 10-second practice trials and (2) constraining analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur) that precede the emergence of “scalloped” performance dynamics strongly linked to reactive inhibition effects ([29, 72]). This is precisely the portion of the learning curve Pan and Rickard referred to when they stated “…rapid learning during that period masks any reactive inhibition effect” [29].”

We finally discuss the implications of neglecting some or all of these recommendations:

Discussion (Lines 444-452):

“Finally, caution should be exercised when extrapolating findings during early skill learning, a period of steep performance improvements, to findings reported after insufficient practice [67], post-plateau performance periods [68], or non-learning situations (e.g. performance of non-repeating keypress sequences in [67]) when reactive inhibition or contextual interference effects are prominent. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.”

Personally, given that the idea of (micro-offline) consolidation seems to attract a lot of interest (and therefore cause a lot of future effort/cost public money) in the scientific community, I would find it extremely important to be cautious in interpreting results in this field. For me, this would include abstaining from the claim that processes occur "during" a rest period (see abstract, for example), given that micro-offline gains (as well as offline contextualization) are computed from data obtained during practice, not rest, and may, thus, just as well reflect a change that occurs "online", e.g., at the very onset of practice (like pre-planning) or throughout practice (like fatigue, or reactive inhibition). In addition, I would suggest to discuss in more depth the actual evidence not only in favour, but also against, the assumption of micro-offline gains as a phenomenon of learning.

We agree with the reviewer that caution is warranted. Based upon these suggestions, we have now expanded the manuscript to very clearly define the experimental constraints under which different groups have successfully studied micro-offline learning and its mechanisms, the impact of fatigue/reactive inhibition on micro-offline performance changes unrelated to learning, as well as the interpretation problems that emerge when those recommendations are not followed.

We clearly articulate the crucial constrains recommended by Pan and Rickard (2015) and Bönstrup et al. (2019) for meaningful measurements and interpretation of offline gains in the revised manuscript.

Methods (Lines 493-499)

“The study design followed specific recommendations by Pan and Rickard (2015): (1) utilizing 10-second practice trials and (2) constraining analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur) that precede the emergence of “scalloped” performance dynamics strongly linked to reactive inhibition effects ([29, 72]). This is precisely the portion of the learning curve Pan and Rickard referred to when they stated “…rapid learning during that period masks any reactive inhibition effect” [29].”

In the Introduction, we review the extensive evidence emerging from LFP and microelectrode recordings in humans and monkeys (including causality of neural replay with respect to micro-offline gains and early learning in the Griffin et al. Nature 2025 publication):

Introduction (Lines 26-56)

“Practicing a new motor skill elicits rapid performance improvements (early learning) [1] that precede skill performance plateaus [5]. Skill gains during early learning accumulate over rest periods (micro-offline) interspersed with practice [1, 6-10], and are up to four times larger than offline performance improvements reported following overnight sleep [1]. During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11]. Micro-offline gains observed during early learning are reproducible [7, 10-13] and are similar in magnitude even when practice periods are reduced by half to 5 seconds in length, thereby confirming that they are not merely a result of recovery from performance fatigue [11]. Additionally, they are unaffected by the random termination of practice periods, which eliminates the possibility of predictive motor slowing as a contributing factor [11]. Collectively, these behavioral findings point towards the interpretation that microoffline gains during early learning represent a form of memory consolidation [1].

This interpretation has been further supported by brain imaging and electrophysiological studies linking known memory-related networks and consolidation mechanisms to rapid offline performance improvements. In humans, the rate of hippocampo-neocortical neural replay predicts micro-offline gains [6]. Consistent with these findings, Chen et al. [12] and Sjøgård et al. [13] furnished direct evidence from intracranial human EEG studies, demonstrating a connection between the density of hippocampal sharp-wave ripples (80-120 Hz)—recognized markers of neural replay—and micro-offline gains during early learning. Further, Griffin et al. reported that neural replay of task-related ensembles in the motor cortex of macaques during brief rest periods— akin to those observed in humans [1, 6-8, 14]—are not merely correlated with, but are causal drivers of micro-offline learning [15]. Specifically, the same reach directions that were replayed the most during rest breaks showed the greatest reduction in path length (i.e. – more efficient movement path between two locations in the reach sequence) during subsequent trials, while stimulation applied during rest intervals preceding performance plateau reduced reactivation rates and virtually abolished micro-offline gains [15]. Thus, converging evidence in humans and non-human primates across indirect non-invasive and direct invasive recording techniques link hippocampal activity, neural replay dynamics and offline skill gains in early motor learning that precede performance plateau.”

Following the reviewer’s advice, we have expanded our discussion in the revised manuscript of alternative hypotheses put forward in the literature and call for caution when extrapolating results across studies with fundamental differences in design (e.g. – different practice and rest durations, or presence/absence of extrinsic reward, etc).

Discussion (Lines 444-452):

“Finally, caution should be exercised when extrapolating findings during early skill learning, a period of steep performance improvements, to findings reported after insufficient practice [67], post-plateau performance periods [68], or non-learning situations (e.g. performance of non-repeating keypress sequences in [67]) when reactive inhibition or contextual interference effects are prominent. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.”

References

(1) Zimerman, M., et al., Disrupting the Ipsilateral Motor Cortex Interferes with Training of a Complex Motor Task in Older Adults. Cereb Cortex, 2012.

(2) Waters, S., T. Wiestler, and J. Diedrichsen, Cooperation Not Competition: Bihemispheric tDCS and fMRI Show Role for Ipsilateral Hemisphere in Motor Learning. J Neurosci, 2017. 37(31): p. 7500-7512.

(3) Sawamura, D., et al., Acquisition of chopstick-operation skills with the nondominant hand and concomitant changes in brain activity. Sci Rep, 2019. 9(1): p. 20397.

(4) Lee, S.H., S.H. Jin, and J. An, The dieerence in cortical activation pattern for complex motor skills: A functional near- infrared spectroscopy study. Sci Rep, 2019. 9(1): p. 14066.

(5) Grafton, S.T., E. Hazeltine, and R.B. Ivry, Motor sequence learning with the nondominant left hand. A PET functional imaging study. Exp Brain Res, 2002. 146(3): p. 369-78.

(6) Buch, E.R., et al., Consolidation of human skill linked to waking hippocamponeocortical replay. Cell Rep, 2021. 35(10): p. 109193.

(7) Wang, L. and S. Jiang, A feature selection method via analysis of relevance, redundancy, and interaction, in Expert Systems with Applications, Elsevier, Editor. 2021.

(8) Yu, L. and H. Liu, Eeicient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 2004. 5: p. 1205-1224.

(9) Munn, B.R., et al., Multiscale organization of neuronal activity unifies scaledependent theories of brain function. Cell, 2024.

(10) Borragan, G., et al., Sleep and memory consolidation: motor performance and proactive interference eeects in sequence learning. Brain Cogn, 2015. 95: p. 54-61.

(11) Landry, S., C. Anderson, and R. Conduit, The eeects of sleep, wake activity and timeon-task on oeline motor sequence learning. Neurobiol Learn Mem, 2016. 127: p. 5663.

(12) Gabitov, E., et al., Susceptibility of consolidated procedural memory to interference is independent of its active task-based retrieval. PLoS One, 2019. 14(1): p. e0210876.

(13) Pan, S.C. and T.C. Rickard, Sleep and motor learning: Is there room for consolidation? Psychol Bull, 2015. 141(4): p. 812-34.

(14) , M., et al., A Rapid Form of Oeline Consolidation in Skill Learning. Curr Biol, 2019. 29(8): p. 1346-1351 e4.

(15) Gupta, M.W. and T.C. Rickard, Comparison of online, oeline, and hybrid hypotheses of motor sequence learning using a quantitative model that incorporate reactive inhibition. Sci Rep, 2024. 14(1): p. 4661.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Bönstrup M, Buch ER, Cohen LG. 2025. Skill learning and consolidation in healthy humans. OpenNeuro. [DOI]

    Supplementary Materials

    MDAR checklist

    Data Availability Statement

    All de-identified and permanently unlinked from all personal identifiable information (PII) data are publicly available on the OpenNeuro platform. All custom analysis code is available in a publicly accessible repository hosted on GitHub (copy archived at Dash and hcps-ninds, 2025).

    The following dataset was generated:

    Bönstrup M, Buch ER, Cohen LG. 2025. Skill learning and consolidation in healthy humans. OpenNeuro.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES